Skip to content

Conversation

@yj-40-56
Copy link

Student project for DIA at TU Berlin: improving copy()

@yj-40-56
Copy link
Author

Replaced Locking on ValueType.BOOLEAN with locking based on ArrayType:

  • BitSetArray requires locking due to underlying storage in longs. Small Array are safe if they are shorter than 64 rows (using underlying Byte[]) a. Once above that threshold, the array is turned into a BitSetArray

  • OptionalArray has an underlying null-mask, indicating the null entries. That null-mask is based on a BitSetArray and is not thread-safe.

Additional ArrayTypes that are locked:

  • RaggedArray: internal resizing should cause race conditions. The set()/append() methods all contained a reset() into set() segment which has no guarantee of being atomic.
  • DDCArray/ACompressedArray: Similar to RaggedArrays, modifications are not atomic.

I have verified the fix for BitSetSArray and OptionalArray.

I did not include tests for RaggedArray and ACompressedArray. After looking at the implementations, I am convinced they are not thread-safe, but I was not able to create unit tests to show that specific behaviour.

@janniklinde
Copy link
Contributor

Thank you for your analysis and contribution @yj-40-56. It's good to know that you identified OptionalArray, BitSetArray, RaggedArray, and CompressedArray as array types that may require locking. I think it would be safer to allow a whitelist of array types to bypass locking (also for future proofing). I'll leave the PR open to fix some formatting issues and revisit some aspects that could be done more efficiently. Good luck in your exam!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

2 participants