You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There are multiple open PRs that collectively aim to significantly improve VectorType (custom type) deserialization performance for vector search workloads. This issue tracks them, describes their contributions, and proposes a merge priority/order.
Vector search workloads (e.g., 768/1536-dimension float vectors for embeddings) are a key use case where deserialization overhead is substantial. These PRs attack the problem from multiple angles: pure Python fast-paths, Cython C-level operations, numpy integration, type-resolution caching, and general read-path optimization.
Directly Vector-Related PRs
1. #689 — (Improvement) Improve performance of Vector type parsing
What: Original umbrella PR containing multiple commits across Python, Cython, and numpy layers for vector deserialization optimization.
2. #730 — Optimize VectorType deserialization with struct.unpack and numpy
What: Pure Python path optimization. Replaces element-by-element deserialization with bulk struct.unpack for known numeric types, and adds a numpy.frombuffer().tolist() fast-path for vectors >= 32 elements.
Impact: 3.5x speedup for small vectors (Vector<float, 3>), 4.0x for large vectors (Vector<float, 1536>).
Scope:cassandra/cqltypes.py only. No Cython dependency.
Priority: HIGH — This is the foundational optimization that benefits all users (pure Python path is always available).
3. #731 — Add VectorType support to numpy_parser for 2D array parsing
What: Extends the Cython NumPy row parser to produce 2D masked arrays (num_rows, vector_dimension) for vector columns, using memcpy of raw wire bytes directly into pre-allocated numpy buffers.
Impact: Zero-copy path for ML/AI workloads consuming results as numpy arrays. Fastest possible path when the consumer is numpy.
Scope:cassandra/numpy_parser.pyx + new tests.
Priority: MEDIUM — Benefits users who opt into the numpy result path. Independent of other PRs.
What: 6 commits: (1) ntohs/ntohl intrinsics for Cython byte unpacking, (2) float-specific ntohl byte-swap, (3) from_ptr_and_size() refactor, (4) new DesVectorType Cython class with type-specialized C-level deserialization, (5) Windows portability, (6) buffer bounds validation.
Impact: 4.4–4.7x faster than pure Python for small vectors. For large vectors, both paths use numpy so the per-vector gain is marginal, but per-row dispatch overhead is eliminated. Also ~4-5% general row throughput improvement from byte-swap intrinsics.
These PRs optimize the general deserialization pipeline that vector data flows through. They benefit all types but have particular impact on vector workloads due to the high volume of data.
6. #690 — Optimize custom type parsing with LRU caching
What: Caches lookup_casstype() results to avoid repeated string manipulation and regex scanning. VectorType is a custom/parameterized type, so this directly reduces per-query type resolution overhead.
Priority: MEDIUM — Prerequisite-like optimization. Benefits vector type resolution specifically.
7. #729 — Fast-path lookup_casstype() for simple type names
What: Skips the regex scanner and stack-based parser for non-parameterized types (direct dict lookup). While VectorType itself is parameterized, its subtypes (FloatType, etc.) are simple and benefit from this.
Priority: LOW-MEDIUM — Incremental improvement to type resolution.
What: Reduces memory copies in the read path. Up to 5.3x speedup for large payloads. Vector results with 768/1536-dim float columns are large payloads.
Impact: 1.2x–5.3x depending on payload size.
Priority: HIGH — Significant general improvement that directly benefits vector query result processing.
9. #741 — Cache deserializer instances in find_deserializer (Cython only)
Overview
There are multiple open PRs that collectively aim to significantly improve VectorType (custom type) deserialization performance for vector search workloads. This issue tracks them, describes their contributions, and proposes a merge priority/order.
Vector search workloads (e.g., 768/1536-dimension float vectors for embeddings) are a key use case where deserialization overhead is substantial. These PRs attack the problem from multiple angles: pure Python fast-paths, Cython C-level operations, numpy integration, type-resolution caching, and general read-path optimization.
Directly Vector-Related PRs
1. #689 — (Improvement) Improve performance of Vector type parsing
2. #730 — Optimize VectorType deserialization with struct.unpack and numpy
struct.unpackfor known numeric types, and adds anumpy.frombuffer().tolist()fast-path for vectors >= 32 elements.Vector<float, 3>), 4.0x for large vectors (Vector<float, 1536>).cassandra/cqltypes.pyonly. No Cython dependency.3. #731 — Add VectorType support to numpy_parser for 2D array parsing
(num_rows, vector_dimension)for vector columns, usingmemcpyof raw wire bytes directly into pre-allocated numpy buffers.cassandra/numpy_parser.pyx+ new tests.4. #732 — Optimize Cython deserialization primitives and add VectorType Cython deserializer
ntohs/ntohlintrinsics for Cython byte unpacking, (2) float-specificntohlbyte-swap, (3)from_ptr_and_size()refactor, (4) newDesVectorTypeCython class with type-specialized C-level deserialization, (5) Windows portability, (6) buffer bounds validation.cassandra/deserializers.pyx,cassandra/cython_marshal.pyx,cassandra/ioutils.pyx,cassandra/buffer.pxd.5. #733 — Add VectorType deserialization benchmarks and expand test coverage
Supporting/Infrastructure PRs (Benefit Vector Workloads Indirectly)
These PRs optimize the general deserialization pipeline that vector data flows through. They benefit all types but have particular impact on vector workloads due to the high volume of data.
6. #690 — Optimize custom type parsing with LRU caching
lookup_casstype()results to avoid repeated string manipulation and regex scanning. VectorType is a custom/parameterized type, so this directly reduces per-query type resolution overhead.7. #729 — Fast-path lookup_casstype() for simple type names
8. #734 — Remove copies on the read path
9. #741 — Cache deserializer instances in find_deserializer (Cython only)
find_deserializer()andmake_deserializers()results, avoiding repeated class lookups and Deserializer object creation. TheDesVectorTypefrom (improvement) Optimize Cython deserialization primitives and add VectorType Cython deserializer (substantial - 11x-30x speedup mainly via DesVectorType.deserialize_bytes with Cyhon) #732 would be cached here.find_deserializer, 29x formake_deserializers(10 types).10. #742 — Cache ParseDesc for prepared statements (Cython only)
make_deserializers().11. #743 — Direct PyUnicode_DecodeUTF8/ASCII from C buffer (Cython only)
Proposed Merge Order
The ordering considers dependencies, risk, and impact:
Notes