MOD-15578 Track shared SVS thread pool memory & expose it through public API#972
MOD-15578 Track shared SVS thread pool memory & expose it through public API#972dor-forer wants to merge 15 commits into
Conversation
🛡️ Jit Security Scan Results✅ No security findings were detected in this PR
Security scan by Jit
|
There was a problem hiding this comment.
Pull request overview
This PR improves VecSim memory accounting by routing the shared SVS thread pool singleton allocations through a tracked allocator and exposing that process-wide (non-index) memory via a new public C API and additional VECSIM_INFO fields.
Changes:
- Added
VecSim_GetGlobalMemory()and appendedGLOBAL_MEMORYto the top-level debug info iterator returned byVecSimIndex_DebugInfoIterator. - Tracked shared SVS thread pool allocations and exposed them via
SHARED_SVS_THREADPOOL_MEMORYin SVS debug info. - Updated unit-test comparators/field-order expectations and added a test to enforce the invariant between
GLOBAL_MEMORY,SHARED_SVS_THREADPOOL_MEMORY, andVecSim_GetGlobalMemory().
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/unit/unit_test_utils.h | Extends debug-info comparator signatures to optionally expect GLOBAL_MEMORY. |
| tests/unit/unit_test_utils.cpp | Updates debug-info comparators and expected field orders to include GLOBAL_MEMORY / SHARED_SVS_THREADPOOL_MEMORY. |
| tests/unit/test_svs.cpp | Adds a typed test validating global-memory vs shared-threadpool-memory invariants. |
| tests/unit/test_svs_tiered.cpp | Updates SVS threadpool wrapper construction to pass a VecSimAllocator. |
| tests/unit/test_svs_threadpool.cpp | Updates threadpool wrapper tests to pass an allocator and resets shared pool state in setup. |
| src/VecSim/vec_sim.h | Adds the new public API declaration VecSim_GetGlobalMemory(). |
| src/VecSim/vec_sim.cpp | Implements VecSim_GetGlobalMemory() and appends GLOBAL_MEMORY in the C API debug iterator wrapper. |
| src/VecSim/vec_sim_common.h | Clarifies that per-index memory excludes process-wide/global allocations. |
| src/VecSim/utils/vec_utils.h | Adds new debug field name constants for GLOBAL_MEMORY / SHARED_SVS_THREADPOOL_MEMORY. |
| src/VecSim/utils/vec_utils.cpp | Defines the new debug field name strings. |
| src/VecSim/index_factories/svs_factory.cpp | Accounts for per-index threadpool wrapper allocation in SVS initial size estimation. |
| src/VecSim/algorithms/svs/svs.h | Constructs the per-index SVS threadpool wrapper with the index allocator and adds SHARED_SVS_THREADPOOL_MEMORY to debug info. |
| src/VecSim/algorithms/svs/svs_utils.h | Introduces a tracked allocator for the shared pool, tracks allocations, and changes wrapper ctor to allocate parallelism via index allocator. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
introduce VecSim_GlobalStatsInfo
31c0e0a to
0adca87
Compare
The doc string claimed the field is exposed only in SVS tiered indexes, but `SVSIndex::debugInfoIterator()` emits it for flat SVS indexes too (and inside the BACKEND_INDEX section of tiered SVS). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #972 +/- ##
==========================================
- Coverage 96.99% 96.97% -0.03%
==========================================
Files 130 130
Lines 7793 7829 +36
==========================================
+ Hits 7559 7592 +33
- Misses 234 237 +3 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
* Drop the unconditional `#include "VecSim/algorithms/svs/svs_utils.h"` at the top of `unit_test_utils.cpp` (added in 3c2023a). The header pulls in `<svs/core/distance.h>`, `<svs/index/vamana/dynamic_index.h>`, etc. — none of which are available when `USE_SVS=OFF`. The HAVE_SVS-guarded include lower in the file already provides the symbols `validateSVSIndexAttributesInfo` needs. * Wrap `compareSVSIndexInfoToIterator` (definition + declaration in the header) and its call site inside `chooseCompareIndexInfoToIterator` with `#if HAVE_SVS`. The function references `VecSimSVSThreadPool`, which is undefined when SVS is disabled. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
0adca87 to
9d082f0
Compare
- svs_factory.cpp: mark unused 'p' variable with [[maybe_unused]] to silence -Wunused-variable under -Werror (Copilot review). - vec_utils.h: rewrite SHARED_SVS_THREADPOOL_MEMORY_STRING doc comment to match actual emission behavior — emitted by SVSIndex::debugInfoIterator() for both non-tiered (top-level) and tiered (BACKEND_INDEX) SVS responses (Copilot review).
| * (e.g. the shared SVS thread pool singleton). 0 if no such allocations | ||
| * have been made. | ||
| */ | ||
| size_t VecSim_GetGlobalMemory(void); |
There was a problem hiding this comment.
Problematic name, implies that it also includes the per-index memory
| // Capacity hint for the iterator. Must equal the number of addInfoField() | ||
| // calls below (1 for ALGORITHM + 9 from addCommonInfoToIterator + 16 SVS-specific). | ||
| // Update this number when fields are added or removed. | ||
| size_t numberOfInfoFields = 26; |
There was a problem hiding this comment.
Only one pair was added no? How come it increased by 3?
There was a problem hiding this comment.
It's actually +1 for the new field and +2 because the old 23 was already wrong.
Before the change the real count was 25 (1 ALGORITHM + 9 from addCommonInfoToIterator + 15 SVS-specific), but the comment just said "update this when needed" and it looks like nobody had.
So adding the shared-memory field bumped the correct count to 26, not 24.
I added a little breakdown in the comment so it doesn't drift again. Worth noting it's only a reserve() hint, not a hard size, so the stale value wasn't actually causing any bug — just an extra realloc — which is probably why it slipped by.
| VecSimSVSThreadPool::resize(1); | ||
| } | ||
|
|
There was a problem hiding this comment.
Is there a validation that the tracked memory actually increased/decreased as expected (rather than just that both APIs return the same value) ?
There was a problem hiding this comment.
Added a new test sharedMemoryTracksThreadPoolResize that specifically validates directional tracking: it asserts that VecSim_GetSharedMemory() increases when the pool grows and decreases when it shrinks, ensuring the accounting is live and not just a static value that happens to match both APIs
| std::shared_ptr<VecSimAllocator> allocator_; // pool's own allocator for memory tracking | ||
| mutable std::mutex pool_mutex_; | ||
| std::vector<std::shared_ptr<ThreadSlot>> slots_; | ||
| std::vector<SlotPtr, SlotVecAllocator> slots_; |
There was a problem hiding this comment.
Wouldn't using vecsim_stl::vector simplify the whole thing?
There was a problem hiding this comment.
Done — slots_ is now vecsim_stl::vector<SlotPtr> (constructed with the pool's allocator), which removed the explicit SlotVecAllocator alias entirely.
…size - Rename VecSim_GetGlobalMemory -> VecSim_GetSharedMemory and the GLOBAL_MEMORY field -> SHARED_MEMORY; "global" wrongly implied it included per-index memory. Doc now states it excludes per-index memory. - Simplify VecSimSVSThreadPoolImpl::slots_ to vecsim_stl::vector<SlotPtr>, dropping the SlotVecAllocator alias and the explicit allocator spelling in the ctor; add the now-needed vecsim_stl.h include. - Fix the SVS debugInfoIterator field count (was a stale 23; true count was 25, now 26 with the new shared-memory field) and pin a breakdown comment so it can't drift again. - Add SVSTest.sharedMemoryTracksThreadPoolResize: assert shared memory grows when the pool grows and shrinks when it shrinks, not just that the SHARED_MEMORY / SHARED_SVS_THREADPOOL_MEMORY / VecSim_GetSharedMemory readouts agree. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…size - Rename VecSim_GetGlobalMemory -> VecSim_GetSharedMemory and the GLOBAL_MEMORY field -> SHARED_MEMORY; "global" wrongly implied it included per-index memory. Doc now states it excludes per-index memory. - Simplify VecSimSVSThreadPoolImpl::slots_ to vecsim_stl::vector<SlotPtr>, dropping the SlotVecAllocator alias and the explicit allocator spelling in the ctor; add the now-needed vecsim_stl.h include. - Fix the SVS debugInfoIterator field count (was a stale 23; true count was 25, now 26 with the new shared-memory field) and pin a breakdown comment so it can't drift again. - Add SVSTest.sharedMemoryTracksThreadPoolResize: assert shared memory grows when the pool grows and shrinks when it shrinks, not just that the SHARED_MEMORY / SHARED_SVS_THREADPOOL_MEMORY / VecSim_GetSharedMemory readouts agree. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2f308d1 to
7c8b871
Compare
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
There are 2 total unresolved issues (including 1 from previous review).
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 7c8b871. Configure here.
…field SVSIndex::debugInfoIterator() reserved 26 fields, but the C API wrapper VecSimIndex_DebugInfoIterator appends one more (SHARED_MEMORY) after the method returns, forcing a reallocation on the top-level path. Bump the capacity hint to 27 and document the +1. Addresses Cursor Bugbot finding. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
compareTieredIndexInfoToIterator expects the top-level SHARED_MEMORY field that VecSimIndex_DebugInfoIterator (C API) appends, but the tiered tests fed it a C++-method iterator (tiered_index->debugInfoIterator()) which omits that field, causing an off-by-one field-count failure (HNSW tiered 16 vs 17, SVS tiered 18 vs 19). Obtain the iterator via the C API like the non-tiered tests and the real RediSearch consumer. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Lead with 27 = 26 + 1 so the comment no longer reads as a 26-vs-27 mismatch against numberOfInfoFields. No functional change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The shared
VecSimSVSThreadPoolsingleton was previously created via rawnewwith the default allocator, so its slot vector and per-slotThreadSlotobjects bypassedVecSimAllocatorand were invisible to any memory accounting downstream (FT.INFO,INFO MODULES, etc.).This PR:
VecSimAllocator(usingVecsimSTLAllocatorfor the slot vector andstd::allocate_sharedfor thread objects).size_t VecSim_GetSharedMemory(void)returning the total bytes of process-wide VecSim allocations not tied to any single index — today equal to the shared SVS thread pool's tracked allocation size.VECSIM_INFOvia two new fields:SHARED_MEMORY— appended at the top level of every algorithm's debug response by the C API wrapperVecSimIndex_DebugInfoIterator. Always present (value may be 0).SHARED_SVS_THREADPOOL_MEMORY— appended at the end of any SVS algorithm section bySVSIndex::debugInfoIterator(). Present at the top level of a non-tiered SVS response, or insideBACKEND_INDEXof a tiered SVS response.Public API change
Before
// (no API to query process-wide VecSim memory)After
Callers (e.g. RediSearch) can fold this into per-spec or process-wide memory metrics without depending on which algorithm contributes.
VECSIM_INFO(FT.DEBUG) output changeCommon header (every algorithm, unchanged)
FLAT — 11 → 12 fields
<common header × 10> BLOCK_SIZE + SHARED_MEMORYHNSW — 18 → 19 fields
<common header × 10> BLOCK_SIZE, M, EF_CONSTRUCTION, EF_RUNTIME, MAX_LEVEL, ENTRYPOINT, EPSILON, NUMBER_OF_MARKED_DELETED + SHARED_MEMORYSVS (non-tiered) — 25 → 27 fields
Tiered HNSW — 16 → 17 fields
<common header × 10> (ALGORITHM = "TIERED") MANAGEMENT_LAYER_MEMORY, BACKGROUND_INDEXING, TIERED_BUFFER_LIMIT FRONTEND_INDEX = nested [<FLAT fields, 11>] (no SHARED_MEMORY in nested) BACKEND_INDEX = nested [<HNSW fields, 18>] (no SHARED_MEMORY in nested) TIERED_HNSW_SWAP_JOBS_THRESHOLD + SHARED_MEMORYTiered SVS — 18 → 19 fields
Two emission rules
SHARED_MEMORYis appended exactly once, at the outermost iterator level, by the C API wrapper. Never appears inside a nestedFRONTEND_INDEX/BACKEND_INDEX.SHARED_SVS_THREADPOOL_MEMORYis appended at the end of any SVS algorithm section bySVSIndex::debugInfoIterator(). So it shows up at the top level of a non-tiered SVS response, or insideBACKEND_INDEXof a tiered SVS response — never duplicated.Stats / API output change
VecSim_GetSharedMemory()Before: API did not exist. The pool's slot vector and per-slot
ThreadSlotobjects went through the default allocator and were not tracked anywhere.After:
Per-index
getAllocationSize()(SVS only)Before: Did not include any per-index portion of the parallelism slot, since the pool was untracked entirely.
After: Each SVS index's per-index allocator now tracks its own
parallelismslot (a small fixed-size structure inside the index, allocated through the index'sVecSimAllocator). The shared pool itself remains process-wide and reported viaVecSim_GetSharedMemory().Cross-field invariant
Since the SVS thread pool is currently the only contributor to shared memory:
This invariant is enforced by the new gtest
SVSTest.debugInfoSharedMemoryEqualsSharedSVSThreadPoolMemory. If a future contributor is added toVecSim_GetSharedMemory()without updating breakdowns, the test will catch the drift.Tests
SVSTest.debugInfoSharedMemoryEqualsSharedSVSThreadPoolMemory(typed test, runs per SVS data type) — asserts both new fields exist exactly once in a non-tiered SVS response and report the same bytes asVecSim_GetSharedMemory().SVSTest.sharedMemoryTracksThreadPoolResize(typed test) — asserts the reported shared memory actually grows when the pool is resized up and shrinks when resized down, so the value tracks real allocation rather than just agreeing across the three readouts.compareFlatIndexInfoToIterator,compareHNSWIndexInfoToIterator,compareSVSIndexInfoToIteratorto take anexpect_shared_memoryparameter (defaulttrue) — needed because these comparators are called both with the C API iterator (top level, has SHARED_MEMORY) and as nested-backend comparators insidecompareTieredIndexInfoToIterator(no SHARED_MEMORY).SVS25 → 26.TIERED_SVSconstant is unchanged — the newSHARED_SVS_THREADPOOL_MEMORYfield lives inside the nested SVSBACKEND_INDEXiterator (already counted as one entry at the tiered level), andSHARED_MEMORYis added via the comparator's+1when called with the C API iterator.FLAT/HNSW/TIERED_HNSWfield-count constants represent the C++ method count (no SHARED_MEMORY); the comparators add+1when called with the C API iterator.getFlatFields(),getHNSWFields(),getTieredHNSWFields(),getSVSFields(),getTieredSVSFields()updated to appendSHARED_MEMORY(andSHARED_SVS_THREADPOOL_MEMORYfor SVS) at the new field positions.Compatibility
VECSIM_INFO. Existing consumers parsing by field name are unaffected; consumers indexing by position must shift their expectations accordingly (covered above).VecSim_GetSharedMemory()is purely additive.Mark if applicable
Supersedes #967 — ownership transferred to @dor-forer. Original branch authored by @meiravgri; identical commits cherry-pushed to
dor-forer-MOD-15578-track-svs-thpool-memoryso a non-self reviewer can be assigned.Note
Low Risk
Additive API and debug fields with allocator routing for the shared pool; no changes to search/index correctness paths beyond memory accounting.
Overview
This PR makes shared SVS thread-pool memory visible to callers that already use VecSim memory accounting and debug output.
The shared
VecSimSVSThreadPoolsingleton now allocates its slot vector andThreadSlotobjects through a dedicatedVecSimAllocator(and routes per-indexparallelism_through each index’s allocator). That fixes previously untracked bytes that could be missing fromFT.INFO/INFO MODULESstyle totals.A new C API
VecSim_GetSharedMemory()reports process-wide VecSim bytes not attributed to any single index (today: the shared pool).VecSimIndex_DebugInfoIteratorappendsSHARED_MEMORYon every algorithm’s top-level debug response; SVS also emitsSHARED_SVS_THREADPOOL_MEMORYin its own section (top-level SVS or nested tieredBACKEND_INDEX).VecSimIndexStatsInfo.memoryis documented as per-index only so aggregators don’t double-count the pool.Initial-size estimation for SVS indexes includes the per-index thread-pool wrapper allocation; tests and debug field-order helpers were updated for the new fields and invariants (including resize tracking).
Reviewed by Cursor Bugbot for commit b17a7b8. Bugbot is set up for automated code reviews on this repo. Configure here.