feat(vector): pre-filtered ANN via similar_to(..., filter: <var>)#9737
Open
shaunpatterson wants to merge 2 commits into
Open
feat(vector): pre-filtered ANN via similar_to(..., filter: <var>)#9737shaunpatterson wants to merge 2 commits into
shaunpatterson wants to merge 2 commits into
Conversation
Push a uid-set scope INTO the HNSW traversal instead of post-filtering, so a scoped vector search returns k in-scope neighbors instead of fetching a fixed k and discarding most of them. This is the standard multi-tenant RAG pattern (restrict retrieval to a tenant / graph_name / permission set). Surface: allowed as var(func: type(Chunk)) @filter(eq(graph_name, "kb")) q(func: similar_to(emb, 10, $vec, filter: allowed)) { uid expand(_all_) } Mechanism: - parser: `filter: <var>` registers the var as a UidVar dependency and records Function.VectorFilterVar; the var-dependency scheduler resolves it first. A plain `filter` marker arg is emitted so the worker can tell an empty scope (reject all) from no filter (accept all) — the var name is never an arg. - query: fillVars routes the resolved var's uids to sg.vectorFilterUids (not into DestUIDs); createTaskQuery ships them in the task UidList. - worker: builds a membership SearchFilter from the allow-set and applies it during the HNSW walk. The search layer already grows its candidate budget to still return k unfiltered (in-scope) results and traverses through filtered nodes to reach in-scope ones. Membership uses a sorted slice + binary search (compact, no per-query map) so whole-tenant scopes don't thrash GC. Existing similar_to behavior is unchanged when no filter option is given. Reviewed by GPT-5 and Gemini: fixed the empty-scope-returns-global-results bug (now rejects all) and replaced the per-query uid map with a sorted-slice binary search. Bitmap-backed-list and replaceVarInFunc concerns verified non-applicable. Tests: membership filter unit tests, parser tests (filter option, undefined-var dependency error), and integration tests (scope restriction with out-of-scope nearest vectors, empty scope, all-admitting scope == unfiltered). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…nt sort
Follow-up review fixes to pre-filtered ANN (similar_to(..., filter: <var>)).
- worker: route a filtered search through SearchWithOptions even without an
ef override. SearchWithOptions uses a max(k, efSearch) bottom-layer candidate
budget; the legacy Search path uses just k. Pre-filtering needs the wider
budget to traverse past out-of-scope nodes and still return k in-scope
neighbors, so a filter-only query (no ef:) previously under-returned.
- worker: uidMembershipFilter now skips the O(n log n) sort when the input is
already sorted ascending (the common case — uid variables reach the worker
pre-sorted via algo.MergeSorted). The defensive copy is retained, and an
unsorted input is still sorted for binary-search correctness. Saves work on
whole-tenant scopes (millions of uids per query).
- tests: cover the membership filter's safety claims that the existing tests
never exercised (unsorted input, no input mutation, race-free concurrent
reads of a shared input); add parser tests for GraphQuery-level NeedsVar
propagation, duplicate filter:, and non-name filter values; add an
integration test for the filter-only-without-ef recall path.
- test fixture: the prefilter integration setup inserted triples at UIDs
>16000, exceeding the zero's default 10000 lease ("Uid cannot be greater
than lease"), so the integration tests could not run. Use small UIDs
matching the sibling vector tests' convention.
Verified: dql/query/worker unit tests, and query/vector + worker + query
integration suites (real cluster) all pass.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Pre-filtered ANN:
similar_to(..., filter: <var>)Adds a
filter:option to thesimilar_tovector-search function that scopes theANN search to the uid set of a DQL variable. The membership predicate is applied
during the HNSW graph traversal (not as a post-filter), so a scoped search
explores enough of the graph to return
kin-scope neighbors instead ofpost-filtering a fixed
kand under-returning.DQL
The filter var is registered as a
NeedsVaruid dependency, so the queryscheduler resolves it before the search runs; its resolved uids travel to the
worker via the task
UidListand become the search allow-set.Design notes
uidMembershipFilterbuilds aSearchFilterfrom asorted copy of the allow-set + binary search (O(log n)); it copies before sorting
so it never mutates the shared var list, and skips the sort when the input is
already ascending (the common case — uid vars reach the worker sorted). A compact
8-bytes/uid slice avoids the GC pressure of a multi-million-entry map for
whole-tenant scopes.
without an
ef:override, so it uses themax(k, efSearch)bottom-layer budgetrather than the narrow
k-only budget — needed to traverse past out-of-scopenodes and still return
kin-scope results.marker arg distinguishes "filter requested, scope empty" from "no filter".
Tests
dql): filter option parsing,ef:combination, undefined-var error,duplicate-filter error, non-name value error,
NeedsVarpropagation.query/vector): scopes to filter var (returnskin-scope despitecloser out-of-scope vectors), works without an
ef:override, empty scope returnsnothing, and a superset scope equals the unfiltered top-k.
All parser unit tests and the 4 integration tests pass.
🤖 Generated with Claude Code