Skip to content

feat: expand unifiedSearch to accept object input with optional vector (#855)#1184

Open
pyramation wants to merge 1 commit into
mainfrom
feat/unified-search-vector
Open

feat: expand unifiedSearch to accept object input with optional vector (#855)#1184
pyramation wants to merge 1 commit into
mainfrom
feat/unified-search-vector

Conversation

@pyramation
Copy link
Copy Markdown
Contributor

Summary

Resolves the unifiedSearch input modality mismatch (#855). Previously, unifiedSearch accepted a plain String that could only be dispatched to text-compatible adapters (tsvector, BM25, trgm). pgvector couldn't participate because it needs a vector array, not text.

Before:

allDocuments(where: {
  unifiedSearch: \"machine learning\"          # text only
  vectorEmbedding: { vector: [0.12, ...] }   # separate filter, AND'd
})

After:

# Text-only (new object syntax)
allDocuments(where: { unifiedSearch: { text: \"machine learning\" } })

# Vector-only
allDocuments(where: { unifiedSearch: { vector: [0.12, ...], metric: COSINE } })

# True hybrid (text OR vector, OR-combined)
allDocuments(where: {
  unifiedSearch: {
    text: \"machine learning\"
    vector: [0.12, ...]
    metric: COSINE
    distance: 1.5
  }
})

Changes:

  • New UnifiedSearchInput GraphQL input type registered in the init hook with fields: text, vector, metric, distance, includeChunks
  • unifiedSearch filter field type changed from String to UnifiedSearchInput
  • Text path dispatches to all text-compatible adapters (tsvector, BM25, trgm) — same as before
  • Vector path dispatches to pgvector adapter — new
  • WHERE clauses from all active paths are OR-combined (true hybrid retrieval)
  • searchScore blends all active signals (text + vector) into a single 0..1 relevance number
  • Vector fields (vector, metric, distance, includeChunks) only appear when a pgvector adapter is registered
  • Hook point for future Graphile LLM plugin: when vector is omitted and an LLM plugin is loaded, it can intercept the text input and auto-generate the vector before the search plugin processes it

Breaking change: unifiedSearch no longer accepts a plain string. Callers must use unifiedSearch: { text: \"...\" } instead of unifiedSearch: \"...\".

Review & Testing Checklist for Human

Medium risk — API shape change for existing unifiedSearch consumers.

  • Verify any existing code using unifiedSearch: \"text\" is updated to unifiedSearch: { text: \"text\" }
  • Test hybrid query with both text and vector — verify rows matching EITHER path are returned (OR semantics)
  • Test vector-only query via unifiedSearch: { vector: [...], distance: 1.0 } — verify vector distance scoring works
  • Test that searchScore correctly blends text + vector signals when both are active
  • Verify includeChunks option works for vector search through unifiedSearch input

Notes

  • The UnifiedSearchInput type conditionally includes vector fields — only when a pgvector adapter is registered. Without pgvector, only text is available.
  • The existing per-adapter filter fields (tsvTsv, bm25Body, vectorEmbedding, etc.) continue to work independently and can be combined with unifiedSearch.
  • This is a plugin-only change — no database-side changes needed.

Link to Devin session: https://app.devin.ai/sessions/2b5a29d83d3f478e8d3d972653b4879c
Requested by: @pyramation

#855)

Changes unifiedSearch from String to UnifiedSearchInput:
- text: keyword query dispatched to tsvector + BM25 + trgm (text adapters)
- vector: query vector dispatched to pgvector (semantic search)
- metric: COSINE/L2/IP (default COSINE)
- distance: max distance threshold for vector results
- includeChunks: enable/disable chunk-aware vector search

When both text and vector are provided, WHERE clauses are OR-combined
for true hybrid retrieval. searchScore blends all active signals
(text + vector) into a single 0..1 relevance number.

Hook point for future Graphile LLM plugin: when vector is omitted and
an LLM plugin is loaded, it can intercept the text input and auto-generate
the vector before the search plugin processes it.
@devin-ai-integration
Copy link
Copy Markdown
Contributor

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

@blacksmith-sh
Copy link
Copy Markdown
Contributor

blacksmith-sh Bot commented May 17, 2026

Found 59 test failures on Blacksmith runners:

Failures

Test View Logs
BM25 search (pg_textsearch)/BM25 filter field exists on LocationFilter View Logs
BM25 search (pg_textsearch)/BM25 orderBy sorts by relevance View Logs
BM25 search (pg_textsearch)/BM25 score field is populated when filter is active View Logs
BM25 search (pg_textsearch)/BM25 score is null when no BM25 filter is active View Logs
BM25 search (pg_textsearch)/BM25 search filters by body text View Logs
CLI E2E — search commands against real DB/
should combine search filter with --limit for paginated results
View Logs
CLI E2E — search commands against real DB/
should expose search fields on Article type via introspection
View Logs
CLI E2E — search commands against real DB/
should filter articles by trgm similarity via dot-notation where
View Logs
CLI E2E — search commands against real DB/
should filter articles by tsvector search via --where.tsvTsv
View Logs
CLI E2E — search commands against real DB/
should filter via unifiedSearch composite filter
View Logs
Kitchen sink (multi-plugin queries)/combines BM25 + scalar filter View Logs
Kitchen sink (multi-plugin queries)/combines OR with tsvector and scalar filters View Logs
Kitchen sink (multi-plugin queries)/
combines tsvector search + scalar filter + relation filter
View Logs
Kitchen sink (multi-plugin queries)/
mega query: BM25 + tsvector + pgvector + PostGIS + pg_trgm + relation filter + scalar i
n ONE query, with multi-signal orderBy
View Logs
Kitchen sink (multi-plugin queries)/pagination works with filters View Logs
Kitchen sink (multi-plugin queries)/vector search + filter on results View Logs
Many-to-many type name collision resilience/
opted-in junction tables produce m2m types in the schema
View Logs
Many-to-many type name collision resilience/
produces distinct m2m edge types for each junction path
View Logs
Many-to-many type name collision resilience/
schema builds without crashing when two junction tables target the same pair
View Logs
pgvector/exposes embedding as array of floats View Logs
pgvector/vector search function returns results ordered by similarity View Logs
pgvector/vector search respects result_limit View Logs
PostGIS spatial filters/geom column is exposed as GeoJSON View Logs
Relation filters/backward relation exists: locations that have tags View Logs
Relation filters/backward relation none: categories with NO inactive locations View Logs
Relation filters/backward relation some: categories with at least one active location View Logs
Relation filters/combines relation filter with scalar filter View Logs
Relation filters/forward relation: filter locations by category name View Logs
Scalar and logical filters/filters by boolean field View Logs
Scalar and logical filters/filters by numeric comparison View Logs
Scalar and logical filters/filters by string equalTo View Logs
Scalar and logical filters/isNull filters for NULL rating View Logs
Scalar and logical filters/no filter returns all rows View Logs
Scalar and logical filters/NOT negates a condition View Logs
Scalar and logical filters/OR combines conditions View Logs
Schema introspection/Location type has vector embedding field View Logs
Schema introspection/LocationFilter has relation filter fields View Logs
Schema introspection/LocationFilter has tsvector search field View Logs
Schema introspection/LocationFilter type exists and has scalar filter fields View Logs
Schema introspection/
locations connection exists and has where argument but no condition
View Logs
tsvector search (PgSearchPlugin)/tsvector search combined with scalar filter View Logs
tsvector search (PgSearchPlugin)/tsvTsv matches filters by text search View Logs
tsvector search (PgSearchPlugin)/tsvTsv with broad term matches multiple rows View Logs
Unified Search — server integration/should query all articles View Logs
Unified Search — server integration › composite search/
should expose searchScore when any search filter is active
View Logs
Unified Search — server integration › composite search/
should filter via unifiedSearch composite filter
View Logs
Unified Search — server integration › composite search/
should order by SEARCH_SCORE_DESC
View Logs
Unified Search — server integration › composite search/
should return searchScore as null when no search filter is active
View Logs
Unified Search — server integration › Mega Query v1 — per-algorithm filters/
should combine tsvector + trgm filters with multi-column ordering
View Logs
Unified Search — server integration › Mega Query v2 — unifiedSearch composite/
should use unifiedSearch + SEARCH_SCORE_DESC
View Logs
Unified Search — server integration › pg_trgm fuzzy matching/
should filter articles by trgm similarity on title (trgmTitle)
View Logs
Unified Search — server integration › pg_trgm fuzzy matching/
should filter by trgm on body (trgmBody)
View Logs
Unified Search — server integration › pg_trgm fuzzy matching/
should order by TITLE_TRGM_SIMILARITY_DESC
View Logs
Unified Search — server integration › Schema introspection/
should expose expected filter fields on ArticleFilter type
View Logs
Unified Search — server integration › Schema introspection/
should expose expected orderBy enum values
View Logs
Unified Search — server integration › Schema introspection/
should expose expected search fields on Article type
View Logs
Unified Search — server integration › tsvector search/
should filter articles by tsvector search (tsvTsv)
View Logs
Unified Search — server integration › tsvector search/should order by TSV_RANK_DESC View Logs
Unified Search — server integration › tsvector search/
should return tsvRank as null when no tsvector filter is active
View Logs

Fix in Cursor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant