Skip to content

v5.4.0 — Hybrid search + reranking for RAG (plus MFA, email verification, and test fixes)#15

Open
AdrianCurtin wants to merge 1 commit into
mainfrom
dev2
Open

v5.4.0 — Hybrid search + reranking for RAG (plus MFA, email verification, and test fixes)#15
AdrianCurtin wants to merge 1 commit into
mainfrom
dev2

Conversation

@AdrianCurtin
Copy link
Copy Markdown
Contributor

@AdrianCurtin AdrianCurtin commented Jun 6, 2026

v5.4.0 — Hybrid search and reranking for RAG, plus MFA, email verification, and test fixes

This branch is the 5.4.0 release. The headline addition is hybrid (lexical + vector) search with reciprocal-rank fusion and cross-encoder reranking for the retrieval layer; it also carries a set of MFA, email-verification, Parse::Audience, and test-tooling fixes.

Retrieval (RAG): hybrid search + reranking

  • NEW: Class.hybrid_search(text:, lexical:, vector:, k:, fusion:) fuses a lexical Atlas Search ($search) branch with a $vectorSearch branch using reciprocal-rank fusion (RRF). Two independent aggregations are required because $vectorSearch cannot live inside $facet/$lookup/$unionWith and must be stage 0. Each branch enforces ACL/CLP/protectedFields independently before fusion, so fused rows are already access-filtered — there is no separate hydration fetch. Results carry #hybrid_score, #hybrid_ranks, and #vector_score/#search_score.
  • NEW: Parse::VectorSearch::Hybrid.rrf (pure fusion math) and Parse::VectorSearch::Hybrid.rank_fusion_supported? (Atlas 8.0+ native $rankFusion detection via a cached behavioural probe, 1-hour TTL — not version-string parsing).
  • NEW: Parse::Retrieval::Reranker cross-encoder reranking protocol with a deterministic Reranker::Fixture and a Reranker::Cohere adapter (/v2/rerank).
  • NEW: Parse::Retrieval.retrieve accepts hybrid: and rerank: (previously reserved and raising NotImplementedError); tenant scope is folded into both branches.
  • NEW: Parse::Embeddings::SpendCap — opt-in per-tenant cumulative embedding token cap with hard-refuse, charged at the semantic_search agent-tool boundary (admin agents exempt; a breach surfaces as a rate-limited tool error).
  • CHANGED: PipelineSecurity::ALLOWED_STAGES and STAGE0_ONLY_ATLAS_STAGES admit $rankFusion for the opt-in native path.

Retrieval (RAG): completeness

  • NEW: Class.embed_pending! backfills embeddings for records whose managed :vector field is null, using objectId-cursor pagination.
  • NEW: Parse::Object#compute_embedding! forces an in-place recompute without a save (digest-tracked).
  • NEW: vector_visibility :owner_only | :public controls whether a class's :vector properties appear in as_json by default; an explicit include_vectors: always wins.
  • IMPROVED: Webhook trigger payloads strip declared :vector columns from object/original/update/objects by default (a :public class keeps them).

Auth and accounts

  • FIXED: Parse::User MFA lifecycle — setup_mfa!, setup_sms_mfa!, confirm_sms_mfa!, disable_mfa!, and disable_mfa_master_key! no longer raise an internal argument error before reaching the server; mfa_enabled?/mfa_status report correctly after an ordinary fetch (a leak-safe {status: "enabled"} projection is preserved while the TOTP secret and recovery codes are stripped).
  • NEW: Interactive console MFA login — rake client:console prompts for a TOTP/recovery code (or reads PARSE_LOGIN_MFA) when logging into an enrolled account.
  • NEW: Parse::User.request_email_verification(email) (and the instance form) re-send the verification email for a registered, unverified user, mirroring request_password_reset.
  • FIXED: Parse::Audience#query is stored as a JSON string on the wire to match Parse Server's _Audience.query column type, so saving a hash query no longer fails the server schema check. The public API is unchanged.

Performance and tooling

  • CHANGED: Parse::AtlasSearch role_cache_ttl now defaults to 30 seconds (was 120) so role grants/revokes reflect in $search ACL decisions sooner.
  • CHANGED: Test tasks run through Bundler (bundle exec ruby) to avoid a minitest activation/load error when running individual test files; README documents the requirement.
  • IMPROVED: ACL documentation in lib/parse/model/acl.rb and lib/parse/model/object.rb clarifies the default :owner_else_private policy, its private fallback, and how to override it via set_default_acl / acl_policy.

Notes for reviewers

  • Hybrid fusion runs client-side by default. The native single-roundtrip $rankFusion path is opt-in (fusion: { method: :rrf_native }) and falls back to client-side fusion when the cluster does not support it. Detection and the native pipeline shape are unit-tested, but native execution is not exercised in CI (no Atlas 8.0 cluster), so live results route through the always-enforced two-aggregate client path unless native is explicitly requested.
  • The Reranker::Cohere /v2/rerank response parsing is tested against a stubbed HTTP layer rather than a live key.

Copilot AI review requested due to automatic review settings June 6, 2026 04:19
@AdrianCurtin AdrianCurtin changed the title Use bundle exec for tests and clarify ACL docs v5.4.0 — Hybrid search + reranking for RAG (plus MFA, email verification, and test fixes) Jun 6, 2026
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the test/developer workflow to run reliably under Bundler while also shipping a sizable 5.4.0 feature set: hybrid lexical+vector search (RRF + optional $rankFusion), retrieval reranking, embedding spend caps, tighter vector exposure defaults (serialization + webhooks), and expanded MFA/email/push integration coverage plus supporting docker test-stack wiring.

Changes:

  • Run test files via bundle exec and add test helper support for post-signup login to obtain a live session token under Parse Server 9.x behavior.
  • Add major RAG/search and security-related features: hybrid search + reranking, spend-cap metering, vector visibility controls, and vector scrubbing in webhook payloads.
  • Expand integration/unit coverage, examples, and docs; bump version to 5.4.0 and update Parse Server docker/test-stack configuration (MFA/auth, push, capturing email).

Reviewed changes

Copilot reviewed 52 out of 54 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
test/test_helper_integration.rb Adds login_after_signup! helper to ensure live session tokens in integration tests.
test/lib/parse/vector_visibility_test.rb Adds unit tests for new vector_visibility behavior and webhook redaction expectations.
test/lib/parse/vector_searchable_hybrid_test.rb Tests Class.hybrid_search wrapper kwargs threading and object hydration with hybrid metadata.
test/lib/parse/vector_search_hybrid_test.rb Tests RRF math, $rankFusion probe caching, native pipeline shape, and orchestration behavior.
test/lib/parse/user_save_signup_integration_test.rb Adjusts integration tests to log in after signup to obtain session tokens.
test/lib/parse/user_authdata_strip_test.rb Adds test ensuring MFA authData is reduced to leak-safe status only.
test/lib/parse/retrieval_retrieve_test.rb Updates retrieval tests for hybrid routing and rerank behavior.
test/lib/parse/retrieval_reranker_test.rb Adds unit tests for reranker protocol, fixture, and Cohere adapter parsing/redaction.
test/lib/parse/push_integration_test.rb Fixes integration setup and adds server-backed push/install/audience lifecycle tests with cleanup.
test/lib/parse/mfa_totp_flow_integration_test.rb Adds end-to-end TOTP MFA integration tests against MFA-enabled server.
test/lib/parse/mfa_test.rb Fixes provisioning URI assertion to account for URL encoding.
test/lib/parse/live_query_integration_test.rb Makes LiveQuery event tests deterministic by using public ACL objects + timeouts.
test/lib/parse/embeddings_spend_cap_test.rb Adds unit tests for embedding token spend cap behavior.
test/lib/parse/embed_pending_test.rb Adds unit tests for embedding backfill and compute_embedding!.
test/lib/parse/email_verification_disruptive_test.rb Adds disruptive integration test that recreates server with email verification enabled.
test/lib/parse/client_rest_password_reset_integration_test.rb Adds client-mode password reset integration coverage using capturing adapter.
test/cloud/dummy-push-adapter.js Adds test-only push adapter enabling deterministic _PushStatus lifecycle tests.
test/cloud/capturing-email-adapter.js Adds test-only email adapter capturing outgoing messages into Parse for assertions.
scripts/start-parse.sh Wires push adapter, MFA auth config file, capturing email adapter, and public server URL for test stack.
scripts/docker/Dockerfile.parse Pins Parse Server image tag to 9.9.0 for specific security fixes/features.
scripts/docker/docker-compose.verifyemail.yml Adds compose override enabling verifyUserEmails for disruptive test.
README.md Adds “What’s new in 5.4” summary, examples section, and bundle exec testing guidance; bumps version text.
Rakefile Routes per-file test execution through bundle exec and adds optional MFA login flow for console.
lib/parse/webhooks/payload.rb Adds vector column scrubbing for webhook payload object/original/update/objects.
lib/parse/vector_search/hybrid.rb Introduces hybrid search module (RRF + optional native $rankFusion path + probe cache).
lib/parse/two_factor_auth/user_extension.rb Fixes MFA API calls to pass session/master opts correctly; revises disable flow.
lib/parse/stack/version.rb Bumps gem version to 5.4.0.
lib/parse/retrieval/retriever.rb Enables hybrid: and rerank: in retrieval, adds rerank integration and hybrid wiring.
lib/parse/retrieval/reranker/cohere.rb Adds Cohere /v2/rerank adapter with hardened HTTP handling.
lib/parse/retrieval/reranker.rb Adds reranker protocol/base, fixture reranker, and Cohere autoload.
lib/parse/retrieval/agent_tool.rb Charges embedding spend cap for semantic_search and maps breaches to rate-limit errors.
lib/parse/pipeline_security.rb Allows $rankFusion as an Atlas stage-0 operator.
lib/parse/model/object.rb Documents default ACL policy and adds hybrid score/ranks accessors; updates vector serialization default logic.
lib/parse/model/core/vector_searchable.rb Adds vector_visibility DSL and hybrid_search API + hybrid hit builder.
lib/parse/model/core/embed_managed.rb Adds compute_embedding! and embed_pending! bulk backfill API.
lib/parse/model/classes/user.rb Preserves leak-safe MFA status projection while stripping sensitive authData; adds email verification request APIs.
lib/parse/model/classes/audience.rb Fixes _Audience.query persistence by storing JSON string on wire and exposing Hash API.
lib/parse/model/acl.rb Clarifies default ACL policy documentation (:owner_else_private).
lib/parse/embeddings/spend_cap.rb Adds per-tenant embedding spend cap implementation (disabled by default).
lib/parse/embeddings.rb Requires new spend cap module.
lib/parse/atlas_search.rb Reduces role cache TTL default from 120s to 30s.
lib/parse/api/users.rb Adds POST /verificationEmailRequest client method with rate-limit tracking.
Gemfile.lock Bumps gem version and adds rotp/rqrcode (and deps).
Gemfile Adds rotp and rqrcode to test/development group for MFA tests/QR.
examples/README.md Adds index of runnable example scripts and common setup.
examples/rag_chatbot.rb Adds end-to-end RAG example using managed embeddings + agent retrieval + LLM add-in.
examples/live_query_listener.rb Adds interactive LiveQuery listener example scoped to a user session.
examples/basic_server.rb Adds privileged (master-key) setup + schema/CRUD example.
examples/basic_client.rb Adds unprivileged client example demonstrating ACL enforcement.
docs/mongodb_direct_guide.md Documents enforcement behavior for Atlas index stages including $rankFusion and hybrid search.
docs/client_sdk_guide.md Links to new runnable client/server examples.
docs/atlas_vector_search_guide.md Documents hybrid search, reranking, and spend cap behavior; links to RAG example.
CHANGELOG.md Adds detailed 5.4.0 changelog entry.
.gitignore Ensures examples/README.md is not ignored.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread lib/parse/retrieval/retriever.rb Outdated
Comment on lines +175 to +179
lexical[:query] ||= query
lexical[:filter] = merge_filters(filter, tenant_filter_hash(klass, tenant_scope), lexical[:filter])
vector[:field] ||= field unless field.nil?
vector[:filter] ||= filter
vector[:vector_filter] ||= merged_vector_filter
Comment on lines +125 to +127
# The account label is URL-encoded in a valid otpauth URI ("@" -> "%40"),
# so decode before asserting the address is present.
assert CGI.unescape(uri).include?("test@example.com"), "Should include account name"
@AdrianCurtin AdrianCurtin force-pushed the dev2 branch 3 times, most recently from 8268e03 to e773968 Compare June 6, 2026 04:50
Bump to v5.4.0 and add major RAG/vector-search features, security improvements, MFA fixes, examples and tests. Introduces hybrid vector+lexical search with reciprocal-rank fusion and optional native $rankFusion support, a Retrievel Reranker protocol (Cohere adapter + test fixture), and Parse::VectorSearch::Hybrid. Adds embeddings spend-cap enforcement, Class.embed_pending!/compute_embedding!, vector_visibility DSL and webhook payload redaction. Fixes Parse::Audience JSON query persistence and multiple MFA write/status/disable bugs; console login now handles MFA prompts. Adds runnable examples (basic client/server, live-query, rag_chatbot) and related docs/README updates, test coverage for new features, and Dev/Test tooling tweaks (rotp/rqrcode deps, bundle exec in Rakefile). Misc: new helper libs (reranker, hybrid, spend_cap), many tests, and assorted docs and integration changes to reflect the 5.4.0 release.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants