Skip to content

feat: add MockTextEmbedder and MockDocumentEmbedder#11709

Open
julian-risch wants to merge 6 commits into
v3from
feat/mock-embedders
Open

feat: add MockTextEmbedder and MockDocumentEmbedder#11709
julian-risch wants to merge 6 commits into
v3from
feat/mock-embedders

Conversation

@julian-risch

@julian-risch julian-risch commented Jun 22, 2026

Copy link
Copy Markdown
Member

Related Issues

  • Part of deepset-ai/haystack-private#376

Proposed Changes:

Adds MockTextEmbedder and MockDocumentEmbedder, embedders that produce embeddings without calling any API. Drop-in replacements for real embedders (e.g. OpenAITextEmbedder in tests and prototypes.

Embedding selection modes (consistent across both components, and mirroring MockChatGenerator's fixed / *_fn / default trio):

  • Deterministic (default) – with no configuration, the embedding is derived from a stable SHA-256 hash of the prepared text. The same text always yields the same unit-length embedding. The mocks ranks documents differently in retrieval pipelines and results are reproducible across runs.
  • Fixed – pass an embedding vector; returned/assigned for every input.
  • Dynamic – pass embedding_fn=callable(text) -> list[float].

Both components accept the parameters of real embedders (prefix, suffix; plus meta_fields_to_embed / embedding_separator for the document embedder). They report approximate token usage in meta, consistent with MockChatGenerator. They implement the full embedder interface: run, run_async, to_dict / from_dict (including embedding_fn named-callable serialization).

How did you test it?

Added unit tests

Notes for the reviewer

Checklist

  • I have read the contributors guidelines and the code of conduct.
  • I have updated the related issue with new insights and changes.
  • I have added unit tests and updated the docstrings.
  • I've used one of the conventional commit types for my PR title.
  • I have documented my code.
  • I have added a release note file.
  • I have run pre-commit hooks and fixed any issue.

🤖 Generated with Claude Code

…rototyping

Add `MockTextEmbedder` and `MockDocumentEmbedder`, embedders that return
deterministic embeddings without calling any API. They are zero-cost
drop-in replacements for real embedders in tests, smoke tests, and quick
prototypes, following the same design as `MockChatGenerator` (#11708)
and inspired by model-layer fakes in other frameworks (LangChain
`DeterministicFakeEmbedding`/`FakeEmbeddings`, LlamaIndex `MockEmbedding`).

By default each embedding is derived from a stable SHA-256 hash of the
(prepared) text, so the same text always yields the same unit-length
embedding and different texts yield different embeddings. This makes the
mocks usable in retrieval pipelines and reproducible across runs and
processes. A fixed `embedding` vector or a custom `embedding_fn` callable
can be provided instead.

Both implement the full embedder interface: `run`, `run_async`, and
serialization. The deterministic-embedding helpers are shared via
`haystack/components/embedders/mock_utils.py`.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@vercel

vercel Bot commented Jun 22, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
haystack-docs Ignored Ignored Preview Jun 22, 2026 2:44pm

Request Review

@github-actions github-actions Bot added topic:tests type:documentation Improvements on the docs labels Jun 22, 2026
@github-actions

github-actions Bot commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

Coverage report

Click to see where and how coverage changed

FileStatementsMissingCoverageCoverage
(new stmts)
Lines missing
  haystack/components/embedders
  __init__.py
  mock_document_embedder.py
  mock_text_embedder.py
  mock_utils.py
Project Total  

This report was generated by python-coverage-comment-action

Reduce the number of test functions via parametrization (init validation,
serialization roundtrips) and by testing the shared deterministic-embedding
algorithm once instead of duplicating it across both embedder test files.
Coverage is unchanged: both components stay at 100% and mock_utils reaches
100% (the zero-vector normalization guard is now pinned by a direct test).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@julian-risch julian-risch changed the title feat: add MockTextEmbedder and MockDocumentEmbedder for testing and prototyping feat: add MockTextEmbedder and MockDocumentEmbedder Jun 22, 2026
@julian-risch julian-risch marked this pull request as ready for review June 22, 2026 11:34
@julian-risch julian-risch requested a review from a team as a code owner June 22, 2026 11:34
@julian-risch julian-risch requested review from bogdankostic and sjrl and removed request for a team and bogdankostic June 22, 2026 11:34
julian-risch and others added 4 commits June 22, 2026 15:47
Document the empty-`embedding` ValueError and the non-numeric-`embedding`
TypeError that `_validate_embedding` raises during construction, matching
the docstring-completeness pass done for MockChatGenerator.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Real embedders that load a local model (e.g. SentenceTransformers) expose
warm_up(); add a no-op warm_up to MockTextEmbedder and MockDocumentEmbedder
so they are drop-in replacements for those too and an explicit warm_up()
call never fails. Matches MockChatGenerator, which already has warm_up.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Mirror MockChatGenerator's warm-up handling: initialize `_is_warmed_up`
to False, set it True in warm_up(), and auto-warm in run() (run_async
delegates to run, so it is covered too).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
warm_up() is an idempotent no-op, so guard with `if not self._is_warmed_up`
is unnecessary; call it directly.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

topic:tests type:documentation Improvements on the docs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant