feat: remove mode argument from Semble#119
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests.
... and 1 file with indirect coverage changes 🚀 New features to boost your workflow:
|
Pringled
left a comment
There was a problem hiding this comment.
LGTM! Just some small questions.
| return [SearchResult(chunk=chunks[i], score=float(scores[i])) for i in indices if scores[i] > 0] | ||
|
|
||
|
|
||
| def search_hybrid( |
There was a problem hiding this comment.
IMO we could even just rename this to search to make it clearer that we just support one search (since the way I read the function name as is implies it's just hybrid search but it's hybrid search + all our tricks, search reads more natureally to me).
| # alpha=0.0 → hybrid pipeline, BM25-only input | ||
| # alpha=1.0 → hybrid pipeline, semantic-only input | ||
| _MODE_PARAMS: dict[str, tuple[SearchMode, float | None]] = { | ||
| "bm25": (SearchMode.BM25, None), |
There was a problem hiding this comment.
We also need to update the actual results I guess if we do this? The readme etc? I'm a bit torn on removing these, since specially bm25 is a very strong and standard baseline to report. But we can just call the private functions here to to keep supporting these ablations I think, wdyt?
This PR removes the mode argument from the search. The mode just led to weaker results because it disabled the ranking pipeline. Because we also expose alpha, users can safely choose full bm25 or full semantic by setting alpha to 0.0 or 1.0, and still get the benefits of the ranking pipeline.
For ablations, I opted to remove the model results as well, I think the results are still pretty clear and actionable, although I can see the value in patching it for evaluation purposes (e.g., maybe disabling the reranking with a private boolean?)