feat(search): add semantic search for AI-powered tool discovery#142
feat(search): add semantic search for AI-powered tool discovery#142Shashikant86 wants to merge 25 commits intomainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Adds first-class semantic search capabilities to the StackOne AI SDK to improve natural-language tool discovery at scale (10k+ actions), with optional agent-facing utility tools and a benchmark harness to compare semantic vs local hybrid (BM25+TF‑IDF) search.
Changes:
- Introduces
SemanticSearchClient(+ Pydantic response models) for/actions/search, and exposes it viaStackOneToolSet.semantic_client. - Adds
StackOneToolSet.search_tools()andsearch_action_names()with connector-aware filtering and optional local fallback. - Extends
Tools/StackOneToolwith connector helpers and adds a semantic variant of thetool_searchutility tool; includes benchmark script + documented benchmark results.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
stackone_ai/semantic_search.py |
New semantic search HTTP client + typed response models. |
stackone_ai/toolset.py |
Adds lazy semantic client + semantic search APIs (search_tools, search_action_names) with fallback logic. |
stackone_ai/utility_tools.py |
Adds create_semantic_tool_search() utility tool for agent-driven semantic discovery. |
stackone_ai/models.py |
Adds StackOneTool.connector plus Tools.get_connectors() / filter_by_connector() and semantic-search support in utility_tools(). |
stackone_ai/__init__.py |
Re-exports semantic search public API symbols. |
tests/test_semantic_search.py |
Unit + integration tests for the semantic client and toolset integration, plus connector helper tests. |
tests/benchmark_search.py |
New benchmark runner comparing local vs semantic search. |
tests/BENCHMARK_RESULTS.md |
Captures benchmark outcomes and methodology. |
README.md |
Documents semantic search at a high level. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
stackone_ai/toolset.py
Outdated
| # Fallback to local search | ||
| all_tools = self.fetch_tools(account_ids=account_ids) | ||
| utility = all_tools.utility_tools() | ||
| search_tool = utility.get_tool("tool_search") | ||
|
|
||
| if search_tool: | ||
| result = search_tool.execute( | ||
| { | ||
| "query": query, | ||
| "limit": top_k, | ||
| "minScore": min_score, | ||
| } | ||
| ) | ||
| matched_names = [t["name"] for t in result.get("tools", [])] | ||
| return Tools([t for t in all_tools if t.name in matched_names]) | ||
|
|
There was a problem hiding this comment.
In the local-search fallback path, connector filtering is silently ignored and the returned Tools are not ordered by local relevance (they’re returned in all_tools iteration order, not matched_names/score order). This makes fallback behavior diverge from the semantic path and can return poorly ordered results. Consider applying the connector filter before building the local index (when connector is provided) and sorting the returned tools to match matched_names order.
stackone_ai/toolset.py
Outdated
| except SemanticSearchError: | ||
| if not fallback_to_local: | ||
| raise | ||
|
|
||
| # Fallback to local search | ||
| all_tools = self.fetch_tools(account_ids=account_ids) | ||
| utility = all_tools.utility_tools() | ||
| search_tool = utility.get_tool("tool_search") | ||
|
|
||
| if search_tool: | ||
| result = search_tool.execute( | ||
| { | ||
| "query": query, | ||
| "limit": top_k, | ||
| "minScore": min_score, | ||
| } | ||
| ) | ||
| matched_names = [t["name"] for t in result.get("tools", [])] | ||
| return Tools([t for t in all_tools if t.name in matched_names]) | ||
|
|
||
| return all_tools |
There was a problem hiding this comment.
The new fallback_to_local behavior in search_tools() is not covered by tests: tests/test_semantic_search.py doesn’t exercise the except SemanticSearchError branch (ordering, min_score handling, and connector behavior). Adding a unit test that forces SemanticSearchClient.search to raise SemanticSearchError would help prevent regressions in the documented fallback feature.
| def search( | ||
| self, | ||
| query: str, | ||
| connector: str | None = None, | ||
| top_k: int = 10, | ||
| ) -> SemanticSearchResponse: | ||
| """Search for relevant actions using semantic search. | ||
|
|
||
| Args: | ||
| query: Natural language query describing what tools/actions you need | ||
| connector: Optional connector/provider filter (e.g., "bamboohr", "slack") | ||
| top_k: Maximum number of results to return (1-500, default: 10) | ||
|
|
||
| Returns: | ||
| SemanticSearchResponse containing matching actions with similarity scores | ||
|
|
||
| Raises: | ||
| SemanticSearchError: If the API call fails | ||
|
|
||
| Example: | ||
| response = client.search("onboard a new team member", top_k=5) | ||
| for result in response.results: | ||
| print(f"{result.action_name}: {result.similarity_score:.2f}") | ||
| """ | ||
| url = f"{self.base_url}/actions/search" | ||
| headers = { | ||
| "Authorization": self._build_auth_header(), | ||
| "Content-Type": "application/json", | ||
| } | ||
| payload: dict[str, Any] = {"query": query, "top_k": top_k} | ||
| if connector: | ||
| payload["connector"] = connector | ||
|
|
||
| try: | ||
| response = httpx.post(url, json=payload, headers=headers, timeout=self.timeout) | ||
| response.raise_for_status() |
There was a problem hiding this comment.
SemanticSearchClient.search() documents top_k as 1–500 but does not validate inputs. Passing top_k<=0 or top_k>500 will currently be sent to the API and likely fail at runtime. Consider validating query (non-empty) and top_k range early (raising ValueError) so callers get consistent, client-side errors and you avoid unnecessary network calls.
| "default": 5, | ||
| }, | ||
| "minScore": { | ||
| "type": "number", | ||
| "description": "Minimum similarity score (0-1) to filter results (default: 0.0)", | ||
| "default": 0.0, |
There was a problem hiding this comment.
In create_semantic_tool_search(), limit and minScore have defaults but are implicitly treated as required in StackOneTool.to_openai_function() (fields are required unless nullable=True). That means agents may be forced to always provide them despite defaults. Consider marking limit and minScore as nullable=True (leaving query required) so the generated tool schema matches the intended optional-argument behavior.
| "default": 5, | |
| }, | |
| "minScore": { | |
| "type": "number", | |
| "description": "Minimum similarity score (0-1) to filter results (default: 0.0)", | |
| "default": 0.0, | |
| "default": 5, | |
| "nullable": True, | |
| }, | |
| "minScore": { | |
| "type": "number", | |
| "description": "Minimum similarity score (0-1) to filter results (default: 0.0)", | |
| "default": 0.0, | |
| "nullable": True, |
tests/benchmark_search.py
Outdated
| connector: str | None = None | ||
|
|
||
|
|
||
| # 103 semantically-challenging evaluation queries |
There was a problem hiding this comment.
The comment # 103 semantically-challenging evaluation queries is inconsistent with the PR description / BENCHMARK_RESULTS.md (94 tasks) and will get stale as the list changes. Consider deriving this from len(EVALUATION_TASKS) (or updating the comment to match the actual count) to avoid confusion.
| # 103 semantically-challenging evaluation queries | |
| # Semantically-challenging evaluation queries |
tests/BENCHMARK_RESULTS.md
Outdated
| | Method | Hit@5 | MRR | Avg Latency | Hits | | ||
| | ----------------- | ---------- | ---------- | ----------- | ------- | | ||
| | Local BM25+TF-IDF | 66.0% | 0.538 | 1.2ms | 62/94 | | ||
| | Semantic Search | 76.6% | 0.634 | 279.6ms | 72/94 | | ||
| | **Improvement** | **+10.6%** | **+0.096** | | **+10** | | ||
|
|
There was a problem hiding this comment.
The markdown tables in this file use leading || (double pipes), which renders as an extra empty column in GitHub-flavored markdown. Use single | table delimiters so the tables render correctly.
stackone_ai/toolset.py
Outdated
| # Step 2: Over-fetch from semantic API to account for connector filtering | ||
| # We fetch 3x to ensure we get enough results after filtering | ||
| over_fetch_multiplier = 3 | ||
| over_fetch_k = top_k * over_fetch_multiplier | ||
|
|
||
| response = self.semantic_client.search( | ||
| query=query, | ||
| connector=connector, | ||
| top_k=over_fetch_k, | ||
| ) |
There was a problem hiding this comment.
search_tools() over-fetches top_k by 3x without clamping to the semantic search API limit (docstring in SemanticSearchClient.search() says 1–500). For sufficiently large top_k, over_fetch_k can exceed the API max and cause avoidable failures. Consider validating top_k and clamping over_fetch_k to the API maximum (and also guarding against non-positive values).
stackone_ai/toolset.py
Outdated
| # Over-fetch if filtering by available_connectors | ||
| fetch_k = top_k * 3 if available_connectors else top_k | ||
|
|
||
| response = self.semantic_client.search( | ||
| query=query, | ||
| connector=connector, | ||
| top_k=fetch_k, | ||
| ) |
There was a problem hiding this comment.
search_action_names() also multiplies top_k by 3 when available_connectors is provided, but doesn’t clamp to the semantic API’s max top_k (1–500). This can trigger API errors for larger top_k. Add validation/clamping for fetch_k similarly to search_tools().
There was a problem hiding this comment.
3 issues found across 9 files
Prompt for AI agents (all issues)
Check if these issues are valid — if so, understand the root cause of each and fix them.
<file name="tests/benchmark_search.py">
<violation number="1" location="tests/benchmark_search.py:982">
P2: Connector-specific tasks aren’t filtered in local search, so local benchmark hits can come from the wrong connector and skew the comparison with semantic search. Filter local results by connector when task.connector is set.</violation>
</file>
<file name="stackone_ai/toolset.py">
<violation number="1" location="stackone_ai/toolset.py:347">
P2: Clamp the semantic search over-fetch to the API’s documented max (500). As written, `top_k * 3` can exceed the API limit and cause semantic search to fail for larger requests.</violation>
<violation number="2" location="stackone_ai/toolset.py:392">
P2: Preserve the relevance ordering from `tool_search` when falling back to local search; the current filtering returns tools in original toolset order instead of relevance order.</violation>
</file>
Since this is your first cubic review, here's how it works:
- cubic automatically reviews your code and comments on bugs and improvements
- Teach cubic by replying to its comments. cubic learns from your replies and gets better over time
- Ask questions if you need clarification on any suggestion
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
There was a problem hiding this comment.
1 issue found across 4 files (changes from recent commits).
Prompt for AI agents (all issues)
Check if these issues are valid — if so, understand the root cause of each and fix them.
<file name="stackone_ai/utility_tools.py">
<violation number="1" location="stackone_ai/utility_tools.py:205">
P2: `limit`/`minScore` are now nullable in the semantic tool schema, but the execution path casts them directly to int/float. A null value will raise a TypeError at runtime. Either disallow null in the schema or add None handling before casting.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
There was a problem hiding this comment.
1 issue found across 1 file (changes from recent commits).
Prompt for AI agents (all issues)
Check if these issues are valid — if so, understand the root cause of each and fix them.
<file name="stackone_ai/utility_tools.py">
<violation number="1" location="stackone_ai/utility_tools.py:225">
P2: Using `or 5` overrides an explicit `limit=0`, so callers can no longer request zero results. Preserve 0 while still defaulting when the key is missing or None.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
tests/BENCHMARK_RESULTS.md
Outdated
| | Method | Hit@5 | MRR | Avg Latency | Hits | | ||
| | ----------------- | ---------- | ---------- | ----------- | ------- | | ||
| | Local BM25+TF-IDF | 66.0% | 0.538 | 1.2ms | 62/94 | | ||
| | Semantic Search | 76.6% | 0.634 | 279.6ms | 72/94 | |
There was a problem hiding this comment.
that's higher latency than I would have hoped, i'm assuming most of it is the network roundtrip not the actual search in the backend
There was a problem hiding this comment.
For the benchmark against local lambda, this is still high as it it does performs embeddings and does semantic semantic search on prebuilt embedding. We have not done the prod benchmarks yet but with optimized infra for embedding will likely to improve the results in prod.
There was a problem hiding this comment.
Benchmark needs to re-done at some point so removing this for now.
stackone_ai/models.py
Outdated
| def utility_tools( | ||
| self, | ||
| hybrid_alpha: float | None = None, | ||
| use_semantic_search: bool = False, |
There was a problem hiding this comment.
Do we need two new params, could we instead consider presence of semantic_client as the use_semantic_search flag ?
There was a problem hiding this comment.
Good point.. They actually takes the different path and Passing semantic_client actually clarify which path to take.. I need to fix that thanks 👍..
There was a problem hiding this comment.
This is now based on the presence of Semantic Client .. Thanks
tests/BENCHMARK_RESULTS.md
Outdated
| | "see who applied for the role" | greenhouse_list_applied_candidate_tags | ashby_add_hiring_team_member | | ||
| | "advance someone to the next round" | greenhouse_move_application | factorial_invite_employee | | ||
| | "see open positions" | teamtailor_list_jobs | hibob_create_position_opening | | ||
| | "close a deal" | zohocrm_get_deal | shopify_close_order | | ||
| | "check course completion" | saba_delete_recurring_completion | saba_get_course | | ||
| | "update deal and notify team" | zohocrm_get_deal | microsoftteams_update_team | | ||
| | "look up customer" | linear_update_customer_need | shopify_search_customers | |
There was a problem hiding this comment.
all these are surprising that semantics earch gets it so wrong. Is that because we're just looking at a single matched result?
Is it possible to commit the benchmark result actually in term of returned matches? Some of these results are puzzling
There was a problem hiding this comment.
Benchmarks needs to be re-visited later.
stackone_ai/models.py
Outdated
| """ | ||
| return {tool.connector for tool in self.tools} | ||
|
|
||
| def filter_by_connector(self, connectors: list[str] | set[str]) -> Tools: |
There was a problem hiding this comment.
why do we need this function? Afaik we already support glob filtering which already allows this type of filtering (and more) and I don't think this is related to the semantic search
There was a problem hiding this comment.
This is removed now as glob filtering via fetch_tools already covered thats.. Great catch as it was likely added speculatively.
| self.base_url = base_url.rstrip("/") | ||
| self.timeout = timeout | ||
|
|
||
| def _build_auth_header(self) -> str: |
There was a problem hiding this comment.
Not as familiar with the python sdk than node one but it's worth dpuble checking whether or not we already have an http client to call the stackone API (considering we're calling the mcp server in that same SDK i'd assume we do)
There was a problem hiding this comment.
This is good point about potential duplication but good candidate for further refactor of the SDK and add A shared HTTP client to make the requests .. I would add the A shared HTTP client could be a follow-up refactor as it doesn't exist in the Python version?
stackone_ai/semantic_search.py
Outdated
| self, | ||
| query: str, | ||
| connector: str | None = None, | ||
| top_k: int = 10, |
There was a problem hiding this comment.
I think that's too low of a default, arguably we shouldn't actually have a default here since we have one in the backend and this property should be optional (i'm assuming here it is indeed optional on the backend)
There was a problem hiding this comment.
Yes, made it optional now and uses the value set in the backend. Good point on magic number shouldn't be there but more we allow more it affect the context. Let's use the backend default here ..
stackone_ai/toolset.py
Outdated
| connector: str | None = None, | ||
| available_connectors: set[str] | None = None, |
There was a problem hiding this comment.
why do we need two connector params? What's "available_connectors" and how is it different to connecto ?
There was a problem hiding this comment.
available connectors shouldn't be there yet or never as filtering is done via fetch tools. replaced by account Ids. It now resolves connectors internally. Fixed this..
There was a problem hiding this comment.
1 issue found across 10 files (changes from recent commits).
Prompt for AI agents (all issues)
Check if these issues are valid — if so, understand the root cause of each and fix them.
<file name="stackone_ai/toolset.py">
<violation number="1" location="stackone_ai/toolset.py:476">
P2: When `account_ids` resolve to zero connectors, the empty set is falsy so connector filtering is skipped and results are returned from unrelated connectors. This breaks account scoping for accounts with no linked connectors; consider returning an empty list or treating an empty set as a valid filter state.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
There was a problem hiding this comment.
1 issue found across 3 files (changes from recent commits).
Prompt for AI agents (all issues)
Check if these issues are valid — if so, understand the root cause of each and fix them.
<file name="examples/utility_tools_example.py">
<violation number="1" location="examples/utility_tools_example.py:103">
P3: This comment is misleading ("onboard new hire" should not map to termination tools). Update it to reflect onboarding tools to avoid confusing users of the example.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
| print(" matching (BM25+TF-IDF) to cloud-based semantic vector search.") | ||
| utility = tools.utility_tools(semantic_client=toolset.semantic_client) |
There was a problem hiding this comment.
feels like this should be one call, why are we inputting the semantic search client and then needing to get tool_search separately
There was a problem hiding this comment.
The two step call is intentional in this example as utility_tools returns multiple tools tool_search or tool_execute and we are optionally passing semantic_client.. The get_tool("tool_search") only for demo purposes as in other examples without semantic search. In the actual run tools will be convered into framework specific format e.g for OpenAI
utility = tools.utility_tools(semantic_client=toolset.semantic_client)
openai_tools = utility.to_openai()
If we do this in one call then we might loose the flexibility of the local search, having to execute tools along with tool_search. The example is just demonstrating each step separately like in other examples.
There was a problem hiding this comment.
I understand the need for the separation between the search and the execute functionality but it feels like the nicer experience is like my_search_tool = tools.search(client=semantic_search) or my_search_tool = tools.search(client=local_search) and the my_execute_tool = tools.execute()
this might be unachievable due to framework requirements but I don't like that you have to pass the semantic client above the hierarchy of actually getting the tool_search tool
| from typing import TYPE_CHECKING, Annotated, Any, ClassVar, TypeAlias, cast | ||
|
|
||
| if TYPE_CHECKING: | ||
| from stackone_ai.semantic_search import SemanticSearchClient |
There was a problem hiding this comment.
either import or don't - we don't want to import TYPE_CHECKING
There was a problem hiding this comment.
We can remove this but there is small risk of going this into a circular import and make type checkers like mypy happy. Happy to remove if you think there no need at all
There was a problem hiding this comment.
yeah think it's best as not sure we do this elsewhere
| return Tools([]) | ||
|
|
||
| # Step 2: Fetch results from semantic API, then filter client-side | ||
| response = self.semantic_client.search( |
There was a problem hiding this comment.
semantic search client already has a top_k parameter - why are we not using this instead?
There was a problem hiding this comment.
This is good point but as we discussed earlier, in the first call, we should fetch all the results across all connectors from the backend (whatever top_k set by default then we are doing client side filter on users available connectors which could discard large portion that not relevant. If e pass the top_k at this stage then we might restrict the results from backend. The client-side top_k cap is applied at the end after filtering and deduplication, I think, this way we never loose relevant things from the backend and still have ability to preocess client side.
There was a problem hiding this comment.
we shouldn't fetch all results across all connectors and then client side filter on all available connectors? If we know what connectors they have available then we should only search those connectors as we can do that server side? That way we can definitely pass in the top_k parameter and not have to do this filtering?
Problem
StackOne has over 10,000 actions across all connectors and growing, some connectors have 2,000+
actions alone. Keyword matching breaks down when someone searches "onboard new hire" but the
action is called
hris_create_employee. In the StackOne AI SDK there is support for the keyword search and we need add the support for the semantic search using the action search service.Implementation Details
Change Summary
SemanticSearchClientinterfacing with StackOne's/actions/searchAPI for naturallanguage tool discovery, with Pydantic models for type-safe results
search_tools()toStackOneToolSetwith over-fetching and connector filtering onlyreturns tools the user has linked accounts for, sorted by semantic relevance
search_action_names()for lightweight lookups without loading full tool definitionstool_search,tool_execute) for AI agents to dynamically discover andexecute tools at runtime, supporting both local and semantic search modes
Toolswithutility_tools(),get_connectors(), andfilter_by_connector()forconnector-aware tool management
messaging, docs, marketing, LMS) — currently benchmarked against local endpoints. Semantic
search achieves 76.6% Hit@5 vs 66.0% for local search (+10.6% improvement)
How to Test
just testto verify all existing and new unit tests passjust lintto confirm no linting regressionssearch_tools()returns relevant tools filtered to available connectorsuse_semantic_search=Trueand default local modeSummary by cubic
Adds semantic search to the SDK for natural-language tool discovery across connectors, improving relevance over keyword search. Benchmarks show +10.6% Hit@5 vs local BM25 (76.6% vs 66.0%).
New Features
Bug Fixes
Written for commit c4f8f34. Summary will update on new commits.