LCORE-86: Prioritize BYOK content over built-in content#1208
LCORE-86: Prioritize BYOK content over built-in content#1208are-ces wants to merge 1 commit intolightspeed-core:mainfrom
Conversation
WalkthroughThis PR restructures the RAG system to support two independent operational modes: Always RAG (pre-query context injection) and Tool RAG (on-demand retrieval via tools). Configuration is reorganized from Solr-based to a hierarchical Rag structure with Always and Tool subsections. A new RAGContext model encapsulates pre-query context data. Multiple utilities are refactored to support both BYOK vector stores and Solr OKP integration with weighted relevance scoring. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Chunks from Solr did not include Printout of the Solr chunk structure returned by Chunk(
chunk_id = '/documentation/en-us/openshift_container_platform/4.20/html-single/architecture/index_chunk_89',
content = '# Chapter 9. Admission plugins\n\n\n\n\nAdmission plugins are used to help regulate how [..]',
chunk_metadata = ChunkChunkMetadata(
chunk_embedding_dimension = None,
chunk_embedding_model = None,
chunk_id = '/documentation/en-us/openshift_container_platform/4.20/html-single/architecture/index_chunk_89',
chunk_tokenizer = None,
chunk_window = None,
content_token_count = None,
created_timestamp = None,
document_id = '/documentation/en-us/openshift_container_platform/4.20/html-single/architecture/index',
metadata_token_count = None,
source = None,
updated_timestamp = None,
),
embedding = [],
metadata = {},
embedding_model = 'sentence-transformers/ibm-granite/granite-embedding-30m-english',
embedding_dimension = 384,
) |
2e57151 to
f593d68
Compare
- Add configurable RAG strategies: always RAG which is performed at each query (OKP Solr + BYOK) and tool RAG can be used independently or together - Add chunk prioritization with score multipliers per vector store for always RAG - Added knobs in config to select the RAG strategy - Tool RAG defaults to enabled=True for backward compatibility - Update lightspeed stack configuration enrichment script to build the solr section in llama stack and fix bugs in building the vector stores - Updated byok and rag documentation - Updated unit tests
There was a problem hiding this comment.
Actionable comments posted: 5
🧹 Nitpick comments (3)
tests/unit/utils/test_responses.py (1)
2248-2275: Add an explicit regression test forvector_store_ids=[].Current cases cover
Noneand non-empty lists, but not an explicit empty list. A dedicated test will prevent accidental “fetch all stores” behavior regressions.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit/utils/test_responses.py` around lines 2248 - 2275, Add a regression test that verifies passing an explicit empty list to get_vector_store_ids returns an empty list and does not call the API: create an AsyncMock client (client_mock), call result = await get_vector_store_ids(client_mock, []), assert result == [], and assert client_mock.vector_stores.list.assert_not_called(); name the test clearly (e.g., test_returns_empty_for_explicit_empty_list_no_fetch) and place it alongside the existing tests in the same test module.tests/unit/test_llama_stack_configuration.py (1)
362-451: Good test coverage forenrich_solrfunction.The tests comprehensively cover the main scenarios: disabled state, empty config, adding providers/stores/models, and preserving existing configuration. The duplicate detection tests for providers and vector stores are present.
Consider adding a test for duplicate embedding model detection to match the logic in
enrich_solr(lines 470-473 insrc/llama_stack_configuration.py) which checksexisting_model_idsbefore adding.Optional test for duplicate embedding model
def test_enrich_solr_skips_duplicate_embedding_model() -> None: """Test enrich_solr does not add duplicate embedding model.""" ls_config: dict[str, Any] = { "registered_resources": { "models": [{"model_id": "existing", "provider_model_id": "all-mpnet-base-v2"}] } } enrich_solr(ls_config, {"enabled": True}) model_ids = [m["provider_model_id"] for m in ls_config["registered_resources"]["models"]] assert model_ids.count("all-mpnet-base-v2") == 1🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit/test_llama_stack_configuration.py` around lines 362 - 451, Add a unit test to cover duplicate embedding model detection in enrich_solr: create ls_config with "registered_resources": {"models": [{"model_id": "existing", "provider_model_id": "all-mpnet-base-v2"}]}, call enrich_solr(ls_config, {"enabled": True}), then assert that the list of provider_model_id values in ls_config["registered_resources"]["models"] contains "all-mpnet-base-v2" only once (count == 1); reference enrich_solr and the existing_model_ids/provider_model_id logic to locate where duplicates are checked.src/models/config.py (1)
1730-1740: Minor: Lambda wrappers indefault_factoryare unnecessary.The
# pylint: disable=unnecessary-lambdacomments indicate these lambdas could be simplified. Since these are class constructors with no arguments, you can pass the class directly.Optional simplification
- solr: SolrRagConfiguration = Field( - default_factory=lambda: SolrRagConfiguration(), # pylint: disable=unnecessary-lambda + solr: SolrRagConfiguration = Field( + default_factory=SolrRagConfiguration, title="Solr RAG configuration", description="Configuration for Solr RAG (pre-query).", ) - byok: ByokRagConfiguration = Field( - default_factory=lambda: ByokRagConfiguration(), # pylint: disable=unnecessary-lambda + byok: ByokRagConfiguration = Field( + default_factory=ByokRagConfiguration, title="BYOK RAG configuration", description="Configuration for BYOK RAG (pre-query).", )The same applies to lines 1764-1773. However, line 1750 requires the lambda since it passes
enabled=True.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/models/config.py` around lines 1730 - 1740, Replace the unnecessary lambda wrappers used as default_factory for SolrRagConfiguration and ByokRagConfiguration by passing the classes themselves (use default_factory=SolrRagConfiguration and default_factory=ByokRagConfiguration) in the Field definitions for the solr and byok attributes; also apply the same simplification for any other fields that use no-arg constructors (e.g., the similar fields around the RAG config group), but keep the lambda where a constructor requires arguments (such as the field that calls a class with enabled=True).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/rag_guide.md`:
- Around line 335-339: The fenced code block in docs/rag_guide.md is missing a
language specifier causing markdownlint MD040; update the block start to include
a language (e.g., bash) so the snippet begins with ```bash and leave the
contents unchanged—this addresses the lint warning for the curl POST example
used to call /v1/query.
In `@lightspeed-stack.yaml`:
- Around line 46-49: The YAML comment and value disagree for the BYOK flag: the
comment says "Default is true for backward compatibility" while the config sets
tool.byok.enabled: false; either change the comment to indicate the feature is
intentionally disabled (e.g., update the line near the tool/byok block to state
the default is false or that this base config disables BYOK) or change the value
to true to match the comment (set tool.byok.enabled: true); update only the
comment or only the boolean so they are consistent and leave the surrounding
YAML structure unchanged.
In `@src/app/endpoints/query.py`:
- Around line 163-167: The code mutates query_request.query by appending
rag_context.context_text; instead, avoid changing the model: after making the
deep copy (or even without copying), construct a separate retrieval string like
retrieval_query = query_request.query + rag_context.context_text and use
retrieval_query for retrieval/search/embedding calls only (where you currently
rely on the mutated value), leaving query_request.query intact for later
operations such as topic summary generation and storing the original user query;
update calls that currently read query_request.query for retrieval to use
retrieval_query instead and ensure persisted/stored values continue to use the
original query_request.query.
In `@src/utils/responses.py`:
- Around line 205-207: prepare_tools is calling vector_stores.list() via
get_vector_store_ids even when the feature toggle rag.tool.byok.enabled is
false, causing unnecessary failures; change the logic so that when
rag.tool.byok.enabled is False you skip any vector store listing and simply
return the passed-in vector_store_ids (or an empty list) instead of calling
vector_stores.list(); update either prepare_tools to short-circuit before
calling get_vector_store_ids or modify get_vector_store_ids to check the
rag.tool.byok.enabled flag and avoid calling vector_stores.list() when the flag
is disabled (referencing prepare_tools, get_vector_store_ids,
vector_stores.list, and rag.tool.byok.enabled to locate the code).
- Around line 125-126: The current truthiness check "if vector_store_ids:"
treats an explicit empty list as False and causes the code to fall through and
fetch all vector stores; change the condition to test for presence explicitly
(e.g., check for None) so that an empty list is preserved and returned. Locate
the check around the vector_store_ids variable in src/utils/responses.py and
replace the truthy conditional with an explicit None check (use
"vector_store_ids is not None" or equivalent) so the function returns the
provided vector_store_ids even when it's an empty list.
---
Nitpick comments:
In `@src/models/config.py`:
- Around line 1730-1740: Replace the unnecessary lambda wrappers used as
default_factory for SolrRagConfiguration and ByokRagConfiguration by passing the
classes themselves (use default_factory=SolrRagConfiguration and
default_factory=ByokRagConfiguration) in the Field definitions for the solr and
byok attributes; also apply the same simplification for any other fields that
use no-arg constructors (e.g., the similar fields around the RAG config group),
but keep the lambda where a constructor requires arguments (such as the field
that calls a class with enabled=True).
In `@tests/unit/test_llama_stack_configuration.py`:
- Around line 362-451: Add a unit test to cover duplicate embedding model
detection in enrich_solr: create ls_config with "registered_resources":
{"models": [{"model_id": "existing", "provider_model_id":
"all-mpnet-base-v2"}]}, call enrich_solr(ls_config, {"enabled": True}), then
assert that the list of provider_model_id values in
ls_config["registered_resources"]["models"] contains "all-mpnet-base-v2" only
once (count == 1); reference enrich_solr and the
existing_model_ids/provider_model_id logic to locate where duplicates are
checked.
In `@tests/unit/utils/test_responses.py`:
- Around line 2248-2275: Add a regression test that verifies passing an explicit
empty list to get_vector_store_ids returns an empty list and does not call the
API: create an AsyncMock client (client_mock), call result = await
get_vector_store_ids(client_mock, []), assert result == [], and assert
client_mock.vector_stores.list.assert_not_called(); name the test clearly (e.g.,
test_returns_empty_for_explicit_empty_list_no_fetch) and place it alongside the
existing tests in the same test module.
ℹ️ Review info
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (22)
docs/byok_guide.mddocs/config.mddocs/rag_guide.mdexamples/lightspeed-stack-byok-rag.yamllightspeed-stack.yamlrun.yamlsrc/app/endpoints/query.pysrc/app/endpoints/streaming_query.pysrc/configuration.pysrc/constants.pysrc/llama_stack_configuration.pysrc/models/config.pysrc/utils/responses.pysrc/utils/types.pysrc/utils/vector_search.pytests/unit/app/endpoints/test_streaming_query.pytests/unit/models/config/test_byok_rag.pytests/unit/models/config/test_dump_configuration.pytests/unit/models/responses/test_rag_chunk.pytests/unit/test_configuration.pytests/unit/test_llama_stack_configuration.pytests/unit/utils/test_responses.py
| ``` | ||
| curl -sX POST http://localhost:8080/v1/query \ | ||
| -H "Content-Type: application/json" \ | ||
| -d '{"query" : "how do I secure a nodejs application with keycloak?", "no_tools":true}' | jq . | ||
| -d '{"query" : "how do I secure a nodejs application with keycloak?"}' | jq . | ||
| ``` |
There was a problem hiding this comment.
Add a language to the fenced code block.
This block is missing a language specifier and triggers markdownlint MD040.
🛠️ Proposed fix
-```
+```bash
curl -sX POST http://localhost:8080/v1/query \
-H "Content-Type: application/json" \
-d '{"query" : "how do I secure a nodejs application with keycloak?"}' | jq .🧰 Tools
🪛 markdownlint-cli2 (0.21.0)
[warning] 335-335: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docs/rag_guide.md` around lines 335 - 339, The fenced code block in
docs/rag_guide.md is missing a language specifier causing markdownlint MD040;
update the block start to include a language (e.g., bash) so the snippet begins
with ```bash and leave the contents unchanged—this addresses the lint warning
for the curl POST example used to call /v1/query.
| tool: | ||
| byok: | ||
| # Default is true for backward compatibility | ||
| enabled: false |
There was a problem hiding this comment.
Comment contradicts the configured value.
The comment states "Default is true for backward compatibility" but the value is set to false. Either update the comment to reflect that this base config intentionally disables the feature, or change the value to true to match the stated backward compatibility intent.
Suggested fix
Option 1 - Keep disabled, fix comment:
tool:
byok:
- # Default is true for backward compatibility
+ # Disabled in base config; defaults to true when not specified
enabled: falseOption 2 - Enable for backward compatibility:
tool:
byok:
# Default is true for backward compatibility
- enabled: false
+ enabled: true📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| tool: | |
| byok: | |
| # Default is true for backward compatibility | |
| enabled: false | |
| tool: | |
| byok: | |
| # Disabled in base config; defaults to true when not specified | |
| enabled: false |
| tool: | |
| byok: | |
| # Default is true for backward compatibility | |
| enabled: false | |
| tool: | |
| byok: | |
| # Default is true for backward compatibility | |
| enabled: true |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@lightspeed-stack.yaml` around lines 46 - 49, The YAML comment and value
disagree for the BYOK flag: the comment says "Default is true for backward
compatibility" while the config sets tool.byok.enabled: false; either change the
comment to indicate the feature is intentionally disabled (e.g., update the line
near the tool/byok block to state the default is false or that this base config
disables BYOK) or change the value to true to match the comment (set
tool.byok.enabled: true); update only the comment or only the boolean so they
are consistent and leave the surrounding YAML structure unchanged.
| if rag_context.context_text: | ||
| # Mutate a local copy to avoid surprising other logic | ||
| query_request = query_request.model_copy(deep=True) | ||
| query_request.query = query_request.query + rag_context.context_text | ||
|
|
There was a problem hiding this comment.
Do not overwrite query_request.query with injected RAG context.
This mutation propagates into downstream uses of query_request.query (e.g., Line 219 topic summary and Line 249 stored query), so non-user context gets persisted as if it were user input.
🛠️ Proposed fix
- if rag_context.context_text:
- # Mutate a local copy to avoid surprising other logic
- query_request = query_request.model_copy(deep=True)
- query_request.query = query_request.query + rag_context.context_text
+ original_query = query_request.query
+ llm_query_request = query_request
+ if rag_context.context_text:
+ llm_query_request = query_request.model_copy(deep=True)
+ llm_query_request.query = query_request.query + rag_context.context_text
@@
- responses_params = await prepare_responses_params(
+ responses_params = await prepare_responses_params(
client,
- query_request,
+ llm_query_request,
user_conversation,
token,
@@
- topic_summary = await get_topic_summary(
- query_request.query, client, responses_params.model
- )
+ topic_summary = await get_topic_summary(
+ original_query, client, responses_params.model
+ )
@@
- query=query_request.query,
+ query=original_query,🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/app/endpoints/query.py` around lines 163 - 167, The code mutates
query_request.query by appending rag_context.context_text; instead, avoid
changing the model: after making the deep copy (or even without copying),
construct a separate retrieval string like retrieval_query = query_request.query
+ rag_context.context_text and use retrieval_query for
retrieval/search/embedding calls only (where you currently rely on the mutated
value), leaving query_request.query intact for later operations such as topic
summary generation and storing the original user query; update calls that
currently read query_request.query for retrieval to use retrieval_query instead
and ensure persisted/stored values continue to use the original
query_request.query.
There was a problem hiding this comment.
that is actually pretty accurate
There was a problem hiding this comment.
@tisnik, thank you for confirming! The mutation of query_request.query with the injected RAG context could indeed cause the enriched query (rather than the original user input) to be persisted in the topic summary and stored query records.
If you'd like assistance implementing the fix or would like me to create code changes for this issue, just let me know! The suggested approach of maintaining original_query separately and using llm_query_request for the LLM call should preserve the original user query for storage and topic generation.
🧠 Learnings used
Learnt from: are-ces
Repo: lightspeed-core/lightspeed-stack PR: 1073
File: src/app/endpoints/query_v2.py:575-586
Timestamp: 2026-01-30T13:33:40.479Z
Learning: In `src/app/endpoints/query_v2.py`, the `parse_referenced_documents_from_responses_api` function filters referenced documents to include only those with `doc_title` or `doc_url` because these documents are displayed in the frontend. Documents with only `doc_id` are intentionally excluded as they wouldn't provide useful information to end users.
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 988
File: src/app/endpoints/query.py:319-339
Timestamp: 2026-01-14T09:37:51.612Z
Learning: In the lightspeed-stack repository, when provider_id == "azure", the Azure provider with provider_type "remote::azure" is guaranteed to be present in the providers list. Therefore, avoid defensive StopIteration handling for next() when locating the Azure provider in providers within src/app/endpoints/query.py. This change applies specifically to this file (or nearby provider lookup code) and relies on the invariant that the Azure provider exists; if the invariant could be violated, keep the existing StopIteration handling.
| if vector_store_ids: | ||
| return vector_store_ids |
There was a problem hiding this comment.
Preserve explicit empty-list semantics for vector_store_ids.
Using truthiness here makes [] fall through to “fetch all vector stores”, which can unintentionally broaden retrieval scope.
🛠️ Proposed fix
- if vector_store_ids:
+ if vector_store_ids is not None:
return vector_store_ids📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if vector_store_ids: | |
| return vector_store_ids | |
| if vector_store_ids is not None: | |
| return vector_store_ids |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/utils/responses.py` around lines 125 - 126, The current truthiness check
"if vector_store_ids:" treats an explicit empty list as False and causes the
code to fall through and fetch all vector stores; change the condition to test
for presence explicitly (e.g., check for None) so that an empty list is
preserved and returned. Locate the check around the vector_store_ids variable in
src/utils/responses.py and replace the truthy conditional with an explicit None
check (use "vector_store_ids is not None" or equivalent) so the function returns
the provided vector_store_ids even when it's an empty list.
There was a problem hiding this comment.
again, this might be relevant
There was a problem hiding this comment.
🧩 Analysis chain
@tisnik, good catch! You're right that this pattern might appear elsewhere. Let me search for similar truthiness checks on optional list/collection parameters that could have the same issue.
🏁 Script executed:
#!/bin/bash
# Search for similar patterns where optional list parameters use truthiness checks
# Pattern: "if <param>:" where param is likely Optional[list[...]]
# First, find functions with Optional list parameters
echo "=== Functions with Optional[list[...]] parameters ==="
rg -n "Optional\[list\[" --type=py -A 5 | head -50
echo -e "\n=== Truthiness checks that might affect optional lists ==="
# Look for "if <identifier>:" patterns in the relevant files
rg -n "^\s+if [a-z_]+:" --type=py src/utils/responses.py src/app/endpoints/ src/utils/vector_search.py -B 3 -A 1
Length of output: 13082
Let me check the implementation of the _fetch_byok_rag function that also has an optional vector_store_ids parameter:
🏁 Script executed:
#!/bin/bash
# Check the _fetch_byok_rag function implementation for similar patterns
echo "=== _fetch_byok_rag function implementation ==="
rg -n -A 30 "async def _fetch_byok_rag" src/utils/vector_search.py
echo -e "\n=== Looking for truthiness checks on vector_store_ids parameter in vector_search.py ==="
rg -n "^\s+if vector_store_ids" src/utils/vector_search.py -B 5 -A 2
Length of output: 1746
@tisnik, you're absolutely right to flag this! I've searched the codebase and this is indeed the primary instance where an optional list parameter is checked with truthiness in this PR.
The semantic issue:
vector_store_ids=None→ should fetch all vector stores (default behavior)vector_store_ids=[]→ should use no vector stores (explicit empty list)- Current code: treats both cases as "fetch all" ❌
The bot's suggested fix is correct:
if vector_store_ids is not None:
return vector_store_idsThis preserves the explicit empty-list semantics, so callers can pass [] to mean "no vector stores" vs None to mean "fetch all".
Other findings:
- Line 435 in
src/utils/vector_search.pyalso hasif vector_store_ids:, but that's checking a local variable returned from a function call, not a parameter—so it's a different situation. - The
_fetch_byok_ragfunction has the same parameter type but doesn't appear to have this truthiness issue in its implementation.
Recommendation: Apply the bot's fix at line 125. This is a correctness issue that could cause unexpected behavior if a caller explicitly passes an empty list.
🧠 Learnings used
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1198
File: src/utils/responses.py:184-192
Timestamp: 2026-02-23T14:56:59.186Z
Learning: In the lightspeed-stack codebase (lightspeed-core/lightspeed-stack), do not enforce de-duplication of duplicate client.models.list() calls in model selection flows (e.g., in src/utils/responses.py prepare_responses_params). These calls are considered relatively cheap and removing duplicates could add unnecessary complexity to the flow. Apply this guideline specifically to this file/context unless similar performance characteristics and design decisions are documented elsewhere.
There was a problem hiding this comment.
Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!
| # Get vector stores for RAG tools - use specified ones or fetch all | ||
| vector_store_ids = await get_vector_store_ids(client, vector_store_ids) | ||
|
|
There was a problem hiding this comment.
Skip vector store listing when Tool RAG is disabled.
prepare_tools currently calls vector_stores.list() even when rag.tool.byok.enabled is false. That can fail the whole request for a feature that is disabled.
🛠️ Proposed fix
- # Get vector stores for RAG tools - use specified ones or fetch all
- vector_store_ids = await get_vector_store_ids(client, vector_store_ids)
-
- # Add RAG tools if vector stores are available
- rag_tools = get_rag_tools(vector_store_ids)
- if rag_tools:
- toolgroups.extend(rag_tools)
+ # Add RAG tools only when Tool RAG is enabled
+ if configuration.rag.tool.byok.enabled:
+ resolved_vector_store_ids = await get_vector_store_ids(client, vector_store_ids)
+ rag_tools = get_rag_tools(resolved_vector_store_ids)
+ if rag_tools:
+ toolgroups.extend(rag_tools)Also applies to: 383-385
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/utils/responses.py` around lines 205 - 207, prepare_tools is calling
vector_stores.list() via get_vector_store_ids even when the feature toggle
rag.tool.byok.enabled is false, causing unnecessary failures; change the logic
so that when rag.tool.byok.enabled is False you skip any vector store listing
and simply return the passed-in vector_store_ids (or an empty list) instead of
calling vector_stores.list(); update either prepare_tools to short-circuit
before calling get_vector_store_ids or modify get_vector_store_ids to check the
rag.tool.byok.enabled flag and avoid calling vector_stores.list() when the flag
is disabled (referencing prepare_tools, get_vector_store_ids,
vector_stores.list, and rag.tool.byok.enabled to locate the code).
There was a problem hiding this comment.
Irrelevant IMO. Solved in get_rag_tools()
There was a problem hiding this comment.
@asimurka, understood! You're right that get_rag_tools() checks configuration.rag.tool.byok.enabled and returns None when disabled, so the vector store IDs won't be used.
While there's a minor efficiency consideration (calling vector_stores.list() when the result won't be used), if you're comfortable with the current flow, that works.
🧠 Learnings used
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1211
File: src/models/responses.py:8-16
Timestamp: 2026-02-25T07:46:39.608Z
Learning: In the lightspeed-stack codebase, src/models/requests.py uses OpenAIResponseInputTool as Tool while src/models/responses.py uses OpenAIResponseTool as Tool. This type difference is intentional - input tools and output/response tools have different schemas in llama-stack-api.
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1198
File: src/utils/responses.py:184-192
Timestamp: 2026-02-23T14:56:59.186Z
Learning: In the lightspeed-stack codebase (lightspeed-core/lightspeed-stack), do not enforce de-duplication of duplicate client.models.list() calls in model selection flows (e.g., in src/utils/responses.py prepare_responses_params). These calls are considered relatively cheap and removing duplicates could add unnecessary complexity to the flow. Apply this guideline specifically to this file/context unless similar performance characteristics and design decisions are documented elsewhere.
tisnik
left a comment
There was a problem hiding this comment.
In overall LGTM, nice work!
asimurka
left a comment
There was a problem hiding this comment.
Great work, LGTM in overall. I suggested some minor improvements and there are also things worth a discussion.
| ) | ||
| from llama_stack_api.openai_responses import ( | ||
| OpenAIResponseOutputMessageMCPCall as MCPCall, | ||
| ) |
There was a problem hiding this comment.
Merge this to a single import please.
| """ | ||
|
|
||
| always: AlwaysRagConfiguration = Field( | ||
| default_factory=lambda: AlwaysRagConfiguration(), # pylint: disable=unnecessary-lambda |
There was a problem hiding this comment.
I think this could be done like default_factory=AlwaysRagConfiguration without using lambda
| ) | ||
| from llama_stack_api.openai_responses import ( | ||
| OpenAIResponseUsage as ResponseUsage, | ||
| OpenAIResponseInputTool as InputTool, |
| if vector_store_ids: | ||
| return vector_store_ids |
| """ | ||
| # Check if Tool RAG is enabled in configuration | ||
| if not (configuration and configuration.rag.tool.byok.enabled): | ||
| return None |
There was a problem hiding this comment.
Worth a discussion: Returning an empty list here would make more sense. I see it does not affect current logic but possibly can in future (responses endpoint). By returning None you signal "overwrite by config" and by an empty list "do not use vector stores".
| # Get vector stores for RAG tools - use specified ones or fetch all | ||
| vector_store_ids = await get_vector_store_ids(client, vector_store_ids) | ||
|
|
There was a problem hiding this comment.
Irrelevant IMO. Solved in get_rag_tools()
|
@are-ces @tisnik Ideally we should try, from the Ansible side, this feature before being merged, just to early detect any incompatibilities or problems, so new changes can be still added as part of the PR. On the other hand, it may take a while, so if you have urgency on merging this, we are also fine on merging , but if any changes necessary once testing, then you'll have to create new PRs and potentially new minor releases. I am busy on a top priority customer issue, @TamiTakamiya is also quite busy I think, but I can guarantee that at some point during next week, I can give this new feature a try for our Ansible use-case. Let's keep in touch! |
I am trying to set up a test environment on my CRC with this, #1179, and the lightspeed-providers#52. Right now I am still unable to make authentication work (it should work with #1179, though)... Hopefully I should be able to find a solution soon. |
TL;DR
Motivation
To prioritize BYOK content, a mechanism to tune chunk scoring per vector store is necessary. The alternative was to create a client-side tool to change the behavior of the RAG tool, but that is not optimal for two reasons:
Always RAG with LCORE-side chunk prioritization avoids both issues.
Description
Adds configurable RAG strategies and chunk prioritization. Control which documentation sources to search, how to search them, and which sources are more important.
How It Works
Always RAG retrieves chunks from configured sources (BYOK and/or Solr) and injects them into the query before sending to the LLM. The LLM always has documentation context without calling tools.
Tool RAG lets the AI call the
file_searchtool during generation. This is the original behavior, enabled by default for backward compatibility.Chunk Prioritization applies a
score_multiplierper BYOK source. All sources are queried in parallel, chunk scores are multiplied by their source's weight, then merged and sorted. Top N chunks are selected across all sources. Solr chunks are appended after BYOK chunks without cross-source ranking (TBD, needs discussion / spike).Configuration
In
lightspeed-stack.yaml:Chunk limits in
src/constants.py:Type of change
Tools used to create PR
Identify any AI code assistants used in this PR (for transparency and review context)
Related Tickets & Documents
Checklist before requesting a review
Testing
Prerequisites: BYOK vector stores (FAISS) created with rag-content tool, OKP Solr instance, Llama Stack 0.4.3+, Lightspeed Stack Providers installed, OpenAI API key.
Configure chunk limits in
src/constants.py:Configure Lightspeed Stack - add BYOK RAG sources and RAG strategy to
lightspeed-stack.yaml:Run enrichment script - reads lightspeed-stack config and generates an enriched llama-stack
run.yamlwith BYOK vector stores and Solr provider registered:Install Lightspeed Stack Providers:
Start Llama Stack:
Start Lightspeed Stack:
Test query — send to
/v1/queryor streaming endpoint:Response:
{ "conversation_id": "...", "response": "Admission plugins in Red Hat OpenShift Container Platform are [...]", "rag_chunks": [ { "content": "...", "source": "openshift-docs-part1", "score": 1.0038, "attributes": { "doc_url": "https://www.redhat.com/data/architecture/admission-plug-ins.txt", "title": "Admission plugins", "document_id": "file-7a70ef22c4a646f2a6f657c66961ba2c" } }, { "content": "...", "source": "openshift-docs-part2", "score": 0.926, "attributes": { "doc_url": "https://www.redhat.com/web_console/dynamic-plugin/overview-dynamic-plugin.txt", "title": "Overview of dynamic plugins", "document_id": "file-b266f575a95a4da19d7ba058fd980f00" } }, { "content": "...", "source": "OKP Solr", "score": 63.996, "attributes": { "document_id": "/documentation/en-us/openshift_container_platform/4.19/html-single/architecture/index" } } ], "referenced_documents": [ { "doc_title": "Admission plugins", "doc_url": "https://www.redhat.com/data/architecture/admission-plug-ins.txt", "source": "openshift-docs-part1" }, { "doc_title": "Overview of dynamic plugins", "doc_url": "https://www.redhat.com/web_console/dynamic-plugin/overview-dynamic-plugin.txt", "source": "openshift-docs-part2" }, { "doc_title": null, "doc_url": "https://mimir.corp.redhat.com/documentation/en-us/openshift_container_platform/4.19/html-single/architecture/index", "source": "OKP Solr" } ], "truncated": false, "input_tokens": 3736, "output_tokens": 448 }Summary by CodeRabbit
Release Notes
New Features
Documentation