LCORE-86: Prioritize BYOK content over built-in content by are-ces · Pull Request #1208 · lightspeed-core/lightspeed-stack

are-ces · 2026-02-24T12:41:30Z

TL;DR

Dual RAG Strategies: Added configurable "Always RAG" (pre-query context injection) and "Tool RAG" (""Agentic"" RAG). Both can be used independently or together.
Multi-Source Support: Query BYOK and Solr OKP simultaneously and merge results with BYOK prioritized first.
Chunk Prioritization: Implemented score multipliers per vector store for Always RAG BYOK to weight results from different sources.
Configuration Updates: Tool RAG defaults to enabled=True for backward compatibility. Both strategies independently configurable in lightspeed-stack.yaml.
Enrichment Script Improvements: Updated to automatically configure Solr in llama-stack and fix bugs in vector store configuration building.
Max chunk Knobs: added constants to define max_chunks for Tool RAG, BYOK RAG (Always Rag), Solr RAG (Always RAG)

Motivation

To prioritize BYOK content, a mechanism to tune chunk scoring per vector store is necessary. The alternative was to create a client-side tool to change the behavior of the RAG tool, but that is not optimal for two reasons:

RAG as a tool alone is not the best strategy- it relies on the LLM deciding when to call RAG, but in practice the majority of use cases require RAG on every query (e.g. in OLS).
In Llama Stack 0.2.x the Agent could use a client-side function directly as a tool. In 0.3.x+ the Responses API returns a function tool call that must be intercepted by LCORE to then perform RAG, which increases complexity significantly.

Always RAG with LCORE-side chunk prioritization avoids both issues.

Description

Adds configurable RAG strategies and chunk prioritization. Control which documentation sources to search, how to search them, and which sources are more important.

How It Works

Always RAG retrieves chunks from configured sources (BYOK and/or Solr) and injects them into the query before sending to the LLM. The LLM always has documentation context without calling tools.

Tool RAG lets the AI call the file_search tool during generation. This is the original behavior, enabled by default for backward compatibility.

Note: Both can be enabled simultaneously, but Tool RAG is non-deterministic - the LLM decides when to invoke file_search, so behavior varies between requests and has not been tested properly in conjunction with Always RAG.

Chunk Prioritization applies a score_multiplier per BYOK source. All sources are queried in parallel, chunk scores are multiplied by their source's weight, then merged and sorted. Top N chunks are selected across all sources. Solr chunks are appended after BYOK chunks without cross-source ranking (TBD, needs discussion / spike).

Configuration

In lightspeed-stack.yaml:

rag:
  always:
    solr:
      enabled: true     # Enable/disable Solr pre-query search
      offline: true     # true = Mimir URLs, false = public URLs
    byok:
      enabled: true     # Enable/disable BYOK pre-query search
  tool:
    byok: # for clarity, RAG as a tool is a llama-stack tool and does not support Solr
      enabled: true     # Enable/disable tool-based RAG (default: true)

byok_rag:
  - rag_id: source-a
    score_multiplier: 1.0    # Baseline
  - rag_id: source-b
    score_multiplier: 1.5    # 50% boost
    # ... other config (embedding_model, vector_db_id, db_path)

Chunk limits in src/constants.py:

BYOK_RAG_MAX_CHUNKS = 10    # Total chunks from all BYOK sources
SOLR_RAG_MAX_CHUNKS = 5    # Total chunks from Solr
TOOL_RAG_MAX_CHUNKS = 10  # Total chunks from RAG as a tool

Type of change

Tools used to create PR

Identify any AI code assistants used in this PR (for transparency and review context)

Assisted-by: Claude

Related Tickets & Documents

Related Issue # LCORE-86
Closes # LCORE-86

Checklist before requesting a review

I have performed a self-review of my code.
PR has passed all pre-merge test jobs.
If it is a core feature, I have added thorough tests.

Testing

Prerequisites: BYOK vector stores (FAISS) created with rag-content tool, OKP Solr instance, Llama Stack 0.4.3+, Lightspeed Stack Providers installed, OpenAI API key.

Configure chunk limits in src/constants.py:

BYOK_RAG_MAX_CHUNKS = 5    # Total chunks from all BYOK sources
SOLR_RAG_MAX_CHUNKS = 5    # Total chunks from Solr

Configure Lightspeed Stack - add BYOK RAG sources and RAG strategy to lightspeed-stack.yaml:

# BYOK (Bring Your Own Knowledge) RAG configuration
byok_rag:
  - rag_id: openshift-docs-part1
    rag_type: inline::faiss
    embedding_model: sentence-transformers/<path-to-embeddings-model>
    embedding_dimension: 768
    vector_db_id: vs_e4270ee9-0834-422b-83f3-3dca19e2454e
    db_path: <path-to-faiss-store>/os-start-ac/faiss_store.db
    score_multiplier: 1.0
  - rag_id: openshift-docs-part2
    rag_type: inline::faiss
    embedding_model: sentence-transformers/<path-to-embeddings-model>
    embedding_dimension: 768
    vector_db_id: vs_61f66acb-c014-4ffa-b4fb-228b49219ca9
    db_path: <path-to-faiss-store>/os-end-mc/faiss_store.db
    score_multiplier: 1.2
  - rag_id: graham-docs
    rag_type: inline::faiss
    embedding_model: sentence-transformers/all-mpnet-base-v2
    embedding_dimension: 768
    vector_db_id: vs_8c94967b-81cc-4028-a294-9cfac6fd9ae2
    db_path: <path-to-faiss-store>/kv_store.db
    score_multiplier: 1.5

# RAG configuration
rag:
  always:
    solr:
      enabled: true
      offline: false
    byok:
      enabled: true
  tool:
    byok:
      enabled: false

Run enrichment script - reads lightspeed-stack config and generates an enriched llama-stack run.yaml with BYOK vector stores and Solr provider registered:

uv run python src/llama_stack_configuration.py \
  --input <path-to-llama-stack>/run_rag_minimal.yaml \
  --output <path-to-llama-stack>/run_enriched.yaml \
  --config lightspeed-stack.yaml

Install Lightspeed Stack Providers:

uv add git+https://github.com/lightspeed-core/lightspeed-providers.git

Start Llama Stack:

export EXTERNAL_PROVIDERS_DIR=<path-to-lightspeed-stack>/providers.d
uv run llama stack run <path-to-llama-stack>/run_enriched.yaml

Start Lightspeed Stack:

uv run src/lightspeed_stack.py -c lightspeed-stack.yaml

Test query — send to /v1/query or streaming endpoint:

curl -X POST http://localhost:8080/v1/query \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini",
    "provider": "openai",
    "query": "What is the Admission plugin?",
    "system_prompt": "You are a helpful assistant"
  }'

Response:

{
  "conversation_id": "...",
  "response": "Admission plugins in Red Hat OpenShift Container Platform are [...]",
  "rag_chunks": [
    {
      "content": "...",
      "source": "openshift-docs-part1",
      "score": 1.0038,
      "attributes": {
        "doc_url": "https://www.redhat.com/data/architecture/admission-plug-ins.txt",
        "title": "Admission plugins",
        "document_id": "file-7a70ef22c4a646f2a6f657c66961ba2c"
      }
    },
    {
      "content": "...",
      "source": "openshift-docs-part2",
      "score": 0.926,
      "attributes": {
        "doc_url": "https://www.redhat.com/web_console/dynamic-plugin/overview-dynamic-plugin.txt",
        "title": "Overview of dynamic plugins",
        "document_id": "file-b266f575a95a4da19d7ba058fd980f00"
      }
    },
    {
      "content": "...",
      "source": "OKP Solr",
      "score": 63.996,
      "attributes": {
        "document_id": "/documentation/en-us/openshift_container_platform/4.19/html-single/architecture/index"
      }
    }
  ],
  "referenced_documents": [
    {
      "doc_title": "Admission plugins",
      "doc_url": "https://www.redhat.com/data/architecture/admission-plug-ins.txt",
      "source": "openshift-docs-part1"
    },
    {
      "doc_title": "Overview of dynamic plugins",
      "doc_url": "https://www.redhat.com/web_console/dynamic-plugin/overview-dynamic-plugin.txt",
      "source": "openshift-docs-part2"
    },
    {
      "doc_title": null,
      "doc_url": "https://mimir.corp.redhat.com/documentation/en-us/openshift_container_platform/4.19/html-single/architecture/index",
      "source": "OKP Solr"
    }
  ],
  "truncated": false,
  "input_tokens": 3736,
  "output_tokens": 448
}

Summary by CodeRabbit

Release Notes

New Features
- Two selectable RAG strategies: Always RAG (pre-query) and Tool RAG (on-demand)
- BYOK vector store support with configurable score multiplier weighting
- Solr integration for enterprise knowledge retrieval
- Enhanced RAG configuration options for strategy selection and control
Documentation
- Updated configuration guides with new RAG modes, BYOK, and Solr setup examples

coderabbitai · 2026-02-24T12:41:40Z

Walkthrough

This PR restructures the RAG system to support two independent operational modes: Always RAG (pre-query context injection) and Tool RAG (on-demand retrieval via tools). Configuration is reorganized from Solr-based to a hierarchical Rag structure with Always and Tool subsections. A new RAGContext model encapsulates pre-query context data. Multiple utilities are refactored to support both BYOK vector stores and Solr OKP integration with weighted relevance scoring.

Changes

Cohort / File(s)	Summary
Documentation `docs/byok_guide.md`, `docs/config.md`, `docs/rag_guide.md`	Comprehensive RAG documentation overhaul: introduced Always RAG and Tool RAG modes with diagrams, clarified configuration structure, added capability comparison tables, updated examples to reflect two-strategy paradigm with rag.always and rag.tool nesting.
Configuration Examples & Defaults `examples/lightspeed-stack-byok-rag.yaml`, `lightspeed-stack.yaml`, `run.yaml`	Updated configuration structure: replaced top-level solr block with hierarchical rag.always.solr/byok and rag.tool.byok entries; added score_multiplier to BYOK entries; changed default_embedding_model to explicit provider/model mapping in vector stores.
Configuration Models `src/models/config.py`	Replaced SolrConfiguration with a new Rag hierarchy: introduced RagConfiguration, AlwaysRagConfiguration, ToolRagConfiguration, SolrRagConfiguration, and enhanced ByokRagConfiguration with score_multiplier field; restructured RAG settings for both pre-query and tool-based modes.
Configuration Access `src/configuration.py`	Replaced solr property with rag property; added score_multiplier_mapping to AppConfig for mapping vector_db_id to score_multiplier values from BYOK configuration.
Constants & Types `src/constants.py`, `src/utils/types.py`	Added RAG-specific constants (TOOL_RAG_MAX_CHUNKS, BYOK_RAG_MAX_CHUNKS, SOLR_RAG_MAX_CHUNKS, Solr defaults, DEFAULT_SCORE_MULTIPLIER); introduced new RAGContext model aggregating context_text, rag_chunks, and referenced_documents.
Vector Search Refactoring `src/utils/vector_search.py`	Comprehensive refactor introducing dual RAG pipelines: new build_rag_context orchestrator function; added _fetch_byok_rag and _fetch_solr_rag for independent context retrieval; introduced per-store weighting, chunk extraction, and document processing helpers; replaced single perform_vector_search with configuration-driven RAG context building.
Llama Stack Integration `src/llama_stack_configuration.py`	Added enrich_solr function for Solr configuration enrichment; refactored BYOK RAG handling to use rag_id instead of vector_db_id for resource naming; integrated Solr vector_io provider, vector stores, and embedding models with proper deduplication.
Response & Tool Utilities `src/utils/responses.py`	Added get_vector_store_ids helper function; gated RAG tool generation by configuration.rag.tool.byok.enabled flag; updated max_num_results for file_search tool to use TOOL_RAG_MAX_CHUNKS constant; refactored prepare_tools flow with improved error handling.
Query Endpoints `src/app/endpoints/query.py`, `src/app/endpoints/streaming_query.py`	Replaced direct vector search with build_rag_context calls; unified RAG chunk and referenced document handling by merging pre-query (BYOK/Solr) context with tool-provided results; updated context injection and response assembly to use merged RAG data from both sources.
Configuration & Tests `tests/unit/test_configuration.py`, `tests/unit/test_llama_stack_configuration.py`	Added comprehensive tests for score_multiplier_mapping behavior; expanded enrich_solr tests validating Solr resource creation, deduplication, and correct identifier formation based on rag_id and enabled flags.
Model Tests `tests/unit/models/config/test_byok_rag.py`, `tests/unit/models/config/test_dump_configuration.py`	Added ByokRag constructor score_multiplier parameter with validation (must be > 0); updated configuration dump tests to reflect new rag hierarchy with always/tool subsections and score_multiplier in byok_rag entries.
Endpoint & Response Tests `tests/unit/app/endpoints/test_streaming_query.py`, `tests/unit/utils/test_responses.py`, `tests/unit/models/responses/test_rag_chunk.py`	Updated mocks to use RAGContext instead of perform_vector_search; added TestGetVectorStoreIds and TestGetRAGToolsWithConfig test classes; expanded RAGContext composition tests covering context_text, rag_chunks, and referenced_documents.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

LCORE-1029: dumping configuration with BYOK section #829 — Modifies BYOK configuration model (ByokRag) and its serialized representation; directly overlaps with score_multiplier and configuration structure changes in this PR.
LCORE-873: Support Solr Vector I/O Provider in LCORE #868 — Modifies vector_search utilities, Solr/RAG configuration, and query/streaming endpoints; shares significant code overlap in vector search integration and Solr support.
LCORE-1350: Increased type-safety of tool utilities #1229 — Touches src/utils/responses.py tool-construction utilities (prepare_tools, get_rag_tools); directly related to RAG tool generation and configuration gating changes.

Suggested reviewers

tisnik
jrobertboos

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title accurately reflects the main objective: introducing prioritization for BYOK content over built-in (Solr) content through dual RAG strategies and score multiplier mechanisms.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

are-ces · 2026-02-24T13:01:26Z

Chunks from Solr did not include title metadata, thus I get null values for Solr chunks titles. @Anxhela21 you might know why, I might have missed something in the Solr configuration..

Printout of the Solr chunk structure returned by client.vector_io.query:

Chunk(
    chunk_id = '/documentation/en-us/openshift_container_platform/4.20/html-single/architecture/index_chunk_89',
    content  = '# Chapter 9. Admission plugins\n\n\n\n\nAdmission plugins are used to help regulate how [..]',
    chunk_metadata = ChunkChunkMetadata(
        chunk_embedding_dimension = None,
        chunk_embedding_model     = None,
        chunk_id                  = '/documentation/en-us/openshift_container_platform/4.20/html-single/architecture/index_chunk_89',
        chunk_tokenizer           = None,
        chunk_window              = None,
        content_token_count       = None,
        created_timestamp         = None,
        document_id               = '/documentation/en-us/openshift_container_platform/4.20/html-single/architecture/index',
        metadata_token_count      = None,
        source                    = None,
        updated_timestamp         = None,
    ),
    embedding           = [],
    metadata            = {},
    embedding_model     = 'sentence-transformers/ibm-granite/granite-embedding-30m-english',
    embedding_dimension = 384,
)

tisnik

LGTM

- Add configurable RAG strategies: always RAG which is performed at each query (OKP Solr + BYOK) and tool RAG can be used independently or together - Add chunk prioritization with score multipliers per vector store for always RAG - Added knobs in config to select the RAG strategy - Tool RAG defaults to enabled=True for backward compatibility - Update lightspeed stack configuration enrichment script to build the solr section in llama stack and fix bugs in building the vector stores - Updated byok and rag documentation - Updated unit tests

coderabbitai

Actionable comments posted: 5

🧹 Nitpick comments (3)

tests/unit/utils/test_responses.py (1)

2248-2275: Add an explicit regression test for vector_store_ids=[].

Current cases cover None and non-empty lists, but not an explicit empty list. A dedicated test will prevent accidental “fetch all stores” behavior regressions.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/unit/utils/test_responses.py` around lines 2248 - 2275, Add a
regression test that verifies passing an explicit empty list to
get_vector_store_ids returns an empty list and does not call the API: create an
AsyncMock client (client_mock), call result = await
get_vector_store_ids(client_mock, []), assert result == [], and assert
client_mock.vector_stores.list.assert_not_called(); name the test clearly (e.g.,
test_returns_empty_for_explicit_empty_list_no_fetch) and place it alongside the
existing tests in the same test module.

tests/unit/test_llama_stack_configuration.py (1)

362-451: Good test coverage for enrich_solr function.

The tests comprehensively cover the main scenarios: disabled state, empty config, adding providers/stores/models, and preserving existing configuration. The duplicate detection tests for providers and vector stores are present.

Consider adding a test for duplicate embedding model detection to match the logic in enrich_solr (lines 470-473 in src/llama_stack_configuration.py) which checks existing_model_ids before adding.

Optional test for duplicate embedding model

def test_enrich_solr_skips_duplicate_embedding_model() -> None:
    """Test enrich_solr does not add duplicate embedding model."""
    ls_config: dict[str, Any] = {
        "registered_resources": {
            "models": [{"model_id": "existing", "provider_model_id": "all-mpnet-base-v2"}]
        }
    }
    enrich_solr(ls_config, {"enabled": True})

    model_ids = [m["provider_model_id"] for m in ls_config["registered_resources"]["models"]]
    assert model_ids.count("all-mpnet-base-v2") == 1

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/unit/test_llama_stack_configuration.py` around lines 362 - 451, Add a
unit test to cover duplicate embedding model detection in enrich_solr: create
ls_config with "registered_resources": {"models": [{"model_id": "existing",
"provider_model_id": "all-mpnet-base-v2"}]}, call enrich_solr(ls_config,
{"enabled": True}), then assert that the list of provider_model_id values in
ls_config["registered_resources"]["models"] contains "all-mpnet-base-v2" only
once (count == 1); reference enrich_solr and the
existing_model_ids/provider_model_id logic to locate where duplicates are
checked.

src/models/config.py (1)

1730-1740: Minor: Lambda wrappers in default_factory are unnecessary.

The # pylint: disable=unnecessary-lambda comments indicate these lambdas could be simplified. Since these are class constructors with no arguments, you can pass the class directly.

Optional simplification

-    solr: SolrRagConfiguration = Field(
-        default_factory=lambda: SolrRagConfiguration(),  # pylint: disable=unnecessary-lambda
+    solr: SolrRagConfiguration = Field(
+        default_factory=SolrRagConfiguration,
         title="Solr RAG configuration",
         description="Configuration for Solr RAG (pre-query).",
     )

-    byok: ByokRagConfiguration = Field(
-        default_factory=lambda: ByokRagConfiguration(),  # pylint: disable=unnecessary-lambda
+    byok: ByokRagConfiguration = Field(
+        default_factory=ByokRagConfiguration,
         title="BYOK RAG configuration",
         description="Configuration for BYOK RAG (pre-query).",
     )

The same applies to lines 1764-1773. However, line 1750 requires the lambda since it passes enabled=True.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/models/config.py` around lines 1730 - 1740, Replace the unnecessary
lambda wrappers used as default_factory for SolrRagConfiguration and
ByokRagConfiguration by passing the classes themselves (use
default_factory=SolrRagConfiguration and default_factory=ByokRagConfiguration)
in the Field definitions for the solr and byok attributes; also apply the same
simplification for any other fields that use no-arg constructors (e.g., the
similar fields around the RAG config group), but keep the lambda where a
constructor requires arguments (such as the field that calls a class with
enabled=True).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/rag_guide.md`:
- Around line 335-339: The fenced code block in docs/rag_guide.md is missing a
language specifier causing markdownlint MD040; update the block start to include
a language (e.g., bash) so the snippet begins with ```bash and leave the
contents unchanged—this addresses the lint warning for the curl POST example
used to call /v1/query.

In `@lightspeed-stack.yaml`:
- Around line 46-49: The YAML comment and value disagree for the BYOK flag: the
comment says "Default is true for backward compatibility" while the config sets
tool.byok.enabled: false; either change the comment to indicate the feature is
intentionally disabled (e.g., update the line near the tool/byok block to state
the default is false or that this base config disables BYOK) or change the value
to true to match the comment (set tool.byok.enabled: true); update only the
comment or only the boolean so they are consistent and leave the surrounding
YAML structure unchanged.

In `@src/app/endpoints/query.py`:
- Around line 163-167: The code mutates query_request.query by appending
rag_context.context_text; instead, avoid changing the model: after making the
deep copy (or even without copying), construct a separate retrieval string like
retrieval_query = query_request.query + rag_context.context_text and use
retrieval_query for retrieval/search/embedding calls only (where you currently
rely on the mutated value), leaving query_request.query intact for later
operations such as topic summary generation and storing the original user query;
update calls that currently read query_request.query for retrieval to use
retrieval_query instead and ensure persisted/stored values continue to use the
original query_request.query.

In `@src/utils/responses.py`:
- Around line 205-207: prepare_tools is calling vector_stores.list() via
get_vector_store_ids even when the feature toggle rag.tool.byok.enabled is
false, causing unnecessary failures; change the logic so that when
rag.tool.byok.enabled is False you skip any vector store listing and simply
return the passed-in vector_store_ids (or an empty list) instead of calling
vector_stores.list(); update either prepare_tools to short-circuit before
calling get_vector_store_ids or modify get_vector_store_ids to check the
rag.tool.byok.enabled flag and avoid calling vector_stores.list() when the flag
is disabled (referencing prepare_tools, get_vector_store_ids,
vector_stores.list, and rag.tool.byok.enabled to locate the code).
- Around line 125-126: The current truthiness check "if vector_store_ids:"
treats an explicit empty list as False and causes the code to fall through and
fetch all vector stores; change the condition to test for presence explicitly
(e.g., check for None) so that an empty list is preserved and returned. Locate
the check around the vector_store_ids variable in src/utils/responses.py and
replace the truthy conditional with an explicit None check (use
"vector_store_ids is not None" or equivalent) so the function returns the
provided vector_store_ids even when it's an empty list.

---

Nitpick comments:
In `@src/models/config.py`:
- Around line 1730-1740: Replace the unnecessary lambda wrappers used as
default_factory for SolrRagConfiguration and ByokRagConfiguration by passing the
classes themselves (use default_factory=SolrRagConfiguration and
default_factory=ByokRagConfiguration) in the Field definitions for the solr and
byok attributes; also apply the same simplification for any other fields that
use no-arg constructors (e.g., the similar fields around the RAG config group),
but keep the lambda where a constructor requires arguments (such as the field
that calls a class with enabled=True).

In `@tests/unit/test_llama_stack_configuration.py`:
- Around line 362-451: Add a unit test to cover duplicate embedding model
detection in enrich_solr: create ls_config with "registered_resources":
{"models": [{"model_id": "existing", "provider_model_id":
"all-mpnet-base-v2"}]}, call enrich_solr(ls_config, {"enabled": True}), then
assert that the list of provider_model_id values in
ls_config["registered_resources"]["models"] contains "all-mpnet-base-v2" only
once (count == 1); reference enrich_solr and the
existing_model_ids/provider_model_id logic to locate where duplicates are
checked.

In `@tests/unit/utils/test_responses.py`:
- Around line 2248-2275: Add a regression test that verifies passing an explicit
empty list to get_vector_store_ids returns an empty list and does not call the
API: create an AsyncMock client (client_mock), call result = await
get_vector_store_ids(client_mock, []), assert result == [], and assert
client_mock.vector_stores.list.assert_not_called(); name the test clearly (e.g.,
test_returns_empty_for_explicit_empty_list_no_fetch) and place it alongside the
existing tests in the same test module.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9a6024a and 2ace88f.

📒 Files selected for processing (22)

docs/byok_guide.md
docs/config.md
docs/rag_guide.md
examples/lightspeed-stack-byok-rag.yaml
lightspeed-stack.yaml
run.yaml
src/app/endpoints/query.py
src/app/endpoints/streaming_query.py
src/configuration.py
src/constants.py
src/llama_stack_configuration.py
src/models/config.py
src/utils/responses.py
src/utils/types.py
src/utils/vector_search.py
tests/unit/app/endpoints/test_streaming_query.py
tests/unit/models/config/test_byok_rag.py
tests/unit/models/config/test_dump_configuration.py
tests/unit/models/responses/test_rag_chunk.py
tests/unit/test_configuration.py
tests/unit/test_llama_stack_configuration.py
tests/unit/utils/test_responses.py

coderabbitai · 2026-02-27T09:36:57Z

docs/rag_guide.md

 ```
 curl -sX POST http://localhost:8080/v1/query \
    -H "Content-Type: application/json" \
-    -d '{"query" : "how do I secure a nodejs application with keycloak?", "no_tools":true}' | jq .
+    -d '{"query" : "how do I secure a nodejs application with keycloak?"}' | jq .
 ```


⚠️ Potential issue | 🟡 Minor

Add a language to the fenced code block.

This block is missing a language specifier and triggers markdownlint MD040.

🛠️ Proposed fix

-``` +```bash curl -sX POST http://localhost:8080/v1/query \ -H "Content-Type: application/json" \ -d '{"query" : "how do I secure a nodejs application with keycloak?"}' | jq .

🧰 Tools

🪛 markdownlint-cli2 (0.21.0)

[warning] 335-335: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@docs/rag_guide.md` around lines 335 - 339, The fenced code block in docs/rag_guide.md is missing a language specifier causing markdownlint MD040; update the block start to include a language (e.g., bash) so the snippet begins with ```bash and leave the contents unchanged—this addresses the lint warning for the curl POST example used to call /v1/query.

coderabbitai · 2026-02-27T09:36:57Z

lightspeed-stack.yaml

+  tool:
+    byok:
+      # Default is true for backward compatibility
+      enabled: false


⚠️ Potential issue | 🟡 Minor

Comment contradicts the configured value.

The comment states "Default is true for backward compatibility" but the value is set to false. Either update the comment to reflect that this base config intentionally disables the feature, or change the value to true to match the stated backward compatibility intent.

Suggested fix

Option 1 - Keep disabled, fix comment:

tool: byok: - # Default is true for backward compatibility + # Disabled in base config; defaults to true when not specified enabled: false

Option 2 - Enable for backward compatibility:

tool: byok: # Default is true for backward compatibility - enabled: false + enabled: true

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

tool:

byok:

# Default is true for backward compatibility

enabled: false

tool:

byok:

# Disabled in base config; defaults to true when not specified

enabled: false

Suggested change

tool:

byok:

# Default is true for backward compatibility

enabled: false

tool:

byok:

# Default is true for backward compatibility

enabled: true

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@lightspeed-stack.yaml` around lines 46 - 49, The YAML comment and value disagree for the BYOK flag: the comment says "Default is true for backward compatibility" while the config sets tool.byok.enabled: false; either change the comment to indicate the feature is intentionally disabled (e.g., update the line near the tool/byok block to state the default is false or that this base config disables BYOK) or change the value to true to match the comment (set tool.byok.enabled: true); update only the comment or only the boolean so they are consistent and leave the surrounding YAML structure unchanged.

coderabbitai · 2026-02-27T09:36:57Z

src/app/endpoints/query.py

+    if rag_context.context_text:
+        # Mutate a local copy to avoid surprising other logic
+        query_request = query_request.model_copy(deep=True)
+        query_request.query = query_request.query + rag_context.context_text



⚠️ Potential issue | 🟠 Major

Do not overwrite query_request.query with injected RAG context.

This mutation propagates into downstream uses of query_request.query (e.g., Line 219 topic summary and Line 249 stored query), so non-user context gets persisted as if it were user input.

🛠️ Proposed fix

- if rag_context.context_text: - # Mutate a local copy to avoid surprising other logic - query_request = query_request.model_copy(deep=True) - query_request.query = query_request.query + rag_context.context_text + original_query = query_request.query + llm_query_request = query_request + if rag_context.context_text: + llm_query_request = query_request.model_copy(deep=True) + llm_query_request.query = query_request.query + rag_context.context_text @@ - responses_params = await prepare_responses_params( + responses_params = await prepare_responses_params( client, - query_request, + llm_query_request, user_conversation, token, @@ - topic_summary = await get_topic_summary( - query_request.query, client, responses_params.model - ) + topic_summary = await get_topic_summary( + original_query, client, responses_params.model + ) @@ - query=query_request.query, + query=original_query,

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/app/endpoints/query.py` around lines 163 - 167, The code mutates query_request.query by appending rag_context.context_text; instead, avoid changing the model: after making the deep copy (or even without copying), construct a separate retrieval string like retrieval_query = query_request.query + rag_context.context_text and use retrieval_query for retrieval/search/embedding calls only (where you currently rely on the mutated value), leaving query_request.query intact for later operations such as topic summary generation and storing the original user query; update calls that currently read query_request.query for retrieval to use retrieval_query instead and ensure persisted/stored values continue to use the original query_request.query.

that is actually pretty accurate

@tisnik, thank you for confirming! The mutation of query_request.query with the injected RAG context could indeed cause the enriched query (rather than the original user input) to be persisted in the topic summary and stored query records.

If you'd like assistance implementing the fix or would like me to create code changes for this issue, just let me know! The suggested approach of maintaining original_query separately and using llm_query_request for the LLM call should preserve the original user query for storage and topic generation.

🧠 Learnings used

Learnt from: are-ces Repo: lightspeed-core/lightspeed-stack PR: 1073 File: src/app/endpoints/query_v2.py:575-586 Timestamp: 2026-01-30T13:33:40.479Z Learning: In `src/app/endpoints/query_v2.py`, the `parse_referenced_documents_from_responses_api` function filters referenced documents to include only those with `doc_title` or `doc_url` because these documents are displayed in the frontend. Documents with only `doc_id` are intentionally excluded as they wouldn't provide useful information to end users.

Learnt from: asimurka Repo: lightspeed-core/lightspeed-stack PR: 988 File: src/app/endpoints/query.py:319-339 Timestamp: 2026-01-14T09:37:51.612Z Learning: In the lightspeed-stack repository, when provider_id == "azure", the Azure provider with provider_type "remote::azure" is guaranteed to be present in the providers list. Therefore, avoid defensive StopIteration handling for next() when locating the Azure provider in providers within src/app/endpoints/query.py. This change applies specifically to this file (or nearby provider lookup code) and relies on the invariant that the Azure provider exists; if the invariant could be violated, keep the existing StopIteration handling.

coderabbitai · 2026-02-27T09:36:57Z

src/utils/responses.py

+    if vector_store_ids:
+        return vector_store_ids


⚠️ Potential issue | 🟠 Major

Preserve explicit empty-list semantics for vector_store_ids.

Using truthiness here makes [] fall through to “fetch all vector stores”, which can unintentionally broaden retrieval scope.

🛠️ Proposed fix

- if vector_store_ids: + if vector_store_ids is not None: return vector_store_ids

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if vector_store_ids:

return vector_store_ids

if vector_store_ids is not None:

return vector_store_ids

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/utils/responses.py` around lines 125 - 126, The current truthiness check "if vector_store_ids:" treats an explicit empty list as False and causes the code to fall through and fetch all vector stores; change the condition to test for presence explicitly (e.g., check for None) so that an empty list is preserved and returned. Locate the check around the vector_store_ids variable in src/utils/responses.py and replace the truthy conditional with an explicit None check (use "vector_store_ids is not None" or equivalent) so the function returns the provided vector_store_ids even when it's an empty list.

again, this might be relevant

🧩 Analysis chain

@tisnik, good catch! You're right that this pattern might appear elsewhere. Let me search for similar truthiness checks on optional list/collection parameters that could have the same issue.

🏁 Script executed:

#!/bin/bash # Search for similar patterns where optional list parameters use truthiness checks # Pattern: "if <param>:" where param is likely Optional[list[...]] # First, find functions with Optional list parameters echo "=== Functions with Optional[list[...]] parameters ===" rg -n "Optional\[list\[" --type=py -A 5 | head -50 echo -e "\n=== Truthiness checks that might affect optional lists ===" # Look for "if <identifier>:" patterns in the relevant files rg -n "^\s+if [a-z_]+:" --type=py src/utils/responses.py src/app/endpoints/ src/utils/vector_search.py -B 3 -A 1

Length of output: 13082

Let me check the implementation of the _fetch_byok_rag function that also has an optional vector_store_ids parameter:

🏁 Script executed:

#!/bin/bash # Check the _fetch_byok_rag function implementation for similar patterns echo "=== _fetch_byok_rag function implementation ===" rg -n -A 30 "async def _fetch_byok_rag" src/utils/vector_search.py echo -e "\n=== Looking for truthiness checks on vector_store_ids parameter in vector_search.py ===" rg -n "^\s+if vector_store_ids" src/utils/vector_search.py -B 5 -A 2

Length of output: 1746

@tisnik, you're absolutely right to flag this! I've searched the codebase and this is indeed the primary instance where an optional list parameter is checked with truthiness in this PR.

The semantic issue:

vector_store_ids=None → should fetch all vector stores (default behavior)

vector_store_ids=[] → should use no vector stores (explicit empty list)

Current code: treats both cases as "fetch all" ❌

The bot's suggested fix is correct:

if vector_store_ids is not None: return vector_store_ids

This preserves the explicit empty-list semantics, so callers can pass [] to mean "no vector stores" vs None to mean "fetch all".

Other findings:

Line 435 in src/utils/vector_search.py also has if vector_store_ids:, but that's checking a local variable returned from a function call, not a parameter—so it's a different situation.

The _fetch_byok_rag function has the same parameter type but doesn't appear to have this truthiness issue in its implementation.

Recommendation: Apply the bot's fix at line 125. This is a correctness issue that could cause unexpected behavior if a caller explicitly passes an empty list.

🧠 Learnings used

Learnt from: asimurka Repo: lightspeed-core/lightspeed-stack PR: 1198 File: src/utils/responses.py:184-192 Timestamp: 2026-02-23T14:56:59.186Z Learning: In the lightspeed-stack codebase (lightspeed-core/lightspeed-stack), do not enforce de-duplication of duplicate client.models.list() calls in model selection flows (e.g., in src/utils/responses.py prepare_responses_params). These calls are considered relatively cheap and removing duplicates could add unnecessary complexity to the flow. Apply this guideline specifically to this file/context unless similar performance characteristics and design decisions are documented elsewhere.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

coderabbitai · 2026-02-27T09:36:57Z

src/utils/responses.py

+    # Get vector stores for RAG tools - use specified ones or fetch all
+    vector_store_ids = await get_vector_store_ids(client, vector_store_ids)



⚠️ Potential issue | 🟠 Major

Skip vector store listing when Tool RAG is disabled.

prepare_tools currently calls vector_stores.list() even when rag.tool.byok.enabled is false. That can fail the whole request for a feature that is disabled.

🛠️ Proposed fix

- # Get vector stores for RAG tools - use specified ones or fetch all - vector_store_ids = await get_vector_store_ids(client, vector_store_ids) - - # Add RAG tools if vector stores are available - rag_tools = get_rag_tools(vector_store_ids) - if rag_tools: - toolgroups.extend(rag_tools) + # Add RAG tools only when Tool RAG is enabled + if configuration.rag.tool.byok.enabled: + resolved_vector_store_ids = await get_vector_store_ids(client, vector_store_ids) + rag_tools = get_rag_tools(resolved_vector_store_ids) + if rag_tools: + toolgroups.extend(rag_tools)

Also applies to: 383-385

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/utils/responses.py` around lines 205 - 207, prepare_tools is calling vector_stores.list() via get_vector_store_ids even when the feature toggle rag.tool.byok.enabled is false, causing unnecessary failures; change the logic so that when rag.tool.byok.enabled is False you skip any vector store listing and simply return the passed-in vector_store_ids (or an empty list) instead of calling vector_stores.list(); update either prepare_tools to short-circuit before calling get_vector_store_ids or modify get_vector_store_ids to check the rag.tool.byok.enabled flag and avoid calling vector_stores.list() when the flag is disabled (referencing prepare_tools, get_vector_store_ids, vector_stores.list, and rag.tool.byok.enabled to locate the code).

Irrelevant IMO. Solved in get_rag_tools()

@asimurka, understood! You're right that get_rag_tools() checks configuration.rag.tool.byok.enabled and returns None when disabled, so the vector store IDs won't be used.

While there's a minor efficiency consideration (calling vector_stores.list() when the result won't be used), if you're comfortable with the current flow, that works.

🧠 Learnings used

Learnt from: asimurka Repo: lightspeed-core/lightspeed-stack PR: 1211 File: src/models/responses.py:8-16 Timestamp: 2026-02-25T07:46:39.608Z Learning: In the lightspeed-stack codebase, src/models/requests.py uses OpenAIResponseInputTool as Tool while src/models/responses.py uses OpenAIResponseTool as Tool. This type difference is intentional - input tools and output/response tools have different schemas in llama-stack-api.

Learnt from: asimurka Repo: lightspeed-core/lightspeed-stack PR: 1198 File: src/utils/responses.py:184-192 Timestamp: 2026-02-23T14:56:59.186Z Learning: In the lightspeed-stack codebase (lightspeed-core/lightspeed-stack), do not enforce de-duplication of duplicate client.models.list() calls in model selection flows (e.g., in src/utils/responses.py prepare_responses_params). These calls are considered relatively cheap and removing duplicates could add unnecessary complexity to the flow. Apply this guideline specifically to this file/context unless similar performance characteristics and design decisions are documented elsewhere.

tisnik

In overall LGTM, nice work!

asimurka

Great work, LGTM in overall. I suggested some minor improvements and there are also things worth a discussion.

asimurka · 2026-02-27T11:50:11Z

src/app/endpoints/streaming_query.py

+)
+from llama_stack_api.openai_responses import (
    OpenAIResponseOutputMessageMCPCall as MCPCall,
 )


Merge this to a single import please.

asimurka · 2026-02-27T11:53:44Z

src/models/config.py

+    """
+
+    always: AlwaysRagConfiguration = Field(
+        default_factory=lambda: AlwaysRagConfiguration(),  # pylint: disable=unnecessary-lambda


I think this could be done like default_factory=AlwaysRagConfiguration without using lambda

asimurka · 2026-02-27T11:54:28Z

src/utils/responses.py

+)
+from llama_stack_api.openai_responses import (
    OpenAIResponseUsage as ResponseUsage,
-    OpenAIResponseInputTool as InputTool,


Merge also this please

asimurka · 2026-02-27T11:56:27Z

src/utils/responses.py

+    if vector_store_ids:
+        return vector_store_ids


asimurka · 2026-02-27T12:10:52Z

src/utils/responses.py

    """
+    # Check if Tool RAG is enabled in configuration
+    if not (configuration and configuration.rag.tool.byok.enabled):
+        return None


Worth a discussion: Returning an empty list here would make more sense. I see it does not affect current logic but possibly can in future (responses endpoint). By returning None you signal "overwrite by config" and by an empty list "do not use vector stores".

asimurka · 2026-02-27T12:11:30Z

src/utils/responses.py

+    # Get vector stores for RAG tools - use specified ones or fetch all
+    vector_store_ids = await get_vector_store_ids(client, vector_store_ids)



Irrelevant IMO. Solved in get_rag_tools()

romartin · 2026-02-27T19:28:42Z

@are-ces @tisnik Ideally we should try, from the Ansible side, this feature before being merged, just to early detect any incompatibilities or problems, so new changes can be still added as part of the PR.

On the other hand, it may take a while, so if you have urgency on merging this, we are also fine on merging , but if any changes necessary once testing, then you'll have to create new PRs and potentially new minor releases.

I am busy on a top priority customer issue, @TamiTakamiya is also quite busy I think, but I can guarantee that at some point during next week, I can give this new feature a try for our Ansible use-case.

Let's keep in touch!
Thanks!

TamiTakamiya · 2026-02-27T20:22:45Z

I am busy on a top priority customer issue, @TamiTakamiya is also quite busy I think, but I can guarantee that at some point during next week, I can give this new feature a try for our Ansible use-case.

I am trying to set up a test environment on my CRC with this, #1179, and the lightspeed-providers#52. Right now I am still unable to make authentication work (it should work with #1179, though)... Hopefully I should be able to find a solution soon.

are-ces marked this pull request as draft February 24, 2026 12:41

are-ces requested review from Anxhela21, asimurka and tisnik and removed request for Anxhela21 February 24, 2026 13:04

tisnik approved these changes Feb 25, 2026

View reviewed changes

are-ces force-pushed the chunk-prio branch 3 times, most recently from 2e57151 to f593d68 Compare February 27, 2026 09:05

are-ces force-pushed the chunk-prio branch from f593d68 to 2ace88f Compare February 27, 2026 09:21

are-ces marked this pull request as ready for review February 27, 2026 09:30

coderabbitai bot reviewed Feb 27, 2026

View reviewed changes

tisnik approved these changes Feb 27, 2026

View reviewed changes

asimurka requested changes Feb 27, 2026

View reviewed changes

		# Get vector stores for RAG tools - use specified ones or fetch all
		vector_store_ids = await get_vector_store_ids(client, vector_store_ids)

Conversation

are-ces commented Feb 24, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

TL;DR

Motivation

Description

How It Works

Configuration

Type of change

Tools used to create PR

Related Tickets & Documents

Checklist before requesting a review

Testing

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai bot commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Uh oh!

are-ces commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tisnik left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

tisnik left a comment

Choose a reason for hiding this comment

Uh oh!

asimurka left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

are-ces commented Feb 24, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 24, 2026 •

edited

Loading

are-ces commented Feb 24, 2026 •

edited

Loading

coderabbitai bot Feb 27, 2026 •

edited

Loading

coderabbitai bot Feb 27, 2026 •

edited

Loading

coderabbitai bot Feb 27, 2026 •

edited

Loading

romartin commented Feb 27, 2026 •

edited

Loading