Skip to content

feat: add e2e test with LSC api#170

Open
VladimirKadlec wants to merge 1 commit intolightspeed-core:mainfrom
VladimirKadlec:e2etests
Open

feat: add e2e test with LSC api#170
VladimirKadlec wants to merge 1 commit intolightspeed-core:mainfrom
VladimirKadlec:e2etests

Conversation

@VladimirKadlec
Copy link
Member

@VladimirKadlec VladimirKadlec commented Feb 25, 2026

Description

Adding e2e test with Lightspeed Stack

Type of change

  • Refactor
  • New feature
  • Bug fix
  • CVE fix
  • Optimization
  • Documentation Update
  • Configuration Update
  • Bump-up service version
  • Bump-up dependent library
  • Bump-up library or tool used for development (does not change the final image)
  • CI configuration change
  • Unit tests improvement

Tools used to create PR

Identify any AI code assistants used in this PR (for transparency and review context)

  • Assisted-by: N/A
  • Generated by: N/A

Related Tickets & Documents

Checklist before requesting a review

  • I have performed a self-review of my code.
  • PR has passed all pre-merge test jobs.
  • If it is a core feature, I have added thorough tests.

Testing

  • Please provide detailed steps to perform tests related to this code change.
  • How were the fix/results from this change verified? Please provide relevant screenshots or results.

Summary by CodeRabbit

  • Tests
    • Added a CI workflow to run end-to-end evaluation tests with query and streaming modes, automated health checks, and result validation.
    • Added comprehensive integration test configurations to simulate a full runtime environment (API, storage, metrics, outputs, logging) for evaluations.
    • Added a sample evaluation dataset for automated runs.
  • Chores
    • Updated ignore rules to exclude local end-to-end testing secrets.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 25, 2026

Walkthrough

Adds end-to-end test infrastructure: a GitHub Actions workflow to run Lightspeed evaluation tests (query and streaming), new Lightspeed service and evaluation configs, an integration runtime and test data, and a .gitignore entry for local e2e secrets.

Changes

Cohort / File(s) Summary
CI / E2E workflow
.github/workflows/e2e_tests.yaml, .gitignore
Adds a GitHub Actions workflow that runs matrixed e2e evaluation (modes: query, streaming), launches a Lightspeed Stack container, waits for health, runs tests with uv, validates eval_output/*_summary.json, and cleans up. .gitignore adds .secrets.
Lightspeed Stack runtime
tests/integration/lightspeed-stack.yaml
New LCS config: server on 0.0.0.0:8080, auth disabled, llama_stack configured as remote library client, feedback/transcript storage paths, and noop auth module.
Integration runtime & providers
tests/integration/run.yaml
New comprehensive integration runtime with API groups, providers (remote/inline), KV/SQL backends, vector stores, registered resources/namespaces, RAG tool runtime, shields, datasets, and benchmarks for end-to-end testing.
Evaluation configs
tests/integration/system-config-query.yaml, tests/integration/system-config-streaming.yaml
Two evaluation configs (query vs streaming) specifying judge LLM, embedding defaults, lightspeed-stack API settings, metrics metadata, detailed CSV output schema, logging, and environment controls.
Test data
tests/integration/test_evaluation_data.yaml
Sample evaluation dataset with one conversation group and a single turn ("What is the capital of France?") expecting "Paris".

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant GHA as GitHub Actions
    participant Repo as Repo Checkout
    participant Docker as Docker Host
    participant LCS as Lightspeed Stack (container)
    participant Runner as Test runner (uv)
    participant EvalFiles as Eval output JSON
    GHA->>Repo: checkout repository
    GHA->>Docker: run LCS image (mount configs, env)
    Docker->>LCS: start service (expose :8080)
    GHA->>LCS: wait for health endpoint
    GHA->>Runner: set CONFIG (query/streaming) and install deps
    Runner->>LCS: send evaluation requests
    LCS->>Runner: respond (streaming or non-stream)
    Runner->>EvalFiles: write evaluation_*_summary.json
    GHA->>EvalFiles: inspect PASS value
    GHA->>Docker: stop & remove container
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Suggested reviewers

  • tisnik
🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: add e2e test with LSC api' directly and accurately summarizes the main change: adding end-to-end tests using the Lightspeed Stack API, which is confirmed by the workflow, configuration files, and test data files added.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@VladimirKadlec VladimirKadlec force-pushed the e2etests branch 4 times, most recently from e6c6208 to 1c145e9 Compare February 27, 2026 10:15
@VladimirKadlec VladimirKadlec marked this pull request as ready for review February 27, 2026 10:17
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (4)
tests/integration/test_evaluation_data.yaml (1)

21-21: Add trailing newline.

YAML files should end with a newline for POSIX compliance and to avoid potential issues with some tools.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/integration/test_evaluation_data.yaml` at line 21, The YAML file ends
without a trailing newline; update the
tests/integration/test_evaluation_data.yaml file (the block containing the
cleanup_script: null entry) to ensure the file terminates with a single trailing
newline character by adding a newline after the final line so the file ends with
a POSIX-compliant newline.
.github/workflows/e2e_tests.yaml (2)

58-61: Remove unnecessary -it flags for detached container.

The -it flags allocate a pseudo-TTY and keep STDIN open, which are unnecessary when running a container in detached mode (--detach). This can cause warnings or unexpected behavior in some CI environments.

🔧 Proposed fix
          docker run \
            --name $LSC_IMAGE_NAME \
-           -it \
            -p 8080:8080 \
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/e2e_tests.yaml around lines 58 - 61, The docker run
invocation that starts the container for $LSC_IMAGE_NAME includes the
interactive flags `-it` while the job runs the container detached; remove the
`-it` flags and ensure `--detach` (or `-d`) is used instead so the container
doesn't allocate a TTY or keep STDIN open, avoiding CI warnings/unexpected
behavior in the workflow step that invokes `docker run`.

130-137: Harden shell script for robustness.

The glob expansion assumes exactly one matching file. If multiple summary files exist or none match, the script may behave unexpectedly. Also, unquoted variables in the comparison could cause issues with whitespace or special characters.

🔧 Proposed fix
-          OUT_FILE=$( echo eval_output/evaluation_*_summary.json )
-          PASS=$( jq .summary_stats.overall.PASS $OUT_FILE )
+          OUT_FILE=$(ls eval_output/evaluation_*_summary.json 2>/dev/null | head -n1)
+          if [ -z "$OUT_FILE" ] || [ ! -f "$OUT_FILE" ]; then
+            echo "No summary file found"
+            exit 1
+          fi
+          PASS=$( jq .summary_stats.overall.PASS "$OUT_FILE" )
           EXPECTED="1"
-          if [ ${PASS} != ${EXPECTED} ] ; then
+          if [ "${PASS}" != "${EXPECTED}" ] ; then
             echo "============"
-            echo "Wrong PASS number in ${OUT_FILE}: got ${PASS}, expected ${EXPECTED}"
+            echo "Wrong PASS number in ${OUT_FILE}: got ${PASS}, expected ${EXPECTED}"
             echo "============"
             exit 1
           fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/e2e_tests.yaml around lines 130 - 137, The script assumes
exactly one file from the glob and does unquoted comparisons; update the
OUT_FILE handling to collect matches into an array (e.g.,
evaluation_files=($(echo eval_output/evaluation_*_summary.json))) then check the
array length and error if zero or >1, set OUT_FILE to the single element only
when length==1, and use quoted variables in the jq call and comparison (e.g.,
PASS="$(jq ... "$OUT_FILE")" and compare with if [ "$PASS" != "$EXPECTED" ] ) so
whitespace/special chars and missing/multiple files are handled robustly.
tests/integration/system-config-streaming.yaml (1)

1-132: Consider reducing duplication between query and streaming configs.

The streaming config is nearly identical to the query config, differing only in endpoint_type. You could use YAML anchors/aliases or a templating approach to reduce maintenance burden. However, keeping them separate is also acceptable for test configuration clarity.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/integration/system-config-streaming.yaml` around lines 1 - 132, The api
block duplicates the same settings for query vs streaming; create a single YAML
anchor for the common api configuration (e.g., &api_common under the existing
api mapping) and then reuse it with an alias for both variants, overriding only
endpoint_type where needed (reference the api mapping and the endpoint_type key)
so you avoid repeating provider/model/timeout/cache_* fields across query and
streaming configs.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/e2e_tests.yaml:
- Line 123: The workflow prints an undefined matrix property `matrix.lsc-image`;
replace that reference with the correct matrix key `matrix.lsc_image_path` so
the echo uses the existing variable. Update the echo line that currently
references `matrix.lsc-image` to `matrix.lsc_image_path`, and scan for any other
occurrences of `matrix.lsc-image` (e.g., in steps or env entries) to ensure they
all use `matrix.lsc_image_path` consistently.

In `@tests/integration/test_evaluation_data.yaml`:
- Around line 14-18: The test YAML config is inconsistent: turn_metrics lists
only "ragas:response_relevancy" while turn_metrics_metadata defines
"ragas:faithfulness". Either add "ragas:faithfulness" to the turn_metrics
sequence so it will be evaluated, or remove the "ragas:faithfulness" entry from
turn_metrics_metadata; update the "turn_metrics" key accordingly (or delete the
"turn_metrics_metadata.ragas:faithfulness" block) to keep keys consistent.

---

Nitpick comments:
In @.github/workflows/e2e_tests.yaml:
- Around line 58-61: The docker run invocation that starts the container for
$LSC_IMAGE_NAME includes the interactive flags `-it` while the job runs the
container detached; remove the `-it` flags and ensure `--detach` (or `-d`) is
used instead so the container doesn't allocate a TTY or keep STDIN open,
avoiding CI warnings/unexpected behavior in the workflow step that invokes
`docker run`.
- Around line 130-137: The script assumes exactly one file from the glob and
does unquoted comparisons; update the OUT_FILE handling to collect matches into
an array (e.g., evaluation_files=($(echo
eval_output/evaluation_*_summary.json))) then check the array length and error
if zero or >1, set OUT_FILE to the single element only when length==1, and use
quoted variables in the jq call and comparison (e.g., PASS="$(jq ...
"$OUT_FILE")" and compare with if [ "$PASS" != "$EXPECTED" ] ) so
whitespace/special chars and missing/multiple files are handled robustly.

In `@tests/integration/system-config-streaming.yaml`:
- Around line 1-132: The api block duplicates the same settings for query vs
streaming; create a single YAML anchor for the common api configuration (e.g.,
&api_common under the existing api mapping) and then reuse it with an alias for
both variants, overriding only endpoint_type where needed (reference the api
mapping and the endpoint_type key) so you avoid repeating
provider/model/timeout/cache_* fields across query and streaming configs.

In `@tests/integration/test_evaluation_data.yaml`:
- Line 21: The YAML file ends without a trailing newline; update the
tests/integration/test_evaluation_data.yaml file (the block containing the
cleanup_script: null entry) to ensure the file terminates with a single trailing
newline character by adding a newline after the final line so the file ends with
a POSIX-compliant newline.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e395459 and 1c145e9.

📒 Files selected for processing (7)
  • .github/workflows/e2e_tests.yaml
  • .gitignore
  • tests/integration/lightspeed-stack.yaml
  • tests/integration/run.yaml
  • tests/integration/system-config-query.yaml
  • tests/integration/system-config-streaming.yaml
  • tests/integration/test_evaluation_data.yaml

Comment on lines 14 to 18
turn_metrics:
- ragas:response_relevancy
turn_metrics_metadata:
ragas:faithfulness:
threshold: 0.9
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Search for faithfulness references to understand intended usage
rg -n "faithfulness" tests/integration/

Repository: lightspeed-core/lightspeed-evaluation

Length of output: 155


🏁 Script executed:

#!/bin/bash
# Read the full test_evaluation_data.yaml file to understand context
cat -n tests/integration/test_evaluation_data.yaml

Repository: lightspeed-core/lightspeed-evaluation

Length of output: 844


🏁 Script executed:

#!/bin/bash
# Search for response_relevancy usage to understand the pattern
rg -n "response_relevancy" tests/integration/

Repository: lightspeed-core/lightspeed-evaluation

Length of output: 320


🏁 Script executed:

#!/bin/bash
# Search for code that processes turn_metrics and turn_metrics_metadata to understand the relationship
rg -n "turn_metrics" --type py tests/

Repository: lightspeed-core/lightspeed-evaluation

Length of output: 11339


🏁 Script executed:

#!/bin/bash
# Search for processing of turn_metrics_metadata in the evaluation code
rg -n "turn_metrics_metadata" --type py tests/ src/

Repository: lightspeed-core/lightspeed-evaluation

Length of output: 5306


🏁 Script executed:

#!/bin/bash
# Look for other examples of turn_metrics_metadata configuration
fd "\.yaml" tests/ | xargs grep -l "turn_metrics_metadata" 2>/dev/null

Repository: lightspeed-core/lightspeed-evaluation

Length of output: 124


Add ragas:faithfulness to turn_metrics or remove its metadata configuration.

turn_metrics lists only ragas:response_relevancy, but turn_metrics_metadata configures ragas:faithfulness with a threshold of 0.9. Either add ragas:faithfulness to the turn_metrics list if it should be evaluated, or remove the metadata entry if it's not needed.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/integration/test_evaluation_data.yaml` around lines 14 - 18, The test
YAML config is inconsistent: turn_metrics lists only "ragas:response_relevancy"
while turn_metrics_metadata defines "ragas:faithfulness". Either add
"ragas:faithfulness" to the turn_metrics sequence so it will be evaluated, or
remove the "ragas:faithfulness" entry from turn_metrics_metadata; update the
"turn_metrics" key accordingly (or delete the
"turn_metrics_metadata.ragas:faithfulness" block) to keep keys consistent.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
.github/workflows/e2e_tests.yaml (1)

123-123: ⚠️ Potential issue | 🟠 Major

Use the defined matrix key name (lsc_image_path).

Line 123 references matrix.lsc-image_path, but the matrix defines lsc_image_path (Line 18). This causes expression resolution failure.

🔧 Proposed fix
-          echo "  LSC image: ${{ matrix.lsc-image_path }}"
+          echo "  LSC image: ${{ matrix.lsc_image_path }}"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/e2e_tests.yaml at line 123, The workflow references the
wrong matrix key name; update the echo step that uses matrix.lsc-image_path to
use the defined matrix key matrix.lsc_image_path (replace the hyphen with an
underscore) so the expression resolves correctly (look for the echo line that
prints "LSC image: ${{ matrix.lsc-image_path }}" and change it to use
matrix.lsc_image_path).
tests/integration/test_evaluation_data.yaml (1)

14-18: ⚠️ Potential issue | 🟠 Major

Fix metric metadata key typo to ensure threshold is applied.

Line 17 uses ragas:respose_relevancy, but Line 15 configures ragas:response_relevancy. This mismatch can cause the threshold config to be ignored.

🔧 Proposed fix
       turn_metrics_metadata:
-        ragas:respose_relevancy:
+        ragas:response_relevancy:
           threshold: 0.9
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/integration/test_evaluation_data.yaml` around lines 14 - 18, There's a
typo in the metric metadata key: change "ragas:respose_relevancy" to
"ragas:response_relevancy" so the metadata key matches the metric listed in
turn_metrics (ragas:response_relevancy) and the threshold: 0.9 is applied;
update the YAML key name in the block under turn_metrics_metadata to the
corrected spelling.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/e2e_tests.yaml:
- Around line 4-38: The workflow uses pull_request_target and checks out PR head
code while passing secrets, which risks secret exfiltration: change the event
from pull_request_target to pull_request (or split jobs so untrusted PRs run
without secrets), ensure actions/checkout is using the safe ref for pull_request
(use github.event.pull_request.head.repo vs checkout of head only when safe),
and remove or avoid passing repository secrets like OPENAI_API_KEY into jobs
that run untrusted PR code; additionally correct the matrix key typo by
replacing any use of matrix.lsc-image_path (hyphen) with matrix.lsc_image_path
(underscore) so the image path resolves correctly (look for references to
LSC_IMAGE_NAME, OPENAI_API_KEY, and the matrix definition).

---

Duplicate comments:
In @.github/workflows/e2e_tests.yaml:
- Line 123: The workflow references the wrong matrix key name; update the echo
step that uses matrix.lsc-image_path to use the defined matrix key
matrix.lsc_image_path (replace the hyphen with an underscore) so the expression
resolves correctly (look for the echo line that prints "LSC image: ${{
matrix.lsc-image_path }}" and change it to use matrix.lsc_image_path).

In `@tests/integration/test_evaluation_data.yaml`:
- Around line 14-18: There's a typo in the metric metadata key: change
"ragas:respose_relevancy" to "ragas:response_relevancy" so the metadata key
matches the metric listed in turn_metrics (ragas:response_relevancy) and the
threshold: 0.9 is applied; update the YAML key name in the block under
turn_metrics_metadata to the corrected spelling.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1c145e9 and 1bc8ee3.

📒 Files selected for processing (2)
  • .github/workflows/e2e_tests.yaml
  • tests/integration/test_evaluation_data.yaml

Comment on lines +4 to +38
on: [push, pull_request_target]

jobs:
##########
e2e_tests:
runs-on: ubuntu-latest

#name: "Lightspeed-stack setup"

strategy:
# For local testing use matrix with just one variant, "act" doesn't separate runs
matrix:
mode: ["query", "streaming"]
eval-data: ["tests/integration/test_evaluation_data.yaml"]
lsc_image_path: ["quay.io/lightspeed-core/lightspeed-stack:latest"]

name: "E2E Lightspeed Evaluation Test, mode: ${{ matrix.mode }}"

env:
LSC_IMAGE_NAME: "lightspeed-stack-test-mode-${{ matrix.mode }}"

steps:
# Stolen from lightspeed-stack
- uses: actions/checkout@v4
with:
# On PR_TARGET → the fork (or same repo) that opened the PR.
# On push → falls back to the current repository.
repository: ${{ github.event.pull_request.head.repo.full_name || github.repository }}

# On PR_TARGET → the PR head *commit* (reproducible).
# On push → the pushed commit that triggered the workflow.
ref: ${{ github.event.pull_request.head.ref || github.sha }}

# Don’t keep credentials when running untrusted PR code under PR_TARGET.
persist-credentials: ${{ github.event_name != 'pull_request_target' }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

cat -n .github/workflows/e2e_tests.yaml

Repository: lightspeed-core/lightspeed-evaluation

Length of output: 6224


🏁 Script executed:

# Verify the matrix key usage - check if line 123 really has the typo
sed -n '123p' .github/workflows/e2e_tests.yaml

Repository: lightspeed-core/lightspeed-evaluation

Length of output: 139


Avoid executing untrusted PR code with repository secrets.

Using pull_request_target (Line 4), checking out PR head code (Lines 31-35), and passing OPENAI_API_KEY (Lines 64, 118) allows secret exfiltration from forked PRs.

Additionally, the matrix variable reference on line 123 uses the wrong key name (lsc-image_path with hyphen instead of lsc_image_path with underscore), causing an empty value to be logged.

🔒 Safer workflow pattern
-on: [push, pull_request_target]
+on:
+  push:
+  pull_request:

@@
-          repository: ${{ github.event.pull_request.head.repo.full_name || github.repository }}
+          repository: ${{ github.repository }}

@@
-          ref: ${{ github.event.pull_request.head.ref || github.sha }}
+          ref: ${{ github.event.pull_request.head.sha || github.sha }}

Also fix line 123: change matrix.lsc-image_path to matrix.lsc_image_path.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/e2e_tests.yaml around lines 4 - 38, The workflow uses
pull_request_target and checks out PR head code while passing secrets, which
risks secret exfiltration: change the event from pull_request_target to
pull_request (or split jobs so untrusted PRs run without secrets), ensure
actions/checkout is using the safe ref for pull_request (use
github.event.pull_request.head.repo vs checkout of head only when safe), and
remove or avoid passing repository secrets like OPENAI_API_KEY into jobs that
run untrusted PR code; additionally correct the matrix key typo by replacing any
use of matrix.lsc-image_path (hyphen) with matrix.lsc_image_path (underscore) so
the image path resolves correctly (look for references to LSC_IMAGE_NAME,
OPENAI_API_KEY, and the matrix definition).

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/e2e_tests.yaml:
- Around line 58-66: The docker run invocation currently includes the
interactive flags "-it" while also using "--detach", which can cause CI issues;
remove the "-it" flags from the docker run command (the block that starts with
"docker run --name $LSC_IMAGE_NAME ... --detach ${{ matrix.lsc_image_path }}")
so the container runs detached without allocating a pseudo-TTY or keeping STDIN
open, leaving all other options (-p, -v, -e, --name, --detach) unchanged.
- Around line 128-138: The test check assumes a single summary file but the glob
eval_output/evaluation_*_summary.json may match zero or multiple files; replace
the current OUT_FILE assignment with a safe existence/count check: build an
array of matches (files=(eval_output/evaluation_*_summary.json)), verify that
${`#files`[@]} equals 1 and error out if not, then set OUT_FILE="${files[0]}" and
run PASS=$(jq .summary_stats.overall.PASS "$OUT_FILE") (quoted) and handle jq
failure by exiting non-zero; update references to OUT_FILE and PASS accordingly.

In `@tests/integration/test_evaluation_data.yaml`:
- Around line 16-18: Fix the typo in the metric key under turn_metrics_metadata:
replace the incorrect "ragas:respose_relevancy" with "ragas:response_relevancy"
so the threshold metadata is applied consistently (matches the existing
"ragas:response_relevancy" key).

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1bc8ee3 and 9a1a40c.

📒 Files selected for processing (7)
  • .github/workflows/e2e_tests.yaml
  • .gitignore
  • tests/integration/lightspeed-stack.yaml
  • tests/integration/run.yaml
  • tests/integration/system-config-query.yaml
  • tests/integration/system-config-streaming.yaml
  • tests/integration/test_evaluation_data.yaml
🚧 Files skipped from review as they are similar to previous changes (3)
  • tests/integration/lightspeed-stack.yaml
  • tests/integration/system-config-query.yaml
  • .gitignore

Comment on lines +58 to +66
docker run \
--name $LSC_IMAGE_NAME \
-it \
-p 8080:8080 \
-v $(pwd)/tests/integration/lightspeed-stack.yaml:/app-root/lightspeed-stack.yaml:Z \
-v $(pwd)/tests/integration/run.yaml:/app-root/run.yaml:Z \
-e OPENAI_API_KEY="${{ secrets.OPENAI_API_KEY }}" \
--detach \
${{ matrix.lsc_image_path }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Remove -it flag from detached container.

The -it flags allocate a pseudo-TTY and keep STDIN open, which is unnecessary for a detached container in CI and can cause unexpected behavior.

🔧 Proposed fix
          docker run \
            --name $LSC_IMAGE_NAME \
-           -it \
            -p 8080:8080 \
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
docker run \
--name $LSC_IMAGE_NAME \
-it \
-p 8080:8080 \
-v $(pwd)/tests/integration/lightspeed-stack.yaml:/app-root/lightspeed-stack.yaml:Z \
-v $(pwd)/tests/integration/run.yaml:/app-root/run.yaml:Z \
-e OPENAI_API_KEY="${{ secrets.OPENAI_API_KEY }}" \
--detach \
${{ matrix.lsc_image_path }}
docker run \
--name $LSC_IMAGE_NAME \
-p 8080:8080 \
-v $(pwd)/tests/integration/lightspeed-stack.yaml:/app-root/lightspeed-stack.yaml:Z \
-v $(pwd)/tests/integration/run.yaml:/app-root/run.yaml:Z \
-e OPENAI_API_KEY="${{ secrets.OPENAI_API_KEY }}" \
--detach \
${{ matrix.lsc_image_path }}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/e2e_tests.yaml around lines 58 - 66, The docker run
invocation currently includes the interactive flags "-it" while also using
"--detach", which can cause CI issues; remove the "-it" flags from the docker
run command (the block that starts with "docker run --name $LSC_IMAGE_NAME ...
--detach ${{ matrix.lsc_image_path }}") so the container runs detached without
allocating a pseudo-TTY or keeping STDIN open, leaving all other options (-p,
-v, -e, --name, --detach) unchanged.

Comment on lines +128 to +138
- name: Check test result
run: |
OUT_FILE=$( echo eval_output/evaluation_*_summary.json )
PASS=$( jq .summary_stats.overall.PASS $OUT_FILE )
EXPECTED="1"
if [ ${PASS} != ${EXPECTED} ] ; then
echo "============"
echo "Wrong PASS number in ${OUT_FILE}: got ${PASS}, expected ${EXPECTED}"
echo "============"
exit 1
fi
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Handle multiple or missing summary files gracefully.

The glob eval_output/evaluation_*_summary.json could match multiple files or none, causing jq to fail or read the wrong file.

🔧 Proposed fix with explicit file count check
       - name: Check test result
         run: |
-          OUT_FILE=$( echo eval_output/evaluation_*_summary.json )
-          PASS=$( jq .summary_stats.overall.PASS $OUT_FILE )
+          shopt -s nullglob
+          FILES=(eval_output/evaluation_*_summary.json)
+          if [ ${`#FILES`[@]} -ne 1 ]; then
+            echo "Expected exactly 1 summary file, found ${`#FILES`[@]}"
+            ls -la eval_output/
+            exit 1
+          fi
+          OUT_FILE="${FILES[0]}"
+          PASS=$( jq .summary_stats.overall.PASS "$OUT_FILE" )
           EXPECTED="1"
-          if [ ${PASS} != ${EXPECTED} ] ; then
+          if [ "${PASS}" != "${EXPECTED}" ] ; then
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/e2e_tests.yaml around lines 128 - 138, The test check
assumes a single summary file but the glob eval_output/evaluation_*_summary.json
may match zero or multiple files; replace the current OUT_FILE assignment with a
safe existence/count check: build an array of matches
(files=(eval_output/evaluation_*_summary.json)), verify that ${`#files`[@]} equals
1 and error out if not, then set OUT_FILE="${files[0]}" and run PASS=$(jq
.summary_stats.overall.PASS "$OUT_FILE") (quoted) and handle jq failure by
exiting non-zero; update references to OUT_FILE and PASS accordingly.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (3)
.github/workflows/e2e_tests.yaml (3)

128-138: ⚠️ Potential issue | 🟡 Minor

Handle multiple or missing summary files gracefully.

The glob eval_output/evaluation_*_summary.json (line 130) could match multiple files or none, causing jq to fail or read incorrect data. The variable should be quoted in the comparison (line 133).

🔧 Proposed fix with explicit file count check
       - name: Check test result
         run: |
-          OUT_FILE=$( echo eval_output/evaluation_*_summary.json )
-          PASS=$( jq .summary_stats.overall.PASS $OUT_FILE )
+          shopt -s nullglob
+          FILES=(eval_output/evaluation_*_summary.json)
+          if [ ${`#FILES`[@]} -ne 1 ]; then
+            echo "Expected exactly 1 summary file, found ${`#FILES`[@]}"
+            ls -la eval_output/
+            exit 1
+          fi
+          OUT_FILE="${FILES[0]}"
+          PASS=$( jq .summary_stats.overall.PASS "$OUT_FILE" )
           EXPECTED="1"
-          if [ ${PASS} != ${EXPECTED} ] ; then
+          if [ "${PASS}" != "${EXPECTED}" ] ; then
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/e2e_tests.yaml around lines 128 - 138, The script assumes
a single matching summary file and uses unquoted vars in the test; change to
explicitly collect matches into an array (e.g.,
files=(eval_output/evaluation_*_summary.json)), check that ${`#files`[@]} equals 1
and fail with a clear message if zero or multiple matches, then set
OUT_FILE="${files[0]}" and call PASS=$(jq .summary_stats.overall.PASS
"$OUT_FILE"); also quote variables in the comparison: if [ "${PASS}" !=
"${EXPECTED}" ] ; then ... to avoid word-splitting and ensure correct behavior.

58-66: ⚠️ Potential issue | 🟡 Minor

Remove -it flag from detached container.

The -it flags (line 60) allocate a pseudo-TTY and keep STDIN open, which is unnecessary and potentially problematic for a detached container in CI.

🔧 Proposed fix
          docker run \
            --name $LSC_IMAGE_NAME \
-           -it \
            -p 8080:8080 \
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/e2e_tests.yaml around lines 58 - 66, Remove the
unnecessary interactive flags from the docker run invocation: in the docker run
block that creates the container named via $LSC_IMAGE_NAME and uses ${{
matrix.lsc_image_path }}, delete the "-it" option so the container is started
detached without allocating a pseudo-TTY or keeping STDIN open.

4-4: ⚠️ Potential issue | 🔴 Critical

Security risk: pull_request_target with PR head checkout and secrets.

This workflow uses pull_request_target (line 4), checks out the PR head code (lines 31-35), and passes OPENAI_API_KEY secret to both the container (line 64) and test execution (line 118). This combination allows malicious PRs from forks to exfiltrate secrets.

🔒 Safer workflow pattern
-on: [push, pull_request_target]
+on: [push, pull_request]

With pull_request, secrets are not available for fork PRs, preventing exfiltration while still running tests on trusted pushes.

🧹 Nitpick comments (2)
.github/workflows/e2e_tests.yaml (2)

141-148: Container cleanup only runs locally, not on GitHub Actions.

The cleanup step (lines 142-148) only executes when env.ACT is set, meaning containers are orphaned on actual GitHub Actions runs. While GitHub-hosted runners are ephemeral, explicitly cleaning up is a good practice and helps with self-hosted runners.

♻️ Proposed fix to always clean up
       # Cleanup
-      - name: Stop the LSC if in local devel
-        if: ${{ always() && env.ACT }}
+      - name: Stop the LSC
+        if: always()
         run: |
           echo "Stopping LSC container $LSC_IMAGE_NAME"
           echo "++++++++++++++++++++++"
           docker stop $LSC_IMAGE_NAME || true
           docker rm $LSC_IMAGE_NAME || true
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/e2e_tests.yaml around lines 141 - 148, The cleanup job
currently only runs when env.ACT is set (if: ${{ always() && env.ACT }}),
leaving containers orphaned on non-local GH Actions; change the step "Stop the
LSC if in local devel" to run unconditionally after the job by removing the
env.ACT conditional and using always() (e.g., if: ${{ always() }}), so the
docker stop/rm commands that reference LSC_IMAGE_NAME run on every workflow run
including GitHub-hosted and self-hosted runners.

76-90: Consider adding --silent to health check curl.

The curl command on line 80 may pollute logs with progress output. Adding --silent keeps logs cleaner while --fail still ensures proper exit codes.

♻️ Proposed fix
-            if curl --output /dev/null --fail http://localhost:8080/v1/models ; then
+            if curl --silent --output /dev/null --fail http://localhost:8080/v1/models ; then
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/e2e_tests.yaml around lines 76 - 90, The curl health-check
in the "Wait for the LSC" loop uses "curl --fail
http://localhost:8080/v1/models" and can emit progress output into logs; update
the curl invocation in that for-loop to include the --silent flag (i.e., use
--silent --fail) so it still fails on non-2xx responses but suppresses progress
output while polling.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/integration/run.yaml`:
- Around line 50-53: The braintrust provider block (provider_id: braintrust,
provider_type: inline::braintrust) is using a placeholder openai_api_key
('********') which will cause silent failures in e2e runs; replace the hardcoded
placeholder with an environment variable reference (e.g., use a runtime
substitution like BRAINTRUST_OPENAI_API_KEY) so tests pick up a real key, or
explicitly remove/disable the braintrust provider entry and add a comment or
test config flag stating that braintrust scoring is not exercised in these
integration tests to avoid running it with an invalid key.

---

Duplicate comments:
In @.github/workflows/e2e_tests.yaml:
- Around line 128-138: The script assumes a single matching summary file and
uses unquoted vars in the test; change to explicitly collect matches into an
array (e.g., files=(eval_output/evaluation_*_summary.json)), check that
${`#files`[@]} equals 1 and fail with a clear message if zero or multiple matches,
then set OUT_FILE="${files[0]}" and call PASS=$(jq .summary_stats.overall.PASS
"$OUT_FILE"); also quote variables in the comparison: if [ "${PASS}" !=
"${EXPECTED}" ] ; then ... to avoid word-splitting and ensure correct behavior.
- Around line 58-66: Remove the unnecessary interactive flags from the docker
run invocation: in the docker run block that creates the container named via
$LSC_IMAGE_NAME and uses ${{ matrix.lsc_image_path }}, delete the "-it" option
so the container is started detached without allocating a pseudo-TTY or keeping
STDIN open.

---

Nitpick comments:
In @.github/workflows/e2e_tests.yaml:
- Around line 141-148: The cleanup job currently only runs when env.ACT is set
(if: ${{ always() && env.ACT }}), leaving containers orphaned on non-local GH
Actions; change the step "Stop the LSC if in local devel" to run unconditionally
after the job by removing the env.ACT conditional and using always() (e.g., if:
${{ always() }}), so the docker stop/rm commands that reference LSC_IMAGE_NAME
run on every workflow run including GitHub-hosted and self-hosted runners.
- Around line 76-90: The curl health-check in the "Wait for the LSC" loop uses
"curl --fail http://localhost:8080/v1/models" and can emit progress output into
logs; update the curl invocation in that for-loop to include the --silent flag
(i.e., use --silent --fail) so it still fails on non-2xx responses but
suppresses progress output while polling.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9a1a40c and f575116.

📒 Files selected for processing (7)
  • .github/workflows/e2e_tests.yaml
  • .gitignore
  • tests/integration/lightspeed-stack.yaml
  • tests/integration/run.yaml
  • tests/integration/system-config-query.yaml
  • tests/integration/system-config-streaming.yaml
  • tests/integration/test_evaluation_data.yaml
🚧 Files skipped from review as they are similar to previous changes (3)
  • tests/integration/system-config-query.yaml
  • tests/integration/system-config-streaming.yaml
  • .gitignore

Comment on lines +50 to +53
- provider_id: braintrust
provider_type: inline::braintrust
config:
openai_api_key: '********'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Placeholder API key may cause silent failures.

The braintrust provider has openai_api_key: '********' which appears to be a placeholder. If this provider is exercised during tests, it will likely fail. Either provide a valid key via environment variable substitution (similar to line 25), or document that braintrust scoring is not exercised in these e2e tests.

🔧 Suggested fix using environment variable
   - provider_id: braintrust
     provider_type: inline::braintrust
     config:
-      openai_api_key: '********'
+      openai_api_key: ${env.OPENAI_API_KEY:=}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/integration/run.yaml` around lines 50 - 53, The braintrust provider
block (provider_id: braintrust, provider_type: inline::braintrust) is using a
placeholder openai_api_key ('********') which will cause silent failures in e2e
runs; replace the hardcoded placeholder with an environment variable reference
(e.g., use a runtime substitution like BRAINTRUST_OPENAI_API_KEY) so tests pick
up a real key, or explicitly remove/disable the braintrust provider entry and
add a comment or test config flag stating that braintrust scoring is not
exercised in these integration tests to avoid running it with an invalid key.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant