feat: add e2e test with LSC api#170
feat: add e2e test with LSC api#170VladimirKadlec wants to merge 1 commit intolightspeed-core:mainfrom
Conversation
WalkthroughAdds end-to-end test infrastructure: a GitHub Actions workflow to run Lightspeed evaluation tests (query and streaming), new Lightspeed service and evaluation configs, an integration runtime and test data, and a Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant GHA as GitHub Actions
participant Repo as Repo Checkout
participant Docker as Docker Host
participant LCS as Lightspeed Stack (container)
participant Runner as Test runner (uv)
participant EvalFiles as Eval output JSON
GHA->>Repo: checkout repository
GHA->>Docker: run LCS image (mount configs, env)
Docker->>LCS: start service (expose :8080)
GHA->>LCS: wait for health endpoint
GHA->>Runner: set CONFIG (query/streaming) and install deps
Runner->>LCS: send evaluation requests
LCS->>Runner: respond (streaming or non-stream)
Runner->>EvalFiles: write evaluation_*_summary.json
GHA->>EvalFiles: inspect PASS value
GHA->>Docker: stop & remove container
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes Suggested reviewers
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
e6c6208 to
1c145e9
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (4)
tests/integration/test_evaluation_data.yaml (1)
21-21: Add trailing newline.YAML files should end with a newline for POSIX compliance and to avoid potential issues with some tools.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/integration/test_evaluation_data.yaml` at line 21, The YAML file ends without a trailing newline; update the tests/integration/test_evaluation_data.yaml file (the block containing the cleanup_script: null entry) to ensure the file terminates with a single trailing newline character by adding a newline after the final line so the file ends with a POSIX-compliant newline..github/workflows/e2e_tests.yaml (2)
58-61: Remove unnecessary-itflags for detached container.The
-itflags allocate a pseudo-TTY and keep STDIN open, which are unnecessary when running a container in detached mode (--detach). This can cause warnings or unexpected behavior in some CI environments.🔧 Proposed fix
docker run \ --name $LSC_IMAGE_NAME \ - -it \ -p 8080:8080 \🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/e2e_tests.yaml around lines 58 - 61, The docker run invocation that starts the container for $LSC_IMAGE_NAME includes the interactive flags `-it` while the job runs the container detached; remove the `-it` flags and ensure `--detach` (or `-d`) is used instead so the container doesn't allocate a TTY or keep STDIN open, avoiding CI warnings/unexpected behavior in the workflow step that invokes `docker run`.
130-137: Harden shell script for robustness.The glob expansion assumes exactly one matching file. If multiple summary files exist or none match, the script may behave unexpectedly. Also, unquoted variables in the comparison could cause issues with whitespace or special characters.
🔧 Proposed fix
- OUT_FILE=$( echo eval_output/evaluation_*_summary.json ) - PASS=$( jq .summary_stats.overall.PASS $OUT_FILE ) + OUT_FILE=$(ls eval_output/evaluation_*_summary.json 2>/dev/null | head -n1) + if [ -z "$OUT_FILE" ] || [ ! -f "$OUT_FILE" ]; then + echo "No summary file found" + exit 1 + fi + PASS=$( jq .summary_stats.overall.PASS "$OUT_FILE" ) EXPECTED="1" - if [ ${PASS} != ${EXPECTED} ] ; then + if [ "${PASS}" != "${EXPECTED}" ] ; then echo "============" - echo "Wrong PASS number in ${OUT_FILE}: got ${PASS}, expected ${EXPECTED}" + echo "Wrong PASS number in ${OUT_FILE}: got ${PASS}, expected ${EXPECTED}" echo "============" exit 1 fi🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/e2e_tests.yaml around lines 130 - 137, The script assumes exactly one file from the glob and does unquoted comparisons; update the OUT_FILE handling to collect matches into an array (e.g., evaluation_files=($(echo eval_output/evaluation_*_summary.json))) then check the array length and error if zero or >1, set OUT_FILE to the single element only when length==1, and use quoted variables in the jq call and comparison (e.g., PASS="$(jq ... "$OUT_FILE")" and compare with if [ "$PASS" != "$EXPECTED" ] ) so whitespace/special chars and missing/multiple files are handled robustly.tests/integration/system-config-streaming.yaml (1)
1-132: Consider reducing duplication between query and streaming configs.The streaming config is nearly identical to the query config, differing only in
endpoint_type. You could use YAML anchors/aliases or a templating approach to reduce maintenance burden. However, keeping them separate is also acceptable for test configuration clarity.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/integration/system-config-streaming.yaml` around lines 1 - 132, The api block duplicates the same settings for query vs streaming; create a single YAML anchor for the common api configuration (e.g., &api_common under the existing api mapping) and then reuse it with an alias for both variants, overriding only endpoint_type where needed (reference the api mapping and the endpoint_type key) so you avoid repeating provider/model/timeout/cache_* fields across query and streaming configs.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/workflows/e2e_tests.yaml:
- Line 123: The workflow prints an undefined matrix property `matrix.lsc-image`;
replace that reference with the correct matrix key `matrix.lsc_image_path` so
the echo uses the existing variable. Update the echo line that currently
references `matrix.lsc-image` to `matrix.lsc_image_path`, and scan for any other
occurrences of `matrix.lsc-image` (e.g., in steps or env entries) to ensure they
all use `matrix.lsc_image_path` consistently.
In `@tests/integration/test_evaluation_data.yaml`:
- Around line 14-18: The test YAML config is inconsistent: turn_metrics lists
only "ragas:response_relevancy" while turn_metrics_metadata defines
"ragas:faithfulness". Either add "ragas:faithfulness" to the turn_metrics
sequence so it will be evaluated, or remove the "ragas:faithfulness" entry from
turn_metrics_metadata; update the "turn_metrics" key accordingly (or delete the
"turn_metrics_metadata.ragas:faithfulness" block) to keep keys consistent.
---
Nitpick comments:
In @.github/workflows/e2e_tests.yaml:
- Around line 58-61: The docker run invocation that starts the container for
$LSC_IMAGE_NAME includes the interactive flags `-it` while the job runs the
container detached; remove the `-it` flags and ensure `--detach` (or `-d`) is
used instead so the container doesn't allocate a TTY or keep STDIN open,
avoiding CI warnings/unexpected behavior in the workflow step that invokes
`docker run`.
- Around line 130-137: The script assumes exactly one file from the glob and
does unquoted comparisons; update the OUT_FILE handling to collect matches into
an array (e.g., evaluation_files=($(echo
eval_output/evaluation_*_summary.json))) then check the array length and error
if zero or >1, set OUT_FILE to the single element only when length==1, and use
quoted variables in the jq call and comparison (e.g., PASS="$(jq ...
"$OUT_FILE")" and compare with if [ "$PASS" != "$EXPECTED" ] ) so
whitespace/special chars and missing/multiple files are handled robustly.
In `@tests/integration/system-config-streaming.yaml`:
- Around line 1-132: The api block duplicates the same settings for query vs
streaming; create a single YAML anchor for the common api configuration (e.g.,
&api_common under the existing api mapping) and then reuse it with an alias for
both variants, overriding only endpoint_type where needed (reference the api
mapping and the endpoint_type key) so you avoid repeating
provider/model/timeout/cache_* fields across query and streaming configs.
In `@tests/integration/test_evaluation_data.yaml`:
- Line 21: The YAML file ends without a trailing newline; update the
tests/integration/test_evaluation_data.yaml file (the block containing the
cleanup_script: null entry) to ensure the file terminates with a single trailing
newline character by adding a newline after the final line so the file ends with
a POSIX-compliant newline.
ℹ️ Review info
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
.github/workflows/e2e_tests.yaml.gitignoretests/integration/lightspeed-stack.yamltests/integration/run.yamltests/integration/system-config-query.yamltests/integration/system-config-streaming.yamltests/integration/test_evaluation_data.yaml
| turn_metrics: | ||
| - ragas:response_relevancy | ||
| turn_metrics_metadata: | ||
| ragas:faithfulness: | ||
| threshold: 0.9 |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Search for faithfulness references to understand intended usage
rg -n "faithfulness" tests/integration/Repository: lightspeed-core/lightspeed-evaluation
Length of output: 155
🏁 Script executed:
#!/bin/bash
# Read the full test_evaluation_data.yaml file to understand context
cat -n tests/integration/test_evaluation_data.yamlRepository: lightspeed-core/lightspeed-evaluation
Length of output: 844
🏁 Script executed:
#!/bin/bash
# Search for response_relevancy usage to understand the pattern
rg -n "response_relevancy" tests/integration/Repository: lightspeed-core/lightspeed-evaluation
Length of output: 320
🏁 Script executed:
#!/bin/bash
# Search for code that processes turn_metrics and turn_metrics_metadata to understand the relationship
rg -n "turn_metrics" --type py tests/Repository: lightspeed-core/lightspeed-evaluation
Length of output: 11339
🏁 Script executed:
#!/bin/bash
# Search for processing of turn_metrics_metadata in the evaluation code
rg -n "turn_metrics_metadata" --type py tests/ src/Repository: lightspeed-core/lightspeed-evaluation
Length of output: 5306
🏁 Script executed:
#!/bin/bash
# Look for other examples of turn_metrics_metadata configuration
fd "\.yaml" tests/ | xargs grep -l "turn_metrics_metadata" 2>/dev/nullRepository: lightspeed-core/lightspeed-evaluation
Length of output: 124
Add ragas:faithfulness to turn_metrics or remove its metadata configuration.
turn_metrics lists only ragas:response_relevancy, but turn_metrics_metadata configures ragas:faithfulness with a threshold of 0.9. Either add ragas:faithfulness to the turn_metrics list if it should be evaluated, or remove the metadata entry if it's not needed.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/integration/test_evaluation_data.yaml` around lines 14 - 18, The test
YAML config is inconsistent: turn_metrics lists only "ragas:response_relevancy"
while turn_metrics_metadata defines "ragas:faithfulness". Either add
"ragas:faithfulness" to the turn_metrics sequence so it will be evaluated, or
remove the "ragas:faithfulness" entry from turn_metrics_metadata; update the
"turn_metrics" key accordingly (or delete the
"turn_metrics_metadata.ragas:faithfulness" block) to keep keys consistent.
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (2)
.github/workflows/e2e_tests.yaml (1)
123-123:⚠️ Potential issue | 🟠 MajorUse the defined matrix key name (
lsc_image_path).Line 123 references
matrix.lsc-image_path, but the matrix defineslsc_image_path(Line 18). This causes expression resolution failure.🔧 Proposed fix
- echo " LSC image: ${{ matrix.lsc-image_path }}" + echo " LSC image: ${{ matrix.lsc_image_path }}"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/e2e_tests.yaml at line 123, The workflow references the wrong matrix key name; update the echo step that uses matrix.lsc-image_path to use the defined matrix key matrix.lsc_image_path (replace the hyphen with an underscore) so the expression resolves correctly (look for the echo line that prints "LSC image: ${{ matrix.lsc-image_path }}" and change it to use matrix.lsc_image_path).tests/integration/test_evaluation_data.yaml (1)
14-18:⚠️ Potential issue | 🟠 MajorFix metric metadata key typo to ensure threshold is applied.
Line 17 uses
ragas:respose_relevancy, but Line 15 configuresragas:response_relevancy. This mismatch can cause the threshold config to be ignored.🔧 Proposed fix
turn_metrics_metadata: - ragas:respose_relevancy: + ragas:response_relevancy: threshold: 0.9🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/integration/test_evaluation_data.yaml` around lines 14 - 18, There's a typo in the metric metadata key: change "ragas:respose_relevancy" to "ragas:response_relevancy" so the metadata key matches the metric listed in turn_metrics (ragas:response_relevancy) and the threshold: 0.9 is applied; update the YAML key name in the block under turn_metrics_metadata to the corrected spelling.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/workflows/e2e_tests.yaml:
- Around line 4-38: The workflow uses pull_request_target and checks out PR head
code while passing secrets, which risks secret exfiltration: change the event
from pull_request_target to pull_request (or split jobs so untrusted PRs run
without secrets), ensure actions/checkout is using the safe ref for pull_request
(use github.event.pull_request.head.repo vs checkout of head only when safe),
and remove or avoid passing repository secrets like OPENAI_API_KEY into jobs
that run untrusted PR code; additionally correct the matrix key typo by
replacing any use of matrix.lsc-image_path (hyphen) with matrix.lsc_image_path
(underscore) so the image path resolves correctly (look for references to
LSC_IMAGE_NAME, OPENAI_API_KEY, and the matrix definition).
---
Duplicate comments:
In @.github/workflows/e2e_tests.yaml:
- Line 123: The workflow references the wrong matrix key name; update the echo
step that uses matrix.lsc-image_path to use the defined matrix key
matrix.lsc_image_path (replace the hyphen with an underscore) so the expression
resolves correctly (look for the echo line that prints "LSC image: ${{
matrix.lsc-image_path }}" and change it to use matrix.lsc_image_path).
In `@tests/integration/test_evaluation_data.yaml`:
- Around line 14-18: There's a typo in the metric metadata key: change
"ragas:respose_relevancy" to "ragas:response_relevancy" so the metadata key
matches the metric listed in turn_metrics (ragas:response_relevancy) and the
threshold: 0.9 is applied; update the YAML key name in the block under
turn_metrics_metadata to the corrected spelling.
| on: [push, pull_request_target] | ||
|
|
||
| jobs: | ||
| ########## | ||
| e2e_tests: | ||
| runs-on: ubuntu-latest | ||
|
|
||
| #name: "Lightspeed-stack setup" | ||
|
|
||
| strategy: | ||
| # For local testing use matrix with just one variant, "act" doesn't separate runs | ||
| matrix: | ||
| mode: ["query", "streaming"] | ||
| eval-data: ["tests/integration/test_evaluation_data.yaml"] | ||
| lsc_image_path: ["quay.io/lightspeed-core/lightspeed-stack:latest"] | ||
|
|
||
| name: "E2E Lightspeed Evaluation Test, mode: ${{ matrix.mode }}" | ||
|
|
||
| env: | ||
| LSC_IMAGE_NAME: "lightspeed-stack-test-mode-${{ matrix.mode }}" | ||
|
|
||
| steps: | ||
| # Stolen from lightspeed-stack | ||
| - uses: actions/checkout@v4 | ||
| with: | ||
| # On PR_TARGET → the fork (or same repo) that opened the PR. | ||
| # On push → falls back to the current repository. | ||
| repository: ${{ github.event.pull_request.head.repo.full_name || github.repository }} | ||
|
|
||
| # On PR_TARGET → the PR head *commit* (reproducible). | ||
| # On push → the pushed commit that triggered the workflow. | ||
| ref: ${{ github.event.pull_request.head.ref || github.sha }} | ||
|
|
||
| # Don’t keep credentials when running untrusted PR code under PR_TARGET. | ||
| persist-credentials: ${{ github.event_name != 'pull_request_target' }} |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cat -n .github/workflows/e2e_tests.yamlRepository: lightspeed-core/lightspeed-evaluation
Length of output: 6224
🏁 Script executed:
# Verify the matrix key usage - check if line 123 really has the typo
sed -n '123p' .github/workflows/e2e_tests.yamlRepository: lightspeed-core/lightspeed-evaluation
Length of output: 139
Avoid executing untrusted PR code with repository secrets.
Using pull_request_target (Line 4), checking out PR head code (Lines 31-35), and passing OPENAI_API_KEY (Lines 64, 118) allows secret exfiltration from forked PRs.
Additionally, the matrix variable reference on line 123 uses the wrong key name (lsc-image_path with hyphen instead of lsc_image_path with underscore), causing an empty value to be logged.
🔒 Safer workflow pattern
-on: [push, pull_request_target]
+on:
+ push:
+ pull_request:
@@
- repository: ${{ github.event.pull_request.head.repo.full_name || github.repository }}
+ repository: ${{ github.repository }}
@@
- ref: ${{ github.event.pull_request.head.ref || github.sha }}
+ ref: ${{ github.event.pull_request.head.sha || github.sha }}Also fix line 123: change matrix.lsc-image_path to matrix.lsc_image_path.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.github/workflows/e2e_tests.yaml around lines 4 - 38, The workflow uses
pull_request_target and checks out PR head code while passing secrets, which
risks secret exfiltration: change the event from pull_request_target to
pull_request (or split jobs so untrusted PRs run without secrets), ensure
actions/checkout is using the safe ref for pull_request (use
github.event.pull_request.head.repo vs checkout of head only when safe), and
remove or avoid passing repository secrets like OPENAI_API_KEY into jobs that
run untrusted PR code; additionally correct the matrix key typo by replacing any
use of matrix.lsc-image_path (hyphen) with matrix.lsc_image_path (underscore) so
the image path resolves correctly (look for references to LSC_IMAGE_NAME,
OPENAI_API_KEY, and the matrix definition).
1bc8ee3 to
9a1a40c
Compare
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/workflows/e2e_tests.yaml:
- Around line 58-66: The docker run invocation currently includes the
interactive flags "-it" while also using "--detach", which can cause CI issues;
remove the "-it" flags from the docker run command (the block that starts with
"docker run --name $LSC_IMAGE_NAME ... --detach ${{ matrix.lsc_image_path }}")
so the container runs detached without allocating a pseudo-TTY or keeping STDIN
open, leaving all other options (-p, -v, -e, --name, --detach) unchanged.
- Around line 128-138: The test check assumes a single summary file but the glob
eval_output/evaluation_*_summary.json may match zero or multiple files; replace
the current OUT_FILE assignment with a safe existence/count check: build an
array of matches (files=(eval_output/evaluation_*_summary.json)), verify that
${`#files`[@]} equals 1 and error out if not, then set OUT_FILE="${files[0]}" and
run PASS=$(jq .summary_stats.overall.PASS "$OUT_FILE") (quoted) and handle jq
failure by exiting non-zero; update references to OUT_FILE and PASS accordingly.
In `@tests/integration/test_evaluation_data.yaml`:
- Around line 16-18: Fix the typo in the metric key under turn_metrics_metadata:
replace the incorrect "ragas:respose_relevancy" with "ragas:response_relevancy"
so the threshold metadata is applied consistently (matches the existing
"ragas:response_relevancy" key).
ℹ️ Review info
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
.github/workflows/e2e_tests.yaml.gitignoretests/integration/lightspeed-stack.yamltests/integration/run.yamltests/integration/system-config-query.yamltests/integration/system-config-streaming.yamltests/integration/test_evaluation_data.yaml
🚧 Files skipped from review as they are similar to previous changes (3)
- tests/integration/lightspeed-stack.yaml
- tests/integration/system-config-query.yaml
- .gitignore
| docker run \ | ||
| --name $LSC_IMAGE_NAME \ | ||
| -it \ | ||
| -p 8080:8080 \ | ||
| -v $(pwd)/tests/integration/lightspeed-stack.yaml:/app-root/lightspeed-stack.yaml:Z \ | ||
| -v $(pwd)/tests/integration/run.yaml:/app-root/run.yaml:Z \ | ||
| -e OPENAI_API_KEY="${{ secrets.OPENAI_API_KEY }}" \ | ||
| --detach \ | ||
| ${{ matrix.lsc_image_path }} |
There was a problem hiding this comment.
Remove -it flag from detached container.
The -it flags allocate a pseudo-TTY and keep STDIN open, which is unnecessary for a detached container in CI and can cause unexpected behavior.
🔧 Proposed fix
docker run \
--name $LSC_IMAGE_NAME \
- -it \
-p 8080:8080 \📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| docker run \ | |
| --name $LSC_IMAGE_NAME \ | |
| -it \ | |
| -p 8080:8080 \ | |
| -v $(pwd)/tests/integration/lightspeed-stack.yaml:/app-root/lightspeed-stack.yaml:Z \ | |
| -v $(pwd)/tests/integration/run.yaml:/app-root/run.yaml:Z \ | |
| -e OPENAI_API_KEY="${{ secrets.OPENAI_API_KEY }}" \ | |
| --detach \ | |
| ${{ matrix.lsc_image_path }} | |
| docker run \ | |
| --name $LSC_IMAGE_NAME \ | |
| -p 8080:8080 \ | |
| -v $(pwd)/tests/integration/lightspeed-stack.yaml:/app-root/lightspeed-stack.yaml:Z \ | |
| -v $(pwd)/tests/integration/run.yaml:/app-root/run.yaml:Z \ | |
| -e OPENAI_API_KEY="${{ secrets.OPENAI_API_KEY }}" \ | |
| --detach \ | |
| ${{ matrix.lsc_image_path }} |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.github/workflows/e2e_tests.yaml around lines 58 - 66, The docker run
invocation currently includes the interactive flags "-it" while also using
"--detach", which can cause CI issues; remove the "-it" flags from the docker
run command (the block that starts with "docker run --name $LSC_IMAGE_NAME ...
--detach ${{ matrix.lsc_image_path }}") so the container runs detached without
allocating a pseudo-TTY or keeping STDIN open, leaving all other options (-p,
-v, -e, --name, --detach) unchanged.
| - name: Check test result | ||
| run: | | ||
| OUT_FILE=$( echo eval_output/evaluation_*_summary.json ) | ||
| PASS=$( jq .summary_stats.overall.PASS $OUT_FILE ) | ||
| EXPECTED="1" | ||
| if [ ${PASS} != ${EXPECTED} ] ; then | ||
| echo "============" | ||
| echo "Wrong PASS number in ${OUT_FILE}: got ${PASS}, expected ${EXPECTED}" | ||
| echo "============" | ||
| exit 1 | ||
| fi |
There was a problem hiding this comment.
Handle multiple or missing summary files gracefully.
The glob eval_output/evaluation_*_summary.json could match multiple files or none, causing jq to fail or read the wrong file.
🔧 Proposed fix with explicit file count check
- name: Check test result
run: |
- OUT_FILE=$( echo eval_output/evaluation_*_summary.json )
- PASS=$( jq .summary_stats.overall.PASS $OUT_FILE )
+ shopt -s nullglob
+ FILES=(eval_output/evaluation_*_summary.json)
+ if [ ${`#FILES`[@]} -ne 1 ]; then
+ echo "Expected exactly 1 summary file, found ${`#FILES`[@]}"
+ ls -la eval_output/
+ exit 1
+ fi
+ OUT_FILE="${FILES[0]}"
+ PASS=$( jq .summary_stats.overall.PASS "$OUT_FILE" )
EXPECTED="1"
- if [ ${PASS} != ${EXPECTED} ] ; then
+ if [ "${PASS}" != "${EXPECTED}" ] ; then🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.github/workflows/e2e_tests.yaml around lines 128 - 138, The test check
assumes a single summary file but the glob eval_output/evaluation_*_summary.json
may match zero or multiple files; replace the current OUT_FILE assignment with a
safe existence/count check: build an array of matches
(files=(eval_output/evaluation_*_summary.json)), verify that ${`#files`[@]} equals
1 and error out if not, then set OUT_FILE="${files[0]}" and run PASS=$(jq
.summary_stats.overall.PASS "$OUT_FILE") (quoted) and handle jq failure by
exiting non-zero; update references to OUT_FILE and PASS accordingly.
9a1a40c to
f575116
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (3)
.github/workflows/e2e_tests.yaml (3)
128-138:⚠️ Potential issue | 🟡 MinorHandle multiple or missing summary files gracefully.
The glob
eval_output/evaluation_*_summary.json(line 130) could match multiple files or none, causingjqto fail or read incorrect data. The variable should be quoted in the comparison (line 133).🔧 Proposed fix with explicit file count check
- name: Check test result run: | - OUT_FILE=$( echo eval_output/evaluation_*_summary.json ) - PASS=$( jq .summary_stats.overall.PASS $OUT_FILE ) + shopt -s nullglob + FILES=(eval_output/evaluation_*_summary.json) + if [ ${`#FILES`[@]} -ne 1 ]; then + echo "Expected exactly 1 summary file, found ${`#FILES`[@]}" + ls -la eval_output/ + exit 1 + fi + OUT_FILE="${FILES[0]}" + PASS=$( jq .summary_stats.overall.PASS "$OUT_FILE" ) EXPECTED="1" - if [ ${PASS} != ${EXPECTED} ] ; then + if [ "${PASS}" != "${EXPECTED}" ] ; then🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/e2e_tests.yaml around lines 128 - 138, The script assumes a single matching summary file and uses unquoted vars in the test; change to explicitly collect matches into an array (e.g., files=(eval_output/evaluation_*_summary.json)), check that ${`#files`[@]} equals 1 and fail with a clear message if zero or multiple matches, then set OUT_FILE="${files[0]}" and call PASS=$(jq .summary_stats.overall.PASS "$OUT_FILE"); also quote variables in the comparison: if [ "${PASS}" != "${EXPECTED}" ] ; then ... to avoid word-splitting and ensure correct behavior.
58-66:⚠️ Potential issue | 🟡 MinorRemove
-itflag from detached container.The
-itflags (line 60) allocate a pseudo-TTY and keep STDIN open, which is unnecessary and potentially problematic for a detached container in CI.🔧 Proposed fix
docker run \ --name $LSC_IMAGE_NAME \ - -it \ -p 8080:8080 \🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/e2e_tests.yaml around lines 58 - 66, Remove the unnecessary interactive flags from the docker run invocation: in the docker run block that creates the container named via $LSC_IMAGE_NAME and uses ${{ matrix.lsc_image_path }}, delete the "-it" option so the container is started detached without allocating a pseudo-TTY or keeping STDIN open.
4-4:⚠️ Potential issue | 🔴 CriticalSecurity risk:
pull_request_targetwith PR head checkout and secrets.This workflow uses
pull_request_target(line 4), checks out the PR head code (lines 31-35), and passesOPENAI_API_KEYsecret to both the container (line 64) and test execution (line 118). This combination allows malicious PRs from forks to exfiltrate secrets.🔒 Safer workflow pattern
-on: [push, pull_request_target] +on: [push, pull_request]With
pull_request, secrets are not available for fork PRs, preventing exfiltration while still running tests on trusted pushes.
🧹 Nitpick comments (2)
.github/workflows/e2e_tests.yaml (2)
141-148: Container cleanup only runs locally, not on GitHub Actions.The cleanup step (lines 142-148) only executes when
env.ACTis set, meaning containers are orphaned on actual GitHub Actions runs. While GitHub-hosted runners are ephemeral, explicitly cleaning up is a good practice and helps with self-hosted runners.♻️ Proposed fix to always clean up
# Cleanup - - name: Stop the LSC if in local devel - if: ${{ always() && env.ACT }} + - name: Stop the LSC + if: always() run: | echo "Stopping LSC container $LSC_IMAGE_NAME" echo "++++++++++++++++++++++" docker stop $LSC_IMAGE_NAME || true docker rm $LSC_IMAGE_NAME || true🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/e2e_tests.yaml around lines 141 - 148, The cleanup job currently only runs when env.ACT is set (if: ${{ always() && env.ACT }}), leaving containers orphaned on non-local GH Actions; change the step "Stop the LSC if in local devel" to run unconditionally after the job by removing the env.ACT conditional and using always() (e.g., if: ${{ always() }}), so the docker stop/rm commands that reference LSC_IMAGE_NAME run on every workflow run including GitHub-hosted and self-hosted runners.
76-90: Consider adding--silentto health check curl.The curl command on line 80 may pollute logs with progress output. Adding
--silentkeeps logs cleaner while--failstill ensures proper exit codes.♻️ Proposed fix
- if curl --output /dev/null --fail http://localhost:8080/v1/models ; then + if curl --silent --output /dev/null --fail http://localhost:8080/v1/models ; then🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/e2e_tests.yaml around lines 76 - 90, The curl health-check in the "Wait for the LSC" loop uses "curl --fail http://localhost:8080/v1/models" and can emit progress output into logs; update the curl invocation in that for-loop to include the --silent flag (i.e., use --silent --fail) so it still fails on non-2xx responses but suppresses progress output while polling.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/integration/run.yaml`:
- Around line 50-53: The braintrust provider block (provider_id: braintrust,
provider_type: inline::braintrust) is using a placeholder openai_api_key
('********') which will cause silent failures in e2e runs; replace the hardcoded
placeholder with an environment variable reference (e.g., use a runtime
substitution like BRAINTRUST_OPENAI_API_KEY) so tests pick up a real key, or
explicitly remove/disable the braintrust provider entry and add a comment or
test config flag stating that braintrust scoring is not exercised in these
integration tests to avoid running it with an invalid key.
---
Duplicate comments:
In @.github/workflows/e2e_tests.yaml:
- Around line 128-138: The script assumes a single matching summary file and
uses unquoted vars in the test; change to explicitly collect matches into an
array (e.g., files=(eval_output/evaluation_*_summary.json)), check that
${`#files`[@]} equals 1 and fail with a clear message if zero or multiple matches,
then set OUT_FILE="${files[0]}" and call PASS=$(jq .summary_stats.overall.PASS
"$OUT_FILE"); also quote variables in the comparison: if [ "${PASS}" !=
"${EXPECTED}" ] ; then ... to avoid word-splitting and ensure correct behavior.
- Around line 58-66: Remove the unnecessary interactive flags from the docker
run invocation: in the docker run block that creates the container named via
$LSC_IMAGE_NAME and uses ${{ matrix.lsc_image_path }}, delete the "-it" option
so the container is started detached without allocating a pseudo-TTY or keeping
STDIN open.
---
Nitpick comments:
In @.github/workflows/e2e_tests.yaml:
- Around line 141-148: The cleanup job currently only runs when env.ACT is set
(if: ${{ always() && env.ACT }}), leaving containers orphaned on non-local GH
Actions; change the step "Stop the LSC if in local devel" to run unconditionally
after the job by removing the env.ACT conditional and using always() (e.g., if:
${{ always() }}), so the docker stop/rm commands that reference LSC_IMAGE_NAME
run on every workflow run including GitHub-hosted and self-hosted runners.
- Around line 76-90: The curl health-check in the "Wait for the LSC" loop uses
"curl --fail http://localhost:8080/v1/models" and can emit progress output into
logs; update the curl invocation in that for-loop to include the --silent flag
(i.e., use --silent --fail) so it still fails on non-2xx responses but
suppresses progress output while polling.
ℹ️ Review info
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
.github/workflows/e2e_tests.yaml.gitignoretests/integration/lightspeed-stack.yamltests/integration/run.yamltests/integration/system-config-query.yamltests/integration/system-config-streaming.yamltests/integration/test_evaluation_data.yaml
🚧 Files skipped from review as they are similar to previous changes (3)
- tests/integration/system-config-query.yaml
- tests/integration/system-config-streaming.yaml
- .gitignore
| - provider_id: braintrust | ||
| provider_type: inline::braintrust | ||
| config: | ||
| openai_api_key: '********' |
There was a problem hiding this comment.
Placeholder API key may cause silent failures.
The braintrust provider has openai_api_key: '********' which appears to be a placeholder. If this provider is exercised during tests, it will likely fail. Either provide a valid key via environment variable substitution (similar to line 25), or document that braintrust scoring is not exercised in these e2e tests.
🔧 Suggested fix using environment variable
- provider_id: braintrust
provider_type: inline::braintrust
config:
- openai_api_key: '********'
+ openai_api_key: ${env.OPENAI_API_KEY:=}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/integration/run.yaml` around lines 50 - 53, The braintrust provider
block (provider_id: braintrust, provider_type: inline::braintrust) is using a
placeholder openai_api_key ('********') which will cause silent failures in e2e
runs; replace the hardcoded placeholder with an environment variable reference
(e.g., use a runtime substitution like BRAINTRUST_OPENAI_API_KEY) so tests pick
up a real key, or explicitly remove/disable the braintrust provider entry and
add a comment or test config flag stating that braintrust scoring is not
exercised in these integration tests to avoid running it with an invalid key.
Description
Adding e2e test with Lightspeed Stack
Type of change
Tools used to create PR
Identify any AI code assistants used in this PR (for transparency and review context)
Related Tickets & Documents
Checklist before requesting a review
Testing
Summary by CodeRabbit