Conversation
There was a problem hiding this comment.
Pull request overview
Adds a lightweight, scenario-based shell test framework for validating a running foc-devnet, and wires it into CI (with optional GitHub issue reporting) plus local shell/Rust pre-commit checks.
Changes:
- Introduce
scenarios/runner, shared assertion library, and initial scenarios (containers + balances). - Extend CI to run scenario tests (nightly + optional workflow dispatch reporting) and add a shell lint/format job.
- Add repo-local git hooks (
.githooks/) with an installer and a pre-commit quality gate.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
scenarios/lib.sh |
Shared helpers for scenario tests (assertions, devnet-info access, result tracking). |
scenarios/run.sh |
Runs scenarios in order, writes a markdown report, optionally opens a GitHub issue. |
scenarios/order.sh |
Defines the serial execution order for scenarios. |
scenarios/test_containers.sh |
Validates expected containers are running and attempts to detect unexpected ones. |
scenarios/test_basic_balances.sh |
Checks FIL + MockUSDFC balances for all users (and installs Foundry if needed). |
README_ADVANCED.md |
Documentation for running/writing scenarios and CI behavior. |
.github/workflows/ci.yml |
Adds nightly/dispatch triggers, shellcheck/shfmt job, scenario execution + artifact upload, and issue-write permission. |
.githooks/pre-commit |
Pre-commit checks for Rust + shell (with optional auto-fix mode). |
.githooks/install.sh |
Installs hooks by setting core.hooksPath to .githooks/. |
scenarios/test_basic_balances.sh
Outdated
| # ── Ensure Foundry is available ────────────────────────────── | ||
| if ! command -v cast &>/dev/null; then | ||
| info "Installing Foundry …" | ||
| curl -sSL https://foundry.paradigm.xyz | bash |
There was a problem hiding this comment.
After installing Foundry via curl ... | bash, cast may still not be available because the standard install flow requires running foundryup (the builder Dockerfile does this explicitly). Without that, assert_ok command -v cast can fail even though the install command succeeded. Run foundryup after installation (and ensure the PATH update is in place), or alternatively use the cast binary from the existing foc-<run_id>-builder container instead of installing on the host.
| curl -sSL https://foundry.paradigm.xyz | bash | |
| curl -sSL https://foundry.paradigm.xyz | bash | |
| "$HOME/.foundry/bin/foundryup" |
scenarios/test_basic_balances.sh
Outdated
|
|
||
| SCENARIO_DIR="$(cd "$(dirname "$0")" && pwd)" | ||
| source "${SCENARIO_DIR}/lib.sh" | ||
| scenario_start "basic_balances" |
There was a problem hiding this comment.
The scenario name used for reporting (scenario_start "basic_balances") doesn’t match the script basename (test_basic_balances) used by the runner. This causes the per-scenario "Details" table (from results.csv) and the "Failed scenarios" list (from the runner) to use different identifiers, making reports harder to correlate. Consider using the same name as the entry in order.sh.
| scenario_start "basic_balances" | |
| scenario_start "test_basic_balances" |
scenarios/lib.sh
Outdated
| # ── devnet-info helpers ────────────────────────────────────── | ||
|
|
||
| # Shorthand: jq_devnet '.info.users[0].evm_addr' | ||
| jq_devnet() { jq -r "$1" "$DEVNET_INFO"; } |
There was a problem hiding this comment.
Because scripts run with set -e, any unexpected non-zero exit from jq, docker, cast, etc. will terminate the scenario immediately and skip scenario_end, which means results.csv won’t be updated and the runner report can miss details. Consider adding a trap-based fallback that records a failure on EXIT/ERR, and/or wrapping external commands in assertion helpers so failures increment _FAIL instead of aborting the script.
| jq_devnet() { jq -r "$1" "$DEVNET_INFO"; } | |
| jq_devnet() { | |
| if ! jq -r "$1" "$DEVNET_INFO"; then | |
| fail "jq_devnet: jq failed for filter '$1' on '$DEVNET_INFO'" | |
| return 1 | |
| fi | |
| } |
scenarios/test_containers.sh
Outdated
| # ── Check no unexpected foc-c-* containers are running ─────── | ||
| # All foc-c-* containers should belong to the expected set | ||
| RUNNING=$(docker ps --filter "name=foc-c-" --format '{{.Names}}') |
There was a problem hiding this comment.
The "unexpected container" check won’t match the actual container naming scheme. Containers are named like foc-<run_id>-curio-<idx> / foc-<run_id>-lotus (see src/docker/containers.rs), so docker ps --filter "name=foc-c-" will typically return nothing and zombie containers won’t be detected. Consider filtering by the run-scoped prefix from devnet-info (e.g., foc-${run_id}-) or at least name=foc- and then comparing against the expected set (allowlisting known extras like portainer/builder if needed).
| # ── Check no unexpected foc-c-* containers are running ─────── | |
| # All foc-c-* containers should belong to the expected set | |
| RUNNING=$(docker ps --filter "name=foc-c-" --format '{{.Names}}') | |
| # ── Check no unexpected foc-* containers are running ───────── | |
| # All foc-* containers for this devnet run should belong to the expected set. | |
| # Prefer the run-scoped prefix from devnet-info when available, fall back to foc-. | |
| RUN_ID="$(jq_devnet '.info.run_id // ""')" | |
| if [[ -n "$RUN_ID" ]]; then | |
| NAME_FILTER="foc-${RUN_ID}-" | |
| else | |
| NAME_FILTER="foc-" | |
| fi | |
| RUNNING=$(docker ps --filter "name=${NAME_FILTER}" --format '{{.Names}}') |
scenarios/test_containers.sh
Outdated
|
|
||
| SCENARIO_DIR="$(cd "$(dirname "$0")" && pwd)" | ||
| source "${SCENARIO_DIR}/lib.sh" | ||
| scenario_start "containers" |
There was a problem hiding this comment.
The scenario name used for reporting (scenario_start "containers") doesn’t match the script basename (test_containers) used by the runner. This causes the per-scenario "Details" table (from results.csv) and the "Failed scenarios" list (from the runner) to use different identifiers, making reports harder to correlate. Consider using the same name as the entry in order.sh.
| scenario_start "containers" | |
| scenario_start "test_containers" |
| REPORTING="${REPORTING:-false}" | ||
| SKIP_REPORT_ON_PASS="${SKIP_REPORT_ON_PASS:-true}" | ||
|
|
||
| # ── Bootstrap ──────────────────────────────────────────────── |
There was a problem hiding this comment.
run.sh assumes ${REPORT_DIR} exists when writing the markdown report (redirecting into ${REPORT_DIR}/scenario_<ts>.md). If the directory doesn’t exist (e.g., a fresh machine without ~/.foc-devnet/state/latest yet), the runner will fail after executing scenarios. Consider mkdir -p "${REPORT_DIR}" during bootstrap before cleaning/writing report artifacts.
| # ── Bootstrap ──────────────────────────────────────────────── | |
| # ── Bootstrap ──────────────────────────────────────────────── | |
| # Ensure report directory exists before cleaning/writing artifacts | |
| mkdir -p "${REPORT_DIR}" |
.githooks/pre-commit
Outdated
|
|
||
| # ── Rust checks ────────────────────────────────────────────── | ||
| HAS_RS=$(echo "$STAGED" | grep -c '\.rs$' || true) | ||
| if [[ $HAS_RS -gt 0 ]]; then | ||
| if command -v cargo &>/dev/null; then | ||
| if cargo fmt --all -- --check &>/dev/null; then | ||
| pass "cargo fmt" | ||
| elif [[ "$FIX" == "1" ]]; then | ||
| cargo fmt --all | ||
| git diff --name-only -- '*.rs' | xargs -r git add |
There was a problem hiding this comment.
In FIX mode, the Rust formatter restages all modified *.rs files (git diff --name-only -- '*.rs' | xargs ...), including unstaged changes that weren’t part of the commit. This can accidentally add unrelated work to the commit. Consider re-staging only the originally staged Rust files (or only the files that were modified by cargo fmt among the staged set).
| # ── Rust checks ────────────────────────────────────────────── | |
| HAS_RS=$(echo "$STAGED" | grep -c '\.rs$' || true) | |
| if [[ $HAS_RS -gt 0 ]]; then | |
| if command -v cargo &>/dev/null; then | |
| if cargo fmt --all -- --check &>/dev/null; then | |
| pass "cargo fmt" | |
| elif [[ "$FIX" == "1" ]]; then | |
| cargo fmt --all | |
| git diff --name-only -- '*.rs' | xargs -r git add | |
| STAGED_RS=$(echo "$STAGED" | grep '\.rs$' || true) | |
| # ── Rust checks ────────────────────────────────────────────── | |
| HAS_RS=$(echo "$STAGED_RS" | grep -c '\.rs$' || true) | |
| if [[ $HAS_RS -gt 0 ]]; then | |
| if command -v cargo &>/dev/null; then | |
| if cargo fmt --all -- --check &>/dev/null; then | |
| pass "cargo fmt" | |
| elif [[ "$FIX" == "1" ]]; then | |
| cargo fmt --all | |
| echo "$STAGED_RS" | xargs -r git add |
scenarios/lib.sh
Outdated
| info() { _log "[INFO]" "$*"; } | ||
| ok() { | ||
| _log "[ OK ]" "$*" | ||
| ((_PASS++)) || true | ||
| } | ||
| fail() { | ||
| _log "[FAIL]" "$*" |
There was a problem hiding this comment.
Logging currently produces doubled brackets (e.g., info passes "[INFO]", but _log wraps the level in brackets again). This results in output like [[INFO]] message. Consider passing plain level names (e.g., INFO, OK, FAIL) or removing the extra brackets in _log so log lines are consistent and readable.
| info() { _log "[INFO]" "$*"; } | |
| ok() { | |
| _log "[ OK ]" "$*" | |
| ((_PASS++)) || true | |
| } | |
| fail() { | |
| _log "[FAIL]" "$*" | |
| info() { _log "INFO" "$*"; } | |
| ok() { | |
| _log " OK " "$*" | |
| ((_PASS++)) || true | |
| } | |
| fail() { | |
| _log "FAIL" "$*" |
scenarios/test_containers.sh
Outdated
| # Verifies that all foc-* containers reported in devnet-info.json | ||
| # are actually running, healthy, and that no zombie foc-* | ||
| # containers exist outside the current run. |
There was a problem hiding this comment.
The header comment says this script verifies containers are "healthy" and that no zombie foc-* containers exist, but the implementation only checks .State.Status == running and only scans for names matching foc-c-. Either update the comment to match the actual checks, or extend the implementation to check Docker health status (when present) and to scan the intended foc- namespace.
scenarios/test_basic_balances.sh
Outdated
| # ── Ensure Foundry is available ────────────────────────────── | ||
| if ! command -v cast &>/dev/null; then | ||
| info "Installing Foundry …" | ||
| curl -sSL https://foundry.paradigm.xyz | bash |
There was a problem hiding this comment.
This script downloads and executes remote code via curl -sSL https://foundry.paradigm.xyz | bash without any integrity or authenticity verification. Because scenarios are run automatically in CI (with access to repository contents and CI secrets), a compromise of foundry.paradigm.xyz or its DNS/TLS could lead to arbitrary code execution in your pipeline and exfiltration of secrets. To mitigate this supply chain risk, avoid curl | bash installers in CI and instead use a pinned Foundry version with checksum/signature verification or a trusted package/source baked into the CI image.
BigLep
left a comment
There was a problem hiding this comment.
Thanks for getting the draft out. Things coming to mind:
-
Can we expand #8 with the set of steps/tasks we're going to do to make sure we're on the same page about the big picture? (This could be actual sub-issues or just a checklist task list).
- I'm wanting to make sure we're accounting for the ability to run tests against latest set of tagged releases and latest changes in "main" branches. (I know that isn't part of this PR, but I want to make sure we are tracking this need.)
- I want to make sure we excercise doing the E2E Synapse flow, including the two copy version. (Again, not part of this PR, but I want to make sure that is in scope and we're tracking it.)
-
Why bash scripting vs. using a more robust language that we're more familiar reading and working with? I appreciate that you're enforcing formatting, static analysis, and some library functions, but it also seems like we're using a language that is not developer ergonomic. (If other devs like @rvagg like this stuff in Bash, then I'll defer, but I generally shy away from long bash scripts for anything I have to maintain for the long term.). If we keep Bash, could we add a sentence or two in README_ADVANCED.md on why we chose shell over Node/Python for the runner? I think that would help future contributors.
There was a problem hiding this comment.
What is the value of adding githhooks if we can't rely on them being installed and we're running these checks in CI anyways? Does it speed up developement? As this is basically duplicating what is setu in CI, lets add comments explaining why we're doing this.
There was a problem hiding this comment.
On my system, it fixes before commits, things like fmt and others. These do not strictly duplicate CI, but there is an argument here. We can instead deduplicate and run the same script with FIX=0 and FIX=1 I guess.
| if [[ "$SHOULD_REPORT" == "true" ]]; then | ||
| RUN_URL="${GITHUB_SERVER_URL:-https://github.com}/${GITHUB_REPOSITORY:-unknown}/actions/runs/${GITHUB_RUN_ID:-0}" | ||
| STATUS_EMOJI="✅" | ||
| [[ $FAILED -gt 0 ]] && STATUS_EMOJI="❌" | ||
| ISSUE_TITLE="${STATUS_EMOJI} Scenario report: ${PASSED}/${TOTAL} passed ($(date -u +%Y-%m-%d))" | ||
| ISSUE_BODY="$(cat "$REPORT") |
There was a problem hiding this comment.
I think we want to dedupe the isue creation? Basically if we are failing 3 nights in a row, I don't think we want 3 issues. We want one issue with two additional comments. That way it's less to manage and the dialog happens in one place. I believe that is what happens in https://github.com/filecoin-project/lotus/blob/master/.github/workflows/very-expensive-test.yml
There was a problem hiding this comment.
Got it! Will add that (ipdxco/create-or-update-issue@v1) to workflow, potentially in python variant if that seems better.
Yes, that will be part of this PR or another PR targetting this branch (haven't made my mind yet). It essentially needs foc-devnet to understand that it needs to figure out latest tag / latest commits. Maybe that can be just a flag like These will be used within the "scheduled" runs every night. As for the number of tests, I feel like the following should make sense:
as first set of items. I am trying this PR to be focused on getting something running in terms of framework, the exact coverage and number of tests can be improved over time, albeit still being tracked in #8 .
I just intended this to be "not rust", so that almost everyone can contribute, with or without knowledge of rust. Had a conversation with @rvagg yesterday and as per my understanding, he is okay with shell scripts. But I agree with your point that bash may not be the most ergonomic. At the same time however, thinking of dev experience locally (for reproduction of issues):
Shell scripts however run straight out of the box, need no additional toolchain, even locally and many a times are smaller / compact, albeit not the simplest to understand (if you opt into the nastiness of bash). If each test is kept small and separate however, with foc-devnet start / stop being done in rust, then all I think what remains is:
tapping not so much into high complexity of shell scripts, and that should be easy enough to understand. I feel that it is right to assume these scenario tests will stay small <150 lines per test and independent, and AI systems can quickly understand them. And hence for allowing more developer participation, this may be not the perfect but okay middleground. In the worst case, if it does need python / JS because we are doing something complex, that specific scenario can in fact, under this model opt into those other runtimes (JS/TS/Python/Go). However, the invariant should be that "scenarios" are "self-contained", and failure of a "scenario" is not failure of "foc-devnet" necessarily. This also taps into the model of scenarios themselves and who owns them, and I would like to ask your thoughts on this end. Scenario Ownership: Thought needs to be put in here on how do we think of "scenarios" and ownership model surrounding that? Do we think of "scenarios" as something "unbreakable", owned by FilOz only and that "foc-devnet" has to take ownership of it? Or it is simply a fixture that stays in "foc-devnet" and runs every so often and issues arising from this will be triaged, even across teams? I feel it is the latter. For example, there may be a scenario where curio teams wants to check what happens to curio under a specific setup in FOC. In such scenarios, the curio team becomes the CODEOWNER of that scenario and triaging. In such a setup, "scenarios" should be as lightweight as possible, or as heavy as they want. I feel other users here could be @wjmelements and/or @ZenGround0 as well. I am not opinionated too much on shell scripts being the "de-facto" here and if there are suggestions in other direction, I am willing to pivot. |
|
#72 is an attempt at pythonic testing scripts |
|
@redpanda-f : I know I need to review your comments. I'll do so before you're back 2026-03-05. |
Part of #8
Scenarios Testing Framework + Nightlies
Introduces a lightweight shell-based test framework that runs scenarios against a live devnet.
scenarios/— runner (run.sh), shared helpers (lib.sh), execution order (order.sh), and two initial scenarios: container health checks and balance verificationshellcheckCI job enforces formatting (shfmt) and lint (shellcheck) on all.shfilescargo fmt,clippy,shfmt,shellcheck, and+xchecks; auto-fix mode viaFIX=1 git commit