Skip to content

(WIP) feat: scenarios based testing#71

Open
redpanda-f wants to merge 9 commits intomainfrom
feat/redpanda/feat-scenarios
Open

(WIP) feat: scenarios based testing#71
redpanda-f wants to merge 9 commits intomainfrom
feat/redpanda/feat-scenarios

Conversation

@redpanda-f
Copy link
Collaborator

@redpanda-f redpanda-f commented Mar 2, 2026

Part of #8

Scenarios Testing Framework + Nightlies

Introduces a lightweight shell-based test framework that runs scenarios against a live devnet.

  • scenarios/ — runner (run.sh), shared helpers (lib.sh), execution order (order.sh), and two initial scenarios: container health checks and balance verification
  • CI — scenarios run after cluster start; nightly schedule + manual dispatch with optional GitHub issue reporting on failure
  • Linting — new shellcheck CI job enforces formatting (shfmt) and lint (shellcheck) on all .sh files
  • Pre-commit hookscargo fmt, clippy, shfmt, shellcheck, and +x checks; auto-fix mode via FIX=1 git commit

Copilot AI review requested due to automatic review settings March 2, 2026 14:56
@FilOzzy FilOzzy added this to FOC Mar 2, 2026
@github-project-automation github-project-automation bot moved this to 📌 Triage in FOC Mar 2, 2026
@redpanda-f redpanda-f self-assigned this Mar 2, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a lightweight, scenario-based shell test framework for validating a running foc-devnet, and wires it into CI (with optional GitHub issue reporting) plus local shell/Rust pre-commit checks.

Changes:

  • Introduce scenarios/ runner, shared assertion library, and initial scenarios (containers + balances).
  • Extend CI to run scenario tests (nightly + optional workflow dispatch reporting) and add a shell lint/format job.
  • Add repo-local git hooks (.githooks/) with an installer and a pre-commit quality gate.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
scenarios/lib.sh Shared helpers for scenario tests (assertions, devnet-info access, result tracking).
scenarios/run.sh Runs scenarios in order, writes a markdown report, optionally opens a GitHub issue.
scenarios/order.sh Defines the serial execution order for scenarios.
scenarios/test_containers.sh Validates expected containers are running and attempts to detect unexpected ones.
scenarios/test_basic_balances.sh Checks FIL + MockUSDFC balances for all users (and installs Foundry if needed).
README_ADVANCED.md Documentation for running/writing scenarios and CI behavior.
.github/workflows/ci.yml Adds nightly/dispatch triggers, shellcheck/shfmt job, scenario execution + artifact upload, and issue-write permission.
.githooks/pre-commit Pre-commit checks for Rust + shell (with optional auto-fix mode).
.githooks/install.sh Installs hooks by setting core.hooksPath to .githooks/.

# ── Ensure Foundry is available ──────────────────────────────
if ! command -v cast &>/dev/null; then
info "Installing Foundry …"
curl -sSL https://foundry.paradigm.xyz | bash
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After installing Foundry via curl ... | bash, cast may still not be available because the standard install flow requires running foundryup (the builder Dockerfile does this explicitly). Without that, assert_ok command -v cast can fail even though the install command succeeded. Run foundryup after installation (and ensure the PATH update is in place), or alternatively use the cast binary from the existing foc-<run_id>-builder container instead of installing on the host.

Suggested change
curl -sSL https://foundry.paradigm.xyz | bash
curl -sSL https://foundry.paradigm.xyz | bash
"$HOME/.foundry/bin/foundryup"

Copilot uses AI. Check for mistakes.

SCENARIO_DIR="$(cd "$(dirname "$0")" && pwd)"
source "${SCENARIO_DIR}/lib.sh"
scenario_start "basic_balances"
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The scenario name used for reporting (scenario_start "basic_balances") doesn’t match the script basename (test_basic_balances) used by the runner. This causes the per-scenario "Details" table (from results.csv) and the "Failed scenarios" list (from the runner) to use different identifiers, making reports harder to correlate. Consider using the same name as the entry in order.sh.

Suggested change
scenario_start "basic_balances"
scenario_start "test_basic_balances"

Copilot uses AI. Check for mistakes.
scenarios/lib.sh Outdated
# ── devnet-info helpers ──────────────────────────────────────

# Shorthand: jq_devnet '.info.users[0].evm_addr'
jq_devnet() { jq -r "$1" "$DEVNET_INFO"; }
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because scripts run with set -e, any unexpected non-zero exit from jq, docker, cast, etc. will terminate the scenario immediately and skip scenario_end, which means results.csv won’t be updated and the runner report can miss details. Consider adding a trap-based fallback that records a failure on EXIT/ERR, and/or wrapping external commands in assertion helpers so failures increment _FAIL instead of aborting the script.

Suggested change
jq_devnet() { jq -r "$1" "$DEVNET_INFO"; }
jq_devnet() {
if ! jq -r "$1" "$DEVNET_INFO"; then
fail "jq_devnet: jq failed for filter '$1' on '$DEVNET_INFO'"
return 1
fi
}

Copilot uses AI. Check for mistakes.
Comment on lines +32 to +34
# ── Check no unexpected foc-c-* containers are running ───────
# All foc-c-* containers should belong to the expected set
RUNNING=$(docker ps --filter "name=foc-c-" --format '{{.Names}}')
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "unexpected container" check won’t match the actual container naming scheme. Containers are named like foc-<run_id>-curio-<idx> / foc-<run_id>-lotus (see src/docker/containers.rs), so docker ps --filter "name=foc-c-" will typically return nothing and zombie containers won’t be detected. Consider filtering by the run-scoped prefix from devnet-info (e.g., foc-${run_id}-) or at least name=foc- and then comparing against the expected set (allowlisting known extras like portainer/builder if needed).

Suggested change
# ── Check no unexpected foc-c-* containers are running ───────
# All foc-c-* containers should belong to the expected set
RUNNING=$(docker ps --filter "name=foc-c-" --format '{{.Names}}')
# ── Check no unexpected foc-* containers are running ─────────
# All foc-* containers for this devnet run should belong to the expected set.
# Prefer the run-scoped prefix from devnet-info when available, fall back to foc-.
RUN_ID="$(jq_devnet '.info.run_id // ""')"
if [[ -n "$RUN_ID" ]]; then
NAME_FILTER="foc-${RUN_ID}-"
else
NAME_FILTER="foc-"
fi
RUNNING=$(docker ps --filter "name=${NAME_FILTER}" --format '{{.Names}}')

Copilot uses AI. Check for mistakes.

SCENARIO_DIR="$(cd "$(dirname "$0")" && pwd)"
source "${SCENARIO_DIR}/lib.sh"
scenario_start "containers"
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The scenario name used for reporting (scenario_start "containers") doesn’t match the script basename (test_containers) used by the runner. This causes the per-scenario "Details" table (from results.csv) and the "Failed scenarios" list (from the runner) to use different identifiers, making reports harder to correlate. Consider using the same name as the entry in order.sh.

Suggested change
scenario_start "containers"
scenario_start "test_containers"

Copilot uses AI. Check for mistakes.
REPORTING="${REPORTING:-false}"
SKIP_REPORT_ON_PASS="${SKIP_REPORT_ON_PASS:-true}"

# ── Bootstrap ────────────────────────────────────────────────
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

run.sh assumes ${REPORT_DIR} exists when writing the markdown report (redirecting into ${REPORT_DIR}/scenario_<ts>.md). If the directory doesn’t exist (e.g., a fresh machine without ~/.foc-devnet/state/latest yet), the runner will fail after executing scenarios. Consider mkdir -p "${REPORT_DIR}" during bootstrap before cleaning/writing report artifacts.

Suggested change
# ── Bootstrap ────────────────────────────────────────────────
# ── Bootstrap ────────────────────────────────────────────────
# Ensure report directory exists before cleaning/writing artifacts
mkdir -p "${REPORT_DIR}"

Copilot uses AI. Check for mistakes.
Comment on lines +34 to +43

# ── Rust checks ──────────────────────────────────────────────
HAS_RS=$(echo "$STAGED" | grep -c '\.rs$' || true)
if [[ $HAS_RS -gt 0 ]]; then
if command -v cargo &>/dev/null; then
if cargo fmt --all -- --check &>/dev/null; then
pass "cargo fmt"
elif [[ "$FIX" == "1" ]]; then
cargo fmt --all
git diff --name-only -- '*.rs' | xargs -r git add
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In FIX mode, the Rust formatter restages all modified *.rs files (git diff --name-only -- '*.rs' | xargs ...), including unstaged changes that weren’t part of the commit. This can accidentally add unrelated work to the commit. Consider re-staging only the originally staged Rust files (or only the files that were modified by cargo fmt among the staged set).

Suggested change
# ── Rust checks ──────────────────────────────────────────────
HAS_RS=$(echo "$STAGED" | grep -c '\.rs$' || true)
if [[ $HAS_RS -gt 0 ]]; then
if command -v cargo &>/dev/null; then
if cargo fmt --all -- --check &>/dev/null; then
pass "cargo fmt"
elif [[ "$FIX" == "1" ]]; then
cargo fmt --all
git diff --name-only -- '*.rs' | xargs -r git add
STAGED_RS=$(echo "$STAGED" | grep '\.rs$' || true)
# ── Rust checks ──────────────────────────────────────────────
HAS_RS=$(echo "$STAGED_RS" | grep -c '\.rs$' || true)
if [[ $HAS_RS -gt 0 ]]; then
if command -v cargo &>/dev/null; then
if cargo fmt --all -- --check &>/dev/null; then
pass "cargo fmt"
elif [[ "$FIX" == "1" ]]; then
cargo fmt --all
echo "$STAGED_RS" | xargs -r git add

Copilot uses AI. Check for mistakes.
scenarios/lib.sh Outdated
Comment on lines +54 to +60
info() { _log "[INFO]" "$*"; }
ok() {
_log "[ OK ]" "$*"
((_PASS++)) || true
}
fail() {
_log "[FAIL]" "$*"
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logging currently produces doubled brackets (e.g., info passes "[INFO]", but _log wraps the level in brackets again). This results in output like [[INFO]] message. Consider passing plain level names (e.g., INFO, OK, FAIL) or removing the extra brackets in _log so log lines are consistent and readable.

Suggested change
info() { _log "[INFO]" "$*"; }
ok() {
_log "[ OK ]" "$*"
((_PASS++)) || true
}
fail() {
_log "[FAIL]" "$*"
info() { _log "INFO" "$*"; }
ok() {
_log " OK " "$*"
((_PASS++)) || true
}
fail() {
_log "FAIL" "$*"

Copilot uses AI. Check for mistakes.
Comment on lines +5 to +7
# Verifies that all foc-* containers reported in devnet-info.json
# are actually running, healthy, and that no zombie foc-*
# containers exist outside the current run.
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The header comment says this script verifies containers are "healthy" and that no zombie foc-* containers exist, but the implementation only checks .State.Status == running and only scans for names matching foc-c-. Either update the comment to match the actual checks, or extend the implementation to check Docker health status (when present) and to scan the intended foc- namespace.

Copilot uses AI. Check for mistakes.
# ── Ensure Foundry is available ──────────────────────────────
if ! command -v cast &>/dev/null; then
info "Installing Foundry …"
curl -sSL https://foundry.paradigm.xyz | bash
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This script downloads and executes remote code via curl -sSL https://foundry.paradigm.xyz | bash without any integrity or authenticity verification. Because scenarios are run automatically in CI (with access to repository contents and CI secrets), a compromise of foundry.paradigm.xyz or its DNS/TLS could lead to arbitrary code execution in your pipeline and exfiltration of secrets. To mitigate this supply chain risk, avoid curl | bash installers in CI and instead use a pinned Foundry version with checksum/signature verification or a trusted package/source baked into the CI image.

Copilot uses AI. Check for mistakes.
@redpanda-f redpanda-f changed the title feat: scenarios based testing (WIP) feat: scenarios based testing Mar 2, 2026
@BigLep BigLep moved this from 📌 Triage to ⌨️ In Progress in FOC Mar 2, 2026
Copy link
Contributor

@BigLep BigLep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for getting the draft out. Things coming to mind:

  1. Can we expand #8 with the set of steps/tasks we're going to do to make sure we're on the same page about the big picture? (This could be actual sub-issues or just a checklist task list).

    • I'm wanting to make sure we're accounting for the ability to run tests against latest set of tagged releases and latest changes in "main" branches. (I know that isn't part of this PR, but I want to make sure we are tracking this need.)
    • I want to make sure we excercise doing the E2E Synapse flow, including the two copy version. (Again, not part of this PR, but I want to make sure that is in scope and we're tracking it.)
  2. Why bash scripting vs. using a more robust language that we're more familiar reading and working with? I appreciate that you're enforcing formatting, static analysis, and some library functions, but it also seems like we're using a language that is not developer ergonomic. (If other devs like @rvagg like this stuff in Bash, then I'll defer, but I generally shy away from long bash scripts for anything I have to maintain for the long term.). If we keep Bash, could we add a sentence or two in README_ADVANCED.md on why we chose shell over Node/Python for the runner? I think that would help future contributors.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the value of adding githhooks if we can't rely on them being installed and we're running these checks in CI anyways? Does it speed up developement? As this is basically duplicating what is setu in CI, lets add comments explaining why we're doing this.

Copy link
Collaborator Author

@redpanda-f redpanda-f Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On my system, it fixes before commits, things like fmt and others. These do not strictly duplicate CI, but there is an argument here. We can instead deduplicate and run the same script with FIX=0 and FIX=1 I guess.

Comment on lines +129 to +134
if [[ "$SHOULD_REPORT" == "true" ]]; then
RUN_URL="${GITHUB_SERVER_URL:-https://github.com}/${GITHUB_REPOSITORY:-unknown}/actions/runs/${GITHUB_RUN_ID:-0}"
STATUS_EMOJI="✅"
[[ $FAILED -gt 0 ]] && STATUS_EMOJI="❌"
ISSUE_TITLE="${STATUS_EMOJI} Scenario report: ${PASSED}/${TOTAL} passed ($(date -u +%Y-%m-%d))"
ISSUE_BODY="$(cat "$REPORT")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want to dedupe the isue creation? Basically if we are failing 3 nights in a row, I don't think we want 3 issues. We want one issue with two additional comments. That way it's less to manage and the dialog happens in one place. I believe that is what happens in https://github.com/filecoin-project/lotus/blob/master/.github/workflows/very-expensive-test.yml

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it! Will add that (ipdxco/create-or-update-issue@v1) to workflow, potentially in python variant if that seems better.

@redpanda-f
Copy link
Collaborator Author

redpanda-f commented Mar 3, 2026

Thanks for getting the draft out. Things coming to mind:

  1. Can we expand CI/Nightly End-to-End Validation of FOC as a whole #8 with the set of steps/tasks we're going to do to make sure we're on the same page about the big picture? (This could be actual sub-issues or just a checklist task list).

    • I'm wanting to make sure we're accounting for the ability to run tests against latest set of tagged releases and latest changes in "main" branches. (I know that isn't part of this PR, but I want to make sure we are tracking this need.)
    • I want to make sure we excercise doing the E2E Synapse flow, including the two copy version. (Again, not part of this PR, but I want to make sure that is in scope and we're tracking it.)

Yes, that will be part of this PR or another PR targetting this branch (haven't made my mind yet). It essentially needs foc-devnet to understand that it needs to figure out latest tag / latest commits. Maybe that can be just a flag like --curio latestCommit --lotus latestCommit etc. or --curio latestTag --lotus latestTag or something. In that way, we can have:

foc-devnet start \
    --curio latestCommit \
    --filecoin-services latestCommit \
    ...

These will be used within the "scheduled" runs every night.

As for the number of tests, I feel like the following should make sense:

  1. Check Balances
  2. E2E Synapse Flow (basically porting over over synapse, what we tested before)

as first set of items. I am trying this PR to be focused on getting something running in terms of framework, the exact coverage and number of tests can be improved over time, albeit still being tracked in #8 .

  1. Why bash scripting vs. using a more robust language that we're more familiar reading and working with? I appreciate that you're enforcing formatting, static analysis, and some library functions, but it also seems like we're using a language that is not developer ergonomic. (If other devs like @rvagg like this stuff in Bash, then I'll defer, but I generally shy away from long bash scripts for anything I have to maintain for the long term.). If we keep Bash, could we add a sentence or two in README_ADVANCED.md on why we chose shell over Node/Python for the runner? I think that would help future contributors.

I just intended this to be "not rust", so that almost everyone can contribute, with or without knowledge of rust. Had a conversation with @rvagg yesterday and as per my understanding, he is okay with shell scripts. But I agree with your point that bash may not be the most ergonomic. At the same time however, thinking of dev experience locally (for reproduction of issues):

  1. Python needs complexity surrounding requirements.txt and venv, and is not trivial in the setup sense.
  2. JS / TS are okay, but needs pnpm / npm, and we have had a lot of issues with dependencies marked vulnerable and hence something breaking, leading to us taking synapse-sdk out of this repo's dependencies anyways.
  3. Rust and Go are not the best for this kind of work, especially surrounding toolchains overhead, and verbosity especially arising from error handling.

Shell scripts however run straight out of the box, need no additional toolchain, even locally and many a times are smaller / compact, albeit not the simplest to understand (if you opt into the nastiness of bash). If each test is kept small and separate however, with foc-devnet start / stop being done in rust, then all I think what remains is:

  1. Cast calls
  2. Some conditionals, checks

tapping not so much into high complexity of shell scripts, and that should be easy enough to understand.

I feel that it is right to assume these scenario tests will stay small <150 lines per test and independent, and AI systems can quickly understand them. And hence for allowing more developer participation, this may be not the perfect but okay middleground.

In the worst case, if it does need python / JS because we are doing something complex, that specific scenario can in fact, under this model opt into those other runtimes (JS/TS/Python/Go). However, the invariant should be that "scenarios" are "self-contained", and failure of a "scenario" is not failure of "foc-devnet" necessarily. This also taps into the model of scenarios themselves and who owns them, and I would like to ask your thoughts on this end.

Scenario Ownership: Thought needs to be put in here on how do we think of "scenarios" and ownership model surrounding that? Do we think of "scenarios" as something "unbreakable", owned by FilOz only and that "foc-devnet" has to take ownership of it? Or it is simply a fixture that stays in "foc-devnet" and runs every so often and issues arising from this will be triaged, even across teams? I feel it is the latter. For example, there may be a scenario where curio teams wants to check what happens to curio under a specific setup in FOC. In such scenarios, the curio team becomes the CODEOWNER of that scenario and triaging. In such a setup, "scenarios" should be as lightweight as possible, or as heavy as they want.

I feel other users here could be @wjmelements and/or @ZenGround0 as well. I am not opinionated too much on shell scripts being the "de-facto" here and if there are suggestions in other direction, I am willing to pivot.

@redpanda-f
Copy link
Collaborator Author

#72 is an attempt at pythonic testing scripts

@BigLep
Copy link
Contributor

BigLep commented Mar 4, 2026

@redpanda-f : I know I need to review your comments. I'll do so before you're back 2026-03-05.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: ⌨️ In Progress

Development

Successfully merging this pull request may close these issues.

3 participants