(WIP) feat: scenarios based testing by redpanda-f · Pull Request #71 · FilOzone/foc-devnet

redpanda-f · 2026-03-02T14:56:20Z

Part of #8

Scenarios Testing Framework + Nightlies

Introduces a lightweight shell-based test framework that runs scenarios against a live devnet.

scenarios/ — runner (run.sh), shared helpers (lib.sh), execution order (order.sh), and two initial scenarios: container health checks and balance verification
CI — scenarios run after cluster start; nightly schedule + manual dispatch with optional GitHub issue reporting on failure
Linting — new shellcheck CI job enforces formatting (shfmt) and lint (shellcheck) on all .sh files
Pre-commit hooks — cargo fmt, clippy, shfmt, shellcheck, and +x checks; auto-fix mode via FIX=1 git commit

Copilot

Pull request overview

Adds a lightweight, scenario-based shell test framework for validating a running foc-devnet, and wires it into CI (with optional GitHub issue reporting) plus local shell/Rust pre-commit checks.

Changes:

Introduce scenarios/ runner, shared assertion library, and initial scenarios (containers + balances).
Extend CI to run scenario tests (nightly + optional workflow dispatch reporting) and add a shell lint/format job.
Add repo-local git hooks (.githooks/) with an installer and a pre-commit quality gate.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 10 comments.

Show a summary per file

File	Description
`scenarios/lib.sh`	Shared helpers for scenario tests (assertions, devnet-info access, result tracking).
`scenarios/run.sh`	Runs scenarios in order, writes a markdown report, optionally opens a GitHub issue.
`scenarios/order.sh`	Defines the serial execution order for scenarios.
`scenarios/test_containers.sh`	Validates expected containers are running and attempts to detect unexpected ones.
`scenarios/test_basic_balances.sh`	Checks FIL + MockUSDFC balances for all users (and installs Foundry if needed).
`README_ADVANCED.md`	Documentation for running/writing scenarios and CI behavior.
`.github/workflows/ci.yml`	Adds nightly/dispatch triggers, shellcheck/shfmt job, scenario execution + artifact upload, and issue-write permission.
`.githooks/pre-commit`	Pre-commit checks for Rust + shell (with optional auto-fix mode).
`.githooks/install.sh`	Installs hooks by setting `core.hooksPath` to `.githooks/`.

Copilot · 2026-03-02T15:05:17Z

scenarios/test_basic_balances.sh

+# ── Ensure Foundry is available ──────────────────────────────
+if ! command -v cast &>/dev/null; then
+  info "Installing Foundry …"
+  curl -sSL https://foundry.paradigm.xyz | bash


After installing Foundry via curl ... | bash, cast may still not be available because the standard install flow requires running foundryup (the builder Dockerfile does this explicitly). Without that, assert_ok command -v cast can fail even though the install command succeeded. Run foundryup after installation (and ensure the PATH update is in place), or alternatively use the cast binary from the existing foc-<run_id>-builder container instead of installing on the host.

Suggested change

curl -sSL https://foundry.paradigm.xyz | bash

curl -sSL https://foundry.paradigm.xyz | bash

"$HOME/.foundry/bin/foundryup"

Copilot · 2026-03-02T15:05:18Z

scenarios/test_basic_balances.sh

+
+SCENARIO_DIR="$(cd "$(dirname "$0")" && pwd)"
+source "${SCENARIO_DIR}/lib.sh"
+scenario_start "basic_balances"


The scenario name used for reporting (scenario_start "basic_balances") doesn’t match the script basename (test_basic_balances) used by the runner. This causes the per-scenario "Details" table (from results.csv) and the "Failed scenarios" list (from the runner) to use different identifiers, making reports harder to correlate. Consider using the same name as the entry in order.sh.

Suggested change

scenario_start "basic_balances"

scenario_start "test_basic_balances"

Copilot · 2026-03-02T15:05:19Z

scenarios/lib.sh

+# ── devnet-info helpers ──────────────────────────────────────
+
+# Shorthand: jq_devnet '.info.users[0].evm_addr'
+jq_devnet() { jq -r "$1" "$DEVNET_INFO"; }


Because scripts run with set -e, any unexpected non-zero exit from jq, docker, cast, etc. will terminate the scenario immediately and skip scenario_end, which means results.csv won’t be updated and the runner report can miss details. Consider adding a trap-based fallback that records a failure on EXIT/ERR, and/or wrapping external commands in assertion helpers so failures increment _FAIL instead of aborting the script.

Suggested change

jq_devnet() { jq -r "$1" "$DEVNET_INFO"; }

jq_devnet() {

if ! jq -r "$1" "$DEVNET_INFO"; then

fail "jq_devnet: jq failed for filter '$1' on '$DEVNET_INFO'"

return 1

fi

}

Copilot · 2026-03-02T15:05:19Z

scenarios/test_containers.sh

+# ── Check no unexpected foc-c-* containers are running ───────
+# All foc-c-* containers should belong to the expected set
+RUNNING=$(docker ps --filter "name=foc-c-" --format '{{.Names}}')


The "unexpected container" check won’t match the actual container naming scheme. Containers are named like foc-<run_id>-curio-<idx> / foc-<run_id>-lotus (see src/docker/containers.rs), so docker ps --filter "name=foc-c-" will typically return nothing and zombie containers won’t be detected. Consider filtering by the run-scoped prefix from devnet-info (e.g., foc-${run_id}-) or at least name=foc- and then comparing against the expected set (allowlisting known extras like portainer/builder if needed).

Suggested change

# ── Check no unexpected foc-c-* containers are running ───────

# All foc-c-* containers should belong to the expected set

RUNNING=$(docker ps --filter "name=foc-c-" --format '{{.Names}}')

# ── Check no unexpected foc-* containers are running ─────────

# All foc-* containers for this devnet run should belong to the expected set.

# Prefer the run-scoped prefix from devnet-info when available, fall back to foc-.

RUN_ID="$(jq_devnet '.info.run_id // ""')"

if [[ -n "$RUN_ID" ]]; then

NAME_FILTER="foc-${RUN_ID}-"

else

NAME_FILTER="foc-"

fi

RUNNING=$(docker ps --filter "name=${NAME_FILTER}" --format '{{.Names}}')

Copilot · 2026-03-02T15:05:19Z

scenarios/test_containers.sh

+
+SCENARIO_DIR="$(cd "$(dirname "$0")" && pwd)"
+source "${SCENARIO_DIR}/lib.sh"
+scenario_start "containers"


The scenario name used for reporting (scenario_start "containers") doesn’t match the script basename (test_containers) used by the runner. This causes the per-scenario "Details" table (from results.csv) and the "Failed scenarios" list (from the runner) to use different identifiers, making reports harder to correlate. Consider using the same name as the entry in order.sh.

Suggested change

scenario_start "containers"

scenario_start "test_containers"

Copilot · 2026-03-02T15:05:20Z

scenarios/run.sh

+REPORTING="${REPORTING:-false}"
+SKIP_REPORT_ON_PASS="${SKIP_REPORT_ON_PASS:-true}"
+
+# ── Bootstrap ────────────────────────────────────────────────


run.sh assumes ${REPORT_DIR} exists when writing the markdown report (redirecting into ${REPORT_DIR}/scenario_<ts>.md). If the directory doesn’t exist (e.g., a fresh machine without ~/.foc-devnet/state/latest yet), the runner will fail after executing scenarios. Consider mkdir -p "${REPORT_DIR}" during bootstrap before cleaning/writing report artifacts.

Suggested change

# ── Bootstrap ────────────────────────────────────────────────

# ── Bootstrap ────────────────────────────────────────────────

# Ensure report directory exists before cleaning/writing artifacts

mkdir -p "${REPORT_DIR}"

Copilot · 2026-03-02T15:05:20Z

.githooks/pre-commit

+
+# ── Rust checks ──────────────────────────────────────────────
+HAS_RS=$(echo "$STAGED" | grep -c '\.rs$' || true)
+if [[ $HAS_RS -gt 0 ]]; then
+  if command -v cargo &>/dev/null; then
+    if cargo fmt --all -- --check &>/dev/null; then
+      pass "cargo fmt"
+    elif [[ "$FIX" == "1" ]]; then
+      cargo fmt --all
+      git diff --name-only -- '*.rs' | xargs -r git add


In FIX mode, the Rust formatter restages all modified *.rs files (git diff --name-only -- '*.rs' | xargs ...), including unstaged changes that weren’t part of the commit. This can accidentally add unrelated work to the commit. Consider re-staging only the originally staged Rust files (or only the files that were modified by cargo fmt among the staged set).

Suggested change

# ── Rust checks ──────────────────────────────────────────────

HAS_RS=$(echo "$STAGED" | grep -c '\.rs$' || true)

if [[ $HAS_RS -gt 0 ]]; then

if command -v cargo &>/dev/null; then

if cargo fmt --all -- --check &>/dev/null; then

pass "cargo fmt"

elif [[ "$FIX" == "1" ]]; then

cargo fmt --all

git diff --name-only -- '*.rs' | xargs -r git add

STAGED_RS=$(echo "$STAGED" | grep '\.rs$' || true)

# ── Rust checks ──────────────────────────────────────────────

HAS_RS=$(echo "$STAGED_RS" | grep -c '\.rs$' || true)

if [[ $HAS_RS -gt 0 ]]; then

if command -v cargo &>/dev/null; then

if cargo fmt --all -- --check &>/dev/null; then

pass "cargo fmt"

elif [[ "$FIX" == "1" ]]; then

cargo fmt --all

echo "$STAGED_RS" | xargs -r git add

Copilot · 2026-03-02T15:05:20Z

scenarios/lib.sh

+info() { _log "[INFO]" "$*"; }
+ok() {
+  _log "[ OK ]" "$*"
+  ((_PASS++)) || true
+}
+fail() {
+  _log "[FAIL]" "$*"


Logging currently produces doubled brackets (e.g., info passes "[INFO]", but _log wraps the level in brackets again). This results in output like [[INFO]] message. Consider passing plain level names (e.g., INFO, OK, FAIL) or removing the extra brackets in _log so log lines are consistent and readable.

Suggested change

info() { _log "[INFO]" "$*"; }

ok() {

_log "[ OK ]" "$*"

((_PASS++)) || true

}

fail() {

_log "[FAIL]" "$*"

info() { _log "INFO" "$*"; }

ok() {

_log " OK " "$*"

((_PASS++)) || true

}

fail() {

_log "FAIL" "$*"

Copilot · 2026-03-02T15:05:21Z

scenarios/test_containers.sh

+# Verifies that all foc-* containers reported in devnet-info.json
+# are actually running, healthy, and that no zombie foc-*
+# containers exist outside the current run.


The header comment says this script verifies containers are "healthy" and that no zombie foc-* containers exist, but the implementation only checks .State.Status == running and only scans for names matching foc-c-. Either update the comment to match the actual checks, or extend the implementation to check Docker health status (when present) and to scan the intended foc- namespace.

Copilot · 2026-03-02T15:05:21Z

scenarios/test_basic_balances.sh

+# ── Ensure Foundry is available ──────────────────────────────
+if ! command -v cast &>/dev/null; then
+  info "Installing Foundry …"
+  curl -sSL https://foundry.paradigm.xyz | bash


This script downloads and executes remote code via curl -sSL https://foundry.paradigm.xyz | bash without any integrity or authenticity verification. Because scenarios are run automatically in CI (with access to repository contents and CI secrets), a compromise of foundry.paradigm.xyz or its DNS/TLS could lead to arbitrary code execution in your pipeline and exfiltration of secrets. To mitigate this supply chain risk, avoid curl | bash installers in CI and instead use a pinned Foundry version with checksum/signature verification or a trusted package/source baked into the CI image.

BigLep

Thanks for getting the draft out. Things coming to mind:

Can we expand #8 with the set of steps/tasks we're going to do to make sure we're on the same page about the big picture? (This could be actual sub-issues or just a checklist task list).
- I'm wanting to make sure we're accounting for the ability to run tests against latest set of tagged releases and latest changes in "main" branches. (I know that isn't part of this PR, but I want to make sure we are tracking this need.)
- I want to make sure we excercise doing the E2E Synapse flow, including the two copy version. (Again, not part of this PR, but I want to make sure that is in scope and we're tracking it.)
Why bash scripting vs. using a more robust language that we're more familiar reading and working with? I appreciate that you're enforcing formatting, static analysis, and some library functions, but it also seems like we're using a language that is not developer ergonomic. (If other devs like @rvagg like this stuff in Bash, then I'll defer, but I generally shy away from long bash scripts for anything I have to maintain for the long term.). If we keep Bash, could we add a sentence or two in README_ADVANCED.md on why we chose shell over Node/Python for the runner? I think that would help future contributors.

BigLep · 2026-03-03T05:38:24Z

.githooks/pre-commit

What is the value of adding githhooks if we can't rely on them being installed and we're running these checks in CI anyways? Does it speed up developement? As this is basically duplicating what is setu in CI, lets add comments explaining why we're doing this.

On my system, it fixes before commits, things like fmt and others. These do not strictly duplicate CI, but there is an argument here. We can instead deduplicate and run the same script with FIX=0 and FIX=1 I guess.

BigLep · 2026-03-03T05:55:12Z

scenarios/run.sh

+if [[ "$SHOULD_REPORT" == "true" ]]; then
+  RUN_URL="${GITHUB_SERVER_URL:-https://github.com}/${GITHUB_REPOSITORY:-unknown}/actions/runs/${GITHUB_RUN_ID:-0}"
+  STATUS_EMOJI="✅"
+  [[ $FAILED -gt 0 ]] && STATUS_EMOJI="❌"
+  ISSUE_TITLE="${STATUS_EMOJI} Scenario report: ${PASSED}/${TOTAL} passed ($(date -u +%Y-%m-%d))"
+  ISSUE_BODY="$(cat "$REPORT")


I think we want to dedupe the isue creation? Basically if we are failing 3 nights in a row, I don't think we want 3 issues. We want one issue with two additional comments. That way it's less to manage and the dialog happens in one place. I believe that is what happens in https://github.com/filecoin-project/lotus/blob/master/.github/workflows/very-expensive-test.yml

Got it! Will add that (ipdxco/create-or-update-issue@v1) to workflow, potentially in python variant if that seems better.

redpanda-f · 2026-03-03T06:28:02Z

Thanks for getting the draft out. Things coming to mind:

Can we expand CI/Nightly End-to-End Validation of FOC as a whole #8 with the set of steps/tasks we're going to do to make sure we're on the same page about the big picture? (This could be actual sub-issues or just a checklist task list).

I'm wanting to make sure we're accounting for the ability to run tests against latest set of tagged releases and latest changes in "main" branches. (I know that isn't part of this PR, but I want to make sure we are tracking this need.)

I want to make sure we excercise doing the E2E Synapse flow, including the two copy version. (Again, not part of this PR, but I want to make sure that is in scope and we're tracking it.)

Yes, that will be part of this PR or another PR targetting this branch (haven't made my mind yet). It essentially needs foc-devnet to understand that it needs to figure out latest tag / latest commits. Maybe that can be just a flag like --curio latestCommit --lotus latestCommit etc. or --curio latestTag --lotus latestTag or something. In that way, we can have:

foc-devnet start \
    --curio latestCommit \
    --filecoin-services latestCommit \
    ...

These will be used within the "scheduled" runs every night.

As for the number of tests, I feel like the following should make sense:

Check Balances
E2E Synapse Flow (basically porting over over synapse, what we tested before)

as first set of items. I am trying this PR to be focused on getting something running in terms of framework, the exact coverage and number of tests can be improved over time, albeit still being tracked in #8 .

Why bash scripting vs. using a more robust language that we're more familiar reading and working with? I appreciate that you're enforcing formatting, static analysis, and some library functions, but it also seems like we're using a language that is not developer ergonomic. (If other devs like @rvagg like this stuff in Bash, then I'll defer, but I generally shy away from long bash scripts for anything I have to maintain for the long term.). If we keep Bash, could we add a sentence or two in README_ADVANCED.md on why we chose shell over Node/Python for the runner? I think that would help future contributors.

I just intended this to be "not rust", so that almost everyone can contribute, with or without knowledge of rust. Had a conversation with @rvagg yesterday and as per my understanding, he is okay with shell scripts. But I agree with your point that bash may not be the most ergonomic. At the same time however, thinking of dev experience locally (for reproduction of issues):

Python needs complexity surrounding requirements.txt and venv, and is not trivial in the setup sense.
JS / TS are okay, but needs pnpm / npm, and we have had a lot of issues with dependencies marked vulnerable and hence something breaking, leading to us taking synapse-sdk out of this repo's dependencies anyways.
Rust and Go are not the best for this kind of work, especially surrounding toolchains overhead, and verbosity especially arising from error handling.

Shell scripts however run straight out of the box, need no additional toolchain, even locally and many a times are smaller / compact, albeit not the simplest to understand (if you opt into the nastiness of bash). If each test is kept small and separate however, with foc-devnet start / stop being done in rust, then all I think what remains is:

Cast calls
Some conditionals, checks

tapping not so much into high complexity of shell scripts, and that should be easy enough to understand.

I feel that it is right to assume these scenario tests will stay small <150 lines per test and independent, and AI systems can quickly understand them. And hence for allowing more developer participation, this may be not the perfect but okay middleground.

In the worst case, if it does need python / JS because we are doing something complex, that specific scenario can in fact, under this model opt into those other runtimes (JS/TS/Python/Go). However, the invariant should be that "scenarios" are "self-contained", and failure of a "scenario" is not failure of "foc-devnet" necessarily. This also taps into the model of scenarios themselves and who owns them, and I would like to ask your thoughts on this end.

Scenario Ownership: Thought needs to be put in here on how do we think of "scenarios" and ownership model surrounding that? Do we think of "scenarios" as something "unbreakable", owned by FilOz only and that "foc-devnet" has to take ownership of it? Or it is simply a fixture that stays in "foc-devnet" and runs every so often and issues arising from this will be triaged, even across teams? I feel it is the latter. For example, there may be a scenario where curio teams wants to check what happens to curio under a specific setup in FOC. In such scenarios, the curio team becomes the CODEOWNER of that scenario and triaging. In such a setup, "scenarios" should be as lightweight as possible, or as heavy as they want.

I feel other users here could be @wjmelements and/or @ZenGround0 as well. I am not opinionated too much on shell scripts being the "de-facto" here and if there are suggestions in other direction, I am willing to pivot.

redpanda-f · 2026-03-03T09:41:51Z

#72 is an attempt at pythonic testing scripts

BigLep · 2026-03-04T06:25:09Z

@redpanda-f : I know I need to review your comments. I'll do so before you're back 2026-03-05.

redpanda-f added 2 commits March 2, 2026 14:49

introduce: initial scenarios

820b67f

feat: githooks, ADVANCED_README

f0fdab5

Copilot AI review requested due to automatic review settings March 2, 2026 14:56

FilOzzy added this to FOC Mar 2, 2026

github-project-automation bot moved this to 📌 Triage in FOC Mar 2, 2026

Copilot started reviewing on behalf of redpanda-f March 2, 2026 14:56 View session

redpanda-f self-assigned this Mar 2, 2026

rename: foc-start-test to foc-devnet-test

803aeb9

Copilot AI reviewed Mar 2, 2026

View reviewed changes

redpanda-f changed the title ~~feat: scenarios based testing~~ (WIP) feat: scenarios based testing Mar 2, 2026

BigLep moved this from 📌 Triage to ⌨️ In Progress in FOC Mar 2, 2026

BigLep reviewed Mar 3, 2026

View reviewed changes

redpanda-f mentioned this pull request Mar 3, 2026

CI/Nightly End-to-End Validation of FOC as a whole #8

Open

redpanda-f added 6 commits March 3, 2026 06:54

add: fixes

930d7eb

add: support for latestCommit and latestTag

d077ef5

fix: scenarios

e6bedf0

fix: test_basic_balances

f7452f6

fix: ensure_foundry was in wrong location

ba87369

feat: latestCommit and latestTag takes a branch optionally

c28437c

	curl -sSL https://foundry.paradigm.xyz \| bash
	curl -sSL https://foundry.paradigm.xyz \| bash
	"$HOME/.foundry/bin/foundryup"

	scenario_start "basic_balances"
	scenario_start "test_basic_balances"

-jq_devnet() { jq -r "$1" "$DEVNET_INFO"; }
+jq_devnet() {
+  if ! jq -r "$1" "$DEVNET_INFO"; then
+    fail "jq_devnet: jq failed for filter '$1' on '$DEVNET_INFO'"
+    return 1
+  fi
+}

-# ── Check no unexpected foc-c-* containers are running ───────
-# All foc-c-* containers should belong to the expected set
-RUNNING=$(docker ps --filter "name=foc-c-" --format '{{.Names}}')
+# ── Check no unexpected foc-* containers are running ─────────
+# All foc-* containers for this devnet run should belong to the expected set.
+# Prefer the run-scoped prefix from devnet-info when available, fall back to foc-.
+RUN_ID="$(jq_devnet '.info.run_id // ""')"
+if [[ -n "$RUN_ID" ]]; then
+  NAME_FILTER="foc-${RUN_ID}-"
+else
+  NAME_FILTER="foc-"
+fi
+RUNNING=$(docker ps --filter "name=${NAME_FILTER}" --format '{{.Names}}')

Conversation

redpanda-f commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Scenarios Testing Framework + Nightlies

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

BigLep left a comment

Choose a reason for hiding this comment

Uh oh!

BigLep Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

redpanda-f Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

BigLep Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

redpanda-f Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

redpanda-f commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

redpanda-f commented Mar 3, 2026

Uh oh!

BigLep commented Mar 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

redpanda-f commented Mar 2, 2026 •

edited

Loading

redpanda-f Mar 3, 2026 •

edited

Loading

redpanda-f commented Mar 3, 2026 •

edited

Loading