Skip to content

ci: add consensus determinism gate and PR coverage for system tests#111

Draft
mateeullahmalik wants to merge 9 commits intomasterfrom
ci/consensus-determinism-gate
Draft

ci: add consensus determinism gate and PR coverage for system tests#111
mateeullahmalik wants to merge 9 commits intomasterfrom
ci/consensus-determinism-gate

Conversation

@mateeullahmalik
Copy link
Copy Markdown
Contributor

Summary

This PR strengthens pre-merge safety for consensus determinism regressions.

Changes

  • Enable existing test workflows on pull requests to master:
    • .github/workflows/test.yml
    • .github/workflows/systemtests.yaml
  • Add a new workflow: .github/workflows/consensus-determinism.yml
    • Builds chain binary
    • Starts a 6-validator ephemeral network
    • Submits deterministic canary tx pack (including map-bearing audit evidence key-order permutations)
    • Fails on app-hash divergence or stalled progress
  • Add map-bearing inventory workflow and helper:
    • .github/workflows/map-consensus-inventory.yml
    • .github/scripts/map_consensus_inventory.sh
    • Surfaces map-bearing proto/generated marshal paths and determinism test coverage gaps

Validation

  • bash -n .github/scripts/map_consensus_inventory.sh
  • YAML parse check for all modified/new workflow files (PyYAML)

Notes

  • The inventory check is currently informational/warn-oriented for identified risk conditions, to avoid immediate disruption while we roll out stricter policies incrementally.

- run test + systemtests workflows on pull requests to master
- add consensus-determinism workflow (6-validator canary, app-hash camp check)
- add map-consensus-inventory workflow and script to inventory map-bearing proto/state paths and surface determinism risks
@roomote-v0
Copy link
Copy Markdown

roomote-v0 bot commented Mar 25, 2026

Rooviewer Clock   See task

All 3 issues from the previous review have been fixed in 375fcea. No new issues found.

  • keys show and submit_tx use --home "$OUT" but the per-node keyrings live at $OUT/node{i}/lumerad (lines 84, 94)
  • Back-to-back submit_tx calls with --broadcast-mode sync will hit an account sequence mismatch (lines 107-108)
Previous reviews

Mention @roomote in a comment to request specific changes to this pull request or fix all unresolved issues.

done
curl -sf http://127.0.0.1:26657/status >/dev/null

SUBJECT=$("$BIN" keys show node1 -a --home "$OUT" --keyring-backend test)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--home "$OUT" points to the testnet output root (.ci-determinism/testnet), but the keyring for node1 lives at $OUT/node1/lumerad (the testnet init-files command creates per-node keyrings under $OUT/<nodeDirPrefix><i>/<nodeDaemonHome>/). This will fail at runtime with a "key not found" error because there is no keyring at $OUT.

Suggested change
SUBJECT=$("$BIN" keys show node1 -a --home "$OUT" --keyring-backend test)
SUBJECT=$("$BIN" keys show node1 -a --home "$OUT/node1/lumerad" --keyring-backend test)

Fix it with Roo Code or mention @roomote and request a fix.

Comment on lines +92 to +100
"$BIN" tx audit submit-evidence "$SUBJECT" cascade-client-failure "$action_id" "$meta_json" \
--from node0 \
--home "$OUT" \
--keyring-backend test \
--chain-id "$CHAIN_ID" \
--node tcp://127.0.0.1:26657 \
--fees 1ulume \
--broadcast-mode sync \
--yes -o json > "$WORK/${action_id}.json"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same --home issue as the keys show call above: $OUT is the testnet root, but the keyring for node0 is at $OUT/node0/lumerad. The --from node0 lookup will fail because there is no keyring at $OUT.

Suggested change
"$BIN" tx audit submit-evidence "$SUBJECT" cascade-client-failure "$action_id" "$meta_json" \
--from node0 \
--home "$OUT" \
--keyring-backend test \
--chain-id "$CHAIN_ID" \
--node tcp://127.0.0.1:26657 \
--fees 1ulume \
--broadcast-mode sync \
--yes -o json > "$WORK/${action_id}.json"
"$BIN" tx audit submit-evidence "$SUBJECT" cascade-client-failure "$action_id" "$meta_json" \
--from node0 \
--home "$OUT/node0/lumerad" \
--keyring-backend test \
--chain-id "$CHAIN_ID" \
--node tcp://127.0.0.1:26657 \
--fees 1ulume \
--broadcast-mode sync \
--yes -o json > "$WORK/${action_id}.json"

Fix it with Roo Code or mention @roomote and request a fix.

Comment on lines +107 to +108
submit_tx ci-canary-1 "$META1"
submit_tx ci-canary-2 "$META2"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both transactions are submitted back-to-back with --broadcast-mode sync. The CLI auto-detects the account sequence from committed state, so if the first tx hasn't landed in a block yet, the second tx will be signed with the same sequence number and rejected with an account sequence mismatch. Adding a short sleep (or switching to --broadcast-mode block for the first tx, or using --sequence explicitly) would prevent this race.

Suggested change
submit_tx ci-canary-1 "$META1"
submit_tx ci-canary-2 "$META2"
submit_tx ci-canary-1 "$META1"
sleep 6
submit_tx ci-canary-2 "$META2"

Fix it with Roo Code or mention @roomote and request a fix.

- use per-node home paths for key lookup and tx signing
- switch canary tx broadcast mode to block to avoid account sequence race between back-to-back submissions
roomote-v0[bot]
roomote-v0 bot previously approved these changes Mar 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant