Skip to content

fix: enforce deterministic protobuf map marshaling in consensus paths#110

Open
mateeullahmalik wants to merge 6 commits intomasterfrom
hotfix/deterministic-map-marshal-consensus
Open

fix: enforce deterministic protobuf map marshaling in consensus paths#110
mateeullahmalik wants to merge 6 commits intomasterfrom
hotfix/deterministic-map-marshal-consensus

Conversation

@mateeullahmalik
Copy link
Copy Markdown
Contributor

@mateeullahmalik mateeullahmalik commented Mar 25, 2026

Summary

Fix consensus-critical nondeterminism caused by protobuf map field marshaling in audit/supernode paths.

Root cause

CascadeClientFailureEvidenceMetadata.details (map) and MetricsAggregate.metrics (map) were encoded via non-deterministic map iteration in generated marshal paths, which can produce different bytes across validators and lead to app-hash divergence.

What changed

  • Use deterministic metadata marshaling in x/audit/v1/keeper/evidence.go for evidence metadata encoding.
  • Update generated marshal code to sort map keys before encoding in:
    • x/audit/v1/types/evidence_metadata.pb.go (details)
    • x/supernode/v1/types/metrics_aggregate.pb.go (metrics)
  • Add regression tests:
    • x/audit/v1/types/evidence_metadata_determinism_test.go
    • x/supernode/v1/types/metrics_aggregate_determinism_test.go
    • keeper-level deterministic evidence metadata test in x/audit/v1/keeper/evidence_test.go

Validation

  • go test ./x/audit/v1/types ./x/audit/v1/keeper ./x/supernode/v1/types

Scope

  • Minimal hotfix; no schema changes, no migrations, no state key changes.

- use deterministic marshal helper for audit evidence metadata encoding in keeper
- sort map keys in generated marshal code for audit and supernode proto types
- add determinism regression tests for cascade failure metadata and metrics aggregate
@roomote-v0
Copy link
Copy Markdown

roomote-v0 bot commented Mar 25, 2026

Rooviewer Clock   See task

Commit 9980140 adds system-level determinism tests (A-E) covering chain progress, JSON permutation stability, height transitions, restart consistency, and reserved evidence type rejection. No new issues found. All previously flagged issues remain resolved.

  • Hand-edited .pb.go files (evidence_metadata.pb.go, metrics_aggregate.pb.go) will lose the sort.Strings fix on protobuf regeneration. Add a CI guard or post-generation script to prevent silent regression.
  • marshalEvidenceMetadataDeterministic now calls gogoproto.Marshal, which uses the generated Marshal() -> MarshalToSizedBuffer() path. The sort.Strings calls were removed from MarshalToSizedBuffer in this same commit, so map fields are encoded in random iteration order again. Either restore the sorted .pb.go edits or revert to the XXX_Marshal(b, true) deterministic path.
  • MetricsAggregate determinism test (metrics_aggregate_determinism_test.go) was deleted with no replacement. The sort.Strings patch in metrics_aggregate.pb.go is now the sole protection for this consensus-critical map field, with no test to catch regression if protobuf is regenerated.
Previous reviews

Mention @roomote in a comment to request specific changes to this pull request or fix all unresolved issues.

…ics types

- route audit metadata marshal through generated marshal path
- patch map-bearing generated XXX_Marshal to avoid proto internal nondeterministic branch
- sort map keys in MarshalToSizedBuffer for audit details and supernode metrics
Comment on lines +174 to +178
func marshalEvidenceMetadataDeterministic(msg gogoproto.Message) ([]byte, error) {
// We use generated Marshal() paths here. Map-bearing generated types are
// patched to sort keys before encoding, giving deterministic bytes.
return gogoproto.Marshal(msg)
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This commit removes the sort.Strings patches from both evidence_metadata.pb.go and metrics_aggregate.pb.go, then replaces the previous XXX_Marshal(b, true) deterministic path here with plain gogoproto.Marshal(msg). The comment says generated types are "patched to sort keys before encoding," but those patches were removed in this same commit. gogoproto.Marshal calls the generated Marshal() -> MarshalToSizedBuffer(), which now iterates Go maps in random order. This reintroduces the non-determinism bug the PR is meant to fix. Either restore the sort.Strings edits in the .pb.go files, or revert to the previous XXX_Marshal(b, true) approach which delegates to gogoproto's built-in deterministic marshaling.

Fix it with Roo Code or mention @roomote and request a fix.

- use deterministic marshal helper via XXX_Marshal(true) with deterministic buffer fallback
- restore map-key sorting for CascadeClientFailureEvidenceMetadata.details
- restore map-key sorting for MetricsAggregate.metrics
- force map-bearing XXX_Marshal methods through sorted MarshalToSizedBuffer path
roomote-v0[bot]
roomote-v0 bot previously approved these changes Mar 25, 2026
- add explicit deterministic protobuf wire encoder for CascadeClientFailureEvidenceMetadata (sorted details map keys)
- route EVIDENCE_TYPE_CASCADE_CLIENT_FAILURE through canonical encoder in CreateEvidence path
- add integration coverage asserting stable metadata bytes across JSON map key order permutations
- drop brittle generated-proto determinism tests that fail after build-time code generation
Comment on lines +147 to +152
keysForMetrics := make([]string, 0, len(m.Metrics))
for k := range m.Metrics {
keysForMetrics = append(keysForMetrics, k)
}
sort.Strings(keysForMetrics)
for iNdEx := len(keysForMetrics) - 1; iNdEx >= 0; iNdEx-- {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sort.Strings patch in generated code is now the sole protection for MetricsAggregate map determinism -- the unit test that previously verified it (metrics_aggregate_determinism_test.go) was deleted in this commit without a replacement. The cascade type got keeper-level and integration tests as replacements, but MetricsAggregate did not. If a future make proto-gen regenerates this file, the sort patch disappears and there is no test to catch the regression. Consider adding a determinism test for MetricsAggregate similar to the one added for cascade metadata.

Fix it with Roo Code or mention @roomote and request a fix.

Add integration regression test asserting MetricsAggregate marshals identically across equivalent map insertion orders. This guards against generated protobuf map-order regressions in consensus state paths.
roomote-v0[bot]
roomote-v0 bot previously approved these changes Mar 25, 2026
Adds system-level determinism coverage for map-bearing audit evidence:
- A: chain progress + single app-hash camp after cascade-client-failure submit
- B: JSON map-key permutation yields identical stored metadata bytes
- C: replay across later height transition remains deterministic
- D: restart without reset preserves deterministic state/app-hash consistency
- E: reserved evidence-type submission is rejected while consensus continues
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant