Skip to content

feat(dkg): Implement bcast#290

Draft
iamquang95 wants to merge 9 commits intomainfrom
iamquang95/dkb/bcast2
Draft

feat(dkg): Implement bcast#290
iamquang95 wants to merge 9 commits intomainfrom
iamquang95/dkb/bcast2

Conversation

@iamquang95
Copy link
Collaborator

Should fix: #132

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend using bootnode example as a starting point, and try to connect by relays by doing this we will have similar interface across all examples, and it will be closer to production

@iamquang95
Copy link
Collaborator Author

@claude

@claude
Copy link

claude bot commented Mar 23, 2026

Claude finished @iamquang95's task in 3m 3s —— View job


Code Review

  • Gather context and understand the PR
  • Read bcast implementation files
  • Read bootnode example for comparison
  • Post comprehensive review

Summary

This PR implements the DKG reliable-broadcast (bcast) protocol in Rust, porting charon/dkg/bcast. The overall structure — Behaviour, Handler, Component, protocol helpers — is clean and well-layered. The integration test in behaviour.rs covers the core round-trip and duplicate semantics. The example uses relay-based connectivity matching the bootnode example's shape (addressing @varex83's review comment).

A few issues are noted below.


Findings

[Medium] TODO comment must be removed before merge

  • crates/dkg/src/bcast/protocol.rs:20
  • Per AGENTS.md: "Avoid leaving TODOs in merged code." The comment // TODO: It's duplicated with peerinfo/src/protocol.rs:write_protobuf. must be resolved or removed. If deduplication is tracked separately, move it to an issue.
  • Fix this →

[Medium] next_op_id overflow resets to 0 — potential id collision

  • crates/dkg/src/bcast/behaviour.rs:88
  • self.next_op_id.checked_add(1).unwrap_or(0) resets next_op_id to 0 on overflow. If there happens to be an active op with id 0, this causes a collision. With u64::MAX possible values this is astronomically unlikely in practice, but semantically wrapping_add(1) is more correct — it cycles without resetting to a previous value. The current code also uses unwrap_or which is disallowed in production code by workspace lints (the unwrap_used = "deny" lint). This will fail clippy.
  • Fix this →

[Medium] .expect() in merge_known_peers panics on invalid peer ID

  • crates/dkg/examples/bcast.rs:273
  • PeerId::from_str(peer).expect("Failed to parse peer ID") — this is inside a plain fn (not main), called from a non-example path. While .expect() is tolerated in example main() functions, this helper is reusable and should return a Result or at minimum use anyhow::bail!. Panics inside merge_known_peers will not be caught gracefully.
  • Fix this →

[Low] handle_inbound_sig_request runs check before dedup recording

  • crates/dkg/src/bcast/handler.rs:386-402
  • The handler.check(peer_id, &any) call happens before the dedup store is consulted/updated. A peer that repeatedly fails the check can send arbitrary hashes without ever being deduplicated. The dedup protection only engages after a check passes. This is likely a parity gap with the Go implementation — worth verifying against charon/dkg/bcast. If the Go code performs dedup first, the order should be swapped.

[Low] Reviewer comment already addressed

  • @varex83's comment asked to use bootnode as a starting point and connect via relays. The current bcast.rs example already does this: it uses bootnode::new_relays, MutableRelayReservation, RelayRouter, and Node::new with QUIC transport — matching the bootnode example's interface. No action needed here.

[Low] ExampleBehaviour omits identify and ping fields

  • crates/dkg/examples/bcast.rs:96-101
  • The bootnode example includes identify and ping in its custom ExampleBehaviour. The bcast example handles PlutoBehaviourEvent::Ping and PlutoBehaviourEvent::Identify in the event loop (lines 600–611), but these events come from Node's built-in Pluto behaviour — not from a custom behaviour field. This is functionally correct. However, if a future ExampleBehaviour variant is added, the handler arms need to stay in sync. Minor consistency gap with bootnode example.

Parity Matrix

Component Go Rust Notes
Protocol IDs /charon/dkg/bcast/1.0.0/{sig,msg} present present (mod.rs:29-32) match
Timeouts (60s receive, 62s send) unverified (no Go ref locally) present (mod.rs:35-38) Go ref needed
Dedup by (peer, msg_id) unverified present (handler.rs:43-48) Go ref needed
Signature = SHA256(type_url || value) unverified present (protocol.rs:61-66) has test vector
Sequential broadcast (one at a time) unverified present (behaviour.rs:58) Go ref needed
Fail on disconnect during active broadcast unverified present (behaviour.rs:104-112) Go ref needed

Note: The Go reference checkout (charon at v1.7.1) is not present in the working directory. Parity items marked "unverified" should be confirmed once the Go reference is available per AGENTS.md requirements.


Tests

  • Unit tests: protocol.rs has hash_any_matches_go_expected_value (cross-language vector) and signature rejection tests. ✅
  • Integration test: behaviour.rs::broadcast_round_trip_and_duplicate_semantics covers the full sig-collect → broadcast flow, dedup semantics, and error paths. ✅
  • No Go reference was available locally to run cross-language parity checks.

Open Questions

  1. Should write_protobuf be extracted to a shared utility crate (e.g., pluto-p2p) to eliminate the duplication with peerinfo?
  2. Does the Go implementation perform the dedup check before or after the application-level check() callback?
  3. Are the 60s/62s timeout values confirmed from the Go source?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement dkg/bcast

2 participants