Skip to content

anounciness metric for peer selection#53

Open
coot wants to merge 8 commits into
mainfrom
coot/sig-metric
Open

anounciness metric for peer selection#53
coot wants to merge 8 commits into
mainfrom
coot/sig-metric

Conversation

@coot
Copy link
Copy Markdown
Contributor

@coot coot commented Apr 21, 2026

List of changes

This PR introduces announcyness metric for peer selection which is used to
decide which hot peers we should demote.

For each peer we count how many valid signatures were announced before any
other peer announced it. This is implemented into two steps: first we account
at which time a peer announce a given signature id, then once we receive the
corresponding signature, and it turns out to be valid, we give a score of one
to the peer that announced it first. Each score is kept for an hour (the same
as churn rate).

Checklist

  • related issue
  • My changes generate no new warnings
  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have added tests that prove my fix is effective or that my feature works

@github-project-automation github-project-automation Bot moved this to In Progress in Ouroboros Network Apr 21, 2026
@coot coot linked an issue Apr 21, 2026 that may be closed by this pull request
@coot coot force-pushed the coot/sig-metric branch 2 times, most recently from 6a60f59 to 24d0083 Compare April 21, 2026 15:07
@coot coot force-pushed the coot/sig-metric branch from 24d0083 to 6268bbd Compare May 8, 2026 14:47
@coot coot marked this pull request as ready for review May 8, 2026 14:47
@coot coot force-pushed the coot/sig-metric branch from 6268bbd to eb59e7b Compare May 8, 2026 15:17
@coot coot force-pushed the coot/sig-metric branch from eb59e7b to f0904b4 Compare May 12, 2026 13:00
@coot coot self-assigned this May 12, 2026
@coot coot force-pushed the coot/sig-metric branch from f0904b4 to 5fe8ce8 Compare May 12, 2026 13:12
@coot coot force-pushed the coot/sig-metric branch 3 times, most recently from c2b2518 to 1efece4 Compare May 12, 2026 15:07
@coot coot force-pushed the coot/sig-metric branch 2 times, most recently from bca4d1a to 142a39e Compare May 12, 2026 15:18
coot and others added 3 commits May 12, 2026 17:26
Adds a property test that verifies `announciness peerMetric` agrees with
a standalone pure model at every metric-state snapshot emitted during the
simulation.

Key design choices:
- `runSigSubmissionV2WithMetric` (IOSim-specific, leaves the generic
  `runSigSubmissionV2` untouched) wires a real `PeerMetric` into each
  inbound peer and registers a `traceTVarIO` callback on the metric TVar.
- Both peer events (`TraceTxInboundReceivedTxIds`, `TraceTxInboundAddedToMempool`)
  and metric snapshots are emitted as a single `SimTraceEvent = Either
  SimInboundEvent SimMetricSnapshot` type.  A single
  `selectTraceEventsDynamicWithTime` call then extracts them in their
  original in-thread order, preserving the `sig_add → snapshot →
  sig_add → snapshot` interleaving that two separate extraction passes
  would destroy.
- The pure model (`PureModelState`, `updatePureModel`,
  `expectedAnnounciness`) is decoupled from the implementation and
  mirrors `reportSigImpl` independently.
- `peerMetricVar` (field accessor) exported from `PeerMetric` for
  testing; constructor remains unexported.
- `TraceTxInboundReceivedTxIds` trace added at both sigid-receipt sites
  in `sigSubmissionInbound`.
- `SimPeerAddr ~ Int` enforced via type equality so a type change causes
  a compile error rather than silently empty traces.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces an “announciness” peer metric used by the peer selection policy to demote hot peers that are slower to announce signatures, and wires metric reporting into the sig-submission inbound client. It also adds property tests around the new metric and updates a couple CI formatting scripts for broader fd compatibility.

Changes:

  • Add a new DMQ.Diffusion.PeerSelection.PeerMetric module (announciness metric + STM state + reporting hooks) and integrate it into NodeKernel / NodeToNode and peer selection demotion.
  • Extend sigSubmissionInbound to record sig-id announcement times and report accepted/rejected outcomes into the metric.
  • Add QuickCheck / IOSim property tests validating metric behavior and update CI scripts to use fdfind/fd.

Reviewed changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
scripts/ci/run-stylish-haskell.sh Use fdfind/fd variable instead of hardcoding fd.
scripts/ci/run-nixpkgs-fmt.sh Same fdfind/fd resolution for nix formatting.
dmq-node/test/Test/DMQ/SigSubmission/App.hs Adds an IOSim property that cross-checks metric snapshots against a pure model; wires metrics into the simulated inbound.
dmq-node/test/Test/DMQ/PeerSelection/PeerMetric.hs New property tests for the metric’s pure implementation (pruning, announciness, competing peers, etc.).
dmq-node/test/Main.hs Registers the new peer-metric test group.
dmq-node/src/DMQ/SigSubmissionV2/Inbound.hs Threads per-peer local metric state; records sig-id receipt time; reports mempool results into the metric.
dmq-node/src/DMQ/Protocol/SigSubmission/Type.hs Fixes a typo in a comment.
dmq-node/src/DMQ/Policy.hs Adds peerMetricConfiguration (1h window).
dmq-node/src/DMQ/NodeToNode.hs Wires metric reporting into sig-submission inbound; erases peer metric state on disconnect via finally.
dmq-node/src/DMQ/Diffusion/PeerSelectionPolicy.hs Uses announciness scores to bias hot peer demotion (lower score demoted first).
dmq-node/src/DMQ/Diffusion/PeerSelection/PeerMetric.hs New metric implementation module (local state, global state, reporting, announciness extraction).
dmq-node/src/DMQ/Diffusion/NodeKernel/Types.hs Adds peerMetric to NodeKernel.
dmq-node/src/DMQ/Diffusion/NodeKernel.hs Instantiates the shared peerMetric in newNodeKernel.
dmq-node/dmq-node.cabal Exposes new modules and adds psqueues dependency; includes new tests.
dmq-node/changelog.d/20260512_150257_coot_sig_metric.md Adds a changelog fragment describing the new feature.
dmq-node/app/Main.hs Switches to PeerSelectionPolicy module and passes NodeKernel’s peerMetric into policy.
cabal.project Pins ouroboros-network from a specific git tag and adds an allow-newer entry.
Comments suppressed due to low confidence (1)

dmq-node/src/DMQ/Diffusion/PeerSelection/PeerMetric.hs:246

  • As implemented, credit is only assigned when a peer’s signature is accepted (reportSigImpl uses that peer’s LocalPeerMetricState to determine the announcement time). This does not (by itself) implement “give a score to the peer that announced it first” across all peers unless the system guarantees that the accepting peer is always the earliest announcer. Either clarify this in the PR description/docs, or adjust the metric to track the earliest announcement globally across peers and use that when a signature becomes valid.
reportSigImpl
  :: forall sigid peeraddr.
     Ord sigid
  => PeerMetricConfiguration
  -> LocalPeerMetricState sigid
  -> TraceLabelPeer peeraddr (sigid, TxMempoolResult)
  -> PeerMetricState sigid peeraddr
  -> (LocalPeerMetricState sigid, PeerMetricState sigid peeraddr)
reportSigImpl
    config
    localState
    (TraceLabelPeer _peeraddr (sigid, TxRejected))
    metricState
    =
    ( localState'
    , prune config mbTime metricState
    )
  where
    localState' :: LocalPeerMetricState sigid
    (mbTime, localState') = unreportSigId sigid localState
reportSigImpl
    config
    localState
    (TraceLabelPeer peeraddr (sigid, TxAccepted))
    st
    =
    ( localState'
    , PeerMetricState { metricState = metricState' }
    )

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

set -euo pipefail

# First, try to find the 'fd' command
FD="$(which fdfind 2>/dev/null || which fd 2>/dev/null)"

fd -e nix -X nixpkgs-fmt
# First, try to find the 'fd' command
FD="$(which fdfind 2>/dev/null || which fd 2>/dev/null)"
Comment on lines +194 to +215
-- | An internal function which prunes the `PeerMetricState` keeping only
-- `timeWindowToKeep` of entries.
--
-- Note: we can trust the time, since it's coming from the host
-- `getMonotonicTime`.
prune :: Ord sigid
=> PeerMetricConfiguration
-> Maybe Time
-- ^ time when sigid was inserted to `LocalPeerMetricState`
-> PeerMetricState sigid peeraddr
-> PeerMetricState sigid peeraddr
prune PeerMetricConfiguration { timeWindowToKeep }
mbTime
PeerMetricState {
metricState
}
= PeerMetricState {
metricState =
case mbTime of
Nothing -> metricState
Just t -> snd $ OrdPSQ.atMostView ((-timeWindowToKeep) `addTime` t) metricState
}
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the protocol becomes stale, we preserve the current information until connectivity resumes, which is a feature not a bug.

This change would require changes in PeerSelectionPolicy API: ight now each policy runs in STM m rather than m.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

Churn metric

2 participants