Skip to content

Add probing service#815

Open
randomlogin wants to merge 5 commits intolightningdevkit:mainfrom
randomlogin:add-probing-service
Open

Add probing service#815
randomlogin wants to merge 5 commits intolightningdevkit:mainfrom
randomlogin:add-probing-service

Conversation

@randomlogin
Copy link

@randomlogin randomlogin commented Mar 1, 2026

Added a probing service which is used to send probes to estimate channels' capacities.

Related issue: #765.

Probing is intended to be used in two ways:

  • on a 'normal' node that allocates some liquidity to probe channels;
  • on a probing node which doesn't make any real payments, just observes the network.


For probing a new abstraction Prober is defined and is (optionally) created during node building.
Prober periodically sends probes to feed the data to the scorer.
Prober sends probes using a ProbingStrategy.

ProbingStrategy trait has only one method: fn next_probe(&self) -> Option<Probe>; every tick it generates a probe, where Probe represents how to send a probe.

To accommodate two different ways the probing is used, we either construct a probing route manually (Probe::PrebuiltRoute) or rely on the router/scorer (Probe::Destination).

Prober tracks how much liquidity is locked in-flight in probes, prevents the new probes from firing if the cap is reached.

There are two probing strategies implemented:

  • Random probing strategy, it picks a random route from the current node, the route is probed via send_probe, thus ignores scoring parameters (what hops to pick), it also ignores liquidity_limit_multiplier which prohibits taking a hop if its capacity is too small. It is a true random route.

  • High degree probing strategy, it examines the graph and finds the nodes with the biggest number of (public) channels and probes routes to them using send_spontaneous_preflight_probes which uses the current router/scorer.

The former is meant to be used on payment nodes, while the latter on probing nodes. For the HighDegreeStrategy to work it is recommended to set probing_diversity_penalty_msat to some nonzero value to prevent routes reuse, however it may fail to find any available routes.

There are three tests added:

  • check the probing locked amount increases/decreases
  • check that the new probes are not fired if the current locked amount cap is reached
  • performance testing which sets up a network of nodes and 4 observing nodes

Example output (runs for ~1 minute, needs --nocapture flag):

SCID            Direction         Probing Random               Probing HiDeg                Probing HiDeg+P              Probing nostrat              Real outbound msat
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
114349209354241 C→Random          [0, 90 100 235]              unknown                      unknown                      unknown                      0
114349209354241 Random→C          [9 899 765, 100 000 000]     unknown                      unknown                      unknown                      917 184 187
114349209419777 D→HiDeg+P         unknown                      unknown                      [0, 91 127 725]              unknown                      0
114349209419777 HiDeg+P→D         unknown                      unknown                      [8 872 275, 100 000 000]     unknown                      905 313 697
114349209485312 B→F               [0, 3 418 855]               unknown                      unknown                      unknown                      160 288 291
114349209485312 F→B               [96 581 145, 100 000 000]    unknown                      unknown                      unknown                      819 711 709
114349209550848 B→D               [0, 6 141 420]               [0, 2 988 399]               [0, 91 127 725]              unknown                      30 052 300
114349209550848 D→B               [93 858 580, 100 000 000]    [97 011 601, 100 000 000]    [8 872 275, 100 000 000]     unknown                      874 555 512
114349209616384 E→HiDeg           [0, 2 097 950]               [0, 90 455 967]              unknown                      unknown                      0
114349209616384 HiDeg→E           [97 902 050, 100 000 000]    [9 544 033, 100 000 000]     unknown                      unknown                      913 366 836
114349209681920 B→C               [98 411 698, 100 000 000]    unknown                      unknown                      unknown                      86 511 108
114349209681920 C→B               [0, 1 588 302]               unknown                      unknown                      unknown                      812 521 355
114349209747457 B→E               [0, 6 625 595]               [0, 90 455 967]              [0, 4 795 311]               unknown                      0
114349209747457 E→B               [93 374 405, 100 000 000]    [9 544 033, 100 000 000]     [95 204 689, 100 000 000]    unknown                      907 339 924
114349209812993 C→nostrat         [0, 1 113 591]               unknown                      unknown                      unknown                      0
114349209812993 nostrat→C         [98 886 409, 100 000 000]    unknown                      unknown                      unknown                      990 000 000
114349209878528 C→E               [2 097 950, 5 335 740]       [0, 90 808 929]              unknown                      unknown                      55 569 306
114349209878528 E→C               [94 664 260, 97 902 050]     [9 191 071, 100 000 000]     unknown                      unknown                      878 423 826
114349209944065 B→C               [97 243 524, 100 000 000]    unknown                      [1 698 550, 6 086 572]       unknown                      938 012 264
114349209944065 C→B               [0, 2 756 476]               unknown                      [93 913 428, 98 301 450]     unknown                      34 367 637
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Known directions                   18/20                        8/20                         8/20                         0/20                        

For performance testing I had to expose the scoring data (scorer_channel_liquidity).
Also exposed scoring_fee_params: ProbabilisticScoringFeeParameters to Config.

TODOs:

  • adjust default parameters
  • improve HighDegree strategy to take into account channel capacities
  • improve HighDegree strategy to cache results
  • improve performance test to better estimate real channel capacity

Create new background task for probing, add its initialization to the
builder.

Add new trait ProobingStrategy with two default implementations.

Total amount of sats in current non-finished probes is tracked, it is
changed on ProbeSucceded and ProbeFailed events.
There are 3 probing tests:
- check that probing lock liquidity amount changes
- test that new probes are not fired if the max locked liquidity is
  reached (for example if on of the nodes goes down)
- probing perfomance test which sets up 4 probing nodes (random
  strategy, high-degree without penalty, high-degree with penalty,
  control node without probing). A network of several nodes is set up;
  these nodes make payments to one another and probing nodes observe
  their behaviour. Next the scorer estimates are printed out.

Scorer channel estimates are exposed for the purposes of the test.

Scoring parameters (probing penalty) is exposed to be set up during node
builiding.

Random probing strategy constructs the maximal possible route (up to the
set limit) instead of failing when the next hop is not possible to
construct.
Probing locked amount involves hops' fees, not only the last hop.

Remove usage of cursor in favor of Atomicusize for pseudorandom pick of
channels for random strategy.

Add check that min_probe amount is indeed smaller than max_probe_amount.

Add liquidity_limit_multiplier to probing config instead of hardocded
None value.
Remove Prober::run method in favor of distinct function.

Tidy probing tests.
@ldk-reviews-bot
Copy link

ldk-reviews-bot commented Mar 1, 2026

👋 Thanks for assigning @tnull as a reviewer!
I'll wait for their review and will help manage the review process.
Once they submit their review, I'll check if a second reviewer would be helpful.

@randomlogin randomlogin marked this pull request as ready for review March 1, 2026 12:10
@tnull tnull self-requested a review March 2, 2026 07:56
@ldk-reviews-bot
Copy link

🔔 1st Reminder

Hey @tnull! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

@ldk-reviews-bot
Copy link

🔔 2nd Reminder

Hey @tnull! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

@ldk-reviews-bot
Copy link

🔔 3rd Reminder

Hey @tnull! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

@ldk-reviews-bot
Copy link

🔔 4th Reminder

Hey @tnull! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

@ldk-reviews-bot
Copy link

🔔 5th Reminder

Hey @tnull! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

@ldk-reviews-bot
Copy link

🔔 6th Reminder

Hey @tnull! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

Copy link
Contributor

@enigbe enigbe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @randomlogin, thanks for the work on this! I've reviewed the first two commits:

  1. 7f3ce11: "Create probing service" and
  2. 6574bf9: "Add probing tests"

I've left a bunch of inline comments addressing configuration and public API, commit hygiene, testing infrastructure, and test flakiness.

In summary:

  1. A couple of items are exposed publicly that seem like they should be scoped to probing or gated for tests only (see scoring_fee_params in Config and scorer_channel_liquidity on Node).
  2. The probing tests duplicate existing test helpers (setup_node, MockLogFacadeLogger). Reusing and extending what's already in tests/common/ would reduce duplication and keep the test file focused on the tests themselves.
  3. test_probe_budget_blocks_when_node_offline has a race condition where the prober dispatches probes before the baseline capacity is measured, causing the assertion between the baseline and stuck capacities to fail. Details in the inline comment.
  4. A few nits about commit hygiene, import structure, and suggestions for renaming stuff.

Also needs to be rebased.

src/builder.rs Outdated
use crate::payment::asynchronous::om_mailbox::OnionMessageMailbox;
use crate::peer_store::PeerStore;
use crate::probing;
use crate::probing::ProbingStrategy;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: ProbingStrategy is imported but never used because it qualified at all usage sites. You could either remove it and use the qualified form, or do the opposite. The rest of builder.rs follows the convention of importing types and using short names and I suggest doing the same here just for consistency.


use std::sync::atomic::{AtomicU64, Ordering};
use std::sync::{Arc, Mutex};
use std::time::Duration;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Import ordering doesn't match the codebase's convention. The rest of the repo follows: std -> external deps -> crate (see e.g. builder.rs, event.rs) but here it's reversed. Please reorder to match.

pub struct HighDegreeStrategy {
network_graph: Arc<Graph>,
/// How many of the highest-degree nodes to cycle through.
pub top_n: usize,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could top_n be renamed to num_top_nodes? The latter reads less generic to me but up to you to modify or not.

src/probing.rs Outdated
let graph = self.network_graph.read_only();

// Collect (pubkey, channel_count) for all nodes.
// wtf it does why we need to iterate here and then sort? maybe we can go just once?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, and in other locations, we have what looks like leftover development notes. Could you fix-up its removal into this commit so it doesn't end up in git history?

async_payments_role: None,
pathfinding_scores_sync_config,
recovery_mode,
probing_strategy: None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Most optional fields here are initialized as local variables before the struct literal (e.g. let chain_data_source_config = None;). Could probing_strategy follow the same pattern for consistency? While at it, async_payments_role has the same inconsistency. Would be nice to align both.

}
}

// helpers
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a lot of helpers defined here. Could we move the general-purpose ones to tests/common/ and the probing-specific to tests/common/probing.rs to keep this file focused on the tests? For example, configure_chain_source duplicates the chain source setup already in setup_node (tests/common/mod.rs).

.set_probing_interval(Duration::from_millis(PROBING_INTERVAL_MILLISECONDS))
.set_max_probe_locked_msat(MAX_LOCKED_MSAT);
build_and_start(builder, config)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

build_and_start and configure_chain_source (and other build_node_* functions) duplicate logic that setup_node (chain source setup, store selection, custom logger, node start) already handles. Could we augment setup_node / TestConfig to support probing configuration and reuse it here instead?


/// Test change of locked_msat amount
#[tokio::test(flavor = "multi_thread", worker_threads = 2)]
async fn test_probe_budget_increments_and_decrements() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can drop the test_ prefix since #[tokio::test] already marks it as one.


/// Test that probing stops if the upper locked in flight probe limit is reached
#[tokio::test(flavor = "multi_thread", worker_threads = 2)]
async fn test_probe_budget_blocks_when_node_offline() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about renaming to exhausted_probe_budget_blocks_new_probes and then updating the documentation to highlight how exhaustion occurs, i.e. node going offline?

.find(|ch| ch.counterparty_node_id == node_b.node_id())
.map(|ch| ch.outbound_capacity_msat)
.expect("A→B channel not found after B stopped");
assert!(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assertion fails with:

HTLC not visible in channel state: capacity unchanged (988999000 msat)

I think this is because the prober dispatches a probe every 250ms in the background as soon as a viable path exists, which happens before initial_capacity is read. By the time the test reads initial_capacity, a probe is already in-flight, so the value is already reduced. The wait_until resolves quickly because locked_msat is already > 0. Thus, by the time we stop node_b, the same probe is still in-flight, and stuck_capacity reads the same value as initial_capacity. Possibly why the assertion fails.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants