Skip to content

htlcswitch: separate onchain and offchain intercpeted HTLCs cleanly#10895

Open
ziggie1984 wants to merge 11 commits into
lightningnetwork:masterfrom
ziggie1984:fix-onchain-interceptor-held-htlc
Open

htlcswitch: separate onchain and offchain intercpeted HTLCs cleanly#10895
ziggie1984 wants to merge 11 commits into
lightningnetwork:masterfrom
ziggie1984:fix-onchain-interceptor-held-htlc

Conversation

@ziggie1984

@ziggie1984 ziggie1984 commented Jun 11, 2026

Copy link
Copy Markdown
Collaborator

Change Description

Fixes #10892

Look at the first commit which shows via itests the failure cases of the old Interceptor Implementation

The second commit introduces a new interface and distingishes between onchain and offchain HTLC for the interceptor. It does not change any public interface.

@ziggie1984 ziggie1984 self-assigned this Jun 11, 2026
@github-actions github-actions Bot added the severity-critical Requires expert review - security/consensus critical label Jun 11, 2026
@github-actions

Copy link
Copy Markdown

PR Severity: CRITICAL

Highest-severity file determines level | 4 non-test files | 467 lines changed

Critical (2 files):

  • htlcswitch/held_htlc_set.go - HTLC forwarding state machine
  • htlcswitch/interceptable_switch.go - HTLC forwarding/payment routing interceptable switch

Medium (1 file):

  • witness_beacon.go - Root-level Go file

Low (4 files - excluded from severity bump counts):

  • docs/release-notes/release-notes-0.21.1.md
  • htlcswitch/held_htlc_set_test.go
  • itest/list_on_test.go
  • itest/lnd_forward_interceptor_test.go

Analysis

This PR modifies core htlcswitch package files. Changes to held_htlc_set.go and interceptable_switch.go affect the HTLC forwarding and payment routing state machine, warranting expert review.

Bump check: 4 non-test files (threshold >20), 467 lines changed (threshold >500), 1 critical package. No bump applied.

To override, add a severity-override-{critical,high,medium,low} label.
<!-- pr-severity-bot -->

@ziggie1984 ziggie1984 added this to the v0.21.1 milestone Jun 11, 2026
@ziggie1984 ziggie1984 added this to v0.21 Jun 11, 2026
@gemini-code-assist

Copy link
Copy Markdown

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request refactors the HTLC interception mechanism to accurately manage HTLCs that transition from off-chain forwarding to on-chain resolution, particularly in scenarios involving channel force-closures. By introducing distinct handling for off-chain and on-chain held HTLCs, the changes prevent incorrect auto-failing or misdirection of settlement attempts, thereby enhancing the robustness and reliability of HTLC management within the system.

Highlights

  • Differentiated HTLC Holding: The heldHtlcSet now distinguishes between offChainHeld and onChainHeld entries, allowing for different behaviors based on whether an HTLC is in the off-chain forwarding flow or the on-chain contractcourt flow.
  • Correct On-Chain HTLC Resolution: On-chain held HTLCs are now settle-only and expire by pruning local interceptor state, preventing issues like premature auto-failing after a restart or incorrect routing of settlement messages to the link mailbox instead of the witness beacon.
  • Dynamic HTLC State Transition: If an HTLC is initially held off-chain and then re-offered on-chain (e.g., due to a channel force-close), the stored entry in the heldHtlcSet is replaced to ensure future settlement attempts correctly reach the witness beacon.
  • Robust Resolution Logic: HTLC resolution now removes an entry from the held set only after the selected action (settle, fail, resume) successfully completes, improving atomicity and reliability.
  • Integration Test Coverage: New integration tests have been added to cover critical scenarios, including settling an on-chain intercepted HTLC after a node restart and settling an HTLC that transitions from off-chain to on-chain without a restart.
New Features

🧠 You can now enable Memory (public preview) to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the heldHtlcSet to distinguish between off-chain and on-chain held HTLCs using a new heldEntry interface, ensuring proper handling of resolutions and expirations for both flows. It also updates the InterceptableSwitch and witness_beacon to support this, and adds comprehensive unit and integration tests. A critical deadlock risk was identified in witness_beacon.go where SubscribeUpdates holds a lock while making a blocking call to the interceptor, which can conflict with the switch's event loop attempting to acquire the same lock during resolution; releasing the lock early is recommended.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread witness_beacon.go Outdated
@ziggie1984 ziggie1984 force-pushed the fix-onchain-interceptor-held-htlc branch 2 times, most recently from cf2e20c to 02347b0 Compare June 11, 2026 01:35
@ziggie1984

Copy link
Copy Markdown
Collaborator Author

/gemini review

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the heldHtlcSet to track held HTLCs as either off-chain or on-chain entries using a new heldEntry interface. This change ensures that on-chain re-offers can replace old off-chain holds, allowing settlements to reach the witness beacon after an incoming channel force-closes. Additionally, integration tests are added to verify on-chain settlement scenarios, and minor fixes are made to the witness beacon subscription flow. The reviewer's feedback correctly points out a style guide violation where the newHeldHtlcSet function lacks a documentation comment.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread htlcswitch/held_htlc_set.go
@ziggie1984 ziggie1984 force-pushed the fix-onchain-interceptor-held-htlc branch from 02347b0 to 6495243 Compare June 11, 2026 01:50
@saubyk saubyk removed this from v0.21 Jun 11, 2026
@saubyk saubyk added this to lnd v0.22 Jun 11, 2026
@saubyk saubyk removed this from the v0.21.1 milestone Jun 11, 2026
@github-project-automation github-project-automation Bot moved this to Backlog in lnd v0.22 Jun 11, 2026
@saubyk saubyk added this to the v0.22.0 milestone Jun 11, 2026
@saubyk saubyk moved this from Backlog to In progress in lnd v0.22 Jun 11, 2026
@ziggie1984 ziggie1984 modified the milestones: v0.22.0, v0.21.1 Jun 11, 2026
@saubyk saubyk added this to v0.21 Jun 11, 2026
@saubyk saubyk removed this from lnd v0.22 Jun 11, 2026
@saubyk saubyk moved this to In progress in v0.21 Jun 11, 2026
@ziggie1984 ziggie1984 force-pushed the fix-onchain-interceptor-held-htlc branch 4 times, most recently from 5e3c3c3 to 424d1a8 Compare June 11, 2026 20:35
@ziggie1984

Copy link
Copy Markdown
Collaborator Author

/gemini review

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the heldHtlcSet to support both off-chain and on-chain held HTLCs via a new heldEntry interface, resolving an issue where on-chain forward interceptor settlements failed to reach the witness beacon after a channel force-close. The changes also include updates to InterceptableSwitch and witness_beacon.go, comprehensive unit and integration tests, and updated documentation. Feedback is provided to add a documentation comment to newHeldHtlcSet to align with the repository's style guide.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread htlcswitch/held_htlc_set.go
When an off-chain held HTLC is promoted to on-chain, the interceptor is
not notified again because it already has the circuit key. A client may
therefore still answer with an off-chain decision such as resume or fail.

Those actions cannot be applied to the on-chain resolver, but returning an
error tears down the whole interceptor stream. Log and ignore non-settle
actions instead, keeping the entry available for a later settle or expiry.
On-chain held entries are replay handles for the interceptor while
contractcourt waits for a preimage or on-chain expiry. Once the resolver
tears down, keeping the handle until the refund timeout can replay a stale
HTLC to a reconnecting interceptor.

Thread a dedicated cleanup signal from the witness subscription cancel path
back through the interceptable switch event loop. The held set only removes
on-chain entries for that signal, leaving off-chain entries under the link
flow lifecycle.
Use distinct internal deadline types for off-chain auto-fail heights and
on-chain settlement deadlines instead of overloading the intercepted packet
field.

Project both variants back into the existing router RPC auto_fail_height
field to preserve wire compatibility. Reject mismatched held HTLC deadline
types in tests.
@ziggie1984 ziggie1984 force-pushed the fix-onchain-interceptor-held-htlc branch from f6d505a to dcc17a9 Compare June 19, 2026 10:10
@ziggie1984

ziggie1984 commented Jun 19, 2026

Copy link
Copy Markdown
Collaborator Author

/gateway dismiss F2

This is documented and was chosen to not duplicate notification for current clients. Since the onchain timeout is always larger and the htlc when onchain cannot be failed anyways its ok

@lightninglabs-gateway

Copy link
Copy Markdown

🚫 Dismissed F2 (minor) by @ziggie1984no reason given

Open findings on this PR: 🟠 F1 (major) · 🟡 F3 (minor) · 🟡 F4 (minor)

@ziggie1984

Copy link
Copy Markdown
Collaborator Author

/gateway dismiss F3

@ziggie1984

Copy link
Copy Markdown
Collaborator Author

/gateway re-review

@lightninglabs-gateway

Copy link
Copy Markdown

🚫 Dismissed F3 (minor) by @ziggie1984no reason given

Open findings on this PR: 🟠 F1 (major) · 🟡 F4 (minor)

@lightninglabs-gateway

lightninglabs-gateway Bot commented Jun 19, 2026

Copy link
Copy Markdown

✅ Re-review posted: #10895 (review)

Prior findings: 2 addressed, 0 still unresolved, 0 withdrawn.
New findings: 4.
Total inline: 4; in body: 0.

🔁 Need another re-review after pushing changes? Reply with /gateway re-review.
Maintainers can also /gateway dismiss <id> to silence specific findings, or anyone can /gateway explain <id> for elaboration.

@lightninglabs-gateway lightninglabs-gateway Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both threaded prior findings are resolved: the on-chain resolve path now returns the documented ErrCannotFailOnChain/ErrCannotResumeOnChain errors instead of silently forwarding non-settle actions (F1), and the release notes now document the InterceptedPacket field change and Deadline/AutoFailHeight() migration (F4). The design is sound and the specialist sweep surfaced no new fund-loss, race, or deadlock on the settlement path — several alarming-looking candidates turned out to be intentional, documented behavior.

The one genuine blocker-to-merge is a committed Git merge-conflict artifact in the release notes. Beyond that, only minor sharp edges remain: the stream-fatal nature of the now-surfaced on-chain errors (a maintainer-discussed tradeoff whose blast radius isn't documented for clients), an implicit demotion-safety invariant in addOffChain, and a trailing-whitespace lint nit.

Findings: 🔴 0 Blocker · 🟠 1 Major · 🟡 3 Minor · 🔵 0 Nit


Status of prior findings

  • F1 addressed: Resolved in htlcswitch/held_htlc_set.goonChainHeld.resolve now returns ErrCannotFailOnChain for FwdActionFail and ErrCannotResumeOnChain for FwdActionResume/FwdActionResumeModified, matching the router.proto contract that on-chain HTLCs are settle-only. Covered by TestHeldHtlcSetOnChainResolve, which asserts each non-settle action returns the typed error and leaves the entry held.
  • F4 addressed: Resolved in docs/release-notes/release-notes-0.21.1.md — the new bug-fix bullet now documents the breaking htlcswitch.InterceptedPacket change: "Go callers ... should use the new Deadline field to distinguish off-chain auto-fail heights from on-chain settlement deadlines, or AutoFailHeight() if they only need the legacy flattened value." This steers callers to the typed Deadline as recommended.

Comment thread docs/release-notes/release-notes-0.21.1.md Outdated
Comment thread docs/release-notes/release-notes-0.21.1.md Outdated
case FwdActionSettle:
return h.fwd.Settle(res.Preimage)

case FwdActionFail:

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/gateway dismiss F7, we are aware of the interceptor stream error feedback capabilities right now

// expire expires held forwards whose deadline has passed.
func (h *heldHtlcSet) expire(height uint32) []heldHtlcExpireError {
var errs []heldHtlcExpireError

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/gateway dismiss F8, onchain is always the final state in case a channel goes onchain, your explicit case can not happen or only in some edge case race scenario where I think it is ok no not log anything but just ignore this call and exit early.

On-chain intercepted HTLCs can only be settled. Resume and fail actions
already return concrete errors through the on-chain intercepted forward, so
let those errors propagate to the interceptor client instead of converting
them to success.

Keep the held entry tracked on these errors so the client can reconnect and
settle the HTLC later. Update the router RPC docs to describe the error
behavior.
@ziggie1984 ziggie1984 force-pushed the fix-onchain-interceptor-held-htlc branch from dcc17a9 to b6bf686 Compare June 19, 2026 11:12
case FwdActionSettle:
return h.fwd.Settle(res.Preimage)

case FwdActionFail:

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/gateway dismiss F7, we are aware of the interceptor stream error feedback capabilities right now

// expire expires held forwards whose deadline has passed.
func (h *heldHtlcSet) expire(height uint32) []heldHtlcExpireError {
var errs []heldHtlcExpireError

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/gateway dismiss F8, onchain is always the final state in case a channel goes onchain, your explicit case can not happen or only in some edge case race scenario where I think it is ok no not log anything but just ignore this call and exit early.

@ziggie1984

Copy link
Copy Markdown
Collaborator Author

/gateway re-review

@lightninglabs-gateway

lightninglabs-gateway Bot commented Jun 19, 2026

Copy link
Copy Markdown

✅ Re-review posted: #10895 (review)

Prior findings: 4 addressed, 0 still unresolved, 0 withdrawn.
New findings: 0.
Total inline: 0; in body: 0.

🔁 Need another re-review after pushing changes? Reply with /gateway re-review.
Maintainers can also /gateway dismiss <id> to silence specific findings, or anyone can /gateway explain <id> for elaboration.

@lightninglabs-gateway lightninglabs-gateway Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-review of the on-chain/off-chain interceptor split. All prior findings are now resolved or have been dismissed by the maintainer: the two major items (F1 on-chain non-settle rejection, F4 exported-field breaking-change note) stay addressed, and the two release-note defects (F5 Git conflict markers, F6 hard-line-break separator) are fixed in docs/release-notes/release-notes-0.21.1.md.

A fresh pass over the current diff surfaced no genuinely-new defects. The concerns the specialists raised either restate findings the maintainer already dismissed with sound reasoning (the auto_fail_height overload — F3; the stream-terminating on-chain rejection — F7; the broad addOffChain exists-guard — F8), describe the PR's intended and itest-covered design (on-chain holds surviving interceptor disconnect; type-aware expire pruning on-chain entries rather than failing them), or were already debated and settled in PR review (the deliberately unbuffered onchainInterceptDone channel).


Status of prior findings

  • F1 addressed: Resolved — onChainHeld.resolve in htlcswitch/held_htlc_set.go now returns ErrCannotFailOnChain for Fail and ErrCannotResumeOnChain for Resume/ResumeModified, matching the ForwardHtlcInterceptResponse contract documented in router.proto. Non-settle actions no longer silently forward to a dead link.
  • F4 addressed: Resolved — the bug-fix bullet in docs/release-notes/release-notes-0.21.1.md now documents the htlcswitch.InterceptedPacket change for Go callers, directing them to the new Deadline field to distinguish off-chain auto-fail heights from on-chain settlement deadlines, or AutoFailHeight() for the legacy flattened value.
  • F5 addressed: Resolved — the Deprecations section of docs/release-notes/release-notes-0.21.1.md no longer contains the <<<<<<</=======/>>>>>>> Git conflict markers; both ### ⚠️ Warning deprecation entries render cleanly.
  • F6 addressed: Resolved — the separator between the tor bullet and the new interceptor bullet is now a genuine empty line rather than a two-space hard line break.

@saubyk

saubyk commented Jun 19, 2026

Copy link
Copy Markdown
Collaborator

/gateway approve

@lightninglabs-gateway

Copy link
Copy Markdown

❌ Cannot approve: 2 finding(s) still open.

  • F7 (minor) — unresolved
  • F8 (minor) — unresolved

Address them and run /gateway re-review to refresh status, then /gateway approve.

@saubyk

saubyk commented Jun 19, 2026

Copy link
Copy Markdown
Collaborator

/gateway dismiss F7, we are aware of the interceptor stream error feedback capabilities right now

@lightninglabs-gateway

Copy link
Copy Markdown

/gateway dismiss first argument must be a finding id like F3 (got: F7,).

@saubyk

saubyk commented Jun 19, 2026

Copy link
Copy Markdown
Collaborator

/gateway dismiss F7 we are aware of the interceptor stream error feedback capabilities right now

@lightninglabs-gateway

Copy link
Copy Markdown

🚫 Dismissed F7 (minor) by @saubyk — we are aware of the interceptor stream error feedback capabilities right now

Open findings on this PR: 🟡 F8 (minor)

@saubyk

saubyk commented Jun 19, 2026

Copy link
Copy Markdown
Collaborator

/gateway dismiss F8 onchain is always the final state in case a channel goes onchain, your explicit case can not happen or only in some edge case race scenario where I think it is ok no not log anything but just ignore this call and exit early.

@lightninglabs-gateway

Copy link
Copy Markdown

🚫 Dismissed F8 (minor) by @saubyk — onchain is always the final state in case a channel goes onchain, your explicit case can not happen or only in some edge case race scenario where I think it is ok no not log anything but just ignore this call and exit early.

All findings on this PR are addressed or dismissed.

@saubyk

saubyk commented Jun 19, 2026

Copy link
Copy Markdown
Collaborator

/gateway approve

@lightninglabs-gateway lightninglabs-gateway Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Approved — all prior findings addressed.

Findings recap

Findings:

  • F1 (major) — addressed: onChainHeld.resolve accepts FwdActionResume, FwdActionResumeModified, and FwdActionFail and forwards them to h.fwd.Resume()/`h.fwd…
  • F2 (minor) — unresolved: interceptOnChain only calls sendForward when !wasHeld, so when an existing off-chain hold is promoted to on-chain the interceptor clie…
  • F3 (minor) — unresolved: InterceptedPacket.AutoFailHeight() projects both either-arms to a single int32, and the RPC layer (forward_interceptor.go) maps that i…
  • F4 (minor) — addressed: The exported AutoFailHeight int32 field on htlcswitch.InterceptedPacket (stable since 2022) is removed in favor of the `Deadline fn.Eith…
  • F5 (major) — addressed: The Deprecations section contains unresolved Git conflict markers committed into the file: <<<<<<< HEAD (line 63), ======= (line 68), an…
  • F6 (minor) — addressed: The separator line between the tor bullet and the new interceptor bullet is (two spaces) rather than an empty line. Two trailing spaces…
  • F7 (minor) — unresolved: onChainHeld.resolve returning ErrCannotFailOnChain/ErrCannotResumeOnChain is correct per the documented contract, but that error propa…
  • F8 (minor) — unresolved: addOffChain does if h.exists(key) { return nil }, which no-ops for any existing entry — including an onChainHeld. That correctly pre…

Dismissed:

  • F2 by @ziggie1984
  • F3 by @ziggie1984
  • F7 by @saubyk — we are aware of the interceptor stream error feedback capabilities right now
  • F8 by @saubyk — onchain is always the final state in case a channel goes onchain, your explicit case can not happen or only in some edge case race scenario where I think it is ok no not log anything but just ignore this call and exit early.

Approved by @saubyk via /gateway approve. Last reviewed at b6bf686. Skill v0.3.0, model claude-opus-4-8.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport-v0.21.x-branch This label triggers a backport to branch `v0.21.x-branch ` bug fix gateway-active severity-critical Requires expert review - security/consensus critical

Projects

Status: In review

Development

Successfully merging this pull request may close these issues.

[bug]: on-chain intercepts evicted after one block due to unset AutoFailHeight

4 participants