confrisk: preparatory changes#1154
Conversation
What: - Cancel a private swap invoice if init fails before storage. - Use a detached timeout context for invoice cleanup. - Reuse the helper for monitor-driven invoice cancellation. - Add tests for early-init invoice cancellation. Why: - A failed pre-storage swap cannot be recovered on restart. - Leaving its invoice active can accumulate orphan invoices.
What: - Send finalization notifications from a goroutine. - Stop blocking the deposit FSM on the manager receive loop. - Add tests for blocked manager and shutdown races. Why: - Final states should not stall while deposit locks are held. - Startup recovery can handle shutdown before notification delivery.
What: - Fix state-list query formatting for final loop-in states. - Add a failed loop-in to the final-state store test. Why: - The previous formatting excluded the last state in the list. - Failed swaps must remain visible to callers querying final states.
What: - Make reservation fanout nonblocking for slow subscribers. - Keep recovery and sweep notifications blocking until delivered. - Make required fanout cancellation-aware. - Cover slow and required subscribers in manager tests. Why: - A slow optional subscriber must not stall the manager lock. - Required work requests still need reliable local delivery.
|
@claude please review |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces preparatory fixes for dynamic confirmation work by improving the robustness of static loop-in and notification delivery flows. The changes focus on preventing system stalls caused by slow subscribers, ensuring proper cleanup of transient resources like invoices during failure scenarios, and improving the reliability of state tracking for loop-in swaps. Highlights
New Features🧠 You can now enable Memory (public preview) to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces non-blocking notification delivery for subscribers, asynchronous deposit finalization, and automatic invoice cleanup on initialization failures. Specifically, a per-subscriber FIFO queue is added to prevent slow subscribers from blocking the main loop, and FinalizeDepositAction is updated to run asynchronously. Additionally, InitHtlcAction now ensures that pre-created invoices are canceled if an error occurs before persistence. Feedback on the changes highlights a potential memory leak in the FIFO queue implementation where sliced elements are not zeroed out, which could prevent garbage collection of pointer types.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| ntfn := q.pending[0] | ||
| q.pending = q.pending[1:] | ||
| q.Unlock() |
There was a problem hiding this comment.
In Go, slicing a slice (e.g., q.pending = q.pending[1:]) does not zero out the retired element. If T is a pointer type, the underlying array will retain a reference to the popped notification, preventing it from being garbage collected until that index in the underlying array is overwritten or the queue is garbage collected. To avoid this memory leak, explicitly zero out the popped element before slicing.
ntfn := q.pending[0]
var zero T
q.pending[0] = zero
q.pending = q.pending[1:]
q.Unlock()
Code ReviewOverall this is a well-scoped preparatory PR that addresses real failure modes. The fixes are independent and the PR description maps cleanly to the diffs. A few things worth flagging before merge:
|
adfd82a to
283bc5f
Compare
What: - Add per-subscriber queues for must-deliver notifications. - Use queued delivery for sweep requests and unfinished swaps. - Match subscribers by receive channel when removing them. - Add tests for queued delivery and cleanup. Why: - Required fanout should not block the manager lock. - Subscribers still need ordered delivery after brief backpressure.
f321b97 to
cb83ba9
Compare
|
@claude please re-review after I added queued notification handling of reservations as well. |
Code ReviewPR: confrisk: preparatory changes The overall direction is sound — each fix targets a real failure mode, test coverage is strong, and the changes are well-scoped. A few areas worth discussing:
|
| Area | Verdict |
|---|---|
| SQL state query bug fix | ✅ Correct and important |
| Async deposit finalization | ✅ Clean fix, good tests |
| Invoice cleanup on init failure | ✅ Correct, minor naming suggestion |
| Notification queue design | |
| Test coverage | ✅ Good |
The main items worth resolving before merge are the PR description/code mismatch on reservation notification queuing, and the recover() approach for the closed-channel race.
starius
left a comment
There was a problem hiding this comment.
LGTM! 💾
A couple of comments left.
|
|
||
| func toJointStringStates(states []fsm.StateType) string { | ||
| return "{" + strings.Join(toStrings(states), ",") + "}" | ||
| return strings.Join(toStrings(states), ",") |
There was a problem hiding this comment.
Good fix! Can you adjust the commit message to explain that now it covers the first and the last states as well, and add a test for first state, please?
| return | ||
| } | ||
|
|
||
| f.cancelSwapInvoice(ctx) |
There was a problem hiding this comment.
This fix is correct for pre-storage situations.
What happens with an invoice in SignHtlcTx/MonitorInvoiceAndHtlcTx → OnError → UnlockDeposits paths?
| case <-deadlineChan: | ||
| // If the server didn't pay the invoice on time, we | ||
| // cancel the invoice and keep monitoring the htlc tx | ||
| // confirmation. We also need to unlock the deposits to | ||
| // re-enable them for loop-ins and withdrawals. | ||
| cancelInvoice() | ||
|
|
||
| event := f.UnlockDepositsAction(ctx, nil) | ||
| if event != fsm.OnError { | ||
| f.Errorf("unable to unlock deposits after " + | ||
| "payment deadline") | ||
| } | ||
|
|
There was a problem hiding this comment.
This guard is dead code. UnlockDepositsAction returns fsm.OnError on both paths — the success path returns fsm.OnError (line 879), and the failure path returns f.HandleError(err), which is also fsm.OnError. So event != fsm.OnError is never true and the "unable to unlock deposits after payment deadline" error is never logged, including when the unlock genuinely fails.
Concretely: when the payment deadline fires we cancel the invoice, try to unlock the deposits, and keep monitoring the htlc. If TransitionDeposits(..., deposit.Deposited) fails here we get no signal for it — the contextual log can't fire — and the FSM keeps running as though the deposits were freed while they actually stay locked until the swap resolves another way (e.g. the htlc timeout sweep). The deposits are the user's funds, so this is a "don't silently fail an operation" issue: low severity, but strictly worse than what's trivially available, since we can just check the real error.
I confirmed the premise on a live regtest run: on a failed init the loop-in FSM entered UnlockDeposits, the deposit transitioned LoopingIn -> Deposited (a successful unlock), and the FSM still emitted OnError -> Failed — i.e. UnlockDepositsAction returns fsm.OnError even on success, which is exactly why this guard can never fire.
Suggested fix: https://gist.github.com/starius/915dee5b117c07f2042170d992274ec6
extracted commits from #1141
This PR pulls out general correctness and notification-delivery fixes ahead of the dynamic confirmation work.
It fixes a few independent issues in static address loop-in/deposit handling and hardens notification fanout so slow subscribers cannot
block unrelated notification delivery.
What
Why
These changes reduce failure modes that are not specific to dynamic confirmations, but become more important as static loop-ins rely more
heavily on notification-driven flows.
Moving them first keeps the dynamic confirmation PR focused on the actual feature behavior while landing independent fixes separately.