Skip to content

Stale splice commitment_signed after local splice abort force-closes channel #4730

@joostjager

Description

@joostjager

This was discovered while developing force-close fuzzing. This report is AI-generated and may contain mistakes.

Summary

During splice negotiation, an initiator can reach FundingTransactionReadyForSigning while the acceptor has already sent its initial commitment_signed. If the initiator then locally aborts the splice with cancel_funding_contributed before that commitment_signed is delivered, receiving the stale in-flight commitment_signed currently force-closes the channel with:

Invalid commitment tx signature from peer

The stale commitment_signed belongs to the splice negotiation that was just aborted. It should be ignored or otherwise treated as stale splice state, not validated against the post-abort live-channel state.

Impact

This can turn a valid local splice abort plus normal message reordering into a channel force-close. The peer did not necessarily send a malformed signature. Instead, the signature appears invalid because it is checked against the wrong channel state after the splice abort.

Why this is an LDK splice/state-machine issue

The reproducer does not depend on force-close fuzzing behavior. It uses normal splice APIs and normal message delivery:

  1. Start a splice-out where only the initiator contributes funds.
  2. Complete the splice interactive transaction negotiation.
  3. Observe the initiator's FundingTransactionReadyForSigning.
  4. Hold the acceptor's already-generated initial commitment_signed in flight.
  5. Locally abort the splice with cancel_funding_contributed.
  6. Keep the outbound tx_abort queued.
  7. Deliver the stale acceptor commitment_signed to the initiator.

This is a plausible transport ordering: the acceptor's response can already be in flight while the initiator's local caller chooses to cancel, and the initiator's later tx_abort may not reach the acceptor first.

The force-close fuzz harness found this ordering, but the root behavior is in the splice state machine: after local abort, LDK still treats the old splice commitment_signed as a live commitment update and validates it against the wrong state.

Expected behavior

After local splice abort, a stale in-flight commitment_signed for the aborted splice should not force-close the channel. It should be ignored, treated as stale, or otherwise handled as part of the aborted splice negotiation.

Actual behavior

LDK validates the stale commitment_signed after the local splice abort, emits an error message containing Invalid commitment tx signature from peer, broadcasts a disabled channel update, and surfaces:

ChannelClosed {
    reason: ProcessingError { err: "Invalid commitment tx signature from peer" },
    ...
}

Focused regression test

The following unit test captures the desired behavior:

#[test]
fn cancel_funding_contributed_then_inflight_commitment_signed_does_not_close_channel() {
	let chanmon_cfgs = create_chanmon_cfgs(2);
	let node_cfgs = create_node_cfgs(2, &chanmon_cfgs);
	let node_chanmgrs = create_node_chanmgrs(2, &node_cfgs, &[None, None]);
	let nodes = create_network(2, &node_cfgs, &node_chanmgrs);

	let initiator = &nodes[0];
	let acceptor = &nodes[1];

	let node_id_initiator = initiator.node.get_our_node_id();
	let node_id_acceptor = acceptor.node.get_our_node_id();

	let initial_channel_capacity = 100_000;
	let (_, _, channel_id, _) =
		create_announced_chan_between_nodes_with_value(&nodes, 0, 1, initial_channel_capacity, 0);

	let outputs = vec![TxOut {
		value: Amount::from_sat(1_000),
		script_pubkey: initiator.wallet_source.get_change_script().unwrap(),
	}];
	let funding_contribution =
		initiate_splice_out(initiator, acceptor, channel_id, outputs).unwrap();
	let new_funding_script = complete_splice_handshake(initiator, acceptor);
	complete_interactive_funding_negotiation(
		initiator,
		acceptor,
		channel_id,
		funding_contribution.clone(),
		new_funding_script,
	);

	// Both peers completed the interactive transaction exchange. Since only the
	// initiator contributed splice funds, the initiator must still surface the
	// unsigned funding transaction before it may send its initial
	// `commitment_signed`.
	let _ = get_event!(initiator, Event::FundingTransactionReadyForSigning);
	assert!(acceptor.node.get_and_clear_pending_events().is_empty());
	assert!(initiator.node.get_and_clear_pending_msg_events().is_empty());

	// The acceptor has no funding contribution, so it can send its initial
	// `commitment_signed` immediately. Hold that message to model it racing with
	// the local caller's decision to cancel instead of sign.
	let acceptor_commit_sig = get_htlc_update_msgs(acceptor, &node_id_initiator);
	assert_eq!(acceptor_commit_sig.commitment_signed.len(), 1);

	// Cancel before signing. This is a valid API flow: local contribution is
	// discarded, the splice negotiation fails locally, and LDK queues a
	// `tx_abort` for the peer.
	initiator.node.cancel_funding_contributed(&channel_id, &node_id_acceptor).unwrap();
	let reason = NegotiationFailureReason::LocallyCanceled;
	expect_splice_failed_events(initiator, &channel_id, funding_contribution, reason);

	// Keep our `tx_abort` queued. The fuzz failure has this exact ordering: our
	// abort is outbound, but the acceptor's earlier `commitment_signed` reaches
	// us first.
	let tx_abort = get_event_msg!(initiator, MessageSendEvent::SendTxAbort, node_id_acceptor);
	assert_eq!(tx_abort.channel_id, channel_id);

	initiator
		.node
		.handle_commitment_signed(node_id_acceptor, &acceptor_commit_sig.commitment_signed[0]);

	// The delayed `commitment_signed` belonged to the splice we just aborted. It
	// should not be validated against the post-abort channel state and should
	// not force-close the live channel as an invalid commitment signature.
	let msg_events = initiator.node.get_and_clear_pending_msg_events();
	let got_invalid_sig_error = msg_events.iter().any(|event| {
		matches!(
			event,
			MessageSendEvent::HandleError {
				action: msgs::ErrorAction::SendErrorMessage { msg },
				..
			} if msg.data.contains("Invalid commitment tx signature from peer")
		)
	});
	let events = initiator.node.get_and_clear_pending_events();
	let got_invalid_sig_close = events.iter().any(|event| {
		matches!(
			event,
			Event::ChannelClosed {
				reason: ClosureReason::ProcessingError { err },
				..
			} if err == "Invalid commitment tx signature from peer"
		)
	});
	let got_channel_close = events.iter().any(|event| matches!(event, Event::ChannelClosed { .. }));
	assert!(
		!got_invalid_sig_error,
		"stale commitment_signed generated invalid-signature error: {msg_events:?}; events: {events:?}"
	);
	assert!(
		!got_invalid_sig_close,
		"stale commitment_signed force-closed with invalid-signature processing error: {events:?}"
	);
	assert!(!got_channel_close, "stale commitment_signed should not close the channel: {events:?}");
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions