Persistent MonitorEvents#4491
Persistent MonitorEvents#4491valentinewallace wants to merge 16 commits intolightningdevkit:mainfrom
MonitorEvents#4491Conversation
Helps when debugging to know which variants failed.
Ack monitor events besides HTLCUpdate XXX
No test coverage
No test coverage
Also add provided_monitor_event_ids tracking in ChannelMonitor, which is in-memory only.
Monitor events will keep being provided on restart to the ChannelManager until they are explicitly ACK'd. This new code will run randomly in tests, to ensure we still support the old paths. XXX
|
👋 Hi! I see this is a draft PR. |
| match channel_monitor.pending_monitor_events.iter().find(|ev| match ev { | ||
| // Strip IDs from pending_monitor_events for backward-compatible TLV field 5. | ||
| // TODO: can we remove the legacy monitor event variant? | ||
| let pending_monitor_events: Vec<MonitorEvent> = match channel_monitor |
| (35, channel_monitor.is_manual_broadcast, required), | ||
| (37, channel_monitor.funding_seen_onchain, required), | ||
| (39, channel_monitor.next_monitor_event_id, required), | ||
| (41, monitor_event_ids, required_vec), |
There was a problem hiding this comment.
ISTM we could do this simpler by just switching based on the persistent_events_enabled flag - if its set, write the full pending_monitor_events to a new even TLV, if its not, store the old thing.
There was a problem hiding this comment.
I'm fine with this, but I don't think it's simpler because MonitorEvent is MaybeReadable and there isn't a writeable impl for tuples with elements that are MaybeReadable, just Readable. So we'd need some kind of read wrapper or something. Wdyt?
| next_monitor_event_id: u64, | ||
| /// Tracks which monitor events have already been returned to the caller during this runtime | ||
| /// session. In-memory only — on restart this is empty, so all unacked events are re-provided. | ||
| provided_monitor_event_ids: HashSet<u64>, |
There was a problem hiding this comment.
bleh, just move the provided events into a new vec.
| FailHTLC { | ||
| htlc_id: u64, | ||
| err_packet: msgs::OnionErrorPacket, | ||
| monitor_event_id: Option<(u64, ChannelId)>, |
There was a problem hiding this comment.
Is the code cleaner to have some type for this? Maybe just a MonitorUpdateCompletionAction but could also be some MonitorEventSource?
| (17, in_flight_monitor_updates, option), | ||
| (19, peer_storage_dir, optional_vec), | ||
| (21, async_receive_offer_cache, (default_value, async_receive_offer_cache)), | ||
| (23, persistent_monitor_events, (default_value, false)), |
There was a problem hiding this comment.
Given the logic isn't complete, should we write this as an even tlv on the write side and deliberately not implement read for it yet?
There was a problem hiding this comment.
Do you mean use an even tlv and conditionally write it only if it's set to true? That makes sense to me
There was a problem hiding this comment.
Is this the plan for upgrades/downgrades?
- Until ~everything's in place, use even TLVs for things but don't write them in prod, and error in prod if they are set on read
- Then have a version or two that allows the even TLVs to be set on read, but doesn't write them, to allow downgrades
- Then have a version that allows the even TLVs to be missing, runs the legacy reconstruction logic once, then sets
persistent_monitor_eventsin the manager/monitors at the end of manager read so even TLVs will be henceforth written
As part of #4482, we're looking into changing our architecture -- currently, the
Channel{Manager}is responsible for managing the resolution of off-chain HTLCs, and theChannelMonitoris responsible for them once they're on-chain. See the issue description but there's complexity that results from this design.Quoting the issue, "Instead, we want to do all resolution in ChannelMonitors (in response to ChannelMonitorUpdates) and pass them back to ChannelManager in the form of MonitorEvents (similar to how HTLCs are resolved after channels are closed). In order to have reliable resolution, we'll need to keep MonitorEvents around in the ChannelMonitor until the ChannelManager has finished processing them - adding a new MonitorEvent resolution path through a new method (rather than via ChannelMonitorUpdates)."
Here we don't add any new
MonitorEvents but add support for making the ones we do have persistent across restarts until they are explicitly ack'd, via a newchain::WatchAPI.