Fix signaling races during peer connection setup#364
Conversation
Serialize incoming signaling messages so WebRTC descriptions and candidates are handled in order. Queue ICE candidates until a remote description is available, ignore stale answers that no longer match the current signaling state, and avoid creating duplicate Peer instances while async TURN credential lookup is in flight. On the server, have JoinLobby return the peers that existed before the join and only request connections to that snapshot. This avoids racing newly joined peers against each other while the lobby membership changes. Credential request responses can still resolve immediately because _addPeer may wait for them while handling a queued connect packet.
| private makingOffer: boolean = false | ||
| private ignoreOffer: boolean = false | ||
| private isSettingRemoteAnswerPending: boolean = false | ||
| private readonly pendingCandidates: RTCIceCandidate[] = [] |
There was a problem hiding this comment.
| private readonly pendingCandidates: RTCIceCandidate[] = [] | |
| private readonly pendingRemoteCandidates: RTCIceCandidate[] = [] |
I would always include local/remote prefixes for these kinds of state.
|
Issue 1: New lobby joiners could both initiate the same peer connection Before the change:
That matters because The fix is that |
|
Issue 2: Client signaling handlers could interleave and corrupt WebRTC setup Before the change, every websocket message started A bad sequence looked like this:
That led to a few concrete failure modes:
The fix addresses those pieces together:
|
|
As discussed in voice. Let's try and find a metric in the analytics that can verify that this change is going to be a positive one. |
Serialize incoming signaling messages so WebRTC descriptions and candidates are handled in order. Queue ICE candidates until a remote description is available, ignore stale answers that no longer match the current signaling state, and avoid creating duplicate Peer instances while async TURN credential lookup is in flight.
On the server, have
JoinLobbyreturn the peers that existed before the join and only request connections to that snapshot. This avoids racing newly joined peers against each other while the lobby membership changes.Credential request responses can still resolve immediately because
_addPeermay wait for them while handling a queued connect packet.There are no new tests as these race conditions aren't easy to replicate with simple tests. I do have a test framework I used to find them, but this runs for several minutes and uses multiple headless Chrome browsers. I'm cleaning up this code for a separate pull request.