TQ: Shrink the ack requirements for partial commit#9852
TQ: Shrink the ack requirements for partial commit#9852andrewjstone wants to merge 1 commit intomainfrom
Conversation
Currently we wait for N-Z acked commits to allow a new configuration to be installed in the `Committing` state. For a 32 node system this means that 29 nodes have to ack. If more than 3 nodes are offline than a customer cannot add or remove any sleds. This is an unnecessarily harsh restriction. This PR changes it so that only K+Z nodes must ack commits to go from `Preparing` to `CommittedPartially`, similar to the prepare phase of the protocol that takes us from `Preparing` to `Committing`. We bump Z a little bit for larger cluster to balance things out. Now, only 21 nodes must ack commits for new configurations to be accepted. This allows up to 11 nodes to be offline and eliminates the need for an escape hatch for oxide support if 4 or 5 nodes rather than 3 are offline. If 11 nodes are offline, something major is going on and really, oxide support should be involved. We will not allow automatic triage via omdb in this situation. This PR also makes acking commits idempotent in the datastore layer, which fixes a potential bug in the case of multiple nexuses trying to commit the same nodes simultaneously. Fixes #9826
| 9..=16 => 2, | ||
| _ => 3, | ||
| 4..=7 => 1, | ||
| 8..=15 => 2, |
There was a problem hiding this comment.
This might be too harsh. It would only allow 1 node to be offline for an 8 node cluster. Then again, we should never have 8 node clusters in production.
This actually isn't strictly a bug, except in the contrived test here. It can make commit occur more slowly though. In the case of competing BG tasks, one can ack say nodes 1,2,3 and a second ack 2,3,4 which would have failed. The next time around a BG task will only ask for node 4 and record that since it will know 2 and 3 have already committed. But this requires an extra loop through the BG task which in our case is up to 1 minute. In any event, there was no reason for the removed filter and so it is gone. |
Currently we wait for N-Z acked commits to allow a new configuration to be installed in the
Committingstate. For a 32 node system this means that 29 nodes have to ack. If more than 3 nodes are offline than a customer cannot add or remove any sleds. This is an unnecessarily harsh restriction.This PR changes it so that only K+Z nodes must ack commits to go from
PreparingtoCommittedPartially, similar to the prepare phase of the protocol that takes us fromPreparingtoCommitting. We bump Z a little bit for larger cluster to balance things out. Now, only 21 nodes must ack commits for new configurations to be accepted. This allows up to 11 nodes to be offline and eliminates the need for an escape hatch for oxide support if 4 or 5 nodes rather than 3 are offline. If 11 nodes are offline, something major is going on and really, oxide support should be involved. We will not allow automatic triage via omdb in this situation.This PR also makes acking commits idempotent in the datastore layer, which fixes a potential bug in the case of multiple nexuses trying to commit the same nodes simultaneously.
Fixes #9826