fence_kdump: Try best to send fence message before rebooting#71
fence_kdump: Try best to send fence message before rebooting#71pfliu wants to merge 3 commits intorhkdump:mainfrom
Conversation
pfliu
commented
Feb 18, 2025
Resolves: https://issues.redhat.com/browse/RHEL-46337 As man 8 fence_kdump_send: -i, --interval=INTERVAL Time to wait between sending a message. The value for INTERVAL must be greater than zero. (default: 10) The interval 10 seconds are two large especially in the case that local dumping goes fast. Suppose the following scenario: network is not ready fence_kdump_notify & network is ready local dumping finish and reboot within 10 seconds. We will miss the chance to send out the fence dump messages. Shorten the interval to one second to ease this issue. Signed-off-by: Pingfan Liu <piliu@redhat.com>
Resolves: https://issues.redhat.com/browse/RHEL-46337 Signed-off-by: Pingfan Liu <piliu@redhat.com>
Resolves: https://issues.redhat.com/browse/RHEL-46337 fence_kdump_send may fail to send a message if the network is slow to initialize while local dumping completes quickly. To address this, add an additional wait for the network and make a best effort to send the message before rebooting. Signed-off-by: Pingfan Liu <piliu@redhat.com>
| @@ -533,10 +533,6 @@ wait_online_network() { | |||
|
|
|||
| get_host_ip() { | |||
|
|
|||
There was a problem hiding this comment.
Hi @pfliu get_host_ip will wait for the network to be ready. Maybe a simpler solution to modify get_host_ip so it will wait the network to be ready for fence_kdump? Note fence_kdump_notify has the code to check if it's fence_dkump so get_host_ip can reuse the code. I Btw, assume fence_kdump_notify is used for sending fence message so it's better get_host_ip gets moved before it.
There was a problem hiding this comment.
@coiby, sorry to reply late. here is an assumption that saving vmcore is more important than sending out fence message.
As we have observed long wait time for the network readiness, I am a little worry about the cluster manager may reboot the crashed machine forcefully during the period. If things go that way, the vmcore will not be saved.
But I am open to this option. What is your opinion now?
thanks
There was a problem hiding this comment.
Thanks for the clarification! If I understand it correctly, the reason we send out fence message is exactly to notify the cluster manager to not reboot the machine because we are in the process of doing vmcore dumping, right? But I agree there is no need to wait for the network to be ready first since FENCE_KDUMP_SEND can keeping sending message until it succeeds. So if the purpose of fence kdump to make sure vmcore dumping will not be interrupted, does it mean there is no need to wait for the network to be ready after vmcore dumping has finished?