kdump: Introduce new kdump_emergency option and /etc/kdump/emergency.d directory interface#143
kdump: Introduce new kdump_emergency option and /etc/kdump/emergency.d directory interface#143nishibem wants to merge 9 commits intorhkdump:mainfrom
Conversation
…ing error handling Update kdump.sh to execute binary and script files located under /etc/kdump/emergency.d and the file specified by kdump_emergency option when the error handler is invoked in emergency mode. Running these files during error handling ensures that emergency-specific actions and configuration are applied even when kdump encounters failures prior to dump process. This allows administrators to define custom recovery or logging behavior in error scenarios. Signed-off-by: Mamoru Nishibe <nishibe.mamoru@fujitsu.com>
…to kdump initramfs Includes files installed under /etc/kdump/emergency.d directory and the file specified by kdump_emergency option into kdump initramfs. Signed-off-by: Mamoru Nishibe <nishibe.mamoru@fujitsu.com>
Add support for kdump_emergency option to kdump.conf parser. This allows the parser to recognize the option as valid. Signed-off-by: Mamoru Nishibe <nishibe.mamoru@fujitsu.com>
Consolidate EXTRA_BINS assignments into a single location and remove the CHECK_FILES variable. And also remove the directories from the HOOKS. No functional change. Signed-off-by: Mamoru Nishibe <nishibe.mamoru@fujitsu.com>
Add update checks for binary and script files located under /etc/kdump/emergency.d directory, and for the file specified by kdump_emergency option. Signed-off-by: Mamoru Nishibe <nishibe.mamoru@fujitsu.com>
This patch adds descriptions of the kdump_emergency option to the documentation. Signed-off-by: Mamoru Nishibe <nishibe.mamoru@fujitsu.com>
Replace "All" with "all" to correct grammar. Signed-off-by: Mamoru Nishibe <nishibe.mamoru@fujitsu.com>
Add a description of the Kdump Emergency Executable to kexec-kdump-howto.txt. This documentation explains the purpose of the Kdump Emergency Executable. Signed-off-by: Mamoru Nishibe <nishibe.mamoru@fujitsu.com>
Add setup logic to ensure that the kdump/emergency.d directory is properly created and prepared. Signed-off-by: Mamoru Nishibe <nishibe.mamoru@fujitsu.com>
Summary of ChangesHello @nishibem, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request enhances the kdump functionality by introducing a mechanism to execute custom scripts during emergency handling, allowing administrators to perform specific actions before the system shuts down. It adds a new configuration option and directory interface, similar to the existing pre and post kdump mechanisms, to provide greater flexibility in managing emergency scenarios. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a new kdump_emergency option and /etc/kdump/emergency.d directory interface, enhancing kdump's emergency handling capabilities. The changes include modifications to the Makefile, kdump.sh, module-setup.sh, gen-kdump-conf.sh, kdump-utils.spec, kdump.conf.5, kdumpctl, and kexec-kdump-howto.txt. The review identified a few areas for improvement, primarily focusing on error handling and code clarity.
| EXTRA_BINS="$EXTRA_BINS $CHECK_FILES" | ||
| CHECK_FILES=${OPT[extra_bins]} | ||
| EXTRA_BINS="$EXTRA_BINS $CHECK_FILES" | ||
| EXTRA_BINS="${OPT[kdump_post]} ${OPT[kdump_pre]} ${OPT[kdump_emergency]} ${OPT[extra_bins]}" |
There was a problem hiding this comment.
The variables OPT[kdump_post], OPT[kdump_pre], OPT[kdump_emergency], and OPT[extra_bins] are concatenated without any separators. This could lead to issues if the values contain spaces or special characters.
| EXTRA_BINS="${OPT[kdump_post]} ${OPT[kdump_pre]} ${OPT[kdump_emergency]} ${OPT[extra_bins]}" | |
| EXTRA_BINS="${OPT[kdump_post]} ${OPT[kdump_pre]} ${OPT[kdump_emergency]} ${OPT[extra_bins]}" |
There was a problem hiding this comment.
The original code did not take into account cases where the values contain spaces or special characters.
Could you share your thoughts on how we should handle this?
| do_kdump_emergency() { | ||
| if [ -d /etc/kdump/emergency.d ]; then | ||
| for file in /etc/kdump/emergency.d/*; do | ||
| "$file" | ||
| _ret=$? | ||
| if [ $_ret -ne 0 ]; then | ||
| derror "$file exited with $_ret status" | ||
| fi | ||
| done | ||
| fi | ||
|
|
||
| if [ -n "$KDUMP_EMERGENCY" ]; then | ||
| "$KDUMP_EMERGENCY" | ||
| _ret=$? | ||
| if [ $_ret -ne 0 ]; then | ||
| derror "$KDUMP_EMERGENCY exited with $_ret status" | ||
| fi | ||
| fi |
There was a problem hiding this comment.
The do_kdump_emergency function does not have any error handling for the directory iteration. If /etc/kdump/emergency.d does not exist or is not a directory, the script will continue without any warning. This could lead to unexpected behavior if the administrator expects the emergency scripts to be executed.
do_kdump_emergency() {
if [ ! -d /etc/kdump/emergency.d ]; then
derror "/etc/kdump/emergency.d is not a directory or does not exist"
return 1 # Indicate failure to the caller
fi
for file in /etc/kdump/emergency.d/*; do
"$file"
_ret=$?
if [ $_ret -ne 0 ]; then
derror "$file exited with $_ret status"
fi
done
if [ -n "$KDUMP_EMERGENCY" ]; then
"$KDUMP_EMERGENCY"
_ret=$?
if [ $_ret -ne 0 ]; then
derror "$KDUMP_EMERGENCY exited with $_ret status"
fi
fi
}There was a problem hiding this comment.
Since this is implemented exactly the same way as pre.d and post.d, if we fix this, we also need to apply the same fix to those as well.
I think it would be better to bundle the changes for pre.d and post.d into a separate pull request.
| elif [[ $file != "/etc/kdump/emergency.d/*" ]]; then | ||
| echo "$file is not executable" |
There was a problem hiding this comment.
The echo statement does not redirect to standard error, so the message will not be visible in logs. This could make it difficult to diagnose issues with the emergency scripts.
| elif [[ $file != "/etc/kdump/emergency.d/*" ]]; then | |
| echo "$file is not executable" | |
| echo "$file is not executable" >&2 |
There was a problem hiding this comment.
This isn’t an emergency-time message; it can be logged during initramfs generation when kdump.service starts.
We’ve confirmed it appears in the journal log under kdump.service, consistent with pre.d/post.d.
|
Hi @nishibem Thanks for creating the PR! I don't see anything wrong with the PR. But the logic of kdump_pre, kdump_post and kdump_emergency is pretty the same. So for the sake of readability and maintainability, I will appreciate it if you can refactor the code so they re-use common functions. If you don't have time, that's fine and I'll merge this first and create a separate PR. |
|
Hi @coiby Thank you for your feedback. You are right that |
prudo1
left a comment
There was a problem hiding this comment.
Hi @nishibem,
all in all the implementation looks good to me. There is only one line in the documentation I find a bit confuting. Please see below.
But I would like to better understand the use case for this feature. Looking at the code in kdump.sh, there aren't that many things that can cause it to fail prior to the dump process. If I'm not mistaken it's basically only get_host_ip. Could you please give a specific example when this feature would have been useful (i.e. what has failed and what additional logging/recovery you would have done).
Furthermore one thing I'm not fully sure myself but think we should at least discuss. The way I see it the new kdump_emergency you propose is a generalized version of failure_action. In the way that it allows to run arbitrary scripts rather than performing pre-defined actions. So I'm asking myself if it makes sense to extend failure_action to allow running scripts/binaries instead rather than introducing the new kdump_emergency. Thing is that this is a user facing change. And I'd like to keep changes to kdump.conf to a minimum.
Thanks
Philipp
| # kdump_emergency <binary | script> | ||
| # - This directive allows you to run a specified executable | ||
| # in emergency mode within the kdump capture kernel, | ||
| # prior to the dump process. |
There was a problem hiding this comment.
I find the prior to the dump process a bit confusing. When kdump.sh end up in the error handler there is no more dump process (unless you specify failure_action dump_to_rootfs). Should that be prior to executing the failure_action.?
Same for the man page.
There was a problem hiding this comment.
As you pointed out, once the kdump capture kernel enters the error handler, the dump process will never be started.
Therefore, I am considering revising the description to state that the executable runs in the emergency mode that the kdump capture kernel transitions into before the dump capture process is executed.
How about wording it as follows?
--- a/kdump.conf.5
+++ b/kdump.conf.5
@@ -133,8 +133,9 @@ than the system's RAM.
.B kdump_emergency <binary | script>
.RS
This directive allows you to run a specified executable
-in emergency mode within the kdump capture kernel,
-prior to the dump process.
+in the emergency mode entered due to an error that occurs
+before the dump collection process is executed in the
+kdump capture kernel.
.PP
All files under /etc/kdump/emergency.d are collectively sorted
and executed in lexical order, before binary or script
|
Hi @prudo1
We are considering scenarios where an issue occurs during the dracut initqueue stage before the kdump collection process starts, resulting in the dump target becoming unavailable. In such cases, kdump‑emergency.service is triggered before the kdump dump process is executed, and the kdump error handler runs instead. Since the existing kdump_post script is not executed in this situation, we believe that the new kdump_emergency directive would be effective and useful.
The failure_action option is currently executed both in the error handler that runs before the kdump capture process starts and in the failure handling that occurs after the process has begun. What we would like to do is to add specific handling exclusively for the error handler that runs before the kdump capture process starts. |
This patch set introduces a new kdump_emergency option along with its
corresponding directory interface, emergency.d.
The design follows the existing kdump_pre and kdump_post mechanisms.
The kdump.sh script is extended to execute binaries and scripts provided
via emergency.d and kdump_emergency configuration values during emergency
handling.
With this enhancement, administrators can run custom logic and
emergency-specific actions during emergency handling before system
shutdown starts.
The series also includes documentation updates that describe the
kdump_emergency interface.