ENT-14108: drain cf-agent in hub preremove.sh before stopping cfengine3 umbrella#2262
Draft
larsewi wants to merge 1 commit into
Draft
ENT-14108: drain cf-agent in hub preremove.sh before stopping cfengine3 umbrella#2262larsewi wants to merge 1 commit into
larsewi wants to merge 1 commit into
Conversation
A cf-agent process spawned by cf-execd can keep running after systemctl stop cf-execd.service, finish a policy run, and then call systemctl start cf-php-fpm.service. cf-php-fpm has Wants=cf-postgres.service, so systemd pulls cf-postgres back in as a dependency. cf-postgres fails to start (data dir being torn down) and Restart=always, RestartSec=10 keeps it looping. The loop continues into the next install's postinst, collides with the postinst-launched postgres, and the postinst's final pg_ctl stop fails with "PID file does not exist" — dpkg sees exit 1 and aborts: dpkg: error processing package cfengine-nova-hub (--install): installed cfengine-nova-hub package post-installation script subprocess returned error exit status 1 Fix: in prerm, stop cf-execd first so no new cf-agent runs spawn, then wait up to 60s for any in-progress cf-agent to drain (SIGKILL the survivor), and only then run the cfengine3 umbrella stop. cf-php-fpm stays up the whole time, so policy passes without re-triggering anything. Ticket: ENT-14108 Signed-off-by: Lars Erik Wik <lars.erik.wik@northern.tech>
Contributor
Author
|
@cf-bottom Jenkins please :) |
|
Sure, I triggered a build: Jenkins: https://ci.cfengine.com/job/pr-pipeline/13821/ Packages: http://buildcache.cfengine.com/packages/testing-pr/jenkins-pr-pipeline-13821/ |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Race: prerm stops cf-execd and cf-postgres, but a still-running cf-agent then starts cf-php-fpm — which pulls cf-postgres back in as a dependency (
Wants=cf-postgres.service). The start fails (data dir being torn down) andRestart=alwayskeeps it looping into the next install's postinst, eventually clobbering postinst's postgres so the finalpg_ctl stopfails and dpkg aborts.This PR drains in-flight cf-agent in prerm before the umbrella stop, so policy can't re-trigger services during teardown. See the commit message for the full mechanism.