Skip to content

fix(stop): clean up systemd service and data on unit removal#773

Merged
cbartz merged 7 commits intomainfrom
fix/cleanup-systemd-service-on-stop
Mar 31, 2026
Merged

fix(stop): clean up systemd service and data on unit removal#773
cbartz merged 7 commits intomainfrom
fix/cleanup-systemd-service-on-stop

Conversation

@cbartz
Copy link
Copy Markdown
Collaborator

@cbartz cbartz commented Mar 30, 2026

Applicable spec: N/A

Overview

On unit removal (stop hook), disable and remove the per-unit systemd service file, reload the daemon, and clean up the per-unit data directory and log file.

Rationale

On co-located machines (dedicated machine shared by multiple github-runner charms), removing a unit left its systemd service file enabled with Restart=on-failure. Systemd would restart the stale service, and if the binary was still compatible (installed by another charm unit on the same machine), it silently kept managing runners with outdated configuration.

Juju Events Changes

No new events. The existing stop hook handler now calls manager_service.cleanup() instead of manager_service.stop().

Module Changes

  • manager_service.py: Added cleanup() (public) and _remove_unit_data() (private helper). cleanup() stops the service, disables it, removes the service file, reloads systemd, and removes the per-unit data dir and log file.
  • charm.py: _on_stop calls cleanup() instead of stop().

Library Changes

None.

Checklist

On co-located machines with multiple github-runner charms, removing a
unit left its systemd service file enabled. Systemd would restart the
stale service, which — if the binary was still compatible — silently
kept managing runners with outdated configuration.

The stop hook now disables the service, removes the service file,
reloads systemd, and cleans up the per-unit data and log files.
Replace 'per-unit' with 'the unit' and backtick-quote 'systemd'.
@cbartz cbartz requested review from Copilot and removed request for Copilot March 30, 2026 10:38
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the charm’s unit removal (stop hook) behavior to fully clean up per-unit systemd artifacts and on-disk state so co-located units don’t leave behind enabled/restarting stale services.

Changes:

  • Add manager_service.cleanup() to stop, disable, remove the unit systemd service file, reload systemd, and remove per-unit data/logs.
  • Switch the charm’s _on_stop handler to call cleanup() instead of stop().
  • Add unit tests for cleanup(), and bump manager version + changelog entry.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/manager_service.py Introduces cleanup() and per-unit data removal logic.
src/charm.py Uses cleanup() during the stop hook.
tests/unit/test_manager_service.py Adds coverage for cleanup success/idempotency/systemd error.
tests/unit/test_charm.py Updates stop-hook assertion to expect cleanup().
github-runner-manager/pyproject.toml Bumps package version to 0.17.1.
docs/changelog.md Records the unit-removal cleanup fix.

cbartz added 2 commits March 30, 2026 12:53
Catch OSError from service file unlink in cleanup(), and replace
shutil.rmtree(ignore_errors=True) with explicit FileNotFoundError
handling so permission errors surface as RunnerManagerApplicationStopError.
Distinguish cleanup errors from stop errors in log output to make
diagnosing unit removal failures easier.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Skip service_disable when the service file is already absent to ensure
idempotent cleanup. Wrap log file unlink in OSError handling for
consistent error reporting.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Reuse instance_service to build the service file path, removing
duplicated naming logic. Rename charm test to reflect cleanup behavior.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

Call service_disable regardless of service file presence to clean up
dangling enablement symlinks from partial manual cleanup. Treat
SystemdError from disable as non-fatal (log warning), matching the
pattern in _disable_legacy_service().
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

@cbartz cbartz marked this pull request as ready for review March 30, 2026 13:17
@cbartz cbartz added the bug Something isn't working label Mar 31, 2026
@cbartz cbartz merged commit 54d811a into main Mar 31, 2026
93 of 99 checks passed
@cbartz cbartz deleted the fix/cleanup-systemd-service-on-stop branch March 31, 2026 06:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working Libraries: OK

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants