feat: reload configuration on SIGHUP without pod restart by danolivo · Pull Request #84 · pgEdge/ace

danolivo · 2026-03-05T09:14:51Z

ACE scheduler previously required a full restart to pick up any changes to ace.yaml or pg_service.conf. This adds live reload support via SIGHUP, satisfying the requirement for database-driven config management and Kubernetes ConfigMap updates.

Changes:

pkg/config: add thread-safe Reload/Set/Get API around the global Cfg; store CfgPath in Init so the reload handler knows where to read from
internal/cli: replace the flat runner loop in StartSchedulerCLI with a schedulerReloadLoop that handles SIGHUP independently of SIGTERM/SIGINT:
- on SIGHUP: load new config, validate via dry-run BuildJobsFromConfig, drain in-flight ops via gocron.Shutdown, then atomically swap config
- on invalid config: log the error and keep the current scheduler running
- API server is started once and unaffected by reloads
tests: add pkg/config/config_reload_test.go covering Load, Init, Reload, Set, Get, DefaultCluster and concurrent access; add internal/cli/sighup_test.go covering shutdown, invalid-config rejection, valid reload, and multiple rapid SIGHUPs

Note: pg_service.conf is read lazily at task-execution time and is therefore reloaded implicitly on the next job run after a SIGHUP.

ACE scheduler previously required a full restart to pick up any changes to ace.yaml or pg_service.conf. This adds live reload support via SIGHUP, satisfying the requirement for database-driven config management and Kubernetes ConfigMap updates. Changes: - pkg/config: add thread-safe Reload/Set/Get API around the global Cfg; store CfgPath in Init so the reload handler knows where to read from - internal/cli: replace the flat runner loop in StartSchedulerCLI with a schedulerReloadLoop that handles SIGHUP independently of SIGTERM/SIGINT: * on SIGHUP: load new config, validate via dry-run BuildJobsFromConfig, drain in-flight ops via gocron.Shutdown, then atomically swap config * on invalid config: log the error and keep the current scheduler running * API server is started once and unaffected by reloads - tests: add pkg/config/config_reload_test.go covering Load, Init, Reload, Set, Get, DefaultCluster and concurrent access; add internal/cli/sighup_test.go covering shutdown, invalid-config rejection, valid reload, and multiple rapid SIGHUPs Note: pg_service.conf is read lazily at task-execution time and is therefore reloaded implicitly on the next job run after a SIGHUP.

coderabbitai · 2026-03-05T09:15:11Z

📝 Walkthrough

Walkthrough

Introduces a SIGHUP-driven scheduler reload loop that validates and atomically swaps configs with graceful draining, and adds concurrency-safe config APIs (Reload, Set, Get) plus a mutex-protected Cfg/CfgPath for live reloads.

Changes

Cohort / File(s)	Summary
Scheduler reload mechanism `internal/cli/cli.go`	Adds `schedulerReloadLoop(runCtx context.Context, sighupCh <-chan os.Signal) error` and rewrites CLI startup to run the API server in a separate goroutine; replaces one-shot scheduler start with a SIGHUP-driven reload loop that dry-runs new configs, drains in-flight jobs, and atomically swaps active config.
Concurrency-safe configuration `pkg/config/config.go`	Introduces package-level RWMutex, `CfgPath` and exported `Cfg` accessors. Adds `Reload(path string) (Config,error)`, `Set(c Config)`, and `Get() Config`. Converts `Load()` to return parsed `Config` without mutating global state and updates callers (e.g., `DefaultCluster`) to use `Get()`.

Poem

🐇 I sniff the SIGHUP on the breeze,
I hop, I pause, I check with ease,
Locks held tight, old jobs drain slow,
New config blooms — now off I go! 🎉

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 77.78% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and accurately describes the main change: adding live configuration reload support on SIGHUP without requiring pod restart, which is the primary objective of this pull request.
Description check	✅ Passed	The description is comprehensive and directly related to the changeset, explaining the problem, solution, implementation details, and noting implicit behavior with pg_service.conf.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch ace-160

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

internal/cli/cli.go (1)

1402-1404: ⚠️ Potential issue | 🟠 Major

Use a config.Get() snapshot instead of direct config.Cfg access.

These lines bypass the synchronized accessor pattern introduced for live reloads. Capture one snapshot (cfg := config.Get()) and use that for nil-checks, canStartAPIServer, and server.New in both CLI entry points.

Suggested fix

 func StartSchedulerCLI(ctx *cli.Context) error {
-	if config.Cfg == nil {
+	cfg := config.Get()
+	if cfg == nil {
 		return fmt.Errorf("configuration not loaded; run inside a directory with ace.yaml or set ACE_CONFIG")
 	}
@@
 	if runAPI {
-		if ok, apiErr := canStartAPIServer(config.Cfg); ok {
-			apiServer, err := server.New(config.Cfg)
+		if ok, apiErr := canStartAPIServer(cfg); ok {
+			apiServer, err := server.New(cfg)
 			if err != nil {
 				return fmt.Errorf("api server init failed: %w", err)
 			}
@@
 func StartAPIServerCLI(ctx *cli.Context) error {
-	if config.Cfg == nil {
+	cfg := config.Get()
+	if cfg == nil {
 		return fmt.Errorf("configuration not loaded; run inside a directory with ace.yaml or set ACE_CONFIG")
 	}
 
-	if ok, err := canStartAPIServer(config.Cfg); !ok {
+	if ok, err := canStartAPIServer(cfg); !ok {
 		return err
 	}
 
-	apiServer, err := server.New(config.Cfg)
+	apiServer, err := server.New(cfg)
 	if err != nil {
 		return err
 	}

Also applies to: 1433-1435, 1557-1565

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@internal/cli/cli.go` around lines 1402 - 1404, Replace direct uses of
config.Cfg with a single snapshot from the synchronized accessor: call cfg :=
config.Get() at the start of each CLI entry point and use that local cfg for the
nil-check, the call to canStartAPIServer(cfg) and for server.New(cfg, ...)
instead of config.Cfg; update the other occurrences noted (around the other
blocks reported) to follow the same pattern so live-reload synchronization is
respected.

🧹 Nitpick comments (2)

internal/cli/cli.go (1)
1521-1524: Prefer config.Reload("") over direct config.CfgPath reads.

Reading config.CfgPath directly bypasses the synchronized path access logic already encapsulated in Reload. Calling Reload("") keeps the lock/lookup behavior in one place.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/cli/cli.go` around lines 1521 - 1524, The code calls
config.Reload(config.CfgPath) which bypasses the synchronized path lookup inside
Reload; change the call to config.Reload("") so Reload uses its internal
lock/lookup logic instead of reading config.CfgPath directly. Update the
surrounding logic that inspects loadErr (and the logger.Info call) to continue
operating unchanged, but ensure any places that assumed a returned path from
Reload still work with the Reload("") semantics; specifically modify the
invocation in the scheduler SIGHUP handler that currently references
config.CfgPath to call config.Reload("") and rely on Reload to resolve the
effective path.
pkg/config/config.go (1)
116-125: Tighten the config access contract to avoid future unsafe reads.

Line 117 currently suggests direct reads of Cfg, and CfgPath is also publicly readable. That weakens the new synchronization model and makes accidental lock bypasses likely. Prefer documenting Get()/Set() as the only supported access path (and optionally add a getter for path).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/config/config.go` around lines 116 - 125, Update the package-level
comment and API so callers do not read Cfg or CfgPath directly: change the doc
on the exported vars (Cfg and CfgPath) to state that direct access is forbidden
and that callers must use the provided Get()/Set() functions for config access;
additionally either make CfgPath unexported or add an explicit getter (e.g.,
GetCfgPath) and document it as the supported way to obtain the config file path,
and ensure cfgMu protection is referenced in the comments for Get/Set.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/cli/cli.go`:
- Around line 1505-1518: The select inside the reloaded loop currently ignores
schedDone so an early exit from scheduler.RunJobs can silently leave the loop
blocked; update the select (the loop that checks runCtx and sighupCh) to also
handle the schedDone channel returned by the goroutine that calls
scheduler.RunJobs(schedCtx, jobs), read the error value, call schedCancel if
needed, and return the error when it is non-nil and not errors.Is(err,
context.Canceled) (mirroring the existing shutdown handling around schedCancel
and <-schedDone) so the main loop properly terminates on scheduler failures.

---

Outside diff comments:
In `@internal/cli/cli.go`:
- Around line 1402-1404: Replace direct uses of config.Cfg with a single
snapshot from the synchronized accessor: call cfg := config.Get() at the start
of each CLI entry point and use that local cfg for the nil-check, the call to
canStartAPIServer(cfg) and for server.New(cfg, ...) instead of config.Cfg;
update the other occurrences noted (around the other blocks reported) to follow
the same pattern so live-reload synchronization is respected.

---

Nitpick comments:
In `@internal/cli/cli.go`:
- Around line 1521-1524: The code calls config.Reload(config.CfgPath) which
bypasses the synchronized path lookup inside Reload; change the call to
config.Reload("") so Reload uses its internal lock/lookup logic instead of
reading config.CfgPath directly. Update the surrounding logic that inspects
loadErr (and the logger.Info call) to continue operating unchanged, but ensure
any places that assumed a returned path from Reload still work with the
Reload("") semantics; specifically modify the invocation in the scheduler SIGHUP
handler that currently references config.CfgPath to call config.Reload("") and
rely on Reload to resolve the effective path.

In `@pkg/config/config.go`:
- Around line 116-125: Update the package-level comment and API so callers do
not read Cfg or CfgPath directly: change the doc on the exported vars (Cfg and
CfgPath) to state that direct access is forbidden and that callers must use the
provided Get()/Set() functions for config access; additionally either make
CfgPath unexported or add an explicit getter (e.g., GetCfgPath) and document it
as the supported way to obtain the config file path, and ensure cfgMu protection
is referenced in the comments for Get/Set.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 3641b1bc-6df6-4047-9778-3de107f21c62

📥 Commits

Reviewing files that changed from the base of the PR and between fac0d31 and d240478.

📒 Files selected for processing (2)

internal/cli/cli.go
pkg/config/config.go

internal/cli/cli.go

coderabbitai · 2026-03-05T09:23:01Z

Note

Docstrings generation - SUCCESS
Generated docstrings for this pull request at #85

@danolivo

Docstrings generation was requested by @danolivo. * #84 (comment) The following files were modified: * `internal/cli/cli.go` * `pkg/config/config.go`

The select inside schedulerReloadLoop only watched runCtx.Done() and sighupCh, leaving schedDone unmonitored. If scheduler.RunJobs returned early on its own (e.g. a gocron internal error), the goroutine would write to schedDone and the inner loop would block indefinitely waiting for a signal that never arrives. Add a schedDone case to the select: call schedCancel for cleanup, then propagate a real error or return nil for a clean/canceled exit, matching the existing shutdown handling pattern.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/cli/cli.go`:
- Around line 1438-1454: The goroutine calling apiServer.Run can fail and
currently only calls stop(), which allows the surrounding function to still
return nil; change this to propagate that runtime error: create an error channel
(e.g., errCh) or shared atomic/variable to capture the non-canceled error from
apiServer.Run inside the goroutine (reference apiServer.Run, logger.Error,
stop()), send the error to errCh when it occurs, and then after waiting for
runCtx.Done()/shutdown check errCh and return the captured error instead of nil;
also apply the same propagation for the API-only branch so failures from
apiServer.Run are returned to the caller rather than masked.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 4299f572-ddf1-4649-846d-494562f579e8

📥 Commits

Reviewing files that changed from the base of the PR and between d240478 and 848d154.

📒 Files selected for processing (1)

internal/cli/cli.go

internal/cli/cli.go

coderabbitai bot reviewed Mar 5, 2026

View reviewed changes

internal/cli/cli.go Show resolved Hide resolved

coderabbitai bot added a commit that referenced this pull request Mar 5, 2026

📝 Add docstrings to ace-160

6be6182

Docstrings generation was requested by @danolivo. * #84 (comment) The following files were modified: * `internal/cli/cli.go` * `pkg/config/config.go`

coderabbitai bot mentioned this pull request Mar 5, 2026

📝 Add docstrings to ace-160 #85

Closed

coderabbitai bot reviewed Mar 5, 2026

View reviewed changes

internal/cli/cli.go Show resolved Hide resolved

mason-sharp requested a review from ibrarahmad March 6, 2026 01:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: reload configuration on SIGHUP without pod restart#84

feat: reload configuration on SIGHUP without pod restart#84
danolivo wants to merge 2 commits intomainfrom
ace-160

danolivo commented Mar 5, 2026

Uh oh!

coderabbitai bot commented Mar 5, 2026 •

edited

Loading

Walkthrough

Changes

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot commented Mar 5, 2026 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

danolivo commented Mar 5, 2026

Uh oh!

coderabbitai bot commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai bot commented Mar 5, 2026 •

edited

Loading

coderabbitai bot commented Mar 5, 2026 •

edited

Loading