Conversation
ACE scheduler previously required a full restart to pick up any changes
to ace.yaml or pg_service.conf. This adds live reload support via
SIGHUP, satisfying the requirement for database-driven config management
and Kubernetes ConfigMap updates.
Changes:
- pkg/config: add thread-safe Reload/Set/Get API around the global Cfg;
store CfgPath in Init so the reload handler knows where to read from
- internal/cli: replace the flat runner loop in StartSchedulerCLI with
a schedulerReloadLoop that handles SIGHUP independently of SIGTERM/SIGINT:
* on SIGHUP: load new config, validate via dry-run BuildJobsFromConfig,
drain in-flight ops via gocron.Shutdown, then atomically swap config
* on invalid config: log the error and keep the current scheduler running
* API server is started once and unaffected by reloads
- tests: add pkg/config/config_reload_test.go covering Load, Init, Reload,
Set, Get, DefaultCluster and concurrent access; add
internal/cli/sighup_test.go covering shutdown, invalid-config rejection,
valid reload, and multiple rapid SIGHUPs
Note: pg_service.conf is read lazily at task-execution time and is
therefore reloaded implicitly on the next job run after a SIGHUP.
📝 WalkthroughWalkthroughIntroduces a SIGHUP-driven scheduler reload loop that validates and atomically swaps configs with graceful draining, and adds concurrency-safe config APIs (Reload, Set, Get) plus a mutex-protected Cfg/CfgPath for live reloads. Changes
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
internal/cli/cli.go (1)
1402-1404:⚠️ Potential issue | 🟠 MajorUse a
config.Get()snapshot instead of directconfig.Cfgaccess.These lines bypass the synchronized accessor pattern introduced for live reloads. Capture one snapshot (
cfg := config.Get()) and use that for nil-checks,canStartAPIServer, andserver.Newin both CLI entry points.Suggested fix
func StartSchedulerCLI(ctx *cli.Context) error { - if config.Cfg == nil { + cfg := config.Get() + if cfg == nil { return fmt.Errorf("configuration not loaded; run inside a directory with ace.yaml or set ACE_CONFIG") } @@ if runAPI { - if ok, apiErr := canStartAPIServer(config.Cfg); ok { - apiServer, err := server.New(config.Cfg) + if ok, apiErr := canStartAPIServer(cfg); ok { + apiServer, err := server.New(cfg) if err != nil { return fmt.Errorf("api server init failed: %w", err) } @@ func StartAPIServerCLI(ctx *cli.Context) error { - if config.Cfg == nil { + cfg := config.Get() + if cfg == nil { return fmt.Errorf("configuration not loaded; run inside a directory with ace.yaml or set ACE_CONFIG") } - if ok, err := canStartAPIServer(config.Cfg); !ok { + if ok, err := canStartAPIServer(cfg); !ok { return err } - apiServer, err := server.New(config.Cfg) + apiServer, err := server.New(cfg) if err != nil { return err }Also applies to: 1433-1435, 1557-1565
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/cli/cli.go` around lines 1402 - 1404, Replace direct uses of config.Cfg with a single snapshot from the synchronized accessor: call cfg := config.Get() at the start of each CLI entry point and use that local cfg for the nil-check, the call to canStartAPIServer(cfg) and for server.New(cfg, ...) instead of config.Cfg; update the other occurrences noted (around the other blocks reported) to follow the same pattern so live-reload synchronization is respected.
🧹 Nitpick comments (2)
internal/cli/cli.go (1)
1521-1524: Preferconfig.Reload("")over directconfig.CfgPathreads.Reading
config.CfgPathdirectly bypasses the synchronized path access logic already encapsulated inReload. CallingReload("")keeps the lock/lookup behavior in one place.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/cli/cli.go` around lines 1521 - 1524, The code calls config.Reload(config.CfgPath) which bypasses the synchronized path lookup inside Reload; change the call to config.Reload("") so Reload uses its internal lock/lookup logic instead of reading config.CfgPath directly. Update the surrounding logic that inspects loadErr (and the logger.Info call) to continue operating unchanged, but ensure any places that assumed a returned path from Reload still work with the Reload("") semantics; specifically modify the invocation in the scheduler SIGHUP handler that currently references config.CfgPath to call config.Reload("") and rely on Reload to resolve the effective path.pkg/config/config.go (1)
116-125: Tighten the config access contract to avoid future unsafe reads.Line 117 currently suggests direct reads of
Cfg, andCfgPathis also publicly readable. That weakens the new synchronization model and makes accidental lock bypasses likely. Prefer documentingGet()/Set()as the only supported access path (and optionally add a getter for path).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/config/config.go` around lines 116 - 125, Update the package-level comment and API so callers do not read Cfg or CfgPath directly: change the doc on the exported vars (Cfg and CfgPath) to state that direct access is forbidden and that callers must use the provided Get()/Set() functions for config access; additionally either make CfgPath unexported or add an explicit getter (e.g., GetCfgPath) and document it as the supported way to obtain the config file path, and ensure cfgMu protection is referenced in the comments for Get/Set.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/cli/cli.go`:
- Around line 1505-1518: The select inside the reloaded loop currently ignores
schedDone so an early exit from scheduler.RunJobs can silently leave the loop
blocked; update the select (the loop that checks runCtx and sighupCh) to also
handle the schedDone channel returned by the goroutine that calls
scheduler.RunJobs(schedCtx, jobs), read the error value, call schedCancel if
needed, and return the error when it is non-nil and not errors.Is(err,
context.Canceled) (mirroring the existing shutdown handling around schedCancel
and <-schedDone) so the main loop properly terminates on scheduler failures.
---
Outside diff comments:
In `@internal/cli/cli.go`:
- Around line 1402-1404: Replace direct uses of config.Cfg with a single
snapshot from the synchronized accessor: call cfg := config.Get() at the start
of each CLI entry point and use that local cfg for the nil-check, the call to
canStartAPIServer(cfg) and for server.New(cfg, ...) instead of config.Cfg;
update the other occurrences noted (around the other blocks reported) to follow
the same pattern so live-reload synchronization is respected.
---
Nitpick comments:
In `@internal/cli/cli.go`:
- Around line 1521-1524: The code calls config.Reload(config.CfgPath) which
bypasses the synchronized path lookup inside Reload; change the call to
config.Reload("") so Reload uses its internal lock/lookup logic instead of
reading config.CfgPath directly. Update the surrounding logic that inspects
loadErr (and the logger.Info call) to continue operating unchanged, but ensure
any places that assumed a returned path from Reload still work with the
Reload("") semantics; specifically modify the invocation in the scheduler SIGHUP
handler that currently references config.CfgPath to call config.Reload("") and
rely on Reload to resolve the effective path.
In `@pkg/config/config.go`:
- Around line 116-125: Update the package-level comment and API so callers do
not read Cfg or CfgPath directly: change the doc on the exported vars (Cfg and
CfgPath) to state that direct access is forbidden and that callers must use the
provided Get()/Set() functions for config access; additionally either make
CfgPath unexported or add an explicit getter (e.g., GetCfgPath) and document it
as the supported way to obtain the config file path, and ensure cfgMu protection
is referenced in the comments for Get/Set.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 3641b1bc-6df6-4047-9778-3de107f21c62
📒 Files selected for processing (2)
internal/cli/cli.gopkg/config/config.go
|
Note Docstrings generation - SUCCESS |
Docstrings generation was requested by @danolivo. * #84 (comment) The following files were modified: * `internal/cli/cli.go` * `pkg/config/config.go`
The select inside schedulerReloadLoop only watched runCtx.Done() and sighupCh, leaving schedDone unmonitored. If scheduler.RunJobs returned early on its own (e.g. a gocron internal error), the goroutine would write to schedDone and the inner loop would block indefinitely waiting for a signal that never arrives. Add a schedDone case to the select: call schedCancel for cleanup, then propagate a real error or return nil for a clean/canceled exit, matching the existing shutdown handling pattern.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/cli/cli.go`:
- Around line 1438-1454: The goroutine calling apiServer.Run can fail and
currently only calls stop(), which allows the surrounding function to still
return nil; change this to propagate that runtime error: create an error channel
(e.g., errCh) or shared atomic/variable to capture the non-canceled error from
apiServer.Run inside the goroutine (reference apiServer.Run, logger.Error,
stop()), send the error to errCh when it occurs, and then after waiting for
runCtx.Done()/shutdown check errCh and return the captured error instead of nil;
also apply the same propagation for the API-only branch so failures from
apiServer.Run are returned to the caller rather than masked.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 4299f572-ddf1-4649-846d-494562f579e8
📒 Files selected for processing (1)
internal/cli/cli.go
ACE scheduler previously required a full restart to pick up any changes to ace.yaml or pg_service.conf. This adds live reload support via SIGHUP, satisfying the requirement for database-driven config management and Kubernetes ConfigMap updates.
Changes:
Note: pg_service.conf is read lazily at task-execution time and is therefore reloaded implicitly on the next job run after a SIGHUP.