Skip to content

tower: cron condition evaluator throws ReferenceError: exitCode is not defined #1048

@amrmelsayed

Description

@amrmelsayed

Symptom

Tower log shows the cron evaluator throwing on every tick for tasks whose condition references exitCode:

```
[WARN] Condition evaluation failed for 'Service Health Check': ReferenceError: exitCode is not defined
[WARN] Cron command failed for 'Service Health Check': FAILURES: ...
[INFO] Cron task 'Service Health Check' completed: failure
```

Repeats reliably every cron tick. The task ends up marked failed, so notification logic for failure cases also fires repeatedly, generating noise.

Root cause

packages/codev/src/agent-farm/servers/tower-cron.ts:293-297:

```typescript
export function evaluateCondition(condition: string, output: string): boolean {
// eslint-disable-next-line no-new-func
const fn = new Function('output', 'return ' + condition);
return !!fn(output);
}
```

The function takes only output as a parameter. Any cron task whose condition references additional variables (e.g. exitCode, stderr, durationMs) hits a ReferenceError at evaluation time.

At least one shipped task — Service Health Check — has a condition that references exitCode. The task author reasonably expected that the cron's full execution result (stdout + stderr + exit code) would be in scope, since the cron also captures all of them for logging.

Proposed fix

Two halves, both worth doing:

  1. Widen the evaluator's scope to include exit code and stderr. Update evaluateCondition to accept (condition, { output, stderr, exitCode, durationMs }) and pass them all to the constructed function. Callers in tower-cron.ts:249 pass the full result instead of just trimmed stdout.

  2. Document the available variables in the condition DSL. Update wherever cron-task condition syntax is documented (skeleton's team-update.yaml/cron docs, role docs, README) to enumerate exactly what's in scope so the next task author isn't guessing.

The fix is small (mechanical scope widening + caller update). Estimated < 100 LOC including the test.

Acceptance criteria

  • evaluateCondition accepts an object of { output, stderr, exitCode, durationMs } as the second parameter, exposes all four as variables in scope of the condition expression.
  • The caller in tower-cron.ts:249 passes the full result object.
  • Unit tests cover: existing output-only conditions still work (back-compat), exitCode-based conditions work, stderr-based conditions work, durationMs-based conditions work, references to undeclared variables still throw (and the error is logged with the offending condition string for diagnosability).
  • At least one shipped condition (the Service Health Check example) is verified to evaluate without ReferenceError after the fix.
  • Cron-condition DSL documentation explicitly lists the four in-scope variables.

Suggested protocol

BUGFIX. The fix surface and root cause are both known; the scope is small and mechanical. No architectural decisions, no UI verification needed. Single builder.

Out of scope

  • The wider "Tower terminals go non-responsive over time" hang. Filed as a separate area/tower PIR.
  • Restricting / sandboxing the condition DSL further (currently new Function(...) evaluates arbitrary JS; that's a separate hardening conversation).

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/towerArea: Tower server / agent farm CLI

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions