diff --git a/CHANGELOG.md b/CHANGELOG.md index 5defe20..d387d39 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -6,6 +6,19 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/), ## [Unreleased] +### Breaking Changes + +- **`--show-reasoning` / `HYPERAGENT_SHOW_REASONING` removed** — the flag was misnamed (it controls reasoning effort, not the display of reasoning). Use `--reasoning-effort [level]` / `HYPERAGENT_REASONING_EFFORT` instead. Same accepted levels (`low` / `medium` / `high` / `xhigh`, default `high`), same wiring into the Copilot SDK session — the old name is gone, no silent fallback. + +### Added + +- **`--very-verbose` / `-vv` / `HYPERAGENT_VERY_VERBOSE`** — extends `--verbose` so the full result body is printed for **every** tool (audit progress, plugin enable/disable, module registration, intent reports, handler registration, …), not just the sandbox tools. `--verbose` on its own keeps the leaner default (sandbox tool bodies only; one-line `✅ Done` for everything else). +- **`--base-dir ` / `HYPERAGENT_BASE_DIR`** — auto-enables both the `fs-read` and `fs-write` plugins at startup with the supplied directory as their `baseDir`. The directory is created if missing and symlinks are still rejected. Independent of `--auto-approve` — the flag itself is the approval signal for the two first-party path-jailed plugins. + +### Fixed + +- **`--verbose` ignored non-sandbox tool result bodies** — the event handler returned early for anything other than `execute_javascript` / `execute_bash`, so `plugin_info`, `module_info`, `report_intent`, `register_handler`, and friends always rendered as a terse `✅ Done` even in verbose mode. The early-return is gone; `--verbose` now prints sandbox tool bodies and `--very-verbose` prints every tool body. + ## [v0.6.1] - 2026-05-15 ### Fixed diff --git a/docs/USAGE.md b/docs/USAGE.md index f0fa8d1..4b58d3a 100644 --- a/docs/USAGE.md +++ b/docs/USAGE.md @@ -6,31 +6,33 @@ Complete reference for HyperAgent configuration, features, and commands. | Flag | Description | | -------------------------- | -------------------------------------------------------------- | -| `--model ` | LLM model (default: `claude-opus-4.6`) | -| `--cpu-timeout ` | CPU time limit per JS execution (default: 1000) | -| `--wall-timeout ` | Wall-clock backstop per execution (default: 5000) | -| `--send-timeout ` | Agent inactivity timeout (default: 300000) | -| `--heap-size ` | Guest heap size (default: 16) | -| `--scratch-size ` | Guest scratch size, includes stack (default: 16) | -| `--profile ` | Apply resource profile at startup (stackable) | -| `--skill ` | Invoke skill(s) before the prompt | -| `--auto-approve` | Auto-approve all interactive prompts | -| `--prompt ""` | Non-interactive: send prompt, wait for completion, exit | -| `--prompt-file ` | Read the non-interactive prompt from a file | -| `--show-code` | Log generated JS to a timestamped file | -| `--show-timing` | Log timing breakdown to a timestamped file | -| `--show-reasoning [level]` | Set reasoning effort (low\|medium\|high\|xhigh, default: high) | -| `--verbose` | Verbose output mode (scrolling reasoning, turn details) | -| `--transcript` | Record session transcript to `~/.hyperagent/logs/` | -| `--tune` | Capture LLM decision/reasoning logs to JSONL | -| `--plugins-dir ` | Custom plugins directory (default: `./plugins`) | -| `--list-models` | List available models and exit | -| `--resume [id]` | Resume a previous session (latest if no ID given) | -| `--skip-suggest` | Skip mandatory suggest_approach/API-discovery enforcement | -| `--output-threshold ` | Large output threshold in bytes (default: 20480) | -| `--debug` | Enable debug event/lifecycle logging | -| `--version` | Show version and exit | -| `--help` | Show help message | +| `--model ` | LLM model (default: `claude-opus-4.6`) | +| `--cpu-timeout ` | CPU time limit per JS execution (default: 1000) | +| `--wall-timeout ` | Wall-clock backstop per execution (default: 5000) | +| `--send-timeout ` | Agent inactivity timeout (default: 300000) | +| `--heap-size ` | Guest heap size (default: 16) | +| `--scratch-size ` | Guest scratch size, includes stack (default: 16) | +| `--profile ` | Apply resource profile at startup (stackable) | +| `--skill ` | Invoke skill(s) before the prompt | +| `--auto-approve` / `--yolo` | Auto-approve all interactive prompts | +| `--base-dir ` | Auto-enable fs-read + fs-write with this directory as their base | +| `--prompt ""` | Non-interactive: send prompt, wait for completion, exit | +| `--prompt-file ` | Read the non-interactive prompt from a file | +| `--show-code` | Log generated JS to a timestamped file | +| `--show-timing` | Log timing breakdown to a timestamped file | +| `--reasoning-effort [level]` | Set reasoning effort (low\|medium\|high\|xhigh, default: high) | +| `--verbose` | Stream reasoning + show sandbox tool result bodies | +| `--very-verbose` / `-vv` | Like `--verbose` plus full result bodies for **every** tool (audit/registration/…) | +| `--transcript` | Record session transcript to `~/.hyperagent/logs/` | +| `--tune` | Capture LLM decision/reasoning logs to JSONL | +| `--plugins-dir ` | Custom plugins directory (default: `./plugins`) | +| `--list-models` | List available models and exit | +| `--resume [id]` | Resume a previous session (latest if no ID given) | +| `--skip-suggest` | Skip mandatory suggest_approach/API-discovery enforcement | +| `--output-threshold ` | Large output threshold in bytes (default: 20480) | +| `--debug` | Enable debug event/lifecycle logging | +| `--version` | Show version and exit | +| `--help` | Show help message | ## Environment Variables @@ -52,8 +54,10 @@ All configuration is also available via environment variables (overridden by CLI | `HYPERAGENT_PROMPT_FILE` | _(none)_ | File containing the non-interactive prompt | | `HYPERAGENT_SKILL` | _(none)_ | Skill name(s) to invoke | | `HYPERAGENT_TUNE` | _(none)_ | Set to `1` to capture LLM decision logs | -| `HYPERAGENT_SHOW_REASONING` | _(none)_ | Reasoning effort level (low/medium/high/xhigh) | +| `HYPERAGENT_REASONING_EFFORT` | _(none)_ | Reasoning effort level (low/medium/high/xhigh) | | `HYPERAGENT_VERBOSE` | _(none)_ | Set to `1` for verbose output mode | +| `HYPERAGENT_VERY_VERBOSE` | _(none)_ | Set to `1` for very-verbose output (full body for **every** tool) | +| `HYPERAGENT_BASE_DIR` | _(none)_ | Base directory for fs-read/fs-write plugins (auto-enables both) | | `HYPERAGENT_LIST_MODELS` | _(none)_ | Set to `1` to list models and exit | | `HYPERAGENT_RESUME_SESSION` | _(none)_ | Session ID to resume, or `__last__` for latest | | `HYPERAGENT_PLUGINS_DIR` | _(none)_ | Custom plugins directory path | diff --git a/src/agent/cli-parser.ts b/src/agent/cli-parser.ts index 301496d..90869b4 100644 --- a/src/agent/cli-parser.ts +++ b/src/agent/cli-parser.ts @@ -10,6 +10,38 @@ import { readFileSync } from "node:fs"; import type { MCPSetupCommand } from "./mcp/setup-commands.js"; +/** + * Reasoning-effort levels accepted by the Copilot SDK session config. + * The CLI flag handler and the env-var initialiser both validate against + * this list so an unexpected value (e.g. `HYPERAGENT_REASONING_EFFORT=potato`) + * is rejected before it can reach the SDK and cause a runtime error. + */ +const VALID_REASONING_EFFORTS = ["low", "medium", "high", "xhigh"] as const; + +/** + * Normalise a reasoning-effort string from env or CLI: lowercase, then + * accept only values in {@link VALID_REASONING_EFFORTS}. Anything else + * (including `undefined`/empty) maps to `""` which the agent treats as + * "unset" — the SDK falls back to the model's default. + */ +function normaliseReasoningEffort(raw: string | undefined): string { + if (!raw) return ""; + const lower = raw.toLowerCase(); + return (VALID_REASONING_EFFORTS as readonly string[]).includes(lower) + ? lower + : ""; +} + +/** + * Trim a base-dir string from env or CLI. A value of `" "` (whitespace-only) + * becomes `""`, which the agent treats as "unset" and skips the fs-read / + * fs-write auto-enable block — avoiding the surprising failure mode where + * `resolve("")` returns `process.cwd()` and silently makes CWD the sandbox. + */ +function normaliseBaseDir(raw: string | undefined): string { + return raw?.trim() ?? ""; +} + export interface CliConfig { model: string; cpuTimeout: string; @@ -19,8 +51,23 @@ export interface CliConfig { scratchSize: string; showCode: boolean; showTiming: boolean; - showReasoning: string; + /** + * Reasoning effort level requested for the session, one of + * "low" | "medium" | "high" | "xhigh" — or "" when unset. + * Wired to `state.reasoningEffort` at startup. CLI: `--reasoning-effort ` + * (default: "high" when flag given without value). Env: HYPERAGENT_REASONING_EFFORT. + */ + reasoningEffort: string; verbose: boolean; + /** + * Very-verbose output: show full result bodies for *all* tools, including + * non-sandbox protocol tools (plugin_info, module_info, register_handler, + * suggest_approach, etc.). Plain `--verbose` only shows full bodies for + * `execute_javascript` / `execute_bash`; `--very-verbose` adds the rest. + * CLI: `--very-verbose` or `-vv`. Env: HYPERAGENT_VERY_VERBOSE. + * Implies `--verbose`; standalone `-vv` enables both. + */ + veryVerbose: boolean; /** Render LLM markdown output with ANSI formatting (headings, code, lists). */ markdown: boolean; transcript: boolean; @@ -40,6 +87,15 @@ export interface CliConfig { * audit approvals, module registration). YOLO mode. 🎸 */ autoApprove: boolean; + /** + * Base directory for the fs-read and fs-write plugins. When set, both + * plugins are auto-enabled at startup with this directory as their + * `baseDir` config, replacing the default "unique-temp-dir" sandbox. + * CLI: `--base-dir `. Env: HYPERAGENT_BASE_DIR. + * Independent of `--auto-approve` / `--yolo` (works on its own). + * Path is resolved relative to cwd; created if missing; symlinks rejected. + */ + baseDir: string; /** * Non-interactive prompt — send this message, wait for completion, exit. * Combines with --auto-approve for fully autonomous operation. @@ -102,8 +158,12 @@ Options: --scratch-size Guest scratch size (default: ${defaults.scratchSize}) --show-code Log generated JS to ~/.hyperagent/logs/ --show-timing Log timing breakdown to ~/.hyperagent/logs/ - --show-reasoning [level] Set reasoning effort (low|medium|high|xhigh, default: high) - --verbose Verbose output mode (scrolling reasoning, turn details) + --reasoning-effort [level] Set reasoning effort (low|medium|high|xhigh, default: high) + Env: HYPERAGENT_REASONING_EFFORT + --verbose Stream reasoning + show sandbox tool result bodies + --very-verbose, -vv Like --verbose, plus show full bodies for ALL tools + (including plugin_info, module_info, register_handler, etc.) + Env: HYPERAGENT_VERY_VERBOSE --[no-]markdown Toggle markdown rendering (default: on, env: HYPERAGENT_MARKDOWN) Aliases: --md, --no-md --transcript Record session transcript to ~/.hyperagent/logs/ @@ -115,7 +175,10 @@ Options: --profile Apply resource profile at startup (limits only) Stack: --profile "web-research heavy-compute" Profiles: default, file-builder, web-research, heavy-compute, mcp-network - --auto-approve Auto-approve all interactive prompts (YOLO mode) + --auto-approve, --yolo Auto-approve all interactive prompts (YOLO mode) + --base-dir Base dir for fs-read + fs-write (auto-enables both plugins, + created if missing, symlinks rejected) + Env: HYPERAGENT_BASE_DIR --prompt "" Send a prompt non-interactively and exit after completion --prompt-file Read prompt from a file (avoids shell quoting issues) --skill Invoke skill(s) before the prompt (e.g. --skill pptx-expert) @@ -151,7 +214,10 @@ Environment variables (overridden by CLI flags): HYPERLIGHT_HEAP_SIZE_MB Heap size (megabytes) HYPERLIGHT_SCRATCH_SIZE_MB Scratch size (megabytes) HYPERAGENT_DEBUG Set to '1' for debug logging + HYPERAGENT_REASONING_EFFORT Reasoning effort level (low/medium/high/xhigh) HYPERAGENT_VERBOSE Set to '1' for verbose output mode + HYPERAGENT_VERY_VERBOSE Set to '1' for very-verbose output (all tool bodies) + HYPERAGENT_BASE_DIR Base dir for fs-read + fs-write plugins HYPERAGENT_PROFILE Profile name(s) to apply at startup HYPERAGENT_PROMPT Non-interactive prompt text HYPERAGENT_PROMPT_FILE Path to file containing prompt text @@ -180,8 +246,18 @@ export function parseCliArgs( scratchSize: process.env.HYPERLIGHT_SCRATCH_SIZE_MB || "16", showCode: false, showTiming: false, - showReasoning: process.env.HYPERAGENT_SHOW_REASONING || "", - verbose: process.env.HYPERAGENT_VERBOSE === "1", + reasoningEffort: normaliseReasoningEffort( + process.env.HYPERAGENT_REASONING_EFFORT, + ), + // HYPERAGENT_VERY_VERBOSE implies HYPERAGENT_VERBOSE — keeps the env-var + // path symmetric with the CLI flag (--very-verbose implies --verbose). + // Without this, env-var-only --very-verbose would set `veryVerbose=true` + // but `verbose=false`, and the event-handler gate (`verboseOutput && ...`) + // would silently suppress all tool bodies. + verbose: + process.env.HYPERAGENT_VERBOSE === "1" || + process.env.HYPERAGENT_VERY_VERBOSE === "1", + veryVerbose: process.env.HYPERAGENT_VERY_VERBOSE === "1", markdown: process.env.HYPERAGENT_MARKDOWN !== "0", transcript: process.env.HYPERAGENT_TRANSCRIPT === "1", listModels: process.env.HYPERAGENT_LIST_MODELS === "1", @@ -191,6 +267,7 @@ export function parseCliArgs( tune: process.env.HYPERAGENT_TUNE === "1", profile: process.env.HYPERAGENT_PROFILE || "", autoApprove: process.env.HYPERAGENT_AUTO_APPROVE === "1", + baseDir: normaliseBaseDir(process.env.HYPERAGENT_BASE_DIR), prompt: process.env.HYPERAGENT_PROMPT || "", promptFile: process.env.HYPERAGENT_PROMPT_FILE || "", skill: process.env.HYPERAGENT_SKILL || "", @@ -251,22 +328,34 @@ export function parseCliArgs( case "--show-timing": config.showTiming = true; break; - case "--show-reasoning": { - // --show-reasoning can optionally take an effort level argument + case "--reasoning-effort": { + // --reasoning-effort can optionally take an effort level argument const nextArg = argv[i + 1]; - const validEfforts = ["low", "medium", "high", "xhigh"]; - if (nextArg && validEfforts.includes(nextArg.toLowerCase())) { - config.showReasoning = nextArg.toLowerCase(); + if ( + nextArg && + (VALID_REASONING_EFFORTS as readonly string[]).includes( + nextArg.toLowerCase(), + ) + ) { + config.reasoningEffort = nextArg.toLowerCase(); i++; } else { // No argument or invalid → default to "high" - config.showReasoning = "high"; + config.reasoningEffort = "high"; } break; } case "--verbose": config.verbose = true; break; + case "--very-verbose": + case "-vv": + // --very-verbose implies --verbose: standalone -vv enables both. + // Plain --verbose without --very-verbose only shows full bodies for + // sandbox tools (execute_javascript / execute_bash). + config.verbose = true; + config.veryVerbose = true; + break; case "--no-markdown": case "--no-md": config.markdown = false; @@ -316,6 +405,18 @@ export function parseCliArgs( case "--yolo": config.autoApprove = true; break; + case "--base-dir": { + // Trim at parse-time: --base-dir " " should be rejected, not + // silently resolved to process.cwd() later via resolve(""). + const raw = argv[++i] ?? ""; + const trimmed = raw.trim(); + if (!trimmed) { + console.error("--base-dir requires a non-empty path"); + process.exit(1); + } + config.baseDir = trimmed; + break; + } case "--prompt": config.prompt = argv[++i] ?? ""; if (!config.prompt) { diff --git a/src/agent/commands.ts b/src/agent/commands.ts index ac697d1..cefcdcf 100644 --- a/src/agent/commands.ts +++ b/src/agent/commands.ts @@ -70,7 +70,7 @@ const COMMANDS: readonly CommandEntry[] = Object.freeze([ "\n" + "Levels: low | medium | high | xhigh\n" + "\n" + - "Also: --show-reasoning [level] CLI flag.", + "Also: --reasoning-effort [level] CLI flag.", }, { completion: "/reasoning audit ", diff --git a/src/agent/event-handler.ts b/src/agent/event-handler.ts index 3e7319f..72ff1cf 100644 --- a/src/agent/event-handler.ts +++ b/src/agent/event-handler.ts @@ -309,22 +309,30 @@ export function registerEventHandler( : "unknown"; if (callId) pendingTools.delete(callId); - // Skip noisy protocol tools in non-debug mode + // ── Result-body gating ───────────────────────────────────── + // Sandbox tools (execute_javascript / execute_bash) are the + // LLM's primary work — `--verbose` is enough to see their + // full output. Non-sandbox tools (plugin_info, module_info, + // register_handler, suggest_approach, etc.) are protocol / + // infrastructure with frequently-huge JSON payloads — gate + // their bodies behind `--very-verbose` so plain `--verbose` + // stays readable. Terse `✅ Done` / `❌ error` lines are + // emitted for every tool in every mode, so the user always + // sees that the tool completed. const isSandboxTool = toolName === "execute_javascript" || toolName === "execute_bash"; - if (!isSandboxTool) { - if (state.debugEnabled) { - const status = event.data?.success ? "✅" : "❌"; - debugLog(`${status} ${toolName} complete`); - } - break; + const showFullBody = + state.verboseOutput && (isSandboxTool || state.veryVerboseOutput); + if (state.debugEnabled) { + const status = event.data?.success ? "✅" : "❌"; + debugLog(`${status} ${toolName} complete`); } - // Show result summary for our sandbox tools + // Show result summary if (event.data?.success) { // In non-verbose mode, still show errors — but skip verbose // result display since the LLM will summarise for the user. - if (!state.verboseOutput) { + if (!showFullBody) { const content = event.data?.result?.content ?? ""; let parsed; try { diff --git a/src/agent/index.ts b/src/agent/index.ts index af5bf6b..3e12bb6 100644 --- a/src/agent/index.ts +++ b/src/agent/index.ts @@ -294,6 +294,7 @@ if (cli.profile) { } if (cli.verbose) process.env.HYPERAGENT_VERBOSE = "1"; +if (cli.veryVerbose) process.env.HYPERAGENT_VERY_VERBOSE = "1"; if (cli.debug) process.env.HYPERAGENT_DEBUG = "1"; // Conditionally allow the tuning tool through the gate @@ -312,6 +313,7 @@ import { copyFileSync, unlinkSync, rmSync, + lstatSync, type WriteStream, } from "node:fs"; @@ -780,6 +782,112 @@ if (discoveredCount > 0 && cli.verbose) { } } +// ── Apply --base-dir to fs-read + fs-write at startup ──────────────── +// When --base-dir is set, both plugins are auto-audited (LOW risk — +// they are first-party path-jailed builtins), auto-approved (the user +// supplied the CLI flag, which IS the approval signal), configured +// with the supplied path as `baseDir`, and enabled. Per-plugin failures +// log a warning and skip rather than aborting startup. +if (cli.baseDir) { + // cli.baseDir was already trimmed in the parser. + const resolvedBaseDir = resolve(cli.baseDir); + + // Up-front validation of the supplied directory: create it if missing, + // reject if it's a symlink (consistent with the fs-read/fs-write runtime + // no-symlink policy), reject if it's not a directory at all. If any of + // these checks fail we skip the whole auto-enable block rather than + // printing a misleading "Enabled" line for a baseDir that won't work. + let baseDirOk = true; + try { + const stat = lstatSync(resolvedBaseDir); + if (stat.isSymbolicLink()) { + console.warn( + ` ${C.warn("⚠️")} --base-dir: "${resolvedBaseDir}" is a symlink — symlinks are rejected for security. Skipping fs-read/fs-write auto-enable.`, + ); + baseDirOk = false; + } else if (!stat.isDirectory()) { + console.warn( + ` ${C.warn("⚠️")} --base-dir: "${resolvedBaseDir}" exists but is not a directory. Skipping fs-read/fs-write auto-enable.`, + ); + baseDirOk = false; + } + } catch (err) { + if ((err as NodeJS.ErrnoException).code === "ENOENT") { + // Doesn't exist yet — create it (recursive). + try { + mkdirSync(resolvedBaseDir, { recursive: true }); + } catch (mkErr) { + console.warn( + ` ${C.warn("⚠️")} --base-dir: failed to create "${resolvedBaseDir}": ${(mkErr as Error).message}. Skipping fs-read/fs-write auto-enable.`, + ); + baseDirOk = false; + } + } else { + console.warn( + ` ${C.warn("⚠️")} --base-dir: cannot access "${resolvedBaseDir}": ${(err as Error).message}. Skipping fs-read/fs-write auto-enable.`, + ); + baseDirOk = false; + } + } + + if (baseDirOk) { + for (const name of ["fs-read", "fs-write"] as const) { + const plugin = pluginManager.getPlugin(name); + if (!plugin) { + console.warn( + ` ${C.warn("⚠️")} --base-dir: plugin "${name}" not discovered, skipping`, + ); + continue; + } + const hash = computePluginHash(plugin.dir); + if (!hash) { + console.warn( + ` ${C.warn("⚠️")} --base-dir: could not hash "${name}" source, skipping`, + ); + continue; + } + pluginManager.setAuditResult(name, { + contentHash: hash, + auditedAt: new Date().toISOString(), + findings: [], + riskLevel: "LOW", + summary: `First-party ${name} plugin with path-jail enforcement.`, + descriptionAccurate: true, + capabilities: + name === "fs-read" + ? ["Read files under baseDir"] + : ["Write files under baseDir"], + riskReasons: [ + "Operates only under the configured baseDir (symlinks rejected)", + ], + recommendation: { + verdict: "approve", + reason: "Auto-approved via --base-dir CLI flag", + }, + }); + pluginManager.approve(name); + pluginManager.setConfig(name, { + ...plugin.config, + baseDir: resolvedBaseDir, + }); + // Load source into plugin.source — required for verifySourceHash() + // to pass when syncPluginsToSandbox runs. Without this, the hash + // check fails ("source changed since audit") and the plugin is + // silently rejected. Mirrors the fast-path in /plugin enable. + if (!pluginManager.loadSource(name)) { + console.warn( + ` ${C.warn("⚠️")} --base-dir: failed to load "${name}" source, skipping enable`, + ); + continue; + } + pluginManager.enable(name); + console.log( + ` ${C.ok("✅")} Enabled ${name} with baseDir: ${resolvedBaseDir}`, + ); + } + } +} + // ── MCP Integration ────────────────────────────────────────────────── import { parseMCPConfig } from "./mcp/config.js"; import { @@ -1067,9 +1175,9 @@ const state = createAgentState(cli, { showTiming: !!sandbox.config.timingLogPath, }); -// Wire CLI --show-reasoning to state.reasoningEffort -if (cli.showReasoning) { - state.reasoningEffort = cli.showReasoning as +// Wire CLI --reasoning-effort to state.reasoningEffort +if (cli.reasoningEffort) { + state.reasoningEffort = cli.reasoningEffort as | "low" | "medium" | "high" diff --git a/src/agent/state.ts b/src/agent/state.ts index fe2244c..c9bed10 100644 --- a/src/agent/state.ts +++ b/src/agent/state.ts @@ -94,6 +94,16 @@ export interface AgentState { */ verboseOutput: boolean; + /** + * Very-verbose output mode. When true, full result bodies are shown + * for *all* tools (including non-sandbox protocol tools like + * `plugin_info`, `module_info`, `register_handler`, `suggest_approach`). + * Only meaningful when `verboseOutput` is also true — plain `--verbose` + * only shows full bodies for sandbox tools (execute_javascript / execute_bash). + * Set via `--very-verbose` / `-vv` CLI flag (which also enables verbose). + */ + veryVerboseOutput: boolean; + /** * Markdown rendering mode. When true, LLM output is buffered * (not streamed character-by-character) and rendered through @@ -358,6 +368,7 @@ export function createAgentState( scratchOverride: null, reasoningEffort: null, verboseOutput: cli.verbose, + veryVerboseOutput: cli.veryVerbose, markdownEnabled: cli.markdown ?? true, auditReasoningEffort: null, diff --git a/tests/cli-parser.test.ts b/tests/cli-parser.test.ts new file mode 100644 index 0000000..c024627 --- /dev/null +++ b/tests/cli-parser.test.ts @@ -0,0 +1,280 @@ +// ── CLI Parser Tests ───────────────────────────────────────────────── +// +// Covers the breaking-change flag cleanup: +// - `--reasoning-effort` (renamed from `--show-reasoning`) +// - `--very-verbose` / `-vv` (new) +// - `--base-dir` / `HYPERAGENT_BASE_DIR` (new) +// - rejection of the removed `--show-reasoning` flag +// +// Also exercises the `--yolo` alias for `--auto-approve`. +// ───────────────────────────────────────────────────────────────────── + +import { afterEach, beforeEach, describe, expect, it, vi } from "vitest"; +import { parseCliArgs } from "../src/agent/cli-parser.js"; + +// Env vars that parseCliArgs consults at call-time. Snapshot and restore +// around each test so the host environment can't leak into assertions. +const ENV_KEYS = [ + "HYPERAGENT_REASONING_EFFORT", + "HYPERAGENT_VERBOSE", + "HYPERAGENT_VERY_VERBOSE", + "HYPERAGENT_BASE_DIR", + "HYPERAGENT_AUTO_APPROVE", +] as const; + +describe("parseCliArgs — breaking flag cleanup", () => { + let savedEnv: Record; + + beforeEach(() => { + savedEnv = {}; + for (const key of ENV_KEYS) { + savedEnv[key] = process.env[key]; + delete process.env[key]; + } + }); + + afterEach(() => { + for (const key of ENV_KEYS) { + if (savedEnv[key] === undefined) { + delete process.env[key]; + } else { + process.env[key] = savedEnv[key]; + } + } + vi.restoreAllMocks(); + }); + + // ── --reasoning-effort ─────────────────────────────────────────── + describe("--reasoning-effort", () => { + it("defaults to empty when not given (env nor flag)", () => { + const cfg = parseCliArgs([]); + expect(cfg.reasoningEffort).toBe(""); + }); + + it("defaults to 'high' when flag given without a level", () => { + const cfg = parseCliArgs(["--reasoning-effort"]); + expect(cfg.reasoningEffort).toBe("high"); + }); + + it("accepts low/medium/high/xhigh (case-insensitive)", () => { + for (const level of ["low", "medium", "high", "xhigh"]) { + expect( + parseCliArgs(["--reasoning-effort", level]).reasoningEffort, + ).toBe(level); + expect( + parseCliArgs(["--reasoning-effort", level.toUpperCase()]) + .reasoningEffort, + ).toBe(level); + } + }); + + it("falls back to 'high' when the next arg is not a valid level", () => { + // Next token is treated as belonging to a later flag; parser + // defaults the effort to 'high' and does NOT consume the token. + const cfg = parseCliArgs(["--reasoning-effort", "--verbose"]); + expect(cfg.reasoningEffort).toBe("high"); + expect(cfg.verbose).toBe(true); + }); + + it("reads HYPERAGENT_REASONING_EFFORT env var", () => { + process.env.HYPERAGENT_REASONING_EFFORT = "medium"; + expect(parseCliArgs([]).reasoningEffort).toBe("medium"); + }); + + it("CLI flag overrides env var", () => { + process.env.HYPERAGENT_REASONING_EFFORT = "low"; + expect( + parseCliArgs(["--reasoning-effort", "xhigh"]).reasoningEffort, + ).toBe("xhigh"); + }); + + it("HYPERAGENT_REASONING_EFFORT with an invalid value falls back to ''", () => { + // Regression: an unexpected env value (e.g. typo, leftover from an + // older flag schema) must NOT propagate verbatim to the SDK union + // type — it should be treated as unset so the SDK falls back to its + // default reasoning level instead of throwing at session-config time. + process.env.HYPERAGENT_REASONING_EFFORT = "potato"; + expect(parseCliArgs([]).reasoningEffort).toBe(""); + }); + + it("HYPERAGENT_REASONING_EFFORT is case-normalised (HIGH → high)", () => { + // Symmetry with the CLI flag handler which lowercases its argument. + process.env.HYPERAGENT_REASONING_EFFORT = "HIGH"; + expect(parseCliArgs([]).reasoningEffort).toBe("high"); + }); + }); + + // ── --show-reasoning is REMOVED (hard break) ───────────────────── + describe("--show-reasoning (removed)", () => { + it("rejects --show-reasoning with 'Unknown option' and exits", () => { + const exitSpy = vi.spyOn(process, "exit").mockImplementation((( + code?: number, + ) => { + throw new Error(`__exit_${code}`); + }) as never); + const errSpy = vi.spyOn(console, "error").mockImplementation(() => {}); + + expect(() => parseCliArgs(["--show-reasoning"])).toThrow("__exit_1"); + expect(errSpy).toHaveBeenCalledWith( + expect.stringContaining("Unknown option: --show-reasoning"), + ); + exitSpy.mockRestore(); + errSpy.mockRestore(); + }); + + it("ignores HYPERAGENT_SHOW_REASONING (old env var is dead)", () => { + // The old env var should not be wired anywhere. Setting it must not + // affect reasoningEffort. + process.env.HYPERAGENT_SHOW_REASONING = "xhigh"; + const cfg = parseCliArgs([]); + expect(cfg.reasoningEffort).toBe(""); + delete process.env.HYPERAGENT_SHOW_REASONING; + }); + }); + + // ── --very-verbose / -vv ───────────────────────────────────────── + describe("--very-verbose / -vv", () => { + it("defaults to false when not given", () => { + const cfg = parseCliArgs([]); + expect(cfg.veryVerbose).toBe(false); + expect(cfg.verbose).toBe(false); + }); + + it("--very-verbose sets BOTH verbose AND veryVerbose", () => { + const cfg = parseCliArgs(["--very-verbose"]); + expect(cfg.verbose).toBe(true); + expect(cfg.veryVerbose).toBe(true); + }); + + it("-vv is equivalent to --very-verbose", () => { + const cfg = parseCliArgs(["-vv"]); + expect(cfg.verbose).toBe(true); + expect(cfg.veryVerbose).toBe(true); + }); + + it("--verbose on its own does NOT enable veryVerbose", () => { + const cfg = parseCliArgs(["--verbose"]); + expect(cfg.verbose).toBe(true); + expect(cfg.veryVerbose).toBe(false); + }); + + it("HYPERAGENT_VERY_VERBOSE=1 enables veryVerbose (env)", () => { + process.env.HYPERAGENT_VERY_VERBOSE = "1"; + const cfg = parseCliArgs([]); + expect(cfg.veryVerbose).toBe(true); + }); + + it("HYPERAGENT_VERY_VERBOSE=1 ALSO enables verbose (env-path symmetry)", () => { + // Regression: without this, env-var-only very-verbose would set + // veryVerbose=true but verbose=false, and the event-handler gate + // (`verboseOutput && (isSandbox || veryVerbose)`) would silently + // suppress all tool bodies — defeating the whole flag. + process.env.HYPERAGENT_VERY_VERBOSE = "1"; + delete process.env.HYPERAGENT_VERBOSE; + const cfg = parseCliArgs([]); + expect(cfg.verbose).toBe(true); + expect(cfg.veryVerbose).toBe(true); + }); + + it("HYPERAGENT_VERY_VERBOSE=0 leaves veryVerbose false", () => { + process.env.HYPERAGENT_VERY_VERBOSE = "0"; + const cfg = parseCliArgs([]); + expect(cfg.veryVerbose).toBe(false); + }); + }); + + // ── --base-dir ─────────────────────────────────────────────────── + describe("--base-dir", () => { + it("defaults to empty string when not given", () => { + expect(parseCliArgs([]).baseDir).toBe(""); + }); + + it("accepts a path argument", () => { + const cfg = parseCliArgs(["--base-dir", "/tmp/sandbox"]); + expect(cfg.baseDir).toBe("/tmp/sandbox"); + }); + + it("preserves the raw value (no resolution at parse-time)", () => { + // Path resolution happens in index.ts after parse — keep the parser + // pure so it can be unit-tested without filesystem context. + const cfg = parseCliArgs(["--base-dir", "./relative/path"]); + expect(cfg.baseDir).toBe("./relative/path"); + }); + + it("exits when --base-dir has no value", () => { + const exitSpy = vi.spyOn(process, "exit").mockImplementation((( + code?: number, + ) => { + throw new Error(`__exit_${code}`); + }) as never); + const errSpy = vi.spyOn(console, "error").mockImplementation(() => {}); + + expect(() => parseCliArgs(["--base-dir"])).toThrow("__exit_1"); + expect(errSpy).toHaveBeenCalledWith( + "--base-dir requires a non-empty path", + ); + exitSpy.mockRestore(); + errSpy.mockRestore(); + }); + + it("rejects whitespace-only --base-dir value with exit", () => { + // Regression: `--base-dir " "` was previously truthy and let + // `index.ts` call `resolve("".trim())` → `process.cwd()`, silently + // making CWD the sandbox root. Parser must trim+reject at the boundary. + const exitSpy = vi.spyOn(process, "exit").mockImplementation((( + code?: number, + ) => { + throw new Error(`__exit_${code}`); + }) as never); + const errSpy = vi.spyOn(console, "error").mockImplementation(() => {}); + + expect(() => parseCliArgs(["--base-dir", " "])).toThrow("__exit_1"); + expect(errSpy).toHaveBeenCalledWith( + "--base-dir requires a non-empty path", + ); + exitSpy.mockRestore(); + errSpy.mockRestore(); + }); + + it("trims whitespace around --base-dir value", () => { + // Tabs / spaces around the path are stripped — keeps the parser + // forgiving for shell-mangled args while still rejecting empty. + const cfg = parseCliArgs(["--base-dir", " /tmp/foo "]); + expect(cfg.baseDir).toBe("/tmp/foo"); + }); + + it("reads HYPERAGENT_BASE_DIR env var", () => { + process.env.HYPERAGENT_BASE_DIR = "/var/data"; + expect(parseCliArgs([]).baseDir).toBe("/var/data"); + }); + + it("treats whitespace-only HYPERAGENT_BASE_DIR as unset", () => { + // Symmetry with the CLI flag: env-var path must also trim and treat + // an empty-after-trim string as missing rather than letting it flow + // into `resolve("".trim())` → `process.cwd()`. + process.env.HYPERAGENT_BASE_DIR = " "; + expect(parseCliArgs([]).baseDir).toBe(""); + }); + + it("CLI flag overrides env var", () => { + process.env.HYPERAGENT_BASE_DIR = "/from/env"; + const cfg = parseCliArgs(["--base-dir", "/from/cli"]); + expect(cfg.baseDir).toBe("/from/cli"); + }); + }); + + // ── --yolo (alias) ─────────────────────────────────────────────── + describe("--yolo / --auto-approve", () => { + it("--yolo is equivalent to --auto-approve", () => { + expect(parseCliArgs(["--yolo"]).autoApprove).toBe(true); + expect(parseCliArgs(["--auto-approve"]).autoApprove).toBe(true); + }); + + it("--yolo does NOT auto-enable --base-dir", () => { + // Sanity check: the two flags are independent. + const cfg = parseCliArgs(["--yolo"]); + expect(cfg.autoApprove).toBe(true); + expect(cfg.baseDir).toBe(""); + }); + }); +});