Skip to content

benchmark/v2: v2.5 delta — S-02 16k padding + preflight, per-hook sweep, run plans, AWS docs#57

Open
Kaushik985 wants to merge 1 commit into
AlphaBitCore:mainfrom
Kaushik985:feat/bench-v25-delta
Open

benchmark/v2: v2.5 delta — S-02 16k padding + preflight, per-hook sweep, run plans, AWS docs#57
Kaushik985 wants to merge 1 commit into
AlphaBitCore:mainfrom
Kaushik985:feat/bench-v25-delta

Conversation

@Kaushik985

Copy link
Copy Markdown

Incremental v2.5 benchmark work on top of the benchmark/v2 harness already on main. ~29 new files + small additive edits (+4651/-27); no reverts of upstream.

Highlights

  • S-02: dataset token preflight (hard-stop on under-padded data), ~16k real-token padding, mock co-location note, logged VU-halving.
  • per_hook_sweep.sh + hooks_toggle.sh (snapshot-restore) for per-hook latency isolation.
  • v2.5 run/ablation plans, AWS runbook hooks-workflow section, James demo artifacts, results + invalid-run quarantine.

Not included: the ai-gateway env flags NEXUS_TRACE_LATENCY / NEXUS_AUDIT_DISABLED — proxy.go was refactored upstream into a stage pipeline; those belong on the new structure (proxy owner to apply).

AWS IPs / instance IDs redacted (public repo). 🤖 Generated with Claude Code

…sweep, run plans, AWS docs

Incremental v2.5 work on top of the benchmark/v2 harness already on main. Mostly
additive (29 new files); the few modified files extend prior versions, no reverts.

Harness:
- scenarios/s02_long_context.py: dataset token preflight (SystemExit on under-padded
  data) + mock co-location methodology note + logged VU-halving (BENCH_S02_NO_HALVE).
- scenarios/s04_concurrency_sweep.py: BENCH_VU_LEVELS / BENCH_LEVEL_DURATION overrides.
- scripts/pad_long_context_dataset.py: token-targeted padding (~16k real cl100k tokens).
- scripts/per_hook_sweep.sh, hooks_toggle.sh: per-hook isolation + snapshot-restore toggle.
- tests/test_s02_preflight.py: preflight guard unit tests.
- datasets/long_context_v2*.json: regenerated to ~16k real tokens.

Docs/plans: V25_RUN_PLAN, V25_ABLATION_PLAN, CLAUDE-CODE-V25-* task specs, AWS_RUNBOOK
hooks-workflow section, AWS_DEPLOYMENT_PROMPTS, JAMES_* demo artifacts, results +
results/invalid quarantine. AWS IPs / instance IDs redacted (public repo).

NOTE: the two ai-gateway env flags (NEXUS_TRACE_LATENCY / NEXUS_AUDIT_DISABLED) are
NOT in this PR — proxy.go was refactored upstream into a stage pipeline; those flags
should be re-applied onto the new structure by the proxy owner.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants