Skip to content

feat(loadgen): adding loadgen binary#3335

Open
chatton wants to merge 15 commits into
mainfrom
cian/benchmarks-binary
Open

feat(loadgen): adding loadgen binary#3335
chatton wants to merge 15 commits into
mainfrom
cian/benchmarks-binary

Conversation

@chatton
Copy link
Copy Markdown
Contributor

@chatton chatton commented Jun 2, 2026

Overview

Adding a load generating binary that uses spamoor to spam transactions both sustained over a period of time, as well as bursts.

Summary by CodeRabbit

  • New Features

    • Added a standalone load generator CLI with commands to run, start scheduled workloads, and perform health checks against the backend.
    • Provided container and compose configurations for running the load generator.
  • Documentation

    • Added comprehensive usage, configuration, matrix format, and quick-start docs for the load generator.
  • Tests

    • Added unit tests covering CLI flags, config parsing, matrix validation, runner behavior, and sync/check logic.
  • Chores

    • Updated build/CI to build and publish the load generator and bumped a compression dependency across modules.

auricom and others added 11 commits May 27, 2026 15:28
…spamoor

Standalone Go binary that orchestrates spamoor-daemon via HTTP API to
drive sustained and burst transaction load against ev-reth. Structured
as a cobra CLI with subcommands: check (connectivity), regular (~1M
tx/day), burst (500K probabilistic), and run (custom matrix).

Includes Dockerfile, docker-compose, baseline/burst matrices, crontab
for supercronic scheduling, just targets for build and smoke testing.
Replace separate regular/burst subcommands invoked by supercronic with
a single `start` command that runs an infinite scheduling loop. Regular
workloads fire on a ticker, bursts at random times within rolling 24h
windows. All timing controlled via CLI flags with env fallbacks.

- remove supercronic from Dockerfile
- delete regular.go, burst.go, crontab
- add start.go with scheduler loop
- plumb context for graceful shutdown
- unify ExecuteMatrix/ExecuteMatrixWithOverrides via matrixOpts
- bump fees (BASE_FEE 20→500, TIP_FEE 2→50)
- remove healthcheck from docker-compose, rely on WaitForSync
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 2, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

Adds a new ev-loadgen CLI app: build and CI entries, Justfile recipes, Docker assets, a Cobra CLI (run/start/check), a Spamoor client abstraction, matrix parsing/validation, a runner that creates/polls spammers, scheduling (regular and burst), tests, and documentation.

Changes

ev-loadgen Load Generator Application

Layer / File(s) Summary
Build infrastructure and dependency updates
.github/workflows/ci.yml, justfile, .gitignore, .just/loadgen.just, .just/build.just, apps/loadgen/go.mod, apps/evm/go.mod, apps/grpc/go.mod, apps/testapp/go.mod, go.mod
CI dashboard adds ev-node-loadgen Docker image; justfile imports loadgen recipes; .just recipes build and docker-build loadgen; .gitignore adds .junie and *.diff; compress dependency bumped to v1.18.5 across modules.
CLI root command and flag handling
apps/loadgen/cmd/root.go, apps/loadgen/cmd/flags_test.go
Root cobra command with persistent --spamoor-url and subcommands (run, start, check). Tests validate flag parsing and run-command argument validation.
Run command: execute matrix
apps/loadgen/cmd/run.go
run <matrix.json> loads a matrix file and executes it via the runner using a Spamoor client.
Spamoor client abstraction
apps/loadgen/internal/client.go
Defines SpamoorClient interface and spamoorAPIClient wrapper; implements GetClients() via HTTP /api/clients with JSON mapping.
Config, scenario building, and matrix parsing
apps/loadgen/internal/config.go, apps/loadgen/internal/config_test.go, apps/loadgen/internal/matrix.go, apps/loadgen/internal/matrix_test.go
Environment-driven SpamoorURL and scenario config builder; Matrix/Entry types and JSON loading; validation of scenarios, derived counts, probability bounds, and timeouts with tests.
Start command: scheduler and runtime
apps/loadgen/cmd/start.go, apps/loadgen/cmd/start_test.go
start implements a scheduler for periodic regular workloads and randomized bursts, config validation, env helpers, signal handling for cleanup, and unit tests for validation logic.
Runner: execution, polling, and sync
apps/loadgen/internal/runner.go
Matrix execution entrypoints, RunCheck connectivity test, spammer lifecycle management, baseline metric capture, delta-based completion polling, and wait-for-sync via client height stabilization.
Runner and support tests
apps/loadgen/internal/runner_test.go
Tests for baseline seeding, spammer start failure handling, completion polling with context timeouts, wait-for-sync behavior, and a fake SpamoorClient test double.
Main entry point and initialization
apps/loadgen/main.go
Program entrypoint runs the CLI root command and initializes standard logger (UTC timestamps).
Docker build and deployment configuration
apps/loadgen/Dockerfile, apps/loadgen/docker-compose.yml, apps/loadgen/README.md, apps/loadgen/matrices/baseline.json, apps/loadgen/matrices/burst.json
Multi-stage Dockerfile builds static binary; docker-compose wires spamoor-daemon and benchmarks; README documents architecture, commands, scheduling semantics, matrix format, and quick-start; baseline and burst JSON matrices included.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Suggested reviewers

  • tac0turtle
  • julienrbrt

Poem

🐰 A rabbit built a loadgen bright,
Hopping through spammers day and night,
Matrices, bursts, and steady pace,
It polled the metrics—kept the pace—
Benchmarks bloom in moonlit light.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 26.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The PR title 'feat(loadgen): adding loadgen binary' clearly describes the main change: addition of a loadgen binary. It follows semantic commit conventions and directly reflects the primary objective of the PR.
Description check ✅ Passed The PR description provides an overview section that explains the purpose and functionality of the loadgen binary (load generation using spamoor for sustained and burst transactions), but it lacks details about specific implementation aspects, technical decisions, or related issues.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch cian/benchmarks-binary

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Jun 2, 2026

Claude finished @chatton's task in 3m 13s —— View job


Code Review

This is a well-structured addition. The package layout is clean, the SpamoorClient interface is appropriately scoped for testability, and the fake test double is well-implemented. Several issues worth addressing before merge:


🔴 High Priority

1. README documents mutex serialization that doesn't exist [cmd/start.go]

The README states: "a mutex prevents concurrent spamoor access. If burst fires during regular (or vice versa), it waits for the lock." But runWorkload contains no mutex — ExecuteMatrixWithOverridesFromFile is called directly from goroutines with no synchronization. If a burst fires while a regular workload is running, they will execute concurrently, which can confuse the delta-based completion polling (both workloads read from the same getSpamoorCounters baseline). Either add the documented mutex or remove the claim from the README.

Fix this →

2. Goroutine accumulation on slow workloads [cmd/start.go:107-111]

case <-ticker.C:
    go runWorkload("regular", cfg.regularMatrix, regularTxPerRun)

Goroutines are launched unconditionally on every tick. If a workload takes longer than interval, goroutines queue up. Even with a mutex (see #1), they'd all run back-to-back after the previous workload finishes, creating a burst of work with no spacing. Adding the mutex alone doesn't fully fix this — you also need to skip the tick when already in-flight.

Fix this →

3. DeleteAllSpammers stops at first error, orphaning remaining spammers [internal/runner.go:253-264]

for _, sp := range existing {
    if err := api.DeleteSpammer(sp.ID); err != nil {
        return fmt.Errorf("delete spammer %d: %w", sp.ID, err) // ← early return
    }
}

On shutdown cleanup, if deleting one spammer fails (transient network error, etc.), all remaining spammers are left running. This is particularly bad for the shutdown path in runScheduler. Collect errors and continue:

var errs []error
for _, sp := range existing {
    if err := api.DeleteSpammer(sp.ID); err != nil {
        errs = append(errs, fmt.Errorf("delete spammer %d: %w", sp.ID, err))
    }
}
return errors.Join(errs...)

Fix this →


🟡 Medium Priority

4. validateMatrix called twice on the run path [internal/runner.go:57-60, internal/matrix.go:53-65]

LoadMatrix already calls validateMatrix internally. executeMatrix then calls validateMatrix again at line 58. The second call is harmless but redundant — for entries loaded via ExecuteMatrixFromFile, validation runs twice.

5. Stale lastHeight inflates sync delta after fetch error [internal/runner.go:336-351]

height, err := getSpamoorHeight(api)
if err != nil {
    log.Printf("warning: %v", err)
} else {
    if lastHeight > 0 && height > 0 {
        delta := height - lastHeight  // ← lastHeight not reset on error

If getSpamoorHeight errors mid-sync, lastHeight keeps its old value. The next successful call computes delta across two poll intervals, inflating it and potentially triggering a spurious "still syncing" log. Reset lastHeight = 0 on error so the next success establishes a fresh baseline.

6. Potential panic in nextBurstTimer with large burstPerDay [cmd/start.go:154]

delay := time.Duration(rand.Int64N(int64(window) / int64(remaining)))

If int64(window) / int64(remaining) rounds to 0 (possible when remaining > window in nanoseconds, which can't happen with normal values — but validateStartConfig doesn't bound burstPerDay), rand.Int64N(0) panics. Add an upper bound validation:

if cfg.burstPerDay > int(burstWindow.Seconds()) {
    return fmt.Errorf("burst-per-day %d exceeds maximum (%d)", cfg.burstPerDay, int(burstWindow.Seconds()))
}

🔵 Low Priority / Style

7. envInt silently swallows parse errors [internal/matrix.go:111-121]

A typo like BENCH_NUM_SPAMMERS=2x silently falls back to the default (1 spammer) rather than failing fast. This is consistent with envIntOr in start.go (same pattern), but since these are matrix env vars, a malformed value would produce surprising behavior. Consider at least logging a warning when strconv.Atoi fails.

8. Inconsistent HTTP timeout [internal/client.go:35]

GetClients uses a custom *http.Client with a 2s timeout. All other methods (ListSpammers, GetMetrics, CreateSpammer, etc.) delegate to c.api which may have a different or no timeout. This inconsistency means GetClients failures surface quickly while other calls may hang.

9. Burst reset is 24h from startup, not midnight UTC [README.md:41]

The README says: "at startup + each midnight UTC, generates N random times across the day". The actual implementation uses time.NewTimer(burstWindow) from startup, which resets 24h from when the process started — not at midnight UTC. The implementation is fine; the docs need updating.

10. No context propagation in SpamoorClient [internal/client.go:16-24]

None of the interface methods accept context.Context, so cancellation from upstream (e.g., cmd.Context()) only propagates to GetClients via the 2s HTTP client timeout, not to other API calls. This is acceptable for a tool this scope but worth a follow-up.


✅ Already Addressed (from prior CodeRabbit review)

  • Non-root user in Dockerfile (USER ev) ✓
  • rebroadcast parsed as uint64 via strconv.ParseUint
  • Language identifiers added to README code blocks ✓
  • Errors wrapped in GetClients
  • Timeout validated in Entry.validate()

Testing

The test coverage for the core paths is solid — baseline counter seeding, spammer start failure, completion polling, sync detection, and context cancellation are all covered. The fakeSpamoorClient sequence-based approach is clean and easy to extend.

Missing coverage:

  • DeleteAllSpammers when a delete fails mid-loop
  • runCheck path (no test exercises the full check flow)
  • BuildScenarioConfig env mapping (no test for the rebroadcast → uint64 conversion)

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 2, 2026

The latest Buf updates on your PR. Results from workflow CI / buf-check (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed⏩ skipped✅ passed✅ passedJun 2, 2026, 2:32 PM

@codecov
Copy link
Copy Markdown

codecov Bot commented Jun 2, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 60.72%. Comparing base (6b2629e) to head (9f5d982).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3335      +/-   ##
==========================================
- Coverage   60.75%   60.72%   -0.03%     
==========================================
  Files         127      127              
  Lines       13781    13781              
==========================================
- Hits         8372     8368       -4     
- Misses       4497     4500       +3     
- Partials      912      913       +1     
Flag Coverage Δ
combined 60.72% <ø> (-0.03%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@chatton chatton marked this pull request as ready for review June 2, 2026 11:51
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (10)
apps/loadgen/cmd/flags_test.go (1)

10-12: 💤 Low value

Redundant (duplicate) start registration.

NewRootCmd() already calls rootCmd.AddCommand(newRunCmd(), newStartCmd(), newCheckCmd()), so root already owns a start subcommand. Adding another startCmd here registers a second, duplicate start command on root. The test still passes because the assertions run against the local startCmd, but root is never used in this test — both root and the extra AddCommand can be dropped.

♻️ Suggested simplification
 func TestStartFlags(t *testing.T) {
-	root := NewRootCmd()
 	startCmd := newStartCmd()
-	root.AddCommand(startCmd)
 
 	err := startCmd.ParseFlags([]string{"--regular-matrix", "custom.json"})
 	require.NoError(t, err)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@apps/loadgen/cmd/flags_test.go` around lines 10 - 12, Remove the redundant
registration of the "start" subcommand: NewRootCmd() already adds newStartCmd(),
so drop the creation of root and the extra root.AddCommand(startCmd) call from
the test; keep the local startCmd used for assertions (created via
newStartCmd()) and remove references to NewRootCmd(), root, and root.AddCommand
to avoid registering a duplicate start command.
apps/loadgen/main.go (1)

17-21: 💤 Low value

Banner side-effect in init() fires on every invocation.

init() runs before main() for all entrypoints, so "ev-loadgen starting" is emitted even for --help, shell completion, and other non-running subcommands, polluting their output. Consider keeping only logger configuration in init() and moving the startup message into main() (or a PersistentPreRun) so it prints only when a workload actually starts.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@apps/loadgen/main.go` around lines 17 - 21, The init() currently configures
logging and also prints a startup banner ("ev-loadgen starting"), which runs for
every entrypoint; remove the fmt.Fprintln(os.Stderr, "ev-loadgen starting") from
init() so init() only does log.SetFlags and log.SetOutput, and relocate the
banner to main() (or a Cobra PersistentPreRun if using Cobra) so the message is
emitted only when the workload actually starts; update references in main() or
the command's PersistentPreRun to print the same startup message to os.Stderr.
apps/loadgen/internal/client.go (2)

67-81: ⚡ Quick win

Wrap returned errors with context.

GetClients returns bare errors from the HTTP GET and the JSON decode. Wrapping them makes failures actionable in logs/callers.

♻️ Proposed wrapping
 	url := fmt.Sprintf("%s/api/clients", c.api.BaseURL)
 	resp, err := c.client.Get(url)
 	if err != nil {
-		return nil, err
+		return nil, fmt.Errorf("get clients: %w", err)
 	}
 	defer func() { _ = resp.Body.Close() }()
 	if resp.StatusCode < 200 || resp.StatusCode >= 300 {
 		body, _ := io.ReadAll(resp.Body)
 		return nil, fmt.Errorf("get clients failed: %s", string(body))
 	}
 	var raw []clientResponse
 	if err := json.NewDecoder(resp.Body).Decode(&raw); err != nil {
-		return nil, err
+		return nil, fmt.Errorf("decode clients response: %w", err)
 	}

As per coding guidelines: "Wrap errors with context using fmt.Errorf in Go code".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@apps/loadgen/internal/client.go` around lines 67 - 81, GetClients currently
returns raw errors from the HTTP GET and JSON decode; update
spamoorAPIClient.GetClients to wrap those errors with contextual messages (e.g.,
using fmt.Errorf) so callers see what operation failed. Specifically, wrap the
error returned by c.client.Get(url) with context like "GET /api/clients failed"
and wrap the error from json.NewDecoder(resp.Body).Decode(&raw) with context
like "decode GetClients response failed"; keep existing behavior for non-2xx
handling and closing resp.Body, but replace bare err returns with fmt.Errorf
including the original err.

16-24: ⚖️ Poor tradeoff

Consider propagating context.Context through the client.

The SpamoorClient interface and GetClients rely solely on the 2s http.Client timeout and ignore the caller's context (e.g., cmd.Context() from RunCheck). Threading a context.Context into these methods would honor cancellation and deadlines from upstream callers. This touches the interface surface, so it can be deferred, but it aligns with the guideline.

As per coding guidelines: "Use context.Context for cancellation in Go code".

Also applies to: 67-69

apps/loadgen/docker-compose.yml (1)

5-5: ⚡ Quick win

Pin spamoor-daemon to an immutable version/digest instead of latest.

Using :latest makes load tests non-reproducible and can silently change behavior between runs.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@apps/loadgen/docker-compose.yml` at line 5, Replace the floating image
reference "image: ethpandaops/spamoor:latest" with an immutable image reference
(either a specific version tag like "ethpandaops/spamoor:vX.Y.Z" or a
content-addressable digest "ethpandaops/spamoor@sha256:...") so the loadgen
service uses a reproducible artifact; update the compose service's image field
where "image: ethpandaops/spamoor:latest" appears and ensure any CI/docs that
trigger builds are updated to pin or update the tag/digest when intentional.
apps/loadgen/internal/matrix.go (2)

104-114: 💤 Low value

envInt silently masks malformed values.

A non-integer value for BENCH_NUM_SPAMMERS/BENCH_COUNT_PER_SPAMMER (e.g. a typo like "2x") silently falls back to the default instead of being rejected, so a misconfigured matrix can run with unintended values. Since these are required, validated inputs, surfacing the parse error would be safer.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@apps/loadgen/internal/matrix.go` around lines 104 - 114, The envInt helper
currently swallows parse errors; change envInt(key) to return (int, error)
instead of silently falling back so callers can fail fast for required inputs
like BENCH_NUM_SPAMMERS and BENCH_COUNT_PER_SPAMMER; specifically, update envInt
to: 1) return fallback value only when the key is absent (or keep current
semantics if fallback is intended) but return a non-nil error when strconv.Atoi
fails, 2) update all callers of envInt (e.g., where BENCH_NUM_SPAMMERS and
BENCH_COUNT_PER_SPAMMER are read) to handle the error and surface it (return
error up the stack or log and exit), and 3) ensure error messages include the
env key and the invalid value to aid debugging.

79-102: ⚡ Quick win

Consider validating Timeout during matrix validation.

Timeout is accepted as a free-form string but never parsed here, so an invalid duration (e.g. "2x") passes validation and only surfaces as an error later when the runner parses it. Validating it up front (when non-empty) gives faster, clearer feedback at load time.

♻️ Proposed early validation
 	if e.Probability != nil && (*e.Probability < 0 || *e.Probability > 1) {
 		return fmt.Errorf("probability must be between 0.0 and 1.0, got %f", *e.Probability)
 	}
 
+	if e.Timeout != "" {
+		if _, err := time.ParseDuration(e.Timeout); err != nil {
+			return fmt.Errorf("invalid timeout %q: %w", e.Timeout, err)
+		}
+	}
+
 	return nil

Add "time" to the imports.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@apps/loadgen/internal/matrix.go` around lines 79 - 102, The validate method
on Entry should also parse and validate the Timeout string early: when e.Timeout
is non-empty call time.ParseDuration(e.Timeout) (add the "time" import) and
return a clear fmt.Errorf if parsing fails (e.g., "invalid timeout %q: %v"), so
invalid durations like "2x" fail during Entry.validate rather than later in the
runner.
apps/loadgen/cmd/start.go (2)

56-129: ⚖️ Poor tradeoff

Use structured logging instead of log.Printf.

This scheduler (and the runner package) logs exclusively via the standard log package with no levels. Consider a leveled/structured logger so operational output (info vs warning/error such as the workload-error and cleanup-failure cases) is distinguishable.

As per coding guidelines: "Use structured logging with appropriate log levels".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@apps/loadgen/cmd/start.go` around lines 56 - 129, The function runScheduler
uses plain log.Printf; replace it with a leveled/structured logger and use
appropriate levels for events (info for normal lifecycle messages, error for
failures, warn for cleanup warnings). Instantiate or accept a structured logger
(e.g., logger variable or parameter) in runScheduler, then change all log.Printf
calls inside runScheduler and the nested runWorkload closure to
logger.Infof/logger.Warnf/logger.Errorf (or equivalent) and include structured
fields like "label", "txCount", "interval", "regularMatrix", "burstMatrix", and
"spamoor" to make messages machine-readable; also propagate or pass the logger
into any calls that need it (or include it in context) for internal.WaitForSync,
internal.ExecuteMatrixWithOverridesFromFile, and internal.DeleteAllSpammers so
their failures are logged with error level and contextual fields.

79-119: ⚖️ Poor tradeoff

Bursts/regular workloads launched on every tick can queue up unboundedly.

Each ticker.C/burstTimer.C fire spawns a new go runWorkload(...). These serialize on mu, but if a workload runs longer than interval the goroutines pile up and later run stale workloads against a chain that has already moved on. Consider skipping (or coalescing) a tick when a workload is already in flight, e.g. a TryLock/busy flag rather than launching unconditionally.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@apps/loadgen/cmd/start.go` around lines 79 - 119, The tick handlers currently
spawn goroutines unconditionally which can pile up because runWorkload
serializes on mu; add a short-circuit "in-flight" guard (e.g., an int32 atomic
flag or a token channel) to skip launching when a workload is already running:
declare inFlight int32 (or a chan struct{} semaphore) near the function, change
the ticker and burstTimer cases to do an atomic.CompareAndSwapInt32(&inFlight,
0, 1) (or try to receive/send on the token channel) and only call go
runWorkload(...) when the CAS succeeds, and ensure runWorkload clears the flag
on exit (defer atomic.StoreInt32(&inFlight, 0) or release the token) so
subsequent ticks can run; keep existing mu locking inside runWorkload and
preserve cleanup on ctx.Done(). Use the same mechanism for both "regular" and
"burst" launches to coalesce/skips overlapping runs.
apps/loadgen/internal/runner.go (1)

337-353: 💤 Low value

waitForSync reuses a stale lastHeight after a fetch error.

If getSpamoorHeight errors on an iteration, lastHeight is not updated, so the next successful read computes delta = height - lastHeight across a longer wall-clock gap (two+ poll intervals). That inflates the delta and can prevent the delta < syncThreshold exit even when synced. Minor, but resetting lastHeight = 0 on error (forcing a fresh baseline) would make the heuristic more robust.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@apps/loadgen/internal/runner.go` around lines 337 - 353, In waitForSync's
polling loop, when getSpamoorHeight returns an error the code currently leaves
lastHeight unchanged which can inflate the next delta; update the error branch
in waitForSync to reset lastHeight to 0 (or another sentinel) after logging the
error so the next successful getSpamoorHeight call establishes a fresh baseline
before computing delta against syncThreshold using pollInterval; modify the
block that handles err from getSpamoorHeight to set lastHeight = 0.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@apps/loadgen/Dockerfile`:
- Around line 11-24: The image runs as root; change the Dockerfile to create and
switch to a non-root user before executing CMD: add an unprivileged group/user
(e.g., create group "ev" and user "ev"), change WORKDIR to a non-root directory
(e.g., /home/ev), chown the copied files (ev-loadgen and the JSON matrices) to
that user (references: WORKDIR, COPY lines that place /usr/bin/ev-loadgen and
/root/*.json), and add a USER instruction so the container runs as that
unprivileged user when invoking CMD ["ev-loadgen","start"].

In `@apps/loadgen/go.mod`:
- Line 42: apps/loadgen/go.mod pulls vulnerable transitive modules
(github.com/cometbft/cometbft and github.com/cosmos/cosmos-sdk) via
github.com/celestiaorg/tastora; update the transitive chain by bumping tastora
to v0.20.0 (which brings cometbft to v1.0.1 and cosmos-sdk to v0.54.3), either
by running `go get github.com/celestiaorg/tastora@v0.20.0` or editing
apps/loadgen/go.mod to require the new versions, then run `go mod tidy` and `go
list -m all` to verify the upgraded versions and re-run the vulnerability scan
(e.g., `govulncheck ./...`) to confirm the fixes.

In `@apps/loadgen/internal/config.go`:
- Around line 39-49: The loop that maps environment variables into cfg currently
uses strconv.Atoi and falls back to the original string, which turns
BENCH_REBROADCAST="true" into the JSON string "true"; change the numeric parsing
to use strconv.ParseUint(val, 10, 64) (for example in the same loop that
iterates envMapping) and, on success, assign the parsed uint64 value into
cfg[cfgKey]; on parse failure keep the original string as before. Also update
the corresponding test in config_test.go to expect a numeric value (uint64/JSON
number) for the BENCH_REBROADCAST entry instead of the string "true". Ensure
references include the envMapping loop and the BENCH_REBROADCAST key so the
change is applied to the correct mapping.

In `@apps/loadgen/README.md`:
- Around line 7-12: The README contains fenced code blocks with ASCII diagrams
(e.g., the block starting with "ev-loadgen (this binary)  -->  spamoor-daemon 
-->  ev-reth RPC") that are missing language identifiers and trigger MD040;
update each such triple-backtick fence (including the block at lines referenced
and the other block around lines 19-23) to include an appropriate language tag
like `text` or `bash` (e.g., ```text) so markdown lint passes and the blocks
render correctly.

---

Nitpick comments:
In `@apps/loadgen/cmd/flags_test.go`:
- Around line 10-12: Remove the redundant registration of the "start"
subcommand: NewRootCmd() already adds newStartCmd(), so drop the creation of
root and the extra root.AddCommand(startCmd) call from the test; keep the local
startCmd used for assertions (created via newStartCmd()) and remove references
to NewRootCmd(), root, and root.AddCommand to avoid registering a duplicate
start command.

In `@apps/loadgen/cmd/start.go`:
- Around line 56-129: The function runScheduler uses plain log.Printf; replace
it with a leveled/structured logger and use appropriate levels for events (info
for normal lifecycle messages, error for failures, warn for cleanup warnings).
Instantiate or accept a structured logger (e.g., logger variable or parameter)
in runScheduler, then change all log.Printf calls inside runScheduler and the
nested runWorkload closure to logger.Infof/logger.Warnf/logger.Errorf (or
equivalent) and include structured fields like "label", "txCount", "interval",
"regularMatrix", "burstMatrix", and "spamoor" to make messages machine-readable;
also propagate or pass the logger into any calls that need it (or include it in
context) for internal.WaitForSync, internal.ExecuteMatrixWithOverridesFromFile,
and internal.DeleteAllSpammers so their failures are logged with error level and
contextual fields.
- Around line 79-119: The tick handlers currently spawn goroutines
unconditionally which can pile up because runWorkload serializes on mu; add a
short-circuit "in-flight" guard (e.g., an int32 atomic flag or a token channel)
to skip launching when a workload is already running: declare inFlight int32 (or
a chan struct{} semaphore) near the function, change the ticker and burstTimer
cases to do an atomic.CompareAndSwapInt32(&inFlight, 0, 1) (or try to
receive/send on the token channel) and only call go runWorkload(...) when the
CAS succeeds, and ensure runWorkload clears the flag on exit (defer
atomic.StoreInt32(&inFlight, 0) or release the token) so subsequent ticks can
run; keep existing mu locking inside runWorkload and preserve cleanup on
ctx.Done(). Use the same mechanism for both "regular" and "burst" launches to
coalesce/skips overlapping runs.

In `@apps/loadgen/docker-compose.yml`:
- Line 5: Replace the floating image reference "image:
ethpandaops/spamoor:latest" with an immutable image reference (either a specific
version tag like "ethpandaops/spamoor:vX.Y.Z" or a content-addressable digest
"ethpandaops/spamoor@sha256:...") so the loadgen service uses a reproducible
artifact; update the compose service's image field where "image:
ethpandaops/spamoor:latest" appears and ensure any CI/docs that trigger builds
are updated to pin or update the tag/digest when intentional.

In `@apps/loadgen/internal/client.go`:
- Around line 67-81: GetClients currently returns raw errors from the HTTP GET
and JSON decode; update spamoorAPIClient.GetClients to wrap those errors with
contextual messages (e.g., using fmt.Errorf) so callers see what operation
failed. Specifically, wrap the error returned by c.client.Get(url) with context
like "GET /api/clients failed" and wrap the error from
json.NewDecoder(resp.Body).Decode(&raw) with context like "decode GetClients
response failed"; keep existing behavior for non-2xx handling and closing
resp.Body, but replace bare err returns with fmt.Errorf including the original
err.

In `@apps/loadgen/internal/matrix.go`:
- Around line 104-114: The envInt helper currently swallows parse errors; change
envInt(key) to return (int, error) instead of silently falling back so callers
can fail fast for required inputs like BENCH_NUM_SPAMMERS and
BENCH_COUNT_PER_SPAMMER; specifically, update envInt to: 1) return fallback
value only when the key is absent (or keep current semantics if fallback is
intended) but return a non-nil error when strconv.Atoi fails, 2) update all
callers of envInt (e.g., where BENCH_NUM_SPAMMERS and BENCH_COUNT_PER_SPAMMER
are read) to handle the error and surface it (return error up the stack or log
and exit), and 3) ensure error messages include the env key and the invalid
value to aid debugging.
- Around line 79-102: The validate method on Entry should also parse and
validate the Timeout string early: when e.Timeout is non-empty call
time.ParseDuration(e.Timeout) (add the "time" import) and return a clear
fmt.Errorf if parsing fails (e.g., "invalid timeout %q: %v"), so invalid
durations like "2x" fail during Entry.validate rather than later in the runner.

In `@apps/loadgen/internal/runner.go`:
- Around line 337-353: In waitForSync's polling loop, when getSpamoorHeight
returns an error the code currently leaves lastHeight unchanged which can
inflate the next delta; update the error branch in waitForSync to reset
lastHeight to 0 (or another sentinel) after logging the error so the next
successful getSpamoorHeight call establishes a fresh baseline before computing
delta against syncThreshold using pollInterval; modify the block that handles
err from getSpamoorHeight to set lastHeight = 0.

In `@apps/loadgen/main.go`:
- Around line 17-21: The init() currently configures logging and also prints a
startup banner ("ev-loadgen starting"), which runs for every entrypoint; remove
the fmt.Fprintln(os.Stderr, "ev-loadgen starting") from init() so init() only
does log.SetFlags and log.SetOutput, and relocate the banner to main() (or a
Cobra PersistentPreRun if using Cobra) so the message is emitted only when the
workload actually starts; update references in main() or the command's
PersistentPreRun to print the same startup message to os.Stderr.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 864cc402-06e5-49ca-83c4-83444b6132aa

📥 Commits

Reviewing files that changed from the base of the PR and between 6b2629e and 3e6d05e.

⛔ Files ignored due to path filters (5)
  • apps/evm/go.sum is excluded by !**/*.sum
  • apps/grpc/go.sum is excluded by !**/*.sum
  • apps/loadgen/go.sum is excluded by !**/*.sum
  • apps/testapp/go.sum is excluded by !**/*.sum
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (29)
  • .github/workflows/ci.yml
  • .gitignore
  • .just/build.just
  • .just/loadgen.just
  • apps/evm/go.mod
  • apps/grpc/go.mod
  • apps/loadgen/Dockerfile
  • apps/loadgen/README.md
  • apps/loadgen/cmd/check.go
  • apps/loadgen/cmd/flags_test.go
  • apps/loadgen/cmd/root.go
  • apps/loadgen/cmd/run.go
  • apps/loadgen/cmd/start.go
  • apps/loadgen/cmd/start_test.go
  • apps/loadgen/docker-compose.yml
  • apps/loadgen/go.mod
  • apps/loadgen/internal/client.go
  • apps/loadgen/internal/config.go
  • apps/loadgen/internal/config_test.go
  • apps/loadgen/internal/matrix.go
  • apps/loadgen/internal/matrix_test.go
  • apps/loadgen/internal/runner.go
  • apps/loadgen/internal/runner_test.go
  • apps/loadgen/main.go
  • apps/loadgen/matrices/baseline.json
  • apps/loadgen/matrices/burst.json
  • apps/testapp/go.mod
  • go.mod
  • justfile

Comment thread apps/loadgen/Dockerfile
Comment thread apps/loadgen/go.mod
github.com/cockroachdb/pebble v1.1.5 // indirect
github.com/cockroachdb/redact v1.1.5 // indirect
github.com/cockroachdb/tokenbucket v0.0.0-20230807174530-cc333fc44b06 // indirect
github.com/cometbft/cometbft v0.38.17 // indirect
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify why vulnerable modules are present and whether upgrades are available.
# 1) Confirm dependency path
go mod why -m github.com/cometbft/cometbft
go mod why -m github.com/cosmos/cosmos-sdk

# 2) List available updates impacting these modules
go list -m -u all | rg 'github.com/(celestiaorg/tastora|cometbft/cometbft|cosmos/cosmos-sdk)'

# 3) Re-run vuln scan after update in CI/local pipeline
# (command depends on tooling in your environment)
# govulncheck ./...

Repository: evstack/ev-node

Length of output: 2478


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== Snippet from apps/loadgen/go.mod =="
nl -ba apps/loadgen/go.mod | sed -n '35,60p'

echo
echo "== go env (quick context) =="
(cd apps/loadgen && go env GOMOD GOPATH)

echo
echo "== Dependency paths within apps/loadgen =="
(cd apps/loadgen && go mod why -m github.com/cometbft/cometbft || true)
(cd apps/loadgen && go mod why -m github.com/cosmos/cosmos-sdk || true)

echo
echo "== Available updates impacting these modules (within apps/loadgen) =="
(cd apps/loadgen && go list -m -u all | rg 'github.com/(celestiaorg/tastora|cometbft/cometbft|cosmos/cosmos-sdk)' || true)

echo
echo "== govulncheck (if available) =="
if command -v govulncheck >/dev/null 2>&1; then
  (cd apps/loadgen && govulncheck ./... || true)
else
  echo "govulncheck not installed in this environment."
fi

Repository: evstack/ev-node

Length of output: 159


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== Snippet from apps/loadgen/go.mod (lines 35-60) =="
awk 'NR>=35 && NR<=60 {printf "%d\t%s\n", NR, $0}' apps/loadgen/go.mod

echo
echo "== go env (apps/loadgen context) =="
(cd apps/loadgen && go env GOMOD GOPATH)

echo
echo "== Dependency paths within apps/loadgen =="
(cd apps/loadgen && go mod why -m github.com/cometbft/cometbft || true)
(cd apps/loadgen && go mod why -m github.com/cosmos/cosmos-sdk || true)

echo
echo "== Available updates impacting these modules (within apps/loadgen) =="
if command -v rg >/dev/null 2>&1; then
  (cd apps/loadgen && go list -m -u all | rg 'github.com/(celestiaorg/tastora|cometbft/cometbft|cosmos/cosmos-sdk)' || true)
else
  (cd apps/loadgen && go list -m -u all | grep -E 'github.com/(celestiaorg/tastora|cometbft/cometbft|cosmos/cosmos-sdk)' || true)
fi

echo
echo "== govulncheck (if available) =="
if command -v govulncheck >/dev/null 2>&1; then
  (cd apps/loadgen && govulncheck ./... || true)
else
  echo "govulncheck not installed in this environment."
fi

Repository: evstack/ev-node

Length of output: 5228


Upgrade vulnerable transitive deps in apps/loadgen/go.mod before release

apps/loadgen/go.mod lists github.com/cometbft/cometbft v0.38.17 (line 42) and github.com/cosmos/cosmos-sdk v0.50.12 (line 49) as indirect dependencies, pulled in via github.com/celestiaorg/tastora/framework/... (see go mod why paths). go list -m -u all shows available upgrades: github.com/celestiaorg/tastora v0.20.0, github.com/cometbft/cometbft v1.0.1, and github.com/cosmos/cosmos-sdk v0.54.3. Upgrade the chain and re-run the vulnerability scan in CI (e.g., govulncheck ./...).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@apps/loadgen/go.mod` at line 42, apps/loadgen/go.mod pulls vulnerable
transitive modules (github.com/cometbft/cometbft and
github.com/cosmos/cosmos-sdk) via github.com/celestiaorg/tastora; update the
transitive chain by bumping tastora to v0.20.0 (which brings cometbft to v1.0.1
and cosmos-sdk to v0.54.3), either by running `go get
github.com/celestiaorg/tastora@v0.20.0` or editing apps/loadgen/go.mod to
require the new versions, then run `go mod tidy` and `go list -m all` to verify
the upgraded versions and re-run the vulnerability scan (e.g., `govulncheck
./...`) to confirm the fixes.

Comment thread apps/loadgen/internal/config.go
Comment thread apps/loadgen/README.md Outdated
chatton and others added 2 commits June 2, 2026 13:33
- run container as non-root user
- use ParseUint for numeric env values in scenario config
- add language identifiers to fenced code blocks (MD040)
- wrap bare errors with context in client.go
- remove redundant root cmd in flags_test.go
- validate timeout duration during matrix validation
- remove startup banner from init()

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
apps/loadgen/cmd/start.go (3)

110-120: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Reschedule bursts against the remaining day, not a fresh 24h window.

After a burst fires, nextBurstTimer(burstsRemaining, burstWindow) ignores how much time is left before resetTimer fires. If an earlier burst happened late, the next timer is frequently pushed past the pending reset and gets canceled, so burstPerDay is not actually honored. Use time.Until(nextReset) for the follow-up delay window instead of another full burstWindow.

Also applies to: 141-156

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@apps/loadgen/cmd/start.go` around lines 110 - 120, The burst scheduling
currently calls nextBurstTimer(burstsRemaining, burstWindow) after each burst
which resets the follow-up delay to a full burstWindow and ignores the remaining
time until resetTimer fires; update the logic in the run loop where burstTimer
and burstsRemaining are updated (references: burstTimer, resetTimer,
burstsRemaining, nextBurstTimer, burstWindow, resetTimer.Reset) to compute the
remaining window until the next reset (e.g., time.Until(nextReset)) and pass
that reduced window into nextBurstTimer so subsequent bursts are scheduled only
within the remaining day; apply the same change to the other similar block
handling bursts (the later occurrence around where burstsRemaining is
decremented and burstTimer reset) so both places use the time until the pending
reset rather than a fresh burstWindow.

66-67: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Zero txCount here means "use matrix defaults", not "run nothing".

validateStartConfig allows zero values, and regularTxPerRun truncates fractional daily budgets down to zero. Downstream, apps/loadgen/internal/runner.go only applies the BENCH_COUNT_PER_SPAMMER override when totalTxTarget > 0, so --tx-per-day=0, --burst-tx-count=0, or --tx-per-day=1 --interval=1h will still execute the matrix at its baked-in counts. Skip launches when txCount == 0, and either reject or carry fractional budget forward instead of truncating it away. As per coding guidelines, "Validate all inputs from external sources in Go code".

Also applies to: 77-80, 84-85, 110-111, 125-137

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@apps/loadgen/cmd/start.go` around lines 66 - 67, The calculation in start.go
(runsPerDay and regularTxPerRun) currently truncates fractional budgets to zero
and treats zero as "run nothing"; instead preserve fractional budget or reject
ambiguous zeroes and treat zero as the sentinel meaning "use matrix defaults".
Update the logic around regularTxPerRun (the variables runsPerDay,
regularTxPerRun) and validateStartConfig so that: 1) inputs are validated (no
negatives; reject or require explicit flags if the caller intended zero vs
defaults), 2) per-run budget does not silently truncate fractional daily budgets
to zero — either carry the fractional remainder forward using a float
accumulator on the start config (e.g., txBudgetFloat / runsPerDay) and
distribute it across runs, or use rounding (math.Ceil) when appropriate so small
budgets aren’t lost, and 3) explicitly treat txCount==0 as the sentinel to "use
matrix defaults" (so do not skip applying matrix defaults) and ensure
runner.go’s BENCH_COUNT_PER_SPAMMER behavior is preserved by checking this
sentinel instead of relying on totalTxTarget > 0.

77-85: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Wait for workload goroutines to finish before calling internal.DeleteAllSpammers during shutdown.

In apps/loadgen/cmd/start.go, runWorkload is launched as detached goroutines (immediately and on ticker/burst), but when ctx.Done() fires the scheduler immediately calls internal.DeleteAllSpammers(api) and returns without waiting. Because SpamoorClient methods don’t take context.Context, in-flight matrix execution (and per-entry deferred api.DeleteSpammer) can still be running while global cleanup deletes the same spammers, leading to shutdown races, cancellation/timed-out errors, and spurious cleanup warnings. Track launched goroutines with a sync.WaitGroup or errgroup and join them before DeleteAllSpammers.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@apps/loadgen/cmd/start.go` around lines 77 - 85, The shutdown races occur
because runWorkload goroutines (launched from start.go, including the immediate
goroutine and those started by the ticker/burst logic) are not awaited before
calling internal.DeleteAllSpammers(api); update the scheduler to track all
launched workers with a sync.WaitGroup or an errgroup tied to ctx, incrementing
the counter inside the code that invokes runWorkload and every goroutine started
from the ticker/burst paths, then on ctx.Done() wait for the group to finish
(wg.Wait() or group.Wait()) before calling internal.DeleteAllSpammers(api) so
in-flight ExecuteMatrixWithOverridesFromFile and per-entry deferred
api.DeleteSpammer calls complete first.
♻️ Duplicate comments (1)
apps/loadgen/internal/config.go (1)

39-48: ⚠️ Potential issue | 🟠 Major

Reject invalid numeric BENCH_* overrides instead of serializing strings.

The else branch still forwards bad external input as the wrong JSON type. With the current mapping, values like BENCH_REBROADCAST=true or BENCH_THROUGHPUT=-1 become strings here and the failure is deferred to the spamoor API call instead of being caught at config build time. Please fail fast on parse errors rather than populating cfg[cfgKey] = val. As per coding guidelines, "Validate all inputs from external sources in Go code".

What JSON types does spamoor expect for the scenario config fields total_count, throughput, max_pending, max_wallets, gas_units_to_burn, base_fee, tip_fee, and rebroadcast?
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@apps/loadgen/internal/config.go` around lines 39 - 48, The code currently
converts env values to uint64 with strconv.ParseUint and on parse failure writes
the raw string into cfg (envMapping/cfg), which allows invalid types to pass
through; instead, fail fast: when strconv.ParseUint returns an error for an
envKey mapped to a numeric cfgKey, return (or propagate) an error describing the
invalid override (include envKey and val) rather than assigning cfg[cfgKey]=val;
for boolean-like keys such as rebroadcast, use strconv.ParseBool and likewise
return an error on failure; update the block referencing envMapping, envKey,
cfgKey, strconv.ParseUint and cfg to perform strict parsing and error
propagation so only correctly typed numeric/boolean values populate cfg.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@apps/loadgen/cmd/start.go`:
- Around line 110-120: The burst scheduling currently calls
nextBurstTimer(burstsRemaining, burstWindow) after each burst which resets the
follow-up delay to a full burstWindow and ignores the remaining time until
resetTimer fires; update the logic in the run loop where burstTimer and
burstsRemaining are updated (references: burstTimer, resetTimer,
burstsRemaining, nextBurstTimer, burstWindow, resetTimer.Reset) to compute the
remaining window until the next reset (e.g., time.Until(nextReset)) and pass
that reduced window into nextBurstTimer so subsequent bursts are scheduled only
within the remaining day; apply the same change to the other similar block
handling bursts (the later occurrence around where burstsRemaining is
decremented and burstTimer reset) so both places use the time until the pending
reset rather than a fresh burstWindow.
- Around line 66-67: The calculation in start.go (runsPerDay and
regularTxPerRun) currently truncates fractional budgets to zero and treats zero
as "run nothing"; instead preserve fractional budget or reject ambiguous zeroes
and treat zero as the sentinel meaning "use matrix defaults". Update the logic
around regularTxPerRun (the variables runsPerDay, regularTxPerRun) and
validateStartConfig so that: 1) inputs are validated (no negatives; reject or
require explicit flags if the caller intended zero vs defaults), 2) per-run
budget does not silently truncate fractional daily budgets to zero — either
carry the fractional remainder forward using a float accumulator on the start
config (e.g., txBudgetFloat / runsPerDay) and distribute it across runs, or use
rounding (math.Ceil) when appropriate so small budgets aren’t lost, and 3)
explicitly treat txCount==0 as the sentinel to "use matrix defaults" (so do not
skip applying matrix defaults) and ensure runner.go’s BENCH_COUNT_PER_SPAMMER
behavior is preserved by checking this sentinel instead of relying on
totalTxTarget > 0.
- Around line 77-85: The shutdown races occur because runWorkload goroutines
(launched from start.go, including the immediate goroutine and those started by
the ticker/burst logic) are not awaited before calling
internal.DeleteAllSpammers(api); update the scheduler to track all launched
workers with a sync.WaitGroup or an errgroup tied to ctx, incrementing the
counter inside the code that invokes runWorkload and every goroutine started
from the ticker/burst paths, then on ctx.Done() wait for the group to finish
(wg.Wait() or group.Wait()) before calling internal.DeleteAllSpammers(api) so
in-flight ExecuteMatrixWithOverridesFromFile and per-entry deferred
api.DeleteSpammer calls complete first.

---

Duplicate comments:
In `@apps/loadgen/internal/config.go`:
- Around line 39-48: The code currently converts env values to uint64 with
strconv.ParseUint and on parse failure writes the raw string into cfg
(envMapping/cfg), which allows invalid types to pass through; instead, fail
fast: when strconv.ParseUint returns an error for an envKey mapped to a numeric
cfgKey, return (or propagate) an error describing the invalid override (include
envKey and val) rather than assigning cfg[cfgKey]=val; for boolean-like keys
such as rebroadcast, use strconv.ParseBool and likewise return an error on
failure; update the block referencing envMapping, envKey, cfgKey,
strconv.ParseUint and cfg to perform strict parsing and error propagation so
only correctly typed numeric/boolean values populate cfg.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: fb41e19f-036f-4e69-a32b-b815da44cd89

📥 Commits

Reviewing files that changed from the base of the PR and between 3e6d05e and 9f5d982.

📒 Files selected for processing (10)
  • apps/loadgen/Dockerfile
  • apps/loadgen/README.md
  • apps/loadgen/cmd/flags_test.go
  • apps/loadgen/cmd/start.go
  • apps/loadgen/internal/client.go
  • apps/loadgen/internal/config.go
  • apps/loadgen/internal/config_test.go
  • apps/loadgen/internal/matrix.go
  • apps/loadgen/internal/runner.go
  • apps/loadgen/main.go
💤 Files with no reviewable changes (2)
  • apps/loadgen/cmd/flags_test.go
  • apps/loadgen/main.go
✅ Files skipped from review due to trivial changes (1)
  • apps/loadgen/README.md
🚧 Files skipped from review as they are similar to previous changes (5)
  • apps/loadgen/Dockerfile
  • apps/loadgen/internal/matrix.go
  • apps/loadgen/internal/config_test.go
  • apps/loadgen/internal/client.go
  • apps/loadgen/internal/runner.go

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants