Skip to content

feat(simulator): deterministic seed based workload fuzz harness#3272

Open
krishvishal wants to merge 4 commits into
apache:masterfrom
krishvishal:workload-v2
Open

feat(simulator): deterministic seed based workload fuzz harness#3272
krishvishal wants to merge 4 commits into
apache:masterfrom
krishvishal:workload-v2

Conversation

@krishvishal
Copy link
Copy Markdown
Contributor

  • Adds a deterministic fuzz driver. A single PRNG drives every choice (action, target replica, payload bytes); a name-keyed
    Shadow predicts server entity state; an Auditor matches replies to in-flight requests and tracks per-(client, namespace) bookkeeping.
  • 25 server commands wired through a uniform per-op surface (sample / build_message / classify_reply / predicted_effect). The op_dispatch! macro generates the dispatch table from the op list so forgetting an op is a compile error.
  • Multi-client support; per-(client, namespace) last-commit cursor reflects that each VSR group keeps its own op counter. Reply lookup runs before any cursor mutation, and the reply's header.namespace is cross-checked against the in-flight request_namespace.

@github-actions
Copy link
Copy Markdown

Thanks for the pull request. It is now waiting for review, labeled S-waiting-on-review.

You can update that label as the review goes back and forth, with slash commands - each on its own line, in a regular PR comment (not an inline review reply):

  • /ready - mark it S-waiting-on-review again, after addressing feedback
  • /author - mark it S-waiting-on-author (maintainers, or anyone who has had a PR merged before)
  • /request-review @user ... - request reviewers (@user or @org/team)

Commands take up to ~90s to apply. If no reaction (👍 or 😕) appears on your comment, the apply step likely failed - check the repo's Actions tab for the PR Triage Apply run. Commands posted inside a review body (rather than a normal comment) cannot be reacted to, so they stay log-only.

See CONTRIBUTING.md for details.

@github-actions github-actions Bot added the S-waiting-on-review PR is waiting on a reviewer label May 18, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented May 18, 2026

Codecov Report

❌ Patch coverage is 49.27169% with 801 lines in your changes missing coverage. Please review.
✅ Project coverage is 24.11%. Comparing base (3628cec) to head (e89f7a6).
⚠️ Report is 5 commits behind head on master.

Files with missing lines Patch % Lines
core/simulator/src/client.rs 3.62% 186 Missing ⚠️
core/simulator/src/workload/shadow.rs 33.65% 136 Missing and 2 partials ⚠️
core/simulator/src/workload/mod.rs 68.80% 31 Missing and 3 partials ⚠️
...imulator/src/workload/ops/create_consumer_group.rs 0.00% 29 Missing ⚠️
...imulator/src/workload/ops/delete_consumer_group.rs 0.00% 29 Missing ⚠️
...mulator/src/workload/ops/delete_consumer_offset.rs 0.00% 28 Missing ⚠️
core/simulator/src/workload/ops/change_password.rs 0.00% 27 Missing ⚠️
core/simulator/src/workload/ops/create_user.rs 0.00% 26 Missing ⚠️
...re/simulator/src/workload/ops/delete_partitions.rs 0.00% 25 Missing ⚠️
core/simulator/src/workload/ops/update_stream.rs 0.00% 24 Missing ⚠️
... and 18 more
Additional details and impacted files
@@              Coverage Diff              @@
##             master    #3272       +/-   ##
=============================================
- Coverage     73.78%   24.11%   -49.68%     
  Complexity      943      943               
=============================================
  Files          1200     1229       +29     
  Lines        109116    96425    -12691     
  Branches      86007    73333    -12674     
=============================================
- Hits          80515    23251    -57264     
- Misses        25874    72587    +46713     
+ Partials       2727      587     -2140     
Components Coverage Δ
Rust Core 8.96% <49.27%> (-65.97%) ⬇️
Java SDK 58.44% <ø> (ø)
C# SDK 69.16% <ø> (-0.29%) ⬇️
Python SDK 81.43% <ø> (ø)
Node SDK 91.41% <ø> (ø)
Go SDK 39.91% <ø> (ø)
Files with missing lines Coverage Δ
core/simulator/src/workload/ids.rs 100.00% <100.00%> (ø)
...r/src/workload/ops/create_personal_access_token.rs 100.00% <100.00%> (ø)
core/simulator/src/workload/ops/create_stream.rs 100.00% <100.00%> (ø)
core/simulator/src/workload/ops/delete_stream.rs 100.00% <100.00%> (ø)
core/simulator/src/workload/options.rs 100.00% <100.00%> (ø)
core/simulator/src/workload/ops/send_messages.rs 97.14% <97.14%> (ø)
...imulator/src/workload/ops/store_consumer_offset.rs 97.29% <97.29%> (ø)
...lator/src/workload/ops/store_consumer_offset_v2.rs 97.67% <97.67%> (ø)
core/simulator/src/workload/ops/mod.rs 92.85% <92.85%> (ø)
core/simulator/src/lib.rs 92.46% <98.75%> (+4.34%) ⬆️
... and 23 more

... and 670 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@numinnex
Copy link
Copy Markdown
Contributor

LGTM +1 binding

// submitted to. A mismatch means the reply landed in the wrong
// VSR group's bookkeeping; refuse to apply effects against the
// wrong shadow bucket.
if entry.request_namespace != header.namespace {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ns-mismatch path removes the in_flight entry then returns None; the caller at mod.rs:152-156 skips the in_flight_per_client decrement on that branch. unreachable today (server echoes request namespace verbatim, request ids are monotonic globally per SimClient), but a future routing or dedup bug would manifest as a silent single-client wedge at CLIENT_REQUEST_QUEUE_MAX=1 instead of a clean replies_unknown count. cleanest fix is to return an enum Match(InFlight) | NsMismatch | Unknown so the caller decrements on NsMismatch without applying effects.

match outcome {
Outcome::Success => {
let user = shadow.pick_user_name(prng)?;
let current_password = format!("pw-{user}");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

format!("pw-{user}") is correct only pre-first-commit. once the first ChangePassword succeeds server-side it stores new_password, but the next sample() rebuilds pw-{user} again, so a second ChangePassword on the same user mismatches the server. dormant under all-Success classification + weight=0 default. fix: shadow tracks per-user current pw; seed pw-{user} on AddUser, update on ChangePassword effect and on RenameUser.

new: String,
},
RenameUser {
old: String,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RenameUser { old, new } carries no pw field; combined with the pw-{user} reconstruction at change_password.rs:52, a ChangePassword issued after RenameUser derives the wrong password. add a pw payload to the RenameUser effect and propagate it through shadow.rs:227-231. bundle with the change_password fix.

} => {
if self.stream_names.contains(&stream) {
self.topic_names.insert((stream, name));
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AddTopic silently no-ops when the parent stream is gone; same pattern at AddConsumerGroup (lines 197-205). auditor.note_committed at mod.rs:174 runs unconditionally, so commits_per_action diverges from the shadow under multi-client interleave. same root cause as the multi-client rename collision below. single fix: Shadow::apply returns ApplyResult { applied: bool } and note_committed only fires on applied = true.


fn rename_stream(&mut self, old: &str, new: &str) {
if !self.stream_names.shift_remove(old) {
return;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename_stream's if !shift_remove(old) { return; } silently drops the second concurrent rename when two clients picked the same old. same pattern at rename_topic (lines 271-277). CLIENT_REQUEST_QUEUE_MAX=1 is per-client; the second client's pick_*_name reads the shadow before the first rename's reply lands, so collisions are possible across clients. dormant under default weights but closes via the same ApplyResult change as the AddTopic finding above.

self.stream_names.insert(new.to_string());
// Rename in (stream, topic) and (stream, topic, group):
// collect-then-rebuild keeps the loop borrow simple.
let old_topics: Vec<(String, String)> = self
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename_stream calls shift_remove inside the topic loop. each shift_remove is O(N), total O(N*M). cold path today (UpdateStream weight=0 default). single rebuild pass collapses it to O(N).

pub namespaces_live: IndexSet<IggyNamespace>,

/// Live streams by name. `CreateStream` inserts; `DeleteStream` removes.
pub stream_names: IndexSet<String>,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IndexSet<String> for entity names grows linearly with creates; fresh_name allocates a new String each time. bounded today by tick_budget per run. flag for Arc<str> interning when long-running harness lands.


self.auditor.note_committed(entry.action);

if let Some(count) = self.in_flight_per_client.get_mut(&header.client) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

count.saturating_sub(1) will mask any future double-decrement once the auditor on_reply consumed-but-rejected hardening lands. switch to checked_sub(1).expect("in_flight underflow") so the invariant fails loud.

}

#[must_use]
pub const fn predicted_effect(_input: &Input, outcome: Outcome) -> Effect {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

predicted_effect = Effect::None is acceptable today because shadow.sends_committed is keyed by IggyNamespace (packed stream/topic/partition) and the purge request carries only a name-based WireIdentifier. the shadow has no name -> ns reverse index, so it cannot zero the right keys. add a TODO so this surfaces when that reverse index lands and the offset clamp at store_consumer_offset.rs:63-64 starts validating against post-purge state.

Outcome::Success
}

#[must_use]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as purge_stream.rs:62. add a matching TODO.

@github-actions github-actions Bot added S-waiting-on-author PR is waiting on author response and removed S-waiting-on-review PR is waiting on a reviewer labels May 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

S-waiting-on-author PR is waiting on author response

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants