diff --git a/vale_styles/config/vocabularies/Base/accept.txt b/vale_styles/config/vocabularies/Base/accept.txt index 6f77a50a3..b371bf10c 100644 --- a/vale_styles/config/vocabularies/Base/accept.txt +++ b/vale_styles/config/vocabularies/Base/accept.txt @@ -220,3 +220,4 @@ test_run unprojected libpg_query heredocs +Jaccard diff --git a/versioned_docs/version-4.0.0/running-keploy/mock-matching.md b/versioned_docs/version-4.0.0/running-keploy/mock-matching.md new file mode 100644 index 000000000..15da60253 --- /dev/null +++ b/versioned_docs/version-4.0.0/running-keploy/mock-matching.md @@ -0,0 +1,91 @@ +--- +id: mock-matching +title: How Keploy Matches Mocks per Protocol +sidebar_label: Mock Matching Reference +description: Reference for what Keploy compares when matching outgoing calls to recorded Mocks, per protocol, and how to control noise and fuzzy matching. +tags: + - mocks + - mock matching + - noise + - fuzzy matching + - deterministic replay +keywords: + - mock matching + - noise + - fuzzy match + - mock mismatch + - deterministic replay + - req_body_noise +--- + +During replay, Keploy intercepts every outgoing call your application makes and answers it from the recorded Mocks instead of the real dependency. This page is the reference for **what is compared per protocol**, what happens when nothing matches (a _mock miss_), and which knobs control the behavior. + +> Keploy has two separate matching layers. **Test assertions** compare your API's recorded response with the live response and decide pass/fail — configured with `globalNoise` (see [Configuration file](configuration-file.md)). **Mock matching** (this page) selects which recorded Mock answers each outgoing call. The two layers share one noise vocabulary: a field path such as `body.user.created_at` or `header.date` means the same thing in both. + +## What each protocol checks + +| Protocol | Matched on | Ignored by default | On a mock miss, your app sees | +| --------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | -------------------------------------------------------------------------------------------------- | +| **HTTP** | URL path, method, content type, header **keys**, query parameter **keys**, request body (exact, then JSON field structure, then fuzzy) | ~60 known volatile headers (`Authorization`, `X-Amz-Date`, `Traceparent`, `X-Request-Id`, webhook signatures, and similar), header **values**, learned `req_body_noise` fields | A synthetic `502 Bad Gateway` with the body `keploy: no matching mock found for this HTTP request` | +| **gRPC** | Method, metadata keys, decoded message structure and content | Volatile metadata, compression details | A `NotFound` gRPC status | +| **MySQL** | Handshake (character set, max packet size), then query text — exact first, then query _structure_ (literals and bind parameters normalized), with recorded-order fallbacks for repeated identical queries | Bind parameter values when the query structure is unambiguous; fresh generated values (for example a UUID written then read back) resolve by recorded order | A closed connection (drivers report a communication error such as SQLSTATE 08S01) | +| **PostgreSQL** | Wire message shape and query fingerprint; bind values act as a tiebreaker; repeated identical statements resolve in recorded order | Volatile bind positions (for example freshly generated UUIDs) inferred automatically | An in-protocol error response (`KP001`) naming the unmatched query | +| **MongoDB** | Command name, collection, and message structure; values are compared by type, not content | Timestamps, ObjectIDs, cursor ids, and session identifiers cannot break a match | The connection closes and the driver reconnects; misses are logged but do not interrupt the driver | +| **DNS** | Query name and type | — | A DNS failure for the query | +| **Generic (any other TCP)** | Raw bytes — exact first, then similarity (Jaccard) with thresholds 0.9 (per-test Mocks) and 0.4 (session Mocks) | Byte ranges covered by the Mock's `noise:` regular expressions | The connection closes with no bytes | + +A few practical consequences of this table: + +- **HTTP header and query values are not matched** — only their key sets. If a dependency call fails to match, the cause is the path, the method, the key sets, or the body. +- **You usually don't need noise for database traffic.** The MySQL, PostgreSQL, and MongoDB matchers absorb drifting values (timestamps, generated ids) through structure-aware matching and recorded-order fallbacks. Noise configuration matters most for HTTP request bodies and for response assertions. +- **Protocol coverage depends on the binary.** The official release binaries and the Enterprise agent include all protocol matchers listed above. A Keploy binary you build from the open-source repository yourself handles HTTP, MySQL, DNS, and everything else through the Generic byte matcher. + +## The noise vocabulary + +One grammar describes an ignorable field everywhere: + +- `body.` — a JSON field, for example `body.data.created_at` +- `header.` — a header, for example `header.date` + +Three places accept it: + +1. **`globalNoise` in `keploy.yml`** — applies to response assertions, and (for the `body` bucket) to HTTP request-body matching during replay, so a field you declared noisy can never cause a mock miss either. +2. **`spec.assertions.noise` in a Test-Case YAML file** — per-test noise, written automatically at record time for timestamp-like fields. +3. **`req_body_noise` on an HTTP Mock** — request-body fields that Keploy _learned_ drift between recording and replay (see the next section). + +When a mock miss is reported, the field paths in the report use this same grammar — you can copy a path from the report straight into `globalNoise`. + +## Learned request-body noise + +Run replay with `--schema-noise-detection` and Keploy diffs each matched HTTP Mock's recorded request body against the live one, recording drifting field paths as `req_body_noise` on that Mock. The learned noise persists onto `mocks.yaml` automatically. + +`--schema-noise-strict` (or `schemaNoiseStrict` in `keploy.yml`) turns the learned noise into an enforcement contract: a Mock that carries `req_body_noise` only matches when every request-body field outside the learned (and user-configured) noise is identical. A drift on an unmarked field fails the test instead of being served silently. The Keploy Kubernetes agent enables this automatically on its replay path. + +## Fuzzy matching and deterministic replay + +When the exact and structural checks leave more than one candidate — or none — some protocols fall back to **similarity matching**: Levenshtein/Jaccard for HTTP, Jaccard for Generic, and partial-shape scoring for MySQL. A similarity pick is a guess, and a wrong guess surfaces as a confusing downstream test failure. + +The `fuzzyMatch` setting (`keploy.yml`) or `--fuzzy-match` flag controls this: + +| Value | Behavior | +| ---------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| `warn` (default) | Similarity fallbacks run as before, but every fuzzy-served Mock logs a visible warning with the Mock name, match type, and score. Use these warnings to audit whether your suite depends on fuzzy matching. | +| `on` | Legacy behavior — similarity fallbacks run silently. | +| `off` | **Deterministic replay.** No similarity guessing. Byte-equal matches still serve; several equally valid candidates resolve in recorded order; anything else becomes a structured mock miss that names the closest candidate and the exact fields that differ. | + +`off` is the right setting once your suite is stable: every served Mock is then either an exact match, a structural match, or a recorded-order pick — reproducible run to run. + +## Debugging a mock miss + +When an outgoing call has no matching Mock, Keploy reports it in four places: + +1. **A warning log at the moment of the miss**, naming the call, how far matching got (`no_schema_candidates`, `body_mismatch`, `strict_noise_reject`, `no_match`, or `fuzzy_match_disabled`), the closest recorded Mock, and the per-field differences. +2. **The `MOCKS MISMATCH SUMMARY` table** at the end of the run — printed whenever any test had mock differences, including runs that passed. +3. **The Test-Run report YAML file** — each failed test carries `failure_info.unmatched_calls` with the same structured data (protocol, closest Mock, match phase, field diffs). +4. **`keploy report`** — failed tests render their unmatched outgoing calls _above_ the response diff, because the response diff is usually a downstream symptom of the miss. + +Reading a miss report: + +- Field diffs of kind `value_changed` on dynamic-looking fields (ids, timestamps, tokens) → add the reported path to `globalNoise`, or run with `--schema-noise-detection` to learn it. +- Diffs of kind `missing_in_mock` / `missing_in_live` (fields added or removed) → the request structure changed since recording; re-record the Test-Set with `keploy record`. +- `Missing mocks:` rows in the summary table → `mappings.yaml` references Mocks that no longer exist in the Mock file (pruned, deduplicated, or edited away); re-record, or refresh the mapping with `--update-test-mapping`. diff --git a/versioned_sidebars/version-4.0.0-sidebars.json b/versioned_sidebars/version-4.0.0-sidebars.json index 55f463295..e9f6a2abd 100644 --- a/versioned_sidebars/version-4.0.0-sidebars.json +++ b/versioned_sidebars/version-4.0.0-sidebars.json @@ -48,6 +48,7 @@ "running-keploy/configuration-file", "running-keploy/recording-filters", "running-keploy/custom-mocks", + "running-keploy/mock-matching", "running-keploy/keploy-templatize", "running-keploy/risk-profile-analysis", "keploy-cloud/time-freezing",