Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 12 additions & 12 deletions DRIFT.md
Original file line number Diff line number Diff line change
@@ -1,19 +1,19 @@
# Live API Drift Detection

llmock produces responses shaped like real LLM APIs. Providers change their APIs over time. **Drift** means the mock no longer matches reality — your tests pass against llmock but break against the real API.
aimock produces responses shaped like real LLM APIs. Providers change their APIs over time. **Drift** means the mock no longer matches reality — your tests pass against aimock but break against the real API.

## Three-Layer Approach

Drift detection compares three independent sources to triangulate the cause of any mismatch:

| SDK types = Real API? | Real API = llmock? | Diagnosis |
| SDK types = Real API? | Real API = aimock? | Diagnosis |
| --------------------- | ------------------ | -------------------------------------------------------------------- |
| Yes | No | **llmock drift** — response builders need updating |
| Yes | No | **aimock drift** — response builders need updating |
| No | No | **Provider changed before SDK update** — flag, wait for SDK catch-up |
| Yes | Yes | **No drift** — all clear |
| No | Yes | **SDK drift** — provider deprecated something SDK still references |

Two-way comparison (mock vs real) can't distinguish between "we need to fix llmock" and "the SDK hasn't caught up yet." Three-way comparison can.
Two-way comparison (mock vs real) can't distinguish between "we need to fix aimock" and "the SDK hasn't caught up yet." Three-way comparison can.

## Running Drift Tests

Expand All @@ -40,9 +40,9 @@ Each provider's tests skip independently if its key is not set. You can run drif

### Severity levels

- **critical** — Test fails. llmock produces a different shape than the real API for a field that both the SDK and real API agree on. This means llmock needs an update.
- **warning** — Test passes (unless `STRICT_DRIFT=1`). The real API has a field that neither the SDK nor llmock knows about, or the SDK and real API disagree. Usually means a provider added something new.
- **info** — Always passes. Known intentional differences (usage fields are always zero, optional fields llmock omits, etc.).
- **critical** — Test fails. aimock produces a different shape than the real API for a field that both the SDK and real API agree on. This means aimock needs an update.
- **warning** — Test passes (unless `STRICT_DRIFT=1`). The real API has a field that neither the SDK nor aimock knows about, or the SDK and real API disagree. Usually means a provider added something new.
- **info** — Always passes. Known intentional differences (usage fields are always zero, optional fields aimock omits, etc.).

### Example report output

Expand Down Expand Up @@ -86,7 +86,7 @@ When a `critical` drift is detected:

## Model Deprecation

The `models.drift.ts` test scrapes model names referenced in llmock's test files, README, and fixtures, then checks each provider's model listing API to verify they still exist.
The `models.drift.ts` test scrapes model names referenced in aimock's test files, README, and fixtures, then checks each provider's model listing API to verify they still exist.

When a model is deprecated:

Expand All @@ -106,7 +106,7 @@ When a model is deprecated:

## WebSocket Drift Coverage

In addition to the 19 existing drift tests (16 HTTP response-shape + 3 model deprecation), WebSocket drift tests cover llmock's WS protocols (4 verified + 2 canary = 6 WS tests):
In addition to the 19 existing drift tests (16 HTTP response-shape + 3 model deprecation), WebSocket drift tests cover aimock's WS protocols (4 verified + 2 canary = 6 WS tests):

| Protocol | Text | Tool Call | Real Endpoint | Status |
| ------------------- | ---- | --------- | ------------------------------------------------------------------- | ---------- |
Expand All @@ -118,13 +118,13 @@ In addition to the 19 existing drift tests (16 HTTP response-shape + 3 model dep

**Auth**: Uses the same `OPENAI_API_KEY` and `GOOGLE_API_KEY` environment variables as HTTP tests. No new secrets needed.

**How it works**: A TLS WebSocket client (`ws-providers.ts`) connects to real provider endpoints using `node:tls` with RFC 6455 framing. Each protocol function handles the setup sequence (e.g., Realtime session negotiation, Gemini Live setup/setupComplete) and collects messages until a terminal event. The mock side uses the existing `ws-test-client.ts` plaintext client against the local llmock server.
**How it works**: A TLS WebSocket client (`ws-providers.ts`) connects to real provider endpoints using `node:tls` with RFC 6455 framing. Each protocol function handles the setup sequence (e.g., Realtime session negotiation, Gemini Live setup/setupComplete) and collects messages until a terminal event. The mock side uses the existing `ws-test-client.ts` plaintext client against the local aimock server.

### Gemini Live: unverified

llmock's Gemini Live handler implements the text-based `BidiGenerateContent` protocol as documented in Google's [Live API reference](https://ai.google.dev/api/live) — `setup`/`setupComplete` handshake, `clientContent` with turns, `serverContent` with `modelTurn.parts[].text`, and `toolCall` responses. The protocol format is correct per the docs.
aimock's Gemini Live handler implements the text-based `BidiGenerateContent` protocol as documented in Google's [Live API reference](https://ai.google.dev/api/live) — `setup`/`setupComplete` handshake, `clientContent` with turns, `serverContent` with `modelTurn.parts[].text`, and `toolCall` responses. The protocol format is correct per the docs.

However, as of March 2026, the only models that support `bidiGenerateContent` are native-audio models (`gemini-2.5-flash-native-audio-*`), which reject text-only requests. No text-capable model exists for this endpoint yet, so we cannot triangulate llmock's output against a real API response.
However, as of March 2026, the only models that support `bidiGenerateContent` are native-audio models (`gemini-2.5-flash-native-audio-*`), which reject text-only requests. No text-capable model exists for this endpoint yet, so we cannot triangulate aimock's output against a real API response.

A canary test (`ws-gemini-live.drift.ts`) queries the Gemini model listing API on each drift run and checks for a non-audio model that supports `bidiGenerateContent`. When Google ships one, the canary will flag it and the full drift tests can be enabled.

Expand Down
2 changes: 1 addition & 1 deletion packages/aimock-pytest/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# aimock-pytest

pytest fixtures for [aimock](https://github.com/CopilotKit/llmock) — mock LLM APIs, MCP tools, A2A agents, vector databases, and more.
pytest fixtures for [aimock](https://github.com/CopilotKit/aimock) — mock LLM APIs, MCP tools, A2A agents, vector databases, and more.

## Install

Expand Down
12 changes: 6 additions & 6 deletions scripts/update-competitive-matrix.ts
Original file line number Diff line number Diff line change
Expand Up @@ -172,7 +172,7 @@ const DOCS_PATH = resolve(import.meta.dirname ?? __dirname, "../docs/index.html"
const GITHUB_TOKEN = process.env.GITHUB_TOKEN ?? "";
const HEADERS: Record<string, string> = {
Accept: "application/vnd.github.v3+json",
"User-Agent": "llmock-competitive-matrix-updater",
"User-Agent": "aimock-competitive-matrix-updater",
...(GITHUB_TOKEN ? { Authorization: `Bearer ${GITHUB_TOKEN}` } : {}),
};

Expand Down Expand Up @@ -403,7 +403,7 @@ function parseCurrentMatrix(html: string): {
while ((m = thRegex.exec(tableHtml)) !== null) {
headers.push(m[1].trim());
}
// headers[0] = "llmock", headers[1] = "MSW", headers[2..] = competitors
// headers[0] = "aimock", headers[1] = "MSW", headers[2..] = competitors

// Extract rows
const rows = new Map<string, Map<string, string>>();
Expand All @@ -422,7 +422,7 @@ function parseCurrentMatrix(html: string): {

const rowLabel = tds[0];
const rowMap = new Map<string, string>();
// tds[1] = llmock, tds[2] = MSW, tds[3..5] = competitors
// tds[1] = aimock, tds[2] = MSW, tds[3..5] = competitors
for (let i = 1; i < tds.length && i - 1 < headers.length; i++) {
rowMap.set(headers[i - 1], tds[i]);
}
Expand All @@ -433,7 +433,7 @@ function parseCurrentMatrix(html: string): {
}

/**
* Updates only competitor cells (not llmock or MSW) where:
* Updates only competitor cells (not aimock or MSW) where:
* - The current value indicates "No" (class="no">No</td>)
* - The feature was detected in the competitor's README
*
Expand Down Expand Up @@ -495,9 +495,9 @@ function applyChanges(html: string, changes: DetectedChange[]): string {
while ((m = thRegex.exec(theadMatch[1])) !== null) {
headers.push(m[1].trim());
}
// Column indices: "Capability" = 0 (no header link), then llmock=1, MSW=2,
// Column indices: "Capability" = 0 (no header link), then aimock=1, MSW=2,
// VidaiMock=3, mock-llm=4, piyook/llm-mock=5
// In the <td> array: index 0 = capability, 1 = llmock, 2 = MSW, 3+ = competitors
// In the <td> array: index 0 = capability, 1 = aimock, 2 = MSW, 3+ = competitors
const compColumnIndex = (name: string): number => {
const idx = headers.indexOf(name);
return idx === -1 ? -1 : idx + 1; // +1 because first <td> is the row label
Expand Down
4 changes: 2 additions & 2 deletions src/__tests__/aimock-cli.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -456,7 +456,7 @@ describe("runAimockCli: onReady and shutdown", () => {
});
});

it("shutdown calls llmock.stop()", async () => {
it("shutdown calls aimock.stop()", async () => {
const mockStop = vi.fn().mockResolvedValue(undefined);
const startFromConfigFn = vi.fn().mockResolvedValue({
llmock: { stop: mockStop },
Expand Down Expand Up @@ -496,7 +496,7 @@ describe("runAimockCli: onReady and shutdown", () => {
});
});

it("shutdown logs error and exits 1 when llmock.stop() rejects", async () => {
it("shutdown logs error and exits 1 when aimock.stop() rejects", async () => {
const mockStop = vi.fn().mockRejectedValue(new Error("close ENOTCONN"));
const startFromConfigFn = vi.fn().mockResolvedValue({
llmock: { stop: mockStop },
Expand Down
2 changes: 1 addition & 1 deletion src/__tests__/cli.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -94,7 +94,7 @@ function writeFixture(dir: string, name: string): string {
describe.skipIf(!CLI_AVAILABLE)("CLI: --help", () => {
it("prints usage text and exits with code 0", async () => {
const { stdout, code } = await runCli(["--help"]);
expect(stdout).toContain("Usage: llmock");
expect(stdout).toContain("Usage: aimock");
expect(stdout).toContain("--port");
expect(stdout).toContain("--fixtures");
expect(code).toBe(0);
Expand Down
2 changes: 1 addition & 1 deletion src/__tests__/competitive-matrix-summary.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@ function writeSummary(summaryPath: string, changes: DetectedChange[]): void {
// ── Helpers ─────────────────────────────────────────────────────────────────

function tmpPath(suffix: string): string {
return join(tmpdir(), `llmock-cm-test-${suffix}-${Date.now()}.md`);
return join(tmpdir(), `aimock-cm-test-${suffix}-${Date.now()}.md`);
}

const tempFiles: string[] = [];
Expand Down
2 changes: 1 addition & 1 deletion src/__tests__/config-loader.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -772,7 +772,7 @@ describe("startFromConfig", () => {
expect(card.name).toBe("no-events-agent");
});

it("with record config, llmock receives record settings", async () => {
it("with record config, aimock receives record settings", async () => {
const config: AimockConfig = {
llm: {
record: {
Expand Down
2 changes: 1 addition & 1 deletion src/__tests__/drift/anthropic.drift.ts
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
/**
* Anthropic Claude Messages API drift tests.
*
* Three-way comparison: SDK types × real API × llmock output.
* Three-way comparison: SDK types × real API × aimock output.
*/

import { describe, it, expect, beforeAll, afterAll } from "vitest";
Expand Down
2 changes: 1 addition & 1 deletion src/__tests__/drift/bedrock-stream.drift.ts
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
/**
* AWS Bedrock drift tests.
*
* Three-way comparison: SDK types x real API x llmock output.
* Three-way comparison: SDK types x real API x aimock output.
* Covers invoke-with-response-stream and converse endpoints.
*/

Expand Down
2 changes: 1 addition & 1 deletion src/__tests__/drift/cohere.drift.ts
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
/**
* Cohere drift tests.
*
* Three-way comparison: expected shape x real API x llmock output.
* Three-way comparison: expected shape x real API x aimock output.
* Covers /v2/chat non-streaming and streaming endpoints.
*
* Requires: COHERE_API_KEY
Expand Down
2 changes: 1 addition & 1 deletion src/__tests__/drift/gemini.drift.ts
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
/**
* Google Gemini GenerateContent API drift tests.
*
* Three-way comparison: SDK types × real API × llmock output.
* Three-way comparison: SDK types × real API × aimock output.
*/

import { describe, it, expect, beforeAll, afterAll } from "vitest";
Expand Down
8 changes: 4 additions & 4 deletions src/__tests__/drift/models.drift.ts
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/**
* Model deprecation checks — verify that models referenced in llmock's
* Model deprecation checks — verify that models referenced in aimock's
* tests, docs, and examples still exist at each provider.
*/

Expand Down Expand Up @@ -43,7 +43,7 @@ const sourceFiles = [
// ---------------------------------------------------------------------------

describe.skipIf(!process.env.OPENAI_API_KEY)("OpenAI model availability", () => {
it("models used in llmock tests are still available", async () => {
it("models used in aimock tests are still available", async () => {
const models = await listOpenAIModels(process.env.OPENAI_API_KEY!);
const referenced = scrapeModels(/\b(gpt-4o(?:-mini)?|gpt-4|gpt-3\.5-turbo)\b/g, sourceFiles);

Expand All @@ -62,7 +62,7 @@ describe.skipIf(!process.env.OPENAI_API_KEY)("OpenAI model availability", () =>
// ---------------------------------------------------------------------------

describe.skipIf(!process.env.ANTHROPIC_API_KEY)("Anthropic model availability", () => {
it("models used in llmock tests are still available", async () => {
it("models used in aimock tests are still available", async () => {
const models = await listAnthropicModels(process.env.ANTHROPIC_API_KEY!);
const referenced = scrapeModels(
/\b(claude-3(?:\.\d+)?-(?:opus|sonnet|haiku)(?:-\d{8})?)\b/g,
Expand All @@ -83,7 +83,7 @@ describe.skipIf(!process.env.ANTHROPIC_API_KEY)("Anthropic model availability",
// ---------------------------------------------------------------------------

describe.skipIf(!process.env.GOOGLE_API_KEY)("Gemini model availability", () => {
it("models used in llmock tests are still available", async () => {
it("models used in aimock tests are still available", async () => {
const models = await listGeminiModels(process.env.GOOGLE_API_KEY!);
const referenced = scrapeModels(/\b(gemini-(?:[\w.-]+))\b/g, sourceFiles);

Expand Down
2 changes: 1 addition & 1 deletion src/__tests__/drift/ollama.drift.ts
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
/**
* Ollama drift tests.
*
* Compares llmock's Ollama endpoint output shapes against a real local
* Compares aimock's Ollama endpoint output shapes against a real local
* Ollama instance. Skips automatically if Ollama is not reachable.
*
* Requires: local Ollama running at http://localhost:11434
Expand Down
2 changes: 1 addition & 1 deletion src/__tests__/drift/openai-chat.drift.ts
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
/**
* OpenAI Chat Completions API drift tests.
*
* Three-way comparison: SDK types × real API × llmock output.
* Three-way comparison: SDK types × real API × aimock output.
*/

import { describe, it, expect, beforeAll, afterAll } from "vitest";
Expand Down
2 changes: 1 addition & 1 deletion src/__tests__/drift/openai-embeddings.drift.ts
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
/**
* OpenAI Embeddings API drift tests.
*
* Three-way comparison: SDK types × real API × llmock output.
* Three-way comparison: SDK types × real API × aimock output.
*/

import { describe, it, expect, beforeAll, afterAll } from "vitest";
Expand Down
2 changes: 1 addition & 1 deletion src/__tests__/drift/openai-responses.drift.ts
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
/**
* OpenAI Responses API drift tests.
*
* Three-way comparison: SDK types × real API × llmock output.
* Three-way comparison: SDK types × real API × aimock output.
*/

import { describe, it, expect, beforeAll, afterAll } from "vitest";
Expand Down
6 changes: 3 additions & 3 deletions src/__tests__/drift/schema.ts
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
/**
* Shape extraction, three-way comparison, severity classification, and reporting
* for drift detection between SDK types, real API responses, and llmock output.
* for drift detection between SDK types, real API responses, and aimock output.
*/

// ---------------------------------------------------------------------------
Expand All @@ -23,7 +23,7 @@ export interface ShapeDiff {
issue: string;
expected: string; // from SDK types
real: string; // from real API
mock: string; // from llmock
mock: string; // from aimock
}

export interface SSEEventShape {
Expand Down Expand Up @@ -248,7 +248,7 @@ function triangulateAt(
// All absent — nothing to compare
if (!sdk && !real && !mock) return diffs;

// Field in SDK + real but not mock → llmock drift (critical)
// Field in SDK + real but not mock → aimock drift (critical)
if (sdk && real && !mock) {
diffs.push({
path: displayPath,
Expand Down
4 changes: 2 additions & 2 deletions src/__tests__/drift/vertex-ai.drift.ts
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
/**
* Vertex AI / Gemini drift tests.
*
* Verifies that llmock's Vertex AI routing produces response shapes
* Verifies that aimock's Vertex AI routing produces response shapes
* consistent with the Gemini generateContent endpoint.
*
* Requires: GOOGLE_APPLICATION_CREDENTIALS or (VERTEX_AI_PROJECT + VERTEX_AI_LOCATION)
Expand Down Expand Up @@ -71,7 +71,7 @@ describe.skipIf(!HAS_CREDENTIALS)("Vertex AI drift", () => {
it("generateContent mock shape matches Gemini format", async () => {
const sdkShape = geminiGenerateContentShape();

// Vertex AI routing in llmock follows the path pattern:
// Vertex AI routing in aimock follows the path pattern:
// /v1/projects/{project}/locations/{location}/publishers/google/models/{model}:generateContent
const mockRes = await httpPost(
`${instance.url}/v1/projects/test-project/locations/us-central1/publishers/google/models/gemini-2.5-flash:generateContent`,
Expand Down
2 changes: 1 addition & 1 deletion src/__tests__/drift/ws-gemini-live.drift.ts
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
/**
* Gemini Live BidiGenerateContent WebSocket drift tests.
*
* Three-way comparison: SDK types × real API (WS) × llmock output (WS).
* Three-way comparison: SDK types × real API (WS) × aimock output (WS).
*
* Currently, the Gemini Live API only supports native-audio models
* (those with "native-audio" in the name) which cannot return TEXT responses.
Expand Down
2 changes: 1 addition & 1 deletion src/__tests__/drift/ws-realtime.drift.ts
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
/**
* OpenAI Realtime API WebSocket drift tests.
*
* Three-way comparison: SDK types x real API (WS) x llmock output (WS).
* Three-way comparison: SDK types x real API (WS) x aimock output (WS).
*/

import { describe, it, expect, beforeAll, afterAll } from "vitest";
Expand Down
2 changes: 1 addition & 1 deletion src/__tests__/drift/ws-responses.drift.ts
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
/**
* OpenAI Responses API WebSocket drift tests.
*
* Three-way comparison: SDK types × real API (WS) × llmock output (WS).
* Three-way comparison: SDK types × real API (WS) × aimock output (WS).
* The Responses WS protocol uses the same event shapes as HTTP SSE.
*/

Expand Down
2 changes: 1 addition & 1 deletion src/__tests__/llmock.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ function chatBody(userMessage: string, stream = true) {
}

function makeTmpDir(): string {
return mkdtempSync(join(tmpdir(), "llmock-test-"));
return mkdtempSync(join(tmpdir(), "aimock-test-"));
}

// ---- Tests ----
Expand Down
Loading
Loading