Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
e7e1d92
docs: add Kimi-K2.6 ADE-Bench behavioral analysis + dbt skill improve…
anandgupta42 May 11, 2026
df9a3d5
docs: add benchmark/ade-bench/ reproduction scaffolding
anandgupta42 May 11, 2026
d8a1add
feat: auto-load skills via applyPaths frontmatter + new dbt-develop p…
anandgupta42 May 11, 2026
6107c3b
docs: document alwaysApply / applyPaths skill frontmatter fields
anandgupta42 May 11, 2026
c647876
feat: reorder auto-loaded skill bodies + add pre-completion checklist
anandgupta42 May 11, 2026
f8fd33f
revert(skill): roll back pre-completion checklist; document negative …
anandgupta42 May 11, 2026
644d0b2
docs(skill): swap dbt-textbook airbnb names for abstract placeholders
anandgupta42 May 26, 2026
267bf4b
feat(skill): schema fidelity + CTE-refactor row preservation + spec-d…
anandgupta42 May 26, 2026
a191b80
feat(dbt-tools): altimate-dbt schema-verify — mechanical column-shape…
anandgupta42 May 26, 2026
519b817
feat(skill): extract dbt-schema-verify into a dedicated auto-load skill
anandgupta42 May 26, 2026
3924009
feat(dbt-tools): auto-run schema-verify after build --model
anandgupta42 May 26, 2026
f784f69
feat(dbt-tools): extend schema-verify auto-trigger to project-wide build
anandgupta42 May 26, 2026
9e09cc2
feat(session): harness-side validator framework (off by default)
anandgupta42 May 26, 2026
a096b97
fix(validators): explicit registration + diagnostic log
anandgupta42 May 26, 2026
091218d
fix(validators): stderr diagnostic so harness logs capture the signal
anandgupta42 May 26, 2026
7ca5a36
feat(validators): dbt-tests-pass + schema-verify hardening + marker f…
anandgupta42 May 27, 2026
0724de3
fix: [#849] address code-review findings in validator framework
anandgupta42 May 27, 2026
04abef0
test: [#849] adversarial test expansion for validator utilities
anandgupta42 May 27, 2026
8a33919
fix: remove upstream product name from research/kimi-k26 findings
anandgupta42 May 28, 2026
81b6df2
fix: [#849] address PR review comments — validator hardening + agent …
anandgupta42 May 28, 2026
c772e99
test: [#849] deflake `work can be started after cancel` runner test
anandgupta42 May 30, 2026
9eb6bc7
fix: [#849] 11 real bugs found via adversarial validator-utils testing
anandgupta42 May 30, 2026
132db73
fix: [#849] address remaining 39 adversarial bugs from waves 2-12
anandgupta42 May 30, 2026
0ba9c0e
test: [#849] add 51 E2E test cases (`.skip`) documenting real-world bugs
anandgupta42 May 30, 2026
01cb979
fix: [#849] gate validators on opt-in flag + enrich result details
anandgupta42 May 30, 2026
390fecb
docs: [#849] document the validator framework + opt-in defaults
anandgupta42 May 30, 2026
44d65db
bench: [#849] enable validators by default in ade-bench setup
anandgupta42 May 30, 2026
2a03484
Merge origin/main into feat/validator-framework
anandgupta42 May 30, 2026
bc5bca6
fix: [#849] mark altimate-backend providerID in transform.ts upstream…
anandgupta42 May 30, 2026
4d5c76d
docs: [#849] address review feedback on validator docs
anandgupta42 May 30, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
275 changes: 274 additions & 1 deletion .opencode/skills/dbt-develop/SKILL.md

Large diffs are not rendered by default.

146 changes: 146 additions & 0 deletions .opencode/skills/dbt-schema-verify/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,146 @@
---
name: dbt-schema-verify
applyPaths:
- "dbt_project.yml"
- "**/dbt_project.yml"
description: |
REQUIRED after building or modifying ANY dbt model that has columns declared
in `schema.yml` / `_models.yml`. Run `altimate-dbt schema-verify --model
<name>` to diff actual columns against the spec, and treat any `mismatch`
verdict as "not done."

The most common reason "the build is green but the tests still fail" is
that the model produces the right *data values* in the wrong *column
shape* — extra columns, missing columns, wrong order, wrong types. Many
dbt equality tests grade the column tuple `(name, type, position)`
exactly, and the agent's prior bias is to add "helpful" extras
(`p1`/`p2`/`p3` rank breakdowns, name-resolved variants, lineage
metadata) or reorder columns "more logically." Both break the contract.

This skill enforces the mechanical check that catches those bugs before
declaring done. Use it before declaring any model task complete.
---

# dbt schema-verify

## When to invoke this skill — every time

Run `altimate-dbt schema-verify --model <name>` before declaring any of the
following tasks complete:

- Creating a new dbt model that has (or will have) a `schema.yml` entry
- Modifying an existing model whose `schema.yml` declares columns
- Refactoring a CTE into its own intermediate model
- Renaming columns or changing their order
- Changing materialization config in a way that re-creates the table
- Any task that says "match the schema", "produce these columns", "the
output should have columns X, Y, Z", or references a `_models.yml`
- Any task with `AUTO_*_equality` or `AUTO_*_existence` tests on a model

If the task touched N models, run schema-verify on **all N of them**, not
just the last one. A `build` is not a verify.

## How to run it

```bash
altimate-dbt schema-verify --model <name>
```

**Note**: `altimate-dbt build --model <name>` already runs schema-verify
automatically after a successful build and includes the verdict in its
response under a `schema_verify` field. You will see the diff in the same
result that reported the build outcome — read it there before deciding
the task is done. If you need to re-check after editing, call
`schema-verify` directly.

Returns a structured JSON result:

```json
{
"model": "int_asana__project_user_agg",
"verdict": "mismatch",
"expected_columns": ["project_id", "users", "number_of_users_involved"],
"actual_columns": ["project_id", "users"],
"columns_extra": [],
"columns_missing": ["number_of_users_involved"],
"columns_reordered": [],
"type_mismatches": []
}
```

## How to read the verdict

| verdict | meaning | what to do |
|---|---|---|
| `match` | actual columns match the spec exactly (case-insensitive on names) | DONE — proceed |
| `mismatch` | one or more of `columns_extra`, `columns_missing`, `columns_reordered`, `type_mismatches` is non-empty | NOT DONE — read the diff, fix the model SQL, rebuild, re-run schema-verify |
| `no-spec` | the model has no columns declared in `schema.yml` | DONE for shape-fidelity purposes — no contract to verify against |

## How to act on a `mismatch`

For each non-empty list, the fix is mechanical:

| Field | What it means | What to change in the model SQL |
|---|---|---|
| `columns_extra` | columns in your model NOT in the spec | REMOVE them from the `SELECT` |
| `columns_missing` | columns in the spec NOT in your model | ADD them to the `SELECT` (compute them, or rename an existing column if you used a synonym) |
| `columns_reordered` | columns present in both but at different positions | REORDER the columns in your `SELECT` to match the spec's order |
| `type_mismatches` | declared `data_type` in spec disagrees with the warehouse's reported type | CAST in the `SELECT` or change the upstream source |

Then run `altimate-dbt build --model <name>` again, then re-run
`altimate-dbt schema-verify --model <name>` until verdict is `match`.

## Iron Rules

1. **The verdict is the source of truth, not your inspection.** Reading the
columns yourself and concluding "looks right to me" does not count.
Run the command and read its output.
2. **A `mismatch` is "not done", even if the build is green.** dbt build
only proves the SQL compiled and ran without errors. It does not prove
the column shape is correct. Equality tests grade shape AND values.
3. **Do not reinterpret the spec to make the model right.** The spec is
the contract. If the spec lists `supplier_company` and your model has
`supplier_id`, the answer is to fix your model, not to argue that
`supplier_id` is more useful.
4. **Run schema-verify on every model touched, not just the last one.**
The most common "almost-pass" is N-1 models passing and the Nth one
silently failing on column shape. Walk the list.
5. **Skip only on `no-spec`.** Do not skip on the grounds that the model
is small, or trivial, or "obvious." The spec is small only because
the dbt project author already curated it.

## Fallback when altimate-dbt is unavailable

If `which altimate-dbt` returns nothing, do the same diff by hand:

```bash
# 1. Read expected columns from any YAML spec under models/
# dbt allows any .yml filename; common patterns include schema.yml,
# _models.yml, models.yml, sources.yml, etc.
cat models/**/*.yml | grep -A 50 "name: <name>" # or: yq eval '...' models/**/*.yml

# 2. Read actual columns from the materialized table
dbt show --select <name> --limit 0
```

Compare the two ordered lists. Produce the same four-bucket diff
(`columns_extra`, `columns_missing`, `columns_reordered`,
`type_mismatches`) in your head, and apply the same fix logic. The
mechanics don't change; only the tool name does.

## What this skill does NOT cover

- **Value-level correctness** — passing schema-verify only proves shape;
whether the *values* in each column are right is a separate check
(`altimate-dbt test` + dbt unit tests). Generate unit tests with the
`dbt-unit-tests` skill when the model has non-trivial transformation
logic.
- **Row count** — schema-verify compares columns, not rows. If a refactor
drops rows that should be preserved (common when extracting a CTE into
its own model — see `dbt-develop`'s "Refactoring a CTE into its own
model" section), schema-verify will pass while equality tests fail.
Check row counts separately.
- **Custom tests** — `check_*` and other non-AUTO tests check
task-specific business rules, not column shape. schema-verify can pass
while a custom test fails. Read the custom test SQL to understand
what's being asserted.
13 changes: 13 additions & 0 deletions .opencode/skills/dbt-unit-tests/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,19 @@ description: Generate dbt unit tests automatically for any model. Analyzes SQL l
3. **Use sql format for ephemeral models.** Dict format fails silently for ephemeral upstreams.
4. **Never weaken a test to make it pass.** If the test fails, the model logic may be wrong. Investigate before changing expected values.
5. **Compile before committing.** Always run `altimate-dbt test --model <name>` to verify tests compile and execute.
6. **Mock data MUST exercise the failure modes of every SQL construct in the model.** A unit test that only covers the happy path validates that the model handles easy inputs — it does not validate correctness. Before writing `given:` rows, list every SQL construct in the model and the boundary case it can mishandle, then ensure at least one mock row triggers each. Universal cases to always cover when the construct appears:
- **`LEFT JOIN` / `LEFT OUTER JOIN`** → at least one parent row with **no matching child** (catches `COUNT(*)` phantom rows, `SUM` over `NULL`, fan-out / dropout)
- **`INNER JOIN`** → at least one parent row whose child is filtered out by the JOIN condition (catches missing rows)
- **`COUNT(*)` / `COUNT(<col>)`** → row where the counted column is `NULL` (catches `COUNT(*)` vs `COUNT(col)` divergence)
- **`NULLIF(x, y)`** → row where `x = y` (so the result is `NULL`, exercising downstream `NULL`-handling)
- **`/` division** → row where the denominator is `0` or `NULL`
- **`CASE WHEN`** → at least one row matching each branch, including the implicit `ELSE NULL` if no explicit `ELSE` is set
- **`COALESCE` / `IFNULL`** → row where every argument is `NULL`
- **Window functions (`OVER`)** → a partition of size 1 (single-row group exercises rank/first/last edge cases), a row at the partition boundary, and a tie-break row (two rows with the same ORDER BY key)
- **Date arithmetic / date spines** → a row at the start of range, end of range, and a gap day with no events
- **Aggregations with `GROUP BY`** → at least one group of size 1 (often masks fan-out bugs) and one group whose key is `NULL`
- **Incremental merge keys** → both an "insert" row and an "update" row matching an existing key
If you can't think of a failure mode for a construct, you don't yet understand it well enough to test it — read the SQL again before guessing inputs.

## Core Workflow: Analyze -> Generate -> Refine -> Validate -> Write

Expand Down
20 changes: 20 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,26 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## Unreleased

### Added

- **Completion-gate validator framework.** A new opt-in harness-side check
that runs after the LLM declares `finish === "stop"`. Two built-in
validators for dbt projects: `dbt-tests-pass` (runs `altimate-dbt test`
against modified models) and `dbt-schema-verify` (runs `altimate-dbt
schema-verify` against modified models). On failure, the framework
injects a synthetic user turn so the agent gets one more chance to fix
the issue, bounded by a per-session retry budget. Two opt-in modes:
`ALTIMATE_VALIDATORS_ENABLED=1` (enforcement + retries) and
`ALTIMATE_VALIDATORS_SHADOW=1` (telemetry-only — measure "would have
caught" rates without blocking). Default is **off** with zero overhead.
Two new telemetry events (`validator_check`, `validator_retries_exhausted`).
Configuration via `ALTIMATE_VALIDATORS_{MAX_RETRIES,TIMEOUT_MS,CONCURRENCY,DEBUG}`.
See [Validators docs](https://docs.altimate.sh/data-engineering/validators/)
for the full reference, performance characteristics, and the phased
rollout plan. (#849)

## [0.7.3] - 2026-05-24

A telemetry-driven hardening release. Five P0 fixes merged from a `telemetry-analysis-2026-05-21` pass — every one tied to a measured failure number from the App Insights pipeline. The headline wins are user-visible: `finops_*` tools now work without an explicit `warehouse=` parameter (auto-pick the first compatible connection); `project_scan` no longer crashes on hosts where `git` isn't in PATH (the silent 437-user regression was masked by the PII filter collapsing the binary name to `?` in error messages); and `webfetch` caches 404/410/451 responses for up to 30 minutes so the agent stops re-asking dead URLs. Two telemetry-only fixes (build-agent name normalization, Anthropic token-count semantics) clean up dashboard mis-bucketing without changing user-visible behavior.
Expand Down
Loading
Loading