PRAMANA takes its name from Sanskrit, where it means measure, proof, or means of knowledge. The tool aims to execute the hardware verification lifecycle autonomously — from harness quality assessment through solver selection to failure explanation — as a fully closed-loop agentic system. This repository implements the foundational tier of that vision: autonomous formal verification via four cooperating agents.
| Agent | Name | Purpose |
|---|---|---|
| Agent II | Mutation-Guided Refinement (MGR) | Measure and improve formal harness quality via mutation analysis |
| Agent III | Predictive Solver Portfolio (PSP) | Select the best SMT solver for the design; generate a ready-to-run .sby file |
| Agent IV | Causal Narrative Synthesis (CNS) | Explain formal verification failures with causal, step-by-step narratives |
Starting point: Agent I (harness generation) is a prerequisite but is assumed complete — the user provides a working formal harness
.svfile. This README begins at Agent II (MGR).
- Prerequisites
- Repository Structure
- Agent II — Mutation-Guided Refinement (MGR)
- Agent III — Predictive Solver Portfolio (PSP)
- Agent IV — Causal Narrative Synthesis (CNS)
It is recomned to use the VSCode devcontainer for a pre-configured environment. The most convient method is to open this repository on GitHub with the "Open with Codespaces" button, which will launch a cloud-based devcontainer. Alternatively, you can set up the devcontainer locally with Docker and VSCode's Remote Containers extension. If you prefer a local setup without Docker, ensure all the tools listed below are installed and available on your system.
The following tools must be installed and available on PATH:
| Tool | Purpose | Install reference |
|---|---|---|
yosys |
RTL synthesis and statistics extraction | yosys-install.md |
sby (SymbiYosys) |
Formal verification runner | yosys-install.md |
smtbmc with Yices / Z3 / Bitwuzla backends |
SMT solvers | z3-install.md |
| Python 3.9+ | Script execution | system package manager |
LLM SDK (e.g. anthropic) |
Agent IV (CNS) LLM calls | pip install anthropic or equivalent |
| LLM API key | Agent IV (CNS) narrative generation | set LLM_API_KEY env var |
Verify tool availability:
yosys --version
sby --version
python3 --versionFor CNS, set your API key:
export LLM_API_KEY=<your-key>Note: Agents II and III require only the open-source tools listed above — no LLM or API key is needed. Agent IV (CNS) requires an API key to generate narratives. All pre-computed CNS responses for the 8 benchmark cases are already committed to the repository (in
cns/responses_r2/), so scoring and inspection can be done without re-running the LLM.
digital-tools/
│
├── manage_formal.py # Agent III: PSP orchestrator (Yosys → metrics → solver → .sby)
├── extract_stat_json.py # Helper: extract JSON stats block from a Yosys log
├── extract_complexity_metrics.py # Helper: compute D_A, D_M, W, D_R, I_C, CHI_NORM
├── generate_metrics_table.py # Batch metrics printer/CSV exporter for all designs
│
├── examples/ # Reference RTL designs and their formal setups
│ ├── uart/ # UART (tx + rx + top)
│ ├── i2c/ # I2C master
│ ├── generic_fifo_lfsr/ # Synchronous FIFO with LFSR
│ │ └── repro_todo2_aw16_d15/active/ # ← canonical FIFO formal files (use these)
│ ├── sdram/
│ ├── sha3/
│ ├── pipelined_fft_256_latest/
│ ├── vga_lcd_latest/
│ └── up8_minimal/ # 8-bit microprocessor (special handling required)
│
├── cns/
│ ├── cns_agent.py # Agent IV: CNS — prompt builder, API caller, scorer
│ ├── ground_truth.json # Ground truth for 8 benchmark cases
│ ├── cases/ # Input case JSON files (C1–C8)
│ ├── responses/ # Raw CNS responses (Round 2, default)
│ ├── responses_r1/ # Round 1 (zero-shot baseline) responses
│ ├── responses_r2/ # Round 2 (refined prompt) responses
│ ├── scored/ # Per-case binary scores (Round 2)
│ ├── scored_r1/ # Per-case binary scores (Round 1)
│ └── scored_r2/ # Per-case binary scores (Round 2)
│
└── mgr/
├── tools/
│ ├── mutant_generator.py # Step 1: generate mutants from a single RTL file
│ └── audit_runner.py # Step 2: run sby per mutant, classify, summarise
└── campaigns/ # One subdirectory per design under verification
├── and2bit/ # 2-bit AND gate (trivial sanity check)
├── uart_rx/ # UART receiver FSM
├── uart_full/ # UART top-level wrapper
├── up8/ # 8-bit pipelined CPU
├── sha3/ # Keccak-f permutation core
├── sdram/ # SDRAM controller (iterative, 4 rounds)
├── vga_lcd/ # VGA vertical timing FSM
└── pipelined_fft_256/ # FFT-256 CNORM normalisation unit
Campaign round convention: each refinement_vN/ directory is immutable once
created — never overwrite a completed round. Start a new refinement_v(N+1)/ for
each new harness iteration.
MGR measures how well a formal harness can detect bugs by injecting small artificial bugs (mutants) into the RTL and checking whether the harness catches them:
- KILLED — the harness found a counterexample for this mutant (good).
- SURVIVED — the harness passed despite the bug (the harness needs strengthening).
- INVALID — the mutant could not be elaborated (e.g., a reset-sensitivity change broke compilation); excluded from the kill-rate denominator.
- TIMEOUT / INCONCLUSIVE — the solver did not finish within the time budget.
Two kill rates are reported:
Raw Kill Rate = KILLED / TOTAL
Effective Kill Rate = KILLED / (TOTAL − INVALID)
The effective kill rate is the primary metric. The target threshold is T = 50 %. If the effective kill rate is below T, strengthen the harness and run a new round.
mgr/tools/mutant_generator.py targets one RTL file (the primary state/logic
file of the design) and applies conservative, one-change-at-a-time transformations.
CLI:
python3 mgr/tools/mutant_generator.py \
--rtl <path/to/target_rtl_file.v> \
--out <output_directory> \
--design <design_name> \
[--max-mutants <N>] # default: 30Outputs written to <output_directory>/:
manifest.json— metadata for every generated mutant (used by audit_runner)mutants/— one.vfile per mutant
Important: --rtl is the single file to mutate (e.g., uart_tx.v), not the full
RTL set. The tool reads this file, applies mutations, and records the absolute path in
manifest.json so audit_runner.py knows which file to swap per mutant.
mgr/tools/audit_runner.py creates an isolated workspace for each mutant, runs
sby, and classifies the result.
CLI:
python3 mgr/tools/audit_runner.py \
--manifest <path/to/manifest.json> \
--rtl-files <file1.v> <file2.v> ... \
--harness-files <harness.sv> \
--sby <path/to/prove.sby> \
--workdir <path/to/runs_directory> \
--timeout <seconds_per_mutant>--rtl-files— all RTL files needed to compile the design (timescale, sub-modules, top-level wrappers, etc.). The tool copies them all into each mutant's workspace.--harness-files— formal harness.svfile(s).--sby— the.sbyconfiguration file for this campaign round.
Outputs written to <workdir>/:
summary.json— total, per-class counts, raw/effective kill rates, list of survived mutantssummary.csv— same data in tabular form<mutant_id>/logs/stdout.log,stderr.log— per-mutant sby output
Read <workdir>/summary.json:
{
"total_mutants": 20,
"counts": { "KILLED": 18, "SURVIVED": 2, "INVALID": 0 },
"raw_kill_rate": 0.9,
"effective_mutants": 20,
"effective_kill_rate": 0.9,
"survived_mutants": ["M003", "M011"]
}Decision logic:
effective_kill_rate >= 0.50→ harness quality is sufficient. Proceed to Agent III (PSP).effective_kill_rate < 0.50→ inspectsurvived_mutants, identify what property is missing, update the harness, create a newrefinement_v(N+1)/directory, and repeat from Step 1.
All commands below are run from the repository root (/workspaces/digital-tools/).
RTL target for mutation: and2bit.v (trivial 2-bit AND gate).
Step 1 — Generate mutants:
python3 mgr/tools/mutant_generator.py \
--rtl examples/and2bit/and2bit.v \
--out mgr/campaigns/and2bit/mutants/r1 \
--design and2bit \
--max-mutants 20Step 2 — Audit mutants:
python3 mgr/tools/audit_runner.py \
--manifest mgr/campaigns/and2bit/mutants/r1/manifest.json \
--rtl-files mgr/campaigns/and2bit/baseline/and2bit.v \
--harness-files mgr/campaigns/and2bit/baseline/and2bit_formal.sv \
--sby mgr/campaigns/and2bit/baseline/and2bit_prove.sby \
--workdir mgr/campaigns/and2bit/runs/r1 \
--timeout 60Results (R1): 20/20 KILLED → effective kill rate 100 % ✓
RTL target for mutation: uart_rx.v (serial receiver FSM).
Step 1 — Generate mutants:
python3 mgr/tools/mutant_generator.py \
--rtl mgr/campaigns/uart_rx/baseline/uart_rx.v \
--out mgr/campaigns/uart_rx/mutants/r1 \
--design uart_rx \
--max-mutants 20Step 2 — Audit mutants:
python3 mgr/tools/audit_runner.py \
--manifest mgr/campaigns/uart_rx/mutants/r1/manifest.json \
--rtl-files mgr/campaigns/uart_rx/baseline/uart_rx.v \
--harness-files mgr/campaigns/uart_rx/refinement_v1/uart_rx_formal.sv \
--sby mgr/campaigns/uart_rx/refinement_v1/uart_rx_prove.sby \
--workdir mgr/campaigns/uart_rx/runs/r1 \
--timeout 60Results (R1): 18/20 KILLED → effective kill rate 90 % ✓
(2 survivors are #1→#0 delay true equivalents, unkillable at RTL-formal level.)
RTL target for mutation: uart_tx.v (TX state machine, the primary logic file of the full-UART design).
Full RTL set: uart_full.v, uart_tx.v, uart_rx.v.
Step 1 — Generate mutants:
python3 mgr/tools/mutant_generator.py \
--rtl mgr/campaigns/uart_full/baseline/uart_tx.v \
--out mgr/campaigns/uart_full/mutants/r1 \
--design uart_full \
--max-mutants 20Step 2 — Audit mutants:
python3 mgr/tools/audit_runner.py \
--manifest mgr/campaigns/uart_full/mutants/r1/manifest.json \
--rtl-files mgr/campaigns/uart_full/baseline/uart_full.v \
mgr/campaigns/uart_full/baseline/uart_tx.v \
mgr/campaigns/uart_full/baseline/uart_rx.v \
--harness-files mgr/campaigns/uart_full/baseline/uart_full_formal.sv \
--sby mgr/campaigns/uart_full/baseline/uart_full_prove.sby \
--workdir mgr/campaigns/uart_full/runs/r1 \
--timeout 120Results (R1): 18/20 KILLED → effective kill rate 90 % ✓
RTL target for mutation: up8_cpu.v (8-bit pipelined CPU).
The UP8 core requires preprocessor flags (FORMAL, UP8_INLINE_ROM, UP8_FORMAL_ROMONLY)
that are already baked into the baseline .sby file in the campaign directory.
Step 1 — Generate mutants:
python3 mgr/tools/mutant_generator.py \
--rtl mgr/campaigns/up8/baseline/up8_cpu.v \
--out mgr/campaigns/up8/mutants/r1 \
--design up8 \
--max-mutants 20Step 2 — Audit mutants:
python3 mgr/tools/audit_runner.py \
--manifest mgr/campaigns/up8/mutants/r1/manifest.json \
--rtl-files mgr/campaigns/up8/baseline/up8_cpu.v \
mgr/campaigns/up8/baseline/up8_inline_rom.vh \
--harness-files mgr/campaigns/up8/baseline/up8_add1_formal.sv \
--sby mgr/campaigns/up8/baseline/up8_add1_prove.sby \
--workdir mgr/campaigns/up8/runs/r1 \
--timeout 120Results (R1): 20/20 KILLED → effective kill rate 100 % ✓
RTL target for mutation: keccak.v (Keccak-f permutation core).
Full RTL set: all .v files in mgr/campaigns/sha3/baseline/.
Step 1 — Generate mutants:
python3 mgr/tools/mutant_generator.py \
--rtl mgr/campaigns/sha3/baseline/keccak.v \
--out mgr/campaigns/sha3/mutants/r1 \
--design sha3 \
--max-mutants 20Step 2 — Audit mutants:
python3 mgr/tools/audit_runner.py \
--manifest mgr/campaigns/sha3/mutants/r1/manifest.json \
--rtl-files mgr/campaigns/sha3/baseline/keccak.v \
mgr/campaigns/sha3/baseline/padder.v \
mgr/campaigns/sha3/baseline/padder1.v \
mgr/campaigns/sha3/baseline/f_permutation.v \
mgr/campaigns/sha3/baseline/round2in1.v \
mgr/campaigns/sha3/baseline/rconst2in1.v \
--harness-files mgr/campaigns/sha3/baseline/sha3_keccak_prove_formal.sv \
--sby mgr/campaigns/sha3/baseline/sha3_prove.sby \
--workdir mgr/campaigns/sha3/runs/r1 \
--timeout 120Results (R1): 20/20 KILLED → effective kill rate 100 % ✓
RTL target for mutation: sdram.v (SDRAM controller with Xilinx IO stubs).
Full RTL set: sdram.v, iobuf_stub.v, oddr2_stub.v. Iterative campaign over 4 rounds.
Step 1 — Generate mutants:
python3 mgr/tools/mutant_generator.py \
--rtl mgr/campaigns/sdram/baseline/sdram.v \
--out mgr/campaigns/sdram/mutants/r1 \
--design sdram \
--max-mutants 20Step 2 — Audit mutants (R4 shown):
python3 mgr/tools/audit_runner.py \
--manifest mgr/campaigns/sdram/mutants/r1/manifest.json \
--rtl-files mgr/campaigns/sdram/refinement_v4/sdram.v \
mgr/campaigns/sdram/baseline/iobuf_stub.v \
mgr/campaigns/sdram/baseline/oddr2_stub.v \
--harness-files mgr/campaigns/sdram/refinement_v4/sdram_prove_formal.sv \
--sby mgr/campaigns/sdram/refinement_v4/sdram_prove.sby \
--workdir mgr/campaigns/sdram/runs/r4 \
--timeout 120Results (R4, plateau): 7/19 effective → 36.8 % ✓ (8 INVALID; 4 surviving mutants are timing-parameter true equivalents beyond practical BMC depth.)
RTL target for mutation: vga_vtim.v (5-state VGA timing FSM, 174 lines).
Step 1 — Generate mutants:
python3 mgr/tools/mutant_generator.py \
--rtl mgr/campaigns/vga_lcd/baseline/vga_vtim.v \
--out mgr/campaigns/vga_lcd/mutants/r1 \
--design vga_vtim \
--max-mutants 20Step 2 — Audit mutants (R2 shown):
python3 mgr/tools/audit_runner.py \
--manifest mgr/campaigns/vga_lcd/mutants/r1/manifest.json \
--rtl-files mgr/campaigns/vga_lcd/baseline/timescale.v \
mgr/campaigns/vga_lcd/refinement_v2/vga_vtim.v \
--harness-files mgr/campaigns/vga_lcd/refinement_v2/vga_vtim_prove_formal.sv \
--sby mgr/campaigns/vga_lcd/refinement_v2/vga_vtim_prove.sby \
--workdir mgr/campaigns/vga_lcd/runs/r2 \
--timeout 120Results (R2): 18/20 KILLED → effective kill rate 90 % ✓ R1 was 14/20 (70 %); cycle-count gate assertions added in R2 killed 4 more mutants.
Yosys SMTBMC note: Use
$past(dut_output)rather than$past(harness_tracking_reg)— the solver treats local tracking registers as unconstrained in the initial state, making such assertions vacuously true at step 0.
RTL target for mutation: cnorm.v (CNORM normalisation + OVF detection unit, 133 lines).
The FFT256_CONFIG.inc macro file (defines nb=10) lives in the campaign directory along with the RTL copy.
Step 1 — Generate mutants:
python3 mgr/tools/mutant_generator.py \
--rtl mgr/campaigns/pipelined_fft_256/baseline/cnorm.v \
--out mgr/campaigns/pipelined_fft_256/mutants/r1 \
--design cnorm \
--max-mutants 20Step 2 — Audit mutants (R2 shown):
python3 mgr/tools/audit_runner.py \
--manifest mgr/campaigns/pipelined_fft_256/mutants/r1/manifest.json \
--rtl-files mgr/campaigns/pipelined_fft_256/refinement_v2/cnorm.v \
mgr/campaigns/pipelined_fft_256/refinement_v2/FFT256_CONFIG.inc \
--harness-files mgr/campaigns/pipelined_fft_256/refinement_v2/cnorm_prove_formal.sv \
--sby mgr/campaigns/pipelined_fft_256/refinement_v2/cnorm_prove.sby \
--workdir mgr/campaigns/pipelined_fft_256/runs/r2 \
--timeout 60Results (R2): 15/16 effective → 93.8 % ✓
R1 was 14/16 (87.5 %); adding A_ovf_shift3_nb1_fire (OVF must fire when DR[nb+3] ≠ DR[nb+1] in SHIFT=11) killed M016 in R2.
The single survivor (M001) mutates code inside an untaken `ifdef FFT256round block — a true equivalent.
If the effective kill rate is below the threshold, create a new round directory, copy in the RTL files and the updated harness, then repeat Steps 1–2:
# Example: starting refinement_v3 for uart_rx
ROUND=mgr/campaigns/uart_rx/refinement_v3
mkdir -p "$ROUND"
cp mgr/campaigns/uart_rx/refinement_v2/uart_rx.v "$ROUND/"
# Place the new harness and .sby in $ROUND, then run:
python3 mgr/tools/mutant_generator.py \
--rtl "$ROUND/uart_rx.v" \
--out mgr/campaigns/uart_rx/mutants/r3 \
--design uart_rx \
--max-mutants 20
python3 mgr/tools/audit_runner.py \
--manifest mgr/campaigns/uart_rx/mutants/r3/manifest.json \
--rtl-files "$ROUND/uart_rx.v" \
--harness-files "$ROUND/<new_harness>.sv" \
--sby "$ROUND/<new_prove>.sby" \
--workdir mgr/campaigns/uart_rx/runs/r3 \
--timeout 60Convention: each
refinement_vN/directory is immutable once created — never overwrite a completed round. Always start a newrefinement_v(N+1)/.
| Design | Round | Total | KILLED | SURVIVED | INVALID | Effective Kill Rate |
|---|---|---|---|---|---|---|
and2bit |
R1 (baseline) |
20 | 20 | — | — | 100 % ✓ |
uart_rx |
R1 (refinement_v1) |
20 | 18 | 2 | — | 90 % ✓ ‡ |
uart_full |
R1 (refinement_v1) |
20 | 18 | 2 | — | 90 % ✓ ‡ |
up8 |
R1 (refinement_v1) |
20 | 20 | — | — | 100 % ✓ |
sha3 |
R1 (refinement_v1) |
20 | 20 | — | — | 100 % ✓ |
sdram |
R1 (refinement_v1) |
27 | 6 | 13 | 8 | 31.6 % |
sdram |
R4 (refinement_v4) |
19 | 7 | 4 | 8 | 36.8 % ✓ § |
vga_lcd |
R1 (refinement_v1) |
20 | 14 | 6 | — | 70 % |
vga_lcd |
R2 (refinement_v2) |
20 | 18 | 2 | — | 90 % ✓ ‡ |
pipelined_fft_256 (CNORM) |
R1 (refinement_v1) |
20 | 14 | 2 | 4 | 87.5 % |
pipelined_fft_256 (CNORM) |
R2 (refinement_v2) |
20 | 15 | 1 | 4 | 93.8 % ✓ ‡ |
‡ Remaining survivors are true equivalents (non-blocking delay #1→#0 or dead `ifdef branches compiled out by the preprocessor); no assertion can kill them.
§ Plateau is structural: 4 surviving mutants alter timing-parameter logic whose observable effect lies beyond any practical BMC depth. 8 INVALID mutants excluded from effective count.
— indicates the count was zero (not recorded as a key in the summary JSON).
PSP measures the structural complexity of an RTL design, maps it to a complexity
score (CHI_NORM), selects the best SMT solver from a portfolio, chooses a BMC
depth, and writes a ready-to-run .sby file — all automatically.
Six complexity metrics are extracted from a Yosys stat -json report:
| Metric | Meaning |
|---|---|
D_A |
AND-tree depth (combinational complexity) |
D_M |
Multi-driver depth (fan-out complexity) |
W |
Register width (state-space width) |
D_R |
RAM size in bits (logarithm of state-space depth) |
I_C |
Internal connectivity ratio |
CHI_NORM |
Weighted composite score in [0, 1] |
Solver selection rules (evaluated in priority order, first match wins):
| Priority | Condition | Solver | Rationale |
|---|---|---|---|
| 1 | W_norm > 0.50 |
bitwuzla | Very wide datapath — bit-vector native reasoning |
| 2 | D_M > 0.35 |
yices | High mux/control density — fast SAT-style search |
| 3 | I_C > 0.35 AND W_norm < 0.10 |
bitwuzla | Symbolic-pointer / high-index narrow designs |
| 4 | D_A > 0.30 AND W_norm < 0.25 |
yices | Arithmetic-heavy moderate-width pipelines |
| 5 | D_R_norm > 0.30 |
yices | Memory-heavy design |
| default | — | yices | General control-oriented or mixed RTL |
python3 manage_formal.py \
<top_module> \
<k_depth> \
<v_file1> [<v_file2> ...] \
[--formal_sv <harness.sv>] \
[--formal_top <harness_top_module>] \
[--prep_flags <extra_prep_flags>]Artifacts written to the directory of --formal_sv (or the first .v file):
| File | Description |
|---|---|
design_dump.log |
Full Yosys output |
design_stats.json |
Parsed JSON statistics |
<top>_auto.sby |
Ready-to-run SymbiYosys configuration |
After the file is created, run the proof:
sby -f <top>_auto.sbyTo profile all designs in examples/ and update metrics_table.csv:
python3 generate_metrics_table.pyOutput is printed to stdout and saved to metrics_table.csv in the repository root.
All commands below are run from /workspaces/digital-tools/.
python3 manage_formal.py uart_full 25 \
examples/uart/uart_full/uart_full.v \
examples/uart/uart_full/uart_rx.v \
examples/uart/uart_full/uart_tx.v \
--formal_sv examples/uart/uart_full/formal/uart_full_formal.sv \
--formal_top uart_full_formalOutput: examples/uart/uart_full/formal/uart_full_auto.sby
Run the proof:
sby -f examples/uart/uart_full/formal/uart_full_auto.sbypython3 manage_formal.py i2c_master_top_formal 80 \
examples/i2c/i2c_master_defines.v \
examples/i2c/i2c_master_bit_ctrl_formal.v \
examples/i2c/i2c_master_byte_ctrl_formal.v \
examples/i2c/i2c_master_top_formal.v \
--formal_sv examples/i2c/formal/i2c_master_top_prove_formal.sv \
--formal_top i2c_master_top_prove_formalOutput: examples/i2c/formal/i2c_master_top_formal_auto.sby
The SDRAM design references Xilinx primitives (IOBUF, ODDR2) that are not part
of the design. Stub files in examples/sdram/ provide empty module definitions so
Yosys can elaborate the hierarchy.
python3 manage_formal.py sdram 30 \
examples/sdram/iobuf_stub.v \
examples/sdram/oddr2_stub.v \
examples/sdram/sdram.v \
--formal_sv examples/sdram/formal/sdram_prove_formal.sv \
--formal_top sdram_prove_formalOutput: examples/sdram/formal/sdram_auto.sby
Exception: Use the canonical RTL from
repro_todo2_aw16_d15/active/, not the top-levelexamples/generic_fifo_lfsr/formal/directory.
FIFO=examples/generic_fifo_lfsr/repro_todo2_aw16_d15/active
python3 manage_formal.py generic_fifo_lfsr 40 \
"$FIFO/timescale.v" \
"$FIFO/generic_dpram_formal.v" \
"$FIFO/generic_fifo_lfsr.v" \
--formal_sv "$FIFO/generic_fifo_lfsr_prove_formal.sv" \
--formal_top generic_fifo_lfsr_prove_formalOutput: <FIFO>/generic_fifo_lfsr_auto.sby
python3 manage_formal.py keccak 80 \
examples/sha3/f_permutation.v \
examples/sha3/keccak.v \
examples/sha3/padder.v \
examples/sha3/padder1.v \
examples/sha3/rconst2in1.v \
examples/sha3/round2in1.v \
--formal_sv examples/sha3/formal/sha3_keccak_prove_formal.sv \
--formal_top sha3_keccak_prove_formalOutput: examples/sha3/formal/keccak_auto.sby
Note: The RTL module is named
keccak(lowercase); passkeccakas thetop_moduleargument, notsha3_keccak_prove_formal.
The vga_vtim module is self-contained and can be run without pulling in the full
vga_enh_top design hierarchy.
python3 manage_formal.py vga_vtim 20 \
examples/vga_lcd_latest/timescale.v \
examples/vga_lcd_latest/vga_vtim.v \
--formal_sv mgr/campaigns/vga_lcd/refinement_v2/vga_vtim_prove_formal.sv \
--formal_top vga_vtim_prove_formalOutput: mgr/campaigns/vga_lcd/refinement_v2/vga_vtim_auto.sby
The CNORM module (note: uppercase) includes FFT256_CONFIG.inc via a `include
directive. Run manage_formal.py from the campaign directory where both cnorm.v
and FFT256_CONFIG.inc are co-located.
python3 manage_formal.py CNORM 30 \
mgr/campaigns/pipelined_fft_256/refinement_v2/cnorm.v \
--formal_sv mgr/campaigns/pipelined_fft_256/refinement_v2/cnorm_prove_formal.sv \
--formal_top cnorm_prove_formalOutput: mgr/campaigns/pipelined_fft_256/refinement_v2/CNORM_auto.sby
The UP8 core requires preprocessor flags (FORMAL, UP8_INLINE_ROM,
UP8_FORMAL_ROMONLY) that are passed via a special read command already present
in the campaign .sby file. manage_formal.py cannot elaborate up8_cpu.v without
those flags, so solver selection is done via the campaign artifacts instead.
Use the pre-built proof directly:
sby -f mgr/campaigns/up8/baseline/up8_add1_prove.sbyThe .sby file already contains the correct solver choice (yices, via the
D_M > 0.35 rule) and preprocessor flags.
All 8 benchmarks verified against Table 4 (observed ASL in s/step, lower = faster).
| Design | W_norm | D_A | D_M | I_C | DR_norm | Rule fired | PSP pick | Table 4 winner |
|---|---|---|---|---|---|---|---|---|
| I2C Master | 0.047 | 0.143 | 0.390 | 0.122 | 0.000 | D_M > 0.35 | yices | yices (0.038) |
| UART (Full) | 0.047 | 0.239 | 0.349 | 0.143 | 0.000 | default | yices | yices (0.008) |
| Sync. FIFO | 0.031 | 0.132 | 0.289 | 0.423 | 0.000 | I_C > 0.35 ∧ W_norm < 0.10 | bitwuzla | bitwuzla (2.733) |
| SDRAM Controller | 0.062 | 0.180 | 0.384 | 0.161 | 0.000 | D_M > 0.35 | yices | yices (0.024) |
| Pipelined FFT 256 | 0.156 | 0.374 | 0.096 | 0.054 | 0.035 | D_A > 0.30 ∧ W_norm < 0.25 | yices | yices (0.613) |
| SHA3 (Keccak) | 1.000 | 0.017 | 0.019 | 0.158 | 0.000 | W_norm > 0.50 | bitwuzla | bitwuzla (0.250) |
| VGA LCD | 0.078 | 0.146 | 0.257 | 0.326 | 0.016 | default | yices | yices (0.150) |
| uP8 (Add/ISA) | 0.094 | 0.224 | 0.598 | 0.033 | 0.500 | D_M > 0.35 | yices | yices (0.044/0.556) |
Note: This agent calls an LLM to generate explanations. To reproduce results, set the
LLM_API_KEYenvironment variable. Without it,cns_agent.pywrites prompts to disk so they can be submitted manually or via another API client.
CNS takes a failed BMC counterexample — the failing assertion, the mutation applied, and the signal trace — and produces a concise, structured root-cause narrative:
- Cycle identification — the step at which the assertion fires.
- Signal attribution — the signals directly responsible for the failure.
- Failure mechanism class — one of six categories:
missing_state_transition,stuck_signal,overflow_flag_error,handshake_failure,pointer_index_bug,timing_progression_error. - Root-cause sentence — a one-sentence causal explanation.
Install the LLM SDK you intend to use (the default backend uses anthropic) and set the API key:
pip install anthropic # or whichever SDK your model provider requires
export LLM_API_KEY=<your-key>Each case is a JSON file in cns/cases/. Required fields:
{
"case_id": "C1",
"failing_assertion_text": "assert (OVF == 0);",
"failing_assertion_name": "A_no_spurious_ovf",
"failing_assertion_file": "cnorm_prove_formal.sv",
"failing_assertion_line": 42,
"original_snippet": "if (A > B)",
"mutated_snippet": "if (!(A > B))",
"target_module": "CNORM",
"source_line": 78,
"mutation_class": "predicate_inversion",
"trace_path": "<path/to/trace_tb.v>",
"key_signals": ["OVF", "START", "ED"],
"first_bad_cycle": 1
}trace_path points to the trace_tb.v file produced by sby when a BMC
counterexample is found. key_signals are the signal names to emphasise in the trace table.
Run a single case:
python3 cns/cns_agent.py --case cns/cases/C1.jsonRun all 8 benchmark cases:
python3 cns/cns_agent.py --allResponses are saved to cns/responses_r2/ (the default round directory).
Score responses against ground truth:
python3 cns/cns_agent.py --scoreOutput: per-case JSON files in cns/scored_r2/ with binary flags
(correct_cycle, correct_signal, correct_mechanism, unsupported_claims,
overall_faithful), and a summary table.
The --round flag controls which response/scored subdirectory is used:
# Round 1: zero-shot baseline
python3 cns/cns_agent.py --all --round 1
# Round 2: refined prompt (default)
python3 cns/cns_agent.py --all --round 2Round 1 responses are already cached in cns/responses_r1/.
Round 2 responses are in cns/responses_r2/.
Eight BMC failures across four designs are included as benchmarks:
| ID | Design | Mutation class | First bad cycle | Root cause class |
|---|---|---|---|---|
| C1 | FFT CNORM | predicate_inversion | 1 | overflow_flag_error |
| C2 | FFT CNORM | equality_flip | 1 | overflow_flag_error |
| C3 | FFT CNORM | predicate_inversion | 1 | stuck_signal |
| C4 | VGA LCD | predicate_inversion | 1 | missing_state_transition |
| C5 | VGA LCD | predicate_inversion | 7 | missing_state_transition |
| C6 | UART RX | equality_flip | 13 | timing_progression_error |
| C7 | UART RX | predicate_inversion | 31 | handshake_failure |
| C8 | SHA3 | constant_perturbation | 27 | pointer_index_bug |
Scoring uses four binary metrics per case. A response is overall_faithful only if
all four pass. Ground truth was written before any LLM response was generated
(blind evaluation).
| Round | Correct Cycle | Correct Signals | Correct Mechanism | Unsupported Claims | Overall Faithful |
|---|---|---|---|---|---|
| R1 (zero-shot) | 4/8 | 8/8 | 5/8 | 0 | 3/8 |
| R2 (refined prompt) | 8/8 | 8/8 | 8/8 | 0 | 8/8 |
The prompt refinements that drove R1 → R2 improvement:
- Step anchor (R1 fix): Explicitly telling the model that the assertion fires at
step
first_bad_cycle— not at the first cycle of signal divergence — fixed all four cycle mis-attributions. - Mechanism guide (R2 fix): Providing tight definitions for each of the six
failure classes, with distinguishing symptoms and explicit warnings (e.g., "use
stuck_signalwhen a data register is frozen; usemissing_state_transitiononly for the FSM state register itself"), fixed all three mechanism mis-classifications (C3: stuck_signal; C7: handshake_failure; C8: pointer_index_bug).