buildo · GorlemZ · Jul 1, 2026 · Jul 1, 2026 · Jul 1, 2026
diff --git a/.env.example b/.env.example
@@ -0,0 +1,16 @@
+# spec-code-analyzer configuration — copy to .env (gitignored) and fill in.
+# Single source of truth for the whole preflight. Load it with:  set -a; . ./.env; set +a
+
+# Atlassian (Confluence/Jira fetch — workflows/fetch_atlassian.py)
+ATLASSIAN_BASE_URL=https://<org>.atlassian.net
+ATLASSIAN_EMAIL=you@example.com
+ATLASSIAN_API_TOKEN=
+
+# vibingwithclaude MCP knowledge-graph cache (workflows/check_mcp.py, MCP-CACHE.md).
+# NEVER hardcoded in the workflow — change the url/key here and the preflight re-registers the server.
+MCP_URL=https://mcp.vibingwithclaude.it/mcp
+MCP_API_KEY=
+
+# Where the workflow writes output. Absolute path recommended so it always lands in this repo
+# regardless of the launch cwd (passed to the workflow as args.outputDir).
+SPEC_OUTPUT_DIR=./output
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1,14 @@
+# spec-code-analyzer secrets & outputs — never commit
+.env
+.env.*
+!.env.example
+
+# workflow output dirs
+output/
+output-goals/
+.spec-analyze*
+
+# misc
+.DS_Store
+node_modules/
+workflows/__pycache__/
diff --git a/README.md b/README.md
@@ -41,43 +41,50 @@ Le due varianti scrivono in directory separate (`./.spec-analyze` vs `./.spec-an
 ## Componenti
 
 - **`workflows/spec-analyze.js`** — il workflow di orchestrazione (descritto sopra).
-- **`workflows/fetch_atlassian.py`** — fetch deterministico di Confluence/Jira (solo stdlib Python 3, nessuna dipendenza). Produce `srs.md` (+ eventuale `cards.md`). Credenziali da env o `<out>/.env`, mai stampate.
+- **`workflows/fetch_atlassian.py`** — fetch deterministico di Confluence/Jira (solo stdlib Python 3, nessuna dipendenza). Produce `srs.md` (+ eventuale `cards.md`). Credenziali da env o `.env`, mai stampate.
+- **`workflows/check_mcp.py`** — preflight della cache MCP (solo stdlib): legge `MCP_URL`/`MCP_API_KEY` dal `.env` di root (mai hardcodati nel workflow), (ri)registra il server MCP da `.env` e ne verifica la raggiungibilità. Config mancante → crea il template `.env` e si ferma; MCP irraggiungibile → warning (il workflow degrada a no-cache). Vedi `workflows/MCP-CACHE.md`.
 - **`workflows/run_cost.py`** — post-processing del costo reale per-agent/per-fase, ricostruito dai transcript JSONL della run (il workflow internamente vede solo il totale output-token; questo script recupera input + cache per il breakdown della RR-5).
 
 ## Come si usa
 
 Il workflow **non** esegue da solo il fetch né la conferma utente: presuppone un preflight interattivo. Flusso tipico:
 
-**1. Preflight** — verifica credenziali Atlassian e `gh` autenticato:
+Tutta la configurazione vive in un **unico `.env` a root** (gitignored): `ATLASSIAN_*`, `MCP_URL`, `MCP_API_KEY`, `SPEC_OUTPUT_DIR`. Se manca, `check_mcp.py` lo crea come template e si ferma; caricalo nell'ambiente con `set -a; . ./.env; set +a` così le variabili valgono per tutti gli script del preflight.
+
+**1. Preflight** — config `.env`, cache MCP, credenziali Atlassian, `gh`:
 
 ```bash
-export ATLASSIAN_BASE_URL="https://<org>.atlassian.net"
-export ATLASSIAN_EMAIL="tu@example.com"
-export ATLASSIAN_API_TOKEN="..."        # oppure in <out>/.env (gitignored)
+python3 workflows/check_mcp.py        # crea/valida .env, (ri)registra e pinga l'MCP; stampa SPEC_OUTPUT_DIR
+set -a; . ./.env; set +a              # carica ATLASSIAN_*, MCP_*, SPEC_OUTPUT_DIR nell'ambiente
 gh auth status
 ```
 
+`check_mcp.py` esce con **2** (bloccante) se il `.env` manca o `MCP_*` è incompleto; con **0** anche se l'MCP è irraggiungibile (warning: il workflow degrada a no-cache via `useIndex`). L'`MCP_URL` non è mai hardcodato: cambialo nel `.env` e il preflight ri-registra il server.
+
 **2. Fetch della specifica** da Confluence (+ eventuale card Jira):
 
 ```bash
 python3 workflows/fetch_atlassian.py \
   --confluence <id|url> \
   --jira <KEY|url> \
-  --out ./.spec-analyze/<slug>
+  --out "$SPEC_OUTPUT_DIR/<slug>"
 ```
 
 **3. Segmentazione** dell'SRS in **≤10 unità** (sezioni), confermata con l'utente.
 
-**4. Lancio del workflow** (dal main-loop di Claude Code) passando gli `args`:
+**4. Lancio del workflow** (dal main-loop di Claude Code) passando gli `args` (`outputDir` = `SPEC_OUTPUT_DIR`, assoluto → output sempre in questo repo):
 
 ```jsonc
 {
   "variant": "prescriptive",            // o "goals"
   "repo": "owner/repo",
   "branch": "develop",
   "slug": "draft-srs-...",
-  "srsPath": "./.spec-analyze/<slug>/srs.md",
-  "cardsPath": "./.spec-analyze/<slug>/cards.md",  // o null
+  "outputDir": "<SPEC_OUTPUT_DIR>",     // da check_mcp.py; assoluto consigliato
+  "workspace": "owner/repo",            // opzionale; sanificato a [a-z0-9-] per l'MCP
+  "useIndex": true,                     // false = bypassa la cache MCP
+  "srsPath": "<SPEC_OUTPUT_DIR>/<slug>/srs.md",
+  "cardsPath": "<SPEC_OUTPUT_DIR>/<slug>/cards.md",  // o null
   "units": [ { "idx": "01", "titolo": "...", "prose": "..." } ],
   "mergeNote": "eventuali merge di sezioni eseguiti"  // o null
 }

diff --git a/workflows/MCP-CACHE.md b/workflows/MCP-CACHE.md
@@ -0,0 +1,113 @@
+# MCP knowledge-graph cache for `spec-analyze`
+
+Wires the `spec-analyze` workflow (`workflows/spec-analyze.js`) to the **vibingwithclaude**
+MCP knowledge-graph as a cache in front of the discovery phase.
+
+- **On start**, look up indexed context and, on a **fresh** hit, reuse it and skip discovery.
+- **On a miss**, run discovery as usual and **populate** the graph from its output.
+
+Design constraint: the Workflow JS body can only call `agent()/parallel()/pipeline()/phase()/log()`;
+the MCP tools are reachable **only inside spawned agents**. So every MCP read/write lives in a
+dedicated agent (`context-broker`, `indexer`), never in the JS body. (Verified by spike, 2026-07.)
+
+## Flow
+
+```
+Context lookup ──fresh?──► reuse: broker materializes repo-map/ + comments.md → SKIP discovery
+      │ miss / MCP down / useIndex=false
+      ▼
+Context (cartographer ‖ crawler)  →  Index write-back (ingest into the graph, non-blocking)
+      ▼
+Analysis → Verification → Reverse diff → SRS improved → Report   (unchanged)
+```
+
+The `Context` barrier is preserved: `repo-map/` + `comments.md` must exist before the Analysis
+pipeline, whether produced by discovery or materialized from the cache.
+
+## Config (`args`)
+
+| arg | default | meaning |
+|---|---|---|
+| `useIndex` | `true` | set `false` to bypass the MCP entirely (flow is then exactly the original) |
+| `workspace` | `repo` | tenant scope; **sanitized** to `[a-z0-9-]` (the MCP rejects slashes), e.g. `pagopa/interop-be-monorepo` → `pagopa-interop-be-monorepo` |
+| `outputDir` | `./output` | where output lands; the preflight passes `SPEC_OUTPUT_DIR` (absolute → always this repo) |
+
+If the MCP is unreachable the workflow degrades to the original discovery flow — no regressions.
+
+## Config & preflight (`.env` + `check_mcp.py`)
+
+The MCP url/key are **never hardcoded** in the workflow — the single source of truth is the root
+`.env` (gitignored; see `.env.example`):
+
+```
+MCP_URL=…      MCP_API_KEY=…      SPEC_OUTPUT_DIR=…      ATLASSIAN_*=…
+```
+
+`workflows/check_mcp.py` (stdlib) is the preflight step:
+- **`.env` missing** → creates the template + ensures `.gitignore`, **exits 2** (fill it and re-run).
+- **`MCP_*` incomplete** → exits 2.
+- **otherwise** → idempotently **(re)registers** the `vibingwithclaude` Claude MCP server *from `.env`*
+  (so changing the url/key in `.env` propagates to the agents, which reach the server by name), then
+  **pings** it (JSON-RPC `initialize`). Reachable → exit 0; unreachable/unauthorized → **warning**,
+  exit 0 (the workflow degrades to no-cache via `useIndex`, so a down MCP never aborts an analysis).
+- prints the resolved absolute `SPEC_OUTPUT_DIR` on stdout for the launcher to pass as `args.outputDir`.
+
+The workflow JS body cannot read `.env`/env, so `outputDir` (and the useIndex/workspace knobs) arrive
+via `args`; the preflight is what reads `.env` and wires them in.
+
+## Node / workspace contract
+
+Workspace = sanitized `owner/repo` (repo-scoped, reusable across features/slugs). Staleness is
+**repo-level** (FASE 1): the indexed bundle's `commit_sha` (stored in `node.extra`) is compared to
+the current branch HEAD; equal ⇒ fresh, else miss. `get_context` returns `card.extra.commit_sha`
+and `body_md` verbatim, so no server change is needed.
+
+Bundles are built deterministically by `workflows/repo_map_to_bundle.py`:
+
+- **cartographer bundle** (`source_kind: cartographer`): one `area` node per area —
+  `name=area_key`, `summary=purpose`, `body_md=<area>.md` (verbatim), `extra={paths, dependsOn,
+  commit_sha, branch}`, `links: dependsOn→area`.
+- **crawler bundle** (`source_kind: crawler`): one `pr` node per enriched PR —
+  `name=pr-<n>`, `extra={number, state, signals, paths, commit_sha}`, `links: touches→area`
+  (mapped by path prefix); plus one doc node `comments-md` (type `pr`, number 0) whose `body_md`
+  is the **entire `comments.md` verbatim**, so a fresh hit restores it exactly.
+
+Materialization on a fresh hit rebuilds `repo-map/index.md` (table from the area cards),
+`repo-map/<area>.md` (each area card's `body_md`), and `comments.md` (the `comments-md` node's
+`body_md`). The broker **self-verifies** all targets exist and are non-empty before reporting
+`fresh` — the JS body cannot check the filesystem, so the broker is the only guard.
+
+## Machine-readable sidecars (M4)
+
+For deterministic parsing, cartographer/crawler also emit JSON sidecars alongside the prose:
+- `repo-map/index.json` — `{ "areas": [{ area_key, purpose, paths[], dependsOn[] }] }`
+  (`area_key` = the `<area>.md` filename stem, so JSON and node file join).
+- `comments.index.json` — `{ "prs": [{ number, title, state, signals, paths[] }] }` (enriched PRs only).
+
+## Robustness invariants
+
+- The cache never overrides as-is truth: any doubt ⇒ miss ⇒ full discovery.
+- Write-back is best-effort/non-fatal: an MCP failure logs a warning, never aborts the analysis.
+- `ingest_bundle` is the primary write path (`schema_version: "1.0"`, validated); the indexer
+  falls back to `upsert_node`/`add_link` if `ingest_bundle` is unavailable.
+- `replace_edges` is asserted only for bundles that actually carry the nodes being re-linked, and
+  empty bundles are omitted, so a partial/failed discovery never wipes prior edges.
+
+## Status
+
+**FASE 1 — implemented, reviewed, spikes green (2026-07).** Repo-level fresh/miss, write-back,
+sidecars, deterministic transform. Not yet: committed to `main`, live E2E on a real repo.
+
+## FASE 2 backlog (deferred)
+
+Per-area staleness instead of whole-repo fresh/miss:
+- diff changed paths (`gh … compare <indexedSha>…<HEAD>`) ∩ each area's key paths → stale areas only.
+- **Requires new contracts that do not exist today**: a cartographer mode for *scoped* re-generation
+  of only the stale areas, and a deterministic re-synthesis of `repo-map/index.md` merging fresh +
+  cached areas. Crawler staleness = PRs merged since `indexedSha`.
+- Configurable tolerance (`maxStaleAreas` / commit budget).
+
+Other open items:
+- `workflows/run_cost.py` `ROLE_PHASE`/`PHASE_ORDER` are stale (old Italian role names, missing
+  `SRS improved`) and don't know the new phases → the RR-5 cost breakdown misclassifies the MCP
+  phases. Pre-existing drift; fix separately.