diff --git a/README.md b/README.md index ced1577..de7a247 100644 --- a/README.md +++ b/README.md @@ -2,7 +2,11 @@ [![CI](https://github.com/blindhacker99/code-reason/actions/workflows/ci.yml/badge.svg)](https://github.com/blindhacker99/code-reason/actions/workflows/ci.yml) -**code-reason** is an MCP server that gives coding agents real program-analysis reasoning — so they verify data flow, trace evidence chains, and navigate call graphs from ground truth, not guesswork. +> *Instead of grep-and-guess, provide your agents program analysis capabilities with code-reason.* + +**code-reason** is an MCP server that gives coding agents real program-analysis primitives — data-flow reachability, call-graph traversal, evidence-chain construction — so they verify code behavior from ground truth instead of speculation. + +## Why code-reason Coding agents are good at reading code, but they struggle with whole-program questions: @@ -10,11 +14,24 @@ Coding agents are good at reading code, but they struggle with whole-program que - *Who really invokes this function across the codebase?* - *What is the complete evidence chain from source to sink?* -These are program-analysis questions. Answering them with grep-and-guess, or with ever-larger context windows, scales poorly and produces false confidence. They are better answered with a code property graph and a few well-chosen graph queries — exposed as MCP tools the agent drives directly. +Without a code-analysis tool, the agent answers these by **manual grep-based tracing.** It can read code, but it can't actually trace data flow, walk a control graph, or verify that user input reaches a sink. Ask any modern coding agent how it traces taint without a tool, and the answer is some variant of *"I read files and follow string matches."* + +That works for simple cases. The cracks show on anything non-trivial — aliased variables, inter-procedural flow, sanitization checks, framework-injected inputs. The agent still produces an answer, often with high confidence, but it's reading 6+ files to confirm a single chain, burning context on speculation, and silently missing flows it never thought to grep for. For security-sensitive work, a confident wrong answer is more dangerous than no answer at all — and that's exactly what grep-based tracing produces at scale. + +code-reason closes the gap. The agent stays in charge of *what's interesting*; code-reason answers *what's actually true* — backed by a code property graph parsed once and queried cheaply. The result: agents that are more deterministic, more token-efficient, and faster — and agentic security workflows you can actually trust. + +Where traditional SAST tools produce findings reports for humans to triage, code-reason exposes the underlying analysis primitives for an agent to drive its own investigation. + +Built on [Fraunhofer AISEC's Code Property Graph](https://github.com/Fraunhofer-AISEC/cpg) for multi-language data-flow, control-flow, and taint analysis, and the [Kotlin MCP SDK](https://github.com/modelcontextprotocol/kotlin-sdk) to expose those capabilities as agent-callable tools. + +## What you get -**Tech stack.** code-reason uses [Fraunhofer AISEC's Code Property Graph](https://github.com/Fraunhofer-AISEC/cpg) for multi-language data-flow, control-flow, and taint analysis, and the [Kotlin MCP SDK](https://github.com/modelcontextprotocol/kotlin-sdk) to expose those capabilities as agent-callable tools. +From dogfood sessions on real Java and Python codebases: -**Long-term vision.** Make program-analysis primitives a first-class capability for coding agents — sharpening vulnerability detection precision, eliminating tokens wasted on speculative grep-and-guess exploration, and turning vulnerability hunting into a directed, evidence-driven workflow. +- **30-40% fewer agent tokens** on multi-step security reviews, mostly from call-graph queries that would otherwise take 5-10 grep iterations to confirm by hand. +- **Compact structured answers, not file dumps.** A call-graph query returns reachable methods in JSON; the grep-and-read equivalent forces the agent to read 6+ files to confirm one chain. +- **One analysis pass per service, unlimited queries.** `reason_analyze_project` builds the CPG once; every other `reason_*` tool queries it cheaply. +- **Real evidence chains.** `reason_trace_taint_path` returns the full source-to-sink path with intermediate steps and code context — not "looks like SQLi maybe." ## How it works @@ -49,15 +66,17 @@ code-reason exposes nine MCP tools, grouped by purpose: | Group | Tool | Purpose | |---|---|---| -| Analysis | `reason_analyze_project` | Parse a project into a CPG | -| Discovery | `reason_find_entry_points` | Locate HTTP handlers and CLI entry points | -| Discovery | `reason_list_supported_checks` | Enumerate built-in vulnerability checks | -| Graph | `reason_find_callers` | "Who calls this function?" | -| Graph | `reason_find_callees` | "What does this function call?" | -| Graph | `reason_query_dataflow` | Forward/backward reachability over the data-flow graph | -| Vulnerability | `reason_scan_injections` | Catalog-driven taint analysis (SQLi, XSS, command injection) | -| Vulnerability | `reason_trace_taint_path` | Full source-to-sink evidence chain for a finding | -| Vulnerability | `reason_get_finding_detail` | Complete finding description and remediation | +| Setup | `reason_analyze_project` | Parse a project into a code property graph | +| Navigation | `reason_find_entry_points` | Locate HTTP handlers, CLI entries, framework hooks | +| Navigation | `reason_find_callers` | "Who calls this function?" | +| Navigation | `reason_find_callees` | "What does this function call?" | +| Data flow | `reason_query_dataflow` | Forward/backward reachability over the data-flow graph | +| Data flow | `reason_trace_taint_path` | Full source-to-sink evidence chain between any two points | +| Catalog scan (convenience) | `reason_scan_injections` | Catalog-driven taint analysis (SQLi/XSS/command injection) | +| Catalog scan (convenience) | `reason_list_supported_checks` | Enumerate built-in vulnerability checks | +| Catalog scan (convenience) | `reason_get_finding_detail` | Description + remediation for a scan finding | + +The marquee value is the **navigation** and **data flow** primitives — the agent composes them to answer whole-program questions on its own. The **catalog scan** tools are a convenience baseline for quick first-pass triage; the agent's own reasoning over the primitives is what makes the difference on real codebases. ## Prerequisites @@ -79,7 +98,7 @@ The launcher lands at `build/install/code-reason/bin/code-reason`. ./gradlew test ``` -Twelve integration tests run the full pipeline against small Java and Python fixtures. +Integration tests run the full pipeline against small Java and Python fixtures. ## Claude Code setup @@ -100,15 +119,15 @@ Restart Claude Code; the `reason_*` tools will appear in its tool list. ## Example workflow -A typical agent session looks like this: +A typical agent-driven session — primitives composing into evidence: -1. Agent calls `reason_analyze_project` on the target codebase to build the CPG. -2. Agent calls `reason_find_entry_points` to enumerate reachable inputs. -3. Agent calls `reason_scan_injections` to surface candidate data flows. -4. For each candidate, agent calls `reason_trace_taint_path` to confirm reachability and retrieve the evidence chain. -5. Agent reasons about exploitability from the structured result. +1. Agent calls `reason_analyze_project` to build the CPG. +2. Agent calls `reason_find_entry_points` to enumerate where external input enters the codebase. +3. For a suspicious entry point, agent uses `reason_find_callees` and `reason_query_dataflow` to map the downstream reach. +4. When the data reaches a sensitive call, agent calls `reason_trace_taint_path` for the full source-to-sink evidence chain. +5. Agent reasons about exploitability from the structured result and decides what to investigate next. -At any step the agent can pivot through the graph on its own using `reason_find_callers`, `reason_find_callees`, or `reason_query_dataflow`. +For quick first-pass triage, the agent can also call `reason_scan_injections` to surface candidate flows from the built-in catalog, then verify each one with `reason_trace_taint_path`. ## Supported languages @@ -119,7 +138,7 @@ Additional CPG frontends (C/C++, Go, TypeScript, JVM, LLVM, Ruby) can be enabled ## Status -code-reason is v0.1.0 — early, research-grade. The build is currently pinned to CPG `main-SNAPSHOT`; this will move to a stable `11.x` release once Fraunhofer publishes one to Maven Central. +code-reason is v0.1.0 — early, research-grade. CI runs on every push and pull request. The build is currently pinned to CPG `main-SNAPSHOT`; this will move to a stable `11.x` release once Fraunhofer publishes one to Maven Central. ## License