Skip to content

Latest commit

 

History

History
545 lines (424 loc) · 24.1 KB

File metadata and controls

545 lines (424 loc) · 24.1 KB

CodeWell Project Guide

This guide collects the project direction, technical stack, and engineering protocol for maintainers. Keep user-facing setup and usage details in README.md, contribution workflow in CONTRIBUTING.md, and release notes in CHANGELOG.md.

Current Status

CodeWell is pre-alpha. The current implementation covers the core local loop:

  • local indexing for Python, TypeScript, and JavaScript source files
  • incremental SQLite storage with FTS5 search
  • token-authenticated GitHub repository archive ingest for public and private repositories
  • detached-library intake for ZIP archives, source folders, bare code files, papers, and loose documents through one managed inbox
  • read-only protection for managed imported code, with best-effort Windows ACL hardening
  • optional auto-indexing immediately after intake import
  • context packs with graph metadata, budget-aware symbol traces, and multi-file expansion
  • workspace-local document attachments with explicit manual project/file/symbol links
  • intake-imported papers/documents surfaced as lightweight relevant references in context and MCP
  • revision memory for failed adaptations and verified fixes
  • CLI, MCP, and a read-only local UI over the same local engine

Broader multi-language parsing depth, detached-library ergonomics, optional embeddings, reranking, and richer exploration views remain future work. Detached-library mode, repair/admin surfaces, the read-only local ops UI, and the managed intake path are now part of the baseline. PDF-specific paper extraction is still intentionally shallow and remains future work. See docs/ARCHITECTURE_V2.md for the next-phase plan. For the current prioritized execution order, see docs/IMPROVEMENT_PLAN.md.

MVP Goal

Build a small but complete local memory loop for AI coding agents:

  1. Index code.
  2. Retrieve useful context.
  3. Record failed reuse.
  4. Save a fixed revision with evidence.
  5. Recall that revision in a later task.

The MVP should validate product usefulness before optimizing for paper benchmarks.

MVP Scope

Phase 0: Repository Skeleton

  • Create a Python package layout.
  • Add pyproject.toml.
  • Add a basic CLI entry point named codewell.
  • Add test fixtures for small Python projects.

Phase 1: Local Project Index

  • Parse Python files with the standard-library ast parser for the MVP.
  • Keep the parser boundary modular enough to add Tree-sitter later.
  • Extract files, classes, functions, methods, imports, basic call edges, and line spans.
  • Store results in SQLite.
  • Add FTS5 indexing for file paths, symbol names, and source content.

Phase 2: Context Pack Retrieval

  • Implement lexical search over the local index.
  • Add graph expansion from seed symbols to neighboring imports, calls, tests, and files.
  • Return a structured context pack with selected files, selected symbols, graph metadata, source provenance, token-budget estimates, symbol traces, and selection explanations.

Phase 3: GitHub URL Indexing

  • Accept a public GitHub repository URL.
  • Download the repository archive into a local cache.
  • Index it as a reference graph.
  • Record repository URL, branch or commit, file path, license when available, and retrieval time.

Phase 4: Revision Memory

  • Add records for snippet use, failure, and fix.
  • Store original snippet ID, target project, error log, failed command, patch diff, explanation, test command, test result, verification state, and applicability notes.
  • Never overwrite the original snippet.

Phase 5: MCP Server

  • Expose MCP tools:
    • index_workspace
    • search_code
    • trace_symbol
    • get_context_pack
    • get_database_status
    • record_failure
    • record_revision
    • search_revision_memory
  • Keep tool outputs structured and compact for coding agents.
  • Treat attached documents as lightweight optional references in context output, not as a second primary retrieval corpus.

Phase 6: Local UI

  • Build a simple local web UI.
  • Show repositories, files, symbols, snippets, failures, revisions, and verification states.
  • Add a graph view after the core loop is stable.

Success Criteria

  • A user can index a local source repository with supported languages.
  • A user can index a public GitHub repository by URL.
  • A coding agent can retrieve a useful context pack through MCP.
  • A failed snippet adaptation can be recorded.
  • A fixed revision can be recalled later with its explanation and verification evidence.
  • The default flow runs locally without paid APIs or a GPU.

Improvement TODO

Use this list to improve the project before public launch. Keep the default path local-first and avoid adding required LLM APIs, vector databases, hosted services, GPUs, or background daemons.

P0: Release Confidence

  • Add GitHub Actions CI for the full release gate.

    • Run python scripts/check_release.py on pull requests and pushes to main.
    • Cache Python dependencies where useful, but keep the workflow simple enough to debug.
    • Done when CI catches test, lint, type, package, and evaluation failures automatically.
  • Run real-project evaluations on 2-3 medium Python repositories.

    • See docs/REAL_PROJECT_EVALUATION.md.
    • Use scripts/evaluate_real_projects.py with a local manifest when evaluating multiple projects.
    • Create task JSON files with known expected files, symbols, and trace relationships.
    • Track context recall, context precision, trace recall, latency, and manual search steps.
    • Done when results show where CodeWell helps and where retrieval still fails.
  • Publish a concise release-readiness report.

    • See docs/RELEASE_READINESS.md.
    • Summarize current feature coverage, known limitations, evaluation results, and package smoke status.
    • Done when a new maintainer can decide whether the project is ready to publish from one page.

P1: Retrieval Quality

  • Address open-source evaluation misses.

    • Prefer defining files for qualified method queries such as Console.print.
    • Ensure trace output and trace evaluation can account for later outgoing calls such as HTTPAdapter.send -> self.build_response.
    • Done when the committed Click, Requests, and Rich task files pass or the remaining failures are documented as accepted limitations.
  • Improve graph expansion beyond direct call edges.

    • Add better links between imports, callers, callees, and related support files.
    • Keep context packs compact and budget-aware.
    • Done when real-project evaluations show fewer missing support files without lowering precision.
  • Add route, command, and test relationship extraction for Python projects.

    • Start with common static patterns instead of broad framework-specific logic.
    • Done when context packs can find likely entry points and tests for common bug-fix tasks.
  • Strengthen context pack selection explanations.

    • Explain why each selected file was included: symbol hit, path hit, call edge, import, revision memory, or fallback.
    • Done when an agent or maintainer can audit context selection without reading the ranking code.
  • Improve revision memory applicability checks.

    • Distinguish reusable fixes from one-off fixes more clearly.
    • Add stale/rejected workflows for revisions that no longer apply.
    • Done when revision search results carry enough evidence to decide whether to reuse them.
  • Expand TypeScript and JavaScript evaluation coverage on real projects.

    • Add at least 2 task files against local JS/TS repositories or stable fixtures with expected files, symbols, and trace relationships.
    • Include queries that depend on object methods, namespace/module blocks, decorated methods, accessors, barrel-file expansion, route entrypoints, and re-export chains.
    • Bundled fixture-suite manifest coverage now exists in evaluations/fixture_suite_manifest.json so TS/JS retrieval changes can be checked at the task-report level, but broader external project evaluation is still pending.
    • Done when retrieval quality is measured on non-trivial TS/JS tasks instead of fixture-only confidence.

P1: Developer Experience

  • Add clearer first-run examples for installed users.

    • Include a short copy-paste flow: create a tiny project, index, search, trace, context, and record revision memory.
    • Done when a user can verify the package in under five minutes after installation.
  • Improve error messages for missing databases, unsupported URLs, and empty search results.

    • Suggest the next command to run, such as codewell index <path>.
    • Done when common mistakes produce actionable CLI guidance.
  • Add stable JSON examples for CLI and MCP outputs.

    • Keep examples short and update them when output contracts change.
    • Done when downstream agent integrations can use the docs as contract examples.
  • Start the detached-library path model.

    • codewell index ... --library-root <path> now stores derived artifacts outside the raw source tree and writes a workspace manifest for the indexed source root.
    • This is the first step toward the Raw / Derived / Manifest architecture in docs/ARCHITECTURE_V2.md.

P2: Planned Product Scope

  • Add private GitHub repository support with explicit token handling.

    • codewell index ... --github-token now supports token-authenticated GitHub metadata and archive requests.
    • CODEWELL_GITHUB_TOKEN and GITHUB_TOKEN are supported as environment-variable fallbacks.
    • Credential-bearing GitHub URLs are rejected so tokens are not stored in provenance metadata.
    • Richer auth flows and host variants remain future work.
  • Add TypeScript and JavaScript parsing.

    • Keep parser boundaries modular; avoid weakening Python indexing.
    • Initial support uses Python ast for Python and lightweight heuristic parsing for TS/JS.
    • Current coverage includes common functions, classes, methods, accessors, object literal methods, namespace/module blocks, decorator-prefixed methods, class-field arrow methods, object-property arrow functions, class-local this. and super. call resolution, import/export forms, re-exports, and relative-import graph expansion.
    • Done when mixed-language repositories can be indexed and covered by fixture and evaluation tests.
  • Deepen the TS/JS parser beyond the current heuristic baseline.

    • Next execution order:
      1. export { x as y } from ... and mixed multi-hop re-export chains
      2. route entrypoint and router-mount fixture evaluation promoted to task-level evaluation
      3. broader real-project JS/TS evaluations before adding new languages
      4. only then consider additional languages or optional reranking
    • Improve trace usefulness where static inference is cheap and low-risk, but avoid pretending to understand dynamic dispatch that the current heuristic parser cannot prove.
    • Current local heuristic baseline already covers:
      • functions, classes, methods, accessors, object literal methods
      • namespace/module blocks and decorator-prefixed methods
      • class-field arrow methods and object-property arrow functions
      • class-local this. and super. call resolution
      • object-method-local this. call resolution for non-arrow object methods
      • route registration handlers for .get/.post/.put/.patch/.delete/.use/.all
      • framework-style export signatures such as export default async function handler(...), export const GET = ..., and export async function POST(...)
      • relative imports, alias imports such as @/, ~/, #/, barrel files, re-exports, export * chains, and importer-side graph expansion
    • Done when additional TS/JS syntax can be covered without materially increasing false symbols or false call edges.

Continuation Notes

Use this section as the handoff snapshot after clearing chat history.

Last Verified Snapshot

Last broad release-gate snapshot was verified on May 12, 2026.

  • python -m pytest: 110 passed
  • python -m ruff check .: passed
  • python -m mypy: passed
  • python scripts/check_release.py: passed

Last focused productization snapshot was verified on May 13, 2026.

  • python -m pytest tests/test_library_status.py tests/test_cli.py tests/test_ui.py: passed
  • python -m pytest tests/test_archive_ingest.py tests/test_cli.py: passed
  • python -m ruff check src/codewell/commands.py src/codewell/library_status.py src/codewell/cli.py src/codewell/ui.py tests/test_library_status.py tests/test_cli.py tests/test_ui.py: passed
  • python -m mypy src tests: passed

Important note:

  • The full release gate was not rerun after the latest ingest-recovery, UI affordance, and shared command-builder changes. Before any release decision, rerun python scripts/check_release.py.

Important Recent Changes

  • Context-pack graph expansion now works in multiple rounds instead of a single hop.
  • Relative TS/JS imports are normalized correctly, including paths such as ../auth.
  • TS/JS context expansion now supports:
    • alias import to source
    • alias -> barrel -> source
    • alias -> export * -> source
    • source -> routes -> app/router-entry via reverse importer expansion
  • TS/JS parser now extracts useful edges for:
    • object-method this. resolution
    • common route registration handlers
    • framework-style route export signatures
    • router mount calls such as app.use('/auth', router)
  • Detached-library mode is now the intended trusted path:
    • codewell init-library
    • raw/derived boundary docs
    • manifest-based DB discovery
    • UI/library-status/repair-plan coverage
  • Ingest recovery is now substantially stronger:
    • local folder, GitHub URL, and ZIP ingest all return more explicit recovery hints
    • failed ingest runs are recorded with stage history and surfaced in CLI and UI
    • empty ZIP archives are rejected explicitly
    • unsafe ZIP paths such as absolute entries or .. are rejected explicitly
    • runs that index 0 supported files now emit a strong warning instead of a silent low-signal success
  • UI productization now includes:
    • workspace health
    • repair queue
    • repair audit filters
    • provenance and raw/derived boundary views
    • ingest stage drill-down
    • onboarding hints
    • copyable inspect/suggested commands for failed ingest runs
  • Shared command generation now exists in src/codewell/commands.py:
    • CLI, UI, and detached-library repair/status surfaces no longer hand-build these command strings independently
  • Detached-library rebuild hints now include --library-root consistently.
  • When a manifest is missing but the last ingest failed, library status can now recover the source path from the last materialized root and still emit a useful rebuild hint.

Important Design Ideas

  • Keep TS/JS parsing heuristic and conservative. Do not replace it with a large parser rewrite unless real-project evaluation proves the current ceiling is too low.
  • Prefer retrieval wins that improve search, trace, and context together.
  • Treat context expansion as graph navigation, not just lexical ranking:
    • imported file
    • importer file
    • callee definition file
    • caller file
    • route / command / test entrypoints
  • Promote useful fixture-only wins into task-level evaluation before claiming them as stable product capability.
  • Do not add required LLM APIs or embeddings to solve current parser/retrieval gaps. Measure the lexical + graph baseline first.
  • Preserve raw-source immutability as a product invariant, not just a documentation preference.
  • Prefer one shared command-builder/helper over duplicated UI/CLI string assembly whenever a user needs to copy or rerun commands.
  • Do not expand MCP/agent orchestration aggressively yet; keep interfaces modular, but prioritize local product trust, evaluation depth, and operational clarity first.

Recommended Next Tasks

  1. Run at least 2 real JS/TS project evaluations with task JSON files, not just unit fixtures.
  2. Review graph precision after raising TS/JS graph-expansion depth and candidate limits.
  3. Add mixed named re-export plus export * task coverage once a real project exposes it.
  4. If JS/TS evaluation exposes real misses, improve retrieval using conservative graph changes before introducing embeddings or reranking.
  5. Only after retrieval evidence improves, expand UI detail/file/graph views further.

Recommended Conversation Restart Prompt

If chat history is cleared, resume with a prompt like:

Continue CodeWell from docs/PROJECT_GUIDE.md Continuation Notes. Focus on the current next tasks, do not expand agent/MCP scope yet, preserve raw-source immutability, and prefer detached-library product completion and JS/TS evaluation depth over new surface area.

  • Build a read-only local UI for repositories, workspace health, provenance, repair state, and revision inspection.

    • codewell serve --ui now serves repository status, code search, revision-memory search, ingest history, detached workspace health, repair queue state, repair-audit summaries, provenance, and raw/derived boundary views.
    • The current UI is intentionally read-only and optimized for inspection, not mutation.
    • Remaining scope: file detail views, failure browsing, and graph views.
  • Add optional embeddings or reranking only after real-project evaluations justify it.

    • Keep lexical search as the default and embeddings as an explicit enhancement.
    • Done when an evaluation shows measurable retrieval improvement over the local lexical baseline.
  • Add richer GitHub ingest strategies.

    • Consider Git Trees API, partial clone, sparse checkout, and better cache invalidation.
    • Done when large public repositories can be indexed without downloading unnecessary files.

Evaluation

Use docs/EVALUATION.md for task-level usefulness evaluation. The fixture baseline is runnable with:

python scripts/evaluate_fixture.py

Project-specific task lists can be evaluated with:

python scripts/evaluate_project.py /path/to/python-project tasks.json

The maintained self-evaluation task list is:

python scripts/evaluate_project.py . evaluations/codewell_self.json
python scripts/evaluate_project.py . evaluations/codewell_natural.json
python scripts/evaluate_project.py tests/fixtures/typescript_basic tests/fixtures/evaluation_tasks/typescript_basic.json
python scripts/evaluate_project.py tests/fixtures/typescript_barrel tests/fixtures/evaluation_tasks/typescript_barrel.json
python scripts/evaluate_project.py tests/fixtures/typescript_extended tests/fixtures/evaluation_tasks/typescript_extended.json
python scripts/evaluate_project.py tests/fixtures/typescript_arrow tests/fixtures/evaluation_tasks/typescript_arrow.json
python scripts/evaluate_project.py tests/fixtures/typescript_routes tests/fixtures/evaluation_tasks/typescript_routes.json
python scripts/evaluate_project.py tests/fixtures/typescript_reexport tests/fixtures/evaluation_tasks/typescript_reexport.json

Unit tests verify correctness of individual modules. Evaluation checks whether the full local loop retrieves useful context and revision memory for a task, then reports recall, precision, budget, and latency metrics for comparison across project snapshots.

Out of Scope for MVP

  • Cloud hosting.
  • Automatic execution of untrusted external code.
  • Large-scale benchmark runs.
  • Deep multi-language support beyond the current Python and TS/JS baseline.
  • Required vector databases.
  • Required LLM API calls.

Technology Stack

Use a local-first, lightweight stack. The default path must run on a personal computer without paid APIs, GPUs, or a hosted service.

Language

  • Python 3.10+ for the MVP.
  • Keep the core modular enough to migrate hot paths to Rust or Go later if needed.

Parsing

  • MVP parser: Python's standard-library ast module.
  • Keep the parser interface modular enough to add stronger language parsers later.
  • Current baseline: Python via ast, plus heuristic TypeScript/JavaScript extraction.
  • Near-term target: strengthen TS/JS coverage before adding more languages.

Storage

  • SQLite as the primary local database.
  • SQLite FTS5 for lexical search over paths, symbols, docstrings, comments, and code slices.
  • Search ranking should weight path and symbol matches above body-only matches.
  • Natural-language queries should be normalized into code-search terms without requiring external embeddings.
  • Avoid a required vector database in the MVP.

Optional Embeddings

  • Embeddings are optional query-time enhancements, not a required indexing dependency.
  • Prefer small local models such as MiniLM, bge-small, or a lightweight code embedding model.
  • External embedding APIs can be added through a provider interface.

Optional LLM Providers

  • LLM calls are optional and should not run during default indexing.
  • Supported use cases include query rewriting, result reranking, revision summaries, applicability notes, and failure pattern classification.
  • Provider interfaces should support BYOK for OpenAI, Anthropic, Gemini, local Ollama, and compatible APIs.

GitHub Ingest

  • Current baseline: download repository archives by URL and cache them by owner, repo, branch, or commit.
  • GitHub token auth now supports private repositories and higher API limits without storing credentials in provenance metadata.
  • Store source provenance: URL, commit SHA when available, license, file path, and retrieval time.
  • Later: add detached-library defaults, richer cache management, Git Trees API, partial clone, and sparse checkout.

Agent Integration

  • MCP is the primary integration protocol.
  • CLI should exist for direct human testing and scripting.
  • CodeWell receives records from coding agents but does not replace them.

UI

  • Current baseline: read-only local UI for repository status, search, revision memory, ingest history, workspace health, repair queue state, repair audit, provenance, and raw/derived boundary inspection.
  • Later: file detail views, graph view for repo nodes, snippets, sources, failures, fixes, and revision branches.

Performance Rules

  • Skip ignored and generated directories such as .git, .venv, node_modules, dist, build, and cache folders.
  • Hash files and only re-index changed content.
  • Store compact source spans instead of duplicating full files where possible.
  • Keep all long-running indexing and verification tasks cancellable.

Ingest Recovery Rules

  • Raw workspace trees and original ZIP archives are immutable inputs; recovery workflows must not mutate them.
  • Failed detached-library ingest runs should leave enough history for codewell ingest-history to explain whether failure happened in plan, materialize, or index.
  • Empty archives, unsafe archive paths, and zero-supported-source runs should produce explicit operator guidance instead of silent low-signal results.
  • When indexing succeeds with 0 supported files, prefer a strong warning plus next steps over turning that case into a hard failure.

Engineering Protocol

Use this protocol before implementing non-trivial CodeWell changes.

1. Define The Boundary

State which subsystem is being changed:

  • parser
  • index store
  • GitHub ingest
  • retrieval
  • context packing
  • revision memory
  • verification
  • MCP
  • UI

Keep each change focused on one boundary unless integration work is required.

2. Identify Edge Cases

Before coding, consider at least two relevant edge cases, such as:

  • syntax errors in source files
  • unreadable files
  • generated files
  • duplicate symbol names
  • dynamic imports
  • missing GitHub metadata
  • unsupported licenses
  • stale cached repositories
  • failing test commands
  • unverifiable agent summaries

3. Define Verification

Every change should have a clear verification path:

  • unit test
  • fixture-based parser output
  • CLI smoke test
  • SQLite schema check
  • MCP tool call test
  • revision state transition test

4. Keep The Core Lightweight

Do not add required LLM calls, vector databases, background daemons, or hosted services to the default path.

Optional providers are allowed only behind explicit configuration.

5. Preserve Provenance

Any external code, snippet, or revision must record:

  • source URL or local path
  • commit or snapshot ID when available
  • file path and line span
  • license metadata when available
  • retrieval time
  • verification state

6. Do Not Pollute Source Memory

Original snippets and repositories are immutable records. Fixes must be stored as revision branches with explanation and evidence.