This guide collects the project direction, technical stack, and engineering protocol for
maintainers. Keep user-facing setup and usage details in README.md, contribution workflow in
CONTRIBUTING.md, and release notes in CHANGELOG.md.
CodeWell is pre-alpha. The current implementation covers the core local loop:
- local indexing for Python, TypeScript, and JavaScript source files
- incremental SQLite storage with FTS5 search
- token-authenticated GitHub repository archive ingest for public and private repositories
- detached-library intake for ZIP archives, source folders, bare code files, papers, and loose documents through one managed inbox
- read-only protection for managed imported code, with best-effort Windows ACL hardening
- optional auto-indexing immediately after intake import
- context packs with graph metadata, budget-aware symbol traces, and multi-file expansion
- workspace-local document attachments with explicit manual project/file/symbol links
- intake-imported papers/documents surfaced as lightweight relevant references in context and MCP
- revision memory for failed adaptations and verified fixes
- CLI, MCP, and a read-only local UI over the same local engine
Broader multi-language parsing depth, detached-library ergonomics, optional embeddings, reranking,
and richer exploration views remain future work. Detached-library mode, repair/admin surfaces, the
read-only local ops UI, and the managed intake path are now part of the baseline. PDF-specific
paper extraction is still intentionally shallow and remains future work. See
docs/ARCHITECTURE_V2.md for the next-phase plan. For the current prioritized execution order,
see docs/IMPROVEMENT_PLAN.md.
Build a small but complete local memory loop for AI coding agents:
- Index code.
- Retrieve useful context.
- Record failed reuse.
- Save a fixed revision with evidence.
- Recall that revision in a later task.
The MVP should validate product usefulness before optimizing for paper benchmarks.
- Create a Python package layout.
- Add
pyproject.toml. - Add a basic CLI entry point named
codewell. - Add test fixtures for small Python projects.
- Parse Python files with the standard-library
astparser for the MVP. - Keep the parser boundary modular enough to add Tree-sitter later.
- Extract files, classes, functions, methods, imports, basic call edges, and line spans.
- Store results in SQLite.
- Add FTS5 indexing for file paths, symbol names, and source content.
- Implement lexical search over the local index.
- Add graph expansion from seed symbols to neighboring imports, calls, tests, and files.
- Return a structured context pack with selected files, selected symbols, graph metadata, source provenance, token-budget estimates, symbol traces, and selection explanations.
- Accept a public GitHub repository URL.
- Download the repository archive into a local cache.
- Index it as a reference graph.
- Record repository URL, branch or commit, file path, license when available, and retrieval time.
- Add records for snippet use, failure, and fix.
- Store original snippet ID, target project, error log, failed command, patch diff, explanation, test command, test result, verification state, and applicability notes.
- Never overwrite the original snippet.
- Expose MCP tools:
index_workspacesearch_codetrace_symbolget_context_packget_database_statusrecord_failurerecord_revisionsearch_revision_memory
- Keep tool outputs structured and compact for coding agents.
- Treat attached documents as lightweight optional references in context output, not as a second primary retrieval corpus.
- Build a simple local web UI.
- Show repositories, files, symbols, snippets, failures, revisions, and verification states.
- Add a graph view after the core loop is stable.
- A user can index a local source repository with supported languages.
- A user can index a public GitHub repository by URL.
- A coding agent can retrieve a useful context pack through MCP.
- A failed snippet adaptation can be recorded.
- A fixed revision can be recalled later with its explanation and verification evidence.
- The default flow runs locally without paid APIs or a GPU.
Use this list to improve the project before public launch. Keep the default path local-first and avoid adding required LLM APIs, vector databases, hosted services, GPUs, or background daemons.
-
Add GitHub Actions CI for the full release gate.
- Run
python scripts/check_release.pyon pull requests and pushes tomain. - Cache Python dependencies where useful, but keep the workflow simple enough to debug.
- Done when CI catches test, lint, type, package, and evaluation failures automatically.
- Run
-
Run real-project evaluations on 2-3 medium Python repositories.
- See
docs/REAL_PROJECT_EVALUATION.md. - Use
scripts/evaluate_real_projects.pywith a local manifest when evaluating multiple projects. - Create task JSON files with known expected files, symbols, and trace relationships.
- Track context recall, context precision, trace recall, latency, and manual search steps.
- Done when results show where CodeWell helps and where retrieval still fails.
- See
-
Publish a concise release-readiness report.
- See
docs/RELEASE_READINESS.md. - Summarize current feature coverage, known limitations, evaluation results, and package smoke status.
- Done when a new maintainer can decide whether the project is ready to publish from one page.
- See
-
Address open-source evaluation misses.
- Prefer defining files for qualified method queries such as
Console.print. - Ensure trace output and trace evaluation can account for later outgoing calls such as
HTTPAdapter.send -> self.build_response. - Done when the committed Click, Requests, and Rich task files pass or the remaining failures are documented as accepted limitations.
- Prefer defining files for qualified method queries such as
-
Improve graph expansion beyond direct call edges.
- Add better links between imports, callers, callees, and related support files.
- Keep context packs compact and budget-aware.
- Done when real-project evaluations show fewer missing support files without lowering precision.
-
Add route, command, and test relationship extraction for Python projects.
- Start with common static patterns instead of broad framework-specific logic.
- Done when context packs can find likely entry points and tests for common bug-fix tasks.
-
Strengthen context pack selection explanations.
- Explain why each selected file was included: symbol hit, path hit, call edge, import, revision memory, or fallback.
- Done when an agent or maintainer can audit context selection without reading the ranking code.
-
Improve revision memory applicability checks.
- Distinguish reusable fixes from one-off fixes more clearly.
- Add stale/rejected workflows for revisions that no longer apply.
- Done when revision search results carry enough evidence to decide whether to reuse them.
-
Expand TypeScript and JavaScript evaluation coverage on real projects.
- Add at least 2 task files against local JS/TS repositories or stable fixtures with expected files, symbols, and trace relationships.
- Include queries that depend on object methods, namespace/module blocks, decorated methods, accessors, barrel-file expansion, route entrypoints, and re-export chains.
- Bundled fixture-suite manifest coverage now exists in
evaluations/fixture_suite_manifest.jsonso TS/JS retrieval changes can be checked at the task-report level, but broader external project evaluation is still pending. - Done when retrieval quality is measured on non-trivial TS/JS tasks instead of fixture-only confidence.
-
Add clearer first-run examples for installed users.
- Include a short copy-paste flow: create a tiny project, index, search, trace, context, and record revision memory.
- Done when a user can verify the package in under five minutes after installation.
-
Improve error messages for missing databases, unsupported URLs, and empty search results.
- Suggest the next command to run, such as
codewell index <path>. - Done when common mistakes produce actionable CLI guidance.
- Suggest the next command to run, such as
-
Add stable JSON examples for CLI and MCP outputs.
- Keep examples short and update them when output contracts change.
- Done when downstream agent integrations can use the docs as contract examples.
-
Start the detached-library path model.
codewell index ... --library-root <path>now stores derived artifacts outside the raw source tree and writes a workspace manifest for the indexed source root.- This is the first step toward the
Raw / Derived / Manifestarchitecture indocs/ARCHITECTURE_V2.md.
-
Add private GitHub repository support with explicit token handling.
codewell index ... --github-tokennow supports token-authenticated GitHub metadata and archive requests.CODEWELL_GITHUB_TOKENandGITHUB_TOKENare supported as environment-variable fallbacks.- Credential-bearing GitHub URLs are rejected so tokens are not stored in provenance metadata.
- Richer auth flows and host variants remain future work.
-
Add TypeScript and JavaScript parsing.
- Keep parser boundaries modular; avoid weakening Python indexing.
- Initial support uses Python
astfor Python and lightweight heuristic parsing for TS/JS. - Current coverage includes common functions, classes, methods, accessors, object literal
methods, namespace/module blocks, decorator-prefixed methods, class-field arrow methods,
object-property arrow functions, class-local
this.andsuper.call resolution, import/export forms, re-exports, and relative-import graph expansion. - Done when mixed-language repositories can be indexed and covered by fixture and evaluation tests.
-
Deepen the TS/JS parser beyond the current heuristic baseline.
- Next execution order:
export { x as y } from ...and mixed multi-hop re-export chains- route entrypoint and router-mount fixture evaluation promoted to task-level evaluation
- broader real-project JS/TS evaluations before adding new languages
- only then consider additional languages or optional reranking
- Improve trace usefulness where static inference is cheap and low-risk, but avoid pretending to understand dynamic dispatch that the current heuristic parser cannot prove.
- Current local heuristic baseline already covers:
- functions, classes, methods, accessors, object literal methods
- namespace/module blocks and decorator-prefixed methods
- class-field arrow methods and object-property arrow functions
- class-local
this.andsuper.call resolution - object-method-local
this.call resolution for non-arrow object methods - route registration handlers for
.get/.post/.put/.patch/.delete/.use/.all - framework-style export signatures such as
export default async function handler(...),export const GET = ..., andexport async function POST(...) - relative imports, alias imports such as
@/,~/,#/, barrel files, re-exports,export *chains, and importer-side graph expansion
- Done when additional TS/JS syntax can be covered without materially increasing false symbols or false call edges.
- Next execution order:
Use this section as the handoff snapshot after clearing chat history.
Last broad release-gate snapshot was verified on May 12, 2026.
python -m pytest: 110 passedpython -m ruff check .: passedpython -m mypy: passedpython scripts/check_release.py: passed
Last focused productization snapshot was verified on May 13, 2026.
python -m pytest tests/test_library_status.py tests/test_cli.py tests/test_ui.py: passedpython -m pytest tests/test_archive_ingest.py tests/test_cli.py: passedpython -m ruff check src/codewell/commands.py src/codewell/library_status.py src/codewell/cli.py src/codewell/ui.py tests/test_library_status.py tests/test_cli.py tests/test_ui.py: passedpython -m mypy src tests: passed
Important note:
- The full release gate was not rerun after the latest ingest-recovery, UI affordance, and shared
command-builder changes. Before any release decision, rerun
python scripts/check_release.py.
- Context-pack graph expansion now works in multiple rounds instead of a single hop.
- Relative TS/JS imports are normalized correctly, including paths such as
../auth. - TS/JS context expansion now supports:
- alias import to source
- alias -> barrel -> source
- alias ->
export *-> source - source -> routes -> app/router-entry via reverse importer expansion
- TS/JS parser now extracts useful edges for:
- object-method
this.resolution - common route registration handlers
- framework-style route export signatures
- router mount calls such as
app.use('/auth', router)
- object-method
- Detached-library mode is now the intended trusted path:
codewell init-library- raw/derived boundary docs
- manifest-based DB discovery
- UI/library-status/repair-plan coverage
- Ingest recovery is now substantially stronger:
- local folder, GitHub URL, and ZIP ingest all return more explicit recovery hints
- failed ingest runs are recorded with stage history and surfaced in CLI and UI
- empty ZIP archives are rejected explicitly
- unsafe ZIP paths such as absolute entries or
..are rejected explicitly - runs that index
0supported files now emit a strong warning instead of a silent low-signal success
- UI productization now includes:
- workspace health
- repair queue
- repair audit filters
- provenance and raw/derived boundary views
- ingest stage drill-down
- onboarding hints
- copyable inspect/suggested commands for failed ingest runs
- Shared command generation now exists in
src/codewell/commands.py:- CLI, UI, and detached-library repair/status surfaces no longer hand-build these command strings independently
- Detached-library rebuild hints now include
--library-rootconsistently. - When a manifest is missing but the last ingest failed, library status can now recover the source path from the last materialized root and still emit a useful rebuild hint.
- Keep TS/JS parsing heuristic and conservative. Do not replace it with a large parser rewrite unless real-project evaluation proves the current ceiling is too low.
- Prefer retrieval wins that improve search, trace, and context together.
- Treat context expansion as graph navigation, not just lexical ranking:
- imported file
- importer file
- callee definition file
- caller file
- route / command / test entrypoints
- Promote useful fixture-only wins into task-level evaluation before claiming them as stable product capability.
- Do not add required LLM APIs or embeddings to solve current parser/retrieval gaps. Measure the lexical + graph baseline first.
- Preserve raw-source immutability as a product invariant, not just a documentation preference.
- Prefer one shared command-builder/helper over duplicated UI/CLI string assembly whenever a user needs to copy or rerun commands.
- Do not expand MCP/agent orchestration aggressively yet; keep interfaces modular, but prioritize local product trust, evaluation depth, and operational clarity first.
- Run at least 2 real JS/TS project evaluations with task JSON files, not just unit fixtures.
- Review graph precision after raising TS/JS graph-expansion depth and candidate limits.
- Add mixed named re-export plus
export *task coverage once a real project exposes it. - If JS/TS evaluation exposes real misses, improve retrieval using conservative graph changes before introducing embeddings or reranking.
- Only after retrieval evidence improves, expand UI detail/file/graph views further.
If chat history is cleared, resume with a prompt like:
Continue CodeWell from docs/PROJECT_GUIDE.md Continuation Notes. Focus on the current next tasks, do not expand agent/MCP scope yet, preserve raw-source immutability, and prefer detached-library product completion and JS/TS evaluation depth over new surface area.
-
Build a read-only local UI for repositories, workspace health, provenance, repair state, and revision inspection.
codewell serve --uinow serves repository status, code search, revision-memory search, ingest history, detached workspace health, repair queue state, repair-audit summaries, provenance, and raw/derived boundary views.- The current UI is intentionally read-only and optimized for inspection, not mutation.
- Remaining scope: file detail views, failure browsing, and graph views.
-
Add optional embeddings or reranking only after real-project evaluations justify it.
- Keep lexical search as the default and embeddings as an explicit enhancement.
- Done when an evaluation shows measurable retrieval improvement over the local lexical baseline.
-
Add richer GitHub ingest strategies.
- Consider Git Trees API, partial clone, sparse checkout, and better cache invalidation.
- Done when large public repositories can be indexed without downloading unnecessary files.
Use docs/EVALUATION.md for task-level usefulness evaluation. The fixture baseline is runnable
with:
python scripts/evaluate_fixture.pyProject-specific task lists can be evaluated with:
python scripts/evaluate_project.py /path/to/python-project tasks.jsonThe maintained self-evaluation task list is:
python scripts/evaluate_project.py . evaluations/codewell_self.json
python scripts/evaluate_project.py . evaluations/codewell_natural.json
python scripts/evaluate_project.py tests/fixtures/typescript_basic tests/fixtures/evaluation_tasks/typescript_basic.json
python scripts/evaluate_project.py tests/fixtures/typescript_barrel tests/fixtures/evaluation_tasks/typescript_barrel.json
python scripts/evaluate_project.py tests/fixtures/typescript_extended tests/fixtures/evaluation_tasks/typescript_extended.json
python scripts/evaluate_project.py tests/fixtures/typescript_arrow tests/fixtures/evaluation_tasks/typescript_arrow.json
python scripts/evaluate_project.py tests/fixtures/typescript_routes tests/fixtures/evaluation_tasks/typescript_routes.json
python scripts/evaluate_project.py tests/fixtures/typescript_reexport tests/fixtures/evaluation_tasks/typescript_reexport.jsonUnit tests verify correctness of individual modules. Evaluation checks whether the full local loop retrieves useful context and revision memory for a task, then reports recall, precision, budget, and latency metrics for comparison across project snapshots.
- Cloud hosting.
- Automatic execution of untrusted external code.
- Large-scale benchmark runs.
- Deep multi-language support beyond the current Python and TS/JS baseline.
- Required vector databases.
- Required LLM API calls.
Use a local-first, lightweight stack. The default path must run on a personal computer without paid APIs, GPUs, or a hosted service.
- Python 3.10+ for the MVP.
- Keep the core modular enough to migrate hot paths to Rust or Go later if needed.
- MVP parser: Python's standard-library
astmodule. - Keep the parser interface modular enough to add stronger language parsers later.
- Current baseline: Python via
ast, plus heuristic TypeScript/JavaScript extraction. - Near-term target: strengthen TS/JS coverage before adding more languages.
- SQLite as the primary local database.
- SQLite FTS5 for lexical search over paths, symbols, docstrings, comments, and code slices.
- Search ranking should weight path and symbol matches above body-only matches.
- Natural-language queries should be normalized into code-search terms without requiring external embeddings.
- Avoid a required vector database in the MVP.
- Embeddings are optional query-time enhancements, not a required indexing dependency.
- Prefer small local models such as MiniLM, bge-small, or a lightweight code embedding model.
- External embedding APIs can be added through a provider interface.
- LLM calls are optional and should not run during default indexing.
- Supported use cases include query rewriting, result reranking, revision summaries, applicability notes, and failure pattern classification.
- Provider interfaces should support BYOK for OpenAI, Anthropic, Gemini, local Ollama, and compatible APIs.
- Current baseline: download repository archives by URL and cache them by owner, repo, branch, or commit.
- GitHub token auth now supports private repositories and higher API limits without storing credentials in provenance metadata.
- Store source provenance: URL, commit SHA when available, license, file path, and retrieval time.
- Later: add detached-library defaults, richer cache management, Git Trees API, partial clone, and sparse checkout.
- MCP is the primary integration protocol.
- CLI should exist for direct human testing and scripting.
- CodeWell receives records from coding agents but does not replace them.
- Current baseline: read-only local UI for repository status, search, revision memory, ingest history, workspace health, repair queue state, repair audit, provenance, and raw/derived boundary inspection.
- Later: file detail views, graph view for repo nodes, snippets, sources, failures, fixes, and revision branches.
- Skip ignored and generated directories such as
.git,.venv,node_modules,dist,build, and cache folders. - Hash files and only re-index changed content.
- Store compact source spans instead of duplicating full files where possible.
- Keep all long-running indexing and verification tasks cancellable.
- Raw workspace trees and original ZIP archives are immutable inputs; recovery workflows must not mutate them.
- Failed detached-library ingest runs should leave enough history for
codewell ingest-historyto explain whether failure happened in plan, materialize, or index. - Empty archives, unsafe archive paths, and zero-supported-source runs should produce explicit operator guidance instead of silent low-signal results.
- When indexing succeeds with
0supported files, prefer a strong warning plus next steps over turning that case into a hard failure.
Use this protocol before implementing non-trivial CodeWell changes.
State which subsystem is being changed:
- parser
- index store
- GitHub ingest
- retrieval
- context packing
- revision memory
- verification
- MCP
- UI
Keep each change focused on one boundary unless integration work is required.
Before coding, consider at least two relevant edge cases, such as:
- syntax errors in source files
- unreadable files
- generated files
- duplicate symbol names
- dynamic imports
- missing GitHub metadata
- unsupported licenses
- stale cached repositories
- failing test commands
- unverifiable agent summaries
Every change should have a clear verification path:
- unit test
- fixture-based parser output
- CLI smoke test
- SQLite schema check
- MCP tool call test
- revision state transition test
Do not add required LLM calls, vector databases, background daemons, or hosted services to the default path.
Optional providers are allowed only behind explicit configuration.
Any external code, snippet, or revision must record:
- source URL or local path
- commit or snapshot ID when available
- file path and line span
- license metadata when available
- retrieval time
- verification state
Original snippets and repositories are immutable records. Fixes must be stored as revision branches with explanation and evidence.