Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,9 @@ name: CI

on:
push:
branches: [main]
branches: [main, dev]
pull_request:
branches: [main]
branches: [main, dev]

env:
CARGO_TERM_COLOR: always
Expand Down
58 changes: 53 additions & 5 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,15 +6,53 @@ Codeilus is a single Rust binary that analyzes any codebase and transforms it in
## Key Docs
- `NORTH_STAR.md` — Purpose, architecture, data flow, sprints, acceptance criteria
- `docs/AGENT_PROMPTS.md` — Copy-paste prompts for parallel Cursor agents (6 waves)
- `docs/OPTIMIZATIONS.md` — Performance optimization plan (12 items, 6 phases)
- `docs/SCHEMATIC_DESIGN.md` — Unified schematic explorer design
- `docs/adr/0002-unified-schematic-explorer.md` — ADR for schematic unification
- `CLAUDE.md` — This file (read by all agents)

## Git Workflow (MUST FOLLOW)

**Branching model:**
```
main (production — protected, receives PRs from dev only)
└── dev (integration — receives PRs from feature branches)
└── feat/xxx, fix/xxx, docs/xxx (short-lived feature branches)
```

**Rules for all agents:**
1. **NEVER commit directly to `main` or `dev`.** Always create a feature branch.
2. Create branches from `dev`: `git checkout -b feat/my-feature dev`
3. Push and open a PR targeting `dev` (not `main`).
4. PRs require CI to pass: `cargo build`, `cargo clippy` (zero warnings), `cargo test`.
5. Use squash merge when merging to `dev`. Feature branches auto-delete after merge.
6. Only `dev → main` PRs are used for releases.

**Branch naming conventions:**
- `feat/short-description` — new features
- `fix/short-description` — bug fixes
- `docs/short-description` — documentation only
- `refactor/short-description` — code restructuring
- `perf/short-description` — performance improvements

**Commit message format:**
```
type: concise description

Optional body with details.

Co-Authored-By: <agent name> <email>
```
Types: `feat`, `fix`, `docs`, `refactor`, `perf`, `test`, `ci`, `chore`

## Build & Test
```bash
cd /Users/bm/codeilus/codeilus

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The build command includes a hardcoded absolute path to a user's home directory. This will not work for other developers or in a CI environment. It should be replaced with a relative path or a placeholder variable to make the instructions more generic and portable.

Suggested change
cd /Users/bm/codeilus/codeilus
cd <project_root> # e.g., /Users/bm/codeilus/codeilus

cargo build # build all crates
cargo test # run all tests
cargo clippy # must be zero warnings
cargo test -p codeilus-parse # test single crate
cd frontend && pnpm build # frontend build
```

## Architecture Rules
Expand All @@ -33,17 +71,27 @@ cargo test -p codeilus-parse # test single crate
- Events flow through EventBus (tokio broadcast) — never direct state mutation
- Tests use in-memory SQLite (`DbPool::in_memory()`)

## Performance Patterns (established)
- **API pagination**: All list endpoints accept `?limit=` and `?offset=` (default 50, max 200)
- **Moka cache**: 500-entry, 10min TTL in `AppCache`. Cache reads before DB queries.
- **HTTP caching**: `Cache-Control: public, max-age=300` on all GET routes via middleware
- **No N+1 queries**: Batch-load with JOINs or `WHERE IN (...)`, group in Rust with HashMap
- **Frontend fetch cache**: `cachedGet()` in `api.ts` with 5min TTL for read-heavy endpoints
- **Vite**: Brotli compression, manual chunk splitting for three/shiki/3d-force-graph
- **Shiki**: Languages loaded per-file on demand, not all upfront

## Current State
Waves 1-4 complete. All 16 crates functional with 268 tests passing, zero clippy warnings.
Waves 1-4 complete. All 16 crates functional with 220+ tests passing, zero clippy warnings.
- Parse: Tree-sitter for 12 languages, incremental parsing
- Graph: Call graph, Louvain communities, 3-level zoom visualization
- Metrics: SLOC, complexity, fan-in/out, modularity
- Narrate: 8 narrative types via Claude Code CLI
- Learn: Curriculum, quizzes, XP/badges/streaks
- API: 50+ REST endpoints, SSE streaming Q&A
- Frontend: SvelteKit 5 with graph explorer, learning path, Ask AI
- Infrastructure: r2d2 pool, moka cache, pipeline checkpoints, structured logging
Next: Wave 5-6 polish, release pipeline, documentation refresh.
- API: 50+ REST endpoints, SSE streaming Q&A, schematic lazy-load API
- Frontend: SvelteKit 5 with graph explorer, learning path, Ask AI, schematic views
- Infrastructure: r2d2 pool, moka cache (500 entries), pipeline checkpoints, structured logging
- Performance: Pagination on all list endpoints, N+1 fixes, client+server caching, brotli compression
Next: Unified schematic explorer (ADR-0002), Wave 5-6 polish, release pipeline.

## Parallel Agent Waves
- **Wave 1** (3 agents): codeilus-parse, codeilus-db repos, frontend skeleton
Expand Down
43 changes: 26 additions & 17 deletions crates/codeilus-api/src/routes/ask.rs
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,6 @@ use tracing::info;

use crate::state::AppState;

type SymbolRow = (String, String, String, i64, i64, Option<String>);

#[derive(Deserialize)]
struct AskRequest {
question: String,
Expand Down Expand Up @@ -72,24 +70,35 @@ async fn ask_stream(
}).into_response();
}

// Build context from selected symbols
// Build context from selected symbols (batch query)
let mut context_parts = Vec::new();
if !body.context_symbol_ids.is_empty() {
let conn = state.db.connection();
for sid in &body.context_symbol_ids {
let result: Result<SymbolRow, _> = conn.query_row(
"SELECT s.name, s.kind, f.path, s.start_line, s.end_line, s.signature
FROM symbols s JOIN files f ON s.file_id = f.id WHERE s.id = ?1",
[sid],
|row| Ok((row.get(0)?, row.get(1)?, row.get(2)?, row.get(3)?, row.get(4)?, row.get(5)?)),
);
if let Ok((name, kind, path, start, end, sig)) = result {
context_parts.push(format!(
"- {} `{}` in `{}` (lines {}-{}){}",
kind, name, path, start, end,
sig.map(|s| format!("\n Signature: {}", s)).unwrap_or_default()
));
}
let placeholders: Vec<String> = body.context_symbol_ids.iter().enumerate().map(|(i, _)| format!("?{}", i + 1)).collect();
let sql = format!(
"SELECT s.name, s.kind, f.path, s.start_line, s.end_line, s.signature \
FROM symbols s JOIN files f ON s.file_id = f.id \
WHERE s.id IN ({})",
placeholders.join(", ")
);
let mut stmt = conn.prepare(&sql).unwrap();
let params: Vec<&dyn rusqlite::types::ToSql> = body.context_symbol_ids.iter().map(|id| id as &dyn rusqlite::types::ToSql).collect();
let rows = stmt.query_map(params.as_slice(), |row| {
Ok((
row.get::<_, String>(0)?,
row.get::<_, String>(1)?,
row.get::<_, String>(2)?,
row.get::<_, i64>(3)?,
row.get::<_, i64>(4)?,
row.get::<_, Option<String>>(5)?,
))
}).unwrap();
Comment on lines +84 to +95

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The use of .unwrap() on lines 84 and 95 can cause the server to panic if preparing the SQL statement or mapping the query results fails. This is especially risky with dynamically generated SQL. Please handle these Results gracefully, for example by using a match statement or map_err, and returning an appropriate error response to prevent the service from crashing.

for (name, kind, path, start, end, sig) in rows.flatten() {
context_parts.push(format!(
"- {} `{}` in `{}` (lines {}-{}){}",
kind, name, path, start, end,
sig.map(|s| format!("\n Signature: {}", s)).unwrap_or_default()
));
}
}

Expand Down
53 changes: 37 additions & 16 deletions crates/codeilus-api/src/routes/files.rs
Original file line number Diff line number Diff line change
Expand Up @@ -17,16 +17,29 @@ use crate::state::AppState;
#[derive(Deserialize)]
pub struct FileListQuery {
pub language: Option<String>,
pub limit: Option<i64>,
pub offset: Option<i64>,
}

/// GET /api/v1/files — List all files, optional ?language= filter
/// GET /api/v1/files — List files with pagination, optional ?language= filter
async fn list_files(
State(state): State<AppState>,
Query(query): Query<FileListQuery>,
) -> Result<Json<Vec<FileRow>>, ApiError> {
) -> Result<Json<serde_json::Value>, ApiError> {
let limit = query.limit.unwrap_or(50).clamp(1, 200);
let offset = query.offset.unwrap_or(0).max(0);
let cache_key = format!("files:l={:?}:l={}:o={}", query.language, limit, offset);

if let Some(cached) = state.cache.json.get(&cache_key) {
return Ok(Json(cached));
}

let repo = FileRepo::new(Arc::clone(&state.db));
let files = repo.list(query.language.as_deref())?;
Ok(Json(files))
let files = repo.list_paginated(query.language.as_deref(), limit, offset)?;
let value = serde_json::to_value(&files)
.map_err(|e| ApiError::from(codeilus_core::error::CodeilusError::Internal(e.to_string())))?;
state.cache.json.insert(cache_key, value.clone());
Ok(Json(value))
}

/// GET /api/v1/files/:id — Get a single file by ID
Expand Down Expand Up @@ -83,24 +96,32 @@ async fn get_file_source(
message: "No repository has been analyzed".to_string(),
})?;

// Resolve the file path relative to repo root
// Resolve the file path — handle both relative and absolute paths
let clean_path = file.path.strip_prefix("./").unwrap_or(&file.path);
let full_path = repo_root.join(clean_path);
let full_path = if std::path::Path::new(clean_path).is_absolute() {
std::path::PathBuf::from(clean_path)
} else {
repo_root.join(clean_path)
};

// Canonicalize and verify the path stays within repo root (prevent path traversal)
// Canonicalize and verify the path exists and is safe
let canonical = full_path.canonicalize().map_err(|e| ApiError {
status: StatusCode::NOT_FOUND,
message: format!("Could not resolve file path: {}", e),
})?;
let canonical_root = repo_root.canonicalize().map_err(|e| ApiError {
status: StatusCode::INTERNAL_SERVER_ERROR,
message: format!("Could not resolve repo root: {}", e),
})?;
if !canonical.starts_with(&canonical_root) {
return Err(ApiError {
status: StatusCode::FORBIDDEN,
message: "Path traversal detected".to_string(),
});

// For relative paths, verify within repo root (prevent traversal)
if !std::path::Path::new(clean_path).is_absolute() {
let canonical_root = repo_root.canonicalize().map_err(|e| ApiError {
status: StatusCode::INTERNAL_SERVER_ERROR,
message: format!("Could not resolve repo root: {}", e),
})?;
if !canonical.starts_with(&canonical_root) {
return Err(ApiError {
status: StatusCode::FORBIDDEN,
message: "Path traversal detected".to_string(),
});
}
}
Comment on lines +114 to 125

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-critical critical

This logic introduces a path traversal vulnerability. By providing an absolute path, an attacker can bypass the directory traversal check because the check !canonical.starts_with(&canonical_root) is only performed for relative paths. This could allow reading arbitrary files on the server. The traversal check should be performed for all paths, regardless of whether they are absolute or relative.

    // Verify path is safe and within repo root (prevent traversal)
    let canonical_root = repo_root.canonicalize().map_err(|e| ApiError {
        status: StatusCode::INTERNAL_SERVER_ERROR,
        message: format!("Could not resolve repo root: {}", e),
    })?;
    if !canonical.starts_with(&canonical_root) {
        return Err(ApiError {
            status: StatusCode::FORBIDDEN,
            message: "Path traversal detected".to_string(),
        });
    }


let content = std::fs::read_to_string(&canonical).map_err(|e| {
Expand Down
Loading
Loading