Skip to content

fix: prevent findDbPath from escaping git worktree boundary#457

Merged
carlos-alm merged 24 commits intomainfrom
fix/db-path-worktree-boundary
Mar 16, 2026
Merged

fix: prevent findDbPath from escaping git worktree boundary#457
carlos-alm merged 24 commits intomainfrom
fix/db-path-worktree-boundary

Conversation

@carlos-alm
Copy link
Contributor

@carlos-alm carlos-alm commented Mar 16, 2026

Summary

  • Add findRepoRoot() to src/db/connection.js using git rev-parse --show-toplevel, which returns the correct root for both regular repos and git worktrees. Results are cached per-process.
  • Use findRepoRoot() as a ceiling in findDbPath() so the walk-up stops at the git boundary — worktree sessions resolve to their own DB, not the parent repo's.
  • Add unit tests for findRepoRoot (caching, non-git fallback) and findDbPath ceiling behavior (worktree simulation, non-git fallback).

Context

When codegraph runs inside a git worktree (e.g. .claude/worktrees/agent-xxx/), findDbPath() walked up from process.cwd() and crossed the worktree boundary, finding the main repo's database. This made codegraph unusable in parallel worktree sessions — the exact workflow it's designed to support.

Behavior change

When no existing .codegraph/graph.db is found during the walk-up, the default DB creation path is now the git repo root (from findRepoRoot()) instead of process.cwd(). For example, running codegraph build from /repo/packages/foo/ with no pre-existing DB now creates /repo/.codegraph/graph.db instead of /repo/packages/foo/.codegraph/graph.db. When not inside a git repo, the previous behavior (process.cwd()) is preserved. This is intentional — the repo root is the correct default for most use cases.

Test plan

  • npm test — all existing tests pass
  • In a worktree: cd .claude/worktrees/xxx && npx codegraph build . && npx codegraph stats — uses worktree DB
  • In main repo: codegraph stats — still works as before
  • MCP tools from a worktree session resolve to the correct DB

Compatibility

Only 2 files changed (src/db/connection.js, tests/unit/db.test.js). No overlap with #456 on connection.js. Minor merge conflict possible in tests/unit/db.test.js import lines — trivial to resolve.

findDbPath() walks up from cwd looking for .codegraph/graph.db. In a git
worktree (e.g. .claude/worktrees/agent-xxx/), this crosses the worktree
boundary and finds the main repo's database instead.

Add findRepoRoot() using `git rev-parse --show-toplevel` (which returns
the correct root for both repos and worktrees) and use it as a ceiling
in findDbPath(). The walk-up now stops at the git boundary, so each
worktree resolves to its own database.
@claude
Copy link

claude bot commented Mar 16, 2026

Claude finished @carlos-alm's task —— View job


PR Review: fix: prevent findDbPath from escaping git worktree boundary

Todo List:

  • Read and analyze modified files
  • Phase 1: Root cause analysis
  • Phase 2: Configuration change detection
  • Phase 3: Test integrity assessment
  • Phase 4: Code quality review
  • Phase 5: Backlog compliance evaluation
  • Final assessment and rating

Root Cause Analysis

Problem Being Solved: When codegraph runs inside a git worktree (e.g., .claude/worktrees/agent-xxx/), the findDbPath() function would traverse up the directory tree and cross the worktree boundary, finding the main repo's database instead of creating/using a worktree-specific database. This made codegraph unusable in parallel worktree sessions — the exact workflow it's designed to support.

Why This Approach: The solution uses git rev-parse --show-toplevel to find the git boundary and uses it as a ceiling for the directory walk. This is the correct approach because:

  • git rev-parse --show-toplevel returns the proper root for both regular repos and worktrees
  • The ceiling-based approach prevents directory traversal beyond the git boundary
  • Results are cached for performance (important for repeated calls)

Risk Assessment: ✅ Low risk. The implementation is conservative with proper fallbacks when not in a git repo.

Phase 2: Configuration Change Detection

No configuration changes detected. This PR only adds functionality without modifying any linting rules, build config, or relaxing any constraints.

Phase 3: Test Integrity Assessment

Test integrity is strong. The PR adds comprehensive test coverage including:

  • findRepoRoot caching behavior and non-git fallback scenarios
  • findDbPath ceiling behavior with worktree simulation
  • Edge cases for non-git environments

The tests are well-designed and don't weaken existing assertions. Lines 266-290 in the test file specifically test the core worktree boundary functionality.

Phase 4: Code Quality Review

Strengths:

  • Clean separation of concerns: findRepoRoot() is a focused utility function
  • Proper error handling with try-catch and graceful fallbacks
  • Good caching strategy with per-process cache and explicit cache reset for testing
  • Comprehensive test coverage with edge cases
  • Security-conscious: uses execFileSync instead of shell execution to prevent command injection

Potential Issues:

  • Performance: execFileSync calls are blocking. However, the caching mitigates repeated calls for the common case (src/db/connection.js:19, 32)
  • Error handling: Silent catch blocks (lines 29-31) could potentially hide git configuration issues, but this is acceptable for graceful degradation

Code Style: Follows existing patterns in the codebase. Uses proper JSDoc documentation.

Phase 5: Backlog Compliance

Zero-dep: ✅ No new runtime dependencies. Uses only Node.js built-ins (node:child_process, node:path) and existing dependencies.

Foundation-aligned: ✅ Fully aligned with FOUNDATION.md principles:

  • Maintains zero-infrastructure requirement (uses local git commands)
  • No cloud API calls or external services
  • Purely enhances existing local functionality
  • Doesn't change the programmatic API contract

Problem-fit: 4/5 — This directly addresses a critical workflow issue for AI coding assistants using parallel sessions. When agents can't properly isolate their graph databases in worktrees, they interfere with each other and produce incorrect results. This fix enables the intended parallel session workflow.

Breaking: No — This is a pure enhancement. The findDbPath() function signature remains unchanged, and the behavior only improves for worktree scenarios while maintaining backward compatibility for regular repos.

Tier classification: Tier 1 (zero-dep ✓ + foundation-aligned ✓)

Critical Concerns

None identified. This is a well-engineered fix for a legitimate workflow issue.

Minor Suggestions

  1. Consider adding a debug log when the git ceiling is applied (for troubleshooting edge cases)
  2. The test at lines 218-224 could be more robust by testing against a known non-git directory rather than assuming os.tmpdir() isn't a repo

Final Recommendation

  • Rating: ⭐⭐⭐⭐⭐ (5/5)
  • Action: APPROVE
  • Reasoning: This PR solves a critical workflow issue with minimal risk, excellent test coverage, and clean implementation. It directly enables the parallel worktree sessions that codegraph is designed to support, with no breaking changes or dependencies added. The code quality is high with proper error handling and performance considerations.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 16, 2026

Greptile Summary

This PR fixes a real correctness bug where findDbPath() would escape a git worktree boundary and resolve to the parent repository's database — breaking parallel worktree workflows. The fix adds findRepoRoot() (backed by git rev-parse --show-toplevel, with cwd-keyed caching) and threads its result as a ceiling into the findDbPath() walk-up loop.

Key changes:

  • findRepoRoot() resolves symlinks (realpathSync) and uses inode-based comparison (isSameDirectory) to handle macOS /tmp → /private/tmp and Windows 8.3 short-name edge cases robustly.
  • Cache is keyed on process.cwd() at call time, so an MCP server serving multiple worktree sessions doesn't reuse a stale ceiling.
  • Default DB creation path changes from process.cwd() to the git root when inside a git repo (documented in PR description under Behavior change).
  • _resetRepoRootCache is correctly excluded from the public barrel (src/db/index.js) and imported directly from connection.js in tests.
  • Tests cover ceiling stop, within-boundary discovery, null-ceiling fallback, spy-verified caching, and macOS/Windows path equivalence — all previously-raised gaps have been addressed.

Confidence Score: 4/5

  • Safe to merge — the ceiling logic is correct and all previously-raised issues have been resolved; one minor test-hermetics gap remains.
  • Production code in connection.js is correct and handles all identified edge cases (symlinks, 8.3 short names, MCP multi-session cwd changes). All critical issues from prior review rounds have been addressed. Score is 4 rather than 5 solely because the pre-existing 'finds .codegraph/graph.db walking up parent directories' test still relies implicitly on the test-runner not being in a git-controlled temp directory, unlike the newly-added tests that control the ceiling explicitly.
  • No files require special attention — src/db/connection.js is the only production file changed and its logic is sound.

Important Files Changed

Filename Overview
src/db/connection.js Adds findRepoRoot() (with cwd-keyed caching) and _resetRepoRootCache(), then updates findDbPath() to use the git root as a ceiling. Logic is correct: ceiling check fires after DB-existence check on the same directory, isSameDirectory handles macOS symlinks and Windows 8.3 names via inode comparison, and process.cwd() fallback is preserved when not in a git repo. No issues found in production code.
src/db/index.js Adds findRepoRoot to the public barrel export (intentional public API). Correctly excludes _resetRepoRootCache (test helper), which is now imported directly from connection.js in tests. No issues.
tests/unit/db.test.js Comprehensive test coverage for findRepoRoot (caching, bypass, null path) and findDbPath ceiling behavior using a real git init. All previously-flagged issues addressed (spy-based cache verification, macOS symlink handling via findRepoRoot() for expected path, mockImplementationOnce for hermetic null-ceiling tests). One remaining minor gap: the pre-existing 'finds .codegraph/graph.db walking up parent directories' test doesn't mock git, relying implicitly on os.tmpdir() not being inside a git repo.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["findDbPath(customPath)"] --> B{customPath?}
    B -- yes --> C["return path.resolve(customPath)"]
    B -- no --> D["ceiling = findRepoRoot()"]
    D --> D1{cached &\ncwd unchanged?}
    D1 -- yes --> D2["return _cachedRepoRoot"]
    D1 -- no --> D3["git rev-parse --show-toplevel"]
    D3 -- success --> D4["realpathSync(raw) → ceiling"]
    D3 -- throws --> D5["ceiling = null"]
    D4 --> E
    D5 --> E
    D2 --> E
    E["dir = realpathSync(cwd)"]
    E --> F["candidate = dir/.codegraph/graph.db"]
    F --> G{exists?}
    G -- yes --> H["return candidate"]
    G -- no --> I{ceiling &&\nisSameDirectory\ndirceiling?}
    I -- yes --> J["break — at ceiling"]
    I -- no --> K{parent === dir?}
    K -- yes --> J
    K -- no --> L["dir = parent"]
    L --> F
    J --> M["base = ceiling ∥ cwd()"]
    M --> N["return base/.codegraph/graph.db"]
Loading

Last reviewed commit: b87a12b

Comment on lines +266 to +290
it('stops walking at git ceiling and does not find parent DB', () => {
const origCwd = process.cwd;
// Mock cwd to be inside the inner "worktree"
process.cwd = () => innerDir;
// Mock findRepoRoot to return the worktree root (one level up from innerDir)
const worktreeRoot = path.join(outerDir, 'worktree');
const origFindRepoRoot = findRepoRoot;

// We can't easily mock findRepoRoot since findDbPath calls it internally,
// but we can test the behavior by creating a DB at the worktree level
fs.mkdirSync(path.join(worktreeRoot, '.codegraph'), { recursive: true });
fs.writeFileSync(path.join(worktreeRoot, '.codegraph', 'graph.db'), '');

try {
_resetRepoRootCache();
const result = findDbPath();
// Should find the worktree DB, not the outer one
expect(result).toContain('worktree');
expect(result).toContain('.codegraph');
} finally {
process.cwd = origCwd;
// Clean up the worktree DB
fs.rmSync(path.join(worktreeRoot, '.codegraph'), { recursive: true, force: true });
}
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ceiling test doesn't exercise ceiling logic

This test runs inside a temp directory that is not a git repo, so findRepoRoot() returns null and the ceiling check on connection.js:104 (if (ceiling && ...)) is never triggered. The test passes because both DBs exist in the walk-up path and the worktree-level DB is found first — this would happen with or without the ceiling fix.

To actually verify that the ceiling prevents crossing the worktree boundary, you'd need to either:

  1. Mock findRepoRoot (e.g., via vi.mock or dependency injection) to return the worktreeRoot path, then remove the worktree-level DB so the only DB is the outer one — and assert it is NOT found.
  2. Initialize an actual git repo (via git init) in the worktreeRoot so findRepoRoot() returns it as the ceiling.

Without this, the core feature of this PR (stopping the walk at the git boundary) has no effective test coverage.

Additionally, origFindRepoRoot on line 272 is assigned but never used — leftover from an abandoned mocking approach.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 83efde5. The ceiling test now uses git init in the worktree root so findRepoRoot() returns a real ceiling. The test no longer creates a worktree-level DB — the only DB is in the outer dir (beyond the ceiling), and the assertion verifies the walk stops at the ceiling and returns the default path at the worktree root, not the outer DB. Also removed the unused origFindRepoRoot variable.

Comment on lines +218 to +224
it('returns null when not in a git repo', () => {
const root = findRepoRoot(os.tmpdir());
// os.tmpdir() is typically not a git repo; if it is, skip gracefully
if (root === null) {
expect(root).toBeNull();
}
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test silently passes without asserting

If os.tmpdir() happens to be inside a git repo (possible on some CI environments or developer machines), this test passes without making any assertion at all — the if block is skipped. Consider using vi.spyOn to mock execFileSync to throw, ensuring the null path is always exercised regardless of the host environment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 83efde5. Now uses vi.mock('node:child_process') with mockImplementationOnce to force execFileSync to throw, ensuring the null path is always exercised regardless of host environment.

process.cwd = () => innerDir;
// Mock findRepoRoot to return the worktree root (one level up from innerDir)
const worktreeRoot = path.join(outerDir, 'worktree');
const origFindRepoRoot = findRepoRoot;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused variable

origFindRepoRoot is assigned but never used. This appears to be a leftover from an abandoned attempt to mock findRepoRoot.

Suggested change
const origFindRepoRoot = findRepoRoot;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 83efde5. Removed the unused origFindRepoRoot variable.

…t test

- Ceiling test now uses `git init` to create a real git repo boundary,
  and verifies the outer DB is NOT found (without the ceiling fix it
  would be). Added separate test for finding DB within the boundary.
- Non-git test uses a fresh mkdtemp dir instead of os.tmpdir().
- Added debug log when ceiling stops the walk-up.
- Removed unused `origFindRepoRoot` variable.
@carlos-alm
Copy link
Contributor Author

Addressed all three review comments in deac23e:

1. Ceiling test doesn't exercise ceiling logic — Rewrote the test to use git init in the worktree dir so findRepoRoot() returns a real ceiling. The outer DB (beyond the ceiling) is the only one — asserts it is NOT found and the default path resolves at the ceiling root instead. Added a separate test that creates a DB within the boundary to verify normal walk-up still works.

2. Test silently passes without asserting — Replaced os.tmpdir() with a fresh mkdtemp directory that is guaranteed not to be inside a git repo. The assertion is now unconditional.

3. Unused variable origFindRepoRoot — Removed.

Also added a debug() log when the git ceiling stops the walk-up, for troubleshooting edge cases.

@greptileai

Comment on lines +112 to +113
const base = ceiling || process.cwd();
return path.join(base, '.codegraph', 'graph.db');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Undocumented behavior change for non-worktree users

When no existing DB is found during the walk-up, the default creation path has silently changed from process.cwd() to ceiling (the git repo root) for any user inside a git repository. Previously, running codegraph build from /repo/sub/ with no DB would create /repo/sub/.codegraph/graph.db. After this PR, it creates /repo/.codegraph/graph.db.

This is likely the desired long-term behavior, but it's a breaking change for users who intentionally scope their codegraph DB to a project subdirectory rather than the repo root. The PR description only mentions the worktree ceiling fix, not the changed default creation location.

Consider documenting this in the PR description or CHANGELOG, or preserving the previous fallback for non-worktree scenarios:

const base = ceiling || process.cwd();
return path.join(base, '.codegraph', 'graph.db');

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch on documenting this. Updated the PR description with a Behavior change section explaining that the default DB creation path is now the git repo root instead of process.cwd() when inside a git repo. The process.cwd() fallback is preserved when not in a git repo (line 109: const base = ceiling || process.cwd()).

Comment on lines +230 to +235
it('caches results when called without arguments', () => {
_resetRepoRootCache();
const first = findRepoRoot();
const second = findRepoRoot();
expect(first).toBe(second);
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caching test does not actually verify caching

JavaScript string primitives are compared by value, not reference, so expect(first).toBe(second) passes even if execFileSync is called twice and returns two distinct string objects with the same content. The test does not prove that the cache is being used.

To truly verify the caching behaviour, spy on execFileSync and assert it was called exactly once:

it('caches results when called without arguments', () => {
  _resetRepoRootCache();
  const spy = vi.spyOn(childProcess, 'execFileSync');
  const first = findRepoRoot();
  const second = findRepoRoot();
  expect(first).toBe(second);
  expect(spy).toHaveBeenCalledTimes(1); // cached on second call
  spy.mockRestore();
});

This requires importing node:child_process as a namespace (import * as childProcess from 'node:child_process') so the spy can intercept the module-level binding.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 83efde5. The caching test now uses vi.mock('node:child_process') to create a spy on execFileSync, clears it before the test, and asserts toHaveBeenCalledTimes(1) — proving the second call hits the cache.

Comment on lines +237 to +243
it('does not use cache when called with explicit dir', () => {
_resetRepoRootCache();
const fromCwd = findRepoRoot();
// Calling with an explicit dir should still work (not use cwd cache)
const fromExplicit = findRepoRoot(process.cwd());
expect(fromExplicit).toBe(fromCwd);
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test name does not match what is being verified

The test is named "does not use cache when called with explicit dir", implying it verifies that execFileSync is invoked again when fromDir is provided. However, the test only asserts that both calls return the same value — which would pass whether or not the cache is bypassed.

Consider either renaming the test to reflect what is actually checked ('returns same value whether using cache or explicit dir'), or adding a spy to assert that execFileSync is called a second time when an explicit fromDir is passed.

Suggested change
it('does not use cache when called with explicit dir', () => {
_resetRepoRootCache();
const fromCwd = findRepoRoot();
// Calling with an explicit dir should still work (not use cwd cache)
const fromExplicit = findRepoRoot(process.cwd());
expect(fromExplicit).toBe(fromCwd);
});
it('returns same value whether using cache or explicit dir', () => {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 83efde5. Renamed to "bypasses cache when called with explicit dir" and added a spy assertion: expect(execFileSyncSpy).toHaveBeenCalledTimes(2) — confirming the second call with an explicit dir actually invokes execFileSync again instead of using the cache.

- Mock execFileSync via vi.mock to verify caching calls exactly once
  and cache bypass calls twice (not just comparing return values)
- Mock execFileSync to throw in "not in git repo" test so the null
  path is always exercised regardless of host environment
- Rename "does not use cache" test to "bypasses cache" with spy assertion
@carlos-alm
Copy link
Contributor Author

@greptileai

Comment on lines +317 to +330
it('falls back gracefully when not in a git repo', () => {
const emptyDir = fs.mkdtempSync(path.join(os.tmpdir(), 'cg-nogit-'));
const origCwd = process.cwd;
process.cwd = () => emptyDir;
_resetRepoRootCache();
try {
const result = findDbPath();
// Should return default path at cwd since there's no git ceiling
expect(result).toBe(path.join(emptyDir, '.codegraph', 'graph.db'));
} finally {
process.cwd = origCwd;
fs.rmSync(emptyDir, { recursive: true, force: true });
}
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flaky non-git fallback test — same issue that was fixed in findRepoRoot

This test creates emptyDir inside os.tmpdir() but never mocks execFileSync to throw. The spy's default implementation calls the real execFileSync, so if the test host's os.tmpdir() happens to be inside a git repository (possible on some CI setups or Docker images), findRepoRoot() will return a non-null ceiling. findDbPath() would then return ceiling/.codegraph/graph.db instead of emptyDir/.codegraph/graph.db, making the expect(result).toBe(path.join(emptyDir, ...)) assertion fail.

The analogous test in findRepoRoot ('returns null when not in a git repo') was already fixed by using mockImplementationOnce to force execFileSync to throw. The same fix should be applied here:

it('falls back gracefully when not in a git repo', () => {
  const emptyDir = fs.mkdtempSync(path.join(os.tmpdir(), 'cg-nogit-'));
  const origCwd = process.cwd;
  process.cwd = () => emptyDir;
  _resetRepoRootCache();
  execFileSyncSpy.mockImplementationOnce(() => {
    throw new Error('not a git repo');
  });
  try {
    const result = findDbPath();
    expect(result).toBe(path.join(emptyDir, '.codegraph', 'graph.db'));
  } finally {
    process.cwd = origCwd;
    fs.rmSync(emptyDir, { recursive: true, force: true });
  }
});

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in e3c1d4a. Added \ before the test, ensuring \ always returns null regardless of host environment.

* Unit tests for src/db.js — build_meta helpers included
*/

import { execFileSync as realExecFileSync } from 'node:child_process';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

realExecFileSync is not the real function — it's the spy

Due to Vitest's vi.mock hoisting, vi.mock('node:child_process', ...) is applied before all import statements are resolved, including this one. As a result, realExecFileSync is bound to execFileSyncSpy, not the unspied original.

This is currently harmless — the spy's default mockImplementation delegates to the real execFileSync, and realExecFileSync is only used in beforeAll (before any mockImplementationOnce calls). However, the name realExecFileSync is misleading: it implies the import bypasses the mock, which it does not. A future developer who relies on this assumption (e.g., calling it after a mockReset()) would get unexpected behavior.

Consider renaming it to execFileSyncUnspied and obtaining it via importOriginal inside the factory, or at minimum add a comment:

Suggested change
import { execFileSync as realExecFileSync } from 'node:child_process';
import { execFileSync as execFileSyncForSetup } from 'node:child_process'; // Note: this is the spy's delegate, not the original — only safe to call before test-level mockImplementationOnce overrides

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in e3c1d4a. Renamed realExecFileSyncexecFileSyncForSetup and added a comment explaining it resolves to the spy due to vi.mock hoisting, and is only safe to call before test-level mockImplementationOnce overrides.

…rt name

- Mock execFileSync to throw in "falls back gracefully when not in a git
  repo" test, preventing flakiness when os.tmpdir() is inside a git repo
- Rename realExecFileSync → execFileSyncForSetup with comment explaining
  it resolves to the spy due to vi.mock hoisting
@carlos-alm
Copy link
Contributor Author

@greptileai

Combine PR's connection.js imports (findRepoRoot, _resetRepoRootCache)
with main's barrel db/index.js pattern. Add missing re-exports to barrel.

Impact: 22 functions changed, 10 affected
Comment on lines 155 to 168
@@ -144,11 +160,13 @@ describe('findDbPath', () => {
const origCwd = process.cwd;
process.cwd = () => emptyDir;
try {
_resetRepoRootCache();
const result = findDbPath();
expect(result).toContain('.codegraph');
expect(result).toContain('graph.db');
} finally {
process.cwd = origCwd;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

findDbPath fallback test doesn't control the ceiling

This test calls _resetRepoRootCache() but never mocks execFileSync to throw, so findRepoRoot() will call the real git rev-parse --show-toplevel from emptyDir. If the test host's os.tmpdir() is inside a git repo (common in some CI setups), ceiling will be set to that repo root and the default path returned will be <git-root>/.codegraph/graph.db rather than emptyDir/.codegraph/graph.db. The toContain assertions pass either way, so the test silently misses validating the actual fallback location.

The analogous 'falls back gracefully when not in a git repo' test in the new findDbPath with git ceiling describe already solves this correctly with execFileSyncSpy.mockImplementationOnce. The same treatment should be applied here:

Suggested change
it('returns default path when no DB found', () => {
const emptyDir = fs.mkdtempSync(path.join(tmpDir, 'empty-'));
const origCwd = process.cwd;
process.cwd = () => emptyDir;
execFileSyncSpy.mockImplementationOnce(() => {
throw new Error('not a git repo');
});
try {
_resetRepoRootCache();
const result = findDbPath();
expect(result).toBe(path.join(emptyDir, '.codegraph', 'graph.db'));
} finally {
process.cwd = origCwd;
_resetRepoRootCache();
}
});

This ensures ceiling = null, the process.cwd() fallback is exercised deterministically, and the assertion is precise enough to detect regressions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 0d0fdd9. The test now mocks execFileSync to throw (via mockImplementationOnce) so findRepoRoot() always returns null, ensuring the process.cwd() fallback is deterministically exercised. Also tightened the assertion from toContain to toBe with the exact expected path.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was already fixed in 0d0fdd9 — the test now mocks execFileSync to throw via mockImplementationOnce, ensuring findRepoRoot() always returns null regardless of host environment. The assertion was also tightened from toContain to toBe with the exact expected path.

On macOS, os.tmpdir() returns /var/folders/... but git rev-parse
returns /private/var/folders/... (resolved symlink). The ceiling
comparison failed because the paths didn't match. Use fs.realpathSync
on cwd to normalize symlinks before comparing against the ceiling.

Impact: 1 functions changed, 1 affected
Impact: 1 functions changed, 1 affected
…boundary

Impact: 9 functions changed, 19 affected
The 'returns default path when no DB found' test didn't control the git
ceiling — if tmpDir was inside a git repo, findRepoRoot() would return
a non-null ceiling and the default path would differ from emptyDir.
Mock execFileSync to throw so the cwd fallback is always exercised.
@carlos-alm
Copy link
Contributor Author

@greptileai

…m ceiling

On macOS, os.tmpdir() returns /var/... but git resolves symlinks to
/private/var/..., and on Windows, 8.3 short names (RUNNER~1) differ
from long names (runneradmin). findDbPath normalizes dir via
fs.realpathSync but findRepoRoot only used path.resolve, causing the
ceiling comparison to fail — the walk crossed the worktree boundary.

Fix findRepoRoot to use fs.realpathSync on git output, and resolve
test paths after directory creation so assertions match.

Impact: 1 functions changed, 77 affected
Comment on lines +300 to +301
try {
_resetRepoRootCache();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test assertion will fail on macOS when /tmp is the temp dir

result is built from ceiling, which is the real (symlink-resolved) path returned by git rev-parse --show-toplevel. On macOS, if os.tmpdir() resolves through /tmp (which is a symlink to /private/tmp), git will return /private/tmp/cg-ceiling-xxx/worktree while worktreeRoot is still /tmp/cg-ceiling-xxx/worktree. The toBe assertion then compares:

  • actual: /private/tmp/cg-ceiling-xxx/worktree/.codegraph/graph.db
  • expected: /tmp/cg-ceiling-xxx/worktree/.codegraph/graph.db

These are not equal, so the test fails. The production code in connection.js is actually correct — realpathSync is applied to dir precisely to ensure ceiling (from git) and dir agree on symlinks. The test expectation just needs to match the same resolved form:

Suggested change
try {
_resetRepoRootCache();
expect(result).toBe(path.join(fs.realpathSync(worktreeRoot), '.codegraph', 'graph.db'));
expect(result).not.toContain(`${path.basename(outerDir)}${path.sep}.codegraph`);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 651c164. The test now uses \ for the expected ceiling instead of the test's , so the assertion always matches the path form that \ uses internally. This handles both the macOS symlink case (\ → ) and the Windows 8.3 short name case (\ → ). Additionally, the production code now uses a stat-based \ comparison (dev+ino fallback) instead of string equality for the ceiling check, which is robust against all path representation differences.

git rev-parse resolves 8.3 short names (RUNNER~1 → runneradmin) but
fs.realpathSync may not, causing the string comparison to fail and the
walk to escape the git ceiling boundary. Replace string equality with
isSameDirectory() that falls back to dev+ino comparison when paths
differ textually but refer to the same directory.

Also fix the test assertion to use findRepoRoot() for the expected
ceiling path instead of the test's worktreeRoot, since git may resolve
paths differently than realpathSync.

Impact: 2 functions changed, 137 affected
@carlos-alm
Copy link
Contributor Author

@greptileai

Comment on lines +2 to +9
export {
_resetRepoRootCache,
closeDb,
findDbPath,
findRepoRoot,
openDb,
openReadonlyOrFail,
} from './connection.js';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test-only utility exported through the public barrel

_resetRepoRootCache is a testing helper (signalled by the leading underscore) but is now re-exported from the public barrel src/db/index.js. Any consumer that imports from …/db/index.js will see this function in their IDE completions and bundled types, and could accidentally call it in production code to blow away the per-process cache.

A cleaner approach is to have the test files import directly from the implementation module:

// in tests/unit/db.test.js
import { _resetRepoRootCache, findRepoRoot, ... } from '../../src/db/connection.js';

Then remove _resetRepoRootCache from the barrel export, keeping the public API free of test infrastructure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in the latest push. Removed _resetRepoRootCache from the src/db/index.js barrel export. The test file now imports it directly from ../../src/db/connection.js, keeping the public API free of test infrastructure.

@carlos-alm
Copy link
Contributor Author

@greptileai

Comment on lines +315 to +330
it('finds DB within the ceiling boundary', () => {
// Create a DB inside the worktree — should be found normally
fs.mkdirSync(path.join(worktreeRoot, '.codegraph'), { recursive: true });
fs.writeFileSync(path.join(worktreeRoot, '.codegraph', 'graph.db'), '');
const origCwd = process.cwd;
process.cwd = () => innerDir;
try {
_resetRepoRootCache();
const result = findDbPath();
expect(result).toContain('worktree');
expect(result).toContain('.codegraph');
} finally {
process.cwd = origCwd;
fs.rmSync(path.join(worktreeRoot, '.codegraph'), { recursive: true, force: true });
}
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Loose assertions don't validate the found DB location precisely

The two toContain assertions verify that the result string contains 'worktree' and '.codegraph', but they don't confirm the DB was found at worktreeRoot rather than innerDir. A path like path.join(innerDir, '.codegraph', 'graph.db') would also satisfy both checks. Every other assertion in this describe block uses toBe with an exact path. Consider using the same here for consistency and precision:

Suggested change
it('finds DB within the ceiling boundary', () => {
// Create a DB inside the worktree — should be found normally
fs.mkdirSync(path.join(worktreeRoot, '.codegraph'), { recursive: true });
fs.writeFileSync(path.join(worktreeRoot, '.codegraph', 'graph.db'), '');
const origCwd = process.cwd;
process.cwd = () => innerDir;
try {
_resetRepoRootCache();
const result = findDbPath();
expect(result).toContain('worktree');
expect(result).toContain('.codegraph');
} finally {
process.cwd = origCwd;
fs.rmSync(path.join(worktreeRoot, '.codegraph'), { recursive: true, force: true });
}
});
it('finds DB within the ceiling boundary', () => {
// Create a DB inside the worktree — should be found normally
fs.mkdirSync(path.join(worktreeRoot, '.codegraph'), { recursive: true });
fs.writeFileSync(path.join(worktreeRoot, '.codegraph', 'graph.db'), '');
const origCwd = process.cwd;
process.cwd = () => innerDir;
try {
_resetRepoRootCache();
const result = findDbPath();
const ceiling = findRepoRoot();
expect(result).toBe(path.join(ceiling, '.codegraph', 'graph.db'));
} finally {
process.cwd = origCwd;
fs.rmSync(path.join(worktreeRoot, '.codegraph'), { recursive: true, force: true });
}
});

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — tightened the assertion to use toBe with the exact ceiling-based path, consistent with the rest of the describe block.

Comment on lines +18 to +41
export function findRepoRoot(fromDir) {
const dir = fromDir || process.cwd();
if (!fromDir && _cachedRepoRoot !== undefined) return _cachedRepoRoot;
let root = null;
try {
const raw = execFileSync('git', ['rev-parse', '--show-toplevel'], {
cwd: dir,
encoding: 'utf-8',
stdio: ['pipe', 'pipe', 'pipe'],
}).trim();
// Use realpathSync to resolve symlinks (macOS /var → /private/var) and
// 8.3 short names (Windows RUNNER~1 → runneradmin) so the ceiling path
// matches the realpathSync'd dir in findDbPath.
try {
root = fs.realpathSync(raw);
} catch {
root = path.resolve(raw);
}
} catch {
root = null;
}
if (!fromDir) _cachedRepoRoot = root;
return root;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stale cache if process.cwd() changes after first call

findRepoRoot() (no args) caches its result the first time it's called and returns that value on every subsequent no-arg call — even if process.cwd() has changed in the meantime. In a long-running process (e.g. an MCP server that serves multiple worktree sessions sequentially), a second session with a different cwd would receive the ceiling from the first session. findDbPath() then computes dir from the current process.cwd() but compares it against a ceiling from the old cwd, allowing the walk to escape the expected boundary.

The per-process cache is intentional and documented, and it's benign if each codegraph invocation is a fresh process. However, for the MCP-server use case explicitly called out in the PR, the lifetime of the process spans multiple client connections. A defensive fix would be to key the cache on the cwd value captured at call time:

let _cachedRepoRoot; // undefined = not computed
let _cachedRepoRootCwd; // cwd when the cache was populated

export function findRepoRoot(fromDir) {
  const dir = fromDir || process.cwd();
  if (!fromDir && _cachedRepoRoot !== undefined && _cachedRepoRootCwd === dir) {
    return _cachedRepoRoot;
  }
  // ... rest of implementation ...
  if (!fromDir) {
    _cachedRepoRoot = root;
    _cachedRepoRootCwd = dir;
  }
  return root;
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — the cache is now keyed on the cwd value at call time. If process.cwd() changes between calls (e.g. MCP server serving multiple sessions), the cache is invalidated and recomputed. _resetRepoRootCache clears both the cached value and the cwd key.

Impact: 2 functions changed, 2 affected
@carlos-alm
Copy link
Contributor Author

@greptileai

…hort names

Impact: 1 functions changed, 1 affected
…tave/codegraph into fix/db-path-worktree-boundary

Impact: 13 functions changed, 3 affected
On Windows CI, fs.realpathSync(process.cwd()) and git rev-parse
--show-toplevel can resolve 8.3 short names (RUNNER~1) differently
than long names (runneradmin). Apply realpathSync to both sides of
the assertion so the comparison is path-equivalent.
…boundary

# Conflicts:
#	src/db/index.js

Impact: 10 functions changed, 11 affected
@carlos-alm carlos-alm merged commit c6e75e3 into main Mar 16, 2026
14 checks passed
@carlos-alm carlos-alm deleted the fix/db-path-worktree-boundary branch March 16, 2026 11:10
@github-actions github-actions bot locked and limited conversation to collaborators Mar 16, 2026
@carlos-alm
Copy link
Contributor Author

@greptileai

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant