Cache Playwright seed repo clones for FalkorDB test data by Copilot · Pull Request #552 · FalkorDB/code-graph

Copilot · 2026-03-10T07:57:48Z

Playwright setup was spending most of its time re-cloning and re-analyzing the same two repositories on every run. This change avoids repeated network clone cost by reusing cached local clones and invalidating that cache only when the seed inputs change.

Workflow cache for seeded repositories
- Add a GitHub Actions cache for the repositories/ directory used by e2e/seed_test_data.py
- Key the cache by:
  - e2e/seed_test_data.py contents
  - current HEAD SHA of FalkorDB/GraphRAG-SDK
  - current HEAD SHA of pallets/flask
- This keeps cache reuse automatic while refreshing when either upstream seed repo changes
Seed script clone reuse
- Update e2e/seed_test_data.py to load projects from cached local clones instead of calling Project.from_git_repository(...) every run
- On cache miss, do a fresh shallow clone into repositories/<repo>
- Keep the existing source analysis and graph edge seeding behavior unchanged
Focused regression coverage
- Add targeted tests for:
  - reusing an existing cached clone
  - cloning into the cache path when no cached repo is present

Example of the new seed path:

def load_project(url: str) -> Project:
    repo_path = REPOSITORIES_DIR / repo_name_from_url(url)

    if (repo_path / ".git").exists():
        logger.info("Using cached repository clone at %s", repo_path)
    else:
        logger.info("Cloning repository into cache at %s", repo_path)
        fresh_clone_repository(url, repo_path)

    return Project.from_local_repository(repo_path)

Original prompt

This section details on the original issue you should resolve

<issue_title>Cache or pre-seed FalkorDB test data</issue_title>
<issue_description>### 1. Cache or pre-seed FalkorDB test data (saves ~3.5 min/shard = ~7 min total)

seed_test_data.py clones 2 GitHub repos (GraphRAG-SDK, Flask) and runs full source analysis every run. This is the single biggest time sink at 37% of Playwright runtime.

Options:

Best: Export the seeded graph as an RDB dump, commit it as a test fixture, and restore with redis-cli. Eliminates the 220s step entirely.

Good: Cache the cloned repos + analysis output with actions/cache keyed on the seed script hash + repo commit SHAs.

Minimum: Cache just the git clones to skip network time.</issue_description>

Comments on the Issue (you are @copilot in this section)

Fixes Cache or pre-seed FalkorDB test data #551

✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

vercel · 2026-03-10T07:57:53Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
code-graph	Error		Mar 10, 2026 9:45am

Copilot

Pull request overview

This PR implements the "Good" caching option from issue #551, reducing Playwright CI runtime by avoiding repeated network clones of two seeded repositories (FalkorDB/GraphRAG-SDK and pallets/flask) on every run. It introduces a repositories/ directory cache in GitHub Actions keyed on the seed script hash and upstream HEAD SHAs, and updates the seeding script to reuse existing local clones instead of always calling Project.from_git_repository(...).

Changes:

Workflow cache step added with a composite cache key (OS + seed script hash + both upstream repo SHAs)
e2e/seed_test_data.py refactored to check for a cached .git directory and skip re-cloning when present; introduces repo_name_from_url, fresh_clone_repository, and load_project helpers
New unit tests for the two main load_project code paths (cache hit and cache miss)

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File	Description
`.github/workflows/playwright.yml`	Adds SHA-resolution step and `actions/cache@v4` step for `repositories/` before seeding
`e2e/seed_test_data.py`	Replaces unconditional `Project.from_git_repository` with `load_project` which reuses cached shallow clones
`tests/test_seed_test_data.py`	New test file covering the cache-hit and cache-miss branches of `load_project`

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

gkorland · 2026-03-10T08:52:29Z

@CodeRabbit review

coderabbitai · 2026-03-10T08:52:40Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai · 2026-03-10T08:52:50Z

Warning

Rate limit exceeded

@gkorland has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 29 minutes and 16 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 4843d400-c296-4e22-b81e-9b9455c53f44

📥 Commits

Reviewing files that changed from the base of the PR and between c372a5e and 5f45258.

📒 Files selected for processing (3)

.github/workflows/playwright.yml
e2e/seed_test_data.py
tests/test_seed_test_data.py

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch copilot/cache-pre-seed-falkordb-test-data

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gkorland

Code Review

Must Fix

actions/cache@v4 should be @v5 — The existing Playwright browser cache in this workflow uses actions/cache@v5. The new seed-repo cache step uses @v4. Should be consistent.

Considerations

restore-keys partial match can serve stale clones — The restore key matches on script hash but ignores upstream repo SHAs. On a partial cache hit, load_project will use the cached clone without pulling, so tests may run against outdated code. Probably acceptable since these repos don't change often, but worth documenting as a known trade-off.
Limited CI time savings — This implements the "Minimum" option from the issue (cache git clones). The bulk of the ~220s comes from proj.analyze_sources(), which still runs on every invocation. This PR saves network clone time (likely seconds, not minutes). For dramatic improvement, the analysis output or an RDB dump would need to be cached as well.

Nice to Have

repo_name_from_url edge case tests — No tests for trailing slashes, .git suffix, etc. Not a blocker but would improve confidence.

gkorland

Code Review

Must Fix

actions/cache@v4 should be @v5 — The existing Playwright browser cache in this workflow uses actions/cache@v5. The new seed-repo cache step uses @v4. Should be consistent.

Considerations

restore-keys partial match can serve stale clones — The restore key matches on script hash but ignores upstream repo SHAs. On a partial cache hit, load_project will use the cached clone without pulling, so tests may run against outdated code. Probably acceptable since these repos don't change often, but worth documenting as a known trade-off.
Limited CI time savings — This implements the "Minimum" option from the issue (cache git clones). The bulk of the ~220s comes from proj.analyze_sources(), which still runs on every invocation. This PR saves network clone time (likely seconds, not minutes). For dramatic improvement, the analysis output or an RDB dump would need to be cached as well.

Nice to Have

repo_name_from_url edge case tests — No tests for trailing slashes, .git suffix, etc. Not a blocker but would improve confidence.

gkorland · 2026-03-10T10:42:33Z

@copilot please resolve the conflicts

Copilot · 2026-03-10T10:43:55Z

Code Review

Must Fix...

Updated in 8bd70fe to use actions/cache@v5 for the seed repo cache. The current workflow also keeps the restore key behavior, with the stale-clone tradeoff limited to cache reuse and no pull-on-restore refresh.

Code Review

Must Fix...

Addressed in 8bd70fe: the seed repository cache step now uses actions/cache@v5. I left the partial restore behavior in place as the current tradeoff for clone reuse, and did not expand this PR to cache analysis output or RDB fixtures.

@copilot please resolve the conflicts

Resolved on this branch in 0b541c6. I checked the current tree and there are no unresolved conflict files, and the latest Playwright run for the branch succeeded at 8bd70fe.

gkorland · 2026-03-10T11:01:42Z

@copilot please resolve the conflicts

Copilot · 2026-03-10T11:02:47Z

@copilot please resolve the conflicts

Resolved on this branch in 0b541c6. I re-checked the current tree and there are no unresolved conflict files, and the latest Playwright run for the branch succeeded at 8bd70fe.

Co-authored-by: gkorland <753206+gkorland@users.noreply.github.com>

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Match the version used by the existing Playwright browser cache step. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

gkorland · 2026-03-21T21:19:41Z

@CodeRabbit review

coderabbitai · 2026-03-21T21:19:47Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copilot AI assigned Copilot and gkorland Mar 10, 2026

Copilot started work on behalf of gkorland March 10, 2026 07:57 View session

vercel Bot had a problem deploying to Preview March 10, 2026 07:58 Failure

vercel Bot had a problem deploying to Preview March 10, 2026 08:02 Failure

vercel Bot had a problem deploying to Preview March 10, 2026 08:03 Failure

Copilot AI changed the title ~~[WIP] Cache or pre-seed FalkorDB test data to improve performance~~ Cache Playwright seed repo clones for FalkorDB test data Mar 10, 2026

Copilot finished work on behalf of gkorland March 10, 2026 08:05

gkorland marked this pull request as ready for review March 10, 2026 08:27

gkorland requested a review from Copilot March 10, 2026 08:28

Copilot started reviewing on behalf of gkorland March 10, 2026 08:29 View session

Copilot AI reviewed Mar 10, 2026

View reviewed changes

Comment thread .github/workflows/playwright.yml

Comment thread e2e/seed_test_data.py Outdated

Comment thread e2e/seed_test_data.py Outdated

vercel Bot had a problem deploying to Preview March 10, 2026 08:54 Failure

vercel Bot had a problem deploying to Preview March 10, 2026 08:55 Failure

vercel Bot had a problem deploying to Preview March 10, 2026 08:56 Failure

vercel Bot had a problem deploying to Preview March 10, 2026 09:05 Failure

gkorland reviewed Mar 10, 2026

View reviewed changes

vercel Bot had a problem deploying to Preview March 10, 2026 09:45 Failure

Copilot started work on behalf of gkorland March 10, 2026 10:42 View session

Copilot finished work on behalf of gkorland March 10, 2026 10:44

Copilot started work on behalf of gkorland March 10, 2026 11:01 View session

Copilot finished work on behalf of gkorland March 10, 2026 11:03

Copilot AI and others added 6 commits March 21, 2026 12:57

Initial plan

966467f

feat: cache Playwright seed repository clones

7fdcf98

Co-authored-by: gkorland <753206+gkorland@users.noreply.github.com>

chore: refine seed cache workflow details

d1c700a

Co-authored-by: gkorland <753206+gkorland@users.noreply.github.com>

Update e2e/seed_test_data.py

aaa9e59

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update e2e/seed_test_data.py

32e71cf

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

fix: use actions/cache@v5 for seed repo cache

5f45258

Match the version used by the existing Playwright browser cache step. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

gkorland force-pushed the copilot/cache-pre-seed-falkordb-test-data branch from 8bd70fe to 5f45258 Compare March 21, 2026 10:58

Merge branch 'staging' into copilot/cache-pre-seed-falkordb-test-data

0305ec5

Conversation

Copilot AI commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Comments on the Issue (you are @copilot in this section)

Uh oh!

vercel Bot commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gkorland commented Mar 10, 2026

Uh oh!

coderabbitai Bot commented Mar 10, 2026

Uh oh!

coderabbitai Bot commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Uh oh!

gkorland left a comment

Choose a reason for hiding this comment

Code Review

Must Fix

Considerations

Nice to Have

Uh oh!

gkorland left a comment

Choose a reason for hiding this comment

Code Review

Must Fix

Considerations

Nice to Have

Uh oh!

gkorland commented Mar 10, 2026

Uh oh!

Copilot AI commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review

Must Fix...

Code Review

Must Fix...

Uh oh!

gkorland commented Mar 10, 2026

Uh oh!

Copilot AI commented Mar 10, 2026

Uh oh!

gkorland commented Mar 21, 2026

Uh oh!

coderabbitai Bot commented Mar 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Copilot AI commented Mar 10, 2026 •

edited

Loading

vercel Bot commented Mar 10, 2026 •

edited

Loading

coderabbitai Bot commented Mar 10, 2026 •

edited

Loading

Copilot AI commented Mar 10, 2026 •

edited

Loading