Cache Playwright seed repo clones for FalkorDB test data#552
Cache Playwright seed repo clones for FalkorDB test data#552
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Co-authored-by: gkorland <753206+gkorland@users.noreply.github.com>
Co-authored-by: gkorland <753206+gkorland@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR implements the "Good" caching option from issue #551, reducing Playwright CI runtime by avoiding repeated network clones of two seeded repositories (FalkorDB/GraphRAG-SDK and pallets/flask) on every run. It introduces a repositories/ directory cache in GitHub Actions keyed on the seed script hash and upstream HEAD SHAs, and updates the seeding script to reuse existing local clones instead of always calling Project.from_git_repository(...).
Changes:
- Workflow cache step added with a composite cache key (OS + seed script hash + both upstream repo SHAs)
e2e/seed_test_data.pyrefactored to check for a cached.gitdirectory and skip re-cloning when present; introducesrepo_name_from_url,fresh_clone_repository, andload_projecthelpers- New unit tests for the two main
load_projectcode paths (cache hit and cache miss)
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
.github/workflows/playwright.yml |
Adds SHA-resolution step and actions/cache@v4 step for repositories/ before seeding |
e2e/seed_test_data.py |
Replaces unconditional Project.from_git_repository with load_project which reuses cached shallow clones |
tests/test_seed_test_data.py |
New test file covering the cache-hit and cache-miss branches of load_project |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@CodeRabbit review |
✅ Actions performedReview triggered.
|
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (3)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
gkorland
left a comment
There was a problem hiding this comment.
Code Review
Must Fix
actions/cache@v4should be@v5— The existing Playwright browser cache in this workflow usesactions/cache@v5. The new seed-repo cache step uses@v4. Should be consistent.
Considerations
-
restore-keyspartial match can serve stale clones — The restore key matches on script hash but ignores upstream repo SHAs. On a partial cache hit,load_projectwill use the cached clone without pulling, so tests may run against outdated code. Probably acceptable since these repos don't change often, but worth documenting as a known trade-off. -
Limited CI time savings — This implements the "Minimum" option from the issue (cache git clones). The bulk of the ~220s comes from
proj.analyze_sources(), which still runs on every invocation. This PR saves network clone time (likely seconds, not minutes). For dramatic improvement, the analysis output or an RDB dump would need to be cached as well.
Nice to Have
repo_name_from_urledge case tests — No tests for trailing slashes,.gitsuffix, etc. Not a blocker but would improve confidence.
gkorland
left a comment
There was a problem hiding this comment.
Code Review
Must Fix
actions/cache@v4should be@v5— The existing Playwright browser cache in this workflow usesactions/cache@v5. The new seed-repo cache step uses@v4. Should be consistent.
Considerations
-
restore-keyspartial match can serve stale clones — The restore key matches on script hash but ignores upstream repo SHAs. On a partial cache hit,load_projectwill use the cached clone without pulling, so tests may run against outdated code. Probably acceptable since these repos don't change often, but worth documenting as a known trade-off. -
Limited CI time savings — This implements the "Minimum" option from the issue (cache git clones). The bulk of the ~220s comes from
proj.analyze_sources(), which still runs on every invocation. This PR saves network clone time (likely seconds, not minutes). For dramatic improvement, the analysis output or an RDB dump would need to be cached as well.
Nice to Have
repo_name_from_urledge case tests — No tests for trailing slashes,.gitsuffix, etc. Not a blocker but would improve confidence.
Match the version used by the existing Playwright browser cache step. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
@copilot please resolve the conflicts |
Updated in 8bd70fe to use
Addressed in 8bd70fe: the seed repository cache step now uses
Resolved on this branch in 0b541c6. I checked the current tree and there are no unresolved conflict files, and the latest Playwright run for the branch succeeded at 8bd70fe. |
|
@copilot please resolve the conflicts |
Playwright setup was spending most of its time re-cloning and re-analyzing the same two repositories on every run. This change avoids repeated network clone cost by reusing cached local clones and invalidating that cache only when the seed inputs change.
Workflow cache for seeded repositories
repositories/directory used bye2e/seed_test_data.pye2e/seed_test_data.pycontentsFalkorDB/GraphRAG-SDKpallets/flaskSeed script clone reuse
e2e/seed_test_data.pyto load projects from cached local clones instead of callingProject.from_git_repository(...)every runrepositories/<repo>Focused regression coverage
Example of the new seed path:
Original prompt
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.