fix: clone all batches before processing commits to avoid broken range [CM-1200]#4138
Conversation
Signed-off-by: Mouad BANI <mouad-mb@outlook.com>
|
|
There was a problem hiding this comment.
Pull request overview
This PR adjusts the git integration flow so that, for batched (shallow + deepen) clones, the worker completes all deepening first and only then runs commit processing, avoiding issues with unstable/invalid commit ranges while the shallow boundary moves.
Changes:
- Move commit processing in
RepositoryWorkerto run only whenbatch_info.is_final_batchis reached. - Simplify commit range selection in
CommitServiceto uselast_processed_commit..HEADfor batched clones and remove edge-commit range optimization logic. - Change
CloneService.clone_batches_generator()to stop yielding per-deepen batch and instead yield again only after deepening completes.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| services/apps/git_integration/src/crowdgit/worker/repository_worker.py | Defers commit processing until the final clone batch. |
| services/apps/git_integration/src/crowdgit/services/commit/commit_service.py | Removes edge-based range logic; uses last_processed..HEAD for batched clones; simplifies commit skipping. |
| services/apps/git_integration/src/crowdgit/services/clone/clone_service.py | Stops yielding intermediate batches; yields after deepening completes. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Signed-off-by: Mouad BANI <mouad-mb@outlook.com>
| await update_last_processed_commit( | ||
| repo_id=repository.id, | ||
| commit_hash=batch_info.latest_commit_in_repo, | ||
| branch=await get_default_branch(batch_info.repo_path), | ||
| ) | ||
|
|
There was a problem hiding this comment.
latest_commit_in_repo is set by CloneService during the initial minimal clone, so this is out of scope and currently working as expected.
Signed-off-by: Mouad BANI <mouad-mb@outlook.com>
Signed-off-by: Mouad BANI <mouad-mb@outlook.com>
This pull request refactors and simplifies the batched cloning and commit processing logic for repositories. The main changes remove unnecessary tracking of shallow clone boundaries, streamline batch management, and improve error handling and metrics reporting. The changes also clarify the conditions for completing a clone and standardize the commit processing flow.
Clone batch management and logic simplification:
edge_commitandprev_batch_edge_commitfields from theCloneBatchInfomodel and all related logic, eliminating the need to track shallow clone boundaries for batch processing. [1] [2] [3]_check_if_final_batchto more clearly determine when a batched clone is complete, including handling of timeouts and force-push scenarios, and raisingReOnboardingRequiredErroras needed.Commit processing and metrics:
process_batch_commits), removing complex metrics context resetting and error tracking, and always updating the last processed commit at the end. [1] [2] [3] [4] [5]_execute_git_logto use a consistent commit range (last_processed_commit..HEAD) for batched clones, removing the need for shallow boundary logic. [1] [2]Configuration and error handling:
These changes make the clone and commit processing code easier to maintain and less error-prone by removing unnecessary complexity and clarifying the core logic.
Note
Medium Risk
Changes the core incremental clone/commit pipeline and alters when commits are processed and how
last_processed_commitis advanced; mistakes could cause missed commits or unnecessary re-onboarding.Overview
Batched cloning now deepens to completion before commit processing.
CloneBatchInfodrops shallow-boundary fields (edge_commit,prev_batch_edge_commit), andCloneServicereworks final-batch detection to verify the fulllast_processed_commit..HEADrange is available, raisingReOnboardingRequiredErroron force-push or configured “stuck” timeouts.Commit extraction is simplified and runs only once per repo update.
CommitServiceconsolidates intoprocess_batch_commits, runsgit logeither for the full ref (full clone) orlast_processed_commit..HEAD(batched), records per-run execution metrics, and movesupdate_last_processed_commitinto the commit service;RepositoryWorkernow calls commit processing only for the final batch.Tests/fixtures are updated to reflect the new flow and activity payloads (notably adding
usernamefields and updating expected outputs).Reviewed by Cursor Bugbot for commit 214b637. Bugbot is set up for automated code reviews on this repo. Configure here.