[CI] diff driven test selection by stas00 · Pull Request #8077 · deepspeedai/DeepSpeed

stas00 · 2026-06-18T22:56:04Z

TLDR: Analyze PR's diff and filter out tests that aren't exercising what has changed, potentially cutting down runtime and expense by 95-99% most of the time.

Detailed:

Deepspeed's CI takes forever - most of the time burning $$ and wastes dev time for no reason as most changes require just a few tests to run.

HF Transformers has a system to select which tests to run based on the diff of the PR - Sylvain Gugger wrote it many years ago since that repo has now probably thousands of tests. Deepspeed's CI isn't too bad but can easily take hours.

So I asked Claude Opus 4.8 to replicate the system for Deepspeed. Please have a look. It looks super complicated, so I'm not sure how easy it'd be to maintain/operate unless we always use AI to continue maintaining it. I asked Claude to leave a detailed state file so that it or another model could pick it up where it left.

I started with just the slowest costliest workload .github/workflows/modal-torch-latest.yml to see if it works well. If you're happy we can replicate it to the rest of the workloads.

CC: @loadams, @tjruwase - please tag others if you think they would be helpful to discuss this.

Signed-off-by: Stas Bekman <stas@stason.org>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f84aec4ea5

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Signed-off-by: Stas Bekman <stas@stason.org>

stas00 · 2026-06-18T23:53:40Z

this is absolutely hilarious - why do we even bother to be in the loop - AI CI flags problems another AI then fixes them, I push the change. We are going to be so replaced really soon now.

I guess it's important that the CI and the code writing aren't done by the same model, otherwise it'd be pointless to check quality and detect problems if it's the same entity.

PKUWZP · 2026-06-19T03:48:32Z

@stas00 May I ask why we are trying to replicate HF Transformers' system? It would be great to learn from your insights on this:)

Copilot

Pull request overview

This PR introduces a diff-driven test selection mechanism for the modal-torch-latest GPU CI workflow to reduce CI runtime by running only tests/unit/v1 tests that are (statically or dynamically) impacted by a PR’s changes, with conservative fail-safe fallbacks to running the full suite.

Changes:

Add a new selector (ci/tests_fetcher.py) that computes impacted tests from the PR diff via a repo-wide import graph (+ curated dynamic edges), emitting mode=all|subset|none and a test target list artifact.
Update the modal runner (ci/torch_latest.py) and workflow (.github/workflows/modal-torch-latest.yml) to consume the generated test target list and gate the deploy job accordingly.
Add docs and self-tests for maintainability (.github/workflows/TEST_SELECTION*.md, ci/test_tests_fetcher.py, CONTRIBUTING update, and gitignore for selector outputs).

Reviewed changes

Copilot reviewed 7 out of 8 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
`CONTRIBUTING.md`	Documents how diff-based test selection works and how to debug/override it.
`ci/torch_latest.py`	Reads selected test targets and passes them to pytest (fallback to full suite).
`ci/tests_fetcher.py`	New diff/import-graph-based test selection engine + GitHub Actions outputs/summary.
`ci/test_tests_fetcher.py`	New stdlib self-tests that validate selector behavior on synthetic repos.
`.gitignore`	Ignores selector output directory `ci/.test_selection/`.
`.github/workflows/TEST_SELECTION.md`	Design/usage/security documentation for the selector system.
`.github/workflows/TEST_SELECTION_STATE.md`	Implementation handoff/state notes for ongoing maintenance.
`.github/workflows/modal-torch-latest.yml`	Adds `collect-tests` job, artifacts, and deploy gating based on selection mode.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+* **Testable.** All repo/config state lives on ``TestSelector`` (constructed with
+  an arbitrary ``repo_root``), so the logic can be unit-tested against synthetic
+  repos -- see ``tests/unit/test_tests_fetcher.py``.
+


+            status = parts[0]
+            if status.startswith("R") and len(parts) >= 3:
+                deleted.append(parts[1])
+                changed.append(parts[2])
+            elif status.startswith("D"):
+                deleted.append(parts[1])
+            elif len(parts) >= 2:
+                changed.append(parts[1])


+
+    @staticmethod
+    def _reachable_with_parents(seed: Path, reverse: dict[Path, set[Path]]) -> dict[Path, Path | None]:
+        """BFS from ``seed`` recording a predecessor for each node (for --explain)."""


+          # NOTE: under pull_request_target this runs PR-authored code; the modal/HF
+          # secrets live on this runner, which is why the workflow is gated on
+          # review_requested / ready_for_review (a maintainer vets the PR first). The
+          # CI orchestration itself is restored from base in the next step so a PR
+          # can't repoint it at the secrets.


PKUWZP · 2026-06-19T03:52:25Z

@stas00 I think it's a great idea to trigger different CI tests based on the PRs. I am actually wondering how accurate the algorithms can be. For example, what's the potential risk of under-testing or over-testing?

stas00 · 2026-06-19T04:06:24Z

HF Transformers has been using it for many years now, this is the same code (from their repo) adapted to DeepSpeed. You can make it much more conservative and run more tests.

What Claude also proposed is to always run the full test suite either on PR merge or have a nightly run that always does the full thing. So in the worse case scenario you'd catch a problem at most 24h later - so basically make no new release until you have run the full test suite. This covers all bases, but waiting for sometimes days to get a ready PR merged because of flakey tests and meanwhile someone merging a PR and needing to start from scratch is a madness - and when we run out of modal.com free credits who will pay for CI?

I'm going to make another version for the new Deepspeed-based RL training repo we are working on https://github.com/Snowflake-AI-Research/Arctic-Platform since there are too many tests already, so I will be testing it out there should the DS maintainers not wanting to take this very small risk. A PR there is at Snowflake-AI-Research/Arctic-Platform#14

stas00 · 2026-06-19T04:10:11Z

@stas00 May I ask why we are trying to replicate HF Transformers' system? It would be great to learn from your insights on this:)

What do you mean? We aren't replicating their system, we are trying to reuse a sub-system that can reduce the CI run time+$$ by 95+%. It's been on my list of things I wanted to port since forever, since it'd be so time consuming, but with a good AI model this is now a matter of a few hours of tokens.

If you'd like to write one from scratch that's a totally valid option.

[CI] diff driven test selection

f84aec4

Signed-off-by: Stas Bekman <stas@stason.org>

stas00 requested a review from loadams as a code owner June 18, 2026 22:56

chatgpt-codex-connector Bot reviewed Jun 18, 2026

View reviewed changes

Comment thread .github/workflows/modal-torch-latest.yml Outdated

Comment thread .github/workflows/modal-torch-latest.yml

address feedback

6577c03

Signed-off-by: Stas Bekman <stas@stason.org>

PKUWZP requested review from PKUWZP, Copilot and tohtana June 19, 2026 03:46

Copilot started reviewing on behalf of PKUWZP June 19, 2026 03:46 View session

PKUWZP requested a review from sfc-gh-truwase June 19, 2026 03:46

Copilot AI reviewed Jun 19, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CI] diff driven test selection#8077

[CI] diff driven test selection#8077
stas00 wants to merge 2 commits into
masterfrom
stas00/diff-driven-test-selection

stas00 commented Jun 18, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

Uh oh!

stas00 commented Jun 18, 2026

Uh oh!

PKUWZP commented Jun 19, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

PKUWZP commented Jun 19, 2026

Uh oh!

stas00 commented Jun 19, 2026 •

edited

Loading

Uh oh!

stas00 commented Jun 19, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

stas00 commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

stas00 commented Jun 18, 2026

Uh oh!

PKUWZP commented Jun 19, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

PKUWZP commented Jun 19, 2026

Uh oh!

stas00 commented Jun 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stas00 commented Jun 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

stas00 commented Jun 18, 2026 •

edited

Loading

stas00 commented Jun 19, 2026 •

edited

Loading

stas00 commented Jun 19, 2026 •

edited

Loading