Skip to content

ci: add breaking change detector#21499

Open
rluvaton wants to merge 25 commits intoapache:mainfrom
rluvaton:automate-semver-breaking-api
Open

ci: add breaking change detector#21499
rluvaton wants to merge 25 commits intoapache:mainfrom
rluvaton:automate-semver-breaking-api

Conversation

@rluvaton
Copy link
Copy Markdown
Member

@rluvaton rluvaton commented Apr 9, 2026

Which issue does this PR close?

Partially closes:

Rationale for this change

detect breaking changes

What changes are included in this PR?

add new github workflow

Are these changes tested?

Looks like it is working

Are there any user-facing changes?

no

@github-actions github-actions bot added the development-process Related to development process of DataFusion label Apr 9, 2026
@rluvaton rluvaton changed the title ci: add breaking change detector ci: add breaking change detector (WIP) Apr 9, 2026
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added to here so I can test it

@github-actions github-actions bot added the common Related to common crate label Apr 9, 2026
rluvaton and others added 7 commits April 9, 2026 14:27
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@rluvaton rluvaton marked this pull request as ready for review April 9, 2026 12:42
@github-actions github-actions bot removed the common Related to common crate label Apr 9, 2026
@rluvaton rluvaton changed the title ci: add breaking change detector (WIP) ci: add breaking change detector Apr 9, 2026
@rluvaton
Copy link
Copy Markdown
Member Author

rluvaton commented Apr 9, 2026

@alamb, @comphead , would love your review
once merge I will iterate over this, but this works from what I tested

Copy link
Copy Markdown
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @rluvaton -- I left some ideas

once merge I will iterate over this, but this works from what I tested

How did you test?

My biggest concern for all these CI jobs / improvements is the bandwidth to maintain them . We have such limited maintainer bandwidth in general.

Have you considered running this check on one of your own machines (rather than as a github action), polling for PRs to test? We have had good luck with that model and the run benchmark commands.

I think external runners are better because:

  1. They are easier to debug
  2. They are maintained outside of the normal DataFusion code repo
  3. They aren't limited to the github workflow syntax / logic (which I find very hard to debug)

What do you think?

- name: Determine changed crates
id: changed_crates
run: |
# Parse workspace members from root Cargo.toml, excluding internal crates
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure how to debug scripts like this -- can you please put this logic into an existing script (like maybe ci/scripts/changed_packages.sh?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extracted along with creating comments


# Only install toolchain and cargo-semver-checks if there are crates to check
- name: Install Rust toolchain
if: steps.changed_crates.outputs.packages != ''
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the github runner already has rust installed so this step is unecessary probably?

steps:
- name: Comment
if: ${{ needs.check-semver.result != 'success' }}
uses: marocchino/sticky-pull-request-comment@v2
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought in general the ASF CI jobs are supposed to avoid using non github actions (unless it is on the whitelist) -- so I am surprised this works.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did not test the sticky pull request itself, just tested the breaking change detector

@rluvaton
Copy link
Copy Markdown
Member Author

rluvaton commented Apr 9, 2026

Thanks @rluvaton -- I left some ideas

once merge I will iterate over this, but this works from what I tested

How did you test?

locally

My biggest concern for all these CI jobs / improvements is the bandwidth to maintain them . We have such limited maintainer bandwidth in general.

Usually ci workflows do not really change

Have you considered running this check on one of your own machines (rather than as a github action), polling for PRs to test? We have had good luck with that model and the run benchmark commands.

I don't have one

I think external runners are better because:

  1. They are easier to debug
  2. They are maintained outside of the normal DataFusion code repo
  3. They aren't limited to the github workflow syntax / logic (which I find very hard to debug)

What do you think?

Unfortunately I don't have my own machines and maintaining a my own script which involves logic that already come built in inside github action sounds harder to maintain, and also I would need to manage my own machines, debug, etc rather than using existing github infra

@alamb
Copy link
Copy Markdown
Contributor

alamb commented Apr 10, 2026

My biggest concern for all these CI jobs / improvements is the bandwidth to maintain them . We have such limited maintainer bandwidth in general.

Usually ci workflows do not really change

Maybe that is true, but I have spent more hours of my life trying to debug / fix github workflows than I would like to admit.

Unfortunately I don't have my own machines and maintaining a my own script which involves logic that already come built in inside github action sounds harder to maintain, and also I would need to manage my own machines, debug, etc rather than using existing github infra

Yes it definitely shifts the maintenance burden (aka who pays for the maintenance)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

development-process Related to development process of DataFusion

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants