Skip to content

Migrate Python usage to uv workspace#20414

Open
adriangb wants to merge 3 commits intoapache:mainfrom
pydantic:migrate-python-to-uv-workspace
Open

Migrate Python usage to uv workspace#20414
adriangb wants to merge 3 commits intoapache:mainfrom
pydantic:migrate-python-to-uv-workspace

Conversation

@adriangb
Copy link
Contributor

@adriangb adriangb commented Feb 17, 2026

I was having trouble getting benchmarks to gen data.

Summary

  • Replace three independent requirements.txt files with a uv workspace (benchmarks, dev, docs projects)
  • Single uv.lock lockfile for reproducible dependency resolution
  • Simplify bench.sh by removing all ad-hoc venv/pip logic in favor of uv run

Test plan

  • uv sync resolves all deps from repo root
  • uv run --project benchmarks python3 benchmarks/compare.py works
  • uv run --project docs sphinx-build docs/source docs/build builds docs
  • Run a benchmark from bench.sh that uses Python (e.g., h2o data gen or compare flow)

🤖 Generated with Claude Code

@adriangb adriangb self-assigned this Feb 17, 2026
@github-actions github-actions bot added documentation Improvements or additions to documentation development-process Related to development process of DataFusion labels Feb 17, 2026
@adriangb adriangb marked this pull request as ready for review February 17, 2026 14:55
adriangb and others added 2 commits February 17, 2026 08:55
Replace three independent requirements.txt files with a uv workspace
containing benchmarks, dev, and docs projects. This provides a single
lockfile, eliminates ad-hoc venv/pip logic in bench.sh, and simplifies
dependency management across all Python code in the repo.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace setup-python + venv + pip install with astral-sh/setup-uv
and uv sync/run, matching the new uv workspace structure.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@adriangb adriangb force-pushed the migrate-python-to-uv-workspace branch from eda8580 to 353a8d1 Compare February 17, 2026 14:55
@adriangb adriangb requested review from alamb and timsaucer February 17, 2026 14:56
Copy link
Member

@timsaucer timsaucer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good to me. I have a couple of minor suggestions. From the PR description it isn't clear to me if you've run all of those commands to verify they work as expected. Maybe just update the description if they've all been manually verified.

Should we also update our release documentation to tell people to run uv run python generate-changelog.py instead? Also we can do a drive-by delete of download-python-wheels.py since it isn't used (and maybe also check-rat-report.py since it isn't mentioned anywhere in the release documents)

with:
python-version: "3.12"
- name: Setup uv
uses: astral-sh/setup-uv@f0ec1fc3b38f5e7cd731bb6ce540c5af426746bb # v6.1.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the specific commit a requirement? I think astral-sh/setup-uv@v6 is pretty stable

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, but it is generally encouraged to lock to commits in GHA: a commit provides an immutable reference to "safe" code. A tag is mutable. If a malicious actor gains control of an action repository they can upload a new v6 and infect everyone. If everyone is pinned to the commit they can't force the malicious code into everyone's CI unless you opt in by updating the hash. TLDR because there are no lockfiles for CI and because CI is a critical vector for supply chain attacks it's best to pin to a hash.

with:
python-version: "3.12"
- name: Setup uv
uses: astral-sh/setup-uv@f0ec1fc3b38f5e7cd731bb6ce540c5af426746bb # v6.1.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as below, do we need to specific a specific sha here?

Comment on lines +1 to +2
[tool.uv.workspace]
members = ["benchmarks", "dev", "docs"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to use three pyproject.toml files instead of just one here at the root and three dependency groups?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dunno, maybe personal preference, I feel it better mirrors the cargo workspace structure and since we use cargo workspaces here I felt it fit better.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

development-process Related to development process of DataFusion documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants