Leverage Iceberg-Rust for all the transforms#1833
Conversation
fab99e3 to
edbac4d
Compare
kevinjqliu
left a comment
There was a problem hiding this comment.
LGTM! I fixed CI and added pyiceberg-core to the pyarrow install group for poetry
[tool.poetry.extras]
pyarrow = ["pyarrow", "pyiceberg-core"]
|
|
||
| [tool.poetry.extras] | ||
| pyarrow = ["pyarrow"] | ||
| pyarrow = ["pyarrow", "pyiceberg-core"] |
There was a problem hiding this comment.
technically this is not required, because we only use the transforms when writing to a partitioned table. But I think that it might lead to a lot of confusion if we don't do this.
There was a problem hiding this comment.
Agreed i was considering this too and came to the same conclusion :)
Also I think the pyarrow_transform functions are used on the read path too
iceberg-python/pyiceberg/io/pyarrow.py
Line 2696 in a67c559
There was a problem hiding this comment.
Also I think the pyarrow_transform functions are used on the read path too
For completeness, I don't think that's true. _determine_partitions is only used by _dataframe_to_data_files. When reading, we often have a single value SELECT * FROM tbl WHERE created_at > '2025-01-01 19:25:00', so there performance is not that important, and we use non-Arrow transform (example for the MonthPartitioning).
# Rationale for this change Testing out to use Iceberg Rust for all of the transforms. I think we have some rounding error in apache/iceberg-rust#1128 Closes apache#1591 # Are these changes tested? # Are there any user-facing changes? <!-- In the case of user-facing changes, please add the changelog label. --> --------- Co-authored-by: Kevin Liu <kevin.jq.liu@gmail.com>
# Rationale for this change Testing out to use Iceberg Rust for all of the transforms. I think we have some rounding error in apache/iceberg-rust#1128 Closes apache#1591 # Are these changes tested? # Are there any user-facing changes? <!-- In the case of user-facing changes, please add the changelog label. --> --------- Co-authored-by: Kevin Liu <kevin.jq.liu@gmail.com>
Rationale for this change
Testing out to use Iceberg Rust for all of the transforms. I think we have some rounding error in apache/iceberg-rust#1128
Closes #1591
Are these changes tested?
Are there any user-facing changes?