Fix support for writing to nested field partition#2204
Fix support for writing to nested field partition#2204kevinjqliu merged 3 commits intoapache:mainfrom
Conversation
kevinjqliu
left a comment
There was a problem hiding this comment.
Generally LGTM! Good catch!
Heres the corresponding spec on partition columns
The source columns, selected by ids, must be a primitive type and cannot be contained in a map or list, but may be nested in a struct. For details on how to serialize a partition spec to JSON, see Appendix C.
maybe we should also gate on map/list.
pyiceberg/io/pyarrow.py
Outdated
There was a problem hiding this comment.
this is fine since we use "." to implicitly reference nested fields
iceberg-python/pyiceberg/expressions/parser.py
Lines 100 to 102 in f475b8e
iceberg-python/pyiceberg/table/update/schema.py
Lines 167 to 171 in f475b8e
pyiceberg/io/pyarrow.py
Outdated
There was a problem hiding this comment.
interesting, so we first reference the struct field in the pa.Table and then navigate to it using struct_field's indices by name
There was a problem hiding this comment.
maybe add this as a comment since it was not obvious
Closes apache#2095 # Rationale for this change Currently, we can only partition on top-level valid field types, but this PR adds support for partitioning on primitive fields in a struct type using dot notation to determine the partitions against the nested structure. # Are these changes tested? Yes added tests and tested a write against the problem in the above issue. ``` > aws s3 ls s3://myBucket/demo1/nestedPartition/data/ PRE timestamp_hour=2024-01-15-10/ PRE timestamp_hour=2024-01-15-11/ PRE timestamp_hour=2024-04-15-11/ PRE timestamp_hour=2024-05-15-10/ ``` # Are there any user-facing changes? no but now can add data to tables that are partitioned by a source column that's in a struct
Closes #2095
Rationale for this change
Currently, we can only partition on top-level valid field types, but this PR adds support for partitioning on primitive fields in a struct type using dot notation to determine the partitions against the nested structure.
Are these changes tested?
Yes added tests and tested a write against the problem in the above issue.
Are there any user-facing changes?
no but now can add data to tables that are partitioned by a source column that's in a struct