Conversation
A faulty refactor: apache@59742e0#diff-e9df836e263024ba54e2706853fb25c00269fbfe3726b440ba57f4a929c995dcR927 This produces incorrect metadata, but was hidden by the writer.
3f04eb9 to
1d1a68d
Compare
|
nit: do we want to add a test for this? |
kevinjqliu
left a comment
There was a problem hiding this comment.
LGTM
status=ManifestEntryStatus.ADDED was previously added in _wrap_append. We missed this line when replacing _wrap_append, everywhere else looks good.
|
Thanks for the suggestion @stevie9868 of adding a test, and after a second look, I believe the code is in a dead branch. Looks like we exclusively write new data with inheritance enabled, so the branch is actually never hit 👍 I've refactored the code a bit by pruning the dead branch. |
|
Did a manual check by running V1: {"status":1,"snapshot_id":3550151653852953585,"data_file":{"file_path":"file:///private/var/folders/q_/gj3w0fts15n9l1dz58630jg40000gp/T/pytest-of-fokko.driesprong/pytest-48/test_sql0/default/test_merge_manifest/data/00000-0-fc7ceb8e-f7da-4932-be94-5157b701ec4e.parquet","file_format":"PARQUET","partition":{},"record_count":3,"file_size_in_bytes":4406,"block_size_in_bytes":67108864,"column_sizes":{"array":[{"key":1,"value":49},{"key":2,"value":78},{"key":3,"value":128},{"key":4,"value":94},{"key":5,"value":118},{"key":6,"value":94},{"key":7,"value":118},{"key":8,"value":118},{"key":9,"value":118},{"key":10,"value":94},{"key":11,"value":78},{"key":12,"value":109}]},"value_counts":{"array":[{"key":1,"value":3},{"key":2,"value":3},{"key":3,"value":3},{"key":4,"value":3},{"key":5,"value":3},{"key":6,"value":3},{"key":7,"value":3},{"key":8,"value":3},{"key":9,"value":3},{"key":10,"value":3},{"key":11,"value":3},{"key":12,"value":3}]},"null_value_counts":{"array":[{"key":1,"value":1},{"key":2,"value":1},{"key":3,"value":1},{"key":4,"value":1},{"key":5,"value":1},{"key":6,"value":1},{"key":7,"value":1},{"key":8,"value":1},{"key":9,"value":1},{"key":10,"value":1},{"key":11,"value":1},{"key":12,"value":1}]},"nan_value_counts":{"array":[]},"lower_bounds":{"array":[{"key":1,"value":"\u0000"},{"key":2,"value":"a"},{"key":3,"value":"aaaaaaaaaaaaaaaa"},{"key":4,"value":"\u0001\u0000\u0000\u0000"},{"key":5,"value":"\u0001\u0000\u0000\u0000\u0000\u0000\u0000\u0000"},{"key":6,"value":"\u0000\u0000\u0000�"},{"key":7,"value":"\u0000\u0000\u0000\u0000\u0000\u0000\u0000�"},{"key":8,"value":"\u0000�jÊ8ñ\u0005\u0000"},{"key":9,"value":"\u0000�jÊ8ñ\u0005\u0000"},{"key":10,"value":"�K\u0000\u0000"},{"key":11,"value":"\u0001"},{"key":12,"value":"\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000"}]},"upper_bounds":{"array":[{"key":1,"value":"\u0001"},{"key":2,"value":"z"},{"key":3,"value":"zzzzzzzzzzzzzzz{"},{"key":4,"value":"\t\u0000\u0000\u0000"},{"key":5,"value":"\t\u0000\u0000\u0000\u0000\u0000\u0000\u0000"},{"key":6,"value":"fff?"},{"key":7,"value":"ÍÌÌÌÌÌì?"},{"key":8,"value":"\u0000»\r«Ûõ\u0005\u0000"},{"key":9,"value":"\u0000»\r«Ûõ\u0005\u0000"},{"key":10,"value":"ÙK\u0000\u0000"},{"key":11,"value":"\u0012"},{"key":12,"value":"\u0011\u0011\u0011\u0011\u0011\u0011\u0011\u0011\u0011\u0011\u0011\u0011\u0011\u0011\u0011\u0011"}]},"key_metadata":null,"split_offsets":{"array":[4]},"sort_order_id":null}}
{"status":0,"snapshot_id":4327648111228143457,"data_file":{"file_path":"file:///private/var/folders/q_/gj3w0fts15n9l1dz58630jg40000gp/T/pytest-of-fokko.driesprong/pytest-48/test_sql0/default/test_merge_manifest/data/00000-0-5a3f45d7-3963-4e61-bb86-348120b264c2.parquet","file_format":"PARQUET","partition":{},"record_count":3,"file_size_in_bytes":4406,"block_size_in_bytes":67108864,"column_sizes":{"array":[{"key":1,"value":49},{"key":2,"value":78},{"key":3,"value":128},{"key":4,"value":94},{"key":5,"value":118},{"key":6,"value":94},{"key":7,"value":118},{"key":8,"value":118},{"key":9,"value":118},{"key":10,"value":94},{"key":11,"value":78},{"key":12,"value":109}]},"value_counts":{"array":[{"key":1,"value":3},{"key":2,"value":3},{"key":3,"value":3},{"key":4,"value":3},{"key":5,"value":3},{"key":6,"value":3},{"key":7,"value":3},{"key":8,"value":3},{"key":9,"value":3},{"key":10,"value":3},{"key":11,"value":3},{"key":12,"value":3}]},"null_value_counts":{"array":[{"key":1,"value":1},{"key":2,"value":1},{"key":3,"value":1},{"key":4,"value":1},{"key":5,"value":1},{"key":6,"value":1},{"key":7,"value":1},{"key":8,"value":1},{"key":9,"value":1},{"key":10,"value":1},{"key":11,"value":1},{"key":12,"value":1}]},"nan_value_counts":{"array":[]},"lower_bounds":{"array":[{"key":1,"value":"\u0000"},{"key":2,"value":"a"},{"key":3,"value":"aaaaaaaaaaaaaaaa"},{"key":4,"value":"\u0001\u0000\u0000\u0000"},{"key":5,"value":"\u0001\u0000\u0000\u0000\u0000\u0000\u0000\u0000"},{"key":6,"value":"\u0000\u0000\u0000�"},{"key":7,"value":"\u0000\u0000\u0000\u0000\u0000\u0000\u0000�"},{"key":8,"value":"\u0000�jÊ8ñ\u0005\u0000"},{"key":9,"value":"\u0000�jÊ8ñ\u0005\u0000"},{"key":10,"value":"�K\u0000\u0000"},{"key":11,"value":"\u0001"},{"key":12,"value":"\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000"}]},"upper_bounds":{"array":[{"key":1,"value":"\u0001"},{"key":2,"value":"z"},{"key":3,"value":"zzzzzzzzzzzzzzz{"},{"key":4,"value":"\t\u0000\u0000\u0000"},{"key":5,"value":"\t\u0000\u0000\u0000\u0000\u0000\u0000\u0000"},{"key":6,"value":"fff?"},{"key":7,"value":"ÍÌÌÌÌÌì?"},{"key":8,"value":"\u0000»\r«Ûõ\u0005\u0000"},{"key":9,"value":"\u0000»\r«Ûõ\u0005\u0000"},{"key":10,"value":"ÙK\u0000\u0000"},{"key":11,"value":"\u0012"},{"key":12,"value":"\u0011\u0011\u0011\u0011\u0011\u0011\u0011\u0011\u0011\u0011\u0011\u0011\u0011\u0011\u0011\u0011"}]},"key_metadata":null,"split_offsets":{"array":[4]},"sort_order_id":null}}
{"status":0,"snapshot_id":5751151500317885948,"data_file":{"file_path":"file:///private/var/folders/q_/gj3w0fts15n9l1dz58630jg40000gp/T/pytest-of-fokko.driesprong/pytest-48/test_sql0/default/test_merge_manifest/data/00000-0-906a39db-b725-4ee3-a223-8522be204d67.parquet","file_format":"PARQUET","partition":{},"record_count":3,"file_size_in_bytes":4406,"block_size_in_bytes":67108864,"column_sizes":{"array":[{"key":1,"value":49},{"key":2,"value":78},{"key":3,"value":128},{"key":4,"value":94},{"key":5,"value":118},{"key":6,"value":94},{"key":7,"value":118},{"key":8,"value":118},{"key":9,"value":118},{"key":10,"value":94},{"key":11,"value":78},{"key":12,"value":109}]},"value_counts":{"array":[{"key":1,"value":3},{"key":2,"value":3},{"key":3,"value":3},{"key":4,"value":3},{"key":5,"value":3},{"key":6,"value":3},{"key":7,"value":3},{"key":8,"value":3},{"key":9,"value":3},{"key":10,"value":3},{"key":11,"value":3},{"key":12,"value":3}]},"null_value_counts":{"array":[{"key":1,"value":1},{"key":2,"value":1},{"key":3,"value":1},{"key":4,"value":1},{"key":5,"value":1},{"key":6,"value":1},{"key":7,"value":1},{"key":8,"value":1},{"key":9,"value":1},{"key":10,"value":1},{"key":11,"value":1},{"key":12,"value":1}]},"nan_value_counts":{"array":[]},"lower_bounds":{"array":[{"key":1,"value":"\u0000"},{"key":2,"value":"a"},{"key":3,"value":"aaaaaaaaaaaaaaaa"},{"key":4,"value":"\u0001\u0000\u0000\u0000"},{"key":5,"value":"\u0001\u0000\u0000\u0000\u0000\u0000\u0000\u0000"},{"key":6,"value":"\u0000\u0000\u0000�"},{"key":7,"value":"\u0000\u0000\u0000\u0000\u0000\u0000\u0000�"},{"key":8,"value":"\u0000�jÊ8ñ\u0005\u0000"},{"key":9,"value":"\u0000�jÊ8ñ\u0005\u0000"},{"key":10,"value":"�K\u0000\u0000"},{"key":11,"value":"\u0001"},{"key":12,"value":"\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000"}]},"upper_bounds":{"array":[{"key":1,"value":"\u0001"},{"key":2,"value":"z"},{"key":3,"value":"zzzzzzzzzzzzzzz{"},{"key":4,"value":"\t\u0000\u0000\u0000"},{"key":5,"value":"\t\u0000\u0000\u0000\u0000\u0000\u0000\u0000"},{"key":6,"value":"fff?"},{"key":7,"value":"ÍÌÌÌÌÌì?"},{"key":8,"value":"\u0000»\r«Ûõ\u0005\u0000"},{"key":9,"value":"\u0000»\r«Ûõ\u0005\u0000"},{"key":10,"value":"ÙK\u0000\u0000"},{"key":11,"value":"\u0012"},{"key":12,"value":"\u0011\u0011\u0011\u0011\u0011\u0011\u0011\u0011\u0011\u0011\u0011\u0011\u0011\u0011\u0011\u0011"}]},"key_metadata":null,"split_offsets":{"array":[4]},"sort_order_id":null}}
{"status":0,"snapshot_id":2322795646469314858,"data_file":{"file_path":"file:///private/var/folders/q_/gj3w0fts15n9l1dz58630jg40000gp/T/pytest-of-fokko.driesprong/pytest-48/test_sql0/default/test_merge_manifest/data/00000-0-c96d54d7-25e7-4a57-b3fd-b206ce551c68.parquet","file_format":"PARQUET","partition":{},"record_count":3,"file_size_in_bytes":4406,"block_size_in_bytes":67108864,"column_sizes":{"array":[{"key":1,"value":49},{"key":2,"value":78},{"key":3,"value":128},{"key":4,"value":94},{"key":5,"value":118},{"key":6,"value":94},{"key":7,"value":118},{"key":8,"value":118},{"key":9,"value":118},{"key":10,"value":94},{"key":11,"value":78},{"key":12,"value":109}]},"value_counts":{"array":[{"key":1,"value":3},{"key":2,"value":3},{"key":3,"value":3},{"key":4,"value":3},{"key":5,"value":3},{"key":6,"value":3},{"key":7,"value":3},{"key":8,"value":3},{"key":9,"value":3},{"key":10,"value":3},{"key":11,"value":3},{"key":12,"value":3}]},"null_value_counts":{"array":[{"key":1,"value":1},{"key":2,"value":1},{"key":3,"value":1},{"key":4,"value":1},{"key":5,"value":1},{"key":6,"value":1},{"key":7,"value":1},{"key":8,"value":1},{"key":9,"value":1},{"key":10,"value":1},{"key":11,"value":1},{"key":12,"value":1}]},"nan_value_counts":{"array":[]},"lower_bounds":{"array":[{"key":1,"value":"\u0000"},{"key":2,"value":"a"},{"key":3,"value":"aaaaaaaaaaaaaaaa"},{"key":4,"value":"\u0001\u0000\u0000\u0000"},{"key":5,"value":"\u0001\u0000\u0000\u0000\u0000\u0000\u0000\u0000"},{"key":6,"value":"\u0000\u0000\u0000�"},{"key":7,"value":"\u0000\u0000\u0000\u0000\u0000\u0000\u0000�"},{"key":8,"value":"\u0000�jÊ8ñ\u0005\u0000"},{"key":9,"value":"\u0000�jÊ8ñ\u0005\u0000"},{"key":10,"value":"�K\u0000\u0000"},{"key":11,"value":"\u0001"},{"key":12,"value":"\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000"}]},"upper_bounds":{"array":[{"key":1,"value":"\u0001"},{"key":2,"value":"z"},{"key":3,"value":"zzzzzzzzzzzzzzz{"},{"key":4,"value":"\t\u0000\u0000\u0000"},{"key":5,"value":"\t\u0000\u0000\u0000\u0000\u0000\u0000\u0000"},{"key":6,"value":"fff?"},{"key":7,"value":"ÍÌÌÌÌÌì?"},{"key":8,"value":"\u0000»\r«Ûõ\u0005\u0000"},{"key":9,"value":"\u0000»\r«Ûõ\u0005\u0000"},{"key":10,"value":"ÙK\u0000\u0000"},{"key":11,"value":"\u0012"},{"key":12,"value":"\u0011\u0011\u0011\u0011\u0011\u0011\u0011\u0011\u0011\u0011\u0011\u0011\u0011\u0011\u0011\u0011"}]},"key_metadata":null,"split_offsets":{"array":[4]},"sort_order_id":null}}
{"status":0,"snapshot_id":776868969565186897,"data_file":{"file_path":"file:///private/var/folders/q_/gj3w0fts15n9l1dz58630jg40000gp/T/pytest-of-fokko.driesprong/pytest-48/test_sql0/default/test_merge_manifest/data/00000-0-6d2aa946-dafa-4cd1-ba43-9092082c78ac.parquet","file_format":"PARQUET","partition":{},"record_count":3,"file_size_in_bytes":4406,"block_size_in_bytes":67108864,"column_sizes":{"array":[{"key":1,"value":49},{"key":2,"value":78},{"key":3,"value":128},{"key":4,"value":94},{"key":5,"value":118},{"key":6,"value":94},{"key":7,"value":118},{"key":8,"value":118},{"key":9,"value":118},{"key":10,"value":94},{"key":11,"value":78},{"key":12,"value":109}]},"value_counts":{"array":[{"key":1,"value":3},{"key":2,"value":3},{"key":3,"value":3},{"key":4,"value":3},{"key":5,"value":3},{"key":6,"value":3},{"key":7,"value":3},{"key":8,"value":3},{"key":9,"value":3},{"key":10,"value":3},{"key":11,"value":3},{"key":12,"value":3}]},"null_value_counts":{"array":[{"key":1,"value":1},{"key":2,"value":1},{"key":3,"value":1},{"key":4,"value":1},{"key":5,"value":1},{"key":6,"value":1},{"key":7,"value":1},{"key":8,"value":1},{"key":9,"value":1},{"key":10,"value":1},{"key":11,"value":1},{"key":12,"value":1}]},"nan_value_counts":{"array":[]},"lower_bounds":{"array":[{"key":1,"value":"\u0000"},{"key":2,"value":"a"},{"key":3,"value":"aaaaaaaaaaaaaaaa"},{"key":4,"value":"\u0001\u0000\u0000\u0000"},{"key":5,"value":"\u0001\u0000\u0000\u0000\u0000\u0000\u0000\u0000"},{"key":6,"value":"\u0000\u0000\u0000�"},{"key":7,"value":"\u0000\u0000\u0000\u0000\u0000\u0000\u0000�"},{"key":8,"value":"\u0000�jÊ8ñ\u0005\u0000"},{"key":9,"value":"\u0000�jÊ8ñ\u0005\u0000"},{"key":10,"value":"�K\u0000\u0000"},{"key":11,"value":"\u0001"},{"key":12,"value":"\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000"}]},"upper_bounds":{"array":[{"key":1,"value":"\u0001"},{"key":2,"value":"z"},{"key":3,"value":"zzzzzzzzzzzzzzz{"},{"key":4,"value":"\t\u0000\u0000\u0000"},{"key":5,"value":"\t\u0000\u0000\u0000\u0000\u0000\u0000\u0000"},{"key":6,"value":"fff?"},{"key":7,"value":"ÍÌÌÌÌÌì?"},{"key":8,"value":"\u0000»\r«Ûõ\u0005\u0000"},{"key":9,"value":"\u0000»\r«Ûõ\u0005\u0000"},{"key":10,"value":"ÙK\u0000\u0000"},{"key":11,"value":"\u0012"},{"key":12,"value":"\u0011\u0011\u0011\u0011\u0011\u0011\u0011\u0011\u0011\u0011\u0011\u0011\u0011\u0011\u0011\u0011"}]},"key_metadata":null,"split_offsets":{"array":[4]},"sort_order_id":null}}V2: The only thing I notice is that the snapshot-id doesn't need to be materialized for the V2 entry with |
Co-authored-by: Kevin Liu <kevinjqliu@users.noreply.github.com>
A faulty refactor:
59742e0#diff-e9df836e263024ba54e2706853fb25c00269fbfe3726b440ba57f4a929c995dcR927
This produces incorrect metadata, but was hidden by the writer.
Rationale for this change
Are these changes tested?
Are there any user-facing changes?