-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Enable parquet filter pushdown by default #19477
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
run benchmarks |
|
🤖 |
|
🤖: Benchmark completed Details
|
|
What a 🎄 🎁 ! I see mostly speedups, vastly outweighing the slowdowns, but there are still slowdowns... any idea what those cases have in common? Any way I can help brainstorm solutions? |
Thanks @adriangb -- I am not working full time the next few days but I am hoping to work on this particular item as it is something I desparately want to be done with (and I find it fun!) I spent some time analyzing where the time is going. I haven't written up my notes yet, but I will do so (probably tomorrow as I have a long train ride). I found a few places to improve, but nothing that will obviously get back all the time (yet). I'll kep at it |
# Which issue does this PR close? - RElated to apache/datafusion#19477 # Rationale for this change While profiling apache/datafusion#19477 I noticed some additional clones we could avoid <img width="1504" height="927" alt="Screenshot 2025-12-26 at 12 03 00 PM" src="https://github.com/user-attachments/assets/958f76e2-53d0-4008-b224-d9275984cd1a" /> I doubt this will be a huge deal but it does remove some allocations int he parquet read path # What changes are included in this PR? Use `into_data` rather than `to_data` # Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> # Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. If there are any breaking changes to public APIs, please call them out. -->
|
I also wrote up some notes about current status and plan here: |
|
run benchmark tpch |
|
🤖 |
|
run benchmark tpcds |
|
🤖: Benchmark completed Details
|
|
🤖 |
|
Benchmark script failed with exit code 1. Last 10 lines of output: Click to expand |
This doesn't look great.... |
|
run benchmark tpch |
|
🤖 |
|
🤖: Benchmark completed Details
|
Which issue does this PR close?
filter_pushdown) by default #3463Rationale for this change
I am testing where we currently are with performance
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?