Skip to content

[arrow-select] Replace ArrayData with direct Array construction in filter kernels#9986

Open
liamzwbao wants to merge 3 commits into
apache:mainfrom
liamzwbao:issue-9298-repalce-array-data-arrow-select
Open

[arrow-select] Replace ArrayData with direct Array construction in filter kernels#9986
liamzwbao wants to merge 3 commits into
apache:mainfrom
liamzwbao:issue-9298-repalce-array-data-arrow-select

Conversation

@liamzwbao
Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

  • Replaces several ArrayDataBuilder paths in arrow-select/src/filter.rs with direct typed array constructors.
  • Adds a small helper for filtered null buffers that reuses the already-computed null count.

Are these changes tested?

Covered by exsiting tests

Are there any user-facing changes?

No

@github-actions github-actions Bot added the arrow Changes to the arrow crate label May 16, 2026
@liamzwbao liamzwbao marked this pull request as ready for review May 16, 2026 16:30
@liamzwbao liamzwbao changed the title [arrow-select] Replace ArrayData with direct Array construction [arrow-select] Replace ArrayData with direct Array construction in filter May 16, 2026
@liamzwbao liamzwbao changed the title [arrow-select] Replace ArrayData with direct Array construction in filter [arrow-select] Replace ArrayData with direct Array construction in filter kernels May 16, 2026
@alamb
Copy link
Copy Markdown
Contributor

alamb commented May 20, 2026

run benchmark filter_kernels

@adriangbot
Copy link
Copy Markdown

🤖 Arrow criterion benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4501184291-232-42vmp 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing issue-9298-repalce-array-data-arrow-select (34b1837) to accb1cf (merge-base) diff
BENCH_NAME=filter_kernels
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench filter_kernels
BENCH_FILTER=
Results will be posted here when complete


File an issue against this benchmark runner

Copy link
Copy Markdown
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @liamzwbao -- this is great. It is hard not to love a PR that:

  1. Makes the code simpler (fewer lines)
  2. Removes uses of unsafe
  3. Make things faster

I left a few stylistic comments and am running the benchmarks. Assuming the benchmark results look good I think this PR is good to merge

Thank you again

let (null_count, nulls) = filter_null_mask(nulls, predicate)?;
let buffer = BooleanBuffer::new(nulls, 0, predicate.count);

Some(unsafe { NullBuffer::new_unchecked(buffer, null_count) })
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we please add a safety comment here explaining why this is safe:

Suggested change
Some(unsafe { NullBuffer::new_unchecked(buffer, null_count) })
// Safety: null_count return from filter_null_mas is correct
Some(unsafe { NullBuffer::new_unchecked(buffer, null_count) })

It might also be nice to add a debug assert here to verify in debug builds

debug_assert_eq!(null_count, nulls.num_zeros())

.len(predicate.count)
.add_buffer(filter.dst_offsets.into())
.add_buffer(filter.dst_values.into());
let offsets = unsafe { OffsetBuffer::new_unchecked(filter.dst_offsets.into()) };
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here it would also be nice to comment about why this is safe (what assumptions it relies on). However, I see the existing code doesn't have a safety comment

// Safety: offsets are correctly constructed

predicate: &FilterPredicate,
) -> GenericByteViewArray<T> {
let new_view_buffer = filter_native(array.views(), predicate);
let views = ScalarBuffer::new(new_view_buffer, 0, predicate.count);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here (and other places) you can probably use the unchecked variants too to skip some checks, if we need to get additional speed (ScalarBuffer::new_unchecked)

However, given your PR removes an allocation (the buffers array) I suspect this is already going to be faster and avoiding unsafe is a nice bonus ❤️

}

/// Filters `nulls` and reuses the computed `null_count` to avoid scanning the bitmap.
fn filter_nulls(nulls: Option<&NullBuffer>, predicate: &FilterPredicate) -> Option<NullBuffer> {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about making this a method on FilterPredicate? That would make it easier to find / reuse I think.

impl FilterPredicate { 
  fn filter_nulls(&self, nulls:  Option<&NullBuffer>) -> Option<NullBuffer> {
     ...
  }
}

@adriangbot
Copy link
Copy Markdown

🤖 Arrow criterion benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

group                                                                         issue-9298-repalce-array-data-arrow-select    main
-----                                                                         ------------------------------------------    ----
filter context decimal128 (kept 1/2)                                          1.03     20.9±0.05µs        ? ?/sec           1.00     20.2±0.11µs        ? ?/sec
filter context decimal128 high selectivity (kept 1023/1024)                   1.00     19.0±0.39µs        ? ?/sec           1.01     19.2±0.12µs        ? ?/sec
filter context decimal128 low selectivity (kept 1/1024)                       1.00    148.3±1.39ns        ? ?/sec           1.30    193.1±0.90ns        ? ?/sec
filter context f32 (kept 1/2)                                                 1.02     85.2±6.64µs        ? ?/sec           1.00     83.2±5.34µs        ? ?/sec
filter context f32 high selectivity (kept 1023/1024)                          1.00      5.5±0.01µs        ? ?/sec           1.03      5.7±0.01µs        ? ?/sec
filter context f32 low selectivity (kept 1/1024)                              1.00   326.3±14.90ns        ? ?/sec           1.31    428.5±9.47ns        ? ?/sec
filter context fsb with value length 20 (kept 1/2)                            1.02     73.1±6.45µs        ? ?/sec           1.00     71.4±5.26µs        ? ?/sec
filter context fsb with value length 20 high selectivity (kept 1023/1024)     1.03     73.4±6.50µs        ? ?/sec           1.00     71.0±5.30µs        ? ?/sec
filter context fsb with value length 20 low selectivity (kept 1/1024)         1.02     73.1±6.45µs        ? ?/sec           1.00     71.4±5.25µs        ? ?/sec
filter context fsb with value length 5 (kept 1/2)                             1.03     73.4±6.34µs        ? ?/sec           1.00     71.6±5.14µs        ? ?/sec
filter context fsb with value length 5 high selectivity (kept 1023/1024)      1.02     73.2±6.53µs        ? ?/sec           1.00     71.6±5.26µs        ? ?/sec
filter context fsb with value length 5 low selectivity (kept 1/1024)          1.03     73.3±6.37µs        ? ?/sec           1.00     71.4±5.20µs        ? ?/sec
filter context fsb with value length 50 (kept 1/2)                            1.03     73.3±6.34µs        ? ?/sec           1.00     71.3±5.30µs        ? ?/sec
filter context fsb with value length 50 high selectivity (kept 1023/1024)     1.03     73.2±6.56µs        ? ?/sec           1.00     71.4±5.19µs        ? ?/sec
filter context fsb with value length 50 low selectivity (kept 1/1024)         1.03     73.3±6.58µs        ? ?/sec           1.00     71.4±5.25µs        ? ?/sec
filter context i32 (kept 1/2)                                                 1.01     13.5±0.80µs        ? ?/sec           1.00     13.4±0.79µs        ? ?/sec
filter context i32 high selectivity (kept 1023/1024)                          1.00      3.7±0.01µs        ? ?/sec           1.00      3.7±0.00µs        ? ?/sec
filter context i32 low selectivity (kept 1/1024)                              1.00    145.5±2.19ns        ? ?/sec           1.40    203.6±4.16ns        ? ?/sec
filter context i32 w NULLs (kept 1/2)                                         1.02     85.8±6.56µs        ? ?/sec           1.00     83.9±5.19µs        ? ?/sec
filter context i32 w NULLs high selectivity (kept 1023/1024)                  1.00      5.5±0.01µs        ? ?/sec           1.01      5.6±0.01µs        ? ?/sec
filter context i32 w NULLs low selectivity (kept 1/1024)                      1.00   328.3±14.30ns        ? ?/sec           1.30    426.4±9.55ns        ? ?/sec
filter context mixed string view (kept 1/2)                                   1.00     92.4±5.29µs        ? ?/sec           1.03     94.9±6.46µs        ? ?/sec
filter context mixed string view high selectivity (kept 1023/1024)            1.00     21.1±0.20µs        ? ?/sec           1.04     22.0±0.11µs        ? ?/sec
filter context mixed string view low selectivity (kept 1/1024)                1.00   415.4±16.62ns        ? ?/sec           1.44   597.7±11.42ns        ? ?/sec
filter context short string view (kept 1/2)                                   1.00     93.9±5.36µs        ? ?/sec           1.00     94.3±6.54µs        ? ?/sec
filter context short string view high selectivity (kept 1023/1024)            1.09     22.7±0.15µs        ? ?/sec           1.00     20.8±0.17µs        ? ?/sec
filter context short string view low selectivity (kept 1/1024)                1.00    349.2±9.92ns        ? ?/sec           1.36   474.1±11.37ns        ? ?/sec
filter context string (kept 1/2)                                              1.01    421.9±5.27µs        ? ?/sec           1.00    419.4±5.84µs        ? ?/sec
filter context string dictionary (kept 1/2)                                   1.00     12.7±0.01µs        ? ?/sec           1.14     14.5±0.30µs        ? ?/sec
filter context string dictionary high selectivity (kept 1023/1024)            1.04      4.3±0.01µs        ? ?/sec           1.00      4.1±0.01µs        ? ?/sec
filter context string dictionary low selectivity (kept 1/1024)                1.00    520.1±0.86ns        ? ?/sec           1.11    576.1±5.06ns        ? ?/sec
filter context string dictionary w NULLs (kept 1/2)                           1.04     87.9±6.53µs        ? ?/sec           1.00     84.9±5.27µs        ? ?/sec
filter context string dictionary w NULLs high selectivity (kept 1023/1024)    1.00      6.0±0.02µs        ? ?/sec           1.00      6.0±0.02µs        ? ?/sec
filter context string dictionary w NULLs low selectivity (kept 1/1024)        1.00   707.6±12.71ns        ? ?/sec           1.14    805.1±9.35ns        ? ?/sec
filter context string high selectivity (kept 1023/1024)                       1.01    315.0±2.74µs        ? ?/sec           1.00    313.0±2.44µs        ? ?/sec
filter context string low selectivity (kept 1/1024)                           1.00   758.2±10.14ns        ? ?/sec           1.18    895.1±7.41ns        ? ?/sec
filter context u8 (kept 1/2)                                                  1.00     12.1±0.03µs        ? ?/sec           1.01     12.1±0.03µs        ? ?/sec
filter context u8 high selectivity (kept 1023/1024)                           1.01   1098.3±5.06ns        ? ?/sec           1.00   1090.6±3.43ns        ? ?/sec
filter context u8 low selectivity (kept 1/1024)                               1.00    128.5±1.37ns        ? ?/sec           1.40    179.3±0.89ns        ? ?/sec
filter context u8 w NULLs (kept 1/2)                                          1.03     87.1±6.65µs        ? ?/sec           1.00     84.4±5.47µs        ? ?/sec
filter context u8 w NULLs high selectivity (kept 1023/1024)                   1.00      2.9±0.01µs        ? ?/sec           1.03      3.0±0.01µs        ? ?/sec
filter context u8 w NULLs low selectivity (kept 1/1024)                       1.00   313.3±11.58ns        ? ?/sec           1.28   399.7±10.34ns        ? ?/sec
filter decimal128 (kept 1/2)                                                  1.01     35.4±0.06µs        ? ?/sec           1.00     35.0±0.07µs        ? ?/sec
filter decimal128 high selectivity (kept 1023/1024)                           1.04     19.8±0.44µs        ? ?/sec           1.00     19.2±0.20µs        ? ?/sec
filter decimal128 low selectivity (kept 1/1024)                               1.00   1567.6±1.64ns        ? ?/sec           1.03   1610.8±2.17ns        ? ?/sec
filter f32 (kept 1/2)                                                         1.00    106.6±0.41µs        ? ?/sec           1.01    108.1±0.43µs        ? ?/sec
filter fsb with value length 20 (kept 1/2)                                    1.03     80.2±0.05µs        ? ?/sec           1.00     78.0±0.18µs        ? ?/sec
filter fsb with value length 20 high selectivity (kept 1023/1024)             1.04     25.6±0.63µs        ? ?/sec           1.00     24.6±0.87µs        ? ?/sec
filter fsb with value length 20 low selectivity (kept 1/1024)                 1.00   1620.4±2.48ns        ? ?/sec           1.03   1665.2±4.96ns        ? ?/sec
filter fsb with value length 5 (kept 1/2)                                     1.02     79.4±0.06µs        ? ?/sec           1.00     78.0±0.19µs        ? ?/sec
filter fsb with value length 5 high selectivity (kept 1023/1024)              1.02      6.2±0.05µs        ? ?/sec           1.00      6.1±0.09µs        ? ?/sec
filter fsb with value length 5 low selectivity (kept 1/1024)                  1.00   1563.9±0.79ns        ? ?/sec           1.03   1605.5±9.64ns        ? ?/sec
filter fsb with value length 50 (kept 1/2)                                    1.06    128.5±0.56µs        ? ?/sec           1.00    120.7±0.44µs        ? ?/sec
filter fsb with value length 50 high selectivity (kept 1023/1024)             1.14     97.1±8.84µs        ? ?/sec           1.00     85.1±6.79µs        ? ?/sec
filter fsb with value length 50 low selectivity (kept 1/1024)                 1.00   1626.5±2.14ns        ? ?/sec           1.03   1678.6±3.49ns        ? ?/sec
filter i32 (kept 1/2)                                                         1.00     29.1±0.04µs        ? ?/sec           1.01     29.4±0.05µs        ? ?/sec
filter i32 high selectivity (kept 1023/1024)                                  1.00      4.8±0.07µs        ? ?/sec           1.03      5.0±0.06µs        ? ?/sec
filter i32 low selectivity (kept 1/1024)                                      1.00   1506.9±4.41ns        ? ?/sec           1.04   1564.1±2.53ns        ? ?/sec
filter optimize (kept 1/2)                                                    1.00     28.0±0.02µs        ? ?/sec           1.00     28.1±0.36µs        ? ?/sec
filter optimize high selectivity (kept 1023/1024)                             1.00   1330.7±1.19ns        ? ?/sec           1.40  1866.6±1237.75ns        ? ?/sec
filter optimize low selectivity (kept 1/1024)                                 1.00   1326.1±1.68ns        ? ?/sec           1.00   1319.5±2.57ns        ? ?/sec
filter run array (kept 1/2)                                                   1.00    285.1±1.97µs        ? ?/sec           1.00    284.5±2.85µs        ? ?/sec
filter run array high selectivity (kept 1023/1024)                            1.03    292.0±7.11µs        ? ?/sec           1.00    284.7±5.38µs        ? ?/sec
filter run array low selectivity (kept 1/1024)                                1.00    236.5±1.01µs        ? ?/sec           1.00    235.9±1.20µs        ? ?/sec
filter single record batch                                                    1.00     29.6±0.06µs        ? ?/sec           1.00     29.6±0.09µs        ? ?/sec
filter u8 (kept 1/2)                                                          1.01     29.6±0.05µs        ? ?/sec           1.00     29.3±0.09µs        ? ?/sec
filter u8 high selectivity (kept 1023/1024)                                   1.00      2.2±0.03µs        ? ?/sec           1.03      2.3±0.04µs        ? ?/sec
filter u8 low selectivity (kept 1/1024)                                       1.00   1461.0±5.05ns        ? ?/sec           1.04   1523.2±2.16ns        ? ?/sec

Resource Usage

base (merge-base)

Metric Value
Wall time 655.1s
Peak memory 3.1 GiB
Avg memory 3.0 GiB
CPU user 651.8s
CPU sys 0.9s
Peak spill 0 B

branch

Metric Value
Wall time 675.2s
Peak memory 3.0 GiB
Avg memory 3.0 GiB
CPU user 671.4s
CPU sys 0.2s
Peak spill 0 B

File an issue against this benchmark runner

@alamb
Copy link
Copy Markdown
Contributor

alamb commented May 20, 2026

@liamzwbao -- some of these results look great but some look like they got slightly slower. Can you investigate (perhaps by using ::new_unchecked to skip redundant validation?

@alamb
Copy link
Copy Markdown
Contributor

alamb commented May 20, 2026

run benchmark filter_kernels

@adriangbot
Copy link
Copy Markdown

🤖 Arrow criterion benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4501463694-234-wjwbf 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing issue-9298-repalce-array-data-arrow-select (34b1837) to accb1cf (merge-base) diff
BENCH_NAME=filter_kernels
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench filter_kernels
BENCH_FILTER=
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Arrow criterion benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

group                                                                         issue-9298-repalce-array-data-arrow-select    main
-----                                                                         ------------------------------------------    ----
filter context decimal128 (kept 1/2)                                          1.00     19.9±0.12µs        ? ?/sec           1.06     21.1±0.12µs        ? ?/sec
filter context decimal128 high selectivity (kept 1023/1024)                   1.00     19.4±0.24µs        ? ?/sec           1.04     20.1±0.15µs        ? ?/sec
filter context decimal128 low selectivity (kept 1/1024)                       1.00    151.2±1.59ns        ? ?/sec           1.31    198.8±1.41ns        ? ?/sec
filter context f32 (kept 1/2)                                                 1.00     85.2±5.69µs        ? ?/sec           1.02     86.7±6.81µs        ? ?/sec
filter context f32 high selectivity (kept 1023/1024)                          1.03      5.7±0.01µs        ? ?/sec           1.00      5.6±0.01µs        ? ?/sec
filter context f32 low selectivity (kept 1/1024)                              1.00   324.1±12.51ns        ? ?/sec           1.34   435.1±12.10ns        ? ?/sec
filter context fsb with value length 20 (kept 1/2)                            1.00    73.7±14.65µs        ? ?/sec           1.00     73.5±6.57µs        ? ?/sec
filter context fsb with value length 20 high selectivity (kept 1023/1024)     1.00    73.6±14.67µs        ? ?/sec           1.00     73.5±6.61µs        ? ?/sec
filter context fsb with value length 20 low selectivity (kept 1/1024)         1.00    73.5±14.64µs        ? ?/sec           1.00     73.4±6.48µs        ? ?/sec
filter context fsb with value length 5 (kept 1/2)                             1.00    73.7±14.62µs        ? ?/sec           1.00     73.4±6.61µs        ? ?/sec
filter context fsb with value length 5 high selectivity (kept 1023/1024)      1.00    73.7±14.65µs        ? ?/sec           1.00     73.4±6.60µs        ? ?/sec
filter context fsb with value length 5 low selectivity (kept 1/1024)          1.00    73.5±14.65µs        ? ?/sec           1.00     73.2±6.55µs        ? ?/sec
filter context fsb with value length 50 (kept 1/2)                            1.00    73.5±14.68µs        ? ?/sec           1.00     73.4±6.47µs        ? ?/sec
filter context fsb with value length 50 high selectivity (kept 1023/1024)     1.00    73.4±14.66µs        ? ?/sec           1.00     73.4±6.50µs        ? ?/sec
filter context fsb with value length 50 low selectivity (kept 1/1024)         1.00    73.6±14.59µs        ? ?/sec           1.00     73.3±6.52µs        ? ?/sec
filter context i32 (kept 1/2)                                                 1.00     12.3±0.01µs        ? ?/sec           1.01     12.4±0.01µs        ? ?/sec
filter context i32 high selectivity (kept 1023/1024)                          1.00      3.7±0.00µs        ? ?/sec           1.00      3.7±0.01µs        ? ?/sec
filter context i32 low selectivity (kept 1/1024)                              1.00    150.0±1.66ns        ? ?/sec           1.33    198.9±1.48ns        ? ?/sec
filter context i32 w NULLs (kept 1/2)                                         1.00     84.3±5.46µs        ? ?/sec           1.02     85.9±6.60µs        ? ?/sec
filter context i32 w NULLs high selectivity (kept 1023/1024)                  1.00      5.5±0.01µs        ? ?/sec           1.02      5.6±0.01µs        ? ?/sec
filter context i32 w NULLs low selectivity (kept 1/1024)                      1.00   328.9±12.03ns        ? ?/sec           1.34   442.0±11.62ns        ? ?/sec
filter context mixed string view (kept 1/2)                                   1.02     94.1±6.72µs        ? ?/sec           1.00     92.1±5.57µs        ? ?/sec
filter context mixed string view high selectivity (kept 1023/1024)            1.00     20.6±0.17µs        ? ?/sec           1.01     20.9±0.38µs        ? ?/sec
filter context mixed string view low selectivity (kept 1/1024)                1.00   417.7±17.91ns        ? ?/sec           1.43   599.3±10.68ns        ? ?/sec
filter context short string view (kept 1/2)                                   1.02     93.9±6.61µs        ? ?/sec           1.00     92.1±5.47µs        ? ?/sec
filter context short string view high selectivity (kept 1023/1024)            1.01     20.4±0.26µs        ? ?/sec           1.00     20.1±0.31µs        ? ?/sec
filter context short string view low selectivity (kept 1/1024)                1.00   354.2±12.74ns        ? ?/sec           1.35   477.6±10.84ns        ? ?/sec
filter context string (kept 1/2)                                              1.00   427.8±10.49µs        ? ?/sec           1.01    430.1±8.03µs        ? ?/sec
filter context string dictionary (kept 1/2)                                   1.02     13.0±0.65µs        ? ?/sec           1.00     12.7±0.03µs        ? ?/sec
filter context string dictionary high selectivity (kept 1023/1024)            1.00      4.1±0.01µs        ? ?/sec           1.01      4.1±0.01µs        ? ?/sec
filter context string dictionary low selectivity (kept 1/1024)                1.00    528.4±7.92ns        ? ?/sec           1.10    579.4±4.23ns        ? ?/sec
filter context string dictionary w NULLs (kept 1/2)                           1.00     84.5±5.43µs        ? ?/sec           1.02     86.4±6.57µs        ? ?/sec
filter context string dictionary w NULLs high selectivity (kept 1023/1024)    1.00      6.0±0.01µs        ? ?/sec           1.00      6.0±0.01µs        ? ?/sec
filter context string dictionary w NULLs low selectivity (kept 1/1024)        1.00   718.2±11.94ns        ? ?/sec           1.14   821.2±13.79ns        ? ?/sec
filter context string high selectivity (kept 1023/1024)                       1.01   327.9±27.24µs        ? ?/sec           1.00   325.5±14.40µs        ? ?/sec
filter context string low selectivity (kept 1/1024)                           1.00   757.3±10.42ns        ? ?/sec           1.19    903.1±9.32ns        ? ?/sec
filter context u8 (kept 1/2)                                                  1.00     12.0±0.01µs        ? ?/sec           1.01     12.1±0.02µs        ? ?/sec
filter context u8 high selectivity (kept 1023/1024)                           1.00   1100.9±3.17ns        ? ?/sec           1.00   1096.4±3.61ns        ? ?/sec
filter context u8 low selectivity (kept 1/1024)                               1.00    128.6±1.46ns        ? ?/sec           1.42    182.9±1.28ns        ? ?/sec
filter context u8 w NULLs (kept 1/2)                                          1.00     85.5±6.45µs        ? ?/sec           1.00     85.8±6.55µs        ? ?/sec
filter context u8 w NULLs high selectivity (kept 1023/1024)                   1.00      2.9±0.01µs        ? ?/sec           1.02      2.9±0.01µs        ? ?/sec
filter context u8 w NULLs low selectivity (kept 1/1024)                       1.00   314.4±12.14ns        ? ?/sec           1.31   411.1±12.46ns        ? ?/sec
filter decimal128 (kept 1/2)                                                  1.00     34.9±0.08µs        ? ?/sec           1.02     35.8±0.09µs        ? ?/sec
filter decimal128 high selectivity (kept 1023/1024)                           1.01     19.4±0.08µs        ? ?/sec           1.00     19.2±0.11µs        ? ?/sec
filter decimal128 low selectivity (kept 1/1024)                               1.00   1567.2±1.70ns        ? ?/sec           1.04   1624.5±5.10ns        ? ?/sec
filter f32 (kept 1/2)                                                         1.00    105.7±0.54µs        ? ?/sec           1.02    107.8±0.42µs        ? ?/sec
filter fsb with value length 20 (kept 1/2)                                    1.03     80.3±0.16µs        ? ?/sec           1.00     78.1±0.22µs        ? ?/sec
filter fsb with value length 20 high selectivity (kept 1023/1024)             1.03     25.2±0.73µs        ? ?/sec           1.00     24.5±0.78µs        ? ?/sec
filter fsb with value length 20 low selectivity (kept 1/1024)                 1.00   1617.5±2.39ns        ? ?/sec           1.02   1655.4±5.60ns        ? ?/sec
filter fsb with value length 5 (kept 1/2)                                     1.02     79.6±0.07µs        ? ?/sec           1.00     78.0±0.08µs        ? ?/sec
filter fsb with value length 5 high selectivity (kept 1023/1024)              1.00      6.1±0.54µs        ? ?/sec           1.03      6.3±0.09µs        ? ?/sec
filter fsb with value length 5 low selectivity (kept 1/1024)                  1.00   1586.3±4.48ns        ? ?/sec           1.03   1638.7±3.39ns        ? ?/sec
filter fsb with value length 50 (kept 1/2)                                    1.00    123.4±0.66µs        ? ?/sec           1.00    123.5±0.20µs        ? ?/sec
filter fsb with value length 50 high selectivity (kept 1023/1024)             1.03     90.4±5.97µs        ? ?/sec           1.00     87.9±8.43µs        ? ?/sec
filter fsb with value length 50 low selectivity (kept 1/1024)                 1.00   1622.1±2.05ns        ? ?/sec           1.03   1664.8±4.19ns        ? ?/sec
filter i32 (kept 1/2)                                                         1.00     29.3±0.02µs        ? ?/sec           1.01     29.6±0.02µs        ? ?/sec
filter i32 high selectivity (kept 1023/1024)                                  1.00      4.9±0.08µs        ? ?/sec           1.01      4.9±0.07µs        ? ?/sec
filter i32 low selectivity (kept 1/1024)                                      1.00   1516.5±6.90ns        ? ?/sec           1.02   1550.4±3.70ns        ? ?/sec
filter optimize (kept 1/2)                                                    1.02     28.2±0.07µs        ? ?/sec           1.00     27.6±0.12µs        ? ?/sec
filter optimize high selectivity (kept 1023/1024)                             1.00   1330.5±1.38ns        ? ?/sec           1.00   1336.0±2.37ns        ? ?/sec
filter optimize low selectivity (kept 1/1024)                                 1.01   1326.1±1.57ns        ? ?/sec           1.00   1317.9±1.52ns        ? ?/sec
filter run array (kept 1/2)                                                   1.00    286.4±2.82µs        ? ?/sec           1.00    286.1±3.29µs        ? ?/sec
filter run array high selectivity (kept 1023/1024)                            1.01    288.3±8.41µs        ? ?/sec           1.00    286.8±5.90µs        ? ?/sec
filter run array low selectivity (kept 1/1024)                                1.01    237.4±2.88µs        ? ?/sec           1.00    236.2±0.79µs        ? ?/sec
filter single record batch                                                    1.00     29.4±0.07µs        ? ?/sec           1.01     29.6±0.05µs        ? ?/sec
filter u8 (kept 1/2)                                                          1.01     29.5±0.07µs        ? ?/sec           1.00     29.2±0.03µs        ? ?/sec
filter u8 high selectivity (kept 1023/1024)                                   1.00      2.2±0.04µs        ? ?/sec           1.04      2.3±0.04µs        ? ?/sec
filter u8 low selectivity (kept 1/1024)                                       1.00   1463.6±4.80ns        ? ?/sec           1.04   1526.3±2.92ns        ? ?/sec

Resource Usage

base (merge-base)

Metric Value
Wall time 670.1s
Peak memory 3.0 GiB
Avg memory 3.0 GiB
CPU user 667.5s
CPU sys 0.8s
Peak spill 0 B

branch

Metric Value
Wall time 685.2s
Peak memory 3.0 GiB
Avg memory 3.0 GiB
CPU user 681.0s
CPU sys 0.2s
Peak spill 0 B

File an issue against this benchmark runner

@alamb
Copy link
Copy Markdown
Contributor

alamb commented May 20, 2026

🤔 hmm that run #9986 (comment) looks very good. Will run another

@alamb
Copy link
Copy Markdown
Contributor

alamb commented May 20, 2026

run benchmark filter_kernels

@adriangbot
Copy link
Copy Markdown

🤖 Arrow criterion benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4501756916-238-fwm4b 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing issue-9298-repalce-array-data-arrow-select (34b1837) to accb1cf (merge-base) diff
BENCH_NAME=filter_kernels
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench filter_kernels
BENCH_FILTER=
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Arrow criterion benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

group                                                                         issue-9298-repalce-array-data-arrow-select    main
-----                                                                         ------------------------------------------    ----
filter context decimal128 (kept 1/2)                                          1.00     20.3±0.06µs        ? ?/sec           1.01     20.5±0.11µs        ? ?/sec
filter context decimal128 high selectivity (kept 1023/1024)                   1.00     19.7±0.10µs        ? ?/sec           1.02     20.0±0.13µs        ? ?/sec
filter context decimal128 low selectivity (kept 1/1024)                       1.00    149.4±1.16ns        ? ?/sec           1.35    201.3±9.12ns        ? ?/sec
filter context f32 (kept 1/2)                                                 1.00     83.5±5.57µs        ? ?/sec           1.02     85.1±6.59µs        ? ?/sec
filter context f32 high selectivity (kept 1023/1024)                          1.00      5.5±0.01µs        ? ?/sec           1.06      5.8±0.01µs        ? ?/sec
filter context f32 low selectivity (kept 1/1024)                              1.00   325.9±11.76ns        ? ?/sec           1.34   437.4±17.67ns        ? ?/sec
filter context fsb with value length 20 (kept 1/2)                            1.00     71.5±5.33µs        ? ?/sec           1.03     73.4±6.38µs        ? ?/sec
filter context fsb with value length 20 high selectivity (kept 1023/1024)     1.00     71.6±5.40µs        ? ?/sec           1.02     73.4±6.37µs        ? ?/sec
filter context fsb with value length 20 low selectivity (kept 1/1024)         1.00     71.5±5.33µs        ? ?/sec           1.03     73.4±6.36µs        ? ?/sec
filter context fsb with value length 5 (kept 1/2)                             1.00     71.6±5.33µs        ? ?/sec           1.03     73.5±6.48µs        ? ?/sec
filter context fsb with value length 5 high selectivity (kept 1023/1024)      1.00     71.5±5.39µs        ? ?/sec           1.03     73.4±6.48µs        ? ?/sec
filter context fsb with value length 5 low selectivity (kept 1/1024)          1.00     71.6±5.38µs        ? ?/sec           1.03     73.4±6.46µs        ? ?/sec
filter context fsb with value length 50 (kept 1/2)                            1.00     71.5±5.33µs        ? ?/sec           1.03     73.4±6.39µs        ? ?/sec
filter context fsb with value length 50 high selectivity (kept 1023/1024)     1.00     71.3±5.44µs        ? ?/sec           1.03     73.4±6.49µs        ? ?/sec
filter context fsb with value length 50 low selectivity (kept 1/1024)         1.00     71.4±5.37µs        ? ?/sec           1.03     73.4±6.38µs        ? ?/sec
filter context i32 (kept 1/2)                                                 1.00     12.3±0.01µs        ? ?/sec           1.00     12.4±0.01µs        ? ?/sec
filter context i32 high selectivity (kept 1023/1024)                          1.00      3.7±0.00µs        ? ?/sec           1.01      3.8±0.01µs        ? ?/sec
filter context i32 low selectivity (kept 1/1024)                              1.00    146.9±2.07ns        ? ?/sec           1.38    202.7±8.87ns        ? ?/sec
filter context i32 w NULLs (kept 1/2)                                         1.00     84.0±5.47µs        ? ?/sec           1.04     87.3±6.59µs        ? ?/sec
filter context i32 w NULLs high selectivity (kept 1023/1024)                  1.00      5.5±0.01µs        ? ?/sec           1.01      5.6±0.01µs        ? ?/sec
filter context i32 w NULLs low selectivity (kept 1/1024)                      1.00   326.3±11.86ns        ? ?/sec           1.33   435.3±17.96ns        ? ?/sec
filter context mixed string view (kept 1/2)                                   1.02     94.7±6.66µs        ? ?/sec           1.00     92.7±5.27µs        ? ?/sec
filter context mixed string view high selectivity (kept 1023/1024)            1.00     20.9±0.32µs        ? ?/sec           1.07     22.4±0.07µs        ? ?/sec
filter context mixed string view low selectivity (kept 1/1024)                1.00   420.7±17.74ns        ? ?/sec           1.44    605.2±9.56ns        ? ?/sec
filter context short string view (kept 1/2)                                   1.02     94.6±6.59µs        ? ?/sec           1.00     92.8±5.31µs        ? ?/sec
filter context short string view high selectivity (kept 1023/1024)            1.00     21.7±0.08µs        ? ?/sec           1.00     21.8±0.13µs        ? ?/sec
filter context short string view low selectivity (kept 1/1024)                1.00   354.2±11.27ns        ? ?/sec           1.32   466.7±10.47ns        ? ?/sec
filter context string (kept 1/2)                                              1.00    424.6±7.11µs        ? ?/sec           1.00    422.9±6.59µs        ? ?/sec
filter context string dictionary (kept 1/2)                                   1.01     12.9±0.02µs        ? ?/sec           1.00     12.8±0.02µs        ? ?/sec
filter context string dictionary high selectivity (kept 1023/1024)            1.01      4.2±0.02µs        ? ?/sec           1.00      4.1±0.01µs        ? ?/sec
filter context string dictionary low selectivity (kept 1/1024)                1.00   532.2±10.20ns        ? ?/sec           1.10    587.2±4.78ns        ? ?/sec
filter context string dictionary w NULLs (kept 1/2)                           1.00     84.5±5.47µs        ? ?/sec           1.02     86.4±6.58µs        ? ?/sec
filter context string dictionary w NULLs high selectivity (kept 1023/1024)    1.00      6.0±0.01µs        ? ?/sec           1.00      6.1±0.01µs        ? ?/sec
filter context string dictionary w NULLs low selectivity (kept 1/1024)        1.00   711.8±11.63ns        ? ?/sec           1.15   816.8±12.15ns        ? ?/sec
filter context string high selectivity (kept 1023/1024)                       1.00    312.4±5.00µs        ? ?/sec           1.02    319.1±1.51µs        ? ?/sec
filter context string low selectivity (kept 1/1024)                           1.00   761.4±10.62ns        ? ?/sec           1.19    904.2±8.94ns        ? ?/sec
filter context u8 (kept 1/2)                                                  1.16     14.0±0.08µs        ? ?/sec           1.00     12.1±0.02µs        ? ?/sec
filter context u8 high selectivity (kept 1023/1024)                           1.00   1098.5±2.88ns        ? ?/sec           1.00  1101.4±10.19ns        ? ?/sec
filter context u8 low selectivity (kept 1/1024)                               1.00    129.0±1.60ns        ? ?/sec           1.44    185.6±9.43ns        ? ?/sec
filter context u8 w NULLs (kept 1/2)                                          1.00     85.5±6.42µs        ? ?/sec           1.00     85.7±6.48µs        ? ?/sec
filter context u8 w NULLs high selectivity (kept 1023/1024)                   1.00      2.9±0.01µs        ? ?/sec           1.02      2.9±0.01µs        ? ?/sec
filter context u8 w NULLs low selectivity (kept 1/1024)                       1.00   319.4±13.02ns        ? ?/sec           1.34   428.4±20.01ns        ? ?/sec
filter decimal128 (kept 1/2)                                                  1.00     35.7±0.10µs        ? ?/sec           1.00     35.7±0.07µs        ? ?/sec
filter decimal128 high selectivity (kept 1023/1024)                           1.01     19.4±0.30µs        ? ?/sec           1.00     19.1±0.16µs        ? ?/sec
filter decimal128 low selectivity (kept 1/1024)                               1.00   1568.0±2.98ns        ? ?/sec           1.03   1609.3±3.83ns        ? ?/sec
filter f32 (kept 1/2)                                                         1.00    106.8±0.44µs        ? ?/sec           1.01    108.2±0.35µs        ? ?/sec
filter fsb with value length 20 (kept 1/2)                                    1.02     80.0±0.08µs        ? ?/sec           1.00     78.3±0.09µs        ? ?/sec
filter fsb with value length 20 high selectivity (kept 1023/1024)             1.04     25.3±0.57µs        ? ?/sec           1.00     24.3±0.82µs        ? ?/sec
filter fsb with value length 20 low selectivity (kept 1/1024)                 1.00   1622.0±4.19ns        ? ?/sec           1.02   1655.0±5.24ns        ? ?/sec
filter fsb with value length 5 (kept 1/2)                                     1.02     79.4±0.05µs        ? ?/sec           1.00     78.0±0.13µs        ? ?/sec
filter fsb with value length 5 high selectivity (kept 1023/1024)              1.00      6.0±0.05µs        ? ?/sec           1.00      6.0±0.09µs        ? ?/sec
filter fsb with value length 5 low selectivity (kept 1/1024)                  1.00   1582.0±1.87ns        ? ?/sec           1.03  1628.3±10.51ns        ? ?/sec
filter fsb with value length 50 (kept 1/2)                                    1.00    124.6±0.55µs        ? ?/sec           1.00    124.6±0.18µs        ? ?/sec
filter fsb with value length 50 high selectivity (kept 1023/1024)             1.08     92.7±6.98µs        ? ?/sec           1.00     86.1±9.40µs        ? ?/sec
filter fsb with value length 50 low selectivity (kept 1/1024)                 1.00   1629.0±4.19ns        ? ?/sec           1.04   1687.8±5.42ns        ? ?/sec
filter i32 (kept 1/2)                                                         1.00     29.2±0.04µs        ? ?/sec           1.01     29.6±0.03µs        ? ?/sec
filter i32 high selectivity (kept 1023/1024)                                  1.00      4.9±0.08µs        ? ?/sec           1.00      4.9±0.07µs        ? ?/sec
filter i32 low selectivity (kept 1/1024)                                      1.00   1505.5±4.99ns        ? ?/sec           1.03   1545.8±2.55ns        ? ?/sec
filter optimize (kept 1/2)                                                    1.03     28.2±0.05µs        ? ?/sec           1.00     27.5±0.04µs        ? ?/sec
filter optimize high selectivity (kept 1023/1024)                             1.01   1339.3±3.48ns        ? ?/sec           1.00   1328.8±1.69ns        ? ?/sec
filter optimize low selectivity (kept 1/1024)                                 1.01   1328.4±1.87ns        ? ?/sec           1.00   1317.9±1.50ns        ? ?/sec
filter run array (kept 1/2)                                                   1.00    285.7±2.67µs        ? ?/sec           1.04    296.0±1.15µs        ? ?/sec
filter run array high selectivity (kept 1023/1024)                            1.04    297.4±4.59µs        ? ?/sec           1.00    285.8±5.59µs        ? ?/sec
filter run array low selectivity (kept 1/1024)                                1.00    236.3±0.94µs        ? ?/sec           1.00    236.3±0.92µs        ? ?/sec
filter single record batch                                                    1.01     29.6±0.03µs        ? ?/sec           1.00     29.4±0.06µs        ? ?/sec
filter u8 (kept 1/2)                                                          1.01     29.6±0.04µs        ? ?/sec           1.00     29.3±0.04µs        ? ?/sec
filter u8 high selectivity (kept 1023/1024)                                   1.00      2.2±0.04µs        ? ?/sec           1.03      2.2±0.04µs        ? ?/sec
filter u8 low selectivity (kept 1/1024)                                       1.00   1460.0±4.14ns        ? ?/sec           1.05   1532.7±3.47ns        ? ?/sec

Resource Usage

base (merge-base)

Metric Value
Wall time 670.2s
Peak memory 3.0 GiB
Avg memory 3.0 GiB
CPU user 662.6s
CPU sys 0.9s
Peak spill 0 B

branch

Metric Value
Wall time 685.2s
Peak memory 3.0 GiB
Avg memory 3.0 GiB
CPU user 681.2s
CPU sys 0.2s
Peak spill 0 B

File an issue against this benchmark runner

@alamb
Copy link
Copy Markdown
Contributor

alamb commented May 20, 2026

I would say the benchmarks don't show a clear win / downside one way or the other

I do think it would be good to avoid the new checks just to be sure but otherwise this PR is ready to go.

Thank you again

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants