chore: remove dead native_iceberg_compat code path#4363
Merged
andygrove merged 22 commits intoMay 19, 2026
Conversation
COMET_NATIVE_SCAN_IMPL config now only allows auto or native_datafusion. CometScanRule always uses native_datafusion. Tests no longer parameterize on scan impl and iceberg_compat-specific tests are removed. Golden files are regenerated in a follow-up commit.
Plan-stability suites no longer parameterize on scan impl, so each query has a single golden directory. native_datafusion dirs are renamed to the unsuffixed name, and native_iceberg_compat dirs are removed.
# Conflicts: # .github/workflows/pr_build_linux.yml # .github/workflows/spark_sql_test.yml
# Conflicts: # .github/workflows/spark_sql_test.yml
Removes the four deprecated symbols from CometConf along with all references in main code, tests, and benchmarks: - COMET_NATIVE_SCAN_IMPL (spark.comet.scan.impl) - SCAN_NATIVE_DATAFUSION - SCAN_NATIVE_ICEBERG_COMPAT - SCAN_AUTO With a single Parquet scan implementation, the scanImpl field on CometScanExec is dropped, CometScanTypeChecker becomes parameterless, and per-impl conditionals in CometScanRule and CometExecRule collapse.
…impl-constants # Conflicts: # spark/src/main/scala/org/apache/comet/CometConf.scala # spark/src/main/scala/org/apache/comet/rules/CometScanRule.scala # spark/src/test/scala/org/apache/comet/CometArrayExpressionSuite.scala # spark/src/test/scala/org/apache/comet/CometCsvExpressionSuite.scala # spark/src/test/scala/org/apache/comet/exec/CometExecSuite.scala # spark/src/test/scala/org/apache/comet/exec/CometNativeReaderSuite.scala # spark/src/test/scala/org/apache/comet/parquet/ParquetReadSuite.scala # spark/src/test/scala/org/apache/comet/rules/CometScanRuleSuite.scala # spark/src/test/scala/org/apache/spark/sql/benchmark/CometBenchmarkBase.scala # spark/src/test/scala/org/apache/spark/sql/comet/ParquetEncryptionITCase.scala
Now that native_iceberg_compat is gone, the JVM-mediated V1 Parquet read path (CometScanExec → CometScanWrapper → CometParquetFileFormat → NativeBatchReader / column-level Native APIs) is unreachable. Iceberg also no longer integrates against Comet's @IcebergApi surface. - CometExecRule: V1 CometScanExec fallback now reverts to wrapped FileSourceScanExec (Spark) instead of CometScanWrapper. - CometScanExec: keep as planning intermediate, throw on doExecute/doExecuteColumnar/executeCollect/inputRDDs. - Drop file-format swap to CometParquetFileFormat in both CometScanExec.apply and CometNativeScanExec.apply. - Delete CometParquetFileFormat and the JVM Parquet reader chain (NativeBatchReader, NativeColumnReader, ColumnReader, RowGroupReader, FileReader, plus all supporting page/index/options/utility classes). - Delete IcebergCometNativeBatchReader and CometLazyVector. - Trim Native.java to the Arrow-native scan APIs. - Delete IcebergApi annotation, CometSchemaImporter, AbstractCometSchemaImporter; strip @IcebergApi from CometVector. - Delete now-orphaned tests (TestFileReader, TestUtils, TestCometInputFile); drop closeColumnReader NPE assertion in CometNativeSuite.
The JVM-side Parquet column readers were deleted in the previous commit; the Rust JNI methods and supporting modules they called are now dead. - parquet/mod.rs: drop JNI methods initColumnReader, setDictionaryPage, setPageV1, setPageV2, resetBatch, readBatch, skipBatch, currentBatch, closeColumnReader, plus the Context/get_reader helpers. - Delete parquet/read/ (column, levels, values, mod), mutable_vector.rs, data_type.rs, and the util sub-modules bit_packing, buffer, memory, test_common that only existed for the column readers. - Trim parquet/util/jni.rs to deserialize_schema (the only helper used by the surviving Arrow-native scan path). - Delete native/core/src/common/ entirely; bit.rs and buffer.rs were only consumed by the deleted parquet column-reader chain. - Drop the parquet_read, bit_util, and parquet_decode benches that exercised the removed code; deregister them from Cargo.toml.
Updates the Spark test diffs to compile against the trimmed CometConf API and the new 10-arg CometScanExec signature: - Drop references to COMET_NATIVE_SCAN_IMPL, SCAN_NATIVE_DATAFUSION, SCAN_NATIVE_ICEBERG_COMPAT, and SCAN_AUTO in patched Spark sources. - Fix the CometScanExec extractor pattern in SubquerySuite to use 10 placeholders now that scanImpl is gone. - Collapse the SQLTestUtils per-impl branching to a single check for IgnoreCometNativeDataFusion / IgnoreCometNativeScan, since native_datafusion is the only Parquet scan impl. - Remove the unused IgnoreCometNativeIcebergCompat tag.
With native_datafusion as the only Parquet scan implementation, the per-impl tag variants are equivalent to the plain IgnoreComet tag. - Replace all IgnoreCometNativeDataFusion and IgnoreCometNativeScan call sites with IgnoreComet. - Drop the corresponding case-class definitions from IgnoreComet.scala. - Remove the now-redundant secondary check in SQLTestUtils; the existing IgnoreComet check handles all usages.
CometScanExec is a planning-only intermediate after the native_iceberg_compat removal, so it never reaches the executed plan inspected by ParquetRowIndexSuite. Its inputRDD lazy val is also gone, breaking the sql_core/sql_hive test compile.
CometScanRule.validateObjectStoreConfig had no production caller; only NativeConfigSuite invoked it. Drop the Scala wrapper, its configValidity cache, the Native.java JNI declaration (which leaves the file empty), the backing Rust JNI impl, and the tests that exercised the wrapper. Keep the extractObjectStoreOptions tests since that utility is still used by the native scan paths.
ParquetFilters and its helper SourceFilterSerde have no instantiation sites left after the scan-impl cleanups; CometReaderThreadPool / CometFileReaderThreadPool likewise has no callers. Drop all three.
comphead
approved these changes
May 19, 2026
Contributor
comphead
left a comment
There was a problem hiding this comment.
Thanks @andygrove CI pending
mbutrovich
approved these changes
May 19, 2026
Contributor
mbutrovich
left a comment
There was a problem hiding this comment.
Approved pending CI. I will get #4201 into reviewable shape after this merges.
This was referenced May 19, 2026
This was referenced May 20, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Part of #4020.
Rationale for this change
With #4019 already restricting V1 Parquet scans to
native_datafusion, the rest of thenative_iceberg_compatmachinery is dead: the scan-impl constants/config, the JVM-mediated V1 read path (CometScanWrapper+CometParquetFileFormat+ JVM column readers), the column-levelNativeJNI methods, the corresponding Rust modules, and the@IcebergApisurface (Iceberg no longer integrates against Comet).What changes are included in this PR?
COMET_NATIVE_SCAN_IMPL,SCAN_NATIVE_DATAFUSION,SCAN_NATIVE_ICEBERG_COMPAT,SCAN_AUTO, thescanImplparameter, and all test/benchmarkwithSQLConfusages.CometExecRuleV1 fallback now reverts to wrappedFileSourceScanExecinstead ofCometScanWrapper;CometScanExecbecomes a planning-only intermediate (execution methods throw); file-format swap toCometParquetFileFormatremoved.CometParquetFileFormat, the JVM Parquet reader chain (NativeBatchReader,NativeColumnReader,ColumnReader,RowGroupReader,FileReader, plus all page/index/bloom/options/utility classes),IcebergCometNativeBatchReader,CometLazyVector,IcebergApi,CometSchemaImporter,AbstractCometSchemaImporter.Native.javatrimmed to the Arrow-native scan APIs;@IcebergApistripped fromCometVector.initColumnReader,setPageV1/V2,setDictionaryPage,readBatch,skipBatch,currentBatch,closeColumnReader,resetBatch); theparquet::{read, mutable_vector, data_type}modules;parquet::util::{bit_packing, buffer, memory, test_common}; the entirecommon/module; theparquet_read/bit_util/parquet_decodebenches.TestFileReader,TestUtils,TestCometInputFile; drop thecloseColumnReaderNPE assertion inCometNativeSuite.Docs cleanup is in #4362.
Follow-ups: merging
CometScanExecintoCometNativeScanExec(still kept as a planning intermediate).How are these changes tested?
./mvnw -pl spark -am test-compile -Pspark-3.5andcargo test --no-run --allboth pass. CI will run on this branch.