Skip to content

WIP: add bolt backend in gluten#11261

Open
WangGuangxin wants to merge 32 commits intoapache:mainfrom
WangGuangxin:add_bolt_backend
Open

WIP: add bolt backend in gluten#11261
WangGuangxin wants to merge 32 commits intoapache:mainfrom
WangGuangxin:add_bolt_backend

Conversation

@WangGuangxin
Copy link
Contributor

No description provided.

@afterincomparableyum
Copy link

Thanks for your hardwork, really appreciate it @WangGuangxin . I know Velox does certain queries where the result does not match with what vanilla spark outputs. Is this the same with Bolt? I think it would be nice to include some docs about any potential mismatches for users to be cautious about

@FelixYBW
Copy link
Contributor

FelixYBW commented Dec 7, 2025

@WangGuangxin can you change the time metric of "total time of" to "time of"

@FelixYBW
Copy link
Contributor

FelixYBW commented Dec 7, 2025

The initial performance number from our test, higher is better FYI.

image

@taiyang-li
Copy link
Contributor

Thanks for your hardwork, really appreciate it @WangGuangxin . I know Velox does certain queries where the result does not match with what vanilla spark outputs. Is this the same with Bolt? I think it would be nice to include some docs about any potential mismatches for users to be cautious about

@afterincomparableyum thank you for your suggestion. We would add those documents later before this PR is merged.

@taiyang-li taiyang-li added the BOLT label Dec 8, 2025
@FelixYBW
Copy link
Contributor

FelixYBW commented Dec 9, 2025

Can you create similar documents as Velox? here is the list of Velox docs.

docs/developers/ProfileMemoryOfGlutenWithVelox.md
docs/developers/velox-backend-CI.md
docs/developers/velox-backend-build-in-docker.md
docs/developers/velox-function-development-guide.md
docs/developers/VeloxDynamicSizingOffheap.md
docs/developers/VeloxUDF.md
docs/get-started/VeloxABFS.md
docs/get-started/VeloxGCS.md
docs/get-started/VeloxIceberg.md
docs/get-started/VeloxLocalCache.md
docs/get-started/VeloxQAT.md
docs/get-started/VeloxS3.md
docs/get-started/VeloxStageResourceAdj.md
docs/get-started/Velox.md
docs/get-started/VeloxGPU.md
docs/image/velox_apply_stage_resource.png
docs/image/velox_decision_support_bench1_10query_performance.png
docs/image/velox_decision_support_bench1_22queries_performance.png
docs/image/velox_profile_memory_gif.gif
docs/image/velox_profile_memory_text.png
docs/image/veloxbe_memory_layout.png
docs/velox-backend-aggregate-function-support.md
docs/velox-backend-generator-function-support.md
docs/velox-backend-limitations.md
docs/velox-backend-troubleshooting.md
docs/velox-backend-window-function-support.md
docs/velox-parquet-write-configuration.md
docs/velox-spark-configuration.md
docs/velox-backend-scalar-function-support.md
docs/velox-backend-support-progress.md
docs/velox-configuration.md

@FelixYBW
Copy link
Contributor

FelixYBW commented Dec 9, 2025

Does Bolt passes Spark3.2, 3.3, 3.4 and 3.5 UTs?

We are going to drop 3.2 support and fixing 4.0 UTs. So Bolt may starts from 3.3 support.

@WangGuangxin WangGuangxin force-pushed the add_bolt_backend branch 2 times, most recently from b871610 to be4494f Compare December 11, 2025 03:27
@taiyang-li
Copy link
Contributor

Does Bolt passes Spark3.2, 3.3, 3.4 and 3.5 UTs?

We are going to drop 3.2 support and fixing 4.0 UTs. So Bolt may starts from 3.3 support.

Currently we mainly run UTs on spark3.5. For now most of 3.5 cound be passed, and we are fixing failed uts. We will follow the community's support for different spark versions.

@metegenez
Copy link

Is there any documentation/script to build gluten jar with bolt backend? Like dev/buildbundle-veloxbe.sh ? I want to bench it against clickbench dataset with arm chip.

@taiyang-li
Copy link
Contributor

Is there any documentation/script to build gluten jar with bolt backend? Like dev/buildbundle-veloxbe.sh ? I want to bench it against clickbench dataset with arm chip.

@metegenez pls refer to steps in https://github.com/WangGuangxin/gluten/blob/d4ee706eb51a250f7bbacae70b46dffba62470b8/README.md

@taiyang-li
Copy link
Contributor

build with s3 failed:

make release ENABLE_S3=True


CMake Error at CMakeLists.txt:362 (find_package):
  By not providing "FindAWSSDK.cmake" in CMAKE_MODULE_PATH this project has
  asked CMake to find a package configuration file provided by "AWSSDK", but
  CMake did not find one.

  Could not find a package configuration file provided by "AWSSDK" with any
  of the following names:

    AWSSDKConfig.cmake
    awssdk-config.cmake

  Add the installation prefix of "AWSSDK" to CMAKE_PREFIX_PATH or set
  "AWSSDK_DIR" to a directory containing one of the above files.  If "AWSSDK"
  provides a separate development package or SDK, be sure it has been
  installed.

@FelixYBW it had been fixed. Please get the lastest code and run:

make bolt-recipe
make release ENABLE_S3=True

@WangGuangxin WangGuangxin marked this pull request as ready for review January 13, 2026 01:30
@FelixYBW
Copy link
Contributor

FelixYBW commented Jan 31, 2026

iceberg table S3 read failed。hive table can be read successfully

An error occurred while calling o497.collectToPython.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 25 in stage 0.0 failed 4 times, most recent failure: Lost task 25.3 in stage 0.0 (TID 121) (10.0.1.5 executor 0): org.apache.gluten.exception.GlutenException: org.apache.gluten.exception.GlutenException: Exception: BoltRuntimeError
Error Source: RUNTIME
Error Code: INVALID_STATE
Reason: (unknown vs. unknown)
Retriable: False
Expression: baseReaderOpts_.getFileFormat() != dwio::common::FileFormat::UNKNOWN
Context: Split [Hive: s3a://presto-workload/tpcds_sf2500_parquet_zstd_180_iceberg_part_gluten/customer/data/ac343273-743b-40be-a652-31eae91b56db.parquet 4 - 8861227] Task Gluten_Stage_0_TID_121_VTID_68
Additional Context: Operator: TableScan[0] 0
Function: prepareSplit
File: /root/.conan2/p/b/bolt537ff2fba4fea/b/bolt/connectors/hive/SplitReader.cpp
Line: 255
Stack trace:
# 0  _ZN9bytedance4bolt7process10StackTraceC1Ei
# 1  _ZN9bytedance4bolt13BoltExceptionC1EPKcmS3_St17basic_string_viewIcSt11char_traitsIcEES7_S7_S7_bNS1_4TypeES7_
# 2  _ZN9bytedance4bolt6detail13boltCheckFailINS0_16BoltRuntimeErrorERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEEEvRKNS1_17BoltCheckFailArgsET0_
# 3  _ZN9bytedance4bolt9connector4hive11SplitReader12prepareSplitESt10shared_ptrINS0_6common14MetadataFilterEERNS0_4dwio6common17RuntimeStatisticsERNS0_11filesystems11FileOptionsEbRSt6vectorIiSaIiEEPKNS2_28HiveConnectorSplitCacheLimitE
# 4  _ZN9bytedance4bolt9connector4hive14HiveDataSource27createConfiguredSplitReaderERKSt10shared_ptrINS2_18HiveConnectorSplitEEb
# 5  _ZN9bytedance4bolt9connector4hive14HiveDataSource8addSplitERKSt10shared_ptrINS2_18HiveConnectorSplitEE
# 6  _ZN9bytedance4bolt9connector4hive14HiveDataSource8addSplitESt10shared_ptrINS1_14ConnectorSplitEE
# 7  _ZN9bytedance4bolt4exec9TableScan9getOutputEv
# 8  _ZZN9bytedance4bolt4exec6Driver11runInternalERSt10shared_ptrIS2_ERS3_INS1_13BlockingStateEERS3_INS0_9RowVectorEEENKUlvE3_clEv
# 9  _ZN9bytedance4bolt4exec6Driver11runInternalERSt10shared_ptrIS2_ERS3_INS1_13BlockingStateEERS3_INS0_9RowVectorEE
# 10 _ZN9bytedance4bolt4exec6Driver4nextERSt10shared_ptrINS1_13BlockingStateEE
# 11 _ZN9bytedance4bolt4exec4Task4nextEPN5folly10SemiFutureINS3_4UnitEEE
# 12 _ZN6gluten24WholeStageResultIterator12nextInternalEv
# 13 _ZN6gluten24WholeStageResultIterator4nextEv
# 14 Java_org_apache_gluten_vectorized_ColumnarBatchOutIterator_nativeHasNext

@FelixYBW
Copy link
Contributor

FelixYBW commented Feb 4, 2026

Velox PRs submited by Gluten committers and related:
bytedance/bolt#191

@FelixYBW
Copy link
Contributor

FelixYBW commented Feb 4, 2026

iceberg table S3 read failed。hive table can be read successfully

Iceberg read on s3 isn't enabled in Bolt yet.

kexianda and others added 8 commits February 12, 2026 12:33
Currently to make sure llvm IR can call c/c++ code, we added a customized library
loader which loads bolt backend with RTLD_GLOBAL flag.
Since bolt_backend is a shared libraries, it will expose all the dynamic
symbol table to Java process, and will cause symbols conflict if the user has other native libraries.

Use version script to control the symbols exposure.

How to check:
```
readelf -d -s --dyn-syms libbolt.so | grep jit_*
```
Signed-off-by: fangzhuhe <fangzhuhe@bytedance.com>
…rc files to make sure Bolt can correctly handle legacy orc files
This change introduces all of the necessary paimon-specific
changes that were included in olap/gluten's master branch
but were not ported over when switching to OSS gluten's
main branch

The following changes were made

- PaimonScanTransformer is now AbstractPaimonScanTransformer
in order to support a other backend's implementation for paimon.
Bolt-specific features are now included in an extended class
BoltPaimonScanTransformer to facilitate the bolt-specific
requirements.

- Added support for adding extension info from the scan
transformer into the advanced_extension field for a ReadRel.
Also added protos for the paimon-specific advanced extension
which are automatically generated and compiled for both Java
and C++ versions.
- The above also coincides with adding support for passing
the "tableParameters" field of a HiveTableHandle when
converting the substrait plan into bolt.

- Added protos specific to paimon splits in order to serialize
and deserialize paimon-specific split information from gluten
into bolt. Previously this was done in a hacky way using
strings and comma-separated lists. This new version uses
protobuf both on the Java and C++ side to communicate
paimon-specific split information via a LocalFiles' file_format
field oneof definition in algebra.proto. One of the new options
added is for "PaimonReadOptions" which contains the required
paimon information.

- Added a bolt-specific paimon suite which mirrors all of
the previous test cases from the master branch.
Removed handling for StructType, ArrayType, and MapType in data type matching.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.