Skip to content

[improvement](query) Align olap scan schema with storage keys#64413

Open
BiteTheDDDDt wants to merge 17 commits into
apache:masterfrom
BiteTheDDDDt:codex/fe-filled-key-columns
Open

[improvement](query) Align olap scan schema with storage keys#64413
BiteTheDDDDt wants to merge 17 commits into
apache:masterfrom
BiteTheDDDDt:codex/fe-filled-key-columns

Conversation

@BiteTheDDDDt

Copy link
Copy Markdown
Contributor

What problem does this PR solve?

Issue Number: N/A

Related PR: N/A

Problem Summary:

This is an alternative to keeping separate storage and execution schemas in the BE segment iterator for storage expression pushdown.

For AGG key tables and non-MOW unique key tables, the planner now expands the OLAP scan tuple so the scan schema starts with the storage key columns in storage order. Missing key slots are marked in thrift as FE-filled key slots. The scan node projects back to the original execution output after storage filters and runtime filters are translated.

On the BE side, the marked key slots are mapped to tablet column ids. Direct reader paths can fill those storage-only key columns without reading their data pages, while merge and aggregation reader paths still keep real key columns for storage semantics. Since FE now provides the storage-aligned key prefix, the non-direct scanner no longer rebuilds the return column list as "all keys plus value outputs"; it only checks the key prefix and keeps the existing sequence-column expansion.

Release note

None

Check List (For Author)

  • Test: Manual test
    • build-support/clang-format.sh
    • git diff --check
    • ./build.sh --fe
    • ./build.sh --be
    • build-support/run-clang-tidy.sh --build-dir be/build_Release --base 23e21f44f0080e25deff5dcdda2e61af3efc9480 was attempted, but this checkout reports existing file-level clang-tidy diagnostics in the touched storage files before producing a clean changed-line gate.
  • Behavior changed: No user-visible behavior change. Internal OLAP scan tuple planning and storage-read column handling are changed.
  • Does this need documentation: No

@hello-stephen

Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

### What problem does this PR solve?

Issue Number: N/A

Related PR: N/A

Problem Summary: Storage expression pushdown previously rebound slot refs to the expanded reader schema by mutating expression column ids inside segment iterators. That violates the assumption that pushed-down expressions are immutable and can produce surprising behavior when the same expression objects are reused. This change keeps the full scan schema unchanged, adds a project schema aligned with the final projected columns, and evaluates pushed-down expressions against a temporary project block. Slot ordinal to tablet column id mapping now uses the project schema, so no expression mutation is required.

### Release note

None

### Check List (For Author)

- Test: Manual test
    - ./build.sh --be
- Behavior changed: No
- Does this need documentation: No
### What problem does this PR solve?

Issue Number: None

Related PR: apache#64010

Problem Summary: Address review comments for avoiding storage expression column id rebinding. The project-schema column-id to block-position map was initialized in multiple places, and project-block construction still ran on ordinary scan paths where the project schema matched the reader schema. Virtual column materialization also rebuilt the project block for each expression. This change centralizes schema block mapping initialization, reuses the original block when the project schema is identical to the reader schema, reuses the projected block during virtual column materialization, and keeps expression exceptions converted to Status at the virtual-column evaluation boundary.

### Release note

None

### Check List (For Author)

- Test: Unit Test
    - ./build.sh --be
    - build-support/run-clang-tidy.sh attempted; it reported pre-existing/analysis diagnostics in segment_iterator.cpp, segment_iterator.h, and be/src/util/jni-util.h unrelated to the changed logic.
- Behavior changed: No
- Does this need documentation: No
### What problem does this PR solve?

Issue Number: N/A

Related PR: apache#64010

Problem Summary: Review feedback pointed out that the segment iterator should work from an explicit project schema, and that the project block helper should fill caller-owned storage instead of returning a selected block pointer. Follow-up coverage showed that direct RowsetReader and Segment::new_iterator callers do not pass the TabletReader-expanded project schema fields. The reader now keeps the SegmentIterator project_columns invariant, while BetaRowsetReader maps direct callers to return_columns and Segment::new_iterator maps direct callers to schema column IDs. The project block helper remains caller-owned and the ANN source column ordinal translation is documented.

### Release note

None

### Check List (For Author)

- Test: Unit Test
    - build-support/clang-format.sh
    - git diff --check
    - ./run-be-ut.sh --run --filter=TestDeltaWriterClusterKey.vec_sequence_col
- Behavior changed: No
- Does this need documentation: No
Issue Number: N/A

Related PR: N/A

Problem Summary: Avoid maintaining separate execution and storage schemas in the BE segment iterator for pushed-down storage expressions. The planner expands AGG-key and non-MOW unique-key olap scan tuples with missing storage key columns and marks the extra key slots in thrift. The scan node projects back to the original output tuple, while BE maps the marked slots to tablet column ids and, in direct reader mode, fills those storage-only key columns without reading them. Merge and aggregation reader paths still read real key columns.

None

- Test: Manual test
    - ./build.sh --fe
    - ./build.sh --be
    - git diff --check
    - build-support/clang-format.sh
    - build-support/run-clang-tidy.sh --build-dir be/build_Release --base 23e21f4 (attempted; failed on existing file-level clang-tidy diagnostics before producing a clean changed-line gate)
- Behavior changed: No
- Does this need documentation: No
@BiteTheDDDDt BiteTheDDDDt force-pushed the codex/fe-filled-key-columns branch from b0ca37e to bbdb261 Compare June 11, 2026 08:48
### What problem does this PR solve?

Issue Number: None

Related PR: apache#64413

Problem Summary: SegmentIterator no longer needs to maintain a second Schema object for the expression slot layout after the FE-side scan schema is expanded to match storage keys. This change removes the project schema state and replaces it with a lightweight expression ordinal to storage column id mapping. The mapping still preserves the old FE compatibility path when TabletReader expands return columns before segment iteration, while the common aligned case is normalized to the storage schema pointer so per-batch checks stay O(1).

### Release note

None

### Check List (For Author)

- Test: Manual test
    - git diff --check
    - ./build.sh --be
    - build-support/run-clang-tidy.sh (fails on existing diagnostics in touched Doris headers/functions, including jni-util.h static_assert diagnostics and pre-existing complexity/modernize warnings)
- Behavior changed: No
- Does this need documentation: No
### What problem does this PR solve?

Issue Number: close #xxx

Related PR: apache#64413

Problem Summary: FE now aligns scan slots to the storage schema and marks filled key columns explicitly. SegmentIterator no longer needs to keep a separate expression-column layout for pushed-down expressions, so this change removes the expression layout fallback from StorageReadOptions, RowsetReaderContext, and SegmentIterator. Pushed-down expression, ANN, index, score, and virtual-column paths now use the scan schema column ids directly.

### Release note

None

### Check List (For Author)

- Test: Manual test
    - ./build.sh --be
    - git diff --check
    - build-support/run-clang-tidy.sh --base HEAD (fails on pre-existing diagnostics in touched files, including jni-util.h static_assert(false), deprecated headers, and existing function complexity warnings)
- Behavior changed: No
- Does this need documentation: No
### What problem does this PR solve?

Issue Number: None

Related PR: apache#64413

Problem Summary: FE-filled key column schema keeps scan output aligned with the storage schema, so BE no longer needs to carry virtual-column block index/type mappings through scan and storage readers. This removes those mapping fields from OlapScanner, TabletReader, RowsetReaderContext, StorageReadOptions, and SegmentIterator, derives virtual column type from its expression context, routes common expression filtering through VExprContext::execute_conjuncts, and materializes virtual columns through VExprContext::execute.

### Release note

None

### Check List (For Author)

- Test: Manual test
    - build-support/clang-format.sh
    - git diff --check
    - ./build.sh --be
    - build-support/run-clang-tidy.sh --base HEAD (stopped after noisy existing full-file/header diagnostics; fixed the reported changed-line use-after-move in vcollect_iterator.cpp)
- Behavior changed: No
- Does this need documentation: No
### What problem does this PR solve?

Issue Number: None

Related PR: apache#64413

Problem Summary: FE-filled key columns are carried in the scan node thrift only when non-empty, and BE no longer keeps a separate LocalState copy of those filled key slot ids. OlapScanner reads the thrift field directly while initializing return columns and still records filled storage column ids for direct-mode readers to avoid reading FE-added key columns when storage merge is unnecessary.

### Release note

None

### Check List (For Author)

- Test: Manual test
    - build-support/clang-format.sh
    - git diff --check
    - ./build.sh --be --fe
    - build-support/run-clang-tidy.sh --base HEAD (failed on existing full-function/header/JNI static_assert diagnostics in the touched translation units)
- Behavior changed: No
- Does this need documentation: No
### What problem does this PR solve?

Issue Number: None

Related PR: apache#64413

Problem Summary: After aligning scan slots with the storage key schema, the scan path still kept a separate filled-key scanner cache and a dedicated SegmentIterator helper for filled columns. This removes the redundant scanner member, records filled key columns directly in ReaderParams only for direct reads, and folds filled-column default generation into the generic prune path. Other BlockReader and SegmentIterator mapping state remains because it still maps expanded storage-read blocks, delete-predicate columns, virtual columns, and expression columns back to caller blocks.

### Release note

None

### Check List (For Author)

- Test: Manual test
    - build-support/clang-format.sh be/src/exec/scan/olap_scanner.cpp be/src/exec/scan/olap_scanner.h be/src/storage/segment/segment_iterator.cpp be/src/storage/segment/segment_iterator.h
    - git diff --check
    - ./build.sh --be
- Behavior changed: No
- Does this need documentation: No
@BiteTheDDDDt

Copy link
Copy Markdown
Contributor Author

run buildall

@BiteTheDDDDt

Copy link
Copy Markdown
Contributor Author

/review

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found two blocking correctness issues and am requesting changes.

Critical checkpoint conclusions:

  • Goal/test: the PR aims to align OLAP scan schemas with storage keys for AGG and non-MOW UNIQUE tables, but the current implementation has correctness regressions in lazy materialization and common-expression filtering. I did not find tests covering those edge cases.
  • Scope/focus: the PR diff is focused on scan-schema alignment and related BE reader cleanup.
  • Concurrency/lifecycle/config: no new thread, lock, special lifecycle, or config-item risk was found in the PR diff.
  • Compatibility/FE-BE variables: the new optional thrift field is directionally compatible and the main FE/BE send path is wired, but the FE projection remapping conflicts with the lazy-materialized scan path.
  • Parallel paths/conditions: direct/non-direct scan paths were reviewed; the lazy-materialization path and common-expression pushdown path need fixes before this can be considered correct.
  • Testing/result files: git diff --check FETCH_HEAD..HEAD passed. Full Doris tests were not run in this review runner. Please add coverage for non-MOW UNIQUE scans with filled keys plus TopN lazy materialization, and for common expr pushdown when column 0 is not materialized before expression evaluation.
  • Data correctness/version/delete bitmap/memory/error handling: no visible-version, delete-bitmap, transaction, or memory-accounting issue was found in the touched code, but the two execution-path issues below can silently produce wrong query results. Status handling generally remains explicit.
  • Observability/performance: no additional observability requirement was found; the intended performance optimization is reasonable after the correctness issues are addressed.

User focus: .code-review.2nSCsc/review_focus.txt contained no additional focus points.

common_ctxs.push_back(ctx.get());
}
_output_index_result_column(common_ctxs, _sel_rowid_idx.data(), _selected_size, block);
RETURN_IF_ERROR(_execute_common_expr(_sel_rowid_idx.data(), _selected_size, block));

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This drops rows for common-expression predicates when the first block column has not been materialized. _process_common_expr() can still be called with selected_size > 0 while block->rows() == 0: for example, when the common expr reads a later column and column 0 is neither a predicate/common-expr column nor otherwise output before this point. The removed code immediately above this hunk installed a temporary column 0 exactly to give the block selected_size rows. In release builds this new code builds an empty filter from block->rows(), _evaluate_common_expr_filter(..., selected_size, filter) returns 0, and the whole batch is silently filtered out. Please preserve a selected_size-sized row-count column, or otherwise make the conjunct execution/filtering operate on selected_size rows instead of block->rows().

context.addScanNode(olapScanNode, olapScan);

translateRuntimeFilter(olapScan, olapScanNode, context);
if (!storageAlignedScanSlots.filledKeyExprIds.isEmpty()) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This projection remaps every output ExprId in PlanTranslatorContext to the new projection tuple slot ids. That breaks visitPhysicalLazyMaterializeOlapScan: after computePhysicalOlapScan(lazyScan.getScan(), context) returns, it computes scanIds with context.findSlotRef(...) and then prunes olapScanNode.getTupleDesc() (the original scan tuple) using those ids. When a filled storage key is added for a non-MOW UNIQUE scan and another value column is lazy-materialized, the ids in scanIds are projection tuple ids, so the real scan tuple slots are removed while the project list still points at the old scan slots. Please avoid this projection/remap for the lazy-materialized scan path, or preserve/use the original scan tuple slot mapping when pruning.

@hello-stephen

Copy link
Copy Markdown
Contributor

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 77.32% (1889/2443)
Line Coverage 64.38% (33946/52725)
Region Coverage 64.77% (17455/26948)
Branch Coverage 53.93% (9338/17316)

@hello-stephen

Copy link
Copy Markdown
Contributor

FE UT Coverage Report

Increment line coverage 68.54% (61/89) 🎉
Increment coverage report
Complete coverage report

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-DS: Total hot run time: 158309 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit bd652553e1e0db7fd1ef25da40ead6f62e096410, data reload: false

query5	4305	617	475	475
query6	428	190	167	167
query7	
query8	
query9	9739	3995	3965	3965
query10	539	306	249	249
query11	6520	2329	2148	2148
query12	157	105	95	95
query13	3217	631	442	442
query14	7290	5347	5038	5038
query14_1	4375	4377	4402	4377
query15	209	197	180	180
query16	985	459	419	419
query17	1073	704	575	575
query18	2457	468	339	339
query19	206	192	157	157
query20	111	106	110	106
query21	212	140	115	115
query22	13706	14374	14557	14374
query23	17602	16684	16357	16357
query23_1	16363	16420	16471	16420
query24	8236	1839	1318	1318
query24_1	1374	1353	1362	1353
query25	591	451	368	368
query26	
query27	2455	559	343	343
query28	5904	2024	2016	2016
query29	1113	610	474	474
query30	309	239	205	205
query31	1152	1080	976	976
query32	118	55	55	55
query33	505	329	253	253
query34	
query35	805	791	664	664
query36	1497	1377	1227	1227
query37	137	98	86	86
query38	3253	3191	3049	3049
query39	914	903	885	885
query39_1	871	877	870	870
query40	234	118	97	97
query41	60	59	56	56
query42	90	86	89	86
query43	337	336	276	276
query44	
query45	189	184	168	168
query46	
query47	
query48	
query49	599	444	337	337
query50	955	360	256	256
query51	4381	4390	4276	4276
query52	83	85	73	73
query53	
query54	
query55	131	79	66	66
query56	218	214	205	205
query57	
query58	227	204	194	194
query59	1597	1669	1441	1441
query60	286	246	220	220
query61	140	144	140	140
query62	693	656	586	586
query63	
query64	2608	743	596	596
query65	
query66	1794	444	336	336
query67	29624	29582	28890	28890
query68	
query69	440	285	251	251
query70	1019	951	960	951
query71	305	232	207	207
query72	3007	2594	2276	2276
query73	
query74	5137	4918	4743	4743
query75	2603	2503	2239	2239
query76	4139	1186	772	772
query77	336	378	273	273
query78	12391	12361	11767	11767
query79	
query80	493	361	357	357
query81	451	238	247	238
query82	430	141	114	114
query83	348	263	243	243
query84	
query85	799	461	452	452
query86	363	285	283	283
query87	3397	3332	3152	3152
query88	
query89	
query90	2029	173	163	163
query91	
query92	61	56	53	53
query93	1425	1401	913	913
query94	525	337	300	300
query95	678	374	434	374
query96	2005	829	336	336
query97	2737	2707	2561	2561
query98	208	206	199	199
query99	1183	1139	1039	1039
Total cold run time: 239905 ms
Total hot run time: 158309 ms

@hello-stephen

Copy link
Copy Markdown
Contributor

BE UT Coverage Report

Increment line coverage 29.31% (34/116) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 53.92% (21117/39167)
Line Coverage 37.63% (201351/535025)
Region Coverage 33.62% (157679/469037)
Branch Coverage 34.71% (69130/199161)

@hello-stephen

Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 36.21% (42/116) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 57.71% (22073/38251)
Line Coverage 41.07% (218476/531903)
Region Coverage 37.10% (174839/471201)
Branch Coverage 37.94% (75661/199424)

@hello-stephen

Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 60.67% (54/89) 🎉
Increment coverage report
Complete coverage report

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants