Skip to content

[refactor](be) Update column nullable semantics#64407

Merged
zclllyybb merged 2 commits into
apache:masterfrom
zclllyybb:nullable
Jun 12, 2026
Merged

[refactor](be) Update column nullable semantics#64407
zclllyybb merged 2 commits into
apache:masterfrom
zclllyybb:nullable

Conversation

@zclllyybb

Copy link
Copy Markdown
Contributor

What problem does this PR solve?

Issue Number: N/A

Related PR: N/A

Problem Summary: IColumn::is_nullable() only identified physical ColumnNullable, while callers also need nullable semantics for ColumnConst(ColumnNullable). The old split between semantic nullable checks and physical nullable checks made call sites ambiguous. This change removes is_concrete_nullable(), makes ColumnConst report nullable when its nested data column is nullable, and migrates physical-column branches to is_column_nullable() or check_and_get_column<ColumnNullable>(). Const nullable join keys and row-copy paths are materialized or unwrapped explicitly so null maps are propagated correctly.

Release note

None

Check List (For Author)

  • Test: Unit Test / Regression test / Build / Style
    • ./run-be-ut.sh --run --filter=ColumnConstTest.* -j 90
    • ./run-be-ut.sh --run --filter=ColumnConstTest.*:RuntimeFilterWrapperTest.TestBitMap:HashJoinProbeOperatorTest.*:HashJoinBuildSinkTest.*:NLJAppendProbeDataWithNullTest.*:VIcebergDeleteSinkTest.*:IcebergDeleteFileReaderHelperTest.* -j 90
    • ./build.sh --be --fe
    • PATH=/mnt/disk6/common/ldb_toolchain_toucan/bin:$PATH build-support/check-format.sh
    • ./run-regression-test.sh --run -f regression-test/suites/correctness_p0/test_runtime_filter_outer_join_nullable_side.groovy
    • ./run-regression-test.sh --run -f regression-test/suites/correctness_p0/test_null_equal.groovy
    • ./run-regression-test.sh --run -f regression-test/suites/correctness_p0/test_outer_join_with_null_value.groovy
    • ./run-regression-test.sh --run -f regression-test/suites/query_p0/sql_functions/cast_function/test_nullable_functions.groovy
    • ./run-regression-test.sh --run -f regression-test/suites/query_p0/sql_functions/array_functions/test_array_functions_by_literal.groovy
    • ./run-regression-test.sh --run -f regression-test/suites/query_p0/sql_functions/json_functions/test_json_extract.groovy
    • ./run-regression-test.sh --run -f regression-test/suites/query_p0/aggregate/map_agg_v1.groovy
    • ./run-regression-test.sh --run -f regression-test/suites/dictionary_p0/test_dict_get_nullable.groovy
    • ./run-regression-test.sh --run -f regression-test/suites/schema_change_p0/test_alter_table_column_nullable.groovy
  • Behavior changed: Yes, internal column nullability predicate now treats const nullable as nullable; SQL behavior is intended to stay unchanged
  • Does this need documentation: No

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen

Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

Issue Number: N/A

Related PR: N/A

Problem Summary: `IColumn::is_nullable()` only identified physical `ColumnNullable`, while callers also need nullable semantics for `ColumnConst(ColumnNullable)`. The old split between semantic nullable checks and physical nullable checks made call sites ambiguous. This change removes `is_concrete_nullable()`, makes `ColumnConst` report nullable when its nested data column is nullable, and migrates physical-column branches to `is_column_nullable()` or `check_and_get_column<ColumnNullable>()`. Const nullable join keys and row-copy paths are materialized or unwrapped explicitly so null maps are propagated correctly.

None

- Test: Unit Test / Regression test / Build / Style
    - `./run-be-ut.sh --run --filter=ColumnConstTest.* -j 90`
    - `./run-be-ut.sh --run --filter=ColumnConstTest.*:RuntimeFilterWrapperTest.TestBitMap:HashJoinProbeOperatorTest.*:HashJoinBuildSinkTest.*:NLJAppendProbeDataWithNullTest.*:VIcebergDeleteSinkTest.*:IcebergDeleteFileReaderHelperTest.* -j 90`
    - `./build.sh --be --fe`
    - `PATH=/mnt/disk6/common/ldb_toolchain_toucan/bin:$PATH build-support/check-format.sh`
    - `./run-regression-test.sh --run -f regression-test/suites/correctness_p0/test_runtime_filter_outer_join_nullable_side.groovy`
    - `./run-regression-test.sh --run -f regression-test/suites/correctness_p0/test_null_equal.groovy`
    - `./run-regression-test.sh --run -f regression-test/suites/correctness_p0/test_outer_join_with_null_value.groovy`
    - `./run-regression-test.sh --run -f regression-test/suites/query_p0/sql_functions/cast_function/test_nullable_functions.groovy`
    - `./run-regression-test.sh --run -f regression-test/suites/query_p0/sql_functions/array_functions/test_array_functions_by_literal.groovy`
    - `./run-regression-test.sh --run -f regression-test/suites/query_p0/sql_functions/json_functions/test_json_extract.groovy`
    - `./run-regression-test.sh --run -f regression-test/suites/query_p0/aggregate/map_agg_v1.groovy`
    - `./run-regression-test.sh --run -f regression-test/suites/dictionary_p0/test_dict_get_nullable.groovy`
    - `./run-regression-test.sh --run -f regression-test/suites/schema_change_p0/test_alter_table_column_nullable.groovy`
- Behavior changed: Yes, internal column nullability predicate now treats const nullable as nullable; SQL behavior is intended to stay unchanged
- Does this need documentation: No
@zclllyybb

Copy link
Copy Markdown
Contributor Author

run buildall

@zclllyybb

Copy link
Copy Markdown
Contributor Author

/review

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requesting changes. The PR goal is valid, but the nullable semantic flip is not complete enough to be safe yet.

Critical checkpoints:

  • Goal alignment: Partially met. The change makes Const(Nullable) visible to semantic nullable checks, but some physical-nullable assumptions remain.
  • Correctness/regressions: Blocking issues found. One parser path copies a one-row const null map as row-aligned, and at least one existing function family now casts ColumnConst as ColumnNullable.
  • Tests: Existing tests cover many nullable paths, but they do not cover mixed vector plus const-nullable arguments for these failures. I did not run the full suite locally; git diff --check passed. The observed BE UT macOS GitHub check failed before tests due a JDK 25 vs JDK 17 runner mismatch, so I did not treat that as a product regression.
  • Scope/compatibility: The blast radius is broad because IColumn::is_nullable() is a core virtual method. Remaining unmodified callers must be audited before changing this contract.
  • Concurrency/lifecycle/locks: No new concurrency or lifecycle issue found in the reviewed BE paths.
  • Config/FE-BE/protocol/security: No config, protocol, or security-boundary change found.
  • Storage/txn/write paths: No transaction or write-consistency issue found beyond the nullable-physical-column checks reviewed in storage readers/schema change.
  • Parallel/old-new paths: Hash join and nested loop join changes appear intentional, but parallel existing function paths are not all migrated.
  • Performance/observability: No standalone performance or observability blocker found.
  • User focus: No additional user-provided focus points were present.

Comment thread be/src/exprs/function/function_jsonb.cpp Outdated
Comment thread be/src/core/column/column_const.h
@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-H: Total hot run time: 29398 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 127fbe5f6d2638460bb8f80e935d15142e242d57, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17592	4075	4078	4075
q2	q3	10745	1417	861	861
q4	4680	493	360	360
q5	7546	880	587	587
q6	181	173	138	138
q7	787	872	621	621
q8	9330	1641	1532	1532
q9	5951	4553	4555	4553
q10	6761	1845	1524	1524
q11	430	265	246	246
q12	629	425	294	294
q13	18109	3465	2816	2816
q14	260	262	255	255
q15	q16	819	764	709	709
q17	992	933	900	900
q18	7190	5918	5580	5580
q19	1368	1347	1014	1014
q20	497	402	265	265
q21	6243	2837	2757	2757
q22	464	375	311	311
Total cold run time: 100574 ms
Total hot run time: 29398 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	5070	4812	4805	4805
q2	q3	4864	5416	4790	4790
q4	2183	2209	1425	1425
q5	4994	4728	4667	4667
q6	230	176	134	134
q7	1897	1751	1531	1531
q8	2388	2106	2080	2080
q9	7809	7418	7379	7379
q10	4801	4695	4252	4252
q11	543	385	351	351
q12	737	743	562	562
q13	2993	3330	2826	2826
q14	267	281	256	256
q15	q16	676	692	607	607
q17	1295	1259	1255	1255
q18	7790	6720	6728	6720
q19	1185	1113	1132	1113
q20	2215	2249	1969	1969
q21	5313	4588	4437	4437
q22	522	471	444	444
Total cold run time: 57772 ms
Total hot run time: 51603 ms

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-DS: Total hot run time: 169311 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 127fbe5f6d2638460bb8f80e935d15142e242d57, data reload: false

query5	4361	620	471	471
query6	442	186	168	168
query7	4923	580	286	286
query8	358	212	201	201
query9	8797	4039	4042	4039
query10	440	304	261	261
query11	5907	2382	2176	2176
query12	160	104	95	95
query13	1254	593	447	447
query14	6400	5395	5045	5045
query14_1	4418	4397	4392	4392
query15	205	203	176	176
query16	978	431	421	421
query17	952	716	621	621
query18	2438	503	354	354
query19	216	188	147	147
query20	107	104	106	104
query21	227	140	114	114
query22	13637	13614	13487	13487
query23	17385	16461	16213	16213
query23_1	16294	16304	16328	16304
query24	7491	1754	1270	1270
query24_1	1329	1311	1327	1311
query25	582	480	393	393
query26	1298	331	178	178
query27	2656	539	342	342
query28	4468	2071	2057	2057
query29	1076	620	488	488
query30	302	232	236	232
query31	1101	1068	942	942
query32	103	57	57	57
query33	516	315	242	242
query34	1180	1144	658	658
query35	738	784	672	672
query36	1374	1419	1217	1217
query37	154	101	86	86
query38	3188	3127	3057	3057
query39	931	907	914	907
query39_1	872	869	871	869
query40	221	118	97	97
query41	65	60	58	58
query42	95	94	95	94
query43	315	316	271	271
query44	
query45	196	180	178	178
query46	1103	1219	736	736
query47	2386	2383	2229	2229
query48	399	419	304	304
query49	624	457	352	352
query50	991	349	267	267
query51	4323	4297	4220	4220
query52	87	85	76	76
query53	245	260	191	191
query54	258	211	189	189
query55	82	73	67	67
query56	229	230	205	205
query57	1397	1429	1348	1348
query58	254	206	202	202
query59	1544	1653	1418	1418
query60	275	247	228	228
query61	152	141	140	140
query62	688	641	584	584
query63	232	184	190	184
query64	2548	769	594	594
query65	
query66	1795	440	342	342
query67	29713	29664	29444	29444
query68	
query69	420	295	265	265
query70	938	956	892	892
query71	307	220	204	204
query72	2989	2651	2345	2345
query73	875	740	453	453
query74	5117	4912	4772	4772
query75	2642	2572	2252	2252
query76	2311	1161	813	813
query77	342	398	290	290
query78	12477	12492	11949	11949
query79	1257	1035	773	773
query80	518	471	372	372
query81	455	284	254	254
query82	229	150	120	120
query83	352	275	243	243
query84	
query85	829	497	414	414
query86	330	294	276	276
query87	3373	3331	3256	3256
query88	3600	2780	2764	2764
query89	403	391	327	327
query90	2181	178	176	176
query91	173	153	157	153
query92	59	58	57	57
query93	1515	1527	933	933
query94	525	365	314	314
query95	683	465	352	352
query96	1104	792	360	360
query97	2693	2679	2606	2606
query98	218	207	203	203
query99	1138	1184	1046	1046
Total cold run time: 249860 ms
Total hot run time: 169311 ms

@hello-stephen

Copy link
Copy Markdown
Contributor

BE UT Coverage Report

Increment line coverage 43.04% (164/381) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 53.91% (21115/39170)
Line Coverage 37.64% (201387/535085)
Region Coverage 33.66% (157898/469112)
Branch Coverage 34.72% (69160/199197)

@hello-stephen

Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 81.58% (310/380) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.94% (28287/38257)
Line Coverage 57.94% (308217/531966)
Region Coverage 54.74% (257958/471279)
Branch Coverage 56.15% (111987/199460)

@zclllyybb zclllyybb marked this pull request as ready for review June 11, 2026 12:43
@zclllyybb

Copy link
Copy Markdown
Contributor Author

/review

@zclllyybb

Copy link
Copy Markdown
Contributor Author

run buildall

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found two blocking correctness issues in dictionary loading. Both are separate dictionary-specific instances of the new ColumnConst(ColumnNullable) semantics and are not duplicates of the existing broad ColumnConst::is_nullable() review thread.

Critical checkpoints:

  • Goal and behavior: The PR aims to make const nullable columns semantically nullable and migrate physical-column checks. That migration is incomplete in dictionary load paths, causing NULL keys/values to be mishandled.
  • Correctness and edge cases: Constant nullable dictionary keys can be retained instead of skipped, and constant nullable attribute values can lose their null map and become non-null defaults.
  • Scope and consistency: Most touched sites move toward explicit physical checks or const-aware null-map helpers, but the dictionary paths need the same const-aware handling.
  • Concurrency/lifecycle/config/storage compatibility: I did not find new issues in those areas for the reviewed changes.
  • Tests: The current coverage does not appear to exercise const-nullable dictionary keys or values; please add targeted coverage for these cases.
  • User focus: No additional user-provided review focus was present in .code-review.i1G8lP/review_focus.txt.

Comment thread be/src/exprs/function/dictionary.cpp
Comment thread be/src/exprs/function/dictionary_util.h
@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-H: Total hot run time: 29454 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit e6834a56dc385a375237bb2184d91256c42b6106, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17836	4157	4127	4127
q2	q3	10807	1437	812	812
q4	4678	476	338	338
q5	7524	894	574	574
q6	186	174	139	139
q7	783	845	634	634
q8	9425	1698	1623	1623
q9	5883	4530	4499	4499
q10	6770	1858	1509	1509
q11	436	278	262	262
q12	640	422	295	295
q13	18144	3455	2807	2807
q14	265	262	245	245
q15	q16	829	771	714	714
q17	931	962	959	959
q18	6992	5713	5583	5583
q19	1317	1263	1109	1109
q20	528	413	260	260
q21	6542	2918	2638	2638
q22	484	394	327	327
Total cold run time: 101000 ms
Total hot run time: 29454 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	5424	5007	4859	4859
q2	q3	5015	5294	4587	4587
q4	2234	2269	1388	1388
q5	5096	4812	4853	4812
q6	243	182	140	140
q7	1963	1754	1527	1527
q8	2484	2204	2086	2086
q9	7743	7503	7419	7419
q10	4732	4702	4190	4190
q11	535	409	400	400
q12	732	743	522	522
q13	3137	3383	2809	2809
q14	270	289	249	249
q15	q16	694	708	618	618
q17	1315	1296	1312	1296
q18	7699	7037	6922	6922
q19	1117	1110	1099	1099
q20	2227	2218	1943	1943
q21	5395	4662	4578	4578
q22	530	496	426	426
Total cold run time: 58585 ms
Total hot run time: 51870 ms

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-DS: Total hot run time: 168469 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit e6834a56dc385a375237bb2184d91256c42b6106, data reload: false

query5	4332	624	470	470
query6	427	193	183	183
query7	4850	577	300	300
query8	361	210	195	195
query9	8744	4115	4086	4086
query10	454	316	247	247
query11	5919	2344	2180	2180
query12	152	100	98	98
query13	1276	621	406	406
query14	6367	5331	5034	5034
query14_1	4422	4345	4320	4320
query15	203	194	178	178
query16	973	447	414	414
query17	890	675	537	537
query18	2409	463	339	339
query19	192	175	139	139
query20	106	114	109	109
query21	221	137	118	118
query22	13614	13586	13349	13349
query23	17339	16554	16165	16165
query23_1	16297	16252	16327	16252
query24	7544	1780	1296	1296
query24_1	1291	1306	1312	1306
query25	548	439	360	360
query26	1298	315	159	159
query27	2709	555	322	322
query28	4494	2012	2000	2000
query29	1043	596	505	505
query30	302	237	200	200
query31	1142	1075	968	968
query32	100	63	60	60
query33	523	314	254	254
query34	1203	1203	670	670
query35	765	778	687	687
query36	1381	1382	1250	1250
query37	155	105	93	93
query38	3219	3136	3064	3064
query39	939	923	901	901
query39_1	875	879	889	879
query40	216	125	106	106
query41	75	68	68	68
query42	97	97	96	96
query43	319	327	281	281
query44	
query45	201	197	178	178
query46	1115	1235	729	729
query47	2362	2350	2226	2226
query48	413	432	305	305
query49	652	477	369	369
query50	1014	364	253	253
query51	4378	4339	4168	4168
query52	90	94	82	82
query53	256	263	192	192
query54	276	233	213	213
query55	82	82	72	72
query56	244	243	242	242
query57	1450	1423	1321	1321
query58	257	228	223	223
query59	1603	1637	1438	1438
query60	287	261	242	242
query61	173	174	178	174
query62	723	648	597	597
query63	236	196	196	196
query64	2605	758	586	586
query65	
query66	1775	472	341	341
query67	29781	29665	29576	29576
query68	
query69	422	317	274	274
query70	987	985	926	926
query71	292	210	211	210
query72	2955	2523	2355	2355
query73	831	791	441	441
query74	5174	4987	4766	4766
query75	2659	2562	2231	2231
query76	2341	1183	783	783
query77	349	375	283	283
query78	12369	12449	11848	11848
query79	1287	1020	740	740
query80	588	461	383	383
query81	450	289	239	239
query82	576	156	124	124
query83	351	270	248	248
query84	
query85	848	512	412	412
query86	359	308	270	270
query87	3427	3324	3142	3142
query88	3635	2724	2735	2724
query89	418	382	332	332
query90	1953	183	174	174
query91	170	156	132	132
query92	62	60	56	56
query93	1467	1467	830	830
query94	543	341	319	319
query95	661	470	341	341
query96	1061	792	357	357
query97	2676	2674	2640	2640
query98	212	208	203	203
query99	1133	1207	1006	1006
Total cold run time: 250452 ms
Total hot run time: 168469 ms

@hello-stephen

Copy link
Copy Markdown
Contributor

BE UT Coverage Report

Increment line coverage 38.31% (195/509) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 53.90% (21115/39174)
Line Coverage 37.63% (201343/535090)
Region Coverage 33.67% (157961/469125)
Branch Coverage 34.71% (69149/199219)

@hello-stephen

Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 87.01% (442/508) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.84% (28251/38261)
Line Coverage 57.83% (307635/531971)
Region Coverage 54.72% (257880/471292)
Branch Coverage 56.05% (111802/199482)

@github-actions github-actions Bot added the approved Indicates a PR has been approved by one committer. label Jun 12, 2026
@github-actions

Copy link
Copy Markdown
Contributor

PR approved by at least one committer and no changes requested.

@csun5285 csun5285 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zclllyybb zclllyybb merged commit 8c9e952 into apache:master Jun 12, 2026
37 of 38 checks passed
@zclllyybb zclllyybb deleted the nullable branch June 12, 2026 03:23
zclllyybb added a commit that referenced this pull request Jun 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/4.1.x dev/4.1.x-conflict

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants