[Feature](func) Support table function json_each, json_each_text#60910
[Feature](func) Support table function json_each, json_each_text#60910linrrzqqq wants to merge 1 commit intoapache:masterfrom
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
TPC-H: Total hot run time: 28775 ms |
TPC-DS: Total hot run time: 184699 ms |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
|
run buildall |
TPC-H: Total hot run time: 28938 ms |
TPC-DS: Total hot run time: 184511 ms |
FE UT Coverage ReportIncrement line coverage |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
|
/review |
|
/review |
There was a problem hiding this comment.
Code Review Summary: [Feature](func) Support table function json_each, json_each_text
Overall this is a well-structured addition of two new table-generating functions (json_each and json_each_text) that expand JSON objects into key/value rows. The implementation follows existing patterns in the codebase (similar to explode_json_object). The C++ implementation is correct, the FE Java classes follow the established TableGeneratingFunction pattern, and the tests are comprehensive.
Issues Found
1. [Indentation] BuiltinTableGeneratingFunctions.java — json_each_text line has inconsistent indentation
The json_each_text entry has 20 spaces of indentation vs 12 spaces for every other entry in the list. Minor formatting issue.
2. [Missing outer variants] json_each and json_each_text lack _outer variants
Every other table-generating function in Doris (explode, explode_map, explode_json_object, explode_split, explode_numbers, posexplode, etc.) has a corresponding _outer variant that provides LEFT JOIN-like semantics (emitting a NULL row when the input is NULL or empty). json_each/json_each_text are the only table functions without outer variants.
- The BE side (
table_function_factory.cpp) already supports outer semantics automatically (it strips the_outersuffix, looks up the base function, and callsset_outer()). No BE changes are needed. - However, on the FE side: (a)
function_fake.cppusesregister_functioninstead ofregister_table_function_expand_outer, sojson_each_outer/json_each_text_outerare not registered in the BE function factory. (b) NoJsonEachOuter.java/JsonEachTextOuter.javaFE classes exist. (c) No registration inBuiltinTableGeneratingFunctions.javafor the outer variants. - This means
LATERAL VIEW OUTER json_each(...)will not work. - Consider whether this is intentional or an oversight. If intentional, a comment explaining why would be helpful.
3. [Performance] Const column optimization is incomplete
When _is_const = true, the process_row method still re-parses the JSON and rebuilds _kv_pairs on every row. The vexplode_numbers function demonstrates the pattern of pre-computing results in process_init and returning early in process_row when const. This is functionally correct but sub-optimal for queries with const JSON columns and many rows. Not a blocker, but worth noting as a follow-up optimization.
| @@ -89,7 +93,8 @@ | |||
There was a problem hiding this comment.
Nit: Indentation inconsistency. This line has 20 spaces of indentation (5 levels), but every other entry in this list uses 12 spaces (3 levels). Should be aligned with the line above:
tableGenerating(JsonEachText.class, "json_each_text"),| register_function<FunctionJsonEachText>(factory, "json_each_text"); | ||
| register_table_function_expand_outer_default<DataTypeString, false>(factory, "explode_split"); | ||
| register_table_function_expand_outer_default<DataTypeInt32, false>(factory, "explode_numbers"); | ||
| register_table_function_expand_outer_default<DataTypeInt64, false>(factory, |
There was a problem hiding this comment.
Missing outer variant registration. Every other table function uses register_table_function_expand_outer (e.g., explode_json_object on the line above), which registers both json_each and json_each_outer. Using plain register_function means json_each_outer / json_each_text_outer are not available, so LATERAL VIEW OUTER json_each(...) will not work.
Consider changing to:
register_table_function_expand_outer<FunctionJsonEach>(factory, "json_each");
register_table_function_expand_outer<FunctionJsonEachText>(factory, "json_each_text");And adding corresponding JsonEachOuter.java / JsonEachTextOuter.java FE classes + registrations.
If omitting outer variants is intentional, please add a comment explaining why.
There was a problem hiding this comment.
Code Review Summary: Feature Support table function json_each, json_each_text
Overall this is a clean, well-structured PR that follows existing patterns for adding table generating functions. The code is functionally correct, has good test coverage (unit tests + regression tests), and handles edge cases (NULL input, empty objects, non-object JSON, unicode, nested values) properly.
Issues Found
-
[Style] Indentation inconsistency in BuiltinTableGeneratingFunctions.java — The
JsonEachTextregistration line has extra indentation (double-indented) compared to all surrounding entries. Minor style nit. -
[Suggestion] Missing
_outervariants — Unlike most otherexplode_*table functions,json_eachandjson_each_textdo not provide_outervariants (json_each_outer,json_each_text_outer). In the BE,register_functionis used instead ofregister_table_function_expand_outer, and noJsonEachOuter/JsonEachTextOuterFE classes exist. This meansLATERAL VIEW json_each_outer(...)will fail. While there is precedent (e.g.,ExplodeVariantArray,Unnest), this is a usability gap that users may expect to work, especially those coming from PostgreSQL. Consider adding outer variants, or explicitly documenting why they are omitted. -
[Suggestion]
_is_constoptimization not fully utilized —process_initcorrectly detects const columns viaunpack_if_constand sets_is_const, butprocess_rowstill unconditionally re-parses JSON and rebuilds_kv_pairson every call. Since the base classTableFunction::process_rowpreserves_cur_sizewhen_is_constis true, the derived class could skip re-parsing after the first row for const inputs. This matches the existing pattern invexplode_json_object(which also doesn't optimize), but is a missed opportunity for performance improvement with literal/const JSON inputs.
Positive Observations
- Clean template design using
TEXT_MODEbool to share code betweenjson_eachandjson_each_text - Good handling of nullable columns, null JSON values, and edge cases
- Comprehensive regression tests covering: basic types, nulls, empty objects, non-objects, unicode, negative numbers, booleans, nested values, and inline literals
- Unit tests cover both
get_valueandget_same_many_valuespaths - Follows the established visitor pattern and function signature conventions correctly
fe/fe-core/src/main/java/org/apache/doris/catalog/BuiltinTableGeneratingFunctions.java
Show resolved
Hide resolved
|
|
||
| register_table_function_expand_outer<FunctionExplodeJsonObject>(factory, "explode_json_object"); | ||
| register_function<FunctionJsonEach>(factory, "json_each"); | ||
| register_function<FunctionJsonEachText>(factory, "json_each_text"); |
There was a problem hiding this comment.
Unlike the adjacent explode_json_object which uses register_table_function_expand_outer, json_each and json_each_text use plain register_function. This means json_each_outer / json_each_text_outer won't be recognized by the FE. If outer semantics are intentionally not supported, consider adding a comment explaining why. Otherwise, consider using register_table_function_expand_outer (and adding corresponding JsonEachOuter/JsonEachTextOuter FE classes) for consistency with the rest of the table function ecosystem.
| } | ||
|
|
||
| _kv_pairs.first = ColumnNullable::create(ColumnString::create(), ColumnUInt8::create()); | ||
| _kv_pairs.second = ColumnNullable::create(ColumnString::create(), ColumnUInt8::create()); |
There was a problem hiding this comment.
When _is_const is true, this code still re-parses JSON and rebuilds _kv_pairs for every row. Consider adding a short-circuit like:
if (_is_const && _cur_size > 0) {
return; // reuse _kv_pairs from first row
}before the JSON parsing (after TableFunction::process_row()). This would avoid redundant work for constant/literal inputs like LATERAL VIEW json_each('{"a":1}').
Note: The base class process_row already preserves _cur_size when _is_const is true, so the _kv_pairs from the first parse would remain valid.
|
/review |
There was a problem hiding this comment.
Code Review Summary: json_each / json_each_text Table Functions
Overall this is a well-structured implementation following existing patterns (particularly explode_json_object). The FE and BE implementations are consistent, the template approach (TEXT_MODE) is clean, and test coverage is reasonable. Below are findings organized by the critical review checkpoints.
Critical Checkpoint Conclusions
1. Goal & Correctness: The PR adds two table functions json_each and json_each_text that expand top-level JSON objects into key/value pairs (JSONB vs text value columns respectively). The implementation accomplishes this goal. Regression tests and unit tests prove the basic functionality.
2. Modification size/focus: The PR is focused and minimal for a new feature. 11 files changed, all related to the feature.
3. Concurrency: No concurrency concerns — table functions execute in a single-threaded pipeline context.
4. Lifecycle management: No special lifecycle concerns. Column pointers are properly nulled in process_close().
5. Configuration items: None added — N/A.
6. Incompatible changes: No incompatible changes — this is purely additive.
7. Parallel code paths (Missing _outer variants): explode_* table function has a corresponding _outer variant (e.g., explode_json_object / explode_json_object_outer). The _outer variant preserves the input row with NULL values when the generator produces zero rows (left-join semantics). json_each and json_each_text are missing their _outer variants. This is a functional gap — users have no way to get outer join behavior with these functions. See inline comments.
8. BE fake function registration: json_each / json_each_text are registered with register_function instead of register_table_function_expand_outer, which is consistent with not having outer variants, but inconsistent with how explode_json_object is registered. If outer variants are added, this needs to change.
9. DCHECK usage in function_fake.cpp: The new FunctionJsonEach and FunctionJsonEachText use DCHECK_EQ for argument type validation. Per coding standards, DCHECK disappears in RELEASE builds, leaving no protection. This should use a proper error return or DORIS_CHECK for invariant assertions. Note: this matches the existing pattern in FunctionExplodeJsonObject, so it's a pre-existing issue, but new code should not perpetuate it.
10. Test coverage: Good breadth — covers basic strings, numbers, booleans, null values, empty objects, SQL NULL input, non-object inputs (arrays, strings), unicode, negative numbers, inline literals, and multi-row scenarios. The BE unit test also covers get_same_many_values. Missing: FE unit tests (pattern exists in ExplodeTest.java), and the id=9 nested object/array test case only runs sql (no qt_ assertion).
11. Observability: N/A for a simple table function.
12. Transaction/persistence: N/A.
13. FE-BE variable passing: The function signatures are correctly matched between FE (JsonType input, Struct(String, JsonType/String) return) and BE (FunctionJsonEach/FunctionJsonEachText fake functions).
14. Formatting: Minor indentation issue in BuiltinTableGeneratingFunctions.java.
Issues Found (by severity)
| # | Severity | File | Issue |
|---|---|---|---|
| 1 | Medium | Multiple | Missing json_each_outer / json_each_text_outer variants |
| 2 | Low | function_fake.cpp |
DCHECK_EQ should be DORIS_CHECK (disappears in RELEASE) |
| 3 | Nit | BuiltinTableGeneratingFunctions.java |
Extra indentation on json_each_text line |
| 4 | Nit | vjson_each.cpp |
compile_check_end.h placed after closing brace of namespace — inconsistent with header |
|
|
||
| register_table_function_expand_outer<FunctionExplodeJsonObject>(factory, "explode_json_object"); | ||
| register_function<FunctionJsonEach>(factory, "json_each"); | ||
| register_function<FunctionJsonEachText>(factory, "json_each_text"); |
There was a problem hiding this comment.
Missing outer variant registration: explode_json_object is registered with register_table_function_expand_outer which automatically registers both the base and _outer variants. json_each and json_each_text are registered with plain register_function, meaning no json_each_outer / json_each_text_outer variants exist.
Every other explode_* table function has an _outer variant. Without it, users cannot get left-join semantics (preserving input rows when the function generates zero rows). Consider:
register_table_function_expand_outer<FunctionJsonEach>(factory, "json_each");
register_table_function_expand_outer<FunctionJsonEachText>(factory, "json_each_text");This also requires adding the corresponding FE classes (JsonEachOuter.java, JsonEachTextOuter.java), visitor methods, builtin registrations, and BE factory entries.
| struct FunctionJsonEach { | ||
| static DataTypePtr get_return_type_impl(const DataTypes& arguments) { | ||
| DCHECK_EQ(arguments[0]->get_primitive_type(), PrimitiveType::TYPE_JSONB) | ||
| << " json_each " << arguments[0]->get_name() << " not supported"; |
There was a problem hiding this comment.
Nit: DCHECK_EQ vanishes in RELEASE builds, leaving no argument type validation. Per coding standards, DORIS_CHECK is preferred for invariant assertions. This is a pre-existing pattern from FunctionExplodeJsonObject (line 143), but new code should prefer DORIS_CHECK.
What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary:
Release note
doc: apache/doris-website#3422
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)