fix: Serialize native LIST/STRUCT/ARRAY/UNION columns per-row in REST JSON (#89)#90
Merged
Merged
Conversation
- convertVectorListToJson read the entire list child vector for every row instead of the row's own slice, so multi-row LIST(VARCHAR) columns reported the concatenation of all rows' lists. - convertVectorStructToJson always read child element 0, so structs nested in a list (LIST(STRUCT)) repeated the first element N times. - Use the per-row duckdb_list_entry offset/length for lists and pass row_idx through to struct children; honor row validity (NULL lists and structs now serialize as null). - Add regression tests covering single/multi-row LIST(STRUCT), LIST(VARCHAR), multi-row STRUCT, and NULL list entries. Closes #89
ARRAY columns were routed through the LIST serializer, which after the list fix reads per-row duckdb_list_entry offset/length metadata. ARRAY vectors have no such metadata (each row is a constant array_size run in the child vector), so this read garbage and risked UB. - Add convertVectorArrayToJson using duckdb_array_type_array_size and duckdb_array_vector_get_child, indexing child at row_idx*array_size+i. - Route DUCKDB_TYPE_ARRAY to the new serializer; honor row validity. - Add a multi-row fixed-size ARRAY regression test. Found in codex review of the LIST/STRUCT fix.
UNION columns were routed through the generic struct serializer, which
emitted the tag plus every candidate member for each row, exposing
inactive members instead of the one selected by the row's tag.
- Add convertVectorUnionToJson: read the tag (struct child 0, uint8),
resolve the active member (struct child tag+1) and emit it as
{member_name: value}, matching DuckDB's own to_json output.
- Route DUCKDB_TYPE_UNION to the new serializer; honor row validity and
fail safe on out-of-range tags.
- Add a multi-row UNION regression test.
Found in codex review of the LIST/STRUCT fix.
This was referenced Jun 25, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #89 — the REST JSON serializer corrupted native nested-type columns:
LIST(VARCHAR)— each row reported the concatenation of every row's list in the chunk (e.g. 4 rows × 3 elements → a 12-element array per row).LIST(STRUCT)— correct length, but every element was a copy of element[0] (the original report).STRUCT— multi-row results always read element[0].ARRAY(found in review) — routed through the LIST path, which after the fix readsduckdb_list_entrymetadata that fixed-size arrays don't have.UNION(found in review) — routed through the struct path, which emitted the tag + every candidate member instead of only the active one.to_json(col)was unaffected (JSON-typed column path), which matched the reporter's workaround and pointed straight at per-row child-vector handling.Root cause
convertVectorListToJsoniterated the entire child vector (duckdb_list_vector_get_size) instead of the row's ownduckdb_list_entry{offset, length}slice.convertVectorStructToJsonalways passed child index0instead ofrow_idx.ARRAY/UNIONwere aliased onto the list/struct paths, which is incorrect for their physical layouts.Fix
duckdb_list_entryoffset/length and only emit that slice.row_idxthrough to each child vector.convertVectorArrayToJsonusingduckdb_array_type_array_size+duckdb_array_vector_get_child(child_idx = row_idx * array_size + i).convertVectorUnionToJson— read the tag (struct child 0, uint8), resolve the active member (struct childtag+1), emit it as{member_name: value}to match DuckDB's ownto_json.null.Test plan
[query_executor][list]regression tests: single/multi-rowLIST(STRUCT), multi-rowLIST(VARCHAR), multi-rowSTRUCT, fixed-sizeARRAY, multi-rowUNION, and NULL list entries (red before fix, green after).Closes #89