Skip to content

fix: Serialize native LIST/STRUCT/ARRAY/UNION columns per-row in REST JSON (#89)#90

Merged
jrosskopf merged 3 commits into
mainfrom
fix/gh-89-list-struct-serialization
Jun 25, 2026
Merged

fix: Serialize native LIST/STRUCT/ARRAY/UNION columns per-row in REST JSON (#89)#90
jrosskopf merged 3 commits into
mainfrom
fix/gh-89-list-struct-serialization

Conversation

@jrosskopf

@jrosskopf jrosskopf commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Summary

Fixes #89 — the REST JSON serializer corrupted native nested-type columns:

  • LIST(VARCHAR) — each row reported the concatenation of every row's list in the chunk (e.g. 4 rows × 3 elements → a 12-element array per row).
  • LIST(STRUCT) — correct length, but every element was a copy of element[0] (the original report).
  • STRUCT — multi-row results always read element[0].
  • ARRAY (found in review) — routed through the LIST path, which after the fix reads duckdb_list_entry metadata that fixed-size arrays don't have.
  • UNION (found in review) — routed through the struct path, which emitted the tag + every candidate member instead of only the active one.

to_json(col) was unaffected (JSON-typed column path), which matched the reporter's workaround and pointed straight at per-row child-vector handling.

Root cause

  • convertVectorListToJson iterated the entire child vector (duckdb_list_vector_get_size) instead of the row's own duckdb_list_entry {offset, length} slice.
  • convertVectorStructToJson always passed child index 0 instead of row_idx.
  • ARRAY/UNION were aliased onto the list/struct paths, which is incorrect for their physical layouts.

Fix

  • List: read the per-row duckdb_list_entry offset/length and only emit that slice.
  • Struct: pass row_idx through to each child vector.
  • Array: new convertVectorArrayToJson using duckdb_array_type_array_size + duckdb_array_vector_get_child (child_idx = row_idx * array_size + i).
  • Union: new convertVectorUnionToJson — read the tag (struct child 0, uint8), resolve the active member (struct child tag+1), emit it as {member_name: value} to match DuckDB's own to_json.
  • All paths now honor row validity, so NULL nested values serialize as null.

Test plan

  • New [query_executor][list] regression tests: single/multi-row LIST(STRUCT), multi-row LIST(VARCHAR), multi-row STRUCT, fixed-size ARRAY, multi-row UNION, and NULL list entries (red before fix, green after).
  • Full C++ unit suite: 643/643 passing.
  • Reviewed with codex (surfaced the ARRAY + UNION regressions, both fixed and re-reviewed LGTM).

Closes #89

- convertVectorListToJson read the entire list child vector for every
  row instead of the row's own slice, so multi-row LIST(VARCHAR)
  columns reported the concatenation of all rows' lists.
- convertVectorStructToJson always read child element 0, so structs
  nested in a list (LIST(STRUCT)) repeated the first element N times.
- Use the per-row duckdb_list_entry offset/length for lists and pass
  row_idx through to struct children; honor row validity (NULL lists
  and structs now serialize as null).
- Add regression tests covering single/multi-row LIST(STRUCT),
  LIST(VARCHAR), multi-row STRUCT, and NULL list entries.

Closes #89
ARRAY columns were routed through the LIST serializer, which after the
list fix reads per-row duckdb_list_entry offset/length metadata. ARRAY
vectors have no such metadata (each row is a constant array_size run in
the child vector), so this read garbage and risked UB.

- Add convertVectorArrayToJson using duckdb_array_type_array_size and
  duckdb_array_vector_get_child, indexing child at row_idx*array_size+i.
- Route DUCKDB_TYPE_ARRAY to the new serializer; honor row validity.
- Add a multi-row fixed-size ARRAY regression test.

Found in codex review of the LIST/STRUCT fix.
UNION columns were routed through the generic struct serializer, which
emitted the tag plus every candidate member for each row, exposing
inactive members instead of the one selected by the row's tag.

- Add convertVectorUnionToJson: read the tag (struct child 0, uint8),
  resolve the active member (struct child tag+1) and emit it as
  {member_name: value}, matching DuckDB's own to_json output.
- Route DUCKDB_TYPE_UNION to the new serializer; honor row validity and
  fail safe on out-of-range tags.
- Add a multi-row UNION regression test.

Found in codex review of the LIST/STRUCT fix.
@jrosskopf jrosskopf changed the title fix: Serialize native LIST/STRUCT/ARRAY columns per-row in REST JSON (#89) fix: Serialize native LIST/STRUCT/ARRAY/UNION columns per-row in REST JSON (#89) Jun 25, 2026
@jrosskopf jrosskopf merged commit ab902db into main Jun 25, 2026
21 checks passed
@jrosskopf jrosskopf deleted the fix/gh-89-list-struct-serialization branch June 25, 2026 13:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

REST JSON serializer repeats the first element of a LIST(STRUCT) column

1 participant