Add optional fieldId and aliases to FieldSpec for column identity#18833
Add optional fieldId and aliases to FieldSpec for column identity#18833navina wants to merge 2 commits into
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #18833 +/- ##
============================================
+ Coverage 64.76% 64.78% +0.01%
Complexity 1322 1322
============================================
Files 3393 3393
Lines 211025 211044 +19
Branches 33136 33140 +4
============================================
+ Hits 136678 136728 +50
+ Misses 63332 63295 -37
- Partials 11015 11021 +6
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
xiangfu0
left a comment
There was a problem hiding this comment.
Found one high-signal correctness issue; see inline comment.
| @@ -585,6 +613,16 @@ public ObjectNode toJsonObject() { | |||
| } | |||
There was a problem hiding this comment.
This still does not round-trip the new metadata for legacy TIME columns. Schema.toJsonObject() serializes _timeFieldSpec through TimeFieldSpec.toJsonObject(), and that override never calls super.toJsonObject(), so fieldId / aliases are silently dropped for time columns even though this PR adds them generically on FieldSpec. Please thread these fields through TimeFieldSpec.toJsonObject() (and add a schema-level test) before merging.
There was a problem hiding this comment.
@xiangfu0 unrelated to my change, looks like tags,description and virtualColumnProvider are also not being serialized. Should I fix that in a separate PR?
Extend FieldSpec with stable fieldId and alias metadata for logical-to-physical column mapping, following the description/tags serde pattern and excluding these fields from backward compatibility checks. Co-authored-by: Cursor <cursoragent@cursor.com>
TimeFieldSpec.toJsonObject() builds its JSON without calling super.toJsonObject(), so the new fieldId/aliases were silently dropped for legacy TIME columns. Extract the fieldId/aliases serialization into a protected FieldSpec#appendFieldIdAndAliases helper and call it from TimeFieldSpec.toJsonObject(). Adds a schema-level round-trip test. Addresses review feedback on apache#18833.
What
Adds two optional, nullable metadata fields to
FieldSpec(pinot-spi):fieldIdIntegeraliasesList<String>Both follow the existing
description/tagsconvention onFieldSpec:fieldIdis serialized with@JsonInclude(NON_NULL),aliaseswith@JsonInclude(NON_EMPTY).toJsonObject()serialization and included inequals()/hashCode().Why
fieldIdprovides a stable column identity that is decoupled from the column name. Together withaliases, it lays the groundwork for advanced schema evolutions like column rename in Pinot — a rename can preserve the column's identity and record the prior name as an alias, instead of looking like a drop-plus-add.Compatibility
null/empty and are omitted from JSON, so the serialized schema and wire format are unchanged for any column that doesn't set them.Tests
FieldSpecTestadds coverage for the new getters/setters, JSON round-trip (including omission when unset), andequals/hashCodeparticipation.