Skip to content

Add optional fieldId and aliases to FieldSpec for column identity#18833

Open
navina wants to merge 2 commits into
apache:masterfrom
navina:column-mapping
Open

Add optional fieldId and aliases to FieldSpec for column identity#18833
navina wants to merge 2 commits into
apache:masterfrom
navina:column-mapping

Conversation

@navina

@navina navina commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

What

Adds two optional, nullable metadata fields to FieldSpec (pinot-spi):

Field Type Meaning
fieldId Integer A stable, name-independent identifier for the column.
aliases List<String> Alternate / historical names for the column.

Both follow the existing description / tags convention on FieldSpec:

  • fieldId is serialized with @JsonInclude(NON_NULL), aliases with @JsonInclude(NON_EMPTY).
  • Both are mirrored in the manual toJsonObject() serialization and included in equals() / hashCode().
  • They are not part of schema backward-compatibility validation, so adding or changing them doesn't fail a schema update today.

Why

fieldId provides a stable column identity that is decoupled from the column name. Together with aliases, it lays the groundwork for advanced schema evolutions like column rename in Pinot — a rename can preserve the column's identity and record the prior name as an alias, instead of looking like a drop-plus-add.

Compatibility

  • No impact on existing tables. Both fields default to null/empty and are omitted from JSON, so the serialized schema and wire format are unchanged for any column that doesn't set them.
  • Mixed-version safe. Older components ignore the unknown JSON properties on read; newer components emit nothing for columns without these fields. They are excluded from backward-compatibility checks, so a schema carrying them validates against one that doesn't.
  • No updates to query/ingestion paths or segment metadata today — this change only adds the fields to the schema model.

Tests

FieldSpecTest adds coverage for the new getters/setters, JSON round-trip (including omission when unset), and equals/hashCode participation.

@navina navina marked this pull request as ready for review June 22, 2026 21:18
@codecov-commenter

codecov-commenter commented Jun 22, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 64.78%. Comparing base (e72b074) to head (adeb373).
⚠️ Report is 2 commits behind head on master.

Additional details and impacted files
@@             Coverage Diff              @@
##             master   #18833      +/-   ##
============================================
+ Coverage     64.76%   64.78%   +0.01%     
  Complexity     1322     1322              
============================================
  Files          3393     3393              
  Lines        211025   211044      +19     
  Branches      33136    33140       +4     
============================================
+ Hits         136678   136728      +50     
+ Misses        63332    63295      -37     
- Partials      11015    11021       +6     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (?)
java-21 64.78% <100.00%> (+0.01%) ⬆️
temurin 64.78% <100.00%> (+0.01%) ⬆️
unittests 64.78% <100.00%> (+0.01%) ⬆️
unittests1 56.98% <100.00%> (+0.01%) ⬆️
unittests2 37.19% <25.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@xiangfu0 xiangfu0 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found one high-signal correctness issue; see inline comment.

@@ -585,6 +613,16 @@ public ObjectNode toJsonObject() {
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still does not round-trip the new metadata for legacy TIME columns. Schema.toJsonObject() serializes _timeFieldSpec through TimeFieldSpec.toJsonObject(), and that override never calls super.toJsonObject(), so fieldId / aliases are silently dropped for time columns even though this PR adds them generically on FieldSpec. Please thread these fields through TimeFieldSpec.toJsonObject() (and add a schema-level test) before merging.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xiangfu0 unrelated to my change, looks like tags,description and virtualColumnProvider are also not being serialized. Should I fix that in a separate PR?

Extend FieldSpec with stable fieldId and alias metadata for logical-to-physical column mapping, following the description/tags serde pattern and excluding these fields from backward compatibility checks.

Co-authored-by: Cursor <cursoragent@cursor.com>
TimeFieldSpec.toJsonObject() builds its JSON without calling super.toJsonObject(),
so the new fieldId/aliases were silently dropped for legacy TIME columns. Extract
the fieldId/aliases serialization into a protected FieldSpec#appendFieldIdAndAliases
helper and call it from TimeFieldSpec.toJsonObject(). Adds a schema-level round-trip
test.

Addresses review feedback on apache#18833.
@navina navina requested a review from xiangfu0 June 24, 2026 22:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants