Skip to content

Support multi-branch Avro union types via customProperties in ODCS #1126

@smtwilio

Description

@smtwilio

Motivation

Currently, the Avro importer in datacontract-cli throws an error when encountering union types with more than one non-null branch, due to the lack of union/oneOf support in ODCS. However, this blocks important Avro schema authoring patterns and hinders round-tripping to/from Avro. It's possible to support multi-branch union types losslessly by leveraging customProperties.

Builds on the fix in #1123, which replaced silent data loss with an explicit error.

Proposed Solution

Use customProperties to encode full union branch structure following these principles:

  • Use avroUnionBranches (JSON array) on the field to preserve the exact Avro branch order, e.g. ["null", "string", "array", "DetailedPayload", "ErrorPayload"].
  • For each complex branch (record, array, map, enum, fixed):
    • Add a sub-property to the union field's properties list.
    • Set the property name as the Avro branch type: Avro record name, "array", or "map".
    • Structure matches the corresponding Avro schema structure.
  • Primitives (string, int, etc.) need only be referenced in avroUnionBranches—no sub-property required.
  • Preserve union field nullability via required: false.
  • Preserve Avro default value (if any) via avroDefault customProperty.

Avro Spec Limitation (Uniqueness of Types in Union)

Per the Avro specification:

"Unions may not contain more than one schema with the same type, except for the named types record, fixed and enum."

This means there can be at most one anonymous array or map in a union, but multiple named records, enums, or fixed types are allowed.

  • Example: ["null", "array", "array"] is invalid in Avro.
  • Example: ["RecordA", "RecordB"] is valid.

This constraint ensures that the design — which uses the Avro type name for named types and just 'array'/'map' for anonymous types — is always unambiguous and round-trippable, since you can't have two 'array' or two 'map' types in a union.

Example

Given Avro:

{
  "name": "payload",
  "type": [
    "null",
    "string",
    { "type": "array", "items": "int" },
    { "type": "record", "name": "DetailedPayload", "fields": [...] },
    { "type": "record", "name": "ErrorPayload", "fields": [...] }
  ]
}

ODCS YAML would be:

- name: payload
  logicalType: object
  physicalType: union
  required: false
  customProperties:
    - property: avroUnionBranches
      value: '["null", "string", "array", "DetailedPayload", "ErrorPayload"]'
    - property: avroDefault
      value: "null"
  properties:
    - name: array
      logicalType: array
      physicalType: array
      items: { ... }  # modeled from Avro "items" schema
    - name: DetailedPayload
      logicalType: object
      physicalType: record
      properties: [...]
    - name: ErrorPayload
      ...  # as above

Roundtrip Guarantee

  • The field and all complex branches are modeled natively in ODCS; all Avro information is round-trippable: ODCS → Avro exporter reconstructs the original union, including ordering, structure, nullability, and defaults.

Benefits

  • Lossless Avro round-trip, enabling powerful schema evolution and interop scenarios.
  • No change required in the ODCS core spec — enhancement is contained in import/export logic and customProperties usage.
  • Fully supports all Avro types permissible in unions.
  • Enables annotating every complex branch's schema fields, supporting ODCS quality rules, descriptions, tags, etc.

References

  • Prior issue (completed — added explicit error for multi-branch unions): #1123

Request

  • Please consider supporting Avro multi-branch unions via this customProperties approach.
  • Willing to contribute the implementation for both importer and exporter.

Metadata

Metadata

Assignees

No one assigned

    Labels

    requires-specification-changeThe Data Contract Specification or ODCS must be changed before this features can be implemented

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions