Motivation
Currently, the Avro importer in datacontract-cli throws an error when encountering union types with more than one non-null branch, due to the lack of union/oneOf support in ODCS. However, this blocks important Avro schema authoring patterns and hinders round-tripping to/from Avro. It's possible to support multi-branch union types losslessly by leveraging customProperties.
Builds on the fix in #1123, which replaced silent data loss with an explicit error.
Proposed Solution
Use customProperties to encode full union branch structure following these principles:
- Use
avroUnionBranches (JSON array) on the field to preserve the exact Avro branch order, e.g. ["null", "string", "array", "DetailedPayload", "ErrorPayload"].
- For each complex branch (record, array, map, enum, fixed):
- Add a sub-property to the union field's
properties list.
- Set the property name as the Avro branch type: Avro record name,
"array", or "map".
- Structure matches the corresponding Avro schema structure.
- Primitives (string, int, etc.) need only be referenced in
avroUnionBranches—no sub-property required.
- Preserve union field nullability via
required: false.
- Preserve Avro default value (if any) via
avroDefault customProperty.
Avro Spec Limitation (Uniqueness of Types in Union)
Per the Avro specification:
"Unions may not contain more than one schema with the same type, except for the named types record, fixed and enum."
This means there can be at most one anonymous array or map in a union, but multiple named records, enums, or fixed types are allowed.
- Example:
["null", "array", "array"] is invalid in Avro.
- Example:
["RecordA", "RecordB"] is valid.
This constraint ensures that the design — which uses the Avro type name for named types and just 'array'/'map' for anonymous types — is always unambiguous and round-trippable, since you can't have two 'array' or two 'map' types in a union.
Example
Given Avro:
{
"name": "payload",
"type": [
"null",
"string",
{ "type": "array", "items": "int" },
{ "type": "record", "name": "DetailedPayload", "fields": [...] },
{ "type": "record", "name": "ErrorPayload", "fields": [...] }
]
}
ODCS YAML would be:
- name: payload
logicalType: object
physicalType: union
required: false
customProperties:
- property: avroUnionBranches
value: '["null", "string", "array", "DetailedPayload", "ErrorPayload"]'
- property: avroDefault
value: "null"
properties:
- name: array
logicalType: array
physicalType: array
items: { ... } # modeled from Avro "items" schema
- name: DetailedPayload
logicalType: object
physicalType: record
properties: [...]
- name: ErrorPayload
... # as above
Roundtrip Guarantee
- The field and all complex branches are modeled natively in ODCS; all Avro information is round-trippable: ODCS → Avro exporter reconstructs the original union, including ordering, structure, nullability, and defaults.
Benefits
- Lossless Avro round-trip, enabling powerful schema evolution and interop scenarios.
- No change required in the ODCS core spec — enhancement is contained in import/export logic and customProperties usage.
- Fully supports all Avro types permissible in unions.
- Enables annotating every complex branch's schema fields, supporting ODCS quality rules, descriptions, tags, etc.
References
- Prior issue (completed — added explicit error for multi-branch unions): #1123
Request
- Please consider supporting Avro multi-branch unions via this customProperties approach.
- Willing to contribute the implementation for both importer and exporter.
Motivation
Currently, the Avro importer in datacontract-cli throws an error when encountering union types with more than one non-null branch, due to the lack of union/
oneOfsupport in ODCS. However, this blocks important Avro schema authoring patterns and hinders round-tripping to/from Avro. It's possible to support multi-branch union types losslessly by leveraging customProperties.Builds on the fix in #1123, which replaced silent data loss with an explicit error.
Proposed Solution
Use customProperties to encode full union branch structure following these principles:
avroUnionBranches(JSON array) on the field to preserve the exact Avro branch order, e.g.["null", "string", "array", "DetailedPayload", "ErrorPayload"].propertieslist."array", or"map".avroUnionBranches—no sub-property required.required: false.avroDefaultcustomProperty.Avro Spec Limitation (Uniqueness of Types in Union)
Per the Avro specification:
This means there can be at most one anonymous array or map in a union, but multiple named records, enums, or fixed types are allowed.
["null", "array", "array"]is invalid in Avro.["RecordA", "RecordB"]is valid.This constraint ensures that the design — which uses the Avro type name for named types and just 'array'/'map' for anonymous types — is always unambiguous and round-trippable, since you can't have two 'array' or two 'map' types in a union.
Example
Given Avro:
{ "name": "payload", "type": [ "null", "string", { "type": "array", "items": "int" }, { "type": "record", "name": "DetailedPayload", "fields": [...] }, { "type": "record", "name": "ErrorPayload", "fields": [...] } ] }ODCS YAML would be:
Roundtrip Guarantee
Benefits
References
Request