Summary
FloatingPointDecoders$.decodeIbmSingleBigEndian has a copy-paste bug introduced in
commit #128 (July 2019) and present in every release up to and including 2.10.5.
IBM32_EXPONENT_MASK is set to 0x80000000 — identical to IBM32_SIGN_MASK —
instead of the correct 0x7F000000. The IBM64 decoder next to it has the right mask
(0x7F00000000000000L), confirming this is a copy-paste error.
Affected versions
All versions from 2.x to 2.10.5 (current latest). Tested on 2.9.7 and 2.10.5.
Symptoms
| Input (COMP-1 IBM HFP) |
Expected |
cobrix decodes as |
4.5 (bytes 41 48 00 00) |
4.5 |
18.0 (×4) |
-3.75 (bytes C1 3C 00 00) |
-3.75 |
0.0 |
2.5 (bytes 41 28 00 00) |
2.5 |
10.0 (×4) |
0.0 (bytes 00 00 00 00) |
0.0 |
0.0 ✓ |
| Any negative IBM HFP |
negative float |
0.0 |
COMP-2 (8-byte IBM HFP double) is NOT affected — decodeIbmDoubleBigEndian has the correct mask.
Root cause
// FloatingPointDecoders.scala — decodeIbmSingleBigEndian
val IBM32_SIGN_MASK = 0x80000000 // correct
val IBM32_EXPONENT_MASK = 0x80000000 // ← BUG: same as sign mask, should be 0x7F000000
...
var exponent = (mantissa & IBM32_EXPONENT_MASK) >> 22
Because IBM32_EXPONENT_MASK == IBM32_SIGN_MASK:
- Positive IBM HFP (bit 31 = 0):
exponent = 0 instead of the real biased exponent (e.g. 65 for value 4.5) → convertedExp is off → result is inflated ×4
- Negative IBM HFP (bit 31 = 1):
exponent = 0x80000000 >> 22 = -512 → convertedExp ≈ -383 → falls into the else { 0.0f } branch → always returns 0.0
Fix
One-line change:
// Before (wrong):
val IBM32_EXPONENT_MASK = 0x80000000
// After (correct):
val IBM32_EXPONENT_MASK = 0x7F000000
The IBM64 decoder already has the correct pattern for reference:
val IBM64_EXPONENT_MASK = 0x7F00000000000000L // ← correct
Workaround
Until fixed: use .option("floating_point_format", "ieee754") and encode COMP-1
fields as IEEE 754 in the source data. This bypasses the broken decoder. Note this
affects all floating-point fields simultaneously, so COMP-2 source data must also
be re-encoded as IEEE 754.
For genuine mainframe EBCDIC files (where COMP-1 is always IBM HFP), there is no
workaround — the field will decode as ×4 for positive values and 0.0 for negatives.
Summary
FloatingPointDecoders$.decodeIbmSingleBigEndianhas a copy-paste bug introduced incommit
#128(July 2019) and present in every release up to and including 2.10.5.IBM32_EXPONENT_MASKis set to0x80000000— identical toIBM32_SIGN_MASK—instead of the correct
0x7F000000. The IBM64 decoder next to it has the right mask(
0x7F00000000000000L), confirming this is a copy-paste error.Affected versions
All versions from 2.x to 2.10.5 (current latest). Tested on 2.9.7 and 2.10.5.
Symptoms
4.5(bytes41 48 00 00)4.518.0(×4)-3.75(bytesC1 3C 00 00)-3.750.02.5(bytes41 28 00 00)2.510.0(×4)0.0(bytes00 00 00 00)0.00.0✓0.0COMP-2 (8-byte IBM HFP double) is NOT affected —
decodeIbmDoubleBigEndianhas the correct mask.Root cause
Because
IBM32_EXPONENT_MASK == IBM32_SIGN_MASK:exponent = 0instead of the real biased exponent (e.g. 65 for value 4.5) →convertedExpis off → result is inflated ×4exponent = 0x80000000 >> 22 = -512→convertedExp ≈ -383→ falls into theelse { 0.0f }branch → always returns0.0Fix
One-line change:
The IBM64 decoder already has the correct pattern for reference:
Workaround
Until fixed: use
.option("floating_point_format", "ieee754")and encode COMP-1fields as IEEE 754 in the source data. This bypasses the broken decoder. Note this
affects all floating-point fields simultaneously, so COMP-2 source data must also
be re-encoded as IEEE 754.
For genuine mainframe EBCDIC files (where COMP-1 is always IBM HFP), there is no
workaround — the field will decode as
×4for positive values and0.0for negatives.