Skip to content

Bug: COMP-1 (IBM HFP single precision) decodes incorrectly — IBM32_EXPONENT_MASK wrong (positive → ×4, negative → 0.0) #853

@vishalshirsath03

Description

@vishalshirsath03

Summary

FloatingPointDecoders$.decodeIbmSingleBigEndian has a copy-paste bug introduced in
commit #128 (July 2019) and present in every release up to and including 2.10.5.

IBM32_EXPONENT_MASK is set to 0x80000000 — identical to IBM32_SIGN_MASK
instead of the correct 0x7F000000. The IBM64 decoder next to it has the right mask
(0x7F00000000000000L), confirming this is a copy-paste error.

Affected versions

All versions from 2.x to 2.10.5 (current latest). Tested on 2.9.7 and 2.10.5.

Symptoms

Input (COMP-1 IBM HFP) Expected cobrix decodes as
4.5 (bytes 41 48 00 00) 4.5 18.0 (×4)
-3.75 (bytes C1 3C 00 00) -3.75 0.0
2.5 (bytes 41 28 00 00) 2.5 10.0 (×4)
0.0 (bytes 00 00 00 00) 0.0 0.0
Any negative IBM HFP negative float 0.0

COMP-2 (8-byte IBM HFP double) is NOT affecteddecodeIbmDoubleBigEndian has the correct mask.

Root cause

// FloatingPointDecoders.scala — decodeIbmSingleBigEndian
val IBM32_SIGN_MASK     = 0x80000000   // correct
val IBM32_EXPONENT_MASK = 0x80000000   // ← BUG: same as sign mask, should be 0x7F000000
...
var exponent = (mantissa & IBM32_EXPONENT_MASK) >> 22

Because IBM32_EXPONENT_MASK == IBM32_SIGN_MASK:

  • Positive IBM HFP (bit 31 = 0): exponent = 0 instead of the real biased exponent (e.g. 65 for value 4.5) → convertedExp is off → result is inflated ×4
  • Negative IBM HFP (bit 31 = 1): exponent = 0x80000000 >> 22 = -512convertedExp ≈ -383 → falls into the else { 0.0f } branch → always returns 0.0

Fix

One-line change:

// Before (wrong):
val IBM32_EXPONENT_MASK = 0x80000000

// After (correct):
val IBM32_EXPONENT_MASK = 0x7F000000

The IBM64 decoder already has the correct pattern for reference:

val IBM64_EXPONENT_MASK = 0x7F00000000000000L  // ← correct

Workaround

Until fixed: use .option("floating_point_format", "ieee754") and encode COMP-1
fields as IEEE 754 in the source data. This bypasses the broken decoder. Note this
affects all floating-point fields simultaneously, so COMP-2 source data must also
be re-encoded as IEEE 754.

For genuine mainframe EBCDIC files (where COMP-1 is always IBM HFP), there is no
workaround — the field will decode as ×4 for positive values and 0.0 for negatives.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions