Skip to content

bug: datafusion-spark substring returns wrong result for large negative start positions #21510

@andygrove

Description

@andygrove

Describe the bug

The datafusion-spark implementation of substring does not match Apache Spark behavior in some cases.

This was discovered by running a PySpark validation script against the .slt test files (see #17045, #21508).

To Reproduce

Run the validation script.

  Line 129: MISMATCH
    SQL: SELECT substring(column1, column2, column3)
FROM VALUES
('Spark SQL'::string, -3::int, 2::int),
('Spark SQL'::string, 3::int, 1::int),
('Spark SQL'::string, 3::int, 700::int),
('Spark SQL'::string, 3::int, -1::int),
('Spark SQL'::string, 3::int, 0::int),
('Spark SQL'::string, 300::int, 3::int),
('Spark SQL'::string, -300::int, 3::int),
(NULL::string, 3::int, 1::int),
('Spark SQL'::string, NULL::int, 1::int),
('Spark SQL'::string, 3::int, NULL::int);
    Translated: SELECT substring(col1, col2, col3)
FROM VALUES
(CAST('Spark SQL' AS STRING), CAST(-3 AS INT), CAST(2 AS INT)),
(CAST('Spark SQL' AS STRING), CAST(3 AS INT), CAST(1 AS INT)),
(CAST('Spark SQL' AS STRING), CAST(3 AS INT), CAST(700 AS INT)),
(CAST('Spark SQL' AS STRING), CAST(3 AS INT), CAST(-1 AS INT)),
(CAST('Spark SQL' AS STRING), CAST(3 AS INT), CAST(0 AS INT)),
(CAST('Spark SQL' AS STRING), CAST(300 AS INT), CAST(3 AS INT)),
(CAST('Spark SQL' AS STRING), CAST(-300 AS INT), CAST(3 AS INT)),
(CAST(NULL AS STRING), CAST(3 AS INT), CAST(1 AS INT)),
(CAST('Spark SQL' AS STRING), CAST(NULL AS INT), CAST(1 AS INT)),
(CAST('Spark SQL' AS STRING), CAST(3 AS INT), CAST(NULL AS INT));
  Expected (10 lines):
    SQ
    a
    ark SQL
    (empty)
    (empty)
    (empty)
    Spa
    NULL
    NULL
    NULL
  Actual (10 lines):
    SQ
    a
    ark SQL
    (empty)
    (empty)
    (empty)
    (empty)
    NULL
    NULL
    NULL

Additional context

Metadata

Metadata

Assignees

Labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions