Run the validation script.
Line 129: MISMATCH
SQL: SELECT substring(column1, column2, column3)
FROM VALUES
('Spark SQL'::string, -3::int, 2::int),
('Spark SQL'::string, 3::int, 1::int),
('Spark SQL'::string, 3::int, 700::int),
('Spark SQL'::string, 3::int, -1::int),
('Spark SQL'::string, 3::int, 0::int),
('Spark SQL'::string, 300::int, 3::int),
('Spark SQL'::string, -300::int, 3::int),
(NULL::string, 3::int, 1::int),
('Spark SQL'::string, NULL::int, 1::int),
('Spark SQL'::string, 3::int, NULL::int);
Translated: SELECT substring(col1, col2, col3)
FROM VALUES
(CAST('Spark SQL' AS STRING), CAST(-3 AS INT), CAST(2 AS INT)),
(CAST('Spark SQL' AS STRING), CAST(3 AS INT), CAST(1 AS INT)),
(CAST('Spark SQL' AS STRING), CAST(3 AS INT), CAST(700 AS INT)),
(CAST('Spark SQL' AS STRING), CAST(3 AS INT), CAST(-1 AS INT)),
(CAST('Spark SQL' AS STRING), CAST(3 AS INT), CAST(0 AS INT)),
(CAST('Spark SQL' AS STRING), CAST(300 AS INT), CAST(3 AS INT)),
(CAST('Spark SQL' AS STRING), CAST(-300 AS INT), CAST(3 AS INT)),
(CAST(NULL AS STRING), CAST(3 AS INT), CAST(1 AS INT)),
(CAST('Spark SQL' AS STRING), CAST(NULL AS INT), CAST(1 AS INT)),
(CAST('Spark SQL' AS STRING), CAST(3 AS INT), CAST(NULL AS INT));
Expected (10 lines):
SQ
a
ark SQL
(empty)
(empty)
(empty)
Spa
NULL
NULL
NULL
Actual (10 lines):
SQ
a
ark SQL
(empty)
(empty)
(empty)
(empty)
NULL
NULL
NULL
Describe the bug
The
datafusion-sparkimplementation ofsubstringdoes not match Apache Spark behavior in some cases.This was discovered by running a PySpark validation script against the
.slttest files (see #17045, #21508).To Reproduce
Run the validation script.
Additional context