Skip to content

Add ARRAY_TO_CSV helper function #4587#5404

Open
ishag4 wants to merge 3 commits into
opensearch-project:mainfrom
ishag4:issue-4587
Open

Add ARRAY_TO_CSV helper function #4587#5404
ishag4 wants to merge 3 commits into
opensearch-project:mainfrom
ishag4:issue-4587

Conversation

@ishag4
Copy link
Copy Markdown

@ishag4 ishag4 commented May 4, 2026

Description

This PR implements the ARRAY_TO_CSV function for OpenSearch SQL, which converts arrays to comma-separated value strings with optional custom delimiters.

Function Signature

// Single parameter - uses default comma delimiter
ARRAY_TO_CSV(array) -> string

// Two parameters - uses custom delimiter
ARRAY_TO_CSV(array, delimiter) -> string

Usage Examples

// Basic usage with default comma delimiter
source=people | eval result = array_to_csv(array('apple', 'banana', 'cherry'))
// Returns: "apple,banana,cherry"

// Custom delimiter
source=people | eval result = array_to_csv(array('apple', 'banana', 'cherry'), '|')
// Returns: "apple|banana|cherry"

// Mixed data types
source=people | eval result = array_to_csv(array('item', 123, 45.67, true))
// Returns: "item,123,45.67,true"

// Multi-character delimiter
source=people | eval result = array_to_csv(array('a', 'b', 'c'), ' -> ')
// Returns: "a -> b -> c"

Related Issues

Resolves #4587

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • New PPL command checklist all confirmed.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff or -s.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Isha Gupta <igupta24@apple.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 4, 2026

PR Reviewer Guide 🔍

(Review updated until commit 67bc7e0)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ Recommended focus areas for review

Null element handling

When an array element is null, the function appends nothing to the result (lines 102-104). This creates an empty position between delimiters (e.g., "a,,c" for array('a', null, 'c')). While this behavior is documented and tested, it may be unexpected for users who might expect "null" as a string or the element to be skipped entirely. Consider if this is the intended behavior or if explicit "null" string representation would be clearer.

Object element = array.get(i);
if (element != null) {
  result.append(element.toString());
}

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 4, 2026

PR Code Suggestions ✨

Latest suggestions up to 67bc7e0

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Handle CSV special characters properly

The current implementation doesn't handle special characters in array elements that
could break CSV format. Elements containing the delimiter, quotes, or newlines
should be properly escaped or quoted to maintain valid CSV structure.

core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/ArrayToCsvFunctionImpl.java [83-108]

 public static String arrayToCsv(List<Object> array, String delimiter) {
   if (array == null) {
     return null;
   }
 
   if (delimiter == null) {
     delimiter = ",";
   }
 
   if (array.isEmpty()) {
     return "";
   }
 
   StringBuilder result = new StringBuilder();
   for (int i = 0; i < array.size(); i++) {
     if (i > 0) {
       result.append(delimiter);
     }
     Object element = array.get(i);
     if (element != null) {
-      result.append(element.toString());
+      String value = element.toString();
+      // Escape values containing delimiter, quotes, or newlines
+      if (value.contains(delimiter) || value.contains("\"") || value.contains("\n")) {
+        value = "\"" + value.replace("\"", "\"\"") + "\"";
+      }
+      result.append(value);
     }
   }
 
   return result.toString();
 }
Suggestion importance[1-10]: 4

__

Why: While the suggestion correctly identifies that special characters aren't handled, the implementation may not align with the intended behavior of array_to_csv. The function appears designed for simple string joining rather than RFC 4180 CSV compliance. The lack of tests for special character handling suggests this may be intentional. The suggestion is valid but may not be necessary depending on requirements.

Low

Previous suggestions

Suggestions up to commit 9193758
CategorySuggestion                                                                                                                                    Impact
Possible issue
Handle CSV special characters properly

The current implementation doesn't handle special characters in CSV values. Elements
containing the delimiter, quotes, or newlines should be escaped or quoted according
to CSV standards to prevent parsing errors.

core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/ArrayToCsvFunctionImpl.java [85-110]

 public static String arrayToCsv(List<Object> array, String delimiter) {
     if (array == null) {
       return null;
     }
 
     if (delimiter == null) {
       delimiter = ",";
     }
 
     if (array.isEmpty()) {
       return "";
     }
 
     StringBuilder result = new StringBuilder();
     for (int i = 0; i < array.size(); i++) {
       if (i > 0) {
         result.append(delimiter);
       }
       Object element = array.get(i);
       if (element != null) {
-        result.append(element.toString());
+        String value = element.toString();
+        // Quote value if it contains delimiter, quotes, or newlines
+        if (value.contains(delimiter) || value.contains("\"") || value.contains("\n")) {
+          value = "\"" + value.replace("\"", "\"\"") + "\"";
+        }
+        result.append(value);
       }
     }
 
     return result.toString();
   }
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies that the current implementation doesn't handle CSV special characters like delimiters, quotes, or newlines within values. However, the implementation choice depends on the intended use case - whether this function should produce RFC 4180 compliant CSV or a simple delimiter-joined string. The suggestion assumes CSV compliance is needed, which may or may not be the requirement.

Medium
Suggestions up to commit baa3553
CategorySuggestion                                                                                                                                    Impact
General
Use explicit null policy for consistent behavior

Using NullPolicy.ARG0 means the function will return null if the first argument (the
array) is null, which is the intended behavior. However, when a two-argument form is
used and the second argument (delimiter) is null, the NullPolicy.ARG0 policy will
NOT short-circuit on a null delimiter — the arrayToCsv method handles this case by
defaulting to ",". This is correct, but it should be verified that NullPolicy.ARG0
does not also short-circuit on ARG1 being null in the two-argument case, potentially
bypassing the null-delimiter fallback logic. Consider using NullPolicy.NONE and
handling all null checks explicitly in arrayToCsv to ensure consistent behavior.

core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/ArrayToCsvFunctionImpl.java [30-32]

 public ArrayToCsvFunctionImpl() {
-  super(new ArrayToCsvImplementor(), NullPolicy.ARG0);
+  super(new ArrayToCsvImplementor(), NullPolicy.NONE);
 }
Suggestion importance[1-10]: 5

__

Why: The suggestion raises a valid concern about NullPolicy.ARG0 potentially short-circuiting on the first argument while the second argument's null handling is done manually in arrayToCsv. Switching to NullPolicy.NONE ensures all null checks are handled explicitly in the method, providing more predictable behavior. This is a moderate correctness/consistency concern.

Low
Clarify null element handling in loop

The current implementation skips null elements entirely (no output for null), but
the tests expect null elements to produce empty strings between delimiters (e.g.,
"GET,,WRITE"). The if (element != null) block should use an else branch to append an
empty string, or simply call result.append(element != null ? element.toString() :
""). The current code actually does produce the correct output since the delimiter
is appended regardless, but the logic is misleading — the if (element != null) block
effectively appends nothing for null, which is correct. However, to make the intent
explicit and avoid future bugs, use a ternary expression.

core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/ArrayToCsvFunctionImpl.java [99-106]

 public static String arrayToCsv(List<Object> array, String delimiter) {
   if (array == null) {
     return null;
   }
 
   if (delimiter == null) {
     delimiter = ",";
   }
 
   if (array.isEmpty()) {
     return "";
   }
 
   StringBuilder result = new StringBuilder();
   for (int i = 0; i < array.size(); i++) {
     if (i > 0) {
       result.append(delimiter);
     }
     Object element = array.get(i);
-    if (element != null) {
-      result.append(element.toString());
-    }
+    result.append(element != null ? element.toString() : "");
   }
 
   return result.toString();
 }
Suggestion importance[1-10]: 4

__

Why: The suggestion correctly identifies that the current code produces the right output but the intent is unclear. Replacing the if (element != null) block with a ternary expression makes the null-handling intent explicit and reduces the chance of future bugs. This is a minor readability/maintainability improvement.

Low
Suggestions up to commit 1a85890
CategorySuggestion                                                                                                                                    Impact
Possible issue
Fix null policy to cover all arguments

The class uses NullPolicy.ARG0, which means the implementor is only called when the
first argument (the array) is non-null. However, the second argument (delimiter)
could still be null at runtime, and the arrayToCsv method handles that by defaulting
to ",". This is fine, but the NotNullImplementor contract assumes non-null inputs —
if the delimiter expression evaluates to null at the Calcite expression level, it
may not be passed correctly. Consider using NullPolicy.ANY or NullPolicy.NONE to
ensure consistent null handling for both arguments.

core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/ArrayToCsvFunctionImpl.java [30-32]

-public static class ArrayToCsvImplementor implements NotNullImplementor {
-  @Override
-  public Expression implement(
-      RexToLixTranslator translator, RexCall call, List<Expression> translatedOperands) {
-    // Handle both 1-argument (with default delimiter) and 2-argument cases
-    if (translatedOperands.size() == 1) {
-      ...
-    } else if (translatedOperands.size() == 2) {
-      ...
-    } else {
-      throw new IllegalArgumentException(
-          "ARRAY_TO_CSV expects 1 or 2 arguments, got " + translatedOperands.size());
-    }
-  }
+public ArrayToCsvFunctionImpl() {
+  super(new ArrayToCsvImplementor(), NullPolicy.ANY);
 }
Suggestion importance[1-10]: 5

__

Why: Using NullPolicy.ARG0 means only the first argument's nullability is checked, but the second argument (delimiter) could be null at the Calcite expression level. Changing to NullPolicy.ANY would ensure consistent null handling for both arguments, though the arrayToCsv method already handles null delimiters gracefully.

Low
General
Use safer null-tolerant string conversion

When an element is null, the delimiter is still appended but no value is added,
which produces correct output (e.g., "a,,b"). However, the leading-null case (e.g.,
[null, "a"]) produces ",a" — a leading delimiter — which may be unexpected. The
current behavior is tested and accepted, but consider using String.join with
null-to-empty-string mapping via streams for clarity and correctness consistency.
More critically, using String.valueOf(element) instead of element.toString() avoids
a potential NullPointerException if the null check is ever removed or refactored.

core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/ArrayToCsvFunctionImpl.java [99-107]

 for (int i = 0; i < array.size(); i++) {
   if (i > 0) {
     result.append(delimiter);
   }
   Object element = array.get(i);
   if (element != null) {
-    result.append(element.toString());
+    result.append(String.valueOf(element));
   }
 }
Suggestion importance[1-10]: 2

__

Why: The change from element.toString() to String.valueOf(element) is functionally equivalent here since there's already a null check (if (element != null)) guarding the call. The suggestion provides minimal safety improvement since the null check already prevents NPE.

Low

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 6, 2026

Persistent review updated to latest commit baa3553

@ishag4
Copy link
Copy Markdown
Author

ishag4 commented May 8, 2026

Hi @LantaoJin @penghuo @RyanL1997 @Swiddis Could you please review?

Copy link
Copy Markdown
Member

@LantaoJin LantaoJin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add some integration tests in CalciteArrayFunctionIT?

Copy link
Copy Markdown
Collaborator

@penghuo penghuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add IT, Docs, PR descriptioni.

Signed-off-by: Isha Gupta <igupta24@apple.com>
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 9193758

Signed-off-by: Isha Gupta <igupta24@apple.com>
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 67bc7e0

@ishag4
Copy link
Copy Markdown
Author

ishag4 commented May 16, 2026

Hi @LantaoJin @penghuo Could you please re-review?

Copy link
Copy Markdown
Collaborator

@penghuo penghuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mvjoin function (added in 3.3.0, docs) already provides this functionality.
#4587 (comment)

I would suggest close this PR and use mvjoin.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] Add ARRAY_TO_CSV helper function

3 participants