fix: handle non-BMP Unicode codepoints in foldl, foldr, and %c format by JoshRosen · Pull Request #606 · databricks/sjsonnet

JoshRosen · 2026-02-18T08:20:21Z

This PR fixes two more non-BMP Unicode bugs:

foldl/foldr iterated strings by UTF-16 code unit (for (char <- s.value)), splitting non-BMP characters like emoji into surrogate pair halves. Use codePointAt/codePointBefore with Character.charCount for correct codepoint iteration.
The %c format conversion used s.toChar.toString which truncates codepoints above U+FFFF to 16 bits. Use Character.toString(s.toInt) instead.

All code written by Claude Opus 4.6.

foldl/foldr iterated strings by UTF-16 code unit (for (char <- s.value)), splitting non-BMP characters like emoji into surrogate pair halves. Use codePointAt/codePointBefore with Character.charCount for correct codepoint iteration. The %c format conversion used s.toChar.toString which truncates codepoints above U+FFFF to 16 bits. Use Character.toString(s.toInt) instead. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: handle non-BMP Unicode codepoints in foldl, foldr, and %c format#606

fix: handle non-BMP Unicode codepoints in foldl, foldr, and %c format#606
JoshRosen wants to merge 1 commit intodatabricks:masterfrom
JoshRosen:fix-unicode-foldl-foldr-format-c

JoshRosen commented Feb 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

JoshRosen commented Feb 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments