Skip to content

[Resources] Support BCP 47 script code qualifiers in resource folders#5582

Open
2KAbhishek wants to merge 14 commits into
JetBrains:masterfrom
2KAbhishek:fix/bcp47-locale-qualifiers
Open

[Resources] Support BCP 47 script code qualifiers in resource folders#5582
2KAbhishek wants to merge 14 commits into
JetBrains:masterfrom
2KAbhishek:fix/bcp47-locale-qualifiers

Conversation

@2KAbhishek
Copy link
Copy Markdown

@2KAbhishek 2KAbhishek commented Apr 20, 2026

Resource folder parsing rejected valid Android BCP 47 folder names like values-b+sr+Latn (script code), values-b+zh+Hans+CN (multi-segment), and values-b+es+419 (numeric region) with "unknown qualifier" errors when placed under commonMain/composeResources.
The Gradle plugin split folder names on - without understanding the b+lang+code segment format, and qualifier validation only accepted 2-letter region codes.

This PR adds support for BCP 47 script codes (ISO 15924) and numeric region codes in resource folder qualifiers, propagates the script through both the suspend (getString()) and Composable (stringResource()) lookup paths.

Fixes CMP-4449
Fixes #4449

Testing

  • testGetPathByEnvironmentWithScript in ResourceTest covers script matching (sr-Latn, sr-Cyrl, zh-Hans, zh-Hant), region narrowing, empty-script fallback, cross-language default resolution, and the script-priority guard.
  • testDefaultComposeEnvironmentPropagatesSystemScript and testDefaultComposeEnvironmentDropsSystemScriptOnLanguageOverride cover the Composable stringResource() script lookup path.
  • testBcpFolderQualifiers and testBcpFolderQualifiersInvalid integration tests cover language-only, script, alpha-2 and numeric regions, multi-segment chains, and trailing theme/density qualifiers, plus rejection of malformed segments, out-of-order subtags, trailing locale qualifiers after b+..., and standalone script qualifiers without b+.
  • Verified end-to-end on Android emulator (API 36) with a sample app

Release Notes

Fixes - Resources

  • Support BCP 47 script (values-b+sr+Latn, values-b+zh+Hans) and numeric region (values-b+es+419) qualifiers, including multi-segment locales and trailing theme/density qualifiers, in commonMain resource folders

Authored with AI assistance, reviewed and tested manually before submitting.

@terrakok terrakok self-requested a review April 20, 2026 09:32
@2KAbhishek
Copy link
Copy Markdown
Author

@terrakok thank you for the review comments, I will be resolving those and get back

@westnordost
Copy link
Copy Markdown

westnordost commented Apr 21, 2026

In one comment you write you also added support for UN M49 region codes.
From a multiplatform perspective, all you have in order to decide which resource to use is the androidx.compose.ui.text.intl.Locale, i.e. ISO 639 language code, ISO 15924 script code and ISO 3166 region code. To be able to interpret UN M49 region codes correctly, you'd need a mapping from these to ISO 3166 region codes and I don't see this anywhere in this PR.
Anyway, you do mention that support for these "is not fully baked in yet", but I'd suggest to leave this out of this PR completely. The utility of ISO 15924 script code is acknowledged by maintainers, support for UN M49 identifiers would be a different issue.

Addendum: Note however that IETF BCP47 language tags actually allow for UN.49 region codes in place of a ISO 3166 region code, e.g. es-Latn-419 would indeed be a valid IETF BCP47 language tag. This means, that any platform that has full support for parsing such a language tag would also understand UN.49 region codes out of the box.

(see #5582 (comment))

@westnordost
Copy link
Copy Markdown

westnordost commented Apr 21, 2026

As a side-note, I always wondered about the odd b+<language code>-<script code> syntax in Android. It seems to me that a unique identifier for language + script + region code would be perfectly possible without the b+ in front. And also, about the <language code>-r<region code> syntax, i.e. with the r in the middle.

Reading the Android documentation on this matter, it looks like the above mentioned syntax simply came to be for historic reasons. Only with Android 7.0, the b+<language code>-<script code> was added.

So, wouldn't it be fine to just use 1:1 IETF BCP47 language tags as directory names instead of inheriting this legacy Android syntax? E.g.

  • sr-Latn-RS
  • zh-Hans
  • en-GB

instead of

  • b+sr+Latn+RS
  • b+zh+Hans
  • en-rGB

Of course, using en-GB instead of en-rGB would be a breaking change, so, to not break existing software using Compose Multiplatform resources, en-rGB would still need to be understood correctly for backward compatibility.

Disclaimer: I am no maintainer here, this is just my opinion. It remains with maintainers such as @terrakok to make any decisions in this regard. I just want to allude to that now is probably the only sensible opportunity to drop that legacy syntax.

@2KAbhishek 2KAbhishek force-pushed the fix/bcp47-locale-qualifiers branch 2 times, most recently from 787cf07 to 8ef360f Compare April 27, 2026 15:05
@2KAbhishek
Copy link
Copy Markdown
Author

2KAbhishek commented Apr 28, 2026

In one comment you write you also added support for UN M49 region codes. From a multiplatform perspective, all you have in order to decide which resource to use is the androidx.compose.ui.text.intl.Locale, i.e. ISO 639 language code, ISO 15924 script code and ISO 3166 region code. To be able to interpret UN M49 region codes correctly, you'd need a mapping from these to ISO 3166 region codes and I don't see this anywhere in this PR. Anyway, you do mention that support for these "is not fully baked in yet", but I'd suggest to leave this out of this PR completely. The utility of ISO 15924 script code is acknowledged by maintainers, support for UN M49 identifiers would be a different issue.

@westnordost can you please help by creating an issue for this on YouTrack? I think it will be better to take the UN M49 mapping as a subsequent fix and not include it on this PR - I can work on adding it

@2KAbhishek
Copy link
Copy Markdown
Author

2KAbhishek commented Apr 28, 2026

As a side-note, I always wondered about the odd b+<language code>-<script code> syntax in Android. It seems to me that a unique identifier for language + script + region code would be perfectly possible without the b+ in front. And also, about the <language code>-r<region code> syntax, i.e. with the r in the middle.

@westnordost Appreciate the suggestion, but for this PR I'd like to keep scope tight: support what Android's resource system already supports - b+ and lang-rREGION, so resources can be ported from Android to CMP without extra steps.

Moving to IETF-pure folder syntax (sr-Latn-RS, en-GB) is a worthwhile discussion but I think it's a separate concern from script-code support and would need its own deprecation story for the existing region syntax. Happy to revisit as a follow-up issue if needed.

@2KAbhishek 2KAbhishek requested a review from terrakok April 28, 2026 08:51
@2KAbhishek
Copy link
Copy Markdown
Author

@terrakok I have resolved the suggestions and did some clean up, can you please take another look 🙏🏽

@westnordost
Copy link
Copy Markdown

@westnordost Appreciate the suggestion, but for this PR I'd like to keep scope tight: support what Android's resource system already supports - b+ and lang-rREGION, so resources can be ported from Android to CMP without extra steps.

Right, but the resource directory names for Compose Multiplatform Resources already deviate from those from Android: Android uses night, notnight, CMP uses dark, light. Given that such a quite unnecessary change was already made, I am inferring that CMP-folks are more open towards deviating from the legacy Android resource directory naming scheme. Well, all that is just speculation from my part, though.

@westnordost
Copy link
Copy Markdown

westnordost commented Apr 28, 2026

One note about UN49 support:

I wrote earlier...

Addendum: Note however that IETF BCP47 language tags actually allow for UN.49 region codes in place of a ISO 3166 region code, e.g. es-Latn-419 would indeed be a valid IETF BCP47 language tag. This means, that any platform that has full support for parsing such a language tag would also understand UN.49 region codes out of the box.

This is wrong, or at least not helpful. While in e.g. JavaScript, es-419 is parsed without error (i.e. it is supported) "419" is just put into the region field. So what's actually missing is an API that would allow one to check whether CL is in 419. Hence, regardless, a mapping UN.49 <-> ISO 3166 code in Kotlin would still be necessary for UN.49 support in directory names. Sorry for the noise.

@2KAbhishek 2KAbhishek force-pushed the fix/bcp47-locale-qualifiers branch from f31d8c9 to bb1e878 Compare April 28, 2026 11:05
@2KAbhishek
Copy link
Copy Markdown
Author

@terrakok just wanted to follow up for a re-review, can you please take another look

2KAbhishek added 14 commits May 12, 2026 16:41
Compose Multiplatform resource parsing rejected valid Android BCP folder
names like values-b+sr+Latn (script code) and values-b+es+419 (numeric
region) with "unknown qualifier" errors. The Gradle plugin split folder
names on '-' without understanding the b+lang+code segment format, and
qualifier validation only accepted 2-letter region codes.

Gradle plugin:
- Add parseAndroidFolderName() to handle BCPF b+lang+code segments
- Expand qualifier regexes for 4-letter ISO 15924 script codes and
  3-digit numeric region codes
- Fix region code extraction (takeLast(2) -> removePrefix("r"))
- Emit ScriptQualifier in generated accessors with path validation

Runtime library:
- Add ScriptQualifier class
- Thread script through ResourceEnvironment on all platforms
- Extend filterByLocale: language -> script -> region, with fallback
  to default when no match; empty-script environments (Compose Locale
  does not expose script yet) prefer non-script items first

Tests:
- Add testGetPathByEnvironmentWithScript (script matching, empty-script
  fallback, cross-language default)
- Add testBcpFolderQualifiers integration test for BCP folder parsing

Fixes https://youtrack.jetbrains.com/issue/CMP-4449/
Fixes JetBrains#4449
Split parseAndroidFolderName into parseComposeResourceLocaleQualifiers
(orchestrator) and expandBcpQualifier (BCP segment expansion). The new
structure supports folders that mix standard qualifiers with BCP
segments (values-b+zh+Hant-dark) and multi-segment BCP chains
(values-b+sr+Latn+RS).

- Tighten path validation in addQualifiers with pathContainsBcpSubtag
  regex helper so multi-segment BCP paths are verified precisely
- Malformed BCP segments now fall through to addQualifiers which
  reports "unknown qualifier", instead of being silently dropped
- Expand integration tests: positive cases for script-only, numeric
  region, multi-segment, and mixed-qualifier BCP folders; negative
  test for malformed segments
Rewrite filterByLocale using val + early-return style consistent with
filterBy and filterByDensity in the same file. Extract filterByRegion
helper to avoid repeating region filter logic. Extract noLocaleItems
once to avoid the duplicated final-fallback filter.

The ScriptQualifier("") case from DefaultComposeEnvironment naturally
falls through the chain without needing a dedicated branch.

- Expand unit test with labeled assertions for each filter case:
  language+script+region (exact and region-fallback), language
  without script, language+region ignoring script, and no-language
  default
DefaultComposeEnvironment.rememberEnvironment() hardcoded
ScriptQualifier(""), which made BCP 47 script-qualified folders (e.g.
values-b+zh+Hant) unreachable from stringResource() because
androidx.compose.ui.text.intl.Locale doesn't expose a script field.
Source the script from getSystemResourceEnvironment() instead, so the
Composable lookup path uses the same script the suspend
getString()/getDrawable()/etc. lookups already do.
addQualifiers verified "script after language" / "region after language"
by re-scanning the resource folder path with a regex
(pathContainsBcpSubtag), even though the parser had already produced the
qualifiers in source order.
- Reject locale-shaped qualifiers (lang / Script / rRR / r000) that follow
  a b+... segment, e.g. values-b+sr+Latn-rRS or values-b+sr-rRS. Mixing the
  two locale syntaxes in one folder is ambiguous; the BCP segment must
  carry the full locale.
- Enforce BCP 47 subtag ordering inside the b+... segment: language must
  come first, and subsequent subtags must follow language < script < region.
  Out-of-order combos like values-b+sr+RS+Latn now fall through to
  "unknown qualifier" instead of being silently accepted.
Two correctness fixes around the new ScriptQualifier support:

- filterByLocale: when the environment requests a specific script, never fall
  back across scripts via region match. Script (e.g. Hans vs Hant, Latn vs Cyrl)
  is a stronger signal than region; falling across scripts can show Traditional
  Chinese to a user who explicitly asked for Simplified just because the region
  matched. The cross-script byRegion fallback now runs only when the env script
  is empty.

- DefaultComposeEnvironment: only borrow the system's script when the compose
  locale language still matches the system's. If the app overrides Locale.current
  (e.g. an in-app language picker) to a different language, ScriptQualifier("")
  is used instead, so e.g. switching to "en" on a "ru-Cyrl" system no longer
  produces a nonsensical "en-Cyrl" environment.
values-sr-Latn / values-sr-rRS-Latn were accepted as
non-BCP script syntax. Reject them in parseComposeResourceQualifiers
so the parser stays scoped to Android's b+... segment, and add
matching invalid cases to testBcpFolderQualifiersInvalid.

Also:
- Add ScriptQualifier.isEmpty() helper and use it in filterByLocale.
- Hoist BCP 47 / Android qualifier regexes to file-private vals
  shared by parseComposeResourceQualifiers, isLocaleShapedQualifier,
  and expandBcpQualifier.
- Match the existing "Forbidden directory name '$dirName'! ..."
  error style for the two new validation messages.
filterBy already does exact-match-then-default-fallback for any
Qualifier subtype, which is what filterByRegion was reimplementing
specifically for RegionQualifier.
Bad BCP 47 folders previously laundered into the generic
"unknown qualifier" message. Now expandBcpQualifier raises explicit
errors for malformed subtags, missing leading language, and out-of-order
subtags, and parseComposeResourceQualifiers rejects 'b+...' segments
that aren't the first qualifier instead of letting them fall through.
Adds the missing positive case for a single-language b+ segment, plus
invalid cases for empty subtags, uppercase-only language slot, and
multi-region segments.
Two related cleanups around parseComposeResourceQualifiers:

1. Drop the early `if (parts.first().isEmpty()) return null`. Folders like
   '-foo' were silently skipped, leaving developers without diagnostics.

2. The six BCP-related errors had inconsistent location context: three
   pointed at `$dirName`, three at `$segment`, one had no suffix. Pass
   dirName through expandBcpQualifier so every error ends with
   "in '$dirName'." and names the offending qualifier or subtag when
   useful.
@2KAbhishek 2KAbhishek force-pushed the fix/bcp47-locale-qualifiers branch from 3efc792 to b5110e2 Compare May 12, 2026 11:14
@2KAbhishek
Copy link
Copy Markdown
Author

@igordmn @eymar @Schahen @MatkovIvan
Sorry for making noise, I wanted to get some eyes on this PR as it has been waiting for some time

Will be super helpful if you can take a look or let me know who can help with this 🙏🏽
Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Ability to support more language and region qualifiers

3 participants