Skip to content

Normalize user CRDT writes to NFD#2146

Open
myieye wants to merge 7 commits into
developfrom
claude/refactor-feature-implementation-1fFGI
Open

Normalize user CRDT writes to NFD#2146
myieye wants to merge 7 commits into
developfrom
claude/refactor-feature-implementation-1fFGI

Conversation

@myieye
Copy link
Copy Markdown
Collaborator

@myieye myieye commented Jan 30, 2026

Introduces normalizing new CRDT data to NFD, like LibLcm.

Note: the new API wrapper only auto-implements the READ API, so that future necessary normalization is not overlooked.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jan 30, 2026

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 10da2312-b26a-4b6d-bed8-795704190c81

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR introduces Unicode NFD text normalization wrappers for the MiniLcm write API. It adds a new MiniLcmWriteApiNormalizationWrapper that automatically normalizes user-entered text before database operations, updates service layer constructor wiring to apply normalization in the correct order, and includes comprehensive test coverage validating normalization across all entity types.

Changes

Cohort / File(s) Summary
Write Normalization Infrastructure
backend/FwLite/MiniLcm/Normalization/MiniLcmWriteApiNormalizationWrapper.cs, backend/FwLite/MiniLcm/Normalization/StringNormalizer.cs
New wrapper factory and wrapper class implementing IMiniLcmApi to apply NFD normalization across all write operations; includes per-entity normalization helpers and patch normalization logic. New StringNormalizer utility class with static methods for normalizing strings, MultiString, RichString, RichMultiString, and string arrays.
Service/Hub Wiring Updates
backend/FwLite/FwLiteShared/Services/MiniLcmJsInvokable.cs, backend/FwLite/FwLiteWeb/Hubs/CrdtMiniLcmApiHub.cs, backend/FwLite/FwLiteWeb/Hubs/FwDataMiniLcmHub.cs, backend/FwLite/FwLiteWeb/Hubs/MiniLcmApiHubBase.cs
Updated constructor signatures to accept separate read and write normalization wrapper factories; modified wrapper ordering to apply read normalization first, then write normalization, then validation. Added using directives for normalization namespace and removed unused imports.
Wrapper Extension & DI Registration
backend/FwLite/MiniLcm/Wrappers/MiniLcmWrappers.cs, backend/FwLite/MiniLcm/Validators/MiniLcmValidators.cs
Updated WrapWith extension method signature to accept nullable wrapper factory sequence and filter out null factories at runtime; added XML documentation. Registered MiniLcmWriteApiNormalizationWrapperFactory as transient service in DI container.
Test Infrastructure
backend/FwLite/MiniLcm.Tests/WriteNormalizationTests.cs
Added comprehensive test suite with WriteNormalizationTests covering all entity types and write operations, NormalizationAssertTests validating assertion utilities, NfcTestData factory methods for test data generation, and NormalizationAssert helper class for reflection-based normalization verification.
Test Base Updates
backend/FwLite/MiniLcm.Tests/MiniLcmTestBase.cs, backend/FwLite/MiniLcm.Tests/QueryEntryTestsBase.cs
Added MiniLcmWriteApiNormalizationWrapperFactory to test wrapper chain; removed manual string normalization from tests since write API now normalizes automatically.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • imnasnainaec
  • hahn-kev

Poem

🐰 Whiskers twitching with glee, the rabbit hops forth:
"Text now normalizes to NFD, letter by letter so clean,
From write to the database, each string finds its form,
No more chaos in Unicode—just harmony seen!" 🌿✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 6.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The PR title clearly summarizes the main change: implementing NFD normalization for user text on write operations in the CRDT system, which aligns with all the substantial changes in the changeset.
Description check ✅ Passed The PR description clearly relates to the changeset, describing the introduction of NFD normalization for CRDT data writes.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch claude/refactor-feature-implementation-1fFGI

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions Bot added the 💻 FW Lite issues related to the fw lite application, not miniLcm or crdt related label Jan 30, 2026
@myieye myieye changed the base branch from copilot/normalize-text-to-nfd to develop January 30, 2026 14:37
@myieye myieye closed this Jan 30, 2026
@myieye myieye reopened this Jan 30, 2026
@argos-ci
Copy link
Copy Markdown

argos-ci Bot commented Jan 30, 2026

The latest updates on your projects. Learn more about Argos notifications ↗︎

Build Status Details Updated (UTC)
default (Inspect) ✅ No changes detected - May 12, 2026, 8:21 AM

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jan 30, 2026

UI unit Tests

  1 files   59 suites   31s ⏱️
176 tests 176 ✅ 0 💤 0 ❌
245 runs  245 ✅ 0 💤 0 ❌

Results for commit f3497d6.

♻️ This comment has been updated with latest results.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Fix all issues with AI agents
In `@backend/FwLite/FwLiteProjectSync/CrdtFwdataProjectSyncService.cs`:
- Around line 66-70: The CRDT API wrapper order is inconsistent with
MiniLcmApiHubBase: replace the current nested creation that inserts
readNormalization between writeNormalization and validation and instead wrap
crdtApi so writeNormalization directly wraps validation (i.e., use
writeNormalizationWrapperFactory.Create(validationWrapperFactory.Create(crdtApi)))
to match MiniLcmApiHubBase; keep fwdataApi as-is (readNormalization +
validation). Update the line that assigns crdtApi which currently calls
writeNormalizationWrapperFactory.Create(readNormalizationWrapperFactory.Create(validationWrapperFactory.Create(crdtApi))).

In `@backend/FwLite/FwLiteShared/Services/MiniLcmJsInvokable.cs`:
- Line 26: The wrapper factories are applied in reverse by WrapWith, causing
validation to run before write normalization; update the factory list passed to
WrapWith when constructing _wrappedApi so the runtime order becomes Client →
Write Normalization → Read Normalization → Validation → Core API by using the
factory sequence [notificationWrapperFactory, validationWrapperFactory,
readNormalizationWrapperFactory, writeNormalizationWrapperFactory] (so after
WrapWith reversal it yields the documented write→read→validation order); ensure
this change in IMiniLcmApi _wrappedApi construction mirrors the ordering used in
MiniLcmApiHubBase.

In `@backend/FwLite/MiniLcm.Tests/WriteNormalizationTests.cs`:
- Around line 536-557: The test exclusion for JsonPatch overloads leaves
NormalizePatch untested; remove or adjust the IsJsonPatchOverload filter in
GetMethodsThatNeedTests so UpdateObjectInput<> overloads are included, and add a
test (e.g., UpdateWritingSystem) that sends a JsonPatchDocument containing
non-normalized strings through the same path the
MiniLcmWriteApiNormalizationWrapper uses (or invoke the wrapper directly) and
asserts the underlying API receives normalized values; reference
UpdateObjectInput<>, IsJsonPatchOverload, GetMethodsThatNeedTests,
MiniLcmWriteApiNormalizationWrapper, NormalizePatch, and the UpdateWritingSystem
test name when implementing the change.

In `@backend/FwLite/MiniLcm/Normalization/README.md`:
- Line 189: Replace the placeholder issue link "Issue:
https://github.com/sillsdev/languageforge-lexbox/issues/[issue-number]" in the
README by updating the bracketed token to the real GitHub issue number (or the
correct URL) so the reference resolves; locate the exact line containing that
string in Normalization/README.md and substitute [issue-number] with the actual
numeric ID (or remove the line if no issue exists).
🧹 Nitpick comments (4)
backend/FwLite/MiniLcm/Normalization/README.md (2)

24-26: Replace ASCII diagram with Mermaid.

As per coding guidelines, use Mermaid diagrams instead of ASCII art for flowcharts and architecture documentation.

📊 Proposed Mermaid diagram
-```
-Client → Write Normalization → Read Normalization → Validation → Core API → Database
-```
+```mermaid
+flowchart LR
+    Client --> WN[Write Normalization]
+    WN --> RN[Read Normalization]
+    RN --> V[Validation]
+    V --> CA[Core API]
+    CA --> DB[Database]
+```

186-189: Convert bare URLs to markdown links.

Static analysis flagged bare URLs. Consider using proper markdown link syntax for consistency.

📝 Proposed fix
 ## References

-- Unicode Normalization: https://unicode.org/reports/tr15/
-- Issue: https://github.com/sillsdev/languageforge-lexbox/issues/[issue-number]
+- [Unicode Normalization (TR15)](https://unicode.org/reports/tr15/)
+- [Related Issue](https://github.com/sillsdev/languageforge-lexbox/issues/[issue-number])
 - FwLite Architecture: `backend/FwLite/AGENTS.md`
backend/FwLite/MiniLcm/Normalization/StringNormalizer.cs (1)

53-65: Inconsistent handling of null RichString values in RichMultiString.

When Normalize(value) returns null (which happens when the input RichString is null per lines 41), the entry is silently dropped from the result. This differs from Normalize(MultiString) which preserves all keys including those with empty/null values.

Consider whether this asymmetry is intentional. If null RichStrings should be preserved, the logic should be updated.

🔄 Proposed fix to preserve null entries
     public static RichMultiString Normalize(RichMultiString richMultiString)
     {
         var normalized = new RichMultiString(richMultiString.Count);
         foreach (var (key, value) in richMultiString)
         {
             var normalizedRichString = Normalize(value);
-            if (normalizedRichString is not null)
-            {
-                normalized[key] = normalizedRichString;
-            }
+            // Preserve all keys, even if the value is null
+            normalized[key] = normalizedRichString;
         }
         return normalized;
     }
backend/FwLite/MiniLcm.Tests/WriteNormalizationTests.cs (1)

928-1171: Consider handling IEnumerable<string> collections.

Currently only string[] is validated; List<string>/IReadOnlyList<string> would be skipped. If such fields exist now or later, normalization checks could miss them.

♻️ Possible enhancement
-                // Check collections of model objects
-                else if (value is IEnumerable enumerable && value is not string)
+                // Check collections of strings
+                else if (value is IEnumerable<string> stringEnumerable)
+                {
+                    int index = 0;
+                    foreach (var item in stringEnumerable)
+                    {
+                        CheckString(item, $"{propPath}[{index++}]", expectNfc, issues);
+                    }
+                }
+                // Check collections of model objects
+                else if (value is IEnumerable enumerable && value is not string)
                 {
                     int index = 0;
                     foreach (var item in enumerable)

Comment thread backend/FwLite/FwLiteProjectSync/CrdtFwdataProjectSyncService.cs Outdated
Comment thread backend/FwLite/FwLiteShared/Services/MiniLcmJsInvokable.cs Outdated
Comment thread backend/FwLite/MiniLcm.Tests/WriteNormalizationTests.cs Outdated
Comment thread backend/FwLite/MiniLcm/Normalization/README.md Outdated
@myieye myieye changed the title Consolidate normalization coverage tests into single test class Normalize user CRDT writes to NFD Jan 30, 2026
@myieye myieye marked this pull request as draft January 30, 2026 15:38
@myieye myieye force-pushed the claude/refactor-feature-implementation-1fFGI branch from 62b303e to 5616334 Compare March 17, 2026 09:15
@myieye
Copy link
Copy Markdown
Collaborator Author

myieye commented Mar 17, 2026

@coderabbitai full review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 17, 2026

✅ Actions performed

Full review triggered.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/FwLite/FwLiteWeb/Hubs/MiniLcmApiHubBase.cs`:
- Around line 16-22: The constructor parameter projectIdentifier must not be
force-unwrapped; change the MiniLcmApiHubBase constructor signature to accept a
non-null IProjectIdentifier (remove the nullable ?) and remove the
null-forgiving operator from the WrapWith call so WrapWith(projectIdentifier)
gets a true non-null value; update any derived classes (e.g., FwDataMiniLcmHub)
to validate/obtain a non-null IProjectIdentifier and pass it into the base
constructor (do the null checks before calling base) so that WrapWith can be
invoked safely in the base ctor without runtime exceptions from
projectIdentifier!.

In `@backend/FwLite/MiniLcm/Normalization/MiniLcmWriteApiNormalizationWrapper.cs`:
- Around line 324-340: NormalizeEntry and its helper calls (NormalizeSense,
NormalizeComplexFormComponent and StringNormalizer.Normalize) currently
dereference nested properties and collections directly which throws if fields
like LexemeForm, Senses, Components or ComplexForms are null; update these
methods to be null-safe by checking for null before normalizing (use
conditional/null-conditional operators or null-coalescing to empty collections
and only call StringNormalizer.Normalize when the source string is non-null),
return null or empty collections as appropriate instead of throwing, and apply
the same pattern to the other affected blocks around lines referenced (the other
Normalize* wrapper uses at 397-410 and 460-470) so validation can catch missing
required fields instead of producing a 500.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: b8cdb204-005f-462d-9c44-808b6c3f7328

📥 Commits

Reviewing files that changed from the base of the PR and between c520a94 and 5616334.

📒 Files selected for processing (11)
  • backend/FwLite/FwLiteShared/Services/MiniLcmJsInvokable.cs
  • backend/FwLite/FwLiteWeb/Hubs/CrdtMiniLcmApiHub.cs
  • backend/FwLite/FwLiteWeb/Hubs/FwDataMiniLcmHub.cs
  • backend/FwLite/FwLiteWeb/Hubs/MiniLcmApiHubBase.cs
  • backend/FwLite/MiniLcm.Tests/MiniLcmTestBase.cs
  • backend/FwLite/MiniLcm.Tests/QueryEntryTestsBase.cs
  • backend/FwLite/MiniLcm.Tests/WriteNormalizationTests.cs
  • backend/FwLite/MiniLcm/Normalization/MiniLcmWriteApiNormalizationWrapper.cs
  • backend/FwLite/MiniLcm/Normalization/StringNormalizer.cs
  • backend/FwLite/MiniLcm/Validators/MiniLcmValidators.cs
  • backend/FwLite/MiniLcm/Wrappers/MiniLcmWrappers.cs

Comment thread backend/FwLite/FwLiteWeb/Hubs/MiniLcmApiHubBase.cs Outdated
@myieye myieye force-pushed the claude/refactor-feature-implementation-1fFGI branch 2 times, most recently from c09b772 to 10647af Compare March 17, 2026 16:10
@myieye myieye force-pushed the claude/refactor-feature-implementation-1fFGI branch from 10647af to 1615b05 Compare May 5, 2026 15:37
myieye and others added 5 commits May 6, 2026 17:42
* Add CustomView forwarding to normalization wrapper; apply write normalization to all user-facing entry points

- MiniLcmWriteApiNormalizationWrapper: add #region CustomView with pass-through
  CreateCustomView/UpdateCustomView/DeleteCustomView (no normalization needed;
  Name is view configuration metadata, not linguistic user text)
- MiniLcmApiHubBase: make MiniLcmWriteApiNormalizationWrapperFactory non-nullable
- FwDataMiniLcmHub: inject and pass write normalization wrapper (FwData now uses
  it too, for consistent forwarding-new-objects behaviour)
- MiniLcmRoutes: apply read + write normalization wrappers via WrapWith, matching
  MiniLcmJsInvokable / MiniLcmApiHubBase pattern
- CrdtFwdataProjectSyncService: add comment documenting that write normalization
  is intentionally not applied during sync

https://claude.ai/code/session_01LN7JMAeHPcoXp7r7xVR8FH

* Introduce MiniLcmApiUserFacingWrappers to centralise wrapper ordering and rationale

- New MiniLcm/Wrappers/MiniLcmApiUserFacingWrappers.cs groups the three standard
  factories (read normalisation → write normalisation → validation) with a doc comment
  explaining why write normalisation is applied to both CRDT and FwData backends,
  and an Apply(api, project, params innerWrappers) method for callers that need extras
  (e.g. MiniLcmJsInvokable adds the notification wrapper)
- Registered via AddMiniLcmValidators()
- All user-facing entry points simplified to inject/call MiniLcmApiUserFacingWrappers:
  MiniLcmJsInvokable, MiniLcmApiHubBase, CrdtMiniLcmApiHub, FwDataMiniLcmHub, MiniLcmRoutes
- CrdtFwdataProjectSyncService: remove read normalisation wrapper (data is already
  normalised on both sides); simplify comment
- CustomView wrapper comment broadened from Name-specific to the whole object

https://claude.ai/code/session_01LN7JMAeHPcoXp7r7xVR8FH

* Fix MiniLcmApiUserFacingWrappers comment: explain transitive normalisation, drop api??this jargon

https://claude.ai/code/session_01LN7JMAeHPcoXp7r7xVR8FH

* Correct MiniLcmApiUserFacingWrappers comment: upfront normalisation covers all sub-objects

The "transitive coverage" claim was wrong: NormalizeEntry normalises the full object
tree before forwarding, so sub-objects from the sync logic are already normalised.
The real reason for uniform FwData normalisation is a simpler invariant, not coverage.

https://claude.ai/code/session_01LN7JMAeHPcoXp7r7xVR8FH

* Don't pass 'this' in normalization wrapper; note new-instance behaviour in comment

Upfront normalisation covers the full object tree, so sub-operations triggered by
Update* sync logic use data that is already normalised. Passing 'this' back into the
recursion caused a redundant second normalisation pass with no benefit. Now the wrapper
forwards 'api' as-is; the inner validation wrapper still inserts itself for sub-calls.

Also adds a note to MiniLcmApiUserFacingWrappers that the wrapper produces new instances,
so the caller's original is never mutated.

https://claude.ai/code/session_01LN7JMAeHPcoXp7r7xVR8FH

* Remove ! on projectIdentifier; null-guard Normalize* helpers against bad JSON input

Item 1: MiniLcmApiHubBase now takes non-nullable IProjectIdentifier; the null-forgiving
operator is gone. FwDataMiniLcmHub throws a clear InvalidOperationException at connection
time if context.Project is null rather than silently propagating null through !.

Item 3: NormalizeEntry/NormalizeSense/NormalizeExampleSentence now use ?? new() for
MultiString/RichMultiString fields and ?? [] for collection fields. JSON deserialization
from external input can produce null values at runtime despite non-nullable annotations
and = new() initializers; the guards prevent a 500 NRE and let the validation layer
downstream report the problem properly.

https://claude.ai/code/session_01LN7JMAeHPcoXp7r7xVR8FH

* Move write normalisation after validation; revert null guards

Validation now runs before write normalisation (read norm → validation → write norm).
Bad input is rejected with a 400 before the normaliser sees it; the normaliser can
therefore assume structurally valid data and needs no null guards of its own.

Reverts the ?? new() / ?? [] guards added in the previous commit.

https://claude.ai/code/session_01LN7JMAeHPcoXp7r7xVR8FH

* Add inline comment explaining validation-before-write-normalisation ordering

https://claude.ai/code/session_01LN7JMAeHPcoXp7r7xVR8FH

* Rename normalization wrappers for consistent naming

MiniLcmApiStringNormalizationWrapper → MiniLcmApiQueryNormalizationWrapper
MiniLcmWriteApiNormalizationWrapper  → MiniLcmApiWriteNormalizationWrapper

Both now follow the MiniLcmApi* prefix convention used across all wrapper
factories. "Query" better describes the role of the read-side wrapper
(normalises search/exemplar query parameters, not all strings).

https://claude.ai/code/session_01LN7JMAeHPcoXp7r7xVR8FH

---------

Co-authored-by: Claude <noreply@anthropic.com>
@myieye myieye marked this pull request as ready for review May 8, 2026 08:35
@myieye
Copy link
Copy Markdown
Collaborator Author

myieye commented May 8, 2026

@coderabbitai full review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 8, 2026

✅ Actions performed

Full review triggered.

Comment thread backend/FwLite/MiniLcm.Tests/Helpers/NfcTestData.cs
@myieye myieye marked this pull request as draft May 11, 2026 15:17
@myieye myieye marked this pull request as ready for review May 13, 2026 08:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

💻 FW Lite issues related to the fw lite application, not miniLcm or crdt related

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant