Simplify interlinear model by alex-rawlings-yyc · Pull Request #63 · sillsdev/interlinearizer-extension

alex-rawlings-yyc · 2026-05-12T23:10:05Z

Interlinear Model — Design Justification

Overview

The interlinear model is built around two types:

InterlinearProject — the persisted envelope: analysis data, alignment links, and references to Platform.Bible source/target projects.
ActiveProject — the runtime object: the envelope paired with reconstructed Book[] text hierarchies (never serialized).

This document explains the key design decisions and their rationale.

Decision 1: Single `analysis` field; optional `target`

InterlinearProject holds one analysis: TextAnalysis field and an optional targetProjectId?. At runtime, ActiveProject exposes source: Book[] and target?: Book[].

InterlinearProject
├─ sourceProjectId
├─ targetProjectId?  — absent for analysis-only projects (LCM, PT9)
├─ analysisLanguages : string[]
├─ analysis : TextAnalysis
└─ links?   : AlignmentLink[]

ActiveProject
├─ project : InterlinearProject
├─ source  : Book[]
└─ target? : Book[]   — present only when targetProjectId is set

Why not split analysis into sourceAnalysis / targetAnalysis? A single-source project (LCM, PT9 glossing) has no target text. Two sibling fields would leave targetAnalysis permanently empty and force callers to always write to sourceAnalysis by convention rather than by structure. A single analysis field is unambiguous regardless of whether the project is single-source or bilateral.

Why not require target? LCM and single-sided PT9 projects have no target text. Making target required would force callers to construct a placeholder — semantically dishonest and a source of drift. An optional field lets single-source projects simply omit it while bilateral projects (BT Extension) set targetProjectId and get target populated at runtime.

Decision 2: `InterlinearProject` + `ActiveProject` split (persisted vs. runtime)

The text hierarchy (Book / Segment / Token) lives only on ActiveProject, not on InterlinearProject. It is rebuilt from Platform.Bible's USJ on every load and never serialized.

Why not serialize the text layer? Platform.Bible owns the source text — duplicating it in storage would create a second copy that can drift from the canonical version. Storing only the analysis layer keeps InterlinearProject small and makes the baseline authoritative. Drift is detected via tokenSnapshot fields on TokenAnalysis and AlignmentEndpoint, which record the token's surface text at analysis time; a mismatch flips status to 'stale'.

Why a separate type rather than computed fields on InterlinearProject? ActiveProject makes the serialization boundary explicit in the type system. Callers that hold an ActiveProject know they have live text; callers that hold an InterlinearProject know they have only the persisted envelope. This prevents accidental serialization of Book[].

Things to Keep in Mind During Implementation

1. `gloss` / `glossSenseRef` co-presence: callers must enforce rendering priority

The type system allows both fields to be set simultaneously (intentional, for PT9/LCM round-trips). When rendering or exporting, gloss takes precedence over glossSenseRef. This rule is not enforced by the model — UI and export code must apply it consistently.

2. `target` may be absent even when `targetProjectId` is set

If a target Platform.Bible project is deleted after the InterlinearProject was created, ActiveProject.target will be undefined at runtime while project.targetProjectId remains set. Code that reads AlignmentLink.targetEndpoints must guard against this case.

3. `updateProjectMetadata`: omitting `targetProjectId` clears it

targetProjectId is an optional trailing parameter. Omitting it (or passing undefined) silently deletes the field — the same behavior as name and description. Callers that only intend to update other fields must explicitly forward the current targetProjectId value to avoid inadvertently unbinding the target project.

4. `saveProjectAnalysis` is a blind read-modify-write with no conflict detection

saveProjectAnalysis reads the current project, merges in the new analysis and links, and writes back. If two WebView sessions for the same project both load state and then save, the second write silently overwrites the first. There is no last-modified timestamp, version field, or conflict signal. This is unlikely to matter in practice on a single-user Electron desktop, but would become a real hazard if the platform ever supports concurrent sessions or cloud-synced storage.

Changes Made Since Original Description

The following changes were applied to src/types/interlinearizer.d.ts as part of the model refactor. Call sites (bookTokenizer.ts, SegmentView.tsx, PhraseBox.tsx, ContinuousView.tsx, and all tests) still reference the old names and will fail tsc --noEmit until updated in a follow-on pass.

`Token.id` → `Token.ref`; cross-references updated throughout

tokenId sounded like it might be content-derived (a hash or similar). In this codebase ref already means a positional identifier (bookRef, startRef, endRef); Token.ref follows that pattern and makes "position-based, not content-derived" explicit. All downstream cross-reference fields were updated: tokenId → tokenRef, tokenIds → tokenRefs.

Token.ref embeds the verse SID and the token's zero-based character offset (e.g. "GEN 1:1:0"), making refs unique within an InterlinearProject (not just within a Book). Segment.id is similarly project-wide unique — it is set to the verse SID (e.g. "GEN 1:1").

`AlignmentEndpoint` restructured; `TokenSnapshot` and `MorphemeLink` extracted

The old AlignmentEndpoint was a flat struct with an & ({ morphemeId?: never } | { tokenAnalysisId: string; morphemeId: string }) intersection that mixed the token reference, a surface-text snapshot, and morpheme pointers into one type. The new shape separates concerns into three interfaces:

AlignmentEndpoint — holds token: TokenSnapshot and morphemeLink?: MorphemeLink
TokenSnapshot — carries tokenRef and surfaceText (the snapshot; previously tokenSnapshot?: string on the endpoint itself, now required)
MorphemeLink — carries tokenAnalysisId + morphemeId; its presence/absence is the discriminant

This removes the flat-intersection invariant that tokenAnalysisId and morphemeId must appear together, and makes surfaceText non-optional (a snapshot is always required when creating an endpoint).

`Morpheme` → `MorphemeAnalysis`

Renamed to be consistent with TokenAnalysis and SegmentAnalysis. MorphemeAnalysis does not extend Analysis — the unit of review is the TokenAnalysis record, not individual morphemes. An MSA tool produces a TokenAnalysis with status: 'suggested'; the morphemes array is data within that record.

`Phrase` → `PhraseAnalysis`; `tokenSnapshots` removed

Phrase was renamed PhraseAnalysis to match the naming convention of TokenAnalysis and SegmentAnalysis. The tokenSnapshots?: [string, ...string[]] parallel-array field — which had to be kept in sync with tokenIds by convention with no typed enforcement — was dropped. Phrase drift detection remains an open question (see Open Questions below).

`Analysis` base interface added

SegmentAnalysis, TokenAnalysis, and PhraseAnalysis all carried duplicated id, status, confidence?, producer?, sourceUser? fields. These are extracted into a shared Analysis base that all three extend.

`type` → `interface` for `TokenAnalysis`, `PhraseAnalysis`, `AlignmentEndpoint`

These were declared as type aliases but described object shapes with no union or intersection logic. Converted to interface for consistency with the rest of the model.

`InterlinearProject.links` is now optional

links was previously AlignmentLink[] (always present). It is now links?: AlignmentLink[]. For analysis-only projects (LCM, PT9) that have no targetProjectId, the field is absent rather than an empty array. This makes analysis-only projects structurally distinct from bilateral alignment projects, though the compiler does not yet enforce the targetProjectId ↔ links co-presence invariant (see Open Questions below).

Open Questions for Team Discussion

1. Phrase staleness

PhraseAnalysis currently has no drift-detection mechanism. The removed tokenSnapshots?: [string, ...string[]] parallel-array was fragile — it had to be kept in sync with tokenIds by convention and had no typed enforcement. When a member token's surface text changes the phrase has no way to detect it independently.

Proposed fix: add tokens: [TokenSnapshot, ...TokenSnapshot[]] to PhraseAnalysis, reusing the existing TokenSnapshot type. This replaces the parallel-array with a paired structure that bundles the token reference and surface text together — exactly the same pattern AlignmentEndpoint.token already uses. Staleness detection then works uniformly: compare each TokenSnapshot.surfaceText against the current Token.surfaceText and flip status to 'stale' on any mismatch.

Alternative: derive phrase staleness from member TokenAnalysis.status. This is fragile because a phrase token may have no TokenAnalysis at all, and is not recommended.

2. Project mutability and `links` placement

Two tightly coupled questions that need a decision before the type can be fully locked down.

Can a project transition from analysis-only to bilateral? updateProjectMetadata currently accepts an optional targetProjectId, so the answer today is yes. The question is whether that should be permitted — gaining a target is a significant structural change (it creates a new field category, links, that did not previously exist on the project) and may warrant creating a new project instead. If targetProjectId is removed from updateProjectMetadata, projects become immutable in type at creation.

Should the targetProjectId ↔ links invariant be enforced by the type system? InterlinearProject.links is only meaningful when targetProjectId is set, but the current type does not enforce this co-presence. A discriminated union — AnalysisProject | AlignmentProject — would let the compiler reject links on a project with no target and reject a missing links on a project that has one. This removes an entire category of runtime errors at the cost of a slightly more complex type.

These two questions are tightly coupled:

If projects are immutable at creation, the discriminated union is clean: createProject picks the right variant at call time, and a project is either always an AnalysisProject or always an AlignmentProject. No migration path is needed.
If targetProjectId can be added post-creation, the union needs an explicit conversion step. The storage layer must handle reading an AnalysisProject and writing back an AlignmentProject atomically, and any in-memory holders of the old reference must be invalidated. This is feasible on a single-user desktop but adds non-trivial ceremony.

The recommended path is immutability at creation, but this is a product decision: if the UX calls for letting users attach a target project after the fact, the type must accommodate it.

This change is

coderabbitai · 2026-05-12T23:10:12Z

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d09ab507-8455-4ecb-abfc-4f5ce756393a

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch simplify-interlinear-model

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

imnasnainaec

@imnasnainaec reviewed 10 files and all commit messages, and made 2 comments.
Reviewable status: 10 of 17 files reviewed, 1 unresolved discussion (waiting on alex-rawlings-yyc and jasonleenaylor).

src/components/CreateProjectModal.tsx line 126 at r1 (raw file):

        </label>
        <input
          id="analysis-language"

🌱 We may want to allow multiple langs (comma-separated?) on creation.

src/components/ProjectMetadataModal.tsx line 95 at r1 (raw file):

    const newLanguage = editLanguage.trim();
    try {
      const updatedProjectJson = await papi.commands.sendCommand(

Doesn't this fall into the known trap of inadvertently clearing the target?

alex-rawlings-yyc

@alex-rawlings-yyc made 2 comments.
Reviewable status: 10 of 17 files reviewed, 1 unresolved discussion (waiting on imnasnainaec and jasonleenaylor).

src/components/CreateProjectModal.tsx line 126 at r1 (raw file):

Previously, imnasnainaec (D. Ror.) wrote…

🌱 We may want to allow multiple langs (comma-separated?) on creation.

I haven't gotten around to doing a lot of adjustment of how the model is handled yet. My main goal with this PR was to present some changes to it and make changes to the code accordingly if we agree to change it

src/components/ProjectMetadataModal.tsx line 95 at r1 (raw file):

Previously, imnasnainaec (D. Ror.) wrote…

Doesn't this fall into the known trap of inadvertently clearing the target?

Oversight. I haven't gone through the code very closely yet as I just want to focus on the model itself for now.

alex-rawlings-yyc

@alex-rawlings-yyc reviewed 18 files and all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on imnasnainaec and jasonleenaylor).

alex-rawlings-yyc

@alex-rawlings-yyc reviewed 12 files and all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on imnasnainaec and jasonleenaylor).

alex-rawlings-yyc

@alex-rawlings-yyc reviewed 6 files and all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on imnasnainaec and jasonleenaylor).

alex-rawlings-yyc

@alex-rawlings-yyc reviewed 2 files and all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on imnasnainaec and jasonleenaylor).

alex-rawlings-yyc

@alex-rawlings-yyc reviewed 2 files and all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on imnasnainaec and jasonleenaylor).

alex-rawlings-yyc

@alex-rawlings-yyc reviewed 11 files and all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on imnasnainaec and jasonleenaylor).

alex-rawlings-yyc

@alex-rawlings-yyc reviewed 2 files and all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on imnasnainaec and jasonleenaylor).

…ect and add associated test, fix schema inconsistency

…update mocks to match real API signatures - If the user picks the same project for both source and target roles, a warning notification is shown and the target picker re-opens until distinct projects are chosen or cancelled. - Includes new tests for this flow, updated platform-bible-react/utils mocks to match current API shapes, a new localized error string, and minor test/parser cleanups.

Introduces three modal components (CreateProjectModal, ProjectMetadataModal, SelectInterlinearProjectModal) with supporting menu contributions, localized strings, command registrations in main.ts, storage helpers in projectStorage.ts, and updated type definitions. Adds full test coverage for all new components and expands existing test suites to cover the new flows.

…oss/senseRef constraint - Replace `<div>` containers with `<dialog open aria-labelledby="…">` in `CreateProjectModal`, `ProjectMetadataModal`, and `SelectInterlinearProjectModal` for proper accessibility semantics. - In `ProjectMetadataModal.handleSave`, return early when `updateProjectMetadata` resolves with a falsy value (mirrors the existing guard in `CreateProjectModal.handleCreate`). - In `interlinearizer.d.ts`, replace the discriminated union on `Token` and `Phrase` that forbade setting both `gloss`/`glossSenseRef` (or `gloss`/`senseRef`) with optional fields, reflecting the updated rule that `gloss` serves as a local override when both are present. - Add a test for the falsy-return early-exit path in `ProjectMetadataModal`, and add the missing `waitFor` before negative assertions in the `CreateProjectModal` falsy-return test.

…zero-check in ContinuousView explicit

Introduces three modal components (CreateProjectModal, ProjectMetadataModal, SelectInterlinearProjectModal) with supporting menu contributions, localized strings, command registrations in main.ts, storage helpers in projectStorage.ts, and updated type definitions. Adds full test coverage for all new components and expands existing test suites to cover the new flows.

…zero-check in ContinuousView explicit

…xt, add ActiveProject - Removes the unused bilingual container types (`InterlinearAlignment`, `InterlinearText`) whose `target` side had no role in the current single-source annotation workflow. - Introduces `ActiveProject` as the runtime pairing of a persisted `InterlinearProject` envelope with its reconstructed `Book[]` source text. - Adds `saveProjectAnalysis` to `projectStorage` as the designated save path for analysis/link mutations.

…0% coverage - InterlinearProject: replace analysisWritingSystem: string with analysisLanguages: string[] (LCM/PT9 carry multiple analysis languages per project; a single string was lossy) and add optional targetProjectId so BT Extension's two-sided Translation can be represented without dropping the target text layer - ActiveProject: add target?: Book[] alongside source so AlignmentLink.targetEndpoints resolve to real tokens at runtime - TokenAnalysis / Phrase: remove gloss ↔ senseRef discriminated union; both fields are now independently optional and may coexist — local gloss takes precedence over the lexicon-derived value when both are present - TokenAnalysis: add excluded?: boolean to carry PT9's Cluster.Excluded flag projectStorage: update createProject / updateProjectMetadata signatures and logic to match; updateProjectMetadata clears targetProjectId when called without it (mirrors name/description clear semantics) - papi-backend mock: add papi.storage (readUserData / writeUserData / deleteUserData) which was missing entirely - Remove two dead branches: parseStrictNumber's n >= 0 guard (the ^\d+$ regex makes negative results impossible) and stableStringify's undefined guard (never reachable via the typed call site) - Add tests for duplicate verse SID throw and new storage behaviour; all 268 tests pass at 100% branch coverage

Cluster.Excluded is a deprecated PT9 tag; carrying it into the model adds noise with no consumer value.

…error handling

…ag from PT9 XML docs

…s, improve error handling

alex-rawlings-yyc added 🟪Idea Idea-priority PR: can be closed... up next Auto adds an issue to the PT Lexical Extensions project labels May 12, 2026

alex-rawlings-yyc requested a review from imnasnainaec May 12, 2026 23:10

alex-rawlings-yyc self-assigned this May 12, 2026

alex-rawlings-yyc requested a review from jasonleenaylor May 12, 2026 23:10

imnasnainaec requested changes May 13, 2026

View reviewed changes