Skip to content

Simplify interlinear model#63

Open
alex-rawlings-yyc wants to merge 28 commits into
create-interlinearizer-projectfrom
simplify-interlinear-model
Open

Simplify interlinear model#63
alex-rawlings-yyc wants to merge 28 commits into
create-interlinearizer-projectfrom
simplify-interlinear-model

Conversation

@alex-rawlings-yyc
Copy link
Copy Markdown
Contributor

@alex-rawlings-yyc alex-rawlings-yyc commented May 12, 2026

Interlinear Model — Design Justification


Overview

The interlinear model is built around two types:

  • InterlinearProject — the persisted envelope: analysis data, alignment links, and references to Platform.Bible source/target projects.
  • ActiveProject — the runtime object: the envelope paired with reconstructed Book[] text hierarchies (never serialized).

This document explains the key design decisions and their rationale.


Decision 1: Single analysis field; optional target

InterlinearProject holds one analysis: TextAnalysis field and an optional targetProjectId?. At runtime, ActiveProject exposes source: Book[] and target?: Book[].

InterlinearProject
├─ sourceProjectId
├─ targetProjectId?  — absent for analysis-only projects (LCM, PT9)
├─ analysisLanguages : string[]
├─ analysis : TextAnalysis
└─ links?   : AlignmentLink[]

ActiveProject
├─ project : InterlinearProject
├─ source  : Book[]
└─ target? : Book[]   — present only when targetProjectId is set

Why not split analysis into sourceAnalysis / targetAnalysis? A single-source project (LCM, PT9 glossing) has no target text. Two sibling fields would leave targetAnalysis permanently empty and force callers to always write to sourceAnalysis by convention rather than by structure. A single analysis field is unambiguous regardless of whether the project is single-source or bilateral.

Why not require target? LCM and single-sided PT9 projects have no target text. Making target required would force callers to construct a placeholder — semantically dishonest and a source of drift. An optional field lets single-source projects simply omit it while bilateral projects (BT Extension) set targetProjectId and get target populated at runtime.


Decision 2: InterlinearProject + ActiveProject split (persisted vs. runtime)

The text hierarchy (Book / Segment / Token) lives only on ActiveProject, not on InterlinearProject. It is rebuilt from Platform.Bible's USJ on every load and never serialized.

Why not serialize the text layer? Platform.Bible owns the source text — duplicating it in storage would create a second copy that can drift from the canonical version. Storing only the analysis layer keeps InterlinearProject small and makes the baseline authoritative. Drift is detected via tokenSnapshot fields on TokenAnalysis and AlignmentEndpoint, which record the token's surface text at analysis time; a mismatch flips status to 'stale'.

Why a separate type rather than computed fields on InterlinearProject? ActiveProject makes the serialization boundary explicit in the type system. Callers that hold an ActiveProject know they have live text; callers that hold an InterlinearProject know they have only the persisted envelope. This prevents accidental serialization of Book[].


Things to Keep in Mind During Implementation

1. gloss / glossSenseRef co-presence: callers must enforce rendering priority

The type system allows both fields to be set simultaneously (intentional, for PT9/LCM round-trips). When rendering or exporting, gloss takes precedence over glossSenseRef. This rule is not enforced by the model — UI and export code must apply it consistently.

2. target may be absent even when targetProjectId is set

If a target Platform.Bible project is deleted after the InterlinearProject was created, ActiveProject.target will be undefined at runtime while project.targetProjectId remains set. Code that reads AlignmentLink.targetEndpoints must guard against this case.

3. updateProjectMetadata: omitting targetProjectId clears it

targetProjectId is an optional trailing parameter. Omitting it (or passing undefined) silently deletes the field — the same behavior as name and description. Callers that only intend to update other fields must explicitly forward the current targetProjectId value to avoid inadvertently unbinding the target project.

4. saveProjectAnalysis is a blind read-modify-write with no conflict detection

saveProjectAnalysis reads the current project, merges in the new analysis and links, and writes back. If two WebView sessions for the same project both load state and then save, the second write silently overwrites the first. There is no last-modified timestamp, version field, or conflict signal. This is unlikely to matter in practice on a single-user Electron desktop, but would become a real hazard if the platform ever supports concurrent sessions or cloud-synced storage.


Changes Made Since Original Description

The following changes were applied to src/types/interlinearizer.d.ts as part of the model refactor. Call sites (bookTokenizer.ts, SegmentView.tsx, PhraseBox.tsx, ContinuousView.tsx, and all tests) still reference the old names and will fail tsc --noEmit until updated in a follow-on pass.

Token.idToken.ref; cross-references updated throughout

tokenId sounded like it might be content-derived (a hash or similar). In this codebase ref already means a positional identifier (bookRef, startRef, endRef); Token.ref follows that pattern and makes "position-based, not content-derived" explicit. All downstream cross-reference fields were updated: tokenIdtokenRef, tokenIdstokenRefs.

Token.ref embeds the verse SID and the token's zero-based character offset (e.g. "GEN 1:1:0"), making refs unique within an InterlinearProject (not just within a Book). Segment.id is similarly project-wide unique — it is set to the verse SID (e.g. "GEN 1:1").

AlignmentEndpoint restructured; TokenSnapshot and MorphemeLink extracted

The old AlignmentEndpoint was a flat struct with an & ({ morphemeId?: never } | { tokenAnalysisId: string; morphemeId: string }) intersection that mixed the token reference, a surface-text snapshot, and morpheme pointers into one type. The new shape separates concerns into three interfaces:

  • AlignmentEndpoint — holds token: TokenSnapshot and morphemeLink?: MorphemeLink
  • TokenSnapshot — carries tokenRef and surfaceText (the snapshot; previously tokenSnapshot?: string on the endpoint itself, now required)
  • MorphemeLink — carries tokenAnalysisId + morphemeId; its presence/absence is the discriminant

This removes the flat-intersection invariant that tokenAnalysisId and morphemeId must appear together, and makes surfaceText non-optional (a snapshot is always required when creating an endpoint).

MorphemeMorphemeAnalysis

Renamed to be consistent with TokenAnalysis and SegmentAnalysis. MorphemeAnalysis does not extend Analysis — the unit of review is the TokenAnalysis record, not individual morphemes. An MSA tool produces a TokenAnalysis with status: 'suggested'; the morphemes array is data within that record.

PhrasePhraseAnalysis; tokenSnapshots removed

Phrase was renamed PhraseAnalysis to match the naming convention of TokenAnalysis and SegmentAnalysis. The tokenSnapshots?: [string, ...string[]] parallel-array field — which had to be kept in sync with tokenIds by convention with no typed enforcement — was dropped. Phrase drift detection remains an open question (see Open Questions below).

Analysis base interface added

SegmentAnalysis, TokenAnalysis, and PhraseAnalysis all carried duplicated id, status, confidence?, producer?, sourceUser? fields. These are extracted into a shared Analysis base that all three extend.

typeinterface for TokenAnalysis, PhraseAnalysis, AlignmentEndpoint

These were declared as type aliases but described object shapes with no union or intersection logic. Converted to interface for consistency with the rest of the model.

InterlinearProject.links is now optional

links was previously AlignmentLink[] (always present). It is now links?: AlignmentLink[]. For analysis-only projects (LCM, PT9) that have no targetProjectId, the field is absent rather than an empty array. This makes analysis-only projects structurally distinct from bilateral alignment projects, though the compiler does not yet enforce the targetProjectId ↔ links co-presence invariant (see Open Questions below).


Open Questions for Team Discussion

1. Phrase staleness

PhraseAnalysis currently has no drift-detection mechanism. The removed tokenSnapshots?: [string, ...string[]] parallel-array was fragile — it had to be kept in sync with tokenIds by convention and had no typed enforcement. When a member token's surface text changes the phrase has no way to detect it independently.

Proposed fix: add tokens: [TokenSnapshot, ...TokenSnapshot[]] to PhraseAnalysis, reusing the existing TokenSnapshot type. This replaces the parallel-array with a paired structure that bundles the token reference and surface text together — exactly the same pattern AlignmentEndpoint.token already uses. Staleness detection then works uniformly: compare each TokenSnapshot.surfaceText against the current Token.surfaceText and flip status to 'stale' on any mismatch.

Alternative: derive phrase staleness from member TokenAnalysis.status. This is fragile because a phrase token may have no TokenAnalysis at all, and is not recommended.

2. Project mutability and links placement

Two tightly coupled questions that need a decision before the type can be fully locked down.

Can a project transition from analysis-only to bilateral? updateProjectMetadata currently accepts an optional targetProjectId, so the answer today is yes. The question is whether that should be permitted — gaining a target is a significant structural change (it creates a new field category, links, that did not previously exist on the project) and may warrant creating a new project instead. If targetProjectId is removed from updateProjectMetadata, projects become immutable in type at creation.

Should the targetProjectId ↔ links invariant be enforced by the type system? InterlinearProject.links is only meaningful when targetProjectId is set, but the current type does not enforce this co-presence. A discriminated union — AnalysisProject | AlignmentProject — would let the compiler reject links on a project with no target and reject a missing links on a project that has one. This removes an entire category of runtime errors at the cost of a slightly more complex type.

These two questions are tightly coupled:

  • If projects are immutable at creation, the discriminated union is clean: createProject picks the right variant at call time, and a project is either always an AnalysisProject or always an AlignmentProject. No migration path is needed.
  • If targetProjectId can be added post-creation, the union needs an explicit conversion step. The storage layer must handle reading an AnalysisProject and writing back an AlignmentProject atomically, and any in-memory holders of the old reference must be invalidated. This is feasible on a single-user desktop but adds non-trivial ceremony.

The recommended path is immutability at creation, but this is a product decision: if the UX calls for letting users attach a target project after the fact, the type must accommodate it.


This change is Reviewable

@alex-rawlings-yyc alex-rawlings-yyc added 🟪Idea Idea-priority PR: can be closed... up next Auto adds an issue to the PT Lexical Extensions project labels May 12, 2026
@alex-rawlings-yyc alex-rawlings-yyc self-assigned this May 12, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 12, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d09ab507-8455-4ecb-abfc-4f5ce756393a

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch simplify-interlinear-model

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@imnasnainaec imnasnainaec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@imnasnainaec reviewed 10 files and all commit messages, and made 2 comments.
Reviewable status: 10 of 17 files reviewed, 1 unresolved discussion (waiting on alex-rawlings-yyc and jasonleenaylor).


src/components/CreateProjectModal.tsx line 126 at r1 (raw file):

        </label>
        <input
          id="analysis-language"

🌱 We may want to allow multiple langs (comma-separated?) on creation.


src/components/ProjectMetadataModal.tsx line 95 at r1 (raw file):

    const newLanguage = editLanguage.trim();
    try {
      const updatedProjectJson = await papi.commands.sendCommand(

Doesn't this fall into the known trap of inadvertently clearing the target?

Copy link
Copy Markdown
Contributor Author

@alex-rawlings-yyc alex-rawlings-yyc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alex-rawlings-yyc made 2 comments.
Reviewable status: 10 of 17 files reviewed, 1 unresolved discussion (waiting on imnasnainaec and jasonleenaylor).


src/components/CreateProjectModal.tsx line 126 at r1 (raw file):

Previously, imnasnainaec (D. Ror.) wrote…

🌱 We may want to allow multiple langs (comma-separated?) on creation.

I haven't gotten around to doing a lot of adjustment of how the model is handled yet. My main goal with this PR was to present some changes to it and make changes to the code accordingly if we agree to change it


src/components/ProjectMetadataModal.tsx line 95 at r1 (raw file):

Previously, imnasnainaec (D. Ror.) wrote…

Doesn't this fall into the known trap of inadvertently clearing the target?

Oversight. I haven't gone through the code very closely yet as I just want to focus on the model itself for now.

@alex-rawlings-yyc alex-rawlings-yyc force-pushed the simplify-interlinear-model branch from 6884c8b to 50d0e8b Compare May 13, 2026 16:03
@alex-rawlings-yyc alex-rawlings-yyc force-pushed the create-interlinearizer-project branch from 253c32e to f5083ae Compare May 13, 2026 16:13
@alex-rawlings-yyc alex-rawlings-yyc force-pushed the simplify-interlinear-model branch from 50d0e8b to 12707a8 Compare May 13, 2026 16:13
Copy link
Copy Markdown
Contributor Author

@alex-rawlings-yyc alex-rawlings-yyc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alex-rawlings-yyc reviewed 18 files and all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on imnasnainaec and jasonleenaylor).

@alex-rawlings-yyc alex-rawlings-yyc force-pushed the simplify-interlinear-model branch 2 times, most recently from a9576db to 9d7c847 Compare May 13, 2026 19:38
Copy link
Copy Markdown
Contributor Author

@alex-rawlings-yyc alex-rawlings-yyc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alex-rawlings-yyc reviewed 12 files and all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on imnasnainaec and jasonleenaylor).

Copy link
Copy Markdown
Contributor Author

@alex-rawlings-yyc alex-rawlings-yyc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alex-rawlings-yyc reviewed 6 files and all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on imnasnainaec and jasonleenaylor).

Copy link
Copy Markdown
Contributor Author

@alex-rawlings-yyc alex-rawlings-yyc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alex-rawlings-yyc reviewed 2 files and all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on imnasnainaec and jasonleenaylor).

Copy link
Copy Markdown
Contributor Author

@alex-rawlings-yyc alex-rawlings-yyc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alex-rawlings-yyc reviewed 2 files and all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on imnasnainaec and jasonleenaylor).

@alex-rawlings-yyc alex-rawlings-yyc force-pushed the simplify-interlinear-model branch from cda9f96 to 9c494fc Compare May 13, 2026 22:56
Copy link
Copy Markdown
Contributor Author

@alex-rawlings-yyc alex-rawlings-yyc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alex-rawlings-yyc reviewed 11 files and all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on imnasnainaec and jasonleenaylor).

@alex-rawlings-yyc alex-rawlings-yyc force-pushed the simplify-interlinear-model branch from 271581e to 69c889d Compare May 13, 2026 23:18
Copy link
Copy Markdown
Contributor Author

@alex-rawlings-yyc alex-rawlings-yyc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alex-rawlings-yyc reviewed 2 files and all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on imnasnainaec and jasonleenaylor).

@alex-rawlings-yyc alex-rawlings-yyc force-pushed the create-interlinearizer-project branch from a6816b4 to a17e5fb Compare May 14, 2026 19:43
@alex-rawlings-yyc alex-rawlings-yyc force-pushed the simplify-interlinear-model branch from d40f053 to b3db9a2 Compare May 14, 2026 19:47
@alex-rawlings-yyc alex-rawlings-yyc force-pushed the create-interlinearizer-project branch from a17e5fb to 9a38c5d Compare May 14, 2026 21:10
…ect and add associated test, fix schema inconsistency
…update mocks to match real API signatures

- If the user picks the same project for both source and target roles, a warning notification is shown and the target picker re-opens until distinct projects are chosen or cancelled.
- Includes new tests for this flow, updated platform-bible-react/utils mocks to match current API shapes, a new localized error string, and minor test/parser cleanups.
Introduces three modal components (CreateProjectModal, ProjectMetadataModal, SelectInterlinearProjectModal) with supporting menu contributions, localized strings, command registrations in main.ts, storage helpers in projectStorage.ts, and updated type definitions. Adds full test coverage for all new components and expands existing test suites to cover the new flows.
…oss/senseRef constraint

- Replace `<div>` containers with `<dialog open aria-labelledby="…">` in `CreateProjectModal`, `ProjectMetadataModal`, and `SelectInterlinearProjectModal` for proper accessibility semantics.
- In `ProjectMetadataModal.handleSave`, return early when `updateProjectMetadata` resolves with a falsy value (mirrors the existing guard in `CreateProjectModal.handleCreate`).
- In `interlinearizer.d.ts`, replace the discriminated union on `Token` and `Phrase` that forbade setting both `gloss`/`glossSenseRef` (or `gloss`/`senseRef`) with optional fields, reflecting the updated rule that `gloss` serves as a local override when both are present.
- Add a test for the falsy-return early-exit path in `ProjectMetadataModal`, and add the missing `waitFor` before negative assertions in the `CreateProjectModal` falsy-return test.
alex-rawlings-yyc and others added 23 commits May 14, 2026 15:11
Introduces three modal components (CreateProjectModal, ProjectMetadataModal, SelectInterlinearProjectModal) with supporting menu contributions, localized strings, command registrations in main.ts, storage helpers in projectStorage.ts, and updated type definitions. Adds full test coverage for all new components and expands existing test suites to cover the new flows.
…xt, add ActiveProject

- Removes the unused bilingual container types (`InterlinearAlignment`, `InterlinearText`) whose `target` side had no role in the current single-source annotation workflow.
- Introduces `ActiveProject` as the runtime pairing of a persisted `InterlinearProject` envelope with its reconstructed `Book[]` source text.
- Adds `saveProjectAnalysis` to `projectStorage` as the designated save path for analysis/link mutations.
…0% coverage

- InterlinearProject: replace analysisWritingSystem: string with analysisLanguages: string[] (LCM/PT9 carry multiple analysis languages per project; a single string was lossy) and add optional targetProjectId so BT Extension's two-sided Translation can be represented without dropping the target text layer
- ActiveProject: add target?: Book[] alongside source so AlignmentLink.targetEndpoints resolve to real tokens at runtime
- TokenAnalysis / Phrase: remove gloss ↔ senseRef discriminated union; both fields are now independently optional and may coexist — local gloss takes precedence over the lexicon-derived value when both are present
- TokenAnalysis: add excluded?: boolean to carry PT9's Cluster.Excluded flag projectStorage: update createProject / updateProjectMetadata signatures and logic to match; updateProjectMetadata clears targetProjectId when called without it (mirrors name/description clear semantics)
- papi-backend mock: add papi.storage (readUserData / writeUserData / deleteUserData) which was missing entirely
- Remove two dead branches: parseStrictNumber's n >= 0 guard (the ^\d+$ regex makes negative results impossible) and stableStringify's undefined guard (never reachable via the typed call site)
- Add tests for duplicate verse SID throw and new storage behaviour; all 268 tests pass at 100% branch coverage
Cluster.Excluded is a deprecated PT9 tag; carrying it into the model adds noise with no consumer value.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

🟪Idea Idea-priority PR: can be closed... up next Auto adds an issue to the PT Lexical Extensions project

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants