Skip to content

refactor(core): add shared runtime orchestration#1002

Open
phernandez wants to merge 442 commits into
mainfrom
codex/repository-explicit-sessions
Open

refactor(core): add shared runtime orchestration#1002
phernandez wants to merge 442 commits into
mainfrom
codex/repository-explicit-sessions

Conversation

@phernandez

@phernandez phernandez commented Jun 14, 2026

Copy link
Copy Markdown
Member

Why

This PR started as the explicit-session repository refactor and grew into the core half of the local/cloud runtime parity work. The goal is for basic-memory to own portable orchestration while cloud and local runtimes provide their own dependency adapters.

Paired cloud PR: basicmachines-co/basic-memory-cloud#1345.
Cloud parity document: basic-memory-cloud/docs/local-cloud-runtime-parity.md.

Closes #750.

What Changed

  • Convert repositories and call sites to explicit caller-owned sessions so cloud can provide tenant sessions without implicit session-maker state.
  • Add shared runtime contracts in basic_memory.runtime for storage events, note-content payloads, job identity/status, workflow metadata, materialization, and project/file indexing values.
  • Add basic_memory.cloud facades for note-content reads/writes, directory deletes, project deletes, and materialization so cloud/local APIs can share route-facing orchestration.
  • Add basic_memory.index as the event-based indexing orchestration layer for storage events, local filesystem watching, project fanout, and inline local processing.
  • Expand basic_memory.indexing with provider-neutral runners for accepted-note mutation/write flows, file indexing, project indexing, directory deletes, relation/vector follow-ups, and progress/result accounting.
  • Move legacy basic_memory.sync code and tests out of the active package/test tree so it is reference material only while the new index path becomes the clear file-indexing path.
  • Update API, MCP, CLI, dependency composition, and tests to use the shared runtime/index/gateway surfaces.

Public Surface

  • Local Basic Memory now has the same core note-content and event-indexing shapes that cloud uses through adapters.
  • Snapshots, file versions/history, tenant provisioning, Tigris bucket snapshots, and hosted admin behavior remain cloud-only and are not implemented in core.
  • The CLI/API route shapes have changed where needed to align with accepted note-content and async indexing behavior; the paired cloud branch consumes those new shapes.

Verification

Not rerun during PR publication. The branch has been validated incrementally during the refactor with focused core tests, just fast-check, just package-check, and paired cloud checks. Final broad core and cloud checks should run before merge.

Review Notes

This PR should be reviewed with the paired cloud PR because the ownership split is intentional: core owns portable orchestration; cloud owns hosted adapters and tenant/runtime wiring.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: bb65be817b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/basic_memory/services/entity_service.py Outdated
Comment thread src/basic_memory/sync/sync_service.py Outdated
@phernandez phernandez changed the title refactor(core): use explicit sessions in repositories refactor(core): add shared runtime orchestration Jun 23, 2026

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f338a6d382

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/basic_memory/indexing/project_index_workflow.py Outdated
Comment thread src/basic_memory/index/local_watch.py

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 703206a432

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/basic_memory/cloud/note_content_materialization.py Outdated
@phernandez phernandez force-pushed the codex/repository-explicit-sessions branch from 703206a to 07a8882 Compare June 23, 2026 04:47
@phernandez

Copy link
Copy Markdown
Member Author

@codex review

1 similar comment
@phernandez

Copy link
Copy Markdown
Member Author

@codex review

@chatgpt-codex-connector

Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, add credits to your account and enable them for code reviews in your settings.

@chatgpt-codex-connector

Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 58e93ca7c8

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/basic_memory/runtime/contracts.py Outdated
Comment thread src/basic_memory/api/v2/routers/knowledge_router.py
@chatgpt-codex-connector

Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

3 similar comments
@chatgpt-codex-connector

Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

@chatgpt-codex-connector

Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

@chatgpt-codex-connector

Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

@phernandez

Copy link
Copy Markdown
Member Author

@codex review

@chatgpt-codex-connector

Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, add credits to your account and enable them for code reviews in your settings.

@phernandez phernandez force-pushed the codex/repository-explicit-sessions branch from c3c5a7b to bd4b25c Compare June 23, 2026 14:40

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: bd4b25ca3f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/basic_memory/runtime/contracts.py Outdated
Comment thread src/basic_memory/repository/accepted_note_search_repository.py Outdated
Comment thread src/basic_memory/indexing/project_index_workflow.py Outdated
Comment thread src/basic_memory/indexing/accepted_note_write_runner.py
Comment thread src/basic_memory/indexing/directory_delete_runner.py
@phernandez

Copy link
Copy Markdown
Member Author

Addressed the five latest Codex P2 findings in 07c25b2:

  • Deleted-note delete responses now fall back to file_path when permalink is missing, while external_id/title remain required.
  • SQLite accepted-note search inserts now store metadata as JSON text; Postgres keeps the JSONB cast.
  • Inline project indexing now consumes batch vector_targets through the existing embedding batch planner when semantic search is enabled.
  • Accepted-note deletes clear semantic vectors before deleting the entity.
  • Directory delete batches clear semantic vectors before deleting entity rows.

Regression coverage was added for each bug. Verification:

  • just pr-1002-feedback-test
  • just local-project-index-test
  • just fast-check
  • just doctor

@codex review

@chatgpt-codex-connector

Copy link
Copy Markdown

Codex Review: Didn't find any major issues. More of your lovely PRs please.

Reviewed commit: 07c25b2144

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Signed-off-by: phernandez <paul@basicmachines.co>
Signed-off-by: phernandez <paul@basicmachines.co>
Signed-off-by: phernandez <paul@basicmachines.co>
Signed-off-by: phernandez <paul@basicmachines.co>
Signed-off-by: phernandez <paul@basicmachines.co>
Signed-off-by: phernandez <paul@basicmachines.co>
Signed-off-by: phernandez <paul@basicmachines.co>
Signed-off-by: phernandez <paul@basicmachines.co>
Signed-off-by: phernandez <paul@basicmachines.co>
Signed-off-by: phernandez <paul@basicmachines.co>
The change-routing loop matched project roots in arbitrary repository order and
broke on the first containing root, so a parent listed before a nested child stole
the child's file events and indexed them into the wrong project. Match deepest
(longest root) first so the most specific project wins. Test covers a nested
child with the parent listed first. Addresses Codex on #1002.

Signed-off-by: phernandez <paul@basicmachines.co>
The directory-delete enqueuer ran the guarded file delete inline and discarded
its RuntimeFileDeleteResult, so finish_directory_delete_acceptance reported every
DB-deleted row as a successful file delete — even when the guard intentionally
left a file on disk (checksum changed), which then reappears on the next index.
Return the result up through enqueue_directory_file_delete_jobs, aggregate skipped
files in the accepted result, and report them as failed_deletes with reasons.
Cloud (queue-based) enqueuers return None and stay deferred. Addresses Codex on #1002.

Signed-off-by: phernandez <paul@basicmachines.co>
The relation-resolution scheduler coalesced any write arriving while a pass was
pending into that pass. A write that committed after the scan read the unresolved
rows (but before the pending flag cleared) was dropped, leaving its forward
references unresolved until an unrelated later trigger. Track a dirty bit, cleared
once the scan starts, and re-arm exactly one follow-up pass when a write lands
during the scan. Bursts still coalesce to one pass. Addresses Codex on #1002.

Signed-off-by: phernandez <paul@basicmachines.co>
Local writes accept and materialize (write the markdown file + index) off the
request path via the bounded worker pool. One-shot clients (bm tool write-note,
importers) returned right after enqueuing, and run_with_cleanup then shut the DB
and event loop down with no drain, so the source-of-truth file could be lost even
though the API reported the note accepted. Add drain_pending_materializations()
(awaits the pool) and call it in run_with_cleanup before db.shutdown_db. Long-lived
servers keep the loop alive and are unaffected. Addresses Codex on #1002.

Signed-off-by: phernandez <paul@basicmachines.co>

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1394bfce55

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/basic_memory/index/local_project.py Outdated
The router scheduled relation resolution right after the accept enqueued the
deferred materialization. Under write load the 0.5s debounce could scan before the
queued index_file inserted the new note's rows, so inbound forward references to it
stayed unresolved until an unrelated later write. Inject the relation scheduler
into the local materializer and schedule a pass after index_file completes; the
scheduler (with the dirty-bit re-arm) coalesces it with the eager router pass. The
router call is kept for cloud parity. Addresses Codex on #1002.

Signed-off-by: phernandez <paul@basicmachines.co>
@phernandez

Copy link
Copy Markdown
Member Author

Worked through the remaining Codex threads — all twelve fixed, each with a regression test:

Finding Fix Commit
Prune ignored dirs during scans os.walk + dirnames prune 6009070d
Don't follow symlinked files is_symlink() skip (no-follow) 6009070d
Route nested watch events to deepest project deepest-root-first matching 52ec66f9
Drain materializations before one-shot exit drain_pending_materializations() in run_with_cleanup 936d02b0
Respect search-only reindex thread embeddings into the project-index request 2886fab2
Mark all storage write failures as failed broaden except to (FileOperationError, FileError, OSError) 9c635bee
Skip API file watching in cloud mode mirror the MCP container's cloud guard 4b687e7b
Treat skipped guarded deletes as failures return + aggregate RuntimeFileDeleteResult, report skips 1394bfce
Preserve legacy sync_delay extend the before-validator (sync_delay -> index_delay) 56444e04
Close resource DB session before file reads short scoped session, copy fields before file I/O d1f255a1
Run relation repair after deferred indexing materializer schedules resolution after index_file 39a3ac28
Remember relation writes during a pending pass dirty-bit re-arm in the resolution scheduler e29a92d4

(Plus the four earlier threads resolved as already-addressed / not-applicable.) Resolving these threads now.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 39a3ac28b7

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/basic_memory/indexing/accepted_note_mutation_runner.py
The os.walk-based scan returned an empty snapshot when the project root was
missing/unmounted (os.walk swallows the error), so the project-index coordinator
classified every indexed path as deleted and removed the project's entities. Add
an onerror that re-raises for the root (deeper failures still prune that subtree),
and re-raise rather than return an empty delete-everything snapshot when nothing
was discovered. Addresses Codex on #1002.

Signed-off-by: phernandez <paul@basicmachines.co>
@phernandez

Copy link
Copy Markdown
Member Author

Re: Resolve permalink-prefixed local watch events — not applicable on this branch. The local watcher path resolves events with LocalStorageEventProjectResolver, constructed from the same local_project_prefix(project) the watcher emits (local_runtime.py:257,316), so the permalink prefix round-trips and always matches. RepositoryStorageEventProjectResolver (which matches by path/name/suffix but not permalink) is never instantiated in src/ — only in tests. Realtime indexing for spaced/cased names (e.g. Alpha Notes) is covered by the passing test_local_watch_request_uses_project_permalink_for_duplicate_leaf_roots. Resolving.

DirectoryDeleteService opened its transaction with a raw session_maker() session,
where SQLite leaves PRAGMA foreign_keys OFF. The bulk delete issues a Core DELETE
on entity and relies on ON DELETE CASCADE for note_content/observations/relations,
so those child rows were orphaned (and later reindexed). Route through
db.scoped_session, which enables foreign_keys for SQLite (and manages commit/
rollback), matching every other FK-dependent delete. Addresses Codex on #1002.

Signed-off-by: phernandez <paul@basicmachines.co>
When a background materialization wrote the markdown file but publish then found
the note had moved (stale_file_path) or disappeared (missing) before publish, the
just-written file was left on disk while the DB no longer owned that path, so the
watcher/project index could re-import it as a duplicate. Flag those outcomes
(written_file_orphaned) and enqueue a checksum-guarded cleanup of the written file;
stale_db_version (same path, re-materialized by a newer pending version) is left
alone. Addresses Codex on #1002.

Signed-off-by: phernandez <paul@basicmachines.co>
The accepted-note move only checked for a DB conflict on the destination, so a
move onto a file that exists on disk but is not indexed committed the entity/
note_content path to the destination while the file stayed unchanged, leaving
DB/search pointing at it and the source to be reindexed as a duplicate. Add a
storage-existence guard (verify_move_destination_absent) mirroring the create/PUT
check, gated by verify_storage_absent_on_create (local True, cloud DB-first), and
reject with 409. Addresses Codex on #1002.

Signed-off-by: phernandez <paul@basicmachines.co>
@phernandez

Copy link
Copy Markdown
Member Author

Addressed the five follow-up threads from the latest Codex re-review:

Finding Resolution Commit
Enable SQLite cascades for directory deletes route DirectoryDeleteService through db.scoped_session (FK ON) f6... (SQLite cascades)
Resolve permalink-prefixed local watch events not applicable — local watcher uses LocalStorageEventProjectResolver (same prefix), resolved
Clean up files written by stale materializations flag missing/stale_file_path outcomes, checksum-guarded cleanup of the written file (orphaned cleanup)
Abort scans when the project root cannot be read os.walk onerror re-raises for the root; never return an empty delete-everything snapshot (root abort)
Reject moves onto unindexed destination files apply the create/PUT storage guard to moves (verify_move_destination_absent) (move guard)

All have regression tests. Resolving the four real ones; the watch-permalink one is resolved as not-applicable.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7feb3cd0e0

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/basic_memory/cli/commands/db.py Outdated
Threading the CLI embeddings flag into the search/FTS rebuild (the search-only fix)
meant a full reindex embedded twice: the project-index phase ran inline embedding
sync, then the explicit embeddings phase called reindex_vectors(force_full=True),
discarding and rebuilding the vectors. The FTS rebuild must never embed — the
explicit phase owns vectors — so pass embeddings=False there. Test asserts a full
reindex calls the embedding provider once. Addresses Codex on #1002.

Signed-off-by: phernandez <paul@basicmachines.co>

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 94812485e5

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

# local writes accept and materialize off the request path, so a
# one-shot client would otherwise exit before the markdown file is
# written (the API already reported the note accepted).
await drain_pending_materializations()

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Drain relation follow-ups before CLI shutdown

When a local one-shot client such as bm tool write-note accepts a write, run_with_cleanup() now waits for the deferred materialization/indexing, but that materializer schedules relation resolution via an untracked debounced asyncio.create_task. After this line returns, asyncio.run() closes the loop and cancels the pending relation task, so inbound forward references to the newly written note remain unresolved until another write or full reindex runs. Add a drain/foreground path for relation-resolution follow-ups before shutting the DB/loop down.

Useful? React with 👍 / 👎.

phernandez and others added 9 commits June 25, 2026 12:26
…icit-sessions

# Conflicts:
#	tests/services/test_initialization_cloud_mode_branches.py
Add the new verify_move_destination_absent method to the move-preparer test fakes
(_CreatePreparer, _MovePreparer) so they still satisfy the AcceptedNoteMovePreparer
protocol, and guard the new reindex/materialization tests against Optional
await_args/materialization. Fixes the Static Checks (ty) failure on #1002.

Signed-off-by: phernandez <paul@basicmachines.co>
- change planning: move destinations must be paths with no indexed row,
  restoring new-files-only semantics so an in-place edit matching a deleted
  file's checksum is never reclassified as a move (regression test added)
- background schedulers and startup indexing hold strong task references so
  GC cannot cancel vector sync, project index, search reindex, or initial
  index tasks mid-flight
- move_entity validates destinations through
  normalize_note_move_destination_path, matching move_note semantics
- markdown checks consolidated onto runtime contracts helpers; link
  resolution now registers extensionless aliases for .markdown files too
- normalize_note_move_destination_path returns a plain str; the single-field
  RuntimeNoteMoveDestination wrapper is gone
- dead RuntimeStorageEventSkipReason.non_markdown removed (never emitted by
  the planner)
- watch_coordinator deferred import documented

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
run_note_materialization enqueues old-path cleanup after the materialized
write and its DB state are durable. A transient queue error there used to
propagate and mark an already-successful job as failed (and drop the
cleanup silently). The runner now catches the enqueue failure, logs it,
and surfaces cleanup_enqueue_failed on the result so runtimes can alert
on the orphaned old-path file instead of re-running finished work.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
NoteContentMutationService resolves the actor context for each mutation
through an optional NoteContentMutationActorResolver before building the
accepted-note actor. Routes keep passing their defaults; a runtime
adapter can replace them with request-derived identity (user profile,
source header, MCP actor headers) without subclassing the service.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…to focused modules

Mechanical module split, no behavior change (Step 2 of the identity split plan):

- runtime/contracts.py (2703 lines) is deleted and split into one-way layered
  modules: storage.py -> projects.py -> jobs.py -> workflows.py / note_content.py
  -> cleanup.py. runtime/__init__.py re-exports the same public names from the
  new modules (temporary barrel until Step 4).
- indexing/project_index_workflow.py (1919 lines) is split into
  project_index_coordinator.py (fan-out + workflow request),
  project_index_maintenance.py (move/delete maintenance + SQL), and a shrunken
  project_index_workflow.py (workflow metadata planning + queued/logical-key
  helpers that move to cloud later). indexing/__init__.py re-exports the same
  names.
- Cloud-only symbols (RuntimeWorkflowBroker/Transport, RuntimeQueuedWorkflowMetadata,
  RuntimeJobStatus, RuntimeJobReference, TenantId) are relocated, not removed;
  Step 3 deletes them.
- All direct `runtime.contracts` / `project_index_workflow` imports across src
  and tests are repointed to the new concrete modules; maintenance/coordinator
  tests move to test_project_index_maintenance.py / test_project_index_coordinator.py.

Placement notes vs the design doc, forced by the one-way layer order:
- WorkflowId lives in runtime/jobs.py (job request shapes carry workflow ids;
  workflows.py imports it from jobs).
- ProjectIndexMaintenanceRunner lives in project_index_maintenance.py because
  StoreProjectIndexMaintenanceRunner subclasses it and the coordinator imports
  maintenance, never the reverse.
- RuntimePendingNoteFileDelete lives in note_content.py (its planners are
  note-content write planning); cleanup.py imports it.
- Project* aliases live in storage.py since storage-level values reference them.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ith safe semantics

The change-planning fix classifies a rename onto an existing indexed path
as modify+delete (never a move), because the scan snapshot cannot
distinguish it from an in-place edit matching a deleted note's checksum —
and misclassifying that edit as a move drops the edit. The project-index
test now locks in the safe semantics. The startup-index test fake gains
add_done_callback for the strong-task-reference scheduler.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…d contracts

Identity split (Step 3 of the runtime-contracts decomposition, #1002):
basic-memory core is single-tenant and queue-agnostic; tenancy and PGQ
transport identity belong to the cloud runtime that owns them.

De-tenanted contracts (fields removed, not made optional):
- runtime/jobs.py: RuntimeStorageFileIndexJobIdentity/Context/Observation,
  RuntimeProjectIndexJobRequest, RuntimeIndexFileBatchJobRequest,
  RuntimeProjectDeleteJobRequest. Dedupe keys are now project-scoped
  (index-file:{project_id}:..., index-file-batch:{project_id}:{batch_index},
  index-project:{project_id}, delete-project:{project_id}) and routing
  headers carry project identity only.
- runtime/note_content.py + runtime/cleanup.py: note materialization and
  note-file delete job requests and their plan_* builders.
- runtime/job_payloads.py + indexing/job_payloads.py: worker payload models
  lose tenant_id/workflow_id; cloud subclasses re-add them on its wire.
- indexing runners/contexts: IndexFileRuntimeRequest, live-update and
  relation-resolution contexts/plans, embedding index requests, directory
  and project delete acceptance, accepted-note enqueue orchestration.
- ProjectIndexWorkflowRequest/Source renamed to ProjectIndexRequest(Source)
  and now carry only the project reference plus force_full/search/embeddings.
  ProjectIndexFanoutFailureRecorder no longer takes workflow_id; scope-owning
  adapters carry their own workflow identity.

Opaque queue transport:
- build_project_index_workflow_start / plan_project_index_workflow_start take
  transport_metadata + transport_event_data mappings instead of
  (transport_broker, transport_entrypoint, transport_job_id).
- RuntimeWorkflowAttemptMetadata.attempt_started_event_data takes an opaque
  transport_event_data mapping instead of pgq_job_id. Event/metadata key
  ordering is preserved so persisted rows stay byte-identical when cloud
  passes its pgq fields.

Deleted cloud-only modules/symbols (re-homed by cloud in Step 5):
runtime/job_identity.py, runtime/job_status.py, runtime/workflow_metadata.py,
RuntimeWorkflowBroker/Transport, RuntimeQueuedWorkflowMetadata,
RuntimeJobStatus, RuntimeWorkflowQueueEnvelope, RuntimeJobReference (dead),
TenantId alias, indexing/project_index_enqueue_runner.py,
project_index_workflow_logical_key + build_project_index_workflow_queued +
ProjectIndexWorkflowQueued, importers/job_payloads.py. Their unit tests moved
out of the core tree alongside them.

Local sentinel removal: LOCAL_EVENT_INDEX_TENANT_ID,
LOCAL_NOTE_CONTENT_TENANT_ID, uuid4() workflow ids in local project index,
get_runtime_tenant_id/RuntimeTenantIdDep, and the tenant param on the
delete-directory route/service are gone; local code now builds plain
project-scoped requests.

Barrels (runtime/indexing/index __init__) stay importable with the deleted
names removed; full barrel removal is a follow-up step. The workflow-gating
behavior for per-file live updates and relation resolution moves to the
runtime adapters that track workflow membership.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Reduce basic_memory.runtime, basic_memory.indexing, and basic_memory.index
__init__.py modules to docstring-only packages with no re-exports or
__all__. Every consumer now imports each name from its concrete owning
module (e.g. basic_memory.runtime.storage, basic_memory.indexing.models,
basic_memory.index.local_project), including the documented deferred
imports in services/initialization.py, cli/commands/db.py, and
index/watch_coordinator.py, and TYPE_CHECKING-only imports.

The index package's cross-package re-exports (runtime storage-event
contracts and the project_index shim over basic_memory.indexing) stop
too: src/basic_memory/index/project_index.py was a pure re-export shim
and is deleted; its names resolve to their basic_memory.indexing owners.
The basic_memory.cloud package facade is unchanged - it re-exports its
own submodules, which is a real package surface, not a cross-package
barrel.

Test changes:
- delete tests/indexing/test_indexing_exports.py (barrel contract test)
- drop the runtime __all__ assertion test and pure export-surface tests
  in tests/index/test_project_index_surface.py; keep the behavioral
  signature/type-hint assertions with concrete imports
- repoint monkeypatch targets from barrel attributes to owning modules
  (basic_memory.index.watch_service.WatchService,
  basic_memory.index.local_project.*)
- entrypoint-name assertions import the constants from
  runtime.job_payloads / indexing.job_payloads directly

50 src files and 75 test files rewritten. No import cycles surfaced;
the latent index <-> services.initialization cycle stays handled by the
existing deferred imports, now pointed at concrete submodules.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7ab8139c92

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

# local writes accept and materialize off the request path, so a
# one-shot client would otherwise exit before the markdown file is
# written (the API already reported the note accepted).
await drain_pending_materializations()

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Drain vector sync tasks before CLI shutdown

When semantic_search_enabled is true and a one-shot local client such as bm tool write-note accepts a note, the router schedules LocalEntityVectorSyncScheduler.schedule_entity_vector_sync() via _background_tasks, but this cleanup path only waits for note materializations before shutting down the DB and closing the event loop. In that CLI context the vector-sync coroutine can be cancelled before it refreshes embeddings, so the markdown/FTS state is current while semantic search remains stale until a later reindex; drain the shared scheduler tasks or run the vector follow-up synchronously before shutdown.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in ecf8040 (commit 3d4d632): run_with_cleanup now drains the shared scheduler task set via a new drain_background_tasks() after pending materializations and before DB shutdown, so one-shot clients wait for vector sync instead of cancelling it at loop close. This also covers relation resolution, which had the same exposure. The drain filters on task state rather than set membership (completed tasks are pruned by a call_soon done-callback that may not have fired yet, and awaiting only done tasks never suspends — membership checks alone would busy-spin), and loops in waves so follow-up tasks scheduled by a draining task are covered too. Regression tests added in tests/cli/test_command_utils.py and tests/services/test_task_scheduler_semantic.py.

phernandez and others added 2 commits July 2, 2026 09:03
A one-shot client like 'bm tool write-note' closes the event loop right
after the command returns, cancelling in-flight vector sync and relation
resolution scheduled by the write path. The accept path had already
reported success, so semantic search and inbound wikilinks stayed stale
until an unrelated reindex.

Drain the shared scheduler task set in run_with_cleanup after pending
materializations and before DB shutdown. The drain filters on task state
rather than set membership: completed tasks are pruned by a call_soon
done-callback that may not have fired yet, and awaiting only done tasks
never suspends, so membership checks alone would busy-spin.

Addresses PR #1002 review (discussion r3511346127).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Cloud main writes source="web_v2" for note edits made in the web-v2 UI
(basic-memory-cloud #1407), and the source vocabulary now lives here
after the parity split. Distinguishing web_v2 from api lets clients tell
a genuine web-UI edit apart from api/materialization round-trips.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ecf8040111

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

request.file_path,
source=source,
)
except FileOperationError:

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Handle markdown files deleted during indexing

When a local markdown file is removed after the metadata check but before file_indexer.index_file() reads it, the local markdown indexer unwraps that read failure and re-raises the underlying FileNotFoundError, so this except FileOperationError branch is skipped. That turns an expected watcher/project-index race into a failed index job instead of the intended missing result, leaving noisy failure status and delaying delete reconciliation until a later event/reindex; catch the file-not-found case here as well as FileOperationError.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Refactor repositories to support caller-owned AsyncSession flows

1 participant