Skip to content

fix(storage): JSON-serialize date/datetime metadata from YAML frontmatter#25

Open
pk8189 wants to merge 1 commit into
ArcadeAI:mainfrom
portofcontext:fix/date-frontmatter-serialization
Open

fix(storage): JSON-serialize date/datetime metadata from YAML frontmatter#25
pk8189 wants to merge 1 commit into
ArcadeAI:mainfrom
portofcontext:fix/date-frontmatter-serialization

Conversation

@pk8189
Copy link
Copy Markdown

@pk8189 pk8189 commented May 20, 2026

What

Database.insert_document and update_document call json.dumps(document.metadata) to persist parsed-frontmatter metadata. Frontmatter dates like last_push: 2026-05-19 are parsed by PyYAML into datetime.date objects, which the stdlib JSON encoder doesn't know how to handle, so any markdown file with a date in frontmatter fails to index:

```
TypeError: Object of type date is not JSON serializable
```

Repro

Hits anyone using Obsidian (very common to have created: YYYY-MM-DD or similar in frontmatter). A minimal example:

```

title: My note
last_push: 2026-05-19

```

`librarian add /path/to/that/note.md` → `Error: ... Object of type date is not JSON serializable`.

Fix

Pass a small `default=` callback to `json.dumps` that converts `date` / `datetime` to ISO strings, and `str()`'s anything else unexpected. Stored as strings; round-tripped values come back as strings — acceptable because metadata is informational, not queried as dates.

Tests

New `tests/test_database.py` with regression tests for both `insert_document` and `update_document`. Verified the tests fail on `main` (reproducing `TypeError`) and pass with this change. Existing `test_parser.py` continues to pass.

Notes

  • Helper is named `_json_default` and lives next to `get_effective_embedding_dimension` so other call sites that serialize metadata can reuse it if needed.
  • No new dependencies.
  • Falls under "bug fix with regression test" per CONTRIBUTING.md.

…tter

Obsidian and other markdown frontmatter commonly contain `YYYY-MM-DD`
values that PyYAML parses into `datetime.date`, e.g.:

    ---
    last_push: 2026-05-19
    ---

When `MarkdownParser` extracted the frontmatter and `Database.insert_document`
ran `json.dumps(document.metadata)`, this crashed with:

    TypeError: Object of type date is not JSON serializable

Add a small `_json_default` fallback that converts `date` / `datetime`
to ISO strings (and falls back to `str()` for anything else). Round-tripped
values come back as strings — acceptable because metadata is informational
and not queried as dates.

Includes a regression test that fails before this change and passes after,
covering both `insert_document` and `update_document`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant