Long-doc (PageIndex) images are extracted but never surface in the rendered wiki

## Problem

For long documents that route through PageIndex (`doc_type: pageindex`), images *are* correctly extracted to `wiki/sources/images/<doc_name>/` and referenced with correct wiki-relative paths inside `wiki/sources/<doc_name>.json` (each page object has `"images": [{"path": "sources/images/<doc>/pX_imgY.png"}]`, and the paths are also inlined as `![image](...)` in each page's `content`).

However, `tree_renderer.py`'s `render_summary_md()` — which builds `wiki/summaries/<doc_name>.md`, the actual page a user opens in Obsidian — never reads or embeds any of this. Its per-node renderer (`_render_nodes_summary`) explicitly strips `![]()` syntax found in `node["text"]` (necessary, since PageIndex's own embedded refs point into a private `.openkb/files/{doc_id}/images/...` cache that doesn't resolve from the wiki), but never re-inserts the *correctly*-pathed images that live in the page JSON.

Net effect: images are on disk, and technically "referenced" in a JSON data file, but **invisible everywhere a human actually browses the vault** — not in the summary, not in any concept/entity page, not in `index.md`. `wiki/sources/<doc_name>.json` isn't rendered as a wiki page by Obsidian (or anything else), so those references are effectively inert.

## Reproduction

1. `openkb add` a PDF long enough to trigger PageIndex (`pageindex_threshold`, default 20 pages), with no `PAGEINDEX_API_KEY` set (so it falls back to local pymupdf extraction, which does extract images — see `images.py:convert_pdf_to_pages`).
2. Open the resulting `wiki/summaries/<doc>.md` in Obsidian, or `grep '!\[' wiki/summaries/*.md wiki/concepts/*.md wiki/entities/*.md wiki/index.md`.
3. Zero image references anywhere, despite `wiki/sources/images/<doc>/` containing real extracted files and `wiki/sources/<doc>.json` referencing them correctly.

Confirmed on `openkb` 0.4.2 with a 31-page manual (35 images extracted across 21 pages, 0 surfaced in the summary).

## Relation to existing issues

- #74 is about giving the LLM *vision* over referenced images during compilation — a different problem (that's about compile-time understanding; this is about images not being visible in the rendered output at all, regardless of whether the LLM ever "saw" them).
- #135 is about unifying the PDF *extraction* backend between short/long paths — doesn't touch the summary-rendering gap described here.

## Suggested fix

`_write_long_doc_artifacts` in `indexer.py` already has the per-page `pages` list (with images) in scope when it calls `render_summary_md` — it's just not passed through. `render_summary_md`/`_render_nodes_summary` could accept that list, build a `page_num -> [image paths]` map, and embed each node's page-range images inline (tracking already-emitted paths the same way duplicate summaries are already collapsed, so a page split across many sibling nodes doesn't repeat the same figure at every one of them).

Happy to share a working patch/diff if useful — implemented and verified this locally against a real ingest (35/35 images now appear in the rendered summary, none duplicated across sibling nodes on the same page).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Long-doc (PageIndex) images are extracted but never surface in the rendered wiki #166

Problem

Reproduction

Relation to existing issues

Suggested fix

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Long-doc (PageIndex) images are extracted but never surface in the rendered wiki #166

Description

Problem

Reproduction

Relation to existing issues

Suggested fix

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions