Skip to content

feat(tiptap): convert Story Block content to Markdown (#35727)#35728

Open
wezell wants to merge 2 commits into
mainfrom
issue-35727-tiptap-markdown-converter
Open

feat(tiptap): convert Story Block content to Markdown (#35727)#35728
wezell wants to merge 2 commits into
mainfrom
issue-35727-tiptap-markdown-converter

Conversation

@wezell
Copy link
Copy Markdown
Member

@wezell wezell commented May 15, 2026

Closes #35727

Summary

  • Adds com.dotcms.tiptap.TiptapMarkdown — bidirectional converter between Tiptap JSON (Story Block / ProseMirror) and Markdown.
  • Exposes it to Velocity templates:
    • $contentlet.storyBlock.toMarkdown() (StoryBlockMap)
    • $markdownTool.blockToMarkdown(json) (MarkdownTool)
  • Adds org.commonmark:commonmark + -ext-gfm-tables + -ext-gfm-strikethrough (0.22.0). Zero transitive runtime deps (~250KB total).

What it handles

Nodes: paragraph, heading 1-6, blockquote, bulletList, orderedList, listItem, codeBlock (with language), horizontalRule, hardBreak, image, table/tableRow/tableHeader/tableCell, plus dotCMS-specific dotImage and youtube.

Marks: bold, italic, strike, code, link.

Graceful degradation: marks with no markdown equivalent (underline, highlight, subscript, superscript, textStyle, color) are dropped silently. Any other unknown node/mark logs once at INFO via Logger.info and is skipped — Tiptap is extensible, so the converter never throws on user-extended schemas.

Notable correctness details

  • Whitespace lifting. Markdown emphasis cannot close after a space (*x * is invalid). The serializer extracts trailing whitespace out of mark spans before emitting closers, and leading whitespace before openers, so output is always well-formed and parses back to the same structure.
  • Code-context escaping. Text inside inline code marks or codeBlock nodes is emitted literally — special chars are NOT backslash-escaped.
  • Dynamic fence width. A codeBlock whose body contains triple backticks gets a longer fence (4+ ticks) so the fence can't collide.
  • Pipe escaping in table cells; mark precedence (link > bold > italic > strike > code, outer→inner) deterministic.

Test plan

  • TiptapMarkdownTest — 49 synthetic unit tests covering every supported node, every mark, escaping, fence-width, JSON-string overload, round-trip stability per node type.
  • TiptapMarkdownBlogContentTest — 7 tests against blog-test.json (trimmed to 2 real Story Block bodies, 122KB), verifying:
    • every node/mark in real content is supported
    • non-empty markdown output for every blog
    • re-parses to a non-empty Tiptap doc
    • reaches a stable fixed point after one normalization pass
    • distinctive text survives the round-trip
    • inline-code content emitted literally
  • Full module compile (./mvnw compile -pl :dotcms-core) clean.
  • Reviewer: manually hit a Story Block field via $contentlet.storyBlock.toMarkdown() in a Velocity template to sanity-check end-to-end wiring.

Out of scope (documented)

  • HTML blocks inside markdown are preserved as a paragraph of raw HTML text (no DOM parsing — Tiptap.js relies on the browser DOM here; the Java port does not).
  • youtube renders as a plain link to the video src (markdown has no native embed). Reviewer call: switch to an <iframe> HTML block if richer rendering is wanted.
  • Marks like underline etc. are intentionally lossy on the JSON→MD direction since markdown lacks the syntax.

🤖 Generated with Claude Code

…35727)

Adds TiptapMarkdown (com.dotcms.tiptap) which converts Story Block /
ProseMirror documents to markdown and back. Wires it into the existing
renderable surface so Velocity can call:

  $contentlet.storyBlock.toMd
  $markdownTool.blockToMarkdown(json)

Supports paragraph, heading 1-6, blockquote, bullet/ordered lists,
codeBlock with language, horizontalRule, hardBreak, image, GFM tables,
plus the dotCMS-specific dotImage and youtube extensions. Marks:
bold, italic, strike, code, link. Marks with no markdown equivalent
(underline, highlight, sub/superscript, textStyle, color) are dropped
silently; truly unknown nodes/marks log once at INFO and are skipped
so the converter never throws on user-extended Tiptap schemas.

Markdown -> Tiptap uses commonmark-java with GFM tables and
strikethrough extensions (zero transitive runtime deps).

Tests: 56 passing -- 49 synthetic + 7 against a trimmed real-blog
fixture (blog-test.json) covering every node and mark plus
fixed-point round-trip stability.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added the Area : Backend PR changes Java/Maven backend code label May 15, 2026
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 15, 2026

Claude finished @wezell's task in 3m 9s —— View job


PR Review — feat(tiptap): convert Story Block content to Markdown

  • Read TiptapMarkdown.java (converter core, 1071 LOC)
  • Read MarkdownTool, StoryBlockMap (Velocity surface)
  • Read tests (unit + blog-content)
  • Verify POM dependency wiring + real dotImage shape

Solid PR overall — converter is well-organized, tests are real (49 unit + 7 fixture-driven), no obvious correctness landmines on the happy path. Below is what I'd want a second pair of eyes on before merge.


Worth a second look

1. Dead code: ListCtx and listStackTiptapMarkdown.java:461,1066-1069
listStack is declared and never pushed/popped/read. The ListCtx inner class is never instantiated. Only listDepth is actually used. Either delete both or wire them up. Right now they imply nested list/indent logic that doesn't exist.

2. Inline-code body can collide with its own fence — TiptapMarkdown.java:991,1002
codeBlock does smart fence widening (pickFence), but inline code always opens/closes with a single backtick. A Tiptap code mark whose text contains a backtick (`) will emit broken markdown like `a`b`. The CommonMark fix is to wrap with the smallest run-of-backticks that's longer than any run inside the body, plus a space pad if it starts/ends with a backtick. Worth handling — code marks containing backticks are common in technical writing about shell/markdown itself.

3. dotImage rendering ignores href (and data)TiptapMarkdown.java:521-523
Real fixture shape (verified in blog-test.json:3005-3015) includes href, target, data on dotImage.attrs. The renderer only reads src/alt/title, so a linked dotImage (href set) loses the link on conversion. If the Story Block UI allows wrapping images in links, this is a content loss bug. Either render as [![](src)](href) when href is set, or document the loss.

4. Table column alignment is silently droppedTiptapMarkdown.java:751-754
GFM tables support :--, :-:, --: alignment markers. The Markdown → Tiptap path doesn't store alignment on cells, and the Tiptap → Markdown path always emits ---. So | col |\n|:---:| round-trips to a left-aligned table. Fine if you don't care about alignment fidelity, but the PR description implies bidirectional round-trip stability — call out the exception or fix it.

5. HtmlBlock round-trip is corrupting, not just lossyTiptapMarkdown.java:232-236 + 1023-1039
Markdown → Tiptap parses an HTML block into a paragraph containing a text node with raw HTML literal. Tiptap → Markdown then runs that text through escapeText, which escapes < and >. So <div>x</div> round-trips to \<div\>x\</div\>. The PR description says HTML blocks are "preserved as a paragraph of raw HTML text" — but they're not preserved, they're escaped on the way back out. Either mark these text nodes with a "raw" flag and skip escaping, or be honest in the description.

6. StoryBlockMap.toMarkdown() returns raw HTML when the field holds HTMLStoryBlockMap.java:133-138
Method named toMarkdown returns un-converted HTML when jsonContFieldValue is null. That's a footgun in a Velocity template where the author expects markdown. Either:

  • run HTML through a markdown converter (the existing txtmark/commonmark pipeline can go the other way with a flexmark HTML→MD or jsoup), or
  • rename to toMarkdownOrHtml() / throw / return empty string.
    Current behavior is documented in the javadoc but the method name lies.

7. MarkdownTool.blockToMarkdown(String) silently returns the input on non-JSONMarkdownTool.java:136-144
If parse isn't valid JSON, the method returns it unchanged. Undocumented. A template author passing a broken JSON string will see their JSON appear in rendered output instead of an error or empty string. At minimum log a warning; better, return empty string and log so the failure is visible.

8. Inconsistent error semantics across overloads

  • TiptapMarkdown.toMarkdown(JsonNode) — null-safe, returns ""
  • TiptapMarkdown.toMarkdown(String) — throws IllegalArgumentException on bad JSON
  • TiptapMarkdown.toMarkdown(JSONObject) — null-safe, returns "", but routes through the String overload so a malformed JSONObject.toString() would still throw

Pick one contract and apply it consistently. The unit test render_string_overload_throws_on_invalid_json documents this, but it's a sharp edge for callers.

9. escapeText over-escapesTiptapMarkdown.java:1023-1039
Escapes {, }, <, >, +, -, !, #, (, ) unconditionally. Most of these are only special at specific positions (e.g. # only at start-of-line, - only as list marker, ! only before [). The output parses correctly but reads like noise — a plain paragraph "Cost: $10 (USD)" becomes Cost: $10 \(USD\). Not a bug, but the rendered markdown will look ugly to humans inspecting it. Consider context-aware escaping if you care about readability of the markdown output itself.

10. Image src/href aren't escaped for )TiptapMarkdown.java:711-721, 997-1009
A URL containing a literal ) (rare but legal — Wikipedia URLs, etc.) breaks the markdown link syntax. Either URL-encode or use the angle-bracket form <url> when the URL contains ). Edge case but easy fix.

11. No tests for the Velocity surface
MarkdownTool.blockToMarkdown(...) and StoryBlockMap.toMarkdown() have zero direct tests. The converter is well-tested in isolation, but the Velocity wrappers (null handling, the HTML-passthrough branch in StoryBlockMap, the silent-passthrough branch in MarkdownTool) have logic that's untested. Worth a small unit test class for each.

12. TiptapMarkdown.toMarkdown(JSONObject) is a wasteful double-parseTiptapMarkdown.java:116-119
tiptap.toString() (org.json) → MAPPER.readTree(json) (Jackson) for every Story Block render. For a 100KB Story Block body that's measurable. Consider walking the org.json.JSONObject directly with a small adapter, or caching the parsed JsonNode on StoryBlockMap so repeated toHtml/toMarkdown calls don't re-parse.


Minor / nits

  • renderInline lifts trailing/leading spaces only — tabs are handled, newlines aren't. In practice text nodes won't contain \n, but worth a comment.
  • repeatStr is dead-equivalent to String.repeat(int) available on Java 11+. Same for the repeat(char, int) helper.
  • The PR description claims "zero transitive runtime deps" — commonmark-java is leaf, but worth confirming with mvn dependency:tree in CI rather than asserting it.
  • Round-trip tests prove fixed-point convergence after one normalization pass, not literal idempotence on the raw input. That's the right call (markdown has many equivalent representations) but the PR description should be explicit.

Security / risk

Nothing scary. No reflection, no eval, no user-controlled URLs reaching a network call. Worst case is a malformed Tiptap doc producing odd-looking markdown — converter never throws on unknown nodes/marks (good).


Recommendation

Approve after addressing #3 (dotImage href), #5 (HtmlBlock corruption), and #6 (StoryBlockMap.toMarkdown misleading return). The rest are nice-to-have but the three above are actual content-loss/correctness issues users will hit.
· Branch: issue-35727-tiptap-markdown-converter

Mirrors the naming of the existing toHtml() Renderable surface so
Velocity callers use `\$contentlet.storyBlock.toMarkdown` alongside
`\$contentlet.storyBlock.toHtml`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

Convert Tiptap Story Block content to Markdown

1 participant