23 Ways the Google Docs API Will Silently Corrupt Your Document

If you've built anything that programmatically edits Google Docs, you've hit these. The API doesn't return errors. It doesn't throw exceptions. It accepts your batchUpdate request, returns 200, and silently produces wrong output.

This repository contains 23 reproducible mutation pairs — before state, API request, after state — demonstrating every major failure mode in the Google Docs batchUpdate API. Each was captured against a live Google Doc. Each is deterministically reproducible.

These are not edge cases. These are the normal operations: insert text, add a table, apply bold, create a list. The API makes every single one of them dangerous.

The Core Problem

The Google Docs API operates on absolute UTF-16 index positions. Every character, every structural element, every table cell boundary occupies a numbered position in a flat index space. When you insert 10 characters at position 50, every index after position 50 shifts by 10.

The API does not adjust subsequent requests in a batch. If you send a batchUpdate with three requests, and the first one shifts indices, the second and third requests are now pointing at the wrong locations. The API will happily execute them at those wrong locations. No error. No warning. Corrupted document.

This means:

Every insertion invalidates every subsequent index in the batch. You must compute cascading offsets yourself.
Every deletion shifts indices backward. Miss one and you delete the wrong paragraph.
Compound operations (insert table + fill cells) require reading the document between steps. The first step changes the index landscape in ways you cannot predict without re-reading.
UTF-16 code units, not characters. Emoji, CJK characters, and anything outside the Basic Multilingual Plane consume 2 index positions (surrogate pairs). Python's len() will give you the wrong number.

Fixture Structure

fixtures/
  01_plain_text.json              # 13 document fixtures (documents.get snapshots)
  02_heading_hierarchy.json
  ...
  13_bookmarks.json
  manifest.json                   # Document IDs for each fixture

  mutations/                      # 23 mutation pairs
    T1_insert_text_start/
      before.json                 # Full document state before mutation
      request.json                # Exact batchUpdate request(s) sent
      after.json                  # Full document state after mutation
      description.md              # What the mutation does and why it matters

    T2_insert_text_end/
    ...
    N3_delete_named_range/

  validation/                     # 8 multi-step validation scenarios
    insert_text_after_heading/
      compiled_request.json       # The request sequence
      after.json                  # Resulting document state
      result.txt                  # PASS/FAIL
    ...

Every mutation directory contains four files: the document state before, the exact API request, the document state after, and a description of what the operation does. You can diff before.json and after.json to see exactly what changed.

All 23 Failure Modes

Text Operations

ID	Operation	What you want	What goes wrong	Severity
T1	Insert text at document start	Add a paragraph at the top	All subsequent indices in the batch shift by `len(text)`. Second request in batch targets wrong position.	Data loss
T2	Insert text at document end	Append a paragraph	Must find the correct end position. The document's `endIndex` is not a valid insertion point — the trailing `\n` occupies it. Inserting at `endIndex` fails silently or corrupts.	Corruption
T3	Insert after a heading	Add text under "Revenue Analysis"	Requires walking the document tree to find the heading's `endIndex`. No semantic addressing — only raw indices. Wrong index = text lands in wrong section.	Corruption
T4	Insert between paragraphs	Add a paragraph between two existing ones	Same index arithmetic problem as T1-T3. Must count paragraphs manually (skip the implicit `sectionBreak` at index 0).	Corruption
T5	Replace a section	Replace body text under a heading	Two-step: delete range, then insert. The delete shifts all subsequent indices. The insert must use the post-delete index. Get it wrong and you overwrite the wrong section.	Data loss
T6	Delete a paragraph	Remove a paragraph	Cannot delete the `sectionBreak` (index 0-1) or the final trailing `\n`. Attempting either fails silently. All subsequent indices shift backward — miss one and the next operation corrupts.	Data loss
T7	Replace all text	Find and replace	The only safe text operation. The API handles index arithmetic internally. Zero complexity. Everything else should be this simple.	Cosmetic

Formatting Operations

ID	Operation	What you want	What goes wrong	Severity
F1	Apply bold	Bold a phrase	The `fields` parameter is a required field mask. Omit it and the API silently clears every other style property on the range — font, size, color, everything. The API returns 200.	Data loss
F2	Change heading level	Promote/demote a heading	Google auto-generates a `headingId` that you cannot predict or control. If your code references heading IDs downstream, they're now stale.	Corruption
F4	Change font	Set font family and size	Range must exclude the trailing `\n` of the paragraph. Include it and you bleed the style into the next paragraph. No error.	Corruption
F5	Add hyperlink	Link text to a URL	Google silently auto-applies underline and blue foreground color (rgb 0.067, 0.333, 0.8). If you also apply these manually, you double-apply and create a styling conflict that survives link removal.	Cosmetic

Structural Operations

ID	Operation	What you want	What goes wrong	Severity
S1	Insert table	Add a table at a location	Insertion index must be strictly less than the segment's `endIndex`. Insert at the end → undocumented failure. A 3x3 table with 4-character cells consumes 60 index positions. Miscalculate and every subsequent operation in the batch corrupts.	Data loss
S2	Add table row	Append a row to a table	Must know the table's `startIndex` (not the row's), the row index, and the column index. A new row consumes `1 + cols * 2` indices. These cascade into everything after the table.	Corruption
S5	Insert bullet list	Create a bulleted list	Two-step operation: insert text first, then apply bullet formatting with `createParagraphBullets`. Send them in the wrong order and the bullet range points at nonexistent text.	Corruption
S6	Insert numbered list	Create a numbered list	Same two-step problem as S5. Numbered list preset is `NUMBERED_DECIMAL_ALPHA_ROMAN`. Not discoverable from the API — you need the docs.	Corruption
S7	Convert paragraphs to list	Bullet existing paragraphs	Must calculate the exact `startIndex` and `endIndex` spanning all target paragraphs. Off by one and you either miss a paragraph or bullet the wrong one.	Cosmetic
S8	Insert page break	Add a page break	A page break creates a two-element paragraph: a `pageBreak` element + a trailing `textRun` containing `\n`. Consumes 2 index positions, not 1. Downstream index arithmetic is wrong if you assume 1.	Corruption

Object Operations

ID	Operation	What you want	What goes wrong	Severity
O1	Insert image	Embed an image	Image consumes exactly 1 index position regardless of file size. But Google re-hosts the image — `sourceUri` and `contentUri` diverge silently. If you track images by URI, your references break.	Cosmetic
O2	Create header/footer	Add page header or footer	Creates an entirely new index segment with its own independent index space starting at 0 (which is omitted from the JSON due to proto3 default value suppression). If you don't handle missing `startIndex`, you get `KeyError`.	Corruption
O4	Insert footnote	Add a footnote	Consumes 1 index in the body. But creates a new footnote segment with its own index space. Compound operation: you must re-read the document after creation to get the footnote segment ID before you can insert content into it.	Corruption

Named Range Operations

ID	Operation	What you want	What goes wrong	Severity
N1	Create named range	Bookmark a section	Named ranges store absolute index positions. Any insertion or deletion before the range silently invalidates it. The range doesn't move. Your named range now points at the wrong text.	Data loss
N2	Replace named range content	Update a bookmarked section	Two-step: delete old content, insert new. The delete invalidates the named range entirely. You must re-create it if you need it again. The API does not warn you.	Data loss
N3	Delete named range	Remove a bookmark	The only safe named range operation. No index impact.

The 5 Worst Offenders

These are the failure modes most likely to corrupt real documents in production.

1. The Silent Style Wipe (F1)

You want to bold a phrase. You send:

{
  "updateTextStyle": {
    "range": {"startIndex": 143, "endIndex": 146},
    "textStyle": {"bold": true}
  }
}

The API returns 200. The text is now bold. It is also now in the default font, default size, default color. Every other style property on that range has been silently cleared.

The fix is a fields parameter:

{
  "updateTextStyle": {
    "range": {"startIndex": 143, "endIndex": 146},
    "textStyle": {"bold": true},
    "fields": "bold"
  }
}

The fields parameter is a field mask that tells the API which properties to modify. Omit it and the API interprets that as "set bold to true, set everything else to default." This is documented in a single paragraph buried in the field masks guide. It is not mentioned in the updateTextStyle reference.

Fixture: fixtures/mutations/F1_apply_bold/

2. The Off-By-One Table Insertion (S1)

You want to insert a table at the end of a section. You calculate the insertion index as the section's endIndex. You send:

{
  "insertTable": {
    "location": {"index": 181},
    "rows": 2,
    "columns": 3
  }
}

If that index equals the segment's endIndex, the API returns: "Index 181 must be less than the end index of the referenced segment, 181." This off-by-one behavior is not documented. You must insert at endIndex - 1 — before the trailing \n, not at the segment boundary.

But the real danger is what happens after the table is inserted. A 2x3 empty table consumes approximately 19 index positions (table start + row markers + cell markers + cell newlines). A 3x3 table with 4-character cells consumes 60. Every operation after the table in the same batch is now pointing at an index that is 19-60 positions too low. The API executes them anyway.

Fixture: fixtures/mutations/S1_insert_table/

3. The Cascading Section Replace (T5)

You want to replace the body text under a heading. This is a two-step operation in a single batch:

[
  {"deleteContentRange": {"range": {"startIndex": 116, "endIndex": 181}}},
  {"insertText": {"location": {"index": 116}, "text": "REPLACEMENT SECTION CONTENT.\n"}}
]

The delete removes 65 characters. Every index after position 116 just shifted backward by 65. The insert must use the post-delete index (116 is correct here because we're inserting at the same position we deleted from). But if you had a third operation in this batch targeting, say, the "Conclusion" heading at its original position, that position is now 36 characters earlier than your code thinks it is (65 deleted - 29 inserted = 36 net shift). The API will execute your third operation at the wrong location.

Fixture: fixtures/mutations/T5_replace_section/

4. The Proto3 Zero-Value Trap (O2)

When you create a header or footer, the API creates a new segment with its own index space. The segment's startIndex is 0. But due to proto3 serialization rules, zero-valued fields are omitted from the JSON response. The startIndex field simply does not appear.

{
  "headers": {
    "kix.nznd4d573jt5": {
      "content": [
        {
          "endIndex": 1,
          "paragraph": { ... }
        }
      ]
    }
  }
}

There is no startIndex on that paragraph element. If your code does element["startIndex"], you get a KeyError. If your code does element.get("startIndex") without a default, you get None and your index arithmetic produces garbage. The correct pattern is element.get("startIndex", 0) — every time, for every element, in every segment. This applies to headers, footers, footnotes, and any future segment type.

This is standard proto3 behavior, but Google's API documentation does not mention it in the context of the Docs API. You discover it when your code crashes on the first document with a header.

Fixture: fixtures/mutations/O2_create_header/

5. Named Range Index Drift (N1 + N2)

Named ranges store absolute index positions. They do not update when the document changes.

Create a named range spanning indices 14-99:

{"createNamedRange": {"name": "executive_summary", "range": {"startIndex": 14, "endIndex": 99}}}

Now insert a paragraph at the beginning of the document (index 1). The document content shifts forward by 20 characters. Your named range still says 14-99. It now points at different text than what you bookmarked. The API does not adjust it. The API does not warn you. The next time you read the named range and operate on its indices, you are operating on the wrong content.

It gets worse with N2 (replace named range content): the delete-and-insert invalidates the named range entirely. The range metadata still exists, but its indices are stale. If you need the range again, you must delete it and re-create it with the new indices. The API does not do this for you.

Fixtures: fixtures/mutations/N1_create_named_range/ and fixtures/mutations/N2_replace_named_range_content/

How to Run These Fixtures

Prerequisites

A Google Cloud project with the Google Docs API enabled
An OAuth 2.0 Client ID (Desktop application type)
Python 3.12+
The credentials.json file from Google Cloud Console

Setup

git clone https://github.com/ConvergentMethods/google-docs-api-fixtures.git
cd google-docs-api-fixtures
pip install google-auth google-auth-oauthlib google-api-python-client

Reproducing the Mutations

Each mutation directory contains the exact batchUpdate request that was sent. To reproduce:

Create a test document using the Google Docs API
Apply the request.json contents via batchUpdate
Compare the result with after.json

The before.json file shows the document state before the mutation, so you can reconstruct the starting document. The description.md file explains what the operation does and what to watch for.

Exploring the Fixtures

The document fixtures (01_plain_text.json through 13_bookmarks.json) are raw documents.get responses from the Google Docs API. They cover:

Plain text, headings, inline formatting
Bullet lists, numbered lists, nested lists
Tables (simple and complex)
Images, headers, footers, footnotes
Named ranges, bookmarks, page breaks
A kitchen-sink document combining all of the above

These are useful as reference material for understanding how Google represents document structure in JSON — something the official documentation is remarkably vague about.

Why These Exist

We built Arezzo, a deterministic compiler for Google Docs API operations. It compiles semantic intent ("insert a paragraph after the Revenue heading") into correct batchUpdate request sequences with proper UTF-16 index arithmetic, cascading offset tracking, and OT-compatible mutation ordering.

These fixtures are the empirical foundation that Arezzo was built on. They exist because the only way to understand how the Google Docs API actually behaves is to send requests and inspect the results. The documentation tells you what the API accepts. These fixtures show you what it actually does.

License

MIT - Convergent Methods, LLC

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
fixtures		fixtures
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

23 Ways the Google Docs API Will Silently Corrupt Your Document

The Core Problem

Fixture Structure

All 23 Failure Modes

Text Operations

Formatting Operations

Structural Operations

Object Operations

Named Range Operations

The 5 Worst Offenders

1. The Silent Style Wipe (F1)

2. The Off-By-One Table Insertion (S1)

3. The Cascading Section Replace (T5)

4. The Proto3 Zero-Value Trap (O2)

5. Named Range Index Drift (N1 + N2)

How to Run These Fixtures

Prerequisites

Setup

Reproducing the Mutations

Exploring the Fixtures

Why These Exist

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

23 Ways the Google Docs API Will Silently Corrupt Your Document

The Core Problem

Fixture Structure

All 23 Failure Modes

Text Operations

Formatting Operations

Structural Operations

Object Operations

Named Range Operations

The 5 Worst Offenders

1. The Silent Style Wipe (F1)

2. The Off-By-One Table Insertion (S1)

3. The Cascading Section Replace (T5)

4. The Proto3 Zero-Value Trap (O2)

5. Named Range Index Drift (N1 + N2)

How to Run These Fixtures

Prerequisites

Setup

Reproducing the Mutations

Exploring the Fixtures

Why These Exist

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages