Skip to content

ArithmeticError in ServeShapePlug.end_telemetry_span/2 when parse_body halts before telemetry span starts #3919

@K-Mistele

Description

@K-Mistele

Bug Description

An ArithmeticError occurs in ServeShapePlug.end_telemetry_span/2 when parse_body halts a request before start_telemetry_span has executed. The custom halt/1 unconditionally calls end_telemetry_span(), which attempts System.monotonic_time() - nil

This may be causing crashes - it is correlated with some electric container crashes, but I am not highly confident in this conclusion.

[error] ** (ArithmeticError) bad argument in arithmetic expression
    :erlang.-(-576447606478025047, nil)
    (electric 1.4.7) lib/electric/plug/serve_shape_plug.ex:286:
        Electric.Plug.ServeShapePlug.end_telemetry_span/2

Root Cause

The plug pipeline order places parse_body (position 2) before start_telemetry_span (position 3):

plug :fetch_query_params     # position 1
plug :parse_body              # position 2 - CAN HALT HERE
plug :start_telemetry_span   # position 3 - sets conn.private[:electric_telemetry_span]

Line 18 even has a comment: # start_telemetry_span needs to always be the first plug after fetching query params. — but parse_body sits between them.

When parse_body encounters an error (oversized body, bad JSON, etc.), it calls halt(). The custom halt/1 override (line 319-323) unconditionally calls end_telemetry_span(), which computes duration as:

duration: System.monotonic_time() - conn.private[:electric_telemetry_span][:start_time]

Since start_telemetry_span never ran, conn.private[:electric_telemetry_span] is nil, so this evaluates to System.monotonic_time() - nilArithmeticError.

How to Reproduce

Send a POST request to any shape endpoint with a body exceeding the Plug read limit (default 8MB). Any of the 4 error paths in parse_body (lines 36-81) will trigger it:

  1. Non-object JSON body (line 50-51)
  2. Invalid JSON (line 57-64)
  3. Body exceeds size limit (line 68-72) — this was the production trigger
  4. Failed to read body (line 75-79)

Commits That Created This

  1. 841922d (Oct 2, 2024, PR chore(sync-service): Expand the set of OT span attributes assigned in ServeShapePlug #1736) — Introduced start_telemetry_span, end_telemetry_span, and the custom halt/1 override. At the time, all plugs that could halt were positioned after start_telemetry_span, so the assumption was safe.

  2. 3f257aa (Jan 27, 2026, PR Add POST support for subset snapshots to avoid URL length limits #3777) — Added parse_body before start_telemetry_span in the pipeline, creating a code path where halt()end_telemetry_span() runs without the span having been started.

Key Code References

Affected Versions

The bug exists in every version from 1.4.0 onward (the first release containing PR #3777). The file has zero diff between @core/sync-service@1.4.0 and @core/sync-service@1.4.6 / HEAD.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions