Skip to content

feat(preprod): Add snapshot image comparison task with odiff#109150

Closed
NicoHinderling wants to merge 1 commit intomasterfrom
create-compare-task-v1
Closed

feat(preprod): Add snapshot image comparison task with odiff#109150
NicoHinderling wants to merge 1 commit intomasterfrom
create-compare-task-v1

Conversation

@NicoHinderling
Copy link
Contributor

@NicoHinderling NicoHinderling commented Feb 23, 2026

Update

Now broken into 3 PRs:

#109380
#109381
#109382


Summary

Adds visual regression detection for preprod snapshots by comparing images between head and base artifacts using the odiff binary.

New modules:

  • image_diff — wraps odiff in a persistent server mode for efficient batch comparisons, using a dual-threshold approach (base + color-sensitive) to catch both structural and subtle color changes
  • compare_snapshots task — async Celery task that fetches manifests from object storage, matches images by filename, and categorizes results as changed/added/removed/unchanged. Diff masks are stored as base64-encoded PNGs in object storage alongside a comparison manifest JSON

How it works:

  1. When triggered, the task loads head and base snapshot manifests
  2. Images present in both are compared — if hashes differ, pixel-level diff is computed via odiff
  3. Each comparison runs at two sensitivity thresholds; the one detecting more differences wins
  4. Results (diff masks, scores, dimensions) are persisted to object storage and the PreprodSnapshotComparison model is updated with summary counts

Test plan

  • Unit tests for compare_images and compare_images_batch covering identical, different, size-mismatched, and threshold-sensitive inputs
  • Tests with real before/after images (skipped if not available)
  • Manual test with uploaded snapshot pairs to verify end-to-end comparison flow

Copy link
Contributor Author

NicoHinderling commented Feb 23, 2026

@github-actions github-actions bot added Scope: Frontend Automatically applied to PRs that change frontend components Scope: Backend Automatically applied to PRs that change backend components labels Feb 23, 2026
@github-actions
Copy link
Contributor

🚨 Warning: This pull request contains Frontend and Backend changes!

It's discouraged to make changes to Sentry's Frontend and Backend in a single pull request. The Frontend and Backend are not atomically deployed. If the changes are interdependent of each other, they must be separated into two pull requests and be made forward or backwards compatible, such that the Backend or Frontend can be safely deployed independently.

Have questions? Please ask in the #discuss-dev-infra channel.

@NicoHinderling NicoHinderling changed the title create compare task v1 feat(preprod): Add snapshot image comparison task with odiff Feb 23, 2026
from dataclasses import dataclass


@dataclass(frozen=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why use dataclass over a pydantic model?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My (very) limited understanding is that Pydantic is important for data models used within APIs boundaries because Pydantic validates and coerces incoming data at runtime (like when you receive a JSON), with things like type coercion (string "42" → int 42), required/optional field enforcement, custom validators, and clear error messages when validation fails.

For this class, we construct it ourselves from known-correct values inside our own code, so there's nothing to validate and it's not validated from external input or serialized to JSON.

Supposedly there's a slight performance overhead to using pydantic, so I think for this use case this is technically ideal.

lmk if you think im missing something

Copy link
Contributor

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 3 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

Copy link
Contributor

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

response = self._read_json(line)
except RuntimeError:
self._process = None
raise
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Process not killed when nulling reference on error

Low Severity

When _read_json raises RuntimeError (e.g., due to JSONDecodeError from malformed output), self._process is set to None without killing the underlying process. The close() method will then no-op since self._process is already None, and the process local variable in compare() is the only remaining reference. When it goes out of scope, Popen.__del__ does not kill the subprocess — it only warns. The same pattern applies in the BrokenPipeError handler, though there the process is likely already dead. Explicitly terminating the process before nulling the reference would prevent orphaned processes.

Additional Locations (1)

Fix in Cursor Fix in Web

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — both error handlers now explicitly process.kill() + process.wait() before nulling the reference, preventing orphaned subprocesses.

  • Claude Code

extra={"head_artifact_id": head_artifact_id, "base_artifact_id": base_artifact_id},
)
comparison.state = PreprodSnapshotComparison.State.PROCESSING
comparison.save(update_fields=["state"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created comparison lacks atomic state guard against duplicates

Low Severity

The created=True branch transitions the comparison from PENDING to PROCESSING via a non-atomic save, while the not created branch correctly uses an atomic filter().update() guard. If two Celery workers dispatch the same task concurrently, one gets created=True and proceeds unconditionally, while the other gets created=False and can also pass the atomic PENDING check before the first worker's save completes. Both workers then fully execute the comparison, and if one fails, its FAILED state save could overwrite the other's SUCCESS.

Additional Locations (1)

Fix in Cursor Fix in Web

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a false positive. get_or_create is backed by a unique constraint on (head_snapshot_metrics, base_snapshot_metrics), so only one worker can ever get created=True — the other will either get created=False or hit the IntegrityError handler, both of which route through the atomic filter().update() path. The plain save() on the created=True branch is safe because we definitively own the row we just inserted. Added a comment in code to clarify this.

Copy link
Contributor

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 3 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

extra={"head_artifact_id": head_artifact_id, "base_artifact_id": base_artifact_id},
)
comparison.state = PreprodSnapshotComparison.State.PROCESSING
comparison.save(update_fields=["state"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Race allows duplicate processing between created and not-created paths

Medium Severity

The get_or_create inserts the comparison with PENDING state, then a separate save() transitions it to PROCESSING. Between these two non-atomic steps, a concurrent worker taking the created=False path can atomically filter(state=PENDING).update(state=PROCESSING) and succeed, causing both workers to proceed with the comparison simultaneously. Creating with PROCESSING state directly (or using an atomic update in the created=True path as well) would close this window.

Fix in Cursor Fix in Web

comparison.state = PreprodSnapshotComparison.State.FAILED
comparison.error_code = PreprodSnapshotComparison.ErrorCode.INTERNAL_ERROR
comparison.save(update_fields=["state", "error_code"])
raise
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error handler defeats retry by persisting terminal state

Medium Severity

The task is configured with Retry(times=3) and processing_deadline_duration=300, so it's designed to retry on timeouts. However, the except Exception: handler catches TimeoutError, sets comparison.state to FAILED, and then re-raises. On retry, the state-guard at line 126–136 finds the comparison in FAILED state (not PENDING), so the atomic update returns updated=0 and the task exits early without reprocessing. This makes the retry configuration completely ineffective for the one scenario it's meant to handle.

Additional Locations (1)

Fix in Cursor Fix in Web

try:
return json.loads(line)
except JSONDecodeError as e:
raise RuntimeError(f"odiff server returned invalid JSON: {line!r}") from e
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Startup error loses stderr due to unset process reference

Low Severity

_read_json reads stderr from self._process, but when called from _start(), self._process is still None (it's assigned only at line 81 after successful startup). If odiff exits immediately, the actual stderr content from the local proc variable is never captured, and the RuntimeError message contains an empty string, making startup failures hard to debug.

Additional Locations (1)

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Scope: Backend Automatically applied to PRs that change backend components Scope: Frontend Automatically applied to PRs that change frontend components

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants