feat(preprod): Add snapshot image comparison task with odiff#109150
feat(preprod): Add snapshot image comparison task with odiff#109150NicoHinderling wants to merge 1 commit intomasterfrom
Conversation
This stack of pull requests is managed by Graphite. Learn more about stacking. |
|
🚨 Warning: This pull request contains Frontend and Backend changes! It's discouraged to make changes to Sentry's Frontend and Backend in a single pull request. The Frontend and Backend are not atomically deployed. If the changes are interdependent of each other, they must be separated into two pull requests and be made forward or backwards compatible, such that the Backend or Frontend can be safely deployed independently. Have questions? Please ask in the |
5bc85e9 to
eb716d6
Compare
eb716d6 to
6ced8f1
Compare
6ced8f1 to
66b9902
Compare
2eb8356 to
9f93bed
Compare
b3074bd to
09db923
Compare
09db923 to
b3e1513
Compare
| from dataclasses import dataclass | ||
|
|
||
|
|
||
| @dataclass(frozen=True) |
There was a problem hiding this comment.
Why use dataclass over a pydantic model?
There was a problem hiding this comment.
My (very) limited understanding is that Pydantic is important for data models used within APIs boundaries because Pydantic validates and coerces incoming data at runtime (like when you receive a JSON), with things like type coercion (string "42" → int 42), required/optional field enforcement, custom validators, and clear error messages when validation fails.
For this class, we construct it ourselves from known-correct values inside our own code, so there's nothing to validate and it's not validated from external input or serialized to JSON.
Supposedly there's a slight performance overhead to using pydantic, so I think for this use case this is technically ideal.
lmk if you think im missing something
db87543 to
8f0eb58
Compare
8f0eb58 to
b277064
Compare
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 3 potential issues.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
b277064 to
bb07625
Compare
bb07625 to
5c143a6
Compare
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
| response = self._read_json(line) | ||
| except RuntimeError: | ||
| self._process = None | ||
| raise |
There was a problem hiding this comment.
Process not killed when nulling reference on error
Low Severity
When _read_json raises RuntimeError (e.g., due to JSONDecodeError from malformed output), self._process is set to None without killing the underlying process. The close() method will then no-op since self._process is already None, and the process local variable in compare() is the only remaining reference. When it goes out of scope, Popen.__del__ does not kill the subprocess — it only warns. The same pattern applies in the BrokenPipeError handler, though there the process is likely already dead. Explicitly terminating the process before nulling the reference would prevent orphaned processes.
Additional Locations (1)
There was a problem hiding this comment.
Fixed — both error handlers now explicitly process.kill() + process.wait() before nulling the reference, preventing orphaned subprocesses.
- Claude Code
| extra={"head_artifact_id": head_artifact_id, "base_artifact_id": base_artifact_id}, | ||
| ) | ||
| comparison.state = PreprodSnapshotComparison.State.PROCESSING | ||
| comparison.save(update_fields=["state"]) |
There was a problem hiding this comment.
Created comparison lacks atomic state guard against duplicates
Low Severity
The created=True branch transitions the comparison from PENDING to PROCESSING via a non-atomic save, while the not created branch correctly uses an atomic filter().update() guard. If two Celery workers dispatch the same task concurrently, one gets created=True and proceeds unconditionally, while the other gets created=False and can also pass the atomic PENDING check before the first worker's save completes. Both workers then fully execute the comparison, and if one fails, its FAILED state save could overwrite the other's SUCCESS.
Additional Locations (1)
There was a problem hiding this comment.
This is a false positive. get_or_create is backed by a unique constraint on (head_snapshot_metrics, base_snapshot_metrics), so only one worker can ever get created=True — the other will either get created=False or hit the IntegrityError handler, both of which route through the atomic filter().update() path. The plain save() on the created=True branch is safe because we definitively own the row we just inserted. Added a comment in code to clarify this.
b2c0d14 to
266ccd9
Compare
266ccd9 to
f96f69a
Compare
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 3 potential issues.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
| extra={"head_artifact_id": head_artifact_id, "base_artifact_id": base_artifact_id}, | ||
| ) | ||
| comparison.state = PreprodSnapshotComparison.State.PROCESSING | ||
| comparison.save(update_fields=["state"]) |
There was a problem hiding this comment.
Race allows duplicate processing between created and not-created paths
Medium Severity
The get_or_create inserts the comparison with PENDING state, then a separate save() transitions it to PROCESSING. Between these two non-atomic steps, a concurrent worker taking the created=False path can atomically filter(state=PENDING).update(state=PROCESSING) and succeed, causing both workers to proceed with the comparison simultaneously. Creating with PROCESSING state directly (or using an atomic update in the created=True path as well) would close this window.
| comparison.state = PreprodSnapshotComparison.State.FAILED | ||
| comparison.error_code = PreprodSnapshotComparison.ErrorCode.INTERNAL_ERROR | ||
| comparison.save(update_fields=["state", "error_code"]) | ||
| raise |
There was a problem hiding this comment.
Error handler defeats retry by persisting terminal state
Medium Severity
The task is configured with Retry(times=3) and processing_deadline_duration=300, so it's designed to retry on timeouts. However, the except Exception: handler catches TimeoutError, sets comparison.state to FAILED, and then re-raises. On retry, the state-guard at line 126–136 finds the comparison in FAILED state (not PENDING), so the atomic update returns updated=0 and the task exits early without reprocessing. This makes the retry configuration completely ineffective for the one scenario it's meant to handle.
Additional Locations (1)
| try: | ||
| return json.loads(line) | ||
| except JSONDecodeError as e: | ||
| raise RuntimeError(f"odiff server returned invalid JSON: {line!r}") from e |
There was a problem hiding this comment.
Startup error loses stderr due to unset process reference
Low Severity
_read_json reads stderr from self._process, but when called from _start(), self._process is still None (it's assigned only at line 81 after successful startup). If odiff exits immediately, the actual stderr content from the local proc variable is never captured, and the RuntimeError message contains an empty string, making startup failures hard to debug.



Update
Now broken into 3 PRs:
#109380
#109381
#109382
Summary
Adds visual regression detection for preprod snapshots by comparing images between head and base artifacts using the odiff binary.
New modules:
image_diff— wraps odiff in a persistent server mode for efficient batch comparisons, using a dual-threshold approach (base + color-sensitive) to catch both structural and subtle color changescompare_snapshotstask — async Celery task that fetches manifests from object storage, matches images by filename, and categorizes results as changed/added/removed/unchanged. Diff masks are stored as base64-encoded PNGs in object storage alongside a comparison manifest JSONHow it works:
PreprodSnapshotComparisonmodel is updated with summary countsTest plan
compare_imagesandcompare_images_batchcovering identical, different, size-mismatched, and threshold-sensitive inputs