CRC of objects with references is not comparable across separate databases

## Summary

`objects.crc32` is meant to be a content fingerprint, and the [comparing-builds](Documentation/comparing-builds.md) workflow expects you can analyze two builds into two separate databases and diff CRCs to find which objects changed. That works for leaf assets (Texture2D/Mesh/AudioClip — no references), but it is **broken for any object that contains references** (Materials, prefabs/GameObjects, MonoBehaviours, etc.): identical content produces different CRCs in two separate `analyze` runs.

## Cause

When `PPtrAndCrcProcessor.ExtractPPtr` folds a reference into the CRC, it uses the **resolved analyzer/database object id** returned by the callback, not the PPtr's own identity:

```csharp
var refId = m_Callback(m_ObjectId, fileId, pathId, ...);   // analyzer db id
m_Crc32 = Crc32Algorithm.Append(m_Crc32, <refId bytes>);
```

That id comes from `ObjectIdProvider.GetId((m_LocalToDbFileId[fileId], pathId))`, and both the serialized-file id and the object id are assigned **sequentially per analyze run**. So the same logical object gets different ids in db1 vs db2 → different CRC for identical content → cross-database comparison reports spurious differences for every object that has references.

## Why we can't just hash the raw PPtr (the tradeoff)

The obvious fix is to hash the raw on-disk PPtr (`fileId` + `pathId`) instead of the resolved id. But the resolved id is currently what makes **within-database** duplicate detection (`view_potential_duplicates`) work across bundles: two copies of the same object in different bundles reference the same target, and resolving through `m_LocalToDbFileId` (keyed by filename) + `pathId` normalizes them to the same id → same CRC → detected as duplicates.

`fileId` is a **local index** into a serialized file's external-reference list, so two copies of an object in different bundles can have different `fileId` values for the same target. Hashing the raw PPtr would therefore weaken duplicate detection. Deduplication is an important feature and is probably not well covered by tests yet, so we don't want to risk regressing it.

## Options to evaluate

1. **Raw PPtr (`fileId` + `pathId`)** — simplest; fixes cross-db comparison in the common case; risks weakening `view_potential_duplicates` (local `fileId` differs between bundles).
2. **Stable target identity + `pathId`** — resolve `fileId` to a stable identifier for the target file and hash that + `pathId`, so it is independent of the local index. This fixes cross-db comparison AND preserves cross-bundle duplicate detection, but the "stable identifier" differs by source:
   * **Build output** external references carry a **path** (e.g. `archive:/CAB-...`), not a GUID.
   * **Editor / Library** references carry a **GUID** (the source asset's GUID).
   So the CRC needs to mix in whichever of `ExternalReference.Path` / `ExternalReference.Guid` is populated (and a fixed marker for local refs, `fileId == 0`). Relies on those fields being present and stable.
   More code: thread the external-reference info from `sf.ExternalReferences` into the CRC.
3. Status quo — cross-db comparison stays broken for referenced objects.

## Prerequisite

Add test coverage for `view_potential_duplicates` / cross-bundle deduplication before changing the CRC, so a fix can be validated to not regress it.

## Context

Discovered while reviewing #73 / #70. Note that this is independent of the CRC changes made there (the ManagedReferenceData size fix, the ComputeCRC chunking fix, and the cah:/ stream hashing) — those also change CRC values vs. older tool versions, so CRCs are not comparable across tool versions regardless.

Related: #44 (refs table).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CRC of objects with references is not comparable across separate databases #74

Summary

Cause

Why we can't just hash the raw PPtr (the tradeoff)

Options to evaluate

Prerequisite

Context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

CRC of objects with references is not comparable across separate databases #74

Description

Summary

Cause

Why we can't just hash the raw PPtr (the tradeoff)

Options to evaluate

Prerequisite

Context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions