Skip to content

[#70] Fix analyze CRC for cah:/ ContentDirectory resources; add --skip-crc#73

Merged
SkowronskiAndrew merged 10 commits into
mainfrom
analyze-fix-resource-crc
Jun 12, 2026
Merged

[#70] Fix analyze CRC for cah:/ ContentDirectory resources; add --skip-crc#73
SkowronskiAndrew merged 10 commits into
mainfrom
analyze-fix-resource-crc

Conversation

@SkowronskiAndrew

@SkowronskiAndrew SkowronskiAndrew commented Jun 11, 2026

Copy link
Copy Markdown
Collaborator

Fixes #70. Also incorporates CRC fixes from #66 and improved the comments and readability of the PPTR / CRC traversal code.

Issue 70 — cah:/ content-addressed resources

Unity 6.6 ContentDirectory builds write resource streams whose StreamingInfo/StreamedResource path is cah:/<hash> while the file on disk is <hash>.resS / <hash>.resource. The old resource lookup stripped the cah:/ prefix and tried archive:/<hash> and <folder>/<hash> (no extension), both of which fail — producing many Error opening resource file messages and omitting those bytes from the CRC.

The hash in a cah:/ path is the content hash of the whole resource file, so we fold the path string into the CRC instead of opening the file. The prefix is matched case-insensitively. Legacy CAB-*.resS/.resource streams still read their bytes as before.

A cah:/ stream always references the entire resource file (offset 0, full size); the offset/size fields only remain for backward compatibility with the older format that packed multiple resources into one file, which ContentDirectory builds do not do. This is documented in code so the path-only CRC choice is clear.

Performance rework (from #66)

Kept the CRC fixes

  • UnityFileReader.ComputeCRC: fixed the chunking bug — the file offset never advanced and the final partial chunk was mishandled, so any range larger than the buffer produced a wrong CRC and over-read. Added a unit test covering buffer-boundary and partial-final-chunk ranges.
  • ProcessManagedReferenceData: fixed the CRC size argument (stringSize + 4, was m_Offset + stringSize + 4).

New --skip-crc flag, independent of --skip-references

--skip-references now only skips reference extraction; CRC is computed unless --skip-crc is also given. (Previously -s did both.) The reference walk still resolves referenced object ids so the CRC stays stable, but only inserts refs rows when extracting references. All four flag combinations were verified on a real ContentDirectory build.

Comments and refactoring

The PPtrAndCrcProcessor code was very difficult to understand and actually covers some very fundamental aspects of how Unity serialization works (including SerializeReference support). So this PR adds comments, reorders the code for readability and removes some dead code.

⚠️ CRC values are not comparable across this change

The CRCs calculation fixes means that the database produced by an older build will show spurious differences in a comparing-builds workflow:

Validation

Verified on a content directory build : no resource errors, deterministic CRCs across runs, streamed Texture2D/AudioClip get non-zero CRCs with sizes including the stream bytes, and all --skip-crc / --skip-references permutations behave correctly.

Existing test suites pass (UnityFileSystem.Tests, Analyzer.Tests, UnityDataTool.Tests).

Follow-up

Recognize content-addressed stream paths (cah:/<hash>) produced by Unity
6.6 ContentDirectory builds: fold the path (which contains the content
hash) into the CRC instead of opening the differently named resource
file, which was failing with "Error opening resource file". Legacy
CAB-*.resS/.resource streams still read their bytes for the CRC.

Also addresses the performance issues from PR 66 without losing CRC
coverage of external streams:
- Fix UnityFileReader.ComputeCRC chunking (advance the file offset and
  handle the partial final chunk) so ranges larger than the buffer no
  longer produce a wrong CRC or over-read.
- Fix the ProcessManagedReferenceData CRC size argument (stringSize + 4,
  not m_Offset + stringSize + 4).
- Keep journal_mode = MEMORY (drop PR 66's ineffective WAL change).

Add a --skip-crc option, fully independent of --skip-references:
--skip-references now only skips reference extraction and no longer
skips the CRC. The reference walk still resolves referenced object ids
(so the CRC stays stable) but only inserts refs rows when extracting.

Add a ComputeCRC unit test for buffer-boundary ranges and update the
analyze documentation for the new flag semantics.

Fixes #70.
…mption

The content-addressed scheme casing is not guaranteed, so compare the
prefix with OrdinalIgnoreCase. Also document why path-only CRC is correct:
a cah:/ stream always references the entire resource file, and the
offset/size fields only remain for backward compatibility with the older
format that packed multiple resources into a single file.
Rewrite the class comment to cover both responsibilities (PPtr extraction
and CRC fingerprinting, including external stream content), document the
constructor arguments and Process(), comment each member variable, and
group the fields by lifetime (constructor config, reused caches/scratch,
and per-object processing state).
Reorder methods to a more logical ordering.
Refine the CallbackDelegate comment to clarify the id spaces (objectId and
the return value are analyzer/database ids; fileId/pathId are raw PPtr
fields) and document the return value.

Add a block comment in front of ProcessManagedReferenceRegistry with C# and
YAML examples showing the [SerializeReference] "references:" layout, and
explain throughout that each entry's data follows the referenced type's own
TypeTree (obtained via GetRefTypeTypeTreeRoot) - the reason walking the
registry jumps between type trees and is more involved than the rest of the
object. Also document the version 1/2 layouts, the terminating sentinel, the
FQN string reads, and the registry re-entry guard.
…comments

Drop the unused referencedTypeDataNode parameter (the referenced object's
layout comes from its own TypeTree via GetRefTypeTypeTreeRoot, not from this
node) and update both call sites.

Add comments for the trickier mechanics: 4-byte alignment, the
vector/map/staticvector array wrapper, Array node child layout, the
StreamingInfo 32/64-bit offset field and its field-order difference from
StreamedResource, PPtr<T> type-name parsing, and the >2GB size
truncation/overflow assumptions in the stream and array handling.
@SkowronskiAndrew SkowronskiAndrew merged commit 99ce157 into main Jun 12, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

UnityDataTool analyze - fix error when analyzing Unity 6.6 ContentDirectory output

1 participant