From cb6cb7544c41a3f43746c030e381789c6ce6f210 Mon Sep 17 00:00:00 2001 From: Andrew Skowronski Date: Thu, 4 Jun 2026 12:32:02 -0400 Subject: [PATCH 01/10] Add reference files for Unity 6.6 output --- .../contentdirectory-zstd/BuildManifestHash.txt | 1 + .../Data/contentdirectory-zstd/content0.archive | Bin 0 -> 8118 bytes 2 files changed, 1 insertion(+) create mode 100644 TestCommon/Data/contentdirectory-zstd/BuildManifestHash.txt create mode 100644 TestCommon/Data/contentdirectory-zstd/content0.archive diff --git a/TestCommon/Data/contentdirectory-zstd/BuildManifestHash.txt b/TestCommon/Data/contentdirectory-zstd/BuildManifestHash.txt new file mode 100644 index 0000000..776b546 --- /dev/null +++ b/TestCommon/Data/contentdirectory-zstd/BuildManifestHash.txt @@ -0,0 +1 @@ +57c4c06634292c7a29331bb7099856cb \ No newline at end of file diff --git a/TestCommon/Data/contentdirectory-zstd/content0.archive b/TestCommon/Data/contentdirectory-zstd/content0.archive new file mode 100644 index 0000000000000000000000000000000000000000..8cc524a2912a30f20e7559f15e5553ee239ecc94 GIT binary patch literal 8118 zcmV;nA4%X38rF*i74 zF=jO|F*r0~Gi5n6I5#e1W-tH%0RSLEI5+_I5jiDwh zHa0UfGC4A1H(@e4Gcz$_VmB~3IXE>oV`469b8l_{00000000000000ewJ-f(DQV^K z06GmFK{*hRS_c3!0TKWb0Mgcx1qcQtDzwG)_+28`CV?TB*={ z)kQEywY(r-I!jUw1pfmr1mD1}cB~`d0d??4`|@D0?#KP)I`SJ?_xt4E2iD8~s{@Y1 z;Lr7v^|GGEi*-M`9n62Te0lJ|m@k8Qv~2e5Ec_K5{Joa|~C-d7tIzGVfN9NrQDU}QvLMnu$htMC?C@9XCKt*Bz{XdO<`dC53 zhl%5nJxGWq^k|5X%!3Dc$Ici*9Xd!Y{-49+CFQ8ak=B6o#!Z_dCz~~D(ilR6<_vO4 zo0%dZXv&ZoBPJP!2+S8EFI}8BkX&Bm*rFv1mMd1Ol%O(2N)#whoM33eWJzNh-Jg|^ zlQ1GVVq~$%u1I8vkRU*Q-1+E{$YY0&9BnvmAlj%wV}^_vXE0v4Xu)EIiWDgD6Ne@Y zOOO~LBE$`X=<7qF(nC=P4picW2BM@RhBaKw7c4}vfJ|2CDHxd$M8XawA&kKY zM1m%gff^dif)9wuNE%}ZV!bdN8Z7t>@cweOJwQIu_{-t->q{=Xd&KZNcgsbG;2_NR z0r!?x-sm}(G+>aoh+JR;sQ~ZJ&P3e^x({<4XZUP!HcO?L4-fwzi z$}aIGF(s{}s2oUjGx?QYh?2x{;xtMtC)pxN=HE6Na32`bYwIWfx*HBDS-q;##(dR&^8|9aK<0L8e9 z3}}UAfnC+WDzFsrhQRU*ne+2L|KyaWh_Fev zjWi=Y_CRoCHd4~@J1au!(fCLrq$Kkv5=$1L`N^}t6g2;26lwmC=Kpc?M8PK7>;SdsRLFKf7lk8P`|fgrHWbAk|l}@TC~{G`yir~)L6YD9|Aqp(*kRH z0rfn>5L3?^I$kf7z)F`7Sy)$~DK0{7UG+?$H0E7IRK+(wVP+MK{h6@9)q)>Vnr;w8 z)nd`uvBwdFiejx4<%-2XsY#x$uLcHbG(H+v<{A~c1{uA+#gkJmBFcF-G^2q9xdgo& z8EbknOZ!SeO`desBBNQiKG2Jopou(QF%!00s)Fq+OX300_0NxxG_x*%g(7MF9+>Nl zB(1`gE=ZdzADt@qn3s|y9+a*)mM65EEBvaJfkuyUyb~j5^h@>yavp+5dZ6RrY4%*i z>|M8tBG5#T`t&1oaL;@+C@ERe=w)?0i#Aou)Rn1oH5`=X8(*#pA4rvKVh z{espK*>DZ-(Lt-g({}!#Cz^w?An1%moZ)`~abHE>89{IySjHQZ#}FVm2#dktu!9GH z|G{^E@Zitu=ePI$F3&sI_0Qd|={ldD0|t1X?=#p_+tbr)&;V}%gFL*Sw_t$+JG{F) zI=VT#I=TdOb8~WVad2#Ka9{gY_6xX`9ccRqv`jRWfj#F@T;h#rkqky1EM~ZbHIQP z)f52r13W!BHJd+)xol!$TK>OWT2e|%LNYosQW||_QBjhp$ozBtkAOT51{pcuQ=bS) z?B51vcQ5x$=~ESM0b--$1d-uu1^G7Jn-zz@g5y9mF8}4}_Hf%5(|tXg>m8SQx4(P8 zgS;U4fu7~Zab+=FAK3dK9B%u1$K)|uSr32JzWdOga&B61= zle-#Ek~pdRpYXc{JmczOGMTJ@$-Fo`jvMp)7%q^@XJK7wv8tk)av~MwG&7L^x#7aF zLz4oN{HH5e!l~s^uxHY!iDa5T66z?8mJ+T^MhKY^{o|oOfN~VP&4kIEFkf@Rp5hOF zuy_clNbRwDd?cTZwkQW6t$yOw%)uNzR;QXVE0LAqRl;MOur)_J5WP(0XuvSFD*M2> zj~8#Q6yRvjh5$%VNTXkXImpc_sPZ$R0VTB@R%imEBL!%bF+nI z1n~}x0apVn8w#GbX1Wyr@Z|yqy9#%lE$opo>)LJTBt(jsfE)BqVG#-BK}Za94KZ&@ zmNZ&*NRm=S(FuxK#%bNHc=TU*up;I#i)xBdEm|;Xt8SPXno%a;9^<+IWp#5^hOSdX z2b-`SOcw=X)e<2gXU*(`9%>cr0uEPZ-GCnIMHbR%9aL6##YzOp4;?x)J{p#smKG17 z@bgAR7byRe`)3P$V18Vy7SH{;Ygfo`0fbJfIx|Bv;Fc&3n8A_jqA2#~P6;oEig2tn zorEYp?Ve8PEppe~qEHn6@*8fduhK&-5(O5wl%7+H(*~sMiJ87e5&EO&sI95-34{n# zSypPGxaP?Nz$IdYN)ONt(Tz6x=S+~_TJrlEHAV!$-GK{+D;gi-Ra>5Qy6Sy$EH-mRK4RLYnu{D;bmUvh z_y}_w4j~fpv}dK#hbdljBl=d-*(j^l<+W@-ai7^qbR z9yU26%&>^f)jvOr8t$wXij4J6GbCwYuNq|Fu56eYn!y&}0w5(SGYfn~=WgAkSMnR9 ztV(Jq#2o6$%|MGahOi31(|T?|!!4Xsr4PK}A;-ayzl6=|Mm=CflEK=yqE%(Zv5)nI zU7$EeRK%*%i?aSv$kIfSb4e*zFfr-4)&N#y#nWCI=ayUUHLYJ+Sem&#=9u7b36e-g z&)EjI=u#6!QsppF7)8eaPYi*K8iOI6J$JV?JvU7(PgSZ3fSrjq`x?_YO zEoQ{*4YfoGqhHtwM)in_TUv*F|6-52P5=}Y^eQ}|`(1B&X&1lsy&Ik0I$o_pxBRs3 z+IzPu-BNEoI*n56U20cteA}(^Qz>+dM{#^g)orMh8|_YU9@qE18?~$YmZwgu(Ouu~ zX?F|V*0{8ePPw%Xg;MqE&b_&JRkx+Kz1z~Mlv=IcPn;U9@>IK)c5N;dx4Tp**S@J$ zUWfACo$vBee5ylrX?MEo-6-^~sZ!~5d)u~-HOJyBf8nHMFkUPyoYU5+x+x70+ zXIOOq@^n?G)V}N8u5Bu{mg+TL^SV@eZ*i)9cb^lUVS3Md3e}@hZr!a;dETy4?Jl%i zQ>%AfcX7Su>i8bz*8CQ)@>HsgN2NHmTkBOR&gG-_lfg5r0v|XolksB0I=`|PulyW) z)4*Z0C)slX+4JiS%s#_xvgeq;drrW1h~S0(Qy`)nBX6&u1Vf2j?tz#J7GN;cB+Z<0 z|Cq4y!caSc#0&Na*K^^^GE=PmPc1tqLoA| z1P)Z=FwvIHY8@a(+U(h*Z!Oj&w_QX?Ml>!{aqCptnt|1M+=b zejK;+Vlf?A>~-V&bU-a%=D!I1%DeZg>weVSX*1OxymB%q8=uN}|8w|goiaglt`a^ZkZ=j#_RSlAjcraq0W zSS%!9)G}+xR_l*HBr~ZCgD>C+Y&O=>UkoL(pr)=%xNtC985J%&%8D zXK@a3huzP?AR@|H?dqJA{RosE+unP!>5|0|7Zrjrz8Vu~<_eqY2X|HK%-V)Xrr#IA zb!|Qkp987I{n$_t*?f`g#Jrn+9(kFQvI9h!$BIkjbf#oT2{9J3|d+~hh@EZ zxEn5|h-+!N-Tt@KD?7srGHAE1G!8=~7wYmw>|9kxqv>?V@u0*;5v8&sqKFN#&oJ~O zB0rIGx{205l5aO1GbD>KYNGRF$!da9to%>a{|38f$K1a7qNACdJnhjx-~gu$oxM6f zE@OZ$T`-dXiWUDODl<&}Gbd~k{++!6#*Oi-$*roELx)p#9uoYS_&1IB0#b#%(+k?_ zLr3uMWH2YITZFR&qtlwyc1mq)6Fr7KmD!3ML6!L1q1|SCziF-0Lh9uo#D?W_2udxe z8^as1Gx*TrjR?bGLl`{a2cV3!=|7PCnH9k^fI|he1q{jyR(FqP_Qo6x5h7s0X|&5- z*$PDr*KBo$B~lY-wX{3F+=8&^ur4YG+9q8zu*m> zfPyUgg@Th0G9QrsD~ref2*8W>B@D+mj}fNC5y}RS5q4)RXO11c|66BVN4*L8+u(E7 zkivMR%l0T^6olBslpn?pRddh_k7(@>QhQz=v6|UiUIq`QG0mvkhqvq9U75X&ZI`?7 zBg&1xx*%b#yGUO))4ZHuhYPAgF1tp#@Ue(EMekMAP+2JFMyTc(ej>`;KBjX&_8=c_ zAv*VbbZfULC9|ci6gN`m5^;FYb+4Faq?_G5t{=TZ+>c~tg~093PxAj=ZJ@Nz<6Hmi z`Y?KLNkNPWYUh&yU?3X75R{i`8zq?(ONk;xR69>X0p_$Cnx_t5%OxaCG9WuxRLWy9 zzJ(wo!yUm+G_tGo=&V9E10jg1bnNL7WE^00mJnwQXU~i2;7wq_)RUE(T?X7Y6`~1h z`^jdNG8Fluu%a=GjOM6N-k2h&KO^FIB3NwZm?2NSHx)iR5Q>H!7>cb<9eJ4%W(d%c ziO|S4MuH60R%eZ$ub&N}Jti?4ZEG?Jb*)uf7)3jyrTK>?h5|!*ap5q3hND48T!(*c zdt_ZYta4N2Fw&xgO&xuz99o=wy(UUpcE=FHo55uCz1fdFwF59s;VorFGp2i^N_*nc zdD*1Fuyz1@IN8bL%Zw9?=wHac0YS(ysXJP^H$xFFBCdZq2CSfY={3({DNyDdAcA>j zKuGI18zHJPvjIg|y#36s)2VJ6m+14na*)~oubUf^r`v)y#~tZ{Dn>{zI>m#U47dfE zx!rvOtCkb$3RI2!UoWoj)}`$mV0b;HgQl9>KeVNbvhA}3uwx4;r^kEZc5-~c*B2CiXeV=!O*!jIDG_*q_CTla zfI*XOk_oJl5slV{5C54P!;ZY|2~4t2sIoT{6Lk{RlR|Ee+?jX?%^-kX(c2{yPnRdGncht=+&A)yiaziK)XII6g(_?Wy+Eg>KNtQFJ zW`CJ0R!LI<9I6FL?gGoEcyZjhiCyJ2%gAL)>>r$B$uI$GvmH?97V#=g9u#m)Y*VT( zJ}a(3Z4uph5}#;G)O~j3)^uoUM6O~x5Q%E5QuC1xA(dTuG8u0nz1>|Xd+MKi1x0wC zu#dx;-9o@Ny#n$T${bNPc&B{JEK28esl|1K{{zyF83W^f7Hoq=&Ish1Yd%29FvoE? z_A!fioQqTu%76Taz~_Y8|{*KL_=#z0Eyc@vsIl z`{Wl6>(s?yqncGm$Ke1Fp5hGj@M3~pa)DV$X}&TdB;Pz}D@9bh0og#*hseQBNd|K$ptzb2b!Jk@t)nOhOkzIB| zXMexW!wonZ>5jmLYY)~}rW4g8(;kWPt)a3R zd*2*t5)1-HWr9FI4D%gq7al(C1km8%J>bYz8CFrI0Tn~M=XhVP#gIC|J{w(W44g@9 z7=I1iFA?KWqe*xNF)8()bG@u}Wamnryuz)>NgRY%2!+FtP!?mwKhxYu^&H@-Z;xO! zAjN#hw;89;Jnd5%Z1|))Ajh9om;O)niPCM1U?J}ET5AEbgVMF7omnOZ&3orP#-4CG z2#<-Fi3K%?$0`(H%JkZo#H{*M-**-L$bSN376=}{bahVmA35^BvpLq#FNnQ^iZ;t#&Bh|M& z#JW_U*BLqNe=KXwxqi8%zS__rOgE;)`U)HO@*Mg-32o_WbQrRK`D-PkhWmWKy-nBI zrmZ_8+w}n&PAR5zkl2h;-7R0<)Tqn5C$*m@xg+76Ewz2FuI6FL^@K06vpw;o`#mb) zNF_G=7l{yIf5$!BtEeeqXP?R)7zMuIzar2*iKe^iMB4$i@N$`1bLT*Z)WXU!9C4q( zTSm3bTI304<|0N^WTI?NO0XDU%qm@n_Z^4Yi+=w}P>swa2m}zo7TlRtE$8itTi`3t zr!|)PZ_d{J1L)pmGL+lT?%JlHBqsiiz#tAaIcyWLB*|x)H8jOLRy*ZuV=DBb-)dgc z;2sGety`mZg4?$aOdezo@Ri8YSbh4pBIC1s5c5zYsABd>{+RSUy8^9b63~!1W#`niN7U3E?O?q_n)BRPfL1 z?s|#Kyd(z)R~2QF8diEis&-ijnj!mzA%!MNi_DNtK@OUKdXx+(gA{=M5cfLjh_K%= zL~7^;=SWJ1aU7vbhZ^t-at|GXbnKyw57DAiOusvlHGd>EX##=o! z$`F1SxrPZDZ|2iA^s^F?w!oM?F$HU&_uohfhJ%j5$@YTAd&vy`<3F;iGzbcMP$u8V ze~N{!*(1rnI zFvT>WjW?Ay+unVDAl)`h8KzFx4)-Cr^MJAiVQ@H7bgb*-9YO#C#s`uc|CMTZ{;EIY zGfU_VAB!Hf821m&kwbnX7>yz{c;kV(z-I@9@W{|55DDJ>H0z9?G`}e<;t-lKNZ)%f z`$y$)HMbFh5Ip8#lBk%qc!YyFf(3@$_@K}f`b6mGQ1yOld*i{iU#8LQ52VXMv%wmp z0W3W4+9H|tJP^yy|Nf(~p)&RE*x@-UBwr57?M*yEiXg;^sc3)%fv6)^f&mBwfR4U! z=Lko71|v@RVE@6{f%sW{JP7KLM#wXQ8KYO*8OD;e2OZw1^fDlaJ!qYefP{k+6tVO@ zF<>gNHV|@X8J=ZEXggkJ9lgirIY)H5k=l}#dx$bCLxEEf|*`FqZi6B&^2u^d3!=K!`){9G4E0ot(zHdhe1~peBcn- z>+_nt4zU6cDW%pNW@GII0Xz~mbf>Er82U%>( zc96%~3$e>kH~l9mw<*y82m(+qqJ#xd6rhe+2^K&YL!QP5n1N8@G*h8bmt)08_0AY5 z85zSzopE(6{#EO3)IidRKru7H9RwP!$hMCG#DFne4>+wiC>@_0d=41{t03HulxVCq z2Q!CfdV#=e!N2#XB@EWAz&*YnyA};vpF!(84Td;w^7urSLulebVDCIL@B{6>r2jFR Q-tR)*4_np(O~z&}0{2KPR{#J2 literal 0 HcmV?d00001 From 5e2854f7ff68083252c7cddc94f322cab9ad5419 Mon Sep 17 00:00:00 2001 From: Andrew Skowronski Date: Thu, 11 Jun 2026 16:09:22 -0400 Subject: [PATCH 02/10] [#70] Fix analyze CRC for cah:/ resources; add independent --skip-crc Recognize content-addressed stream paths (cah:/) produced by Unity 6.6 ContentDirectory builds: fold the path (which contains the content hash) into the CRC instead of opening the differently named resource file, which was failing with "Error opening resource file". Legacy CAB-*.resS/.resource streams still read their bytes for the CRC. Also addresses the performance issues from PR 66 without losing CRC coverage of external streams: - Fix UnityFileReader.ComputeCRC chunking (advance the file offset and handle the partial final chunk) so ranges larger than the buffer no longer produce a wrong CRC or over-read. - Fix the ProcessManagedReferenceData CRC size argument (stringSize + 4, not m_Offset + stringSize + 4). - Keep journal_mode = MEMORY (drop PR 66's ineffective WAL change). Add a --skip-crc option, fully independent of --skip-references: --skip-references now only skips reference extraction and no longer skips the CRC. The reference walk still resolves referenced object ids (so the CRC stays stable) but only inserts refs rows when extracting. Add a ComputeCRC unit test for buffer-boundary ranges and update the analyze documentation for the new flag semantics. Fixes #70. --- Analyzer/AnalyzerTool.cs | 2 + Analyzer/PPtrAndCrcProcessor.cs | 81 ++++++++++++------- Analyzer/SQLite/Handlers/ISQLiteHandler.cs | 1 + .../Parsers/AddressablesBuildLayoutParser.cs | 1 + .../SQLite/Parsers/SerializedFileParser.cs | 3 +- .../Writers/SerializedFileSQLiteWriter.cs | 30 ++++--- Documentation/analyzer.md | 2 +- Documentation/command-analyze.md | 31 ++++--- UnityDataTool/Program.cs | 36 ++++++--- UnityFileSystem.Tests/UnityFileSystemTests.cs | 19 +++++ UnityFileSystem/UnityFileReader.cs | 14 ++-- 11 files changed, 154 insertions(+), 66 deletions(-) diff --git a/Analyzer/AnalyzerTool.cs b/Analyzer/AnalyzerTool.cs index 4a8efe6..6bc6903 100644 --- a/Analyzer/AnalyzerTool.cs +++ b/Analyzer/AnalyzerTool.cs @@ -26,6 +26,7 @@ public int Analyze( string databaseName, string searchPattern, bool skipReferences, + bool skipCrc, bool verbose, bool noRecursion) { @@ -40,6 +41,7 @@ public int Analyze( { parser.Verbose = verbose; parser.SkipReferences = skipReferences; + parser.SkipCrc = skipCrc; parser.Init(writer.Connection); } diff --git a/Analyzer/PPtrAndCrcProcessor.cs b/Analyzer/PPtrAndCrcProcessor.cs index a9d5b13..07c1964 100644 --- a/Analyzer/PPtrAndCrcProcessor.cs +++ b/Analyzer/PPtrAndCrcProcessor.cs @@ -13,12 +13,18 @@ public class PPtrAndCrcProcessor : IDisposable { public delegate int CallbackDelegate(long objectId, int fileId, long pathId, string propertyPath, string propertyType); + // Content-addressed stream paths (new ContentDirectory build output) look like + // "cah:/". The hash already identifies the content, so the path itself is + // folded into the CRC instead of opening the (differently named) resource file. + private const string ContentAddressedPrefix = "cah:/"; + private SerializedFile m_SerializedFile; private UnityFileReader m_Reader; private long m_Offset; private long m_ObjectId; private uint m_Crc32; private string m_Folder; + private bool m_SkipCrc; private StringBuilder m_StringBuilder = new(); private byte[] m_pptrBytes = new byte[4]; @@ -27,11 +33,12 @@ public class PPtrAndCrcProcessor : IDisposable private Dictionary m_resourceReaders = new(); public PPtrAndCrcProcessor(SerializedFile serializedFile, UnityFileReader reader, string folder, - CallbackDelegate callback) + bool skipCrc, CallbackDelegate callback) { m_SerializedFile = serializedFile; m_Reader = reader; m_Folder = folder; + m_SkipCrc = skipCrc; m_Callback = callback; } @@ -79,6 +86,32 @@ private UnityFileReader GetResourceReader(string filename) return reader; } + // Extends the CRC with a range of the main serialized file, unless CRC is disabled. + private void AppendCrc(long offset, int size) + { + if (!m_SkipCrc) + m_Crc32 = m_Reader.ComputeCRC(offset, size, m_Crc32); + } + + // Extends the CRC with the content of an external stream segment (StreamingInfo / + // StreamedResource), unless CRC is disabled. Content-addressed paths fold in the path + // string; other paths read the actual bytes from the companion resource file. + private void AppendStreamCrc(long offset, int size, string path) + { + if (m_SkipCrc) + return; + + if (path.StartsWith(ContentAddressedPrefix)) + { + m_Crc32 = Crc32Algorithm.Append(m_Crc32, Encoding.UTF8.GetBytes(path)); + return; + } + + var resourceFile = GetResourceReader(path); + if (resourceFile != null) + m_Crc32 = resourceFile.ComputeCRC(offset, size, m_Crc32); + } + public uint Process(long objectId, long offset, TypeTreeNode node) { m_Offset = offset; @@ -99,7 +132,7 @@ private void ProcessNode(TypeTreeNode node, bool isInManagedReferenceRegistry) { if (node.IsBasicType) { - m_Crc32 = m_Reader.ComputeCRC(m_Offset, node.Size, m_Crc32); + AppendCrc(m_Offset, node.Size); m_Offset += node.Size; } else if (node.IsArray) @@ -136,12 +169,7 @@ private void ProcessNode(TypeTreeNode node, bool isInManagedReferenceRegistry) if (size > 0) { - var resourceFile = GetResourceReader(filename); - - if (resourceFile != null) - { - m_Crc32 = resourceFile.ComputeCRC(offset, size, m_Crc32); - } + AppendStreamCrc(offset, size, filename); } } else if (node.Type == "StreamedResource") @@ -162,19 +190,14 @@ private void ProcessNode(TypeTreeNode node, bool isInManagedReferenceRegistry) if (size > 0) { - var resourceFile = GetResourceReader(filename); - - if (resourceFile != null) - { - m_Crc32 = resourceFile.ComputeCRC(offset, size, m_Crc32); - } + AppendStreamCrc(offset, size, filename); } } else if (node.CSharpType == typeof(string)) { var prevOffset = m_Offset; m_Offset += m_Reader.ReadInt32(m_Offset) + 4; - m_Crc32 = m_Reader.ComputeCRC(prevOffset, (int)(m_Offset - prevOffset), m_Crc32); + AppendCrc(prevOffset, (int)(m_Offset - prevOffset)); } else if (node.IsManagedReferenceRegistry) { @@ -210,12 +233,12 @@ private void ProcessArray(TypeTreeNode node, bool isManagedReferenceRegistry, bo if (dataNode.IsBasicType) { var arraySize = m_Reader.ReadInt32(m_Offset); - m_Crc32 = m_Reader.ComputeCRC(m_Offset, dataNode.Size * arraySize + 4, m_Crc32); + AppendCrc(m_Offset, dataNode.Size * arraySize + 4); m_Offset += dataNode.Size * arraySize + 4; } else { - m_Crc32 = m_Reader.ComputeCRC(m_Offset, 4, m_Crc32); + AppendCrc(m_Offset, 4); var arraySize = m_Reader.ReadInt32(m_Offset); m_Offset += 4; @@ -239,7 +262,7 @@ private void ProcessArray(TypeTreeNode node, bool isManagedReferenceRegistry, bo // First child is rid. long rid = m_Reader.ReadInt64(m_Offset); - m_Crc32 = m_Reader.ComputeCRC(m_Offset, 8, m_Crc32); + AppendCrc(m_Offset, 8); m_Offset += 8; ProcessManagedReferenceData(dataNode.Children[1], dataNode.Children[2], rid); @@ -255,7 +278,7 @@ private void ProcessManagedReferenceRegistry(TypeTreeNode node) // First child is version number. var version = m_Reader.ReadInt32(m_Offset); - m_Crc32 = m_Reader.ComputeCRC(m_Offset, node.Children[0].Size, m_Crc32); + AppendCrc(m_Offset, node.Children[0].Size); m_Offset += node.Children[0].Size; if (version == 1) @@ -301,19 +324,19 @@ bool ProcessManagedReferenceData(TypeTreeNode refTypeNode, TypeTreeNode referenc throw new Exception("Invalid ReferencedManagedType"); var stringSize = m_Reader.ReadInt32(m_Offset); - m_Crc32 = m_Reader.ComputeCRC(m_Offset, (int)(m_Offset + stringSize + 4), m_Crc32); + AppendCrc(m_Offset, stringSize + 4); var className = m_Reader.ReadString(m_Offset + 4, stringSize); m_Offset += stringSize + 4; m_Offset = (m_Offset + 3) & ~(3); stringSize = m_Reader.ReadInt32(m_Offset); - m_Crc32 = m_Reader.ComputeCRC(m_Offset, (int)(m_Offset + stringSize + 4), m_Crc32); + AppendCrc(m_Offset, stringSize + 4); var namespaceName = m_Reader.ReadString(m_Offset + 4, stringSize); m_Offset += stringSize + 4; m_Offset = (m_Offset + 3) & ~(3); stringSize = m_Reader.ReadInt32(m_Offset); - m_Crc32 = m_Reader.ComputeCRC(m_Offset, (int)(m_Offset + stringSize + 4), m_Crc32); + AppendCrc(m_Offset, stringSize + 4); var assemblyName = m_Reader.ReadString(m_Offset + 4, stringSize); m_Offset += stringSize + 4; m_Offset = (m_Offset + 3) & ~(3); @@ -347,11 +370,15 @@ private void ExtractPPtr(string referencedType) if (fileId != 0 || pathId != 0) { var refId = m_Callback(m_ObjectId, fileId, pathId, m_StringBuilder.ToString(), referencedType); - m_pptrBytes[0] = (byte)(refId >> 24); - m_pptrBytes[1] = (byte)(refId >> 16); - m_pptrBytes[2] = (byte)(refId >> 8); - m_pptrBytes[3] = (byte)(refId); - m_Crc32 = Crc32Algorithm.Append(m_Crc32, m_pptrBytes); + + if (!m_SkipCrc) + { + m_pptrBytes[0] = (byte)(refId >> 24); + m_pptrBytes[1] = (byte)(refId >> 16); + m_pptrBytes[2] = (byte)(refId >> 8); + m_pptrBytes[3] = (byte)(refId); + m_Crc32 = Crc32Algorithm.Append(m_Crc32, m_pptrBytes); + } } } } diff --git a/Analyzer/SQLite/Handlers/ISQLiteHandler.cs b/Analyzer/SQLite/Handlers/ISQLiteHandler.cs index 147e15e..2026d56 100644 --- a/Analyzer/SQLite/Handlers/ISQLiteHandler.cs +++ b/Analyzer/SQLite/Handlers/ISQLiteHandler.cs @@ -29,4 +29,5 @@ public interface ISQLiteFileParser : IDisposable void Parse(string filename); public bool Verbose { get; set; } public bool SkipReferences { get; set; } + public bool SkipCrc { get; set; } } diff --git a/Analyzer/SQLite/Parsers/AddressablesBuildLayoutParser.cs b/Analyzer/SQLite/Parsers/AddressablesBuildLayoutParser.cs index 4ac13e6..4941dae 100644 --- a/Analyzer/SQLite/Parsers/AddressablesBuildLayoutParser.cs +++ b/Analyzer/SQLite/Parsers/AddressablesBuildLayoutParser.cs @@ -15,6 +15,7 @@ public class AddressablesBuildLayoutParser : ISQLiteFileParser public bool Verbose { get; set; } public bool SkipReferences { get; set; } + public bool SkipCrc { get; set; } public void Dispose() { diff --git a/Analyzer/SQLite/Parsers/SerializedFileParser.cs b/Analyzer/SQLite/Parsers/SerializedFileParser.cs index dcb0128..c2d232c 100644 --- a/Analyzer/SQLite/Parsers/SerializedFileParser.cs +++ b/Analyzer/SQLite/Parsers/SerializedFileParser.cs @@ -15,6 +15,7 @@ public class SerializedFileParser : ISQLiteFileParser public bool Verbose { get; set; } public bool SkipReferences { get; set; } + public bool SkipCrc { get; set; } public bool CanParse(string filename) { @@ -36,7 +37,7 @@ public void Dispose() public void Init(SqliteConnection db) { - m_Writer = new SerializedFileSQLiteWriter(db, SkipReferences); + m_Writer = new SerializedFileSQLiteWriter(db, SkipReferences, SkipCrc); } public void Parse(string filename) diff --git a/Analyzer/SQLite/Writers/SerializedFileSQLiteWriter.cs b/Analyzer/SQLite/Writers/SerializedFileSQLiteWriter.cs index f91bcd4..e569813 100644 --- a/Analyzer/SQLite/Writers/SerializedFileSQLiteWriter.cs +++ b/Analyzer/SQLite/Writers/SerializedFileSQLiteWriter.cs @@ -19,6 +19,7 @@ public class SerializedFileSQLiteWriter : IDisposable private int m_NextAssetBundleId = 0; private bool m_SkipReferences; + private bool m_SkipCrc; private IdProvider m_SerializedFileIdProvider = new(); private ObjectIdProvider m_ObjectIdProvider = new(); @@ -54,11 +55,12 @@ public class SerializedFileSQLiteWriter : IDisposable private SqliteConnection m_Database; private SqliteCommand m_LastId = new SqliteCommand(); private SqliteTransaction m_CurrentTransaction = null; - public SerializedFileSQLiteWriter(SqliteConnection database, bool skipReferences) + public SerializedFileSQLiteWriter(SqliteConnection database, bool skipReferences, bool skipCrc) { m_Initialized = false; m_Database = database; m_SkipReferences = skipReferences; + m_SkipCrc = skipCrc; } public void Init() @@ -116,7 +118,7 @@ public void WriteSerializedFile(string relativePath, string fullPath, string con { using var sf = UnityFileSystem.OpenSerializedFile(fullPath); using var reader = new UnityFileReader(fullPath, 64 * 1024 * 1024); - using var pptrReader = new PPtrAndCrcProcessor(sf, reader, containingFolder, AddReference); + using var pptrReader = new PPtrAndCrcProcessor(sf, reader, containingFolder, m_SkipCrc, AddReference); int serializedFileId = m_SerializedFileIdProvider.GetId(Path.GetFileName(fullPath).ToLower()); int sceneId = -1; @@ -228,7 +230,10 @@ public void WriteSerializedFile(string relativePath, string fullPath, string con m_AddObjectCommand.SetValue("game_object", ""); } - if (!m_SkipReferences) + // The walk both extracts references and accumulates the CRC, so it is needed + // unless both are disabled. When CRC is on but references are off, the walk + // still resolves referenced object ids (AddReference skips the insert). + if (!m_SkipReferences || !m_SkipCrc) { crc32 = pptrReader.Process(currentObjectId, offset, root); } @@ -266,13 +271,20 @@ public void WriteSerializedFile(string relativePath, string fullPath, string con private int AddReference(long objectId, int fileId, long pathId, string propertyPath, string propertyType) { + // Always resolve the id so the CRC stays stable; only persist the row when references + // are being extracted. var referencedObjectId = m_ObjectIdProvider.GetId((m_LocalToDbFileId[fileId], pathId)); - m_AddReferenceCommand.SetTransaction(m_CurrentTransaction); - m_AddReferenceCommand.SetValue("object", objectId); - m_AddReferenceCommand.SetValue("referenced_object", referencedObjectId); - m_AddReferenceCommand.SetValue("property_path", propertyPath); - m_AddReferenceCommand.SetValue("property_type", propertyType); - m_AddReferenceCommand.ExecuteNonQuery(); + + if (!m_SkipReferences) + { + m_AddReferenceCommand.SetTransaction(m_CurrentTransaction); + m_AddReferenceCommand.SetValue("object", objectId); + m_AddReferenceCommand.SetValue("referenced_object", referencedObjectId); + m_AddReferenceCommand.SetValue("property_path", propertyPath); + m_AddReferenceCommand.SetValue("property_type", propertyType); + m_AddReferenceCommand.ExecuteNonQuery(); + } + return referencedObjectId; } diff --git a/Documentation/analyzer.md b/Documentation/analyzer.md index 2e95ed3..dc6056e 100644 --- a/Documentation/analyzer.md +++ b/Documentation/analyzer.md @@ -47,7 +47,7 @@ case, Unity will include the asset in all the AssetBundles with a reference to i view_potential_duplicates provides the number of instances and the total size of the potentially duplicated assets. It also lists all the AssetBundles where the asset was found. -If the skipReferences option is used, there will be a lot of false positives in that view. Otherwise, +If the `--skip-crc` option is used, there will be a lot of false positives in that view. Otherwise, it should be very accurate because CRCs are used to determine if objects are identical. ## asset_view (AssetBundleProcessor) diff --git a/Documentation/command-analyze.md b/Documentation/command-analyze.md index 78692d8..b4a08f7 100644 --- a/Documentation/command-analyze.md +++ b/Documentation/command-analyze.md @@ -13,7 +13,8 @@ UnityDataTool analyze [options] | `` | Path to folder containing files to analyze | *(required)* | | `-o, --output-file ` | Output database filename | `database.db` | | `-p, --search-pattern ` | File search pattern (`*` and `?` supported) | `*` | -| `-s, --skip-references` | Skip CRC and reference extraction (faster, smaller DB) | `false` | +| `-s, --skip-references` | Do not extract references (smaller DB, no `refs` table). CRC is still computed. | `false` | +| `--skip-crc` | Skip the CRC32 checksum calculation (faster; `objects.crc32` will be 0) | `false` | | `-v, --verbose` | Show more information during analysis | `false` | | `--no-recurse` | Do not recurse into sub-directories | `false` | | `-d, --typetree-data ` | Load an external TypeTree data file before processing (Unity 6.5+) | — | @@ -30,9 +31,9 @@ Analyze only `.bundle` files and specify a custom database name: UnityDataTool analyze /path/to/asset/bundles -o my_database.db -p "*.bundle" ``` -Fast analysis (skip reference tracking): +Fastest analysis (skip both reference extraction and CRC): ```bash -UnityDataTool analyze /path/to/bundles -s +UnityDataTool analyze /path/to/bundles --skip-references --skip-crc ``` See also [Analyze Examples](../../Documentation/analyze-examples.md). @@ -121,23 +122,27 @@ See [Comparing Builds](../../Documentation/comparing-builds.md) for strategies t ### Slow Analyze times, large output database -Consider using the `--skip-references` argument. +Two independent flags reduce analyze time and database size: -A real life analyze of a big Addressables build shows how large a difference this can make: +* `--skip-crc` skips the CRC32 calculation. This is usually the largest time saver, because computing a CRC requires reading the full content of every object, including large texture, mesh and audio data in companion `.resS`/`.resource` files. +* `--skip-references` skips reference extraction, which is the largest contributor to database size (the `refs` table). The references are not needed for core asset inventory and size information. -* 208 seconds and producted a 500MB database (not specifying --skip-reference) -* 9 seconds and produced a 68 MB file (with --skip-reference) +For the fastest, smallest result, combine them. -The references are not needed for core asset inventory and size information. +A real life analyze of a big Addressables build, skipping both references and CRC, shows how large a difference this can make: -Note: When specifying `--skip-reference` some functionality is lost: +* 208 seconds and produced a 500MB database (default) +* 9 seconds and produced a 68 MB file (with `--skip-references --skip-crc`) + +When `--skip-references` is used, some functionality is lost: * the `find-refs` command will not work * `view_material_shader_refs` and `view_material_texture_refs` will be empty +* `script_object_view` will be empty * Queries that look at the relationship between objects will not work. For example the refs table is required to link between a `MonoBehaviour` and its `MonoScript`. -* The `objects.crc32` column will be NULL/0 for all objects. This means: - * No detection of identical objects by content hash (See [Comparing Builds](../../Documentation/comparing-builds.md)) - * The `view_potential_duplicates` view relies partially on CRC32 to distinguish true duplicates -Future work: The refs table contains a lot of repeated strings and could be made smaller and more efficient. It might also be prudent to control the CRC32 calculation using an independent flag. +When `--skip-crc` is used, the `objects.crc32` column will be 0 for all objects. This means: + +* No detection of identical objects by content hash (See [Comparing Builds](../../Documentation/comparing-builds.md)) +* The `view_potential_duplicates` view relies partially on CRC32 to distinguish true duplicates diff --git a/UnityDataTool/Program.cs b/UnityDataTool/Program.cs index 2d5d591..9a0ef04 100644 --- a/UnityDataTool/Program.cs +++ b/UnityDataTool/Program.cs @@ -1,5 +1,6 @@ using System; using System.CommandLine; +using System.CommandLine.Invocation; using System.IO; using System.Threading.Tasks; using UnityDataTools.Analyzer; @@ -41,7 +42,8 @@ static Command BuildAnalyzeCommand() { var pathArg = new Argument("path", "The path to the directory containing the files to analyze").ExistingOnly(); var oOpt = new Option(aliases: new[] { "--output-file", "-o" }, description: "Filename of the output database", getDefaultValue: () => "database.db"); - var sOpt = new Option(aliases: new[] { "--skip-references", "-s" }, description: "Skip CRC and do not extract references"); + var sOpt = new Option(aliases: new[] { "--skip-references", "-s" }, description: "Do not extract references (CRC is still computed unless --skip-crc is also given)"); + var scOpt = new Option(aliases: new[] { "--skip-crc" }, description: "Skip CRC checksum calculation"); var rOpt = new Option(aliases: new[] { "--extract-references", "-r" }) { IsHidden = true }; var pOpt = new Option(aliases: new[] { "--search-pattern", "-p" }, description: "File search pattern", getDefaultValue: () => "*"); var vOpt = new Option(aliases: new[] { "--verbose", "-v" }, description: "Verbose output"); @@ -53,6 +55,7 @@ static Command BuildAnalyzeCommand() pathArg, oOpt, sOpt, + scOpt, rOpt, pOpt, vOpt, @@ -61,14 +64,28 @@ static Command BuildAnalyzeCommand() }; analyzeCommand.AddAlias("analyse"); - analyzeCommand.SetHandler( - (DirectoryInfo di, string o, bool s, bool r, string p, bool v, bool noRecurse, FileInfo d) => + // Bound via InvocationContext because the option count exceeds the strongly-typed + // SetHandler overloads. + analyzeCommand.SetHandler((InvocationContext context) => + { + var d = context.ParseResult.GetValueForOption(dOpt); + var ttResult = LoadTypeTreeDataFile(d); + if (ttResult != 0) { - var ttResult = LoadTypeTreeDataFile(d); - if (ttResult != 0) return Task.FromResult(ttResult); - return Task.FromResult(HandleAnalyze(di, o, s, r, p, v, noRecurse)); - }, - pathArg, oOpt, sOpt, rOpt, pOpt, vOpt, recurseOpt, dOpt); + context.ExitCode = ttResult; + return; + } + + context.ExitCode = HandleAnalyze( + context.ParseResult.GetValueForArgument(pathArg), + context.ParseResult.GetValueForOption(oOpt), + context.ParseResult.GetValueForOption(sOpt), + context.ParseResult.GetValueForOption(scOpt), + context.ParseResult.GetValueForOption(rOpt), + context.ParseResult.GetValueForOption(pOpt), + context.ParseResult.GetValueForOption(vOpt), + context.ParseResult.GetValueForOption(recurseOpt)); + }); return analyzeCommand; } @@ -293,6 +310,7 @@ static int HandleAnalyze( DirectoryInfo path, string outputFile, bool skipReferences, + bool skipCrc, bool extractReferences, string searchPattern, bool verbose, @@ -305,7 +323,7 @@ static int HandleAnalyze( Console.WriteLine("WARNING: --extract-references, -r option is deprecated (references are now extracted by default)"); } - return analyzer.Analyze(path.FullName, outputFile, searchPattern, skipReferences, verbose, noRecurse); + return analyzer.Analyze(path.FullName, outputFile, searchPattern, skipReferences, skipCrc, verbose, noRecurse); } static int HandleFindReferences(FileInfo databasePath, string outputFile, long? objectId, string objectName, string objectType, bool findAll) diff --git a/UnityFileSystem.Tests/UnityFileSystemTests.cs b/UnityFileSystem.Tests/UnityFileSystemTests.cs index 47fd784..7001a91 100644 --- a/UnityFileSystem.Tests/UnityFileSystemTests.cs +++ b/UnityFileSystem.Tests/UnityFileSystemTests.cs @@ -244,6 +244,25 @@ public void ReadFile_InvalidHandle_ThrowsException() Assert.Throws(() => file.Read(10, new byte[10])); } + // Ranges that cross the internal buffer boundary (and a partial final chunk) must + // produce the same CRC as a single-buffer read. TextFile.txt is 21 bytes; an 8-byte + // buffer forces three chunks (8 + 8 + 5). + [TestCase(0, 21)] // whole file, partial final chunk + [TestCase(0, 16)] // exact multiple of the buffer size + [TestCase(3, 15)] // unaligned start, crosses two boundaries + [TestCase(0, 8)] // exactly one buffer + [TestCase(2, 5)] // entirely within one buffer + public void ComputeCRC_RangeCrossingBuffer_MatchesSingleBufferRead(long offset, int size) + { + var path = Path.Combine(Context.TestDataFolder, "TextFile.txt"); + + using var singleBufferReader = new UnityFileReader(path, 1024 * 1024); + var expected = singleBufferReader.ComputeCRC(offset, size); + + using var smallBufferReader = new UnityFileReader(path, 8); + Assert.AreEqual(expected, smallBufferReader.ComputeCRC(offset, size)); + } + [Test] public void OpenFile_ArchiveFileSystem_ReturnsFile() { diff --git a/UnityFileSystem/UnityFileReader.cs b/UnityFileSystem/UnityFileReader.cs index bf46145..73221be 100644 --- a/UnityFileSystem/UnityFileReader.cs +++ b/UnityFileSystem/UnityFileReader.cs @@ -117,16 +117,18 @@ public byte ReadUInt8(long fileOffset) return m_Buffer[offset]; } + // Computes the CRC32 over a contiguous range, reading the file in buffer-sized chunks. public uint ComputeCRC(long fileOffset, int size, uint crc32 = 0) { - var readSize = size > m_Buffer.Length ? m_Buffer.Length : size; - var readBytes = 0; + var remaining = size; - while (readBytes < size) + while (remaining > 0) { - var offset = GetBufferOffset(fileOffset, readSize); - crc32 = Crc32Algorithm.Append(crc32, m_Buffer, offset, readSize); - readBytes += readSize; + var chunk = (int)Math.Min(m_Buffer.Length, remaining); + var offset = GetBufferOffset(fileOffset, chunk); + crc32 = Crc32Algorithm.Append(crc32, m_Buffer, offset, chunk); + fileOffset += chunk; + remaining -= chunk; } return crc32; From b169ea626204ba5a9dc5934729e23cb39ec86079 Mon Sep 17 00:00:00 2001 From: Andrew Skowronski Date: Thu, 11 Jun 2026 16:35:09 -0400 Subject: [PATCH 03/10] [#70] Match cah:/ prefix case-insensitively; document whole-file assumption The content-addressed scheme casing is not guaranteed, so compare the prefix with OrdinalIgnoreCase. Also document why path-only CRC is correct: a cah:/ stream always references the entire resource file, and the offset/size fields only remain for backward compatibility with the older format that packed multiple resources into a single file. --- Analyzer/PPtrAndCrcProcessor.cs | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/Analyzer/PPtrAndCrcProcessor.cs b/Analyzer/PPtrAndCrcProcessor.cs index 07c1964..be71829 100644 --- a/Analyzer/PPtrAndCrcProcessor.cs +++ b/Analyzer/PPtrAndCrcProcessor.cs @@ -16,6 +16,7 @@ public class PPtrAndCrcProcessor : IDisposable // Content-addressed stream paths (new ContentDirectory build output) look like // "cah:/". The hash already identifies the content, so the path itself is // folded into the CRC instead of opening the (differently named) resource file. + // Matched case-insensitively since the scheme casing is not guaranteed. private const string ContentAddressedPrefix = "cah:/"; private SerializedFile m_SerializedFile; @@ -101,7 +102,13 @@ private void AppendStreamCrc(long offset, int size, string path) if (m_SkipCrc) return; - if (path.StartsWith(ContentAddressedPrefix)) + // A cah:/ stream always references the entire resource file: the hash in the path + // is the hash of the whole file, so the path uniquely identifies the bytes and we + // fold it into the CRC rather than reading them. The offset/size fields only exist + // for backward compatibility with the older output format that packed multiple + // resources into one file; ContentDirectory builds never do this (offset is 0 and + // size is the full file), which is why ignoring offset/size here is correct. + if (path.StartsWith(ContentAddressedPrefix, StringComparison.OrdinalIgnoreCase)) { m_Crc32 = Crc32Algorithm.Append(m_Crc32, Encoding.UTF8.GetBytes(path)); return; From 84e4394215a789d194ab3cc687aca6ddb080889e Mon Sep 17 00:00:00 2001 From: Andrew Skowronski Date: Thu, 11 Jun 2026 17:42:19 -0400 Subject: [PATCH 04/10] [#70] Clarify PPtrAndCrcProcessor comments (CRC role, API, fields) Rewrite the class comment to cover both responsibilities (PPtr extraction and CRC fingerprinting, including external stream content), document the constructor arguments and Process(), comment each member variable, and group the fields by lifetime (constructor config, reused caches/scratch, and per-object processing state). --- Analyzer/PPtrAndCrcProcessor.cs | 47 ++++++++++++++++++++++++--------- 1 file changed, 34 insertions(+), 13 deletions(-) diff --git a/Analyzer/PPtrAndCrcProcessor.cs b/Analyzer/PPtrAndCrcProcessor.cs index be71829..c83592d 100644 --- a/Analyzer/PPtrAndCrcProcessor.cs +++ b/Analyzer/PPtrAndCrcProcessor.cs @@ -7,8 +7,13 @@ namespace UnityDataTools.Analyzer; -// This class is used to extract all the PPtrs in a serialized object. It executes a callback whenever a PPtr is found. -// It provides a string representing the property path of the property (e.g. "m_MyObject.m_MyArray[2].m_PPtrProperty"). +// Walks the TypeTree of a serialized object to do two things in a single pass: +// 1. Extract every PPtr (object reference). A callback is executed for each one, receiving the +// property path that leads to it (e.g. "m_MyObject.m_MyArray[2].m_PPtrProperty"). +// 2. Accumulate a CRC32 over the object's serialized bytes, including the content of external +// streams (texture/mesh/audio data stored in companion .resS/.resource files). This CRC is a +// content fingerprint used to detect whether two objects are identical. +// CRC computation can be disabled (skipCrc) while still extracting references. public class PPtrAndCrcProcessor : IDisposable { public delegate int CallbackDelegate(long objectId, int fileId, long pathId, string propertyPath, string propertyType); @@ -19,20 +24,33 @@ public class PPtrAndCrcProcessor : IDisposable // Matched case-insensitively since the scheme casing is not guaranteed. private const string ContentAddressedPrefix = "cah:/"; - private SerializedFile m_SerializedFile; - private UnityFileReader m_Reader; - private long m_Offset; - private long m_ObjectId; - private uint m_Crc32; - private string m_Folder; - private bool m_SkipCrc; - private StringBuilder m_StringBuilder = new(); - private byte[] m_pptrBytes = new byte[4]; - - private CallbackDelegate m_Callback; + // Configuration shared across all objects, set once in the constructor. + private SerializedFile m_SerializedFile; // file being analyzed; used to resolve referenced managed type trees + private UnityFileReader m_Reader; // reader over the serialized file holding the object data + private string m_Folder; // directory of the serialized file; used to find companion resource files + private bool m_SkipCrc; // when true, skip CRC computation (references are still extracted) + private CallbackDelegate m_Callback; // invoked for each PPtr; returns the referenced object's id + // Readers for external resource (.resS/.resource) files, opened on demand, reused across + // objects, and disposed in Dispose(). private Dictionary m_resourceReaders = new(); + // Reusable scratch buffers, kept as fields to avoid allocating per object/property. + private StringBuilder m_StringBuilder = new(); // builds the current property path during the walk + private byte[] m_pptrBytes = new byte[4]; // holds a referenced object id while feeding it to the CRC + + // State for the object currently being processed, (re)initialized by each Process() call. + private long m_Offset; // current read position within m_Reader + private long m_ObjectId; // analyzer id of the object being processed, passed to the callback + private uint m_Crc32; // CRC accumulated so far for this object + + // serializedFile: the file whose objects are analyzed (used to resolve referenced managed types). + // reader: reader over that file's bytes; Process() walks each object through it. + // folder: directory containing the serialized file; companion .resS/.resource files are + // looked up here when a non-content-addressed external stream contributes to the CRC. + // skipCrc: when true, the tree is still walked to emit references but no CRC is computed. + // callback: called for every PPtr found; its return value (the referenced object's id) is + // folded into the CRC. public PPtrAndCrcProcessor(SerializedFile serializedFile, UnityFileReader reader, string folder, bool skipCrc, CallbackDelegate callback) { @@ -119,6 +137,9 @@ private void AppendStreamCrc(long offset, int size, string path) m_Crc32 = resourceFile.ComputeCRC(offset, size, m_Crc32); } + // Walks the serialized object rooted at `node`, whose data starts at `offset` in the reader, + // emitting every PPtr through the callback. Returns a CRC32 fingerprint of the object's content + // (0 when CRC is disabled). `objectId` is the analyzer id of this object, forwarded to the callback. public uint Process(long objectId, long offset, TypeTreeNode node) { m_Offset = offset; From 541a6e91b0cb0d6e36f747bdf7e7a535b94a6ce1 Mon Sep 17 00:00:00 2001 From: Andrew Skowronski Date: Thu, 11 Jun 2026 19:16:49 -0400 Subject: [PATCH 05/10] PPtrAndCrcProcessor.cs Reorder methods to a more logical ordering. --- Analyzer/PPtrAndCrcProcessor.cs | 133 ++++++++++++++++---------------- 1 file changed, 67 insertions(+), 66 deletions(-) diff --git a/Analyzer/PPtrAndCrcProcessor.cs b/Analyzer/PPtrAndCrcProcessor.cs index c83592d..fcd110f 100644 --- a/Analyzer/PPtrAndCrcProcessor.cs +++ b/Analyzer/PPtrAndCrcProcessor.cs @@ -71,72 +71,6 @@ public void Dispose() m_resourceReaders.Clear(); } - private UnityFileReader GetResourceReader(string filename) - { - var slashPos = filename.LastIndexOf('/'); - if (slashPos > 0) - { - filename = filename.Remove(0, slashPos + 1); - } - - if (!m_resourceReaders.TryGetValue(filename, out var reader)) - { - try - { - reader = new UnityFileReader("archive:/" + filename, 4 * 1024 * 1024); - } - catch (Exception) - { - try - { - reader = new UnityFileReader(Path.Join(m_Folder, filename), 4 * 1024 * 1024); - } - catch (Exception) - { - Console.Error.WriteLine(); - Console.Error.WriteLine($"Error opening resource file {filename}"); - reader = null; - } - } - - m_resourceReaders[filename] = reader; - } - - return reader; - } - - // Extends the CRC with a range of the main serialized file, unless CRC is disabled. - private void AppendCrc(long offset, int size) - { - if (!m_SkipCrc) - m_Crc32 = m_Reader.ComputeCRC(offset, size, m_Crc32); - } - - // Extends the CRC with the content of an external stream segment (StreamingInfo / - // StreamedResource), unless CRC is disabled. Content-addressed paths fold in the path - // string; other paths read the actual bytes from the companion resource file. - private void AppendStreamCrc(long offset, int size, string path) - { - if (m_SkipCrc) - return; - - // A cah:/ stream always references the entire resource file: the hash in the path - // is the hash of the whole file, so the path uniquely identifies the bytes and we - // fold it into the CRC rather than reading them. The offset/size fields only exist - // for backward compatibility with the older output format that packed multiple - // resources into one file; ContentDirectory builds never do this (offset is 0 and - // size is the full file), which is why ignoring offset/size here is correct. - if (path.StartsWith(ContentAddressedPrefix, StringComparison.OrdinalIgnoreCase)) - { - m_Crc32 = Crc32Algorithm.Append(m_Crc32, Encoding.UTF8.GetBytes(path)); - return; - } - - var resourceFile = GetResourceReader(path); - if (resourceFile != null) - m_Crc32 = resourceFile.ComputeCRC(offset, size, m_Crc32); - } - // Walks the serialized object rooted at `node`, whose data starts at `offset` in the reader, // emitting every PPtr through the callback. Returns a CRC32 fingerprint of the object's content // (0 when CRC is disabled). `objectId` is the analyzer id of this object, forwarded to the callback. @@ -409,4 +343,71 @@ private void ExtractPPtr(string referencedType) } } } + + // Extends the CRC with a range of the main serialized file, unless CRC is disabled. + private void AppendCrc(long offset, int size) + { + if (!m_SkipCrc) + m_Crc32 = m_Reader.ComputeCRC(offset, size, m_Crc32); + } + + // Extends the CRC with the content of an external stream segment (StreamingInfo / + // StreamedResource), unless CRC is disabled. Content-addressed paths fold in the path + // string; other paths read the actual bytes from the companion resource file. + private void AppendStreamCrc(long offset, int size, string path) + { + if (m_SkipCrc) + return; + + // A cah:/ stream always references the entire resource file: the hash in the path + // is the hash of the whole file, so the path uniquely identifies the bytes and we + // fold it into the CRC rather than reading them. The offset/size fields only exist + // for backward compatibility with the older output format that packed multiple + // resources into one file; ContentDirectory builds never do this (offset is 0 and + // size is the full file), which is why ignoring offset/size here is correct. + if (path.StartsWith(ContentAddressedPrefix, StringComparison.OrdinalIgnoreCase)) + { + m_Crc32 = Crc32Algorithm.Append(m_Crc32, Encoding.UTF8.GetBytes(path)); + return; + } + + var resourceFile = GetResourceReader(path); + if (resourceFile != null) + m_Crc32 = resourceFile.ComputeCRC(offset, size, m_Crc32); + } + + private UnityFileReader GetResourceReader(string filename) + { + var slashPos = filename.LastIndexOf('/'); + if (slashPos > 0) + { + filename = filename.Remove(0, slashPos + 1); + } + + if (!m_resourceReaders.TryGetValue(filename, out var reader)) + { + try + { + reader = new UnityFileReader("archive:/" + filename, 4 * 1024 * 1024); + } + catch (Exception) + { + try + { + reader = new UnityFileReader(Path.Join(m_Folder, filename), 4 * 1024 * 1024); + } + catch (Exception) + { + Console.Error.WriteLine(); + Console.Error.WriteLine($"Error opening resource file {filename}"); + reader = null; + } + } + + m_resourceReaders[filename] = reader; + } + + return reader; + } + } From 0b4ceb0674fbe0e86747bb6888a6f04d15eaefd7 Mon Sep 17 00:00:00 2001 From: Andrew Skowronski Date: Thu, 11 Jun 2026 19:33:09 -0400 Subject: [PATCH 06/10] [#70] Document ManagedReferenceRegistry parsing and PPtr callback Refine the CallbackDelegate comment to clarify the id spaces (objectId and the return value are analyzer/database ids; fileId/pathId are raw PPtr fields) and document the return value. Add a block comment in front of ProcessManagedReferenceRegistry with C# and YAML examples showing the [SerializeReference] "references:" layout, and explain throughout that each entry's data follows the referenced type's own TypeTree (obtained via GetRefTypeTypeTreeRoot) - the reason walking the registry jumps between type trees and is more involved than the rest of the object. Also document the version 1/2 layouts, the terminating sentinel, the FQN string reads, and the registry re-entry guard. --- Analyzer/PPtrAndCrcProcessor.cs | 79 +++++++++++++++++-- .../Writers/SerializedFileSQLiteWriter.cs | 1 + 2 files changed, 75 insertions(+), 5 deletions(-) diff --git a/Analyzer/PPtrAndCrcProcessor.cs b/Analyzer/PPtrAndCrcProcessor.cs index fcd110f..edcee7d 100644 --- a/Analyzer/PPtrAndCrcProcessor.cs +++ b/Analyzer/PPtrAndCrcProcessor.cs @@ -16,6 +16,14 @@ namespace UnityDataTools.Analyzer; // CRC computation can be disabled (skipCrc) while still extracting references. public class PPtrAndCrcProcessor : IDisposable { + // Invoked for each PPtr (object reference) found while walking an object. + // objectId - analyzer/database id of the object that contains the reference (the source) + // fileId - PPtr m_FileID: index into the file's external-reference table; 0 means this (local) file + // pathId - PPtr m_PathID: the referenced object's local file id (LFID) within that file + // propertyPath - dotted path to the reference, e.g. "m_MyObject.m_MyArray[2].m_PPtrProperty" + // propertyType - the referenced type, e.g. "Texture2D" + // Returns the analyzer/database id of the referenced object (same id space as objectId), which the + // caller folds into the CRC. public delegate int CallbackDelegate(long objectId, int fileId, long pathId, string propertyPath, string propertyType); // Content-addressed stream paths (new ContentDirectory build output) look like @@ -51,8 +59,12 @@ public class PPtrAndCrcProcessor : IDisposable // skipCrc: when true, the tree is still walked to emit references but no CRC is computed. // callback: called for every PPtr found; its return value (the referenced object's id) is // folded into the CRC. - public PPtrAndCrcProcessor(SerializedFile serializedFile, UnityFileReader reader, string folder, - bool skipCrc, CallbackDelegate callback) + public PPtrAndCrcProcessor( + SerializedFile serializedFile, + UnityFileReader reader, + string folder, + bool skipCrc, + CallbackDelegate callback) { m_SerializedFile = serializedFile; m_Reader = reader; @@ -163,7 +175,10 @@ private void ProcessNode(TypeTreeNode node, bool isInManagedReferenceRegistry) } else if (node.IsManagedReferenceRegistry) { - // ManagedReferenceRegistry are never nested + // The registry holds this object's [SerializeReference] instances (see + // ProcessManagedReferenceRegistry). It only appears at the top level of the object; + // the guard prevents re-entering it when we are already walking referenced-object + // data through another type tree (isInManagedReferenceRegistry == true). if (!isInManagedReferenceRegistry) ProcessManagedReferenceRegistry(node); } @@ -219,10 +234,12 @@ private void ProcessArray(TypeTreeNode node, bool isManagedReferenceRegistry, bo } else { + // This is the version-2 "RefIds" array. Each element is a ReferencedObject + // whose children are [rid, type, data]; read the rid here and hand the type + // and data nodes to ProcessManagedReferenceData. if (dataNode.Children.Count < 3) throw new Exception("Invalid ReferencedObject"); - // First child is rid. long rid = m_Reader.ReadInt64(m_Offset); AppendCrc(m_Offset, 8); m_Offset += 8; @@ -233,6 +250,47 @@ private void ProcessArray(TypeTreeNode node, bool isManagedReferenceRegistry, bo } } + // A ManagedReferenceRegistry holds the [SerializeReference] instances owned by this object. + // In YAML/JSON it is the "references:" section that always appears at the end of a + // MonoBehaviour/ScriptableObject. Each instance is stored here exactly once; the fields that + // point at it (elsewhere in the object) only store its "rid", so shared instances and cycles + // collapse to the same rid. + // + // Given this C# source: + // + // [Serializable] public class MyClass { public string m_string; } + // + // public class MyScriptableObject : ScriptableObject + // { + // [SerializeReference] public MyClass m_refA, m_refB, m_refC; // m_refC assigned m_refB + // } + // + // the serialized layout looks like this (YAML shown; the binary we walk has the same shape): + // + // m_refA: { rid: 4862042034409046192 } + // m_refB: { rid: 4862042034409046193 } + // m_refC: { rid: 4862042034409046193 } // shared instance -> same rid as m_refB + // references: + // version: 2 + // RefIds: + // - rid: 4862042034409046192 + // type: { class: MyClass, ns: , asm: MyAssembly } + // data: { m_string: foo } + // - rid: 4862042034409046193 + // type: { class: MyClass, ns: , asm: MyAssembly } + // data: { m_string: bar } + // + // The complication: TypeTrees cannot express polymorphism, so the layout of each "data" block + // is NOT described by this object's own TypeTree. Each RefId entry names its concrete type + // (class/namespace/assembly), and the "data" bytes follow a SEPARATE TypeTree obtained via + // SerializedFile.GetRefTypeTypeTreeRoot(...). Walking the registry therefore means jumping into + // a different TypeTree for every entry (see ProcessManagedReferenceData) - which is exactly why + // finding references inside the registry is so much more involved than for the rest of the object. + // + // Two on-disk versions exist: + // version 1 - entries stored back to back and terminated by a sentinel type (see + // ProcessManagedReferenceData); the rid is implied by position. + // version 2 - entries stored as a "RefIds" array, each element carrying its own rid. private void ProcessManagedReferenceRegistry(TypeTreeNode node) { if (node.Children.Count < 2) @@ -251,6 +309,8 @@ private void ProcessManagedReferenceRegistry(TypeTreeNode node) var refTypeNode = refObjNode.Children[0]; var refObjData = refObjNode.Children[1]; + // Read entries until ProcessManagedReferenceData hits the sentinel; here the rid is + // simply the entry's position. int i = 0; while (ProcessManagedReferenceData(refTypeNode, refObjData, i++)) { @@ -280,11 +340,18 @@ private void ProcessManagedReferenceRegistry(TypeTreeNode node) } } + // Reads one registry entry: the concrete type's fully-qualified name (class, namespace, + // assembly) followed by the object's data. The data is laid out according to that type's own + // TypeTree, so we fetch it and recurse into it. Returns false at the end of a version-1 + // registry - marked either by the "Terminus" sentinel type or by a null/unknown rid (-1 / -2) + // - and true otherwise. bool ProcessManagedReferenceData(TypeTreeNode refTypeNode, TypeTreeNode referencedTypeDataNode, long rid) { if (refTypeNode.Children.Count < 3) throw new Exception("Invalid ReferencedManagedType"); + // The type's fully-qualified name is stored as three consecutive strings: class, namespace, + // then assembly. Each is a length-prefixed string, padded to a 4-byte boundary. var stringSize = m_Reader.ReadInt32(m_Offset); AppendCrc(m_Offset, stringSize + 4); var className = m_Reader.ReadString(m_Offset + 4, stringSize); @@ -303,15 +370,17 @@ bool ProcessManagedReferenceData(TypeTreeNode refTypeNode, TypeTreeNode referenc m_Offset += stringSize + 4; m_Offset = (m_Offset + 3) & ~(3); + // Sentinel that terminates a version-1 registry, plus the null/unknown rids. if ((className == "Terminus" && namespaceName == "UnityEngine.DMAT" && assemblyName == "FAKE_ASM") || rid == -1 || rid == -2) { return false; } + // The data block follows the referenced type's own TypeTree, not this object's, so look it + // up by FQN and walk it (isInManagedReferenceRegistry = true so we don't re-enter the registry). var refTypeTypeTree = m_SerializedFile.GetRefTypeTypeTreeRoot(className, namespaceName, assemblyName); - // Process the ReferencedObject using its own TypeTree. var size = m_StringBuilder.Length; m_StringBuilder.Append("rid("); m_StringBuilder.Append(rid); diff --git a/Analyzer/SQLite/Writers/SerializedFileSQLiteWriter.cs b/Analyzer/SQLite/Writers/SerializedFileSQLiteWriter.cs index e569813..fe15ab1 100644 --- a/Analyzer/SQLite/Writers/SerializedFileSQLiteWriter.cs +++ b/Analyzer/SQLite/Writers/SerializedFileSQLiteWriter.cs @@ -269,6 +269,7 @@ public void WriteSerializedFile(string relativePath, string fullPath, string con } } + // Callback from PPtrAndCrcProcessor for each reference discovered in the SerializedFile private int AddReference(long objectId, int fileId, long pathId, string propertyPath, string propertyType) { // Always resolve the id so the CRC stays stable; only persist the row when references From 1533c0d07fa658768ff3f083011e729b1782821a Mon Sep 17 00:00:00 2001 From: Andrew Skowronski Date: Thu, 11 Jun 2026 20:48:10 -0400 Subject: [PATCH 07/10] [#70] Remove dead ProcessManagedReferenceData param; add explanatory comments Drop the unused referencedTypeDataNode parameter (the referenced object's layout comes from its own TypeTree via GetRefTypeTypeTreeRoot, not from this node) and update both call sites. Add comments for the trickier mechanics: 4-byte alignment, the vector/map/staticvector array wrapper, Array node child layout, the StreamingInfo 32/64-bit offset field and its field-order difference from StreamedResource, PPtr type-name parsing, and the >2GB size truncation/overflow assumptions in the stream and array handling. --- Analyzer/PPtrAndCrcProcessor.cs | 36 +++++++++++++++++++++++---------- bash.exe.stackdump | 9 +++++++++ 2 files changed, 34 insertions(+), 11 deletions(-) create mode 100644 bash.exe.stackdump diff --git a/Analyzer/PPtrAndCrcProcessor.cs b/Analyzer/PPtrAndCrcProcessor.cs index edcee7d..844d349 100644 --- a/Analyzer/PPtrAndCrcProcessor.cs +++ b/Analyzer/PPtrAndCrcProcessor.cs @@ -115,10 +115,12 @@ private void ProcessNode(TypeTreeNode node, bool isInManagedReferenceRegistry) } else if (node.Type == "vector" || node.Type == "map" || node.Type == "staticvector") { + // These containers wrap an Array node as their single child; process that array. ProcessArray(node.Children[0], false, isInManagedReferenceRegistry); } else if (node.Type.StartsWith("PPtr<")) { + // Extract T from the "PPtr" type string. var startIndex = node.Type.IndexOf('<') + 1; var endIndex = node.Type.Length - 1; var referencedType = node.Type.Substring(startIndex, endIndex - startIndex); @@ -127,12 +129,15 @@ private void ProcessNode(TypeTreeNode node, bool isInManagedReferenceRegistry) } else if (node.Type == "StreamingInfo") { + // StreamingInfo (Texture2D/Mesh) points at external stream data: offset, size, path. if (node.Children.Count != 3) throw new Exception("Invalid StreamingInfo"); + // The offset field is 32- or 64-bit depending on the type tree version. var offset = node.Children[0].Size == 4 ? m_Reader.ReadInt32(m_Offset) : m_Reader.ReadInt64(m_Offset); m_Offset += node.Children[0].Size; + // size is an unsigned 32-bit field read as a signed int; streams >2GB are not handled. var size = m_Reader.ReadInt32(m_Offset); m_Offset += 4; @@ -148,6 +153,8 @@ private void ProcessNode(TypeTreeNode node, bool isInManagedReferenceRegistry) } else if (node.Type == "StreamedResource") { + // Like StreamingInfo but used by AudioClip/VideoClip; the fields are in a different + // order - path first, then 64-bit offset and size. if (node.Children.Count != 3) throw new Exception("Invalid StreamedResource"); @@ -159,6 +166,7 @@ private void ProcessNode(TypeTreeNode node, bool isInManagedReferenceRegistry) var offset = m_Reader.ReadInt64(m_Offset); m_Offset += 8; + // 64-bit size truncated to int; streams >2GB are not handled. var size = (int)m_Reader.ReadInt64(m_Offset); m_Offset += 8; @@ -169,6 +177,7 @@ private void ProcessNode(TypeTreeNode node, bool isInManagedReferenceRegistry) } else if (node.CSharpType == typeof(string)) { + // A string is serialized as a 4-byte length followed by its bytes; CRC the whole span. var prevOffset = m_Offset; m_Offset += m_Reader.ReadInt32(m_Offset) + 4; AppendCrc(prevOffset, (int)(m_Offset - prevOffset)); @@ -194,6 +203,8 @@ private void ProcessNode(TypeTreeNode node, bool isInManagedReferenceRegistry) } } + // Unity pads certain fields to a 4-byte boundary. Re-align after the node if it, or any of + // its children, is flagged to align. if ( ((int)node.MetaFlags & (int)TypeTreeMetaFlags.AlignBytes) != 0 || ((int)node.MetaFlags & (int)TypeTreeMetaFlags.AnyChildUsesAlignBytes) != 0 @@ -205,10 +216,13 @@ private void ProcessNode(TypeTreeNode node, bool isInManagedReferenceRegistry) private void ProcessArray(TypeTreeNode node, bool isManagedReferenceRegistry, bool isInManagedReferenceRegistry) { + // An Array node has two children: [0] is the int element count, [1] the element template. var dataNode = node.Children[1]; if (dataNode.IsBasicType) { + // Fixed-size elements are stored contiguously, so CRC the 4-byte count plus all element + // bytes in one range. (size * count can overflow int for very large arrays.) var arraySize = m_Reader.ReadInt32(m_Offset); AppendCrc(m_Offset, dataNode.Size * arraySize + 4); m_Offset += dataNode.Size * arraySize + 4; @@ -235,8 +249,9 @@ private void ProcessArray(TypeTreeNode node, bool isManagedReferenceRegistry, bo else { // This is the version-2 "RefIds" array. Each element is a ReferencedObject - // whose children are [rid, type, data]; read the rid here and hand the type - // and data nodes to ProcessManagedReferenceData. + // whose children are [rid, type, data]; read the rid here and pass the type + // node to ProcessManagedReferenceData (the data node isn't needed - the layout + // comes from the referenced type's own TypeTree). if (dataNode.Children.Count < 3) throw new Exception("Invalid ReferencedObject"); @@ -244,7 +259,7 @@ private void ProcessArray(TypeTreeNode node, bool isManagedReferenceRegistry, bo AppendCrc(m_Offset, 8); m_Offset += 8; - ProcessManagedReferenceData(dataNode.Children[1], dataNode.Children[2], rid); + ProcessManagedReferenceData(dataNode.Children[1], rid); } } } @@ -303,16 +318,14 @@ private void ProcessManagedReferenceRegistry(TypeTreeNode node) if (version == 1) { - // Second child is the ReferencedObject. + // Second child is the ReferencedObject; its first child describes the referenced type. var refObjNode = node.Children[1]; - // And its children are the referenced type and data nodes. var refTypeNode = refObjNode.Children[0]; - var refObjData = refObjNode.Children[1]; // Read entries until ProcessManagedReferenceData hits the sentinel; here the rid is // simply the entry's position. int i = 0; - while (ProcessManagedReferenceData(refTypeNode, refObjData, i++)) + while (ProcessManagedReferenceData(refTypeNode, i++)) { } } @@ -342,10 +355,11 @@ private void ProcessManagedReferenceRegistry(TypeTreeNode node) // Reads one registry entry: the concrete type's fully-qualified name (class, namespace, // assembly) followed by the object's data. The data is laid out according to that type's own - // TypeTree, so we fetch it and recurse into it. Returns false at the end of a version-1 - // registry - marked either by the "Terminus" sentinel type or by a null/unknown rid (-1 / -2) - // - and true otherwise. - bool ProcessManagedReferenceData(TypeTreeNode refTypeNode, TypeTreeNode referencedTypeDataNode, long rid) + // TypeTree, which we look up by name and recurse into - so the data node from the registry's + // own TypeTree is not needed here; refTypeNode is used only to sanity-check the entry's shape. + // Returns false at the end of a version-1 registry - marked either by the "Terminus" sentinel + // type or by a null/unknown rid (-1 / -2) - and true otherwise. + bool ProcessManagedReferenceData(TypeTreeNode refTypeNode, long rid) { if (refTypeNode.Children.Count < 3) throw new Exception("Invalid ReferencedManagedType"); diff --git a/bash.exe.stackdump b/bash.exe.stackdump new file mode 100644 index 0000000..c998532 --- /dev/null +++ b/bash.exe.stackdump @@ -0,0 +1,9 @@ +Stack trace: +Frame Function Args +000FFFFC200 00210062B0E (00210297178, 00210275E3E, 00000000000, 000FFFFB100) +000FFFFC200 0021004846A (00000000000, 00000000000, 00000000000, 00000000000) +000FFFFC200 002100484A2 (00210297229, 000FFFFC0B8, 00000000000, 00000000000) +000FFFFC200 002100D2FFE (00000000000, 00000000000, 00000000000, 00000000000) +000FFFFC200 002100D3125 (000FFFFC210, 00000000000, 00000000000, 00000000000) +001004F84B7 002100D46E5 (000FFFFC210, 00000000000, 00000000000, 00000000000) +End of stack trace From 505eb481e2411c891a9eec31a0bd69d34ad19722 Mon Sep 17 00:00:00 2001 From: Andrew Skowronski Date: Thu, 11 Jun 2026 20:48:34 -0400 Subject: [PATCH 08/10] Remove accidentally committed bash.exe.stackdump; ignore *.stackdump --- .gitignore | 2 ++ bash.exe.stackdump | 9 --------- 2 files changed, 2 insertions(+), 9 deletions(-) delete mode 100644 bash.exe.stackdump diff --git a/.gitignore b/.gitignore index e71a9f7..de49506 100644 --- a/.gitignore +++ b/.gitignore @@ -40,3 +40,5 @@ UnityFileSystemTestData/UserSettings/ UnityFileSystemTestData/Packages/ *.db *.csv + +*.stackdump diff --git a/bash.exe.stackdump b/bash.exe.stackdump deleted file mode 100644 index c998532..0000000 --- a/bash.exe.stackdump +++ /dev/null @@ -1,9 +0,0 @@ -Stack trace: -Frame Function Args -000FFFFC200 00210062B0E (00210297178, 00210275E3E, 00000000000, 000FFFFB100) -000FFFFC200 0021004846A (00000000000, 00000000000, 00000000000, 00000000000) -000FFFFC200 002100484A2 (00210297229, 000FFFFC0B8, 00000000000, 00000000000) -000FFFFC200 002100D2FFE (00000000000, 00000000000, 00000000000, 00000000000) -000FFFFC200 002100D3125 (000FFFFC210, 00000000000, 00000000000, 00000000000) -001004F84B7 002100D46E5 (000FFFFC210, 00000000000, 00000000000, 00000000000) -End of stack trace From d71c0119bb8bf2aacceb8b1756fa2f425e35d473 Mon Sep 17 00:00:00 2001 From: Andrew Skowronski Date: Thu, 11 Jun 2026 21:00:56 -0400 Subject: [PATCH 09/10] [#70] Note CRC is a within-database fingerprint (ref #74) --- Analyzer/PPtrAndCrcProcessor.cs | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/Analyzer/PPtrAndCrcProcessor.cs b/Analyzer/PPtrAndCrcProcessor.cs index 844d349..ee3e1e2 100644 --- a/Analyzer/PPtrAndCrcProcessor.cs +++ b/Analyzer/PPtrAndCrcProcessor.cs @@ -13,6 +13,8 @@ namespace UnityDataTools.Analyzer; // 2. Accumulate a CRC32 over the object's serialized bytes, including the content of external // streams (texture/mesh/audio data stored in companion .resS/.resource files). This CRC is a // content fingerprint used to detect whether two objects are identical. +// NOTE: references contribute their resolved analyzer object id (see ExtractPPtr), so the CRC +// is only comparable within a single analyze database, not between separate runs - see issue #74. // CRC computation can be disabled (skipCrc) while still extracting references. public class PPtrAndCrcProcessor : IDisposable { @@ -416,6 +418,10 @@ private void ExtractPPtr(string referencedType) { var refId = m_Callback(m_ObjectId, fileId, pathId, m_StringBuilder.ToString(), referencedType); + // The CRC folds in the resolved analyzer object id rather than the raw PPtr + // (fileId/pathId). This normalizes references so duplicate objects in different bundles + // hash the same within a database, but it makes the CRC depend on per-run id assignment, + // so CRCs are not comparable between separate databases. See issue #74. if (!m_SkipCrc) { m_pptrBytes[0] = (byte)(refId >> 24); From 4f64dce02b0795e27315e0792ea229068620b45d Mon Sep 17 00:00:00 2001 From: Andrew Skowronski Date: Thu, 11 Jun 2026 21:22:09 -0400 Subject: [PATCH 10/10] Remove contentdirectory-zstd test fixtures (belong on another branch) --- .../contentdirectory-zstd/BuildManifestHash.txt | 1 - .../Data/contentdirectory-zstd/content0.archive | Bin 8118 -> 0 bytes 2 files changed, 1 deletion(-) delete mode 100644 TestCommon/Data/contentdirectory-zstd/BuildManifestHash.txt delete mode 100644 TestCommon/Data/contentdirectory-zstd/content0.archive diff --git a/TestCommon/Data/contentdirectory-zstd/BuildManifestHash.txt b/TestCommon/Data/contentdirectory-zstd/BuildManifestHash.txt deleted file mode 100644 index 776b546..0000000 --- a/TestCommon/Data/contentdirectory-zstd/BuildManifestHash.txt +++ /dev/null @@ -1 +0,0 @@ -57c4c06634292c7a29331bb7099856cb \ No newline at end of file diff --git a/TestCommon/Data/contentdirectory-zstd/content0.archive b/TestCommon/Data/contentdirectory-zstd/content0.archive deleted file mode 100644 index 8cc524a2912a30f20e7559f15e5553ee239ecc94..0000000000000000000000000000000000000000 GIT binary patch literal 0 HcmV?d00001 literal 8118 zcmV;nA4%X38rF*i74 zF=jO|F*r0~Gi5n6I5#e1W-tH%0RSLEI5+_I5jiDwh zHa0UfGC4A1H(@e4Gcz$_VmB~3IXE>oV`469b8l_{00000000000000ewJ-f(DQV^K z06GmFK{*hRS_c3!0TKWb0Mgcx1qcQtDzwG)_+28`CV?TB*={ z)kQEywY(r-I!jUw1pfmr1mD1}cB~`d0d??4`|@D0?#KP)I`SJ?_xt4E2iD8~s{@Y1 z;Lr7v^|GGEi*-M`9n62Te0lJ|m@k8Qv~2e5Ec_K5{Joa|~C-d7tIzGVfN9NrQDU}QvLMnu$htMC?C@9XCKt*Bz{XdO<`dC53 zhl%5nJxGWq^k|5X%!3Dc$Ici*9Xd!Y{-49+CFQ8ak=B6o#!Z_dCz~~D(ilR6<_vO4 zo0%dZXv&ZoBPJP!2+S8EFI}8BkX&Bm*rFv1mMd1Ol%O(2N)#whoM33eWJzNh-Jg|^ zlQ1GVVq~$%u1I8vkRU*Q-1+E{$YY0&9BnvmAlj%wV}^_vXE0v4Xu)EIiWDgD6Ne@Y zOOO~LBE$`X=<7qF(nC=P4picW2BM@RhBaKw7c4}vfJ|2CDHxd$M8XawA&kKY zM1m%gff^dif)9wuNE%}ZV!bdN8Z7t>@cweOJwQIu_{-t->q{=Xd&KZNcgsbG;2_NR z0r!?x-sm}(G+>aoh+JR;sQ~ZJ&P3e^x({<4XZUP!HcO?L4-fwzi z$}aIGF(s{}s2oUjGx?QYh?2x{;xtMtC)pxN=HE6Na32`bYwIWfx*HBDS-q;##(dR&^8|9aK<0L8e9 z3}}UAfnC+WDzFsrhQRU*ne+2L|KyaWh_Fev zjWi=Y_CRoCHd4~@J1au!(fCLrq$Kkv5=$1L`N^}t6g2;26lwmC=Kpc?M8PK7>;SdsRLFKf7lk8P`|fgrHWbAk|l}@TC~{G`yir~)L6YD9|Aqp(*kRH z0rfn>5L3?^I$kf7z)F`7Sy)$~DK0{7UG+?$H0E7IRK+(wVP+MK{h6@9)q)>Vnr;w8 z)nd`uvBwdFiejx4<%-2XsY#x$uLcHbG(H+v<{A~c1{uA+#gkJmBFcF-G^2q9xdgo& z8EbknOZ!SeO`desBBNQiKG2Jopou(QF%!00s)Fq+OX300_0NxxG_x*%g(7MF9+>Nl zB(1`gE=ZdzADt@qn3s|y9+a*)mM65EEBvaJfkuyUyb~j5^h@>yavp+5dZ6RrY4%*i z>|M8tBG5#T`t&1oaL;@+C@ERe=w)?0i#Aou)Rn1oH5`=X8(*#pA4rvKVh z{espK*>DZ-(Lt-g({}!#Cz^w?An1%moZ)`~abHE>89{IySjHQZ#}FVm2#dktu!9GH z|G{^E@Zitu=ePI$F3&sI_0Qd|={ldD0|t1X?=#p_+tbr)&;V}%gFL*Sw_t$+JG{F) zI=VT#I=TdOb8~WVad2#Ka9{gY_6xX`9ccRqv`jRWfj#F@T;h#rkqky1EM~ZbHIQP z)f52r13W!BHJd+)xol!$TK>OWT2e|%LNYosQW||_QBjhp$ozBtkAOT51{pcuQ=bS) z?B51vcQ5x$=~ESM0b--$1d-uu1^G7Jn-zz@g5y9mF8}4}_Hf%5(|tXg>m8SQx4(P8 zgS;U4fu7~Zab+=FAK3dK9B%u1$K)|uSr32JzWdOga&B61= zle-#Ek~pdRpYXc{JmczOGMTJ@$-Fo`jvMp)7%q^@XJK7wv8tk)av~MwG&7L^x#7aF zLz4oN{HH5e!l~s^uxHY!iDa5T66z?8mJ+T^MhKY^{o|oOfN~VP&4kIEFkf@Rp5hOF zuy_clNbRwDd?cTZwkQW6t$yOw%)uNzR;QXVE0LAqRl;MOur)_J5WP(0XuvSFD*M2> zj~8#Q6yRvjh5$%VNTXkXImpc_sPZ$R0VTB@R%imEBL!%bF+nI z1n~}x0apVn8w#GbX1Wyr@Z|yqy9#%lE$opo>)LJTBt(jsfE)BqVG#-BK}Za94KZ&@ zmNZ&*NRm=S(FuxK#%bNHc=TU*up;I#i)xBdEm|;Xt8SPXno%a;9^<+IWp#5^hOSdX z2b-`SOcw=X)e<2gXU*(`9%>cr0uEPZ-GCnIMHbR%9aL6##YzOp4;?x)J{p#smKG17 z@bgAR7byRe`)3P$V18Vy7SH{;Ygfo`0fbJfIx|Bv;Fc&3n8A_jqA2#~P6;oEig2tn zorEYp?Ve8PEppe~qEHn6@*8fduhK&-5(O5wl%7+H(*~sMiJ87e5&EO&sI95-34{n# zSypPGxaP?Nz$IdYN)ONt(Tz6x=S+~_TJrlEHAV!$-GK{+D;gi-Ra>5Qy6Sy$EH-mRK4RLYnu{D;bmUvh z_y}_w4j~fpv}dK#hbdljBl=d-*(j^l<+W@-ai7^qbR z9yU26%&>^f)jvOr8t$wXij4J6GbCwYuNq|Fu56eYn!y&}0w5(SGYfn~=WgAkSMnR9 ztV(Jq#2o6$%|MGahOi31(|T?|!!4Xsr4PK}A;-ayzl6=|Mm=CflEK=yqE%(Zv5)nI zU7$EeRK%*%i?aSv$kIfSb4e*zFfr-4)&N#y#nWCI=ayUUHLYJ+Sem&#=9u7b36e-g z&)EjI=u#6!QsppF7)8eaPYi*K8iOI6J$JV?JvU7(PgSZ3fSrjq`x?_YO zEoQ{*4YfoGqhHtwM)in_TUv*F|6-52P5=}Y^eQ}|`(1B&X&1lsy&Ik0I$o_pxBRs3 z+IzPu-BNEoI*n56U20cteA}(^Qz>+dM{#^g)orMh8|_YU9@qE18?~$YmZwgu(Ouu~ zX?F|V*0{8ePPw%Xg;MqE&b_&JRkx+Kz1z~Mlv=IcPn;U9@>IK)c5N;dx4Tp**S@J$ zUWfACo$vBee5ylrX?MEo-6-^~sZ!~5d)u~-HOJyBf8nHMFkUPyoYU5+x+x70+ zXIOOq@^n?G)V}N8u5Bu{mg+TL^SV@eZ*i)9cb^lUVS3Md3e}@hZr!a;dETy4?Jl%i zQ>%AfcX7Su>i8bz*8CQ)@>HsgN2NHmTkBOR&gG-_lfg5r0v|XolksB0I=`|PulyW) z)4*Z0C)slX+4JiS%s#_xvgeq;drrW1h~S0(Qy`)nBX6&u1Vf2j?tz#J7GN;cB+Z<0 z|Cq4y!caSc#0&Na*K^^^GE=PmPc1tqLoA| z1P)Z=FwvIHY8@a(+U(h*Z!Oj&w_QX?Ml>!{aqCptnt|1M+=b zejK;+Vlf?A>~-V&bU-a%=D!I1%DeZg>weVSX*1OxymB%q8=uN}|8w|goiaglt`a^ZkZ=j#_RSlAjcraq0W zSS%!9)G}+xR_l*HBr~ZCgD>C+Y&O=>UkoL(pr)=%xNtC985J%&%8D zXK@a3huzP?AR@|H?dqJA{RosE+unP!>5|0|7Zrjrz8Vu~<_eqY2X|HK%-V)Xrr#IA zb!|Qkp987I{n$_t*?f`g#Jrn+9(kFQvI9h!$BIkjbf#oT2{9J3|d+~hh@EZ zxEn5|h-+!N-Tt@KD?7srGHAE1G!8=~7wYmw>|9kxqv>?V@u0*;5v8&sqKFN#&oJ~O zB0rIGx{205l5aO1GbD>KYNGRF$!da9to%>a{|38f$K1a7qNACdJnhjx-~gu$oxM6f zE@OZ$T`-dXiWUDODl<&}Gbd~k{++!6#*Oi-$*roELx)p#9uoYS_&1IB0#b#%(+k?_ zLr3uMWH2YITZFR&qtlwyc1mq)6Fr7KmD!3ML6!L1q1|SCziF-0Lh9uo#D?W_2udxe z8^as1Gx*TrjR?bGLl`{a2cV3!=|7PCnH9k^fI|he1q{jyR(FqP_Qo6x5h7s0X|&5- z*$PDr*KBo$B~lY-wX{3F+=8&^ur4YG+9q8zu*m> zfPyUgg@Th0G9QrsD~ref2*8W>B@D+mj}fNC5y}RS5q4)RXO11c|66BVN4*L8+u(E7 zkivMR%l0T^6olBslpn?pRddh_k7(@>QhQz=v6|UiUIq`QG0mvkhqvq9U75X&ZI`?7 zBg&1xx*%b#yGUO))4ZHuhYPAgF1tp#@Ue(EMekMAP+2JFMyTc(ej>`;KBjX&_8=c_ zAv*VbbZfULC9|ci6gN`m5^;FYb+4Faq?_G5t{=TZ+>c~tg~093PxAj=ZJ@Nz<6Hmi z`Y?KLNkNPWYUh&yU?3X75R{i`8zq?(ONk;xR69>X0p_$Cnx_t5%OxaCG9WuxRLWy9 zzJ(wo!yUm+G_tGo=&V9E10jg1bnNL7WE^00mJnwQXU~i2;7wq_)RUE(T?X7Y6`~1h z`^jdNG8Fluu%a=GjOM6N-k2h&KO^FIB3NwZm?2NSHx)iR5Q>H!7>cb<9eJ4%W(d%c ziO|S4MuH60R%eZ$ub&N}Jti?4ZEG?Jb*)uf7)3jyrTK>?h5|!*ap5q3hND48T!(*c zdt_ZYta4N2Fw&xgO&xuz99o=wy(UUpcE=FHo55uCz1fdFwF59s;VorFGp2i^N_*nc zdD*1Fuyz1@IN8bL%Zw9?=wHac0YS(ysXJP^H$xFFBCdZq2CSfY={3({DNyDdAcA>j zKuGI18zHJPvjIg|y#36s)2VJ6m+14na*)~oubUf^r`v)y#~tZ{Dn>{zI>m#U47dfE zx!rvOtCkb$3RI2!UoWoj)}`$mV0b;HgQl9>KeVNbvhA}3uwx4;r^kEZc5-~c*B2CiXeV=!O*!jIDG_*q_CTla zfI*XOk_oJl5slV{5C54P!;ZY|2~4t2sIoT{6Lk{RlR|Ee+?jX?%^-kX(c2{yPnRdGncht=+&A)yiaziK)XII6g(_?Wy+Eg>KNtQFJ zW`CJ0R!LI<9I6FL?gGoEcyZjhiCyJ2%gAL)>>r$B$uI$GvmH?97V#=g9u#m)Y*VT( zJ}a(3Z4uph5}#;G)O~j3)^uoUM6O~x5Q%E5QuC1xA(dTuG8u0nz1>|Xd+MKi1x0wC zu#dx;-9o@Ny#n$T${bNPc&B{JEK28esl|1K{{zyF83W^f7Hoq=&Ish1Yd%29FvoE? z_A!fioQqTu%76Taz~_Y8|{*KL_=#z0Eyc@vsIl z`{Wl6>(s?yqncGm$Ke1Fp5hGj@M3~pa)DV$X}&TdB;Pz}D@9bh0og#*hseQBNd|K$ptzb2b!Jk@t)nOhOkzIB| zXMexW!wonZ>5jmLYY)~}rW4g8(;kWPt)a3R zd*2*t5)1-HWr9FI4D%gq7al(C1km8%J>bYz8CFrI0Tn~M=XhVP#gIC|J{w(W44g@9 z7=I1iFA?KWqe*xNF)8()bG@u}Wamnryuz)>NgRY%2!+FtP!?mwKhxYu^&H@-Z;xO! zAjN#hw;89;Jnd5%Z1|))Ajh9om;O)niPCM1U?J}ET5AEbgVMF7omnOZ&3orP#-4CG z2#<-Fi3K%?$0`(H%JkZo#H{*M-**-L$bSN376=}{bahVmA35^BvpLq#FNnQ^iZ;t#&Bh|M& z#JW_U*BLqNe=KXwxqi8%zS__rOgE;)`U)HO@*Mg-32o_WbQrRK`D-PkhWmWKy-nBI zrmZ_8+w}n&PAR5zkl2h;-7R0<)Tqn5C$*m@xg+76Ewz2FuI6FL^@K06vpw;o`#mb) zNF_G=7l{yIf5$!BtEeeqXP?R)7zMuIzar2*iKe^iMB4$i@N$`1bLT*Z)WXU!9C4q( zTSm3bTI304<|0N^WTI?NO0XDU%qm@n_Z^4Yi+=w}P>swa2m}zo7TlRtE$8itTi`3t zr!|)PZ_d{J1L)pmGL+lT?%JlHBqsiiz#tAaIcyWLB*|x)H8jOLRy*ZuV=DBb-)dgc z;2sGety`mZg4?$aOdezo@Ri8YSbh4pBIC1s5c5zYsABd>{+RSUy8^9b63~!1W#`niN7U3E?O?q_n)BRPfL1 z?s|#Kyd(z)R~2QF8diEis&-ijnj!mzA%!MNi_DNtK@OUKdXx+(gA{=M5cfLjh_K%= zL~7^;=SWJ1aU7vbhZ^t-at|GXbnKyw57DAiOusvlHGd>EX##=o! z$`F1SxrPZDZ|2iA^s^F?w!oM?F$HU&_uohfhJ%j5$@YTAd&vy`<3F;iGzbcMP$u8V ze~N{!*(1rnI zFvT>WjW?Ay+unVDAl)`h8KzFx4)-Cr^MJAiVQ@H7bgb*-9YO#C#s`uc|CMTZ{;EIY zGfU_VAB!Hf821m&kwbnX7>yz{c;kV(z-I@9@W{|55DDJ>H0z9?G`}e<;t-lKNZ)%f z`$y$)HMbFh5Ip8#lBk%qc!YyFf(3@$_@K}f`b6mGQ1yOld*i{iU#8LQ52VXMv%wmp z0W3W4+9H|tJP^yy|Nf(~p)&RE*x@-UBwr57?M*yEiXg;^sc3)%fv6)^f&mBwfR4U! z=Lko71|v@RVE@6{f%sW{JP7KLM#wXQ8KYO*8OD;e2OZw1^fDlaJ!qYefP{k+6tVO@ zF<>gNHV|@X8J=ZEXggkJ9lgirIY)H5k=l}#dx$bCLxEEf|*`FqZi6B&^2u^d3!=K!`){9G4E0ot(zHdhe1~peBcn- z>+_nt4zU6cDW%pNW@GII0Xz~mbf>Er82U%>( zc96%~3$e>kH~l9mw<*y82m(+qqJ#xd6rhe+2^K&YL!QP5n1N8@G*h8bmt)08_0AY5 z85zSzopE(6{#EO3)IidRKru7H9RwP!$hMCG#DFne4>+wiC>@_0d=41{t03HulxVCq z2Q!CfdV#=e!N2#XB@EWAz&*YnyA};vpF!(84Td;w^7urSLulebVDCIL@B{6>r2jFR Q-tR)*4_np(O~z&}0{2KPR{#J2