Commit 576fd03
committed
perf: Speed up purge_table by deduplicating manifest reads and parallelizing file deletion
Three changes to reduce purge_table wall time from ~7s to ~0.13s (54x) on a table with 200 snapshots:
1. Deduplicate manifests by path before iterating in delete_data_files().
The same manifest appears across many snapshots' manifest lists.
For 200 snapshots this reduces 20,100 manifest opens to 200.
2. Parallelize file deletion using the existing ExecutorFactory
ThreadPoolExecutor, matching the pattern already used for manifest
reading in plan_files() and data file reading in to_arrow().
This aligns with the Java reference implementation (CatalogUtil.dropTableData)
which also deletes files concurrently via a worker thread pool.
3. Cache Avro-to-Iceberg schema conversion and reader tree resolution.
All manifests of the same type share the same Avro schema, but it was
being JSON-parsed, converted, and resolved into a reader tree on every
open. Uses explicit threading.Lock for thread safety across all Python
implementations.1 parent 1a54e9c commit 576fd03
2 files changed
+68
-14
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
22 | 22 | | |
23 | 23 | | |
24 | 24 | | |
| 25 | + | |
25 | 26 | | |
26 | 27 | | |
27 | 28 | | |
| |||
31 | 32 | | |
32 | 33 | | |
33 | 34 | | |
| 35 | + | |
| 36 | + | |
34 | 37 | | |
35 | 38 | | |
36 | 39 | | |
| |||
68 | 71 | | |
69 | 72 | | |
70 | 73 | | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
71 | 116 | | |
72 | 117 | | |
73 | 118 | | |
| |||
97 | 142 | | |
98 | 143 | | |
99 | 144 | | |
100 | | - | |
101 | | - | |
102 | | - | |
| 145 | + | |
103 | 146 | | |
104 | 147 | | |
105 | 148 | | |
| |||
178 | 221 | | |
179 | 222 | | |
180 | 223 | | |
181 | | - | |
| 224 | + | |
182 | 225 | | |
183 | 226 | | |
184 | 227 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
90 | 90 | | |
91 | 91 | | |
92 | 92 | | |
| 93 | + | |
93 | 94 | | |
94 | 95 | | |
95 | 96 | | |
| |||
284 | 285 | | |
285 | 286 | | |
286 | 287 | | |
287 | | - | |
| 288 | + | |
288 | 289 | | |
289 | 290 | | |
290 | 291 | | |
| |||
293 | 294 | | |
294 | 295 | | |
295 | 296 | | |
296 | | - | |
| 297 | + | |
| 298 | + | |
| 299 | + | |
297 | 300 | | |
298 | 301 | | |
299 | 302 | | |
300 | 303 | | |
301 | 304 | | |
| 305 | + | |
| 306 | + | |
| 307 | + | |
302 | 308 | | |
303 | 309 | | |
304 | 310 | | |
305 | 311 | | |
| 312 | + | |
| 313 | + | |
| 314 | + | |
306 | 315 | | |
307 | 316 | | |
308 | 317 | | |
309 | 318 | | |
310 | 319 | | |
311 | 320 | | |
312 | | - | |
| 321 | + | |
313 | 322 | | |
| 323 | + | |
| 324 | + | |
| 325 | + | |
| 326 | + | |
| 327 | + | |
314 | 328 | | |
315 | | - | |
316 | | - | |
317 | | - | |
318 | | - | |
319 | | - | |
320 | | - | |
321 | | - | |
| 329 | + | |
| 330 | + | |
| 331 | + | |
| 332 | + | |
322 | 333 | | |
323 | 334 | | |
324 | 335 | | |
| |||
0 commit comments