_overlap_all() allocates two massive, dense (n_truth × n_pred) NumPy arrays, even though the STRtree query immediately before it proves that only a tiny fraction of pairs actually intersect.
https://github.com/weecology/DeepForest/blob/main/src/deepforest/IoU.py#L54-L58
These dense matrices are then passed to linear_sum_assignment() and used to compute a third dense iou_mat, further tripling the memory cost.
https://github.com/weecology/DeepForest/blob/main/src/deepforest/IoU.py#L115
https://github.com/weecology/DeepForest/blob/main/src/deepforest/IoU.py#L122-L128
Impact
For a realistic large-tile evaluation with 5,000 ground truth boxes and 10,000 predictions, this allocates:
intersections: 5,000 × 10,000 × 8 bytes = ~381 MB
unions: same = ~381 MB
iou_mat (line 123): same = ~381 MB
np.zeros_like (line 126): same = ~381 MB
- Total: ~1.5 GB for a single image evaluation
In reality, only a tiny fraction of these pairs actually intersect (typically <1%), so >99% of that memory stores zeros. _overlap_all is called per image via evaluate_boxes → evaluate_image_boxes → compute_IoU, so this becomes a bottleneck when evaluating large NEON plots or aerial survey tiles containing thousands of trees.
Steps to Reproduce
Here is a standalone script that measures the memory waste using tracemalloc:
import time
import tracemalloc
import numpy as np
import geopandas as gpd
from shapely.geometry import box
from deepforest.IoU import _overlap_all
def make_dummy_data(n, img_size=10000, box_size=50):
xs = np.random.uniform(0, img_size - box_size, n)
ys = np.random.uniform(0, img_size - box_size, n)
geoms = [box(x, y, x + box_size, y + box_size) for x, y in zip(xs, ys)]
return gpd.GeoDataFrame({"score": np.random.uniform(0.5, 1.0, n)}, geometry=geoms)
np.random.seed(42)
truth = make_dummy_data(5000)
preds = make_dummy_data(10000)
tracemalloc.start()
t0 = time.time()
inter, _, _, _ = _overlap_all(preds, truth)
t1 = time.time()
_, peak = tracemalloc.get_traced_memory()
tracemalloc.stop()
sparsity = np.count_nonzero(inter) / inter.size
print(f"Shape: {inter.shape}")
print(f"Sparsity: {sparsity:.4f}")
print(f"Peak memory: {peak / 1024 / 1024:.1f} MB")
print(f"Time: {t1-t0:.2f}s")
#Proposed Fix
-Use scipy.sparse.csr_matrix instead of np.zeros() to store the intersections and unions, reducing the spatial complexity from O(N*M) to O(k) (where k is the number of actual overlapping pairs).
-For linear_sum_assignment which requires a dense matrix, extract only the dense sub-matrix consisting of rows and columns that have at least one intersection, then map the assignment indices back to the original arrays. This greatly reduces the problem size for the cubic-time Hungarian algorithm.
_overlap_all()allocates two massive, dense (n_truth × n_pred) NumPy arrays, even though the STRtree query immediately before it proves that only a tiny fraction of pairs actually intersect.https://github.com/weecology/DeepForest/blob/main/src/deepforest/IoU.py#L54-L58
These dense matrices are then passed to
linear_sum_assignment()and used to compute a third denseiou_mat, further tripling the memory cost.https://github.com/weecology/DeepForest/blob/main/src/deepforest/IoU.py#L115
https://github.com/weecology/DeepForest/blob/main/src/deepforest/IoU.py#L122-L128
Impact
For a realistic large-tile evaluation with 5,000 ground truth boxes and 10,000 predictions, this allocates:
intersections: 5,000 × 10,000 × 8 bytes = ~381 MBunions: same = ~381 MBiou_mat(line 123): same = ~381 MBnp.zeros_like(line 126): same = ~381 MBIn reality, only a tiny fraction of these pairs actually intersect (typically <1%), so >99% of that memory stores zeros. _overlap_all is called per image via evaluate_boxes → evaluate_image_boxes → compute_IoU, so this becomes a bottleneck when evaluating large NEON plots or aerial survey tiles containing thousands of trees.
Steps to Reproduce
Here is a standalone script that measures the memory waste using
tracemalloc:#Proposed Fix
-Use
scipy.sparse.csr_matrixinstead ofnp.zeros()to store the intersections and unions, reducing the spatial complexity fromO(N*M)toO(k)(wherekis the number of actual overlapping pairs).-For
linear_sum_assignmentwhich requires a dense matrix, extract only the dense sub-matrix consisting of rows and columns that have at least one intersection, then map the assignment indices back to the original arrays. This greatly reduces the problem size for the cubic-time Hungarian algorithm.