perf: add direct-mapped node cache to BTreeMap by sasa-tomic · Pull Request #416 · dfinity/stable-structures

sasa-tomic · 2026-03-18T17:46:28Z

Summary

Add a 32-slot direct-mapped node cache to BTreeMap, modeled after CPU caches: O(1) lookup via (address / page_size) % 32, collision = eviction (no LRU tracking)
Read paths (get, contains_key, first/last_key_value) use a take+return pattern to avoid re-loading hot upper-tree nodes from stable memory
Write paths invalidate affected cache slots in save_node, deallocate_node, merge, and clear_new
Switch get() from destructive extract_entry_at (swap_remove) to non-destructive node.value() (borrows via OnceCell)
Remove now-unused extract_entry_at method

This subsumes all four previous caching approaches (root-only, LRU+clone, LRU+Rc, page cache) into a single design that:

Has ~5 instructions overhead per cache lookup (vs ~330 for the Rc LRU's linear scan)
Stores Node<K> directly (no Rc, no Clone, no heap allocation per cache entry)
Uses cache.get_mut() on write paths (zero RefCell overhead)

Expected improvement: ~15-20% for random reads, ~65% for hot-key workloads, ~0% overhead for writes.

Add a 32-slot direct-mapped node cache to BTreeMap that avoids re-loading hot nodes from stable memory. Modeled after CPU caches: O(1) lookup via (address / page_size) % 32, collision = eviction. Read paths (get, contains_key, first/last_key_value) use a take+return pattern to borrow nodes from the cache without RefCell lifetime issues. Write paths (insert, remove, split, merge) invalidate affected cache slots. Key changes: - Switch get() from destructive extract_entry_at to node.value() - Remove unused extract_entry_at method - Change traverse() closure from Fn(&mut Node) to Fn(&Node) - Invalidate cache in save_node, deallocate_node, merge, clear_new Expected improvement: ~15-20% for random reads, ~65% for hot-key workloads, ~0% overhead for writes (cache.get_mut() bypasses RefCell).

github-actions · 2026-03-19T15:02:47Z

`canbench` 🏋 (dir: ./benchmarks/btreeset) `f0e3cb4` 2026-03-29 14:46:06 UTC

✅ ./benchmarks/btreeset/canbench_results.yml is up to date
📦 canbench_results_btreeset.csv available in artifacts

---------------------------------------------------

Summary:
  instructions:
    status:   No significant changes 👍
    counts:   [total 100 | regressed 0 | improved 0 | new 0 | unchanged 100]
    change:   [max +141.73K | p75 0 | median 0 | p25 0 | min -1.41K]
    change %: [max +0.03% | p75 0.00% | median 0.00% | p25 0.00% | min -0.00%]

  heap_increase:
    status:   No significant changes 👍
    counts:   [total 100 | regressed 0 | improved 0 | new 0 | unchanged 100]
    change:   [max 0 | p75 0 | median 0 | p25 0 | min 0]
    change %: [max 0.00% | p75 0.00% | median 0.00% | p25 0.00% | min 0.00%]

  stable_memory_increase:
    status:   No significant changes 👍
    counts:   [total 100 | regressed 0 | improved 0 | new 0 | unchanged 100]
    change:   [max 0 | p75 0 | median 0 | p25 0 | min 0]
    change %: [max 0.00% | p75 0.00% | median 0.00% | p25 0.00% | min 0.00%]

---------------------------------------------------
CSV results saved to canbench_results.csv

github-actions · 2026-03-19T15:02:48Z

`canbench` 🏋 (dir: ./benchmarks/nns) `f0e3cb4` 2026-03-29 14:45:57 UTC

✅ ./benchmarks/nns/canbench_results.yml is up to date
📦 canbench_results_nns.csv available in artifacts

---------------------------------------------------

Summary:
  instructions:
    status:   No significant changes 👍
    counts:   [total 16 | regressed 0 | improved 0 | new 0 | unchanged 16]
    change:   [max +361.55K | p75 0 | median 0 | p25 -610.45K | min -19.63M]
    change %: [max +0.04% | p75 0.00% | median 0.00% | p25 -0.22% | min -0.72%]

  heap_increase:
    status:   No significant changes 👍
    counts:   [total 16 | regressed 0 | improved 0 | new 0 | unchanged 16]
    change:   [max 0 | p75 0 | median 0 | p25 0 | min 0]
    change %: [max 0.00% | p75 0.00% | median 0.00% | p25 0.00% | min 0.00%]

  stable_memory_increase:
    status:   No significant changes 👍
    counts:   [total 16 | regressed 0 | improved 0 | new 0 | unchanged 16]
    change:   [max 0 | p75 0 | median 0 | p25 0 | min 0]
    change %: [max 0.00% | p75 0.00% | median 0.00% | p25 0.00% | min 0.00%]

---------------------------------------------------
CSV results saved to canbench_results.csv

github-actions · 2026-03-19T15:02:49Z

`canbench` 🏋 (dir: ./benchmarks/vec) `f0e3cb4` 2026-03-29 14:45:49 UTC

✅ ./benchmarks/vec/canbench_results.yml is up to date
📦 canbench_results_vec.csv available in artifacts

---------------------------------------------------

Summary:
  instructions:
    status:   No significant changes 👍
    counts:   [total 16 | regressed 0 | improved 0 | new 0 | unchanged 16]
    change:   [max 0 | p75 0 | median 0 | p25 0 | min 0]
    change %: [max 0.00% | p75 0.00% | median 0.00% | p25 0.00% | min 0.00%]

  heap_increase:
    status:   No significant changes 👍
    counts:   [total 16 | regressed 0 | improved 0 | new 0 | unchanged 16]
    change:   [max 0 | p75 0 | median 0 | p25 0 | min 0]
    change %: [max 0.00% | p75 0.00% | median 0.00% | p25 0.00% | min 0.00%]

  stable_memory_increase:
    status:   No significant changes 👍
    counts:   [total 16 | regressed 0 | improved 0 | new 0 | unchanged 16]
    change:   [max 0 | p75 0 | median 0 | p25 0 | min 0]
    change %: [max 0.00% | p75 0.00% | median 0.00% | p25 0.00% | min 0.00%]

---------------------------------------------------
CSV results saved to canbench_results.csv

github-actions · 2026-03-19T15:02:49Z

`canbench` 🏋 (dir: ./benchmarks/memory_manager) `f0e3cb4` 2026-03-29 14:45:42 UTC

✅ ./benchmarks/memory_manager/canbench_results.yml is up to date
📦 canbench_results_memory-manager.csv available in artifacts

---------------------------------------------------

Summary:
  instructions:
    status:   No significant changes 👍
    counts:   [total 3 | regressed 0 | improved 0 | new 0 | unchanged 3]
    change:   [max 0 | p75 0 | median 0 | p25 0 | min 0]
    change %: [max 0.00% | p75 0.00% | median 0.00% | p25 0.00% | min 0.00%]

  heap_increase:
    status:   No significant changes 👍
    counts:   [total 3 | regressed 0 | improved 0 | new 0 | unchanged 3]
    change:   [max 0 | p75 0 | median 0 | p25 0 | min 0]
    change %: [max 0.00% | p75 0.00% | median 0.00% | p25 0.00% | min 0.00%]

  stable_memory_increase:
    status:   No significant changes 👍
    counts:   [total 3 | regressed 0 | improved 0 | new 0 | unchanged 3]
    change:   [max 0 | p75 0 | median 0 | p25 0 | min 0]
    change %: [max 0.00% | p75 0.00% | median 0.00% | p25 0.00% | min 0.00%]

---------------------------------------------------
CSV results saved to canbench_results.csv

github-actions · 2026-03-20T11:10:50Z

`canbench` 🏋 (dir: ./benchmarks/io_chunks) `f0e3cb4` 2026-03-29 14:46:30 UTC

✅ ./benchmarks/io_chunks/canbench_results.yml is up to date
📦 canbench_results_io_chunks.csv available in artifacts

---------------------------------------------------

Summary:
  instructions:
    status:   Regressions and improvements 🔴🟢
    counts:   [total 18 | regressed 1 | improved 1 | new 0 | unchanged 16]
    change:   [max +13.00M | p75 +11 | median 0 | p25 0 | min -1.25B]
    change %: [max +2.12% | p75 0.00% | median 0.00% | p25 0.00% | min -3.05%]

  heap_increase:
    status:   No significant changes 👍
    counts:   [total 18 | regressed 0 | improved 0 | new 0 | unchanged 18]
    change:   [max 0 | p75 0 | median 0 | p25 0 | min 0]
    change %: [max 0.00% | p75 0.00% | median 0.00% | p25 0.00% | min 0.00%]

  stable_memory_increase:
    status:   No significant changes 👍
    counts:   [total 18 | regressed 0 | improved 0 | new 0 | unchanged 18]
    change:   [max 0 | p75 0 | median 0 | p25 0 | min 0]
    change %: [max 0.00% | p75 0.00% | median 0.00% | p25 0.00% | min 0.00%]

---------------------------------------------------

Only significant changes:
| status | name                    | calls |     ins |  ins Δ% | HI |  HI Δ% | SMI |  SMI Δ% |
|--------|-------------------------|-------|---------|---------|----|--------|-----|---------|
|   +    | read_chunks_btreemap_1k |       | 508.81M |  +2.12% |  0 |  0.00% |   0 |   0.00% |
|   -    | read_chunks_btreemap_1m |       |  39.69B |  -3.05% |  0 |  0.00% |   0 |   0.00% |

ins = instructions, HI = heap_increase, SMI = stable_memory_increase, Δ% = percent change

---------------------------------------------------
CSV results saved to canbench_results.csv

github-actions · 2026-03-20T11:25:02Z

`canbench` 🏋 (dir: ./benchmarks/btreemap) `f0e3cb4` 2026-03-29 14:47:45 UTC

✅ ./benchmarks/btreemap/canbench_results.yml is up to date
📦 canbench_results_btreemap.csv available in artifacts

---------------------------------------------------

Summary:
  instructions:
    status:   Regressions, improvements, and new benchmarks 🔴🟢➕
    counts:   [total 315 | regressed 67 | improved 27 | new 12 | unchanged 209]
    change:   [max +248.00M | p75 +17.08M | median +65.62K | p25 -1.37M | min -174.02M]
    change %: [max +6.44% | p75 +1.32% | median 0.01% | p25 -0.21% | min -6.59%]

  heap_increase:
    status:   New benchmarks added ➕
    counts:   [total 315 | regressed 0 | improved 0 | new 12 | unchanged 303]
    change:   [max 0 | p75 0 | median 0 | p25 0 | min 0]
    change %: [max 0.00% | p75 0.00% | median 0.00% | p25 0.00% | min 0.00%]

  stable_memory_increase:
    status:   New benchmarks added ➕
    counts:   [total 315 | regressed 0 | improved 0 | new 12 | unchanged 303]
    change:   [max 0 | p75 0 | median 0 | p25 0 | min 0]
    change %: [max 0.00% | p75 0.00% | median 0.00% | p25 0.00% | min 0.00%]

---------------------------------------------------

Only significant changes:
| status | name                                            | calls |     ins |  ins Δ% | HI |  HI Δ% | SMI |  SMI Δ% |
|--------|-------------------------------------------------|-------|---------|---------|----|--------|-----|---------|
|   +    | btreemap_v2_pop_last_principal                  |       | 830.52M |  +6.44% |  0 |  0.00% |   0 |   0.00% |
|   +    | btreemap_v2_pop_first_blob8_u64                 |       | 633.12M |  +5.65% |  0 |  0.00% |   0 |   0.00% |
|   +    | btreemap_v2_pop_last_blob8_u64                  |       | 607.50M |  +5.53% |  0 |  0.00% |   0 |   0.00% |
|   +    | btreemap_v2_contains_blob8_u64                  |       | 292.22M |  +5.44% |  0 |  0.00% |   0 |   0.00% |
|   +    | btreemap_v2_pop_first_blob_32_0                 |       | 792.12M |  +5.41% |  0 |  0.00% |   0 |   0.00% |
|   +    | btreemap_v2_pop_last_u64_u64                    |       | 691.81M |  +5.23% |  0 |  0.00% |   0 |   0.00% |
|   +    | btreemap_v2_pop_last_blob_32_0                  |       | 759.98M |  +5.07% |  0 |  0.00% |   0 |   0.00% |
|   +    | btreemap_v2_pop_first_blob_32_4                 |       | 818.17M |  +5.05% |  0 |  0.00% |   0 |   0.00% |
|   +    | btreemap_v2_pop_first_u64_u64                   |       | 715.57M |  +5.04% |  0 |  0.00% |   0 |   0.00% |
|   +    | btreemap_v2_pop_last_blob_32_8                  |       | 805.94M |  +4.92% |  0 |  0.00% |   0 |   0.00% |
|   +    | btreemap_v2_pop_last_blob_32_4                  |       | 792.05M |  +4.89% |  0 |  0.00% |   0 |   0.00% |
|   +    | btreemap_v2_mem_manager_contains_vec512_u64     |       |   1.26B |  +4.86% |  0 |  0.00% |   0 |   0.00% |
|   +    | btreemap_v2_pop_last_blob_32_16                 |       | 803.97M |  +4.83% |  0 |  0.00% |   0 |   0.00% |
|   +    | btreemap_v2_pop_last_u64_blob8                  |       | 676.54M |  +4.76% |  0 |  0.00% |   0 |   0.00% |
|   +    | btreemap_v2_pop_last_blob_16_128                |       | 749.97M |  +4.66% |  0 |  0.00% |   0 |   0.00% |
|   +    | btreemap_v2_pop_last_u64_vec8                   |       | 678.25M |  +4.62% |  0 |  0.00% |   0 |   0.00% |
|   +    | btreemap_v2_pop_last_blob_4_128                 |       | 374.33M |  +4.59% |  0 |  0.00% |   0 |   0.00% |
|   +    | btreemap_v2_pop_first_blob_32_8                 |       | 833.30M |  +4.58% |  0 |  0.00% |   0 |   0.00% |
|   +    | btreemap_v2_pop_first_blob_16_128               |       | 776.90M |  +4.55% |  0 |  0.00% |   0 |   0.00% |
|   +    | btreemap_v2_pop_first_u64_blob8                 |       | 699.58M |  +4.53% |  0 |  0.00% |   0 |   0.00% |
|   +    | btreemap_v2_pop_last_blob_8_128                 |       | 625.85M |  +4.49% |  0 |  0.00% |   0 |   0.00% |
|   +    | btreemap_v2_pop_first_principal                 |       | 836.01M |  +4.48% |  0 |  0.00% |   0 |   0.00% |
|   +    | btreemap_v2_pop_last_vec8_u64                   |       | 793.37M |  +4.45% |  0 |  0.00% |   0 |   0.00% |
|   +    | btreemap_v2_contains_vec_32_8                   |       | 368.29M |  +4.44% |  0 |  0.00% |   0 |   0.00% |
|   +    | btreemap_v2_pop_first_u64_vec8                  |       | 701.38M |  +4.40% |  0 |  0.00% |   0 |   0.00% |
|  ...   | ... 56 rows omitted ...                         |       |         |         |    |        |     |         |
|   -    | btreemap_v2_mem_manager_get_u64_vec512          |       | 375.21M |  -3.50% |  0 |  0.00% |   0 |   0.00% |
|   -    | btreemap_v2_get_blob_32_128                     |       | 329.59M |  -3.53% |  0 |  0.00% |   0 |   0.00% |
|   -    | btreemap_v2_get_vec_32_128                      |       | 420.26M |  -3.62% |  0 |  0.00% |   0 |   0.00% |
|   -    | btreemap_v2_get_blob_256_128                    |       |   1.32B |  -3.75% |  0 |  0.00% |   0 |   0.00% |
|   -    | btreemap_v2_get_blob_512_128                    |       |   2.29B |  -3.84% |  0 |  0.00% |   0 |   0.00% |
|   -    | btreemap_v2_get_blob_1024_128                   |       |   4.28B |  -3.90% |  0 |  0.00% |   0 |   0.00% |
|   -    | btreemap_v2_mem_manager_get_blob512_u64         |       |   2.36B |  -4.07% |  0 |  0.00% |   0 |   0.00% |
|   -    | btreemap_v2_get_blob_32_256                     |       | 328.46M |  -4.14% |  0 |  0.00% |   0 |   0.00% |
|   -    | btreemap_v2_get_blob_32_512                     |       | 328.46M |  -4.15% |  0 |  0.00% |   0 |   0.00% |
|   -    | btreemap_v2_get_blob_32_1024                    |       | 335.29M |  -4.34% |  0 |  0.00% |   0 |   0.00% |
|   -    | btreemap_v2_get_blob_64_128                     |       | 406.31M |  -5.23% |  0 |  0.00% |   0 |   0.00% |
|   -    | btreemap_v2_get_vec_32_1024                     |       | 509.48M |  -6.43% |  0 |  0.00% |   0 |   0.00% |
|   -    | btreemap_v2_contains_vec_32_1024                |       | 486.20M |  -6.59% |  0 |  0.00% |   0 |   0.00% |
|  new   | btreemap_v2_contains_blob_32_128_cached_32entry |       | 284.20M |         |  0 |        |   0 |         |
|  new   | btreemap_v2_contains_u64_u64_cached_32entry     |       | 215.19M |         |  0 |        |   0 |         |
|  new   | btreemap_v2_contains_vec_32_128_cached_32entry  |       | 331.88M |         |  0 |        |   0 |         |
|  new   | btreemap_v2_get_blob_32_128_cached_32entry      |       | 288.86M |         |  0 |        |   0 |         |
|  new   | btreemap_v2_get_u64_u64_cached_32entry          |       | 219.84M |         |  0 |        |   0 |         |
|  new   | btreemap_v2_get_vec_32_128_cached_32entry       |       | 338.38M |         |  0 |        |   0 |         |
|  new   | btreemap_v2_insert_blob_32_128_cached_32entry   |       | 543.53M |         |  0 |        |  28 |         |
|  new   | btreemap_v2_insert_u64_u64_cached_32entry       |       | 415.56M |         |  0 |        |   6 |         |
|  new   | btreemap_v2_insert_vec_32_128_cached_32entry    |       | 755.75M |         |  0 |        |  33 |         |
|  new   | btreemap_v2_remove_blob_32_128_cached_32entry   |       | 763.55M |         |  0 |        |   0 |         |
|  new   | btreemap_v2_remove_u64_u64_cached_32entry       |       | 606.02M |         |  0 |        |   0 |         |
|  new   | btreemap_v2_remove_vec_32_128_cached_32entry    |       |   1.05B |         |  0 |        |   0 |         |

ins = instructions, HI = heap_increase, SMI = stable_memory_increase, Δ% = percent change

---------------------------------------------------
CSV results saved to canbench_results.csv

src/btreemap.rs

…expose hit/miss stats

src/btreemap.rs

alin-at-dfinity · 2026-03-26T07:56:21Z

src/btreemap.rs

+        if !self.is_enabled() {
+            self.misses += 1;
+            return None;
+        }


Just an idea, feel free to ignore: rather than disable the cache (and incur the cost of a branch), might it be preferable to have a cache of size 1?

(As a side note, I would actually be curious to see the difference in benchmark scores from adding this and the stats. Maybe it's all a storm in a teacup. But maybe not.)

If we have a single entry that will strictly be more computationally expensive (always and is unlikely to give any benefit since we'll keep overwriting it all the time.
size-0 execution path will be correctly predicted every time and costs ~0 cycles in practice.
A size-1 cache, on the other hand, would still allocate a Node on the heap and would collide on every operation (every node maps to slot 0), producing worse miss behaviour than even a small real cache.

So I'd recommend we go with size-0 by default, in the next version. And then we can turn it on to size-32 or larger by default once we get some feedback from production runs.

btw I now see in the benchmarks with default 0-size cache size some increase in instruction cost, up to +8%.
would it be possible to have some kind of short circuit not to do extra work when cache size is zero?

btw I now see in the benchmarks with default 0-size cache size some increase in instruction cost, up to +8%.

This is what I was referring to. Sure, a size 1 cache is going to be more expensive than a size 0 cache. But having the option of a size 0 cache makes all other cache sizes more expensive. And the size 0 cache more expensive than not having this feature at all.

So I guess it's a trade-off between optimizing for no caching vs. optimizing for caching. I was going for the latter, since caching appears to make a positive change in performance, but we could also stick with no caching as the default.

By inlining one function, we're now roughly in the negative range (marginally improved performance) even with cache disabled.

re: size 1 cache

In current implementation direct-mapped cache does not prioritise top level nodes over lower level nodes, meaning that the cache with size 1 will always be overwritten on each tree traversal and will always have 0 hits, which is a pure waste of cycles.

More over any cache with the size smaller than the tree height will have 0 hits. So the rough range for the cache sizes that make it practically usable:

lower bound must be at least the tree size, maybe x2 to account for collisions

upper bound any cache size that reaches 90-95% hit ratio, above that the ROI is too small

(optional exploration idea) Based on that, I suppose one interesting idea to explore (not now, in the future PRs) is to store in cache also the current node height (or level) and prioritize top nodes (closer to root) over lower level nodes. Or maybe when we decide to use set-assotiative cache store cache lines of different levels with prioritization by level.

alin-at-dfinity · 2026-03-26T08:03:07Z

src/btreemap.rs

+    pub fn cache_stats(&self) -> CacheStats {
+        self.cache.borrow().stats()
+    }


Might be useful to return the cache size in bytes. The end user might have a hard time figuring out the average node size or even just the page size.

there were some requests from users to add to stable structures methods that report memory usage: heap usage, stable memory allocated and actually used. if this is implemented it also covers any cache, so maybe it should not report back the cache size. also if user configures the cache size, it's expected to be full.

I'm fine if this PR does not implement reporting cache size.

Reporting the cache size in bytes is easy and cheap, so I just added it.

it wasn't so easy after all, I'll look for another way to do it.

I added a MemSize trait implementation to this branch.
It calculates heap usage with (in my opinion) an acceptable tradeoff between the cost and precision.
Please take a look and let me know what you think. It's ok to revert my changes if they don't work.

src/btreemap.rs

maksymar · 2026-03-26T17:08:14Z

src/btreemap.rs

+    ///     map.set_cache_size(stats.size_bytes * 2);
+    /// }
+    /// ```
+    pub fn set_cache_size(&mut self, bytes: u64) {


After thinking about this a bit more I realized we cannot guarantee this cache size limit, because each node is 11 key-value pairs which can be unbounded and take any amount of space. So probably we should stick back to the actual numbers of nodes when defining the cache size.

Based on that, if the user wants to know the exact (or at least realistic) cache size and can only define it in the number of nodes, then we need to provide a way to report this data, maybe it should be returned as a field in stats. We can use something similar to DataSize trait implementation, or maybe some existing crate.

And with such a way to dynamically change the cache size I suppose here are the methods we need to cover cache-lifecycle:

let stats = map.node_cache_stats(); // provides: hits, misses, hit ratio, memory usage ... map.node_cache_resize(number_of_nodes); // should also clear hit/miss counters. map.node_cache_clear(); // same as resizing to the current cache size

I don't think we should stick to power of 2 cache size, it's too restrictive, let's allow users to choose their favourite number, the same way one can resize vector to any number.

Calculating the exact cache size is not trivial after all.

You'd need to walk all slots and sum up the actual heap allocations inside each Node — the Vec<LazyEntry> for entries, Vec for children, and each LazyEntry's OnceCell contents (which may or may not be materialized)

This is inherently imprecise - Rust Vec allocates with capacity, not len. You'd be measuring a lower bound unless you account for allocator overhead, which you can't portably

Node doesn't own its values in the cache (that's the whole point of read_value_uncached) — so the "big" part of the data isn't even cached. The cache holds mostly keys + children addresses + metadata

Adding a DataSize trait or pulling in a crate like deepsize/datasize is dependency bloat for a feature that not everyone will use

How about this instead?

/// Returns an estimate of the cache's heap usage in bytes. /// /// This is a rough upper bound: `num_slots * (page_size + overhead)`. /// Actual usage is typically lower because cached nodes don't /// materialize values (only keys and child pointers). pub fn node_cache_size_bytes_approx(&self) -> usize { self.cache_num_slots * (self.version.page_size().get() as usize + size_of::<(Address, Option<Node<K>>)>()) }

I have committed to the branch an implementation of MemSize trait which is used to make the calculating of node cache memory usage approximately accurate and fast without much cost.

it does not not iterate over all the nodes each time you need to read memory usage, internally it keeps track of memory usage when adding/removing nodes into the cache

imprecision of vec len vs vec capacity is good enough, I suppose with MemSize one can even measure capacity instead of len

incorporating MemSize into lazy objects (keys and values) will properly calculate actual heap usage (only keys, when values are not cached)

MemSize is currently an in-place implementation, no dependency bloat issue

The approach of approximately measuring via assuming that each node is a single 'virtual page' (1KiB) is too imprecise in my opinion.

Let's take a look at those options closely and discuss the tradeoffs.

src/btreemap.rs

maksymar · 2026-03-29T11:12:15Z

src/btreemap.rs

+            + self.version.mem_size()
+            + self.allocator.mem_size()
+            + self.length.mem_size()
+            + self.node_cache_memory_used()


FYI: this line is O(1), it does not traverse all the slots to calculate memory usage, because it tracks the usage on adding/removing the nodes.

maksymar · 2026-03-29T13:15:23Z

src/btreemap.rs

        }
-        let root = self.load_node(self.root_addr);
+        let root = self.take_or_load_node(self.root_addr);
        let (k, encoded_v) = root.get_min(self.memory());


Node::get_min and Node::get_max have their own "fast" tree traversal without searching the keys, but they always load without using the power of cache.
Maybe a future optimization would be to move this logic from node-level to btreemap-level, so that it can traverse without search but using the cache.

maksymar · 2026-03-29T13:15:31Z

src/btreemap.rs

        }
-        let root = self.load_node(self.root_addr);
+        let root = self.take_or_load_node(self.root_addr);
        let (k, encoded_v) = root.get_max(self.memory());


maksymar · 2026-03-29T13:17:56Z

src/btreemap.rs

+
+    fn put(&mut self, addr: Address, node: Node<K>) {
+        debug_assert!(self.is_enabled());
+        self.metrics.add_memory_used(node.heap_memory_used());


(optional exploration) temporary for debugging reasons we can add a debug assert to check if all the elements in the node do not contain the value, and if some do, then to inspect how they find the way to end up in cache and maybe try to remove them if possible (like in the case of get_min/get_max traversal).

this is not critical for this PR and can be done later.

maksymar · 2026-03-29T13:24:34Z

src/btreemap.rs

+    fn take(&mut self, addr: Address) -> Option<Node<K>> {
+        debug_assert!(self.is_enabled());
+        let idx = self.slot_index(addr);
+        if self.slots[idx].0 == addr {


This comparison looks a bit suspicious due to NULL which is Address(0), so if it's called to take with NULL it'll record a hit which I don't think is correct. But at the same time I don't know, maybe at this point it's always called for non NULL values? Maybe it should have an extra check for non-null.

sasa-tomic requested a review from a team as a code owner March 18, 2026 17:46

sasa-tomic marked this pull request as draft March 18, 2026 17:52

Merge branch 'main' into perf/direct-mapped-node-cache

06a4ef3

sasa-tomic added 2 commits March 20, 2026 11:42

Merge branch 'main' into perf/direct-mapped-node-cache

3ba9552

Update canbench results and add uncached reads

c3a1072

Update io-chunks perf

1ab43e9

sasa-tomic marked this pull request as ready for review March 20, 2026 11:23

maksymar reviewed Mar 25, 2026

View reviewed changes

src/btreemap.rs Outdated Show resolved Hide resolved

sasa-tomic added 2 commits March 25, 2026 14:38

Make NodeCache size configurable at construction size, and track and …

cc81894

…expose hit/miss stats

fmt

ec14e6d

maksymar reviewed Mar 25, 2026

View reviewed changes

src/btreemap.rs Show resolved Hide resolved

alin-at-dfinity reviewed Mar 26, 2026

View reviewed changes

update results and reviews

53c2c37

maksymar reviewed Mar 26, 2026

View reviewed changes

src/btreemap.rs Outdated Show resolved Hide resolved

Address reviews

a4f64b4

maksymar reviewed Mar 27, 2026

View reviewed changes

src/btreemap.rs Outdated Show resolved Hide resolved

maksymar reviewed Mar 27, 2026

View reviewed changes

src/btreemap.rs Outdated Show resolved Hide resolved

sasa-tomic and others added 5 commits March 27, 2026 14:25

optimize disabled cache case

3313af0

rename and extend cache metrics

1100a19

Added an approximate node cache size in bytes

9516ea9

Add cache tests and fix clippy

a5ac1fc

add mem_size

d0dffc4

maksymar added 2 commits March 29, 2026 12:34

add impl MemSize for Node

6515edf

add node_cache_memory_used

6fd83b3

maksymar reviewed Mar 29, 2026

View reviewed changes

subtract evicted on put

d32047f

maksymar reviewed Mar 29, 2026

View reviewed changes

canbench persist

4a954b2

maksymar reviewed Mar 29, 2026

View reviewed changes

maksymar added 5 commits March 29, 2026 15:37

update ByVal

975bddb

cleanup

7f69bdd

fix computing slot_index twice in put

259f35d

reduce nesting in traverse

b78ee0c

use cache for first/last entries

ec1286f

Conversation

sasa-tomic commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

github-actions bot commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

canbench 🏋 (dir: ./benchmarks/btreeset) f0e3cb4 2026-03-29 14:46:06 UTC

Uh oh!

github-actions bot commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

canbench 🏋 (dir: ./benchmarks/nns) f0e3cb4 2026-03-29 14:45:57 UTC

Uh oh!

github-actions bot commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

canbench 🏋 (dir: ./benchmarks/vec) f0e3cb4 2026-03-29 14:45:49 UTC

Uh oh!

github-actions bot commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

canbench 🏋 (dir: ./benchmarks/memory_manager) f0e3cb4 2026-03-29 14:45:42 UTC

Uh oh!

github-actions bot commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

canbench 🏋 (dir: ./benchmarks/io_chunks) f0e3cb4 2026-03-29 14:46:30 UTC

Uh oh!

github-actions bot commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

canbench 🏋 (dir: ./benchmarks/btreemap) f0e3cb4 2026-03-29 14:47:45 UTC

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

maksymar Mar 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

sasa-tomic commented Mar 18, 2026 •

edited

Loading

github-actions bot commented Mar 19, 2026 •

edited

Loading

`canbench` 🏋 (dir: ./benchmarks/btreeset) `f0e3cb4` 2026-03-29 14:46:06 UTC

github-actions bot commented Mar 19, 2026 •

edited

Loading

`canbench` 🏋 (dir: ./benchmarks/nns) `f0e3cb4` 2026-03-29 14:45:57 UTC

github-actions bot commented Mar 19, 2026 •

edited

Loading

`canbench` 🏋 (dir: ./benchmarks/vec) `f0e3cb4` 2026-03-29 14:45:49 UTC

github-actions bot commented Mar 19, 2026 •

edited

Loading

`canbench` 🏋 (dir: ./benchmarks/memory_manager) `f0e3cb4` 2026-03-29 14:45:42 UTC

github-actions bot commented Mar 20, 2026 •

edited

Loading

`canbench` 🏋 (dir: ./benchmarks/io_chunks) `f0e3cb4` 2026-03-29 14:46:30 UTC

github-actions bot commented Mar 20, 2026 •

edited

Loading

`canbench` 🏋 (dir: ./benchmarks/btreemap) `f0e3cb4` 2026-03-29 14:47:45 UTC

maksymar Mar 29, 2026 •

edited

Loading