Skip to content

perf: add direct-mapped node cache to BTreeMap#416

Open
sasa-tomic wants to merge 23 commits intomainfrom
perf/direct-mapped-node-cache
Open

perf: add direct-mapped node cache to BTreeMap#416
sasa-tomic wants to merge 23 commits intomainfrom
perf/direct-mapped-node-cache

Conversation

@sasa-tomic
Copy link
Copy Markdown
Member

@sasa-tomic sasa-tomic commented Mar 18, 2026

Summary

  • Add a 32-slot direct-mapped node cache to BTreeMap, modeled after CPU caches: O(1) lookup via (address / page_size) % 32, collision = eviction (no LRU tracking)
  • Read paths (get, contains_key, first/last_key_value) use a take+return pattern to avoid re-loading hot upper-tree nodes from stable memory
  • Write paths invalidate affected cache slots in save_node, deallocate_node, merge, and clear_new
  • Switch get() from destructive extract_entry_at (swap_remove) to non-destructive node.value() (borrows via OnceCell)
  • Remove now-unused extract_entry_at method

This subsumes all four previous caching approaches (root-only, LRU+clone, LRU+Rc, page cache) into a single design that:

  • Has ~5 instructions overhead per cache lookup (vs ~330 for the Rc LRU's linear scan)
  • Stores Node<K> directly (no Rc, no Clone, no heap allocation per cache entry)
  • Uses cache.get_mut() on write paths (zero RefCell overhead)

Expected improvement: ~15-20% for random reads, ~65% for hot-key workloads, ~0% overhead for writes.

Add a 32-slot direct-mapped node cache to BTreeMap that avoids
re-loading hot nodes from stable memory. Modeled after CPU caches:
O(1) lookup via (address / page_size) % 32, collision = eviction.

Read paths (get, contains_key, first/last_key_value) use a
take+return pattern to borrow nodes from the cache without
RefCell lifetime issues. Write paths (insert, remove, split,
merge) invalidate affected cache slots.

Key changes:
- Switch get() from destructive extract_entry_at to node.value()
- Remove unused extract_entry_at method
- Change traverse() closure from Fn(&mut Node) to Fn(&Node)
- Invalidate cache in save_node, deallocate_node, merge, clear_new

Expected improvement: ~15-20% for random reads, ~65% for hot-key
workloads, ~0% overhead for writes (cache.get_mut() bypasses RefCell).
@sasa-tomic sasa-tomic requested a review from a team as a code owner March 18, 2026 17:46
@sasa-tomic sasa-tomic marked this pull request as draft March 18, 2026 17:52
@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 19, 2026

canbench 🏋 (dir: ./benchmarks/btreeset) f0e3cb4 2026-03-29 14:46:06 UTC

./benchmarks/btreeset/canbench_results.yml is up to date
📦 canbench_results_btreeset.csv available in artifacts

---------------------------------------------------

Summary:
  instructions:
    status:   No significant changes 👍
    counts:   [total 100 | regressed 0 | improved 0 | new 0 | unchanged 100]
    change:   [max +141.73K | p75 0 | median 0 | p25 0 | min -1.41K]
    change %: [max +0.03% | p75 0.00% | median 0.00% | p25 0.00% | min -0.00%]

  heap_increase:
    status:   No significant changes 👍
    counts:   [total 100 | regressed 0 | improved 0 | new 0 | unchanged 100]
    change:   [max 0 | p75 0 | median 0 | p25 0 | min 0]
    change %: [max 0.00% | p75 0.00% | median 0.00% | p25 0.00% | min 0.00%]

  stable_memory_increase:
    status:   No significant changes 👍
    counts:   [total 100 | regressed 0 | improved 0 | new 0 | unchanged 100]
    change:   [max 0 | p75 0 | median 0 | p25 0 | min 0]
    change %: [max 0.00% | p75 0.00% | median 0.00% | p25 0.00% | min 0.00%]

---------------------------------------------------
CSV results saved to canbench_results.csv

@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 19, 2026

canbench 🏋 (dir: ./benchmarks/nns) f0e3cb4 2026-03-29 14:45:57 UTC

./benchmarks/nns/canbench_results.yml is up to date
📦 canbench_results_nns.csv available in artifacts

---------------------------------------------------

Summary:
  instructions:
    status:   No significant changes 👍
    counts:   [total 16 | regressed 0 | improved 0 | new 0 | unchanged 16]
    change:   [max +361.55K | p75 0 | median 0 | p25 -610.45K | min -19.63M]
    change %: [max +0.04% | p75 0.00% | median 0.00% | p25 -0.22% | min -0.72%]

  heap_increase:
    status:   No significant changes 👍
    counts:   [total 16 | regressed 0 | improved 0 | new 0 | unchanged 16]
    change:   [max 0 | p75 0 | median 0 | p25 0 | min 0]
    change %: [max 0.00% | p75 0.00% | median 0.00% | p25 0.00% | min 0.00%]

  stable_memory_increase:
    status:   No significant changes 👍
    counts:   [total 16 | regressed 0 | improved 0 | new 0 | unchanged 16]
    change:   [max 0 | p75 0 | median 0 | p25 0 | min 0]
    change %: [max 0.00% | p75 0.00% | median 0.00% | p25 0.00% | min 0.00%]

---------------------------------------------------
CSV results saved to canbench_results.csv

@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 19, 2026

canbench 🏋 (dir: ./benchmarks/vec) f0e3cb4 2026-03-29 14:45:49 UTC

./benchmarks/vec/canbench_results.yml is up to date
📦 canbench_results_vec.csv available in artifacts

---------------------------------------------------

Summary:
  instructions:
    status:   No significant changes 👍
    counts:   [total 16 | regressed 0 | improved 0 | new 0 | unchanged 16]
    change:   [max 0 | p75 0 | median 0 | p25 0 | min 0]
    change %: [max 0.00% | p75 0.00% | median 0.00% | p25 0.00% | min 0.00%]

  heap_increase:
    status:   No significant changes 👍
    counts:   [total 16 | regressed 0 | improved 0 | new 0 | unchanged 16]
    change:   [max 0 | p75 0 | median 0 | p25 0 | min 0]
    change %: [max 0.00% | p75 0.00% | median 0.00% | p25 0.00% | min 0.00%]

  stable_memory_increase:
    status:   No significant changes 👍
    counts:   [total 16 | regressed 0 | improved 0 | new 0 | unchanged 16]
    change:   [max 0 | p75 0 | median 0 | p25 0 | min 0]
    change %: [max 0.00% | p75 0.00% | median 0.00% | p25 0.00% | min 0.00%]

---------------------------------------------------
CSV results saved to canbench_results.csv

@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 19, 2026

canbench 🏋 (dir: ./benchmarks/memory_manager) f0e3cb4 2026-03-29 14:45:42 UTC

./benchmarks/memory_manager/canbench_results.yml is up to date
📦 canbench_results_memory-manager.csv available in artifacts

---------------------------------------------------

Summary:
  instructions:
    status:   No significant changes 👍
    counts:   [total 3 | regressed 0 | improved 0 | new 0 | unchanged 3]
    change:   [max 0 | p75 0 | median 0 | p25 0 | min 0]
    change %: [max 0.00% | p75 0.00% | median 0.00% | p25 0.00% | min 0.00%]

  heap_increase:
    status:   No significant changes 👍
    counts:   [total 3 | regressed 0 | improved 0 | new 0 | unchanged 3]
    change:   [max 0 | p75 0 | median 0 | p25 0 | min 0]
    change %: [max 0.00% | p75 0.00% | median 0.00% | p25 0.00% | min 0.00%]

  stable_memory_increase:
    status:   No significant changes 👍
    counts:   [total 3 | regressed 0 | improved 0 | new 0 | unchanged 3]
    change:   [max 0 | p75 0 | median 0 | p25 0 | min 0]
    change %: [max 0.00% | p75 0.00% | median 0.00% | p25 0.00% | min 0.00%]

---------------------------------------------------
CSV results saved to canbench_results.csv

@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 20, 2026

canbench 🏋 (dir: ./benchmarks/io_chunks) f0e3cb4 2026-03-29 14:46:30 UTC

./benchmarks/io_chunks/canbench_results.yml is up to date
📦 canbench_results_io_chunks.csv available in artifacts

---------------------------------------------------

Summary:
  instructions:
    status:   Regressions and improvements 🔴🟢
    counts:   [total 18 | regressed 1 | improved 1 | new 0 | unchanged 16]
    change:   [max +13.00M | p75 +11 | median 0 | p25 0 | min -1.25B]
    change %: [max +2.12% | p75 0.00% | median 0.00% | p25 0.00% | min -3.05%]

  heap_increase:
    status:   No significant changes 👍
    counts:   [total 18 | regressed 0 | improved 0 | new 0 | unchanged 18]
    change:   [max 0 | p75 0 | median 0 | p25 0 | min 0]
    change %: [max 0.00% | p75 0.00% | median 0.00% | p25 0.00% | min 0.00%]

  stable_memory_increase:
    status:   No significant changes 👍
    counts:   [total 18 | regressed 0 | improved 0 | new 0 | unchanged 18]
    change:   [max 0 | p75 0 | median 0 | p25 0 | min 0]
    change %: [max 0.00% | p75 0.00% | median 0.00% | p25 0.00% | min 0.00%]

---------------------------------------------------

Only significant changes:
| status | name                    | calls |     ins |  ins Δ% | HI |  HI Δ% | SMI |  SMI Δ% |
|--------|-------------------------|-------|---------|---------|----|--------|-----|---------|
|   +    | read_chunks_btreemap_1k |       | 508.81M |  +2.12% |  0 |  0.00% |   0 |   0.00% |
|   -    | read_chunks_btreemap_1m |       |  39.69B |  -3.05% |  0 |  0.00% |   0 |   0.00% |

ins = instructions, HI = heap_increase, SMI = stable_memory_increase, Δ% = percent change

---------------------------------------------------
CSV results saved to canbench_results.csv

@sasa-tomic sasa-tomic marked this pull request as ready for review March 20, 2026 11:23
@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 20, 2026

canbench 🏋 (dir: ./benchmarks/btreemap) f0e3cb4 2026-03-29 14:47:45 UTC

./benchmarks/btreemap/canbench_results.yml is up to date
📦 canbench_results_btreemap.csv available in artifacts

---------------------------------------------------

Summary:
  instructions:
    status:   Regressions, improvements, and new benchmarks 🔴🟢➕
    counts:   [total 315 | regressed 67 | improved 27 | new 12 | unchanged 209]
    change:   [max +248.00M | p75 +17.08M | median +65.62K | p25 -1.37M | min -174.02M]
    change %: [max +6.44% | p75 +1.32% | median 0.01% | p25 -0.21% | min -6.59%]

  heap_increase:
    status:   New benchmarks added ➕
    counts:   [total 315 | regressed 0 | improved 0 | new 12 | unchanged 303]
    change:   [max 0 | p75 0 | median 0 | p25 0 | min 0]
    change %: [max 0.00% | p75 0.00% | median 0.00% | p25 0.00% | min 0.00%]

  stable_memory_increase:
    status:   New benchmarks added ➕
    counts:   [total 315 | regressed 0 | improved 0 | new 12 | unchanged 303]
    change:   [max 0 | p75 0 | median 0 | p25 0 | min 0]
    change %: [max 0.00% | p75 0.00% | median 0.00% | p25 0.00% | min 0.00%]

---------------------------------------------------

Only significant changes:
| status | name                                            | calls |     ins |  ins Δ% | HI |  HI Δ% | SMI |  SMI Δ% |
|--------|-------------------------------------------------|-------|---------|---------|----|--------|-----|---------|
|   +    | btreemap_v2_pop_last_principal                  |       | 830.52M |  +6.44% |  0 |  0.00% |   0 |   0.00% |
|   +    | btreemap_v2_pop_first_blob8_u64                 |       | 633.12M |  +5.65% |  0 |  0.00% |   0 |   0.00% |
|   +    | btreemap_v2_pop_last_blob8_u64                  |       | 607.50M |  +5.53% |  0 |  0.00% |   0 |   0.00% |
|   +    | btreemap_v2_contains_blob8_u64                  |       | 292.22M |  +5.44% |  0 |  0.00% |   0 |   0.00% |
|   +    | btreemap_v2_pop_first_blob_32_0                 |       | 792.12M |  +5.41% |  0 |  0.00% |   0 |   0.00% |
|   +    | btreemap_v2_pop_last_u64_u64                    |       | 691.81M |  +5.23% |  0 |  0.00% |   0 |   0.00% |
|   +    | btreemap_v2_pop_last_blob_32_0                  |       | 759.98M |  +5.07% |  0 |  0.00% |   0 |   0.00% |
|   +    | btreemap_v2_pop_first_blob_32_4                 |       | 818.17M |  +5.05% |  0 |  0.00% |   0 |   0.00% |
|   +    | btreemap_v2_pop_first_u64_u64                   |       | 715.57M |  +5.04% |  0 |  0.00% |   0 |   0.00% |
|   +    | btreemap_v2_pop_last_blob_32_8                  |       | 805.94M |  +4.92% |  0 |  0.00% |   0 |   0.00% |
|   +    | btreemap_v2_pop_last_blob_32_4                  |       | 792.05M |  +4.89% |  0 |  0.00% |   0 |   0.00% |
|   +    | btreemap_v2_mem_manager_contains_vec512_u64     |       |   1.26B |  +4.86% |  0 |  0.00% |   0 |   0.00% |
|   +    | btreemap_v2_pop_last_blob_32_16                 |       | 803.97M |  +4.83% |  0 |  0.00% |   0 |   0.00% |
|   +    | btreemap_v2_pop_last_u64_blob8                  |       | 676.54M |  +4.76% |  0 |  0.00% |   0 |   0.00% |
|   +    | btreemap_v2_pop_last_blob_16_128                |       | 749.97M |  +4.66% |  0 |  0.00% |   0 |   0.00% |
|   +    | btreemap_v2_pop_last_u64_vec8                   |       | 678.25M |  +4.62% |  0 |  0.00% |   0 |   0.00% |
|   +    | btreemap_v2_pop_last_blob_4_128                 |       | 374.33M |  +4.59% |  0 |  0.00% |   0 |   0.00% |
|   +    | btreemap_v2_pop_first_blob_32_8                 |       | 833.30M |  +4.58% |  0 |  0.00% |   0 |   0.00% |
|   +    | btreemap_v2_pop_first_blob_16_128               |       | 776.90M |  +4.55% |  0 |  0.00% |   0 |   0.00% |
|   +    | btreemap_v2_pop_first_u64_blob8                 |       | 699.58M |  +4.53% |  0 |  0.00% |   0 |   0.00% |
|   +    | btreemap_v2_pop_last_blob_8_128                 |       | 625.85M |  +4.49% |  0 |  0.00% |   0 |   0.00% |
|   +    | btreemap_v2_pop_first_principal                 |       | 836.01M |  +4.48% |  0 |  0.00% |   0 |   0.00% |
|   +    | btreemap_v2_pop_last_vec8_u64                   |       | 793.37M |  +4.45% |  0 |  0.00% |   0 |   0.00% |
|   +    | btreemap_v2_contains_vec_32_8                   |       | 368.29M |  +4.44% |  0 |  0.00% |   0 |   0.00% |
|   +    | btreemap_v2_pop_first_u64_vec8                  |       | 701.38M |  +4.40% |  0 |  0.00% |   0 |   0.00% |
|  ...   | ... 56 rows omitted ...                         |       |         |         |    |        |     |         |
|   -    | btreemap_v2_mem_manager_get_u64_vec512          |       | 375.21M |  -3.50% |  0 |  0.00% |   0 |   0.00% |
|   -    | btreemap_v2_get_blob_32_128                     |       | 329.59M |  -3.53% |  0 |  0.00% |   0 |   0.00% |
|   -    | btreemap_v2_get_vec_32_128                      |       | 420.26M |  -3.62% |  0 |  0.00% |   0 |   0.00% |
|   -    | btreemap_v2_get_blob_256_128                    |       |   1.32B |  -3.75% |  0 |  0.00% |   0 |   0.00% |
|   -    | btreemap_v2_get_blob_512_128                    |       |   2.29B |  -3.84% |  0 |  0.00% |   0 |   0.00% |
|   -    | btreemap_v2_get_blob_1024_128                   |       |   4.28B |  -3.90% |  0 |  0.00% |   0 |   0.00% |
|   -    | btreemap_v2_mem_manager_get_blob512_u64         |       |   2.36B |  -4.07% |  0 |  0.00% |   0 |   0.00% |
|   -    | btreemap_v2_get_blob_32_256                     |       | 328.46M |  -4.14% |  0 |  0.00% |   0 |   0.00% |
|   -    | btreemap_v2_get_blob_32_512                     |       | 328.46M |  -4.15% |  0 |  0.00% |   0 |   0.00% |
|   -    | btreemap_v2_get_blob_32_1024                    |       | 335.29M |  -4.34% |  0 |  0.00% |   0 |   0.00% |
|   -    | btreemap_v2_get_blob_64_128                     |       | 406.31M |  -5.23% |  0 |  0.00% |   0 |   0.00% |
|   -    | btreemap_v2_get_vec_32_1024                     |       | 509.48M |  -6.43% |  0 |  0.00% |   0 |   0.00% |
|   -    | btreemap_v2_contains_vec_32_1024                |       | 486.20M |  -6.59% |  0 |  0.00% |   0 |   0.00% |
|  new   | btreemap_v2_contains_blob_32_128_cached_32entry |       | 284.20M |         |  0 |        |   0 |         |
|  new   | btreemap_v2_contains_u64_u64_cached_32entry     |       | 215.19M |         |  0 |        |   0 |         |
|  new   | btreemap_v2_contains_vec_32_128_cached_32entry  |       | 331.88M |         |  0 |        |   0 |         |
|  new   | btreemap_v2_get_blob_32_128_cached_32entry      |       | 288.86M |         |  0 |        |   0 |         |
|  new   | btreemap_v2_get_u64_u64_cached_32entry          |       | 219.84M |         |  0 |        |   0 |         |
|  new   | btreemap_v2_get_vec_32_128_cached_32entry       |       | 338.38M |         |  0 |        |   0 |         |
|  new   | btreemap_v2_insert_blob_32_128_cached_32entry   |       | 543.53M |         |  0 |        |  28 |         |
|  new   | btreemap_v2_insert_u64_u64_cached_32entry       |       | 415.56M |         |  0 |        |   6 |         |
|  new   | btreemap_v2_insert_vec_32_128_cached_32entry    |       | 755.75M |         |  0 |        |  33 |         |
|  new   | btreemap_v2_remove_blob_32_128_cached_32entry   |       | 763.55M |         |  0 |        |   0 |         |
|  new   | btreemap_v2_remove_u64_u64_cached_32entry       |       | 606.02M |         |  0 |        |   0 |         |
|  new   | btreemap_v2_remove_vec_32_128_cached_32entry    |       |   1.05B |         |  0 |        |   0 |         |

ins = instructions, HI = heap_increase, SMI = stable_memory_increase, Δ% = percent change

---------------------------------------------------
CSV results saved to canbench_results.csv

src/btreemap.rs Outdated
Comment on lines +169 to +172
if !self.is_enabled() {
self.misses += 1;
return None;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just an idea, feel free to ignore: rather than disable the cache (and incur the cost of a branch), might it be preferable to have a cache of size 1?

(As a side note, I would actually be curious to see the difference in benchmark scores from adding this and the stats. Maybe it's all a storm in a teacup. But maybe not.)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have a single entry that will strictly be more computationally expensive (always and is unlikely to give any benefit since we'll keep overwriting it all the time.
size-0 execution path will be correctly predicted every time and costs ~0 cycles in practice.
A size-1 cache, on the other hand, would still allocate a Node on the heap and would collide on every operation (every node maps to slot 0), producing worse miss behaviour than even a small real cache.

So I'd recommend we go with size-0 by default, in the next version. And then we can turn it on to size-32 or larger by default once we get some feedback from production runs.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw I now see in the benchmarks with default 0-size cache size some increase in instruction cost, up to +8%.
would it be possible to have some kind of short circuit not to do extra work when cache size is zero?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw I now see in the benchmarks with default 0-size cache size some increase in instruction cost, up to +8%.

This is what I was referring to. Sure, a size 1 cache is going to be more expensive than a size 0 cache. But having the option of a size 0 cache makes all other cache sizes more expensive. And the size 0 cache more expensive than not having this feature at all.

So I guess it's a trade-off between optimizing for no caching vs. optimizing for caching. I was going for the latter, since caching appears to make a positive change in performance, but we could also stick with no caching as the default.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By inlining one function, we're now roughly in the negative range (marginally improved performance) even with cache disabled.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re: size 1 cache

In current implementation direct-mapped cache does not prioritise top level nodes over lower level nodes, meaning that the cache with size 1 will always be overwritten on each tree traversal and will always have 0 hits, which is a pure waste of cycles.

More over any cache with the size smaller than the tree height will have 0 hits. So the rough range for the cache sizes that make it practically usable:

  • lower bound must be at least the tree size, maybe x2 to account for collisions
  • upper bound any cache size that reaches 90-95% hit ratio, above that the ROI is too small

(optional exploration idea) Based on that, I suppose one interesting idea to explore (not now, in the future PRs) is to store in cache also the current node height (or level) and prioritize top nodes (closer to root) over lower level nodes. Or maybe when we decide to use set-assotiative cache store cache lines of different levels with prioritization by level.

src/btreemap.rs Outdated
Comment on lines +486 to +488
pub fn cache_stats(&self) -> CacheStats {
self.cache.borrow().stats()
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be useful to return the cache size in bytes. The end user might have a hard time figuring out the average node size or even just the page size.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there were some requests from users to add to stable structures methods that report memory usage: heap usage, stable memory allocated and actually used. if this is implemented it also covers any cache, so maybe it should not report back the cache size. also if user configures the cache size, it's expected to be full.

I'm fine if this PR does not implement reporting cache size.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reporting the cache size in bytes is easy and cheap, so I just added it.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it wasn't so easy after all, I'll look for another way to do it.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a MemSize trait implementation to this branch.
It calculates heap usage with (in my opinion) an acceptable tradeoff between the cost and precision.
Please take a look and let me know what you think. It's ok to revert my changes if they don't work.

src/btreemap.rs Outdated
/// map.set_cache_size(stats.size_bytes * 2);
/// }
/// ```
pub fn set_cache_size(&mut self, bytes: u64) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After thinking about this a bit more I realized we cannot guarantee this cache size limit, because each node is 11 key-value pairs which can be unbounded and take any amount of space. So probably we should stick back to the actual numbers of nodes when defining the cache size.

Based on that, if the user wants to know the exact (or at least realistic) cache size and can only define it in the number of nodes, then we need to provide a way to report this data, maybe it should be returned as a field in stats. We can use something similar to DataSize trait implementation, or maybe some existing crate.

And with such a way to dynamically change the cache size I suppose here are the methods we need to cover cache-lifecycle:

let stats = map.node_cache_stats(); // provides: hits, misses, hit ratio, memory usage
...
map.node_cache_resize(number_of_nodes); // should also clear hit/miss counters.
map.node_cache_clear(); // same as resizing to the current cache size

I don't think we should stick to power of 2 cache size, it's too restrictive, let's allow users to choose their favourite number, the same way one can resize vector to any number.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calculating the exact cache size is not trivial after all.

  • You'd need to walk all slots and sum up the actual heap allocations inside each Node — the Vec<LazyEntry> for entries, Vec for children, and each LazyEntry's OnceCell contents (which may or may not be materialized)
  • This is inherently imprecise - Rust Vec allocates with capacity, not len. You'd be measuring a lower bound unless you account for allocator overhead, which you can't portably
  • Node doesn't own its values in the cache (that's the whole point of read_value_uncached) — so the "big" part of the data isn't even cached. The cache holds mostly keys + children addresses + metadata
  • Adding a DataSize trait or pulling in a crate like deepsize/datasize is dependency bloat for a feature that not everyone will use

How about this instead?

  /// Returns an estimate of the cache's heap usage in bytes.
  ///
  /// This is a rough upper bound: `num_slots * (page_size + overhead)`.
  /// Actual usage is typically lower because cached nodes don't
  /// materialize values (only keys and child pointers).
  pub fn node_cache_size_bytes_approx(&self) -> usize {
      self.cache_num_slots * (self.version.page_size().get() as usize + size_of::<(Address, Option<Node<K>>)>())
  }

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have committed to the branch an implementation of MemSize trait which is used to make the calculating of node cache memory usage approximately accurate and fast without much cost.

  • it does not not iterate over all the nodes each time you need to read memory usage, internally it keeps track of memory usage when adding/removing nodes into the cache
  • imprecision of vec len vs vec capacity is good enough, I suppose with MemSize one can even measure capacity instead of len
  • incorporating MemSize into lazy objects (keys and values) will properly calculate actual heap usage (only keys, when values are not cached)
  • MemSize is currently an in-place implementation, no dependency bloat issue

The approach of approximately measuring via assuming that each node is a single 'virtual page' (1KiB) is too imprecise in my opinion.

Let's take a look at those options closely and discuss the tradeoffs.

+ self.version.mem_size()
+ self.allocator.mem_size()
+ self.length.mem_size()
+ self.node_cache_memory_used()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: this line is O(1), it does not traverse all the slots to calculate memory usage, because it tracks the usage on adding/removing the nodes.

src/btreemap.rs Outdated
}
let root = self.load_node(self.root_addr);
let root = self.take_or_load_node(self.root_addr);
let (k, encoded_v) = root.get_min(self.memory());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Node::get_min and Node::get_max have their own "fast" tree traversal without searching the keys, but they always load without using the power of cache.
Maybe a future optimization would be to move this logic from node-level to btreemap-level, so that it can traverse without search but using the cache.

src/btreemap.rs Outdated
}
let root = self.load_node(self.root_addr);
let root = self.take_or_load_node(self.root_addr);
let (k, encoded_v) = root.get_max(self.memory());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto.


fn put(&mut self, addr: Address, node: Node<K>) {
debug_assert!(self.is_enabled());
self.metrics.add_memory_used(node.heap_memory_used());
Copy link
Copy Markdown
Contributor

@maksymar maksymar Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(optional exploration) temporary for debugging reasons we can add a debug assert to check if all the elements in the node do not contain the value, and if some do, then to inspect how they find the way to end up in cache and maybe try to remove them if possible (like in the case of get_min/get_max traversal).

this is not critical for this PR and can be done later.

fn take(&mut self, addr: Address) -> Option<Node<K>> {
debug_assert!(self.is_enabled());
let idx = self.slot_index(addr);
if self.slots[idx].0 == addr {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comparison looks a bit suspicious due to NULL which is Address(0), so if it's called to take with NULL it'll record a hit which I don't think is correct. But at the same time I don't know, maybe at this point it's always called for non NULL values? Maybe it should have an extra check for non-null.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants