Skip to content

perf(chain): refactor TxGraph.spends from BTreeMap to HashMap#2145

Open
shubham-shinde-442 wants to merge 1 commit intobitcoindevkit:masterfrom
shubham-shinde-442:perf/spends-hashmap-optimization
Open

perf(chain): refactor TxGraph.spends from BTreeMap to HashMap#2145
shubham-shinde-442 wants to merge 1 commit intobitcoindevkit:masterfrom
shubham-shinde-442:perf/spends-hashmap-optimization

Conversation

@shubham-shinde-442
Copy link

Resolves: #2026

Description

Refactors TxGraph.spends field from BTreeMap to HashMap for improved
performance while maintaining all existing behavioral guarantees.

Problem:
The spends field was a BTreeMap<OutPoint, HashSet<Txid>>, providing O(log n)
lookups. For transaction graph traversal this is unnecessary overhead.

Solution:

  • Switch to HashMap<OutPoint, HashSet<Txid>> for O(1) average-case lookups
  • Add secondary index spend_vouts_by_txid: HashMap<Txid, BTreeSet<u32>> to:
    • Provide O(1) lookup of all vouts spent by a given txid
    • Preserve deterministic vout iteration order (required by tx_spends() and
      descendant walking)
  • Update three methods to use the new index instead of range queries

Notes to the reviewers

Changelog notice

Checklists

All Submissions:

New Features:

  • I've added tests for the new feature
  • I've added docs for the new feature

Bugfixes:

  • This pull request breaks the existing API
  • I've added tests to reproduce the issue which are now passing
  • I'm linking the issue being fixed by this PR

@evanlinjin
Copy link
Member

Does the performance improvement justify the added complexity?

@lucasdbr05
Copy link

I reviewed the implementation, and indeed the tx_spends function had a complexity improvement from O(log(n) + k) to O(k) (where k is the number of spent vouts for the txid), in addition to the direct performance improvements from switching from a BTreeMap to a HashMap (insertion and lookup in amortized O(1)).

Furthermore, in terms of developer experience, I believe the performance improvement justifies the change made, as it keeps spends and spend_vouts_by_txid synchronized.

@evanlinjin
Copy link
Member

evanlinjin commented Mar 14, 2026

Thanks @lucasdbr05 for the review — a few pushbacks though.

The O(log(n) + k) → O(k) improvement for tx_spends is real, but the framing
obscures what was actually traded away. The original BTreeMap gave us range queries for free as a consequence of OutPoint's natural sort order. This PR recovers that same capability by introducing a secondary index that must be kept manually synchronized. The complexity didn't disappear — it moved from the data structure into our invariant maintenance.

Also worth noting: the reviewer's conclusion that "it keeps spends and spend_vouts_by_txid synchronized" is describing the requirement, not the guarantee. There's nothing enforcing that synchronization except the correctness of the insertion code.

A few remaining concerns before this can move forward:

  1. No benchmarks. The asymptotic argument only holds at scale. For realistic wallet graphs the constant factors of hashing OutPoint and Txid twice per lookup may dominate. Real numbers would settle this.

  2. DoubleEndedIterator regression. The original tx_spends() returned impl DoubleEndedIterator via BTreeMap::range. The new implementation uses flat_map, which doesn't preserve that bound. Was this intentional? If the return type was quietly downgraded it's a breaking change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

The TxGraph spends field does not need to be a BTreeMap, could be a HashMap instead.

3 participants