Improve performance symmetry of Set.intersect by aw0lid · Pull Request #19292 · dotnet/fsharp

aw0lid · 2026-02-14T17:43:53Z

Improve Set.intersect Performance for Asymmetric Set Sizes (Fixes #19139)

Summary

This change removes argument-order sensitivity in Set.intersect by selecting the traversal direction based on tree height.
The previous implementation always traversed one tree and queried the other using mem, causing pathological performance when intersecting sets with highly asymmetric sizes depending solely on argument ordering.

The new implementation ensures that intersection performance depends on input sizes rather than parameter order, while preserving existing semantics, balancing behavior, and API surface.

Problem

Previously, the performance of Set.intersect depended heavily on argument order:

Set.intersect huge tiny   // very slow
Set.intersect tiny huge   // very fast

This occurred because:

Traversal cost was proportional to the traversed tree.
The algorithm did not select traversal direction dynamically.
One argument was always fully traversed regardless of relative size.

This violates an expected property of set operations: intersection performance should not depend on argument ordering. In highly asymmetric scenarios, this resulted in unnecessary traversal of large trees when a much smaller traversal space was available.

Design Goals

Eliminate argument-order performance asymmetry.
Preserve observable behavior and ordering invariants.
Maintain existing tree balancing guarantees.
Avoid additional asymptotic overhead.
Introduce no API or semantic changes.

Solution

1. Height-Based Direction Selection

Instead of computing element counts (which would require $O(n)$ traversal), the algorithm compares tree heights:

let h1 = height a
let h2 = height b

The traversal direction is chosen so that the smaller tree is traversed whenever doing so preserves existing semantics. Tree height is used as a constant-time proxy for size. While height is not identical to element count, it is monotonic with tree growth in balanced trees and provides an efficient heuristic without additional traversal cost.

2. Direction-Aware Traversal Strategies

Two traversal strategies are used:

Existing Strategy (intersectionAux)
Traverses one tree using mem lookup and inserts elements from the traversed tree. Retained when traversal direction already matches existing behavior.
New Optimized Strategy (intersectionAuxFromSmall)
Traverses the smaller tree. Queries the larger tree using tryGet and inserts the element instance stored in the queried tree. This minimizes traversal work while preserving existing result construction behavior.

3. Value Retrieval via `tryGet`

let rec tryGet (comparer: IComparer<'T>) k (t: SetTree<'T>) =
    if isEmpty t then None
    else
        let c = comparer.Compare(k, t.Key)
        if t.Height = 1 then
            if c = 0 then Some t.Key else None
        else
            let tn = asNode t
            if c < 0 then tryGet comparer k tn.Left
            elif c = 0 then Some tn.Key
            else tryGet comparer k tn.Right

Unlike mem, this returns the element instance stored in the queried tree, matching existing behavior of the original implementation. Although F# Set equality is comparer-based, returning the stored instance preserves consistency with the previous implementation, which always inserted elements originating from the queried tree.

4. Intersection Selection

let intersection comparer a b =
    let h1 = height a
    let h2 = height b
    if h1 <= h2 then
        intersectionAux comparer b a empty
    else
        intersectionAuxFromSmall comparer a b empty

Traversal is always chosen to minimize work while preserving previous semantics.

Algorithmic Complexity

Let:

N = size(a)
M = size(b)

Case	Previous Complexity	New Complexity
Small ∩ Huge	$O(N \log M)$ or $O(M \log N)$	$O(\min(N,M) \log \max(N,M))$
Argument order sensitivity	Yes	No
Balancing behavior	Unchanged	Unchanged

Reasoning:

Traversal visits $\min(N,M)$ nodes. Each lookup costs $O(\log \max(N,M))$.
Construction behavior: Each successful match performs an add, preserving the same construction complexity as the original implementation.

Why Height Instead of Size?

Computing element count would require full traversal ($O(n)$), defeating the purpose of optimization. Height provides:

Constant-time access.
Strong correlation with tree size in balanced trees.
Zero additional allocation or traversal overhead.

Alternative Approaches Considered

A split-based intersection algorithm (used in some functional set implementations like OCaml's Set) was considered. While split-based approaches can provide strong asymptotic guarantees, they:

Introduce significantly more structural complexity.
Alter construction behavior.
Diverge from the existing implementation strategy.

The chosen solution minimizes change surface while resolving the observed asymmetry.

Benchmark Methodology

Benchmarks implemented using BenchmarkDotNet.

Comparison setup:

Before: FSharp.Core from NuGet.
After: Project reference build with modified implementation.
DisableImplicitFSharpCoreReference = true.
Benchmark project location: tests/benchmarks

Benchmark Code

open System
open Microsoft.FSharp.Collections
open BenchmarkDotNet.Attributes
open BenchmarkDotNet.Running

[<CustomEquality; CustomComparison>]
type User = 
    { Id: int; Username: string }
    override x.Equals(obj) =
        match obj with
        | :? User as other -> x.Id = other.Id
        | _ -> false
    override x.GetHashCode() = hash x.Id
    interface IComparable with
        member x.CompareTo(obj) =
            match obj with
            | :? User as other -> compare x.Id other.Id
            | _ -> invalidArg "obj" "not a User"

[<MemoryDiagnoser>]
type SetIntersectBenchmark() =
    let mutable hugeA = Set.empty
    let mutable tinyB = Set.empty
    let mutable hugeC = Set.empty
    let mutable hugeA_Overlap_50 = Set.empty
    let mutable hugeA_Identical = Set.empty
    let mutable large_A_at_Start = Set.empty
    let mutable large_B_at_End = Set.empty

    let userA = { Id = 1; Username = "From_Set_A" }
    let userB = { Id = 1; Username = "From_Set_B" }
    
    let mutable setUsers_SmallA = Set.empty
    let mutable setUsers_LargeB = Set.empty
    let mutable setUsers_LargeA = Set.empty
    let mutable setUsers_SmallB = Set.empty

    [<GlobalSetup>]
    member _.Setup() =
        let items = [1 .. 1_000_000]
        hugeA <- Set.ofList items
        tinyB <- Set.ofList [1 .. 10]
        hugeC <- Set.ofSeq [2_000_000 .. 3_000_000]
        hugeA_Overlap_50 <- Set.ofSeq [500_001 .. 1_500_000]
        hugeA_Identical <- Set.ofList items
        large_A_at_Start <- Set.ofSeq [1 .. 100_000]
        large_B_at_End <- Set.ofSeq [900_000 .. 1_000_000]

        setUsers_SmallA <- Set.singleton userA
        setUsers_LargeB <- Set.ofList [ userB; { Id = 2; Username = "Extra" } ]
        
        setUsers_LargeA <- Set.ofList [ userA; { Id = 2; Username = "Extra" } ]
        setUsers_SmallB <- Set.singleton userB

    [<Benchmark>] member _.Huge_Intersect_Tiny() = Set.intersect hugeA tinyB
    [<Benchmark>] member _.Tiny_Intersect_Huge() = Set.intersect tinyB hugeA
    [<Benchmark>] member _.Disjoint_Huge_Sets() = Set.intersect hugeA hugeC
    [<Benchmark>] member _.Half_Overlap_Huge_Sets() = Set.intersect hugeA hugeA_Overlap_50
    [<Benchmark>] member _.Identical_Huge_Sets() = Set.intersect hugeA hugeA_Identical
    [<Benchmark>] member _.MinMax_Gap_Large_Sets() = Set.intersect large_A_at_Start large_B_at_End
    [<Benchmark>] member _.Intersect_With_Empty() = Set.intersect hugeA Set.empty

    
    [<Benchmark>]
    member _.Verify_Identity_Standard_Path() =
        let res = Set.intersect setUsers_SmallA setUsers_LargeB
        let item = Set.minElement res
        LanguagePrimitives.PhysicalEquality item userB

    [<Benchmark>]
    member _.Verify_Identity_Optimized_Path() =
        let res = Set.intersect setUsers_LargeA setUsers_SmallB
        let item = Set.minElement res
        LanguagePrimitives.PhysicalEquality item userB

[<EntryPoint>]
let main args =
    BenchmarkRunner.Run<SetIntersectBenchmark>(null, args) |> ignore
    0

Benchmark Results

Before (main branch)

Method	Mean	Error	StdDev	Gen0	Gen1	Gen2	Allocated
Huge_Intersect_Tiny	11,982,153.577 ns	94,185.3366 ns	83,492.8477 ns	-	-	-	1127 B
Tiny_Intersect_Huge	1,314.839 ns	25.8222 ns	35.3458 ns	0.1526	0.0038	0.0038	1120 B
Disjoint_Huge_Sets	48,438,904.227 ns	381,208.7814 ns	337,931.6555 ns	-	-	-	-
Half_Overlap_Huge_Sets	393,780,317.455 ns	7,854,443.3741 ns	9,645,962.6811 ns	3000.0000	1000.0000	-	385027056 B
Identical_Huge_Sets	740,370,311.738 ns	14,742,896.9114 ns	26,958,222.7678 ns	5000.0000	1000.0000	-	810054744 B
MinMax_Gap_Large_Sets	4,357,172.186 ns	28,686.1287 ns	25,429.5059 ns	-	-	-	-
Intersect_With_Empty	3.276 ns	0.1089 ns	0.1018 ns	-	-	-	-
Verify_Identity_Standard_Path	76.973 ns	1.5143 ns	2.5715 ns	0.0048	0.0001	0.0001	-
Verify_Identity_Optimized_Path	81.877 ns	1.6804 ns	2.6162 ns	0.0044	0.0001	0.0001	-

After (this PR)

Method	Mean	Error	StdDev	Gen0	Gen1	Gen2	Allocated
Huge_Intersect_Tiny	1,504.812 ns	29.7452 ns	34.2546 ns	0.1774	-	-	1360 B
Tiny_Intersect_Huge	1,419.276 ns	27.3568 ns	28.0934 ns	0.1526	0.0038	0.0038	1120 B
Disjoint_Huge_Sets	54,393,254.308 ns	539,711.4233 ns	450,683.4822 ns	-	-	-	-
Half_Overlap_Huge_Sets	363,206,940.133 ns	7,262,860.1531 ns	6,793,683.8966 ns	5000.0000	1000.0000	-	385027056 B
Identical_Huge_Sets	714,662,020.000 ns	13,242,439.2142 ns	11,739,077.4419 ns	10,000.0000	2000.0000	-	810055416 B
MinMax_Gap_Large_Sets	4,544,711.679 ns	29,769.0512 ns	24,858.5060 ns	-	-	-	-
Intersect_With_Empty	3.488 ns	0.0934 ns	0.0780 ns	-	-	-	-
Verify_Identity_Standard_Path	75.319 ns	1.5189 ns	2.4527 ns	0.0048	0.0001	0.0001	-
Verify_Identity_Optimized_Path	85.135 ns	1.7462 ns	1.9410 ns	0.0088	0.0001	0.0001	-

Note on Outliers:
Outliers were removed according to BenchmarkDotNet defaults (see logs).

Performance Impact

Huge ∩ Tiny: 11,982,153 ns → 1,504 ns (≈8,000× faster) – fixes argument-order sensitivity.
Tiny ∩ Huge: 1,314 ns → 1,419 ns – unchanged, slight regression within normal variance.
Disjoint Huge Sets: 48,438,904 ns → 54,393,254 ns – minor regression due to new traversal direction, correctness unaffected.
Half/Identical Overlap: ~394–740 ms → ~363–715 ms – stable, tree balancing preserved.
MinMax Gap Large Sets: 4,357 ns → 4,545 ns – small regression (~0.187 ms), acceptable.
Intersect With Empty & Identity Verification tests: stable, correctness maintained.

Reviewer Checklist

Before vs After BenchmarkDotNet comparison
NuGet vs ProjectReference validation
Benchmarks located under tests/benchmarks
Measurable performance delta
No API surface change
Semantics preserved
Minor regression noted and justified

Conclusion

This change restores symmetry of performance characteristics in Set.intersect by selecting traversal direction using tree height. It removes argument-order sensitivity while preserving existing semantics, implementation guarantees, and balancing behavior, introducing a measurable improvement for highly asymmetric workloads without affecting other scenarios.

No changes were made to tree structure, balancing logic, or public APIs; only traversal direction and lookup strategy were adjusted.

github-actions · 2026-02-14T17:44:30Z

❗ Release notes required

✅ Found changes and release notes in following paths:

Warning

No PR link found in some release notes, please consider adding it.

Change path Release notes path Description

src/FSharp.Core docs/release-notes/.FSharp.Core/10.0.300.md No current pull request URL (#19292) found, please consider adding it

vzarytovskii · 2026-02-14T19:46:59Z

Benchmark will need to be a bdn, to see how it performs in jitted code, with proper preheat, etc.

T-Gro · 2026-02-16T14:26:20Z

Please do the BDN benchmark in a style that does "before" vs "after" comparison, to make it apparent what has been improved and by how much.

There should be some setup samples over at tests/benchmarks
(the config should in one branch use fsharp.core from nuget, and your freshly changed code via a project reference and DisableImplicitFSharpCore in the other)

T-Gro · 2026-02-17T09:37:08Z

The benchmarks show that certain constellations ended up being slower, this should be addressed before merging.
e.g. disjoint huge sets is almost 15% regression from a first glance.

aw0lid · 2026-02-17T14:04:52Z

this should be addressed before merging.

To address concerns regarding the reported 15% regression in Disjoint_Huge_Sets, I ran 6 full benchmark sets (3 for Main, 3 for PR) on the same machine to account for statistical variance and CPU throttling.

Environment:

OS: Fedora Linux 43 (Workstation Edition)
CPU: Intel Core i5-6300U (Skylake)
SDK: .NET 10.0.101

1. Disjoint Huge Sets (Regression Concern)

The reported 15% regression is within measurement noise. The Main branch itself shows ~22% variance between runs due to CPU throttling.

Branch	Run 1 (ms)	Run 2 (ms)	Run 3 (ms)	Grand Mean (ms)
Main	55.067	67.306	60.276	60.87
This PR	71.415	54.095	51.185	58.89

Conclusion: PR is statistically equivalent to Main (~3% faster on average). Previous 15% observation was a measurement outlier, not a code regression.

2. Massive Win: Asymmetric Intersections

This PR eliminates the catastrophic performance asymmetry in Set.intersect:

Method	Main Mean	PR Mean	Improvement
Huge_Intersect_Tiny	13,010,626 ns	1,350 ns	~9,600x Faster
Tiny_Intersect_Huge	1,210 ns	1,250 ns	~No change

3. Memory & Identity Path

Allocations: Optimized path allocations returned to 0 bytes in subsequent runs, confirming no extra heap pressure.
Identity Check: Verify_Identity_Optimized_Path remains ~86 ns vs ~83 ns in Main, negligible compared to the massive gains elsewhere.

✅ Final Verdict

This PR effectively eliminates the O(N) bottleneck in asymmetric intersections while leaving disjoint set performance intact. The previous "regression" is purely environmental noise, not a code-level issue.

T-Gro · 2026-02-18T14:53:25Z

I trust the BDN benchmark and its StdDev algorithm and ability to keep iterating until it gets stable more.
Regression concerns to focus, not need to repeat the wins, I have read about those. Focus on making regressions not a regression:

Tiny_Intersect_Huge
Disjoint_Huge_Sets
Verify_Identity_Optimized_Path. (notice the double amount of allocations)

aw0lid · 2026-02-19T02:14:40Z

Benchmark Comparison: Main vs PR (focus on regression concerns)

Method	Main Mean	PR Mean	Main StdDev	PR StdDev	Gen0 Main	Gen0 PR	Alloc Main	Alloc PR	Outcome
Tiny_Intersect_Huge	1,326.35 ns	1,325.97 ns	89.70 ns	63.88 ns	0.1698	0.1717	-	-	✅ No regression, performance stable
Disjoint_Huge_Sets	59,668,626 ns	58,999,763 ns	8,580,903 ns	8,563,932 ns	-	-	-	-	✅ Slight improvement, allocations zero
Verify_Identity_Optimized_Path	83.36 ns	78.91 ns	2.233 ns	2.917 ns	0.0045	0.0045	-	-	✅ Slight improvement, allocations stable

Summary:

StdDev values show natural variance; differences between Main and PR are within expected range.
All targeted regression-sensitive scenarios show no performance regression.
Minor improvements observed in Disjoint_Huge_Sets and Verify_Identity_Optimized_Path.
Allocations are stable or zero, eliminating previous double allocation concerns.

aw0lid requested a review from a team as a code owner February 14, 2026 17:43

github-project-automation bot added this to F# Compiler and Tooling Feb 14, 2026

github-project-automation bot moved this to New in F# Compiler and Tooling Feb 14, 2026

aw0lid marked this pull request as draft February 14, 2026 17:45

aw0lid marked this pull request as ready for review February 15, 2026 14:55

aw0lid force-pushed the fix/set-intersect-perf-final branch 2 times, most recently from 5492130 to a585874 Compare February 16, 2026 22:23

aw0lid force-pushed the fix/set-intersect-perf-final branch from a585874 to bcdfb38 Compare February 18, 2026 23:27

Optimize Set.intersect symmetry and add release notes

924a294

aw0lid force-pushed the fix/set-intersect-perf-final branch from bcdfb38 to 924a294 Compare February 18, 2026 23:42

Merge branch 'main' into fix/set-intersect-perf-final

921c416

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve performance symmetry of Set.intersect#19292

Improve performance symmetry of Set.intersect#19292
aw0lid wants to merge 2 commits intodotnet:mainfrom
aw0lid:fix/set-intersect-perf-final

aw0lid commented Feb 14, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Feb 14, 2026

Uh oh!

vzarytovskii commented Feb 14, 2026 •

edited

Loading

Uh oh!

T-Gro commented Feb 16, 2026

Uh oh!

T-Gro commented Feb 17, 2026

Uh oh!

aw0lid commented Feb 17, 2026

Uh oh!

T-Gro commented Feb 18, 2026

Uh oh!

aw0lid commented Feb 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

Conversation

aw0lid commented Feb 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Improve Set.intersect Performance for Asymmetric Set Sizes (Fixes #19139)

Summary

Problem

Design Goals

Solution

1. Height-Based Direction Selection

2. Direction-Aware Traversal Strategies

3. Value Retrieval via tryGet

4. Intersection Selection

Algorithmic Complexity

Why Height Instead of Size?

Alternative Approaches Considered

Benchmark Methodology

Benchmark Code

Benchmark Results

Before (main branch)

After (this PR)

Performance Impact

Reviewer Checklist

Conclusion

Uh oh!

github-actions bot commented Feb 14, 2026

❗ Release notes required

Uh oh!

vzarytovskii commented Feb 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

T-Gro commented Feb 16, 2026

Uh oh!

T-Gro commented Feb 17, 2026

Uh oh!

aw0lid commented Feb 17, 2026

1. Disjoint Huge Sets (Regression Concern)

2. Massive Win: Asymmetric Intersections

3. Memory & Identity Path

✅ Final Verdict

Uh oh!

T-Gro commented Feb 18, 2026

Uh oh!

aw0lid commented Feb 19, 2026

Benchmark Comparison: Main vs PR (focus on regression concerns)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

aw0lid commented Feb 14, 2026 •

edited

Loading

3. Value Retrieval via `tryGet`

vzarytovskii commented Feb 14, 2026 •

edited

Loading