Skip to content

Optimize CSS parser performance#212

Open
jafin wants to merge 1 commit intoAngleSharp:develfrom
jafin:perf/css-parser-optimizations-clean
Open

Optimize CSS parser performance#212
jafin wants to merge 1 commit intoAngleSharp:develfrom
jafin:perf/css-parser-optimizations-clean

Conversation

@jafin
Copy link
Copy Markdown
Contributor

@jafin jafin commented Mar 29, 2026

Prerequisites

Please make sure you can check the following two boxes:

  • I have read the CONTRIBUTING document
  • My code follows the code style of this project

Contribution Type

What types of changes does your code introduce? Put an x in all the boxes that apply:

  • Bug fix (non-breaking change which fixes an issue, please reference the issue id)
  • New feature (non-breaking change which adds functionality, make sure to open an associated issue first)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • My change requires a change to the documentation
  • I have updated the documentation accordingly
  • I have added tests to cover my changes
  • All new and existing tests passed

Description

  • Refactor ContentFrom() to scan chars directly instead of re-invoking the full tokenizer, eliminating double-tokenization
  • Cache single-char token strings via static lookup table.
  • Add ToLowerFast() to skip allocation when strings are already lowercase, use StringBuilder for custom property name concatenation
  • Replace @-rule if-else chain with dictionary dispatch, add dictionary index to CssStyleDeclaration for O(1) property lookups
  • Avoid double-array allocation in PeriodicValueConverter when all 4 values are present
CSS File Before After Speedup Mem Before Mem After Mem Saved
cdnjs.cloudflare 2.21 ms 1.95 ms 1.13x 1.99 MB 1.81 MB 9%
csszengarden 2.35 ms 1.82 ms 1.29x 1.84 MB 1.63 MB 11%
florian-rappl 4.73 ms 4.19 ms 1.13x 3.47 MB 3.34 MB 4%
maxcdn.bootstrapcdn 8.66 ms 7.69 ms 1.13x 6.69 MB 5.72 MB 15%
s.yimg 2.02 ms 1.91 ms 1.06x 2.83 MB 2.50 MB 12%
static.licdn 2.28 ms 2.09 ms 1.09x 2.27 MB 2.03 MB 11%
style.aliunicorn 1.59 ms 1.54 ms 1.03x 2.17 MB 2.04 MB 6%
z-ecx.images-amazon 8.46 ms 8.06 ms 1.05x 6.85 MB 6.05 MB 12%

AngleSharp vs ExCss comparison (after optimizations)

CSS File AngleSharp ExCss Ratio Memory Ratio
cdnjs.cloudflare 1.95 ms 2.24 ms 0.87x 2.00x less
csszengarden 1.82 ms 1.43 ms 1.28x 1.35x less
florian-rappl 4.19 ms 11.57 ms 0.36x 4.26x less
maxcdn.bootstrapcdn 7.69 ms 9.56 ms 0.80x 2.03x less
s.yimg 1.91 ms 11.25 ms 0.17x 5.63x less
static.licdn 2.09 ms 6.10 ms 0.34x 3.89x less
style.aliunicorn 1.54 ms 10.76 ms 0.14x 6.83x less
z-ecx.images-amazon 8.06 ms 10.45 ms 0.77x 2.09x less

Summary:
AngleSharp is now faster on 7 of 8 benchmark files and allocates 2–7× less memory than ExCss across all files.

- Rewrite ContentFrom() to scan chars directly instead of
  re-invoking the full tokenizer, eliminating double-tokenization
- Cache single-char token strings via static lookup table,
  remove unused CssStringToken/CssUrlToken/CssCommentToken subclasses
- Add ToLowerFast() to skip allocation when strings are already
  lowercase, use StringBuilder for custom property name concatenation
- Replace @-rule if-else chain with dictionary dispatch, add
  dictionary index to CssStyleDeclaration for O(1) property lookups
- Avoid double-array allocation in PeriodicValueConverter when all
  4 values are present
@jogibear9988
Copy link
Copy Markdown
Contributor

ToLowerFast maybe still could be optimized:

https://dotnetfiddle.net/ToLmEm

@jafin
Copy link
Copy Markdown
Contributor Author

jafin commented Mar 29, 2026

@jogibear9988 I'll take a look and see what the benchmark says.
I'll report back with the results of the ToLower changes you've posted would be. If it's a win I'll amend my PR.

@jafin
Copy link
Copy Markdown
Contributor Author

jafin commented Mar 29, 2026

Scenario Method Run 1 Run 2 Allocated
Already lowercase Original 7.66 ns 7.62 ns 0 B
Span 4.29 ns 4.26 ns 0 B
SIMD 8.01 ns 8.19 ns 0 B
Mixed case Original 20.18 ns 20.00 ns 56 B
Span 19.07 ns 19.49 ns 56 B
SIMD 20.12 ns 20.22 ns 56 B
All uppercase Original 14.16 ns 14.37 ns 56 B
Span 15.05 ns 15.00 ns 56 B
SIMD 15.25 ns 15.21 ns 56 B

Both runs confirm the same pattern:

  • Span is ~44% faster for the case that matters most since CSS identifiers are mostly lowercase
  • Mixed/uppercase cases: negligible difference (dominated by ToLowerInvariant allocation)
  • SIMD adds overhead for these short strings.
  • Zero regressions in the full test suite

CSS File Benchmark Comparison Existing PR vs adjusting to Span

CSS File Before (Original) After (Span) Delta
cdnjs.cloudflare 42ms 43ms +1ms
csszengarden 10ms 11ms +1ms
florian-rappl 22ms 21ms -1ms
maxcdn.bootstrapcdn 32ms 32ms 0ms
s.yimg 10ms 9ms -1ms
static.licdn 9ms 9ms 0ms
style.aliunicorn 8ms 8ms 0ms
z-ecx.images-amazon 32ms 43ms +11ms
Total 167ms 180ms +13ms

Outcome observed, Span in benchmark against realworld css was marginally worse. Not worth changing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants