Skip to content

Pratt parser#240

Open
dkryaklin wants to merge 13 commits into
postcss:masterfrom
dkryaklin:v11/spec-first-playground
Open

Pratt parser#240
dkryaklin wants to merge 13 commits into
postcss:masterfrom
dkryaklin:v11/spec-first-playground

Conversation

@dkryaklin
Copy link
Copy Markdown

@dkryaklin dkryaklin commented Apr 22, 2026

What

A decoupled prototype of new calc() internals under playground/pratt/

Why

v10 uses a JISON-generated LR parser and an ad-hoc reducer. It doesn't cover min() / max() / clamp(), typed division (calc(100vw / 1px)), or calc-keyword folding (pi, e, infinity). CSS Values & Units 4 §10 gives a deterministic simplification algorithm — implementing it directly makes correctness reviewable section-by-section rather than invented-and-patched.

How

Five TS modules, each mapped one-to-one to a spec section:

  • tokenizer.ts — §10.1 tokens
  • parser.ts — §10.1 grammar (Pratt + parselet registry)
  • type.ts — §10.2 calculation types
  • simplify.ts — §10.10 simplification
  • serialize.ts — §10.12 output

169 tests (parser / tokenizer / simplify / typed / opaque / serialize / WPT crib), run via pnpm test:pratt. tsx runtime, no build step.

See ROADMAP.md for more details

@dkryaklin dkryaklin changed the title v11 spec-first playground: Pratt parser + §10.10 simplifier v11 Pratt parser Apr 24, 2026
Copy link
Copy Markdown
Collaborator

@ludofischer ludofischer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very interesting! How much is AI generated? I am asking because I had thought of trying to use AI to generate a "hand-written" parser, but never tried it.

Copy link
Copy Markdown
Collaborator

@ludofischer ludofischer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's good that you are comparing against the csstools output, as I suppose they checked they're fairly spec-compliant. Are you also testing against the current postcss-calc test suite? There's a lot of code here. when do you think it's going to stabilize so I can look at it more thoroughly?

Comment thread scripts/benchmark-v10.ts Outdated

async function main(): Promise<void> {
// Dynamic imports so CJS/ESM interop works under tsx.
const v10Plugin = (await import('../src/index.js')).default as () => AcceptedPlugin;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find the naming (version 10? version 3?) strange, I don't think we need dynamic import either, we can drop Node 20 compatibility if we release a major.

Comment thread package.json
"fast-check": "^4.7.0",
"jison-gho": "0.6.1-216",
"postcss": "^8.5.10",
"prettier": "^3.8.3",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can update TypeScript to the latest version. cssnano is also using it. You could even try TypeScript go.

Comment thread tsconfig.pratt.json Outdated
"target": "es2022",
"lib": ["es2022"],
"module": "esnext",
"moduleResolution": "bundler",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are there two tsconfig files?

@dkryaklin dkryaklin force-pushed the v11/spec-first-playground branch from e362c9e to 16c1db0 Compare May 5, 2026 11:59
@dkryaklin
Copy link
Copy Markdown
Author

I am pretty much done with coding part. I will provide more context about this over weekends. Sorry for delay with response

@dkryaklin
Copy link
Copy Markdown
Author

dkryaklin commented May 23, 2026

Quick note on the reasoning before the deeper review.

Why replace jison. The generated parser worked, but the simplifier lived inside its action rules — parsing and reduction were tangled together. Ten years of edge-case patches piled on, and the output approximated the spec rather than following it. Hard to reason about, harder to extend.

Why Pratt. A Pratt parser handles operator precedence with one short priority table instead of a long list of grammar rules (one per operator arrangement). For the calc grammar that collapses ~60 jison rules down to ~11 small handlers. The whole parser is ~250 lines you can read top-to-bottom. With parsing isolated, simplification and serialization become separate narrow modules instead of being mixed into parse actions.

Net effect: each spec section maps to one file. If you find a bug, you know within a minute where to look.

@dkryaklin dkryaklin force-pushed the v11/spec-first-playground branch from cfcc02e to 1e8afd8 Compare May 23, 2026 17:43
@dkryaklin
Copy link
Copy Markdown
Author

Top-level architecture is a pipeline. Each stage is one file under src/pratt/src/core/:

calc body string
   → tokenizer.ts → tokens
   → parser.ts → AST
   → simplify.ts (+ simplify/*.ts, one module per math function) → reduced AST
   → serialize.ts → output string

Supporting files: node.ts defines the AST shape, type.ts handles the CSS unit type system.

The PostCSS adapter at src/pratt/src/plugin/plugin.ts wraps the pipeline — walks declarations, finds calc(), feeds the body in, writes the result back. The core knows nothing about PostCSS.

@dkryaklin
Copy link
Copy Markdown
Author

dkryaklin commented May 23, 2026

Three opt-in flags bridge spec-compliant defaults with legacy jison behavior. Each lives in exactly one stage of the pipeline:

  • strictWhitespace (tokenizer) — reject 2px+3px per §10.1; set false to accept the legacy form.
  • preserveOrder (simplifier) — keep input order instead of canonical reordering.
  • dropZeroIdentities (simplifier) — drop + 0px when other terms carry the unit; default preserves it for the type signal.

Defaults follow the spec. test/index.js opts into all three so the existing fixtures keep expressing the current version's contract.

@dkryaklin
Copy link
Copy Markdown
Author

dkryaklin commented May 23, 2026

Testing runs across seven layers:

  • Unit tests (~900) — src/pratt/test/unit/, one file per module in src/pratt/src/. Tokenizer, parser, serializer, every simplify fold module.
  • Property testssrc/pratt/test/property/, fast-check generates random ASTs and asserts algebraic laws (commutativity, associativity, round-trip stability).
  • Differential — every random input is also compared against @csstools/css-calc; disagreement counts as a failure (outputs canonicalized through our parser at shared precision to absorb cosmetic differences).
  • Conformancesrc/pratt/test/conformance/, subsets of WPT (css-values) and the csstools test corpus.
  • Framework corpus — ~155 calc() expressions extracted from 8 CSS frameworks (Bootstrap, Bulma, Foundation, Milligram, Picnic, Semantic UI, Turret, UIkit). Polished, expert-written inputs.
  • GitHub harvestscripts/harvest-github.ts searches public repos via gh search code with ~60 diversifying queries (works around GitHub's 1000-result-per-query cap), pulls 5,704 CSS/SCSS/LESS files and extracts 23,581 raw calc() bodies, split into:
    • 21,260 valid CSS — fed to the pipeline and compared against @csstools/css-calc.
    • 2,228 preprocessor-only (Sass/Less interpolation like calc(#{$x} * 2)) — must error cleanly, not crash.
    • 93 outright malformed — also fed in to verify the parser fails gracefully.
  • Legacy regressiontest/index.js, all 190 existing fixtures pass with the three legacy flags enabled.

Plus mutation testing via Stryker (must kill ≥85% of injected mutations) and quality gates: lint, typecheck, depcruise for module boundaries, type-coverage ≥99.5%, knip for dead code.

@dkryaklin
Copy link
Copy Markdown
Author

dkryaklin commented May 23, 2026

Five support scripts live under scripts/:

  • harvest-github.ts — pulls calc() expressions from public GitHub repos via gh search code. Source of the 21k+ real-world corpus.
  • split-corpus.ts — splits the raw harvest into three buckets (pure CSS / preprocessor / malformed) that drive the three corpus test layers.
  • show-divergences.ts — buckets harvest divergences against @csstools/css-calc so each one can be audited: real bug, or known design choice?
  • randomizer.ts — long-running differential fuzzer. Hammers random inputs against csstools and logs disagreements to reports/randomizer-finds.jsonl. Catches edge cases fast-check misses on a single CI run.
  • benchmark.ts — pratt vs @csstools/css-calc perf on the harvested corpus.

@dkryaklin dkryaklin force-pushed the v11/spec-first-playground branch from 8e07933 to 8a96a40 Compare May 23, 2026 18:28
@dkryaklin
Copy link
Copy Markdown
Author

dkryaklin commented May 23, 2026

Performance. Two comparisons on the harvested corpus.

Raw pipeline (tokenize → simplify → serialize, no PostCSS overhead) over 21,260 expressions:

pratt     3.76 µs/expr     accepted 21,260   threw 0
csstools  4.94 µs/expr     accepted 21,260   threw 0

End-to-end PostCSS run (full plugin, 100 iterations × 21,416 declarations):

legacy jison   157.29 ms / run
new pratt      155.25 ms / run

@dkryaklin dkryaklin changed the title v11 Pratt parser Pratt parser May 23, 2026
@dkryaklin dkryaklin marked this pull request as ready for review May 23, 2026 18:44
@dkryaklin dkryaklin requested a review from ludofischer May 23, 2026 18:44
@dkryaklin
Copy link
Copy Markdown
Author

dkryaklin commented May 23, 2026

Scaffolding.

  • TypeScript-first source with strict + noUncheckedIndexedAccess. Three tsconfigs (tsconfig.base.json shared by .json for lint/IDE and .build.json for emit) separate linting from emit.
  • tsx runtime — tests and scripts run .ts directly with no pre-build step.
  • Quality gates under npm run quality:
    • eslint (flat config, @typescript-eslint type-aware + eslint-plugin-sonarjs)
    • dependency-cruiser (.dependency-cruiser.cjs) — keeps core/ PostCSS-free, blocks test→prod imports
    • jscpd — copy-paste detector across the per-function fold modules under simplify/
    • knip — dead code
    • type-coverage ≥99.5% — measures any leak
  • Stryker mutation testing (stryker.conf.mjs) — mutates simplify/serialize/node, ≥85% kill threshold.
  • Package boundaryexports field in package.json gates the public API to one default import; prebuild: "rm -rf dist" for clean emits.
  • Deps — added @csstools/css-calc (differential oracle), fast-check (property tests). Dropped postcss-selector-parser (dead after the rewrite).
  • Node 22+ in engines (Node 20 compat dropped per earlier review).

@dkryaklin
Copy link
Copy Markdown
Author

dkryaklin commented May 23, 2026

Open-issues sweep.

Fixed by the rewrite (parser errors or unsupported syntax that now work cleanly):

  • #77 — nested calc() inside a var() fallback
  • #104 — 3+ variable fallbacks
  • #117calc(min(max(var(--foo), 0), 100))
  • #130calc(1 * clamp(1, ((1*1)*1), 1))
  • #132calc(120rpx - 41.7rpx) (floating + unknown unit)
  • #142calc(var(--x) * -1) (negative multiplication)
  • #190max(var(1, var(2, 3)), var(4, var(5, 6))) * 1
  • #198 — meta "better parser" request — this PR is the answer
  • #216 — Color level 5 relative colors flow through opaquely; inner calc() simplifies
  • #233calc(l * 0.7) inside oklch(from …)
  • #236lvh/lvw no longer throw UNKNOWN_DIMENSION

Newly supported as features:

  • #189min() / max() reduce the same way as calc(). min(360px, 100% - 24px - 24px)min(360px, 100% - 48px).
  • #222pow() reduces. calc(1rem * pow(1.618, 3))4.2358rem.

Improved (closer to the reporter's intent, not byte-identical):

  • #238calc(100vh - calc(100vh - 100%)) no longer collapses to 100%. New default output: calc(0vh + 100%) — preserves the length-percentage typing the reporter wanted. Not the literal input shape, but the underlying complaint is resolved.

Won't fix / design call:

  • #144100% / 3 reduces to 33.33333%; spec-aligned. Could be flagged behind an opt-out option if needed.
  • #62allowRounding option request. Not implemented; reasonable to add later.
  • #67 — Browserlist support is out of scope (cssnano territory).

@dkryaklin
Copy link
Copy Markdown
Author

test/index.js diff (the legacy regression suite):

  • Input fixtures unchanged. Every calc(...) test input is identical to master.
  • Header wraps the plugin with three legacy flags — strictWhitespace: false, preserveOrder: true, dropZeroIdentities: true — so the file expresses the current version's contract against the new pipeline.
  • ~39 expected outputs changed. Each has a one-line inline annotation explaining the change (spec-style spaces, /2 → *0.5 reciprocal, fold pi, calc unwrap §10.6, unit case lowercased, etc.) and keeps the previous expected value as /* legacy */ new so the diff is readable in place.
  • 3 previously-skipped comments tests re-enabled — the new pipeline strips CSS comments cleanly.
  • testThrows helper removed. Three tests no longer throw under the new spec rules and now use testValue:
    • calc(500px/0)calc(infinity * 1px) (§10.13)
    • calc(500px/2px)250 (dim/dim → number)
    • calc(10pc + unknown) → preserved opaque (no lex error)

All 190 fixtures pass.

@dkryaklin
Copy link
Copy Markdown
Author

The ~39 changed-output cases cluster into seven categories, all spec-aligned:

  • Spec-style operator spacing (~15 cases). The serializer emits * and / around binary operators per §10.12 canonical form. Old: 100px*var(--x). New: 100px * var(--x). Semantics unchanged.

  • Reciprocal conversion (~8 cases). Division by a numeric constant rewrites to multiplication by its reciprocal — mathematically equivalent; simplifies downstream folding. Old: var(--x) / 2. New: var(--x) * 0.5.

  • Constant folding on literals (~5 cases). Compile-time arithmetic on numeric literals collapses. Old: (var(--a) + 4px) * 2 * 2. New: (var(--a) + 4px) * 4.

  • Math constant folding (3 cases, §10.7.1). pi and e reduce to numeric values. Old: calc(43 + pi) kept symbolic. New: 46.14159.

  • Calc unwrap (~5 cases, §10.6). A calc() containing exactly one value drops the wrapper. Old: calc(var(--foo)) preserved. New: var(--foo). Same for constant(…), env(…), unknown opaque calls.

  • Unit case lowercased (2 cases). CSS units are case-insensitive; the spec recommends lowercase. Old: 2PX. New: 2px. Same for Qq.

  • Same-unit arithmetic on unknown units (1 case). When both operands share an unrecognized unit, the math still applies. Old: 1unknown + 2unknown kept opaque. New: 3unknown (matches @csstools/css-calc).

@dkryaklin
Copy link
Copy Markdown
Author

Closed-issues sanity check. Ran 48 testable repros from the 82 closed issues through the new pipeline (skipped non-bug entries — release notes, README links, etc.):

  • 47 process cleanly. Every historical bug with a concrete calc(...) reduces without throwing.
  • 1 warn: #8 (calc(2 ^ 3)) — feature request for ^ as exponent. CSS never adopted ^; our parser correctly warns about the invalid character. The spec answer is pow(2, 3), which we support.

Notable historical bugs now producing correct results:

  • #41 calc(100vw - (100vw - 100%))calc(0vw + 100%) (preserves length+% typing; was lossy → 100%)
  • #88 calc(0px - (100vw - 10px) / 2)calc(5px - 50vw)
  • #68 calc(10px/16px)0.625 (unitless, §10.9 dim/dim rule)
  • #92 calc(2px * 0)0px (zero preserves type signal per WPT)
  • #50 var(--tooltip-calculated-offset) — variable names containing "calc" work

No regressions, no unexpected throws.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants