Skip to content

Add x64 backend#1

Merged
StachuDotNet merged 23 commits intomainfrom
x64
Apr 10, 2026
Merged

Add x64 backend#1
StachuDotNet merged 23 commits intomainfrom
x64

Conversation

@StachuDotNet
Copy link
Copy Markdown
Member

@StachuDotNet StachuDotNet commented Apr 9, 2026

Add x86_64 native backend

Adds a complete x86_64 (Intel/AMD) backend to the Dark compiler alongside the existing ARM64 backend. The compiler auto-detects the host CPU and emits the correct native code — same Dark source, same test suite, two architectures.

Shared work

  • ARM64 files moved to passes/arm64/ and new work lives in passes/x64/ for symmetry.
  • PlatformTypes.fs — extracted OS/Arch DU for compile-order availability.
  • ArchConfig.fs — per-architecture calling convention configs.
  • Platform.fs — x86_64 syscall numbers, architecture detection.
  • 5_RegisterAllocation.fs — now takes an Arch DU parameter instead of runtime CPU checks, so the register allocator can be driven by configuration rather than host detection.
  • CompilerLibrary.fs — dual-architecture compilation pipeline.
  • R11 scratch aliasing. x86_64 has fewer GP registers than ARM64 — LIR registers X8-X17 all map to R11 (scratch). Codegen uses SaveRegs/RestoreRegs patterns when multiple scratch values are needed simultaneously.

New x86_64 passes (passes/x64/)

  • 6_CodeGen.fs — LIR to x86_64 instruction translation (3,065 lines)
  • 7_Encoding.fs — instruction to machine code bytes (646 lines)
  • 7_Resolve.fs — jump label resolution (132 lines)
  • 8_Binary_Generation_ELF.fs — ELF executable output (110 lines)
  • X86_64.fs — instruction type definitions (registers, conditions, ~47 instruction variants)

Tests

4 new test files (encoding, codegen, resolve, binary) with 556 lines of x86_64-specific unit tests.

Tooling

Diagnostic scripts (debug-stack.sh, debug-x86-crash.sh, disasm-func.sh, dump-lir-func.sh).

Test results

4,529 / 4,530 (99.98%)

1 known failure: memReclaimBurn — requires reference counting (heap memory reclamation). The RC helpers are implemented but disabled (enabling them currently causes 37 test regressions in crypto/other areas). See docs/x64-refcounting.md.

The hard part

The bulk of this work was handling x86_64's two-operand instruction format. ARM64 is three-operand (ADD X0, X1, X2 = "X0 = X1 + X2"), but x86_64 is two-operand (ADD RAX, RCX = "RAX += RCX"). When the LIR says dest = left OP right and dest == right, a naive MOV dest, left; OP dest, right clobbers right before reading it. Every binary operation needed careful handling:

  • Commutative ops (Add, Mul, And, Or, Xor): swap operands
  • Non-commutative ops (Sub, Div): use scratch/temp register
  • Float non-commutative (FSub, FDiv): use XMM15 as temp

Benchmarks: ARM64 vs x86_64

Same run_benchmarks.sh cachegrind pipeline on both arches. ARM numbers are from paul @ 453aaafe (2026-03-05); x64 numbers are from major (Threadripper 3960X) @ 518d00e2 (2026-04-10), so some delta is commit drift, not pure codegen. edigits, fasta, mandelbrot failed to run on x64 and are excluded; reduced-size benchmarks are skipped.

Benchmark ARM64 (paul) x64 (major) x64 / ARM
ackermann 11,450,298,027 12,881,626,223 1.13×
binary_trees 154,007,725 160,561,244 1.04×
collatz 81,441,905 113,644,282 1.40×
factorial 4,420,203 4,790,258 1.08×
fib 686,796,263 746,517,690 1.09×
leibniz 1,200,000,148 1,800,000,168 1.50×
merkletrees 733,993,597 704,505,266 0.96×
nqueen 724,301,430 936,840,115 1.29×
pisum 65,014,671 70,012,708 1.08×
primes 5,443,919 8,563,311 1.57×
sum_to_n 7,002,526 8,002,248 1.14×
tak 635,804,177 604,637,268 0.95×

Geometric mean x64/ARM ≈ 1.15× (x64 emits ~15% more instructions on average). tak and merkletrees are slightly better on x64; primes, leibniz, collatz, and nqueen are the clearest regressions and the best targets for x64 codegen tuning. Aggregate Dark-vs-Rust speedup in RESULTS.md moved from 3.99× (ARM) to 4.57× (x64).

Known gaps

  • Encoding test coverage: 11 of 47 instruction types have byte-level unit tests. The remaining 36 are covered by E2E tests but lack encoding-specific regression tests.
  • Reference counting: x86_64 emits no RC instructions. Programs work because the 512MB heap suffices for nearly all tests.
  • No Mach-O output for x86_64 (Linux ELF only). macOS x86_64 is effectively EOL.

StachuDotNet and others added 22 commits April 8, 2026 21:03
- PlatformTypes.fs: OS/Arch DUs available early in compile order
- Platform.fs: x86_64 syscall numbers, architecture detection
- ArchConfig.fs: per-architecture calling convention configs
- DarkCompiler.fsproj: include new files, arm64/x64 directory structure
- Binary_ELF.fs: EM_X86_64 constant
- Dockerfile: add enscript/ghostscript for PDF code review
- devcontainer.json: VS Code devcontainer configuration
- .gitignore: scope x64/x86 patterns to repo root only

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- X86_64.fs: instruction DU with ~47 variants (MOV, ADD, SSE2, etc.),
  registers (RAX-R15, XMM0-XMM15), conditions, and sizes
- 7_Encoding.fs: instruction-to-bytes encoder handling REX prefixes,
  ModRM/SIB bytes, and variable-length x86_64 encoding
- 7_Resolve.fs: two-pass label resolution for jump targets
- 8_Binary_Generation_ELF.fs: x86_64 Linux ELF executable output

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Complete LIR-to-x86_64 translation (~3,000 lines) handling:
- Two-operand conflict resolution (dest==right clobbering) for all
  binary ops: commutative swap, non-commutative temp registers
- Register mapping: X0-X7 to RAX,RDI,RSI,RCX,R8,R9,R10,RDX
- X8-X17 aliasing to R11 (scratch) with SaveRegs/RestoreRegs for
  multi-operand instructions (RawSet, Msub, Madd)
- Heap allocation via bump pointer (R14) + free list (R15)
- String literal emission, file I/O syscalls (read/write/append/exists)
- IDIV with RDX save/restore, INT64_MIN/-1 overflow detection
- Reference counting helpers (implemented but disabled)
- Float operations via SSE2 with XMM15 temp for non-commutative ops

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- 5_RegisterAllocation.fs: take Arch DU parameter instead of runtime
  CPU detection; x86_64-aware operand rewriting (keep right operand
  as StackSlot for two-operand codegen); reduced callee-saved set
  (3 vs ARM64's 8) due to R14/R15 reservation for heap/free-list
- CompilerLibrary.fs: dual-architecture compilation pipeline with
  runtime arch detection branching to x86_64 or ARM64 codegen
- Runtime.fs: x86_64 print routines, heap initialization via mmap,
  leak counter support, string refcounting intrinsics

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Rename arch-specific pass files into passes/arm64/ and passes/x64/
directories. Shared passes (1-5) remain in passes/. Makes the
architecture boundary explicit in the file structure.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- X86_64EncodingTests.fs: byte-level verification for 11 instruction families
- X86_64CodeGenTests.fs: end-to-end compile-and-execute tests
- X86_64ResolveTests.fs: label resolution and instruction sizing
- X86_64BinaryTests.fs: ELF generation and execution tests
- TestRunner.fs: architecture-conditional test execution (qemu fallback)
- PhiResolutionTests.fs: pass Arch parameter to register allocator
- floats.e2e: tolerance fix for cross-platform float formatting

4,529/4,530 tests pass (99.98%). 1 known failure: memReclaimBurn
(requires reference counting, deferred).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- CLAUDE.md: x86_64 backend architecture, register mapping, known
  issues, devcontainer build instructions
- INITIATIVES.md: x86_64 progress tracking, remaining work, debugging
  clues for memReclaimBurn and RC implementation
- Diagnostic scripts: debug-stack.sh (callee-saved corruption),
  debug-x86-crash.sh (GDB crash analysis), disasm-func.sh,
  dump-lir-func.sh, generate-review-pdfs.sh

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- README: note dual-architecture support (ARM64 + x86_64)
- INITIATIVES.md: own ARC/RC implementation on x86_64 rather than
  waiting for ARM64 contributor; add phased plan for fixing the
  37-test regression and generic RefCountDec

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- benchmarks/HISTORY.md: add machine registry (paul=ARM64, major=x86_64),
  append x86_64 instruction counts from cachegrind quick check
- benchmarks/QUICK_BASELINE.txt: replace with x86_64 baseline for
  future regression detection on this architecture
- README.md: note dual-architecture support
- INITIATIVES.md: own ARC/RC implementation on x86_64 rather than
  waiting for ARM64 contributor; add phased plan

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- CLAUDE.md: generic build/test instructions, architecture overview,
  link to TODOs.md. No longer x64-specific.
- TODOs.md: thin index replacing INITIATIVES.md, links to detail docs
- docs/x64-refcounting.md: deep-dive on RC implementation status,
  37-test regression analysis, debugging clues, phased plan
- Remove INITIATIVES.md (content moved to TODOs.md + docs/)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Generate PDFs for everything in the PR diff: new files in full, modified
files as diffs. 38 PDFs covering source, tests, docs, infra, scripts,
and benchmarks. 127 pages / 64 sheets double-sided.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- TODOs.md: flatten groups into single checkboxed list, drop personal items
- CLAUDE.md: reduce to pointer to AGENTS.md and README.md
- AGENTS.md: remove x64 branch reference, absorb devcontainer quickstart
- Dockerfile: reword qemu comment, drop enscript/ghostscript (host handles PDFs)
- scripts/run-in-container: detects host vs container and dispatches
  accordingly; used by the debug and dump scripts
- scripts/x64/: new dir for backend-specific tools
  - debug-stack.sh: callee-saved corruption watcher
  - debug-crash.sh: renamed from debug-x86-crash.sh
  - disasm-func.sh: moved, both now use run-in-container
- scripts/dump-lir-func.sh: stays top-level (LIR is shared), uses run-in-container
- benchmarks/HISTORY.md: add machine column to every row (paul was missing),
  fill in major (AMD Threadripper 3960X), drop redundant separator
- Platform.fs now defines OS, Arch, detection helpers, and syscall numbers
  in one place. It has no ARM64.fs dependency, so it sits early enough in
  the compile order that the register allocator can reference Platform.Arch
  directly. Callers use Platform.MacOS, Platform.X86_64, Platform.detectArch ().
- PlatformTypes.fs deleted (was a workaround for the old ARM64 dependency).
- The ARM64-specific SyscallConfig (SyscallRegister + SvcImmediate) moved
  into ARM64.fs as ARM64.SyscallConfig / ARM64.syscallConfigFor, which
  wraps Platform.syscallNumbersFor.
- Callers updated: Runtime.fs, arm64/6_CodeGen.fs, arm64/7_Emit.fs,
  arm64/7_Encoding.fs, CompilerLibrary.fs, Program.fs, register allocator,
  and x64 tests.
- DarkCompiler.fsproj reorganized: arm64 passes and x64 passes grouped
  together rather than interleaved by pass number. Section comments added.
- scripts/generate-review-pdfs.sh removed (moved to host ~/bin).

Tests: 4529/4530 (baseline preserved).
ArchConfig defined per-arch calling convention records (IntArgRegs,
CalleeSavedRegs, etc.) but nothing referenced them — the register
allocator has its own inline lists via calleeSavedRegsFor. Remove the
file and its fsproj entry.
- EncodingTests.fs -> ARM64EncodingTests.fs (tests ARM64 encoder only)
- BinaryTests.fs -> ARM64BinaryTests.fs (tests Mach-O output only)
- TestRunner.fs: group ARM64 and x64 tests, use consistent naming
- Tests.fsproj: add section comments for shared / ARM64 / x64 tests
- PhiResolutionTests.fs: drop the Error->ARM64 fallback hack; the test
  is arch-independent, just hardcode ARM64 for the full register set.
- TODOs.md: add CircleCI setup
- passes/x64/6_CodeGen.fs: rewrite file header to stand on its own
  without comparing to ARM64; update comments on heapPtr/freeListBase,
  Sdiv, SaveRegs, FVirtual 2000 temp, and RawSet ownership inc to stop
  referencing the arm64 backend.
- passes/x64/8_Binary_Generation_ELF.fs: drop the "differences from
  ARM64 variant" header; state the x64 file's own design notes.
- .circleci/config.yml: minimal CI — build devcontainer image, run full
  test suite inside it. Keeps CI and local dev on the same toolchain.
- README.md: 338 -> 67 lines. Intro, quick start, pointer to docs.
  Removes the inline CLI reference, Docker/Codex/Claude walkthrough,
  ARM64-specific troubleshooting, and outdated test count.
- docs/quick-start.md: new. CLI reference, dump flags, binary inspection.
- docs/docker.md: new. Devcontainer usage, run-in-container helper,
  Codex/Claude, worktree volumes.
- docs/architecture.md: make the pipeline and platform sections
  arch-neutral; mention x64 passes under passes/{arm64,x64}/.
docs/compiler-passes.md:
- Replace noisy ASCII box pipeline diagram with a compact table
- Rewrite Pass 4/5/6/7+8 sections to be arch-neutral
- Add per-backend file matrix (arm64 vs x64 Encoding/Resolve/Binary)
- Update register classes to describe the LIR abstraction (X0-X30)
  and explain the x86_64 R11 aliasing of X8-X17
- Add Platform.fs, X86_64.fs, Binary_ELF.fs to Data Structures table

AGENTS.md:
- Architecture overview mentions the pass split
- Devcontainer section now points at scripts/run-in-container
- Documentation References table adds quick-start.md, docker.md, x64-refcounting.md
- Key Files table adds Platform.fs, 5_RegisterAllocation.fs, and the
  per-arch codegen files

TODOs.md: add follow-up doc tasks (adding-a-backend.md, x64-codegen.md,
stale feature-doc audit).
CircleCI's machine executor checks out code as uid 1001, but the
container runs as dark (uid 1000), so the bind-mounted workspace
was read-only to the container and dotnet restore failed creating
obj/.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@StachuDotNet StachuDotNet changed the title X64 Add x64 backend Apr 10, 2026
@StachuDotNet StachuDotNet marked this pull request as ready for review April 10, 2026 20:47
Requires reference counting, which is implemented but currently gated
off because enabling it regresses 37 other tests. Tracked in
docs/x64-refcounting.md.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@StachuDotNet StachuDotNet merged commit 777d22f into main Apr 10, 2026
1 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant