Merged
Conversation
- PlatformTypes.fs: OS/Arch DUs available early in compile order - Platform.fs: x86_64 syscall numbers, architecture detection - ArchConfig.fs: per-architecture calling convention configs - DarkCompiler.fsproj: include new files, arm64/x64 directory structure - Binary_ELF.fs: EM_X86_64 constant - Dockerfile: add enscript/ghostscript for PDF code review - devcontainer.json: VS Code devcontainer configuration - .gitignore: scope x64/x86 patterns to repo root only Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- X86_64.fs: instruction DU with ~47 variants (MOV, ADD, SSE2, etc.), registers (RAX-R15, XMM0-XMM15), conditions, and sizes - 7_Encoding.fs: instruction-to-bytes encoder handling REX prefixes, ModRM/SIB bytes, and variable-length x86_64 encoding - 7_Resolve.fs: two-pass label resolution for jump targets - 8_Binary_Generation_ELF.fs: x86_64 Linux ELF executable output Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Complete LIR-to-x86_64 translation (~3,000 lines) handling: - Two-operand conflict resolution (dest==right clobbering) for all binary ops: commutative swap, non-commutative temp registers - Register mapping: X0-X7 to RAX,RDI,RSI,RCX,R8,R9,R10,RDX - X8-X17 aliasing to R11 (scratch) with SaveRegs/RestoreRegs for multi-operand instructions (RawSet, Msub, Madd) - Heap allocation via bump pointer (R14) + free list (R15) - String literal emission, file I/O syscalls (read/write/append/exists) - IDIV with RDX save/restore, INT64_MIN/-1 overflow detection - Reference counting helpers (implemented but disabled) - Float operations via SSE2 with XMM15 temp for non-commutative ops Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- 5_RegisterAllocation.fs: take Arch DU parameter instead of runtime CPU detection; x86_64-aware operand rewriting (keep right operand as StackSlot for two-operand codegen); reduced callee-saved set (3 vs ARM64's 8) due to R14/R15 reservation for heap/free-list - CompilerLibrary.fs: dual-architecture compilation pipeline with runtime arch detection branching to x86_64 or ARM64 codegen - Runtime.fs: x86_64 print routines, heap initialization via mmap, leak counter support, string refcounting intrinsics Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Rename arch-specific pass files into passes/arm64/ and passes/x64/ directories. Shared passes (1-5) remain in passes/. Makes the architecture boundary explicit in the file structure. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- X86_64EncodingTests.fs: byte-level verification for 11 instruction families - X86_64CodeGenTests.fs: end-to-end compile-and-execute tests - X86_64ResolveTests.fs: label resolution and instruction sizing - X86_64BinaryTests.fs: ELF generation and execution tests - TestRunner.fs: architecture-conditional test execution (qemu fallback) - PhiResolutionTests.fs: pass Arch parameter to register allocator - floats.e2e: tolerance fix for cross-platform float formatting 4,529/4,530 tests pass (99.98%). 1 known failure: memReclaimBurn (requires reference counting, deferred). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- CLAUDE.md: x86_64 backend architecture, register mapping, known issues, devcontainer build instructions - INITIATIVES.md: x86_64 progress tracking, remaining work, debugging clues for memReclaimBurn and RC implementation - Diagnostic scripts: debug-stack.sh (callee-saved corruption), debug-x86-crash.sh (GDB crash analysis), disasm-func.sh, dump-lir-func.sh, generate-review-pdfs.sh Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- README: note dual-architecture support (ARM64 + x86_64) - INITIATIVES.md: own ARC/RC implementation on x86_64 rather than waiting for ARM64 contributor; add phased plan for fixing the 37-test regression and generic RefCountDec Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- benchmarks/HISTORY.md: add machine registry (paul=ARM64, major=x86_64), append x86_64 instruction counts from cachegrind quick check - benchmarks/QUICK_BASELINE.txt: replace with x86_64 baseline for future regression detection on this architecture - README.md: note dual-architecture support - INITIATIVES.md: own ARC/RC implementation on x86_64 rather than waiting for ARM64 contributor; add phased plan Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- CLAUDE.md: generic build/test instructions, architecture overview, link to TODOs.md. No longer x64-specific. - TODOs.md: thin index replacing INITIATIVES.md, links to detail docs - docs/x64-refcounting.md: deep-dive on RC implementation status, 37-test regression analysis, debugging clues, phased plan - Remove INITIATIVES.md (content moved to TODOs.md + docs/) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Generate PDFs for everything in the PR diff: new files in full, modified files as diffs. 38 PDFs covering source, tests, docs, infra, scripts, and benchmarks. 127 pages / 64 sheets double-sided. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- TODOs.md: flatten groups into single checkboxed list, drop personal items - CLAUDE.md: reduce to pointer to AGENTS.md and README.md - AGENTS.md: remove x64 branch reference, absorb devcontainer quickstart - Dockerfile: reword qemu comment, drop enscript/ghostscript (host handles PDFs)
- scripts/run-in-container: detects host vs container and dispatches accordingly; used by the debug and dump scripts - scripts/x64/: new dir for backend-specific tools - debug-stack.sh: callee-saved corruption watcher - debug-crash.sh: renamed from debug-x86-crash.sh - disasm-func.sh: moved, both now use run-in-container - scripts/dump-lir-func.sh: stays top-level (LIR is shared), uses run-in-container - benchmarks/HISTORY.md: add machine column to every row (paul was missing), fill in major (AMD Threadripper 3960X), drop redundant separator
- Platform.fs now defines OS, Arch, detection helpers, and syscall numbers in one place. It has no ARM64.fs dependency, so it sits early enough in the compile order that the register allocator can reference Platform.Arch directly. Callers use Platform.MacOS, Platform.X86_64, Platform.detectArch (). - PlatformTypes.fs deleted (was a workaround for the old ARM64 dependency). - The ARM64-specific SyscallConfig (SyscallRegister + SvcImmediate) moved into ARM64.fs as ARM64.SyscallConfig / ARM64.syscallConfigFor, which wraps Platform.syscallNumbersFor. - Callers updated: Runtime.fs, arm64/6_CodeGen.fs, arm64/7_Emit.fs, arm64/7_Encoding.fs, CompilerLibrary.fs, Program.fs, register allocator, and x64 tests. - DarkCompiler.fsproj reorganized: arm64 passes and x64 passes grouped together rather than interleaved by pass number. Section comments added. - scripts/generate-review-pdfs.sh removed (moved to host ~/bin). Tests: 4529/4530 (baseline preserved).
ArchConfig defined per-arch calling convention records (IntArgRegs, CalleeSavedRegs, etc.) but nothing referenced them — the register allocator has its own inline lists via calleeSavedRegsFor. Remove the file and its fsproj entry.
- EncodingTests.fs -> ARM64EncodingTests.fs (tests ARM64 encoder only) - BinaryTests.fs -> ARM64BinaryTests.fs (tests Mach-O output only) - TestRunner.fs: group ARM64 and x64 tests, use consistent naming - Tests.fsproj: add section comments for shared / ARM64 / x64 tests - PhiResolutionTests.fs: drop the Error->ARM64 fallback hack; the test is arch-independent, just hardcode ARM64 for the full register set. - TODOs.md: add CircleCI setup
- passes/x64/6_CodeGen.fs: rewrite file header to stand on its own without comparing to ARM64; update comments on heapPtr/freeListBase, Sdiv, SaveRegs, FVirtual 2000 temp, and RawSet ownership inc to stop referencing the arm64 backend. - passes/x64/8_Binary_Generation_ELF.fs: drop the "differences from ARM64 variant" header; state the x64 file's own design notes. - .circleci/config.yml: minimal CI — build devcontainer image, run full test suite inside it. Keeps CI and local dev on the same toolchain.
- README.md: 338 -> 67 lines. Intro, quick start, pointer to docs.
Removes the inline CLI reference, Docker/Codex/Claude walkthrough,
ARM64-specific troubleshooting, and outdated test count.
- docs/quick-start.md: new. CLI reference, dump flags, binary inspection.
- docs/docker.md: new. Devcontainer usage, run-in-container helper,
Codex/Claude, worktree volumes.
- docs/architecture.md: make the pipeline and platform sections
arch-neutral; mention x64 passes under passes/{arm64,x64}/.
docs/compiler-passes.md: - Replace noisy ASCII box pipeline diagram with a compact table - Rewrite Pass 4/5/6/7+8 sections to be arch-neutral - Add per-backend file matrix (arm64 vs x64 Encoding/Resolve/Binary) - Update register classes to describe the LIR abstraction (X0-X30) and explain the x86_64 R11 aliasing of X8-X17 - Add Platform.fs, X86_64.fs, Binary_ELF.fs to Data Structures table AGENTS.md: - Architecture overview mentions the pass split - Devcontainer section now points at scripts/run-in-container - Documentation References table adds quick-start.md, docker.md, x64-refcounting.md - Key Files table adds Platform.fs, 5_RegisterAllocation.fs, and the per-arch codegen files TODOs.md: add follow-up doc tasks (adding-a-backend.md, x64-codegen.md, stale feature-doc audit).
CircleCI's machine executor checks out code as uid 1001, but the container runs as dark (uid 1000), so the bind-mounted workspace was read-only to the container and dotnet restore failed creating obj/. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Requires reference counting, which is implemented but currently gated off because enabling it regresses 37 other tests. Tracked in docs/x64-refcounting.md. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Add x86_64 native backend
Adds a complete x86_64 (Intel/AMD) backend to the Dark compiler alongside the existing ARM64 backend. The compiler auto-detects the host CPU and emits the correct native code — same Dark source, same test suite, two architectures.
Shared work
passes/arm64/and new work lives inpasses/x64/for symmetry.PlatformTypes.fs— extracted OS/Arch DU for compile-order availability.ArchConfig.fs— per-architecture calling convention configs.Platform.fs— x86_64 syscall numbers, architecture detection.5_RegisterAllocation.fs— now takes anArchDU parameter instead of runtime CPU checks, so the register allocator can be driven by configuration rather than host detection.CompilerLibrary.fs— dual-architecture compilation pipeline.New x86_64 passes (
passes/x64/)6_CodeGen.fs— LIR to x86_64 instruction translation (3,065 lines)7_Encoding.fs— instruction to machine code bytes (646 lines)7_Resolve.fs— jump label resolution (132 lines)8_Binary_Generation_ELF.fs— ELF executable output (110 lines)X86_64.fs— instruction type definitions (registers, conditions, ~47 instruction variants)Tests
4 new test files (encoding, codegen, resolve, binary) with 556 lines of x86_64-specific unit tests.
Tooling
Diagnostic scripts (
debug-stack.sh,debug-x86-crash.sh,disasm-func.sh,dump-lir-func.sh).Test results
4,529 / 4,530 (99.98%)
1 known failure:
memReclaimBurn— requires reference counting (heap memory reclamation). The RC helpers are implemented but disabled (enabling them currently causes 37 test regressions in crypto/other areas). Seedocs/x64-refcounting.md.The hard part
The bulk of this work was handling x86_64's two-operand instruction format. ARM64 is three-operand (
ADD X0, X1, X2= "X0 = X1 + X2"), but x86_64 is two-operand (ADD RAX, RCX= "RAX += RCX"). When the LIR saysdest = left OP rightanddest == right, a naiveMOV dest, left; OP dest, rightclobbersrightbefore reading it. Every binary operation needed careful handling:Benchmarks: ARM64 vs x86_64
Same
run_benchmarks.shcachegrind pipeline on both arches. ARM numbers are from paul @453aaafe(2026-03-05); x64 numbers are from major (Threadripper 3960X) @518d00e2(2026-04-10), so some delta is commit drift, not pure codegen.edigits,fasta,mandelbrotfailed to run on x64 and are excluded; reduced-size benchmarks are skipped.Geometric mean x64/ARM ≈ 1.15× (x64 emits ~15% more instructions on average).
takandmerkletreesare slightly better on x64;primes,leibniz,collatz, andnqueenare the clearest regressions and the best targets for x64 codegen tuning. Aggregate Dark-vs-Rust speedup inRESULTS.mdmoved from 3.99× (ARM) to 4.57× (x64).Known gaps