Skip to content

New build system using nix#1304

Draft
mvachhar wants to merge 24 commits intomainfrom
pr/mvachhar/new-build-system
Draft

New build system using nix#1304
mvachhar wants to merge 24 commits intomainfrom
pr/mvachhar/new-build-system

Conversation

@mvachhar
Copy link
Contributor

@mvachhar mvachhar commented Feb 24, 2026

This PR is a continuation of the work started by @daniel-noland to move to a proper nix based build system.

Most of this PR was built based on #1275 and the work of Claude Code using Opus 4.6. As such it should be reviewed carefully. I have tried to do the work in small chunks with the AI to get some review as we go along, but I am not a nix expert and had to rely a bit on the AI's judgement as to the best approach for certain things.

TODO:

  • Make failing new sanitizer runs optional - the sanitizers found real bugs we need to fix in separate PRs
    • These got commented out, the github action-fu to make it work is too hard for this PR
  • ~~ Create cachix "githedgehog" cache so that these runs come from the cache ~~ DONE
  • Have @Fredi-raspall, @qmonnet, and @daniel-noland rebase on this branch to make sure their workflow still works
  • Careful manual review of this PR before signing off
  • Co-pilot review of this PR before signoff DONE
  • Remove scripts/todo.sh. DONE
  • Remove scripts/install-real-nix.sh. DONE
  • Make sure the proper just targets for building and pushing containers is there (I believe we are good, but I want to confirm)

@mvachhar mvachhar requested a review from a team as a code owner February 24, 2026 16:06
@mvachhar mvachhar self-assigned this Feb 24, 2026
@mvachhar mvachhar requested review from sergeymatov and removed request for a team February 24, 2026 16:06
@mvachhar mvachhar marked this pull request as draft February 24, 2026 16:06
@mvachhar mvachhar requested review from Copilot and removed request for sergeymatov February 24, 2026 16:07
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR continues the migration to a Nix-based build and CI workflow, replacing the prior compile-env/docker-based approach and wiring sysroot/toolchain configuration through Nix shells and Nix builds.

Changes:

  • Replaces the legacy compile-env + fake-nix workflow with default.nix/overlays, nix-shell, and updated just recipes.
  • Updates CI (dev.yml) to build/test via Nix targets and introduces new Nix packaging pieces (FRR packaging, platform/profile plumbing).
  • Refactors sysroot usage in Rust build scripts and updates docs to match the new Nix-first workflow.

Reviewed changes

Copilot reviewed 55 out of 56 changed files in this pull request and generated 12 comments.

Show a summary per file
File Description
testing.md Updates testing instructions to assume nix-shell tooling.
sysfs/build.rs Removes sysroot build script logic.
sysfs/Cargo.toml Drops dpdk-sysroot-helper build-dependency.
shell.nix Switches shell entrypoint to default.nix devenv.
scripts/update-doc-headers.sh Bumps KaTeX version used in docs.
scripts/todo.sh Adds a Nix-based build/test “checklist” script.
scripts/test-runner.sh Removes legacy docker-based test runner wrapper.
scripts/rust.env Removes legacy RUSTFLAGS/profile env file.
scripts/k8s-crd.env Updates gateway CRD ref env file (now likely legacy).
scripts/installl-real-nix.sh Adds helper to replace “fake nix” with real Nix install.
scripts/dpdk-sys.env Updates pinned dpdk-sys commit.
scripts/doc/custom-header.html Updates KaTeX CDN links and integrity hashes.
rust-toolchain.toml Removes rustup toolchain file in favor of Nix toolchain sourcing.
routing/Cargo.toml Cleans tokio features and adds dev tokio “full”.
npins/sources.json Updates Nix pins (crane, frr, gateway, nixpkgs, rust, rust-overlay).
nix/profiles.nix Adjusts compile/link/security profile flags and profile mapping.
nix/platforms.nix Adds platform name mapping for bluefield2 → bluefield.
nix/pkgs/frr/patches/yang-hack.patch Adds FRR/libyang-related patch.
nix/pkgs/frr/patches/xrelifo.py.fix.patch Adds FRR python/xrelfo patch.
nix/pkgs/frr/default.nix Introduces FRR derivation with configurable protocol support.
nix/pkgs/frr/clippy-helper.nix Adds split derivation for FRR “clippy” tool for cross builds.
nix/pkgs/dpdk/default.nix Simplifies DPDK build params and uses platform-provided properties.
nix/overlays/llvm.nix Reworks LLVM+Rust toolchain overlay to source versions from pins.
nix/overlays/frr.nix Adds overlay customizing dependencies for FRR static/cross builds.
nix/overlays/default.nix Registers new overlays (rust/llvm/dataplane/frr).
nix/overlays/dataplane.nix Wires platform/profile into DPDK build and tweaks deps.
nix/overlays/dataplane-dev.nix Uses llvmPackages’ stdenv and adds a static-leaning gdb override.
net/src/buffer/test_buffer.rs Cleans doc-only import; adds explicit PacketBuffer doc link.
mgmt/tests/reconcile.rs Adds VM-runner attribute to a test.
mgmt/src/tests/mgmt.rs Removes unused imports and disables a VM test during refactor.
mgmt/Cargo.toml Adds n-vm + tracing-subscriber for tests.
k8s-intf/build.rs Refactors CRD generation to OUT_DIR and env-driven inputs.
k8s-intf/Cargo.toml Swaps build deps to dpdk-sysroot-helper.
justfile Replaces compile-env/sterile/docker flows with Nix build/test/container commands.
init/build.rs Switches to dpdk_sysroot_helper::use_sysroot() behind feature gate.
init/Cargo.toml Introduces sysroot feature and makes sysroot helper optional.
hardware/src/os/mod.rs Fixes a typo in a clippy lint comment.
hardware/build.rs Switches to centralized use_sysroot().
dpdk/src/lcore.rs Updates lcore ID call to rte_lcore_id().
dpdk/build.rs Switches to centralized use_sysroot().
dpdk-sysroot-helper/src/lib.rs Changes sysroot discovery to DATAPLANE_SYSROOT and adds use_sysroot().
dpdk-sys/build.rs Updates bindgen/sysroot handling and link libs list.
development/code/running-tests.md Updates test-running docs to Nix-first commands.
default.nix Major Nix build definition: dev shell env, profiles, test archives, container tars.
dataplane/src/drivers/dpdk.rs Gates DPDK driver file behind dpdk feature.
dataplane/build.rs Switches to centralized use_sysroot() behind dpdk feature.
dataplane/Cargo.toml Makes dpdk deps optional behind a dpdk feature (default on).
cli/build.rs Removes sysroot build script logic.
cli/Cargo.toml Drops dpdk-sysroot-helper build-dependency.
README.md Updates developer setup/docs to nix-shell workflow.
Cargo.toml Updates workspace version and dependency versions.
Cargo.lock Updates lockfile to match dependency/version changes.
.github/workflows/dev.yml.old Keeps old workflow as .old (new file added).
.github/workflows/dev.yml Reworks CI to use Nix builds and archives.
.envrc Simplifies direnv env vars for the new devroot/sysroot layout.
.cargo/config.toml Updates env vars and rustflags for sysroot/devroot-based builds.

@mvachhar mvachhar force-pushed the pr/mvachhar/new-build-system branch 7 times, most recently from d2a1beb to cddb251 Compare February 24, 2026 21:12
@daniel-noland daniel-noland force-pushed the pr/mvachhar/new-build-system branch from cddb251 to 3591e49 Compare February 24, 2026 21:27
@mvachhar mvachhar force-pushed the pr/mvachhar/new-build-system branch from 3591e49 to 921adf0 Compare February 24, 2026 21:49
@mvachhar mvachhar added ci:+vlab Enable VLAB tests labels Feb 24, 2026
@mvachhar mvachhar closed this Feb 24, 2026
@mvachhar mvachhar reopened this Feb 24, 2026
@daniel-noland daniel-noland force-pushed the pr/mvachhar/new-build-system branch from e3be498 to eb71953 Compare February 24, 2026 22:25
@mvachhar mvachhar added the ci:-upgrade Disable VLAB upgrade tests label Feb 24, 2026
@daniel-noland daniel-noland force-pushed the pr/mvachhar/new-build-system branch 2 times, most recently from bae29e6 to 6a688dd Compare February 24, 2026 23:09
@mvachhar mvachhar force-pushed the pr/mvachhar/new-build-system branch 2 times, most recently from 81e9456 to 0059740 Compare February 24, 2026 23:19
@daniel-noland daniel-noland force-pushed the pr/mvachhar/new-build-system branch from 628c5d8 to e56e5c9 Compare February 25, 2026 06:52
@mvachhar mvachhar force-pushed the pr/mvachhar/new-build-system branch from 219e2e2 to dd0a0ca Compare March 2, 2026 23:04
mvachhar and others added 23 commits March 2, 2026 17:49
Replace the old FHS-based shell.nix with a minimal wrapper that imports
devenv from default.nix.  Simplify .envrc to just set RUSTC_BOOTSTRAP
and add devroot/bin to PATH, removing the old compile-env docker-based
environment setup.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Daniel Noland <daniel@githedgehog.com>
Replace rust-toolchain.toml with nix-managed rust toolchain configuration.
The rust version and components are now sourced from the npins rust pin via
rust-overlay's fromRustupToolchain, and targets come from the nix platform
config.

Also fix llvmPackages -> llvmPackages' references throughout llvm.nix,
add rustPlatform'-dev for host builds, and switch prev -> final where
appropriate to respect overlay ordering.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Daniel Noland <daniel@githedgehog.com>
Add nix overlay and package definition for building FRR (Free Range
Routing) as a cross-compiled dependency.  Includes clippy-helper for
lint integration and patches directory for any needed source fixes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Daniel Noland <daniel@githedgehog.com>
Simplify the DPDK nix package build, pass platform through to the
dpdk derivation, and fix llvmPackages -> llvmPackages' in the
dataplane-dev overlay.  Update dpdk-sysroot-helper to simplify linker
search path handling, fix lcore hwloc type usage, and remove unnecessary
build.rs files from cli and sysfs crates.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Daniel Noland <daniel@githedgehog.com>
Add platform name mapping so bluefield2 compiles with the name
"bluefield" as DPDK expects.  The cross compile file is still generated
correctly for bluefield2 (cortex-a72 / armv8.2-a), but DPDK's internal
naming requires the shorter form.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Daniel Noland <daniel@githedgehog.com>
Add a size-optimized, statically-linked gdb build (gdb') to the
dataplane-dev overlay for use in debug containers.  Built with LTO
and --gc-sections, with source-highlight disabled to enable static
compilation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Daniel Noland <daniel@githedgehog.com>
Move --gc-sections and --as-needed from performance-only link flags to
common RUSTFLAGS (they work fine for rust, FRR has its own build).
Enable -fcf-protection=full and -Zcf-protection=full for all builds.
Add fuzz profile as an alias for release, and make profile-map use rec
to enable self-references.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Daniel Noland <daniel@githedgehog.com>
Major rework of the top-level nix build:

- Add fuzz cargo profile mapping and frr-pkgs package set
- Add skopeo to devroot for container operations
- Replace devenv shellHook with env attribute, exporting sysroot and
  devroot paths directly from the nix store so .cargo/config.toml's
  force=false env vars are properly overridden
- Switch cc to cxx (clang++) for C++ linking support
- Simplify build-std features (remove conditional llvm-libunwind)
- Remove sanitizer-conditional RUSTFLAGS and libgcc logic
- Add --as-needed,--gc-sections to linker flags
- Rework test-builder to support both per-package and workspace-wide
  test archive generation
- Split dataplane-tar into min-tar (base filesystem) and dataplane-tar
  (adds binaries), enabling reuse of the base layer
- Fix tar --mode flags for consistent permissions
- Add debug container definitions (libc and dataplane-debugger)
- Export new derivations (containers, min-tar)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Daniel Noland <daniel@githedgehog.com>
Simplify build scripts across the workspace:

- k8s-intf: Read CRD from GW_CRD_PATH env var instead of fetching from
  URL, remove reqwest/tokio build dependencies
- dpdk-sys: Simplify bindgen configuration, use DATAPLANE_SYSROOT
- dataplane, dpdk, hardware, init: Simplify sysroot path handling to
  use DATAPLANE_SYSROOT env var consistently
- cli, sysfs: Remove unnecessary build.rs files entirely

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Daniel Noland <daniel@githedgehog.com>
Replace the old compile-env/docker-based build environment with
nix-native paths:

- .cargo/config.toml: Point env vars at sysroot/devroot relative paths
  with force=false so nix env vars take precedence
- scripts/rust.env: Gut most content, keep only what justfile needs
- remove scripts/dpdk-sys.env as it is no longer used
- justfile: Add shell recipe for nix-shell entry
- Delete scripts/test-runner.sh (replaced by nix-based testing)
- Add scripts/todo.sh (build verification helper)
- Add scripts/installl-real-nix.sh (nix installation helper)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Daniel Noland <daniel@githedgehog.com>
Replace the old just/docker-based CI workflow with a nix-based build
using cachix and install-nix-action.  The new workflow uses a matrix
strategy across nix targets and build profiles.

The old workflow is preserved as dev.yml.old for reference during the
transition.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Daniel Noland <daniel@githedgehog.com>
De-duplicate tokio feature flags in routing/Cargo.toml and add tokio
with full features to dev-dependencies for test support.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Daniel Noland <daniel@githedgehog.com>
Update mgmt tests for compatibility with the nix build environment:
add n_vm test dependencies, simplify test_sample_config, and add

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Daniel Noland <daniel@githedgehog.com>
Update npins sources (crane, frr, gateway, nixpkgs, rust,
rust-overlay) and refresh Cargo.lock.  Bump workspace version and
update dependency versions in Cargo.toml.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Daniel Noland <daniel@githedgehog.com>
Signed-off-by: Manish Vachharajani <manish@githedgehog.com>
Update KaTeX version in custom-header.html and update-doc-headers.sh.
Fix a doc typo in hardware/src/os/mod.rs and clean up an unnecessary
include in net/src/buffer/test_buffer.rs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Daniel Noland <daniel@githedgehog.com>
Nix now sets up the entire environment, so rust.env is not needed.
Remove all docker/compile-env recipes and variables that are dead code
after the migration to nix-based builds. Rewrite build-container and
push recipes to use nix build and skopeo directly, and update remaining
recipes to call cargo without the old wrapper.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace all references to the old docker/compile-env workflow with the
new nix-shell based development environment across README.md, testing.md,
and development/code/running-tests.md.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add a justfile recipe to create devroot and sysroot symlinks via nix
build, making it easy to set up the local development environment.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Use continue-on-error with a per-matrix optional flag so that
sanitize/address and sanitize/thread failures show as warnings
instead of blocking the workflow.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Manish Vachharajani <manish@githedgehog.com>
Previously, we were using the committed generated file
without updating it.  This fixes that so that we now
generate the kopium gateway_agent_crd.rs file in the
target directory and properly use it.

A big change here is that the gateway agent version
now comes from npins/sources.json and not
scripts/k8s-crd.env.  The procedure to update the CRD
is now in the README.md

We also must not exclude json files from the nix sources or the
npins files are not available within nix build.

Co-authored-by: Daniel Noland <daniel@githedgehog.com>
Signed-off-by: Manish Vachharajani <manish@githedgehog.com>
The earlier series of commits adds the address and thread sanitizer
to the dev workflows.  These fail due to real bugs that need to be
addressed.  However, that is for later commits.

While the sanitizer jobs are marked as optional and do not cause
build failure, the summary job still sees them as failed and fails.
A future commit should make the summary job somehow look at the
optional flag and not fail.

Signed-off-by: Manish Vachharajani <manish@githedgehog.com>
Remove install-real-nix.sh and todo.sh as these are not
needed.

Signed-off-by: Manish Vachharajani <manish@githedgehog.com>
@mvachhar mvachhar force-pushed the pr/mvachhar/new-build-system branch 3 times, most recently from c718a55 to 4b3211c Compare March 3, 2026 16:47
Rewrite the build section of dev.yml so that each task happens
in a proper step inside the build job. Also, fix how container
pushes work with docker and the official oci registry and ghcr.io.
Add back the cargo deny check as well.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@mvachhar mvachhar force-pushed the pr/mvachhar/new-build-system branch from 4b3211c to 5661eaf Compare March 3, 2026 16:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci:-upgrade Disable VLAB upgrade tests ci:+vlab Enable VLAB tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants