Skip to content

Add UID-based nftables firewall for NATS and monit connections#399

Open
rkoster wants to merge 5 commits intocloudfoundry:mainfrom
rkoster:feature/uid-based-firewall
Open

Add UID-based nftables firewall for NATS and monit connections#399
rkoster wants to merge 5 commits intocloudfoundry:mainfrom
rkoster:feature/uid-based-firewall

Conversation

@rkoster
Copy link
Contributor

@rkoster rkoster commented Feb 6, 2026

Summary

This PR implements a UID-based nftables firewall to protect NATS (mbus) and monit connections, replacing the previous cgroup-based iptables approach.

Motivation

The existing cgroup-based firewall approach has limitations in nested container environments (like Garden containers running on BOSH VMs). In these environments:

  • The cgroup hierarchy may not be accessible or reliable
  • iptables cgroup matching doesn't work consistently
  • This leaves NATS and monit ports exposed to unprivileged processes

The UID-based approach solves this by using nftables with meta skuid matching, which works reliably regardless of container nesting.

Changes

New platform/firewall Package

  • firewall.go - Defines Manager and NatsFirewallHook interfaces
  • nftables_firewall.go - Linux implementation using github.com/google/nftables (netlink-based, no CLI required)
  • nftables_firewall_other.go - Stub for non-Linux platforms

Firewall Rules

Creates an nftables table bosh_agent with two output chains:

Chain Purpose Rule
monit_access Protect monit (port 2822) Allow only UID 0, drop others
nats_access Protect NATS/director connection Allow only UID 0, drop others

Platform Integration

  • Added GetNatsFirewallHook() to Platform interface
  • LinuxPlatform initializes firewall and implements the hook
  • Stub implementations for Windows and dummy platforms

NATS Handler Integration

  • nats_handler.go calls BeforeConnect hook before NATS connection and on each reconnection
  • Supports DNS re-resolution on reconnect for HA director failover scenarios

Testing

  • 23 unit tests for firewall functionality (setup, cleanup, error handling)
  • 4 new tests for NATS handler firewall hook integration
  • All tests use dependency injection for nftables connection and DNS resolver

Technical Details

  • Uses github.com/google/nftables library which communicates via netlink (no nft CLI needed)
  • Works on Ubuntu Jammy and Noble without additional package installation
  • IPv4 and IPv6 support
  • Firewall rules are idempotent (chains are flushed before adding rules)

Testing Performed

  • Unit tests: ginkgo -r platform/firewall mbus
  • Manual testing on Noble stemcell in nested Garden environment:
    • Non-root user: curl 127.0.0.1:2822 hangs (blocked)
    • Root user: curl 127.0.0.1:2822 returns 401 (allowed through firewall)

Implement a firewall mechanism that restricts NATS (mbus) connections
to the bosh-agent process only, using UID-based filtering with nftables.

Key changes:
- Add platform/firewall package with Manager and NatsFirewallHook interfaces
- Implement NftablesFirewall that creates UID-based egress rules
- Add GetNatsFirewallHook() to Platform interface
- Integrate BeforeConnect hook in nats_handler.go for connection/reconnection
- Support DNS re-resolution on reconnect for HA failover scenarios
- Add stub implementations for Windows and dummy platforms

The firewall rules allow only the agent's UID to connect to NATS/director
ports while blocking other processes, improving security posture.
Add comprehensive unit tests for the new firewall functionality:

platform/firewall tests (23 tests):
- SetupMonitFirewall: table/chain/rule creation, error handling
- SetupNATSFirewall: IPv4/IPv6, DNS resolution, https/empty URL handling
- BeforeConnect: delegation to SetupNATSFirewall
- Cleanup: table deletion and error handling

mbus/nats_handler tests (4 new tests):
- Firewall hook is called on Start
- BeforeConnect receives correct mbus URL
- Handler still starts when hook returns nil
- Warning logged but no failure when BeforeConnect errors

Also:
- Add DNSResolver interface for testable DNS resolution
- Inject resolver dependency via NewNftablesFirewallWithDeps
- Configure test logging to use GinkgoWriter for visibility
- Fix ST1023 linter error: omit type from var declaration
- Add linux_header.txt for counterfeiter to add build tags to Linux-only fakes
- Regenerate fake_nftables_conn.go and fake_dnsresolver.go with //go:build linux tag
- This fixes macOS/Windows build failures due to google/nftables being Linux-only
var natsOptions = []nats.Option{
nats.RetryOnFailedConnect(true),
nats.DisconnectErrHandler(func(c *nats.Conn, err error) {
h.logger.Debug(natsHandlerLogTag, "Nats disconnected with Error: %v", err.Error())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this was not introduced in your PR, but I see you handle err nil on line 170 and this seems to have similar issue?

}

// Update firewall rules before initial connection
h.updateFirewallForNATS()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The SetupNetworking in ubuntu_net_manager is still calling SetupNatsFirewall. Will this be an issue to have both iptables with cgroup matching and nftables?

&expr.Cmp{
Op: expr.CmpOpEq,
Register: 1,
Data: net.ParseIP("127.0.0.1").To4(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we also want to add a rule for ipv6 for loopback?

SetupNATSFirewall(mbusURL string) error

// Cleanup removes all agent-managed firewall rules
Cleanup() error
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see this is being called?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, have removed it, since even during a restart of the agent (which sometimes happens during an update_settings) it is better to keep the firewall rules in place.

p.firewallManager = mgr

// Set up monit firewall rules immediately
if err := mgr.SetupMonitFirewall(); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering if this can be more explicitly called during setup and not in a getter?

return f.conn.Flush()
}

func (f *NftablesFirewall) ensureTable() error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like these 2 methods always return nil?

@Alphasite
Copy link

Im a little worried by the general approach of teaching the agent about os-version specific things. Specifically I worry that it will (further?) violate layering by pushing version specific customisation from the stemcell into the agent.

Its probably ok here since this somewhat sits between agent setup and stemcell config, but i wanted to at least mention it even if nothing comes of it.

- Fix nil pointer dereference in DisconnectErrHandler when err is nil
- Remove iptables-based SetupNatsFirewall code (replaced by nftables)
- Remove unused Cleanup() method from firewall interface
- Move firewall initialization from lazy getter to explicit SetupFirewall()
- Add comment explaining IPv6 loopback is intentionally not protected
  (monit only binds to 127.0.0.1:2822)
@rkoster
Copy link
Contributor Author

rkoster commented Feb 6, 2026

Im a little worried by the general approach of teaching the agent about os-version specific things. Specifically I worry that it will (further?) violate layering by pushing version specific customisation from the stemcell into the agent.

There currently is nothing OS specific about this feature, because it works on both noble an jammy. So this is an effort to simplify and centralise all the different nats and monit firewall codepaths into the agent, where it can more easily be tested (compared to the stemcell builder).

The nftables library batches operations until Flush() is called, so
AddTable/AddChain/AddRule never return errors. Removing the misleading
error return types from these internal helper methods.
@rkoster rkoster requested a review from mariash February 6, 2026 20:14
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this file?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was confusing as I was expecting a C header based on naming.

Maybe clearer would be to put it in firewallfakes/ and name it linux_build_flag_header.txt or - if not in the directory - have the name indicate counterfeiter or fakes: {counterfeiter|fakes}_linux_build_flag_header.txt.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Pending Review | Discussion

Development

Successfully merging this pull request may close these issues.

4 participants