This document provides a deep dive into the go-microvm networking subsystem, including the in-process userspace network stack, wire protocol, firewall architecture, and extension points.
- Overview
- Architecture
- QEMU Wire Protocol
- Network Topology
- VirtualNetwork Lifecycle
- Firewall Architecture
- Performance
- Usage Examples
- Provider Interface
go-microvm uses a userspace network stack powered by gvisor-tap-vsock. All VM traffic flows as Ethernet frames with no kernel networking between host and guest, and no separate gvproxy binary is needed.
The gvisor-tap-vsock library (used by podman machine, lima, and libkrun) is imported directly as a Go dependency. It provides a complete virtual network stack including DHCP, DNS, and TCP port forwarding.
There are two networking modes:
- Runner-side (default): When no
WithNetProvider()is set, the runner process creates a VirtualNetwork connected to libkrun via a socketpair. Port forwards are passed in the runner config JSON. This is the simplest path -- no Unix socket, no external process coordination. - Hosted (caller-side): When
WithNetProvider()is set (e.g., withnet/hosted.Provider), the VirtualNetwork runs in the caller's process and exposes a Unix socket that the runner connects to. This allows the caller to access the VirtualNetwork directly for gonet listeners and HTTP services on the gateway IP.
Key properties:
- Userspace only: All packet processing happens in Go. No iptables, no eBPF, no network namespaces.
- Frame-level access: Every Ethernet frame passes through Go code, enabling the optional firewall to inspect and filter traffic.
- Shared topology: Network constants (subnet, gateway, IPs, MTU) are
centralized in the
net/topologypackage.
The networking subsystem has two modes depending on how the caller configures it, and within the hosted mode, an optional firewall relay.
When no WithNetProvider() is set, the runner creates a VirtualNetwork
in-process and connects it to libkrun via a socketpair:
+----------------------------------go-microvm-runner process------+
| |
| +----------+ socketpair +-------------------+ Go net |
| | libkrun | (fd pair) | VirtualNetwork |----------+ |
| | virtio- |<===============> | (gVisor netstack) | | |
| | net | Ethernet frames | DHCP, DNS, | | |
| +----------+ | port forwarding | | |
| | +-------------------+ | |
| +---v------+ | | |
| | Guest VM | 127.0.0.1:<port> | |
| | eth0: | port forward listeners | |
| | 192.168. | | +---------+
| | 127.2 | +------> | Host |
| +----------+ | Network |
+----------------------------------------------------------+---------+
Port forwards are configured via the runner config JSON. The VirtualNetwork runs as goroutines in the runner process, tied to the VM's lifetime.
When WithNetProvider() is set (e.g., net/hosted.Provider), the
VirtualNetwork runs in the caller's process and exposes a Unix socket:
+----------+ +-------------------+ +---------+
| Guest VM | | VirtualNetwork | | Host |
| | Unix socket | (gVisor netstack) | Go net | Network |
| virtio- |===(QEMU wire)===> | |--------->| |
| net | SOCK_STREAM | DHCP, DNS, | | |
| | 4B BE + frame | port forwarding | | |
+----------+ +-------------------+ +---------+
(in runner process) (in caller's process)
When firewall rules are configured via WithFirewallRules() with a hosted
provider, a relay is inserted between the VM socket and the VirtualNetwork.
The relay intercepts every Ethernet frame, parses headers, and applies
allow/deny rules with stateful connection tracking:
+----------+ +-----------------+ +-------------------+
| Guest VM | | Relay + Filter | | VirtualNetwork |
| | Unix socket | | net.Pipe | (gVisor netstack) |
| virtio- |===(QEMU wire)===>| egress gor. --->|===(in-mem)===>| |
| net | SOCK_STREAM | ingress gor.<---|<==(in-mem)====| DHCP, DNS, |
| | 4B BE + frame | | | port forwarding |
+----------+ | - parse ETH/IP | +-------------------+
| - conntrack | |
| - rule matching | +---------+
| - metrics | | Host |
+-----------------+ | Network |
+---------+
The relay creates a net.Pipe() -- one end is passed to
VirtualNetwork.AcceptQemu(), the other is used by the relay. Two goroutines
handle egress (VM to network) and ingress (network to VM) independently.
The QEMU transport is a stream protocol over a Unix domain socket
(SOCK_STREAM). Every Ethernet frame is prefixed with a 4-byte big-endian
length header.
+---+---+---+---+---+---+---+---+---+---+---+...
| Length (4B BE) | Ethernet Frame |
+---+---+---+---+---+---+---+---+---+---+---+...
|<--- 4 bytes -->|<-------- N bytes ------------->|
- Length field:
uint32, big-endian. Value is the number of bytes in the Ethernet frame that follows (does NOT include the 4-byte header itself). - Ethernet frame: Raw L2 frame starting with destination MAC address.
- No handshake: Data flows immediately after socket connection.
- Max frame size: Practically limited by MTU (default 1500 bytes).
libkrun's krun_add_net_unixstream speaks the QEMU wire protocol: a
SOCK_STREAM Unix socket with 4-byte big-endian length-prefixed Ethernet
frames. The gvisor-tap-vsock library's AcceptQemu() method uses the same
framing, making them directly compatible.
| Protocol | Header | Byte Order | Socket Type | Use Case |
|---|---|---|---|---|
| QEMU | 4 bytes | Big-endian | SOCK_STREAM | libkrun, QEMU |
| VfKit | None | N/A | SOCK_DGRAM | macOS Virt.framework |
| BESS | None | N/A | SOCK_SEQPACKET | User Mode Linux |
+---------------------------------------------------+
| Host Machine |
| |
| +---------------+ Unix socket +-----------+ |
| | VirtualNetwork|---(SOCK_STREAM)->| libkrun | |
| | (in-process) | 4-byte BE len | virtio- | |
| | | prefix frames | net | |
| | Gateway: | | | |
| | 192.168.127.1 | +-----------+ |
| | | | |
| | DHCP server | +----v-----+ |
| | DNS server | | Guest VM | |
| | Port forwards | | | |
| +---------------+ | eth0: | |
| | | 192.168. | |
| | Port forwards: | 127.2 | |
| | localhost:8080 | | |
| +-----> guest:80 +----------+ |
| | localhost:2222 |
| +-----> guest:22 |
+---------------------------------------------------+
| Property | Value |
|---|---|
| Gateway | 192.168.127.1 (VirtualNetwork, in-process) |
| Guest IP | 192.168.127.2 (DHCP assigned) |
| Subnet | 192.168.127.0/24 |
| Socket type | Unix domain, SOCK_STREAM |
| Wire format | 4-byte big-endian length prefix + Ethernet frame |
| DHCP | Built into VirtualNetwork |
| DNS | Built into VirtualNetwork |
| Port forwarding | TCP, host-to-guest only |
When no custom provider is set, the runner's setupInProcessNetworking()
creates the VirtualNetwork during VM startup:
- Builds the port forward map from
runner.Config.PortForwards. - Creates a
virtualnetwork.New()instance using constants fromnet/topology(subnet, gateway IP/MAC, MTU). - Creates a
socketpair(AF_UNIX, SOCK_STREAM). - Wraps one end as a
net.Connand passes it toAcceptQemu()in a background goroutine. - Returns the other fd to be passed to
krun_add_net_unixstream().
The VirtualNetwork goroutines run alongside krun_start_enter() and are
torn down when the runner process exits (i.e., when the guest shuts down).
When using net/hosted.Provider, the lifecycle is managed in the caller's
process:
Provider.Start() performs the following:
- Creates a
virtualnetwork.New()instance with the network configuration (subnet, gateway, port forwards, DHCP, DNS) usingnet/topologyconstants. - Starts any registered HTTP services on the VirtualNetwork via
VirtualNetwork.Listen(). - Creates a Unix listener at the socket path (
hosted-net.sockin the data directory). - If firewall rules are configured, creates a
firewall.Filterandfirewall.Relay, and starts the conntrack expiry goroutine. - Starts an accept loop goroutine. For each runner connection:
- With firewall: creates a
net.Pipe(), runs the relay between the runner connection and the pipe, passes the pipe toAcceptQemu(). - Without firewall: passes the runner connection directly to
AcceptQemu().
- With firewall: creates a
- Returns once the listener is ready.
Provider.Stop() tears down everything:
- Gracefully shuts down HTTP services (5-second timeout per service).
- Cancels the context, which signals all goroutines to exit.
- Closes the Unix listener.
- Waits for the accept loop and all connection handlers to finish.
- Removes the socket file.
All goroutines are tracked via sync.WaitGroup and context cancellation.
The firewall provides frame-level packet filtering with stateful connection tracking. It operates entirely in userspace by intercepting Ethernet frames as they pass between the VM socket and the VirtualNetwork.
The firewall inserts a relay between the VM's Unix socket and the VirtualNetwork. The relay reads each frame, parses the Ethernet and IP headers, applies firewall rules, and either forwards or drops the frame.
Each Ethernet frame is parsed at fixed offsets with zero allocations:
-
Ethernet header (14 bytes): Destination MAC (6B), Source MAC (6B), EtherType (2B). EtherType 0x0800 = IPv4, 0x0806 = ARP, 0x86DD = IPv6.
-
IPv4 header (20+ bytes, starts at offset 14): Protocol field at byte 23 (6=TCP, 17=UDP, 1=ICMP). Source IP at bytes 26-29, destination IP at bytes 30-33. IHL field gives header length.
-
TCP/UDP header (starts at offset 14 + IHL*4): Source port (2B), destination port (2B).
Non-IPv4 frames (ARP, IPv6, LLDP) are always passed through without filtering. They are essential for the network stack to function (ARP resolution, etc.).
Each firewall rule specifies:
| Field | Type | Description |
|---|---|---|
| Direction | Ingress/Egress | Ingress = outside to VM, Egress = VM to outside |
| Action | Allow/Deny | What to do when matched |
| Protocol | uint8 | 6=TCP, 17=UDP, 1=ICMP; 0=any |
| SrcCIDR | net.IPNet | Source IP range |
| DstCIDR | net.IPNet | Destination IP range |
| SrcPort | uint16 | Source port; 0=any |
| DstPort | uint16 | Destination port; 0=any |
Rules are evaluated in order. First match wins (same as iptables). If no
rule matches, the default action applies (configurable via
WithFirewallDefaultAction(); defaults to Allow when no rules are set).
The firewall tracks active connections using a 5-tuple key:
connKey = { protocol, srcIP, dstIP, srcPort, dstPort }
When a rule allows a packet, the connection tracker records the flow. Return traffic (with source and destination swapped) is automatically allowed via a reverse-lookup in the connection table. This means you do not need explicit ingress rules for return traffic from allowed egress connections.
TTLs:
- TCP connections: 5 minutes idle timeout
- UDP flows: 30 seconds idle timeout
An expiry goroutine periodically sweeps the connection table to remove stale entries.
Memory: Each conntrack entry is approximately 100 bytes. A typical VM workload of 200-500 concurrent flows uses around 50 KB.
For each frame, the filter follows this path:
- Conntrack fast path: Check if the packet belongs to an already-allowed flow via reverse-lookup. If yes, allow immediately (most common case for established connections).
- Rule walk: Iterate through rules in order. First match wins. If the matching rule allows the packet, record it in the connection tracker.
- Default action: If no rule matches, apply the configured default action (Allow or Deny).
The relay runs two goroutines:
- Egress goroutine: Reads frames from the VM socket, applies filter, writes to the VirtualNetwork pipe.
- Ingress goroutine: Reads frames from the VirtualNetwork pipe, applies filter, writes to the VM socket.
Each goroutine uses a buffered reader (64 KB) and a reusable frame buffer. Frames that the filter denies are silently dropped (not forwarded). Atomic counters track forwarded frames, dropped frames, and bytes forwarded.
The firewall adds minimal overhead per Ethernet frame:
| Operation | Cost |
|---|---|
| Read frame (4-byte prefix + payload) | Required regardless -- no added cost |
| Parse Ethernet + IPv4 headers | ~10ns -- fixed-offset reads, no allocations |
| Connection tracker lookup | ~20ns -- map lookup under RLock |
| Rule matching (per rule, on miss) | ~5ns -- simple comparisons |
| Write frame (forward) | Required regardless -- no added cost |
Total added latency: ~50-100ns per frame. At 1 Gbps with 1500-byte frames (~83,000 frames/sec), the firewall adds roughly 4ms of CPU time per second. This is negligible at typical VM throughput.
Memory overhead:
- Connection tracker: ~100 bytes per entry, typically 200-500 entries = ~50 KB
- Frame buffer: ~2 KB per direction, reused
- Rule slice: typically <20 rules = negligible
Allow the VM to make DNS queries and HTTPS connections, but deny all other outbound traffic. Inbound traffic is denied except on explicitly allowed ports. Return traffic for allowed connections is automatically permitted via connection tracking.
import "github.com/stacklok/go-microvm/net/firewall"
vm, err := microvm.Run(ctx, "my-app:latest",
microvm.WithPorts(
microvm.PortForward{Host: 8080, Guest: 80},
microvm.PortForward{Host: 2222, Guest: 22},
),
microvm.WithFirewallDefaultAction(firewall.Deny),
microvm.WithFirewallRules(
// Egress: allow DNS and HTTPS
firewall.Rule{
Direction: firewall.Egress,
Action: firewall.Allow,
Protocol: 17, // UDP
DstPort: 53, // DNS
},
firewall.Rule{
Direction: firewall.Egress,
Action: firewall.Allow,
Protocol: 6, // TCP
DstPort: 443, // HTTPS
},
// Ingress: allow SSH and HTTP
firewall.Rule{
Direction: firewall.Ingress,
Action: firewall.Allow,
Protocol: 6,
DstPort: 22,
},
firewall.Rule{
Direction: firewall.Ingress,
Action: firewall.Allow,
Protocol: 6,
DstPort: 80,
},
),
)vm, err := microvm.Run(ctx, "my-server:latest",
microvm.WithPorts(
microvm.PortForward{Host: 8443, Guest: 443},
microvm.PortForward{Host: 6443, Guest: 6443},
),
microvm.WithFirewallDefaultAction(firewall.Deny),
microvm.WithFirewallRules(
// Allow all egress (VM can reach the internet)
firewall.Rule{
Direction: firewall.Egress,
Action: firewall.Allow,
},
// Allow specific ingress ports
firewall.Rule{
Direction: firewall.Ingress,
Action: firewall.Allow,
Protocol: 6,
DstPort: 443,
},
firewall.Rule{
Direction: firewall.Ingress,
Action: firewall.Allow,
Protocol: 6,
DstPort: 6443,
},
firewall.Rule{
Direction: firewall.Ingress,
Action: firewall.Allow,
Protocol: 6,
DstPort: 22,
},
),
)When no firewall rules are configured, all traffic passes through unrestricted. This is the default behavior:
vm, err := microvm.Run(ctx, "alpine:latest",
microvm.WithPorts(microvm.PortForward{Host: 8080, Guest: 80}),
)WithEgressPolicy() restricts VM outbound traffic to a set of allowed DNS
hostnames. Instead of writing firewall rules for specific IPs (which change
often), you specify hostnames and let go-microvm handle the rest.
vm, err := microvm.Run(ctx, "my-app:latest",
microvm.WithPorts(microvm.PortForward{Host: 8080, Guest: 80}),
microvm.WithEgressPolicy(microvm.EgressPolicy{
AllowedHosts: []microvm.EgressHost{
{Name: "api.github.com", Ports: []uint16{443}},
{Name: "*.docker.io"},
{Name: "ntp.ubuntu.com", Ports: []uint16{123}, Protocol: 17},
},
}),
)How it works:
- The firewall default action is forced to Deny. A hosted network provider is auto-created if none was configured.
- Implicit firewall rules are added for DNS (to gateway), DHCP, and port-forwarded ingress ports.
- A
DNSInterceptoris wired into the relay between the VM and the VirtualNetwork. - Egress DNS queries: The interceptor parses each outbound DNS query. If the queried hostname is not in the allowlist, it returns an NXDOMAIN response directly to the VM. Allowed queries pass through normally.
- Ingress DNS responses: For allowed hostnames, the interceptor parses A records from responses and creates temporary firewall rules for those IPs. The rule TTL matches the DNS record TTL (minimum 60 seconds).
- The VM can only connect to IPs that were resolved from allowed hostnames. All other egress traffic is denied by the default-deny policy.
Interaction with static firewall rules:
Static rules added via WithFirewallRules() are evaluated before dynamic
rules. You can use static rules alongside an egress policy to allow
additional traffic (e.g., specific IP ranges) that doesn't go through DNS.
Implicit rules (DNS, DHCP, port forwards) are prepended before user rules.
Limitations:
- Hardcoded IPs bypass DNS: If the VM connects to an IP directly (without DNS resolution), the egress policy cannot block it unless the default-deny catches it. This is mitigated by the default-deny policy — only IPs learned from allowed DNS responses get dynamic allow rules.
- DNS-over-HTTPS (DoH): Blocked by the default-deny policy since HTTPS to DoH servers would need to be in the allowlist. Standard DNS over UDP port 53 is the only supported resolution path.
- IPv6: Only IPv4 A records create dynamic rules. AAAA records are ignored.
The networking layer is abstracted behind the net.Provider interface:
type Provider interface {
// Start launches the network provider. Must block until ready.
Start(ctx context.Context, cfg Config) error
// SocketPath returns the Unix socket path for virtio-net.
SocketPath() string
// Stop terminates the provider and cleans up.
Stop()
}Config contains:
LogDir-- directory for log filesForwards-- slice ofPortForward{Host, Guest}for TCP forwardingFirewallRules-- optional packet filtering rules for frame-level filteringFirewallDefaultAction-- default action when no rule matches (Allow or Deny)
By default (no WithNetProvider()), networking runs inside the runner
process. The net/hosted package provides a ready-made hosted provider
that runs the VirtualNetwork in the caller's process with support for
HTTP services on the gateway IP.
To replace the default runner-side networking with an alternative backend (e.g., passt, slirp4netns, or a custom bridge):
- Implement the
net.Providerinterface. Start()must block until the Unix socket is ready for connections.- The socket must use
SOCK_STREAMwith 4-byte big-endian length-prefixed Ethernet frames (the QEMU transport protocol). - Pass your provider via
microvm.WithNetProvider(myProvider).
The SocketPath() return value is passed to the runner as the Unix socket
path for krun_add_net_unixstream. See net/hosted/provider.go for the
reference implementation.