You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Issue 1 is fully functional on its own — mechanistic recommendations (denied host:port to allow rule, with L7 audit-then-refine). Issue 2 upgrades the analysis to produce intelligent, grouped, security-aware recommendations.
When a sandbox denies a network connection, the current remediation loop is entirely client-side: an agent skill must parse logs, detect denials, generate a policy update, and push it via openshell sandbox policy set. This requires significant client-side work, relies on correct agent skill execution, and provides no system-level enforcement for the approval step (no HITL gate). The best UX would be for sandboxes to auto-detect when network policies are preventing connections, generate structured policy change recommendations, and send them through the existing sandbox → gateway → user communication channel with an explicit human-in-the-loop approval step before any policy change takes effect.
The sandbox runs both a DenialAggregator and a lightweight PolicyAdvisor agent harness. The aggregator groups denials locally (typed data, precise timing, L7 samples). The PolicyAdvisor analyzes summaries using the cluster-wide inference model via inference.local, validates proposals against the local OPA engine, and submits proposed chunks to the gateway. The gateway is a thin persistence + validation + approval layer — it never calls an LLM.
Zero LLM configuration. Uses existing cluster inference (openshell cluster inference set). Every sandbox already has inference.local access via the proxy fast path (proxy.rs:217, bypasses OPA). No new env vars, API keys, or network policy entries.
Distributed scaling. Each sandbox makes its own LLM calls via inference.local. N sandboxes = N independent analysis pipelines. No gateway bottleneck.
Pre-validated proposals. The sandbox has the OPA engine — it validates proposed rules locally before submission. The gateway doesn't run OPA. Only the sandbox can catch complete-rule conflicts, L7 config inconsistencies, and schema errors before they reach the user.
Deep policy knowledge. The PolicyAdvisor embeds extracted knowledge from the generate-sandbox-policy skill — validation rules, decision trees, access presets, glob patterns, private IP handling, 20+ reference examples (~4000-5000 tokens, bundled in sandbox binary).
Thin gateway. Gateway is purely persistence + validation + approval workflow. No LLM client, no analysis triggers, no context window management.
Trust Model
The sandbox proposes policy chunks, but dual control ensures safety:
The PolicyAdvisor is our code (baked into the sandbox binary), not untrusted user code
The gateway validates all proposed chunks (rejects loopback/link-local, rate-limits, format checks)
The user approves every chunk (human-in-the-loop is mandatory)
LLM calls traverse the audited inference.local path
Core Concepts
Living Draft Policy
The system maintains a continuously-evolving draft policy per sandbox — not a queue of individual recommendations. The draft is composed of granular chunks, each a proposed rule addition tied to specific denial events. Users can view the full draft, inspect individual chunks with rationale, and selectively approve/reject at any time.
PolicyChunk Lifecycle
pending --> approved --> (superseded by Stage 2 refinement)
\-> rejected --> (superseded if re-analysis produces new chunk)
Each chunk carries: proposed rule, rationale, security notes, confidence score, denial references, stage (initial/refined), and optional supersession pointer.
Progressive L7 Visibility
Policy recommendations follow a two-stage pipeline that leverages L7 inspection for data-driven refinement:
Stage 1 (L4 denial → initial recommendation): When a new host:port is first denied and the port supports HTTP inspection (80, 443, 8080, etc.), the system recommends a rule with L7 audit mode: protocol: rest, tls: terminate (for 443), enforcement: audit, broad access. Traffic flows immediately while every HTTP request is logged with method + path by the existing L7 relay.
Stage 2 (L7 audit data → refined recommendation): The aggregator collects L7 audit events with (method, path) samples. Once enough data accumulates, a refined chunk replaces the audit-mode rule with specific access presets or explicit L7 rules based on observed traffic patterns. The refined chunk supersedes the Stage 1 chunk.
This eliminates the two-approval-cycle problem (read-only approved, then write needed) and produces data-driven L7 recommendations instead of guessing access levels. For L4-only protocols (SSH, databases, Kafka, etc.), Stage 1 produces a plain L4 rule with no Stage 2.
DenialAggregator
Groups by primary key (host, port, binary) with dedup windows. Key features:
Accurate counts: suppressed_count + total_count (not just threshold value)
Memory bounds: max_keys=1000 cap with overflow detection
Credential sanitization: strips Authorization, API keys, cookies from cmdline before storage
L7 event ingestion: collects per-request (method, path, decision) samples from L7 relay events (capped at 50 per entry)
DNS probe: one-shot lookup_host() at entry creation to detect private IPs early, enabling allowed_ips in initial chunks
PolicyAdvisor Agent Harness
A lightweight agent — not general-purpose tool-calling, but a fixed 1-2 call pattern:
First call: Analyze denials + propose chunks (using embedded skill context as system prompt via inference.local)
Validate each proposed chunk against the local OPA engine (check conflicts, L7 config, breadth warnings)
Second call (if needed): Fix chunks that failed validation, including the error messages
Submit validated chunks + summaries to gateway via SubmitPolicyAnalysis RPC
The skill context (~4000-5000 tokens) is extracted from the existing generate-sandbox-policy skill and bundled in the sandbox binary. It includes: validation rules, access preset definitions, L4 vs L7 decision tree, glob pattern translation, private IP / SSRF rules, protocol reference table, auth chain patterns, and reference examples.
Mechanistic mode: When cluster inference is not configured, the PolicyAdvisor skips LLM calls and runs rule-based analysis. Stage 1 produces L7 audit-mode rules for HTTP ports; Stage 2 computes access level from observed methods (all GET/HEAD/OPTIONS → read-only, POST but no DELETE → read-write, etc.).
DNS Resolution: Sandbox Probe + Gateway Verification
The DenialAggregator performs a speculative DNS lookup when it first observes a new (host, port) key. This enables the PolicyAdvisor to include allowed_ips in Stage 1 chunks for hosts that resolve to private IPs — without a second approval cycle.
Sandbox probe (untrusted, best-effort): One-shot tokio::net::lookup_host() at entry creation. If DNS resolves to RFC1918, the proposed rule includes allowed_ips: ["x.x.x.x/32"].
Gateway verification (trusted): Re-resolves independently on SubmitPolicyAnalysis receipt. If sandbox and gateway DNS diverge, gateway's resolution wins and a security warning is added.
Trusted CIDRs:--trusted-cidr at sandbox creation pre-authorizes known-good ranges, replacing per-host /32s with subnet CIDRs for environments with many internal services on the same subnet.
Affected Components
Component
Role
proto/navigator.proto, proto/sandbox.proto
New messages (L7RequestSample, DenialSummary, PolicyChunk, SubmitPolicyAnalysis), new RPCs, DraftPolicyUpdate stream event
crates/navigator-server/src/grpc.rs, persistence/
DraftPolicy Store, SubmitPolicyAnalysis handler, draft query/approval RPCs, DNS re-verification
Sandbox DNS probe at aggregation + gateway re-verification
Chunk granularity
Coarse by default (LLM groups related services)
Auto-expiry
1h default for pending chunks
Rejection semantics
Backoff after 2 rejections for same (host, port); draft retry to re-queue
Risks & Open Questions
Trust & Security
Privilege escalation pathway: A compromised sandbox could craft denial patterns to manipulate operators into approving access to sensitive endpoints. Mitigation: gateway validates proposals (reject loopback/link-local/metadata IPs), dual-control approval, audit trail.
Sandbox DNS untrusted: Speculative DNS probe results could be poisoned. Mitigation: gateway re-verifies independently; gateway's resolution is the trust anchor.
Rate limiting: Max 10 pending chunks per sandbox, adaptive analysis intervals (10s cold-start → 2m steady-state). Gateway-enforced.
Strictly additive: Recommendations can only propose adding entries to network_policies. Cannot modify static fields, remove restrictions, or change network mode.
Open Questions
#
Question
Recommendation
1
Chunk granularity — fine (per host:port) or coarse (per activity pattern)?
Coarse by default. LLM groups related services.
2
Auto-expiry for pending chunks?
Yes, 1h default with configurable TTL.
3
Cross-sandbox learning?
Defer to v2.
4
Agent workflow UX — autonomous agents can't pause for approval
Policy templates + audit mode as interim. Needs further design.
5
Wildcard host matching (*.googleapis.com)?
Defer to v2. Each unique hostname is a separate rule entry for now.
Test Considerations
Unit tests: DenialAggregator dedup/threshold/cooldown/L7 sample logic, OPA validation of proposed chunks, cmdline sanitization, DNS probe
Security tests: SSRF protections after approved endpoints, static field immutability through merge path, rate limiting, gateway DNS re-verification vs sandbox probe divergence
Implementation Plan
This feature is split into two issues for incremental delivery:
inference.local, skill context extraction, OPA validation loop, context window management, progressive L7 intelligence (~5-6 days)Issue 1 is fully functional on its own — mechanistic recommendations (denied host:port to allow rule, with L7 audit-then-refine). Issue 2 upgrades the analysis to produce intelligent, grouped, security-aware recommendations.
Full design: https://gitlab-master.nvidia.com/-/snippets/12930
Problem Statement
When a sandbox denies a network connection, the current remediation loop is entirely client-side: an agent skill must parse logs, detect denials, generate a policy update, and push it via
openshell sandbox policy set. This requires significant client-side work, relies on correct agent skill execution, and provides no system-level enforcement for the approval step (no HITL gate). The best UX would be for sandboxes to auto-detect when network policies are preventing connections, generate structured policy change recommendations, and send them through the existing sandbox → gateway → user communication channel with an explicit human-in-the-loop approval step before any policy change takes effect.Architecture: Sandbox Aggregator + Sandbox PolicyAdvisor + Gateway Persistence
The sandbox runs both a
DenialAggregatorand a lightweightPolicyAdvisoragent harness. The aggregator groups denials locally (typed data, precise timing, L7 samples). The PolicyAdvisor analyzes summaries using the cluster-wide inference model viainference.local, validates proposals against the local OPA engine, and submits proposed chunks to the gateway. The gateway is a thin persistence + validation + approval layer — it never calls an LLM.Why Sandbox-Side LLM
openshell cluster inference set). Every sandbox already hasinference.localaccess via the proxy fast path (proxy.rs:217, bypasses OPA). No new env vars, API keys, or network policy entries.inference.local. N sandboxes = N independent analysis pipelines. No gateway bottleneck.generate-sandbox-policyskill — validation rules, decision trees, access presets, glob patterns, private IP handling, 20+ reference examples (~4000-5000 tokens, bundled in sandbox binary).Trust Model
The sandbox proposes policy chunks, but dual control ensures safety:
inference.localpathCore Concepts
Living Draft Policy
The system maintains a continuously-evolving draft policy per sandbox — not a queue of individual recommendations. The draft is composed of granular chunks, each a proposed rule addition tied to specific denial events. Users can view the full draft, inspect individual chunks with rationale, and selectively approve/reject at any time.
PolicyChunk Lifecycle
Each chunk carries: proposed rule, rationale, security notes, confidence score, denial references, stage (initial/refined), and optional supersession pointer.
Progressive L7 Visibility
Policy recommendations follow a two-stage pipeline that leverages L7 inspection for data-driven refinement:
Stage 1 (L4 denial → initial recommendation): When a new host:port is first denied and the port supports HTTP inspection (80, 443, 8080, etc.), the system recommends a rule with L7 audit mode:
protocol: rest,tls: terminate(for 443),enforcement: audit, broad access. Traffic flows immediately while every HTTP request is logged with method + path by the existing L7 relay.Stage 2 (L7 audit data → refined recommendation): The aggregator collects L7 audit events with (method, path) samples. Once enough data accumulates, a refined chunk replaces the audit-mode rule with specific access presets or explicit L7 rules based on observed traffic patterns. The refined chunk supersedes the Stage 1 chunk.
This eliminates the two-approval-cycle problem (read-only approved, then write needed) and produces data-driven L7 recommendations instead of guessing access levels. For L4-only protocols (SSH, databases, Kafka, etc.), Stage 1 produces a plain L4 rule with no Stage 2.
DenialAggregator
Groups by primary key
(host, port, binary)with dedup windows. Key features:suppressed_count+total_count(not just threshold value)max_keys=1000cap with overflow detectionpersistent_threshold+ periodic stale-flush(method, path, decision)samples from L7 relay events (capped at 50 per entry)lookup_host()at entry creation to detect private IPs early, enablingallowed_ipsin initial chunksPolicyAdvisor Agent Harness
A lightweight agent — not general-purpose tool-calling, but a fixed 1-2 call pattern:
inference.local)SubmitPolicyAnalysisRPCThe skill context (~4000-5000 tokens) is extracted from the existing
generate-sandbox-policyskill and bundled in the sandbox binary. It includes: validation rules, access preset definitions, L4 vs L7 decision tree, glob pattern translation, private IP / SSRF rules, protocol reference table, auth chain patterns, and reference examples.Mechanistic mode: When cluster inference is not configured, the PolicyAdvisor skips LLM calls and runs rule-based analysis. Stage 1 produces L7 audit-mode rules for HTTP ports; Stage 2 computes access level from observed methods (all GET/HEAD/OPTIONS → read-only, POST but no DELETE → read-write, etc.).
DNS Resolution: Sandbox Probe + Gateway Verification
The DenialAggregator performs a speculative DNS lookup when it first observes a new (host, port) key. This enables the PolicyAdvisor to include
allowed_ipsin Stage 1 chunks for hosts that resolve to private IPs — without a second approval cycle.tokio::net::lookup_host()at entry creation. If DNS resolves to RFC1918, the proposed rule includesallowed_ips: ["x.x.x.x/32"].SubmitPolicyAnalysisreceipt. If sandbox and gateway DNS diverge, gateway's resolution wins and a security warning is added.--trusted-cidrat sandbox creation pre-authorizes known-good ranges, replacing per-host /32s with subnet CIDRs for environments with many internal services on the same subnet.Affected Components
proto/navigator.proto,proto/sandbox.protoL7RequestSample,DenialSummary,PolicyChunk,SubmitPolicyAnalysis), new RPCs,DraftPolicyUpdatestream eventcrates/navigator-server/src/grpc.rs,persistence/SubmitPolicyAnalysishandler, draft query/approval RPCs, DNS re-verificationcrates/navigator-sandbox/src/proxy.rs,l7/relay.rscrates/navigator-sandbox/src/(new modules)DenialAggregator,PolicyAdvisoragent harness,SubmitPolicyAnalysisclientcrates/navigator-cli/src/main.rs,run.rssandbox draftcommands,cluster policy-advisortogglecrates/navigator-tui/src/app.rs,ui/Proto Changes
New messages:
L7RequestSample— observed HTTP method+path pattern from L7 inspectionDenialSummary— withl7_request_samples,l7_inspection_active,denial_stage(l4_deny|l7_deny|l7_audit|ssrf), sandbox-probedresolved_ipsPolicyChunk— withstage(initial/refined),supersedes_chunk_id,security_notes,confidenceSubmitPolicyAnalysisRequest/Response— atomic submission of summaries + proposed chunks + analysis modeDraftPolicyUpdate— newSandboxStreamEventvariant for real-time notificationsNew RPCs:
SubmitPolicyAnalysis— sandbox → gatewayGetDraftPolicy,ApproveDraftChunk,RejectDraftChunk,ApproveAllDraftChunks,EditDraftChunk,UndoDraftChunk,GetDraftHistory— CLI/TUI → gatewayCLI UX
Phased Implementation
--policy-template python-dev/node-dev/agent-coding) + trusted networks (--trusted-cidr)SubmitPolicyAnalysisRPC + draft query/approval RPCs) + basic CLI (sandbox draft,draft approve,draft reject) + gateway-side DNS verification + cmdline sanitizationDenialAggregator(counts, caps, expiry, stale-flush, L7 event ingestion, DNS probe) + mechanisticPolicyAdvisor(Stage 1 L7 audit + Stage 2 access refinement, no LLM) +SubmitPolicyAnalysisclientPolicyAdvisoragent harness: skill context extraction +inference.localLLM calls + OPA validation loop + fix-and-retry + adaptive triggers + pre-filtering + context window management. CLI:cluster policy-advisor enable/disabledraft edit,draft reject --reason,draft undo,draft history) + TUI draft panel with keybindings + chunk supersession UXapprove --allsafety gate + rejection backoff + hostname normalization + sensitive endpoint blocklistTotal: ~24-30 days. Phases 0-1 ship independently. Phase 0 = cold-start friction reduction. Phase 1 = gateway MVP. Phase 2 = sandbox MVP with mechanistic recommendations. Phase 3 = LLM intelligence. Phase 4 = full UX. Phase 5 = security hardening.
Key Design Decisions
inference.local(not gateway)(host, port, binary)primary + L7 (method, path) sub-samplesdraft retryto re-queueRisks & Open Questions
Trust & Security
network_policies. Cannot modify static fields, remove restrictions, or change network mode.Open Questions
*.googleapis.com)?Test Considerations
DenialAggregatordedup/threshold/cooldown/L7 sample logic, OPA validation of proposed chunks, cmdline sanitization, DNS probesandbox draft/draft approveflow, TUI draft panel rendering