Problem Statement
Operators need a durable way to collect sandbox agent and supervisor logs, especially on Kubernetes. Today openshell logs is useful for interactive diagnosis, but it is backed by a bounded in-memory gateway buffer and is not sufficient as the production log export path. OpenShell already writes the complete sandbox log record to files inside the sandbox, so the investigation should look for the simplest architecture that lets operators collect those files across compute drivers.
This is a focused follow-up to #1055 and was motivated by the OCSF JSONL validation work in #1917 / #1921.
Technical Context
OpenShell currently has two sandbox log paths:
- The supervisor process writes local files inside the sandbox, including the shorthand log and opt-in OCSF JSONL file.
- A tracing layer pushes sandbox log lines over gRPC to the gateway, where they are kept in a per-sandbox in-memory tail buffer for
openshell logs, the TUI, and watch streams.
The file-backed path is the complete record. The gRPC push path is best-effort and volatile. On Kubernetes, operators naturally expect cluster log collection via pod logs, sidecars, daemonsets, or OpenTelemetry collectors, but the OpenShell files live inside the agent container filesystem and are not directly exposed to kubectl logs or standard file collectors unless the pod has an appropriate shared volume or sidecar.
Affected Components
| Component |
Key Files |
Role |
| Public API / CLI log access |
proto/openshell.proto, crates/openshell-cli/src/run.rs |
Defines GetSandboxLogs, PushSandboxLogs, WatchSandbox, and implements openshell logs. |
| Gateway log buffer |
crates/openshell-server/src/tracing_bus.rs, crates/openshell-server/src/grpc/policy.rs |
Stores pushed logs in memory and serves tail/watch requests. |
| Sandbox logging |
crates/openshell-sandbox/src/main.rs, crates/openshell-supervisor-process/src/log_push.rs, crates/openshell-ocsf/src/tracing_layers/jsonl_layer.rs |
Writes local files, emits OCSF shorthand/JSONL, and pushes best-effort lines to the gateway. |
| Settings |
crates/openshell-core/src/settings.rs |
Defines ocsf_json_enabled, which controls full OCSF JSONL output. |
| Compute-driver contract |
proto/compute_driver.proto |
Driver-neutral contract currently has lifecycle/status operations but no log collection/export contract. |
| Kubernetes driver and Helm chart |
crates/openshell-driver-kubernetes/src/driver.rs, crates/openshell-driver-kubernetes/src/config.rs, deploy/helm/openshell/values.yaml |
Builds sandbox pod specs and is the natural place for sidecar/shared-volume support. |
| Docker / Podman drivers |
crates/openshell-driver-docker/src/lib.rs, crates/openshell-driver-podman/src/container.rs, crates/openshell-core/src/driver_mounts.rs |
Already support validated driver-owned mounts that could support file-based log collection on local drivers. |
| Docs / architecture |
docs/observability/accessing-logs.mdx, docs/observability/ocsf-json-export.mdx, architecture/sandbox.md, architecture/gateway.md |
Document that gateway log storage is bounded/volatile and files are durable. |
Technical Investigation
Architecture Overview
openshell logs resolves a sandbox name to an ID and then either calls WatchSandbox for tailing or GetSandboxLogs for one-shot reads. Both paths read from gateway-side streams, not from the sandbox filesystem. The gateway receives sandbox-originated logs through PushSandboxLogs, normalizes their source to sandbox, and publishes them into TracingLogBus.
TracingLogBus maintains a broadcast channel and a per-sandbox VecDeque tail buffer. The default tail cap is 2000 lines. The buffer is process-local memory and is dropped on gateway restart, sandbox deletion, or gateway rotation.
Inside the sandbox, tracing is layered so the shorthand formatter writes to stderr and the local file appender, while the JSONL layer writes OCSF JSONL when enabled. The same subscriber also installs the best-effort LogPushLayer. That push layer uses bounded channels, try_send, batching, reconnect, and backoff. Its behavior is intentionally non-blocking, so it can drop logs when the sandbox is under pressure or disconnected.
Kubernetes sandbox pods are built by the Kubernetes driver from a generated pod template. The driver injects a single agent container, required volumes for TLS/bootstrap identity, optional SPIFFE mounts, supervisor side-loading, and workspace persistence. Current Kubernetes platform_config support is intentionally narrow and does not expose arbitrary sidecars or arbitrary shared log mounts. The Helm chart exposes gateway and sandbox driver settings, but no log collector configuration today.
Docker and Podman already have validated driver mount configuration. They can map named volumes, bind mounts, and tmpfs/image mounts depending on driver support and safety settings. This makes file-based log collection plausible outside Kubernetes, but there is no OpenShell-owned log directory/mount convention yet.
Code References
| Location |
Description |
proto/openshell.proto:156 |
Public API defines GetSandboxLogs as the one-shot recent log fetch path. |
proto/openshell.proto:159 |
Public API defines PushSandboxLogs as the supervisor-to-gateway client-streaming path. |
proto/openshell.proto:768 |
WatchSandboxRequest.follow_logs streams gateway-correlated logs. |
proto/openshell.proto:1329 |
GetSandboxLogsRequest supports tail count, time, source, and level filters only. |
crates/openshell-cli/src/run.rs:7211 |
sandbox_logs resolves sandbox identity and calls gateway log APIs. |
crates/openshell-cli/src/run.rs:7255 |
Tail mode uses WatchSandbox. |
crates/openshell-cli/src/run.rs:7283 |
One-shot mode uses GetSandboxLogs. |
crates/openshell-server/src/tracing_bus.rs:17 |
Gateway log bus is keyed by sandbox ID. |
crates/openshell-server/src/tracing_bus.rs:88 |
Tail reads from the in-memory buffer. |
crates/openshell-server/src/tracing_bus.rs:114 |
Default tail capacity is 2000 lines per sandbox. |
crates/openshell-server/src/grpc/policy.rs:2075 |
handle_get_sandbox_logs serves the gateway buffer. |
crates/openshell-server/src/grpc/policy.rs:2114 |
handle_push_sandbox_logs ingests supervisor-pushed log batches. |
crates/openshell-supervisor-process/src/log_push.rs:17 |
LogPushLayer captures tracing events for gateway push. |
crates/openshell-supervisor-process/src/log_push.rs:87 |
Log push is best-effort and drops if the channel is full. |
crates/openshell-supervisor-process/src/log_push.rs:97 |
Background task batches and streams logs to the gateway. |
crates/openshell-sandbox/src/main.rs:276 |
Sandbox sets up local file logging guards. |
crates/openshell-sandbox/src/main.rs:279 |
OCSF JSONL file appender writes openshell-ocsf*.log under /var/log. |
crates/openshell-sandbox/src/main.rs:299 |
Tracing subscriber combines shorthand file output, JSONL output, and gateway push. |
crates/openshell-ocsf/src/tracing_layers/jsonl_layer.rs:21 |
JSONL output is gated by a runtime-enabled flag. |
crates/openshell-core/src/settings.rs:118 |
ocsf_json_enabled is documented as writing /var/log/openshell-ocsf*.log. |
proto/compute_driver.proto:18 |
Compute driver service has lifecycle/status operations but no log export RPC. |
proto/compute_driver.proto:117 |
DriverSandboxTemplate.platform_config is the platform-specific escape hatch. |
crates/openshell-driver-kubernetes/src/driver.rs:1249 |
Kubernetes driver builds the sandbox pod template. |
crates/openshell-driver-kubernetes/src/driver.rs:1418 |
Kubernetes driver assembles the agent container volume mounts. |
crates/openshell-driver-kubernetes/src/driver.rs:1454 |
Kubernetes driver assembles pod volumes. |
crates/openshell-driver-kubernetes/src/driver.rs:1515 |
Kubernetes workspace persistence uses an injected PVC/init-container pattern. |
crates/openshell-driver-kubernetes/src/driver.rs:1525 |
Kubernetes pod driver config currently merges node selector, priority class, and tolerations. |
crates/openshell-driver-kubernetes/src/config.rs:161 |
Kubernetes compute config schema has no log collector fields today. |
deploy/helm/openshell/values.yaml:147 |
Helm server values expose sandbox driver settings but no log collection options. |
crates/openshell-driver-docker/src/lib.rs:1656 |
Docker driver parses and validates mount config. |
crates/openshell-driver-docker/src/lib.rs:2142 |
Docker driver builds the container spec and host config. |
crates/openshell-driver-podman/src/container.rs:787 |
Podman driver builds the container spec. |
crates/openshell-driver-podman/src/container.rs:810 |
Podman driver already creates a named /sandbox workspace volume. |
crates/openshell-core/src/driver_mounts.rs:56 |
Shared mount target validation does not currently reserve /var/log. |
docs/observability/accessing-logs.mdx:36 |
Docs state that gateway log storage is bounded and not persisted. |
docs/observability/accessing-logs.mdx:57 |
Docs state that files contain the complete record and the push channel can drop events. |
docs/observability/ocsf-json-export.mdx:36 |
Docs state OCSF JSONL writes to /var/log/openshell-ocsf.YYYY-MM-DD.log. |
architecture/sandbox.md:91 |
Architecture says sandbox logs are local and can also be pushed to the gateway. |
architecture/gateway.md:104 |
Gateway observability scope includes pushing/streaming sandbox logs, not durable archival. |
Current Behavior
- Operators can use
openshell logs and the TUI for recent or live logs, but those tools read the gateway's in-memory log bus.
- Gateway restarts, gateway rotation, buffer overflow, sandbox deletion, or push-channel pressure can lose data from the gateway-access path.
- The durable logs are inside the sandbox filesystem at
/var/log/openshell*.log and /var/log/openshell-ocsf*.log.
- Kubernetes pod logs do not expose those files unless the sandbox process also writes all desired logs to stdout/stderr or another container tails a shared volume.
- Existing Docker/Podman mount support may let advanced users build a file-collection workaround, but OpenShell does not provide a documented, portable log collection model.
What Would Need to Change
A buildable design should answer these questions:
- Define the canonical log collection source. The simplest answer is the existing files, but the design should decide whether
/var/log remains the path or whether OpenShell introduces a configurable log directory such as /var/log/openshell to simplify shared-volume mounts.
- Define the user/operator configuration surface. This may be gateway TOML, Helm values, sandbox template fields, or settings. Because this affects gateway TOML and Helm rendering if implemented there,
docs/reference/gateway-config.mdx and Helm docs would need updates.
- Define how Kubernetes exposes file-backed logs. Likely options include a shared
emptyDir/PVC for OpenShell log files and an optional sidecar that tails/ships them, or support for operator-provided collector sidecars in the sandbox pod template.
- Define how Docker/Podman/VM drivers expose file-backed logs. Options include generated named volumes, documented driver mounts, or a driver-neutral log directory convention that host collectors can read.
- Decide whether OpenTelemetry log shipping is implemented directly in the supervisor, delegated to an external collector sidecar/daemon, or deferred behind a later sink interface.
- Preserve the existing
openshell logs behavior as interactive/recent diagnostics rather than making it the archival export mechanism.
Alternative Approaches Considered
File-driven collection with driver-specific exposure. Keep sandbox files as source of truth. Add a driver-neutral log directory convention and expose that directory through Kubernetes shared volumes/sidecars and local driver mounts. This is the simplest operational model and aligns with current implementation, but needs careful path/backward-compatibility decisions.
Kubernetes collector sidecar first. Add Helm/Kubernetes driver support for a configurable sidecar that tails OpenShell log files and writes to stdout, Fluent Bit, Vector, or an OpenTelemetry Collector. This solves the immediate Kubernetes operator problem, but should be framed as one driver implementation of a broader log sink model so Docker, Podman, and VM are not left behind.
Supervisor-owned OTLP exporter. Add an OpenTelemetry logs exporter in the supervisor that ships directly from the tracing/event stream. This is portable across drivers and avoids sidecars, but introduces network policy, credentials, batching/backpressure, retry, and configuration complexity inside every sandbox.
Gateway persistence/export. Persist pushed logs at the gateway or add an export API. This improves openshell logs, but it keeps the gateway in the hot path and does not solve file-backed collection or Kubernetes-native log collection. It should not be the primary archival design.
Patterns to Follow
- Logging must remain non-blocking for sandbox execution. Existing push behavior drops under pressure rather than blocking the supervisor.
- OCSF structured events should remain the machine-readable/security-audit format, while shorthand remains optimized for humans and agents.
- Kubernetes driver changes should follow the existing pod-template transform pattern used for supervisor side-loading and workspace persistence.
- Driver mount behavior should reuse
crates/openshell-core/src/driver_mounts.rs validation and existing Docker/Podman driver config patterns.
- Operator-facing config must be documented in
docs/reference/gateway-config.mdx and Helm values/README if it affects gateway TOML or Helm rendering.
Proposed Approach
Investigate a small, file-first log collection architecture. Treat local sandbox log files as the durable source of truth, keep openshell logs as a recent/live diagnostic view, and add a driver-neutral concept of a sandbox log export directory or log sink. For Kubernetes, explore mounting that directory on a shared volume and optionally adding an operator-configurable collector sidecar that can tail files to stdout or ship to OTLP-compatible collectors. For Docker, Podman, and VM, explore how the same directory can map to host-visible volumes or driver state paths without requiring the gateway to persist log streams.
Scope Assessment
- Complexity: Medium-High
- Confidence: Medium — the existing file path is clear, but the right portable configuration surface needs human review.
- Estimated files to change: 8-15 for an initial implementation, depending on whether the first build includes Kubernetes only or a driver-neutral config surface.
- Issue type:
feat
Risks & Open Questions
- Should OpenShell keep writing directly under
/var/log, or move to/configure a subdirectory that can be mounted cleanly without masking unrelated system logs?
- Should the first implementation support Kubernetes only, or define the driver-neutral log sink model before adding the Kubernetes sidecar?
- Should OpenTelemetry support be implemented by shipping files through a collector sidecar, or by adding OTLP export directly in the supervisor?
- How should collector credentials and network egress be modeled without weakening sandbox isolation or leaking secrets into logs?
- What are the retention and rotation expectations when a collector is enabled? Existing rotation is daily with 3 files for the OpenShell files.
- How should this interact with OCSF JSONL enablement? Operators likely need to enable full JSONL and log collection independently.
- LSM impact: shared volume and sidecar access should be checked under AppArmor/SELinux. Podman already has SELinux-aware mount behavior; Kubernetes sidecar access depends on pod security context, volume type, and labels.
Test Considerations
- Unit tests for any new log directory or sink config parsing/defaults.
- Kubernetes pod-template unit tests proving the shared log volume, agent mount, and optional sidecar are rendered correctly and do not disturb TLS/bootstrap/SPIFFE mounts.
- Helm lint/docs tests if Helm values are added.
- Docker/Podman unit tests if log directory mounts are generated or reserved paths change.
- Kubernetes e2e test that creates a sandbox, emits OCSF events, and verifies a collector sidecar or shared volume can read the file-backed logs.
- Optional OTLP integration test with a mock collector if direct OTLP export is selected.
- Docs update for operator workflows under
docs/observability/ and any gateway config changes under docs/reference/gateway-config.mdx.
Created by spike investigation. Use build-from-issue to plan and implement after human review.
Problem Statement
Operators need a durable way to collect sandbox agent and supervisor logs, especially on Kubernetes. Today
openshell logsis useful for interactive diagnosis, but it is backed by a bounded in-memory gateway buffer and is not sufficient as the production log export path. OpenShell already writes the complete sandbox log record to files inside the sandbox, so the investigation should look for the simplest architecture that lets operators collect those files across compute drivers.This is a focused follow-up to #1055 and was motivated by the OCSF JSONL validation work in #1917 / #1921.
Technical Context
OpenShell currently has two sandbox log paths:
openshell logs, the TUI, and watch streams.The file-backed path is the complete record. The gRPC push path is best-effort and volatile. On Kubernetes, operators naturally expect cluster log collection via pod logs, sidecars, daemonsets, or OpenTelemetry collectors, but the OpenShell files live inside the agent container filesystem and are not directly exposed to
kubectl logsor standard file collectors unless the pod has an appropriate shared volume or sidecar.Affected Components
proto/openshell.proto,crates/openshell-cli/src/run.rsGetSandboxLogs,PushSandboxLogs,WatchSandbox, and implementsopenshell logs.crates/openshell-server/src/tracing_bus.rs,crates/openshell-server/src/grpc/policy.rscrates/openshell-sandbox/src/main.rs,crates/openshell-supervisor-process/src/log_push.rs,crates/openshell-ocsf/src/tracing_layers/jsonl_layer.rscrates/openshell-core/src/settings.rsocsf_json_enabled, which controls full OCSF JSONL output.proto/compute_driver.protocrates/openshell-driver-kubernetes/src/driver.rs,crates/openshell-driver-kubernetes/src/config.rs,deploy/helm/openshell/values.yamlcrates/openshell-driver-docker/src/lib.rs,crates/openshell-driver-podman/src/container.rs,crates/openshell-core/src/driver_mounts.rsdocs/observability/accessing-logs.mdx,docs/observability/ocsf-json-export.mdx,architecture/sandbox.md,architecture/gateway.mdTechnical Investigation
Architecture Overview
openshell logsresolves a sandbox name to an ID and then either callsWatchSandboxfor tailing orGetSandboxLogsfor one-shot reads. Both paths read from gateway-side streams, not from the sandbox filesystem. The gateway receives sandbox-originated logs throughPushSandboxLogs, normalizes their source tosandbox, and publishes them intoTracingLogBus.TracingLogBusmaintains a broadcast channel and a per-sandboxVecDequetail buffer. The default tail cap is 2000 lines. The buffer is process-local memory and is dropped on gateway restart, sandbox deletion, or gateway rotation.Inside the sandbox, tracing is layered so the shorthand formatter writes to stderr and the local file appender, while the JSONL layer writes OCSF JSONL when enabled. The same subscriber also installs the best-effort
LogPushLayer. That push layer uses bounded channels,try_send, batching, reconnect, and backoff. Its behavior is intentionally non-blocking, so it can drop logs when the sandbox is under pressure or disconnected.Kubernetes sandbox pods are built by the Kubernetes driver from a generated pod template. The driver injects a single
agentcontainer, required volumes for TLS/bootstrap identity, optional SPIFFE mounts, supervisor side-loading, and workspace persistence. Current Kubernetesplatform_configsupport is intentionally narrow and does not expose arbitrary sidecars or arbitrary shared log mounts. The Helm chart exposes gateway and sandbox driver settings, but no log collector configuration today.Docker and Podman already have validated driver mount configuration. They can map named volumes, bind mounts, and tmpfs/image mounts depending on driver support and safety settings. This makes file-based log collection plausible outside Kubernetes, but there is no OpenShell-owned log directory/mount convention yet.
Code References
proto/openshell.proto:156GetSandboxLogsas the one-shot recent log fetch path.proto/openshell.proto:159PushSandboxLogsas the supervisor-to-gateway client-streaming path.proto/openshell.proto:768WatchSandboxRequest.follow_logsstreams gateway-correlated logs.proto/openshell.proto:1329GetSandboxLogsRequestsupports tail count, time, source, and level filters only.crates/openshell-cli/src/run.rs:7211sandbox_logsresolves sandbox identity and calls gateway log APIs.crates/openshell-cli/src/run.rs:7255WatchSandbox.crates/openshell-cli/src/run.rs:7283GetSandboxLogs.crates/openshell-server/src/tracing_bus.rs:17crates/openshell-server/src/tracing_bus.rs:88crates/openshell-server/src/tracing_bus.rs:114crates/openshell-server/src/grpc/policy.rs:2075handle_get_sandbox_logsserves the gateway buffer.crates/openshell-server/src/grpc/policy.rs:2114handle_push_sandbox_logsingests supervisor-pushed log batches.crates/openshell-supervisor-process/src/log_push.rs:17LogPushLayercaptures tracing events for gateway push.crates/openshell-supervisor-process/src/log_push.rs:87crates/openshell-supervisor-process/src/log_push.rs:97crates/openshell-sandbox/src/main.rs:276crates/openshell-sandbox/src/main.rs:279openshell-ocsf*.logunder/var/log.crates/openshell-sandbox/src/main.rs:299crates/openshell-ocsf/src/tracing_layers/jsonl_layer.rs:21crates/openshell-core/src/settings.rs:118ocsf_json_enabledis documented as writing/var/log/openshell-ocsf*.log.proto/compute_driver.proto:18proto/compute_driver.proto:117DriverSandboxTemplate.platform_configis the platform-specific escape hatch.crates/openshell-driver-kubernetes/src/driver.rs:1249crates/openshell-driver-kubernetes/src/driver.rs:1418crates/openshell-driver-kubernetes/src/driver.rs:1454crates/openshell-driver-kubernetes/src/driver.rs:1515crates/openshell-driver-kubernetes/src/driver.rs:1525crates/openshell-driver-kubernetes/src/config.rs:161deploy/helm/openshell/values.yaml:147crates/openshell-driver-docker/src/lib.rs:1656crates/openshell-driver-docker/src/lib.rs:2142crates/openshell-driver-podman/src/container.rs:787crates/openshell-driver-podman/src/container.rs:810/sandboxworkspace volume.crates/openshell-core/src/driver_mounts.rs:56/var/log.docs/observability/accessing-logs.mdx:36docs/observability/accessing-logs.mdx:57docs/observability/ocsf-json-export.mdx:36/var/log/openshell-ocsf.YYYY-MM-DD.log.architecture/sandbox.md:91architecture/gateway.md:104Current Behavior
openshell logsand the TUI for recent or live logs, but those tools read the gateway's in-memory log bus./var/log/openshell*.logand/var/log/openshell-ocsf*.log.What Would Need to Change
A buildable design should answer these questions:
/var/logremains the path or whether OpenShell introduces a configurable log directory such as/var/log/openshellto simplify shared-volume mounts.docs/reference/gateway-config.mdxand Helm docs would need updates.emptyDir/PVC for OpenShell log files and an optional sidecar that tails/ships them, or support for operator-provided collector sidecars in the sandbox pod template.openshell logsbehavior as interactive/recent diagnostics rather than making it the archival export mechanism.Alternative Approaches Considered
File-driven collection with driver-specific exposure. Keep sandbox files as source of truth. Add a driver-neutral log directory convention and expose that directory through Kubernetes shared volumes/sidecars and local driver mounts. This is the simplest operational model and aligns with current implementation, but needs careful path/backward-compatibility decisions.
Kubernetes collector sidecar first. Add Helm/Kubernetes driver support for a configurable sidecar that tails OpenShell log files and writes to stdout, Fluent Bit, Vector, or an OpenTelemetry Collector. This solves the immediate Kubernetes operator problem, but should be framed as one driver implementation of a broader log sink model so Docker, Podman, and VM are not left behind.
Supervisor-owned OTLP exporter. Add an OpenTelemetry logs exporter in the supervisor that ships directly from the tracing/event stream. This is portable across drivers and avoids sidecars, but introduces network policy, credentials, batching/backpressure, retry, and configuration complexity inside every sandbox.
Gateway persistence/export. Persist pushed logs at the gateway or add an export API. This improves
openshell logs, but it keeps the gateway in the hot path and does not solve file-backed collection or Kubernetes-native log collection. It should not be the primary archival design.Patterns to Follow
crates/openshell-core/src/driver_mounts.rsvalidation and existing Docker/Podman driver config patterns.docs/reference/gateway-config.mdxand Helm values/README if it affects gateway TOML or Helm rendering.Proposed Approach
Investigate a small, file-first log collection architecture. Treat local sandbox log files as the durable source of truth, keep
openshell logsas a recent/live diagnostic view, and add a driver-neutral concept of a sandbox log export directory or log sink. For Kubernetes, explore mounting that directory on a shared volume and optionally adding an operator-configurable collector sidecar that can tail files to stdout or ship to OTLP-compatible collectors. For Docker, Podman, and VM, explore how the same directory can map to host-visible volumes or driver state paths without requiring the gateway to persist log streams.Scope Assessment
featRisks & Open Questions
/var/log, or move to/configure a subdirectory that can be mounted cleanly without masking unrelated system logs?Test Considerations
docs/observability/and any gateway config changes underdocs/reference/gateway-config.mdx.Created by spike investigation. Use
build-from-issueto plan and implement after human review.