Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
379 changes: 379 additions & 0 deletions .claude/wip/grpc-plan.md

Large diffs are not rendered by default.

62 changes: 62 additions & 0 deletions .claude/wip/grpc-server.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
# Standadlone GRPC Server

Goal: replace snyk-broker with a native-Go implemented Grpc streaming interface for tunneling HTTP traffic.

## Current architecture

This project currently communicates with the Cortex background by hosting an instance of the snyk-broker component. This connects to another snyk-broker in the Cortex infrastructure via a websocket. It's only functions are to:

* Initiate a two-way websocket tunnel by reaching out to the server
* Maintain the health of this tunnel
* Serve as a dumb pipe for HTTP traffic from the server side to these agent instances

So the flow is:

- Client [Axon] initializes `snyk-broker`
- `snyk-broker` contacts the server side (e.g. https://relay.cortex.io) and establises a websocket tunnel
- Server side then can dispatch HTTP calls that come through the tunnel then are executed by Axon inside the customer's network
- Responses are then played back through the tunnel.

On the server side, snyk-broker interfaces with an HTTP server called the `BROKER_SERVER` which it communicates a set of operations to:

- server-connected / deleted: a server-side snyk-broker instance is registering itself with the broker server
- client-connected / deleted: a client-side instance has regsered with the server-side snyk-broker and this information is sent to the broker-serve. this includes a BROKER_TOKEN which can be used to route traffic back to a specific client instance

On the server side, the BROKER_SERVER also supports a dispatching operation like:

`GET http://broker-server:8080/broker/$token/some/path`

The broker server then takes the token and determines which snyk-broker server instance owns that connection. It then envelopes the HTTP request, and that is sent over a websocket to the client side. The client side then compares the path and method to it's accept.json file and uses that information to either reject the call (no matching config) or rewrite it to a local call, then tunnel the response back.

### Problems with this

- There is a lot of code in snyk-broker we don't care about. We only want the HTTP tunneling, none of the other stuff
- Snyk-broker is written in node which complicates development and installation
- The node websocket stack is complicated and we've seen very fragile. The semantics between server and client have been difficult to get right and there are a lot of failure modes that have been difficult to anticipate.

## New plan

Rather than take on this complexity, I'd like to instead move to an exactly compatable system that is written in Go and GRPC with the following high level architecture:

- build a set of protobuf service files that define the flow between client and server
- in the axon project add a new root folder called server that implements the server side
- in the axon /agent folder we implement a new client to talk to this server. based on a flag we will instantiate either that or the existing snyk relay_instance_manager. ideally this is a very abstract interface so most of axon has no idea which we have injected
- this should be designed for durability of connection
- one of the message types is "heartbeat" and both sides regularly send the other a heartbeat to validate a working tunnel
- when a side can't heartbeat it should aggressively try to establish a new tunnel
- client side should support multiple (probably 2) of these running concurrently eg if one tunnel has problems, can switch to a healthy tunnel, kill the exisitng one and re-establish. since server side is sticky hopefully this will allow connecting to multiple remote heads.
- goal: we should never need to restart client instances to get them to reconnect, they should know they are in a disconnected state
- should support bearer authentication, starting with expecting a valid non-expired JWT signed by cortex. this should be optional to start with and the server side should support specifying a JWT public secret file for validating the JWT against. we don't need to protect all traffic but ideally we can require a valid cortex token.
- need to add a new server/docker/Dockerfile for building just the server component.
- the server side should be able to safely run any number of server components.
- the server side should emit prometheus metrics for it's primary operations. It should use uber/tally as the main interface for emitting metrics from code, backed by a prometheus recorder.
- the server side should emit structured zap JSON logging.

### Investigation

- the client routing side should support the existing accept.json format. See examples in agent/server/snykbroker/accept_files for the format to handle. This stack will replace the snyk-broker and reflector pieces of the exisitng architecture.


- please investigate the BROKER_SERVER interface here: https://github.com/cortexapps/snyk-broker/blob/16805ee1f3318c783df7ed35085ec9aa941bff6e/lib/server/infra/dispatcher.ts#L178. we want the server to support interacting with a server that supports this interface, given a hostport eg BROKER_SERVER_URL
- In https://github.com/cortexapps/snyk-broker/ undertand the usage of the BROKER_TOKEN raw and hashed versions in the API. Each server instance will need to keep track of it's client connections and the raw and hashed token for each.
- We want to be very paranoid about how to deal with problems, for example if the GCP load balancer doesn't have long enough TTLs can we recognize infrastructure closing our ports; can we handle the server side instances rolling, etc.
79 changes: 79 additions & 0 deletions .claude/wip/grpc-tunnel-e2e-test.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
# gRPC Tunnel E2E Test

## Status: PASSING (both proxy and no-proxy modes)

## Fixes Applied to Get Tests Passing

### 1. `python:3.8-alpine` → `python:3.13-alpine`
Python 3.8 is EOL and the Docker image was removed from Docker Hub.
- **File**: `agent/test/relay/docker-compose.grpc.yml`

### 2. Missing `CORTEX_TENANT_ID` env var
The server requires `tenant_id` in ClientHello but the docker-compose didn't set `CORTEX_TENANT_ID` for the axon-relay container.
- **File**: `agent/test/relay/docker-compose.grpc.yml` — added `CORTEX_TENANT_ID: test-tenant`

### 3. Separated gRPC TLS from HTTP TLS config
`DISABLE_TLS` controlled both gRPC transport credentials and HTTP client TLS verification. When running with proxy (`CA_CERT_PATH` set), `http_client.go` panicked: "Cannot use custom CA cert with TLS verification disabled". Added a new `GRPC_INSECURE` config field specifically for gRPC tunnel connections.
- **Files**:
- `agent/config/config.go` — added `GrpcInsecure bool` field, read from `GRPC_INSECURE` env var
- `agent/server/grpctunnel/tunnel_client.go` — uses `GrpcInsecure` instead of `HttpDisableTLS`
- `agent/test/relay/docker-compose.grpc.yml` — uses `GRPC_INSECURE: "true"` instead of `DISABLE_TLS: "true"`

### 4. Removed snyk-broker-specific header check
The test checked for `x-axon-relay-instance` header which is injected by the snyk-broker reflector, not the gRPC tunnel path.
- **File**: `agent/test/relay/relay_test.grpc.sh`

### 5. macOS compatibility fix
`stat -c%s` doesn't work on macOS (BSD stat). Changed to `wc -c <` which is portable.
- **File**: `agent/test/relay/relay_test.grpc.sh`

## Running the Tests

```bash
# No-proxy mode
cd agent/test/relay && PROXY=0 ./relay_test.grpc.sh

# With proxy mode
cd agent/test/relay && PROXY=1 ./relay_test.grpc.sh

# Both (via Makefile)
cd agent && make grpc-relay-test
```

## Test Architecture

```
Host
|
v
grpc-tunnel-server (HTTP :8080, gRPC :50052)
|
gRPC bidirectional stream
|
v
axon-relay (RELAY_MODE=grpc-tunnel)
|
HTTP request execution
|
v
python-server (:80, serves /tmp)
or GitHub (HTTPS)
or cortex-fake (:8081, echo endpoint)
```

## Test Cases
1. Text file relay (write to /tmp, fetch via tunnel)
2. Binary file relay (1MB, SHA-256 checksum verification)
3. HTTPS relay (GitHub README fetch)
4. Proxy header injection (PROXY=1 only) — verifies `x-proxy-mitmproxy`
5. Accept file header injection (PROXY=1 only) — verifies `added-fake-server`
6. Plugin header injection (PROXY=1 only) — verifies `HOME=/root`
7. gRPC tunnel stream establishment (PROXY=1 only) — log check

## Remaining Phase 2 Tasks
- None — Phase 2 is complete (code + e2e tests passing)

## Phase 3 & 4 (Not Started)
- Phase 3: JWT auth, graceful shutdown hardening, health endpoints
- Phase 4: Migration, cleanup
- Plan doc: `.claude/wip/grpc-plan.md`
50 changes: 39 additions & 11 deletions .vscode/launch.json
Original file line number Diff line number Diff line change
Expand Up @@ -64,20 +64,42 @@
"console": "integratedTerminal",

},

{
"name": "Launch Go EV-Data Example",
"name": "Launch Agent (relay)",
"type": "go",
"request": "launch",
"mode": "auto",
"program": "${workspaceFolder}/examples/go/axon-ev-sync/main.go",
"program": "${workspaceFolder}/agent/main.go",
"args": [
"relay",
"-i",
"github",
"--alias",
"relay2",
"-v"
],
"envFile": "${workspaceFolder}/.env",
"env": {
"LOG_LEVEL": "info"
}
},

"SCAFFOLD_DIR": "${workspaceFolder}/scaffold",
"SCAFFOLD_DOCKER_IMAGE": "cortex-axon-agent:local",
"PORT" : "7399",
"BUILTIN_PLUGIN_DIR": "${workspaceFolder}/agent/server/snykbroker/plugins",
}
},
{
"name": "Launch gRPC Tunnel Server",
"type": "go",
"request": "launch",
"mode": "auto",
"program": "${workspaceFolder}/server/cmd/main.go",
"env": {
"LOG_LEVEL": "info"
}
},

{
"name": "Launch Agent (relay)",
"name": "Launch Agent (relay gRPC)",
"type": "go",
"request": "launch",
"mode": "auto",
Expand All @@ -87,18 +109,24 @@
"-i",
"github",
"--alias",
"relay2",
"relay-grpc",
"-v"
],
"envFile": "${workspaceFolder}/.env",
"env": {
"SCAFFOLD_DIR": "${workspaceFolder}/scaffold",
"SCAFFOLD_DOCKER_IMAGE": "cortex-axon-agent:local",
"PORT" : "7399",
"PORT": "7399",
"BUILTIN_PLUGIN_DIR": "${workspaceFolder}/agent/server/snykbroker/plugins",
}
"RELAY_MODE": "grpc-tunnel",
"BROKER_SERVER_URL": "localhost:50052",
"BROKER_TOKEN": "4f49654b-000-0000-000-9deef1d9f2f6",
"CORTEX_TENANT_ID": "1",
"GRPC_INSECURE": "true",
"TUNNEL_COUNT": "1",
"CORTEX_API_TOKEN": "abc-123"
}
},


]
}
2 changes: 2 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ all: setup

proto:
$(MAKE) -C agent proto
$(MAKE) -C server proto
$(MAKE) -C sdks/python proto
.PHONY: proto

Expand All @@ -25,6 +26,7 @@ setup:

test:
$(MAKE) -C agent test
$(MAKE) -C server test
@echo "TODO: sdk go test"
$(MAKE) -C sdks/go test
$(MAKE) -C scaffold test
Expand Down
27 changes: 25 additions & 2 deletions agent/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,20 @@ GO_FILES := $(patsubst proto/%.proto,$(GENERATED_PATH)/%.pb.go,$(PROTO_FILES))
GOPATH ?= $(HOME)/go
GOBIN ?= $(GOPATH)/bin

proto: setup $(GO_FILES) version
TUNNEL_PROTO = ../proto/tunnel/tunnel.proto
TUNNEL_GENERATED = $(GENERATED_DIR)/github.com/cortexapps/axon/tunnelpb

proto: setup $(GO_FILES) tunnel-proto version

tunnel-proto: $(TUNNEL_PROTO)
@echo "Generating tunnel protobuf for agent"
@mkdir -p $(GENERATED_DIR)
@protoc -I=../proto/tunnel \
--go_out=$(GENERATED_DIR) --go-grpc_out=$(GENERATED_DIR) \
--go_opt=Mtunnel.proto=github.com/cortexapps/axon/tunnelpb \
--go-grpc_opt=Mtunnel.proto=github.com/cortexapps/axon/tunnelpb \
$(TUNNEL_PROTO)
.PHONY: tunnel-proto

version: $(GO_SDK_DIR)/version/agentversion.txt $(PYTHON_SDK_DIR)/cortex_axon/agentversion.py

Expand Down Expand Up @@ -83,7 +96,17 @@ relay-test-with-proxy:

relay-test: relay-test-no-proxy relay-test-with-proxy

.PHONY: relay-test relay-test-no-proxy relay-test-with-proxy
grpc-relay-test-no-proxy:
@echo "Running gRPC relay tests: no proxy"
cd test/relay && export PROXY=0 && ./relay_test.grpc.sh

grpc-relay-test-with-proxy:
@echo "Running gRPC relay tests: with proxy"
cd test/relay && export PROXY=1 && ./relay_test.grpc.sh

grpc-relay-test: grpc-relay-test-no-proxy grpc-relay-test-with-proxy

.PHONY: relay-test relay-test-no-proxy relay-test-with-proxy grpc-relay-test grpc-relay-test-no-proxy grpc-relay-test-with-proxy

run: proto
go run main.go serve
Expand Down
Loading
Loading