diff --git a/README.md b/README.md index fcf14f0..a1d8f17 100644 --- a/README.md +++ b/README.md @@ -1,36 +1,35 @@ # dp-cli -Command-line client providing *handy helper tools* for the ONS Dissemination Platform software engineering team +> [!WARNING] +> This tool is primarily for internal use at ONS but feel free to fork for your own use. +> +> If you notice any bugs/issues please open a GitHub issue. -:warning: Still in active development. If you notice any bugs/issues please open a GitHub issue. +Command-line client providing *handy helper tools* for the ONS Dissemination Platform software engineering team ## Getting started -Clone the code (not needed if you [brew install on macOS](#brew-installation) :warning:) - -```shell -git clone git@github.com:ONSdigital/dp-cli.git -``` +If using macOS, you can install using `brew`: -:warning: `dp-cli` uses Go Modules and **must** be cloned to a location outside of your `$GOPATH`. +- Create tap -### Prerequisites + ```shell + brew tap ONSdigital/homebrew-dp-cli git@github.com:ONSdigital/homebrew-dp-cli + ``` -**Required:** +- Run brew install -Check that `session-manager-plugin` is installed by running the following command + ```shell + brew install dp-cli + ``` -```shell -which session-manager-plugin -``` +### Prerequisites -if not installed, you can install it using the following: +The cli tool will do its best to check you have the required supporting tools installed, but you will need to have the following installed to use the tool: -```shell -brew install --cask session-manager-plugin -``` - -or by follow this [doc](https://docs.aws.amazon.com/systems-manager/latest/userguide/session-manager-working-with-install-plugin.html#install-plugin-macos). +- **aws cli** - Either `brew install awscli` or follow the [AWS docs](https://docs.aws.amazon.com/cli/latest/userguide/install-cliv2-mac.html) +- **aws session manager plugin** - Either `brew install --cask session-manager-plugin` or follow the [AWS docs](https://docs.aws.amazon.com/systems-manager/latest/userguide/session-manager-working-with-install-plugin.html#install-plugin-macos) +- **socat** - Either `brew install socat` #### Optional but common requirements @@ -56,7 +55,7 @@ In order to use the `dp ssh` sub-command you will need: git clone git@github.com:ONSdigital/dp-nisra-infrastructure ``` -Note: Make sure your repo's are on the right branches and are uptodate: +Note: Make sure your repo's are on the right branches and are up-to-date: - `dp-setup` is on the `awsb` (or `main`) branch - `dp-ci` is on the `main` branch @@ -116,31 +115,20 @@ update the paths and `user-name`: You can uncomment more `environments` values as and when you get access to them. -### Brew Installation - -If using macOS, you can install using `brew`: - -- Create tap - - ```shell - brew tap ONSdigital/homebrew-dp-cli git@github.com:ONSdigital/homebrew-dp-cli - ``` - -- Run brew install - - ```shell - brew install dp-cli - ``` +## Binary build and run -### Build and run - -If not using the *brew* installation (above), you can build, install and start the CLI thus: +```shell +git clone git@github.com:ONSdigital/dp-cli.git +``` ```shell make install dp ``` +> [!IMPORTANT] +> `dp-cli` uses Go Modules and **must** be cloned to a location outside of your `$GOPATH`. + - If you get: `command not found: dp` @@ -170,14 +158,16 @@ Usage: Available Commands: clean Delete data from your local environment - create-repo Creates a new repository with the typical Dissemination Platform configurations + completion Generate the autocompletion script for the specified shell + create-repo Creates a new repository with the typical Dissemination Platform configurations + eks EKS cluster management commands generate-project Generates the boilerplate for a given project type help Help about any command import Import data into your local developer environment + override-key Generates an overrideKey to bypass the Florence dataset version validation step when approving a collection remote Allow or deny remote access to environment - scp Push (or `--pull`) a file to (from) an environment using scp + remote2 (NEW) Allow or deny remote access to environment using the new remote allow service spew Log out some useful debugging info - ssh Access an environment using ssh version Print the app version Flags: @@ -188,9 +178,9 @@ Use "dp [command] --help" for more information about a command. Use the available commands for more info on the functionality available. -### Common issues +## Common issues -#### Credentials error +### Credentials error 1. If sandbox/prod/staging are not in the dp cli output try unsetting `AWS_REGION` and `AWS_DEFAULT_REGION` @@ -219,7 +209,7 @@ Use the available commands for more info on the functionality available. profile: ``` -#### SSH/SCP command fails +### SSH/SCP command fails ```shell $ dp ssh sandbox @@ -229,7 +219,7 @@ ssh to sandbox If the SSH or SCP command fails, ensure that the `dp remote allow` command has been run for the environment you want to connect to. -#### Remote Allow security group error +### Remote Allow security group error `Error: no security groups matching environment: "sandbox" with name "sandbox - bastion"` @@ -246,7 +236,7 @@ Example: export AWS_PROFILE=dp-staging ``` -#### Remote Allow security group rule already exists error +### Remote Allow security group rule already exists error ```shell $ dp remote allow sandbox @@ -258,13 +248,14 @@ Error: error adding rules to bastionSG: InvalidPermission.Duplicate: the specifi The error occurs when rules have previously been added and the command is run again. Use (e.g.) `dp remote deny sandbox` to clear out existing rules and try again. -Note: *This error should no longer appear* - the code should now avoid re-adding existing rules. -However, it is possible that the rule has been added with a description that does not match your username. -If so, you will have to use the AWS web UI/console to remove any offending Security Group rules. +> [!NOTE] +> *This error should no longer appear* - the code should now avoid re-adding existing rules. +> However, it is possible that the rule has been added with a description that does not match your username. +> If so, you will have to use the AWS web UI/console to remove any offending Security Group rules. -### Advanced use +## Advanced use -#### ssh commands +### ssh commands You can run ssh commands from the command-line, for example to determine the time on a given host: @@ -285,7 +276,7 @@ $ dp ssh sandbox web 1 --to 0 -- ls -la # runs `ls -la` on ALL web boxes ``` -#### Manually configuring your IP or user +### Manually configuring your IP or user Optionally, (e.g. to avoid the program looking-up your IP), you can use the `--ip` flag (or an environment variable `MY_IP`) to force the IP used when running `dp remote allow`. @@ -303,7 +294,7 @@ Similarly, use the `--user` flag to change the label attached to the IP that is dp remote --user MyColleaguesName --ip 192.168.44.55 --http-only allow sandbox ``` -#### Remote allow extra ports +### Remote allow extra ports You can expand the allowed ports in your config for `publishing`, `web` or `bastion` with: @@ -315,7 +306,7 @@ environments: - 80 ``` -#### AWS Command Line Access +### AWS Command Line Access Follow the guide in [dp](https://github.com/ONSdigital/dp/blob/main/guides/AWS_ACCOUNT_ACCESS.md) diff --git a/aws/aws.go b/aws/aws.go index 8fb7a66..a512657 100644 --- a/aws/aws.go +++ b/aws/aws.go @@ -7,7 +7,10 @@ import ( "github.com/aws/aws-sdk-go-v2/config" ) -func getAWSConfig(ctx context.Context, profile string) aws.Config { +// GetAWSConfig loads the AWS SDK config for the given profile. +// Region is determined by the SDK's default resolution chain +// (environment variables, shared config file, then falls back to eu-west-2). +func GetAWSConfig(ctx context.Context, profile string) (aws.Config, error) { var configOpts []func(*config.LoadOptions) error configOpts = append(configOpts, config.WithRegion("eu-west-2")) @@ -16,12 +19,15 @@ func getAWSConfig(ctx context.Context, profile string) aws.Config { configOpts = append(configOpts, config.WithSharedConfigProfile(profile)) } - cfg, err := config.LoadDefaultConfig(ctx, - configOpts..., - ) + return config.LoadDefaultConfig(ctx, configOpts...) +} + +// Deprecated: The use of getAWSConfig will be removed, this was the legacy internal helper that panics on error. +// Existing code in this package uses it; new code should use GetAWSConfig and legacy refactored. +func getAWSConfig(ctx context.Context, profile string) aws.Config { + cfg, err := GetAWSConfig(ctx, profile) if err != nil { panic(err) } - return cfg } diff --git a/aws/lambda.go b/aws/lambda.go new file mode 100644 index 0000000..b1e9399 --- /dev/null +++ b/aws/lambda.go @@ -0,0 +1,27 @@ +package aws + +import ( + "context" + + "github.com/aws/aws-sdk-go-v2/aws" + "github.com/aws/aws-sdk-go-v2/service/lambda" +) + +func getLambdaClient(ctx context.Context, profile string) *lambda.Client { + return lambda.NewFromConfig(getAWSConfig(ctx, profile)) +} + +// InvokeLambda invokes a Lambda function with the provided JSON payload and returns the raw response payload as string. +func InvokeLambda(ctx context.Context, profile string, functionName string, payload []byte) (string, error) { + client := getLambdaClient(ctx, profile) + + out, err := client.Invoke(ctx, &lambda.InvokeInput{ + FunctionName: aws.String(functionName), + Payload: payload, + }) + if err != nil { + return "", err + } + + return string(out.Payload), nil +} diff --git a/ci/build.yml b/ci/build.yml index 204b4b2..3f76733 100644 --- a/ci/build.yml +++ b/ci/build.yml @@ -6,7 +6,7 @@ image_resource: type: docker-image source: repository: golang - tag: 1.24.11-bookworm + tag: 1.26.3-bookworm inputs: - name: dp-cli diff --git a/ci/lint.yml b/ci/lint.yml index bd02503..2e62e0d 100644 --- a/ci/lint.yml +++ b/ci/lint.yml @@ -6,7 +6,7 @@ image_resource: type: docker-image source: repository: golang - tag: 1.24.11-bookworm + tag: 1.26.3-bookworm inputs: - name: dp-cli diff --git a/ci/unit.yml b/ci/unit.yml index 0a3b642..beb6107 100644 --- a/ci/unit.yml +++ b/ci/unit.yml @@ -6,7 +6,7 @@ image_resource: type: docker-image source: repository: golang - tag: 1.24.11-bookworm + tag: 1.26.3-bookworm inputs: - name: dp-cli diff --git a/command/eks.go b/command/eks.go new file mode 100644 index 0000000..4544e7c --- /dev/null +++ b/command/eks.go @@ -0,0 +1,388 @@ +package command + +import ( + "context" + "fmt" + "os" + "os/signal" + "strings" + "syscall" + + "github.com/ONSdigital/dp-cli/config" + "github.com/ONSdigital/dp-cli/eks" + "github.com/ONSdigital/dp-cli/out" + "github.com/spf13/cobra" +) + +func eksCommand(ctx context.Context, cfg *config.Config) *cobra.Command { + cmd := &cobra.Command{ + Use: "eks", + Short: "EKS cluster management commands", + } + + cmd.AddCommand(eksSessionCommand(ctx, cfg)) + + return cmd +} + +func eksSessionCommand(ctx context.Context, cfg *config.Config) *cobra.Command { + cmd := &cobra.Command{ + Use: "session", + Short: "Manage EKS tunnel sessions via bastion", + } + + cmd.AddCommand(eksSessionStartCommand(ctx, cfg)) + cmd.AddCommand(eksSessionStopCommand(ctx, cfg)) + cmd.AddCommand(eksSessionStatusCommand()) + + return cmd +} + +func eksSessionStartCommand(ctx context.Context, cfg *config.Config) *cobra.Command { + cmd := &cobra.Command{ + Use: "start", + Short: "Start EKS tunnel sessions for an environment", + } + + // Create a subcommand for each environment + for _, e := range cfg.Environments { + env := e + cmd.AddCommand(&cobra.Command{ + Use: env.Name, + Short: fmt.Sprintf("Start EKS tunnel sessions for %s", env.Name), + RunE: func(cmd *cobra.Command, args []string) error { + return runSessionStart(ctx, cfg, env) + }, + }) + } + + return cmd +} + +func eksSessionStopCommand(ctx context.Context, cfg *config.Config) *cobra.Command { + cmd := &cobra.Command{ + Use: "stop [environment]", + Short: "Stop EKS tunnel sessions (all if no environment specified)", + RunE: func(cmd *cobra.Command, args []string) error { + if len(args) > 0 { + return runSessionStopEnv(args[0]) + } + return runSessionStopAll() + }, + } + + return cmd +} + +func eksSessionStatusCommand() *cobra.Command { + return &cobra.Command{ + Use: "status", + Short: "Show active EKS tunnel sessions", + RunE: func(cmd *cobra.Command, args []string) error { + return runSessionStatus() + }, + } +} + +func runSessionStart(ctx context.Context, cfg *config.Config, env config.Environment) error { + // Check dependencies + missing := eks.CheckDependencies() + if len(missing) > 0 { + out.ErrorFHighlight("Missing required dependencies:") + for _, dep := range missing { + out.ErrorFHighlight(" ✗ %s (%s) - %s", dep.Name, dep.Command, dep.InstallHint) + } + return fmt.Errorf("install missing dependencies before continuing") + } + + profile := cfg.GetProfile(env.Name) + + out.InfoFHighlight("Starting EKS session for environment: %s", env.Name) + + // Discover bastion + out.Info(" Discovering bastion...") + bastion, err := eks.FindBastion(ctx, profile) + if err != nil { + return fmt.Errorf("bastion discovery failed: %w", err) + } + out.InfoFHighlight(" ✓ Found bastion: %s (%s)", bastion.Name, bastion.InstanceID) + + // Discover clusters + out.Info(" Discovering EKS clusters...") + clusters, err := eks.FindClusters(ctx, profile) + if err != nil { + return fmt.Errorf("cluster discovery failed: %w", err) + } + out.InfoFHighlight(" ✓ Found %s cluster(s)", fmt.Sprintf("%d", len(clusters))) + for _, c := range clusters { + out.InfoFHighlight(" - %s", c.Name) + } + + // Setup signal handler for graceful cleanup + sigChan := make(chan os.Signal, 1) + signal.Notify(sigChan, syscall.SIGINT, syscall.SIGTERM) + go func() { + <-sigChan + out.Warn("\n Caught interrupt, cleaning up...") + runSessionStopAll() + os.Exit(0) + }() + + // Ensure tunnel directory exists + if err := eks.EnsureTunnelDir(); err != nil { + return fmt.Errorf("failed to create tunnel directory: %w", err) + } + + // Ensure sudo credentials are cached before any privileged operations + if err := eks.EnsureSudo("binding port 443 (socat), updating /etc/hosts, and creating loopback aliases"); err != nil { + return err + } + + // Clean up any stale tunnels before starting + eks.CleanupStaleTunnels() + + // Start tunnels for each cluster + for _, cluster := range clusters { + // Allocate an available loopback IP + loopbackIP, err := eks.AllocateLoopbackIP() + if err != nil { + out.WarnFHighlight(" ⚠ %s", err.Error()) + continue + } + + out.InfoFHighlight(" Setting up tunnel for: %s", cluster.Name) + + // Check if tunnel already running and healthy + existing, err := eks.LoadTunnelState(cluster.Name) + if err == nil && eks.IsProcessAlive(existing.SSMPid) && eks.IsProcessAlive(existing.SocatPid) { + out.InfoFHighlight(" Tunnel already active (SSM PID: %s)", fmt.Sprintf("%d", existing.SSMPid)) + continue + } + // If state exists but processes are dead, clean up first + if err == nil { + stopTunnel(*existing) + } + + // Resolve IPv4 via bastion + out.Info(" Resolving endpoint IPv4 via bastion...") + ipv4, err := eks.ResolveEndpointIPv4(ctx, profile, bastion.InstanceID, cluster.Endpoint) + if err != nil { + out.WarnFHighlight(" ⚠ Failed to resolve %s: %s", cluster.Name, err.Error()) + continue + } + out.InfoFHighlight(" IPv4: %s", ipv4) + + // Allocate local port + localPort, err := eks.AllocateLocalPort() + if err != nil { + out.WarnFHighlight(" ⚠ %s", err.Error()) + continue + } + out.InfoFHighlight(" Local port: %s", fmt.Sprintf("%d", localPort)) + + // Start SSM port forward + out.Info(" Starting SSM session...") + ssmPid, err := eks.StartSSMPortForward(bastion.InstanceID, ipv4, localPort, profile) + if err != nil { + out.WarnFHighlight(" ⚠ SSM session failed: %s", err.Error()) + continue + } + out.InfoFHighlight(" SSM session established (PID: %s)", fmt.Sprintf("%d", ssmPid)) + + // Ensure loopback alias + if err := eks.EnsureLoopbackAlias(loopbackIP); err != nil { + out.WarnFHighlight(" ⚠ Failed to create loopback alias: %s", err.Error()) + eks.KillProcess(ssmPid, false) + continue + } + + // Start socat + out.InfoFHighlight(" Starting socat (%s:443 → 127.0.0.1:%s)...", loopbackIP, fmt.Sprintf("%d", localPort)) + socatPid, err := eks.StartSocat(loopbackIP, localPort) + if err != nil { + out.WarnFHighlight(" ⚠ socat failed: %s", err.Error()) + eks.KillProcess(ssmPid, false) + continue + } + + // Add hosts entry + if err := eks.AddHostsEntry(loopbackIP, cluster.Endpoint, cluster.Name); err != nil { + out.WarnFHighlight(" ⚠ Failed to update /etc/hosts: %s", err.Error()) + eks.KillProcess(ssmPid, false) + eks.KillProcess(socatPid, true) + continue + } + + // Save state + state := eks.TunnelState{ + ClusterName: cluster.Name, + SSMPid: ssmPid, + SocatPid: socatPid, + Endpoint: cluster.Endpoint, + LoopbackIP: loopbackIP, + IPv4: ipv4, + LocalPort: localPort, + } + if err := eks.SaveTunnelState(state); err != nil { + out.WarnFHighlight(" ⚠ Failed to save tunnel state: %s", err.Error()) + } + + out.InfoFHighlight(" ✓ Tunnel active: %s → %s:443 → bastion → %s:443", cluster.Endpoint, loopbackIP, ipv4) + } + + // Flush DNS cache + eks.FlushDNSCache() + + // Update kubeconfig for each cluster + out.Info(" Updating kubeconfig...") + for _, cluster := range clusters { + msg, err := eks.UpdateKubeconfig(cluster.Name, profile) + if err != nil { + out.WarnFHighlight(" ⚠ Failed to configure %s: %s", cluster.Name, err.Error()) + continue + } + out.InfoFHighlight(" ✓ %s", msg) + } + + out.Info(" ✓ All tunnels active. kubectl, Terraform, and k9s can route through the bastion.") + out.Info(" All access is auditable via CloudTrail SSM session logs.") + out.Info("") + out.Info(" Run 'dp eks session status' to check tunnel health.") + out.Info(" Run 'dp eks session stop' to tear down all tunnels.") + + return nil +} + +func runSessionStopEnv(environment string) error { + tunnels, err := eks.ListActiveTunnels() + if err != nil { + return err + } + + found := false + for _, t := range tunnels { + if strings.Contains(t.ClusterName, environment) { + if !found { + // Only prompt for sudo on first match + if err := eks.EnsureSudo("killing socat processes and cleaning /etc/hosts"); err != nil { + return err + } + } + stopTunnel(t) + found = true + } + } + + if !found { + out.WarnFHighlight(" No active tunnels found for environment: %s", environment) + } + + eks.FlushDNSCache() + return nil +} + +func runSessionStopAll() error { + tunnels, err := eks.ListActiveTunnels() + if err != nil { + return err + } + + if len(tunnels) == 0 { + out.Info(" No active tunnels") + return nil + } + + if err := eks.EnsureSudo("killing socat processes and cleaning /etc/hosts"); err != nil { + return err + } + + out.Info("Stopping all EKS tunnels...") + for _, t := range tunnels { + stopTunnel(t) + } + + eks.FlushDNSCache() + out.Info(" ✓ All tunnels stopped") + return nil +} + +func stopTunnel(t eks.TunnelState) { + out.InfoFHighlight(" Stopping tunnel: %s", t.ClusterName) + + if eks.IsProcessAlive(t.SocatPid) { + eks.KillProcess(t.SocatPid, true) + out.InfoFHighlight(" Killed socat (PID: %s)", fmt.Sprintf("%d", t.SocatPid)) + } + if eks.IsProcessAlive(t.SSMPid) { + eks.KillProcess(t.SSMPid, false) + out.InfoFHighlight(" Killed SSM session (PID: %s)", fmt.Sprintf("%d", t.SSMPid)) + } + if t.Endpoint != "" { + eks.RemoveHostsEntry(t.Endpoint) + out.Info(" Removed /etc/hosts entry") + } + eks.CleanupTunnelState(t.ClusterName) +} + +func runSessionStatus() error { + tunnels, err := eks.ListActiveTunnels() + if err != nil { + return err + } + + if len(tunnels) == 0 { + out.Info(" No active EKS tunnels") + return nil + } + + out.Info("Checking EKS tunnel status...") + + for _, t := range tunnels { + ssmAlive := eks.IsProcessAlive(t.SSMPid) + socatAlive := eks.IsProcessAlive(t.SocatPid) + + ssmStatus := "RUNNING" + socatStatus := "RUNNING" + if !ssmAlive { + ssmStatus = "DEAD" + } + if !socatAlive { + socatStatus = "DEAD" + } + + out.InfoFHighlight(" Cluster: %s", t.ClusterName) + if ssmAlive { + out.InfoFHighlight(" SSM: %s (PID: %s, port: %s)", ssmStatus, fmt.Sprintf("%d", t.SSMPid), fmt.Sprintf("%d", t.LocalPort)) + } else { + out.ErrorFHighlight(" SSM: %s (PID: %s, port: %s)", ssmStatus, fmt.Sprintf("%d", t.SSMPid), fmt.Sprintf("%d", t.LocalPort)) + } + if socatAlive { + out.InfoFHighlight(" socat: %s (PID: %s, %s:443)", socatStatus, fmt.Sprintf("%d", t.SocatPid), t.LoopbackIP) + } else { + out.ErrorFHighlight(" socat: %s (PID: %s, %s:443)", socatStatus, fmt.Sprintf("%d", t.SocatPid), t.LoopbackIP) + } + out.InfoFHighlight(" Endpoint: %s", t.Endpoint) + + // End-to-end connectivity check: curl the EKS API via the tunnel + apiStatus := "SKIPPED" + if ssmAlive && socatAlive && t.Endpoint != "" { + apiReachable := eks.CheckAPIConnectivity(t.Endpoint) + if apiReachable { + apiStatus = "REACHABLE" + } else { + apiStatus = "UNREACHABLE" + } + if apiReachable { + out.InfoFHighlight(" API: %s", apiStatus) + } else { + out.ErrorFHighlight(" API: %s (tunnel may be stale)", apiStatus) + } + } else { + out.WarnFHighlight(" API: %s (processes not healthy)", apiStatus) + } + + out.InfoFHighlight(" Route: %s → %s:443 → 127.0.0.1:%s → bastion → EKS", t.Endpoint, t.LoopbackIP, fmt.Sprintf("%d", t.LocalPort)) + } + + return nil +} diff --git a/command/remote2_access.go b/command/remote2_access.go new file mode 100644 index 0000000..5207efb --- /dev/null +++ b/command/remote2_access.go @@ -0,0 +1,272 @@ +package command + +import ( + "context" + "encoding/json" + "fmt" + "time" + + "github.com/ONSdigital/dp-cli/aws" + "github.com/ONSdigital/dp-cli/config" + "github.com/ONSdigital/dp-cli/out" + "github.com/spf13/cobra" +) + +// buildRemoteAccessPayload collects user/IPs from cfg and builds the JSON payload. +// action must be one of: "add", "revoke". +func buildRemoteAccessPayload(cfg *config.Config, action string) ([]byte, error) { + if cfg.UserName == nil || len(*cfg.UserName) == 0 { + return nil, fmt.Errorf("no user provided (use --user)") + } + + // If explicit IPs were provided, use only those (avoid external lookup) + ipv4 := "" + if cfg.IPv4Address != nil && len(*cfg.IPv4Address) > 0 { + ipv4 = *cfg.IPv4Address + } + ipv6 := "" + if cfg.IPv6Address != nil && len(*cfg.IPv6Address) > 0 { + ipv6 = *cfg.IPv6Address + } + if ipv4 == "" && ipv6 == "" { + var err error + ipv4, ipv6, err = cfg.GetMyIPs2() + if err != nil { + return nil, err + } + } + + ips := make([]string, 0) + if len(ipv4) > 0 { + ips = append(ips, ipv4) + } + if len(ipv6) > 0 { + ips = append(ips, ipv6) + } + if len(ips) == 0 { + return nil, fmt.Errorf("no IPv4 or IPv6 address resolved") + } + + payload := map[string]interface{}{ + "action": action, + "user": *cfg.UserName, + "ips": ips, + } + + return json.Marshal(payload) +} + +// LambdaResult represents a single item from the Lambda response array +type LambdaResult struct { + Status string `json:"status"` + Message string `json:"message"` + ResourceID string `json:"resource_id"` + ResourceType string `json:"resource_type"` + Action string `json:"action"` + TableUpdated bool `json:"table_updated"` + SGGroupName string `json:"sg_group_name,omitempty"` + CloudflareListName string `json:"cloudflare_list_name,omitempty"` + IPVersion string `json:"ip_version,omitempty"` + ExpiresAt *int64 `json:"expires_at,omitempty"` + RevokedAt *int64 `json:"revoked_at,omitempty"` +} + +func formatUnix(ts int64) string { + // dd/mm/yyyy hh:mm:ss + return time.Unix(ts, 0).Local().Format("02/01/2006 15:04:05") +} + +func renderLambdaResults(lvl out.Level, envName string, user string, results []LambdaResult) { + for _, r := range results { + verb := "allowing" + if r.Action == "revoke" { + verb = "denying" + } + + // Resource label: prefer SG group name for SGs, otherwise type(id) + label := r.ResourceType + switch r.ResourceType { + case "sg": + if r.SGGroupName != "" { + label = fmt.Sprintf("sg %s (%s)", r.SGGroupName, r.ResourceID) + } else { + label = fmt.Sprintf("sg (%s)", r.ResourceID) + } + case "cloudflare": + if r.CloudflareListName != "" { + label = fmt.Sprintf("cloudflare %s (%s)", r.CloudflareListName, r.ResourceID) + } else { + label = fmt.Sprintf("cloudflare (%s)", r.ResourceID) + } + default: + if r.ResourceID != "" { + label = fmt.Sprintf("%s (%s)", r.ResourceType, r.ResourceID) + } + } + + // Optional timestamp suffix + suffix := "" + if r.ExpiresAt != nil { + suffix = fmt.Sprintf(" (expires: %s)", formatUnix(*r.ExpiresAt)) + } else if r.RevokedAt != nil { + suffix = fmt.Sprintf(" (revoked: %s)", formatUnix(*r.RevokedAt)) + } + + // Compose line similar to existing output, using the Lambda message + // Example: [dp] allowing bob via sandbox - remote-allow-test-1 (sg-...) (expires: ...) + out.Highlight(lvl, "%s %s via %s - %s %s%s", verb, user, envName, label, r.Message, suffix) + } +} + +// remote2Access creates the new remote2 command for the new remote allow service. +func remote2Access(ctx context.Context, cfg *config.Config) *cobra.Command { + cmd := &cobra.Command{ + Use: "remote2", + Short: "(NEW) Allow or deny remote access to environment using the new remote allow service", + } + ipv4Default := "" + if cfg.IPv4Address != nil { + ipv4Default = *cfg.IPv4Address + } + ipv4Flag := cmd.PersistentFlags().String("ipv4", ipv4Default, "The IPv4 address for remote2 sub-commands") + if ipv4Flag != nil { + cfg.IPv4Address = ipv4Flag + } + + ipv6Default := "" + if cfg.IPv6Address != nil { + ipv6Default = *cfg.IPv6Address + } + ipv6Flag := cmd.PersistentFlags().String("ipv6", ipv6Default, "The IPv6 address for remote2 sub-commands") + if ipv6Flag != nil { + cfg.IPv6Address = ipv6Flag + } + + userDefault := "" + if cfg.UserName != nil { + userDefault = *cfg.UserName + } + userFlag := cmd.PersistentFlags().String("user", userDefault, "The user for access lists") + if userFlag != nil { + cfg.UserName = userFlag + } + + cmd.AddCommand(remote2AllowCommand(ctx, cfg)) + cmd.AddCommand(remote2DenyCommand(ctx, cfg)) + + return cmd +} + +// remote2AllowCommand creates the allow subcommand for remote2 +func remote2AllowCommand(ctx context.Context, cfg *config.Config) *cobra.Command { + + cmd := &cobra.Command{ + Use: "allow", + Short: "allow access to environment", + } + + envSubCmds := make([]*cobra.Command, 0) + + // create subcommands for each environment from the config + for _, e := range cfg.Environments { + env := e + envSubCmds = append(envSubCmds, &cobra.Command{ + Use: e.Name, + Short: "allow access to " + env.Name, + RunE: func(cmd *cobra.Command, args []string) error { + lvl := out.Level(out.GetLevel(env)) + + // Build payload + payload, err := buildRemoteAccessPayload(cfg, "add") + if err != nil { + out.Warn(fmt.Sprintf("Warning: %v. Aborting allow.", err)) + return nil + } + + // Lambda function name pattern: dis--remote-allow + functionName := fmt.Sprintf("dis-%s-remote-allow", env.Name) + + out.Highlight(lvl, "invoking lambda %s for %s", functionName, env.Name) + resp, err := aws.InvokeLambda(ctx, cfg.GetProfile(env.Name), functionName, payload) + if err != nil { + return fmt.Errorf("lambda invoke failed: %w", err) + } + // Parse and render response as highlighted lines + var results []LambdaResult + if err := json.Unmarshal([]byte(resp), &results); err != nil { + // Fallback to raw output if not JSON array + cmd.Printf("%s\n", resp) + return nil + } + user := "" + if cfg.UserName != nil { + user = *cfg.UserName + } + renderLambdaResults(lvl, env.Name, user, results) + return nil + }, + }) + } + + if len(envSubCmds) == 0 { + out.Warn("Warning: No subcommands found for envs - missing envs in config?") + } + + cmd.AddCommand(envSubCmds...) + return cmd +} + +// remote2DenyCommand creates the deny subcommand for remote2 +func remote2DenyCommand(ctx context.Context, cfg *config.Config) *cobra.Command { + cmd := &cobra.Command{ + Use: "deny", + Short: "deny access to environment", + } + + envSubCmds := make([]*cobra.Command, 0) + + for _, e := range cfg.Environments { + env := e + envSubCmds = append(envSubCmds, &cobra.Command{ + Use: e.Name, + Short: "deny access to " + env.Name, + RunE: func(cmd *cobra.Command, args []string) error { + lvl := out.Level(out.GetLevel(env)) + + // Build payload with action revoke + payload, err := buildRemoteAccessPayload(cfg, "revoke") + if err != nil { + out.Warn(fmt.Sprintf("Warning: %v. Aborting deny.", err)) + return nil + } + + functionName := fmt.Sprintf("dis-%s-remote-allow", env.Name) + profile := cfg.GetProfile(env.Name) + + out.Highlight(lvl, "invoking lambda %s for %s", functionName, env.Name) + resp, err := aws.InvokeLambda(ctx, profile, functionName, payload) + if err != nil { + return fmt.Errorf("lambda invoke failed: %w", err) + } + var results []LambdaResult + if err := json.Unmarshal([]byte(resp), &results); err != nil { + cmd.Printf("%s\n", resp) + return nil + } + user := "" + if cfg.UserName != nil { + user = *cfg.UserName + } + renderLambdaResults(lvl, env.Name, user, results) + return nil + }, + }) + } + + if len(envSubCmds) == 0 { + out.Warn("Warning: No subcommands found for envs - missing envs in config?") + } + + cmd.AddCommand(envSubCmds...) + return cmd +} diff --git a/command/root_command.go b/command/root_command.go index dfe92e1..a99e984 100644 --- a/command/root_command.go +++ b/command/root_command.go @@ -19,6 +19,8 @@ func Load(ctx context.Context, cfg *config.Config) (*cobra.Command, error) { root = &cobra.Command{ Use: "dp", Short: "dp is a command-line client providing handy helper tools for ONS Dissemination Platform software engineers", + // TODO: The following arg as it makes the output cleaner on errors, but needs regression testing + // SilenceUsage: true, //silence usage when an error occurs } // register the root sub-commands. @@ -40,6 +42,8 @@ func getSubCommands(ctx context.Context, cfg *config.Config) ([]*cobra.Command, generateProjectSubCommand(), spew(), remoteAccess(ctx, cfg), + remote2Access(ctx, cfg), // This is for dual running with the new remote allow lambda + eksCommand(ctx, cfg), overrideKey(), } diff --git a/config/config.go b/config/config.go index 853a5a5..a2e3b5e 100644 --- a/config/config.go +++ b/config/config.go @@ -33,6 +33,8 @@ type Config struct { SSHUser *string `yaml:"ssh-user"` UserName *string `yaml:"user-name"` IPAddress *string `yaml:"ip-address"` + IPv4Address *string `yaml:"ipv4-address"` + IPv6Address *string `yaml:"ipv6-address"` HttpOnly *bool `yaml:"http-only"` DPSetupPath string `yaml:"dp-setup-path"` NisraPath string `yaml:"dp-nisra-path"` @@ -179,6 +181,59 @@ func (cfg Config) GetMyIP() (string, error) { return string(b), nil } +// GetMyIPs2 returns both IPv4 and IPv6 addresses, checking config > cli > env, then external service if needed. +func (cfg Config) GetMyIPs2() (ipv4 string, ipv6 string, err error) { + // 1. Check config/env for explicit values + if cfg.IPv4Address != nil && len(*cfg.IPv4Address) > 0 { + ipv4 = *cfg.IPv4Address + } else if cfg.IPAddress != nil && len(*cfg.IPAddress) > 0 { // legacy fallback + ipv4 = *cfg.IPAddress + } else if ip := os.Getenv("MY_IPV4"); len(ip) > 0 { + ipv4 = ip + } + + if cfg.IPv6Address != nil && len(*cfg.IPv6Address) > 0 { + ipv6 = *cfg.IPv6Address + } else if ip := os.Getenv("MY_IPV6"); len(ip) > 0 { + ipv6 = ip + } + + // 2. If not set, fetch from external service + if ipv4 == "" { + res, err4 := httpClient.Get("https://api.ipify.org") + if err4 == nil && res.StatusCode == 200 { + b, errRead := io.ReadAll(res.Body) + res.Body.Close() + if errRead == nil { + s := string(b) + if isValidIP, _ := cfg.checkGotIP(s); isValidIP { + ipv4 = s + } + } + } + } + if ipv6 == "" { + res, err6 := httpClient.Get("https://api64.ipify.org") + if err6 == nil && res.StatusCode == 200 { + b, errRead := io.ReadAll(res.Body) + res.Body.Close() + if errRead == nil { + s := string(b) + // TODO: improve IPv6 validation + if len(s) > 0 { + ipv6 = s + } + } + } + } + + // 3. Validate at least one IP found + if ipv4 == "" && ipv6 == "" { + err = fmt.Errorf("could not determine IPv4 or IPv6 address from config, env, or external service") + } + return +} + func (env Environment) hasTag(tag string) bool { for _, eachTag := range env.Tags { if eachTag == tag { diff --git a/eks/dependencies.go b/eks/dependencies.go new file mode 100644 index 0000000..f91e7ad --- /dev/null +++ b/eks/dependencies.go @@ -0,0 +1,44 @@ +package eks + +import ( + "os/exec" +) + +// Dependency represents a required external tool +type Dependency struct { + Name string + Command string + InstallHint string +} + +// SessionDependencies lists all external tools needed for EKS tunnel operations +var SessionDependencies = []Dependency{ + { + Name: "AWS CLI", + Command: "aws", + InstallHint: "Install from https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html", + }, + { + Name: "SSM Session Manager Plugin", + Command: "session-manager-plugin", + InstallHint: "Install from https://docs.aws.amazon.com/systems-manager/latest/userguide/session-manager-working-with-install-plugin.html", + }, + { + Name: "socat", + Command: "socat", + InstallHint: "Install with: brew install socat", + }, +} + +// TODO: Currently this is for EKS but can be elevated for use across all sub cmds +// CheckDependencies verifies all required tools are available for the sub cmd. +// Returns a list of missing dependencies. +func CheckDependencies() []Dependency { + var missing []Dependency + for _, dep := range SessionDependencies { + if _, err := exec.LookPath(dep.Command); err != nil { + missing = append(missing, dep) + } + } + return missing +} diff --git a/eks/discovery.go b/eks/discovery.go new file mode 100644 index 0000000..22b14dc --- /dev/null +++ b/eks/discovery.go @@ -0,0 +1,201 @@ +package eks + +import ( + "context" + "fmt" + "strings" + "time" + + "github.com/ONSdigital/dp-cli/aws" + sdkaws "github.com/aws/aws-sdk-go-v2/aws" + "github.com/aws/aws-sdk-go-v2/service/ec2" + ec2types "github.com/aws/aws-sdk-go-v2/service/ec2/types" + "github.com/aws/aws-sdk-go-v2/service/eks" + "github.com/aws/aws-sdk-go-v2/service/ssm" +) + +const ( + // BastionRoleTag is the tag value used to identify EKS bastion instances + BastionRoleTag = "eks-bastion" + // ClusterAccessTag is the tag key used to identify clusters available for tunnel access + ClusterAccessTag = "ssm-tunnel-access" +) + +// ClusterInfo holds discovered EKS cluster information +type ClusterInfo struct { + Name string + Endpoint string +} + +// BastionInfo holds discovered bastion instance information +type BastionInfo struct { + InstanceID string + Name string +} + +// FindBastion discovers the EKS bastion instance by tags. +// Scoped to the AWS account via the profile. +func FindBastion(ctx context.Context, profile string) (*BastionInfo, error) { + cfg, err := aws.GetAWSConfig(ctx, profile) + if err != nil { + return nil, fmt.Errorf("failed to load AWS config: %w", err) + } + + client := ec2.NewFromConfig(cfg) + + result, err := client.DescribeInstances(ctx, &ec2.DescribeInstancesInput{ + Filters: []ec2types.Filter{ + { + Name: sdkaws.String("tag:Role"), + Values: []string{BastionRoleTag}, + }, + { + Name: sdkaws.String("instance-state-name"), + Values: []string{"running"}, + }, + }, + }) + if err != nil { + return nil, fmt.Errorf("failed to describe instances: %w", err) + } + + for _, reservation := range result.Reservations { + for _, instance := range reservation.Instances { + if instance.InstanceId == nil { + continue + } + name := "" + for _, tag := range instance.Tags { + if tag.Key != nil && *tag.Key == "Name" && tag.Value != nil { + name = *tag.Value + } + } + return &BastionInfo{ + InstanceID: *instance.InstanceId, + Name: name, + }, nil + } + } + + return nil, fmt.Errorf("no running EKS bastion found (looking for tag Role=%s)", BastionRoleTag) +} + +// FindClusters discovers EKS clusters available for tunnel access. +// Scoped to the AWS account via the profile — uses ssm-tunnel-access tag as opt-in. +func FindClusters(ctx context.Context, profile string) ([]ClusterInfo, error) { + cfg, err := aws.GetAWSConfig(ctx, profile) + if err != nil { + return nil, fmt.Errorf("failed to load AWS config: %w", err) + } + + client := eks.NewFromConfig(cfg) + + listResult, err := client.ListClusters(ctx, &eks.ListClustersInput{}) + if err != nil { + return nil, fmt.Errorf("failed to list EKS clusters: %w", err) + } + + var clusters []ClusterInfo + for _, clusterName := range listResult.Clusters { + descResult, err := client.DescribeCluster(ctx, &eks.DescribeClusterInput{ + Name: sdkaws.String(clusterName), + }) + if err != nil { + continue + } + + cluster := descResult.Cluster + if cluster == nil || cluster.Tags == nil { + continue + } + + accessTag, hasAccess := cluster.Tags[ClusterAccessTag] + if !hasAccess || accessTag != "true" { + continue + } + + endpoint := "" + if cluster.Endpoint != nil { + endpoint = strings.TrimPrefix(*cluster.Endpoint, "https://") + } + + clusters = append(clusters, ClusterInfo{ + Name: clusterName, + Endpoint: endpoint, + }) + } + + if len(clusters) == 0 { + return nil, fmt.Errorf("no EKS clusters found with tag %s=true", ClusterAccessTag) + } + + return clusters, nil +} + +// ResolveEndpointIPv4 resolves an EKS endpoint to its IPv4 address via the bastion using SSM RunCommand +func ResolveEndpointIPv4(ctx context.Context, profile, bastionID, endpoint string) (string, error) { + cfg, err := aws.GetAWSConfig(ctx, profile) + if err != nil { + return "", fmt.Errorf("failed to load AWS config: %w", err) + } + + client := ssm.NewFromConfig(cfg) + + sendResult, err := client.SendCommand(ctx, &ssm.SendCommandInput{ + InstanceIds: []string{bastionID}, + DocumentName: sdkaws.String("AWS-RunShellScript"), + Parameters: map[string][]string{ + "commands": {fmt.Sprintf("dig +short A %s | head -1", endpoint)}, + }, + }) + if err != nil { + return "", fmt.Errorf("failed to send command to bastion: %w", err) + } + + if sendResult.Command == nil || sendResult.Command.CommandId == nil { + return "", fmt.Errorf("no command ID returned from SSM") + } + + commandID := *sendResult.Command.CommandId + + var ipv4 string + for i := 0; i < 10; i++ { + select { + case <-ctx.Done(): + return "", ctx.Err() + default: + } + + if i == 0 { + time.Sleep(3 * time.Second) + } else { + time.Sleep(2 * time.Second) + } + + getResult, err := client.GetCommandInvocation(ctx, &ssm.GetCommandInvocationInput{ + CommandId: sdkaws.String(commandID), + InstanceId: sdkaws.String(bastionID), + }) + if err != nil { + continue + } + + if getResult.Status == "Success" && getResult.StandardOutputContent != nil { + ipv4 = strings.TrimSpace(*getResult.StandardOutputContent) + break + } + if getResult.Status == "Failed" || getResult.Status == "Cancelled" { + stderr := "" + if getResult.StandardErrorContent != nil { + stderr = *getResult.StandardErrorContent + } + return "", fmt.Errorf("command failed on bastion: %s", stderr) + } + } + + if ipv4 == "" { + return "", fmt.Errorf("failed to resolve IPv4 for %s (timed out waiting for bastion response)", endpoint) + } + + return ipv4, nil +} diff --git a/eks/kubeconfig.go b/eks/kubeconfig.go new file mode 100644 index 0000000..9639101 --- /dev/null +++ b/eks/kubeconfig.go @@ -0,0 +1,23 @@ +package eks + +import ( + "os/exec" + "strings" +) + +// UpdateKubeconfig runs aws eks update-kubeconfig for the given cluster. +// Region is determined by the AWS profile configuration. +func UpdateKubeconfig(clusterName, profile string) (string, error) { + args := []string{ + "eks", "update-kubeconfig", + "--name", clusterName, + "--alias", clusterName, + } + if profile != "" { + args = append(args, "--profile", profile) + } + + cmd := exec.Command("aws", args...) + output, err := cmd.CombinedOutput() + return strings.TrimSpace(string(output)), err +} diff --git a/eks/sudo.go b/eks/sudo.go new file mode 100644 index 0000000..7e029b3 --- /dev/null +++ b/eks/sudo.go @@ -0,0 +1,34 @@ +package eks + +import ( + "fmt" + "os" + "os/exec" +) + +// Prompts the user with an explanation and caches sudo credentials using -v (validate flag). +// This should be called once at the start of operations that require elevated privileges. +// Subsequent sudo calls within the timeout window (typically 5 minutes) won't prompt again. +func EnsureSudo(reason string) error { + // Check if sudo is already cached (no prompt needed) + if exec.Command("sudo", "-n", "true").Run() == nil { + return nil + } + + // Explain why we need sudo + fmt.Println() + fmt.Printf(" sudo access required: %s\n", reason) + fmt.Println(" Enter your password when prompted.") + fmt.Println() + + // Run sudo -v to cache credentials (this will prompt) + cmd := exec.Command("sudo", "-v") + cmd.Stdin = os.Stdin + cmd.Stdout = os.Stdout + cmd.Stderr = os.Stderr + if err := cmd.Run(); err != nil { + return fmt.Errorf("failed to obtain sudo access: %w", err) + } + + return nil +} diff --git a/eks/tunnel.go b/eks/tunnel.go new file mode 100644 index 0000000..6c8f0a3 --- /dev/null +++ b/eks/tunnel.go @@ -0,0 +1,413 @@ +package eks + +import ( + "fmt" + "net" + "os" + "os/exec" + "path/filepath" + "runtime" + "strconv" + "strings" + "syscall" + "time" +) + +const ( + tunnelDir = "/tmp/eks-tunnels-ssm" + hostsMarker = "# EKS-TUNNEL-MANAGED" + basePort = 9443 + maxPort = 9500 +) + +// TunnelState represents the persisted state of an active tunnel +type TunnelState struct { + ClusterName string + SSMPid int + SocatPid int + Endpoint string + LoopbackIP string + IPv4 string + LocalPort int +} + +// EnsureTunnelDir creates the tunnel state directory if it doesn't exist +func EnsureTunnelDir() error { + return os.MkdirAll(tunnelDir, 0755) +} + +// AllocateLocalPort finds an available port in the range 9443-9500 +func AllocateLocalPort() (int, error) { + for port := basePort; port < maxPort; port++ { + cmd := exec.Command("lsof", "-i", fmt.Sprintf(":%d", port)) + if err := cmd.Run(); err != nil { + // lsof returns non-zero if port is not in use + return port, nil + } + } + return 0, fmt.Errorf("no available ports in range %d-%d", basePort, maxPort) +} + +// AllocateLoopbackIP finds the next available loopback IP where port 443 is not in use. +// Checks 127.0.0.1 through 127.0.0.254. +// Uses sudo lsof to detect root-owned processes (socat runs as root). +func AllocateLoopbackIP() (string, error) { + for i := 1; i <= 254; i++ { + ip := fmt.Sprintf("127.0.0.%d", i) + // Check if port 443 is already bound on this IP (need sudo to see root processes) + cmd := exec.Command("sudo", "lsof", "-i", fmt.Sprintf("@%s:443", ip)) + if err := cmd.Run(); err != nil { + // lsof returns non-zero if nothing is bound — this IP is available + return ip, nil + } + } + return "", fmt.Errorf("no available loopback IP found (all 127.0.0.1-254 have port 443 in use)") +} + +// EnsureLoopbackAlias creates a loopback alias for addresses other than 127.0.0.1 +func EnsureLoopbackAlias(ip string) error { + if ip == "127.0.0.1" { + return nil + } + + switch runtime.GOOS { + case "darwin": + out, err := exec.Command("ifconfig", "lo0").Output() + if err != nil { + return fmt.Errorf("failed to check loopback interfaces: %w", err) + } + if strings.Contains(string(out), "inet "+ip+" ") { + return nil + } + cmd := exec.Command("sudo", "ifconfig", "lo0", "alias", ip) + cmd.Stdout = os.Stdout + cmd.Stderr = os.Stderr + return cmd.Run() + case "linux": + out, err := exec.Command("ip", "addr", "show", "dev", "lo").Output() + if err != nil { + return fmt.Errorf("failed to check loopback interfaces: %w", err) + } + if strings.Contains(string(out), " "+ip+"/") { + return nil + } + cmd := exec.Command("sudo", "ip", "addr", "add", ip+"/8", "dev", "lo") + cmd.Stdout = os.Stdout + cmd.Stderr = os.Stderr + return cmd.Run() + default: + return fmt.Errorf("unsupported OS: %s", runtime.GOOS) + } +} + +// StartSSMPortForward starts an SSM port forwarding session in the background. +func StartSSMPortForward(bastionID, targetIP string, localPort int, profile string) (int, error) { + logFile := filepath.Join(tunnelDir, fmt.Sprintf("ssm-%d.log", localPort)) + + args := []string{ + "ssm", "start-session", + "--target", bastionID, + "--document-name", "AWS-StartPortForwardingSessionToRemoteHost", + "--parameters", fmt.Sprintf(`{"host":["%s"],"portNumber":["443"],"localPortNumber":["%d"]}`, targetIP, localPort), + } + if profile != "" { + args = append(args, "--profile", profile) + } + + cmd := exec.Command("aws", args...) + + f, err := os.Create(logFile) + if err != nil { + return 0, fmt.Errorf("failed to create log file: %w", err) + } + cmd.Stdout = f + cmd.Stderr = f + + if err := cmd.Start(); err != nil { + f.Close() + return 0, fmt.Errorf("failed to start SSM session: %w", err) + } + f.Close() + + // Wait for session to establish + pid := cmd.Process.Pid + for i := 0; i < 15; i++ { + time.Sleep(1 * time.Second) + content, _ := os.ReadFile(logFile) + if strings.Contains(string(content), "Waiting for connections") { + return pid, nil + } + // Check if process is still alive + if cmd.ProcessState != nil && cmd.ProcessState.Exited() { + content, _ := os.ReadFile(logFile) + return 0, fmt.Errorf("SSM session died: %s", string(content)) + } + } + + return pid, nil // Return pid even if we didn't see the message — it might still be starting +} + +// StartSocat starts socat to bind a loopback IP on port 443 to a local port +func StartSocat(loopbackIP string, localPort int) (int, error) { + listenAddr := fmt.Sprintf("TCP-LISTEN:443,bind=%s,fork,reuseaddr", loopbackIP) + targetAddr := fmt.Sprintf("TCP:127.0.0.1:%d", localPort) + + cmd := exec.Command("sudo", "socat", listenAddr, targetAddr) + cmd.Stdout = nil + cmd.Stderr = nil + // Detach from the controlling terminal so socat survives after dp exits + cmd.SysProcAttr = &syscall.SysProcAttr{ + Setpgid: true, + } + + if err := cmd.Start(); err != nil { + return 0, fmt.Errorf("failed to start socat: %w", err) + } + + // Give socat a moment to bind + time.Sleep(500 * time.Millisecond) + + // Find the actual socat PID (sudo may have forked it) + pid := findSocatPid(loopbackIP, localPort) + if pid > 0 { + return pid, nil + } + + // Fallback to the sudo PID if we can't find the socat process + return cmd.Process.Pid, nil +} + +// findSocatPid finds the PID of a socat process bound to the given IP on port 443 +// that forwards to the specified local port (to avoid matching unrelated socat instances) +func findSocatPid(loopbackIP string, localPort int) int { + // Match socat processes with both our bind address AND our target port + pattern := fmt.Sprintf("socat.*bind=%s.*127.0.0.1:%d", loopbackIP, localPort) + out, err := exec.Command("pgrep", "-f", pattern).Output() + if err != nil { + return 0 + } + lines := strings.Split(strings.TrimSpace(string(out)), "\n") + if len(lines) > 0 && lines[0] != "" { + pid, err := strconv.Atoi(lines[0]) + if err == nil { + return pid + } + } + return 0 +} + +// AddHostsEntry adds an entry to /etc/hosts for the given IP and hostname +func AddHostsEntry(ip, hostname, clusterName string) error { + // Remove any existing entry first + RemoveHostsEntry(hostname) + + entry := fmt.Sprintf("%s %s %s %s\n", ip, hostname, hostsMarker, clusterName) + cmd := exec.Command("sudo", "tee", "-a", "/etc/hosts") + cmd.Stdin = strings.NewReader(entry) + cmd.Stdout = nil // suppress tee output + return cmd.Run() +} + +// RemoveHostsEntry removes a managed hosts entry for the given hostname +func RemoveHostsEntry(hostname string) { + content, err := os.ReadFile("/etc/hosts") + if err != nil { + return + } + var filtered []string + for _, line := range strings.Split(string(content), "\n") { + if !(strings.Contains(line, hostname) && strings.Contains(line, hostsMarker)) { + filtered = append(filtered, line) + } + } + cmd := exec.Command("sudo", "tee", "/etc/hosts") + cmd.Stdin = strings.NewReader(strings.Join(filtered, "\n")) + cmd.Stdout = nil + cmd.Run() +} + +// FlushDNSCache flushes the OS DNS cache. Behaviour is OS-specific and best-effort. +func FlushDNSCache() { + switch runtime.GOOS { + case "darwin": + exec.Command("sudo", "dscacheutil", "-flushcache").Run() + exec.Command("sudo", "killall", "-HUP", "mDNSResponder").Run() + case "linux": + // Try resolvectl (systemd 239+), then systemd-resolve, then nscd — all best-effort + if exec.Command("resolvectl", "flush-caches").Run() != nil { + if exec.Command("systemd-resolve", "--flush-caches").Run() != nil { + exec.Command("sudo", "systemctl", "restart", "nscd").Run() + } + } + } +} + +// SaveTunnelState persists tunnel state to disk +func SaveTunnelState(state TunnelState) error { + prefix := filepath.Join(tunnelDir, state.ClusterName) + writes := map[string]string{ + prefix + ".ssm.pid": strconv.Itoa(state.SSMPid), + prefix + ".socat.pid": strconv.Itoa(state.SocatPid), + prefix + ".endpoint": state.Endpoint, + prefix + ".loopback": state.LoopbackIP, + prefix + ".ipv4": state.IPv4, + prefix + ".port": strconv.Itoa(state.LocalPort), + } + for path, content := range writes { + if err := os.WriteFile(path, []byte(content), 0644); err != nil { + return fmt.Errorf("failed to write %s: %w", path, err) + } + } + return nil +} + +// LoadTunnelState reads tunnel state from disk +func LoadTunnelState(clusterName string) (*TunnelState, error) { + prefix := filepath.Join(tunnelDir, clusterName) + + ssmPidBytes, err := os.ReadFile(prefix + ".ssm.pid") + if err != nil { + return nil, err + } + socatPidBytes, _ := os.ReadFile(prefix + ".socat.pid") + endpointBytes, _ := os.ReadFile(prefix + ".endpoint") + loopbackBytes, _ := os.ReadFile(prefix + ".loopback") + ipv4Bytes, _ := os.ReadFile(prefix + ".ipv4") + portBytes, _ := os.ReadFile(prefix + ".port") + + ssmPid, _ := strconv.Atoi(strings.TrimSpace(string(ssmPidBytes))) + socatPid, _ := strconv.Atoi(strings.TrimSpace(string(socatPidBytes))) + localPort, _ := strconv.Atoi(strings.TrimSpace(string(portBytes))) + + return &TunnelState{ + ClusterName: clusterName, + SSMPid: ssmPid, + SocatPid: socatPid, + Endpoint: strings.TrimSpace(string(endpointBytes)), + LoopbackIP: strings.TrimSpace(string(loopbackBytes)), + IPv4: strings.TrimSpace(string(ipv4Bytes)), + LocalPort: localPort, + }, nil +} + +// ListActiveTunnels returns all tunnel states from disk +func ListActiveTunnels() ([]TunnelState, error) { + if err := EnsureTunnelDir(); err != nil { + return nil, err + } + + matches, err := filepath.Glob(filepath.Join(tunnelDir, "*.ssm.pid")) + if err != nil { + return nil, err + } + + var tunnels []TunnelState + for _, match := range matches { + name := strings.TrimSuffix(filepath.Base(match), ".ssm.pid") + state, err := LoadTunnelState(name) + if err != nil { + continue + } + tunnels = append(tunnels, *state) + } + return tunnels, nil +} + +// IsProcessAlive checks if a process with the given PID is still running +// and is actually one of our managed processes (aws ssm or socat) +func IsProcessAlive(pid int) bool { + if pid <= 0 { + return false + } + // Get the command name for this PID + out, err := exec.Command("ps", "-p", strconv.Itoa(pid), "-o", "command=").Output() + if err != nil { + return false + } + cmd := strings.TrimSpace(string(out)) + // Verify it's one of our processes + return strings.Contains(cmd, "aws ssm") || + strings.Contains(cmd, "session-manager-plugin") || + strings.Contains(cmd, "socat") +} + +// KillProcess sends SIGTERM to a process +func KillProcess(pid int, useSudo bool) { + if pid <= 0 { + return + } + if useSudo { + exec.Command("sudo", "kill", strconv.Itoa(pid)).Run() + } else { + if p, err := os.FindProcess(pid); err == nil { + p.Kill() + } + } +} + +// CleanupTunnelState removes all state files for a cluster +func CleanupTunnelState(clusterName string) { + prefix := filepath.Join(tunnelDir, clusterName) + + // Read the port before deleting so we can clean up the log file + portBytes, _ := os.ReadFile(prefix + ".port") + port := strings.TrimSpace(string(portBytes)) + if port != "" { + os.Remove(filepath.Join(tunnelDir, fmt.Sprintf("ssm-%s.log", port))) + } + + extensions := []string{".ssm.pid", ".socat.pid", ".endpoint", ".loopback", ".ipv4", ".port"} + for _, ext := range extensions { + os.Remove(prefix + ext) + } +} + +// CleanupStaleTunnels finds and kills any orphaned socat/SSM processes from previous runs +func CleanupStaleTunnels() { + tunnels, err := ListActiveTunnels() + if err != nil { + return + } + + for _, t := range tunnels { + ssmAlive := IsProcessAlive(t.SSMPid) + socatAlive := IsProcessAlive(t.SocatPid) + + // If both are dead, just clean up state + if !ssmAlive && !socatAlive { + if t.Endpoint != "" { + RemoveHostsEntry(t.Endpoint) + } + CleanupTunnelState(t.ClusterName) + continue + } + + // If only one is alive, kill it and clean up + if socatAlive && !ssmAlive { + KillProcess(t.SocatPid, true) + if t.Endpoint != "" { + RemoveHostsEntry(t.Endpoint) + } + CleanupTunnelState(t.ClusterName) + } + if ssmAlive && !socatAlive { + KillProcess(t.SSMPid, false) + if t.Endpoint != "" { + RemoveHostsEntry(t.Endpoint) + } + CleanupTunnelState(t.ClusterName) + } + } +} + +// CheckAPIConnectivity performs an end-to-end check by opening a TCP connection +// to the EKS API endpoint on port 443. A successful dial confirms the tunnel is +// routing traffic without requiring TLS validation or valid auth credentials. +func CheckAPIConnectivity(endpoint string) bool { + conn, err := net.DialTimeout("tcp", endpoint+":443", 5*time.Second) + if err != nil { + return false + } + conn.Close() + return true +} diff --git a/go.mod b/go.mod index 7c00087..44e29b6 100644 --- a/go.mod +++ b/go.mod @@ -1,12 +1,15 @@ module github.com/ONSdigital/dp-cli -go 1.24.0 +go 1.26.0 require ( github.com/ONSdigital/log.go/v2 v2.5.1 - github.com/aws/aws-sdk-go-v2 v1.41.0 + github.com/aws/aws-sdk-go-v2 v1.41.7 github.com/aws/aws-sdk-go-v2/config v1.32.5 github.com/aws/aws-sdk-go-v2/service/ec2 v1.276.1 + github.com/aws/aws-sdk-go-v2/service/eks v1.83.0 + github.com/aws/aws-sdk-go-v2/service/lambda v1.87.0 + github.com/aws/aws-sdk-go-v2/service/ssm v1.68.6 github.com/fatih/color v1.18.0 github.com/google/go-github/v66 v66.0.0 github.com/johnnadratowski/golang-neo4j-bolt-driver v0.0.0-20200323142034-807201386efa @@ -21,10 +24,11 @@ require ( require ( github.com/ONSdigital/dp-api-clients-go/v2 v2.266.0 // indirect github.com/ONSdigital/dp-net/v3 v3.3.0 // indirect + github.com/aws/aws-sdk-go-v2/aws/protocol/eventstream v1.7.4 // indirect github.com/aws/aws-sdk-go-v2/credentials v1.19.5 // indirect github.com/aws/aws-sdk-go-v2/feature/ec2/imds v1.18.16 // indirect - github.com/aws/aws-sdk-go-v2/internal/configsources v1.4.16 // indirect - github.com/aws/aws-sdk-go-v2/internal/endpoints/v2 v2.7.16 // indirect + github.com/aws/aws-sdk-go-v2/internal/configsources v1.4.23 // indirect + github.com/aws/aws-sdk-go-v2/internal/endpoints/v2 v2.7.23 // indirect github.com/aws/aws-sdk-go-v2/internal/ini v1.8.4 // indirect github.com/aws/aws-sdk-go-v2/service/internal/accept-encoding v1.13.4 // indirect github.com/aws/aws-sdk-go-v2/service/internal/presigned-url v1.13.16 // indirect @@ -32,7 +36,7 @@ require ( github.com/aws/aws-sdk-go-v2/service/sso v1.30.7 // indirect github.com/aws/aws-sdk-go-v2/service/ssooidc v1.35.12 // indirect github.com/aws/aws-sdk-go-v2/service/sts v1.41.5 // indirect - github.com/aws/smithy-go v1.24.0 // indirect + github.com/aws/smithy-go v1.25.1 // indirect github.com/go-logr/logr v1.4.2 // indirect github.com/go-logr/stdr v1.2.2 // indirect github.com/google/go-querystring v1.1.0 // indirect diff --git a/go.sum b/go.sum index 451c67d..ae1c64a 100644 --- a/go.sum +++ b/go.sum @@ -4,36 +4,44 @@ github.com/ONSdigital/dp-net/v3 v3.3.0 h1:NAH9z+nvbJxoK6OnDpOyJJ+52dqBhVtaugk5bq github.com/ONSdigital/dp-net/v3 v3.3.0/go.mod h1:ur4LLCvd2xW2jpa785pElE6HB2bPvszZxdAjqv0XFGg= github.com/ONSdigital/log.go/v2 v2.5.1 h1:GCM270UHSP5+mv4OaQ2oHiWp0FiSgfnM6imW2zpAslw= github.com/ONSdigital/log.go/v2 v2.5.1/go.mod h1:KZNEweCUHD8dKwhlvoRvgd2Y2aUIuU3H9/MmbFyVzW8= -github.com/aws/aws-sdk-go-v2 v1.41.0 h1:tNvqh1s+v0vFYdA1xq0aOJH+Y5cRyZ5upu6roPgPKd4= -github.com/aws/aws-sdk-go-v2 v1.41.0/go.mod h1:MayyLB8y+buD9hZqkCW3kX1AKq07Y5pXxtgB+rRFhz0= +github.com/aws/aws-sdk-go-v2 v1.41.7 h1:DWpAJt66FmnnaRIOT/8ASTucrvuDPZASqhhLey6tLY8= +github.com/aws/aws-sdk-go-v2 v1.41.7/go.mod h1:4LAfZOPHNVNQEckOACQx60Y8pSRjIkNZQz1w92xpMJc= +github.com/aws/aws-sdk-go-v2/aws/protocol/eventstream v1.7.4 h1:489krEF9xIGkOaaX3CE/Be2uWjiXrkCH6gUX+bZA/BU= +github.com/aws/aws-sdk-go-v2/aws/protocol/eventstream v1.7.4/go.mod h1:IOAPF6oT9KCsceNTvvYMNHy0+kMF8akOjeDvPENWxp4= github.com/aws/aws-sdk-go-v2/config v1.32.5 h1:pz3duhAfUgnxbtVhIK39PGF/AHYyrzGEyRD9Og0QrE8= github.com/aws/aws-sdk-go-v2/config v1.32.5/go.mod h1:xmDjzSUs/d0BB7ClzYPAZMmgQdrodNjPPhd6bGASwoE= github.com/aws/aws-sdk-go-v2/credentials v1.19.5 h1:xMo63RlqP3ZZydpJDMBsH9uJ10hgHYfQFIk1cHDXrR4= github.com/aws/aws-sdk-go-v2/credentials v1.19.5/go.mod h1:hhbH6oRcou+LpXfA/0vPElh/e0M3aFeOblE1sssAAEk= github.com/aws/aws-sdk-go-v2/feature/ec2/imds v1.18.16 h1:80+uETIWS1BqjnN9uJ0dBUaETh+P1XwFy5vwHwK5r9k= github.com/aws/aws-sdk-go-v2/feature/ec2/imds v1.18.16/go.mod h1:wOOsYuxYuB/7FlnVtzeBYRcjSRtQpAW0hCP7tIULMwo= -github.com/aws/aws-sdk-go-v2/internal/configsources v1.4.16 h1:rgGwPzb82iBYSvHMHXc8h9mRoOUBZIGFgKb9qniaZZc= -github.com/aws/aws-sdk-go-v2/internal/configsources v1.4.16/go.mod h1:L/UxsGeKpGoIj6DxfhOWHWQ/kGKcd4I1VncE4++IyKA= -github.com/aws/aws-sdk-go-v2/internal/endpoints/v2 v2.7.16 h1:1jtGzuV7c82xnqOVfx2F0xmJcOw5374L7N6juGW6x6U= -github.com/aws/aws-sdk-go-v2/internal/endpoints/v2 v2.7.16/go.mod h1:M2E5OQf+XLe+SZGmmpaI2yy+J326aFf6/+54PoxSANc= +github.com/aws/aws-sdk-go-v2/internal/configsources v1.4.23 h1:GpT/TrnBYuE5gan2cZbTtvP+JlHsutdmlV2YfEyNde0= +github.com/aws/aws-sdk-go-v2/internal/configsources v1.4.23/go.mod h1:xYWD6BS9ywC5bS3sz9Xh04whO/hzK2plt2Zkyrp4JuA= +github.com/aws/aws-sdk-go-v2/internal/endpoints/v2 v2.7.23 h1:bpd8vxhlQi2r1hiueOw02f/duEPTMK59Q4QMAoTTtTo= +github.com/aws/aws-sdk-go-v2/internal/endpoints/v2 v2.7.23/go.mod h1:15DfR2nw+CRHIk0tqNyifu3G1YdAOy68RftkhMDDwYk= github.com/aws/aws-sdk-go-v2/internal/ini v1.8.4 h1:WKuaxf++XKWlHWu9ECbMlha8WOEGm0OUEZqm4K/Gcfk= github.com/aws/aws-sdk-go-v2/internal/ini v1.8.4/go.mod h1:ZWy7j6v1vWGmPReu0iSGvRiise4YI5SkR3OHKTZ6Wuc= github.com/aws/aws-sdk-go-v2/service/ec2 v1.276.1 h1:P7db/Z55pXvwnueLuHUuVlxnqjbAtiadm01+QIC42OA= github.com/aws/aws-sdk-go-v2/service/ec2 v1.276.1/go.mod h1:Wg68QRgy2gEGGdmTPU/UbVpdv8sM14bUZmF64KFwAsY= +github.com/aws/aws-sdk-go-v2/service/eks v1.83.0 h1:mS5rkyFt+NYryy0p4n8o80tJjBmXiQrRCQjP8jZcSLY= +github.com/aws/aws-sdk-go-v2/service/eks v1.83.0/go.mod h1:JQcyECIV9iZHm+GMrWn1pTPTJYRavOVsqPvlCbjt+Fg= github.com/aws/aws-sdk-go-v2/service/internal/accept-encoding v1.13.4 h1:0ryTNEdJbzUCEWkVXEXoqlXV72J5keC1GvILMOuD00E= github.com/aws/aws-sdk-go-v2/service/internal/accept-encoding v1.13.4/go.mod h1:HQ4qwNZh32C3CBeO6iJLQlgtMzqeG17ziAA/3KDJFow= github.com/aws/aws-sdk-go-v2/service/internal/presigned-url v1.13.16 h1:oHjJHeUy0ImIV0bsrX0X91GkV5nJAyv1l1CC9lnO0TI= github.com/aws/aws-sdk-go-v2/service/internal/presigned-url v1.13.16/go.mod h1:iRSNGgOYmiYwSCXxXaKb9HfOEj40+oTKn8pTxMlYkRM= +github.com/aws/aws-sdk-go-v2/service/lambda v1.87.0 h1:E5UXxF3vK3JuViwKCHfTJBIiFjvE4aytSucZjI2UAlQ= +github.com/aws/aws-sdk-go-v2/service/lambda v1.87.0/go.mod h1:6f64Y1BEf6e1uCI+LtGbcZSKDK1GvgJ+iI4vP/bbE8s= github.com/aws/aws-sdk-go-v2/service/signin v1.0.4 h1:HpI7aMmJ+mm1wkSHIA2t5EaFFv5EFYXePW30p1EIrbQ= github.com/aws/aws-sdk-go-v2/service/signin v1.0.4/go.mod h1:C5RdGMYGlfM0gYq/tifqgn4EbyX99V15P2V3R+VHbQU= +github.com/aws/aws-sdk-go-v2/service/ssm v1.68.6 h1:0LPJjbSNEDHidGOXa0LfvSVbdn9/GdlJUQTgE0kFpso= +github.com/aws/aws-sdk-go-v2/service/ssm v1.68.6/go.mod h1:SrZAopBP5/lyQ6NBVXKlRp8wPIXhzBCZU98sEozmv8Y= github.com/aws/aws-sdk-go-v2/service/sso v1.30.7 h1:eYnlt6QxnFINKzwxP5/Ucs1vkG7VT3Iezmvfgc2waUw= github.com/aws/aws-sdk-go-v2/service/sso v1.30.7/go.mod h1:+fWt2UHSb4kS7Pu8y+BMBvJF0EWx+4H0hzNwtDNRTrg= github.com/aws/aws-sdk-go-v2/service/ssooidc v1.35.12 h1:AHDr0DaHIAo8c9t1emrzAlVDFp+iMMKnPdYy6XO4MCE= github.com/aws/aws-sdk-go-v2/service/ssooidc v1.35.12/go.mod h1:GQ73XawFFiWxyWXMHWfhiomvP3tXtdNar/fi8z18sx0= github.com/aws/aws-sdk-go-v2/service/sts v1.41.5 h1:SciGFVNZ4mHdm7gpD1dgZYnCuVdX1s+lFTg4+4DOy70= github.com/aws/aws-sdk-go-v2/service/sts v1.41.5/go.mod h1:iW40X4QBmUxdP+fZNOpfmkdMZqsovezbAeO+Ubiv2pk= -github.com/aws/smithy-go v1.24.0 h1:LpilSUItNPFr1eY85RYgTIg5eIEPtvFbskaFcmmIUnk= -github.com/aws/smithy-go v1.24.0/go.mod h1:LEj2LM3rBRQJxPZTB4KuzZkaZYnZPnvgIhb4pu07mx0= +github.com/aws/smithy-go v1.25.1 h1:J8ERsGSU7d+aCmdQur5Txg6bVoYelvQJgtZehD12GkI= +github.com/aws/smithy-go v1.25.1/go.mod h1:YE2RhdIuDbA5E5bTdciG9KrW3+TiEONeUWCqxX9i1Fc= github.com/cpuguy83/go-md2man/v2 v2.0.6/go.mod h1:oOW0eioCTA6cOiMLiUPZOpcVxMig6NIQQ7OS05n1F4g= github.com/davecgh/go-spew v1.1.1 h1:vj9j/u1bqnvCEfJOwUhtlOARqs3+rkHYY13jYWTU97c= github.com/davecgh/go-spew v1.1.1/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38=