feat: Add Google Cloud Vertex AI provider support#22
Open
itdove wants to merge 15 commits intoLobsterTrap:midstreamfrom
Open
feat: Add Google Cloud Vertex AI provider support#22itdove wants to merge 15 commits intoLobsterTrap:midstreamfrom
itdove wants to merge 15 commits intoLobsterTrap:midstreamfrom
Conversation
- Add vertex provider plugin with ANTHROPIC_VERTEX_PROJECT_ID credential - Add vertex inference profile with Anthropic-compatible protocols - Register vertex in provider registry and CLI - Add vertex to supported inference provider types - Fix scripts/podman.env to use correct env var names for local registry - Update docs for simplified CLI install workflow Known limitation: GCP OAuth authentication not yet implemented. Vertex provider can be created and configured but API calls will fail until OAuth token generation is added.
- Note that mise run cluster:build:full builds AND starts the gateway - Add verification step after build completes - Clarify that gateway is already running before sandbox creation
- Add vertex to supported provider types table in manage-providers.md - Add Vertex AI provider tab in inference configuration docs - Clarify two usage modes: direct API calls vs inference.local routing - Document prerequisites (GCP project, Application Default Credentials) - Note OAuth limitation only affects inference routing, not direct calls - Keep Vertex docs in provider/inference pages, not installation guides
- Add gcp_auth dependency for OAuth token generation - Generate OAuth tokens from Application Default Credentials in vertex provider - Store tokens as VERTEX_OAUTH_TOKEN credential for router authentication - Update inference profile to use Bearer auth with OAuth tokens - Construct Vertex-specific URLs with :streamRawPredict endpoint - Support project ID from credentials for URL construction - Add model parameter to build_backend_url for Vertex routing
Avoid tokio runtime nesting panic by spawning OAuth token generation in a separate OS thread with its own runtime. This allows provider discovery to work when called from within an existing tokio context.
…r ordering - Delete all sandboxes before destroying gateway - Explicitly stop and remove cluster and registry containers by name - Remove images by specific tags (localhost/openshell/*) - Run cargo clean for build artifacts - Add reinstall instructions to completion message - Better error handling with 2>/dev/null redirects
…iables Add selective direct injection for provider credentials that need to be accessible as real environment variables (not placeholders). This allows tools like `claude` CLI to read Vertex AI credentials directly. Changes: - Add direct_inject_credentials() list for credentials requiring direct access - Modify from_provider_env() to support selective direct injection - Inject ANTHROPIC_VERTEX_PROJECT_ID, VERTEX_OAUTH_TOKEN, and ANTHROPIC_VERTEX_REGION as actual values instead of placeholders - Other credentials continue using openshell:resolve:env:* placeholders for HTTP proxy resolution Security note: Directly injected credentials are visible via /proc/*/environ, unlike placeholder-based credentials which are only resolved within HTTP requests. Only credentials essential for CLI tool compatibility are included.
- Add CLAUDE_CODE_USE_VERTEX to direct injection list - Automatically set CLAUDE_CODE_USE_VERTEX=1 in Vertex provider credentials - Enables claude CLI to auto-detect Vertex AI without manual config Now sandboxes with Vertex provider will automatically have: - ANTHROPIC_VERTEX_PROJECT_ID (from env) - VERTEX_OAUTH_TOKEN (generated from GCP ADC) - CLAUDE_CODE_USE_VERTEX=1 (auto-set) The claude CLI can now use Vertex AI with zero manual configuration.
…rmance - Change Podman machine default memory from 8 GB to 12 GB - Update documentation to reflect 12 GB default - Update troubleshooting to suggest 16 GB for build issues 12 GB provides better performance for Rust compilation and reduces out-of-memory issues during parallel builds.
Replace manual 'cargo build + cp' with 'cargo install --path' Add verification step with 'openshell gateway info' Keep correct 'mise run cluster:build:full' command
Vertex AI's :streamRawPredict endpoint expects the model in the URL path, not in the request body. The router was incorrectly inserting the model field, causing "Extra inputs are not permitted" errors. Changes: - Router now detects Vertex AI endpoints and removes model field - Added bash 3 compatibility fix for cluster-deploy-fast.sh - Added scripts/rebuild-cluster.sh for development workflow - Updated documentation for Vertex AI setup and rebuild process Fixes inference routing to Vertex AI via inference.local endpoint.
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
Added examples/vertex-ai/ directory with: - sandbox-policy.yaml: Network policy for Vertex AI endpoints - README.md: Quick start guide with links to full documentation Provides ready-to-use policy file for Vertex AI integration.
Podman does not support --push flag in build command like Docker buildx. This commit fixes two issues: 1. docker-build-image.sh: Filter out --push flag and execute push as separate command after build completes 2. docker-publish-multiarch.sh: Use safe array expansion syntax to avoid unbound variable errors with set -u when EXTRA_TAGS is empty Note: Multi-arch builds with Podman still require manual workflow due to cross-compilation toolchain issues. Use /tmp/build-multiarch-local.sh for local multi-arch builds with QEMU emulation. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
…h.sh Add Podman-specific multi-architecture build logic to complement existing Docker buildx support. Podman builds each platform sequentially using manifest lists, while Docker buildx builds in parallel. Changes: - Detect Podman and use manifest-based approach for multi-arch builds - Build each platform (arm64, amd64) separately with explicit TARGETARCH - Create and push manifest list combining all architectures - Preserve existing Docker buildx workflow unchanged - Add informative logging about sequential vs parallel builds Build times: - Podman: Sequential builds (~30-40 min on Linux, ~45-60 min on macOS) - Docker buildx: Parallel builds (~20-30 min) This enables multi-arch image publishing on systems using Podman as the container runtime, supporting both Apple Silicon and Intel architectures.
Fix CI formatting check failures: - Split long .insert() calls across multiple lines - Reformat MockDiscoveryContext initialization No functional changes, formatting only.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds complete support for Google Cloud Vertex AI as an inference provider, enabling OpenShell sandboxes to use Claude models via GCP Vertex AI with OAuth authentication.
This implementation includes full end-to-end testing and supports both direct Claude CLI usage and inference routing via
inference.local.Features
Vertex AI Provider
ANTHROPIC_VERTEX_REGION(defaults to us-central1)CLAUDE_CODE_USE_VERTEX=1automaticallyInference Routing
:streamRawPredictendpointvertex-2023-10-16Anthropic API version@separator (e.g.,claude-sonnet-4-5@20250929)Direct Credential Injection
ANTHROPIC_VERTEX_PROJECT_ID,VERTEX_OAUTH_TOKEN,CLAUDE_CODE_USE_VERTEX,ANTHROPIC_VERTEX_REGIONopenshell:resolve:env:*placeholdersNetwork Policy Support
oauth2.googleapis.com,accounts.google.com*-aiplatform.googleapis.com)inference.localendpoint for privacy-aware routingChanges
Core Implementation
crates/openshell-providers/src/providers/vertex.rs- Vertex AI provider plugin with OAuth generationcrates/openshell-core/src/inference.rs- VERTEX_PROFILE with Bearer auth and vertex API versioncrates/openshell-server/src/inference.rs- Vertex URL construction with project ID and regioncrates/openshell-router/src/backend.rs- Critical fix: Removes model field from request body for Vertex AIcrates/openshell-sandbox/src/secrets.rs- Direct credential injection for CLI compatibilitycrates/openshell-providers/Cargo.toml- Addgcp_authdependencycrates/openshell-providers/src/lib.rs- Register vertex providercrates/openshell-cli/src/main.rs- Add Vertex to provider type enumExamples
examples/vertex-ai/sandbox-policy.yaml- New: Network policy for Vertex AI endpointsexamples/vertex-ai/README.md- New: Quick start guide with documentation referencesDevelopment Improvements
tasks/scripts/cluster-deploy-fast.sh- Bash 3 compatibility fix (replacesmapfile)scripts/rebuild-cluster.sh- New: Quick rebuild script for development workflowscripts/setup-podman-macos.sh- Increase default memory from 8 GB to 12 GB for better build performancecleanup-openshell-podman-macos.sh- Improved cleanup with sandbox deletionDocumentation
docs/sandboxes/manage-providers.md- Updated Vertex provider documentation, removed OAuth limitation notedocs/inference/configure.md- Updated Vertex AI setup guide with OAuth token generationdocs/get-started/install-podman-macos.md- Added rebuild/cleanup workflow documentationCONTRIBUTING.md- Added development rebuild workflowUsage
Prerequisites
Quick Start
Inference Routing (Optional)
Testing
✅ Fully tested end-to-end on macOS with:
inference.localKey Test Results:
Technical Details
Router Fix (Critical)
The router was incorrectly inserting the
modelfield into request bodies for all providers. Vertex AI's:streamRawPredictendpoint expects the model in the URL path, not the request body, causing "Extra inputs are not permitted" errors.Fix: Router now detects Vertex AI endpoints (
aiplatform.googleapis.com) and removes the model field from the request body while keeping it in the URL path.Credential Flow
CLAUDE_CODE_USE_VERTEX=1~/.config/gcloud/directoryURL Structure
Vertex AI requests go to:
The router constructs this URL and removes the
modelfield from the JSON body.Development Workflow
Rebuilding After Changes
Breaking Changes
None. All changes are additive.
Related Issues
Addresses the need for Vertex AI provider support for users who:
Checklist
cargo checkandcargo testsucceed)