[Review] Model Engine OnPrem Support and vLLM 0.11.1 + Model Engine Integration Fixes by govambam · Pull Request #1 · govambam/llm-engine

govambam · 2026-02-10T22:18:46Z

Recreated from scaleapi#744 for Macroscope review.

Original PR: scaleapi#744 by @charlesahn-scale
Status: closed (merged)

Recreated using squash merge commit - exact merged state preserved.

…n Fixes (scaleapi#744) * add support for on-prem * clean up on-prem artificats * add back comments from initial code * fix lint * use ecr image repo:tag directly * fix: isort import ordering * fix: remove unused infra_config import * fix: mypy type annotation errors * fix: remove type annotation causing mypy no-redef error * fix: mypy type errors in s3_utils.py and io.py - use botocore.config.Config * fix: mypy typeddict-item errors - use broad type ignore * fix: update test mocks to use get_s3_resource from s3_utils * test: add unit tests for s3_utils, onprem_docker_repository, and onprem_queue_endpoint_resource_delegate * style: format test files with black * refactor: use filesystem_gateway abstraction for S3 operations Address review feedback to use the filesystem_gateway abstraction layer instead of directly calling get_s3_client. This ensures on-prem S3 configuration logic is properly encapsulated in the gateway. Changes: - Add head_object, delete_object, list_objects methods to S3FilesystemGateway - Update S3FileStorageGateway to use self.filesystem_gateway for all S3 ops - Remove direct import of get_s3_client from s3_file_storage_gateway * fix: deduplicate S3 client config by using centralized s3_utils Refactor open_wrapper to use get_s3_client from s3_utils instead of duplicating the on-prem S3 configuration logic. This ensures a single source of truth for S3 client creation across the codebase. * fix: add pagination to list_objects to handle >1000 objects S3 list_objects_v2 returns max 1000 objects per request. Use paginator to iterate through all pages and return complete results. Without this fix, directories with >1000 files would silently return truncated results. * fix: make OnPremDockerRepository.get_image_url consistent with ECR/ACR Include docker_repo_prefix in image URL to match behavior of ECR and ACR implementations. Also change image_exists logging from warning to debug to reduce log noise on every deployment. Updated tests to mock infra_config and verify prefix handling. * refactor: add explicit on-prem branches in dependencies.py for clarity Add explicit elif branches for on-prem cloud provider to make it clear that S3-based gateways are intentionally used for on-prem (with MinIO configuration applied via s3_utils). This improves code readability and makes the on-prem support more discoverable. * feat: implement Redis LLEN for queue depth in OnPremQueueEndpointResourceDelegate Replace hardcoded queue depth with actual Redis LLEN call to enable proper autoscaling based on queue metrics. Falls back to 0 gracefully if Redis client is unavailable. - Add optional redis_client parameter to constructor - Implement lazy Redis client initialization - Add tests for both with and without Redis scenarios * fix: replace mutable default argument with None in _get_client Using {} as a default argument is a Python anti-pattern that can cause subtle bugs since the same dict instance is shared across calls. Use Optional[Dict] = None pattern instead. * refactor: extract inline import to module-level helper function Move the infra_config import from inside the validator to a module-level helper function _is_onprem_deployment(). This improves testability, avoids repeated import overhead on each validation call, and follows Python best practices for imports. * fix: reduce excessive debug logging in s3_utils Replace per-call debug logs with a one-time info log when S3 is configured for on-prem. This prevents log spam from debug messages firing on every S3 client creation. - Extract common on-prem config to _get_onprem_client_kwargs helper - Add _s3_config_logged flag to log endpoint only once - Add return type annotations to get_s3_client and get_s3_resource - Update tests to reset logging flag between tests * chore: remove unused TYPE_CHECKING import Clean up unused import left over from refactoring the inline import. * fix: make Dockerfile multi-arch compatible for ARM/AMD64 Use architecture detection to download the correct binaries for aws-iam-authenticator and kubectl. This enables building the image for both ARM64 (Mac M1/M2) and AMD64 (CI/production) platforms. * style: fix black formatting in test_onprem_queue_endpoint_resource_delegate * fix: restore AWS_PROFILE env var fallback in s3_utils The original code checked os.getenv('AWS_PROFILE') as a fallback when no aws_profile kwarg was provided. This was accidentally removed during refactoring, breaking S3 operations in CI where AWS_PROFILE may be set via environment variable. Restores the original behavior for AWS deployments while maintaining the new on-prem path. * fix: correct isort ordering in s3_filesystem_gateway.py * fix: use Literal type for s3 addressing_style to satisfy mypy * Onprem Compatibility Change --------- Co-authored-by: Tarun <tarun.ravikumar@scale.com>

macroscopeapp · 2026-02-10T22:22:38Z

Add on‑prem deployment support across Model Engine and extend vLLM parsing to OpenAI formats while injecting `S3_ENDPOINT_URL` into all S3 and `s5cmd` paths

Introduce on‑prem selections for queues, Docker repository, Redis, and Postgres; centralize S3 client/resource creation with S3_ENDPOINT_URL; add s5cmd endpoint flags for model downloads; and update vLLM sync/stream parsing to accept OpenAI completion formats. Tests and Helm templates are updated accordingly.

📍Where to Start

Start with on‑prem behavior wiring and S3 endpoint handling in llm_model_endpoint_use_cases and s3_utils: see CreateLLMModelBundleV1UseCase and _get_s3_endpoint_flag in llm_model_endpoint_use_cases.py, then review S3 client/resource factories in s3_utils.py. After that, scan on‑prem wiring in dependencies.py and the Docker/image URL flow in live_endpoint_builder_service.py.

📊 Macroscope summarized a8cd1ab. 32 files reviewed, 20 issues evaluated, 2 issues filtered, 1 comment posted. View details

macroscopeapp · 2026-02-10T22:26:57Z

+        if infra_config().cloud_provider == "onprem":
+            user = os.environ.get("DB_USER", "postgres")
+            password = os.environ.get("DB_PASSWORD", "postgres")
+            host = os.environ.get("DB_HOST_RO") or os.environ.get("DB_HOST", "localhost")


🟡 Medium

db/base.py:65 The onprem branch ignores the read_only parameter when selecting the host. Consider using conditional logic like the AWS branch: creds.get("clusterHostRo") if read_only else creds.get("clusterHost"). Currently, if DB_HOST_RO is set, write operations could be routed to a read-only replica.

Suggested change

host = os.environ.get("DB_HOST_RO") or os.environ.get("DB_HOST", "localhost")

host = os.environ.get("DB_HOST_RO") if read_only else os.environ.get("DB_HOST", "localhost")

🚀 Want me to fix this? Reply ex: "fix it for me".

🤖 Prompt for AI

In file model-engine/model_engine_server/db/base.py around line 65: The `onprem` branch ignores the `read_only` parameter when selecting the host. Consider using conditional logic like the AWS branch: `creds.get("clusterHostRo") if read_only else creds.get("clusterHost")`. Currently, if `DB_HOST_RO` is set, write operations could be routed to a read-only replica.

macroscopeapp Bot reviewed Feb 10, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Review] Model Engine OnPrem Support and vLLM 0.11.1 + Model Engine Integration Fixes#1

[Review] Model Engine OnPrem Support and vLLM 0.11.1 + Model Engine Integration Fixes#1
govambam wants to merge 1 commit intobase-for-pr-744from
review-pr-744

govambam commented Feb 10, 2026

Uh oh!

macroscopeapp Bot commented Feb 10, 2026 •

edited

Loading

Uh oh!

macroscopeapp Bot Feb 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	host = os.environ.get("DB_HOST_RO") or os.environ.get("DB_HOST", "localhost")
	host = os.environ.get("DB_HOST_RO") if read_only else os.environ.get("DB_HOST", "localhost")

Conversation

govambam commented Feb 10, 2026

Uh oh!

macroscopeapp Bot commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Add on‑prem deployment support across Model Engine and extend vLLM parsing to OpenAI formats while injecting S3_ENDPOINT_URL into all S3 and s5cmd paths

📍Where to Start

Uh oh!

macroscopeapp Bot Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

macroscopeapp Bot commented Feb 10, 2026 •

edited

Loading

Add on‑prem deployment support across Model Engine and extend vLLM parsing to OpenAI formats while injecting `S3_ENDPOINT_URL` into all S3 and `s5cmd` paths