[Review] Model Engine OnPrem Support and vLLM 0.11.1 + Model Engine Integration Fixes#1
[Review] Model Engine OnPrem Support and vLLM 0.11.1 + Model Engine Integration Fixes#1govambam wants to merge 1 commit intobase-for-pr-744from
Conversation
…n Fixes (scaleapi#744) * add support for on-prem * clean up on-prem artificats * add back comments from initial code * fix lint * use ecr image repo:tag directly * fix: isort import ordering * fix: remove unused infra_config import * fix: mypy type annotation errors * fix: remove type annotation causing mypy no-redef error * fix: mypy type errors in s3_utils.py and io.py - use botocore.config.Config * fix: mypy typeddict-item errors - use broad type ignore * fix: update test mocks to use get_s3_resource from s3_utils * test: add unit tests for s3_utils, onprem_docker_repository, and onprem_queue_endpoint_resource_delegate * style: format test files with black * refactor: use filesystem_gateway abstraction for S3 operations Address review feedback to use the filesystem_gateway abstraction layer instead of directly calling get_s3_client. This ensures on-prem S3 configuration logic is properly encapsulated in the gateway. Changes: - Add head_object, delete_object, list_objects methods to S3FilesystemGateway - Update S3FileStorageGateway to use self.filesystem_gateway for all S3 ops - Remove direct import of get_s3_client from s3_file_storage_gateway * fix: deduplicate S3 client config by using centralized s3_utils Refactor open_wrapper to use get_s3_client from s3_utils instead of duplicating the on-prem S3 configuration logic. This ensures a single source of truth for S3 client creation across the codebase. * fix: add pagination to list_objects to handle >1000 objects S3 list_objects_v2 returns max 1000 objects per request. Use paginator to iterate through all pages and return complete results. Without this fix, directories with >1000 files would silently return truncated results. * fix: make OnPremDockerRepository.get_image_url consistent with ECR/ACR Include docker_repo_prefix in image URL to match behavior of ECR and ACR implementations. Also change image_exists logging from warning to debug to reduce log noise on every deployment. Updated tests to mock infra_config and verify prefix handling. * refactor: add explicit on-prem branches in dependencies.py for clarity Add explicit elif branches for on-prem cloud provider to make it clear that S3-based gateways are intentionally used for on-prem (with MinIO configuration applied via s3_utils). This improves code readability and makes the on-prem support more discoverable. * feat: implement Redis LLEN for queue depth in OnPremQueueEndpointResourceDelegate Replace hardcoded queue depth with actual Redis LLEN call to enable proper autoscaling based on queue metrics. Falls back to 0 gracefully if Redis client is unavailable. - Add optional redis_client parameter to constructor - Implement lazy Redis client initialization - Add tests for both with and without Redis scenarios * fix: replace mutable default argument with None in _get_client Using {} as a default argument is a Python anti-pattern that can cause subtle bugs since the same dict instance is shared across calls. Use Optional[Dict] = None pattern instead. * refactor: extract inline import to module-level helper function Move the infra_config import from inside the validator to a module-level helper function _is_onprem_deployment(). This improves testability, avoids repeated import overhead on each validation call, and follows Python best practices for imports. * fix: reduce excessive debug logging in s3_utils Replace per-call debug logs with a one-time info log when S3 is configured for on-prem. This prevents log spam from debug messages firing on every S3 client creation. - Extract common on-prem config to _get_onprem_client_kwargs helper - Add _s3_config_logged flag to log endpoint only once - Add return type annotations to get_s3_client and get_s3_resource - Update tests to reset logging flag between tests * chore: remove unused TYPE_CHECKING import Clean up unused import left over from refactoring the inline import. * fix: make Dockerfile multi-arch compatible for ARM/AMD64 Use architecture detection to download the correct binaries for aws-iam-authenticator and kubectl. This enables building the image for both ARM64 (Mac M1/M2) and AMD64 (CI/production) platforms. * style: fix black formatting in test_onprem_queue_endpoint_resource_delegate * fix: restore AWS_PROFILE env var fallback in s3_utils The original code checked os.getenv('AWS_PROFILE') as a fallback when no aws_profile kwarg was provided. This was accidentally removed during refactoring, breaking S3 operations in CI where AWS_PROFILE may be set via environment variable. Restores the original behavior for AWS deployments while maintaining the new on-prem path. * fix: correct isort ordering in s3_filesystem_gateway.py * fix: use Literal type for s3 addressing_style to satisfy mypy * Onprem Compatibility Change --------- Co-authored-by: Tarun <tarun.ravikumar@scale.com>
Add on‑prem deployment support across Model Engine and extend vLLM parsing to OpenAI formats while injecting
|
| if infra_config().cloud_provider == "onprem": | ||
| user = os.environ.get("DB_USER", "postgres") | ||
| password = os.environ.get("DB_PASSWORD", "postgres") | ||
| host = os.environ.get("DB_HOST_RO") or os.environ.get("DB_HOST", "localhost") |
There was a problem hiding this comment.
🟡 Medium
db/base.py:65 The onprem branch ignores the read_only parameter when selecting the host. Consider using conditional logic like the AWS branch: creds.get("clusterHostRo") if read_only else creds.get("clusterHost"). Currently, if DB_HOST_RO is set, write operations could be routed to a read-only replica.
| host = os.environ.get("DB_HOST_RO") or os.environ.get("DB_HOST", "localhost") | |
| host = os.environ.get("DB_HOST_RO") if read_only else os.environ.get("DB_HOST", "localhost") |
🚀 Want me to fix this? Reply ex: "fix it for me".
🤖 Prompt for AI
In file model-engine/model_engine_server/db/base.py around line 65:
The `onprem` branch ignores the `read_only` parameter when selecting the host. Consider using conditional logic like the AWS branch: `creds.get("clusterHostRo") if read_only else creds.get("clusterHost")`. Currently, if `DB_HOST_RO` is set, write operations could be routed to a read-only replica.
Recreated from scaleapi#744 for Macroscope review.
Original PR: scaleapi#744 by @charlesahn-scale
Status: closed (merged)
Recreated using squash merge commit - exact merged state preserved.
Original PR: scaleapi#744