Skip to content

Conversation

@mkaic
Copy link
Contributor

@mkaic mkaic commented Dec 23, 2025

Description

Motivation: that we can enable only the smaller model sizes on serverless!

Currently there's already an inference implementation and workflow block for Perception Encoder, but to the best of my knowledge they're disabled on serverless because the largest model size we support is big enough that it could cause problems (correct me if I'm wrong here).

This PR makes it possible to only enable the two smaller checkpoints for Perception Encoder on serverless using a new environment variable PERCEPTION_ENCODER_DISALLOWED_VERSION_IDS, which is referenced in the PerceptionEncoderInferenceRequest class's pydantic validators.

By setting CORE_MODEL_PE_ENABLED to True and PERCEPTION_ENCODER_DISALLOWED_VERSION_IDS to "PE-Core-G14-448" in roboflow-infra/gcp/serverless-inference/appstack/chart/rf-svrls/values-staging.yaml, serverless will start PE with the exception of the 9.1GB g14-448 variant.

Type of change

  • New feature (non-breaking change which adds functionality)

How has this change been tested, please provide a testcase or example of how you tested the change?

Tested in workflow running on localhosted inference server. Verified that it works properly with 0, 1, and 2 different version IDs in the environment variable. Verified that the user sees the intended error message if they try to run an unsupported model variant.

Any specific deployment considerations

N/A

Docs

N/A

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants