Allow for specific perception encoder version IDs to be disallowed #1845
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Motivation: that we can enable only the smaller model sizes on serverless!
Currently there's already an
inferenceimplementation and workflow block for Perception Encoder, but to the best of my knowledge they're disabled on serverless because the largest model size we support is big enough that it could cause problems (correct me if I'm wrong here).This PR makes it possible to only enable the two smaller checkpoints for Perception Encoder on serverless using a new environment variable
PERCEPTION_ENCODER_DISALLOWED_VERSION_IDS, which is referenced in thePerceptionEncoderInferenceRequestclass'spydanticvalidators.By setting
CORE_MODEL_PE_ENABLEDtoTrueandPERCEPTION_ENCODER_DISALLOWED_VERSION_IDSto"PE-Core-G14-448"inroboflow-infra/gcp/serverless-inference/appstack/chart/rf-svrls/values-staging.yaml, serverless will start PE with the exception of the 9.1GB g14-448 variant.Type of change
How has this change been tested, please provide a testcase or example of how you tested the change?
Tested in workflow running on localhosted
inferenceserver. Verified that it works properly with 0, 1, and 2 different version IDs in the environment variable. Verified that the user sees the intended error message if they try to run an unsupported model variant.Any specific deployment considerations
N/A
Docs
N/A