feat: auto-load HF ONNX artifacts on CPU#402
Open
aidamian wants to merge 8 commits into
Open
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d7cba8f217
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
What changed: - Make auto ONNX startup opportunistic and fall back to Transformers/PT on ONNX init or warmup failure. - Keep explicit ONNX runtimes fail-fast while explicit PT skips manifest lookup. - Gate decoder and tokenizer remote code on global and runtime trust flags. - Confine manifest-declared artifact paths to the downloaded HF snapshot and filter broad/framework-weight allow patterns. - Forward runtime metadata consistently for privacy-filter responses and add focused regression coverage. Why: - Preserve seamless CPU ONNX when available without breaking Transformers fallback or weakening remote-code/path safety.
What changed: - Require selected ONNX runtime config trust_remote_code=True before executing artifact decoder or tokenizer remote code. - Add regression coverage proving a top-level manifest trust flag cannot enable runtime code execution by itself. Why: - Avoid remote-code trust bypasses from broad manifest metadata; the selected runtime must explicitly opt in.
What changed: - Added subclass ONNX fallback hooks in the HF serving base. - Added local privacy-filter ONNX discovery and BIOES/Viterbi span decoding. - Covered fallback runtime selection and privacy-filter decoder behavior with tests. Why: - Allow openai/privacy-filter ONNX artifacts to run without a remote artifact manifest or remote Python decoder code.
What changed: - Keep HF artifact path traversal checks lexical so valid snapshot symlinks into the cache blob store are accepted. - Merge exact manifest files with recommended ONNX allow patterns after filtering broad or framework-weight downloads. - Add regression coverage for both behaviors. Why: - Live PR image validation showed Sentinel and privacy-filter ONNX startup falling back because valid HF snapshot files were rejected as escaping the snapshot.
What changed: - Temporarily allow ONNX artifact decoders without runtime-level trust_remote_code to inherit global TRUST_REMOTE_CODE=True. - Keep explicit runtime trust_remote_code=False as a hard block. - Add a TODO documenting the security concern and declarative decoder replacement path. Why: - The current Sentinel ONNX artifact predates runtime-level trust metadata and uses a reviewed contract decoder, so it needs a compatibility path until the artifact moves to declarative decoding.
What changed: - Split ONNX remote-code trust between tokenizer/model loading and decoder execution. - Keep tokenizer/model loading tied to runtime-level trust_remote_code. - Temporarily allow Python decoder execution when global TRUST_REMOTE_CODE=True, even for legacy runtimes that mark ONNX trust_remote_code=False. Why: - Current Sentinel ONNX artifacts use trust_remote_code=False for tokenizer/model loading but still declare a Python contract decoder. This keeps the temporary compatibility path narrow until declarative decoding replaces it.
What changed: - Prepare HF ONNX artifacts in an edge-node-owned materialized cache before creating ONNX Runtime sessions. - Hardlink resolved HF cache blobs when possible and copy as fallback. - Preserve runtime relative layout for .onnx and external data sidecars. - Add regression coverage for symlinked external data files. Why: - ONNX Runtime rejects HF snapshot symlinks for external data because resolved sidecars can escape the model directory.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.