Conversation
WalkthroughReworks the custom inference runtime doc to use Xinference as the standard workflow example, targets Changes
Sequence Diagram(s)(omitted — changes are documentation examples and do not introduce new multi-component runtime control flow) Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. 🧹 Recent nitpick comments
📜 Recent review detailsConfiguration used: Organization UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🧰 Additional context used🧠 Learnings (1)📚 Learning: 2025-12-31T02:30:16.360ZApplied to files:
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
🔇 Additional comments (1)
✏️ Tip: You can disable this entire section by setting Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Fix all issues with AI Agents
In
@docs/en/model_inference/inference_service/how_to/custom_inference_runtime.mdx:
- Around line 597-599: The table entry uses an annotation key with an erroneous
trailing colon; change `storage.kserve.io/readonly:` to
`storage.kserve.io/readonly` in the Markdown table so the annotation key is
valid in Kubernetes manifests and docs (update the cell under "Configuration
Key" where `storage.kserve.io/readonly:` appears). Ensure no other instances of
the key include the trailing colon.
- Around line 601-603: The "Root Access" subsection currently states MindIE must
run as root but lacks the actual configuration; update the text under the
heading to either clarify that the existing YAML example already runs as root or
add an explicit securityContext snippet showing how to run as root: set
securityContext with runAsUser: 0 and runAsNonRoot: false and, if required for
NPU access, privileged: true (apply this to the ClusterServingRuntime or
InferenceService pod/container spec), and include a short note about security
implications and when privileged is necessary.
🧹 Nitpick comments (4)
docs/en/model_inference/inference_service/how_to/custom_inference_runtime.mdx (4)
605-613: Consider relocating the comparison table earlier in the document.The comparison table currently appears after all the detailed runtime examples. For better user experience, consider moving this table to appear right after the "Prerequisites" section (around line 49) or at the beginning of the "Standard Workflow" section. This allows readers to understand the differences between runtimes before diving into specific configurations.
324-561: Consider alternative approaches for the large embedded script.The MindIE startup script is embedded as a 230+ line environment variable, with
__LT__placeholders to work around YAML parsing issues. While functional, this approach has maintainability drawbacks:
- Difficult to version control, test, and debug separately
__LT__workaround adds cognitive overhead- Changes require redeploying the entire ClusterServingRuntime
Alternative approaches to consider:
- ConfigMap: Store the script in a ConfigMap and mount it as a volume
- Image bundling: Include the script in the MindIE image itself
- Init container: Use an init container to prepare the script from a ConfigMap
These alternatives improve maintainability while preserving functionality. However, if portability and self-contained configuration are priorities, the current approach is acceptable.
318-318: Be aware of the security implications of file permissions in the MindIE script.The embedded startup script sets broad permissions (
chmod -R 755) onMODEL_PATHand modifies permissions on the MindIE installation directory. Combined with running as root (documented in section 3), this creates a more permissive security posture compared to other runtimes like MLServer, which runs as non-root user 1000.While these permissions may be necessary for NPU hardware access and the Ascend driver requirements, ensure that:
- The model storage volume doesn't contain sensitive data beyond the model files
- Network policies restrict access to the MindIE pods
- Users understand the security tradeoffs when deploying MindIE
Also applies to: 528-530
290-603: Consider consistent subsection formatting across runtime examples.The MindIE section uses numbered subsections (1.ClusterServingRuntime, 2.Mandatory Annotations, 3.User Privileges), while the Xinference and MLServer examples don't use this format. For consistency, consider either:
- Using the same numbered subsection format for all runtime examples, or
- Using unnumbered subsections or plain headings throughout
Consistent formatting improves document readability and makes it easier for users to navigate between different runtime configurations.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
docs/en/model_inference/inference_service/how_to/custom_inference_runtime.mdx
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-12-31T02:30:16.360Z
Learnt from: EdisonSu768
Repo: alauda/aml-docs PR: 73
File: docs/en/monitoring_ops/resource_monitoring/how_to/add_monitor_dashboard.mdx:28-45
Timestamp: 2025-12-31T02:30:16.360Z
Learning: In MDX documentation files (e.g., docs/.../*.mdx), when including PromQL code blocks, use bash as the syntax highlighter fallback because the rspress system does not support PromQL highlighting. Ensure the code blocks specify the language as bash (e.g., ```bash) where PromQL would appear, to maintain readability and avoid broken highlighting.
Applied to files:
docs/en/model_inference/inference_service/how_to/custom_inference_runtime.mdx
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pipelines as Code CI / doc-pr-build-ai
Deploying alauda-ai with
|
| Latest commit: |
6547a2f
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://8d5be20d.alauda-ai.pages.dev |
| Branch Preview URL: | https://add-310p-rt.alauda-ai.pages.dev |
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Fix all issues with AI Agents
In
@docs/en/model_inference/inference_service/how_to/custom_inference_runtime.mdx:
- Around line 318-320: Replace the recursive permission change (chmod -R 755
"${MODEL_PATH}") with a non-recursive change to the directory (chmod 755 on
MODEL_PATH) or remove it entirely and instead investigate/fix the storage
initializer that sets model file permissions; document the security implications
of making model files world-readable and only widen permissions when absolutely
required; also reduce noisy logging by keeping a single ls -ld "${MODEL_PATH}"
(or remove pre-change listing) rather than logging before and after for every
startup.
- Line 453: The default PORT value currently set to 8080 conflicts with the
port-range validation that enforces 1025-65535; either change the default PORT
to a value that satisfies the validator (e.g., set PORT to 1025 or 8081) or
relax the validator to include 8080 by adjusting the lower-bound check in the
port validation logic (the check enforcing 1025-65535 on lines 512-514) so that
PORT (environment variable PORT) can be accepted.
- Around line 582-589: The startupProbe's per-attempt timeout is excessively
long (timeoutSeconds: 180); change the startupProbe definition so timeoutSeconds
is reduced to a short value (e.g., 10) while keeping failureThreshold: 60 and
periodSeconds: 10 so overall retry window remains unchanged; update the
startupProbe block (the httpGet path/port/scheme and timeoutSeconds field) to
set timeoutSeconds: 10 to ensure individual probes fail fast but the probe still
retries up to failureThreshold × periodSeconds.
- Line 394: The error message string "Error: Unable to retrieve device info.
Please check if npu-smi is available for current user (id 1001), or if you are
specifying an occupied device." should be updated to remove or clarify the
hard-coded user id; replace it with a generic reference like "current user" or
interpolate the actual runtime user variable if available (e.g., runtime UID) so
it matches how the MindIE runtime runs (often root). Locate and update that
exact echo message in custom_inference_runtime.mdx to either omit "(id 1001)" or
substitute a dynamic UID placeholder and adjust the surrounding text to remain
consistent.
🧹 Nitpick comments (2)
docs/en/model_inference/inference_service/how_to/custom_inference_runtime.mdx (2)
310-312: Consider simplifying the__LT__workaround.The current approach uses
__LT__as a placeholder for<in the embedded script, which is then replaced at runtime. While functional, this adds cognitive overhead and an extra processing step.Alternative approaches to consider:
- Use YAML multiline literal (
|) with proper escaping – the current approach already uses|, so shell comparison operators like<should work directly in most cases- Use arithmetic comparison syntax that doesn't require
<:if [ "$PORT" -lt 1025 ] || [ "$PORT" -gt 65535 ]; then- Test if the current
<actually causes YAML parsing issues – modern YAML parsers handle<in literal blocks without problemsIf you choose to keep the current approach, ensure it's documented with a comment explaining why the workaround is necessary.
Also applies to: 512-513
310-561: Consider externalizing the MindIE startup script.The embedded bash script spans ~235 lines (lines 326-561), making the YAML configuration difficult to read and maintain. While functionally correct, this approach has several drawbacks:
- Harder to review and test the script independently
- YAML syntax highlighting doesn't help with bash code
- Version control diffs become less readable
- The
__LT__workaround adds complexityConsider these alternatives:
- Ship the script as part of the container image at a known path like
/opt/mindie/startup.sh, then reference it directly- Use a ConfigMap to store the script and mount it into the container
- Keep the current approach if dynamic script generation from the YAML is a hard requirement
Note: This is a maintainability suggestion, not a functional issue. The current implementation works correctly.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
docs/en/model_inference/inference_service/how_to/custom_inference_runtime.mdx
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-12-31T02:30:16.360Z
Learnt from: EdisonSu768
Repo: alauda/aml-docs PR: 73
File: docs/en/monitoring_ops/resource_monitoring/how_to/add_monitor_dashboard.mdx:28-45
Timestamp: 2025-12-31T02:30:16.360Z
Learning: In MDX documentation files (e.g., docs/.../*.mdx), when including PromQL code blocks, use bash as the syntax highlighter fallback because the rspress system does not support PromQL highlighting. Ensure the code blocks specify the language as bash (e.g., ```bash) where PromQL would appear, to maintain readability and avoid broken highlighting.
Applied to files:
docs/en/model_inference/inference_service/how_to/custom_inference_runtime.mdx
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pipelines as Code CI / doc-pr-build-ai
🔇 Additional comments (5)
docs/en/model_inference/inference_service/how_to/custom_inference_runtime.mdx (5)
50-53: LGTM! Clear section restructuring.The renaming of the section to "Standard Workflow (Example: Xinference)" effectively establishes Xinference as the baseline reference implementation while making it clear that the workflow applies more broadly.
201-283: LGTM! Well-structured MLServer runtime example.The MLServer configuration demonstrates best practices with proper security context (non-root user), startup probes, and flexible model directory handling for different storage initializers.
609-613: LGTM! Root access requirements clearly documented.The explanation effectively clarifies that MindIE requires root privileges by contrasting it with MLServer's explicit non-root security context. The note that the absence of a
securityContextin the YAML means the container runs with image defaults (typically root) is helpful.
615-623: LGTM! Helpful comparison table.The comparison table provides a clear, actionable summary of the key differences between runtimes, making it easy for users to understand special requirements at a glance before diving into detailed configurations.
601-608: No action required—the annotation documentation is complete.The section correctly identifies
storage.kserve.io/readonlyas the required InferenceService metadata annotation for NPU binding. A codebase search for other Ascend/NPU-related annotations found no additional requirements for InferenceService metadata. Theaml-model-repoandaml-pipeline-tagannotations referenced in the runtime templates are custom application-level annotations, not NPU-specific binding requirements.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In
`@docs/en/model_inference/inference_service/how_to/custom_inference_runtime.mdx`:
- Around line 619-623: The MLServer row in the comparison table is inconsistent
with the MLServer YAML's supportedModelFormats; update the table entry for
MLServer (the table row containing "MLServer") so its Supported Frameworks cell
matches the YAML's supportedModelFormats (use "mlflow, transformers"), or
alternatively add "sklearn" and "xgboost" into the YAML's supportedModelFormats
if you intend to advertise those; ensure you update the table cell or the
supportedModelFormats array (symbol: supportedModelFormats) so both sources list
the same frameworks.
♻️ Duplicate comments (3)
docs/en/model_inference/inference_service/how_to/custom_inference_runtime.mdx (3)
317-320: Reconsider recursive chmod on model directory.This concern was raised in a previous review. Executing
chmod -R 755on the entire model directory could be slow for large models and creates security concerns by making all files world-readable/executable.Consider changing to non-recursive
chmod 755 "${MODEL_PATH}"(directory only) or documenting why recursive permissions are necessary.
394-396: Clarify user ID reference.This was flagged in a previous review. The error message mentions "user (id 1001)" but MindIE runs as root (user 0). Update the message to be generic or accurate:
- echo "Error: Unable to retrieve device info. Please check if npu-smi is available for current user (id 1001), or if you are specifying an occupied device." + echo "Error: Unable to retrieve device info. Please check if npu-smi is available for the current user, or if you are specifying an occupied device."
582-589: Reduce excessive startup probe timeout.This was flagged in a previous review.
timeoutSeconds: 180(3 minutes) per probe attempt is excessive and can mask unresponsive containers. Startup probes should fail fast per attempt, relying onfailureThresholdfor overall patience.startupProbe: failureThreshold: 60 httpGet: path: /v1/models port: 8080 scheme: HTTP periodSeconds: 10 - timeoutSeconds: 180 + timeoutSeconds: 10With
failureThreshold: 60andperiodSeconds: 10, the probe will retry for up to 10 minutes total, which should be sufficient for MindIE's initialization.
🧹 Nitpick comments (1)
docs/en/model_inference/inference_service/how_to/custom_inference_runtime.mdx (1)
209-212: Remove auto-generated metadata field from template example.The
creationTimestampfield is auto-generated by Kubernetes when a resource is created and should not be included in example YAML templates. Including it may confuse users who copy this template verbatim.🔧 Suggested fix
metadata: annotations: cpaas.io/display-name: mlserver-cuda11.6-x86-arm - creationTimestamp: 2026-01-05T07:02:33Z generation: 1Also consider removing
generation: 1as it's similarly auto-managed by Kubernetes.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
docs/en/model_inference/inference_service/how_to/custom_inference_runtime.mdx
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-12-31T02:30:16.360Z
Learnt from: EdisonSu768
Repo: alauda/aml-docs PR: 73
File: docs/en/monitoring_ops/resource_monitoring/how_to/add_monitor_dashboard.mdx:28-45
Timestamp: 2025-12-31T02:30:16.360Z
Learning: In MDX documentation files (e.g., docs/.../*.mdx), when including PromQL code blocks, use bash as the syntax highlighter fallback because the rspress system does not support PromQL highlighting. Ensure the code blocks specify the language as bash (e.g., ```bash) where PromQL would appear, to maintain readability and avoid broken highlighting.
Applied to files:
docs/en/model_inference/inference_service/how_to/custom_inference_runtime.mdx
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pipelines as Code CI / doc-pr-build-ai
🔇 Additional comments (1)
docs/en/model_inference/inference_service/how_to/custom_inference_runtime.mdx (1)
601-614: LGTM!The mandatory annotations table and root privileges explanation are now clear. The note comparing MindIE's default root configuration with MLServer's explicit non-root settings provides helpful context for users.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
| | Runtime | Target Hardware | Supported Frameworks | Special Requirements | | ||
| | :--- | :--- | :--- | :--- | | ||
| | **Xinference** | CPU / NVIDIA GPU | transformers, pytorch | **Must** set `MODEL_FAMILY` environment variable | | ||
| | **MLServer** | CPU / NVIDIA GPU | sklearn, xgboost, mlflow | Standard configuration | | ||
| | **MindIE** | Huawei Ascend NPU | mindspore, transformers | **Must** add NPU required Annotations to InferenceService | |
There was a problem hiding this comment.
MLServer supported frameworks mismatch with YAML example.
The comparison table lists MLServer as supporting "sklearn, xgboost, mlflow", but the MLServer YAML example (lines 277-281) defines supportedModelFormats as only "mlflow" and "transformers". This inconsistency could mislead users.
Update the table to match the actual YAML configuration, or adjust the YAML to include all advertised formats.
🔧 Suggested fix (align table with YAML)
| Runtime | Target Hardware | Supported Frameworks | Special Requirements |
| :--- | :--- | :--- | :--- |
| **Xinference** | CPU / NVIDIA GPU | transformers, pytorch | **Must** set `MODEL_FAMILY` environment variable |
-| **MLServer** | CPU / NVIDIA GPU | sklearn, xgboost, mlflow | Standard configuration |
+| **MLServer** | CPU / NVIDIA GPU | mlflow, transformers | Standard configuration |
| **MindIE** | Huawei Ascend NPU | mindspore, transformers | **Must** add NPU required Annotations to InferenceService |🤖 Prompt for AI Agents
In
`@docs/en/model_inference/inference_service/how_to/custom_inference_runtime.mdx`
around lines 619 - 623, The MLServer row in the comparison table is inconsistent
with the MLServer YAML's supportedModelFormats; update the table entry for
MLServer (the table row containing "MLServer") so its Supported Frameworks cell
matches the YAML's supportedModelFormats (use "mlflow, transformers"), or
alternatively add "sklearn" and "xgboost" into the YAML's supportedModelFormats
if you intend to advertise those; ensure you update the table cell or the
supportedModelFormats array (symbol: supportedModelFormats) so both sources list
the same frameworks.
696a66f to
6547a2f
Compare
* Add ascend 310p runtime * take advice * Fix runtime name
* Add ascend 310p runtime (#78) * Add ascend 310p runtime * take advice * Fix runtime name * fix upgrade version and prepare * Fix version * fix upgrade version and prepare
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.