Skip to content

cld2labs/Qwen3-8b#91

Open
arpannookala-12 wants to merge 2 commits into
opea-project:mainfrom
cld2labs:cld2labs/Qwen3-8b
Open

cld2labs/Qwen3-8b#91
arpannookala-12 wants to merge 2 commits into
opea-project:mainfrom
cld2labs:cld2labs/Qwen3-8b

Conversation

@arpannookala-12
Copy link
Copy Markdown
Contributor

@arpannookala-12 arpannookala-12 commented Apr 21, 2026

Summary

  • Adds model card for Qwen3-8B (Alibaba Cloud / Qwen Team) under third_party/Dell/model-deployment/Qwen3-8b/
  • Adds Helm-based deployment guide for deploying Qwen3-8B via vLLM on CPU (Xeon) with Keycloak OIDC and APISIX ingress

Signed-off-by: arpannookala-12 <ganesh.arpan.nookala@cloud2labs.com>
@arpannookala-12 arpannookala-12 changed the title feat: Add Qwen3-8B model card and deployment guide cld2labs/Qwen3-8b Apr 21, 2026
@alexsin368 alexsin368 self-requested a review April 29, 2026 21:48
@alexsin368
Copy link
Copy Markdown
Collaborator

Model deployed successfully. Getting Gateway Time-out error when testing inference.

vLLM pod has no issues. Ingress-nginx-controller saying upstream timed out:

2026/05/18 22:53:31 [error] 153281#153281: *6540963 upstream timed out (110: Operation timed out) while reading response header from upstream, client: 172.17.23.1, server: api.example.com, request: "POST /Qwen3-8B-vllmcpu/v1/completions HTTP/2.0", upstream: "http://10.233.104.80:9080/Qwen3-8B-vllmcpu/v1/completions", host: "api.example.com"
172.17.23.1 - - [18/May/2026:22:53:31 +0000] "POST /Qwen3-8B-vllmcpu/v1/completions HTTP/2.0" 504 160 "-" "curl/7.81.0" 1204 60.001 [auth-apisix-auth-apisix-gateway-80] [] 10.233.104.80:9080 0 60.001 504 b2fca060a21ad0bc39ba893529293cbd

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants