Skip to content

Comments

Add SGLang Ray inference example#40

Open
xyuzh wants to merge 5 commits intomainfrom
sglang_ray
Open

Add SGLang Ray inference example#40
xyuzh wants to merge 5 commits intomainfrom
sglang_ray

Conversation

@xyuzh
Copy link
Contributor

@xyuzh xyuzh commented Feb 11, 2026

Add offline and online inference drivers with Dockerfile and Anyscale job configs for running SGLang on Ray.

xyuzh and others added 5 commits February 10, 2026 19:02
Add offline and online inference drivers with Dockerfile and Anyscale job configs for running SGLang on Ray.
…stness

- Dockerfile: use sglang[all]==0.5.8 + sgl-kernel==0.3.21 instead of git fork
- Drivers: add logging, named placement groups, exit codes, better error handling
- Job configs: add NCCL_DEBUG, fix submit path comment
- README: add How It Works, Troubleshooting, local run examples

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Rename sglang_ray_inference -> sglang_inference
- Batch inference (job.yaml + driver_offline.py) fully working with
  multi-node TP=4, PP=2 using SGLang's use_ray=True mode
- Ray Serve deployment (service.yaml + serve.py) uses same pattern as
  official Ray LLM SGLang integration with signal monkey-patching
- Add query.py script for testing the service
- Simplify configuration with environment variables

The serving example is still being validated with multi-replica
autoscaling. Single replica works; investigating occasional timeouts
with multiple replicas.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Signed-off-by: Robert Nishihara <rkn@anyscale.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants