Skip to content

expert distributions#3709

Closed
CUHKSZzxy wants to merge 11 commits into
InternLM:mainfrom
CUHKSZzxy:expert-distribution
Closed

expert distributions#3709
CUHKSZzxy wants to merge 11 commits into
InternLM:mainfrom
CUHKSZzxy:expert-distribution

Conversation

@CUHKSZzxy
Copy link
Copy Markdown
Collaborator

@CUHKSZzxy CUHKSZzxy commented Jul 4, 2025

Recorder adapted from dlBLAS, endpoint refers to SGLang.

Usage

  1. Start the server with the env variables
LMDEPLOY_DUMP_EXPERT_DISTRIBUTION=1 \
LMDEPLOY_EXPERT_DUMP_DIR="./path_to_save_dump" \
LMDEPLOY_EXPERT_DUMP_VISUALIZE=1 \
lmdeploy serve api_server Qwen/Qwen3-30B-A3B --backend pytorch --tp 2 --log-level INFO
  • LMDEPLOY_DUMP_EXPERT_DISTRIBUTION to enable the expert distribution feature.
  • LMDEPLOY_EXPERT_DUMP_DIR specifies the path to save dump files.
  • LMDEPLOY_EXPERT_DUMP_VISUALIZE turns on visualization, will produce heatmap figs under dump dir.
  1. Start / dump / stop
curl -X POST http://localhost:23333/start_expert_distribution_record
curl -X POST http://localhost:23333/dump_expert_distribution_record
curl -X POST http://localhost:23333/stop_expert_distribution_record

Result

Data: sharegpt, prompts 1000, random in 1024, random out 1024.

  • Model: Qwen3-30B-A3B, TP=2.
rank0_step375_expert_counts_heatmap
  • Model: Qwen3.5-35B-A3B, TP=2.
rank0_step139_expert_counts_heatmap

@CUHKSZzxy CUHKSZzxy marked this pull request as ready for review July 4, 2025 04:16
Copilot AI review requested due to automatic review settings March 30, 2026 12:54
@CUHKSZzxy CUHKSZzxy marked this pull request as draft March 30, 2026 12:55
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an MoE expert-dispatch distribution recorder to the PyTorch backend, enabling periodic dumps of routed expert counts to JSON for debugging/analysis (intended for eager-mode runs).

Changes:

  • Introduce ExpertsDistributionRecorder utility (real vs no-op based on env flags) that aggregates and dumps expert token counts.
  • Hook the recorder into Qwen3 MoE and DeepSeek V2 MoE forward passes to record topk_ids.
  • Add new environment variables for enabling/disabling and configuring dump output.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.

File Description
lmdeploy/pytorch/models/utils/expert_distribution_recorder.py New recorder implementation that tracks expert token counts and periodically dumps JSON.
lmdeploy/pytorch/models/qwen3_moe.py Records MoE router topk_ids per forward for Qwen3 MoE blocks.
lmdeploy/pytorch/models/deepseek_v2.py Records MoE router topk_ids per forward for DeepSeek V2 MoE blocks; stores layer_idx.
lmdeploy/pytorch/envs.py Adds env configuration knobs for expert distribution dumping.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread lmdeploy/pytorch/models/utils/expert_distribution_recorder.py Outdated
Comment thread lmdeploy/pytorch/models/utils/expert_distribution_recorder.py Outdated
Comment thread lmdeploy/pytorch/models/utils/expert_distribution_recorder.py Outdated
Comment thread lmdeploy/pytorch/models/utils/expert_distribution_recorder.py Outdated
Comment thread lmdeploy/pytorch/models/utils/expert_distribution_recorder.py Outdated
CUHKSZzxy and others added 7 commits March 30, 2026 21:27
- Sort by (layer_index, num_experts) for stable JSON output
- Use topk_ids.device instead of hard-coded 'cuda'
- Use reshape(-1) instead of view(-1) for non-contiguous safety
- Move all_reduce inside dump block to avoid per-step sync overhead
- Replace minute-modulo dump guard with absolute timestamp; validate dump_frequency >= 1

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@CUHKSZzxy CUHKSZzxy marked this pull request as ready for review April 1, 2026 06:27
@CUHKSZzxy CUHKSZzxy closed this Apr 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants