Refactor turbomind engine #4223

lzhangzz · 2025-12-19T07:58:57Z

No description provided.

Copilot

Pull request overview

This PR performs a major refactoring of the turbomind engine architecture with the following key changes:

Replaces LlamaTritonModel with a new TurboMind class providing a cleaner API
Removes the old batch processing implementation (LlamaBatch, LlamaV2)
Introduces new model abstractions: LanguageModel, InputProcessor, and OutputProcessor to better separate concerns
Updates RequestMetrics fields to use atomic operations for thread-safe access
Consolidates model-related code into a unified models CMake target

Reviewed changes

Copilot reviewed 102 out of 102 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
src/turbomind/utils/metrics.h	Changed metric fields to atomic types and fixed typo in field name
src/turbomind/turbomind.h/cc	New TurboMind class interface replacing LlamaTritonModel
src/turbomind/triton_backend/llama/*	Removed old triton backend files
src/turbomind/python/bind.cpp	Updated Python bindings to use new TurboMind class
src/turbomind/models/language_model.*	New LanguageModel abstraction for inference
src/turbomind/models/input_processor.*	New component for handling input processing
src/turbomind/models/output_processor.*	New component for handling output processing
src/turbomind/models/llama/unified_decoder.*	Updated to work with new architecture
src/turbomind/models/llama/unified_attention_layer.*	Refactored attention layer implementation
src/turbomind/models/llama/llama_utils.cu	Changed isTuning() from thread_local to static
src/turbomind/layers/sampling_layers/*	Removed old sampling layer files
src/turbomind/kernels/sampling_kernels.h	Changed sampled_indexes/nums types from uint32_t to int

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-12-26T13:00:04Z

src/turbomind/utils/metrics.h

 struct RequestMetrics {
-    int64_t enque_time{};      // when a request is enqued
-    int64_t scheduled_time{};  // when a request is scheduled for inference
+    std::atomic<int64_t> enqueue_time{};    // when a request is enqued


Spelling error in comment: "enqued" should be "enqueued".

Copilot · 2025-12-26T13:00:05Z

src/turbomind/utils/metrics.h

    os << "RequestMetrics { ";
-    os << "enque_time=" << m.enque_time;
-    os << ", scheduled_time=" << m.scheduled_time;
+    os << "enque_time=" << m.enqueue_time.load(std::memory_order_relaxed);


Inconsistent naming: the output stream operator still uses the old field name "enque_time" in the message, but it should be "enqueue_time" to match the renamed field.

Copilot · 2025-12-26T13:00:05Z

src/turbomind/models/llama/llama_utils.cu

 bool& isTuning()
 {
-    thread_local bool value{};
+    static bool value{};


The isTuning() function changed from thread_local to static, which removes thread safety. If this function is accessed from multiple threads simultaneously in the new architecture, this could cause race conditions. Consider whether thread-local storage is still needed or if synchronization is required.

refactor turbomind engine

d19d92d

lvhan028 added the improvement label Dec 19, 2025

lzhangzz added 7 commits December 22, 2025 07:58

simplify interface

a7ab1e5

minor

e134666

metrics

af46169

refactor & logprobs

bedc618

fix output logits

3d558e0

fix logprobs

9cc59f8

rename

66be4db

windreamer mentioned this pull request Dec 24, 2025

fix: Fix Guided Decoding Crashes and State Corruption Issues #4167

Open

lzhangzz added 2 commits December 25, 2025 06:00

mrope

4c7c22a

Merge remote-tracking branch 'origin/main' into engine2a

08a0b89

lvhan028 requested a review from Copilot December 26, 2025 12:55

Copilot started reviewing on behalf of lvhan028 December 26, 2025 12:55 View session

Copilot AI reviewed Dec 26, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Refactor turbomind engine #4223

Refactor turbomind engine #4223

lzhangzz commented Dec 19, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Dec 26, 2025

Uh oh!

Copilot AI Dec 26, 2025

Uh oh!

Copilot AI Dec 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Refactor turbomind engine #4223

Are you sure you want to change the base?

Refactor turbomind engine #4223

Conversation

lzhangzz commented Dec 19, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Dec 26, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 26, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 26, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants