Skip to content

duj12/Fun-ASR-vllm

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

89 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Fun-ASR vLLM Acceleration

This repository provides an accelerated implementation of Fun-ASR using vLLM. By leveraging vLLM's efficient attention mechanisms and memory management, this project significantly boosts the inference performance of Fun-ASR models while maintaining accuracy.

Environment Setup 🐍

To get started, clone the repository and install the required dependencies:

git clone https://github.com/yuekaizhang/Fun-ASR-vllm.git
cd Fun-ASR-vllm
apt-get install -y ffmpeg
uv pip install -r requirements.txt

Features 📝

  • Support VLLM
  • Support batch > 1 Inference
  • Support sensevoice encoder acceleration
  • Integration with Nvidia Triton Inference Server

Usage 🛠️

Python API Inference

You can run inference directly using the Python API:

from model import FunASRNano
from vllm import LLM, SamplingParams

def main():
    model_dir = "FunAudioLLM/Fun-ASR-Nano-2512"
    # Load the base model
    m, kwargs = FunASRNano.from_pretrained(model=model_dir, device="cuda:0")
    m.eval()
    
    # Initialize vLLM
    vllm = LLM(model="yuekai/Fun-ASR-Nano-2512-vllm", enable_prompt_embeds=True, gpu_memory_utilization=0.4)
    sampling_params = SamplingParams(
        top_p=0.001,
        max_tokens=500,
    )
    
    # Attach vLLM to the model
    m.vllm = vllm
    m.vllm_sampling_params = sampling_params

    # Run inference
    wav_path = f"{kwargs['model_path']}/example/zh.mp3"
    res = m.inference(data_in=[wav_path], **kwargs)
    print(res)
    text = res[0][0]["text"]
    print(text)


if __name__ == "__main__":
    main()

Running Benchmarks

To evaluate performance on a dataset (e.g., SpeechIO):

dataset_name="yuekai/speechio"
subset_name="SPEECHIO_ASR_ZH00007"
split_name="test"

uv run python \
    infer.py \
    --model_dir FunAudioLLM/Fun-ASR-Nano-2512 \
    --huggingface_dataset $dataset_name \
    --subset_name $subset_name \
    --split_name $split_name \
    --batch_size 16 \
    --log_dir ./logs_vllm_$dataset_name_$subset_name \
    --vllm_model_dir yuekai/Fun-ASR-Nano-2512-vllm

Performance 🚀

We compared the performance of the standard HuggingFace PyTorch implementation against our vLLM-accelerated version.

Benchmark Details:

Mode Decoding Time RTF RTFx CER Note
Huggingface PyTorch 218.2 Secs 0.06 16.5 7.02% batch_size=1
Huggingface PyTorch 45.4 Secs 0.013 79.3 8.53% batch_size=16
vLLM (Qwen3-0.6B) 145.6 Secs 0.04 24.7 6.99% batch_size=1
vLLM (Qwen3-0.6B) 26.3 Secs 0.007 136.9 7.03% batch_size=16

Note: RTF (Real Time Factor) - lower is better; RTFx (Speedup factor) - higher is better.

About

Fun-ASR is an end-to-end speech recognition large model launched by Tongyi Lab.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 88.6%
  • Shell 7.6%
  • Perl 3.8%