一、环境准备

表 1 版本配套表

配套	版本	环境准备指导
CANN	8.2.RC1	-
Python	3.10.12	-
torch	2.8.0	-
torch_npu	2.8.0rc1	-

1. 安装vllm

MiDashengLM-7B需要使用vllm 0.10.2以上版本。

执行如下命令：

pip config set global.extra-index-url "https://download.pytorch.org/whl/cpu/ https://mirrors.huaweicloud.com/ascend/repos/pypi"
pip install vllm==0.10.2
pip install vllm-ascend==0.10.2rc1
pip install vllm[audio]

2. 安装依赖库

pip install torchaudio==2.8.0

二、修改代码

MiDashengLM模型代码及其使用的Torch Audio未完全适配昇腾，故需要对代码做一定修改。

1. midashenglm.py

初始化window_fn的数据类型为torch.float32。

/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/midashenglm.py：339

  def _init_front_end(self, config):
        with set_default_torch_dtype(torch.float32):
            window_fn = lambda win_len: torch.hann_window(win_len, dtype=torch.float32)
            self.front_end = nn.Sequential(
                audio_transforms.MelSpectrogram(
                    f_min=config.f_min,
                    f_max=config.f_max,
                    center=config.center,
                    win_length=config.win_length,
                    hop_length=config.hop_length,
                    sample_rate=config.sample_rate,
                    n_fft=config.n_fft,
                    n_mels=config.n_mels,
                    window_fn=window_fn,
                ),
                audio_transforms.AmplitudeToDB(top_db=120),
            )

2. midashenglm.py

将x的类型转为torch.float32，因torch.stft不支持DT_BFLOAT16。

/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/midashenglm.py

    def forward(
        self,
        x: torch.Tensor,
        x_length: Optional[torch.Tensor] = None,
    ) -> tuple[torch.Tensor, Optional[torch.Tensor]]:
        x = x.to(torch.float32)
        x = self.front_end(x)
        x = x.to(self.time_pos_embed.dtype)
        target_length_in_patches = self.target_length // 4
        x = x.unsqueeze(1)
        x = torch.permute(x, (0, 2, 1, 3))
        x = self.init_bn(x)
        x = torch.permute(x, (0, 2, 1, 3))

        x = self.patch_embed(x)
        t = x.shape[-1]

3. functional.py

支持对复数求ABS。

/usr/local/lib/python3.10/dist-packages/torchaudio/functional/functional.py：145

    spec_f = spec_f.reshape(shape[:-1] + spec_f.shape[-2:])

    if window_norm:
        spec_f /= window.pow(2.0).sum().sqrt()
    
    if power is not None:    
        if not spec_f.is_complex():
            if power == 1.0:
                return spec_f.abs()
            return spec_f.abs().pow(power)
        else:
            real_part = spec_f.real
            imag_part = spec_f.imag
            abs_tensor = torch.hypot(real_part, imag_part)  
            if power == 1.0:
                return abs_tensor
            return abs_tensor.pow(power)

    return spec_f

三、模型推理

1. 下载模型权重

modelscope download --model midasheng/midashenglm-7b --local_dir ./MiDashengLM-7B

2. 启动服务

采用bfloat16精度启动。

vllm serve models/MiDashengLM-7B-bf16 \
 --served-model-name midashenglm-7b-bf16 \
 --tensor-parallel-size 1 \
 --max_model_len 4096 \
 --trust-remote-code \
 --dtype bfloat16 \
 --enforce-eager \
 --port 8106

3. 验证

 curl http://127.0.0.1:8106/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $INF_API_KEY" \
-d '{
"model": "midashenglm-7b-bf16",
"messages": [
{"role": "system", "content": "You are a helpful language and speech assistant."},
{"role": "user", "content": [   
    {"type": "audio_url", "audio_url": {"url": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen2.5-Omni/cough.wav"}},
    {"type": "text", "text": "Caption the audio."}
]}    
],
"temperature": 0.7,
"max_tokens": 2048
}'

MiDashengLM-7B模型对temperature参数较为灵敏。如果进行识别类型的评测，可将temperature设为0。