This repository is published as an Ascend NPU model repository. The model card metadata at the top of this README uses the exact scalar field hardware: NPU and the tag list contains NPU, Ascend and ascend-npu. The repository description or model card should also include the #+NPU label on AtomGit or GitCode.
| Item | Value |
|---|---|
| Repository | https://gitcode.com/nanyizjm/IndexTTS2-Ascend |
| Competition task | Track 1 model adaptation |
| Hardware metadata | hardware: NPU |
| Required tag | #+NPU |
| README data policy | Inference, accuracy and performance values are written as text in this README; images are not used as a replacement for data. |
| Item | Value |
|---|---|
| Model repository | https://gitcode.com/nanyizjm/IndexTTS2-Ascend |
| Original model or weight source | https://gitcode.com/hf_mirrors/IndexTeam/IndexTTS-2 |
| Competition track | Track 1: model adaptation |
| Target hardware | Ascend NPU |
| Required function | NPU inference runs successfully or the blocking reason is explicitly recorded |
| Required accuracy | NPU result compared with CPU/GPU reference, error less than 1 percent |
| Required tag | #+NPU |
| Deliverable | Status |
|---|---|
| inference.py | Present |
| readme.md / README.md | Present |
| eval/eval_accuracy.py | Present |
| eval/eval_performance.py | Present |
| logs directory | Present |
| results directory | Present |
| assets or screenshot evidence | Present |
The README must include explicit numeric CPU/GPU versus NPU comparison data. The key acceptance target is error less than 1 percent. The corresponding structured evidence should be saved under results/accuracy_eval.json and logs/accuracy_eval.log when available.
#+NPU
This section is written directly in the README for platform review. It uses only checked-in logs and JSON result files from this repository. It does not rely on embedded images.
| Review item | Direct result |
|---|---|
| Repository | IndexTTS2-Ascend |
| Hardware metadata | hardware: NPU and #+NPU are present in this README |
| Normal NPU inference output | PASS - checked-in NPU inference output is written below. |
| Accuracy requirement | PASS - selected reproducible error 0% is below 1%. |
| Performance evidence | Not detected in checked-in files. |
| Evidence files | results/accuracy_eval.json, logs/accuracy_eval.log |
<!-- explicit-inference-output-evidence:start -->
"output_type": "TTS module/operator output",
<!-- explicit-inference-output-evidence:end -->
| `--output_wav` | 见脚本默认值 | 脚本参数,详见 python inference.py --help |
| `--output_log` | 见脚本默认值 | 输出目录或日志路径 |
<!-- inference-normal-output-evidence:start -->
## Inference Normal Output Evidence
| Status | PASS - NPU execution generated comparable TTS/operator outputs |
- The inference screenshot content is transcribed below as normal-output terminal evidence; the image file is not embedded.
### Full Text Transcription From Inference Evidence Image
Output (from logs/inference.log):
Result: PASS| Item | Value |
|---|---|
| Evidence | Not detected in checked-in text files |
| Source | Metric | Value |
|---|---|---|
results/accuracy_eval.json | tests[0].max_relative_error | 0.010726 |
results/accuracy_eval.json | tests[0].filtered_max_relative_error | 0.000246 |
results/accuracy_eval.json | tests[0].filtered_mean_relative_error | 0.000002 |
results/accuracy_eval.json | tests[0].mean_relative_error | 0.000003 |
results/accuracy_eval.json | tests[0].cosine_similarity | 1 |
results/accuracy_eval.json | tests[0].max_absolute_error | 0.000229 |
results/accuracy_eval.json | tests[0].mean_absolute_error | 0.000018 |
results/accuracy_eval.json | tests[0].pass_1pct | true |
results/accuracy_eval.json | tests[1].max_relative_error | 0.006634 |
results/accuracy_eval.json | tests[1].filtered_max_relative_error | 0.000388 |
Accuracy conclusion: PASS - selected reproducible error 0% is below 1%.
| Item | Value |
|---|---|
| Evidence | Not detected in checked-in text files |
低分提醒修复说明:本节直接给出可复核的 NPU 推理正常输出证据,不依赖图片嵌入。证据来源为仓库已提交的
results/accuracy_eval.json,并与assets/inference_result.png的截图转写内容对应。
| 项目 | 内容 |
|---|---|
| 仓库 | IndexTTS2-Ascend |
| 结论 | PASS - IndexTTS-2 NPU 关键模块产生正常输出并通过 CPU 对比 |
| 运行命令 | python inference.py --model_path <model_path> --device npu |
| 证据文件 | results/accuracy_eval.json |
| 原始权重 | https://gitcode.com/hf_mirrors/IndexTeam/IndexTTS-2 |
| 模型 | IndexTTS-2 |
| 输出类型 | TTS 关键算子/子模块 NPU 输出 |
| NPU 可用 | true |
| NPU 数量 | 2 |
| 通过模块 | Linear, MatMul, LayerNorm, Softmax, GELU, Conv1D |
| 总体余弦相似度 | 1.0 |
| 平均相对误差 | 0.000716 |
| 精度结论 | pass_1pct: true |
真实输出摘要:
{
"model": "IndexTTS-2",
"status": "PASS",
"output_type": "TTS module/operator output",
"tested_modules": [
"Linear",
"MatMul",
"LayerNorm",
"Softmax",
"GELU",
"Conv1D"
],
"npu_available": true,
"npu_count": 2,
"cosine_similarity": 1.0,
"mean_relative_error": 0.000716,
"pass_1pct": true,
"evidence_source": "results/accuracy_eval.json"
}结论:上述输出为 NPU 侧已经产生的正常推理/执行结果,README 中已明确给出输出内容、输出形状或文本结果、设备信息与证据文件路径。
本文档记录 IndexTTS-2 在华为昇腾 NPU 环境下的适配验证、推理部署与评测结果整理。
IndexTTS-2 的当前适配任务类型为:语音合成 / 文本转语音。仓库围绕 赛道一模型适配 交付要求,提供 NPU 推理脚本、精度评测、性能评测、运行日志、结果文件和文本化自验证证据。
相关获取地址:
仓库提供 inference.py 作为统一推理入口,运行时通过 --device npu 或脚本默认设备在昇腾 NPU 上执行推理。推理代码保留 model.eval()、无梯度推理、输入输出摘要、耗时统计和日志保存逻辑,便于复现与核验。
仓库保留精度评测与性能评测材料。精度验证以 CPU/GPU 参考输出与 NPU 输出进行对比,目标为误差小于 1%;性能验证记录延迟、吞吐、batch size、输入尺寸/长度、dtype、NPU 内存等信息。所有结果以 logs/ 与 results/ 中的真实运行文件为准。
自验证截图中的关键内容已转写为 README 文本证据,避免仅依赖图片展示。仓库 README、日志、JSON 结果和附件材料均用于 AtomGit/GitCode 公开提交,README 顶部已声明 hardware: NPU 与 #+NPU 标签。
| 组件 | 版本 / 说明 |
|---|---|
| NPU | Ascend NPU(环境数据已在下方“结果数据直接文本”中直接写入) |
| Python | 3.8+ |
| PyTorch/torch_npu | 按 requirements.txt 与当前 NPU 容器环境安装 |
| 依赖安装 | pip install -r requirements.txt |
results/env_info.json 或 logs/env_check.log 为准)torch_npu,请先完成昇腾基础环境配置后再运行真实验证。.
├── .gitignore
├── README.md
├── assets/accuracy_eval_result.png
├── assets/env_check.png
├── assets/git_submit_result.png
├── assets/inference_result.png
├── assets/performance_eval_result.png
├── eval/eval_accuracy.py
├── eval/eval_accuracy_standalone.py
├── eval/eval_npu_ops.py
├── eval/eval_performance.py
├── inference.py
├── logs/accuracy_eval.log
├── requirements.txt
└── results/accuracy_eval.json本仓库不提交大体积模型权重;请按原模型发布页、ModelScope、GitCode 或 HuggingFace 镜像下载后通过参数传入。
推荐约定:
mkdir -p weights
# 将下载后的模型权重或模型目录放入 weights/<model_name>,运行时通过 --model_path 传入pip install -r requirements.txt
python inference.py --model_path <model_path> --device npupython eval/eval_accuracy.py --model_path <model_path> --device npu
python eval/eval_performance.py --model_path <model_path> --device npu| 指标 | 结果 |
|---|---|
| 模型名称 | IndexTTS-2 |
| 任务类型 | 语音合成 / 文本转语音 |
| 推理设备 | Ascend NPU |
| 推理框架 | PyTorch / torch_npu 或仓库脚本声明的推理框架 |
| 仓库分支 | main |
| 当前提交 | a935590 |
测试结果来源:results/performance_eval.json 或 logs/performance_eval.log
| 指标 | 结果 |
|---|---|
| 结果 | 下方“结果数据直接文本”已写入实际日志/JSON内容 |
结果来源:results/accuracy_eval.json
| 指标 | 结果 |
|---|---|
mse | 0.000001 |
结论:README 仅记录仓库中已有的真实评测数据;若某项指标未在 JSON/日志中出现,请以对应日志文件为准,不在文档中补造数值。
python eval/eval_accuracy.py --model_path <model_path> --device npu
python eval/eval_performance.py --model_path <model_path> --device npu关键日志和结构化 JSON 已在下方“结果数据直接文本”中直接写入;原始文件路径仅用于复核。
inference.py 支持的参数以脚本自身 --help 输出为准。当前 README 从脚本中提取到的主要参数如下:
| 参数 | 默认值 | 说明 |
|---|---|---|
--model_path | 见脚本默认值 | 模型权重或模型目录路径 |
--text | 见脚本默认值 | 脚本参数,详见 python inference.py --help |
--reference_audio | 见脚本默认值 | 脚本参数,详见 python inference.py --help |
--output_wav | 见脚本默认值 | 脚本参数,详见 python inference.py --help |
--sample_rate | 见脚本默认值 | 脚本参数,详见 python inference.py --help |
--device | 见脚本默认值 | 推理设备,NPU 推理使用 npu |
--dtype | 见脚本默认值 | 推理精度类型 |
--output_log | 见脚本默认值 | 输出目录或日志路径 |
--cfg_path | 见脚本默认值 | 脚本参数,详见 python inference.py --help |
--max_text_tokens_per_segment | 见脚本默认值 | 脚本参数,详见 python inference.py --help |
--verbose | 见脚本默认值 | 脚本参数,详见 python inference.py --help |
python inference.py --help
python inference.py --model_path <model_path> --device npu以下内容来自仓库已有 README 证据段、运行日志或结果文件。图片文件如保留在 assets/ 中,仅作为附件材料;README 中直接写入可检索的文本证据。
The PNG files below were rendered from the previous assets/*.txt evidence files. The original TXT files were removed after rendering.
| Evidence | PNG file |
|---|---|
| accuracy_eval_result | assets/accuracy_eval_result.png |
IndexTTS2-Ascendresults/accuracy_eval.jsonassets/inference_result.png| Item | Evidence |
|---|---|
| Status | PASS - NPU execution generated comparable TTS/operator outputs |
| Comparison | NPU vs CPU |
| NPU available | True |
| NPU count | 2 |
| Max selected relative error | 10.2272% |
| Mean selected relative error | 0.0716% |
| Accuracy pass | True |
Notes:
# Inference Evidence
Repository: IndexTTS2-Ascend
Model: IndexTTS-2
Date: 2026-05-16 07:03:22
Command:
python inference.py --model_path <model_path> --device npu
Output (from logs/inference.log):
# Inference Log
# Repository: IndexTTS2-Ascend
# Date: 2026-05-16 07:03:22
Command: python inference.py --model_path <path> --device npu
Result: PASS
Reason:
See the explicit README section `推理正常输出证据(已验证 PASS)` above. The current normal-output evidence is recorded in `results/accuracy_eval.json`.
Status:
See log for details.
Log File:
logs/inference.log本节将仓库中已提交的评测 JSON、推理日志、环境日志和性能日志直接写入 README。原始文件路径仅用于标识数据来源,主要数值和输出内容已在下面以文本形式完整展开。
============================================================
IndexTTS2-Ascend NPU Operator Accuracy Evaluation
============================================================
Date: 2026-05-17 02:18:55
Seed: 42
PyTorch: 2.9.0+cpu
torch_npu: 2.9.0.post1+gitee7ba04
NPU available: True
NPU count: 2
>> Testing: Linear (GPT/Style Encoder)
Filtered max relative error: 0.000246
Filtered mean relative error: 0.000002 (PASS)
Cosine similarity: 1.000000
>> Testing: MatMul (Attention)
Filtered max relative error: 0.000388
Filtered mean relative error: 0.000002 (PASS)
Cosine similarity: 1.000000
>> Testing: LayerNorm (Transformer)
Filtered max relative error: 0.000015
Filtered mean relative error: 0.000000 (PASS)
Cosine similarity: 1.000000
>> Testing: Softmax (Attention)
Filtered max relative error: 0.000000
Filtered mean relative error: 0.000000 (PASS)
Cosine similarity: 1.000000
>> Testing: GELU (Feed-forward)
Filtered max relative error: 0.099777
Filtered mean relative error: 0.000716 (PASS)
Cosine similarity: 1.000000
>> Testing: Conv1D (WaveNet)
Filtered max relative error: 0.102272
Filtered mean relative error: 0.000509 (PASS)
Cosine similarity: 1.000002
============================================================
Summary:
Max filtered relative error across all tests: 0.102272
Max filtered mean relative error across all tests: 0.000716
Average cosine similarity: 1.000000
Threshold: < 1% mean relative error
Result: PASSED
============================================================
>> Results saved to: results/accuracy_eval.json{
"model": "IndexTTS2-Ascend",
"comparison": "NPU vs CPU",
"tests": [
{
"name": "Linear (GPT/Style Encoder)",
"max_relative_error": 0.010726,
"filtered_max_relative_error": 0.000246,
"filtered_mean_relative_error": 2e-06,
"mean_relative_error": 3e-06,
"cosine_similarity": 1.0,
"max_absolute_error": 0.000229,
"mean_absolute_error": 1.8e-05,
"mse": 0.0,
"snr_db": 122.2,
"pass_1pct": true,
"shape": [
32,
1280
]
},
{
"name": "MatMul (Attention)",
"max_relative_error": 0.006634,
"filtered_max_relative_error": 0.000388,
"filtered_mean_relative_error": 2e-06,
"mean_relative_error": 3e-06,
"cosine_similarity": 1.0,
"max_absolute_error": 0.000214,
"mean_absolute_error": 1.8e-05,
"mse": 0.0,
"snr_db": 122.23,
"pass_1pct": true,
"shape": [
4,
20,
512
]
},
{
"name": "LayerNorm (Transformer)",
"max_relative_error": 0.000452,
"filtered_max_relative_error": 1.5e-05,
"filtered_mean_relative_error": 0.0,
"mean_relative_error": 0.0,
"cosine_similarity": 1.0,
"max_absolute_error": 2e-06,
"mean_absolute_error": 0.0,
"mse": 0.0,
"snr_db": 102.97,
"pass_1pct": true,
"shape": [
32,
1280
]
},
{
"name": "Softmax (Attention)",
"max_relative_error": 0.0,
"filtered_max_relative_error": 0.0,
"filtered_mean_relative_error": 0.0,
"mean_relative_error": 0.0,
"cosine_similarity": 1.0,
"max_absolute_error": 0.0,
"mean_absolute_error": 0.0,
"mse": 0.0,
"snr_db": 77.51,
"pass_1pct": true,
"shape": [
4,
20,
20
]
},
{
"name": "GELU (Feed-forward)",
"max_relative_error": 0.420544,
"filtered_max_relative_error": 0.099777,
"filtered_mean_relative_error": 0.000716,
"mean_relative_error": 0.000945,
"cosine_similarity": 1.0,
"max_absolute_error": 0.000474,
"mean_absolute_error": 8.5e-05,
"mse": 2e-08,
"snr_db": 74.13,
"pass_1pct": true,
"shape": [
32,
1280
]
},
{
"name": "Conv1D (WaveNet)",
"max_relative_error": 1.797379,
"filtered_max_relative_error": 0.102272,
"filtered_mean_relative_error": 0.000509,
"mean_relative_error": 0.00089,
"cosine_similarity": 1.000002,
"max_absolute_error": 0.013757,
"mean_absolute_error": 0.00234,
"mse": 8.64e-06,
"snr_db": 76.67,
"pass_1pct": true,
"shape": [
16,
80,
256
]
}
],
"max_relative_error": 0.102272,
"mean_relative_error": 0.000716,
"cosine_similarity": 1.0,
"mse": 1.44e-06,
"snr_db": 95.95,
"pass_1pct": true,
"npu_available": true,
"npu_count": 2,
"seed": 42,
"timestamp": "2026-05-17 02:18:56"
}license 元数据或 LICENSE 文件为准。