ChatTTS实现文本转语音(TTS)全流程教程【附完整代码 & 环境配置】

言简意赅的讲解ChatTTS解决的痛点

‍ 本教程手把手带你从零上手ChatTTS,实现文本到语音(TTS)转换,适合自媒体配音、有声内容创作、AI语音实验等场景。配套提供完整代码和环境配置方法,一键复现,无压力!


什么是ChatTTS?

ChatTTS 是由清华大学团队开源的一款中文文本转语音(Text-to-Speech, TTS)模型。它的特点包括:

  • ️ 语音自然流畅,情感丰富
  • ️ 支持自定义发音人音色(speaker embedding)
  • ⚡ 推理速度快,易于部署
  • 完全开源,适合学习与研究

GitHub地址: https://github.com/2noise/ChatTTS

⚠️ 注意:本教程仅限学习、科研或非商业用途。ChatTTS模型可能存在版权限制,不建议直接用于商业项目。


⚙️ 快速环境配置(可复现)

✅ 推荐使用虚拟环境

创建 Python 虚拟环境,避免污染系统环境:

# Windows
python -m venv env
.\env\Scripts\activate

# Linux / macOS
python3 -m venv env
source env/bin/activate

安装必要依赖

将以下内容保存为 requirements.txt

torch==2.7.1
torchaudio==2.7.1
ChatTTS==0.2.4
numpy==2.2.6

执行安装:

pip install -r requirements.txt

ChatTTS完整演示代码(可直接运行)

保存以下内容为 chattts_demo.py,即刻运行:

import ChatTTS
import torch
import torchaudio

chat = ChatTTS.Chat()
chat.load(compile=True) # Set to True for better performance

speaker_vector = '3.281,2.916,2.316,2.280,-0.884,0.723,-10.224,-0.266,2.000,4.442,1.427,-1.075,-3.027,0.438,1.224,1.043,-0.423,-3.808,0.740,-4.704,10.824,-0.362,-0.291,10.761,1.843,6.068,-4.458,5.142,4.183,4.333,-4.859,-1.444,0.901,3.424,-0.992,-1.645,0.230,0.818,-2.294,1.417,2.839,-0.533,-2.302,-1.379,-3.898,0.998,-0.196,14.249,1.641,-6.927,-1.283,6.005,-0.125,1.531,3.420,8.003,-0.202,-2.823,4.435,2.378,5.186,4.951,1.945,-2.100,1.644,1.921,-1.138,6.229,-5.203,3.610,-0.919,-5.326,2.130,-4.196,21.073,-0.012,11.642,-4.142,-0.581,-0.138,0.882,-1.235,2.377,6.930,3.499,-0.239,4.493,0.411,-2.122,-7.575,0.809,0.885,0.008,0.900,9.930,-3.983,-0.462,-0.699,-4.674,5.071,-3.361,-2.448,0.742,-3.553,-6.620,-0.570,2.859,-3.424,8.462,2.013,-0.204,6.866,-1.946,0.946,-0.440,-4.125,1.194,-2.351,5.368,-5.819,0.049,-0.918,-1.812,-2.288,4.262,12.440,2.084,-0.156,-5.392,0.106,3.279,4.475,0.139,1.006,5.619,0.716,-5.918,-0.774,-5.805,-7.207,-0.402,0.228,-1.422,4.098,1.244,-1.481,-0.672,-3.778,1.220,-0.947,0.807,-1.658,5.977,7.529,-1.070,-2.112,-2.946,-1.739,-2.284,-1.045,-0.442,-1.813,1.269,-2.797,0.321,0.155,-0.379,-2.554,-1.409,2.421,0.318,4.154,0.647,0.610,-3.516,4.489,3.286,3.552,-0.358,1.132,-2.382,-0.612,-5.613,-0.168,-0.680,1.723,-20.408,-2.504,5.870,1.631,4.703,-0.359,0.367,-1.661,-2.172,-9.155,-3.521,-1.058,4.984,-4.086,-3.736,8.959,-1.759,-1.117,17.161,1.497,1.160,2.355,2.584,-2.312,-2.436,0.787,-4.522,5.730,-2.144,1.276,-5.419,-5.749,8.496,5.873,-0.472,-1.497,1.162,8.649,0.326,-0.714,4.123,-3.431,-1.579,0.016,-4.171,-15.907,0.607,1.909,-3.616,-0.803,-0.046,1.836,5.704,-0.730,3.511,7.312,-0.066,0.972,0.199,1.777,-3.750,0.089,18.306,1.156,-2.320,1.280,-2.491,-2.765,-19.535,-1.302,-8.004,-1.833,0.941,-3.547,-0.741,-1.860,-1.626,-2.555,-0.086,0.221,7.494,8.971,-2.007,-2.635,-2.494,-1.794,-5.080,4.963,-1.599,-1.862,-0.225,0.518,2.853,1.657,-0.257,-1.165,-7.366,2.681,5.232,0.592,-2.189,1.606,-1.407,-2.102,-2.945,2.853,8.901,0.029,-5.533,2.786,-0.961,2.544,1.200,-1.171,1.578,-1.681,-0.955,-2.784,2.281,-3.195,2.811,-1.718,0.810,-0.603,-1.736,-1.373,-0.891,-5.330,-1.381,9.614,4.155,0.047,-1.053,3.823,1.010,0.042,-2.849,1.482,0.798,19.538,-12.677,3.608,-1.660,4.493,2.087,-0.906,1.919,1.318,-1.815,-1.287,9.502,0.361,-4.073,-0.589,-1.339,5.567,1.668,0.479,0.333,-26.911,-3.391,-1.020,-4.389,-4.838,2.512,1.478,-3.802,-1.676,-0.469,0.898,3.632,-1.620,0.432,-1.906,-1.674,-7.568,0.361,1.675,-1.164,-13.232,-1.971,-3.983,-2.921,8.597,-0.231,-0.449,0.392,4.953,-5.841,0.047,4.419,-5.355,-4.528,-1.681,-8.136,0.295,3.060,-7.177,-1.483,1.266,2.680,2.265,-8.495,-2.133,1.485,-1.022,-3.286,-1.811,-1.127,-4.633,0.473,6.305,-4.399,-0.452,1.234,-0.245,4.392,2.415,-2.063,3.707,-4.495,3.831,29.219,-11.790,-0.259,1.078,-7.773,1.709,-1.397,-0.067,0.768,2.663,-1.454,-3.022,-1.810,5.015,1.889,1.119,-1.413,-3.240,4.198,14.864,-2.950,1.651,-3.228,-1.095,-2.941,1.231,-8.138,-1.936,-2.160,5.722,-5.481,0.544,-0.786,-1.140,-1.677,2.252,0.031,-2.385,-1.336,3.705,-3.270,-0.495,2.222,1.482,-6.423,-0.509,-1.519,5.775,0.420,-2.140,-3.850,2.567,8.821,-6.976,-6.974,-1.999,-3.226,-1.778,-6.607,9.911,-9.480,1.611,-0.308,-9.211,-1.476,1.751,-0.466,-2.363,-2.358,-4.866,-1.147,-3.444,-6.753,-0.507,-1.245,-2.413,-1.962,2.347,-0.094,-0.745,0.668,2.071,-0.755,-2.450,6.657,7.899,-6.277,2.189,-2.127,2.291,6.095,0.927,-0.250,-1.662,4.031,6.098,-17.201,-9.840,-0.729,1.427,3.222,-0.208,2.380,1.103,1.147,4.051,2.980,0.371,4.143,-1.133,0.486,-0.588,-0.392,-1.507,-0.803,-6.217,-0.947,-5.456,-9.972,-1.221,-1.677,0.376,-0.066,-0.987,-1.144,2.161,-0.282,1.924,1.364,10.975,-6.569,-5.145,-1.047,-1.040,-6.781,-0.387,-21.836,0.683,3.180,-0.757,-0.025,3.418,0.626,1.140,-4.496,-2.867,-2.973,10.865,2.019,-6.465,9.939,-0.107,-2.146,4.367,0.148,-1.867,1.160,-13.273,7.023,0.068,-1.266,-4.770,-1.669,-0.643,1.833,-0.005,-3.790,3.982,-4.962,-0.175,2.481,-0.260,-1.224,1.529,1.225,5.921,5.203,2.303,-2.092,-0.826,5.299,-0.106,0.513,-3.272,1.970,-2.140,-0.033,-3.860,2.144,2.398,-1.478,4.514,1.553,0.901,-2.297,4.627,-0.360,-0.540,0.280,4.952,-2.520,-3.369,19.334,-0.363,0.924,-0.099,-1.544,1.592,-0.163,6.088,-2.863,2.072,-0.336,-1.897,4.755,-3.300,-6.699,-3.852,0.276,-5.298,-0.012,-6.344,3.159,9.078,-2.398,-1.005,-5.604,1.775,4.505,1.172,-3.949,-1.302,5.698,-0.228,-8.495,-4.654,2.533,-4.065,0.098,0.392,2.624,0.928,-0.316,-0.798,-1.814,-3.183,-3.962,-2.339,-2.051,3.898,-1.971,-2.940,8.407,0.673,2.367,7.541,3.822,-6.822,-2.951,-2.204,10.548,-0.255,-1.850,1.263,1.779,1.912,0.608,-1.325,2.220,2.637,1.699,-8.412,-1.397,-0.422,-4.485,3.247,0.382,7.712,-2.262,-1.700,-0.506,7.307,7.266,-0.498,-6.427,-5.879,14.950,0.991,3.057,10.976,-3.662,-0.781,-0.486,7.302,1.052,-3.824,-9.337,-0.783,3.696,-1.935,1.468,-2.889,-1.950,-1.511,2.164,2.791,0.533,-0.986,0.150,1.650,0.909,1.949,-4.902,1.626,0.463,2.622,-1.272,2.164,2.691,-4.966,-0.418,-0.016,-8.490,-3.339,5.198,-3.525,2.392,0.445,1.293,3.217,6.489,1.553,2.088,-1.272,-1.077,3.093,-7.780,4.537,0.824,-1.956,1.215,0.527,2.576,-3.828,2.519,3.089,-4.081,-2.068,-0.241,0.574,-0.245,-1.646,-0.643,3.113,0.861,9.045,-4.593,5.160,3.660,-1.495'
speaker = torch.tensor([float(x) for x in speaker_vector.split(',')])
my_text = """如果帮助到了您,请一键三连[lbreak]关注CSDN博客 文浩(楠搏万)[uv_break],感谢您的支持[laugh]。"""
params_infer_code = ChatTTS.Chat.InferCodeParams(
    spk_emb = speaker, # add sampled speaker
    temperature = .3,   # using custom temperature
    top_P = 0.7,        # top P decode
    top_K = 20,         # top K decode
)
params_refine_text = ChatTTS.Chat.RefineTextParams(
    prompt='[oral_2][laugh_0][break_6]',
)

wavs = chat.infer(
    my_text,
    params_refine_text=params_refine_text,
    params_infer_code=params_infer_code,
)
torchaudio.save("word_level_output.wav", torch.from_numpy(wavs[0]).unsqueeze(0), 24000)


speaker_vector 是什么?如何自定义音色?

speaker_vector 是 ChatTTS 用于控制语音音色的 768维向量,可以看作“说话者的声音特征”。

你可以理解为:

  • 同样的文字,不同的 speaker_vector 会生成不同“人”在说话的语音。
  • 你可以使用更柔和的、更磁性的、甚至更幽默的音色,只需换个向量。

推荐做法:

  • 本文提供的是一个通用的好听音色,适合新手使用。

  • 想探索更多?访问:ChatTTS GitHub 项目:

    • 获取官方预设音色向量
    • 学习如何从自己的语音中提取 speaker_vector
    • 解锁更多语气提示词(如 [angry], [shy], [whisper]

参数详解与优化建议

参数名 作用说明
temperature 控制语音多样性,值越低越稳定,推荐 0.2~0.5
top_P 控制采样概率,越小生成越保守,推荐 0.6~0.8
top_K 限制候选token数量,推荐 20~50
prompt 用于控制语气,如 [laugh], [break], [shy], [oral_2]

✅ 推荐组合用于自然口语:

prompt = '[oral_2][laugh_0][break_6]'
temperature = 0.3
top_P = 0.7
top_K = 20

示例语音试听(上传你的语音效果)

GitHub地址: 效果展示


总结

在本教程中,我们从零开始介绍了:

  1. ✅ ChatTTS模型的安装与环境复现
  2. ✅ speaker_vector的意义与获取方式
  3. ✅ 参数调优技巧,提升语音表现
  4. ✅ 提供完整可运行的代码与音频保存方式

通过上述内容,你就已经基本理解了这个方法,基础用法我也都有展示。如果你能融会贯通,我相信你会很强

Best
Wenhao (楠博万)

你可能感兴趣的:(语言模型,Chattts,大语言模型,AI,人工智能,python,生成)