【TTS】OuteTTS初体验
目录
一、环境
二、安装
1.安装outetts
2.测试脚本
3.遇到的问题
3.1 ckpt文件下载失败
3.2 OuteTTS-0.1-350M模型文件下载失败
3.3 驱动问题
3.4 play问题
4.最终脚本
总结
一、环境
操作系统:Red Hat Enterprise Linux release 8.8 (Ootpa)
python:python3.10.15
二、安装
1.安装outetts
pip install outetts
2.测试脚本
from outetts.v0_1.interface import InterfaceHF, InterfaceGGUF
# Initialize the interface with the Hugging Face model
interface = InterfaceHF("OuteAI/OuteTTS-0.1-350M")
# Or initialize the interface with a GGUF model
# interface = InterfaceGGUF("path/to/model.gguf")
# Generate TTS output
# Without a speaker reference, the model generates speech with random speaker characteristics
output = interface.generate(
text="Hello, am I working?",
temperature=0.1,
repetition_penalty=1.1,
max_length=4096
)
# Play the generated audio
output.play()
# Save the generated audio to a file
output.save("output.wav")
3.遇到的问题
3.1 ckpt文件下载失败
解决:通过此url下载,放到$HOME/.cache/outeai/tts/wavtokenizer_large_speech_75_token/路径下
https://huggingface.co/novateur/WavTokenizer-large-speech-75token/resolve/main/wavtokenizer_large_speech_320_24k.ckpt
3.2 OuteTTS-0.1-350M模型文件下载失败
解决:离线下载后,放到和脚本相同路径,修改脚本文件,执行时设置一下环境变量
export TRANSFORMERS_OFFLINE=1
export HF_DATASETS_OFFLINE=1
通过此url下载里面所有文件
https://huggingface.co/OuteAI/OuteTTS-0.1-350M/
├── OuteTTS-0.1-350M
│ ├── config.json
│ ├── generation_config.json
│ ├── gitattributes
│ ├── model.safetensors
│ ├── README.md
│ ├── special_tokens_map.json
│ ├── tokenizer_config.json
│ └── tokenizer.json
└── tts_exp.py
3.3 驱动问题
执行报错:OSError: PortAudio library not found
解决:安装portaudio-devel
yum install portaudio-devel
3.4 play问题
执行报错:sounddevice.PortAudioError: Error querying device -1
解决:注释output.play()。
4.最终脚本
export TRANSFORMERS_OFFLINE=1
export HF_DATASETS_OFFLINE=1
python tts_exp.py
from outetts.v0_1.interface import InterfaceHF, InterfaceGGUF
# Initialize the interface with the Hugging Face model
# interface = InterfaceHF("OuteAI/OuteTTS-0.1-350M")
interface = InterfaceHF("./OuteTTS-0.1-350M")
# Or initialize the interface with a GGUF model
# interface = InterfaceGGUF("/data/tts/OuteTTS-0.1-350M-FP16.gguf")
# Generate TTS output
# Without a speaker reference, the model generates speech with random speaker characteristics
output = interface.generate(
text="Hello, am I working?",
temperature=0.1,
repetition_penalty=1.1,
max_length=4096
)
# Play the generated audio
# output.play()
# Save the generated audio to a file
output.save("output.wav")
总结
部署简单,不支持中文,让子弹飞一会。