当前位置：首页 > article >正文

【TTS】OuteTTS初体验

article 2025/2/28 9:50:33

一、环境

二、安装

1.安装outetts

2.测试脚本

3.遇到的问题

3.1 ckpt文件下载失败

3.2 OuteTTS-0.1-350M模型文件下载失败

3.3 驱动问题

3.4 play问题

4.最终脚本

总结

一、环境

操作系统：Red Hat Enterprise Linux release 8.8 (Ootpa)

python：python3.10.15

二、安装

1.安装outetts

pip install outetts

2.测试脚本

from outetts.v0_1.interface import InterfaceHF, InterfaceGGUF

# Initialize the interface with the Hugging Face model
interface = InterfaceHF("OuteAI/OuteTTS-0.1-350M")

# Or initialize the interface with a GGUF model
# interface = InterfaceGGUF("path/to/model.gguf")

# Generate TTS output
# Without a speaker reference, the model generates speech with random speaker characteristics
output = interface.generate(
    text="Hello, am I working?",
    temperature=0.1,
    repetition_penalty=1.1,
    max_length=4096
)

# Play the generated audio
output.play()

# Save the generated audio to a file
output.save("output.wav")

3.遇到的问题

3.1 ckpt文件下载失败

解决：通过此url下载，放到$HOME/.cache/outeai/tts/wavtokenizer_large_speech_75_token/路径下

https://huggingface.co/novateur/WavTokenizer-large-speech-75token/resolve/main/wavtokenizer_large_speech_320_24k.ckpt

3.2 `OuteTTS-0.1-350M模型文件下载失败`

解决：离线下载后，放到和脚本相同路径，修改脚本文件，执行时设置一下环境变量

export TRANSFORMERS_OFFLINE=1

export HF_DATASETS_OFFLINE=1

通过此url下载里面所有文件

https://huggingface.co/OuteAI/OuteTTS-0.1-350M/

├── OuteTTS-0.1-350M
│   ├── config.json
│   ├── generation_config.json
│   ├── gitattributes
│   ├── model.safetensors
│   ├── README.md
│   ├── special_tokens_map.json
│   ├── tokenizer_config.json
│   └── tokenizer.json
└── tts_exp.py

`3.3 驱动问题`

执行报错：OSError: PortAudio library not found

解决：安装portaudio-devel

yum install portaudio-devel

`3.4 play问题`

执行报错：sounddevice.PortAudioError: Error querying device -1

解决：注释output.play()。

4.最终脚本

export TRANSFORMERS_OFFLINE=1

export HF_DATASETS_OFFLINE=1

python tts_exp.py

from outetts.v0_1.interface import InterfaceHF, InterfaceGGUF

# Initialize the interface with the Hugging Face model
# interface = InterfaceHF("OuteAI/OuteTTS-0.1-350M")
interface = InterfaceHF("./OuteTTS-0.1-350M")

# Or initialize the interface with a GGUF model
# interface = InterfaceGGUF("/data/tts/OuteTTS-0.1-350M-FP16.gguf")

# Generate TTS output
# Without a speaker reference, the model generates speech with random speaker characteristics
output = interface.generate(
    text="Hello, am I working?",
    temperature=0.1,
    repetition_penalty=1.1,
    max_length=4096
)

# Play the generated audio
# output.play()

# Save the generated audio to a file
output.save("output.wav")