当前位置：首页 > article >正文

flow-matching based TTS : VoiceBox, E2-TTS, maskGCT

article 2025/2/28 15:41:01

文章目录

VoiceBox
- abstract
- method
E2 TTS
- method
- extension of method
- results

VoiceBox

Meta
2023.10
demo page

abstract

在这里插入图片描述

用于音频编辑，降噪，ero-shot TTS，风格迁移等多种生成任务；
和vall-E对比，WER更低 (5.9% vs 1.9%），spk simi 相当(0.580 vs 0.681)，速度快20x，

method

在这里插入图片描述

使用MFA以及G2P，得到frame-level phone，和mel 对齐；

E2 TTS

microsoft
2024.9

method

在这里插入图片描述

stage	condition_1	condition_2	target
train	[text, filled token]，长度和mel等长	masked mel	预测被mask的mel
infer	[prompt_text, target_text, filled token]	prompt_mel	预测target text对应的mel

extension of method

在这里插入图片描述

motivation：不需要对prompt audio的音频进行转录文本

stage	condition_1	condition_2	target
train	[text of masked region, filled token]，长度和mel等长	prompt mel	预测给定文本的mel
infer	[target_text, filled token]	prompt_mel	预测target text对应的mel

在这里插入图片描述

motivation：因为用的是character，需要对一些文本的发音进行特殊控制；
部分字随机替换为g2p 的结果

results

在这里插入图片描述

http://www.kler.cn/a/504614.html

相关文章：

数据结构与算法之栈: LeetCode 1047. 删除字符串中的所有相邻重复项 (Ts版)

JVM 核心知识点总结

springboot使用阿里oss实现文件上传

如何优化Elasticsearch大文档查询?

haproxy+httpd网站架构，实现负载均衡实验笔记

【centos】校时服务创建

C 语言标准库函数——strtol函数

C#轻松实现ModbusTCP服务器接口

高性能、低成本立体声音频模数转换器—— GC1808，支持掉电和时钟检测低功耗模式

Go语言的数据竞争 (Data Race) 和竞态条件 (Race Condition)

Centos 离线安装杀毒软件

基于禁忌搜索算法的TSP问题最优路径搜索matlab仿真

HarmonyOS鸿蒙-@State@Prop装饰器限制条件

kalilinux - 目录扫描之dirsearch

Autodl转发端口，在本地机器上运行Autodl服务器中的ipynb文件

linux通过web向mac远程传输字符串，mac收到后在终端中直接打印。

微信小程序wx.showToast在真机显示时时间设置无效，显示时间很短问题

传统架构下应用部署

匿名管道通信

深入浅出 Vue.js 渐进式加载图片组件开发