当前位置：首页 > article >正文

人工智能-语音识别技术paddlespeech的搭建和使用

article 2025/1/16 16:43:32

PaddleSpeech 介绍

PaddleSpeech是百度飞桨（PaddlePaddle）开源深度学习平台的其中一个项目，它基于飞桨的语音方向模型库，用于语音和音频中的各种关键任务的开发，包含大量基于深度学习前沿和有影响力的模型。PaddleSpeech支持语音识别、语音翻译（英译中）、语音合成、标点恢复等应用示例。

安装paddlespeech

PaddleSpeech 快速安装方式有两种，一种是 pip 安装，一种是源码编译（官方推荐）。

使用pip安装paddlespeech

$ pip install pytest-runner
$ pip3 install paddleaudio==1.0.1
$ pip3 install paddlespeech==1.0.1

使用源码编译安装

$ git clone https://github.com/PaddlePaddle/PaddleSpeech.git
$ cd PaddleSpeech
$ pip install pytest-runner
$ pip install .

提示：安装过程可能因为缺少各种的库报错，如：librosa 依赖的系统库，gcc 环境问题，kaldi 安装等可以在网上查找。

音频示例下载

$ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav
$ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/en.wav

基本使用

语音合成

 $ paddlespeech tts --input "你好，欢迎使用百度飞桨深度学习框架！" --output output.wav
 $ paddlespeech tts --input "你好税软" --output sr.wav

如果报错

$ pip install numpy==1.23.0
$ sudo apt-get install libsndfile1

执行过程

$  ts-paddle /paddle/PaddleSpeech paddlespeech tts --input "你好，欢迎使用百度飞桨深度学习框架！" --output output.wav
grep: warning: GREP_OPTIONS is deprecated; please use an alias or script
/usr/local/lib/python3.7/dist-packages/librosa/core/constantq.py:1059: DeprecationWarning: `np.complex` is a deprecated alias for the builtin `complex`. To silence this warning, use `complex` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.complex128` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  dtype=np.complex,
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 489M/489M [01:01<00:00, 7.96MB/s]
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 915M/915M [01:51<00:00, 8.22MB/s]
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 589M/589M [01:01<00:00, 9.57MB/s]
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 107k/107k [00:00<00:00, 1.33MB/s]
W0606 13:22:41.408085  2451 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 7.5, Driver API Version: 11.7, Runtime API Version: 11.7
W0606 13:22:41.412684  2451 gpu_resources.cc:91] device: 0, cuDNN Version: 8.4.
/paddle/PaddleSpeech/output.wavλ ts-paddle /paddle/PaddleSpeech ls

语音识别

识别中文

$ paddlespeech asr --lang zh --input zh.wav

指定模型识别，识别英文

$ paddlespeech  asr --lang en --model deepspeech2offline_librispeech  --input en.wav

标点恢复

恢复文本标点，可与ASR模型配合使用

$ paddlespeech text --task punc --input 今天的天气真不错啊你下午有空吗我想约你一起去吃饭

声音分类

适配多场景的开放领域声音分类工具
基于 AudioSet 数据集 527 个类别的声音分类模型

$ paddlespeech cls --input zh.wav

声纹提取

工业级声纹提取工具

$ paddlespeech vector --task spk --input zh.wav

语音翻译

端到端英译中语音翻译工具,使用预编译的 kaldi 相关工具，只支持在 Ubuntu 系统中体验

$ paddlespeech st --input en.wav

查看全文

http://www.kler.cn/a/160249.html

Flink 应用

Vue2+OpenLayers添加/删除点、点击事件功能实现（提供Gitee源码）

微信小程序获取当前页面路径，登录成功后重定向回原页面

设计模式行为型访问者模式（Visitor Pattern）与常见技术框架应用解析

亿道三防丨三防笔记本是什么意思？和普通笔记本的优势在哪里？

探秘 JMeter （Interleave Controller）交错控制器：解锁性能测试的隐藏密码

centos用户相关命令

python起步

问卷调查须避免的错误要点（02）：避免逻辑错误与提升数据质量

基于jsp+servlet+mybatis的简易在线选课系统

Dubbo(二)dubbo调用关系

golang使用sip协议用户名和密码注册到vos3000

vue3中如何实现事件总线eventBus

【数据结构(八)】哈希表

OpenCV-python numpy和基本作图

甘草书店：#8 2023年11月22日星期三「“说一套做一套”的甘草与麦田」

InnoDB的数据存储结构

Qt5.15.2的镜像网址

用100ask 6ull配合飞凌 elf1的教程进行学习的记录 - ap3216

SQL手工注入漏洞测试(Sql Server数据库)-墨者

【Linux】进程控制-进程终止

【musl-pwn】msul-pwn 刷题记录 -- musl libc 1.2.2

面试官问：如何手动触发垃圾回收？幸好昨天复习到了

HarmonyOS学习--创建和运行Hello World

基于SSM的物资物流系统

什么是呼叫中心的语音通道？呼叫中心语音线路有几种？