当前位置: 首页 > article >正文

华为HarmonyOS实现实时语音识别转文本

场景介绍

将一段音频信息(短语音模式不超过60s,长语音模式不超过8h)转换为文本,音频信息可以为pcm音频文件或者实时语音。

开发步骤

  1. 在使用语音识别时,将实现语音识别相关的类添加至工程。

    1. import { speechRecognizer } from '@kit.CoreSpeechKit';
      import { BusinessError } from '@kit.BasicServicesKit';

  2. 调用createEngine方法,对引擎进行初始化,并创建SpeechRecognitionEngine实例。

    createEngine方法提供了两种调用形式,当前以其中一种作为示例,其他方式可参考API参考。

    1. let asrEngine: speechRecognizer.SpeechRecognitionEngine;
      let sessionId: string = '123456';
      // 创建引擎,通过callback形式返回
      // 设置创建引擎参数
      let extraParam: Record<string, Object> = {"locate": "CN", "recognizerMode": "short"};
      let initParamsInfo: speechRecognizer.CreateEngineParams = {
      language: 'zh-CN',
      online: 1,
      extraParams: extraParam
      };
      // 调用createEngine方法
      speechRecognizer.createEngine(initParamsInfo, (err: BusinessError, speechRecognitionEngine: speechRecognizer.SpeechRecognitionEngine) => {
      if (!err) {
      console.info('Succeeded in creating engine.');
      // 接收创建引擎的实例
      asrEngine = speechRecognitionEngine;
      } else {
      // 无法创建引擎时返回错误码1002200008,原因:引擎正在销毁中
      console.error(`Failed to create engine. Code: ${err.code}, message: ${err.message}.`);
      }
      });

  3. 得到SpeechRecognitionEngine实例对象后,实例化RecognitionListener对象,调用setListener方法设置回调,用来接收语音识别相关的回调信息。

    1. ​
      // 创建回调对象
      let setListener: speechRecognizer.RecognitionListener = {
      // 开始识别成功回调
      onStart(sessionId: string, eventMessage: string) {
      console.info(`onStart, sessionId: ${sessionId} eventMessage: ${eventMessage}`);
      },
      // 事件回调
      onEvent(sessionId: string, eventCode: number, eventMessage: string) {
      console.info(`onEvent, sessionId: ${sessionId} eventCode: ${eventCode} eventMessage: ${eventMessage}`);
      },
      // 识别结果回调,包括中间结果和最终结果
      onResult(sessionId: string, result: speechRecognizer.SpeechRecognitionResult) {
      console.info(`onResult, sessionId: ${sessionId} sessionId: ${JSON.stringify(result)}`);
      },
      // 识别完成回调
      onComplete(sessionId: string, eventMessage: string) {
      console.info(`onComplete, sessionId: ${sessionId} eventMessage: ${eventMessage}`);
      },
      // 错误回调,错误码通过本方法返回
      // 如:返回错误码1002200006,识别引擎正忙,引擎正在识别中
      // 更多错误码请参考错误码参考
      onError(sessionId: string, errorCode: number, errorMessage: string) {
      console.error(`onError, sessionId: ${sessionId} errorCode: ${errorCode} errorMessage: ${errorMessage}`);
      }
      }
      // 设置回调
      asrEngine.setListener(setListener);
      ​

  4. 分别为音频文件转文字和麦克风转文字功能设置开始识别的相关参数,调用startListening方法,开始合成。

    1. ​
      // 开始识别
      private startListeningForWriteAudio() {
      // 设置开始识别的相关参数
      let recognizerParams: speechRecognizer.StartParams = {
      sessionId: this.sessionId,
      audioInfo: { audioType: 'pcm', sampleRate: 16000, soundChannel: 1, sampleBit: 16 } //audioInfo参数配置请参考AudioInfo
      }
      // 调用开始识别方法
      asrEngine.startListening(recognizerParams);
      };
      private startListeningForRecording() {
      let audioParam: speechRecognizer.AudioInfo = { audioType: 'pcm', sampleRate: 16000, soundChannel: 1, sampleBit: 16 }
      let extraParam: Record<string, Object> = {
      "recognitionMode": 0,
      "vadBegin": 2000,
      "vadEnd": 3000,
      "maxAudioDuration": 20000
      }
      let recognizerParams: speechRecognizer.StartParams = {
      sessionId: this.sessionId,
      audioInfo: audioParam,
      extraParams: extraParam
      }
      console.info('startListening start');
      asrEngine.startListening(recognizerParams);
      };
      ​

  5. 传入音频流,调用writeAudio方法,开始写入音频流。读取音频文件时,开发者需预先准备一个pcm格式音频文件。

    1. ​
      let uint8Array: Uint8Array = new Uint8Array();
      // 可以通过如下方式获取音频流:1、通过录音获取音频流;2、从音频文件中读取音频流
      // 2、从音频文件中读取音频流:demo参考
      // 写入音频流,音频流长度仅支持640或1280
      asrEngine.writeAudio(sessionId, uint8Array);
      ​

  6. (可选)当需要查询语音识别服务支持的语种信息,可调用listLanguages方法。

    listLanguages方法提供了两种调用形式,当前以其中一种作为示例,其他方式可参考API参考。
    // 设置查询相关的参数
    let languageQuery: speechRecognizer.LanguageQuery = {
    sessionId: sessionId
    };
    // 调用listLanguages方法
    asrEngine.listLanguages(languageQuery).then((res: Array<string>) => {
    console.info(`Succeeded in listing languages, result: ${JSON.stringify(res)}.`);
    }).catch((err: BusinessError) => {
    console.error(`Failed to list languages. Code: ${err.code}, message: ${err.message}.`);
    });

  7. (可选)当需要结束识别时,可调用finish方法。

    1. // 结束识别
      asrEngine.finish(sessionId);

  8. (可选)当需要取消识别时,可调用cancel方法。

    1. // 取消识别
      asrEngine.cancel(sessionId);

  9. (可选)当需要释放语音识别引擎资源时,可调用shutdown方法。
    
    // 释放识别引擎资源
    asrEngine.shutdown();

  10. 需要在module.json5配置文件中添加ohos.permission.MICROPHONE权限,确保麦克风使用正常。详细步骤可查看声明权限章节。

    
    //...
    "requestPermissions": [
    {
    "name" : "ohos.permission.MICROPHONE",
    "reason": "$string:reason",
    "usedScene": {
    "abilities": [
    "EntryAbility"
    ],
    "when":"inuse"
    }
    }
    ],
    //...

开发实例

点击按钮,将一段音频信息转换为文本。index.ets文件如下:

​
import { speechRecognizer } from '@kit.CoreSpeechKit';
import { BusinessError } from '@kit.BasicServicesKit';
import { fileIo } from '@kit.CoreFileKit';
import { hilog } from '@kit.PerformanceAnalysisKit';
import AudioCapturer from './AudioCapturer';
const TAG = 'CoreSpeechKitDemo';
let asrEngine: speechRecognizer.SpeechRecognitionEngine;
@Entry
@Component
struct Index {
@State createCount: number = 0;
@State result: boolean = false;
@State voiceInfo: string = "";
@State sessionId: string = "123456";
private mAudioCapturer = new AudioCapturer();
build() {
Column() {
Scroll() {
Column() {
Button() {
Text("CreateEngineByCallback")
.fontColor(Color.White)
.fontSize(20)
}
.type(ButtonType.Capsule)
.backgroundColor("#0x317AE7")
.width("80%")
.height(50)
.margin(10)
.onClick(() => {
this.createCount++;
hilog.info(0x0000, TAG, `CreateAsrEngine:createCount:${this.createCount}`);
this.createByCallback();
})
Button() {
Text("setListener")
.fontColor(Color.White)
.fontSize(20)
}
.type(ButtonType.Capsule)
.backgroundColor("#0x317AE7")
.width("80%")
.height(50)
.margin(10)
.onClick(() => {
this.setListener();
})
Button() {
Text("startRecording")
.fontColor(Color.White)
.fontSize(20)
}
.type(ButtonType.Capsule)
.backgroundColor("#0x317AE7")
.width("80%")
.height(50)
.margin(10)
.onClick(() => {
this.startRecording();
})
Button() {
Text("writeAudio")
.fontColor(Color.White)
.fontSize(20)
}
.type(ButtonType.Capsule)
.backgroundColor("#0x317AE7")
.width("80%")
.height(50)
.margin(10)
.onClick(() => {
this.writeAudio();
})
Button() {
Text("queryLanguagesCallback")
.fontColor(Color.White)
.fontSize(20)
}
.type(ButtonType.Capsule)
.backgroundColor("#0x317AE7")
.width("80%")
.height(50)
.margin(10)
.onClick(() => {
this.queryLanguagesCallback();
})
Button() {
Text("finish")
.fontColor(Color.White)
.fontSize(20)
}
.type(ButtonType.Capsule)
.backgroundColor("#0x317AE7")
.width("80%")
.height(50)
.margin(10)
.onClick(() => {
// 结束识别
hilog.info(0x0000, TAG, "finish click:-->");
asrEngine.finish(this.sessionId);
})
Button() {
Text("cancel")
.fontColor(Color.White)
.fontSize(20)
}
.type(ButtonType.Capsule)
.backgroundColor("#0x317AE7")
.width("80%")
.height(50)
.margin(10)
.onClick(() => {
// 取消识别
hilog.info(0x0000, TAG, "cancel click:-->");
asrEngine.cancel(this.sessionId);
})
Button() {
Text("shutdown")
.fontColor(Color.White)
.fontSize(20)
}
.type(ButtonType.Capsule)
.backgroundColor("#0x317AA7")
.width("80%")
.height(50)
.margin(10)
.onClick(() => {
// 释放引擎
asrEngine.shutdown();
})
}
.layoutWeight(1)
}
.width('100%')
.height('100%')
}
}
// 创建引擎,通过callback形式返回
private createByCallback() {
// 设置创建引擎参数
let extraParam: Record<string, Object> = {"locate": "CN", "recognizerMode": "short"};
let initParamsInfo: speechRecognizer.CreateEngineParams = {
language: 'zh-CN',
online: 1,
extraParams: extraParam
};
// 调用createEngine方法
speechRecognizer.createEngine(initParamsInfo, (err: BusinessError, speechRecognitionEngine:
speechRecognizer.SpeechRecognitionEngine) => {
if (!err) {
hilog.info(0x0000, TAG, 'Succeeded in creating engine.');
// 接收创建引擎的实例
asrEngine = speechRecognitionEngine;
} else {
// 无法创建引擎时返回错误码1002200001,原因:语种不支持、模式不支持、初始化超时、资源不存在等导致创建引擎失败
// 无法创建引擎时返回错误码1002200006,原因:引擎正在忙碌中,一般多个应用同时调用语音识别引擎时触发
// 无法创建引擎时返回错误码1002200008,原因:引擎正在销毁中
hilog.error(0x0000, TAG, `Failed to create engine. Code: ${err.code}, message: ${err.message}.`);
}
});
}
// 查询语种信息,以callback形式返回
private queryLanguagesCallback() {
// 设置查询相关参数
let languageQuery: speechRecognizer.LanguageQuery = {
sessionId: '123456'
};
// 调用listLanguages方法
asrEngine.listLanguages(languageQuery, (err: BusinessError, languages: Array<string>) => {
if (!err) {
// 接收目前支持的语种信息
hilog.info(0x0000, TAG, `Succeeded in listing languages, result: ${JSON.stringify(languages)}`);
} else {
hilog.error(0x0000, TAG, `Failed to create engine. Code: ${err.code}, message: ${err.message}.`);
}
});
};
// 开始识别
private startListeningForWriteAudio() {
// 设置开始识别的相关参数
let recognizerParams: speechRecognizer.StartParams = {
sessionId: this.sessionId,
audioInfo: { audioType: 'pcm', sampleRate: 16000, soundChannel: 1, sampleBit: 16 } //audioInfo参数配置请参考AudioInfo
}
// 调用开始识别方法
asrEngine.startListening(recognizerParams);
};
private startListeningForRecording() {
let audioParam: speechRecognizer.AudioInfo = { audioType: 'pcm', sampleRate: 16000, soundChannel: 1, sampleBit: 16 }
let extraParam: Record<string, Object> = {
"recognitionMode": 0,
"vadBegin": 2000,
"vadEnd": 3000,
"maxAudioDuration": 20000
}
let recognizerParams: speechRecognizer.StartParams = {
sessionId: this.sessionId,
audioInfo: audioParam,
extraParams: extraParam
}
hilog.info(0x0000, TAG, 'startListening start');
asrEngine.startListening(recognizerParams);
};
// 写音频流
private async writeAudio() {
this.startListeningForWriteAudio();
hilog.error(0x0000, TAG, `Failed to read from file. Code`);
let ctx = getContext(this);
let filenames: string[] = fileIo.listFileSync(ctx.filesDir);
if (filenames.length <= 0) {
hilog.error(0x0000, TAG, `Failed to read from file. Code`);
return;
}
hilog.error(0x0000, TAG, `Failed to read from file. Code`);
let filePath: string = `${ctx.filesDir}/${filenames[0]}`;
let file = fileIo.openSync(filePath, fileIo.OpenMode.READ_WRITE);
try {
let buf: ArrayBuffer = new ArrayBuffer(1280);
let offset: number = 0;
while (1280 == fileIo.readSync(file.fd, buf, {
offset: offset
})) {
let uint8Array: Uint8Array = new Uint8Array(buf);
asrEngine.writeAudio("123456", uint8Array);
await this.countDownLatch(1);
offset = offset + 1280;
}
} catch (err) {
hilog.error(0x0000, TAG, `Failed to read from file. Code: ${err.code}, message: ${err.message}.`);
} finally {
if (null != file) {
fileIo.closeSync(file);
}
}
}
// 麦克风语音转文本
private async startRecording() {
this.startListeningForRecording();
// 录音获取音频
let data: ArrayBuffer;
hilog.info(0x0000, TAG, 'create capture success');
this.mAudioCapturer.init((dataBuffer: ArrayBuffer) => {
hilog.info(0x0000, TAG, 'start write');
hilog.info(0x0000, TAG, 'ArrayBuffer ' + JSON.stringify(dataBuffer));
data = dataBuffer
let unit8Array: Uint8Array = new Uint8Array(data);
hilog.info(0x0000, TAG, 'ArrayBuffer unit8Array ' + JSON.stringify(unit8Array));
// 写入音频流
asrEngine.writeAudio("1234567", unit8Array);
});
};
// 计时
public async countDownLatch(count: number) {
while (count > 0) {
await this.sleep(40);
count--;
}
}
// 睡眠
private sleep(ms: number):Promise<void> {
return new Promise(resolve => setTimeout(resolve, ms));
}
// 设置回调
private setListener() {
// 创建回调对象
let setListener: speechRecognizer.RecognitionListener = {
// 开始识别成功回调
onStart(sessionId: string, eventMessage: string) {
hilog.info(0x0000, TAG, `onStart, sessionId: ${sessionId} eventMessage: ${eventMessage}`);
},
// 事件回调
onEvent(sessionId: string, eventCode: number, eventMessage: string) {
hilog.info(0x0000, TAG, `onEvent, sessionId: ${sessionId} eventCode: ${eventCode} eventMessage: ${eventMessage}`);
},
// 识别结果回调,包括中间结果和最终结果
onResult(sessionId: string, result: speechRecognizer.SpeechRecognitionResult) {
hilog.info(0x0000, TAG, `onResult, sessionId: ${sessionId} sessionId: ${JSON.stringify(result)}`);
},
// 识别完成回调
onComplete(sessionId: string, eventMessage: string) {
hilog.info(0x0000, TAG, `onComplete, sessionId: ${sessionId} eventMessage: ${eventMessage}`);
},
// 错误回调,错误码通过本方法返回
// 返回错误码1002200002,开始识别失败,重复启动startListening方法时触发
// 更多错误码请参考错误码参考
onError(sessionId: string, errorCode: number, errorMessage: string) {
hilog.error(0x0000, TAG, `onError, sessionId: ${sessionId} errorCode: ${errorCode} errorMessage: ${errorMessage}`);
},
}
// 设置回调
asrEngine.setListener(setListener);
};
}
​

添加AudioCapturer.ts文件用于获取麦克风音频流。


'use strict';
/*
* Copyright (c) Huawei Technologies Co., Ltd. 2023-2023. All rights reserved.
*/

import {audio} from '@kit.AudioKit';
import { hilog } from '@kit.PerformanceAnalysisKit';

const TAG = 'AudioCapturer';

/**
* Audio collector tool
*/
export default class AudioCapturer {
/**
* Collector object
*/
private mAudioCapturer = null;

/**
* Audio Data Callback Method
*/
private mDataCallBack: (data: ArrayBuffer) => void = null;

/**
* Indicates whether recording data can be obtained.
*/
private mCanWrite: boolean = true;

/**
* Audio stream information
*/
private audioStreamInfo = {
samplingRate: audio.AudioSamplingRate.SAMPLE_RATE_16000,
channels: audio.AudioChannel.CHANNEL_1,
sampleFormat: audio.AudioSampleFormat.SAMPLE_FORMAT_S16LE,
encodingType: audio.AudioEncodingType.ENCODING_TYPE_RAW
}

/**
* Audio collector information
*/
private audioCapturerInfo = {
source: audio.SourceType.SOURCE_TYPE_MIC,
capturerFlags: 0
}

/**
* Audio Collector Option Information
*/
private audioCapturerOptions = {
streamInfo: this.audioStreamInfo,
capturerInfo: this.audioCapturerInfo
}

/**
* Initialize
* @param audioListener
*/
public async init(dataCallBack: (data: ArrayBuffer) => void) {
if (null != this.mAudioCapturer) {
hilog.error(0x0000, TAG, 'AudioCapturerUtil already init');
return;
}
this.mDataCallBack = dataCallBack;
this.mAudioCapturer = await audio.createAudioCapturer(this.audioCapturerOptions).catch(error => {
hilog.error(0x0000, TAG, `AudioCapturerUtil init createAudioCapturer failed, code is ${error.code}, message is ${error.message}`);
});
}

/**
* start recording
*/
public async start() {
hilog.error(0x0000, TAG, `AudioCapturerUtil start`);
let stateGroup = [audio.AudioState.STATE_PREPARED, audio.AudioState.STATE_PAUSED, audio.AudioState.STATE_STOPPED];
if (stateGroup.indexOf(this.mAudioCapturer.state) === -1) {
hilog.error(0x0000, TAG, `AudioCapturerUtil start failed`);
return;
}
this.mCanWrite = true;
await this.mAudioCapturer.start();
while (this.mCanWrite) {
let bufferSize = await this.mAudioCapturer.getBufferSize();
let buffer = await this.mAudioCapturer.read(bufferSize, true);
this.mDataCallBack(buffer)
}
}

/**
* stop recording
*/
public async stop() {
if (this.mAudioCapturer.state !== audio.AudioState.STATE_RUNNING && this.mAudioCapturer.state !== audio.AudioState.STATE_PAUSED) {
hilog.error(0x0000, TAG, `AudioCapturerUtil stop Capturer is not running or paused`);
return;
}
this.mCanWrite = false;
await this.mAudioCapturer.stop();
if (this.mAudioCapturer.state === audio.AudioState.STATE_STOPPED) {
hilog.info(0x0000, TAG, `AudioCapturerUtil Capturer stopped`);
} else {
hilog.error(0x0000, TAG, `Capturer stop failed`);
}
}

/**
* release
*/
public async release() {
if (this.mAudioCapturer.state === audio.AudioState.STATE_RELEASED || this.mAudioCapturer.state === audio.AudioState.STATE_NEW) {
hilog.error(0x0000, TAG, `Capturer already released`);
return;
}
await this.mAudioCapturer.release();
this.mAudioCapturer = null;
if (this.mAudioCapturer.state == audio.AudioState.STATE_RELEASED) {
hilog.info(0x0000, TAG, `Capturer released`);
} else {
hilog.error(0x0000, TAG, `Capturer release failed`);
}
}
}

在EntryAbility.ets文件中添加麦克风权限。


import { abilityAccessCtrl, AbilityConstant, UIAbility, Want } from '@kit.AbilityKit';
import { hilog } from '@kit.PerformanceAnalysisKit';
import { window } from '@kit.ArkUI';
import { BusinessError } from '@kit.BasicServicesKit';

export default class EntryAbility extends UIAbility {
onCreate(want: Want, launchParam: AbilityConstant.LaunchParam): void {
hilog.info(0x0000, 'testTag', '%{public}s', 'Ability onCreate');
}

onDestroy(): void {
hilog.info(0x0000, 'testTag', '%{public}s', 'Ability onDestroy');
}

onWindowStageCreate(windowStage: window.WindowStage): void {
// Main window is created, set main page for this ability
hilog.info(0x0000, 'testTag', '%{public}s', 'Ability onWindowStageCreate');

let atManager = abilityAccessCtrl.createAtManager();
atManager.requestPermissionsFromUser(this.context, ['ohos.permission.MICROPHONE']).then((data) => {
hilog.info(0x0000, 'testTag', 'data:' + JSON.stringify(data));
hilog.info(0x0000, 'testTag', 'data:' + JSON.stringify(data));
hilog.info(0x0000, 'testTag', 'data permissions:' + data.permissions);
hilog.info(0x0000, 'testTag', 'data authResults:' + data.authResults);
}).catch((err: BusinessError) => {
hilog.error(0x0000, 'testTag', 'errCode: ' + err.code + 'errMessage: ' + err.message);
});

windowStage.loadContent('pages/Index', (err, data) => {
if (err.code) {
hilog.error(0x0000, 'testTag', 'Failed to load the content. Cause: %{public}s', JSON.stringify(err) ?? '');
return;
}
hilog.info(0x0000, 'testTag', 'Succeeded in loading the content. Data: %{public}s', JSON.stringify(data) ?? '');
});
}

onWindowStageDestroy(): void {
// Main window is destroyed, release UI related resources
hilog.info(0x0000, 'testTag', '%{public}s', 'Ability onWindowStageDestroy');
}

onForeground(): void {
// Ability has brought to foreground
hilog.info(0x0000, 'testTag', '%{public}s', 'Ability onForeground');
}

onBackground(): void {
// Ability has back to background
hilog.info(0x0000, 'testTag', '%{public}s', 'Ability onBackground');
}
}

http://www.kler.cn/a/370195.html

相关文章:

  • 自然语言处理(NLP)领域相关模型概述
  • Node.js 完全教程:从入门到精通
  • springboot基于安卓的智启教育服务平台app
  • Linux TCP 之 RTT 采集与 RTO 计算
  • leetcode 面试经典 150 题:插入区间
  • jvm_threads_live_threads 和 jvm_threads_states_threads 这两个指标之间存在一定的关系,但它们关注的维度不同
  • 银行信贷风控专题:Python、R 语言机器学习数据挖掘应用实例合集:xgboost、决策树、随机森林、贝叶斯等
  • 论文笔记:SIBO: A Simple Booster for Parameter-Efficient Fine-Tuning
  • 合理使用动画和转场<HarmonyOS第一课>
  • 【Orange Pi 5 Linux 5.x 内核编程】-字符设备文件操作基础
  • ssm011线上旅行信息管理系统(论文+源码)_kaic
  • 基于SpringBoot的“超市进销存系统”的设计与实现(源码+数据库+文档+PPT)
  • 面向对象进阶(下)(JAVA笔记第二十五期)
  • 【STM32-HAL库】火焰传感器(STM32F407ZGT6)(附带工程下载链接)
  • spring-第十二章 GoF代理模式
  • Android Studio安装完成后,下载gradle-7.4-bin.zip出现连接超时
  • 将 Logstash 管道转换为 OpenTelemetry Collector 管道
  • JavaScript如何判断变量数据类型 - 2024最新版前端秋招面试短期突击面试题【100道】
  • SpringBoot 集成RabbitMQ 实现钉钉日报定时发送功能
  • [LeetCode] 526. 优美的排列
  • Docker | 校园网上docker pull或者docker run失败的一种解决方法
  • 探索C嘎嘎:认识string类
  • 【大数据分析与挖掘模型】matlab实现——非线性回归预测模型
  • 【计算机网络 - 基础问题】每日 3 题(五十七)
  • 《等保测评:安全与发展的双轮驱动》
  • 14 C语言中的关键字