当前位置：首页 > article >正文

深入探讨 ESPnet AIShell 项目：ASR 脚本 asr.sh 的实现与解析(一)之脚本前564行，定义各种配置项、函数和条件逻辑

article 2024/11/6 3:01:03

文章目录

- - 脚本头部
  - 函数定义
  - 计时器
  - 全局配置
  - 数据准备相关配置
  - 速度扰动相关
  - 特征提取相关配置
  - 分词相关配置
  - N-gram 模型相关
  - 语言模型相关
  - ASR 模型相关
  - 上传模型相关
  - 解码相关配置
  - 推理相关配置
  - 数据集相关配置
  - BPE 和语言模型相关文件路径
  - 语言相关和其他配置
  - 日志记录和参数解析
  - 必要参数检查
  - 测试集检查
  - 训练集与验证集检查
  - 测试集处理
  - 特征类型检查
  - 参考文本文件处理
  - BPE 和语言模型训练文本路径设置
  - 分词类型检查
  - 分词类型选择
  - 语言模型处理
  - 模型标签和实验路径设置
  - 统计信息目录设置
  - 训练命令目录设置
  - 推理标签设置
  - 跳过的阶段处理
  - 清理与输出
  - 总结

脚本头部

#!/usr/bin/env bash

指定脚本使用 Bash 解释器执行。

set -e
set -u
set -o pipefail

set -e：如果命令返回非零状态，脚本将立即退出。
set -u：使用未定义变量时使脚本退出。
set -o pipefail：在管道中，如果任何命令失败，整个管道将返回失败状态。

函数定义

log() {
    local fname=${BASH_SOURCE[1]##*/}
    echo -e "$(date '+%Y-%m-%dT%H:%M:%S') (${fname}:${BASH_LINENO[0]}:${FUNCNAME[1]}) $*"
}

定义一个 log 函数，用于记录日志。
- local fname=${BASH_SOURCE[1]##*/}：获取调用日志函数的文件名。
- echo -e ...：输出当前时间、文件名、行号和传递给函数的参数。

min() {
  local a b
  a=$1
  for b in "$@"; do
      if [ "${b}" -le "${a}" ]; then
          a="${b}"
      fi
  done
  echo "${a}"
}

定义一个 min 函数，用于找出传入参数中的最小值。
- local a b：声明局部变量。
- a=$1：将第一个参数赋值给 a。
- 使用 for 循环遍历所有参数，检查并更新 a 为当前较小值。
- 最后输出最小值。

计时器

SECONDS=0

将 SECONDS 变量重置为 0，用于计算脚本的执行时间。

全局配置

# General configuration
stage=1                 # Processes starts from the specified stage.
stop_stage=10000        # Processes is stopped at the specified stage.

stage=1：指定处理的起始阶段。
stop_stage=10000：指定处理的结束阶段。

skip_stages=            # Specify the stage to be skipped
skip_data_prep=false    # Skip data preparation stages.
skip_train=false        # Skip training stages.
skip_eval=false         # Skip decoding and evaluation stages.

skip_stages：用空值初始化，稍后可以填入要跳过的阶段。
后续的 skip_* 变量用于控制是否跳过数据准备、训练和评估阶段。

ngpu=1                  # The number of gpus ("0" uses cpu, otherwise use gpu).
num_nodes=1             # The number of nodes.
nj=32                   # The number of parallel jobs.
inference_nj=32         # The number of parallel jobs in decoding.

ngpu=1：指定使用的 GPU 数量，0 表示使用 CPU。
num_nodes=1：指定计算节点的数量。
nj=32：指定并行作业的数量。
inference_nj=32：指定解码阶段的并行作业数量。

gpu_inference=false     # Whether to perform gpu decoding.
dumpdir=dump            # Directory to dump features.
expdir=exp              # Directory to save experiments.
python=python3          # Specify python to execute espnet commands.

gpu_inference：是否使用 GPU 进行解码的标志。
dumpdir：特征存储目录。
expdir：实验结果存储目录。
python：指定运行 ESPnet 命令的 Python 解释器。

数据准备相关配置

local_data_opts= # The options given to local/data.sh.
post_process_local_data_opts= # The options given to local/data.sh for additional processing in stage 4.
auxiliary_data_tags= # the names of training data for auxiliary tasks

定义数据准备相关选项，初始化为空。

速度扰动相关

speed_perturb_factors=  # perturbation factors, e.g. "0.9 1.0 1.1" (separated by space).

用于指定速度扰动因子，初始化为空。

特征提取相关配置

feats_type=raw       # Feature type (raw, raw_copy, fbank_pitch, or extracted).
audio_format=flac    # Audio format: wav, flac, wav.ark, flac.ark  (only in feats_type=raw).
fs=16k               # Sampling rate.
min_wav_duration=0.1 # Minimum duration in second.
max_wav_duration=20  # Maximum duration in second.

feats_type：指定特征类型。
audio_format：指定音频文件格式。
fs：指定采样率。
min_wav_duration 和 max_wav_duration：指定音频的最小和最大持续时间。

分词相关配置

token_type=bpe      # Tokenization type (char or bpe).
nbpe=30             # The number of BPE vocabulary.
bpemode=unigram     # Mode of BPE (unigram or bpe).
oov="<unk>"         # Out of vocabulary symbol.
blank="<blank>"     # CTC blank symbol
sos_eos="<sos/eos>" # sos and eos symbole
bpe_input_sentence_size=100000000 # Size of input sentence for BPE.

配置分词方式、BPE（Byte Pair Encoding）参数和特殊符号。

N-gram 模型相关

use_ngram=false
ngram_exp=
ngram_num=3

use_ngram：是否使用 N-gram 模型。
ngram_exp 和 ngram_num：指定 N-gram 的实验路径和数量。

语言模型相关

use_lm=true       # Use language model for ASR decoding.
lm_tag=           # Suffix to the result dir for language model training.
lm_exp=           # Specify the directory path for LM experiment.

use_lm：是否使用语言模型进行解码。

ASR 模型相关

asr_task=asr   # ASR task mode. Either 'asr' or 'asr_transducer'.
asr_tag=       # Suffix to the result dir for asr model training.
asr_exp=       # Specify the directory path for ASR experiment.

asr_task：指定 ASR 任务模式。
asr_tag 和 asr_exp：定义 ASR 训练的标签和路径。

上传模型相关

hf_repo=

定义 Hugging Face 模型库的路径，初始化为空。

解码相关配置

use_k2=false      # Whether to use k2 based decoder
k2_ctc_decoding=true
use_nbest_rescoring=true # use transformer-decoder and transformer language model for nbest rescoring

配置是否使用 K2 解码器和 N-best 重评分的选项。

推理相关配置

inference_tag=    # Suffix to the result dir for decoding.
inference_config= # Config for decoding.
inference_args=   # Arguments for decoding, e.g., "--lm_weight 0.1".

预留推理标签、配置和参数。

数据集相关配置

train_set=       # Name of training set.
valid_set=       # Name of validation set used for monitoring/tuning network training.
test_sets=       # Names of test sets. Multiple items (e.g., both dev and eval sets) can be specified.

定义训练集、验证集和测试集的名称。

BPE 和语言模型相关文件路径

bpe_train_text=  # Text file path of bpe training set.
lm_train_text=   # Text file path of language model training set.
lm_dev_text=     # Text file path of language model development set.
lm_test_text=    # Text file path of language model evaluation set.

预留 BPE 和语言模型训练、开发和测试集的文本文件路径。

语言相关和其他配置

lang=noinfo      # The language type of corpus.

指定语料库的语言类型。

help_message=$(cat << EOF
Usage: $0 --train-set "<train_set_name>" --valid-set "<valid_set_name>" --test_sets "<test_set_names>"

定义帮助信息，说明如何使用该脚本及其参数。

日志记录和参数解析

log "$0 $*"
run_args=$(scripts/utils/print_args.sh $0 "$@")
. utils/parse_options.sh

记录脚本的调用信息并调用参数解析脚本。

必要参数检查

if [ $# -ne 0 ]; then
    log "${help_message}"
    log "Error: No positional arguments are required."
    exit 2
fi

检查是否提供了不需要的参数，如果提供则输出帮助信息并退出。

if ! "${skip_train}"; then
    [ -z "${train_set}" ] && { log "${help_message}"; log "Error: --train_set is required"; exit 2; };
    [ -z "${valid_set}" ] && { log "${help_message}"; log "Error: --valid_set is required"; exit 2; };
fi

检查是否跳过训练，如果不跳过则验证训练集和验证集参数是否提供。

测试集检查

if ! "${eval_valid_set}"; then
    [ -z "${test_sets}" ] && { log "${help_message}"; log "Error: --test_sets is required"; exit 2; };
else
    [ -z "${valid_set}" ] && { log "${help_message}"; log "Error: --valid_set is required"; exit 2; };
fi

检查是否需要评估验证集，如果不评估，则需要指定测试集。

训练集与验证集检查

if [ -n "${train_set}" ] && [ "${train_set}" = "${valid_set}" ]; then
    log "Error: train_set and valid_set must be different. --train_set ${train_set} --valid_set ${valid_set}"
    exit 1
fi

检查训练集和验证集是否相同，若相同则退出。

测试集处理

_test_sets=
for dset in ${test_sets}; do
    if [ "${dset}" = "${train_set}" ]; then
        log "Error: train_set and test_sets must be different. --train_set ${train_set} --test_sets ${test_sets}"
        exit 1
    fi
    if [ "${dset}" = "${valid_set}" ]; then
        log "Info: The valid_set '${valid_set}' is included in the test_sets. '--eval_valid_set true' is set and '${valid_set}' is removed from the test_sets"
        eval_valid_set=true
    elif [[ " ${_test_sets} " =~ [[:space:]]${dset}[[:space:]] ]]; then
        log "Info: ${dset} is duplicated in the test_sets. One is removed"
    else
        _test_sets+="${dset} "
    fi
done
test_sets=${_test_sets}

循环检查每个测试集，确保它们与训练集和验证集不同，并处理重复项。

特征类型检查

if [ "${feats_type}" = raw ]; then
    data_feats=${dumpdir}/raw
elif [ "${feats_type}" = raw_copy ]; then
    data_feats=${dumpdir}/raw_copy
elif [ "${feats_type}" = fbank_pitch ]; then
    data_feats=${dumpdir}/fbank_pitch
elif [ "${feats_type}" = fbank ]; then
    data_feats=${dumpdir}/fbank
elif [ "${feats_type}" == extracted ]; then
    data_feats=${dumpdir}/extracted
else
    log "${help_message}"
    log "Error: not supported: --feats_type ${feats_type}"
    exit 2
fi

根据特征类型设置数据特征的存储路径。

参考文本文件处理

num_inf=${num_inf:=${num_ref}}
if [ ${num_ref} -eq 1 ]; then
    ref_text_files_str="text "
    ref_text_names_str="text "
else
    ref_text_files_str="text_spk1 "
    ref_text_names_str="text "
    for n in $(seq 2 ${num_ref}); do
        ref_text_files_str+="text_spk${n} "
        ref_text_names_str+="text_spk${n} "
    done
fi
# shellcheck disable=SC2206
ref_text_files=(${ref_text_files_str// / })
# shellcheck disable=SC2206
ref_text_names=(${ref_text_names_str// / })

设置参考文本文件的名称，支持单个或多个说话者。

BPE 和语言模型训练文本路径设置

[ -z "${bpe_train_text}" ] && bpe_train_text="${data_feats}/org/${train_set}/${ref_text_files[0]}"
[ -z "${lm_train_text}" ] && lm_train_text="${data_feats}/org/${train_set}/${ref_text_files[0]}"
[ -z "${lm_dev_text}" ] && lm_dev_text="${data_feats}/org/${valid_set}/${ref_text_files[0]}"
if [ -z "${lm_test_text}" ]; then
    if [ -z "${test_sets}" ]; then
        lm_test_text="${data_feats}/org/${valid_set}/${ref_text_files[0]}"
    else
        lm_test_text="${data_feats}/${test_sets%% *}/${ref_text_files[0]}"
    fi
fi

设置 BPE 和语言模型相关的文本路径，如果未指定则自动生成。

分词类型检查

if [ "${lang}" != noinfo ]; then
    token_listdir=data/${lang}_token_list
else
    token_listdir=data/token_list
fi
bpedir="${token_listdir}/bpe_${bpemode}${nbpe}"
bpeprefix="${bpedir}"/bpe
bpemodel="${bpeprefix}".model
bpetoken_list="${bpedir}"/tokens.txt
chartoken_list="${token_listdir}"/char/tokens.txt
hugging_face_token_list="${token_listdir}/hugging_face_"${hugging_face_model_name_or_path/\//-}/tokens.txt

根据语言类型设置分词目录，并指定 BPE、字符和 Hugging Face 相关的 token 列表路径。

分词类型选择

if [ "${token_type}" = bpe ]; then
    token_list="${bpetoken_list}"
elif [ "${token_type}" = char ]; then
    token_list="${chartoken_list}"
    bpemodel=none
elif [ "${token_type}" = word ]; then
    token_list="${wordtoken_list}"
    bpemodel=none
elif [ "${token_type}" = whisper_en ]; then
    token_list="${token_listdir}"/whisper_en/tokens.txt
    bpemodel=whisper_en
    hyp_cleaner=${cleaner}
elif [ "${token_type}" = whisper_multilingual ]; then
    token_list="${token_listdir}"/whisper_multilingual/tokens.txt
    bpemodel=whisper_multilingual
    hyp_cleaner=${cleaner}
elif [ "${token_type}" = hugging_face ]; then
    token_list="${hugging_face_token_list}"
    bpemodel=${hugging_face_model_name_or_path}
else
    log "Error: not supported --token_type '${token_type}'"
    exit 2
fi

根据配置的分词类型设置相应的 token 列表路径，并处理不支持的类型。

语言模型处理

if ${use_word_lm}; then
    log "Error: Word LM is not supported yet"
    exit 2
else
    lm_token_list="${token_list}"
    lm_token_type="${token_type}"
fi

检查是否使用单词语言模型，如果是则输出错误信息并退出。

模型标签和实验路径设置

if [ -z "${asr_tag}" ]; then
    if [ -n "${asr_config}" ]; then
        asr_tag="$(basename "${asr_config}" .yaml)_${feats_type}"
    else
        asr_tag="train_${feats_type}"
    fi
    if [ "${lang}" != noinfo ]; then
        asr_tag+="_${lang}_${token_type}"
    else
        asr_tag+="_${token_type}"
    fi
    if [ "${token_type}" = bpe ]; then
        asr_tag+="${nbpe}"
    fi
    if [ "${token_type}" = hugging_face ]; then
        asr_tag+="_"${hugging_face_model_name_or_path/\//-}
    fi
    if [ -n "${asr_args}" ]; then
        asr_tag+="$(echo "${asr_args}" | sed -e "s/--/\_/g" -e "s/[ |=/]//g")"
    fi
    if [ -n "${speed_perturb_factors}" ]; then
        asr_tag+="_sp"
    fi
fi

生成 ASR 模型的标签，包括配置文件名、语言类型、分词类型等信息。

if [ -z "${lm_tag}" ]; then
    if [ -n "${lm_config}" ]; then
        lm_tag="$(basename "${lm_config}" .yaml)"
    else
        lm_tag="train"
    fi
    if [ "${lang}" != noinfo ]; then
        lm_tag+="_${lang}_${lm_token_type}"
    else
        lm_tag+="_${lm_token_type}"
    fi
    if [ "${lm_token_type}" = bpe ]; then
        lm_tag+="${nbpe}"
    fi
    if [ -n "${lm_args}" ]; then

    lm_tag+="$(echo "${lm_args}" | sed -e "s/--/\_/g" -e "s/[ |=/]//g")"
fi

如果 lm_tag 为空，则生成语言模型的标签。根据语言模型的配置文件、语言类型和参数设置生成标签。

统计信息目录设置

# The directory used for collect-stats mode
if [ -z "${asr_stats_dir}" ]; then
    if [ "${lang}" != noinfo ]; then
        asr_stats_dir="${expdir}/asr_stats_${feats_type}_${lang}_${token_type}"
    else
        asr_stats_dir="${expdir}/asr_stats_${feats_type}_${token_type}"
    fi
    if [ "${token_type}" = bpe ]; then
        asr_stats_dir+="${nbpe}"
    fi
    if [ "${token_type}" = hugging_face ]; then
        asr_stats_dir+="_"${hugging_face_model_name_or_path/\//-}
    fi
    if [ -n "${speed_perturb_factors}" ]; then
        asr_stats_dir+="_sp"
    fi
fi

如果 asr_stats_dir 为空，基于特征类型、语言、分词类型等生成 ASR 统计信息存储目录。

if [ -z "${lm_stats_dir}" ]; then
    if [ "${lang}" != noinfo ]; then
        lm_stats_dir="${expdir}/lm_stats_${lang}_${lm_token_type}"
    else
        lm_stats_dir="${expdir}/lm_stats_${lm_token_type}"
    fi
    if [ "${lm_token_type}" = bpe ]; then
        lm_stats_dir+="${nbpe}"
    fi
fi

如果 lm_stats_dir 为空，基于语言和分词类型生成语言模型统计信息存储目录。

训练命令目录设置

# The directory used for training commands
if [ -z "${asr_exp}" ]; then
    asr_exp="${expdir}/asr_${asr_tag}"
fi
if [ -z "${lm_exp}" ]; then
    lm_exp="${expdir}/lm_${lm_tag}"
fi
if [ -z "${ngram_exp}" ]; then
    ngram_exp="${expdir}/ngram"
fi

为 ASR、语言模型和 N-gram 的实验结果指定存储目录。

推理标签设置

if [ -z "${inference_tag}" ]; then
    if [ -n "${inference_config}" ]; then
        inference_tag="$(basename "${inference_config}" .yaml)"
    else
        inference_tag=inference
    fi
    if [ -n "${inference_args}" ]; then
        inference_tag+="$(echo "${inference_args}" | sed -e "s/--/\_/g" -e "s/[ |=/]//g")"
    fi
    if "${use_lm}"; then
        inference_tag+="_lm_$(basename "${lm_exp}")_$(echo "${inference_lm}" | sed -e "s/\//_/g" -e "s/\.[^.]*$//g")"
    fi
    if "${use_ngram}"; then
        inference_tag+="_ngram_$(basename "${ngram_exp}")_$(echo "${inference_ngram}" | sed -e "s/\//_/g" -e "s/\.[^.]*$//g")"
    fi
    inference_tag+="_asr_model_$(echo "${inference_asr_model}" | sed -e "s/\//_/g" -e "s/\.[^.]*$//g")"

    if "${use_k2}"; then
      inference_tag+="_use_k2"
      inference_tag+="_k2_ctc_decoding_${k2_ctc_decoding}"
      inference_tag+="_use_nbest_rescoring_${use_nbest_rescoring}"
    fi
fi

生成推理阶段的标签，将配置、模型路径等信息拼接成推理标签。

跳过的阶段处理

if "${skip_data_prep}"; then
    skip_stages+="1 2 3 4 5 "
fi
if "${skip_train}"; then
    skip_stages+="2 4 5 6 7 8 9 10 11 "
elif ! "${use_lm}"; then
    skip_stages+="6 7 8 "
fi
if ! "${use_ngram}"; then
    skip_stages+="9 "
fi
if "${skip_eval}"; then
    skip_stages+="12 13 "
fi

if "${skip_packing}"; then
    skip_stages+="14 "
fi
if "${skip_upload_hf}"; then
    skip_stages+="15 "
fi

根据用户设置的跳过选项，动态构建要跳过的阶段列表。

清理与输出

skip_stages=$(echo "${skip_stages}" | tr ' ' '\n' | sort -nu | tr '\n' ' ')
log "Skipped stages: ${skip_stages}"

最后，去重并排序跳过的阶段列表，然后记录日志。

总结

这段代码实现了一个复杂的工作流管理脚本，主要用于语音识别模型的训练和评估。通过定义各种配置项、函数和条件逻辑，该脚本可以灵活地跳过某些阶段、设置参数，并根据用户输入生成相应的目录和文件路径。整个流程的设计意在支持高度可配置的机器学习操作，便于研究者和开发者进行模型训练和实验管理。

查看全文

http://www.kler.cn/a/380981.html

学习stm32

【dvwa靶场：XSS系列】XSS (DOM) 低-中-高级别，通关啦

CesiumJS 案例 P19：添加矩形、监听鼠标左击、监听鼠标右击、监听鼠标移动

mutable用法

数学建模学习（135）：使用Python基于WSM、WPM、WASPAS的多准则决策分析

Swift 开发教程系列 - 第3章：控制流

Oracle 11g DataGuard GAP处理

uniapp实现【时间戳转换为日期格式（年-月-日时-分-秒）】

10款音视频转文字工具体验记！！！

docker构建次数过多导致硬盘爆满，清除

mysql上课总结(2)（DCL的所有操作总结、命令行快速启动/关闭mysql服务）

【让中国再次伟大】腾讯开源大语言模型Hunyuan-large，支持高达256K文本序列

基于qt vs下的视频播放

[Python学习日记-61] 什么是类与对象？类与对象是什么关系呢？我们该如何定义和使用类与对象呢？

使用 Python 构建代理池并测试其有效性

JavaEE初阶----网络原理之TCP篇（一）

10款PDF转Word软件工具的使用感受及其亮点！！！

LeetCode:20. 有效的括号(java)

计算机网络网络层笔记

golang 实现比特币内核：椭圆曲线有限域的代码实现

#渗透测试#SRC漏洞挖掘# 操作系统-windows系统bat病毒

有线电视 1.27.5 | 完全免费的电视直播应用，频道丰富，画质清晰

成功解决WSL2上的Ubuntu22.04执行sudo apt-get update指令报错问题

基于A*算法的无人车路径规划

高斯飞溅OccGaussian 人体重建

IP-guard与Ping32文档加密解决方案对比，选择适合自己的解决方案