kongdeqiang
5 天以前 28ccfbfc51068a663a80764e14074df5edf2b5ba
提交
4个文件已修改
23个文件已添加
2 文件已重命名
9个文件已删除
322666 ■■■■■ 已修改文件
Acknowledge.md 10 ●●●●● 补丁 | 查看 | 原始文档 | blame | 历史
Contribution.md 46 ●●●●● 补丁 | 查看 | 原始文档 | blame | 历史
README.md 396 ●●●●● 补丁 | 查看 | 原始文档 | blame | 历史
SECURITY.md 11 ●●●●● 补丁 | 查看 | 原始文档 | blame | 历史
START_USE.md 51 ●●●●● 补丁 | 查看 | 原始文档 | blame | 历史
data/list/train.jsonl 4 ●●●● 补丁 | 查看 | 原始文档 | blame | 历史
data/list/train_emo.txt 4 ●●●● 补丁 | 查看 | 原始文档 | blame | 历史
data/list/train_event.txt 4 ●●●● 补丁 | 查看 | 原始文档 | blame | 历史
data/list/train_text.txt 6 ●●●●● 补丁 | 查看 | 原始文档 | blame | 历史
data/list/train_text_language.txt 4 ●●●● 补丁 | 查看 | 原始文档 | blame | 历史
data/list/train_wav.scp 6 ●●●●● 补丁 | 查看 | 原始文档 | blame | 历史
data/list/val.jsonl 2 ●●●●● 补丁 | 查看 | 原始文档 | blame | 历史
data/train/train.jsonl 补丁 | 查看 | 原始文档 | blame | 历史
data/train/train_emo.txt 2 ●●●●● 补丁 | 查看 | 原始文档 | blame | 历史
data/train/train_event.txt 2 ●●●●● 补丁 | 查看 | 原始文档 | blame | 历史
data/train/train_text.txt 2 ●●●●● 补丁 | 查看 | 原始文档 | blame | 历史
data/train/train_text_language.txt 2 ●●●●● 补丁 | 查看 | 原始文档 | blame | 历史
data/train/train_wav.scp 2 ●●●●● 补丁 | 查看 | 原始文档 | blame | 历史
data/val/val.jsonl 补丁 | 查看 | 原始文档 | blame | 历史
data/val/val_text.txt 补丁 | 查看 | 原始文档 | blame | 历史
data/val/val_wav.scp 补丁 | 查看 | 原始文档 | blame | 历史
demo1.py 21 ●●●●● 补丁 | 查看 | 原始文档 | blame | 历史
examples/industrial_data_pretraining/paraformer/finetune.sh 40 ●●●● 补丁 | 查看 | 原始文档 | blame | 历史
examples/industrial_data_pretraining/paraformer/infer_from_local.sh 6 ●●●● 补丁 | 查看 | 原始文档 | blame | 历史
gen_funasr_file.py 100 ●●●●● 补丁 | 查看 | 原始文档 | blame | 历史
models/models/iic/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/.mdl 补丁 | 查看 | 原始文档 | blame | 历史
models/models/iic/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/.msc 补丁 | 查看 | 原始文档 | blame | 历史
models/models/iic/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/.mv 1 ●●●● 补丁 | 查看 | 原始文档 | blame | 历史
models/models/iic/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/README.md 411 ●●●●● 补丁 | 查看 | 原始文档 | blame | 历史
models/models/iic/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/am.mvn 8 ●●●●● 补丁 | 查看 | 原始文档 | blame | 历史
models/models/iic/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/config.yaml 134 ●●●●● 补丁 | 查看 | 原始文档 | blame | 历史
models/models/iic/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/configuration.json 17 ●●●●● 补丁 | 查看 | 原始文档 | blame | 历史
models/models/iic/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/example/asr_example.wav 补丁 | 查看 | 原始文档 | blame | 历史
models/models/iic/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/fig/struct.png 补丁 | 查看 | 原始文档 | blame | 历史
models/models/iic/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/model.pt 补丁 | 查看 | 原始文档 | blame | 历史
models/models/iic/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/seg_dict 312968 ●●●●● 补丁 | 查看 | 原始文档 | blame | 历史
models/models/iic/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/tokens.json 8406 ●●●●● 补丁 | 查看 | 原始文档 | blame | 历史
音频标注数据.xlsx 补丁 | 查看 | 原始文档 | blame | 历史
Acknowledge.md
File was deleted
Contribution.md
File was deleted
README.md
File was deleted
SECURITY.md
File was deleted
START_USE.md
New file
@@ -0,0 +1,51 @@
# é’ˆå¯¹ä½¿ç”¨æ­¤å·¥å…·æ¥è¿›è¡Œè®­ç»ƒï¼ˆéžå¼€å‘)
## ä¸€.准备工作:
#### 1.启动后台的数据集采集程序:
```
先执行java -version  ç¡®ä¿ä½¿ç”¨çš„æ˜¯java21,如果不是就输入sudo update-alternatives --config java  åˆ‡æ¢
后台:
cd /home/boying/IdeaProjects/asr_datasets
nohup java -jar AutoLabelASR > output.log 2>&1 &
前端:
控制台输入idea打开,启动AutoLabelASR项目,记得node选20.19,可以用nvm use 20.19.5命令,然后npm run dev会打开https接口
```
#### 2.清库清文件
```
192.168.0.5的数据库asr_datasets每次重新训练时需要清库,不然会有之前已经训练好的重复数据,
同样cd /home/boying/IdeaProjects/asr_datasets  é¡¹ç›®å¯åŠ¨æ–‡ä»¶é‡Œçš„upload文件夹需要清空
```
#### 3.进行采集
```
浏览器打开https://192.168.0.5:1443页面,输入要识别的正确文本,点击开始录音,录入文本对应的语音,
点击保存到数据库,每次录入建议100条以上,点击导出excel
```
## äºŒ.训练流程:
#### 1.文件生成:
```
vscode打开项目FunASRxl-0313,把导出的excel复制到根目录,命名为”音频标注数据.xlsx“,打开终端(右上角的切换面板),依次执行:
conda activate fun_asr_xl   #切换虚拟环境
python gen_funasr_file.py  #生成文件
执行完看data/train目录下有没有生成train_text.txt和train_wav.scp文件
把项目启动文件里的upload文件夹里的audio文件拷贝到data/train/wav目录下
```
#### 2.开始训练:
```
分为paraformer和sensvoice两个模型训练,以后可能会多whisper和nano-2512,操作方法相同paraformer举例:
cd /home/boying/IdeaProjects/FunASRxl-0313/examples/industrial_data_pretraining/paraformer
./finetune.sh   ä¼šçœ‹åˆ°æŽ§åˆ¶å°è¾“出:开始funasr
训练结束后会看到模型保存在xxx目录下,里面有很多model模型(因为有训练轮次,每一次都会生成模型),有个model.pt.best(最优模型)即可成功
需要把整个文件夹另存到别的目录,下一次训练会覆盖此目录,model只留best那个
```
data/list/train.jsonl
File was deleted
data/list/train_emo.txt
File was deleted
data/list/train_event.txt
File was deleted
data/list/train_text.txt
@@ -1,4 +1,2 @@
BAC009S0764W0121 ç”šè‡³å‡ºçŽ°äº¤æ˜“å‡ ä¹Žåœæ»žçš„æƒ…å†µ
BAC009S0916W0489 æ¹–北一公司以员工名义贷款数十员工负债千万
asr_example_cn_en æ‰€æœ‰åªè¦å¤„理 data ä¸ç®¡ä½ æ˜¯åš machine learning åš deep learning åš data analytics åš data science ä¹Ÿå¥½ scientist ä¹Ÿå¥½é€šé€šéƒ½è¦éƒ½åšçš„基本功啊那 again å…ˆå…ˆå¯¹æœ‰ä¸€äº›ä¹Ÿè®¸å¯¹
ID0012W0014 he tried to think how it could be
96ed5ed7-2602-46c5-b5cb-52e737b6c19e æ•°æ®é›†æ‰“标系统
f09683b9-7e1f-4cca-a1d5-7b17f5503116 æµ‹æµ‹å››å
data/list/train_text_language.txt
File was deleted
data/list/train_wav.scp
@@ -1,4 +1,2 @@
BAC009S0764W0121 https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/BAC009S0764W0121.wav
BAC009S0916W0489 https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/BAC009S0916W0489.wav
asr_example_cn_en https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_cn_en.wav
ID0012W0014 https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_en.wav
96ed5ed7-2602-46c5-b5cb-52e737b6c19e wav/96ed5ed7-2602-46c5-b5cb-52e737b6c19e.wav
f09683b9-7e1f-4cca-a1d5-7b17f5503116 wav/f09683b9-7e1f-4cca-a1d5-7b17f5503116.wav
data/list/val.jsonl
File was deleted
data/train/train.jsonl
data/train/train_emo.txt
New file
@@ -0,0 +1,2 @@
BAC009S0764W0121 <|NEUTRAL|>
BAC009S0916W0489 <|NEUTRAL|>
data/train/train_event.txt
New file
@@ -0,0 +1,2 @@
BAC009S0764W0121 <|Speech|>
BAC009S0916W0489 <|Speech|>
data/train/train_text.txt
New file
@@ -0,0 +1,2 @@
BAC009S0764W0121 ç”šè‡³å‡ºçŽ°äº¤æ˜“å‡ ä¹Žåœæ»žçš„æƒ…å†µ
BAC009S0916W0489 æ¹–北一公司以员工名义贷款数十员工负债千万
data/train/train_text_language.txt
New file
@@ -0,0 +1,2 @@
BAC009S0764W0121 <|zh|>
BAC009S0916W0489 <|zh|>
data/train/train_wav.scp
New file
@@ -0,0 +1,2 @@
BAC009S0764W0121 https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/BAC009S0764W0121.wav
BAC009S0916W0489 https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/BAC009S0916W0489.wav
data/val/val.jsonl
data/val/val_text.txt
data/val/val_wav.scp
demo1.py
New file
@@ -0,0 +1,21 @@
import os
# ã€å…³é”®ä¿®æ”¹ 1】在代码里指定模型缓存目录
# æ³¨æ„ï¼šè¿™é‡Œè¦å¡«åˆ° 'models' è¿™ä¸€çº§ï¼Œä¸è¦å¡«åˆ°å…·ä½“的模型文件夹里
# å› ä¸ºç¨‹åºä¼šè‡ªåŠ¨åŽ» è¿™ä¸ªç›®å½•/iic/模型名 ä¸‹å¯»æ‰¾
os.environ['MODELSCOPE_CACHE'] = '/home/boying/IdeaProjects/FunASRxl-0313/models'
from funasr import AutoModel
model = AutoModel(model=r"iic/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch",
        vad_model=None,
        punc_model=None,
        disable_download=True,
        disable_update=True,
)
res = model.generate(
    input = r"1.wav",
    batch_size_s=300,
    hotword='贷款'
)
print('识别结果:',res)
examples/industrial_data_pretraining/paraformer/finetune.sh
old mode 100644 new mode 100755
@@ -2,15 +2,15 @@
#  MIT License  (https://opensource.org/licenses/MIT)
workspace=`pwd`
export MODELSCOPE_CACHE="/home/boying/IdeaProjects/FunASRxl-0313/models/"
# which gpu to train or finetune
export CUDA_VISIBLE_DEVICES="0,1"
export CUDA_VISIBLE_DEVICES="0"
gpu_num=$(echo $CUDA_VISIBLE_DEVICES | awk -F "," '{print NF}')
# model_name from model_hub, or model_dir in local path
## option 1, download model automatically
model_name_or_model_dir="iic/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch"
model_name_or_model_dir="iic/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch"
## option 2, download model by git
#local_path_root=${workspace}/modelscope_models
@@ -20,26 +20,29 @@
# data dir, which contains: train.json, val.json
data_dir="../../../data/list"
data_dir="../../../data"
train_data="${data_dir}/train.jsonl"
val_data="${data_dir}/val.jsonl"
train_data="${data_dir}/train/train.jsonl"
val_data="${data_dir}/val/val.jsonl"
# generate train.jsonl and val.jsonl from wav.scp and text.txt
scp2jsonl \
++scp_file_list='["../../../data/list/train_wav.scp", "../../../data/list/train_text.txt"]' \
++scp_file_list='["../../../data/train/train_wav.scp", "../../../data/train/train_text.txt"]' \
++data_type_list='["source", "target"]' \
++jsonl_file_out="${train_data}"
scp2jsonl \
++scp_file_list='["../../../data/list/val_wav.scp", "../../../data/list/val_text.txt"]' \
++scp_file_list='["../../../data/val/val_wav.scp", "../../../data/val/val_text.txt"]' \
++data_type_list='["source", "target"]' \
++jsonl_file_out="${val_data}"
# exp output dir
output_dir="./outputs"
output_dir="/home/boying/IdeaProjects/FunASRxl-0313/exp/paraformer_train"
log_file="${output_dir}/log.txt"
BATCH_SIZE=16
LR=0.0005
deepspeed_config=${workspace}/../../deepspeed_conf/ds_stage1.json
@@ -56,6 +59,14 @@
echo $DISTRIBUTED_ARGS
echo "=========================================="
echo " å¼€å§‹ FunASR è®­ç»ƒ..."
echo "📁 æ•°æ®ç›®å½•: $train_data"
echo "💾 è¾“出目录: $output_dir"
echo " é¢„训练模型: $model_name_or_model_dir"
echo "🎯 Batch Size: $BATCH_SIZE, LR: $LR, Epochs: $MAX_EPOCH"
echo "=========================================="
torchrun $DISTRIBUTED_ARGS \
../../../funasr/bin/train_ds.py \
++model="${model_name_or_model_dir}" \
@@ -65,7 +76,7 @@
++dataset_conf.index_ds="IndexDSJsonl" \
++dataset_conf.data_split_num=1 \
++dataset_conf.batch_sampler="BatchSampler" \
++dataset_conf.batch_size=6000  \
++dataset_conf.batch_size="${BATCH_SIZE}"  \
++dataset_conf.sort_size=1024 \
++dataset_conf.batch_type="token" \
++dataset_conf.num_workers=4 \
@@ -78,5 +89,10 @@
++train_conf.avg_nbest_model=10 \
++train_conf.use_deepspeed=false \
++train_conf.deepspeed_config=${deepspeed_config} \
++optim_conf.lr=0.0002 \
++output_dir="${output_dir}" &> ${log_file}
++optim_conf.lr="${LR}" \
++output_dir="${output_dir}" &> ${log_file}
echo "=========================================="
echo "✅ è®­ç»ƒå®Œæˆï¼æ¨¡åž‹ä¿å­˜åœ¨: $output_dir"
echo "=========================================="
examples/industrial_data_pretraining/paraformer/infer_from_local.sh
@@ -2,11 +2,11 @@
#  MIT License  (https://opensource.org/licenses/MIT)
# method2, inference from local model
export MODELSCOPE_CACHE="/home/boying/IdeaProjects/FunASRxl-0313/models/"
# for more input type, please ref to readme.md
input="https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_zh.wav"
output_dir="./outputs/debug"
output_dir="/home/boying/IdeaProjects/FunASRxl-0313/exp/paraformer_train/debug"
workspace=`pwd`
@@ -22,7 +22,7 @@
cmvn_file="${local_path}/am.mvn"
config="config.yaml"
init_param="${local_path}/model.pt"
init_param="${local_path}/model.pt.best"
python -m funasr.bin.inference \
--config-path "${local_path}" \
gen_funasr_file.py
New file
@@ -0,0 +1,100 @@
import pandas as pd
import os
import uuid as uuid_lib  # é¿å…å’Œåˆ—名冲突
def validate_uuid(uuid_str):
    """校验 UUID æ ¼å¼æ˜¯å¦åˆæ³•"""
    try:
        uuid_lib.UUID(uuid_str)
        return True
    except ValueError:
        return False
def process_excel_to_funasr_files(excel_path, output_dir="."):
    """
    ä»Ž Excel ç”Ÿæˆ FunASR è®­ç»ƒæ‰€éœ€çš„ train_text.txt å’Œ train_wav.scp
    :param excel_path: Excel æ–‡ä»¶è·¯å¾„
    :param output_dir: è¾“出文件目录(默认当前目录)
    """
    # 1. è¯»å– Excel æ–‡ä»¶
    try:
        if excel_path.endswith(".xlsx"):
            df = pd.read_excel(excel_path, engine="openpyxl")
        elif excel_path.endswith(".csv"):
            df = pd.read_csv(excel_path)
        else:
            raise ValueError("仅支持 .xlsx å’Œ .csv æ ¼å¼æ–‡ä»¶")
    except FileNotFoundError:
        print(f"错误:未找到文件 {excel_path}")
        return
    except Exception as e:
        print(f"读取文件失败:{str(e)}")
        return
    # 2. å®šä¹‰åˆ—名(请根据你的 Excel å®žé™…列名修改)
    uuid_col = "音频唯一标识 (UUID)"
    text_col = "音频对应的文字内容"
    path_col = "音频保存的路径"
    # æ£€æŸ¥å¿…要列是否存在
    required_cols = [uuid_col, text_col, path_col]
    missing_cols = [col for col in required_cols if col not in df.columns]
    if missing_cols:
        print(f"错误:Excel ä¸­ç¼ºå°‘必要列:{missing_cols}")
        return
    # 3. æ•°æ®æ¸…æ´—
    # åŽ»é™¤ç©ºå€¼è¡Œ
    df_clean = df.dropna(subset=required_cols).copy()
    # åŽ»é‡ï¼ˆåŸºäºŽ UUID)
    df_clean = df_clean.drop_duplicates(subset=[uuid_col], keep="first")
    # è¿‡æ»¤ UUID æ ¼å¼é”™è¯¯çš„行
    df_clean["uuid_valid"] = df_clean[uuid_col].apply(validate_uuid)
    invalid_uuid_rows = df_clean[~df_clean["uuid_valid"]]
    if not invalid_uuid_rows.empty:
        print(f"警告:发现 {len(invalid_uuid_rows)} è¡Œ UUID æ ¼å¼é”™è¯¯ï¼Œå·²è¿‡æ»¤ï¼š")
        print(invalid_uuid_rows[uuid_col].tolist())
    df_clean = df_clean[df_clean["uuid_valid"]].drop(columns=["uuid_valid"])
    # 4. ç”Ÿæˆæ–‡ä»¶
    os.makedirs(output_dir, exist_ok=True)
    text_file_path = os.path.join(output_dir, "train_text.txt")
    scp_file_path = os.path.join(output_dir, "train_wav.scp")
    # ç”Ÿæˆ train_text.txt
    with open(text_file_path, "w", encoding="utf-8") as f_text:
        for _, row in df_clean.iterrows():
            uuid = str(row[uuid_col]).strip()
            text = str(row[text_col]).strip()
            # è¿‡æ»¤ç©ºæ–‡æœ¬
            if text:
                f_text.write(f"{uuid} {text}\n")
    # ç”Ÿæˆ train_wav.scp(提取最后一段 uuid.wav)
    with open(scp_file_path, "w", encoding="utf-8") as f_scp:
        for _, row in df_clean.iterrows():
            uuid = str(row[uuid_col]).strip()
            full_path = str(row[path_col]).strip()
            # æå– uuid.wav(兼容 / å’Œ \ è·¯å¾„分隔符)
            wav_file = os.path.basename(full_path)
            # ç¡®ä¿æ˜¯ .wav æ–‡ä»¶
            if wav_file.endswith(".wav"):
                f_scp.write(f"{uuid} wav/{wav_file}\n")
            else:
                print(f"警告:{uuid} çš„音频路径 {full_path} ä¸æ˜¯ .wav æ–‡ä»¶ï¼Œå·²è·³è¿‡")
    # 5. è¾“出统计信息
    print("\n=== å¤„理完成 ===")
    print(f"原始数据行数:{len(df)}")
    print(f"清洗后有效行数:{len(df_clean)}")
    print(f"生成 train_text.txt:{text_file_path}")
    print(f"生成 train_wav.scp:{scp_file_path}")
# ===================== æ‰§è¡Œå…¥å£ =====================
if __name__ == "__main__":
    # è¯·ä¿®æ”¹ä¸ºä½ çš„ Excel æ–‡ä»¶è·¯å¾„
    EXCEL_FILE = "音频标注数据.xlsx"
    # è¾“出目录(默认当前目录,可修改为绝对路径如 "/home/boying/funasr_data")
    OUTPUT_DIR = "./data/list/"
    process_excel_to_funasr_files(EXCEL_FILE, OUTPUT_DIR)
models/models/iic/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/.mdl
Binary files differ
models/models/iic/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/.msc
Binary files differ
models/models/iic/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/.mv
New file
@@ -0,0 +1 @@
Revision:master,CreatedAt:1706753553
models/models/iic/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/README.md
New file
@@ -0,0 +1,411 @@
---
tasks:
- auto-speech-recognition
domain:
- audio
model-type:
- Non-autoregressive
frameworks:
- pytorch
backbone:
- transformer/conformer
metrics:
- CER
license: Apache License 2.0
language:
- cn
tags:
- FunASR
- Paraformer
- Alibaba
- INTERSPEECH 2022
datasets:
  train:
  - 60,000 hour industrial Mandarin task
  test:
  - AISHELL-1 dev/test
  - AISHELL-2 dev_android/dev_ios/dev_mic/test_android/test_ios/test_mic
  - WentSpeech dev/test_meeting/test_net
  - SpeechIO TIOBE
  - 60,000 hour industrial Mandarin task
indexing:
   results:
   - task:
       name: Automatic Speech Recognition
     dataset:
       name: 60,000 hour industrial Mandarin task
       type: audio    # optional
       args: 16k sampling rate, 8404 characters  # optional
     metrics:
       - type: CER
         value: 8.53%  # float
         description: greedy search, withou lm, avg.
         args: default
       - type: RTF
         value: 0.0251  # float
         description: GPU inference on V100
         args: batch_size=1
widgets:
  - task: auto-speech-recognition
    model_revision: v2.0.4
    inputs:
      - type: audio
        name: input
        title: éŸ³é¢‘
    examples:
      - name: 1
        title: ç¤ºä¾‹1
        inputs:
          - name: input
            data: git://example/asr_example.wav
    inferencespec:
      cpu: 8 #CPU数量
      memory: 4096
finetune-support: True
---
# Highlights
- Paraformer-large长音频模型集成VAD、ASR、标点与时间戳功能,可直接对时长为数小时音频进行识别,并输出带标点文字与时间戳:
  - ASR模型:[Parformer-large模型](https://www.modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary)结构为非自回归语音识别模型,多个中文公开数据集上取得SOTA效果,可快速地基于ModelScope对模型进行微调定制和推理。
  - çƒ­è¯ç‰ˆæœ¬ï¼š[Paraformer-large热词版模型](https://www.modelscope.cn/models/damo/speech_paraformer-large-contextual_asr_nat-zh-cn-16k-common-vocab8404/summary)支持热词定制功能,基于提供的热词列表进行激励增强,提升热词的召回率和准确率。
## <strong>[FunASR开源项目介绍](https://github.com/alibaba-damo-academy/FunASR)</strong>
<strong>[FunASR](https://github.com/alibaba-damo-academy/FunASR)</strong>希望在语音识别的学术研究和工业应用之间架起一座桥梁。通过发布工业级语音识别模型的训练和微调,研究人员和开发人员可以更方便地进行语音识别模型的研究和生产,并推动语音识别生态的发展。让语音识别更有趣!
[**github仓库**](https://github.com/alibaba-damo-academy/FunASR)
| [**最新动态**](https://github.com/alibaba-damo-academy/FunASR#whats-new)
| [**环境安装**](https://github.com/alibaba-damo-academy/FunASR#installation)
| [**服务部署**](https://www.funasr.com)
| [**模型库**](https://github.com/alibaba-damo-academy/FunASR/tree/main/model_zoo)
| [**联系我们**](https://github.com/alibaba-damo-academy/FunASR#contact)
## æ¨¡åž‹åŽŸç†ä»‹ç»
Paraformer是达摩院语音团队提出的一种高效的非自回归端到端语音识别框架。本项目为Paraformer中文通用语音识别模型,采用工业级数万小时的标注音频进行模型训练,保证了模型的通用识别效果。模型可以被应用于语音输入法、语音导航、智能会议纪要等场景。
<p align="center">
<img src="fig/struct.png" alt="Paraformer模型结构"  width="500" />
Paraformer模型结构如上图所示,由 Encoder、Predictor、Sampler、Decoder ä¸Ž Loss function äº”部分组成。Encoder可以采用不同的网络结构,例如self-attention,conformer,SAN-M等。Predictor ä¸ºä¸¤å±‚FFN,预测目标文字个数以及抽取目标文字对应的声学向量。Sampler ä¸ºæ— å¯å­¦ä¹ å‚数模块,依据输入的声学向量和目标向量,生产含有语义的特征向量。Decoder ç»“构与自回归模型类似,为双向建模(自回归为单向建模)。Loss function éƒ¨åˆ†ï¼Œé™¤äº†äº¤å‰ç†µï¼ˆCE)与 MWER åŒºåˆ†æ€§ä¼˜åŒ–目标,还包括了 Predictor ä¼˜åŒ–目标 MAE。
其核心点主要有:
- Predictor æ¨¡å—:基于 Continuous integrate-and-fire (CIF) çš„ é¢„测器 (Predictor) æ¥æŠ½å–目标文字对应的声学特征向量,可以更加准确的预测语音中目标文字个数。
- Sampler:通过采样,将声学特征向量与目标文字向量变换成含有语义信息的特征向量,配合双向的 Decoder æ¥å¢žå¼ºæ¨¡åž‹å¯¹äºŽä¸Šä¸‹æ–‡çš„建模能力。
- åŸºäºŽè´Ÿæ ·æœ¬é‡‡æ ·çš„ MWER è®­ç»ƒå‡†åˆ™ã€‚
更详细的细节见:
- è®ºæ–‡ï¼š [Paraformer: Fast and Accurate Parallel Transformer for Non-autoregressive End-to-End Speech Recognition](https://arxiv.org/abs/2206.08317)
- è®ºæ–‡è§£è¯»ï¼š[Paraformer: é«˜è¯†åˆ«çŽ‡ã€é«˜è®¡ç®—æ•ˆçŽ‡çš„å•è½®éžè‡ªå›žå½’ç«¯åˆ°ç«¯è¯­éŸ³è¯†åˆ«æ¨¡åž‹](https://mp.weixin.qq.com/s/xQ87isj5_wxWiQs4qUXtVw)
#### åŸºäºŽModelScope进行推理
- æŽ¨ç†æ”¯æŒéŸ³é¢‘格式如下:
  - wav文件路径,例如:data/test/audios/asr_example.wav
  - pcm文件路径,例如:data/test/audios/asr_example.pcm
  - wav文件url,例如:https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_zh.wav
  - wav二进制数据,格式bytes,例如:用户直接从文件里读出bytes数据或者是麦克风录出bytes数据。
  - å·²è§£æžçš„audio音频,例如:audio, rate = soundfile.read("asr_example_zh.wav"),类型为numpy.ndarray或者torch.Tensor。
  - wav.scp文件,需符合如下要求:
```sh
cat wav.scp
asr_example1  data/test/audios/asr_example1.wav
asr_example2  data/test/audios/asr_example2.wav
...
```
- è‹¥è¾“入格式wav文件url,api调用方式可参考如下范例:
```python
from modelscope.pipelines import pipeline
from modelscope.utils.constant import Tasks
inference_pipeline = pipeline(
    task=Tasks.auto_speech_recognition,
    model='iic/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch',
    model_revision="v2.0.4")
rec_result = inference_pipeline('https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_vad_punc_example.wav')
print(rec_result)
```
- è¾“入音频为pcm格式,调用api时需要传入音频采样率参数audio_fs,例如:
```python
rec_result = inference_pipeline('https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_vad_punc_example.pcm', fs=16000)
```
- è¾“入音频为wav格式,api调用方式可参考如下范例:
```python
rec_result = inference_pipeline('asr_vad_punc_example.wav')
```
- è‹¥è¾“入格式为文件wav.scp(注:文件名需要以.scp结尾),可添加 output_dir å‚数将识别结果写入文件中,api调用方式可参考如下范例:
```python
inference_pipeline("wav.scp", output_dir='./output_dir')
```
识别结果输出路径结构如下:
```sh
tree output_dir/
output_dir/
└── 1best_recog
    â”œâ”€â”€ score
    â”œâ”€â”€ text
1 directory, 4 files
```
score:识别路径得分
text:语音识别结果文件
- è‹¥è¾“入音频为已解析的audio音频,api调用方式可参考如下范例:
```python
import soundfile
waveform, sample_rate = soundfile.read("asr_vad_punc_example.wav")
rec_result = inference_pipeline(waveform)
```
- ASR、VAD、PUNC模型自由组合
可根据使用需求对VAD和PUNC标点模型进行自由组合,使用方式如下:
```python
inference_pipeline = pipeline(
    task=Tasks.auto_speech_recognition,
    model='iic/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch', model_revision="v2.0.4",
    vad_model='iic/speech_fsmn_vad_zh-cn-16k-common-pytorch', vad_model_revision="v2.0.4",
    punc_model='iic/punc_ct-transformer_zh-cn-common-vocab272727-pytorch', punc_model_revision="v2.0.3",
    # spk_model="iic/speech_campplus_sv_zh-cn_16k-common",
    # spk_model_revision="v2.0.2",
)
```
若不使用PUNC模型,可配置punc_model="",或不传入punc_model参数,如需加入LM模型,可增加配置lm_model='damo/speech_transformer_lm_zh-cn-common-vocab8404-pytorch',并设置lm_weight和beam_size参数。
## åŸºäºŽFunASR进行推理
下面为快速上手教程,测试音频([中文](https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/vad_example.wav),[英文](https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_en.wav))
### å¯æ‰§è¡Œå‘½ä»¤è¡Œ
在命令行终端执行:
```shell
funasr +model=paraformer-zh +vad_model="fsmn-vad" +punc_model="ct-punc" +input=vad_example.wav
```
注:支持单条音频文件识别,也支持文件列表,列表为kaldi风格wav.scp:`wav_id   wav_path`
### python示例
#### éžå®žæ—¶è¯­éŸ³è¯†åˆ«
```python
from funasr import AutoModel
# paraformer-zh is a multi-functional asr model
# use vad, punc, spk or not as you need
model = AutoModel(model="paraformer-zh", model_revision="v2.0.4",
                  vad_model="fsmn-vad", vad_model_revision="v2.0.4",
                  punc_model="ct-punc-c", punc_model_revision="v2.0.4",
                  # spk_model="cam++", spk_model_revision="v2.0.2",
                  )
res = model.generate(input=f"{model.model_path}/example/asr_example.wav",
            batch_size_s=300,
            hotword='魔搭')
print(res)
```
注:`model_hub`:表示模型仓库,`ms`为选择modelscope下载,`hf`为选择huggingface下载。
#### å®žæ—¶è¯­éŸ³è¯†åˆ«
```python
from funasr import AutoModel
chunk_size = [0, 10, 5] #[0, 10, 5] 600ms, [0, 8, 4] 480ms
encoder_chunk_look_back = 4 #number of chunks to lookback for encoder self-attention
decoder_chunk_look_back = 1 #number of encoder chunks to lookback for decoder cross-attention
model = AutoModel(model="paraformer-zh-streaming", model_revision="v2.0.4")
import soundfile
import os
wav_file = os.path.join(model.model_path, "example/asr_example.wav")
speech, sample_rate = soundfile.read(wav_file)
chunk_stride = chunk_size[1] * 960 # 600ms
cache = {}
total_chunk_num = int(len((speech)-1)/chunk_stride+1)
for i in range(total_chunk_num):
    speech_chunk = speech[i*chunk_stride:(i+1)*chunk_stride]
    is_final = i == total_chunk_num - 1
    res = model.generate(input=speech_chunk, cache=cache, is_final=is_final, chunk_size=chunk_size, encoder_chunk_look_back=encoder_chunk_look_back, decoder_chunk_look_back=decoder_chunk_look_back)
    print(res)
```
注:`chunk_size`为流式延时配置,`[0,10,5]`表示上屏实时出字粒度为`10*60=600ms`,未来信息为`5*60=300ms`。每次推理输入为`600ms`(采样点数为`16000*0.6=960`),输出为对应文字,最后一个语音片段输入需要设置`is_final=True`来强制输出最后一个字。
#### è¯­éŸ³ç«¯ç‚¹æ£€æµ‹ï¼ˆéžå®žæ—¶ï¼‰
```python
from funasr import AutoModel
model = AutoModel(model="fsmn-vad", model_revision="v2.0.4")
wav_file = f"{model.model_path}/example/asr_example.wav"
res = model.generate(input=wav_file)
print(res)
```
#### è¯­éŸ³ç«¯ç‚¹æ£€æµ‹ï¼ˆå®žæ—¶ï¼‰
```python
from funasr import AutoModel
chunk_size = 200 # ms
model = AutoModel(model="fsmn-vad", model_revision="v2.0.4")
import soundfile
wav_file = f"{model.model_path}/example/vad_example.wav"
speech, sample_rate = soundfile.read(wav_file)
chunk_stride = int(chunk_size * sample_rate / 1000)
cache = {}
total_chunk_num = int(len((speech)-1)/chunk_stride+1)
for i in range(total_chunk_num):
    speech_chunk = speech[i*chunk_stride:(i+1)*chunk_stride]
    is_final = i == total_chunk_num - 1
    res = model.generate(input=speech_chunk, cache=cache, is_final=is_final, chunk_size=chunk_size)
    if len(res[0]["value"]):
        print(res)
```
#### æ ‡ç‚¹æ¢å¤
```python
from funasr import AutoModel
model = AutoModel(model="ct-punc", model_revision="v2.0.4")
res = model.generate(input="那今天的会就到这里吧 happy new year æ˜Žå¹´è§")
print(res)
```
#### æ—¶é—´æˆ³é¢„测
```python
from funasr import AutoModel
model = AutoModel(model="fa-zh", model_revision="v2.0.4")
wav_file = f"{model.model_path}/example/asr_example.wav"
text_file = f"{model.model_path}/example/text.txt"
res = model.generate(input=(wav_file, text_file), data_type=("sound", "text"))
print(res)
```
更多详细用法([示例](https://github.com/alibaba-damo-academy/FunASR/tree/main/examples/industrial_data_pretraining))
## å¾®è°ƒ
详细用法([示例](https://github.com/alibaba-damo-academy/FunASR/tree/main/examples/industrial_data_pretraining))
## Benchmark
  ç»“合大数据、大模型优化的Paraformer在一序列语音识别的benchmark上获得当前SOTA的效果,以下展示学术数据集AISHELL-1、AISHELL-2、WenetSpeech,公开评测项目SpeechIO TIOBE白盒测试场景的效果。在学术界常用的中文语音识别评测任务中,其表现远远超于目前公开发表论文中的结果,远好于单独封闭数据集上的模型。此结果为[Paraformer-large模型](https://www.modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-aishell1-vocab8404-pytorch/summary)在无VAD和标点模型下的测试结果。
### AISHELL-1
| AISHELL-1 test                                   | w/o LM                                | w/ LM                                 |
|:------------------------------------------------:|:-------------------------------------:|:-------------------------------------:|
| <div style="width: 150pt">Espnet</div>           | <div style="width: 150pt">4.90</div>  | <div style="width: 150pt">4.70</div>  |
| <div style="width: 150pt">Wenet</div>            | <div style="width: 150pt">4.61</div>  | <div style="width: 150pt">4.36</div>  |
| <div style="width: 150pt">K2</div>               | <div style="width: 150pt">-</div>     | <div style="width: 150pt">4.26</div>  |
| <div style="width: 150pt">Blockformer</div>      | <div style="width: 150pt">4.29</div>  | <div style="width: 150pt">4.05</div>  |
| <div style="width: 150pt">Paraformer-large</div> | <div style="width: 150pt">1.95</div>  | <div style="width: 150pt">1.68</div>     |
### AISHELL-2
|           | dev_ios| test_android| test_ios|test_mic|
|:-------------------------------------------------:|:-------------------------------------:|:-------------------------------------:|:------------------------------------:|:------------------------------------:|
| <div style="width: 150pt">Espnet</div>            | <div style="width: 70pt">5.40</div>  |<div style="width: 70pt">6.10</div>  |<div style="width: 70pt">5.70</div>  |<div style="width: 70pt">6.10</div>  |
| <div style="width: 150pt">WeNet</div>             | <div style="width: 70pt">-</div>     |<div style="width: 70pt">-</div>     |<div style="width: 70pt">5.39</div>  |<div style="width: 70pt">-</div>    |
| <div style="width: 150pt">Paraformer-large</div>  | <div style="width: 70pt">2.80</div>  |<div style="width: 70pt">3.13</div>  |<div style="width: 70pt">2.85</div>  |<div style="width: 70pt">3.06</div>  |
### Wenetspeech
|           | dev| test_meeting| test_net|
|:-------------------------------------------------:|:-------------------------------------:|:-------------------------------------:|:------------------------------------:|
| <div style="width: 150pt">Espnet</div>            | <div style="width: 100pt">9.70</div>  |<div style="width: 100pt">15.90</div>  |<div style="width: 100pt">8.80</div>  |
| <div style="width: 150pt">WeNet</div>             | <div style="width: 100pt">8.60</div>  |<div style="width: 100pt">17.34</div>  |<div style="width: 100pt">9.26</div>  |
| <div style="width: 150pt">K2</div>                | <div style="width: 100pt">7.76</div>  |<div style="width: 100pt">13.41</div>  |<div style="width: 100pt">8.71</div>  |
| <div style="width: 150pt">Paraformer-large</div>  | <div style="width: 100pt">3.57</div>  |<div style="width: 100pt">6.97</div>   |<div style="width: 100pt">6.74</div>  |
### [SpeechIO TIOBE](https://github.com/SpeechColab/Leaderboard)
Paraformer-large模型结合Transformer-LM模型做shallow fusion,在公开评测项目SpeechIO TIOBE白盒测试场景上获得当前SOTA的效果,目前[Transformer-LM模型](https://modelscope.cn/models/damo/speech_transformer_lm_zh-cn-common-vocab8404-pytorch/summary)已在ModelScope上开源,以下展示SpeechIO TIOBE白盒测试场景without LM、with Transformer-LM的效果:
- Decode config w/o LM:
  - Decode without LM
  - Beam size: 1
- Decode config w/ LM:
  - Decode with [Transformer-LM](https://modelscope.cn/models/damo/speech_transformer_lm_zh-cn-common-vocab8404-pytorch/summary)
  - Beam size: 10
  - LM weight: 0.15
| testset | w/o LM | w/ LM |
|:------------------:|:----:|:----:|
|<div style="width: 200pt">SPEECHIO_ASR_ZH00001</div>| <div style="width: 150pt">0.49</div> | <div style="width: 150pt">0.35</div> |
|<div style="width: 200pt">SPEECHIO_ASR_ZH00002</div>| <div style="width: 150pt">3.23</div> | <div style="width: 150pt">2.86</div> |
|<div style="width: 200pt">SPEECHIO_ASR_ZH00003</div>| <div style="width: 150pt">1.13</div> | <div style="width: 150pt">0.80</div> |
|<div style="width: 200pt">SPEECHIO_ASR_ZH00004</div>| <div style="width: 150pt">1.33</div> | <div style="width: 150pt">1.10</div> |
|<div style="width: 200pt">SPEECHIO_ASR_ZH00005</div>| <div style="width: 150pt">1.41</div> | <div style="width: 150pt">1.18</div> |
|<div style="width: 200pt">SPEECHIO_ASR_ZH00006</div>| <div style="width: 150pt">5.25</div> | <div style="width: 150pt">4.85</div> |
|<div style="width: 200pt">SPEECHIO_ASR_ZH00007</div>| <div style="width: 150pt">5.51</div> | <div style="width: 150pt">4.97</div> |
|<div style="width: 200pt">SPEECHIO_ASR_ZH00008</div>| <div style="width: 150pt">3.69</div> | <div style="width: 150pt">3.18</div> |
|<div style="width: 200pt">SPEECHIO_ASR_ZH00009</div>| <div style="width: 150pt">3.02</div> | <div style="width: 150pt">2.78</div> |
|<div style="width: 200pt">SPEECHIO_ASR_ZH000010</div>| <div style="width: 150pt">3.35</div> | <div style="width: 150pt">2.99</div> |
|<div style="width: 200pt">SPEECHIO_ASR_ZH000011</div>| <div style="width: 150pt">1.54</div> | <div style="width: 150pt">1.25</div> |
|<div style="width: 200pt">SPEECHIO_ASR_ZH000012</div>| <div style="width: 150pt">2.06</div> | <div style="width: 150pt">1.68</div> |
|<div style="width: 200pt">SPEECHIO_ASR_ZH000013</div>| <div style="width: 150pt">2.57</div> | <div style="width: 150pt">2.25</div> |
|<div style="width: 200pt">SPEECHIO_ASR_ZH000014</div>| <div style="width: 150pt">3.86</div> | <div style="width: 150pt">3.08</div> |
|<div style="width: 200pt">SPEECHIO_ASR_ZH000015</div>| <div style="width: 150pt">3.34</div> | <div style="width: 150pt">2.67</div> |
## ä½¿ç”¨æ–¹å¼ä»¥åŠé€‚用范围
运行范围
- æ”¯æŒLinux-x86_64、Mac和Windows运行。
使用方式
- ç›´æŽ¥æŽ¨ç†ï¼šå¯ä»¥ç›´æŽ¥å¯¹è¾“入音频进行解码,输出目标文字。
- å¾®è°ƒï¼šåŠ è½½è®­ç»ƒå¥½çš„æ¨¡åž‹ï¼Œé‡‡ç”¨ç§æœ‰æˆ–è€…å¼€æºæ•°æ®è¿›è¡Œæ¨¡åž‹è®­ç»ƒã€‚
使用范围与目标场景
- é€‚合与离线语音识别场景,如录音文件转写,配合GPU推理效果更加,输入音频时长不限制,可以为几个小时音频。
## æ¨¡åž‹å±€é™æ€§ä»¥åŠå¯èƒ½çš„偏差
考虑到特征提取流程和工具以及训练工具差异,会对CER的数据带来一定的差异(<0.1%),推理GPU环境差异导致的RTF数值差异。
## ç›¸å…³è®ºæ–‡ä»¥åŠå¼•用信息
```BibTeX
@inproceedings{gao2022paraformer,
  title={Paraformer: Fast and Accurate Parallel Transformer for Non-autoregressive End-to-End Speech Recognition},
  author={Gao, Zhifu and Zhang, Shiliang and McLoughlin, Ian and Yan, Zhijie},
  booktitle={INTERSPEECH},
  year={2022}
}
```
models/models/iic/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/am.mvn
New file
@@ -0,0 +1,8 @@
<Nnet>
<Splice> 560 560
[ 0 ]
<AddShift> 560 560
<LearnRateCoef> 0 [ -8.311879 -8.600912 -9.615928 -10.43595 -11.21292 -11.88333 -12.36243 -12.63706 -12.8818 -12.83066 -12.89103 -12.95666 -13.19763 -13.40598 -13.49113 -13.5546 -13.55639 -13.51915 -13.68284 -13.53289 -13.42107 -13.65519 -13.50713 -13.75251 -13.76715 -13.87408 -13.73109 -13.70412 -13.56073 -13.53488 -13.54895 -13.56228 -13.59408 -13.62047 -13.64198 -13.66109 -13.62669 -13.58297 -13.57387 -13.4739 -13.53063 -13.48348 -13.61047 -13.64716 -13.71546 -13.79184 -13.90614 -14.03098 -14.18205 -14.35881 -14.48419 -14.60172 -14.70591 -14.83362 -14.92122 -15.00622 -15.05122 -15.03119 -14.99028 -14.92302 -14.86927 -14.82691 -14.7972 -14.76909 -14.71356 -14.61277 -14.51696 -14.42252 -14.36405 -14.30451 -14.23161 -14.19851 -14.16633 -14.15649 -14.10504 -13.99518 -13.79562 -13.3996 -12.7767 -11.71208 -8.311879 -8.600912 -9.615928 -10.43595 -11.21292 -11.88333 -12.36243 -12.63706 -12.8818 -12.83066 -12.89103 -12.95666 -13.19763 -13.40598 -13.49113 -13.5546 -13.55639 -13.51915 -13.68284 -13.53289 -13.42107 -13.65519 -13.50713 -13.75251 -13.76715 -13.87408 -13.73109 -13.70412 -13.56073 -13.53488 -13.54895 -13.56228 -13.59408 -13.62047 -13.64198 -13.66109 -13.62669 -13.58297 -13.57387 -13.4739 -13.53063 -13.48348 -13.61047 -13.64716 -13.71546 -13.79184 -13.90614 -14.03098 -14.18205 -14.35881 -14.48419 -14.60172 -14.70591 -14.83362 -14.92122 -15.00622 -15.05122 -15.03119 -14.99028 -14.92302 -14.86927 -14.82691 -14.7972 -14.76909 -14.71356 -14.61277 -14.51696 -14.42252 -14.36405 -14.30451 -14.23161 -14.19851 -14.16633 -14.15649 -14.10504 -13.99518 -13.79562 -13.3996 -12.7767 -11.71208 -8.311879 -8.600912 -9.615928 -10.43595 -11.21292 -11.88333 -12.36243 -12.63706 -12.8818 -12.83066 -12.89103 -12.95666 -13.19763 -13.40598 -13.49113 -13.5546 -13.55639 -13.51915 -13.68284 -13.53289 -13.42107 -13.65519 -13.50713 -13.75251 -13.76715 -13.87408 -13.73109 -13.70412 -13.56073 -13.53488 -13.54895 -13.56228 -13.59408 -13.62047 -13.64198 -13.66109 -13.62669 -13.58297 -13.57387 -13.4739 -13.53063 -13.48348 -13.61047 -13.64716 -13.71546 -13.79184 -13.90614 -14.03098 -14.18205 -14.35881 -14.48419 -14.60172 -14.70591 -14.83362 -14.92122 -15.00622 -15.05122 -15.03119 -14.99028 -14.92302 -14.86927 -14.82691 -14.7972 -14.76909 -14.71356 -14.61277 -14.51696 -14.42252 -14.36405 -14.30451 -14.23161 -14.19851 -14.16633 -14.15649 -14.10504 -13.99518 -13.79562 -13.3996 -12.7767 -11.71208 -8.311879 -8.600912 -9.615928 -10.43595 -11.21292 -11.88333 -12.36243 -12.63706 -12.8818 -12.83066 -12.89103 -12.95666 -13.19763 -13.40598 -13.49113 -13.5546 -13.55639 -13.51915 -13.68284 -13.53289 -13.42107 -13.65519 -13.50713 -13.75251 -13.76715 -13.87408 -13.73109 -13.70412 -13.56073 -13.53488 -13.54895 -13.56228 -13.59408 -13.62047 -13.64198 -13.66109 -13.62669 -13.58297 -13.57387 -13.4739 -13.53063 -13.48348 -13.61047 -13.64716 -13.71546 -13.79184 -13.90614 -14.03098 -14.18205 -14.35881 -14.48419 -14.60172 -14.70591 -14.83362 -14.92122 -15.00622 -15.05122 -15.03119 -14.99028 -14.92302 -14.86927 -14.82691 -14.7972 -14.76909 -14.71356 -14.61277 -14.51696 -14.42252 -14.36405 -14.30451 -14.23161 -14.19851 -14.16633 -14.15649 -14.10504 -13.99518 -13.79562 -13.3996 -12.7767 -11.71208 -8.311879 -8.600912 -9.615928 -10.43595 -11.21292 -11.88333 -12.36243 -12.63706 -12.8818 -12.83066 -12.89103 -12.95666 -13.19763 -13.40598 -13.49113 -13.5546 -13.55639 -13.51915 -13.68284 -13.53289 -13.42107 -13.65519 -13.50713 -13.75251 -13.76715 -13.87408 -13.73109 -13.70412 -13.56073 -13.53488 -13.54895 -13.56228 -13.59408 -13.62047 -13.64198 -13.66109 -13.62669 -13.58297 -13.57387 -13.4739 -13.53063 -13.48348 -13.61047 -13.64716 -13.71546 -13.79184 -13.90614 -14.03098 -14.18205 -14.35881 -14.48419 -14.60172 -14.70591 -14.83362 -14.92122 -15.00622 -15.05122 -15.03119 -14.99028 -14.92302 -14.86927 -14.82691 -14.7972 -14.76909 -14.71356 -14.61277 -14.51696 -14.42252 -14.36405 -14.30451 -14.23161 -14.19851 -14.16633 -14.15649 -14.10504 -13.99518 -13.79562 -13.3996 -12.7767 -11.71208 -8.311879 -8.600912 -9.615928 -10.43595 -11.21292 -11.88333 -12.36243 -12.63706 -12.8818 -12.83066 -12.89103 -12.95666 -13.19763 -13.40598 -13.49113 -13.5546 -13.55639 -13.51915 -13.68284 -13.53289 -13.42107 -13.65519 -13.50713 -13.75251 -13.76715 -13.87408 -13.73109 -13.70412 -13.56073 -13.53488 -13.54895 -13.56228 -13.59408 -13.62047 -13.64198 -13.66109 -13.62669 -13.58297 -13.57387 -13.4739 -13.53063 -13.48348 -13.61047 -13.64716 -13.71546 -13.79184 -13.90614 -14.03098 -14.18205 -14.35881 -14.48419 -14.60172 -14.70591 -14.83362 -14.92122 -15.00622 -15.05122 -15.03119 -14.99028 -14.92302 -14.86927 -14.82691 -14.7972 -14.76909 -14.71356 -14.61277 -14.51696 -14.42252 -14.36405 -14.30451 -14.23161 -14.19851 -14.16633 -14.15649 -14.10504 -13.99518 -13.79562 -13.3996 -12.7767 -11.71208 -8.311879 -8.600912 -9.615928 -10.43595 -11.21292 -11.88333 -12.36243 -12.63706 -12.8818 -12.83066 -12.89103 -12.95666 -13.19763 -13.40598 -13.49113 -13.5546 -13.55639 -13.51915 -13.68284 -13.53289 -13.42107 -13.65519 -13.50713 -13.75251 -13.76715 -13.87408 -13.73109 -13.70412 -13.56073 -13.53488 -13.54895 -13.56228 -13.59408 -13.62047 -13.64198 -13.66109 -13.62669 -13.58297 -13.57387 -13.4739 -13.53063 -13.48348 -13.61047 -13.64716 -13.71546 -13.79184 -13.90614 -14.03098 -14.18205 -14.35881 -14.48419 -14.60172 -14.70591 -14.83362 -14.92122 -15.00622 -15.05122 -15.03119 -14.99028 -14.92302 -14.86927 -14.82691 -14.7972 -14.76909 -14.71356 -14.61277 -14.51696 -14.42252 -14.36405 -14.30451 -14.23161 -14.19851 -14.16633 -14.15649 -14.10504 -13.99518 -13.79562 -13.3996 -12.7767 -11.71208 ]
<Rescale> 560 560
<LearnRateCoef> 0 [ 0.155775 0.154484 0.1527379 0.1518718 0.1506028 0.1489256 0.147067 0.1447061 0.1436307 0.1443568 0.1451849 0.1455157 0.1452821 0.1445717 0.1439195 0.1435867 0.1436018 0.1438781 0.1442086 0.1448844 0.1454756 0.145663 0.146268 0.1467386 0.1472724 0.147664 0.1480913 0.1483739 0.1488841 0.1493636 0.1497088 0.1500379 0.1502916 0.1505389 0.1506787 0.1507102 0.1505992 0.1505445 0.1505938 0.1508133 0.1509569 0.1512396 0.1514625 0.1516195 0.1516156 0.1515561 0.1514966 0.1513976 0.1512612 0.151076 0.1510596 0.1510431 0.151077 0.1511168 0.1511917 0.151023 0.1508045 0.1505885 0.1503493 0.1502373 0.1501726 0.1500762 0.1500065 0.1499782 0.150057 0.1502658 0.150469 0.1505335 0.1505505 0.1505328 0.1504275 0.1502438 0.1499674 0.1497118 0.1494661 0.1493102 0.1493681 0.1495501 0.1499738 0.1509654 0.155775 0.154484 0.1527379 0.1518718 0.1506028 0.1489256 0.147067 0.1447061 0.1436307 0.1443568 0.1451849 0.1455157 0.1452821 0.1445717 0.1439195 0.1435867 0.1436018 0.1438781 0.1442086 0.1448844 0.1454756 0.145663 0.146268 0.1467386 0.1472724 0.147664 0.1480913 0.1483739 0.1488841 0.1493636 0.1497088 0.1500379 0.1502916 0.1505389 0.1506787 0.1507102 0.1505992 0.1505445 0.1505938 0.1508133 0.1509569 0.1512396 0.1514625 0.1516195 0.1516156 0.1515561 0.1514966 0.1513976 0.1512612 0.151076 0.1510596 0.1510431 0.151077 0.1511168 0.1511917 0.151023 0.1508045 0.1505885 0.1503493 0.1502373 0.1501726 0.1500762 0.1500065 0.1499782 0.150057 0.1502658 0.150469 0.1505335 0.1505505 0.1505328 0.1504275 0.1502438 0.1499674 0.1497118 0.1494661 0.1493102 0.1493681 0.1495501 0.1499738 0.1509654 0.155775 0.154484 0.1527379 0.1518718 0.1506028 0.1489256 0.147067 0.1447061 0.1436307 0.1443568 0.1451849 0.1455157 0.1452821 0.1445717 0.1439195 0.1435867 0.1436018 0.1438781 0.1442086 0.1448844 0.1454756 0.145663 0.146268 0.1467386 0.1472724 0.147664 0.1480913 0.1483739 0.1488841 0.1493636 0.1497088 0.1500379 0.1502916 0.1505389 0.1506787 0.1507102 0.1505992 0.1505445 0.1505938 0.1508133 0.1509569 0.1512396 0.1514625 0.1516195 0.1516156 0.1515561 0.1514966 0.1513976 0.1512612 0.151076 0.1510596 0.1510431 0.151077 0.1511168 0.1511917 0.151023 0.1508045 0.1505885 0.1503493 0.1502373 0.1501726 0.1500762 0.1500065 0.1499782 0.150057 0.1502658 0.150469 0.1505335 0.1505505 0.1505328 0.1504275 0.1502438 0.1499674 0.1497118 0.1494661 0.1493102 0.1493681 0.1495501 0.1499738 0.1509654 0.155775 0.154484 0.1527379 0.1518718 0.1506028 0.1489256 0.147067 0.1447061 0.1436307 0.1443568 0.1451849 0.1455157 0.1452821 0.1445717 0.1439195 0.1435867 0.1436018 0.1438781 0.1442086 0.1448844 0.1454756 0.145663 0.146268 0.1467386 0.1472724 0.147664 0.1480913 0.1483739 0.1488841 0.1493636 0.1497088 0.1500379 0.1502916 0.1505389 0.1506787 0.1507102 0.1505992 0.1505445 0.1505938 0.1508133 0.1509569 0.1512396 0.1514625 0.1516195 0.1516156 0.1515561 0.1514966 0.1513976 0.1512612 0.151076 0.1510596 0.1510431 0.151077 0.1511168 0.1511917 0.151023 0.1508045 0.1505885 0.1503493 0.1502373 0.1501726 0.1500762 0.1500065 0.1499782 0.150057 0.1502658 0.150469 0.1505335 0.1505505 0.1505328 0.1504275 0.1502438 0.1499674 0.1497118 0.1494661 0.1493102 0.1493681 0.1495501 0.1499738 0.1509654 0.155775 0.154484 0.1527379 0.1518718 0.1506028 0.1489256 0.147067 0.1447061 0.1436307 0.1443568 0.1451849 0.1455157 0.1452821 0.1445717 0.1439195 0.1435867 0.1436018 0.1438781 0.1442086 0.1448844 0.1454756 0.145663 0.146268 0.1467386 0.1472724 0.147664 0.1480913 0.1483739 0.1488841 0.1493636 0.1497088 0.1500379 0.1502916 0.1505389 0.1506787 0.1507102 0.1505992 0.1505445 0.1505938 0.1508133 0.1509569 0.1512396 0.1514625 0.1516195 0.1516156 0.1515561 0.1514966 0.1513976 0.1512612 0.151076 0.1510596 0.1510431 0.151077 0.1511168 0.1511917 0.151023 0.1508045 0.1505885 0.1503493 0.1502373 0.1501726 0.1500762 0.1500065 0.1499782 0.150057 0.1502658 0.150469 0.1505335 0.1505505 0.1505328 0.1504275 0.1502438 0.1499674 0.1497118 0.1494661 0.1493102 0.1493681 0.1495501 0.1499738 0.1509654 0.155775 0.154484 0.1527379 0.1518718 0.1506028 0.1489256 0.147067 0.1447061 0.1436307 0.1443568 0.1451849 0.1455157 0.1452821 0.1445717 0.1439195 0.1435867 0.1436018 0.1438781 0.1442086 0.1448844 0.1454756 0.145663 0.146268 0.1467386 0.1472724 0.147664 0.1480913 0.1483739 0.1488841 0.1493636 0.1497088 0.1500379 0.1502916 0.1505389 0.1506787 0.1507102 0.1505992 0.1505445 0.1505938 0.1508133 0.1509569 0.1512396 0.1514625 0.1516195 0.1516156 0.1515561 0.1514966 0.1513976 0.1512612 0.151076 0.1510596 0.1510431 0.151077 0.1511168 0.1511917 0.151023 0.1508045 0.1505885 0.1503493 0.1502373 0.1501726 0.1500762 0.1500065 0.1499782 0.150057 0.1502658 0.150469 0.1505335 0.1505505 0.1505328 0.1504275 0.1502438 0.1499674 0.1497118 0.1494661 0.1493102 0.1493681 0.1495501 0.1499738 0.1509654 0.155775 0.154484 0.1527379 0.1518718 0.1506028 0.1489256 0.147067 0.1447061 0.1436307 0.1443568 0.1451849 0.1455157 0.1452821 0.1445717 0.1439195 0.1435867 0.1436018 0.1438781 0.1442086 0.1448844 0.1454756 0.145663 0.146268 0.1467386 0.1472724 0.147664 0.1480913 0.1483739 0.1488841 0.1493636 0.1497088 0.1500379 0.1502916 0.1505389 0.1506787 0.1507102 0.1505992 0.1505445 0.1505938 0.1508133 0.1509569 0.1512396 0.1514625 0.1516195 0.1516156 0.1515561 0.1514966 0.1513976 0.1512612 0.151076 0.1510596 0.1510431 0.151077 0.1511168 0.1511917 0.151023 0.1508045 0.1505885 0.1503493 0.1502373 0.1501726 0.1500762 0.1500065 0.1499782 0.150057 0.1502658 0.150469 0.1505335 0.1505505 0.1505328 0.1504275 0.1502438 0.1499674 0.1497118 0.1494661 0.1493102 0.1493681 0.1495501 0.1499738 0.1509654 ]
</Nnet>
models/models/iic/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/config.yaml
New file
@@ -0,0 +1,134 @@
# This is an example that demonstrates how to configure a model file.
# You can modify the configuration according to your own requirements.
# to print the register_table:
# from funasr.register import tables
# tables.print()
# network architecture
#model: funasr.models.paraformer.model:Paraformer
model: BiCifParaformer
model_conf:
    ctc_weight: 0.0
    lsm_weight: 0.1
    length_normalized_loss: true
    predictor_weight: 1.0
    predictor_bias: 1
    sampling_ratio: 0.75
# encoder
encoder: SANMEncoder
encoder_conf:
    output_size: 512
    attention_heads: 4
    linear_units: 2048
    num_blocks: 50
    dropout_rate: 0.1
    positional_dropout_rate: 0.1
    attention_dropout_rate: 0.1
    input_layer: pe
    pos_enc_class: SinusoidalPositionEncoder
    normalize_before: true
    kernel_size: 11
    sanm_shfit: 0
    selfattention_layer_type: sanm
# decoder
decoder: ParaformerSANMDecoder
decoder_conf:
    attention_heads: 4
    linear_units: 2048
    num_blocks: 16
    dropout_rate: 0.1
    positional_dropout_rate: 0.1
    self_attention_dropout_rate: 0.1
    src_attention_dropout_rate: 0.1
    att_layer_num: 16
    kernel_size: 11
    sanm_shfit: 0
predictor: CifPredictorV3
predictor_conf:
    idim: 512
    threshold: 1.0
    l_order: 1
    r_order: 1
    tail_threshold: 0.45
    smooth_factor2: 0.25
    noise_threshold2: 0.01
    upsample_times: 3
    use_cif1_cnn: false
    upsample_type: cnn_blstm
# frontend related
frontend: WavFrontend
frontend_conf:
    fs: 16000
    window: hamming
    n_mels: 80
    frame_length: 25
    frame_shift: 10
    lfr_m: 7
    lfr_n: 6
specaug: SpecAugLFR
specaug_conf:
    apply_time_warp: false
    time_warp_window: 5
    time_warp_mode: bicubic
    apply_freq_mask: true
    freq_mask_width_range:
    - 0
    - 30
    lfr_rate: 6
    num_freq_mask: 1
    apply_time_mask: true
    time_mask_width_range:
    - 0
    - 12
    num_time_mask: 1
train_conf:
  accum_grad: 1
  grad_clip: 5
  max_epoch: 150
  val_scheduler_criterion:
      - valid
      - acc
  best_model_criterion:
  -   - valid
      - acc
      - max
  keep_nbest_models: 10
  log_interval: 50
optim: adam
optim_conf:
   lr: 0.0005
scheduler: warmuplr
scheduler_conf:
   warmup_steps: 30000
dataset: AudioDataset
dataset_conf:
    index_ds: IndexDSJsonl
    batch_sampler: DynamicBatchLocalShuffleSampler
    batch_type: example # example or length
    batch_size: 1 # if batch_type is example, batch_size is the numbers of samples; if length, batch_size is source_token_len+target_token_len;
    max_token_length: 2048 # filter samples if source_token_len+target_token_len > max_token_length,
    buffer_size: 500
    shuffle: True
    num_workers: 0
tokenizer: CharTokenizer
tokenizer_conf:
  unk_symbol: <unk>
  split_with_space: true
ctc_conf:
    dropout_rate: 0.0
    ctc_type: builtin
    reduce: true
    ignore_nan_grad: true
normalize: null
models/models/iic/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/configuration.json
New file
@@ -0,0 +1,17 @@
{
  "framework": "pytorch",
  "task" : "auto-speech-recognition",
  "model": {"type" : "funasr"},
  "pipeline": {"type":"funasr-pipeline"},
  "vad_model": "iic/speech_fsmn_vad_zh-cn-16k-common-pytorch",
  "punc_model": "iic/punc_ct-transformer_cn-en-common-vocab471067-large",
  "lm_model": "iic/speech_transformer_lm_zh-cn-common-vocab8404-pytorch",
  "model_name_in_hub": {
    "ms":"iic/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch",
    "hf":""},
  "file_path_metas": {
    "init_param":"model.pt",
    "config":"config.yaml",
    "tokenizer_conf": {"token_list": "tokens.json", "seg_dict_file": "seg_dict"},
    "frontend_conf":{"cmvn_file": "am.mvn"}}
}
models/models/iic/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/example/asr_example.wav
Binary files differ
models/models/iic/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/fig/struct.png
models/models/iic/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/model.pt
Binary files differ
models/models/iic/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/seg_dict
New file
Diff too large
models/models/iic/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/tokens.json
New file
Diff too large
ÒôƵ±ê×¢Êý¾Ý.xlsx
Binary files differ