python/FunASR-XL.git

parent: a308356d | 补丁 | 提交 | ignore whitespace

Merge branch 'main' of https://github.com/alibaba-damo-academy/FunASR

nichongjia-2007

2023-07-07 1ce704d8c09bd4d4c7e5ab087f951f31fad9fca6

Merge branch 'main' of https://github.com/alibaba-damo-academy/FunASR

37个文件已修改

1个文件已删除

9个文件已添加

2 文件已重命名

	Acknowledge	8 ●●●●● 补丁 \| 查看 \| 原始文档 \| blame \| 历史
	README.md	10 ●●●●● 补丁 \| 查看 \| 原始文档 \| blame \| 历史
	README_zh.md	207 ●●●●● 补丁 \| 查看 \| 原始文档 \| blame \| 历史
	docs/images/dingding.jpg	补丁 \| 查看 \| 原始文档 \| blame \| 历史
	docs/model_zoo/modelscope_models.md	2 ●●●●● 补丁 \| 查看 \| 原始文档 \| blame \| 历史
	docs/reference/papers.md	1 ●●●●● 补丁 \| 查看 \| 原始文档 \| blame \| 历史
	egs/aishell/bat/README.md	16 ●●●●● 补丁 \| 查看 \| 原始文档 \| blame \| 历史
	egs/aishell/bat/conf/decode_bat_conformer.yaml	1 ●●●●● 补丁 \| 查看 \| 原始文档 \| blame \| 历史
	egs/aishell/bat/conf/train_conformer_bat.yaml	108 ●●●●● 补丁 \| 查看 \| 原始文档 \| blame \| 历史
	egs/aishell/bat/local/aishell_data_prep.sh	66 ●●●●● 补丁 \| 查看 \| 原始文档 \| blame \| 历史
	egs/aishell/bat/path.sh	5 ●●●●● 补丁 \| 查看 \| 原始文档 \| blame \| 历史
	egs/aishell/bat/run.sh	210 ●●●●● 补丁 \| 查看 \| 原始文档 \| blame \| 历史
	egs/aishell/bat/utils	1 ●●●●● 补丁 \| 查看 \| 原始文档 \| blame \| 历史
	egs_modelscope/asr/paraformer/speech_paraformer-large-contextual_asr_nat-zh-cn-16k-common-vocab8404/demo.py	4 ●●●●● 补丁 \| 查看 \| 原始文档 \| blame \| 历史
	funasr/bin/asr_infer.py	14 ●●●●● 补丁 \| 查看 \| 原始文档 \| blame \| 历史
	funasr/bin/asr_inference_launch.py	27 ●●●●● 补丁 \| 查看 \| 原始文档 \| blame \| 历史
	funasr/bin/build_trainer.py	4 ●●●●● 补丁 \| 查看 \| 原始文档 \| blame \| 历史
	funasr/bin/diar_infer.py	2 ●●●●● 补丁 \| 查看 \| 原始文档 \| blame \| 历史
	funasr/bin/diar_inference_launch.py	15 ●●●●● 补丁 \| 查看 \| 原始文档 \| blame \| 历史
	funasr/datasets/large_datasets/dataset.py	18 ●●●●● 补丁 \| 查看 \| 原始文档 \| blame \| 历史
	funasr/datasets/large_datasets/utils/hotword_utils.py	3 ●●●●● 补丁 \| 查看 \| 原始文档 \| blame \| 历史
	funasr/datasets/large_datasets/utils/tokenize.py	12 ●●●●● 补丁 \| 查看 \| 原始文档 \| blame \| 历史
	funasr/models/decoder/contextual_decoder.py	3 ●●●●● 补丁 \| 查看 \| 原始文档 \| blame \| 历史
	funasr/models/e2e_asr_contextual_paraformer.py	4 ●●●●● 补丁 \| 查看 \| 原始文档 \| blame \| 历史
	funasr/models/e2e_asr_paraformer.py	2 ●●●●● 补丁 \| 查看 \| 原始文档 \| blame \| 历史
	funasr/models/encoder/rnn_encoder.py	4 ●●●●● 补丁 \| 查看 \| 原始文档 \| blame \| 历史
	funasr/modules/data2vec/data_utils.py	2 ●●●●● 补丁 \| 查看 \| 原始文档 \| blame \| 历史
	funasr/modules/frontends/mask_estimator.py	2 ●●●●● 补丁 \| 查看 \| 原始文档 \| blame \| 历史
	funasr/modules/nets_utils.py	8 ●●●●● 补丁 \| 查看 \| 原始文档 \| blame \| 历史
	funasr/runtime/deploy_tools/funasr-runtime-deploy-offline-cpu-zh.sh	583 ●●●●● 补丁 \| 查看 \| 原始文档 \| blame \| 历史
	funasr/runtime/docs/SDK_advanced_guide_offline.md	2 ●●●●● 补丁 \| 查看 \| 原始文档 \| blame \| 历史
	funasr/runtime/docs/SDK_advanced_guide_offline_zh.md	230 ●●●●● 补丁 \| 查看 \| 原始文档 \| blame \| 历史
	funasr/runtime/docs/SDK_tutorial.md	6 ●●●●● 补丁 \| 查看 \| 原始文档 \| blame \| 历史
	funasr/runtime/docs/SDK_tutorial_zh.md	47 ●●●●● 补丁 \| 查看 \| 原始文档 \| blame \| 历史
	funasr/runtime/html5/readme.md	2 ●●●●● 补丁 \| 查看 \| 原始文档 \| blame \| 历史
	funasr/runtime/html5/readme_cn.md	2 ●●●●● 补丁 \| 查看 \| 原始文档 \| blame \| 历史
	funasr/runtime/python/onnxruntime/setup.py	2 ●●●●● 补丁 \| 查看 \| 原始文档 \| blame \| 历史
	funasr/runtime/python/websocket/README.md	29 ●●●●● 补丁 \| 查看 \| 原始文档 \| blame \| 历史
	funasr/runtime/python/websocket/funasr_wss_client.py	93 ●●●●● 补丁 \| 查看 \| 原始文档 \| blame \| 历史
	funasr/runtime/python/websocket/funasr_wss_server.py	58 ●●●●● 补丁 \| 查看 \| 原始文档 \| blame \| 历史
	funasr/runtime/python/websocket/parse_args.py	50 ●●●●● 补丁 \| 查看 \| 原始文档 \| blame \| 历史
	funasr/runtime/python/websocket/requirements_client.txt	2 ●●●●● 补丁 \| 查看 \| 原始文档 \| blame \| 历史
	funasr/runtime/readme_cn.md	4 ●●●●● 补丁 \| 查看 \| 原始文档 \| blame \| 历史
	funasr/runtime/run_server.sh	补丁 \| 查看 \| 原始文档 \| blame \| 历史
	funasr/runtime/websocket/CMakeLists.txt	2 ●●●●● 补丁 \| 查看 \| 原始文档 \| blame \| 历史
	funasr/runtime/websocket/funasr-wss-client.cpp	5 ●●●●● 补丁 \| 查看 \| 原始文档 \| blame \| 历史
	funasr/utils/misc.py	2 ●●●●● 补丁 \| 查看 \| 原始文档 \| blame \| 历史
	funasr/version.txt	2 ●●●●● 补丁 \| 查看 \| 原始文档 \| blame \| 历史
	tests/test_asr_inference_pipeline.py	16 ●●●●● 补丁 \| 查看 \| 原始文档 \| blame \| 历史

 Acknowledge

New file
@@ -0,0 +1,8 @@
## Acknowledge

1. We borrowed a lot of code from [Kaldi](http://kaldi-asr.org/) for data preparation.
2. We borrowed a lot of code from [ESPnet](https://github.com/espnet/espnet). FunASR follows up the training and finetuning pipelines of ESPnet.
3. We referred [Wenet](https://github.com/wenet-e2e/wenet) for building dataloader for large scale data training.
4. We acknowledge [ChinaTelecom](https://github.com/zhuzizyf/damo-fsmn-vad-infer-httpserver) for contributing the VAD runtime.
5. We acknowledge [RapidAI](https://github.com/RapidAI) for contributing the Paraformer and CT_Transformer-punc runtime.
6. We acknowledge [AiHealthx](http://www.aihealthx.com/) for contributing the websocket service and html5.

 README.md

@@ -1,5 +1,7 @@
[//]: # (<div align="left"><img src="docs/images/funasr_logo.jpg" width="400"/></div>)

([简体中文](./README_zh.md)|English)

# FunASR: A Fundamental End-to-End Speech Recognition Toolkit
<p align="left">
    <a href=""><img src="https://img.shields.io/badge/OS-Linux%2C%20Win%2C%20Mac-brightgreen.svg"></a>
@@ -23,7 +25,7 @@

### FunASR runtime-SDK

- 2023.07.02: 
- 2023.07.03: 
We have release the FunASR runtime-SDK-0.1.0, file transcription service (Mandarin) is now supported ([ZH](funasr/runtime/readme_cn.md)/[EN](funasr/runtime/readme.md))

### Multi-Channel Multi-Party Meeting Transcription 2.0 (M2MeT2.0) Challenge
@@ -109,13 +111,13 @@
For the server:
```shell
cd funasr/runtime/python/websocket
python wss_srv_asr.py --port 10095
python funasr_wss_server.py --port 10095
```

For the client:
```shell
python wss_client_asr.py --host "127.0.0.1" --port 10095 --mode 2pass --chunk_size "5,10,5"
#python wss_client_asr.py --host "127.0.0.1" --port 10095 --mode 2pass --chunk_size "8,8,4" --audio_in "./data/wav.scp" --output_dir "./results"
python funasr_wss_client.py --host "127.0.0.1" --port 10095 --mode 2pass --chunk_size "5,10,5"
#python funasr_wss_client.py --host "127.0.0.1" --port 10095 --mode 2pass --chunk_size "8,8,4" --audio_in "./data/wav.scp" --output_dir "./results"
```
More examples could be found in [docs](https://alibaba-damo-academy.github.io/FunASR/en/runtime/websocket_python.html#id2)
## Contact

 README_zh.md

New file
@@ -0,0 +1,207 @@
[//]: # (<div align="left"><img src="docs/images/funasr_logo.jpg" width="400"/></div>)

(简体中文|[English](./README.md))

# FunASR: A Fundamental End-to-End Speech Recognition Toolkit
<p align="left">
    <a href=""><img src="https://img.shields.io/badge/OS-Linux%2C%20Win%2C%20Mac-brightgreen.svg"></a>
    <a href=""><img src="https://img.shields.io/badge/Python->=3.7,<=3.10-aff.svg"></a>
    <a href=""><img src="https://img.shields.io/badge/Pytorch-%3E%3D1.11-blue"></a>
</p>

FunASR希望在语音识别的学术研究和工业应用之间架起一座桥梁。通过支持在[ModelScope](https://www.modelscope.cn/models?page=1&tasks=auto-speech-recognition)上发布的工业级语音识别模型的训练和微调，研究人员和开发人员可以更方便地进行语音识别模型的研究和生产，并推动语音识别生态的发展。让语音识别更有趣！

<div align="center">  
<h4>
<a href="#最新动态"> 最新动态 </a>
｜<a href="#安装教程"> 安装 </a>
｜<a href="#快速开始"> 快速开始 </a>
｜<a href="https://alibaba-damo-academy.github.io/FunASR/en/index.html"> 教程文档 </a>
｜<a href="#核心功能"> 核心功能 </a>
｜<a href="./docs/model_zoo/modelscope_models.md"> 模型仓库 </a>
｜<a href="./funasr/runtime/readme_cn.md"> 服务部署 </a>
｜<a href="#联系我们"> 联系我们 </a>
</h4>
</div>

<a name="最新动态"></a>
## 最新动态

### 服务部署SDK

- 2023.07.03: 
中文离线文件转写服务（CPU版本）发布，支持一键部署和测试([点击此处](funasr/runtime/readme_cn.md))

### ASRU 2023 多通道多方会议转录挑战 2.0

详情请参考文档（[点击此处](https://alibaba-damo-academy.github.io/FunASR/m2met2_cn/index.html)）


### 学术模型更新

### 工业模型更新

- 2023/07/06 

<a name="核心功能"></a>
## 核心功能
- FunASR是一个基础语音识别工具包，提供多种功能，包括语音识别（ASR）、语音活动检测（VAD）、标点恢复、语言模型、说话人验证、说话人分离和多人对话语音识别。
- 我们在[ModelScope](https://www.modelscope.cn/models?page=1&tasks=auto-speech-recognition)上发布了大量的学术和工业预训练模型，可以通过我们的[模型仓库](https://github.com/alibaba-damo-academy/FunASR/blob/main/docs/model_zoo/modelscope_models.md)访问。代表性的[Paraformer-large](https://www.modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary)模型在许多语音识别任务中实现了SOTA性能。
- FunASR提供了一个易于使用的接口，可以直接基于ModelScope中托管模型进行推理与微调。此外，FunASR中的优化数据加载器可以加速大规模数据集的训练速度。

<a name="安装教程"></a>
## 安装教程

直接安装发布软件包

```shell
pip3 install -U funasr
# 中国大陆用户，如果遇到网络问题，可以用下面指令:
# pip3 install -U funasr -i https://mirror.sjtu.edu.cn/pypi/web/simple
```

您也可以从源码安装


``` sh
git clone https://github.com/alibaba/FunASR.git && cd FunASR
pip3 install -e ./
# 中国大陆用户，如果遇到网络问题，可以用下面指令:
# pip3 install -e ./ -i https://mirror.sjtu.edu.cn/pypi/web/simple
```
如果您需要使用ModelScope中发布的预训练模型，需要安装ModelScope

```shell
pip3 install -U modelscope
# 中国大陆用户，如果遇到网络问题，可以用下面指令:
# pip3 install -U modelscope -f https://modelscope.oss-cn-beijing.aliyuncs.com/releases/repo.html -i https://mirror.sjtu.edu.cn/pypi/web/simple
```

更详细安装过程介绍（[点击此处](https://alibaba-damo-academy.github.io/FunASR/en/installation/installation.html)）

<a name="快速开始"></a>
## 快速开始

您可以通过如下几种方式使用FunASR功能:

- 服务部署SDK
- 工业模型egs
- 学术模型egs

### 服务部署SDK

#### python版本示例

支持实时流式语音识别，并且会用非流式模型进行纠错，输出文本带有标点。目前只支持单个client，如需多并发请参考下方c++版本服务部署SDK

##### 服务端部署
```shell
cd funasr/runtime/python/websocket
python funasr_wss_server.py --port 10095
```

##### 客户端测试
```shell
python funasr_wss_client.py --host "127.0.0.1" --port 10095 --mode 2pass --chunk_size "5,10,5"
#python funasr_wss_client.py --host "127.0.0.1" --port 10095 --mode 2pass --chunk_size "8,8,4" --audio_in "./data/wav.scp"
```
更多例子可以参考（[点击此处](https://alibaba-damo-academy.github.io/FunASR/en/runtime/websocket_python.html#id2)）

<a name="cpp版本示例"></a>
#### c++版本示例

目前已支持离线文件转写服务（CPU），支持上百路并发请求

##### 服务端部署
可以用个下面指令，一键部署完成部署
```shell
curl -O https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/shell/funasr-runtime-deploy-offline-cpu-zh.sh
sudo bash funasr-runtime-deploy-offline-cpu-zh.sh install --workspace ./funasr-runtime-resources
```

##### 客户端测试

```shell
python3 funasr_wss_client.py --host "127.0.0.1" --port 10095 --mode offline --audio_in "../audio/asr_example.wav"
```
更多例子参考（[点击此处](https://github.com/alibaba-damo-academy/FunASR/blob/main/funasr/runtime/docs/SDK_tutorial_zh.md)）


### 工业模型egs

如果您希望使用ModelScope中预训练好的工业模型，进行推理或者微调训练，您可以参考下面指令：


```python
from modelscope.pipelines import pipeline
from modelscope.utils.constant import Tasks

inference_pipeline = pipeline(
    task=Tasks.auto_speech_recognition,
    model='damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch',
)

rec_result = inference_pipeline(audio_in='https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_zh.wav')
print(rec_result)
# {'text': '欢迎大家来体验达摩院推出的语音识别模型'}
```

更多例子可以参考（[点击此处](https://alibaba-damo-academy.github.io/FunASR/en/modelscope_pipeline/quick_start.html)）


### 学术模型egs

如果您希望从头开始训练，通常为学术模型，您可以通过下面的指令启动训练与推理：

```shell
cd egs/aishell/paraformer
. ./run.sh --CUDA_VISIBLE_DEVICES="0,1" --gpu_num=2
```

更多例子可以参考（[点击此处](https://alibaba-damo-academy.github.io/FunASR/en/academic_recipe/asr_recipe.html)）

<a name="联系我们"></a>
## 联系我们

如果您在使用中遇到困难，可以通过以下方式联系我们

- 邮件: [funasr@list.alibaba-inc.com](funasr@list.alibaba-inc.com)

|                                  钉钉群                                  |                          微信                           |
|:---------------------------------------------------------------------:|:-----------------------------------------------------:|
| <div align="left"><img src="docs/images/dingding.jpg" width="250"/>   | <img src="docs/images/wechat.png" width="232"/></div> |

## 社区贡献者

| <div align="left"><img src="docs/images/damo.png" width="180"/> | <div align="left"><img src="docs/images/nwpu.png" width="260"/> | <img src="docs/images/China_Telecom.png" width="200"/> </div>  | <img src="docs/images/RapidAI.png" width="200"/> </div> | <img src="docs/images/aihealthx.png" width="200"/> </div> |
|:---------------------------------------------------------------:|:---------------------------------------------------------------:|:--------------------------------------------------------------:|:-------------------------------------------------------:|:-----------------------------------------------------------:|

贡献者名单请参考（[点击此处](./Acknowledge)）


## 许可协议
项目遵循[The MIT License](https://opensource.org/licenses/MIT)开源协议. 工业模型许可协议请参考（[点击此处](./MODEL_LICENSE)）


## Stargazers over time

[![Stargazers over time](https://starchart.cc/alibaba-damo-academy/FunASR.svg)](https://starchart.cc/alibaba-damo-academy/FunASR)

## 论文引用

``` bibtex
@inproceedings{gao2023funasr,
  author={Zhifu Gao and Zerui Li and Jiaming Wang and Haoneng Luo and Xian Shi and Mengzhe Chen and Yabin Li and Lingyun Zuo and Zhihao Du and Zhangyu Xiao and Shiliang Zhang},
  title={FunASR: A Fundamental End-to-End Speech Recognition Toolkit},
  year={2023},
  booktitle={INTERSPEECH},
}
@inproceedings{gao22b_interspeech,
  author={Zhifu Gao and ShiLiang Zhang and Ian McLoughlin and Zhijie Yan},
  title={{Paraformer: Fast and Accurate Parallel Transformer for Non-autoregressive End-to-End Speech Recognition}},
  year=2022,
  booktitle={Proc. Interspeech 2022},
  pages={2063--2067},
  doi={10.21437/Interspeech.2022-9996}
}
```

 docs/images/dingding.jpg



 docs/model_zoo/modelscope_models.md

@@ -15,7 +15,7 @@
|                                                                     Model Name                                                                     | Language |          Training Data           | Vocab Size | Parameter | Offline/Online | Notes                                                                                                                           |
|:--------------------------------------------------------------------------------------------------------------------------------------------------:|:--------:|:--------------------------------:|:----------:|:---------:|:--------------:|:--------------------------------------------------------------------------------------------------------------------------------|
|        [Paraformer-large](https://www.modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary)        | CN & EN  | Alibaba Speech Data (60000hours) |    8404    |   220M    |    Offline     | Duration of input wav <= 20s                                                                                                    |
| [Paraformer-large-long](https://www.modelscope.cn/models/damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary) | CN & EN  | Alibaba Speech Data (60000hours) |    8404    |   220M    |    Offline     | Which ould deal with arbitrary length input wav                                                                                 |
| [Paraformer-large-long](https://www.modelscope.cn/models/damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary) | CN & EN  | Alibaba Speech Data (60000hours) |    8404    |   220M    |    Offline     | Which would deal with arbitrary length input wav                                                                                 |
| [Paraformer-large-contextual](https://www.modelscope.cn/models/damo/speech_paraformer-large-contextual_asr_nat-zh-cn-16k-common-vocab8404/summary) | CN & EN  | Alibaba Speech Data (60000hours) |    8404    |   220M    |    Offline     | Which supports the hotword customization based on the incentive enhancement, and improves the recall and precision of hotwords. |
|              [Paraformer](https://modelscope.cn/models/damo/speech_paraformer_asr_nat-zh-cn-16k-common-vocab8358-tensorflow1/summary)              | CN & EN  | Alibaba Speech Data (50000hours) |    8358    |    68M    |    Offline     | Duration of input wav <= 20s                                                                                                    |
|           [Paraformer-online](https://www.modelscope.cn/models/damo/speech_paraformer_asr_nat-zh-cn-16k-common-vocab8404-online/summary)           | CN & EN  | Alibaba Speech Data (50000hours) |    8404    |    68M    |     Online     | Which could deal with streaming input                                                                                           |

 docs/reference/papers.md

@@ -4,6 +4,7 @@

### Speech Recognition
- [FunASR: A Fundamental End-to-End Speech Recognition Toolkit](https://arxiv.org/abs/2305.11013), INTERSPEECH 2023
- [BAT: Boundary aware transducer for memory-efficient and low-latency ASR](https://arxiv.org/abs/2305.11571), INTERSPEECH 2023
- [Paraformer: Fast and Accurate Parallel Transformer for Non-autoregressive End-to-End Speech Recognition](https://arxiv.org/abs/2206.08317), INTERSPEECH 2022
- [Universal ASR: Unifying Streaming and Non-Streaming ASR Using a Single Encoder-Decoder Model](https://arxiv.org/abs/2010.14099), arXiv preprint arXiv:2010.14099, 2020.
- [San-m: Memory equipped self-attention for end-to-end speech recognition](https://arxiv.org/pdf/2006.01713), INTERSPEECH 2020

 egs/aishell/bat/README.md

New file
@@ -0,0 +1,16 @@
# Boundary Aware Transducer (BAT) Result

## Training Config
- 8 gpu(Tesla V100)
- Feature info: using 80 dims fbank, global cmvn, speed perturb(0.9, 1.0, 1.1), specaugment
- Train config: conf/train_conformer_bat.yaml
- LM config: LM was not used
- Model size: 90M

## Results (CER)
- Decode config: conf/decode_bat_conformer.yaml

|   testset   |  CER(%) |
|:-----------:|:-------:|
|     dev     |  4.56   |
|    test     |  4.97   |

 egs/aishell/bat/conf/decode_bat_conformer.yaml

New file
@@ -0,0 +1 @@
beam_size: 10

 egs/aishell/bat/conf/train_conformer_bat.yaml

New file
@@ -0,0 +1,108 @@
encoder: chunk_conformer
encoder_conf:
      activation_type: swish
      positional_dropout_rate: 0.5
      time_reduction_factor: 2
      embed_vgg_like: false
      subsampling_factor: 4
      linear_units: 2048
      output_size: 512
      attention_heads: 8
      dropout_rate: 0.5
      positional_dropout_rate: 0.5
      attention_dropout_rate: 0.5
      cnn_module_kernel: 15
      num_blocks: 12    

# decoder related
rnnt_decoder: rnnt
rnnt_decoder_conf:
    embed_size: 512
    hidden_size: 512
    embed_dropout_rate: 0.5
    dropout_rate: 0.5
    use_embed_mask: true

predictor: bat_predictor
predictor_conf:
  idim: 512
  threshold: 1.0
  l_order: 1
  r_order: 1
  return_accum: true

joint_network_conf:
    joint_space_size: 512

# frontend related
frontend: wav_frontend
frontend_conf:
    fs: 16000
    window: hamming
    n_mels: 80
    frame_length: 25
    frame_shift: 10
    lfr_m: 1
    lfr_n: 1


# Auxiliary CTC
model: bat
model_conf:
    auxiliary_ctc_weight: 0.0
    cif_weight: 1.0
    r_d: 3
    r_u: 5

# minibatch related
use_amp: true

# optimization related
accum_grad: 1
grad_clip: 5
max_epoch: 100
val_scheduler_criterion:
    - valid
    - loss
best_model_criterion:
-   - valid
    - cer_transducer
    - min
keep_nbest_models: 10

optim: adam
optim_conf:
   lr: 0.001
scheduler: warmuplr
scheduler_conf:
   warmup_steps: 25000

specaug: specaug
specaug_conf:
    apply_time_warp: true
    time_warp_window: 5
    time_warp_mode: bicubic
    apply_freq_mask: true
    freq_mask_width_range:
    - 0
    - 40
    num_freq_mask: 2
    apply_time_mask: true
    time_mask_width_range:
    - 0
    - 50
    num_time_mask: 5

dataset_conf:
    data_names: speech,text
    data_types: sound,text
    shuffle: True
    shuffle_conf:
        shuffle_size: 2048
        sort_size: 500
    batch_conf:
        batch_type: token
        batch_size: 25000
    num_workers: 8

log_interval: 50

 egs/aishell/bat/local/aishell_data_prep.sh

New file
@@ -0,0 +1,66 @@
#!/bin/bash

# Copyright 2017 Xingyu Na
# Apache 2.0

#. ./path.sh || exit 1;

if [ $# != 3 ]; then
  echo "Usage: $0 <audio-path> <text-path> <output-path>"
  echo " $0 /export/a05/xna/data/data_aishell/wav /export/a05/xna/data/data_aishell/transcript data"
  exit 1;
fi

aishell_audio_dir=$1
aishell_text=$2/aishell_transcript_v0.8.txt
output_dir=$3

train_dir=$output_dir/data/local/train
dev_dir=$output_dir/data/local/dev
test_dir=$output_dir/data/local/test
tmp_dir=$output_dir/data/local/tmp

mkdir -p $train_dir
mkdir -p $dev_dir
mkdir -p $test_dir
mkdir -p $tmp_dir

# data directory check
if [ ! -d $aishell_audio_dir ] || [ ! -f $aishell_text ]; then
  echo "Error: $0 requires two directory arguments"
  exit 1;
fi

# find wav audio file for train, dev and test resp.
find $aishell_audio_dir -iname "*.wav" > $tmp_dir/wav.flist
n=`cat $tmp_dir/wav.flist | wc -l`
[ $n -ne 141925 ] && \
  echo Warning: expected 141925 data data files, found $n

grep -i "wav/train" $tmp_dir/wav.flist > $train_dir/wav.flist || exit 1;
grep -i "wav/dev" $tmp_dir/wav.flist > $dev_dir/wav.flist || exit 1;
grep -i "wav/test" $tmp_dir/wav.flist > $test_dir/wav.flist || exit 1;

rm -r $tmp_dir

# Transcriptions preparation
for dir in $train_dir $dev_dir $test_dir; do
  echo Preparing $dir transcriptions
  sed -e 's/\.wav//' $dir/wav.flist | awk -F '/' '{print $NF}' > $dir/utt.list
  paste -d' ' $dir/utt.list $dir/wav.flist > $dir/wav.scp_all
  utils/filter_scp.pl -f 1 $dir/utt.list $aishell_text > $dir/transcripts.txt
  awk '{print $1}' $dir/transcripts.txt > $dir/utt.list
  utils/filter_scp.pl -f 1 $dir/utt.list $dir/wav.scp_all | sort -u > $dir/wav.scp
  sort -u $dir/transcripts.txt > $dir/text
done

mkdir -p $output_dir/data/train $output_dir/data/dev $output_dir/data/test

for f in wav.scp text; do
  cp $train_dir/$f $output_dir/data/train/$f || exit 1;
  cp $dev_dir/$f $output_dir/data/dev/$f || exit 1;
  cp $test_dir/$f $output_dir/data/test/$f || exit 1;
done

echo "$0: AISHELL data preparation succeeded"
exit 0;

 egs/aishell/bat/path.sh

New file
@@ -0,0 +1,5 @@
export FUNASR_DIR=$PWD/../../..

# NOTE(kan-bayashi): Use UTF-8 in Python to avoid UnicodeDecodeError when LC_ALL=C
export PYTHONIOENCODING=UTF-8
export PATH=$FUNASR_DIR/funasr/bin:$PATH

 egs/aishell/bat/run.sh

New file
@@ -0,0 +1,210 @@
#!/usr/bin/env bash

. ./path.sh || exit 1;

# machines configuration
CUDA_VISIBLE_DEVICES="0,1,2,3,4,5,6,7"
gpu_num=8
count=1
gpu_inference=true  # Whether to perform gpu decoding, set false for cpu decoding
# for gpu decoding, inference_nj=ngpu*njob; for cpu decoding, inference_nj=njob
njob=5
train_cmd=utils/run.pl
infer_cmd=utils/run.pl

# general configuration
feats_dir="../DATA" #feature output dictionary
exp_dir="."
lang=zh
token_type=char
type=sound
scp=wav.scp
speed_perturb="0.9 1.0 1.1"
stage=0
stop_stage=5

# feature configuration
feats_dim=80
nj=64

# data
raw_data=../raw_data
data_url=www.openslr.org/resources/33

# exp tag
tag="exp1"

. utils/parse_options.sh || exit 1;

# Set bash to 'debug' mode, it will exit on :
# -e 'error', -u 'undefined variable', -o ... 'error in pipeline', -x 'print commands',
set -e
set -u
set -o pipefail

train_set=train
valid_set=dev
test_sets="dev test"

asr_config=conf/train_conformer_bat.yaml
model_dir="baseline_$(basename "${asr_config}" .yaml)_${lang}_${token_type}_${tag}"

inference_config=conf/decode_bat_conformer.yaml
inference_asr_model=valid.cer_transducer.ave_10best.pb

# you can set gpu num for decoding here
gpuid_list=$CUDA_VISIBLE_DEVICES  # set gpus for decoding, the same as training stage by default
ngpu=$(echo $gpuid_list | awk -F "," '{print NF}')

if ${gpu_inference}; then
    inference_nj=$[${ngpu}*${njob}]
    _ngpu=1
else
    inference_nj=$njob
    _ngpu=0
fi

if [ ${stage} -le -1 ] && [ ${stop_stage} -ge -1 ]; then
    echo "stage -1: Data Download"
    local/download_and_untar.sh ${raw_data} ${data_url} data_aishell
    local/download_and_untar.sh ${raw_data} ${data_url} resource_aishell
fi

if [ ${stage} -le 0 ] && [ ${stop_stage} -ge 0 ]; then
    echo "stage 0: Data preparation"
    # Data preparation
    local/aishell_data_prep.sh ${raw_data}/data_aishell/wav ${raw_data}/data_aishell/transcript ${feats_dir}
    for x in train dev test; do
        cp ${feats_dir}/data/${x}/text ${feats_dir}/data/${x}/text.org
        paste -d " " <(cut -f 1 -d" " ${feats_dir}/data/${x}/text.org) <(cut -f 2- -d" " ${feats_dir}/data/${x}/text.org | tr -d " ") \
            > ${feats_dir}/data/${x}/text
        utils/text2token.py -n 1 -s 1 ${feats_dir}/data/${x}/text > ${feats_dir}/data/${x}/text.org
        mv ${feats_dir}/data/${x}/text.org ${feats_dir}/data/${x}/text
    done
fi

if [ ${stage} -le 1 ] && [ ${stop_stage} -ge 1 ]; then
    echo "stage 1: Feature and CMVN Generation"
    utils/compute_cmvn.sh --cmd "$train_cmd" --nj $nj --feats_dim ${feats_dim} ${feats_dir}/data/${train_set}
fi

token_list=${feats_dir}/data/${lang}_token_list/char/tokens.txt
echo "dictionary: ${token_list}"
if [ ${stage} -le 2 ] && [ ${stop_stage} -ge 2 ]; then
    echo "stage 2: Dictionary Preparation"
    mkdir -p ${feats_dir}/data/${lang}_token_list/char/

    echo "make a dictionary"
    echo "<blank>" > ${token_list}
    echo "<s>" >> ${token_list}
    echo "</s>" >> ${token_list}
    utils/text2token.py -s 1 -n 1 --space "" ${feats_dir}/data/$train_set/text | cut -f 2- -d" " | tr " " "\n" \
        | sort | uniq | grep -a -v -e '^\s*$' | awk '{print $0}' >> ${token_list}
    echo "<unk>" >> ${token_list}
fi

# LM Training Stage
world_size=$gpu_num  # run on one machine
if [ ${stage} -le 3 ] && [ ${stop_stage} -ge 3 ]; then
    echo "stage 3: LM Training"
fi

# ASR Training Stage
world_size=$gpu_num  # run on one machine
if [ ${stage} -le 4 ] && [ ${stop_stage} -ge 4 ]; then
    echo "stage 4: ASR Training"
    mkdir -p ${exp_dir}/exp/${model_dir}
    mkdir -p ${exp_dir}/exp/${model_dir}/log
    INIT_FILE=./ddp_init
    if [ -f $INIT_FILE ];then
        rm -f $INIT_FILE
    fi 
    init_method=file://$(readlink -f $INIT_FILE)
    echo "$0: init method is $init_method"
    for ((i = 0; i < $gpu_num; ++i)); do
        {
            rank=$i
            local_rank=$i
            gpu_id=$(echo $CUDA_VISIBLE_DEVICES | cut -d',' -f$[$i+1])
            train.py \
                --task_name asr \
                --gpu_id $gpu_id \
                --use_preprocessor true \
                --token_type char \
                --token_list $token_list \
                --data_dir ${feats_dir}/data \
                --train_set ${train_set} \
                --valid_set ${valid_set} \
                --data_file_names "wav.scp,text" \
                --cmvn_file ${feats_dir}/data/${train_set}/cmvn/cmvn.mvn \
                --speed_perturb ${speed_perturb} \
                --resume true \
                --output_dir ${exp_dir}/exp/${model_dir} \
                --config $asr_config \
                --ngpu $gpu_num \
                --num_worker_count $count \
                --dist_init_method $init_method \
                --dist_world_size $world_size \
                --dist_rank $rank \
                --local_rank $local_rank 1> ${exp_dir}/exp/${model_dir}/log/train.log.$i 2>&1
        } &
        done
        wait
fi

# Testing Stage
if [ ${stage} -le 5 ] && [ ${stop_stage} -ge 5 ]; then
    echo "stage 5: Inference"
    for dset in ${test_sets}; do
        asr_exp=${exp_dir}/exp/${model_dir}
        inference_tag="$(basename "${inference_config}" .yaml)"
        _dir="${asr_exp}/${inference_tag}/${inference_asr_model}/${dset}"
        _logdir="${_dir}/logdir"
        if [ -d ${_dir} ]; then
            echo "${_dir} is already exists. if you want to decode again, please delete this dir first."
            exit 0
        fi
        mkdir -p "${_logdir}"
        _data="${feats_dir}/data/${dset}"
        key_file=${_data}/${scp}
        num_scp_file="$(<${key_file} wc -l)"
        _nj=$([ $inference_nj -le $num_scp_file ] && echo "$inference_nj" || echo "$num_scp_file")
        split_scps=
        for n in $(seq "${_nj}"); do
            split_scps+=" ${_logdir}/keys.${n}.scp"
        done
        # shellcheck disable=SC2086
        utils/split_scp.pl "${key_file}" ${split_scps}
        _opts=
        if [ -n "${inference_config}" ]; then
            _opts+="--config ${inference_config} "
        fi
        ${infer_cmd} --gpu "${_ngpu}" --max-jobs-run "${_nj}" JOB=1:"${_nj}" "${_logdir}"/asr_inference.JOB.log \
            python -m funasr.bin.asr_inference_launch \
                --batch_size 1 \
                --ngpu "${_ngpu}" \
                --njob ${njob} \
                --gpuid_list ${gpuid_list} \
                --data_path_and_name_and_type "${_data}/${scp},speech,${type}" \
                --cmvn_file ${feats_dir}/data/${train_set}/cmvn/cmvn.mvn \
                --key_file "${_logdir}"/keys.JOB.scp \
                --asr_train_config "${asr_exp}"/config.yaml \
                --asr_model_file "${asr_exp}"/"${inference_asr_model}" \
                --output_dir "${_logdir}"/output.JOB \
                --mode bat \
                ${_opts}

        for f in token token_int score text; do
            if [ -f "${_logdir}/output.1/1best_recog/${f}" ]; then
                for i in $(seq "${_nj}"); do
                    cat "${_logdir}/output.${i}/1best_recog/${f}"
                done | sort -k1 >"${_dir}/${f}"
            fi
        done
        python utils/proce_text.py ${_dir}/text ${_dir}/text.proc
        python utils/proce_text.py ${_data}/text ${_data}/text.proc
        python utils/compute_wer.py ${_data}/text.proc ${_dir}/text.proc ${_dir}/text.cer
        tail -n 3 ${_dir}/text.cer > ${_dir}/text.cer.txt
        cat ${_dir}/text.cer.txt
    done
fi

 egs/aishell/bat/utils

New file
@@ -0,0 +1 @@
../transformer/utils

 egs_modelscope/asr/paraformer/speech_paraformer-large-contextual_asr_nat-zh-cn-16k-common-vocab8404/demo.py

@@ -3,6 +3,10 @@

param_dict = dict()
param_dict['hotword'] = "https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/hotword.txt"
param_dict['clas_scale'] = 1.00  # 1.50 # set it larger if you want high recall (sacrifice general accuracy)
# 13% relative recall raise over internal hotword test set (45%->51%)
# CER might raise when utterance contains no hotword

inference_pipeline = pipeline(
    task=Tasks.auto_speech_recognition,
    model="damo/speech_paraformer-large-contextual_asr_nat-zh-cn-16k-common-vocab8404",

 funasr/bin/asr_infer.py

@@ -280,6 +280,7 @@
            nbest: int = 1,
            frontend_conf: dict = None,
            hotword_list_or_file: str = None,
            clas_scale: float = 1.0,
            decoding_ind: int = 0,
            **kwargs,
    ):
@@ -376,6 +377,7 @@
        # 6. [Optional] Build hotword list from str, local file or url
        self.hotword_list = None
        self.hotword_list = self.generate_hotwords_list(hotword_list_or_file)
        self.clas_scale = clas_scale

        is_use_lm = lm_weight != 0.0 and lm_file is not None
        if (ctc_weight == 0.0 or asr_model.ctc == None) and not is_use_lm:
@@ -439,16 +441,20 @@
        pre_token_length = pre_token_length.round().long()
        if torch.max(pre_token_length) < 1:
            return []
        if not isinstance(self.asr_model, ContextualParaformer) and not isinstance(self.asr_model,
                                                                                   NeatContextualParaformer):
        if not isinstance(self.asr_model, ContextualParaformer) and \
            not isinstance(self.asr_model, NeatContextualParaformer):
            if self.hotword_list:
                logging.warning("Hotword is given but asr model is not a ContextualParaformer.")
            decoder_outs = self.asr_model.cal_decoder_with_predictor(enc, enc_len, pre_acoustic_embeds,
                                                                     pre_token_length)
            decoder_out, ys_pad_lens = decoder_outs[0], decoder_outs[1]
        else:
            decoder_outs = self.asr_model.cal_decoder_with_predictor(enc, enc_len, pre_acoustic_embeds,
                                                                     pre_token_length, hw_list=self.hotword_list)
            decoder_outs = self.asr_model.cal_decoder_with_predictor(enc, 
                                                                     enc_len, 
                                                                     pre_acoustic_embeds,
                                                                     pre_token_length, 
                                                                     hw_list=self.hotword_list,
                                                                     clas_scale=self.clas_scale)
            decoder_out, ys_pad_lens = decoder_outs[0], decoder_outs[1]

        if isinstance(self.asr_model, BiCifParaformer):

 funasr/bin/asr_inference_launch.py

@@ -255,8 +255,10 @@
    if param_dict is not None:
        hotword_list_or_file = param_dict.get('hotword')
        export_mode = param_dict.get("export_mode", False)
        clas_scale = param_dict.get('clas_scale', 1.0)
    else:
        hotword_list_or_file = None
        clas_scale = 1.0

    if kwargs.get("device", None) == "cpu":
        ngpu = 0
@@ -289,6 +291,7 @@
        penalty=penalty,
        nbest=nbest,
        hotword_list_or_file=hotword_list_or_file,
        clas_scale=clas_scale,
    )

    speech2text = Speech2TextParaformer(**speech2text_kwargs)
@@ -617,10 +620,27 @@
            sorted_data = sorted(data_with_index, key=lambda x: x[0][1] - x[0][0])
            results_sorted = []
            
            if not len(sorted_data):
                key = keys[0]
                # no active segments after VAD
                if writer is not None:
                    # Write empty results
                    ibest_writer["token"][key] = ""
                    ibest_writer["token_int"][key] = ""
                    ibest_writer["vad"][key] = ""
                    ibest_writer["text"][key] = ""
                    ibest_writer["text_with_punc"][key] = ""
                    if use_timestamp:
                        ibest_writer["time_stamp"][key] = ""

                logging.info("decoding, utt: {}, empty speech".format(key))
                continue

            batch_size_token_ms = batch_size_token*60
            if speech2text.device == "cpu":
                batch_size_token_ms = 0
            batch_size_token_ms = max(batch_size_token_ms, sorted_data[0][0][1] - sorted_data[0][0][0])
            if len(sorted_data) > 0 and len(sorted_data[0]) > 0:
                batch_size_token_ms = max(batch_size_token_ms, sorted_data[0][0][1] - sorted_data[0][0][0])
            
            batch_size_token_ms_cum = 0
            beg_idx = 0
@@ -1349,10 +1369,7 @@
        left_context=left_context,
        right_context=right_context,
    )
    speech2text = Speech2TextTransducer.from_pretrained(
        model_tag=model_tag,
        **speech2text_kwargs,
    )
    speech2text = Speech2TextTransducer(**speech2text_kwargs)

    def _forward(data_path_and_name_and_type,
                 raw_inputs: Union[np.ndarray, torch.Tensor] = None,

 funasr/bin/build_trainer.py

@@ -85,7 +85,9 @@
        finetune_configs = yaml.safe_load(f)
        # set data_types
        if dataset_type == "large":
            finetune_configs["dataset_conf"]["data_types"] = "sound,text"
            # finetune_configs["dataset_conf"]["data_types"] = "sound,text"
            if 'data_types' not in finetune_configs['dataset_conf']:
                finetune_configs["dataset_conf"]["data_types"] = "sound,text"
    finetune_configs = update_dct(configs, finetune_configs)
    for key, value in finetune_configs.items():
        if hasattr(args, key):

 funasr/bin/diar_infer.py

@@ -179,7 +179,7 @@

    @staticmethod
    def seq2arr(seq, vec_dim=8):
        def int2vec(x, vec_dim=8, dtype=np.int):
        def int2vec(x, vec_dim=8, dtype=np.int32):
            b = ('{:0' + str(vec_dim) + 'b}').format(x)
            # little-endian order: lower bit first
            return (np.array(list(b)[::-1]) == '1').astype(dtype)

 funasr/bin/diar_inference_launch.py

@@ -92,10 +92,7 @@
            embedding_node="resnet1_dense"
        )
        logging.info("speech2xvector_kwargs: {}".format(speech2xvector_kwargs))
        speech2xvector = Speech2Xvector.from_pretrained(
            model_tag=model_tag,
            **speech2xvector_kwargs,
        )
        speech2xvector = Speech2Xvector(**speech2xvector_kwargs)
        speech2xvector.sv_model.eval()

    # 2b. Build speech2diar
@@ -109,10 +106,7 @@
        dur_threshold=dur_threshold,
    )
    logging.info("speech2diarization_kwargs: {}".format(speech2diar_kwargs))
    speech2diar = Speech2DiarizationSOND.from_pretrained(
        model_tag=model_tag,
        **speech2diar_kwargs,
    )
    speech2diar = Speech2DiarizationSOND(**speech2diar_kwargs)
    speech2diar.diar_model.eval()

    def output_results_str(results: dict, uttid: str):
@@ -257,10 +251,7 @@
        dtype=dtype,
    )
    logging.info("speech2diarization_kwargs: {}".format(speech2diar_kwargs))
    speech2diar = Speech2DiarizationEEND.from_pretrained(
        model_tag=model_tag,
        **speech2diar_kwargs,
    )
    speech2diar = Speech2DiarizationEEND(**speech2diar_kwargs)
    speech2diar.diar_model.eval()

    def output_results_str(results: dict, uttid: str):

 funasr/datasets/large_datasets/dataset.py

@@ -202,14 +202,7 @@
    data_types = conf.get("data_types", "kaldi_ark,text")

    pre_hwfile = conf.get("pre_hwlist", None)
    pre_prob = conf.get("pre_prob", 0)  # unused yet

    hw_config = {"sample_rate": conf.get("sample_rate", 0.6),
                 "double_rate": conf.get("double_rate", 0.1),
                 "hotword_min_length": conf.get("hotword_min_length", 2),
                 "hotword_max_length": conf.get("hotword_max_length", 8),
                 "pre_prob": conf.get("pre_prob", 0.0)}

    # pre_prob = conf.get("pre_prob", 0)  # unused yet
    if pre_hwfile is not None:
        pre_hwlist = []
        with open(pre_hwfile, 'r') as fin:
@@ -218,6 +211,15 @@
    else:
        pre_hwlist = None

    hw_config = {"sample_rate": conf.get("sample_rate", 0.6),
                 "double_rate": conf.get("double_rate", 0.1),
                 "hotword_min_length": conf.get("hotword_min_length", 2),
                 "hotword_max_length": conf.get("hotword_max_length", 8),
                 "pre_prob": conf.get("pre_prob", 0.0),
                 "pre_hwlist": pre_hwlist}

    

    dataset = AudioDataset(scp_lists, 
                           data_names, 
                           data_types, 

 funasr/datasets/large_datasets/utils/hotword_utils.py

@@ -6,7 +6,8 @@
                   sample_rate,
                   double_rate,
                   pre_prob,
                   pre_index=None):
                   pre_index=None,
                   pre_hwlist=None):
        if length < hotword_min_length:
            return [-1]
        if random.random() < sample_rate:

 funasr/datasets/large_datasets/utils/tokenize.py

@@ -54,7 +54,17 @@

    length = len(text)
    if 'hw_tag' in data:
        hotword_indxs = sample_hotword(length, **hw_config)
        if hw_config['pre_hwlist'] is not None and hw_config['pre_prob'] > 0:
            # enable preset hotword detect in sampling
            pre_index = None
            for hw in hw_config['pre_hwlist']:
                hw = " ".join(seg_tokenize(hw, seg_dict))
                _find = " ".join(text).find(hw)
                if _find != -1:
                    # _find = text[:_find].count(" ")  # bpe sometimes
                    pre_index = [_find, _find + max(hw.count(" "), 1)]
                    break
        hotword_indxs = sample_hotword(length, **hw_config, pre_index=pre_index)
        data['hotword_indxs'] = hotword_indxs
        del data['hw_tag']
    for i in range(length):

 funasr/models/decoder/contextual_decoder.py

@@ -244,6 +244,7 @@
        ys_in_pad: torch.Tensor,
        ys_in_lens: torch.Tensor,
        contextual_info: torch.Tensor,
        clas_scale: float = 1.0,
        return_hidden: bool = False,
    ) -> Tuple[torch.Tensor, torch.Tensor]:
        """Forward decoder.
@@ -283,7 +284,7 @@
        cx, tgt_mask, _, _, _ = self.bias_decoder(x_self_attn, tgt_mask, contextual_info, memory_mask=contextual_mask)

        if self.bias_output is not None:
            x = torch.cat([x_src_attn, cx], dim=2)
            x = torch.cat([x_src_attn, cx*clas_scale], dim=2)
            x = self.bias_output(x.transpose(1, 2)).transpose(1, 2)  # 2D -> D
            x = x_self_attn + self.dropout(x)


 funasr/models/e2e_asr_contextual_paraformer.py

@@ -341,7 +341,7 @@
            input_mask_expand_dim, 0)
        return sematic_embeds * tgt_mask, decoder_out * tgt_mask

    def cal_decoder_with_predictor(self, encoder_out, encoder_out_lens, sematic_embeds, ys_pad_lens, hw_list=None):
    def cal_decoder_with_predictor(self, encoder_out, encoder_out_lens, sematic_embeds, ys_pad_lens, hw_list=None, clas_scale=1.0):
        if hw_list is None:
            hw_list = [torch.Tensor([1]).long().to(encoder_out.device)]  # empty hotword list
            hw_list_pad = pad_list(hw_list, 0)
@@ -363,7 +363,7 @@
            hw_embed = h_n.repeat(encoder_out.shape[0], 1, 1)
        
        decoder_outs = self.decoder(
            encoder_out, encoder_out_lens, sematic_embeds, ys_pad_lens, contextual_info=hw_embed
            encoder_out, encoder_out_lens, sematic_embeds, ys_pad_lens, contextual_info=hw_embed, clas_scale=clas_scale
        )
        decoder_out = decoder_outs[0]
        decoder_out = torch.log_softmax(decoder_out, dim=-1)

 funasr/models/e2e_asr_paraformer.py

@@ -2107,7 +2107,7 @@

        return loss_att, acc_att, cer_att, wer_att, loss_pre

    def cal_decoder_with_predictor(self, encoder_out, encoder_out_lens, sematic_embeds, ys_pad_lens, hw_list=None):
    def cal_decoder_with_predictor(self, encoder_out, encoder_out_lens, sematic_embeds, ys_pad_lens, hw_list=None, clas_scale=1.0):
        if hw_list is None:
            # default hotword list
            hw_list = [torch.Tensor([self.sos]).long().to(encoder_out.device)]  # empty hotword list

 funasr/models/encoder/rnn_encoder.py

@@ -46,12 +46,12 @@
            raise ValueError(f"Not supported rnn_type={rnn_type}")

        if subsample is None:
            subsample = np.ones(num_layers + 1, dtype=np.int)
            subsample = np.ones(num_layers + 1, dtype=np.int32)
        else:
            subsample = subsample[:num_layers]
            # Append 1 at the beginning because the second or later is used
            subsample = np.pad(
                np.array(subsample, dtype=np.int),
                np.array(subsample, dtype=np.int32),
                [1, num_layers - len(subsample)],
                mode="constant",
                constant_values=1,

 funasr/modules/data2vec/data_utils.py

@@ -105,7 +105,7 @@
            for length in sorted(lengths, reverse=True):
                lens = np.fromiter(
                    (e - s if e - s >= length + min_space else 0 for s, e in parts),
                    np.int,
                    np.int32,
                )
                l_sum = np.sum(lens)
                if l_sum == 0:

 funasr/modules/frontends/mask_estimator.py

@@ -13,7 +13,7 @@
class MaskEstimator(torch.nn.Module):
    def __init__(self, type, idim, layers, units, projs, dropout, nmask=1):
        super().__init__()
        subsample = np.ones(layers + 1, dtype=np.int)
        subsample = np.ones(layers + 1, dtype=np.int32)

        typ = type.lstrip("vgg").rstrip("p")
        if type[-1] == "p":

 funasr/modules/nets_utils.py

@@ -407,7 +407,7 @@

    elif mode == "mt" and arch == "rnn":
        # +1 means input (+1) and layers outputs (train_args.elayer)
        subsample = np.ones(train_args.elayers + 1, dtype=np.int)
        subsample = np.ones(train_args.elayers + 1, dtype=np.int32)
        logging.warning("Subsampling is not performed for machine translation.")
        logging.info("subsample: " + " ".join([str(x) for x in subsample]))
        return subsample
@@ -417,7 +417,7 @@
            or (mode == "mt" and arch == "rnn")
            or (mode == "st" and arch == "rnn")
    ):
        subsample = np.ones(train_args.elayers + 1, dtype=np.int)
        subsample = np.ones(train_args.elayers + 1, dtype=np.int32)
        if train_args.etype.endswith("p") and not train_args.etype.startswith("vgg"):
            ss = train_args.subsample.split("_")
            for j in range(min(train_args.elayers + 1, len(ss))):
@@ -432,7 +432,7 @@

    elif mode == "asr" and arch == "rnn_mix":
        subsample = np.ones(
            train_args.elayers_sd + train_args.elayers + 1, dtype=np.int
            train_args.elayers_sd + train_args.elayers + 1, dtype=np.int32
        )
        if train_args.etype.endswith("p") and not train_args.etype.startswith("vgg"):
            ss = train_args.subsample.split("_")
@@ -451,7 +451,7 @@
    elif mode == "asr" and arch == "rnn_mulenc":
        subsample_list = []
        for idx in range(train_args.num_encs):
            subsample = np.ones(train_args.elayers[idx] + 1, dtype=np.int)
            subsample = np.ones(train_args.elayers[idx] + 1, dtype=np.int32)
            if train_args.etype[idx].endswith("p") and not train_args.etype[
                idx
            ].startswith("vgg"):

 funasr/runtime/deploy_tools/funasr-runtime-deploy-offline-cpu-zh.sh

@@ -1,7 +1,7 @@
#!/usr/bin/env bash

scriptVersion="0.0.4"
scriptDate="20230702"
scriptVersion="0.0.6"
scriptDate="20230705"


# Set color
@@ -22,16 +22,20 @@
UNDERLINE="\033[4m"

# Current folder
cur_dir=`pwd`
CUR_DIR=`pwd`
SUDO_CMD="sudo"


DEFAULT_DOCKER_OFFLINE_CPU_ZH_LISTS="https://raw.githubusercontent.com/alibaba-damo-academy/FunASR/main/funasr/runtime/docs/docker_offline_cpu_zh_lists"
DEFAULT_DOCKER_IMAGE_LISTS=$DEFAULT_DOCKER_OFFLINE_CPU_ZH_LISTS
DEFAULT_DOCKER_OFFLINE_CPU_ZH_LISTS_OSS="https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/docker_lists/docker_offline_cpu_zh_lists"
DEFAULT_DOCKER_OFFLINE_CPU_ZH_LISTS_GIT="https://raw.githubusercontent.com/alibaba-damo-academy/FunASR/main/funasr/runtime/docs/docker_offline_cpu_zh_lists"
DEFAULT_DOCKER_IMAGE_LISTS=$DEFAULT_DOCKER_OFFLINE_CPU_ZH_LISTS_OSS
DEFAULT_FUNASR_DOCKER_URL="registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr"
DEFAULT_FUNASR_RUNTIME_RESOURCES="funasr-runtime-resources"
DEFAULT_FUNASR_LOCAL_WORKSPACE="${cur_dir}/${DEFAULT_FUNASR_RUNTIME_RESOURCES}"
DEFAULT_FUNASR_CONFIG_DIR="/var/funasr"
DEFAULT_FUNASR_LOCAL_WORKSPACE=${CUR_DIR}/${DEFAULT_FUNASR_RUNTIME_RESOURCES}
DEFAULT_FUNASR_CONFIG_DIR=""
DEFAULT_FUNASR_CONFIG_DIR_BAK="/var/funasr"
DEFAULT_FUNASR_CONFIG_FILE="${DEFAULT_FUNASR_CONFIG_DIR}/config"
DEFAULT_FUNASR_SERVER_CONFIG_FILE="${DEFAULT_FUNASR_CONFIG_DIR}/server_config"
DEFAULT_FUNASR_PROGRESS_TXT="${DEFAULT_FUNASR_CONFIG_DIR}/progress.txt"
DEFAULT_FUNASR_SERVER_LOG="${DEFAULT_FUNASR_CONFIG_DIR}/server_console.log"
DEFAULT_FUNASR_WORKSPACE_DIR="/workspace/models"
@@ -308,11 +312,38 @@
}

initConfiguration(){
    if [ ! -z "$DEFAULT_FUNASR_CONFIG_DIR" ]; then
        mkdir -p $DEFAULT_FUNASR_CONFIG_DIR
    if [ -z "$DEFAULT_FUNASR_CONFIG_DIR" ];then
        DEFAULT_FUNASR_CONFIG_DIR="$HOME"
        if [ -z "$DEFAULT_FUNASR_CONFIG_DIR" ];then
            $DEFAULT_FUNASR_CONFIG_DIR=$(echo ~/)
            if [ -z "$DEFAULT_FUNASR_CONFIG_DIR" ];then
                $DEFAULT_FUNASR_CONFIG_DIR=$DEFAULT_FUNASR_CONFIG_DIR_BAK
            fi
        fi
        DEFAULT_FUNASR_CONFIG_DIR=${DEFAULT_FUNASR_CONFIG_DIR}/.funasr
    fi

    if [ ! -z "$DEFAULT_FUNASR_CONFIG_DIR" ]; then
        $SUDO_CMD mkdir -p $DEFAULT_FUNASR_CONFIG_DIR
    else
        echo -e "    ${RED}DEFAULT_FUNASR_CONFIG_DIR is empty!${PLAIN}"
        exit 1
    fi
    if [ ! -d "$DEFAULT_FUNASR_CONFIG_DIR" ]; then
        echo -e "    ${RED}${DEFAULT_FUNASR_CONFIG_DIR} does not exist!${PLAIN}"
        exit 2
    fi

    DEFAULT_FUNASR_CONFIG_FILE="${DEFAULT_FUNASR_CONFIG_DIR}/config"
    DEFAULT_FUNASR_SERVER_CONFIG_FILE="${DEFAULT_FUNASR_CONFIG_DIR}/server_config"
    DEFAULT_FUNASR_PROGRESS_TXT="${DEFAULT_FUNASR_CONFIG_DIR}/progress.txt"
    DEFAULT_FUNASR_SERVER_LOG="${DEFAULT_FUNASR_CONFIG_DIR}/server_console.log"

    if [ ! -f $DEFAULT_FUNASR_CONFIG_FILE ]; then
        touch $DEFAULT_FUNASR_CONFIG_FILE
        $SUDO_CMD touch $DEFAULT_FUNASR_CONFIG_FILE
    fi
    if [ ! -f $DEFAULT_FUNASR_SERVER_CONFIG_FILE ]; then
        $SUDO_CMD touch $DEFAULT_FUNASR_SERVER_CONFIG_FILE
    fi
}

@@ -346,14 +377,21 @@

# Get a list of docker images.
readDockerInfoFromUrl(){
    list_url=$DEFAULT_DOCKER_IMAGE_LISTS
    while true
    do
        list_url=$DEFAULT_DOCKER_IMAGE_LISTS
        content=$(curl --connect-timeout 10 -m 10 -s $list_url)
        if [ ! -z "$content" ]; then
            break
        else
            echo -e "    ${RED}Unable to get docker image list due to network issues, try again.${PLAIN}"

            # switch sources of docker image lists
            if [ "$list_url" = "$DEFAULT_DOCKER_OFFLINE_CPU_ZH_LISTS_OSS" ]; then
                DEFAULT_DOCKER_IMAGE_LISTS=$DEFAULT_DOCKER_OFFLINE_CPU_ZH_LISTS_GIT
            else
                DEFAULT_DOCKER_IMAGE_LISTS=$DEFAULT_DOCKER_OFFLINE_CPU_ZH_LISTS_OSS
            fi
        fi
    done
    array=($(echo "$content"))
@@ -397,7 +435,12 @@
        echo -e "  ${ERROR} MUST RUN AS ${RED}ROOT${PLAIN} USER!"
    fi

    cd $cur_dir
    check_sudo=$(which sudo | wc -l)
    if [ $check_sudo -eq 0 ]; then
        SUDO_CMD=""
    fi

    cd $CUR_DIR
    echo
}

@@ -408,22 +451,42 @@
    readDockerInfoFromUrl
    echo

    the_latest_docker_image=$PARAMS_DOCKER_IMAGE

    echo -e "  ${YELLOW}Please choose the Docker image.${PLAIN}"
    menuSelection ${DOCKER_IMAGES[*]}
    result=$?
    index=`expr $result - 1`
    index=`expr ${result} - 1`

    PARAMS_DOCKER_IMAGE=${DOCKER_IMAGES[${index}]}
    echo -e "  ${UNDERLINE}You have chosen the Docker image:${PLAIN} ${GREEN}${PARAMS_DOCKER_IMAGE}${PLAIN}"

    checkDockerExist
    result=$?
    result=`expr $result + 0`
    if [ ${result} -eq 50 ]; then
        return 50
    if [ -z "$the_latest_docker_image" ] && [ -z "$PARAMS_FUNASR_DOCKER_ID" ]; then
        result=0
    else
        #  0: DOCKER is not running
        # 60: DOCKER_ID is empty
        # 61: DOCKER_IMAGE is empty
        # 62: DOCKER is running
        # 63: DOCKER_ID and DOCKER_IMAGE are empty
        checkDockerIdExist "install"
        result=$?
        result=`expr ${result} + 0`
        if [ $result -eq 60 ]; then
            result=0
        elif [ $result -eq 61 ]; then
            echo
            echo -e "  ${RED}Please run (${PLAIN}${GREEN}${SUDO_CMD} bash funasr-runtime-deploy-offline-cpu-zh.sh install${PLAIN}${RED}) to install Docker first.${PLAIN}"
        elif [ $result -eq 62 ]; then
            echo
            echo -e "  ${RED}Docker: ${PARAMS_DOCKER_IMAGE} ${PARAMS_FUNASR_DOCKER_ID} has been launched, please run (${PLAIN}${GREEN}${SUDO_CMD} bash funasr-runtime-deploy-offline-cpu-zh.sh remove${PLAIN}${RED}) to remove Docker first ant then install.${PLAIN}"
        elif [ $result -eq 63 ]; then
            result=0
        fi
    fi

    echo
    return $result
}

# Configure FunASR server host port setting.
@@ -530,10 +593,6 @@
    if [ ! -z "$funasr_local_models_dir" ]; then
        PARAMS_FUNASR_LOCAL_MODELS_DIR=$funasr_local_models_dir
    fi
    funasr_config_path=`sed '/^PARAMS_FUNASR_CONFIG_PATH=/!d;s/.*=//' ${DEFAULT_FUNASR_CONFIG_FILE}`
    if [ ! -z "$funasr_config_path" ]; then
        PARAMS_FUNASR_CONFIG_PATH=$funasr_config_path
    fi

    docker_image=`sed '/^PARAMS_DOCKER_IMAGE=/!d;s/.*=//' ${DEFAULT_FUNASR_CONFIG_FILE}`
    if [ ! -z "$docker_image" ]; then
@@ -599,6 +658,15 @@
    if [ ! -z "$io_thread_num" ]; then
        PARAMS_IO_THREAD_NUM=$io_thread_num
    fi

    ssl_flag=`sed '/^PARAMS_SSL_FLAG=/!d;s/.*=//' ${DEFAULT_FUNASR_CONFIG_FILE}`
    if [ ! -z "$ssl_flag" ]; then
        PARAMS_SSL_FLAG=$ssl_flag
    fi
    docker_id=`sed '/^PARAMS_FUNASR_DOCKER_ID=/!d;s/.*=//' ${DEFAULT_FUNASR_CONFIG_FILE}`
    if [ ! -z "$docker_id" ]; then
        PARAMS_FUNASR_DOCKER_ID=$docker_id
    fi
}

saveParams(){
@@ -611,7 +679,6 @@
    echo "PARAMS_FUNASR_SAMPLES_LOCAL_DIR=${PARAMS_FUNASR_SAMPLES_LOCAL_DIR}" >> $DEFAULT_FUNASR_CONFIG_FILE
    echo "PARAMS_FUNASR_SAMPLES_LOCAL_PATH=${PARAMS_FUNASR_SAMPLES_LOCAL_PATH}" >> $DEFAULT_FUNASR_CONFIG_FILE
    echo "PARAMS_FUNASR_LOCAL_MODELS_DIR=${PARAMS_FUNASR_LOCAL_MODELS_DIR}" >> $DEFAULT_FUNASR_CONFIG_FILE
    echo "PARAMS_FUNASR_CONFIG_PATH=${PARAMS_FUNASR_CONFIG_PATH}" >> $DEFAULT_FUNASR_CONFIG_FILE

    echo "PARAMS_DOWNLOAD_MODEL_DIR=${PARAMS_DOWNLOAD_MODEL_DIR}" >> $DEFAULT_FUNASR_CONFIG_FILE

@@ -640,12 +707,19 @@
    echo "PARAMS_DOCKER_PORT=${PARAMS_DOCKER_PORT}" >> $DEFAULT_FUNASR_CONFIG_FILE
    echo "PARAMS_DECODER_THREAD_NUM=${PARAMS_DECODER_THREAD_NUM}" >> $DEFAULT_FUNASR_CONFIG_FILE
    echo "PARAMS_IO_THREAD_NUM=${PARAMS_IO_THREAD_NUM}" >> $DEFAULT_FUNASR_CONFIG_FILE
    echo "PARAMS_SSL_FLAG=${PARAMS_SSL_FLAG}" >> $DEFAULT_FUNASR_CONFIG_FILE
    echo "PARAMS_FUNASR_DOCKER_ID=${PARAMS_FUNASR_DOCKER_ID}" >> $DEFAULT_FUNASR_CONFIG_FILE

    serverConfigGeneration
    echo "${daemon_server_config}" > $DEFAULT_FUNASR_SERVER_CONFIG_FILE
}

showAllParams(){
    echo -e "${UNDERLINE}${BOLD}[3/5]${PLAIN}"
    echo -e "  ${YELLOW}Show parameters of FunASR server setting and confirm to run ...${PLAIN}"
    echo

    only_show_flag=$1

    if [ ! -z "$PARAMS_DOCKER_IMAGE" ]; then
        echo -e "  The current Docker image is                                    : ${GREEN}${PARAMS_DOCKER_IMAGE}${PLAIN}"
@@ -696,8 +770,19 @@
    if [ ! -z "$PARAMS_FUNASR_SAMPLES_LOCAL_DIR" ]; then
        echo -e "  Sample code will be store in local                             : ${GREEN}${PARAMS_FUNASR_SAMPLES_LOCAL_DIR}${PLAIN}"
    fi
    if [ ! -z "$PARAMS_SSL_FLAG" ]; then
        echo -e "  The flag for the use of SSL                                    : ${GREEN}${PARAMS_SSL_FLAG}${PLAIN}"
    fi
    if [ "$only_show_flag" = "only_show" ] && [ ! -z "$PARAMS_FUNASR_DOCKER_ID" ]; then
        echo -e "  The docker ID that already exists is                           : ${GREEN}${PARAMS_FUNASR_DOCKER_ID}${PLAIN}"
    fi

    echo

    if [ "$only_show_flag" = "only_show" ]; then
        return 0
    fi

    while true
    do
        params_confirm="y"
@@ -746,22 +831,22 @@
        case "$lowercase_osid" in
            ubuntu)
                DOCKER_INSTALL_CMD="curl -fsSL https://test.docker.com -o test-docker.sh"
                DOCKER_INSTALL_RUN_CMD="sudo sh test-docker.sh"
                DOCKER_INSTALL_RUN_CMD="${SUDO_CMD} sh test-docker.sh"
                ;;
            centos)
                DOCKER_INSTALL_CMD="curl -fsSL https://get.docker.com | bash -s docker --mirror Aliyun"
                ;;
            debian)
                DOCKER_INSTALL_CMD="curl -fsSL https://get.docker.com -o get-docker.sh"
                DOCKER_INSTALL_RUN_CMD="sudo sh get-docker.sh"
                DOCKER_INSTALL_RUN_CMD="${SUDO_CMD} sh get-docker.sh"
                ;;
            \"alios\")
                DOCKER_INSTALL_CMD="curl -fsSL https://get.docker.com -o get-docker.sh"
                DOCKER_INSTALL_RUN_CMD="sudo sh get-docker.sh"
                DOCKER_INSTALL_RUN_CMD="${SUDO_CMD} sh get-docker.sh"
                ;;
            \"alinux\")
                DOCKER_INSTALL_CMD="sudo yum -y install dnf"
                DOCKER_INSTALL_RUN_CMD="sudo dnf -y install docker"
                DOCKER_INSTALL_CMD="${SUDO_CMD} yum -y install dnf"
                DOCKER_INSTALL_RUN_CMD="${SUDO_CMD} dnf -y install docker"
                ;;
            *)
                echo -e "  ${RED}$lowercase_osid is not supported.${PLAIN}"
@@ -775,13 +860,13 @@
        if [ ! -z "$DOCKER_INSTALL_RUN_CMD" ]; then
            $DOCKER_INSTALL_RUN_CMD
        fi
        sudo systemctl start docker
        $SUDO_CMD systemctl start docker

        DOCKERINFO=$(sudo docker info | wc -l)
        DOCKERINFO=$(${SUDO_CMD} docker info | wc -l)
        DOCKERINFOLEN=`expr ${DOCKERINFO} + 0`
        if [ $DOCKERINFOLEN -gt 30 ]; then
            echo -e "  ${GREEN}Docker install success, start docker server.${PLAIN}"
            sudo systemctl start docker
            $SUDO_CMD systemctl start docker
        else
            echo -e "  ${RED}Docker install failed!${PLAIN}"
            exit 1
@@ -794,7 +879,7 @@
    # Download docker image
    echo -e "  ${YELLOW}Pull docker image(${PARAMS_DOCKER_IMAGE})...${PLAIN}"

    sudo docker pull $PARAMS_DOCKER_IMAGE
    ${SUDO_CMD} docker pull $PARAMS_DOCKER_IMAGE

    echo
    sleep 1
@@ -804,74 +889,41 @@
    echo -e "${UNDERLINE}${BOLD}[5/5]${PLAIN}"
    echo -e "  ${YELLOW}Construct command and run docker ...${PLAIN}"

    run_cmd="sudo docker run"
    port_map=" -p ${PARAMS_HOST_PORT}:${PARAMS_DOCKER_PORT}"
    dir_params=" --privileged=true"
    dir_map_params=""
    if [ ! -z "$PARAMS_LOCAL_ASR_DIR" ]; then
        if [ -z "$dir_map_params" ]; then
            dir_map_params="${dir_params} -v ${PARAMS_LOCAL_ASR_DIR}:${PARAMS_DOCKER_ASR_DIR}"
        else
            dir_map_params="${dir_map_params} -v ${PARAMS_LOCAL_ASR_DIR}:${PARAMS_DOCKER_ASR_DIR}"
    start_flag=$1
    if [ "$start_flag" = "install" ]; then
        run_cmd="${SUDO_CMD} docker run"
        port_map=" -p ${PARAMS_HOST_PORT}:${PARAMS_DOCKER_PORT}"
        env_params=" --privileged=true"
        dir_map_params=" -v ${DEFAULT_FUNASR_CONFIG_DIR}:/workspace/.config -v ${PARAMS_FUNASR_LOCAL_MODELS_DIR}:${PARAMS_DOWNLOAD_MODEL_DIR}"

        serverConfigGeneration
        env_params=" ${env_params} --env DAEMON_SERVER_CONFIG=${daemon_server_config}"

        run_cmd="${run_cmd}${port_map}${dir_map_params}${env_params}"
        run_cmd="${run_cmd} -it -d ${PARAMS_DOCKER_IMAGE}"
    else
        #  0: DOCKER is not running
        # 60: DOCKER_ID is empty
        # 61: DOCKER_IMAGE is empty
        # 62: DOCKER is running
        checkDockerIdExist $start_flag
        result=$?
        result=`expr ${result} + 0`
        if [ $result -eq 60 ]; then
            echo
            echo -e "  ${RED}Please run (${PLAIN}${GREEN}${SUDO_CMD} bash funasr-runtime-deploy-offline-cpu-zh.sh install${PLAIN}${RED}) to install Docker first.${PLAIN}"
            return $result
        elif [ $result -eq 61 ]; then
            echo
            echo -e "  ${RED}Please run (${PLAIN}${GREEN}${SUDO_CMD} bash funasr-runtime-deploy-offline-cpu-zh.sh install${PLAIN}${RED}) to install Docker first.${PLAIN}"
            return $result
        elif [ $result -eq 62 ]; then
            echo
            echo -e "  ${RED}Docker: ${PARAMS_DOCKER_IMAGE} ${PARAMS_FUNASR_DOCKER_ID} has been launched, please run (${PLAIN}${GREEN}${SUDO_CMD} bash funasr-runtime-deploy-offline-cpu-zh.sh stop${PLAIN}${RED}) to stop Docker first.${PLAIN}"
            return $result
        fi
    fi
    if [ ! -z "$PARAMS_LOCAL_VAD_DIR" ]; then
        if [ -z "$dir_map_params" ]; then
            dir_map_params="${dir_params} -v ${PARAMS_LOCAL_VAD_DIR}:${PARAMS_DOCKER_VAD_DIR}"
        else
            dir_map_params="${dir_map_params} -v ${PARAMS_LOCAL_VAD_DIR}:${PARAMS_DOCKER_VAD_DIR}"
        fi
    fi
    if [ ! -z "$PARAMS_LOCAL_PUNC_DIR" ]; then
        if [ -z "$dir_map_params" ]; then
            dir_map_params="${dir_params} -v ${PARAMS_LOCAL_PUNC_DIR}:${PARAMS_DOCKER_PUNC_DIR}"
        else
            dir_map_params="${dir_map_params} -v ${PARAMS_LOCAL_VAD_DIR}:${PARAMS_DOCKER_VAD_DIR}"
        fi
    fi

    exec_params="\"exec\":\"${PARAMS_DOCKER_EXEC_PATH}\""
    if [ ! -z "$PARAMS_ASR_ID" ]; then
        asr_params="\"--model-dir\":\"${PARAMS_ASR_ID}\""
    else
        asr_params="\"--model-dir\":\"${PARAMS_DOCKER_ASR_PATH}\""
    fi
    if [ ! -z "$PARAMS_VAD_ID" ]; then
        vad_params="\"--vad-dir\":\"${PARAMS_VAD_ID}\""
    else
        vad_params="\"--vad-dir\":\"${PARAMS_DOCKER_VAD_PATH}\""
    fi
    if [ ! -z "$PARAMS_PUNC_ID" ]; then
        punc_params="\"--punc-dir\":\"${PARAMS_PUNC_ID}\""
    else
        punc_params="\"--punc-dir\":\"${PARAMS_DOCKER_PUNC_PATH}\""
    fi
    download_params="\"--download-model-dir\":\"${PARAMS_DOWNLOAD_MODEL_DIR}\""
    if [ -z "$PARAMS_DOWNLOAD_MODEL_DIR" ]; then
        model_params="${asr_params},${vad_params},${punc_params}"
    else
        model_params="${asr_params},${vad_params},${punc_params},${download_params}"
    fi

    decoder_params="\"--decoder-thread-num\":\"${PARAMS_DECODER_THREAD_NUM}\""
    io_params="\"--io-thread-num\":\"${PARAMS_IO_THREAD_NUM}\""
    thread_params=${decoder_params},${io_params}
    port_params="\"--port\":\"${PARAMS_DOCKER_PORT}\""
    crt_path="\"--certfile\":\"/workspace/FunASR/funasr/runtime/ssl_key/server.crt\""
    key_path="\"--keyfile\":\"/workspace/FunASR/funasr/runtime/ssl_key/server.key\""

    env_params=" -v ${DEFAULT_FUNASR_CONFIG_DIR}:/workspace/.config"
    env_params=" ${env_params} --env DAEMON_SERVER_CONFIG={\"server\":[{${exec_params},${model_params},${thread_params},${port_params},${crt_path},${key_path}}]}"

    run_cmd="${run_cmd}${port_map}${dir_map_params}${env_params}"
    run_cmd="${run_cmd} -it -d ${PARAMS_DOCKER_IMAGE}"

    # check Docker
    checkDockerExist
    result=$?
    result=`expr ${result} + 0`
    if [ ${result} -eq 50 ]; then
        return 50
        run_cmd="${SUDO_CMD} docker restart ${PARAMS_FUNASR_DOCKER_ID}"
    fi

    rm -f ${DEFAULT_FUNASR_PROGRESS_TXT}
@@ -881,6 +933,9 @@

    echo
    echo -e "  ${YELLOW}Loading models:${PLAIN}"

    getDockerId
    saveParams

    # Hide the cursor, start draw progress.
    printf "\e[?25l"
@@ -915,8 +970,58 @@

    deploySamples
    echo -e "  ${BOLD}The sample code is already stored in the ${PLAIN}(${GREEN}${PARAMS_FUNASR_SAMPLES_LOCAL_DIR}${PLAIN}) ."
    echo -e "  ${BOLD}If you want to see an example of how to use the client, you can run ${PLAIN}${GREEN}sudo bash funasr-runtime-deploy-offline-cpu-zh.sh client${PLAIN} ."
    echo -e "  ${BOLD}If you want to see an example of how to use the client, you can run ${PLAIN}${GREEN}${SUDO_CMD} bash funasr-runtime-deploy-offline-cpu-zh.sh client${PLAIN} ."
    echo
}

daemon_server_config=""
serverConfigGeneration(){
    # params about models
    if [ ! -z "$PARAMS_ASR_ID" ]; then
        asr_params="\"--model-dir\":\"${PARAMS_ASR_ID}\""
    else
        if [ ! -z "$PARAMS_LOCAL_ASR_PATH" ]; then
            dir_map_params="${dir_map_params} -v ${PARAMS_LOCAL_ASR_PATH}:${PARAMS_DOCKER_ASR_PATH}"
        fi
        asr_params="\"--model-dir\":\"${PARAMS_DOCKER_ASR_PATH}\""
    fi
    if [ ! -z "$PARAMS_VAD_ID" ]; then
        vad_params="\"--vad-dir\":\"${PARAMS_VAD_ID}\""
    else
        if [ ! -z "$PARAMS_LOCAL_VAD_PATH" ]; then
            dir_map_params="${dir_map_params} -v ${PARAMS_LOCAL_VAD_PATH}:${PARAMS_DOCKER_VAD_PATH}"
        fi
        vad_params="\"--vad-dir\":\"${PARAMS_DOCKER_VAD_PATH}\""
    fi
    if [ ! -z "$PARAMS_PUNC_ID" ]; then
        punc_params="\"--punc-dir\":\"${PARAMS_PUNC_ID}\""
    else
        if [ ! -z "$PARAMS_LOCAL_PUNC_PATH" ]; then
            dir_map_params="${dir_map_params} -v ${PARAMS_LOCAL_VAD_PATH}:${PARAMS_DOCKER_VAD_PATH}"
        fi
        punc_params="\"--punc-dir\":\"${PARAMS_DOCKER_PUNC_PATH}\""
    fi
    download_params="\"--download-model-dir\":\"${PARAMS_DOWNLOAD_MODEL_DIR}\""
    model_params="${asr_params},${vad_params},${punc_params},${download_params}"

    # params about thread_num
    decoder_params="\"--decoder-thread-num\":\"${PARAMS_DECODER_THREAD_NUM}\""
    io_params="\"--io-thread-num\":\"${PARAMS_IO_THREAD_NUM}\""
    thread_params=${decoder_params},${io_params}

    # params about port and ssl
    port_params="\"--port\":\"${PARAMS_DOCKER_PORT}\""
    if [ $PARAMS_SSL_FLAG -eq 0 ]; then
        crt_path="\"--certfile\":\"\""
        key_path="\"--keyfile\":\"\""
    else
        crt_path="\"--certfile\":\"/workspace/FunASR/funasr/runtime/ssl_key/server.crt\""
        key_path="\"--keyfile\":\"/workspace/FunASR/funasr/runtime/ssl_key/server.key\""
    fi

    exec_params="\"exec\":\"${PARAMS_DOCKER_EXEC_PATH}\""

    daemon_server_config="{\"server\":[{${exec_params},${model_params},${thread_params},${port_params},${crt_path},${key_path}}]}"
}

installPythonDependencyForPython(){
@@ -940,19 +1045,19 @@
    lowercase_osid=$(echo ${OSID} | tr '[A-Z]' '[a-z]')
    case "$lowercase_osid" in
        ubuntu)
            pre_cmd="sudo apt-get install -y ffmpeg"
            pre_cmd="${SUDO_CMD} apt-get install -y ffmpeg"
            ;;
        centos)
            pre_cmd="sudo yum install -y ffmpeg"
            pre_cmd="${SUDO_CMD} yum install -y ffmpeg"
            ;;
        debian)
            pre_cmd="sudo apt-get install -y ffmpeg"
            pre_cmd="${SUDO_CMD} apt-get install -y ffmpeg"
            ;;
        \"alios\")
            pre_cmd="sudo yum install -y ffmpeg"
            pre_cmd="${SUDO_CMD} yum install -y ffmpeg"
            ;;
        \"alinux\")
            pre_cmd="sudo yum install -y ffmpeg"
            pre_cmd="${SUDO_CMD} yum install -y ffmpeg"
            ;;
        *)
            echo -e "  ${RED}$lowercase_osid is not supported.${PLAIN}"
@@ -985,21 +1090,83 @@
    fi
}

checkDockerExist(){
    result=$(sudo docker ps | grep ${PARAMS_DOCKER_IMAGE} | wc -l)
    result=`expr ${result} + 0`
    if [ ${result} -ne 0 ]; then
        echo
        echo -e "  ${RED}Docker: ${PARAMS_DOCKER_IMAGE} has been launched, please run (${PLAIN}${GREEN}sudo bash funasr-runtime-deploy-offline-cpu-zh.sh stop${PLAIN}${RED}) to stop Docker first.${PLAIN}"
        return 50
getDockerId(){
    id=""
    array=($(${SUDO_CMD} docker ps -a | grep ${PARAMS_DOCKER_IMAGE} | awk '{print $1}'))
    len=${#array[@]}
    if [ $len -ge 1 ]; then
        # get the first id
        id=$array
        if [ ! -z "$id" ]; then
            PARAMS_FUNASR_DOCKER_ID=$id
        fi
    fi
}

dockerExit(){
    echo -e "  ${YELLOW}Stop docker(${PLAIN}${GREEN}${PARAMS_DOCKER_IMAGE}${PLAIN}${YELLOW}) server ...${PLAIN}"
    sudo docker stop `sudo docker ps -a| grep ${PARAMS_DOCKER_IMAGE} | awk '{print $1}' `
checkDockerImageExist(){
    result=1
    if [ -z "$PARAMS_DOCKER_IMAGE" ]; then
        return 50
    else
        result=$(${SUDO_CMD} docker ps | grep ${PARAMS_DOCKER_IMAGE} | wc -l)
    fi
    result=`expr ${result} + 0`
    echo "checkDockerImageExist result0: " $result
    if [ $result -ne 0 ]; then
        # found docker
        return 51
    else
        return 0
    fi
}

checkDockerIdExist(){
    result=0
    if [ -z "$PARAMS_FUNASR_DOCKER_ID" ]; then
        if [ -z "$PARAMS_DOCKER_IMAGE" ]; then
            return 63
        else
            return 60
        fi
    else
        if [ -z "$PARAMS_DOCKER_IMAGE" ]; then
            return 61
        else
            if [ "$1" = "install" ]; then
                result=$(${SUDO_CMD} docker ps -a | grep ${PARAMS_DOCKER_IMAGE} | grep ${PARAMS_FUNASR_DOCKER_ID} | wc -l)
            else
                result=$(${SUDO_CMD} docker ps | grep ${PARAMS_DOCKER_IMAGE} | grep ${PARAMS_FUNASR_DOCKER_ID} | wc -l)
            fi
        fi
    fi
    result=`expr ${result} + 0`
    if [ $result -eq 1 ]; then
        # found docker
        return 62
    else
        return 0
    fi
}

dockerStop(){
    if [ -z "$PARAMS_FUNASR_DOCKER_ID" ]; then
        echo -e "  ${RED}DOCKER_ID is empty, cannot stop docker.${PLAIN}"
    else
        echo -e "  ${YELLOW}Stop docker(${PLAIN}${GREEN}${PARAMS_DOCKER_IMAGE} ${PARAMS_FUNASR_DOCKER_ID}${PLAIN}${YELLOW}) server ...${PLAIN}"
        ${SUDO_CMD} docker stop ${PARAMS_FUNASR_DOCKER_ID}
    fi
    echo
    sleep 1
}

dockerRemove(){
    if [ -z "$PARAMS_FUNASR_DOCKER_ID" ]; then
        echo -e "  ${RED}DOCKER_ID is empty, cannot remove docker.${PLAIN}"
    else
        echo -e "  ${YELLOW}Remove docker(${PLAIN}${GREEN}${PARAMS_DOCKER_IMAGE} ${PARAMS_FUNASR_DOCKER_ID}${PLAIN}${YELLOW}) ...${PLAIN}"
        ${SUDO_CMD} docker rm ${PARAMS_FUNASR_DOCKER_ID}
    fi

    echo
}

modelChange(){
@@ -1007,13 +1174,16 @@
    model_id=$2
    local_flag=0

    if [ -d "$model_id" ]; then
    relativePathToFullPath $model_id
    if [ -d "$full_path" ]; then
        local_flag=1
        model_id=$full_path
    else
        local_flag=0
    fi
    full_path=""

    result=$(echo $model_type | grep "--asr_model")
    result=$(echo ${model_type} | grep "\-\-asr_model")
    if [ "$result" != "" ]; then
        if [ $local_flag -eq 0 ]; then
            PARAMS_ASR_ID=$model_id
@@ -1029,13 +1199,12 @@
            else
                model_name=$(basename "${PARAMS_LOCAL_ASR_PATH}")
                PARAMS_LOCAL_ASR_DIR=$(dirname "${PARAMS_LOCAL_ASR_PATH}")
                middle=${PARAMS_LOCAL_ASR_DIR#*"${PARAMS_FUNASR_LOCAL_MODELS_DIR}"}
                PARAMS_DOCKER_ASR_DIR=$PARAMS_DOWNLOAD_MODEL_DIR
                PARAMS_DOCKER_ASR_PATH=${PARAMS_DOCKER_ASR_DIR}/${middle}/${model_name}
                PARAMS_DOCKER_ASR_PATH=${PARAMS_DOCKER_ASR_DIR}/${model_name}
            fi
        fi
    fi
    result=$(echo ${model_type} | grep "--vad_model")
    result=$(echo ${model_type} | grep "\-\-vad_model")
    if [ "$result" != "" ]; then
        if [ $local_flag -eq 0 ]; then
            PARAMS_VAD_ID=$model_id
@@ -1051,13 +1220,12 @@
            else
                model_name=$(basename "${PARAMS_LOCAL_VAD_PATH}")
                PARAMS_LOCAL_VAD_DIR=$(dirname "${PARAMS_LOCAL_VAD_PATH}")
                middle=${PARAMS_LOCAL_VAD_DIR#*"${PARAMS_FUNASR_LOCAL_MODELS_DIR}"}
                PARAMS_DOCKER_VAD_DIR=$PARAMS_DOWNLOAD_MODEL_DIR
                PARAMS_DOCKER_VAD_PATH=${PARAMS_DOCKER_VAD_DIR}/${middle}/${model_name}
                PARAMS_DOCKER_VAD_PATH=${PARAMS_DOCKER_VAD_DIR}/${model_name}
            fi
        fi
    fi
    result=$(echo $model_type | grep "--punc_model")
    result=$(echo ${model_type} | grep "\-\-punc_model")
    if [ "$result" != "" ]; then
        if [ $local_flag -eq 0 ]; then
            PARAMS_PUNC_ID=$model_id
@@ -1068,9 +1236,8 @@
        else
            model_name=$(basename "${PARAMS_LOCAL_PUNC_PATH}")
            PARAMS_LOCAL_PUNC_DIR=$(dirname "${PARAMS_LOCAL_PUNC_PATH}")
            middle=${PARAMS_LOCAL_PUNC_DIR#*"${PARAMS_FUNASR_LOCAL_MODELS_DIR}"}
            PARAMS_DOCKER_PUNC_DIR=$PARAMS_DOWNLOAD_MODEL_DIR
            PARAMS_DOCKER_PUNC_PATH=${PARAMS_DOCKER_PUNC_DIR}/${middle}/${model_name}
            PARAMS_DOCKER_PUNC_PATH=${PARAMS_DOCKER_PUNC_DIR}/${model_name}
        fi
    fi
}
@@ -1082,11 +1249,11 @@
    if [ -z "$val"]; then
        num=`expr ${val} + 0`
        if [ $num -ge 1 ] && [ $num -le 1024 ]; then
            result=$(echo ${type} | grep "--decode_thread_num")
            result=$(echo ${type} | grep "\-\-decode_thread_num")
            if [ "$result" != "" ]; then
                PARAMS_DECODER_THREAD_NUM=$num
            fi
            result=$(echo ${type} | grep "--io_thread_num")
            result=$(echo ${type} | grep "\-\-io_thread_num")
            if [ "$result" != "" ]; then
                PARAMS_IO_THREAD_NUM=$num
            fi
@@ -1164,12 +1331,8 @@
        pre_cmd=”“
        case "$lang" in
            Linux_Cpp)
                pre_cmd="export LD_LIBRARY_PATH=${PARAMS_FUNASR_SAMPLES_LOCAL_DIR}/cpp/libs:\$LD_LIBRARY_PATH"
                client_exec="${PARAMS_FUNASR_SAMPLES_LOCAL_DIR}/cpp/funasr-wss-client"
                run_cmd="${client_exec} --server-ip ${server_ip} --port ${host_port} --wav-path ${wav_path}"
                echo -e "  Run ${BLUE}${pre_cmd}${PLAIN}"
                $pre_cmd
                echo
                ;;
            Python)
                client_exec="${PARAMS_FUNASR_SAMPLES_LOCAL_DIR}/python/wss_client_asr.py"
@@ -1202,12 +1365,13 @@
    selectDockerImages
    result=$?
    result=`expr ${result} + 0`
    if [ ${result} -eq 50 ]; then
        return 50
    if [ $result -ne 0 ]; then
        return $result
    fi

    setupHostPort
    complementParameters
    return 0
}

# Display Help info
@@ -1220,14 +1384,17 @@
    echo -e "${UNDERLINE}Options${PLAIN}:"
    echo -e "   ${BOLD}-i, install, --install${PLAIN}    Install and run FunASR docker."
    echo -e "                install [--workspace] <workspace in local>"
    echo -e "                install [--ssl] <0: close SSL; 1: open SSL, default:1>"
    echo -e "   ${BOLD}-s, start  , --start${PLAIN}      Run FunASR docker with configuration that has already been set."
    echo -e "   ${BOLD}-p, stop   , --stop${PLAIN}       Stop FunASR docker."
    echo -e "   ${BOLD}-m, remove , --remove${PLAIN}     Remove FunASR docker installed."
    echo -e "   ${BOLD}-r, restart, --restart${PLAIN}    Restart FunASR docker."
    echo -e "   ${BOLD}-u, update , --update${PLAIN}     Update parameters that has already been set."
    echo -e "                update [--workspace] <workspace in local>"
    echo -e "                update [--asr_model | --vad_model | --punc_model] <model_id or local model path>"
    echo -e "                update [--host_port | --docker_port] <port number>"
    echo -e "                update [--decode_thread_num | io_thread_num] <the number of threads>"
    echo -e "                update [--ssl] <0: close SSL; 1: open SSL, default:1>"
    echo -e "   ${BOLD}-c, client , --client${PLAIN}     Get a client example to show how to initiate speech recognition."
    echo -e "   ${BOLD}-o, show   , --show${PLAIN}       Displays all parameters that have been set."
    echo -e "   ${BOLD}-v, version, --version${PLAIN}    Display current script version."
@@ -1253,9 +1420,12 @@
                if [ "$stage" = "--workspace" ]; then
                    relativePathToFullPath $val
                    PARAMS_FUNASR_LOCAL_WORKSPACE=$full_path
                    full_path=""
                    if [ ! -z "$PARAMS_FUNASR_LOCAL_WORKSPACE" ]; then
                        mkdir -p $PARAMS_FUNASR_LOCAL_WORKSPACE
                    fi
                elif [ "$stage" = "--ssl" ]; then
                    PARAMS_SSL_FLAG=`expr ${val} + 0`
                fi
            fi
        done
@@ -1266,8 +1436,8 @@
OSID=$(grep ^ID= /etc/os-release | cut -d= -f2)
OSVER=$(lsb_release -cs)
OSNUM=$(grep -oE  "[0-9.]+" /etc/issue)
CPUNUM=$(cat /proc/cpuinfo |grep "processor"|wc -l)
DOCKERINFO=$(sudo docker info | wc -l)
CPUNUM=$(cat /proc/cpuinfo | grep "processor"|wc -l)
DOCKERINFO=$(${SUDO_CMD} docker info | wc -l)
DOCKERINFOLEN=`expr ${DOCKERINFO} + 0`

# PARAMS
@@ -1279,8 +1449,8 @@
PARAMS_FUNASR_SAMPLES_LOCAL_PATH=${PARAMS_FUNASR_LOCAL_WORKSPACE}/${DEFAULT_SAMPLES_NAME}.tar.gz
#  The dir stored models in local
PARAMS_FUNASR_LOCAL_MODELS_DIR="${PARAMS_FUNASR_LOCAL_WORKSPACE}/models"
#  The path of configuration in local
PARAMS_FUNASR_CONFIG_PATH="${PARAMS_FUNASR_LOCAL_WORKSPACE}/config"
#  The id of started docker
PARAMS_FUNASR_DOCKER_ID=""

#  The server excutor in local
PARAMS_DOCKER_EXEC_PATH=$DEFAULT_DOCKER_EXEC_PATH
@@ -1329,6 +1499,7 @@
PARAMS_DOCKER_PORT="10095"
PARAMS_DECODER_THREAD_NUM="32"
PARAMS_IO_THREAD_NUM="8"
PARAMS_SSL_FLAG=1


echo -e "#############################################################"
@@ -1352,47 +1523,97 @@
        paramsConfigure
        result=$?
        result=`expr ${result} + 0`
        if [ ${result} -ne 50 ]; then
            showAllParams
        if [ $result -eq 0 ]; then
            showAllParams "install"
            installFunasrDocker
            dockerRun
            dockerRun "install"
            result=$?
            stage=`expr ${result} + 0`
            if [ $stage -eq 98 ]; then
                dockerExit
                dockerRun
            fi

            try_count=1
            while true
            do
                stage=`expr ${result} + 0`
                if [ $try_count -ge 10 ]; then
                    break
                else
                    # 98: cannot find progress from Docker
                    if [ $stage -eq 98 ]; then
                        dockerStop
                        dockerRun "start"
                        result=$?
                        let try_count=try_count+1
                    else
                        break
                    fi
                fi
            done
        fi
        ;;
    start|-s|--start)
        rootNess
        paramsFromDefault
        showAllParams
        dockerRun
        showAllParams "only_show"
        dockerRun "start"
        result=$?
        stage=`expr ${result} + 0`
        if [ $stage -eq 98 ]; then
            dockerExit
            dockerRun
        fi

        try_count=1
        while true
        do
            stage=`expr ${result} + 0`
            if [ $try_count -ge 10 ]; then
                break
            else
                # 98: cannot find progress from Docker
                if [ $stage -eq 98 ]; then
                    dockerStop
                    dockerRun "start"
                    result=$?
                    let try_count=try_count+1
                else
                    break
                fi
            fi
        done
        ;;
    restart|-r|--restart)
        rootNess
        paramsFromDefault
        showAllParams
        dockerExit
        dockerRun
        showAllParams "only_show"
        dockerStop
        dockerRun "start"
        result=$?
        stage=`expr ${result} + 0`
        if [ $stage -eq 98 ]; then
            dockerExit
            dockerRun
        fi
        
        try_count=1
        while true
        do
            stage=`expr ${result} + 0`
            if [ $try_count -ge 10 ]; then
                break
            else
                # 98: cannot find progress from Docker
                if [ $stage -eq 98 ]; then
                    dockerStop
                    dockerRun "start"
                    result=$?
                    let try_count=try_count+1
                else
                    break
                fi
            fi
        done
        ;;
    stop|-p|--stop)
        rootNess
        paramsFromDefault
        dockerExit
        dockerStop
        ;;
    remove|-m|--remove)
        rootNess
        paramsFromDefault
        dockerStop
        dockerRemove
        rm -f ${DEFAULT_FUNASR_CONFIG_FILE}
        rm -f ${DEFAULT_FUNASR_SERVER_CONFIG_FILE}
        ;;
    update|-u|--update)
        rootNess
@@ -1413,6 +1634,13 @@
                if [ ! -z "$PARAMS_FUNASR_LOCAL_WORKSPACE" ]; then
                    mkdir -p $PARAMS_FUNASR_LOCAL_WORKSPACE
                fi
            elif [ "$type" = "--ssl" ]; then
                switch=`expr ${val} + 0`
                if [ $switch -eq 0 ]; then
                    PARAMS_SSL_FLAG=0
                else
                    PARAMS_SSL_FLAG=1
                fi
            else
                displayHelp
            fi
@@ -1422,15 +1650,30 @@

        initParameters
        complementParameters
        showAllParams
        dockerExit
        dockerRun
        showAllParams "install"
        dockerStop
        dockerRun "start"
        result=$?
        stage=`expr ${result} + 0`
        if [ $stage -eq 98 ]; then
            dockerExit
            dockerRun
        fi

        try_count=1
        while true
        do
            stage=`expr ${result} + 0`
            if [ $try_count -ge 10 ]; then
                break
            else
                # 98: cannot find progress from Docker
                # 60: DOCKER_ID is empty
                if [ $stage -eq 98 ] || [ $stage -eq 60 ]; then
                    dockerStop
                    dockerRun "start"
                    result=$?
                    let try_count=try_count+1
                else
                    break
                fi
            fi
        done
        ;;
    client|-c|--client)
        rootNess
@@ -1441,7 +1684,7 @@
    show|-o|--show)
        rootNess
        paramsFromDefault
        showAllParams
        showAllParams "only_show"
        ;;
    *)
        displayHelp

 funasr/runtime/docs/SDK_advanced_guide_offline.md

@@ -169,7 +169,7 @@

### python-client
```shell
python wss_client_asr.py --host "127.0.0.1" --port 10095 --mode offline --audio_in "./data/wav.scp" --send_without_sleep --output_dir "./results"
python funasr_wss_client.py --host "127.0.0.1" --port 10095 --mode offline --audio_in "./data/wav.scp" --send_without_sleep --output_dir "./results"
```

Introduction to command parameters:

 funasr/runtime/docs/SDK_advanced_guide_offline_zh.md

@@ -2,11 +2,47 @@

FunASR提供可一键本地或者云端服务器部署的中文离线文件转写服务，内核为FunASR已开源runtime-SDK。FunASR-runtime结合了达摩院语音实验室在Modelscope社区开源的语音端点检测(VAD)、Paraformer-large语音识别(ASR)、标点检测(PUNC) 等相关能力，可以准确、高效的对音频进行高并发转写。

本文档为FunASR离线文件转写服务开发指南。如果您想快速体验离线文件转写服务，请参考FunASR离线文件转写服务一键部署示例（[点击此处](./SDK_tutorial_cn.md)）。
本文档为FunASR离线文件转写服务开发指南。如果您想快速体验离线文件转写服务，可参考[快速上手](#快速上手)。

## 快速上手
### 镜像启动

通过下述命令拉取并启动FunASR runtime-SDK的docker镜像：

```shell
sudo docker pull registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-cpu-0.1.0

sudo docker run -p 10095:10095 -it --privileged=true -v /root:/workspace/models registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-cpu-0.1.0
```
如果您没有安装docker，可参考[Docker安装](#Docker安装)

### 服务端启动

docker启动之后，启动 funasr-wss-server服务程序：
```shell
cd FunASR/funasr/runtime
./run_server.sh \
  --download-model-dir /workspace/models \
  --vad-dir damo/speech_fsmn_vad_zh-cn-16k-common-onnx \
  --model-dir damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-onnx  \
  --punc-dir damo/punc_ct-transformer_zh-cn-common-vocab272727-onnx
```
服务端详细参数介绍可参考[服务端参数介绍](#服务端参数介绍)
### 客户端测试与使用

下载客户端测试工具目录samples
```shell
wget https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/sample/funasr_samples.tar.gz
```
我们以Python语言客户端为例，进行说明，支持多种音频格式输入（.wav, .pcm, .mp3等），也支持视频输入(.mp4等)，以及多文件列表wav.scp输入，其他版本客户端请参考文档（[点击此处](#客户端用法详解)），定制服务部署请参考[如何定制服务部署](#如何定制服务部署)
```shell
python3 wss_client_asr.py --host "127.0.0.1" --port 10095 --mode offline --audio_in "../audio/asr_example.wav"
```

------------------
## Docker安装

下述步骤为手动安装docker及docker镜像的步骤，如您docker镜像已启动，可以忽略本步骤：
下述步骤为手动安装docker环境的步骤：

### docker环境安装
```shell
@@ -30,35 +66,63 @@
sudo systemctl start docker
```

### 镜像拉取及启动

通过下述命令拉取并启动FunASR runtime-SDK的docker镜像：
## 客户端用法详解

在服务器上完成FunASR服务部署以后，可以通过如下的步骤来测试和使用离线文件转写服务。
目前分别支持以下几种编程语言客户端

- [Python](#python-client)
- [CPP](#cpp-client)
- [html网页版本](#Html网页版)
- [Java](#Java-client)

### python-client
若想直接运行client进行测试，可参考如下简易说明，以python版本为例：

```shell
sudo docker pull registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-cpu-latest

sudo docker run -p 10095:10095 -it --privileged=true -v /root:/workspace/models registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-cpu-latest
python3 wss_client_asr.py --host "127.0.0.1" --port 10095 --mode offline --audio_in "../audio/asr_example.wav" --output_dir "./results"
```

命令参数介绍：
命令参数说明：
```text
-p <宿主机端口>:<映射到docker端口>
如示例，宿主机(ecs)端口10095映射到docker端口10095上。前提是确保ecs安全规则打开了10095端口。
-v <宿主机路径>:<挂载至docker路径>
如示例，宿主机路径/root挂载至docker路径/workspace/models
--host 为FunASR runtime-SDK服务部署机器ip，默认为本机ip（127.0.0.1），如果client与服务不在同一台服务器，需要改为部署机器ip
--port 10095 部署端口号
--mode offline表示离线文件转写
--audio_in 需要进行转写的音频文件，支持文件路径，文件列表wav.scp
--output_dir 识别结果保存路径
```


## 服务端启动

docker启动之后，启动 funasr-wss-server服务程序：
### cpp-client
进入samples/cpp目录后，可以用cpp进行测试，指令如下：
```shell
./run_server.sh --vad-dir damo/speech_fsmn_vad_zh-cn-16k-common-onnx \
  --model-dir damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-onnx  \
  --punc-dir damo/punc_ct-transformer_zh-cn-common-vocab272727-onnx
./funasr-wss-client --server-ip 127.0.0.1 --port 10095 --wav-path ../audio/asr_example.wav
```

详细命令参数介绍：
命令参数说明：

```text
--server-ip 为FunASR runtime-SDK服务部署机器ip，默认为本机ip（127.0.0.1），如果client与服务不在同一台服务器，需要改为部署机器ip
--port 10095 部署端口号
--wav-path 需要进行转写的音频文件，支持文件路径
```

### Html网页版

在浏览器中打开 html/static/index.html，即可出现如下页面，支持麦克风输入与文件上传，直接进行体验

<img src="images/html.png"  width="900"/>

### Java-client

```shell
FunasrWsClient --host localhost --port 10095 --audio_in ./asr_example.wav --mode offline
```
详细可以参考文档（[点击此处](../java/readme.md)）



## 服务端参数介绍：

funasr-wss-server支持从Modelscope下载模型，设置模型下载地址（--download-model-dir，默认为/workspace/models）及model ID（--model-dir、--vad-dir、--punc-dir）,示例如下：
```shell
@@ -76,21 +140,21 @@
 ```
命令参数介绍：
```text
--download-model-dir #模型下载地址，通过设置model ID从Modelscope下载模型
--model-dir # modelscope model ID
--quantize  # True为量化ASR模型，False为非量化ASR模型，默认是True
--vad-dir # modelscope model ID
--vad-quant  # True为量化VAD模型，False为非量化VAD模型，默认是True
--punc-dir # modelscope model ID
--punc-quant  # True为量化PUNC模型，False为非量化PUNC模型，默认是True
--port # 服务端监听的端口号，默认为 10095
--decoder-thread-num # 服务端启动的推理线程数，默认为 8
--io-thread-num # 服务端启动的IO线程数，默认为 1
--certfile <string> # ssl的证书文件，默认为：../../../ssl_key/server.crt
--keyfile <string> # ssl的密钥文件，默认为：../../../ssl_key/server.key
--download-model-dir 模型下载地址，通过设置model ID从Modelscope下载模型
--model-dir  modelscope model ID
--quantize  True为量化ASR模型，False为非量化ASR模型，默认是True
--vad-dir  modelscope model ID
--vad-quant   True为量化VAD模型，False为非量化VAD模型，默认是True
--punc-dir  modelscope model ID
--punc-quant   True为量化PUNC模型，False为非量化PUNC模型，默认是True
--port  服务端监听的端口号，默认为 10095
--decoder-thread-num  服务端启动的推理线程数，默认为 8
--io-thread-num  服务端启动的IO线程数，默认为 1
--certfile  ssl的证书文件，默认为：../../../ssl_key/server.crt
--keyfile   ssl的密钥文件，默认为：../../../ssl_key/server.key
```

funasr-wss-server同时也支持从本地路径加载模型（本地模型资源准备详见[模型资源准备](#anchor-1)）示例如下：
funasr-wss-server同时也支持从本地路径加载模型（本地模型资源准备详见[模型资源准备](#模型资源准备)）示例如下：
```shell
cd /workspace/FunASR/funasr/runtime/websocket/build/bin
./funasr-wss-server  \
@@ -105,32 +169,32 @@
 ```
命令参数介绍：
```text
--model-dir # ASR模型路径，默认为：/workspace/models/asr
--quantize  # True为量化ASR模型，False为非量化ASR模型，默认是True
--vad-dir # VAD模型路径，默认为：/workspace/models/vad
--vad-quant  # True为量化VAD模型，False为非量化VAD模型，默认是True
--punc-dir # PUNC模型路径，默认为：/workspace/models/punc
--punc-quant  # True为量化PUNC模型，False为非量化PUNC模型，默认是True
--port # 服务端监听的端口号，默认为 10095
--decoder-thread-num # 服务端启动的推理线程数，默认为 8
--io-thread-num # 服务端启动的IO线程数，默认为 1
--certfile <string> # ssl的证书文件，默认为：../../../ssl_key/server.crt
--keyfile <string> # ssl的密钥文件，默认为：../../../ssl_key/server.key
--model-dir  ASR模型路径，默认为：/workspace/models/asr
--quantize   True为量化ASR模型，False为非量化ASR模型，默认是True
--vad-dir  VAD模型路径，默认为：/workspace/models/vad
--vad-quant   True为量化VAD模型，False为非量化VAD模型，默认是True
--punc-dir  PUNC模型路径，默认为：/workspace/models/punc
--punc-quant   True为量化PUNC模型，False为非量化PUNC模型，默认是True
--port  服务端监听的端口号，默认为 10095
--decoder-thread-num  服务端启动的推理线程数，默认为 8
--io-thread-num  服务端启动的IO线程数，默认为 1
--certfile ssl的证书文件，默认为：../../../ssl_key/server.crt
--keyfile  ssl的密钥文件，默认为：../../../ssl_key/server.key
```

## <a id="anchor-1">模型资源准备</a>
## 模型资源准备

如果您选择通过funasr-wss-server从Modelscope下载模型，可以跳过本步骤。

FunASR离线文件转写服务中的vad、asr和punc模型资源均来自Modelscope，模型地址详见下表：

| 模型 | Modelscope链接                                                                                                     |
|------|------------------------------------------------------------------------------------------------------------------|
| VAD  | https://www.modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary |
| ASR  | https://www.modelscope.cn/models/damo/speech_fsmn_vad_zh-cn-16k-common-pytorch/summary                           |
| PUNC | https://www.modelscope.cn/models/damo/punc_ct-transformer_zh-cn-common-vocab272727-pytorch/summary               |
| 模型 | Modelscope链接                                                                                                  |
|------|---------------------------------------------------------------------------------------------------------------|
| VAD  | https://www.modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-onnx/summary |
| ASR  | https://www.modelscope.cn/models/damo/speech_fsmn_vad_zh-cn-16k-common-onnx/summary                           |
| PUNC | https://www.modelscope.cn/models/damo/punc_ct-transformer_zh-cn-common-vocab272727-onnx/summary               |

离线文件转写服务中部署的是量化后的ONNX模型，下面介绍下如何导出ONNX模型及其量化：您可以选择从Modelscope导出ONNX模型、从本地文件导出ONNX模型或者从finetune后的资源导出模型：
离线文件转写服务中部署的是量化后的ONNX模型，下面介绍下如何导出ONNX模型及其量化：您可以选择从Modelscope导出ONNX模型、从finetune后的资源导出模型：

### 从Modelscope导出ONNX模型

@@ -153,22 +217,6 @@
--type 模型类型，目前支持 ONNX、torch
--quantize  int8模型量化
```

### 从本地文件导出ONNX模型

设置model name为模型本地路径，导出量化后的ONNX模型：

```shell
python -m funasr.export.export_model --model-name /workspace/models/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch --export-dir ./export --type onnx --quantize True
```
命令参数介绍：
```text
--model-name  模型本地路径，例如/workspace/models/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch
--export-dir  ONNX模型导出地址
--type 模型类型，目前支持 ONNX、torch
--quantize  int8模型量化
```

### 从finetune后的资源导出模型

假如您想部署finetune后的模型，可以参考如下步骤：
@@ -179,36 +227,18 @@
python -m funasr.export.export_model --model-name /path/to/finetune/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch --export-dir ./export --type onnx --quantize True
```

## 客户端启动

在服务器上完成FunASR离线文件转写服务部署以后，可以通过如下的步骤来测试和使用离线文件转写服务。目前FunASR-bin支持多种方式启动客户端，如下是基于python-client、c++-client的命令行实例及自定义客户端Websocket通信协议：

### python-client
```shell
python wss_client_asr.py --host "127.0.0.1" --port 10095 --mode offline --audio_in "./data/wav.scp" --send_without_sleep --output_dir "./results"
```
命令参数介绍：
```text
--host # 服务端ip地址，本机测试可设置为 127.0.0.1
--port # 服务端监听端口号
--audio_in # 音频输入，输入可以是：wav路径 或者 wav.scp路径（kaldi格式的wav list，wav_id \t wav_path）
--output_dir # 识别结果输出路径
--ssl # 是否使用SSL加密，默认使用
--mode # offline模式
```
## 如何定制服务部署

### c++-client：
```shell
. /funasr-wss-client --server-ip 127.0.0.1 --port 10095 --wav-path test.wav --thread-num 1 --is-ssl 1
```
命令参数介绍：
```text
--server-ip # 服务端ip地址，本机测试可设置为 127.0.0.1
--port # 服务端监听端口号
--wav-path # 音频输入，输入可以是：wav路径 或者 wav.scp路径（kaldi格式的wav list，wav_id \t wav_path）
--thread-num # 客户端线程数
--is-ssl # 是否使用SSL加密，默认使用
```
FunASR-runtime的代码已开源，如果服务端和客户端不能很好的满足您的需求，您可以根据自己的需求进行进一步的开发：
### c++ 客户端：

https://github.com/alibaba-damo-academy/FunASR/tree/main/funasr/runtime/websocket

### python 客户端：

https://github.com/alibaba-damo-academy/FunASR/tree/main/funasr/runtime/python/websocket

### 自定义客户端：

@@ -223,16 +253,6 @@
{"is_speaking": False}
```

## 如何定制服务部署

FunASR-runtime的代码已开源，如果服务端和客户端不能很好的满足您的需求，您可以根据自己的需求进行进一步的开发：
### c++ 客户端：

https://github.com/alibaba-damo-academy/FunASR/tree/main/funasr/runtime/websocket

### python 客户端：

https://github.com/alibaba-damo-academy/FunASR/tree/main/funasr/runtime/python/websocket
### c++ 服务端：

#### VAD
@@ -265,4 +285,4 @@
FUNASR_RESULT result=CTTransformerInfer(punc_hanlde, txt_str.c_str(), RASR_NONE, NULL);
// 其中：punc_hanlde为CTTransformerInit返回值，txt_str为文本
```
使用示例详见：https://github.com/alibaba-damo-academy/FunASR/blob/main/funasr/runtime/onnxruntime/bin/funasr-onnx-offline-punc.cpp
使用示例详见：https://github.com/alibaba-damo-academy/FunASR/blob/main/funasr/runtime/onnxruntime/bin/funasr-onnx-offline-punc.cpp

 funasr/runtime/docs/SDK_tutorial.md

@@ -275,7 +275,7 @@
Looking in indexes: http://mirrors.cloud.aliyuncs.com/pypi/simple/
Requirement already satisfied: websockets in /usr/local/lib/python3.8/dist-packages (from -r /root/funasr_samples/python/requirements_client.txt (line 1)) (11.0.3)

  Run python3 /root/funasr_samples/python/wss_client_asr.py --host 127.0.0.1 --port 10095 --mode offline --audio_in /root/funasr_samples/audio/asr_example.wav --send_without_sleep --output_dir ./funasr_samples/python
  Run python3 /root/funasr_samples/python/funasr_wss_client.py --host 127.0.0.1 --port 10095 --mode offline --audio_in /root/funasr_samples/audio/asr_example.wav --send_without_sleep --output_dir ./funasr_samples/python

  ...
  ...
@@ -284,7 +284,7 @@
Exception: sent 1000 (OK); then received 1000 (OK)
end

  If failed, you can try (python3 /root/funasr_samples/python/wss_client_asr.py --host 127.0.0.1 --port 10095 --mode offline --audio_in /root/funasr_samples/audio/asr_example.wav --send_without_sleep --output_dir ./funasr_samples/python) in your Shell.
  If failed, you can try (python3 /root/funasr_samples/python/funasr_wss_client.py --host 127.0.0.1 --port 10095 --mode offline --audio_in /root/funasr_samples/audio/asr_example.wav --send_without_sleep --output_dir ./funasr_samples/python) in your Shell.

```

@@ -292,7 +292,7 @@

If you want to directly run the client for testing, you can refer to the following simple instructions, taking the Python version as an example:
```shell
python3 wss_client_asr.py --host "127.0.0.1" --port 10095 --mode offline --audio_in "../audio/asr_example.wav" --send_without_sleep --output_dir "./results"
python3 funasr_wss_client.py --host "127.0.0.1" --port 10095 --mode offline --audio_in "../audio/asr_example.wav" --send_without_sleep --output_dir "./results"
```

Command parameter instructions: 

 funasr/runtime/docs/SDK_tutorial_zh.md

@@ -10,32 +10,36 @@
- 配置2: （X86，计算型），16核vCPU，内存32G，单机可以支持大约64路的请求
- 配置3: （X86，计算型），64核vCPU，内存128G，单机可以支持大约200路的请求

详细性能测试报告（[点击此处](./benchmark_onnx_cpp.md)）

云服务厂商，针对新用户，有3个月免费试用活动，申请教程（[点击此处](./aliyun_server_tutorial.md)）

## 快速上手

### 服务端启动

`注意`：一键部署工具，过程分为：安装docker、下载docker镜像、启动服务。如果用户希望直接从FunASR docker镜像启动，可以参考开发指南（[点击此处](./SDK_advanced_guide_offline_zh.md)）

下载部署工具`funasr-runtime-deploy-offline-cpu-zh.sh`

```shell
curl -O https://raw.githubusercontent.com/alibaba-damo-academy/FunASR/main/funasr/runtime/deploy_tools/funasr-runtime-deploy-offline-cpu-zh.sh;
# 如遇到网络问题，中国大陆用户，可以用个下面的命令：
# 如遇到网络问题，中国大陆用户，可以使用下面的命令：
# curl -O https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/shell/funasr-runtime-deploy-offline-cpu-zh.sh;
```

执行部署工具，在提示处输入回车键即可完成服务端安装与部署。目前便捷部署工具暂时仅支持Linux环境，其他环境部署参考开发指南（[点击此处](./SDK_advanced_guide_zh.md)）
执行部署工具，在提示处输入回车键即可完成服务端安装与部署。目前便捷部署工具暂时仅支持Linux环境，其他环境部署参考开发指南（[点击此处](./SDK_advanced_guide_offline_zh.md)）
```shell
sudo bash funasr-runtime-deploy-offline-cpu-zh.sh install --workspace /root/funasr-runtime-resources
```

### 客户端测试与使用

运行上面安装指令后，会在/root/funasr-runtime-resources（默认安装目录）中下载客户端测试工具目录samples，
运行上面安装指令后，会在/root/funasr-runtime-resources（默认安装目录）中下载客户端测试工具目录samples（手动下载，[点击此处](https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/sample/funasr_samples.tar.gz)），
我们以Python语言客户端为例，进行说明，支持多种音频格式输入（.wav, .pcm, .mp3等），也支持视频输入(.mp4等)，以及多文件列表wav.scp输入，其他版本客户端请参考文档（[点击此处](#客户端用法详解)）

```shell
python3 wss_client_asr.py --host "127.0.0.1" --port 10095 --mode offline --audio_in "../audio/asr_example.wav" --output_dir "./results"
python3 funasr_wss_client.py --host "127.0.0.1" --port 10095 --mode offline --audio_in "../audio/asr_example.wav"
```

## 客户端用法详解
@@ -54,7 +58,7 @@
若想直接运行client进行测试，可参考如下简易说明，以python版本为例：

```shell
python3 wss_client_asr.py --host "127.0.0.1" --port 10095 --mode offline --audio_in "../audio/asr_example.wav" --output_dir "./results"
python3 funasr_wss_client.py --host "127.0.0.1" --port 10095 --mode offline --audio_in "../audio/asr_example.wav"
```

命令参数说明：
@@ -63,7 +67,8 @@
--port 10095 部署端口号
--mode offline表示离线文件转写
--audio_in 需要进行转写的音频文件，支持文件路径，文件列表wav.scp
--output_dir 识别结果保存路径
--thread_num 设置并发发送线程数，默认为1
--ssl 设置是否开启ssl证书校验，默认1开启，设置为0关闭
```

### cpp-client
@@ -78,6 +83,8 @@
--server-ip 为FunASR runtime-SDK服务部署机器ip，默认为本机ip（127.0.0.1），如果client与服务不在同一台服务器，需要改为部署机器ip
--port 10095 部署端口号
--wav-path 需要进行转写的音频文件，支持文件路径
--thread_num 设置并发发送线程数，默认为1
--ssl 设置是否开启ssl证书校验，默认1开启，设置为0关闭
```

### Html网页版
@@ -108,6 +115,13 @@
sudo bash funasr-runtime-deploy-offline-cpu-zh.sh stop
```

### 释放FunASR服务

释放已经部署的FunASR服务。
```shell
sudo bash funasr-runtime-deploy-offline-cpu-zh.sh remove
```

### 重启FunASR服务

根据上次一键部署的设置重启启动FunASR服务。
@@ -134,6 +148,7 @@
sudo bash funasr-runtime-deploy-offline-cpu-zh.sh update [--host_port | --docker_port] <port number>
sudo bash funasr-runtime-deploy-offline-cpu-zh.sh update [--decode_thread_num | --io_thread_num] <the number of threads>
sudo bash funasr-runtime-deploy-offline-cpu-zh.sh update [--workspace] <workspace in local>
sudo bash funasr-runtime-deploy-offline-cpu-zh.sh update [--ssl] <0: close SSL; 1: open SSL, default:1>

e.g
sudo bash funasr-runtime-deploy-offline-cpu-zh.sh update --decode_thread_num 32
@@ -143,7 +158,7 @@

## 服务端启动过程配置详解

##### 选择FunASR Docker镜像
### 选择FunASR Docker镜像
推荐选择1)使用我们的最新发布版镜像，也可选择历史版本。
```text
[1/5]
@@ -157,7 +172,7 @@
```


##### 设置宿主机提供给FunASR的端口
### 设置宿主机提供给FunASR的端口
设置提供给Docker的宿主机端口，默认为10095。请保证此端口可用。
```text
[2/5]
@@ -167,6 +182,22 @@
  The port in Docker for FunASR server is 10095
```

### 设置SSL

默认开启SSL校验，如果需要关闭，可以在启动时设置
```shell
sudo bash funasr-runtime-deploy-offline-cpu-zh.sh --ssl 0
```

## 联系我们

在您使用过程中，如果遇到问题，欢迎加入用户群进行反馈


|                                    钉钉用户群                                     |                                      微信               |
|:----------------------------------------------------------------------------:|:-----------------------------------------------------:|
| <div align="left"><img src="../../../docs/images/dingding.jpg" width="250"/> | <img src="../../../docs/images/wechat.png" width="232"/></div> |


## 视频demo


 funasr/runtime/html5/readme.md

@@ -41,7 +41,7 @@
`Tips:` asr service and html5 service should be deployed on the same device.

```shell

cd ../python/websocket

python wss_srv_asr.py --port 10095

python funasr_wss_server.py --port 10095

```






 funasr/runtime/html5/readme_cn.md

@@ -49,7 +49,7 @@
#### wss方式

```shell

cd ../python/websocket

python wss_srv_asr.py --port 10095

python funasr_wss_server.py --port 10095

```



### 浏览器打开地址


 funasr/runtime/python/onnxruntime/setup.py

@@ -13,7 +13,7 @@


MODULE_NAME = 'funasr_onnx'
VERSION_NUM = '0.1.1'
VERSION_NUM = '0.1.2'

setuptools.setup(
    name=MODULE_NAME,

 funasr/runtime/python/websocket/README.md

@@ -24,7 +24,7 @@

##### API-reference
```shell
python wss_srv_asr.py \
python funasr_wss_server.py \
--port [port id] \
--asr_model [asr model_name] \
--asr_model_online [asr model_name] \
@@ -36,7 +36,7 @@
```
##### Usage examples
```shell
python wss_srv_asr.py --port 10095 --asr_model "damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch"  --asr_model_online "damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online"
python funasr_wss_server.py --port 10095
```

## For the client
@@ -47,11 +47,19 @@
cd funasr/runtime/python/websocket
pip install -r requirements_client.txt
```
If you want infer from videos, you should install `ffmpeg`
```shell
apt-get install -y ffmpeg #ubuntu
# yum install -y ffmpeg # centos
# brew install ffmpeg # mac
# winget install ffmpeg # wins
pip3 install websockets ffmpeg-python
```

### Start client
#### API-reference
```shell
python wss_client_asr.py \
python funasr_wss_client.py \
--host [ip_address] \
--port [port id] \
--chunk_size ["5,10,5"=600ms, "8,8,4"=480ms] \
@@ -59,9 +67,8 @@
--words_max_print [max number of words to print] \
--audio_in [if set, loadding from wav.scp, else recording from mircrophone] \
--output_dir [if set, write the results to output_dir] \
--send_without_sleep [only set for offline] \
--ssl [1 for wss connect, 0 for ws, default is 1] \
--mode [`online` for streaming asr, `offline` for non-streaming, `2pass` for unifying streaming and non-streaming asr] \
--thread_num [thread_num for send data]
```

#### Usage examples
@@ -69,36 +76,36 @@
Recording from mircrophone
```shell
# --chunk_interval, "10": 600/10=60ms, "5"=600/5=120ms, "20": 600/12=30ms
python wss_client_asr.py --host "0.0.0.0" --port 10095 --mode offline --chunk_interval 10 --words_max_print 100
python funasr_wss_client.py --host "0.0.0.0" --port 10095 --mode offline
```
Loadding from wav.scp(kaldi style)
```shell
# --chunk_interval, "10": 600/10=60ms, "5"=600/5=120ms, "20": 600/12=30ms
python wss_client_asr.py --host "0.0.0.0" --port 10095 --mode offline --chunk_interval 10 --words_max_print 100 --audio_in "./data/wav.scp" --output_dir "./results"
python funasr_wss_client.py --host "0.0.0.0" --port 10095 --mode offline --audio_in "./data/wav.scp" --output_dir "./results"
```

##### ASR streaming client
Recording from mircrophone
```shell
# --chunk_size, "5,10,5"=600ms, "8,8,4"=480ms
python wss_client_asr.py --host "0.0.0.0" --port 10095 --mode online --chunk_size "5,10,5" --words_max_print 100
python funasr_wss_client.py --host "0.0.0.0" --port 10095 --mode online --chunk_size "5,10,5"
```
Loadding from wav.scp(kaldi style)
```shell
# --chunk_size, "5,10,5"=600ms, "8,8,4"=480ms
python wss_client_asr.py --host "0.0.0.0" --port 10095 --mode online --chunk_size "5,10,5" --audio_in "./data/wav.scp" --output_dir "./results"
python funasr_wss_client.py --host "0.0.0.0" --port 10095 --mode online --chunk_size "5,10,5" --audio_in "./data/wav.scp" --output_dir "./results"
```

##### ASR offline/online 2pass client
Recording from mircrophone
```shell
# --chunk_size, "5,10,5"=600ms, "8,8,4"=480ms
python wss_client_asr.py --host "0.0.0.0" --port 10095 --mode 2pass --chunk_size "8,8,4"
python funasr_wss_client.py --host "0.0.0.0" --port 10095 --mode 2pass --chunk_size "8,8,4"
```
Loadding from wav.scp(kaldi style)
```shell
# --chunk_size, "5,10,5"=600ms, "8,8,4"=480ms
python wss_client_asr.py --host "0.0.0.0" --port 10095 --mode 2pass --chunk_size "8,8,4" --audio_in "./data/wav.scp" --output_dir "./results"
python funasr_wss_client.py --host "0.0.0.0" --port 10095 --mode 2pass --chunk_size "8,8,4" --audio_in "./data/wav.scp" --output_dir "./results"
```
## Acknowledge
1. This project is maintained by [FunASR community](https://github.com/alibaba-damo-academy/FunASR).

 funasr/runtime/python/websocket/funasr_wss_client.py

File was renamed from funasr/runtime/python/websocket/wss_client_asr.py
@@ -8,11 +8,10 @@
import json
import traceback
from multiprocessing import Process
from funasr.fileio.datadir_writer import DatadirWriter
# from funasr.fileio.datadir_writer import DatadirWriter

import logging

SUPPORT_AUDIO_TYPE_SETS = ['.wav', '.pcm']
logging.basicConfig(level=logging.ERROR)

parser = argparse.ArgumentParser()
@@ -42,10 +41,10 @@
                    action="store_true",
                    default=True,
                    help="if audio_in is set, send_without_sleep")
parser.add_argument("--test_thread_num",
parser.add_argument("--thread_num",
                    type=int,
                    default=1,
                    help="test_thread_num")
                    help="thread_num")
parser.add_argument("--words_max_print",
                    type=int,
                    default=10000,
@@ -72,11 +71,13 @@

voices = Queue()
offline_msg_done=False
 
ibest_writer = None

if args.output_dir is not None:
    writer = DatadirWriter(args.output_dir)
    ibest_writer = writer[f"1best_recog"]
    # if os.path.exists(args.output_dir):
    #     os.remove(args.output_dir)
        
    if not os.path.exists(args.output_dir):
        os.makedirs(args.output_dir)


async def record_microphone():
@@ -100,11 +101,13 @@

    message = json.dumps({"mode": args.mode, "chunk_size": args.chunk_size, "chunk_interval": args.chunk_interval,
                          "wav_name": "microphone", "is_speaking": True})
    voices.put(message)
    #voices.put(message)
    await websocket.send(message)
    while True:
        data = stream.read(CHUNK)
        message = data
        voices.put(message)
        #voices.put(message)
        await websocket.send(message)
        await asyncio.sleep(0.005)

async def record_from_scp(chunk_begin, chunk_size):
@@ -134,8 +137,17 @@
                frames = wav_file.readframes(wav_file.getnframes())
                audio_bytes = bytes(frames)
        else:
            raise NotImplementedError(
                f'Not supported audio type')
            import ffmpeg
            try:
                # This launches a subprocess to decode audio while down-mixing and resampling as necessary.
                # Requires the ffmpeg CLI and `ffmpeg-python` package to be installed.
                audio_bytes, _ = (
                    ffmpeg.input(wav_path, threads=0)
                    .output("-", format="s16le", acodec="pcm_s16le", ac=1, ar=16000)
                    .run(cmd=["ffmpeg", "-nostdin"], capture_stdout=True, capture_stderr=True)
                )
            except ffmpeg.Error as e:
                raise RuntimeError(f"Failed to load audio: {e.stderr.decode()}") from e

        # stride = int(args.chunk_size/1000*16000*2)
        stride = int(60 * args.chunk_size[1] / args.chunk_interval / 1000 * 16000 * 2)
@@ -164,10 +176,9 @@
            sleep_duration = 0.001 if args.mode == "offline" else 60 * args.chunk_size[1] / args.chunk_interval / 1000
            
            await asyncio.sleep(sleep_duration)
    # when all data sent, we need to close websocket
    while not voices.empty():
         await asyncio.sleep(1)
    await asyncio.sleep(3)
    
    if not args.mode=="offline":
        await asyncio.sleep(2)
    # offline model need to wait for message recved
    
    if args.mode=="offline":
@@ -176,17 +187,18 @@
         await asyncio.sleep(1)
    
    await websocket.close()
     
 
 

 
             

          
async def message(id):
    global websocket,voices,offline_msg_done
    text_print = ""
    text_print_2pass_online = ""
    text_print_2pass_offline = ""
    if args.output_dir is not None:
        ibest_writer = open(os.path.join(args.output_dir, "text.{}".format(id)), "a", encoding="utf-8")
    else:
        ibest_writer = None
    try:
       while True:
        
@@ -194,9 +206,11 @@
            meg = json.loads(meg)
            wav_name = meg.get("wav_name", "demo")
            text = meg["text"]
            if ibest_writer is not None:
                ibest_writer["text"][wav_name] = text

            if ibest_writer is not None:
                text_write_line = "{}\t{}\n".format(wav_name, text)
                ibest_writer.write(text_write_line)
                
            if meg["mode"] == "online":
                text_print += "{}".format(text)
                text_print = text_print[-args.words_max_print:]
@@ -204,10 +218,10 @@
                print("\rpid" + str(id) + ": " + text_print)
            elif meg["mode"] == "offline":
                text_print += "{}".format(text)
                text_print = text_print[-args.words_max_print:]
                os.system('clear')
                print("\rpid" + str(id) + ": " + text_print)
                offline_msg_done=True
                # text_print = text_print[-args.words_max_print:]
                # os.system('clear')
                print("\rpid" + str(id) + ": " + wav_name + ": " + text_print)
                offline_msg_done = True
            else:
                if meg["mode"] == "2pass-online":
                    text_print_2pass_online += "{}".format(text)
@@ -219,6 +233,7 @@
                text_print = text_print[-args.words_max_print:]
                os.system('clear')
                print("\rpid" + str(id) + ": " + text_print)
                offline_msg_done=True

    except Exception as e:
            print("Exception:", e)
@@ -227,17 +242,6 @@
 


async def print_messge():
    global websocket
    while True:
        try:
            meg = await websocket.recv()
            meg = json.loads(meg)
            print(meg)
        except Exception as e:
            print("Exception:", e)
            #traceback.print_exc()
            exit(0)

async def ws_client(id, chunk_begin, chunk_size):
  if args.audio_in is None:
@@ -262,7 +266,6 @@
            task = asyncio.create_task(record_from_scp(i, 1))
        else:
            task = asyncio.create_task(record_microphone())
        #task2 = asyncio.create_task(ws_send())
        task3 = asyncio.create_task(message(str(id)+"_"+str(i))) #processid+fileid
        await asyncio.gather(task, task3)
  exit(0)
@@ -291,21 +294,19 @@
            wav_name = wav_splits[0] if len(wav_splits) > 1 else "demo"
            wav_path = wav_splits[1] if len(wav_splits) > 1 else wav_splits[0]
            audio_type = os.path.splitext(wav_path)[-1].lower()
            if audio_type not in SUPPORT_AUDIO_TYPE_SETS:
                raise NotImplementedError(
                    f'Not supported audio type: {audio_type}')


        total_len = len(wavs)
        if total_len >= args.test_thread_num:
            chunk_size = int(total_len / args.test_thread_num)
            remain_wavs = total_len - chunk_size * args.test_thread_num
        if total_len >= args.thread_num:
            chunk_size = int(total_len / args.thread_num)
            remain_wavs = total_len - chunk_size * args.thread_num
        else:
            chunk_size = 1
            remain_wavs = 0

        process_list = []
        chunk_begin = 0
        for i in range(args.test_thread_num):
        for i in range(args.thread_num):
            now_chunk_size = chunk_size
            if remain_wavs > 0:
                now_chunk_size = chunk_size + 1

 funasr/runtime/python/websocket/funasr_wss_server.py

File was renamed from funasr/runtime/python/websocket/wss_srv_asr.py
@@ -5,17 +5,64 @@
import logging
import tracemalloc
import numpy as np
import argparse
import ssl
from parse_args import args
from modelscope.pipelines import pipeline
from modelscope.utils.constant import Tasks
from modelscope.utils.logger import get_logger
from funasr.runtime.python.onnxruntime.funasr_onnx.utils.frontend import load_bytes

tracemalloc.start()

logger = get_logger(log_level=logging.CRITICAL)
logger.setLevel(logging.CRITICAL)

parser = argparse.ArgumentParser()
parser.add_argument("--host",
                    type=str,
                    default="0.0.0.0",
                    required=False,
                    help="host ip, localhost, 0.0.0.0")
parser.add_argument("--port",
                    type=int,
                    default=10095,
                    required=False,
                    help="grpc server port")
parser.add_argument("--asr_model",
                    type=str,
                    default="damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch",
                    help="model from modelscope")
parser.add_argument("--asr_model_online",
                    type=str,
                    default="damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online",
                    help="model from modelscope")
parser.add_argument("--vad_model",
                    type=str,
                    default="damo/speech_fsmn_vad_zh-cn-16k-common-pytorch",
                    help="model from modelscope")
parser.add_argument("--punc_model",
                    type=str,
                    default="damo/punc_ct-transformer_zh-cn-common-vad_realtime-vocab272727",
                    help="model from modelscope")
parser.add_argument("--ngpu",
                    type=int,
                    default=1,
                    help="0 for cpu, 1 for gpu")
parser.add_argument("--ncpu",
                    type=int,
                    default=4,
                    help="cpu cores")
parser.add_argument("--certfile",
                    type=str,
                    default="./ssl_key/server.crt",
                    required=False,
                    help="certfile for ssl")

parser.add_argument("--keyfile",
                    type=str,
                    default="./ssl_key/server.key",
                    required=False,
                    help="keyfile for ssl")
args = parser.parse_args()


websocket_users = set()
@@ -185,8 +232,6 @@
async def async_asr(websocket, audio_in):
            if len(audio_in) > 0:
                # print(len(audio_in))
                audio_in = load_bytes(audio_in)
                
                rec_result = inference_pipeline_asr(audio_in=audio_in,
                                                    param_dict=websocket.param_dict_asr)
                # print(rec_result)
@@ -195,13 +240,12 @@
                                                         param_dict=websocket.param_dict_punc)
                    # print("offline", rec_result)
                if 'text' in rec_result:
                    message = json.dumps({"mode": "2pass-offline", "text": rec_result["text"], "wav_name": websocket.wav_name})
                    message = json.dumps({"mode": websocket.mode, "text": rec_result["text"], "wav_name": websocket.wav_name})
                    await websocket.send(message)


async def async_asr_online(websocket, audio_in):
    if len(audio_in) > 0:
        audio_in = load_bytes(audio_in)
        # print(websocket.param_dict_asr_online.get("is_final", False))
        rec_result = inference_pipeline_asr_online(audio_in=audio_in,
                                                   param_dict=websocket.param_dict_asr_online)
@@ -212,7 +256,7 @@
        if "text" in rec_result:
            if rec_result["text"] != "sil" and rec_result["text"] != "waiting_for_more_voice":
                # print("online", rec_result)
                message = json.dumps({"mode": "2pass-online", "text": rec_result["text"], "wav_name": websocket.wav_name})
                message = json.dumps({"mode": websocket.mode, "text": rec_result["text"], "wav_name": websocket.wav_name})
                await websocket.send(message)

if len(args.certfile)>0:

 funasr/runtime/python/websocket/parse_args.py

File was deleted

 funasr/runtime/python/websocket/requirements_client.txt

@@ -1,2 +1,2 @@
websockets
pyaudio
pyaudio

 funasr/runtime/readme_cn.md

@@ -14,7 +14,7 @@

## 中文离线文件转写服务部署（CPU版本）

中文语音离线文件服务部署（CPU版本），拥有完整的语音识别链路，可以将几十个小时的音频与视频识别成带标点的文字，而且支持上百路请求同时进行转写。
中文语音离线文件服务部署（CPU版本），拥有完整的语音识别链路，可以将几十个小时的长音频与视频识别成带标点的文字，而且支持上百路请求同时进行转写。
为了支持不同用户的需求，针对不同场景，准备了不同的图文教程：

### 便捷部署教程
@@ -28,4 +28,4 @@

### 技术原理揭秘

文档介绍了背后技术原理，识别准确率，计算效率等，以及核心优势介绍：便捷、高精度、高效率、长音频链路，详细文档参考（[点击此处](https://mp.weixin.qq.com/s?__biz=MzA3MTQ0NTUyMw==&tempkey=MTIyNF84d05USjMxSEpPdk5GZXBJUFNJNzY0bU1DTkxhV19mcWY4MTNWQTJSYXhUaFgxOWFHZTZKR0JzWC1JRmRCdUxCX2NoQXg0TzFpNmVJX2R1WjdrcC02N2FEcUc3MDhzVVhpNWQ5clU4QUdqNFdkdjFYb18xRjlZMmc5c3RDOTl0U0NiRkJLb05ZZ0RmRlVkVjFCZnpXNWFBVlRhbXVtdWs4bUMwSHZnfn4%3D&chksm=1f2c3254285bbb42bc8f76a82e9c5211518a0bb1ff8c357d085c1b78f675ef2311f3be6e282c#rd)）
文档介绍了背后技术原理，识别准确率，计算效率等，以及核心优势介绍：便捷、高精度、高效率、长音频链路，详细文档参考（[点击此处](https://mp.weixin.qq.com/s/DHQwbgdBWcda0w_L60iUww)）

 funasr/runtime/run_server.sh

old mode 100755
new mode 100644

 funasr/runtime/websocket/CMakeLists.txt

@@ -1,4 +1,4 @@
cmake_minimum_required(VERSION 3.10)
cmake_minimum_required(VERSION 3.16)

project(FunASRWebscoket) 


 funasr/runtime/websocket/funasr-wss-client.cpp

@@ -20,6 +20,7 @@
#include <websocketpp/config/asio_client.hpp>
#include <fstream>
#include <atomic>
#include <thread>
#include <glog/logging.h>

#include "audio.h"
@@ -106,7 +107,7 @@
        switch (msg->get_opcode()) {
            case websocketpp::frame::opcode::text:
                total_num=total_num+1;
                LOG(INFO)<<total_num<<",on_message = " << payload;
                LOG(INFO)<< "Thread: " << this_thread::get_id() <<",on_message = " << payload;
                if((total_num+1)==wav_index)
                {
                    websocketpp::lib::error_code ec;
@@ -375,4 +376,4 @@
    for (auto& t : client_threads) {
        t.join();
    }
}
}

 funasr/utils/misc.py

@@ -12,7 +12,7 @@
    return numel


def int2vec(x, vec_dim=8, dtype=np.int):
def int2vec(x, vec_dim=8, dtype=np.int32):
    b = ('{:0' + str(vec_dim) + 'b}').format(x)
    # little-endian order: lower bit first
    return (np.array(list(b)[::-1]) == '1').astype(dtype)

 funasr/version.txt

@@ -1 +1 @@
0.6.7
0.6.9

 tests/test_asr_inference_pipeline.py

@@ -119,20 +119,28 @@
    def test_paraformer_large_online_common(self):
        inference_pipeline = pipeline(
            task=Tasks.auto_speech_recognition,
            model='damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online')
            model='damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online',
            model_revision='v1.0.6',
            update_model=False,
            mode="paraformer_fake_streaming"
        )
        rec_result = inference_pipeline(
            audio_in='https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_zh.wav')
        logger.info("asr inference result: {0}".format(rec_result))
        assert rec_result["text"] == "欢迎大 家来 体验达 摩院推 出的 语音识 别模 型"
        assert rec_result["text"] == "欢迎大家来体验达摩院推出的语音识别模型"

    def test_paraformer_online_common(self):
        inference_pipeline = pipeline(
            task=Tasks.auto_speech_recognition,
            model='damo/speech_paraformer_asr_nat-zh-cn-16k-common-vocab8404-online')
            model='damo/speech_paraformer_asr_nat-zh-cn-16k-common-vocab8404-online',
            model_revision='v1.0.6',
            update_model=False,
            mode="paraformer_fake_streaming"
        )
        rec_result = inference_pipeline(
            audio_in='https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_zh.wav')
        logger.info("asr inference result: {0}".format(rec_result))
        assert rec_result["text"] == "欢迎 大家来 体验达 摩院推 出的 语音识 别模 型"
        assert rec_result["text"] == "欢迎大家来体验达摩院推出的语音识别模型"

    def test_paraformer_tiny_commandword(self):
        inference_pipeline = pipeline(

New file
			@@ -0,0 +1,8 @@
			## Acknowledge

			1. We borrowed a lot of code from [Kaldi](http://kaldi-asr.org/) for data preparation.
			2. We borrowed a lot of code from [ESPnet](https://github.com/espnet/espnet). FunASR follows up the training and finetuning pipelines of ESPnet.
			3. We referred [Wenet](https://github.com/wenet-e2e/wenet) for building dataloader for large scale data training.
			4. We acknowledge [ChinaTelecom](https://github.com/zhuzizyf/damo-fsmn-vad-infer-httpserver) for contributing the VAD runtime.
			5. We acknowledge [RapidAI](https://github.com/RapidAI) for contributing the Paraformer and CT_Transformer-punc runtime.
			6. We acknowledge [AiHealthx](http://www.aihealthx.com/) for contributing the websocket service and html5.

			@@ -1,5 +1,7 @@
			[//]: # (<div align="left"><img src="docs/images/funasr_logo.jpg" width="400"/></div>)

			([简体中文](./README_zh.md)\|English)

			# FunASR: A Fundamental End-to-End Speech Recognition Toolkit
			<p align="left">
			<a href=""><img src="https://img.shields.io/badge/OS-Linux%2C%20Win%2C%20Mac-brightgreen.svg"></a>
			@@ -23,7 +25,7 @@

			### FunASR runtime-SDK

			- 2023.07.02:
			- 2023.07.03:
			We have release the FunASR runtime-SDK-0.1.0, file transcription service (Mandarin) is now supported ([ZH](funasr/runtime/readme_cn.md)/[EN](funasr/runtime/readme.md))

			### Multi-Channel Multi-Party Meeting Transcription 2.0 (M2MeT2.0) Challenge
			@@ -109,13 +111,13 @@
			For the server:
			```shell
			cd funasr/runtime/python/websocket
			python wss_srv_asr.py --port 10095
			python funasr_wss_server.py --port 10095
			```

			For the client:
			```shell
			python wss_client_asr.py --host "127.0.0.1" --port 10095 --mode 2pass --chunk_size "5,10,5"
			#python wss_client_asr.py --host "127.0.0.1" --port 10095 --mode 2pass --chunk_size "8,8,4" --audio_in "./data/wav.scp" --output_dir "./results"
			python funasr_wss_client.py --host "127.0.0.1" --port 10095 --mode 2pass --chunk_size "5,10,5"
			#python funasr_wss_client.py --host "127.0.0.1" --port 10095 --mode 2pass --chunk_size "8,8,4" --audio_in "./data/wav.scp" --output_dir "./results"
			```
			More examples could be found in [docs](https://alibaba-damo-academy.github.io/FunASR/en/runtime/websocket_python.html#id2)
			## Contact

New file
			@@ -0,0 +1,207 @@
			[//]: # (<div align="left"><img src="docs/images/funasr_logo.jpg" width="400"/></div>)

			(简体中文\|[English](./README.md))

			# FunASR: A Fundamental End-to-End Speech Recognition Toolkit
			<p align="left">
			<a href=""><img src="https://img.shields.io/badge/OS-Linux%2C%20Win%2C%20Mac-brightgreen.svg"></a>
			<a href=""><img src="https://img.shields.io/badge/Python->=3.7,<=3.10-aff.svg"></a>
			<a href=""><img src="https://img.shields.io/badge/Pytorch-%3E%3D1.11-blue"></a>
			</p>

			FunASR希望在语音识别的学术研究和工业应用之间架起一座桥梁。通过支持在[ModelScope](https://www.modelscope.cn/models?page=1&tasks=auto-speech-recognition)上发布的工业级语音识别模型的训练和微调，研究人员和开发人员可以更方便地进行语音识别模型的研究和生产，并推动语音识别生态的发展。让语音识别更有趣！

			<div align="center">
			<h4>
			<a href="#最新动态"> 最新动态 </a>
			｜<a href="#安装教程"> 安装 </a>
			｜<a href="#快速开始"> 快速开始 </a>
			｜<a href="https://alibaba-damo-academy.github.io/FunASR/en/index.html"> 教程文档 </a>
			｜<a href="#核心功能"> 核心功能 </a>
			｜<a href="./docs/model_zoo/modelscope_models.md"> 模型仓库 </a>
			｜<a href="./funasr/runtime/readme_cn.md"> 服务部署 </a>
			｜<a href="#联系我们"> 联系我们 </a>
			</h4>
			</div>

			<a name="最新动态"></a>
			## 最新动态

			### 服务部署SDK

			- 2023.07.03:
			中文离线文件转写服务（CPU版本）发布，支持一键部署和测试([点击此处](funasr/runtime/readme_cn.md))

			### ASRU 2023 多通道多方会议转录挑战 2.0

			详情请参考文档（[点击此处](https://alibaba-damo-academy.github.io/FunASR/m2met2_cn/index.html)）


			### 学术模型更新

			### 工业模型更新

			- 2023/07/06

			<a name="核心功能"></a>
			## 核心功能
			- FunASR是一个基础语音识别工具包，提供多种功能，包括语音识别（ASR）、语音活动检测（VAD）、标点恢复、语言模型、说话人验证、说话人分离和多人对话语音识别。
			- 我们在[ModelScope](https://www.modelscope.cn/models?page=1&tasks=auto-speech-recognition)上发布了大量的学术和工业预训练模型，可以通过我们的[模型仓库](https://github.com/alibaba-damo-academy/FunASR/blob/main/docs/model_zoo/modelscope_models.md)访问。代表性的[Paraformer-large](https://www.modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary)模型在许多语音识别任务中实现了SOTA性能。
			- FunASR提供了一个易于使用的接口，可以直接基于ModelScope中托管模型进行推理与微调。此外，FunASR中的优化数据加载器可以加速大规模数据集的训练速度。

			<a name="安装教程"></a>
			## 安装教程

			直接安装发布软件包

			```shell
			pip3 install -U funasr
			# 中国大陆用户，如果遇到网络问题，可以用下面指令:
			# pip3 install -U funasr -i https://mirror.sjtu.edu.cn/pypi/web/simple
			```

			您也可以从源码安装


			``` sh
			git clone https://github.com/alibaba/FunASR.git && cd FunASR
			pip3 install -e ./
			# 中国大陆用户，如果遇到网络问题，可以用下面指令:
			# pip3 install -e ./ -i https://mirror.sjtu.edu.cn/pypi/web/simple
			```
			如果您需要使用ModelScope中发布的预训练模型，需要安装ModelScope

			```shell
			pip3 install -U modelscope
			# 中国大陆用户，如果遇到网络问题，可以用下面指令:
			# pip3 install -U modelscope -f https://modelscope.oss-cn-beijing.aliyuncs.com/releases/repo.html -i https://mirror.sjtu.edu.cn/pypi/web/simple
			```

			更详细安装过程介绍（[点击此处](https://alibaba-damo-academy.github.io/FunASR/en/installation/installation.html)）

			<a name="快速开始"></a>
			## 快速开始

			您可以通过如下几种方式使用FunASR功能:

			- 服务部署SDK
			- 工业模型egs
			- 学术模型egs

			### 服务部署SDK

			#### python版本示例

			支持实时流式语音识别，并且会用非流式模型进行纠错，输出文本带有标点。目前只支持单个client，如需多并发请参考下方c++版本服务部署SDK

			##### 服务端部署
			```shell
			cd funasr/runtime/python/websocket
			python funasr_wss_server.py --port 10095
			```

			##### 客户端测试
			```shell
			python funasr_wss_client.py --host "127.0.0.1" --port 10095 --mode 2pass --chunk_size "5,10,5"
			#python funasr_wss_client.py --host "127.0.0.1" --port 10095 --mode 2pass --chunk_size "8,8,4" --audio_in "./data/wav.scp"
			```
			更多例子可以参考（[点击此处](https://alibaba-damo-academy.github.io/FunASR/en/runtime/websocket_python.html#id2)）

			<a name="cpp版本示例"></a>
			#### c++版本示例

			目前已支持离线文件转写服务（CPU），支持上百路并发请求

			##### 服务端部署
			可以用个下面指令，一键部署完成部署
			```shell
			curl -O https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/shell/funasr-runtime-deploy-offline-cpu-zh.sh
			sudo bash funasr-runtime-deploy-offline-cpu-zh.sh install --workspace ./funasr-runtime-resources
			```

			##### 客户端测试

			```shell
			python3 funasr_wss_client.py --host "127.0.0.1" --port 10095 --mode offline --audio_in "../audio/asr_example.wav"
			```
			更多例子参考（[点击此处](https://github.com/alibaba-damo-academy/FunASR/blob/main/funasr/runtime/docs/SDK_tutorial_zh.md)）


			### 工业模型egs

			如果您希望使用ModelScope中预训练好的工业模型，进行推理或者微调训练，您可以参考下面指令：


			```python
			from modelscope.pipelines import pipeline
			from modelscope.utils.constant import Tasks

			inference_pipeline = pipeline(
			task=Tasks.auto_speech_recognition,
			model='damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch',
			)

			rec_result = inference_pipeline(audio_in='https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_zh.wav')
			print(rec_result)
			# {'text': '欢迎大家来体验达摩院推出的语音识别模型'}
			```

			更多例子可以参考（[点击此处](https://alibaba-damo-academy.github.io/FunASR/en/modelscope_pipeline/quick_start.html)）


			### 学术模型egs

			如果您希望从头开始训练，通常为学术模型，您可以通过下面的指令启动训练与推理：

			```shell
			cd egs/aishell/paraformer
			. ./run.sh --CUDA_VISIBLE_DEVICES="0,1" --gpu_num=2
			```

			更多例子可以参考（[点击此处](https://alibaba-damo-academy.github.io/FunASR/en/academic_recipe/asr_recipe.html)）

			<a name="联系我们"></a>
			## 联系我们

			如果您在使用中遇到困难，可以通过以下方式联系我们

			- 邮件: [funasr@list.alibaba-inc.com](funasr@list.alibaba-inc.com)

			\| 钉钉群 \| 微信 \|
			\|:---------------------------------------------------------------------:\|:-----------------------------------------------------:\|
			\| <div align="left"><img src="docs/images/dingding.jpg" width="250"/> \| <img src="docs/images/wechat.png" width="232"/></div> \|

			## 社区贡献者

			\| <div align="left"><img src="docs/images/damo.png" width="180"/> \| <div align="left"><img src="docs/images/nwpu.png" width="260"/> \| <img src="docs/images/China_Telecom.png" width="200"/> </div> \| <img src="docs/images/RapidAI.png" width="200"/> </div> \| <img src="docs/images/aihealthx.png" width="200"/> </div> \|
			\|:---------------------------------------------------------------:\|:---------------------------------------------------------------:\|:--------------------------------------------------------------:\|:-------------------------------------------------------:\|:-----------------------------------------------------------:\|

			贡献者名单请参考（[点击此处](./Acknowledge)）


			## 许可协议
			项目遵循[The MIT License](https://opensource.org/licenses/MIT)开源协议. 工业模型许可协议请参考（[点击此处](./MODEL_LICENSE)）


			## Stargazers over time

			[![Stargazers over time](https://starchart.cc/alibaba-damo-academy/FunASR.svg)](https://starchart.cc/alibaba-damo-academy/FunASR)

			## 论文引用

			``` bibtex
			@inproceedings{gao2023funasr,
			author={Zhifu Gao and Zerui Li and Jiaming Wang and Haoneng Luo and Xian Shi and Mengzhe Chen and Yabin Li and Lingyun Zuo and Zhihao Du and Zhangyu Xiao and Shiliang Zhang},
			title={FunASR: A Fundamental End-to-End Speech Recognition Toolkit},
			year={2023},
			booktitle={INTERSPEECH},
			}
			@inproceedings{gao22b_interspeech,
			author={Zhifu Gao and ShiLiang Zhang and Ian McLoughlin and Zhijie Yan},
			title={{Paraformer: Fast and Accurate Parallel Transformer for Non-autoregressive End-to-End Speech Recognition}},
			year=2022,
			booktitle={Proc. Interspeech 2022},
			pages={2063--2067},
			doi={10.21437/Interspeech.2022-9996}
			}
			```

			@@ -15,7 +15,7 @@
			\| Model Name \| Language \| Training Data \| Vocab Size \| Parameter \| Offline/Online \| Notes \|
			\|:--------------------------------------------------------------------------------------------------------------------------------------------------:\|:--------:\|:--------------------------------:\|:----------:\|:---------:\|:--------------:\|:--------------------------------------------------------------------------------------------------------------------------------\|
			\| [Paraformer-large](https://www.modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary) \| CN & EN \| Alibaba Speech Data (60000hours) \| 8404 \| 220M \| Offline \| Duration of input wav <= 20s \|
			\| [Paraformer-large-long](https://www.modelscope.cn/models/damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary) \| CN & EN \| Alibaba Speech Data (60000hours) \| 8404 \| 220M \| Offline \| Which ould deal with arbitrary length input wav \|
			\| [Paraformer-large-long](https://www.modelscope.cn/models/damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary) \| CN & EN \| Alibaba Speech Data (60000hours) \| 8404 \| 220M \| Offline \| Which would deal with arbitrary length input wav \|
			\| [Paraformer-large-contextual](https://www.modelscope.cn/models/damo/speech_paraformer-large-contextual_asr_nat-zh-cn-16k-common-vocab8404/summary) \| CN & EN \| Alibaba Speech Data (60000hours) \| 8404 \| 220M \| Offline \| Which supports the hotword customization based on the incentive enhancement, and improves the recall and precision of hotwords. \|
			\| [Paraformer](https://modelscope.cn/models/damo/speech_paraformer_asr_nat-zh-cn-16k-common-vocab8358-tensorflow1/summary) \| CN & EN \| Alibaba Speech Data (50000hours) \| 8358 \| 68M \| Offline \| Duration of input wav <= 20s \|
			\| [Paraformer-online](https://www.modelscope.cn/models/damo/speech_paraformer_asr_nat-zh-cn-16k-common-vocab8404-online/summary) \| CN & EN \| Alibaba Speech Data (50000hours) \| 8404 \| 68M \| Online \| Which could deal with streaming input \|

			@@ -4,6 +4,7 @@

			### Speech Recognition
			- [FunASR: A Fundamental End-to-End Speech Recognition Toolkit](https://arxiv.org/abs/2305.11013), INTERSPEECH 2023
			- [BAT: Boundary aware transducer for memory-efficient and low-latency ASR](https://arxiv.org/abs/2305.11571), INTERSPEECH 2023
			- [Paraformer: Fast and Accurate Parallel Transformer for Non-autoregressive End-to-End Speech Recognition](https://arxiv.org/abs/2206.08317), INTERSPEECH 2022
			- [Universal ASR: Unifying Streaming and Non-Streaming ASR Using a Single Encoder-Decoder Model](https://arxiv.org/abs/2010.14099), arXiv preprint arXiv:2010.14099, 2020.
			- [San-m: Memory equipped self-attention for end-to-end speech recognition](https://arxiv.org/pdf/2006.01713), INTERSPEECH 2020

New file
			@@ -0,0 +1,16 @@
			# Boundary Aware Transducer (BAT) Result

			## Training Config
			- 8 gpu(Tesla V100)
			- Feature info: using 80 dims fbank, global cmvn, speed perturb(0.9, 1.0, 1.1), specaugment
			- Train config: conf/train_conformer_bat.yaml
			- LM config: LM was not used
			- Model size: 90M

			## Results (CER)
			- Decode config: conf/decode_bat_conformer.yaml

			\| testset \| CER(%) \|
			\|:-----------:\|:-------:\|
			\| dev \| 4.56 \|
			\| test \| 4.97 \|

New file
			@@ -0,0 +1,108 @@
			encoder: chunk_conformer
			encoder_conf:
			activation_type: swish
			positional_dropout_rate: 0.5
			time_reduction_factor: 2
			embed_vgg_like: false
			subsampling_factor: 4
			linear_units: 2048
			output_size: 512
			attention_heads: 8
			dropout_rate: 0.5
			positional_dropout_rate: 0.5
			attention_dropout_rate: 0.5
			cnn_module_kernel: 15
			num_blocks: 12

			# decoder related
			rnnt_decoder: rnnt
			rnnt_decoder_conf:
			embed_size: 512
			hidden_size: 512
			embed_dropout_rate: 0.5
			dropout_rate: 0.5
			use_embed_mask: true

			predictor: bat_predictor
			predictor_conf:
			idim: 512
			threshold: 1.0
			l_order: 1
			r_order: 1
			return_accum: true

			joint_network_conf:
			joint_space_size: 512

			# frontend related
			frontend: wav_frontend
			frontend_conf:
			fs: 16000
			window: hamming
			n_mels: 80
			frame_length: 25
			frame_shift: 10
			lfr_m: 1
			lfr_n: 1


			# Auxiliary CTC
			model: bat
			model_conf:
			auxiliary_ctc_weight: 0.0
			cif_weight: 1.0
			r_d: 3
			r_u: 5

			# minibatch related
			use_amp: true

			# optimization related
			accum_grad: 1
			grad_clip: 5
			max_epoch: 100
			val_scheduler_criterion:
			- valid
			- loss
			best_model_criterion:
			- - valid
			- cer_transducer
			- min
			keep_nbest_models: 10

			optim: adam
			optim_conf:
			lr: 0.001
			scheduler: warmuplr
			scheduler_conf:
			warmup_steps: 25000

			specaug: specaug
			specaug_conf:
			apply_time_warp: true
			time_warp_window: 5
			time_warp_mode: bicubic
			apply_freq_mask: true
			freq_mask_width_range:
			- 0
			- 40
			num_freq_mask: 2
			apply_time_mask: true
			time_mask_width_range:
			- 0
			- 50
			num_time_mask: 5

			dataset_conf:
			data_names: speech,text
			data_types: sound,text
			shuffle: True
			shuffle_conf:
			shuffle_size: 2048
			sort_size: 500
			batch_conf:
			batch_type: token
			batch_size: 25000
			num_workers: 8

			log_interval: 50

New file
			@@ -0,0 +1,66 @@
			#!/bin/bash

			# Copyright 2017 Xingyu Na
			# Apache 2.0

			#. ./path.sh \|\| exit 1;

			if [ $# != 3 ]; then
			echo "Usage: $0 <audio-path> <text-path> <output-path>"
			echo " $0 /export/a05/xna/data/data_aishell/wav /export/a05/xna/data/data_aishell/transcript data"
			exit 1;
			fi

			aishell_audio_dir=$1
			aishell_text=$2/aishell_transcript_v0.8.txt
			output_dir=$3

			train_dir=$output_dir/data/local/train
			dev_dir=$output_dir/data/local/dev
			test_dir=$output_dir/data/local/test
			tmp_dir=$output_dir/data/local/tmp

			mkdir -p $train_dir
			mkdir -p $dev_dir
			mkdir -p $test_dir
			mkdir -p $tmp_dir

			# data directory check
			if [ ! -d $aishell_audio_dir ] \|\| [ ! -f $aishell_text ]; then
			echo "Error: $0 requires two directory arguments"
			exit 1;
			fi

			# find wav audio file for train, dev and test resp.
			find $aishell_audio_dir -iname "*.wav" > $tmp_dir/wav.flist
			n=`cat $tmp_dir/wav.flist \| wc -l`
			[ $n -ne 141925 ] && \
			echo Warning: expected 141925 data data files, found $n

			grep -i "wav/train" $tmp_dir/wav.flist > $train_dir/wav.flist \|\| exit 1;
			grep -i "wav/dev" $tmp_dir/wav.flist > $dev_dir/wav.flist \|\| exit 1;
			grep -i "wav/test" $tmp_dir/wav.flist > $test_dir/wav.flist \|\| exit 1;

			rm -r $tmp_dir

			# Transcriptions preparation
			for dir in $train_dir $dev_dir $test_dir; do
			echo Preparing $dir transcriptions
			sed -e 's/\.wav//' $dir/wav.flist \| awk -F '/' '{print $NF}' > $dir/utt.list
			paste -d' ' $dir/utt.list $dir/wav.flist > $dir/wav.scp_all
			utils/filter_scp.pl -f 1 $dir/utt.list $aishell_text > $dir/transcripts.txt
			awk '{print $1}' $dir/transcripts.txt > $dir/utt.list
			utils/filter_scp.pl -f 1 $dir/utt.list $dir/wav.scp_all \| sort -u > $dir/wav.scp
			sort -u $dir/transcripts.txt > $dir/text
			done

			mkdir -p $output_dir/data/train $output_dir/data/dev $output_dir/data/test

			for f in wav.scp text; do
			cp $train_dir/$f $output_dir/data/train/$f \|\| exit 1;
			cp $dev_dir/$f $output_dir/data/dev/$f \|\| exit 1;
			cp $test_dir/$f $output_dir/data/test/$f \|\| exit 1;
			done

			echo "$0: AISHELL data preparation succeeded"
			exit 0;

New file
			@@ -0,0 +1,5 @@
			export FUNASR_DIR=$PWD/../../..

			# NOTE(kan-bayashi): Use UTF-8 in Python to avoid UnicodeDecodeError when LC_ALL=C
			export PYTHONIOENCODING=UTF-8
			export PATH=$FUNASR_DIR/funasr/bin:$PATH

New file
			@@ -0,0 +1,210 @@
			#!/usr/bin/env bash

			. ./path.sh \|\| exit 1;

			# machines configuration
			CUDA_VISIBLE_DEVICES="0,1,2,3,4,5,6,7"
			gpu_num=8
			count=1
			gpu_inference=true # Whether to perform gpu decoding, set false for cpu decoding
			# for gpu decoding, inference_nj=ngpu*njob; for cpu decoding, inference_nj=njob
			njob=5
			train_cmd=utils/run.pl
			infer_cmd=utils/run.pl

			# general configuration
			feats_dir="../DATA" #feature output dictionary
			exp_dir="."
			lang=zh
			token_type=char
			type=sound
			scp=wav.scp
			speed_perturb="0.9 1.0 1.1"
			stage=0
			stop_stage=5

			# feature configuration
			feats_dim=80
			nj=64

			# data
			raw_data=../raw_data
			data_url=www.openslr.org/resources/33

			# exp tag
			tag="exp1"

			. utils/parse_options.sh \|\| exit 1;

			# Set bash to 'debug' mode, it will exit on :
			# -e 'error', -u 'undefined variable', -o ... 'error in pipeline', -x 'print commands',
			set -e
			set -u
			set -o pipefail

			train_set=train
			valid_set=dev
			test_sets="dev test"

			asr_config=conf/train_conformer_bat.yaml
			model_dir="baseline_$(basename "${asr_config}" .yaml)_${lang}_${token_type}_${tag}"

			inference_config=conf/decode_bat_conformer.yaml
			inference_asr_model=valid.cer_transducer.ave_10best.pb

			# you can set gpu num for decoding here
			gpuid_list=$CUDA_VISIBLE_DEVICES # set gpus for decoding, the same as training stage by default
			ngpu=$(echo $gpuid_list \| awk -F "," '{print NF}')

			if ${gpu_inference}; then
			inference_nj=$[${ngpu}*${njob}]
			_ngpu=1
			else
			inference_nj=$njob
			_ngpu=0
			fi

			if [ ${stage} -le -1 ] && [ ${stop_stage} -ge -1 ]; then
			echo "stage -1: Data Download"
			local/download_and_untar.sh ${raw_data} ${data_url} data_aishell
			local/download_and_untar.sh ${raw_data} ${data_url} resource_aishell
			fi

			if [ ${stage} -le 0 ] && [ ${stop_stage} -ge 0 ]; then
			echo "stage 0: Data preparation"
			# Data preparation
			local/aishell_data_prep.sh ${raw_data}/data_aishell/wav ${raw_data}/data_aishell/transcript ${feats_dir}
			for x in train dev test; do
			cp ${feats_dir}/data/${x}/text ${feats_dir}/data/${x}/text.org
			paste -d " " <(cut -f 1 -d" " ${feats_dir}/data/${x}/text.org) <(cut -f 2- -d" " ${feats_dir}/data/${x}/text.org \| tr -d " ") \
			> ${feats_dir}/data/${x}/text
			utils/text2token.py -n 1 -s 1 ${feats_dir}/data/${x}/text > ${feats_dir}/data/${x}/text.org
			mv ${feats_dir}/data/${x}/text.org ${feats_dir}/data/${x}/text
			done
			fi

			if [ ${stage} -le 1 ] && [ ${stop_stage} -ge 1 ]; then
			echo "stage 1: Feature and CMVN Generation"
			utils/compute_cmvn.sh --cmd "$train_cmd" --nj $nj --feats_dim ${feats_dim} ${feats_dir}/data/${train_set}
			fi

			token_list=${feats_dir}/data/${lang}_token_list/char/tokens.txt
			echo "dictionary: ${token_list}"
			if [ ${stage} -le 2 ] && [ ${stop_stage} -ge 2 ]; then
			echo "stage 2: Dictionary Preparation"
			mkdir -p ${feats_dir}/data/${lang}_token_list/char/

			echo "make a dictionary"
			echo "<blank>" > ${token_list}
			echo "<s>" >> ${token_list}
			echo "</s>" >> ${token_list}
			utils/text2token.py -s 1 -n 1 --space "" ${feats_dir}/data/$train_set/text \| cut -f 2- -d" " \| tr " " "\n" \
			\| sort \| uniq \| grep -a -v -e '^\s*$' \| awk '{print $0}' >> ${token_list}
			echo "<unk>" >> ${token_list}
			fi

			# LM Training Stage
			world_size=$gpu_num # run on one machine
			if [ ${stage} -le 3 ] && [ ${stop_stage} -ge 3 ]; then
			echo "stage 3: LM Training"
			fi

			# ASR Training Stage
			world_size=$gpu_num # run on one machine
			if [ ${stage} -le 4 ] && [ ${stop_stage} -ge 4 ]; then
			echo "stage 4: ASR Training"
			mkdir -p ${exp_dir}/exp/${model_dir}
			mkdir -p ${exp_dir}/exp/${model_dir}/log
			INIT_FILE=./ddp_init
			if [ -f $INIT_FILE ];then
			rm -f $INIT_FILE
			fi
			init_method=file://$(readlink -f $INIT_FILE)
			echo "$0: init method is $init_method"
			for ((i = 0; i < $gpu_num; ++i)); do
			{
			rank=$i
			local_rank=$i
			gpu_id=$(echo $CUDA_VISIBLE_DEVICES \| cut -d',' -f$[$i+1])
			train.py \
			--task_name asr \
			--gpu_id $gpu_id \
			--use_preprocessor true \
			--token_type char \
			--token_list $token_list \
			--data_dir ${feats_dir}/data \
			--train_set ${train_set} \
			--valid_set ${valid_set} \
			--data_file_names "wav.scp,text" \
			--cmvn_file ${feats_dir}/data/${train_set}/cmvn/cmvn.mvn \
			--speed_perturb ${speed_perturb} \
			--resume true \
			--output_dir ${exp_dir}/exp/${model_dir} \
			--config $asr_config \
			--ngpu $gpu_num \
			--num_worker_count $count \
			--dist_init_method $init_method \
			--dist_world_size $world_size \
			--dist_rank $rank \
			--local_rank $local_rank 1> ${exp_dir}/exp/${model_dir}/log/train.log.$i 2>&1
			} &
			done
			wait
			fi

			# Testing Stage
			if [ ${stage} -le 5 ] && [ ${stop_stage} -ge 5 ]; then
			echo "stage 5: Inference"
			for dset in ${test_sets}; do
			asr_exp=${exp_dir}/exp/${model_dir}
			inference_tag="$(basename "${inference_config}" .yaml)"
			_dir="${asr_exp}/${inference_tag}/${inference_asr_model}/${dset}"
			_logdir="${_dir}/logdir"
			if [ -d ${_dir} ]; then
			echo "${_dir} is already exists. if you want to decode again, please delete this dir first."
			exit 0
			fi
			mkdir -p "${_logdir}"
			_data="${feats_dir}/data/${dset}"
			key_file=${_data}/${scp}
			num_scp_file="$(<${key_file} wc -l)"
			_nj=$([ $inference_nj -le $num_scp_file ] && echo "$inference_nj" \|\| echo "$num_scp_file")
			split_scps=
			for n in $(seq "${_nj}"); do
			split_scps+=" ${_logdir}/keys.${n}.scp"
			done
			# shellcheck disable=SC2086
			utils/split_scp.pl "${key_file}" ${split_scps}
			_opts=
			if [ -n "${inference_config}" ]; then
			_opts+="--config ${inference_config} "
			fi
			${infer_cmd} --gpu "${_ngpu}" --max-jobs-run "${_nj}" JOB=1:"${_nj}" "${_logdir}"/asr_inference.JOB.log \
			python -m funasr.bin.asr_inference_launch \
			--batch_size 1 \
			--ngpu "${_ngpu}" \
			--njob ${njob} \
			--gpuid_list ${gpuid_list} \
			--data_path_and_name_and_type "${_data}/${scp},speech,${type}" \
			--cmvn_file ${feats_dir}/data/${train_set}/cmvn/cmvn.mvn \
			--key_file "${_logdir}"/keys.JOB.scp \
			--asr_train_config "${asr_exp}"/config.yaml \
			--asr_model_file "${asr_exp}"/"${inference_asr_model}" \
			--output_dir "${_logdir}"/output.JOB \
			--mode bat \
			${_opts}

			for f in token token_int score text; do
			if [ -f "${_logdir}/output.1/1best_recog/${f}" ]; then
			for i in $(seq "${_nj}"); do
			cat "${_logdir}/output.${i}/1best_recog/${f}"
			done \| sort -k1 >"${_dir}/${f}"
			fi
			done
			python utils/proce_text.py ${_dir}/text ${_dir}/text.proc
			python utils/proce_text.py ${_data}/text ${_data}/text.proc
			python utils/compute_wer.py ${_data}/text.proc ${_dir}/text.proc ${_dir}/text.cer
			tail -n 3 ${_dir}/text.cer > ${_dir}/text.cer.txt
			cat ${_dir}/text.cer.txt
			done
			fi

			@@ -3,6 +3,10 @@

			param_dict = dict()
			param_dict['hotword'] = "https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/hotword.txt"
			param_dict['clas_scale'] = 1.00 # 1.50 # set it larger if you want high recall (sacrifice general accuracy)
			# 13% relative recall raise over internal hotword test set (45%->51%)
			# CER might raise when utterance contains no hotword

			inference_pipeline = pipeline(
			task=Tasks.auto_speech_recognition,
			model="damo/speech_paraformer-large-contextual_asr_nat-zh-cn-16k-common-vocab8404",

			@@ -280,6 +280,7 @@
			nbest: int = 1,
			frontend_conf: dict = None,
			hotword_list_or_file: str = None,
			clas_scale: float = 1.0,
			decoding_ind: int = 0,
			**kwargs,
			):
			@@ -376,6 +377,7 @@
			# 6. [Optional] Build hotword list from str, local file or url
			self.hotword_list = None
			self.hotword_list = self.generate_hotwords_list(hotword_list_or_file)
			self.clas_scale = clas_scale

			is_use_lm = lm_weight != 0.0 and lm_file is not None
			if (ctc_weight == 0.0 or asr_model.ctc == None) and not is_use_lm:
			@@ -439,16 +441,20 @@
			pre_token_length = pre_token_length.round().long()
			if torch.max(pre_token_length) < 1:
			return []
			if not isinstance(self.asr_model, ContextualParaformer) and not isinstance(self.asr_model,
			NeatContextualParaformer):
			if not isinstance(self.asr_model, ContextualParaformer) and \
			not isinstance(self.asr_model, NeatContextualParaformer):
			if self.hotword_list:
			logging.warning("Hotword is given but asr model is not a ContextualParaformer.")
			decoder_outs = self.asr_model.cal_decoder_with_predictor(enc, enc_len, pre_acoustic_embeds,
			pre_token_length)
			decoder_out, ys_pad_lens = decoder_outs[0], decoder_outs[1]
			else:
			decoder_outs = self.asr_model.cal_decoder_with_predictor(enc, enc_len, pre_acoustic_embeds,
			pre_token_length, hw_list=self.hotword_list)
			decoder_outs = self.asr_model.cal_decoder_with_predictor(enc,
			enc_len,
			pre_acoustic_embeds,
			pre_token_length,
			hw_list=self.hotword_list,
			clas_scale=self.clas_scale)
			decoder_out, ys_pad_lens = decoder_outs[0], decoder_outs[1]

			if isinstance(self.asr_model, BiCifParaformer):

			@@ -255,8 +255,10 @@
			if param_dict is not None:
			hotword_list_or_file = param_dict.get('hotword')
			export_mode = param_dict.get("export_mode", False)
			clas_scale = param_dict.get('clas_scale', 1.0)
			else:
			hotword_list_or_file = None
			clas_scale = 1.0

			if kwargs.get("device", None) == "cpu":
			ngpu = 0
			@@ -289,6 +291,7 @@
			penalty=penalty,
			nbest=nbest,
			hotword_list_or_file=hotword_list_or_file,
			clas_scale=clas_scale,
			)

			speech2text = Speech2TextParaformer(**speech2text_kwargs)
			@@ -617,10 +620,27 @@
			sorted_data = sorted(data_with_index, key=lambda x: x[0][1] - x[0][0])
			results_sorted = []

			if not len(sorted_data):
			key = keys[0]
			# no active segments after VAD
			if writer is not None:
			# Write empty results
			ibest_writer["token"][key] = ""
			ibest_writer["token_int"][key] = ""
			ibest_writer["vad"][key] = ""
			ibest_writer["text"][key] = ""
			ibest_writer["text_with_punc"][key] = ""
			if use_timestamp:
			ibest_writer["time_stamp"][key] = ""

			logging.info("decoding, utt: {}, empty speech".format(key))
			continue

			batch_size_token_ms = batch_size_token*60
			if speech2text.device == "cpu":
			batch_size_token_ms = 0
			batch_size_token_ms = max(batch_size_token_ms, sorted_data[0][0][1] - sorted_data[0][0][0])
			if len(sorted_data) > 0 and len(sorted_data[0]) > 0:
			batch_size_token_ms = max(batch_size_token_ms, sorted_data[0][0][1] - sorted_data[0][0][0])

			batch_size_token_ms_cum = 0
			beg_idx = 0
			@@ -1349,10 +1369,7 @@
			left_context=left_context,
			right_context=right_context,
			)
			speech2text = Speech2TextTransducer.from_pretrained(
			model_tag=model_tag,
			**speech2text_kwargs,
			)
			speech2text = Speech2TextTransducer(**speech2text_kwargs)

			def _forward(data_path_and_name_and_type,
			raw_inputs: Union[np.ndarray, torch.Tensor] = None,

			@@ -85,7 +85,9 @@
			finetune_configs = yaml.safe_load(f)
			# set data_types
			if dataset_type == "large":
			finetune_configs["dataset_conf"]["data_types"] = "sound,text"
			# finetune_configs["dataset_conf"]["data_types"] = "sound,text"
			if 'data_types' not in finetune_configs['dataset_conf']:
			finetune_configs["dataset_conf"]["data_types"] = "sound,text"
			finetune_configs = update_dct(configs, finetune_configs)
			for key, value in finetune_configs.items():
			if hasattr(args, key):

			@@ -179,7 +179,7 @@

			@staticmethod
			def seq2arr(seq, vec_dim=8):
			def int2vec(x, vec_dim=8, dtype=np.int):
			def int2vec(x, vec_dim=8, dtype=np.int32):
			b = ('{:0' + str(vec_dim) + 'b}').format(x)
			# little-endian order: lower bit first
			return (np.array(list(b)[::-1]) == '1').astype(dtype)

			@@ -92,10 +92,7 @@
			embedding_node="resnet1_dense"
			)
			logging.info("speech2xvector_kwargs: {}".format(speech2xvector_kwargs))
			speech2xvector = Speech2Xvector.from_pretrained(
			model_tag=model_tag,
			**speech2xvector_kwargs,
			)
			speech2xvector = Speech2Xvector(**speech2xvector_kwargs)
			speech2xvector.sv_model.eval()

			# 2b. Build speech2diar
			@@ -109,10 +106,7 @@
			dur_threshold=dur_threshold,
			)
			logging.info("speech2diarization_kwargs: {}".format(speech2diar_kwargs))
			speech2diar = Speech2DiarizationSOND.from_pretrained(
			model_tag=model_tag,
			**speech2diar_kwargs,
			)
			speech2diar = Speech2DiarizationSOND(**speech2diar_kwargs)
			speech2diar.diar_model.eval()

			def output_results_str(results: dict, uttid: str):
			@@ -257,10 +251,7 @@
			dtype=dtype,
			)
			logging.info("speech2diarization_kwargs: {}".format(speech2diar_kwargs))
			speech2diar = Speech2DiarizationEEND.from_pretrained(
			model_tag=model_tag,
			**speech2diar_kwargs,
			)
			speech2diar = Speech2DiarizationEEND(**speech2diar_kwargs)
			speech2diar.diar_model.eval()

			def output_results_str(results: dict, uttid: str):

			@@ -202,14 +202,7 @@
			data_types = conf.get("data_types", "kaldi_ark,text")

			pre_hwfile = conf.get("pre_hwlist", None)
			pre_prob = conf.get("pre_prob", 0) # unused yet

			hw_config = {"sample_rate": conf.get("sample_rate", 0.6),
			"double_rate": conf.get("double_rate", 0.1),
			"hotword_min_length": conf.get("hotword_min_length", 2),
			"hotword_max_length": conf.get("hotword_max_length", 8),
			"pre_prob": conf.get("pre_prob", 0.0)}

			# pre_prob = conf.get("pre_prob", 0) # unused yet
			if pre_hwfile is not None:
			pre_hwlist = []
			with open(pre_hwfile, 'r') as fin:
			@@ -218,6 +211,15 @@
			else:
			pre_hwlist = None

			hw_config = {"sample_rate": conf.get("sample_rate", 0.6),
			"double_rate": conf.get("double_rate", 0.1),
			"hotword_min_length": conf.get("hotword_min_length", 2),
			"hotword_max_length": conf.get("hotword_max_length", 8),
			"pre_prob": conf.get("pre_prob", 0.0),
			"pre_hwlist": pre_hwlist}



			dataset = AudioDataset(scp_lists,
			data_names,
			data_types,

			@@ -6,7 +6,8 @@
			sample_rate,
			double_rate,
			pre_prob,
			pre_index=None):
			pre_index=None,
			pre_hwlist=None):
			if length < hotword_min_length:
			return [-1]
			if random.random() < sample_rate:

			@@ -54,7 +54,17 @@

			length = len(text)
			if 'hw_tag' in data:
			hotword_indxs = sample_hotword(length, **hw_config)
			if hw_config['pre_hwlist'] is not None and hw_config['pre_prob'] > 0:
			# enable preset hotword detect in sampling
			pre_index = None
			for hw in hw_config['pre_hwlist']:
			hw = " ".join(seg_tokenize(hw, seg_dict))
			_find = " ".join(text).find(hw)
			if _find != -1:
			# _find = text[:_find].count(" ") # bpe sometimes
			pre_index = [_find, _find + max(hw.count(" "), 1)]
			break
			hotword_indxs = sample_hotword(length, **hw_config, pre_index=pre_index)
			data['hotword_indxs'] = hotword_indxs
			del data['hw_tag']
			for i in range(length):

			@@ -244,6 +244,7 @@
			ys_in_pad: torch.Tensor,
			ys_in_lens: torch.Tensor,
			contextual_info: torch.Tensor,
			clas_scale: float = 1.0,
			return_hidden: bool = False,
			) -> Tuple[torch.Tensor, torch.Tensor]:
			"""Forward decoder.
			@@ -283,7 +284,7 @@
			cx, tgt_mask, _, _, _ = self.bias_decoder(x_self_attn, tgt_mask, contextual_info, memory_mask=contextual_mask)

			if self.bias_output is not None:
			x = torch.cat([x_src_attn, cx], dim=2)
			x = torch.cat([x_src_attn, cx*clas_scale], dim=2)
			x = self.bias_output(x.transpose(1, 2)).transpose(1, 2) # 2D -> D
			x = x_self_attn + self.dropout(x)

			@@ -341,7 +341,7 @@
			input_mask_expand_dim, 0)
			return sematic_embeds * tgt_mask, decoder_out * tgt_mask

			def cal_decoder_with_predictor(self, encoder_out, encoder_out_lens, sematic_embeds, ys_pad_lens, hw_list=None):
			def cal_decoder_with_predictor(self, encoder_out, encoder_out_lens, sematic_embeds, ys_pad_lens, hw_list=None, clas_scale=1.0):
			if hw_list is None:
			hw_list = [torch.Tensor([1]).long().to(encoder_out.device)] # empty hotword list
			hw_list_pad = pad_list(hw_list, 0)
			@@ -363,7 +363,7 @@
			hw_embed = h_n.repeat(encoder_out.shape[0], 1, 1)

			decoder_outs = self.decoder(
			encoder_out, encoder_out_lens, sematic_embeds, ys_pad_lens, contextual_info=hw_embed
			encoder_out, encoder_out_lens, sematic_embeds, ys_pad_lens, contextual_info=hw_embed, clas_scale=clas_scale
			)
			decoder_out = decoder_outs[0]
			decoder_out = torch.log_softmax(decoder_out, dim=-1)

			@@ -2107,7 +2107,7 @@

			return loss_att, acc_att, cer_att, wer_att, loss_pre

			def cal_decoder_with_predictor(self, encoder_out, encoder_out_lens, sematic_embeds, ys_pad_lens, hw_list=None):
			def cal_decoder_with_predictor(self, encoder_out, encoder_out_lens, sematic_embeds, ys_pad_lens, hw_list=None, clas_scale=1.0):
			if hw_list is None:
			# default hotword list
			hw_list = [torch.Tensor([self.sos]).long().to(encoder_out.device)] # empty hotword list

			@@ -46,12 +46,12 @@
			raise ValueError(f"Not supported rnn_type={rnn_type}")

			if subsample is None:
			subsample = np.ones(num_layers + 1, dtype=np.int)
			subsample = np.ones(num_layers + 1, dtype=np.int32)
			else:
			subsample = subsample[:num_layers]
			# Append 1 at the beginning because the second or later is used
			subsample = np.pad(
			np.array(subsample, dtype=np.int),
			np.array(subsample, dtype=np.int32),
			[1, num_layers - len(subsample)],
			mode="constant",
			constant_values=1,

			@@ -105,7 +105,7 @@
			for length in sorted(lengths, reverse=True):
			lens = np.fromiter(
			(e - s if e - s >= length + min_space else 0 for s, e in parts),
			np.int,
			np.int32,
			)
			l_sum = np.sum(lens)
			if l_sum == 0:

			@@ -13,7 +13,7 @@
			class MaskEstimator(torch.nn.Module):
			def __init__(self, type, idim, layers, units, projs, dropout, nmask=1):
			super().__init__()
			subsample = np.ones(layers + 1, dtype=np.int)
			subsample = np.ones(layers + 1, dtype=np.int32)

			typ = type.lstrip("vgg").rstrip("p")
			if type[-1] == "p":

			@@ -407,7 +407,7 @@

			elif mode == "mt" and arch == "rnn":
			# +1 means input (+1) and layers outputs (train_args.elayer)
			subsample = np.ones(train_args.elayers + 1, dtype=np.int)
			subsample = np.ones(train_args.elayers + 1, dtype=np.int32)
			logging.warning("Subsampling is not performed for machine translation.")
			logging.info("subsample: " + " ".join([str(x) for x in subsample]))
			return subsample
			@@ -417,7 +417,7 @@
			or (mode == "mt" and arch == "rnn")
			or (mode == "st" and arch == "rnn")
			):
			subsample = np.ones(train_args.elayers + 1, dtype=np.int)
			subsample = np.ones(train_args.elayers + 1, dtype=np.int32)
			if train_args.etype.endswith("p") and not train_args.etype.startswith("vgg"):
			ss = train_args.subsample.split("_")
			for j in range(min(train_args.elayers + 1, len(ss))):
			@@ -432,7 +432,7 @@

			elif mode == "asr" and arch == "rnn_mix":
			subsample = np.ones(
			train_args.elayers_sd + train_args.elayers + 1, dtype=np.int
			train_args.elayers_sd + train_args.elayers + 1, dtype=np.int32
			)
			if train_args.etype.endswith("p") and not train_args.etype.startswith("vgg"):
			ss = train_args.subsample.split("_")
			@@ -451,7 +451,7 @@
			elif mode == "asr" and arch == "rnn_mulenc":
			subsample_list = []
			for idx in range(train_args.num_encs):
			subsample = np.ones(train_args.elayers[idx] + 1, dtype=np.int)
			subsample = np.ones(train_args.elayers[idx] + 1, dtype=np.int32)
			if train_args.etype[idx].endswith("p") and not train_args.etype[
			idx
			].startswith("vgg"):

			@@ -169,7 +169,7 @@

			### python-client
			```shell
			python wss_client_asr.py --host "127.0.0.1" --port 10095 --mode offline --audio_in "./data/wav.scp" --send_without_sleep --output_dir "./results"
			python funasr_wss_client.py --host "127.0.0.1" --port 10095 --mode offline --audio_in "./data/wav.scp" --send_without_sleep --output_dir "./results"
			```

			Introduction to command parameters:

			@@ -2,11 +2,47 @@

			FunASR提供可一键本地或者云端服务器部署的中文离线文件转写服务，内核为FunASR已开源runtime-SDK。FunASR-runtime结合了达摩院语音实验室在Modelscope社区开源的语音端点检测(VAD)、Paraformer-large语音识别(ASR)、标点检测(PUNC) 等相关能力，可以准确、高效的对音频进行高并发转写。

			本文档为FunASR离线文件转写服务开发指南。如果您想快速体验离线文件转写服务，请参考FunASR离线文件转写服务一键部署示例（[点击此处](./SDK_tutorial_cn.md)）。
			本文档为FunASR离线文件转写服务开发指南。如果您想快速体验离线文件转写服务，可参考[快速上手](#快速上手)。

			## 快速上手
			### 镜像启动

			通过下述命令拉取并启动FunASR runtime-SDK的docker镜像：

			```shell
			sudo docker pull registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-cpu-0.1.0

			sudo docker run -p 10095:10095 -it --privileged=true -v /root:/workspace/models registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-cpu-0.1.0
			```
			如果您没有安装docker，可参考[Docker安装](#Docker安装)

			### 服务端启动

			docker启动之后，启动 funasr-wss-server服务程序：
			```shell
			cd FunASR/funasr/runtime
			./run_server.sh \
			--download-model-dir /workspace/models \
			--vad-dir damo/speech_fsmn_vad_zh-cn-16k-common-onnx \
			--model-dir damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-onnx \
			--punc-dir damo/punc_ct-transformer_zh-cn-common-vocab272727-onnx
			```
			服务端详细参数介绍可参考[服务端参数介绍](#服务端参数介绍)
			### 客户端测试与使用

			下载客户端测试工具目录samples
			```shell
			wget https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/sample/funasr_samples.tar.gz
			```
			我们以Python语言客户端为例，进行说明，支持多种音频格式输入（.wav, .pcm, .mp3等），也支持视频输入(.mp4等)，以及多文件列表wav.scp输入，其他版本客户端请参考文档（[点击此处](#客户端用法详解)），定制服务部署请参考[如何定制服务部署](#如何定制服务部署)
			```shell
			python3 wss_client_asr.py --host "127.0.0.1" --port 10095 --mode offline --audio_in "../audio/asr_example.wav"
			```

			------------------
			## Docker安装

			下述步骤为手动安装docker及docker镜像的步骤，如您docker镜像已启动，可以忽略本步骤：
			下述步骤为手动安装docker环境的步骤：

			### docker环境安装
			```shell
			@@ -30,35 +66,63 @@
			sudo systemctl start docker
			```

			### 镜像拉取及启动

			通过下述命令拉取并启动FunASR runtime-SDK的docker镜像：
			## 客户端用法详解

			在服务器上完成FunASR服务部署以后，可以通过如下的步骤来测试和使用离线文件转写服务。
			目前分别支持以下几种编程语言客户端

			- [Python](#python-client)
			- [CPP](#cpp-client)
			- [html网页版本](#Html网页版)
			- [Java](#Java-client)

			### python-client
			若想直接运行client进行测试，可参考如下简易说明，以python版本为例：

			```shell
			sudo docker pull registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-cpu-latest

			sudo docker run -p 10095:10095 -it --privileged=true -v /root:/workspace/models registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-cpu-latest
			python3 wss_client_asr.py --host "127.0.0.1" --port 10095 --mode offline --audio_in "../audio/asr_example.wav" --output_dir "./results"
			```

			命令参数介绍：
			命令参数说明：
			```text
			-p <宿主机端口>:<映射到docker端口>
			如示例，宿主机(ecs)端口10095映射到docker端口10095上。前提是确保ecs安全规则打开了10095端口。
			-v <宿主机路径>:<挂载至docker路径>
			如示例，宿主机路径/root挂载至docker路径/workspace/models
			--host 为FunASR runtime-SDK服务部署机器ip，默认为本机ip（127.0.0.1），如果client与服务不在同一台服务器，需要改为部署机器ip
			--port 10095 部署端口号
			--mode offline表示离线文件转写
			--audio_in 需要进行转写的音频文件，支持文件路径，文件列表wav.scp
			--output_dir 识别结果保存路径
			```


			## 服务端启动

			docker启动之后，启动 funasr-wss-server服务程序：
			### cpp-client
			进入samples/cpp目录后，可以用cpp进行测试，指令如下：
			```shell
			./run_server.sh --vad-dir damo/speech_fsmn_vad_zh-cn-16k-common-onnx \
			--model-dir damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-onnx \
			--punc-dir damo/punc_ct-transformer_zh-cn-common-vocab272727-onnx
			./funasr-wss-client --server-ip 127.0.0.1 --port 10095 --wav-path ../audio/asr_example.wav
			```

			详细命令参数介绍：
			命令参数说明：

			```text
			--server-ip 为FunASR runtime-SDK服务部署机器ip，默认为本机ip（127.0.0.1），如果client与服务不在同一台服务器，需要改为部署机器ip
			--port 10095 部署端口号
			--wav-path 需要进行转写的音频文件，支持文件路径
			```

			### Html网页版

			在浏览器中打开 html/static/index.html，即可出现如下页面，支持麦克风输入与文件上传，直接进行体验

			<img src="images/html.png" width="900"/>

			### Java-client

			```shell
			FunasrWsClient --host localhost --port 10095 --audio_in ./asr_example.wav --mode offline
			```
			详细可以参考文档（[点击此处](../java/readme.md)）



			## 服务端参数介绍：

			funasr-wss-server支持从Modelscope下载模型，设置模型下载地址（--download-model-dir，默认为/workspace/models）及model ID（--model-dir、--vad-dir、--punc-dir）,示例如下：
			```shell
			@@ -76,21 +140,21 @@
			```
			命令参数介绍：
			```text
			--download-model-dir #模型下载地址，通过设置model ID从Modelscope下载模型
			--model-dir # modelscope model ID
			--quantize # True为量化ASR模型，False为非量化ASR模型，默认是True
			--vad-dir # modelscope model ID
			--vad-quant # True为量化VAD模型，False为非量化VAD模型，默认是True
			--punc-dir # modelscope model ID
			--punc-quant # True为量化PUNC模型，False为非量化PUNC模型，默认是True
			--port # 服务端监听的端口号，默认为 10095
			--decoder-thread-num # 服务端启动的推理线程数，默认为 8
			--io-thread-num # 服务端启动的IO线程数，默认为 1
			--certfile <string> # ssl的证书文件，默认为：../../../ssl_key/server.crt
			--keyfile <string> # ssl的密钥文件，默认为：../../../ssl_key/server.key
			--download-model-dir 模型下载地址，通过设置model ID从Modelscope下载模型
			--model-dir modelscope model ID
			--quantize True为量化ASR模型，False为非量化ASR模型，默认是True
			--vad-dir modelscope model ID
			--vad-quant True为量化VAD模型，False为非量化VAD模型，默认是True
			--punc-dir modelscope model ID
			--punc-quant True为量化PUNC模型，False为非量化PUNC模型，默认是True
			--port 服务端监听的端口号，默认为 10095
			--decoder-thread-num 服务端启动的推理线程数，默认为 8
			--io-thread-num 服务端启动的IO线程数，默认为 1
			--certfile ssl的证书文件，默认为：../../../ssl_key/server.crt
			--keyfile ssl的密钥文件，默认为：../../../ssl_key/server.key
			```

			funasr-wss-server同时也支持从本地路径加载模型（本地模型资源准备详见[模型资源准备](#anchor-1)）示例如下：
			funasr-wss-server同时也支持从本地路径加载模型（本地模型资源准备详见[模型资源准备](#模型资源准备)）示例如下：
			```shell
			cd /workspace/FunASR/funasr/runtime/websocket/build/bin
			./funasr-wss-server \
			@@ -105,32 +169,32 @@
			```
			命令参数介绍：
			```text
			--model-dir # ASR模型路径，默认为：/workspace/models/asr
			--quantize # True为量化ASR模型，False为非量化ASR模型，默认是True
			--vad-dir # VAD模型路径，默认为：/workspace/models/vad
			--vad-quant # True为量化VAD模型，False为非量化VAD模型，默认是True
			--punc-dir # PUNC模型路径，默认为：/workspace/models/punc
			--punc-quant # True为量化PUNC模型，False为非量化PUNC模型，默认是True
			--port # 服务端监听的端口号，默认为 10095
			--decoder-thread-num # 服务端启动的推理线程数，默认为 8
			--io-thread-num # 服务端启动的IO线程数，默认为 1
			--certfile <string> # ssl的证书文件，默认为：../../../ssl_key/server.crt
			--keyfile <string> # ssl的密钥文件，默认为：../../../ssl_key/server.key
			--model-dir ASR模型路径，默认为：/workspace/models/asr
			--quantize True为量化ASR模型，False为非量化ASR模型，默认是True
			--vad-dir VAD模型路径，默认为：/workspace/models/vad
			--vad-quant True为量化VAD模型，False为非量化VAD模型，默认是True
			--punc-dir PUNC模型路径，默认为：/workspace/models/punc
			--punc-quant True为量化PUNC模型，False为非量化PUNC模型，默认是True
			--port 服务端监听的端口号，默认为 10095
			--decoder-thread-num 服务端启动的推理线程数，默认为 8
			--io-thread-num 服务端启动的IO线程数，默认为 1
			--certfile ssl的证书文件，默认为：../../../ssl_key/server.crt
			--keyfile ssl的密钥文件，默认为：../../../ssl_key/server.key
			```

			## <a id="anchor-1">模型资源准备</a>
			## 模型资源准备

			如果您选择通过funasr-wss-server从Modelscope下载模型，可以跳过本步骤。

			FunASR离线文件转写服务中的vad、asr和punc模型资源均来自Modelscope，模型地址详见下表：

			\| 模型 \| Modelscope链接 \|
			\|------\|------------------------------------------------------------------------------------------------------------------\|
			\| VAD \| https://www.modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary \|
			\| ASR \| https://www.modelscope.cn/models/damo/speech_fsmn_vad_zh-cn-16k-common-pytorch/summary \|
			\| PUNC \| https://www.modelscope.cn/models/damo/punc_ct-transformer_zh-cn-common-vocab272727-pytorch/summary \|
			\| 模型 \| Modelscope链接 \|
			\|------\|---------------------------------------------------------------------------------------------------------------\|
			\| VAD \| https://www.modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-onnx/summary \|
			\| ASR \| https://www.modelscope.cn/models/damo/speech_fsmn_vad_zh-cn-16k-common-onnx/summary \|
			\| PUNC \| https://www.modelscope.cn/models/damo/punc_ct-transformer_zh-cn-common-vocab272727-onnx/summary \|

			离线文件转写服务中部署的是量化后的ONNX模型，下面介绍下如何导出ONNX模型及其量化：您可以选择从Modelscope导出ONNX模型、从本地文件导出ONNX模型或者从finetune后的资源导出模型：
			离线文件转写服务中部署的是量化后的ONNX模型，下面介绍下如何导出ONNX模型及其量化：您可以选择从Modelscope导出ONNX模型、从finetune后的资源导出模型：

			### 从Modelscope导出ONNX模型

			@@ -153,22 +217,6 @@
			--type 模型类型，目前支持 ONNX、torch
			--quantize int8模型量化
			```

			### 从本地文件导出ONNX模型

			设置model name为模型本地路径，导出量化后的ONNX模型：

			```shell
			python -m funasr.export.export_model --model-name /workspace/models/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch --export-dir ./export --type onnx --quantize True
			```
			命令参数介绍：
			```text
			--model-name 模型本地路径，例如/workspace/models/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch
			--export-dir ONNX模型导出地址
			--type 模型类型，目前支持 ONNX、torch
			--quantize int8模型量化
			```

			### 从finetune后的资源导出模型

			假如您想部署finetune后的模型，可以参考如下步骤：
			@@ -179,36 +227,18 @@
			python -m funasr.export.export_model --model-name /path/to/finetune/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch --export-dir ./export --type onnx --quantize True
			```

			## 客户端启动

			在服务器上完成FunASR离线文件转写服务部署以后，可以通过如下的步骤来测试和使用离线文件转写服务。目前FunASR-bin支持多种方式启动客户端，如下是基于python-client、c++-client的命令行实例及自定义客户端Websocket通信协议：

			### python-client
			```shell
			python wss_client_asr.py --host "127.0.0.1" --port 10095 --mode offline --audio_in "./data/wav.scp" --send_without_sleep --output_dir "./results"
			```
			命令参数介绍：
			```text
			--host # 服务端ip地址，本机测试可设置为 127.0.0.1
			--port # 服务端监听端口号
			--audio_in # 音频输入，输入可以是：wav路径或者 wav.scp路径（kaldi格式的wav list，wav_id \t wav_path）
			--output_dir # 识别结果输出路径
			--ssl # 是否使用SSL加密，默认使用
			--mode # offline模式
			```
			## 如何定制服务部署

			### c++-client：
			```shell
			. /funasr-wss-client --server-ip 127.0.0.1 --port 10095 --wav-path test.wav --thread-num 1 --is-ssl 1
			```
			命令参数介绍：
			```text
			--server-ip # 服务端ip地址，本机测试可设置为 127.0.0.1
			--port # 服务端监听端口号
			--wav-path # 音频输入，输入可以是：wav路径或者 wav.scp路径（kaldi格式的wav list，wav_id \t wav_path）
			--thread-num # 客户端线程数
			--is-ssl # 是否使用SSL加密，默认使用
			```
			FunASR-runtime的代码已开源，如果服务端和客户端不能很好的满足您的需求，您可以根据自己的需求进行进一步的开发：
			### c++ 客户端：

			https://github.com/alibaba-damo-academy/FunASR/tree/main/funasr/runtime/websocket

			### python 客户端：

			https://github.com/alibaba-damo-academy/FunASR/tree/main/funasr/runtime/python/websocket

			### 自定义客户端：

			@@ -223,16 +253,6 @@
			{"is_speaking": False}
			```

			## 如何定制服务部署

			FunASR-runtime的代码已开源，如果服务端和客户端不能很好的满足您的需求，您可以根据自己的需求进行进一步的开发：
			### c++ 客户端：

			https://github.com/alibaba-damo-academy/FunASR/tree/main/funasr/runtime/websocket

			### python 客户端：

			https://github.com/alibaba-damo-academy/FunASR/tree/main/funasr/runtime/python/websocket
			### c++ 服务端：

			#### VAD
			@@ -265,4 +285,4 @@
			FUNASR_RESULT result=CTTransformerInfer(punc_hanlde, txt_str.c_str(), RASR_NONE, NULL);
			// 其中：punc_hanlde为CTTransformerInit返回值，txt_str为文本
			```
			使用示例详见：https://github.com/alibaba-damo-academy/FunASR/blob/main/funasr/runtime/onnxruntime/bin/funasr-onnx-offline-punc.cpp
			使用示例详见：https://github.com/alibaba-damo-academy/FunASR/blob/main/funasr/runtime/onnxruntime/bin/funasr-onnx-offline-punc.cpp

			@@ -275,7 +275,7 @@
			Looking in indexes: http://mirrors.cloud.aliyuncs.com/pypi/simple/
			Requirement already satisfied: websockets in /usr/local/lib/python3.8/dist-packages (from -r /root/funasr_samples/python/requirements_client.txt (line 1)) (11.0.3)

			Run python3 /root/funasr_samples/python/wss_client_asr.py --host 127.0.0.1 --port 10095 --mode offline --audio_in /root/funasr_samples/audio/asr_example.wav --send_without_sleep --output_dir ./funasr_samples/python
			Run python3 /root/funasr_samples/python/funasr_wss_client.py --host 127.0.0.1 --port 10095 --mode offline --audio_in /root/funasr_samples/audio/asr_example.wav --send_without_sleep --output_dir ./funasr_samples/python

			...
			...
			@@ -284,7 +284,7 @@
			Exception: sent 1000 (OK); then received 1000 (OK)
			end

			If failed, you can try (python3 /root/funasr_samples/python/wss_client_asr.py --host 127.0.0.1 --port 10095 --mode offline --audio_in /root/funasr_samples/audio/asr_example.wav --send_without_sleep --output_dir ./funasr_samples/python) in your Shell.
			If failed, you can try (python3 /root/funasr_samples/python/funasr_wss_client.py --host 127.0.0.1 --port 10095 --mode offline --audio_in /root/funasr_samples/audio/asr_example.wav --send_without_sleep --output_dir ./funasr_samples/python) in your Shell.

			```

			@@ -292,7 +292,7 @@

			If you want to directly run the client for testing, you can refer to the following simple instructions, taking the Python version as an example:
			```shell
			python3 wss_client_asr.py --host "127.0.0.1" --port 10095 --mode offline --audio_in "../audio/asr_example.wav" --send_without_sleep --output_dir "./results"
			python3 funasr_wss_client.py --host "127.0.0.1" --port 10095 --mode offline --audio_in "../audio/asr_example.wav" --send_without_sleep --output_dir "./results"
			```

			Command parameter instructions:

			@@ -10,32 +10,36 @@
			- 配置2: （X86，计算型），16核vCPU，内存32G，单机可以支持大约64路的请求
			- 配置3: （X86，计算型），64核vCPU，内存128G，单机可以支持大约200路的请求

			详细性能测试报告（[点击此处](./benchmark_onnx_cpp.md)）

			云服务厂商，针对新用户，有3个月免费试用活动，申请教程（[点击此处](./aliyun_server_tutorial.md)）

			## 快速上手

			### 服务端启动

			`注意`：一键部署工具，过程分为：安装docker、下载docker镜像、启动服务。如果用户希望直接从FunASR docker镜像启动，可以参考开发指南（[点击此处](./SDK_advanced_guide_offline_zh.md)）

			下载部署工具`funasr-runtime-deploy-offline-cpu-zh.sh`

			```shell
			curl -O https://raw.githubusercontent.com/alibaba-damo-academy/FunASR/main/funasr/runtime/deploy_tools/funasr-runtime-deploy-offline-cpu-zh.sh;
			# 如遇到网络问题，中国大陆用户，可以用个下面的命令：
			# 如遇到网络问题，中国大陆用户，可以使用下面的命令：
			# curl -O https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/shell/funasr-runtime-deploy-offline-cpu-zh.sh;
			```

			执行部署工具，在提示处输入回车键即可完成服务端安装与部署。目前便捷部署工具暂时仅支持Linux环境，其他环境部署参考开发指南（[点击此处](./SDK_advanced_guide_zh.md)）
			执行部署工具，在提示处输入回车键即可完成服务端安装与部署。目前便捷部署工具暂时仅支持Linux环境，其他环境部署参考开发指南（[点击此处](./SDK_advanced_guide_offline_zh.md)）
			```shell
			sudo bash funasr-runtime-deploy-offline-cpu-zh.sh install --workspace /root/funasr-runtime-resources
			```

			### 客户端测试与使用

			运行上面安装指令后，会在/root/funasr-runtime-resources（默认安装目录）中下载客户端测试工具目录samples，
			运行上面安装指令后，会在/root/funasr-runtime-resources（默认安装目录）中下载客户端测试工具目录samples（手动下载，[点击此处](https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/sample/funasr_samples.tar.gz)），
			我们以Python语言客户端为例，进行说明，支持多种音频格式输入（.wav, .pcm, .mp3等），也支持视频输入(.mp4等)，以及多文件列表wav.scp输入，其他版本客户端请参考文档（[点击此处](#客户端用法详解)）

			```shell
			python3 wss_client_asr.py --host "127.0.0.1" --port 10095 --mode offline --audio_in "../audio/asr_example.wav" --output_dir "./results"
			python3 funasr_wss_client.py --host "127.0.0.1" --port 10095 --mode offline --audio_in "../audio/asr_example.wav"
			```

			## 客户端用法详解
			@@ -54,7 +58,7 @@
			若想直接运行client进行测试，可参考如下简易说明，以python版本为例：

			```shell
			python3 wss_client_asr.py --host "127.0.0.1" --port 10095 --mode offline --audio_in "../audio/asr_example.wav" --output_dir "./results"
			python3 funasr_wss_client.py --host "127.0.0.1" --port 10095 --mode offline --audio_in "../audio/asr_example.wav"
			```

			命令参数说明：
			@@ -63,7 +67,8 @@
			--port 10095 部署端口号
			--mode offline表示离线文件转写
			--audio_in 需要进行转写的音频文件，支持文件路径，文件列表wav.scp
			--output_dir 识别结果保存路径
			--thread_num 设置并发发送线程数，默认为1
			--ssl 设置是否开启ssl证书校验，默认1开启，设置为0关闭
			```

			### cpp-client
			@@ -78,6 +83,8 @@
			--server-ip 为FunASR runtime-SDK服务部署机器ip，默认为本机ip（127.0.0.1），如果client与服务不在同一台服务器，需要改为部署机器ip
			--port 10095 部署端口号
			--wav-path 需要进行转写的音频文件，支持文件路径
			--thread_num 设置并发发送线程数，默认为1
			--ssl 设置是否开启ssl证书校验，默认1开启，设置为0关闭
			```

			### Html网页版
			@@ -108,6 +115,13 @@
			sudo bash funasr-runtime-deploy-offline-cpu-zh.sh stop
			```

			### 释放FunASR服务

			释放已经部署的FunASR服务。
			```shell
			sudo bash funasr-runtime-deploy-offline-cpu-zh.sh remove
			```

			### 重启FunASR服务

			根据上次一键部署的设置重启启动FunASR服务。
			@@ -134,6 +148,7 @@
			sudo bash funasr-runtime-deploy-offline-cpu-zh.sh update [--host_port \| --docker_port] <port number>
			sudo bash funasr-runtime-deploy-offline-cpu-zh.sh update [--decode_thread_num \| --io_thread_num] <the number of threads>
			sudo bash funasr-runtime-deploy-offline-cpu-zh.sh update [--workspace] <workspace in local>
			sudo bash funasr-runtime-deploy-offline-cpu-zh.sh update [--ssl] <0: close SSL; 1: open SSL, default:1>

			e.g
			sudo bash funasr-runtime-deploy-offline-cpu-zh.sh update --decode_thread_num 32
			@@ -143,7 +158,7 @@

			## 服务端启动过程配置详解

			##### 选择FunASR Docker镜像
			### 选择FunASR Docker镜像
			推荐选择1)使用我们的最新发布版镜像，也可选择历史版本。
			```text
			[1/5]
			@@ -157,7 +172,7 @@
			```


			##### 设置宿主机提供给FunASR的端口
			### 设置宿主机提供给FunASR的端口
			设置提供给Docker的宿主机端口，默认为10095。请保证此端口可用。
			```text
			[2/5]
			@@ -167,6 +182,22 @@
			The port in Docker for FunASR server is 10095
			```

			### 设置SSL

			默认开启SSL校验，如果需要关闭，可以在启动时设置
			```shell
			sudo bash funasr-runtime-deploy-offline-cpu-zh.sh --ssl 0
			```

			## 联系我们

			在您使用过程中，如果遇到问题，欢迎加入用户群进行反馈


			\| 钉钉用户群 \| 微信 \|
			\|:----------------------------------------------------------------------------:\|:-----------------------------------------------------:\|
			\| <div align="left"><img src="../../../docs/images/dingding.jpg" width="250"/> \| <img src="../../../docs/images/wechat.png" width="232"/></div> \|


			## 视频demo