python/FunASR-XL.git

parent: 38c1f639 | 补丁 | 提交 | ignore whitespace

Yabin Li

2024-06-27 5853ebc98f51c79d0ae2955cefe1457cba78efe4

Merge Dev blade (#1856)

* update readme

* add benchmark_libtorch_cpp

* add benchmark_libtorch_cpp

* update readme

* update readme

* update readme

* update readme

4个文件已修改

3个文件已添加

	README.md	1 ●●●●● 补丁 \| 查看 \| 原始文档 \| blame \| 历史
	README_zh.md	1 ●●●●● 补丁 \| 查看 \| 原始文档 \| blame \| 历史
	runtime/docs/SDK_advanced_guide_offline_gpu.md	173 ●●●●● 补丁 \| 查看 \| 原始文档 \| blame \| 历史
	runtime/docs/SDK_advanced_guide_offline_gpu_zh.md	209 ●●●●● 补丁 \| 查看 \| 原始文档 \| blame \| 历史
	runtime/docs/benchmark_libtorch_cpp.md	31 ●●●●● 补丁 \| 查看 \| 原始文档 \| blame \| 历史
	runtime/readme.md	16 ●●●●● 补丁 \| 查看 \| 原始文档 \| blame \| 历史
	runtime/readme_cn.md	15 ●●●●● 补丁 \| 查看 \| 原始文档 \| blame \| 历史

 README.md

@@ -29,6 +29,7 @@

<a name="whats-new"></a>
## What's new:
- 2024/06/27: Offline File Transcription Service GPU 1.0 released, supporting dynamic batch processing and multi-threading concurrency. In the long audio test set, the single-thread RTF is 0.0076, and multi-threads' speedup is 1200+ (compared to 330+ on CPU); ref to ([docs](runtime/readme.md))
- 2024/05/15：emotion recognition models are new supported. [emotion2vec+large](https://modelscope.cn/models/iic/emotion2vec_plus_large/summary)，[emotion2vec+base](https://modelscope.cn/models/iic/emotion2vec_plus_base/summary)，[emotion2vec+seed](https://modelscope.cn/models/iic/emotion2vec_plus_seed/summary). currently supports the following categories: 0: angry 1: happy 2: neutral 3: sad 4: unknown.
- 2024/05/15: Offline File Transcription Service 4.5, Offline File Transcription Service of English 1.6，Real-time Transcription Service 1.10 released，adapting to FunASR 1.0 model structure；([docs](runtime/readme.md))
- 2024/03/05：Added the Qwen-Audio and Qwen-Audio-Chat large-scale audio-text multimodal models, which have topped multiple audio domain leaderboards. These models support speech dialogue, [usage](examples/industrial_data_pretraining/qwen_audio).

 README_zh.md

@@ -33,6 +33,7 @@

<a name="最新动态"></a>
## 最新动态
- 2024/06/27：中文离线文件转写服务GPU版本 1.0发布，支持动态batch，支持多路并发，在长音频测试集上单线RTF为0.0076，多线加速比为1200+（CPU为330+）；详细信息参阅([部署文档](runtime/readme_cn.md))
- 2024/05/15：新增加情感识别模型，[emotion2vec+large](https://modelscope.cn/models/iic/emotion2vec_plus_large/summary)，[emotion2vec+base](https://modelscope.cn/models/iic/emotion2vec_plus_base/summary)，[emotion2vec+seed](https://modelscope.cn/models/iic/emotion2vec_plus_seed/summary)，输出情感类别为：生气/angry，开心/happy，中立/neutral，难过/sad。
- 2024/05/15: 中文离线文件转写服务 4.5、英文离线文件转写服务 1.6、中文实时语音听写服务 1.10 发布，适配FunASR 1.0模型结构；详细信息参阅([部署文档](runtime/readme_cn.md))
- 2024/03/05：新增加Qwen-Audio与Qwen-Audio-Chat音频文本模态大模型，在多个音频领域测试榜单刷榜，中支持语音对话，详细用法见 [示例](examples/industrial_data_pretraining/qwen_audio)。

 runtime/docs/SDK_advanced_guide_offline_gpu.md

New file
@@ -0,0 +1,173 @@
 # Advanced Development Guide (File transcription service GPU)

([简体中文](SDK_advanced_guide_offline_gpu_zh.md)|English)

[//]: # (FunASR provides a Chinese offline file transcription service that can be deployed locally or on a cloud server with just one click. The core of the service is the FunASR runtime SDK, which has been open-sourced. FunASR-runtime combines various capabilities such as speech endpoint detection &#40;VAD&#41;, large-scale speech recognition &#40;ASR&#41; using Paraformer-large, and punctuation detection &#40;PUNC&#41;, which have all been open-sourced by the speech laboratory of DAMO Academy on the Modelscope community. This enables accurate and efficient high-concurrency transcription of audio files.)
FunASR Offline File Transcription Software Package(GPU) provides a powerful speech-to-text offline file transcription service. With a complete speech recognition pipeline, it combines models for speech endpoint detection, speech recognition, punctuation, etc., allowing for the transcription of long audio and video files, spanning several hours, into punctuated text. It supports simultaneous transcription of hundreds of concurrent requests. The output is text with punctuation, including word-level timestamps, and it supports ITN (Initial Time Normalization) and user-defined hotwords. The server-side integration includes ffmpeg, enabling support for various audio and video formats as input. The software package provides client libraries in multiple programming languages such as HTML, Python, C++, Java, and C#, allowing users to use and further develop the software.

This document serves as a development guide for the FunASR offline file transcription service. If you wish to quickly experience the offline file transcription service, please refer to the one-click deployment example for the FunASR offline file transcription service ([docs](./SDK_tutorial.md)).

<img src="images/offline_structure.jpg"  width="900"/>


| TIME       | INFO                                                                                                                             | IMAGE VERSION                | IMAGE ID     |
|------------|----------------------------------------------------------------------------------------------------------------------------------|------------------------------|--------------|
| 2024.06.27 | Offline File Transcription Software Package(GPU) 1.0 released | funasr-runtime-sdk-gpu-0.1.0 | aa10f938da3b |


## Quick start
### Docker install
If you have already installed Docker, ignore this step!
```shell
curl -O https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/shell/install_docker.sh;
sudo bash install_docker.sh
```
If you do not have Docker installed, please refer to [Docker Installation](https://alibaba-damo-academy.github.io/FunASR/en/installation/docker.html)

### Pulling and launching images
Use the following command to pull and launch the Docker image for the FunASR runtime-SDK:
```shell
sudo docker pull registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-gpu-0.1.0

sudo docker run --gpus=all -p 10098:10095 -it --privileged=true -v /root:/workspace/models registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-gpu-0.1.0
```

Introduction to command parameters: 
```text
-p <host port>:<mapped docker port>: In the example, host machine (ECS) port 10098 is mapped to port 10095 in the Docker container. Make sure that port 10098 is open in the ECS security rules.

-v <host path>:<mounted Docker path>: In the example, the host machine path /root is mounted to the Docker path /workspace/models.
```

### Starting the server
Use the flollowing script to start the server ：
```shell
nohup bash run_server.sh \
  --download-model-dir /workspace/models \
  --vad-dir damo/speech_fsmn_vad_zh-cn-16k-common-onnx \
  --model-dir damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-torchscript  \
  --punc-dir damo/punc_ct-transformer_cn-en-common-vocab471067-large-onnx \
  --lm-dir damo/speech_ngram_lm_zh-cn-ai-wesp-fst \
  --itn-dir thuduj12/fst_itn_zh \
  --hotword /workspace/models/hotwords.txt > log.txt 2>&1 &

# If you want to close ssl，please add：--certfile 0
# If you want to deploy the timestamp or nn hotword model, please set --model-dir to the corresponding model:
#   damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-torchscript（timestamp）
#   damo/speech_paraformer-large-contextual_asr_nat-zh-cn-16k-common-vocab8404-torchscript（hotword）
# If you want to load hotwords on the server side, please configure the hotwords in the host machine file ./funasr-runtime-resources/models/hotwords.txt (docker mapping address: /workspace/models/hotwords.txt):
# One hotword per line, format (hotword weight): 阿里巴巴 20"
```

### More details about the script run_server.sh:

The funasr-wss-server supports downloading models from Modelscope. You can set the model download address (--download-model-dir, default is /workspace/models) and the model ID (--model-dir, --vad-dir, --punc-dir). Here is an example:

```shell
cd /workspace/FunASR/runtime
nohup bash run_server.sh \
  --download-model-dir /workspace/models \
  --model-dir damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-torchscript \
  --vad-dir damo/speech_fsmn_vad_zh-cn-16k-common-onnx \
  --punc-dir damo/punc_ct-transformer_cn-en-common-vocab471067-large-onnx \
  --itn-dir thuduj12/fst_itn_zh \
  --lm-dir damo/speech_ngram_lm_zh-cn-ai-wesp-fst \
  --certfile  ../../../ssl_key/server.crt \
  --keyfile ../../../ssl_key/server.key \
  --hotword ../../hotwords.txt > log.txt 2>&1 &
 ```

Introduction to run_server.sh parameters: 
```text
--download-model-dir: Model download address, download models from Modelscope by setting the model ID.
--model-dir: modelscope model ID or local model path.
--vad-dir: modelscope model ID or local model path.
--punc-dir: modelscope model ID or local model path.
--itn-dir modelscope model ID or local model path.
--port: Port number that the server listens on. Default is 10095.
--decoder-thread-num: The number of thread pools on the server side that can handle concurrent requests.
--io-thread-num: Number of IO threads that the server starts.
--model-thread-num: The number of internal threads for each recognition route to control the parallelism of the ONNX model. 
        The default value is 1. It is recommended that decoder-thread-num * model-thread-num equals the total number of threads.
--certfile <string>: SSL certificate file. Default is ../../../ssl_key/server.crt. If you want to close ssl，set 0
--keyfile <string>: SSL key file. Default is ../../../ssl_key/server.key. 
--hotword: Hotword file path, one line for each hotword(e.g.:阿里巴巴 20), if the client provides hot words, then combined with the hot words provided by the client.
```

### Shutting Down the FunASR Service
```text
# Check the PID of the funasr-wss-server process
ps -x | grep funasr-wss-server
kill -9 PID
```

### Modifying Models and Other Parameters
To replace the currently used model or other parameters, you need to first shut down the FunASR service, make the necessary modifications to the parameters you want to replace, and then restart the FunASR service. The model should be either an ASR/VAD/PUNC model from ModelScope or a fine-tuned model obtained from ModelScope.
```text
# For example, to replace the ASR model with damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-torchscript, use the following parameter setting --model-dir
    --model-dir damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-torchscript 
# Set the port number using --port
    --port <port number>
# Set the number of inference threads the server will start using --decoder-thread-num
    --decoder-thread-num <decoder thread num>
# Set the number of IO threads the server will start using --io-thread-num
    --io-thread-num <io thread num>
# Disable SSL certificate
    --certfile 0
```

After executing the above command, the real-time speech transcription service will be started. If the model is specified as a ModelScope model id, the following models will be automatically downloaded from ModelScope:
[FSMN-VAD](https://www.modelscope.cn/models/damo/speech_fsmn_vad_zh-cn-16k-common-onnx/summary),
[Paraformer-lagre](https://www.modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-torchscript/summary),
[CT-Transformer](https://www.modelscope.cn/models/damo/punc_ct-transformer_cn-en-common-vocab471067-large-onnx/summary),
[FST-ITN](https://www.modelscope.cn/models/thuduj12/fst_itn_zh/summary),
[Ngram lm](https://www.modelscope.cn/models/damo/speech_ngram_lm_zh-cn-ai-wesp-fst/summary)

If you wish to deploy your fine-tuned model (e.g., 10epoch.pb), you need to manually rename the model to model.pb and replace the original model.pb in ModelScope. Then, specify the path as `model_dir`.

## Starting the client
After completing the deployment of FunASR offline file transcription service on the server, you can test and use the service by following these steps. Currently, FunASR-bin supports multiple ways to start the client. The following are command-line examples based on python-client, c++-client, and custom client Websocket communication protocol: 

### python-client
```shell
python funasr_wss_client.py --host "127.0.0.1" --port 10095 --mode offline --audio_in "./data/wav.scp" --send_without_sleep --output_dir "./results"
```
Introduction to command parameters:
```text
--host: the IP address of the server. It can be set to 127.0.0.1 for local testing.
--port: the port number of the server listener.
--audio_in: the audio input. Input can be a path to a wav file or a wav.scp file (a Kaldi-formatted wav list in which each line includes a wav_id followed by a tab and a wav_path).
--output_dir: the path to the recognition result output.
--ssl: whether to use SSL encryption. The default is to use SSL.
--mode: offline mode.
--hotword: Hotword file path, one line for each hotword(e.g.:阿里巴巴 20)
--use_itn: whether to use itn, the default value is 1 for enabling and 0 for disabling.
```

### c++-client
```shell
. /funasr-wss-client --server-ip 127.0.0.1 --port 10095 --wav-path test.wav --thread-num 1 --is-ssl 1
```

Introduction to command parameters:
```text
--server-ip: the IP address of the server. It can be set to 127.0.0.1 for local testing.
--port: the port number of the server listener.
--wav-path: the audio input. Input can be a path to a wav file or a wav.scp file (a Kaldi-formatted wav list in which each line includes a wav_id followed by a tab and a wav_path).
--is-ssl: whether to use SSL encryption. The default is to use SSL.
--hotword: Hotword file path, one line for each hotword(e.g.:阿里巴巴 20)
--use-itn: whether to use itn, the default value is 1 for enabling and 0 for disabling.
```

### Custom client
If you want to define your own client, see the [Websocket communication protocol](./websocket_protocol.md)

## How to customize service deployment
The code for FunASR-runtime is open source. If the server and client cannot fully meet your needs, you can further develop them based on your own requirements:

### C++ client
https://github.com/alibaba-damo-academy/FunASR/tree/main/runtime/websocket

### Python client
https://github.com/alibaba-damo-academy/FunASR/tree/main/runtime/python/websocket


 runtime/docs/SDK_advanced_guide_offline_gpu_zh.md

New file
@@ -0,0 +1,209 @@
# FunASR离线文件转写服务GPU版本开发指南

(简体中文|[English](SDK_advanced_guide_offline_gpu.md))

FunASR离线文件转写GPU软件包，提供了一款功能强大的语音离线文件转写服务。拥有完整的语音识别链路，结合了语音端点检测、语音识别、标点等模型，可以将几十个小时的长音频与视频识别成带标点的文字，而且支持上百路请求同时进行转写。输出为带标点的文字，含有字级别时间戳，支持ITN与用户自定义热词等。服务端集成有ffmpeg，支持各种音视频格式输入。软件包提供有html、python、c++、java与c#等多种编程语言客户端，用户可以直接使用与进一步开发。

本文档为FunASR离线文件转写服务GPU版本开发指南。如果您想快速体验离线文件转写服务，可参考[快速上手](#快速上手)。

<img src="images/offline_structure.jpg"  width="900"/>

| 时间         | 详情                                                | 镜像版本                         | 镜像ID         |
|------------|---------------------------------------------------|------------------------------|--------------|
| 2024.06.27 | 离线文件转写服务GPU版本1.0 发布                  | funasr-runtime-sdk-gpu-0.1.0 | aa10f938da3b |

## 服务器配置

用户可以根据自己的业务需求，选择合适的服务器配置，推荐配置为：
- 配置1: （GPU），8核vCPU，内存32G，V100，单机可以支持大约20路的请求

详细性能测试报告（[点击此处](./benchmark_onnx_cpp.md)）

云服务厂商，针对新用户，有3个月免费试用活动，申请教程（[点击此处](https://github.com/alibaba-damo-academy/FunASR/blob/main/runtime/docs/aliyun_server_tutorial.md)）


## 快速上手

### docker安装
如果您已安装docker，忽略本步骤！!
通过下述命令在服务器上安装docker：
```shell
curl -O https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/shell/install_docker.sh；
sudo bash install_docker.sh
```
docker安装失败请参考 [Docker Installation](https://alibaba-damo-academy.github.io/FunASR/en/installation/docker.html)

### 镜像启动

通过下述命令拉取并启动FunASR软件包的docker镜像：

```shell
sudo docker pull \
  registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-gpu-0.1.0
mkdir -p ./funasr-runtime-resources/models
sudo docker run --gpus=all -p 10098:10095 -it --privileged=true \
  -v $PWD/funasr-runtime-resources/models:/workspace/models \
  registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-gpu-0.1.0
```

### 服务端启动

docker启动之后，启动 funasr-wss-server服务程序:
```shell
cd FunASR/runtime
nohup bash run_server.sh \
  --download-model-dir /workspace/models \
  --vad-dir damo/speech_fsmn_vad_zh-cn-16k-common-onnx \
  --model-dir damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-torchscript  \
  --punc-dir damo/punc_ct-transformer_cn-en-common-vocab471067-large-onnx \
  --lm-dir damo/speech_ngram_lm_zh-cn-ai-wesp-fst \
  --itn-dir thuduj12/fst_itn_zh \
  --hotword /workspace/models/hotwords.txt > log.txt 2>&1 &

# 如果您想关闭ssl，增加参数：--certfile 0
# 默认加载时间戳模型，如果您想使用nn热词模型进行部署，请设置--model-dir为对应模型：
#   damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-torchscript（时间戳）
#   damo/speech_paraformer-large-contextual_asr_nat-zh-cn-16k-common-vocab8404-torchscript（nn热词）
# 如果您想在服务端加载热词，请在宿主机文件./funasr-runtime-resources/models/hotwords.txt配置热词（docker映射地址为/workspace/models/hotwords.txt）:
#   每行一个热词，格式(热词 权重)：阿里巴巴 20（注：热词理论上无限制，但为了兼顾性能和效果，建议热词长度不超过10，个数不超过1k，权重1~100）
```
如果您想定制ngram，参考文档([如何训练LM](./lm_train_tutorial.md))

服务端详细参数介绍可参考[服务端用法详解](#服务端用法详解)

### 客户端测试与使用

下载客户端测试工具目录samples
```shell
wget https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/sample/funasr_samples.tar.gz
```
我们以Python语言客户端为例，进行说明，支持多种音频格式输入（.wav, .pcm, .mp3等），也支持视频输入(.mp4等)，以及多文件列表wav.scp输入，其他版本客户端请参考文档（[点击此处](#客户端用法详解)），定制服务部署请参考[如何定制服务部署](#如何定制服务部署)
```shell
python3 funasr_wss_client.py --host "127.0.0.1" --port 10095 --mode offline --audio_in "../audio/asr_example.wav"
```

## 客户端用法详解

在服务器上完成FunASR服务部署以后，可以通过如下的步骤来测试和使用离线文件转写服务。
目前分别支持以下几种编程语言客户端

- [Python](#python-client)
- [CPP](#cpp-client)
- [html网页版本](#Html网页版)
- [Java](#Java-client)

### python-client
若想直接运行client进行测试，可参考如下简易说明，以python版本为例：

```shell
python3 funasr_wss_client.py --host "127.0.0.1" --port 10095 --mode offline \
        --audio_in "../audio/asr_example.wav" --output_dir "./results"
```

命令参数说明：
```text
--host 为FunASR runtime-SDK服务部署机器ip，默认为本机ip（127.0.0.1），如果client与服务不在同一台服务器，
       需要改为部署机器ip
--port 10095 部署端口号
--mode offline表示离线文件转写
--audio_in 需要进行转写的音频文件，支持文件路径，文件列表wav.scp
--thread_num 设置并发发送线程数，默认为1
--ssl 设置是否开启ssl证书校验，默认1开启，设置为0关闭
--hotword 热词文件，每行一个热词，格式(热词 权重)：阿里巴巴 20
--use_itn 设置是否使用itn，默认1开启，设置为0关闭
```

### cpp-client
进入samples/cpp目录后，可以用cpp进行测试，指令如下：
```shell
./funasr-wss-client --server-ip 127.0.0.1 --port 10095 --wav-path ../audio/asr_example.wav
```

命令参数说明：
```text
--server-ip 为FunASR runtime-SDK服务部署机器ip，默认为本机ip（127.0.0.1），如果client与服务不在同一台服务器，
            需要改为部署机器ip
--port 10095 部署端口号
--wav-path 需要进行转写的音频文件，支持文件路径
--hotword 热词文件，每行一个热词，格式(热词 权重)：阿里巴巴 20
--thread-num 设置客户端线程数
--use-itn 设置是否使用itn，默认1开启，设置为0关闭
```

### Html网页版
在浏览器中打开 html/static/index.html，即可出现如下页面，支持麦克风输入与文件上传，直接进行体验

<img src="images/html.png"  width="900"/>

### Java-client
```shell
FunasrWsClient --host localhost --port 10095 --audio_in ./asr_example.wav --mode offline
```
详细可以参考文档（[点击此处](../java/readme.md)）

## 服务端用法详解：

### 启动FunASR服务
```shell
cd /workspace/FunASR/runtime
nohup bash run_server.sh \
  --download-model-dir /workspace/models \
  --model-dir damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-torchscript \
  --vad-dir damo/speech_fsmn_vad_zh-cn-16k-common-onnx \
  --punc-dir damo/punc_ct-transformer_cn-en-common-vocab471067-large-onnx \
  --lm-dir damo/speech_ngram_lm_zh-cn-ai-wesp-fst \
  --itn-dir thuduj12/fst_itn_zh \
  --certfile  ../../../ssl_key/server.crt \
  --keyfile ../../../ssl_key/server.key \
  --hotword ../../hotwords.txt  > log.txt 2>&1 &
 ```
**run_server.sh命令参数介绍**
```text
--download-model-dir 模型下载地址，通过设置model ID从Modelscope下载模型
--model-dir  modelscope model ID 或者 本地模型路径
--vad-dir  modelscope model ID 或者 本地模型路径
--punc-dir  modelscope model ID 或者 本地模型路径
--lm-dir modelscope model ID 或者 本地模型路径
--itn-dir modelscope model ID 或者 本地模型路径
--port  服务端监听的端口号，默认为 10095
--decoder-thread-num  服务端线程池个数(支持的最大并发路数)，
                      **建议每路分配1G显存，即20G显存可配置20路并发**
--io-thread-num  服务端启动的IO线程数
--model-thread-num  每路识别的内部线程数(控制ONNX模型的并行)，默认为 1，
                    其中建议 decoder-thread-num*model-thread-num 等于总线程数
--certfile  ssl的证书文件，默认为：../../../ssl_key/server.crt，如果需要关闭ssl，参数设置为0
--keyfile   ssl的密钥文件，默认为：../../../ssl_key/server.key
--hotword   热词文件路径，每行一个热词，格式：热词 权重(例如:阿里巴巴 20)，
            如果客户端提供热词，则与客户端提供的热词合并一起使用，服务端热词全局生效，客户端热词只针对对应客户端生效。
```

### 关闭FunASR服务
```text
# 查看 funasr-wss-server 对应的PID
ps -x | grep funasr-wss-server
kill -9 PID
```

### 修改模型及其他参数
替换正在使用的模型或者其他参数，需先关闭FunASR服务，修改需要替换的参数，并重新启动FunASR服务。其中模型需为ModelScope中的ASR/VAD/PUNC模型，或者从ModelScope中模型finetune后的模型。
```text
# 例如替换ASR模型为 damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-torchscript，则如下设置参数 --model-dir
    --model-dir damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-torchscript 
# 设置端口号 --port
    --port <port number>
# 设置服务端启动的推理线程数 --decoder-thread-num
    --decoder-thread-num <decoder thread num>
# 设置服务端启动的IO线程数 --io-thread-num
    --io-thread-num <io thread num>
# 关闭SSL证书 
    --certfile 0
```

执行上述指令后，启动离线文件转写服务。如果模型指定为ModelScope中model id，会自动从MoldeScope中下载如下模型：
[FSMN-VAD模型](https://www.modelscope.cn/models/damo/speech_fsmn_vad_zh-cn-16k-common-onnx/summary),
[Paraformer-lagre模型](https://www.modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-torchscript/summary),
[CT-Transformer标点预测模型](https://www.modelscope.cn/models/damo/punc_ct-transformer_cn-en-common-vocab471067-large-onnx/summary),
[基于FST的中文ITN](https://www.modelscope.cn/models/thuduj12/fst_itn_zh/summary),
[Ngram中文语言模型](https://www.modelscope.cn/models/damo/speech_ngram_lm_zh-cn-ai-wesp-fst/summary)

如果，您希望部署您finetune后的模型（例如10epoch.pb），需要手动将模型重命名为model.pb，并将原modelscope中模型model.pb替换掉，将路径指定为`model_dir`即可。

 runtime/docs/benchmark_libtorch_cpp.md

New file
@@ -0,0 +1,31 @@
# GPU Benchmark (libtorch-cpp)

## Configuration
### Data set:
A long audio test set(Non-open source) containing 103 audio files, with durations ranging from 2 to 30 minutes.

## [FSMN-VAD](https://www.modelscope.cn/models/damo/speech_fsmn_vad_zh-cn-16k-common-onnx/summary) + [Paraformer-large](https://www.modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-torchscript/summary) + [CT-Transformer](https://www.modelscope.cn/models/damo/punc_ct-transformer_zh-cn-common-vocab272727-onnx/summary) 

```shell
./funasr-onnx-offline-rtf \
    --model-dir    ./damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-torchscript \
    --vad-dir   ./damo/speech_fsmn_vad_zh-cn-16k-common-onnx \
    --punc-dir  ./damo/punc_ct-transformer_cn-en-common-vocab471067-large-onnx \
    --gpu \
    --thread-num 20 \
    --bladedisc true \
    --batch-size 20 \
    --wav-path     ./long_test.scp
```
Node: run in docker, ref to ([docs](./SDK_advanced_guide_offline_gpu_zh.md))

### Intel(R) Xeon(R) Platinum 8369B CPU @ 2.90GHz 16core-32processor with avx512_vnni, GPU @ A10

| concurrent-tasks | batch  |   RTF  | Speedup Rate |
|------------------|:------:|:------:|:------------:|
| 1                |   1    | 0.0076 |      130     |
| 1                |   20   | 0.0048 |      208     |
| 20               |   20   | 0.0008 |      1200    |

Node: On CPUs, the single-thread RTF is 0.066, and 32-threads' speedup is 330+


 runtime/readme.md

@@ -7,9 +7,23 @@
- File transcription service, Mandarin, CPU version, done
- The real-time transcription service, Mandarin (CPU), done
- File transcription service, English, CPU version, done
- File transcription service, Mandarin, GPU version, in progress
- File transcription service, Mandarin, GPU version, done
- and more.

## File Transcription Service, Mandarin (GPU)

Currently, the FunASR runtime-SDK supports the deployment of file transcription service, Mandarin (GPU version), with a complete speech recognition chain that can transcribe tens of hours of audio into punctuated text, and supports recognition for more than a hundred concurrent streams. 

To meet the needs of different users, we have prepared different tutorials with text and images for both novice and advanced developers.

### Whats-new
- 2024/06/27: File Transcription Service 1.0 GPU released, supporting dynamic batch processing and multi-threading concurrency. In the long audio test set, the single-thread RTF is 0.0076, and multi-threads' speedup is 1200+ (compared to 330+ on CPU), ref to([docs](./docs/benchmark_libtorch_cpp.md)) , docker image version funasr-runtime-sdk-gpu-0.1.0 (aa10f938da3b)

### Advanced Development Guide

The documentation mainly targets advanced developers who require modifications and customization of the service. It supports downloading model deployments from modelscope and also supports deploying models that users have fine-tuned. For detailed information, please refer to the documentation available by [docs](./docs/SDK_advanced_guide_offline_gpu.md)


## File Transcription Service, English (CPU)

Currently, the FunASR runtime-SDK supports the deployment of file transcription service, English (CPU version), with a complete speech recognition chain that can transcribe tens of hours of audio into punctuated text, and supports recognition for more than a hundred concurrent streams. 

 runtime/readme_cn.md

@@ -10,9 +10,22 @@
- 中文离线文件转写服务（CPU版本），已完成
- 中文流式语音识别服务（CPU版本），已完成
- 英文离线文件转写服务（CPU版本），已完成
- 中文离线文件转写服务（GPU版本），进行中
- 中文离线文件转写服务（GPU版本），已完成
- 更多支持中

## 中文离线文件转写服务（GPU版本）

中文语音离线文件服务部署（GPU版本），拥有完整的语音识别链路，可以将几十个小时的长音频与视频识别成带标点的文字，而且支持多路请求同时进行转写。
为了支持不同用户的需求，针对不同场景，准备了不同的图文教程：

### 最新动态
- 2024/06/27:   中文离线文件转写服务GPU 1.0 发布，支持动态batch，支持多路并发，在长音频测试集上单线RTF为0.0076，多线加速比为1200+（CPU为330+），详见([文档](./docs/benchmark_libtorch_cpp.md))，dokcer镜像版本funasr-runtime-sdk-gpu-0.1.0 (aa10f938da3b)

### 部署与开发文档

部署模型来自于ModelScope，或者用户finetune，支持用户定制服务，详细文档参考（[点击此处](./docs/SDK_advanced_guide_offline_gpu_zh.md)）


## 英文离线文件转写服务（CPU版本）

英文离线文件转写服务部署（CPU版本），拥有完整的语音识别链路，可以将几十个小时的长音频与视频识别成带标点的文字，而且支持上百路请求同时进行转写。

			@@ -29,6 +29,7 @@

			<a name="whats-new"></a>
			## What's new:
			- 2024/06/27: Offline File Transcription Service GPU 1.0 released, supporting dynamic batch processing and multi-threading concurrency. In the long audio test set, the single-thread RTF is 0.0076, and multi-threads' speedup is 1200+ (compared to 330+ on CPU); ref to ([docs](runtime/readme.md))
			- 2024/05/15：emotion recognition models are new supported. [emotion2vec+large](https://modelscope.cn/models/iic/emotion2vec_plus_large/summary)，[emotion2vec+base](https://modelscope.cn/models/iic/emotion2vec_plus_base/summary)，[emotion2vec+seed](https://modelscope.cn/models/iic/emotion2vec_plus_seed/summary). currently supports the following categories: 0: angry 1: happy 2: neutral 3: sad 4: unknown.
			- 2024/05/15: Offline File Transcription Service 4.5, Offline File Transcription Service of English 1.6，Real-time Transcription Service 1.10 released，adapting to FunASR 1.0 model structure；([docs](runtime/readme.md))
			- 2024/03/05：Added the Qwen-Audio and Qwen-Audio-Chat large-scale audio-text multimodal models, which have topped multiple audio domain leaderboards. These models support speech dialogue, [usage](examples/industrial_data_pretraining/qwen_audio).

			@@ -33,6 +33,7 @@

			<a name="最新动态"></a>
			## 最新动态
			- 2024/06/27：中文离线文件转写服务GPU版本 1.0发布，支持动态batch，支持多路并发，在长音频测试集上单线RTF为0.0076，多线加速比为1200+（CPU为330+）；详细信息参阅([部署文档](runtime/readme_cn.md))
			- 2024/05/15：新增加情感识别模型，[emotion2vec+large](https://modelscope.cn/models/iic/emotion2vec_plus_large/summary)，[emotion2vec+base](https://modelscope.cn/models/iic/emotion2vec_plus_base/summary)，[emotion2vec+seed](https://modelscope.cn/models/iic/emotion2vec_plus_seed/summary)，输出情感类别为：生气/angry，开心/happy，中立/neutral，难过/sad。
			- 2024/05/15: 中文离线文件转写服务 4.5、英文离线文件转写服务 1.6、中文实时语音听写服务 1.10 发布，适配FunASR 1.0模型结构；详细信息参阅([部署文档](runtime/readme_cn.md))
			- 2024/03/05：新增加Qwen-Audio与Qwen-Audio-Chat音频文本模态大模型，在多个音频领域测试榜单刷榜，中支持语音对话，详细用法见 [示例](examples/industrial_data_pretraining/qwen_audio)。

New file
			@@ -0,0 +1,173 @@
			# Advanced Development Guide (File transcription service GPU)

			([简体中文](SDK_advanced_guide_offline_gpu_zh.md)\|English)

			[//]: # (FunASR provides a Chinese offline file transcription service that can be deployed locally or on a cloud server with just one click. The core of the service is the FunASR runtime SDK, which has been open-sourced. FunASR-runtime combines various capabilities such as speech endpoint detection (VAD), large-scale speech recognition (ASR) using Paraformer-large, and punctuation detection (PUNC), which have all been open-sourced by the speech laboratory of DAMO Academy on the Modelscope community. This enables accurate and efficient high-concurrency transcription of audio files.)
			FunASR Offline File Transcription Software Package(GPU) provides a powerful speech-to-text offline file transcription service. With a complete speech recognition pipeline, it combines models for speech endpoint detection, speech recognition, punctuation, etc., allowing for the transcription of long audio and video files, spanning several hours, into punctuated text. It supports simultaneous transcription of hundreds of concurrent requests. The output is text with punctuation, including word-level timestamps, and it supports ITN (Initial Time Normalization) and user-defined hotwords. The server-side integration includes ffmpeg, enabling support for various audio and video formats as input. The software package provides client libraries in multiple programming languages such as HTML, Python, C++, Java, and C#, allowing users to use and further develop the software.

			This document serves as a development guide for the FunASR offline file transcription service. If you wish to quickly experience the offline file transcription service, please refer to the one-click deployment example for the FunASR offline file transcription service ([docs](./SDK_tutorial.md)).

			<img src="images/offline_structure.jpg" width="900"/>


			\| TIME \| INFO \| IMAGE VERSION \| IMAGE ID \|
			\|------------\|----------------------------------------------------------------------------------------------------------------------------------\|------------------------------\|--------------\|
			\| 2024.06.27 \| Offline File Transcription Software Package(GPU) 1.0 released \| funasr-runtime-sdk-gpu-0.1.0 \| aa10f938da3b \|


			## Quick start
			### Docker install
			If you have already installed Docker, ignore this step!
			```shell
			curl -O https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/shell/install_docker.sh;
			sudo bash install_docker.sh
			```
			If you do not have Docker installed, please refer to [Docker Installation](https://alibaba-damo-academy.github.io/FunASR/en/installation/docker.html)

			### Pulling and launching images
			Use the following command to pull and launch the Docker image for the FunASR runtime-SDK:
			```shell
			sudo docker pull registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-gpu-0.1.0

			sudo docker run --gpus=all -p 10098:10095 -it --privileged=true -v /root:/workspace/models registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-gpu-0.1.0
			```

			Introduction to command parameters:
			```text
			-p <host port>:<mapped docker port>: In the example, host machine (ECS) port 10098 is mapped to port 10095 in the Docker container. Make sure that port 10098 is open in the ECS security rules.

			-v <host path>:<mounted Docker path>: In the example, the host machine path /root is mounted to the Docker path /workspace/models.
			```

			### Starting the server
			Use the flollowing script to start the server ：
			```shell
			nohup bash run_server.sh \
			--download-model-dir /workspace/models \
			--vad-dir damo/speech_fsmn_vad_zh-cn-16k-common-onnx \
			--model-dir damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-torchscript \
			--punc-dir damo/punc_ct-transformer_cn-en-common-vocab471067-large-onnx \
			--lm-dir damo/speech_ngram_lm_zh-cn-ai-wesp-fst \
			--itn-dir thuduj12/fst_itn_zh \
			--hotword /workspace/models/hotwords.txt > log.txt 2>&1 &

			# If you want to close ssl，please add：--certfile 0
			# If you want to deploy the timestamp or nn hotword model, please set --model-dir to the corresponding model:
			# damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-torchscript（timestamp）
			# damo/speech_paraformer-large-contextual_asr_nat-zh-cn-16k-common-vocab8404-torchscript（hotword）
			# If you want to load hotwords on the server side, please configure the hotwords in the host machine file ./funasr-runtime-resources/models/hotwords.txt (docker mapping address: /workspace/models/hotwords.txt):
			# One hotword per line, format (hotword weight): 阿里巴巴 20"
			```

			### More details about the script run_server.sh:

			The funasr-wss-server supports downloading models from Modelscope. You can set the model download address (--download-model-dir, default is /workspace/models) and the model ID (--model-dir, --vad-dir, --punc-dir). Here is an example:

			```shell
			cd /workspace/FunASR/runtime
			nohup bash run_server.sh \
			--download-model-dir /workspace/models \
			--model-dir damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-torchscript \
			--vad-dir damo/speech_fsmn_vad_zh-cn-16k-common-onnx \
			--punc-dir damo/punc_ct-transformer_cn-en-common-vocab471067-large-onnx \
			--itn-dir thuduj12/fst_itn_zh \
			--lm-dir damo/speech_ngram_lm_zh-cn-ai-wesp-fst \
			--certfile ../../../ssl_key/server.crt \
			--keyfile ../../../ssl_key/server.key \
			--hotword ../../hotwords.txt > log.txt 2>&1 &
			```

			Introduction to run_server.sh parameters:
			```text
			--download-model-dir: Model download address, download models from Modelscope by setting the model ID.
			--model-dir: modelscope model ID or local model path.
			--vad-dir: modelscope model ID or local model path.
			--punc-dir: modelscope model ID or local model path.
			--itn-dir modelscope model ID or local model path.
			--port: Port number that the server listens on. Default is 10095.
			--decoder-thread-num: The number of thread pools on the server side that can handle concurrent requests.
			--io-thread-num: Number of IO threads that the server starts.
			--model-thread-num: The number of internal threads for each recognition route to control the parallelism of the ONNX model.
			The default value is 1. It is recommended that decoder-thread-num * model-thread-num equals the total number of threads.
			--certfile <string>: SSL certificate file. Default is ../../../ssl_key/server.crt. If you want to close ssl，set 0
			--keyfile <string>: SSL key file. Default is ../../../ssl_key/server.key.
			--hotword: Hotword file path, one line for each hotword(e.g.:阿里巴巴 20), if the client provides hot words, then combined with the hot words provided by the client.
			```

			### Shutting Down the FunASR Service
			```text
			# Check the PID of the funasr-wss-server process
			ps -x \| grep funasr-wss-server
			kill -9 PID
			```

			### Modifying Models and Other Parameters
			To replace the currently used model or other parameters, you need to first shut down the FunASR service, make the necessary modifications to the parameters you want to replace, and then restart the FunASR service. The model should be either an ASR/VAD/PUNC model from ModelScope or a fine-tuned model obtained from ModelScope.
			```text
			# For example, to replace the ASR model with damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-torchscript, use the following parameter setting --model-dir
			--model-dir damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-torchscript
			# Set the port number using --port
			--port <port number>
			# Set the number of inference threads the server will start using --decoder-thread-num
			--decoder-thread-num <decoder thread num>
			# Set the number of IO threads the server will start using --io-thread-num
			--io-thread-num <io thread num>
			# Disable SSL certificate
			--certfile 0
			```

			After executing the above command, the real-time speech transcription service will be started. If the model is specified as a ModelScope model id, the following models will be automatically downloaded from ModelScope:
			[FSMN-VAD](https://www.modelscope.cn/models/damo/speech_fsmn_vad_zh-cn-16k-common-onnx/summary),
			[Paraformer-lagre](https://www.modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-torchscript/summary),
			[CT-Transformer](https://www.modelscope.cn/models/damo/punc_ct-transformer_cn-en-common-vocab471067-large-onnx/summary),
			[FST-ITN](https://www.modelscope.cn/models/thuduj12/fst_itn_zh/summary),
			[Ngram lm](https://www.modelscope.cn/models/damo/speech_ngram_lm_zh-cn-ai-wesp-fst/summary)

			If you wish to deploy your fine-tuned model (e.g., 10epoch.pb), you need to manually rename the model to model.pb and replace the original model.pb in ModelScope. Then, specify the path as `model_dir`.

			## Starting the client
			After completing the deployment of FunASR offline file transcription service on the server, you can test and use the service by following these steps. Currently, FunASR-bin supports multiple ways to start the client. The following are command-line examples based on python-client, c++-client, and custom client Websocket communication protocol:

			### python-client
			```shell
			python funasr_wss_client.py --host "127.0.0.1" --port 10095 --mode offline --audio_in "./data/wav.scp" --send_without_sleep --output_dir "./results"
			```
			Introduction to command parameters:
			```text
			--host: the IP address of the server. It can be set to 127.0.0.1 for local testing.
			--port: the port number of the server listener.
			--audio_in: the audio input. Input can be a path to a wav file or a wav.scp file (a Kaldi-formatted wav list in which each line includes a wav_id followed by a tab and a wav_path).
			--output_dir: the path to the recognition result output.
			--ssl: whether to use SSL encryption. The default is to use SSL.
			--mode: offline mode.
			--hotword: Hotword file path, one line for each hotword(e.g.:阿里巴巴 20)
			--use_itn: whether to use itn, the default value is 1 for enabling and 0 for disabling.
			```

			### c++-client
			```shell
			. /funasr-wss-client --server-ip 127.0.0.1 --port 10095 --wav-path test.wav --thread-num 1 --is-ssl 1
			```

			Introduction to command parameters:
			```text
			--server-ip: the IP address of the server. It can be set to 127.0.0.1 for local testing.
			--port: the port number of the server listener.
			--wav-path: the audio input. Input can be a path to a wav file or a wav.scp file (a Kaldi-formatted wav list in which each line includes a wav_id followed by a tab and a wav_path).
			--is-ssl: whether to use SSL encryption. The default is to use SSL.
			--hotword: Hotword file path, one line for each hotword(e.g.:阿里巴巴 20)
			--use-itn: whether to use itn, the default value is 1 for enabling and 0 for disabling.
			```

			### Custom client
			If you want to define your own client, see the [Websocket communication protocol](./websocket_protocol.md)

			## How to customize service deployment
			The code for FunASR-runtime is open source. If the server and client cannot fully meet your needs, you can further develop them based on your own requirements:

			### C++ client
			https://github.com/alibaba-damo-academy/FunASR/tree/main/runtime/websocket

			### Python client
			https://github.com/alibaba-damo-academy/FunASR/tree/main/runtime/python/websocket

New file
			@@ -0,0 +1,209 @@
			# FunASR离线文件转写服务GPU版本开发指南

			(简体中文\|[English](SDK_advanced_guide_offline_gpu.md))

			FunASR离线文件转写GPU软件包，提供了一款功能强大的语音离线文件转写服务。拥有完整的语音识别链路，结合了语音端点检测、语音识别、标点等模型，可以将几十个小时的长音频与视频识别成带标点的文字，而且支持上百路请求同时进行转写。输出为带标点的文字，含有字级别时间戳，支持ITN与用户自定义热词等。服务端集成有ffmpeg，支持各种音视频格式输入。软件包提供有html、python、c++、java与c#等多种编程语言客户端，用户可以直接使用与进一步开发。

			本文档为FunASR离线文件转写服务GPU版本开发指南。如果您想快速体验离线文件转写服务，可参考[快速上手](#快速上手)。

			<img src="images/offline_structure.jpg" width="900"/>

			\| 时间 \| 详情 \| 镜像版本 \| 镜像ID \|
			\|------------\|---------------------------------------------------\|------------------------------\|--------------\|
			\| 2024.06.27 \| 离线文件转写服务GPU版本1.0 发布 \| funasr-runtime-sdk-gpu-0.1.0 \| aa10f938da3b \|

			## 服务器配置

			用户可以根据自己的业务需求，选择合适的服务器配置，推荐配置为：
			- 配置1: （GPU），8核vCPU，内存32G，V100，单机可以支持大约20路的请求

			详细性能测试报告（[点击此处](./benchmark_onnx_cpp.md)）

			云服务厂商，针对新用户，有3个月免费试用活动，申请教程（[点击此处](https://github.com/alibaba-damo-academy/FunASR/blob/main/runtime/docs/aliyun_server_tutorial.md)）


			## 快速上手

			### docker安装
			如果您已安装docker，忽略本步骤！!
			通过下述命令在服务器上安装docker：
			```shell
			curl -O https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/shell/install_docker.sh；
			sudo bash install_docker.sh
			```
			docker安装失败请参考 [Docker Installation](https://alibaba-damo-academy.github.io/FunASR/en/installation/docker.html)

			### 镜像启动

			通过下述命令拉取并启动FunASR软件包的docker镜像：

			```shell
			sudo docker pull \
			registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-gpu-0.1.0
			mkdir -p ./funasr-runtime-resources/models
			sudo docker run --gpus=all -p 10098:10095 -it --privileged=true \
			-v $PWD/funasr-runtime-resources/models:/workspace/models \
			registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-gpu-0.1.0
			```

			### 服务端启动

			docker启动之后，启动 funasr-wss-server服务程序:
			```shell
			cd FunASR/runtime
			nohup bash run_server.sh \
			--download-model-dir /workspace/models \
			--vad-dir damo/speech_fsmn_vad_zh-cn-16k-common-onnx \
			--model-dir damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-torchscript \
			--punc-dir damo/punc_ct-transformer_cn-en-common-vocab471067-large-onnx \
			--lm-dir damo/speech_ngram_lm_zh-cn-ai-wesp-fst \
			--itn-dir thuduj12/fst_itn_zh \
			--hotword /workspace/models/hotwords.txt > log.txt 2>&1 &

			# 如果您想关闭ssl，增加参数：--certfile 0
			# 默认加载时间戳模型，如果您想使用nn热词模型进行部署，请设置--model-dir为对应模型：
			# damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-torchscript（时间戳）
			# damo/speech_paraformer-large-contextual_asr_nat-zh-cn-16k-common-vocab8404-torchscript（nn热词）
			# 如果您想在服务端加载热词，请在宿主机文件./funasr-runtime-resources/models/hotwords.txt配置热词（docker映射地址为/workspace/models/hotwords.txt）:
			# 每行一个热词，格式(热词权重)：阿里巴巴 20（注：热词理论上无限制，但为了兼顾性能和效果，建议热词长度不超过10，个数不超过1k，权重1~100）
			```
			如果您想定制ngram，参考文档([如何训练LM](./lm_train_tutorial.md))

			服务端详细参数介绍可参考[服务端用法详解](#服务端用法详解)

			### 客户端测试与使用

			下载客户端测试工具目录samples
			```shell
			wget https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/sample/funasr_samples.tar.gz
			```
			我们以Python语言客户端为例，进行说明，支持多种音频格式输入（.wav, .pcm, .mp3等），也支持视频输入(.mp4等)，以及多文件列表wav.scp输入，其他版本客户端请参考文档（[点击此处](#客户端用法详解)），定制服务部署请参考[如何定制服务部署](#如何定制服务部署)
			```shell
			python3 funasr_wss_client.py --host "127.0.0.1" --port 10095 --mode offline --audio_in "../audio/asr_example.wav"
			```

			## 客户端用法详解

			在服务器上完成FunASR服务部署以后，可以通过如下的步骤来测试和使用离线文件转写服务。
			目前分别支持以下几种编程语言客户端

			- [Python](#python-client)
			- [CPP](#cpp-client)
			- [html网页版本](#Html网页版)
			- [Java](#Java-client)

			### python-client
			若想直接运行client进行测试，可参考如下简易说明，以python版本为例：

			```shell
			python3 funasr_wss_client.py --host "127.0.0.1" --port 10095 --mode offline \
			--audio_in "../audio/asr_example.wav" --output_dir "./results"
			```

			命令参数说明：
			```text
			--host 为FunASR runtime-SDK服务部署机器ip，默认为本机ip（127.0.0.1），如果client与服务不在同一台服务器，
			需要改为部署机器ip
			--port 10095 部署端口号
			--mode offline表示离线文件转写
			--audio_in 需要进行转写的音频文件，支持文件路径，文件列表wav.scp
			--thread_num 设置并发发送线程数，默认为1
			--ssl 设置是否开启ssl证书校验，默认1开启，设置为0关闭
			--hotword 热词文件，每行一个热词，格式(热词权重)：阿里巴巴 20
			--use_itn 设置是否使用itn，默认1开启，设置为0关闭
			```

			### cpp-client
			进入samples/cpp目录后，可以用cpp进行测试，指令如下：
			```shell
			./funasr-wss-client --server-ip 127.0.0.1 --port 10095 --wav-path ../audio/asr_example.wav
			```

			命令参数说明：
			```text
			--server-ip 为FunASR runtime-SDK服务部署机器ip，默认为本机ip（127.0.0.1），如果client与服务不在同一台服务器，
			需要改为部署机器ip
			--port 10095 部署端口号
			--wav-path 需要进行转写的音频文件，支持文件路径
			--hotword 热词文件，每行一个热词，格式(热词权重)：阿里巴巴 20
			--thread-num 设置客户端线程数
			--use-itn 设置是否使用itn，默认1开启，设置为0关闭
			```

			### Html网页版
			在浏览器中打开 html/static/index.html，即可出现如下页面，支持麦克风输入与文件上传，直接进行体验

			<img src="images/html.png" width="900"/>

			### Java-client
			```shell
			FunasrWsClient --host localhost --port 10095 --audio_in ./asr_example.wav --mode offline
			```
			详细可以参考文档（[点击此处](../java/readme.md)）

			## 服务端用法详解：

			### 启动FunASR服务
			```shell
			cd /workspace/FunASR/runtime
			nohup bash run_server.sh \
			--download-model-dir /workspace/models \
			--model-dir damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-torchscript \
			--vad-dir damo/speech_fsmn_vad_zh-cn-16k-common-onnx \
			--punc-dir damo/punc_ct-transformer_cn-en-common-vocab471067-large-onnx \
			--lm-dir damo/speech_ngram_lm_zh-cn-ai-wesp-fst \
			--itn-dir thuduj12/fst_itn_zh \
			--certfile ../../../ssl_key/server.crt \
			--keyfile ../../../ssl_key/server.key \
			--hotword ../../hotwords.txt > log.txt 2>&1 &
			```
			run_server.sh命令参数介绍
			```text
			--download-model-dir 模型下载地址，通过设置model ID从Modelscope下载模型
			--model-dir modelscope model ID 或者本地模型路径
			--vad-dir modelscope model ID 或者本地模型路径
			--punc-dir modelscope model ID 或者本地模型路径
			--lm-dir modelscope model ID 或者本地模型路径
			--itn-dir modelscope model ID 或者本地模型路径
			--port 服务端监听的端口号，默认为 10095
			--decoder-thread-num 服务端线程池个数(支持的最大并发路数)，
			建议每路分配1G显存，即20G显存可配置20路并发
			--io-thread-num 服务端启动的IO线程数
			--model-thread-num 每路识别的内部线程数(控制ONNX模型的并行)，默认为 1，
			其中建议 decoder-thread-num*model-thread-num 等于总线程数
			--certfile ssl的证书文件，默认为：../../../ssl_key/server.crt，如果需要关闭ssl，参数设置为0
			--keyfile ssl的密钥文件，默认为：../../../ssl_key/server.key
			--hotword 热词文件路径，每行一个热词，格式：热词权重(例如:阿里巴巴 20)，
			如果客户端提供热词，则与客户端提供的热词合并一起使用，服务端热词全局生效，客户端热词只针对对应客户端生效。
			```

			### 关闭FunASR服务
			```text
			# 查看 funasr-wss-server 对应的PID
			ps -x \| grep funasr-wss-server
			kill -9 PID
			```

			### 修改模型及其他参数
			替换正在使用的模型或者其他参数，需先关闭FunASR服务，修改需要替换的参数，并重新启动FunASR服务。其中模型需为ModelScope中的ASR/VAD/PUNC模型，或者从ModelScope中模型finetune后的模型。
			```text
			# 例如替换ASR模型为 damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-torchscript，则如下设置参数 --model-dir
			--model-dir damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-torchscript
			# 设置端口号 --port
			--port <port number>
			# 设置服务端启动的推理线程数 --decoder-thread-num
			--decoder-thread-num <decoder thread num>
			# 设置服务端启动的IO线程数 --io-thread-num
			--io-thread-num <io thread num>
			# 关闭SSL证书
			--certfile 0
			```

			执行上述指令后，启动离线文件转写服务。如果模型指定为ModelScope中model id，会自动从MoldeScope中下载如下模型：
			[FSMN-VAD模型](https://www.modelscope.cn/models/damo/speech_fsmn_vad_zh-cn-16k-common-onnx/summary),
			[Paraformer-lagre模型](https://www.modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-torchscript/summary),
			[CT-Transformer标点预测模型](https://www.modelscope.cn/models/damo/punc_ct-transformer_cn-en-common-vocab471067-large-onnx/summary),
			[基于FST的中文ITN](https://www.modelscope.cn/models/thuduj12/fst_itn_zh/summary),
			[Ngram中文语言模型](https://www.modelscope.cn/models/damo/speech_ngram_lm_zh-cn-ai-wesp-fst/summary)

			如果，您希望部署您finetune后的模型（例如10epoch.pb），需要手动将模型重命名为model.pb，并将原modelscope中模型model.pb替换掉，将路径指定为`model_dir`即可。

New file
			@@ -0,0 +1,31 @@
			# GPU Benchmark (libtorch-cpp)

			## Configuration
			### Data set:
			A long audio test set(Non-open source) containing 103 audio files, with durations ranging from 2 to 30 minutes.

			## [FSMN-VAD](https://www.modelscope.cn/models/damo/speech_fsmn_vad_zh-cn-16k-common-onnx/summary) + [Paraformer-large](https://www.modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-torchscript/summary) + [CT-Transformer](https://www.modelscope.cn/models/damo/punc_ct-transformer_zh-cn-common-vocab272727-onnx/summary)

			```shell
			./funasr-onnx-offline-rtf \
			--model-dir ./damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-torchscript \
			--vad-dir ./damo/speech_fsmn_vad_zh-cn-16k-common-onnx \
			--punc-dir ./damo/punc_ct-transformer_cn-en-common-vocab471067-large-onnx \
			--gpu \
			--thread-num 20 \
			--bladedisc true \
			--batch-size 20 \
			--wav-path ./long_test.scp
			```
			Node: run in docker, ref to ([docs](./SDK_advanced_guide_offline_gpu_zh.md))

			### Intel(R) Xeon(R) Platinum 8369B CPU @ 2.90GHz 16core-32processor with avx512_vnni, GPU @ A10

			\| concurrent-tasks \| batch \| RTF \| Speedup Rate \|
			\|------------------\|:------:\|:------:\|:------------:\|
			\| 1 \| 1 \| 0.0076 \| 130 \|
			\| 1 \| 20 \| 0.0048 \| 208 \|
			\| 20 \| 20 \| 0.0008 \| 1200 \|

			Node: On CPUs, the single-thread RTF is 0.066, and 32-threads' speedup is 330+

			@@ -7,9 +7,23 @@
			- File transcription service, Mandarin, CPU version, done
			- The real-time transcription service, Mandarin (CPU), done
			- File transcription service, English, CPU version, done
			- File transcription service, Mandarin, GPU version, in progress
			- File transcription service, Mandarin, GPU version, done
			- and more.

			## File Transcription Service, Mandarin (GPU)

			Currently, the FunASR runtime-SDK supports the deployment of file transcription service, Mandarin (GPU version), with a complete speech recognition chain that can transcribe tens of hours of audio into punctuated text, and supports recognition for more than a hundred concurrent streams.

			To meet the needs of different users, we have prepared different tutorials with text and images for both novice and advanced developers.

			### Whats-new
			- 2024/06/27: File Transcription Service 1.0 GPU released, supporting dynamic batch processing and multi-threading concurrency. In the long audio test set, the single-thread RTF is 0.0076, and multi-threads' speedup is 1200+ (compared to 330+ on CPU), ref to([docs](./docs/benchmark_libtorch_cpp.md)) , docker image version funasr-runtime-sdk-gpu-0.1.0 (aa10f938da3b)

			### Advanced Development Guide

			The documentation mainly targets advanced developers who require modifications and customization of the service. It supports downloading model deployments from modelscope and also supports deploying models that users have fine-tuned. For detailed information, please refer to the documentation available by [docs](./docs/SDK_advanced_guide_offline_gpu.md)


			## File Transcription Service, English (CPU)

			Currently, the FunASR runtime-SDK supports the deployment of file transcription service, English (CPU version), with a complete speech recognition chain that can transcribe tens of hours of audio into punctuated text, and supports recognition for more than a hundred concurrent streams.

			@@ -10,9 +10,22 @@
			- 中文离线文件转写服务（CPU版本），已完成
			- 中文流式语音识别服务（CPU版本），已完成
			- 英文离线文件转写服务（CPU版本），已完成
			- 中文离线文件转写服务（GPU版本），进行中
			- 中文离线文件转写服务（GPU版本），已完成
			- 更多支持中

			## 中文离线文件转写服务（GPU版本）

			中文语音离线文件服务部署（GPU版本），拥有完整的语音识别链路，可以将几十个小时的长音频与视频识别成带标点的文字，而且支持多路请求同时进行转写。
			为了支持不同用户的需求，针对不同场景，准备了不同的图文教程：

			### 最新动态
			- 2024/06/27: 中文离线文件转写服务GPU 1.0 发布，支持动态batch，支持多路并发，在长音频测试集上单线RTF为0.0076，多线加速比为1200+（CPU为330+），详见([文档](./docs/benchmark_libtorch_cpp.md))，dokcer镜像版本funasr-runtime-sdk-gpu-0.1.0 (aa10f938da3b)

			### 部署与开发文档

			部署模型来自于ModelScope，或者用户finetune，支持用户定制服务，详细文档参考（[点击此处](./docs/SDK_advanced_guide_offline_gpu_zh.md)）


			## 英文离线文件转写服务（CPU版本）

			英文离线文件转写服务部署（CPU版本），拥有完整的语音识别链路，可以将几十个小时的长音频与视频识别成带标点的文字，而且支持上百路请求同时进行转写。