python/FunASR-XL.git

			@@ -1,8 +1,80 @@
			# Advanced Development Guide (File transcription service)

			FunASR provides a Chinese offline file transcription service that can be deployed locally or on a cloud server with just one click. The core of the service is the FunASR runtime SDK, which has been open-sourced. FunASR-runtime combines various capabilities such as speech endpoint detection (VAD), large-scale speech recognition (ASR) using Paraformer-large, and punctuation detection (PUNC), which have all been open-sourced by the speech laboratory of DAMO Academy on the Modelscope community. This enables accurate and efficient high-concurrency transcription of audio files.
			FunASR provides a Chinese online transcription service that can be deployed locally or on a cloud server with just one click. The core of the service is the FunASR runtime SDK, which has been open-sourced. FunASR-runtime combines various capabilities such as speech endpoint detection (VAD), offline large-scale speech recognition (ASR) using Paraformer-large, online large-scale speech recognition (ASR) using Paraformer-large, and punctuation detection (PUNC), which have all been open-sourced by the speech laboratory of DAMO Academy on the Modelscope community.
			This document serves as a development guide for the FunASR online transcription service. If you wish to quickly experience the online transcription service, please refer to the one-click deployment example for the FunASR online transcription service [Quick Start](#Quick Start)。

			This document serves as a development guide for the FunASR offline file transcription service. If you wish to quickly experience the offline file transcription service, please refer to the one-click deployment example for the FunASR offline file transcription service ([docs](./SDK_tutorial.md)).
			### 镜像启动

			通过下述命令拉取并启动FunASR软件包的docker镜像：

			```shell
			sudo docker pull registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-online-cpu-0.1.0
			mkdir -p ./funasr-runtime-resources/models
			sudo docker run -p 10095:10095 -it --privileged=true -v ./funasr-runtime-resources/models:/workspace/models registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-online-cpu-0.1.0
			```
			如果您没有安装docker，可参考[Docker安装](https://alibaba-damo-academy.github.io/FunASR/en/installation/docker_zh.html)

			### 服务端启动

			docker启动之后，启动 funasr-wss-server-2pass服务程序：
			```shell
			cd FunASR/funasr/runtime
			./run_server_2pass.sh \
			--download-model-dir /workspace/models \
			--vad-dir damo/speech_fsmn_vad_zh-cn-16k-common-onnx \
			--model-dir damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-onnx \
			--online-model-dir damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online-onnx \
			--punc-dir damo/punc_ct-transformer_zh-cn-common-vad_realtime-vocab272727-onnx
			```
			服务端详细参数介绍可参考[服务端参数介绍](#服务端参数介绍)
			### 客户端测试与使用

			下载客户端测试工具目录samples
			```shell
			wget https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/sample/funasr_samples.tar.gz
			```
			我们以Python语言客户端为例，进行说明，支持音频格式（.wav, .pcm），以及多文件列表wav.scp输入，其他版本客户端请参考文档（[点击此处](#客户端用法详解)），定制服务部署请参考[如何定制服务部署](#如何定制服务部署)
			```shell
			python3 wss_client_asr.py --host "127.0.0.1" --port 10095 --mode 2pass
			```


			## Quick Start

			### Server Startup

			pull and run docker image:

			```shell
			sudo docker pull registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-online-cpu-0.1.0
			mkdir -p ./funasr-runtime-resources/models
			sudo docker run -p 10095:10095 -it --privileged=true -v ./funasr-runtime-resources/models:/workspace/models registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-online-cpu-0.1.0
			```

			start funasr-wss-server-2pass：
			```shell
			cd FunASR/funasr/runtime
			./run_server_2pass.sh \
			--download-model-dir /workspace/models \
			--vad-dir damo/speech_fsmn_vad_zh-cn-16k-common-onnx \
			--model-dir damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-onnx \
			--online-model-dir damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online-onnx \
			--punc-dir damo/punc_ct-transformer_zh-cn-common-vad_realtime-vocab272727-onnx
			```



			### Client Testing and Usage

			After running the above installation instructions, the client testing tool directory samples will be downloaded in the default installation directory /root/funasr-runtime-resources ([download click](https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/sample/funasr_samples.tar.gz)).
			We take the Python language client as an example to explain that it supports multiple audio format inputs (such as .wav, .pcm, .mp3, etc.), video inputs (.mp4, etc.), and multiple file list wav.scp inputs. For other client versions, please refer to the [documentation](#Detailed-Description-of-Client-Usage).

			```shell
			python3 funasr_wss_client.py --host "127.0.0.1" --port 10095 --mode 2pass --audio_in "../audio/asr_example.wav"
			```




			## Installation of Docker

			@@ -36,9 +108,9 @@
			Use the following command to pull and launch the Docker image for the FunASR runtime-SDK:

			```shell
			sudo docker pull registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-cpu-latest
			sudo docker pull registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-online-cpu-0.1.0

			sudo docker run -p 10095:10095 -it --privileged=true -v /root:/workspace/models registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-cpu-latest
			sudo docker run -p 10095:10095 -it --privileged=true -v /root:/workspace/models registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-online-cpu-0.1.0
			```

			Introduction to command parameters:
			@@ -54,27 +126,26 @@

			Use the flollowing script to start the server ：
			```shell
			./run_server.sh --vad-dir damo/speech_fsmn_vad_zh-cn-16k-common-onnx \
			cd FunASR/funasr/runtime
			./run_server_2pass.sh \
			--download-model-dir /workspace/models \
			--vad-dir damo/speech_fsmn_vad_zh-cn-16k-common-onnx \
			--model-dir damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-onnx \
			--punc-dir damo/punc_ct-transformer_zh-cn-common-vocab272727-onnx
			--online-model-dir damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online-onnx \
			--punc-dir damo/punc_ct-transformer_zh-cn-common-vad_realtime-vocab272727-onnx
			```

			More details about the script run_server.sh:
			More details about the script run_server_2pass.sh:

			The FunASR-wss-server supports downloading models from Modelscope. You can set the model download address (--download-model-dir, default is /workspace/models) and the model ID (--model-dir, --vad-dir, --punc-dir). Here is an example:
			The FunASR-wss-server-2pass supports downloading models from Modelscope. You can set the model download address (--download-model-dir, default is /workspace/models) and the model ID (--model-dir, --vad-dir, --punc-dir). Here is an example:

			```shell
			cd /workspace/FunASR/funasr/runtime/websocket/build/bin
			./funasr-wss-server \
			./funasr-wss-server-2pass \
			--download-model-dir /workspace/models \
			--model-dir damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-onnx \
			--vad-dir damo/speech_fsmn_vad_zh-cn-16k-common-onnx \
			--punc-dir damo/punc_ct-transformer_zh-cn-common-vocab272727-onnx \
			--decoder-thread-num 32 \
			--io-thread-num 8 \
			--port 10095 \
			--certfile ../../../ssl_key/server.crt \
			--keyfile ../../../ssl_key/server.key
			```

			Introduction to command parameters:
			@@ -94,82 +165,25 @@
			--keyfile <string>: SSL key file. Default is ../../../ssl_key/server.key.
			```

			The FunASR-wss-server also supports loading models from a local path (see Preparing Model Resources for detailed instructions on preparing local model resources). Here is an example:

			```shell
			cd /workspace/FunASR/funasr/runtime/websocket/build/bin
			./funasr-wss-server \
			--model-dir /workspace/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-onnx \
			--vad-dir /workspace/models/damo/speech_fsmn_vad_zh-cn-16k-common-onnx \
			--punc-dir /workspace/models/damo/punc_ct-transformer_zh-cn-common-vocab272727-onnx \
			--decoder-thread-num 32 \
			--io-thread-num 8 \
			--port 10095 \
			--certfile ../../../ssl_key/server.crt \
			--keyfile ../../../ssl_key/server.key
			```


			## Preparing Model Resources

			If you choose to download models from Modelscope through the FunASR-wss-server, you can skip this step. The vad, asr, and punc model resources in the offline file transcription service of FunASR are all from Modelscope. The model addresses are shown in the table below:

			\| Model \| Modelscope url \|
			\|-------\|------------------------------------------------------------------------------------------------------------------\|
			\| VAD \| https://www.modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary \|
			\| ASR \| https://www.modelscope.cn/models/damo/speech_fsmn_vad_zh-cn-16k-common-pytorch/summary \|
			\| PUNC \| https://www.modelscope.cn/models/damo/punc_ct-transformer_zh-cn-common-vocab272727-pytorch/summary \|

			The offline file transcription service deploys quantized ONNX models. Below are instructions on how to export ONNX models and their quantization. You can choose to export ONNX models from Modelscope, local files, or finetuned resources:

			### Exporting ONNX models from Modelscope

			Download the corresponding model with the given model name from the Modelscope website, and then export the quantized ONNX model

			```shell
			python -m funasr.export.export_model \
			--export-dir ./export \
			--type onnx \
			--quantize True \
			--model-name damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch \
			--model-name damo/speech_fsmn_vad_zh-cn-16k-common-pytorch \
			--model-name damo/punc_ct-transformer_zh-cn-common-vocab272727-pytorch
			```

			Introduction to command parameters:

			```text
			--model-name: The name of the model on Modelscope, for example: damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch
			--export-dir: The export directory of ONNX model.
			--type: Model type, currently supports ONNX and torch.
			--quantize: Quantize the int8 model.
			```

			### Exporting ONNX models from local files

			Set the model name to the local path of the model, and export the quantized ONNX model:

			```shell
			python -m funasr.export.export_model --model-name /workspace/models/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch --export-dir ./export --type onnx --quantize True
			```

			After executing the above command, the real-time speech recognition service will be started. If the model is specified as the model ID in ModelScope, the following model will be automatically downloaded from ModelScope:
			[FSMN-VAD](https://www.modelscope.cn/models/damo/speech_fsmn_vad_zh-cn-16k-common-onnx/summary)，
			[Paraformer-lagre-online](https://www.modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online-onnx/summary )
			[Paraformer-lagre-offline](https://www.modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-onnx/summary)
			[CT-Transformer-online](https://www.modelscope.cn/models/damo/punc_ct-transformer_zh-cn-common-vad_realtime-vocab272727-onnx/summary)

			### Exporting models from finetuned resources

			If you want to deploy a finetuned model, you can follow these steps:
			Rename the model you want to deploy after finetuning (for example, 10epoch.pb) to model.pb, and replace the original model.pb in Modelscope with this one. If the path of the replaced model is /path/to/finetune/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch, use the following command to convert the finetuned model to an ONNX model:
			Rename the model you want to deploy after finetuning (for example, 10epoch.pb) to model.pb, and replace the original model.pb in Modelscope with this one. If the path of the replaced model is /path/to/finetune/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch, set the path to model-dir.

			```shell
			python -m funasr.export.export_model --model-name /path/to/finetune/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch --export-dir ./export --type onnx --quantize True
			```

			## Starting the client

			After completing the deployment of FunASR offline file transcription service on the server, you can test and use the service by following these steps. Currently, FunASR-bin supports multiple ways to start the client. The following are command-line examples based on python-client, c++-client, and custom client Websocket communication protocol:
			After completing the deployment of FunASR online transcription service on the server, you can test and use the service by following these steps. Currently, FunASR-bin supports multiple ways to start the client. The following are command-line examples based on python-client, c++-client, and custom client Websocket communication protocol:

			### python-client
			```shell
			python funasr_wss_client.py --host "127.0.0.1" --port 10095 --mode offline --audio_in "./data/wav.scp" --send_without_sleep --output_dir "./results"
			python funasr_wss_client.py --host "127.0.0.1" --port 10095 --mode 2pass --audio_in "./data/wav.scp" --send_without_sleep --output_dir "./results"
			```

			Introduction to command parameters:
			@@ -180,80 +194,22 @@
			--audio_in: the audio input. Input can be a path to a wav file or a wav.scp file (a Kaldi-formatted wav list in which each line includes a wav_id followed by a tab and a wav_path).
			--output_dir: the path to the recognition result output.
			--ssl: whether to use SSL encryption. The default is to use SSL.
			--mode: offline mode.
			--mode: offline, online, 2pass
			```

			### c++-client
			```shell
			. /funasr-wss-client --server-ip 127.0.0.1 --port 10095 --wav-path test.wav --thread-num 1 --is-ssl 1
			. /funasr-wss-client-2pass --server-ip 127.0.0.1 --port 10095 --wav-path test.wav --thread-num 1 --is-ssl 1
			```

			Introduction to command parameters:

			```text
			--host: the IP address of the server. It can be set to 127.0.0.1 for local testing.
			--server-ip: the IP address of the server. It can be set to 127.0.0.1 for local testing.
			--port: the port number of the server listener.
			--audio_in: the audio input. Input can be a path to a wav file or a wav.scp file (a Kaldi-formatted wav list in which each line includes a wav_id followed by a tab and a wav_path).
			--output_dir: the path to the recognition result output.
			--ssl: whether to use SSL encryption. The default is to use SSL.
			--mode: offline mode.
			--wav-path: the audio input. Input can be a path to a wav file or a wav.scp file (a Kaldi-formatted wav list in which each line includes a wav_id followed by a tab and a wav_path).
			--is-ssl: whether to use SSL encryption. The default is to use SSL.
			--mode: offline, online, 2pass
			--thread-num 1

			```

			### Custom client

			If you want to define your own client, the Websocket communication protocol is as follows:

			```text
			# First communication
			{"mode": "offline", "wav_name": wav_name, "is_speaking": True}
			# Send wav data
			Bytes data
			# Send end flag
			{"is_speaking": False}
			```

			## How to customize service deployment

			The code for FunASR-runtime is open source. If the server and client cannot fully meet your needs, you can further develop them based on your own requirements:

			### C++ client

			https://github.com/alibaba-damo-academy/FunASR/tree/main/funasr/runtime/websocket

			### Python client

			https://github.com/alibaba-damo-academy/FunASR/tree/main/funasr/runtime/python/websocket

			### C++ server

			#### VAD
			```c++
			// The use of the VAD model consists of two steps: FsmnVadInit and FsmnVadInfer:
			FUNASR_HANDLE vad_hanlde=FsmnVadInit(model_path, thread_num);
			// Where: model_path contains "model-dir" and "quantize", thread_num is the ONNX thread count;
			FUNASR_RESULT result=FsmnVadInfer(vad_hanlde, wav_file.c_str(), NULL, 16000);
			// Where: vad_hanlde is the return value of FunOfflineInit, wav_file is the path to the audio file, and sampling_rate is the sampling rate (default 16k).
			```

			See the usage example for details [docs](https://github.com/alibaba-damo-academy/FunASR/blob/main/funasr/runtime/onnxruntime/bin/funasr-onnx-offline-vad.cpp)

			#### ASR
			```text
			// The use of the ASR model consists of two steps: FunOfflineInit and FunOfflineInfer:
			FUNASR_HANDLE asr_hanlde=FunOfflineInit(model_path, thread_num);
			// Where: model_path contains "model-dir" and "quantize", thread_num is the ONNX thread count;
			FUNASR_RESULT result=FunOfflineInfer(asr_hanlde, wav_file.c_str(), RASR_NONE, NULL, 16000);
			// Where: asr_hanlde is the return value of FunOfflineInit, wav_file is the path to the audio file, and sampling_rate is the sampling rate (default 16k).
			```

			See the usage example for details, [docs](https://github.com/alibaba-damo-academy/FunASR/blob/main/funasr/runtime/onnxruntime/bin/funasr-onnx-offline.cpp)

			#### PUNC
			```text
			// The use of the PUNC model consists of two steps: CTTransformerInit and CTTransformerInfer:
			FUNASR_HANDLE punc_hanlde=CTTransformerInit(model_path, thread_num);
			// Where: model_path contains "model-dir" and "quantize", thread_num is the ONNX thread count;
			FUNASR_RESULT result=CTTransformerInfer(punc_hanlde, txt_str.c_str(), RASR_NONE, NULL);
			// Where: punc_hanlde is the return value of CTTransformerInit, txt_str is the text
			```
			See the usage example for details, [docs](https://github.com/alibaba-damo-academy/FunASR/blob/main/funasr/runtime/onnxruntime/bin/funasr-onnx-offline-punc.cpp)