From 913342bc6f90b74417ea992ffacf46294ff96d11 Mon Sep 17 00:00:00 2001 From: 游雁 <zhifu.gzf@alibaba-inc.com> Date: 星期三, 09 八月 2023 14:39:24 +0800 Subject: [PATCH] docs --- funasr/runtime/docs/SDK_advanced_guide_online.md | 217 +++++++++++------------------------------------------ funasr/runtime/docs/SDK_advanced_guide_online_zh.md | 2 2 files changed, 47 insertions(+), 172 deletions(-) diff --git a/funasr/runtime/docs/SDK_advanced_guide_online.md b/funasr/runtime/docs/SDK_advanced_guide_online.md index 7e478cb..0c46899 100644 --- a/funasr/runtime/docs/SDK_advanced_guide_online.md +++ b/funasr/runtime/docs/SDK_advanced_guide_online.md @@ -1,22 +1,22 @@ - # Advanced Development Guide (File transcription service) - -FunASR provides a Chinese online transcription service that can be deployed locally or on a cloud server with just one click. The core of the service is the FunASR runtime SDK, which has been open-sourced. FunASR-runtime combines various capabilities such as speech endpoint detection (VAD), offline large-scale speech recognition (ASR) using Paraformer-large, online large-scale speech recognition (ASR) using Paraformer-large, and punctuation detection (PUNC), which have all been open-sourced by the speech laboratory of DAMO Academy on the Modelscope community. -This document serves as a development guide for the FunASR online transcription service. If you wish to quickly experience the online transcription service, please refer to the one-click deployment example for the FunASR online transcription service [Quick Start](#Quick Start)銆� +# Real-time Speech Transcription Service Development Guide -### 闀滃儚鍚姩 +FunASR provides a real-time speech transcription service that can be easily deployed on local or cloud servers, with the FunASR runtime-SDK as the core. It integrates the speech endpoint detection (VAD), Paraformer-large non-streaming speech recognition (ASR), Paraformer-large streaming speech recognition (ASR), punctuation (PUNC), and other related capabilities open-sourced by the speech laboratory of DAMO Academy on the Modelscope community. The software package can perform real-time speech-to-text transcription, and can also accurately transcribe text at the end of sentences for high-precision output. The output text contains punctuation and supports high-concurrency multi-channel requests. -閫氳繃涓嬭堪鍛戒护鎷夊彇骞跺惎鍔‵unASR杞欢鍖呯殑docker闀滃儚锛� +## Quick Start +### Pull Docker Image + +Use the following command to pull and start the FunASR software package docker image: ```shell sudo docker pull registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-online-cpu-0.1.0 mkdir -p ./funasr-runtime-resources/models sudo docker run -p 10095:10095 -it --privileged=true -v ./funasr-runtime-resources/models:/workspace/models registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-online-cpu-0.1.0 ``` -濡傛灉鎮ㄦ病鏈夊畨瑁卍ocker锛屽彲鍙傝�僛Docker瀹夎](https://alibaba-damo-academy.github.io/FunASR/en/installation/docker_zh.html) +If you do not have Docker installed, please refer to [Docker Installation](https://alibaba-damo-academy.github.io/FunASR/en/installation/docker.html) -### 鏈嶅姟绔惎鍔� +### Launching the Server -docker鍚姩涔嬪悗锛屽惎鍔� funasr-wss-server-2pass鏈嶅姟绋嬪簭锛� +After Docker is launched, start the funasr-wss-server-2pass service program: ```shell cd FunASR/funasr/runtime ./run_server_2pass.sh \ @@ -26,190 +26,65 @@ --online-model-dir damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online-onnx \ --punc-dir damo/punc_ct-transformer_zh-cn-common-vad_realtime-vocab272727-onnx ``` -鏈嶅姟绔缁嗗弬鏁颁粙缁嶅彲鍙傝�僛鏈嶅姟绔弬鏁颁粙缁峕(#鏈嶅姟绔弬鏁颁粙缁�) -### 瀹㈡埛绔祴璇曚笌浣跨敤 +For a more detailed description of server parameters, please refer to [Server Introduction]() +### Client Testing and Usage -涓嬭浇瀹㈡埛绔祴璇曞伐鍏风洰褰晄amples +Download the client testing tool directory `samples`: ```shell wget https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/sample/funasr_samples.tar.gz ``` -鎴戜滑浠ython璇█瀹㈡埛绔负渚嬶紝杩涜璇存槑锛屾敮鎸侀煶棰戞牸寮忥紙.wav, .pcm锛夛紝浠ュ強澶氭枃浠跺垪琛╳av.scp杈撳叆锛屽叾浠栫増鏈鎴风璇峰弬鑰冩枃妗o紙[鐐瑰嚮姝ゅ](#瀹㈡埛绔敤娉曡瑙�)锛夛紝瀹氬埗鏈嶅姟閮ㄧ讲璇峰弬鑰僛濡備綍瀹氬埗鏈嶅姟閮ㄧ讲](#濡備綍瀹氬埗鏈嶅姟閮ㄧ讲) +For illustration, we will use the Python language client, which supports audio formats (.wav, .pcm) and a multi-file list wav.scp input. For other client versions, please refer to the [documentation](). + ```shell python3 wss_client_asr.py --host "127.0.0.1" --port 10095 --mode 2pass ``` +------------------ -## Quick Start +## Client Usage Details -### Server Startup +After completing the FunASR service deployment on the server, you can test and use the offline file transcription service by following these steps. Currently, the following programming language client versions are supported: -pull and run docker image: +- [Python](./SDK_tutorial_online.md#python-client) +- [CPP](./SDK_tutorial_online.md#cpp-client) +- [Html](./SDK_tutorial_online.md#html-client) +- [Java](./SDK_tutorial_online.md#java-client) +- [C\#](./SDK_tutorial_online.md#c\#) -```shell -sudo docker pull registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-online-cpu-0.1.0 -mkdir -p ./funasr-runtime-resources/models -sudo docker run -p 10095:10095 -it --privileged=true -v ./funasr-runtime-resources/models:/workspace/models registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-online-cpu-0.1.0 -``` +For more detailed usage, please click on the links above. For more client version support, please refer to [WebSocket/GRPC Protocol](./websocket_protocol_zh.md). -start funasr-wss-server-2pass锛� -```shell -cd FunASR/funasr/runtime -./run_server_2pass.sh \ - --download-model-dir /workspace/models \ - --vad-dir damo/speech_fsmn_vad_zh-cn-16k-common-onnx \ - --model-dir damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-onnx \ - --online-model-dir damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online-onnx \ - --punc-dir damo/punc_ct-transformer_zh-cn-common-vad_realtime-vocab272727-onnx -``` +## Server Introduction: - - -### Client Testing and Usage - -After running the above installation instructions, the client testing tool directory samples will be downloaded in the default installation directory /root/funasr-runtime-resources ([download click](https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/sample/funasr_samples.tar.gz)). -We take the Python language client as an example to explain that it supports multiple audio format inputs (such as .wav, .pcm, .mp3, etc.), video inputs (.mp4, etc.), and multiple file list wav.scp inputs. For other client versions, please refer to the [documentation](#Detailed-Description-of-Client-Usage). - -```shell -python3 funasr_wss_client.py --host "127.0.0.1" --port 10095 --mode 2pass --audio_in "../audio/asr_example.wav" -``` - - - - -## Installation of Docker - -The following steps are for manually installing Docker and Docker images. If your Docker image has already been launched, you can ignore this step. - -### Installation of Docker environment - -```shell -# Ubuntu锛� -curl -fsSL https://test.docker.com -o test-docker.sh -sudo sh test-docker.sh -# Debian锛� -curl -fsSL https://get.docker.com -o get-docker.sh -sudo sh get-docker.sh -# CentOS锛� -curl -fsSL https://get.docker.com | bash -s docker --mirror Aliyun -# MacOS锛� -brew install --cask --appdir=/Applications docker -``` - -More details could ref to [docs](https://alibaba-damo-academy.github.io/FunASR/en/installation/docker.html) - -### Starting Docker - -```shell -sudo systemctl start docker -``` - -### Pulling and launching images - -Use the following command to pull and launch the Docker image for the FunASR runtime-SDK: - -```shell -sudo docker pull registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-online-cpu-0.1.0 - -sudo docker run -p 10095:10095 -it --privileged=true -v /root:/workspace/models registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-online-cpu-0.1.0 -``` - -Introduction to command parameters: -```text --p <host port>:<mapped docker port>: In the example, host machine (ECS) port 10095 is mapped to port 10095 in the Docker container. Make sure that port 10095 is open in the ECS security rules. - --v <host path>:<mounted Docker path>: In the example, the host machine path /root is mounted to the Docker path /workspace/models. - -``` - - -## Starting the server - -Use the flollowing script to start the server 锛� -```shell -cd FunASR/funasr/runtime -./run_server_2pass.sh \ - --download-model-dir /workspace/models \ - --vad-dir damo/speech_fsmn_vad_zh-cn-16k-common-onnx \ - --model-dir damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-onnx \ - --online-model-dir damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online-onnx \ - --punc-dir damo/punc_ct-transformer_zh-cn-common-vad_realtime-vocab272727-onnx -``` - -More details about the script run_server_2pass.sh: - -The FunASR-wss-server-2pass supports downloading models from Modelscope. You can set the model download address (--download-model-dir, default is /workspace/models) and the model ID (--model-dir, --vad-dir, --punc-dir). Here is an example: - +funasr-wss-server-2pass supports downloading models from Modelscope or starting from a local directory path, as shown below: ```shell cd /workspace/FunASR/funasr/runtime/websocket/build/bin ./funasr-wss-server-2pass \ - --download-model-dir /workspace/models \ --decoder-thread-num 32 \ --io-thread-num 8 \ - --port 10095 \ + --port 10095 ``` -Introduction to command parameters: - +Command parameter introduction: ```text ---download-model-dir: Model download address, download models from Modelscope by setting the model ID. ---model-dir: Modelscope model ID. ---quantize: True for quantized ASR model, False for non-quantized ASR model. Default is True. ---vad-dir: Modelscope model ID. ---vad-quant: True for quantized VAD model, False for non-quantized VAD model. Default is True. ---punc-dir: Modelscope model ID. ---punc-quant: True for quantized PUNC model, False for non-quantized PUNC model. Default is True. ---port: Port number that the server listens on. Default is 10095. ---decoder-thread-num: Number of inference threads that the server starts. Default is 8. ---io-thread-num: Number of IO threads that the server starts. Default is 1. ---certfile <string>: SSL certificate file. Default is ../../../ssl_key/server.crt. ---keyfile <string>: SSL key file. Default is ../../../ssl_key/server.key. +--download-model-dir Model download address, download models from Modelscope by setting model id +--model-dir modelscope model ID +--online-model-dir modelscope model ID +--quantize True for quantized ASR models, False for non-quantized ASR models, default is True +--vad-dir modelscope model ID +--vad-quant True for quantized VAD models, False for non-quantized VAD models, default is True +--punc-dir modelscope model ID +--punc-quant True for quantized PUNC models, False for non-quantized PUNC models, default is True +--port Port number that the server should listen on, default is 10095 +--decoder-thread-num The number of inference threads the server should start, default is 8 +--io-thread-num The number of IO threads the server should start, default is 1 +--certfile SSL certificate file, the default is: ../../../ssl_key/server.crt, set to "" to disable +--keyfile SSL key file, the default is: ../../../ssl_key/server.key, set to "" to disable ``` -After executing the above command, the real-time speech recognition service will be started. If the model is specified as the model ID in ModelScope, the following model will be automatically downloaded from ModelScope: -[FSMN-VAD](https://www.modelscope.cn/models/damo/speech_fsmn_vad_zh-cn-16k-common-onnx/summary)锛� -[Paraformer-lagre-online](https://www.modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online-onnx/summary ) -[Paraformer-lagre-offline](https://www.modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-onnx/summary) -[CT-Transformer-online](https://www.modelscope.cn/models/damo/punc_ct-transformer_zh-cn-common-vad_realtime-vocab272727-onnx/summary) +After executing the above command, the real-time speech transcription service will be started. If the model is specified as a ModelScope model id, the following models will be automatically downloaded from ModelScope: +[FSMN-VAD model](https://www.modelscope.cn/models/damo/speech_fsmn_vad_zh-cn-16k-common-onnx/summary)锛� +[Paraformer-lagre online](https://www.modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online-onnx/summary ) +[Paraformer-lagre](https://www.modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-onnx/summary) +[CT-Transformer](https://www.modelscope.cn/models/damo/punc_ct-transformer_zh-cn-common-vad_realtime-vocab272727-onnx/summary) -### Exporting models from finetuned resources - -If you want to deploy a finetuned model, you can follow these steps: -Rename the model you want to deploy after finetuning (for example, 10epoch.pb) to model.pb, and replace the original model.pb in Modelscope with this one. If the path of the replaced model is /path/to/finetune/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch, set the path to model-dir. - - -## Starting the client - -After completing the deployment of FunASR online transcription service on the server, you can test and use the service by following these steps. Currently, FunASR-bin supports multiple ways to start the client. The following are command-line examples based on python-client, c++-client, and custom client Websocket communication protocol: - -### python-client -```shell -python funasr_wss_client.py --host "127.0.0.1" --port 10095 --mode 2pass --audio_in "./data/wav.scp" --send_without_sleep --output_dir "./results" -``` - -Introduction to command parameters: - -```text ---host: the IP address of the server. It can be set to 127.0.0.1 for local testing. ---port: the port number of the server listener. ---audio_in: the audio input. Input can be a path to a wav file or a wav.scp file (a Kaldi-formatted wav list in which each line includes a wav_id followed by a tab and a wav_path). ---output_dir: the path to the recognition result output. ---ssl: whether to use SSL encryption. The default is to use SSL. ---mode: offline, online, 2pass -``` - -### c++-client -```shell -. /funasr-wss-client-2pass --server-ip 127.0.0.1 --port 10095 --wav-path test.wav --thread-num 1 --is-ssl 1 -``` - -Introduction to command parameters: - -```text ---server-ip: the IP address of the server. It can be set to 127.0.0.1 for local testing. ---port: the port number of the server listener. ---wav-path: the audio input. Input can be a path to a wav file or a wav.scp file (a Kaldi-formatted wav list in which each line includes a wav_id followed by a tab and a wav_path). ---is-ssl: whether to use SSL encryption. The default is to use SSL. ---mode: offline, online, 2pass ---thread-num 1 - -``` +If you wish to deploy your fine-tuned model (e.g., 10epoch.pb), you need to manually rename the model to model.pb and replace the original model.pb in ModelScope. Then, specify the path as `model_dir`. diff --git a/funasr/runtime/docs/SDK_advanced_guide_online_zh.md b/funasr/runtime/docs/SDK_advanced_guide_online_zh.md index a56d334..f3b2e72 100644 --- a/funasr/runtime/docs/SDK_advanced_guide_online_zh.md +++ b/funasr/runtime/docs/SDK_advanced_guide_online_zh.md @@ -36,7 +36,7 @@ ```shell wget https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/sample/funasr_samples.tar.gz ``` -鎴戜滑浠ython璇█瀹㈡埛绔负渚嬶紝杩涜璇存槑锛屾敮鎸侀煶棰戞牸寮忥紙.wav, .pcm锛夛紝浠ュ強澶氭枃浠跺垪琛╳av.scp杈撳叆锛屽叾浠栫増鏈鎴风璇峰弬鑰冩枃妗o紙[鐐瑰嚮姝ゅ](#瀹㈡埛绔敤娉曡瑙�)锛夛紝瀹氬埗鏈嶅姟閮ㄧ讲璇峰弬鑰僛濡備綍瀹氬埗鏈嶅姟閮ㄧ讲](#濡備綍瀹氬埗鏈嶅姟閮ㄧ讲) +鎴戜滑浠ython璇█瀹㈡埛绔负渚嬶紝杩涜璇存槑锛屾敮鎸侀煶棰戞牸寮忥紙.wav, .pcm锛夛紝浠ュ強澶氭枃浠跺垪琛╳av.scp杈撳叆锛屽叾浠栫増鏈鎴风璇峰弬鑰冩枃妗o紙[鐐瑰嚮姝ゅ](#瀹㈡埛绔敤娉曡瑙�)锛夈�� ```shell python3 wss_client_asr.py --host "127.0.0.1" --port 10095 --mode 2pass ``` -- Gitblit v1.9.1