From 5853ebc98f51c79d0ae2955cefe1457cba78efe4 Mon Sep 17 00:00:00 2001 From: Yabin Li <wucong.lyb@alibaba-inc.com> Date: 星期四, 27 六月 2024 17:38:19 +0800 Subject: [PATCH] Merge Dev blade (#1856) --- runtime/docs/SDK_advanced_guide_offline_gpu_zh.md | 209 ++++++++++++++++++++++++++ runtime/readme_cn.md | 15 + runtime/docs/SDK_advanced_guide_offline_gpu.md | 173 +++++++++++++++++++++ README_zh.md | 1 runtime/docs/benchmark_libtorch_cpp.md | 31 +++ runtime/readme.md | 16 + README.md | 1 7 files changed, 444 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 835eed4..a8b15e5 100644 --- a/README.md +++ b/README.md @@ -29,6 +29,7 @@ <a name="whats-new"></a> ## What's new: +- 2024/06/27: Offline File Transcription Service GPU 1.0 released, supporting dynamic batch processing and multi-threading concurrency. In the long audio test set, the single-thread RTF is 0.0076, and multi-threads' speedup is 1200+ (compared to 330+ on CPU); ref to ([docs](runtime/readme.md)) - 2024/05/15锛歟motion recognition models are new supported. [emotion2vec+large](https://modelscope.cn/models/iic/emotion2vec_plus_large/summary)锛孾emotion2vec+base](https://modelscope.cn/models/iic/emotion2vec_plus_base/summary)锛孾emotion2vec+seed](https://modelscope.cn/models/iic/emotion2vec_plus_seed/summary). currently supports the following categories: 0: angry 1: happy 2: neutral 3: sad 4: unknown. - 2024/05/15: Offline File Transcription Service 4.5, Offline File Transcription Service of English 1.6锛孯eal-time Transcription Service 1.10 released锛宎dapting to FunASR 1.0 model structure锛�([docs](runtime/readme.md)) - 2024/03/05锛欰dded the Qwen-Audio and Qwen-Audio-Chat large-scale audio-text multimodal models, which have topped multiple audio domain leaderboards. These models support speech dialogue, [usage](examples/industrial_data_pretraining/qwen_audio). diff --git a/README_zh.md b/README_zh.md index 43db23b..169face 100644 --- a/README_zh.md +++ b/README_zh.md @@ -33,6 +33,7 @@ <a name="鏈�鏂板姩鎬�"></a> ## 鏈�鏂板姩鎬� +- 2024/06/27锛氫腑鏂囩绾挎枃浠惰浆鍐欐湇鍔PU鐗堟湰 1.0鍙戝竷锛屾敮鎸佸姩鎬乥atch锛屾敮鎸佸璺苟鍙戯紝鍦ㄩ暱闊抽娴嬭瘯闆嗕笂鍗曠嚎RTF涓�0.0076锛屽绾垮姞閫熸瘮涓�1200+锛圕PU涓�330+锛夛紱璇︾粏淇℃伅鍙傞槄([閮ㄧ讲鏂囨。](runtime/readme_cn.md)) - 2024/05/15锛氭柊澧炲姞鎯呮劅璇嗗埆妯″瀷锛孾emotion2vec+large](https://modelscope.cn/models/iic/emotion2vec_plus_large/summary)锛孾emotion2vec+base](https://modelscope.cn/models/iic/emotion2vec_plus_base/summary)锛孾emotion2vec+seed](https://modelscope.cn/models/iic/emotion2vec_plus_seed/summary)锛岃緭鍑烘儏鎰熺被鍒负锛氱敓姘�/angry锛屽紑蹇�/happy锛屼腑绔�/neutral锛岄毦杩�/sad銆� - 2024/05/15: 涓枃绂荤嚎鏂囦欢杞啓鏈嶅姟 4.5銆佽嫳鏂囩绾挎枃浠惰浆鍐欐湇鍔� 1.6銆佷腑鏂囧疄鏃惰闊冲惉鍐欐湇鍔� 1.10 鍙戝竷锛岄�傞厤FunASR 1.0妯″瀷缁撴瀯锛涜缁嗕俊鎭弬闃�([閮ㄧ讲鏂囨。](runtime/readme_cn.md)) - 2024/03/05锛氭柊澧炲姞Qwen-Audio涓嶲wen-Audio-Chat闊抽鏂囨湰妯℃�佸ぇ妯″瀷锛屽湪澶氫釜闊抽棰嗗煙娴嬭瘯姒滃崟鍒锋锛屼腑鏀寔璇煶瀵硅瘽锛岃缁嗙敤娉曡 [绀轰緥](examples/industrial_data_pretraining/qwen_audio)銆� diff --git a/runtime/docs/SDK_advanced_guide_offline_gpu.md b/runtime/docs/SDK_advanced_guide_offline_gpu.md new file mode 100644 index 0000000..a33715c --- /dev/null +++ b/runtime/docs/SDK_advanced_guide_offline_gpu.md @@ -0,0 +1,173 @@ + # Advanced Development Guide (File transcription service GPU) + +([绠�浣撲腑鏂嘳(SDK_advanced_guide_offline_gpu_zh.md)|English) + +[//]: # (FunASR provides a Chinese offline file transcription service that can be deployed locally or on a cloud server with just one click. The core of the service is the FunASR runtime SDK, which has been open-sourced. FunASR-runtime combines various capabilities such as speech endpoint detection (VAD), large-scale speech recognition (ASR) using Paraformer-large, and punctuation detection (PUNC), which have all been open-sourced by the speech laboratory of DAMO Academy on the Modelscope community. This enables accurate and efficient high-concurrency transcription of audio files.) +FunASR Offline File Transcription Software Package(GPU) provides a powerful speech-to-text offline file transcription service. With a complete speech recognition pipeline, it combines models for speech endpoint detection, speech recognition, punctuation, etc., allowing for the transcription of long audio and video files, spanning several hours, into punctuated text. It supports simultaneous transcription of hundreds of concurrent requests. The output is text with punctuation, including word-level timestamps, and it supports ITN (Initial Time Normalization) and user-defined hotwords. The server-side integration includes ffmpeg, enabling support for various audio and video formats as input. The software package provides client libraries in multiple programming languages such as HTML, Python, C++, Java, and C#, allowing users to use and further develop the software. + +This document serves as a development guide for the FunASR offline file transcription service. If you wish to quickly experience the offline file transcription service, please refer to the one-click deployment example for the FunASR offline file transcription service ([docs](./SDK_tutorial.md)). + +<img src="images/offline_structure.jpg" width="900"/> + + +| TIME | INFO | IMAGE VERSION | IMAGE ID | +|------------|----------------------------------------------------------------------------------------------------------------------------------|------------------------------|--------------| +| 2024.06.27 | Offline File Transcription Software Package(GPU) 1.0 released | funasr-runtime-sdk-gpu-0.1.0 | aa10f938da3b | + + +## Quick start +### Docker install +If you have already installed Docker, ignore this step! +```shell +curl -O https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/shell/install_docker.sh; +sudo bash install_docker.sh +``` +If you do not have Docker installed, please refer to [Docker Installation](https://alibaba-damo-academy.github.io/FunASR/en/installation/docker.html) + +### Pulling and launching images +Use the following command to pull and launch the Docker image for the FunASR runtime-SDK: +```shell +sudo docker pull registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-gpu-0.1.0 + +sudo docker run --gpus=all -p 10098:10095 -it --privileged=true -v /root:/workspace/models registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-gpu-0.1.0 +``` + +Introduction to command parameters: +```text +-p <host port>:<mapped docker port>: In the example, host machine (ECS) port 10098 is mapped to port 10095 in the Docker container. Make sure that port 10098 is open in the ECS security rules. + +-v <host path>:<mounted Docker path>: In the example, the host machine path /root is mounted to the Docker path /workspace/models. +``` + +### Starting the server +Use the flollowing script to start the server 锛� +```shell +nohup bash run_server.sh \ + --download-model-dir /workspace/models \ + --vad-dir damo/speech_fsmn_vad_zh-cn-16k-common-onnx \ + --model-dir damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-torchscript \ + --punc-dir damo/punc_ct-transformer_cn-en-common-vocab471067-large-onnx \ + --lm-dir damo/speech_ngram_lm_zh-cn-ai-wesp-fst \ + --itn-dir thuduj12/fst_itn_zh \ + --hotword /workspace/models/hotwords.txt > log.txt 2>&1 & + +# If you want to close ssl锛宲lease add锛�--certfile 0 +# If you want to deploy the timestamp or nn hotword model, please set --model-dir to the corresponding model: +# damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-torchscript锛坱imestamp锛� +# damo/speech_paraformer-large-contextual_asr_nat-zh-cn-16k-common-vocab8404-torchscript锛坔otword锛� +# If you want to load hotwords on the server side, please configure the hotwords in the host machine file ./funasr-runtime-resources/models/hotwords.txt (docker mapping address: /workspace/models/hotwords.txt): +# One hotword per line, format (hotword weight): 闃块噷宸村反 20" +``` + +### More details about the script run_server.sh: + +The funasr-wss-server supports downloading models from Modelscope. You can set the model download address (--download-model-dir, default is /workspace/models) and the model ID (--model-dir, --vad-dir, --punc-dir). Here is an example: + +```shell +cd /workspace/FunASR/runtime +nohup bash run_server.sh \ + --download-model-dir /workspace/models \ + --model-dir damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-torchscript \ + --vad-dir damo/speech_fsmn_vad_zh-cn-16k-common-onnx \ + --punc-dir damo/punc_ct-transformer_cn-en-common-vocab471067-large-onnx \ + --itn-dir thuduj12/fst_itn_zh \ + --lm-dir damo/speech_ngram_lm_zh-cn-ai-wesp-fst \ + --certfile ../../../ssl_key/server.crt \ + --keyfile ../../../ssl_key/server.key \ + --hotword ../../hotwords.txt > log.txt 2>&1 & + ``` + +Introduction to run_server.sh parameters: +```text +--download-model-dir: Model download address, download models from Modelscope by setting the model ID. +--model-dir: modelscope model ID or local model path. +--vad-dir: modelscope model ID or local model path. +--punc-dir: modelscope model ID or local model path. +--itn-dir modelscope model ID or local model path. +--port: Port number that the server listens on. Default is 10095. +--decoder-thread-num: The number of thread pools on the server side that can handle concurrent requests. +--io-thread-num: Number of IO threads that the server starts. +--model-thread-num: The number of internal threads for each recognition route to control the parallelism of the ONNX model. + The default value is 1. It is recommended that decoder-thread-num * model-thread-num equals the total number of threads. +--certfile <string>: SSL certificate file. Default is ../../../ssl_key/server.crt. If you want to close ssl锛宻et 0 +--keyfile <string>: SSL key file. Default is ../../../ssl_key/server.key. +--hotword: Hotword file path, one line for each hotword(e.g.:闃块噷宸村反 20), if the client provides hot words, then combined with the hot words provided by the client. +``` + +### Shutting Down the FunASR Service +```text +# Check the PID of the funasr-wss-server process +ps -x | grep funasr-wss-server +kill -9 PID +``` + +### Modifying Models and Other Parameters +To replace the currently used model or other parameters, you need to first shut down the FunASR service, make the necessary modifications to the parameters you want to replace, and then restart the FunASR service. The model should be either an ASR/VAD/PUNC model from ModelScope or a fine-tuned model obtained from ModelScope. +```text +# For example, to replace the ASR model with damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-torchscript, use the following parameter setting --model-dir + --model-dir damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-torchscript +# Set the port number using --port + --port <port number> +# Set the number of inference threads the server will start using --decoder-thread-num + --decoder-thread-num <decoder thread num> +# Set the number of IO threads the server will start using --io-thread-num + --io-thread-num <io thread num> +# Disable SSL certificate + --certfile 0 +``` + +After executing the above command, the real-time speech transcription service will be started. If the model is specified as a ModelScope model id, the following models will be automatically downloaded from ModelScope: +[FSMN-VAD](https://www.modelscope.cn/models/damo/speech_fsmn_vad_zh-cn-16k-common-onnx/summary), +[Paraformer-lagre](https://www.modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-torchscript/summary), +[CT-Transformer](https://www.modelscope.cn/models/damo/punc_ct-transformer_cn-en-common-vocab471067-large-onnx/summary), +[FST-ITN](https://www.modelscope.cn/models/thuduj12/fst_itn_zh/summary), +[Ngram lm](https://www.modelscope.cn/models/damo/speech_ngram_lm_zh-cn-ai-wesp-fst/summary) + +If you wish to deploy your fine-tuned model (e.g., 10epoch.pb), you need to manually rename the model to model.pb and replace the original model.pb in ModelScope. Then, specify the path as `model_dir`. + +## Starting the client +After completing the deployment of FunASR offline file transcription service on the server, you can test and use the service by following these steps. Currently, FunASR-bin supports multiple ways to start the client. The following are command-line examples based on python-client, c++-client, and custom client Websocket communication protocol: + +### python-client +```shell +python funasr_wss_client.py --host "127.0.0.1" --port 10095 --mode offline --audio_in "./data/wav.scp" --send_without_sleep --output_dir "./results" +``` +Introduction to command parameters: +```text +--host: the IP address of the server. It can be set to 127.0.0.1 for local testing. +--port: the port number of the server listener. +--audio_in: the audio input. Input can be a path to a wav file or a wav.scp file (a Kaldi-formatted wav list in which each line includes a wav_id followed by a tab and a wav_path). +--output_dir: the path to the recognition result output. +--ssl: whether to use SSL encryption. The default is to use SSL. +--mode: offline mode. +--hotword: Hotword file path, one line for each hotword(e.g.:闃块噷宸村反 20) +--use_itn: whether to use itn, the default value is 1 for enabling and 0 for disabling. +``` + +### c++-client +```shell +. /funasr-wss-client --server-ip 127.0.0.1 --port 10095 --wav-path test.wav --thread-num 1 --is-ssl 1 +``` + +Introduction to command parameters: +```text +--server-ip: the IP address of the server. It can be set to 127.0.0.1 for local testing. +--port: the port number of the server listener. +--wav-path: the audio input. Input can be a path to a wav file or a wav.scp file (a Kaldi-formatted wav list in which each line includes a wav_id followed by a tab and a wav_path). +--is-ssl: whether to use SSL encryption. The default is to use SSL. +--hotword: Hotword file path, one line for each hotword(e.g.:闃块噷宸村反 20) +--use-itn: whether to use itn, the default value is 1 for enabling and 0 for disabling. +``` + +### Custom client +If you want to define your own client, see the [Websocket communication protocol](./websocket_protocol.md) + +## How to customize service deployment +The code for FunASR-runtime is open source. If the server and client cannot fully meet your needs, you can further develop them based on your own requirements: + +### C++ client +https://github.com/alibaba-damo-academy/FunASR/tree/main/runtime/websocket + +### Python client +https://github.com/alibaba-damo-academy/FunASR/tree/main/runtime/python/websocket + diff --git a/runtime/docs/SDK_advanced_guide_offline_gpu_zh.md b/runtime/docs/SDK_advanced_guide_offline_gpu_zh.md new file mode 100644 index 0000000..3416117 --- /dev/null +++ b/runtime/docs/SDK_advanced_guide_offline_gpu_zh.md @@ -0,0 +1,209 @@ +# FunASR绂荤嚎鏂囦欢杞啓鏈嶅姟GPU鐗堟湰寮�鍙戞寚鍗� + +(绠�浣撲腑鏂噟[English](SDK_advanced_guide_offline_gpu.md)) + +FunASR绂荤嚎鏂囦欢杞啓GPU杞欢鍖咃紝鎻愪緵浜嗕竴娆惧姛鑳藉己澶х殑璇煶绂荤嚎鏂囦欢杞啓鏈嶅姟銆傛嫢鏈夊畬鏁寸殑璇煶璇嗗埆閾捐矾锛岀粨鍚堜簡璇煶绔偣妫�娴嬨�佽闊宠瘑鍒�佹爣鐐圭瓑妯″瀷锛屽彲浠ュ皢鍑犲崄涓皬鏃剁殑闀块煶棰戜笌瑙嗛璇嗗埆鎴愬甫鏍囩偣鐨勬枃瀛楋紝鑰屼笖鏀寔涓婄櫨璺姹傚悓鏃惰繘琛岃浆鍐欍�傝緭鍑轰负甯︽爣鐐圭殑鏂囧瓧锛屽惈鏈夊瓧绾у埆鏃堕棿鎴筹紝鏀寔ITN涓庣敤鎴疯嚜瀹氫箟鐑瘝绛夈�傛湇鍔$闆嗘垚鏈塮fmpeg锛屾敮鎸佸悇绉嶉煶瑙嗛鏍煎紡杈撳叆銆傝蒋浠跺寘鎻愪緵鏈塰tml銆乸ython銆乧++銆乯ava涓巆#绛夊绉嶇紪绋嬭瑷�瀹㈡埛绔紝鐢ㄦ埛鍙互鐩存帴浣跨敤涓庤繘涓�姝ュ紑鍙戙�� + +鏈枃妗d负FunASR绂荤嚎鏂囦欢杞啓鏈嶅姟GPU鐗堟湰寮�鍙戞寚鍗椼�傚鏋滄偍鎯冲揩閫熶綋楠岀绾挎枃浠惰浆鍐欐湇鍔★紝鍙弬鑰僛蹇�熶笂鎵媇(#蹇�熶笂鎵�)銆� + +<img src="images/offline_structure.jpg" width="900"/> + +| 鏃堕棿 | 璇︽儏 | 闀滃儚鐗堟湰 | 闀滃儚ID | +|------------|---------------------------------------------------|------------------------------|--------------| +| 2024.06.27 | 绂荤嚎鏂囦欢杞啓鏈嶅姟GPU鐗堟湰1.0 鍙戝竷 | funasr-runtime-sdk-gpu-0.1.0 | aa10f938da3b | + +## 鏈嶅姟鍣ㄩ厤缃� + +鐢ㄦ埛鍙互鏍规嵁鑷繁鐨勪笟鍔¢渶姹傦紝閫夋嫨鍚堥�傜殑鏈嶅姟鍣ㄩ厤缃紝鎺ㄨ崘閰嶇疆涓猴細 +- 閰嶇疆1: 锛圙PU锛夛紝8鏍竩CPU锛屽唴瀛�32G锛孷100锛屽崟鏈哄彲浠ユ敮鎸佸ぇ绾�20璺殑璇锋眰 + +璇︾粏鎬ц兘娴嬭瘯鎶ュ憡锛圼鐐瑰嚮姝ゅ](./benchmark_onnx_cpp.md)锛� + +浜戞湇鍔″巶鍟嗭紝閽堝鏂扮敤鎴凤紝鏈�3涓湀鍏嶈垂璇曠敤娲诲姩锛岀敵璇锋暀绋嬶紙[鐐瑰嚮姝ゅ](https://github.com/alibaba-damo-academy/FunASR/blob/main/runtime/docs/aliyun_server_tutorial.md)锛� + + +## 蹇�熶笂鎵� + +### docker瀹夎 +濡傛灉鎮ㄥ凡瀹夎docker锛屽拷鐣ユ湰姝ラ锛�! +閫氳繃涓嬭堪鍛戒护鍦ㄦ湇鍔″櫒涓婂畨瑁卍ocker锛� +```shell +curl -O https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/shell/install_docker.sh锛� +sudo bash install_docker.sh +``` +docker瀹夎澶辫触璇峰弬鑰� [Docker Installation](https://alibaba-damo-academy.github.io/FunASR/en/installation/docker.html) + +### 闀滃儚鍚姩 + +閫氳繃涓嬭堪鍛戒护鎷夊彇骞跺惎鍔‵unASR杞欢鍖呯殑docker闀滃儚锛� + +```shell +sudo docker pull \ + registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-gpu-0.1.0 +mkdir -p ./funasr-runtime-resources/models +sudo docker run --gpus=all -p 10098:10095 -it --privileged=true \ + -v $PWD/funasr-runtime-resources/models:/workspace/models \ + registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-gpu-0.1.0 +``` + +### 鏈嶅姟绔惎鍔� + +docker鍚姩涔嬪悗锛屽惎鍔� funasr-wss-server鏈嶅姟绋嬪簭: +```shell +cd FunASR/runtime +nohup bash run_server.sh \ + --download-model-dir /workspace/models \ + --vad-dir damo/speech_fsmn_vad_zh-cn-16k-common-onnx \ + --model-dir damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-torchscript \ + --punc-dir damo/punc_ct-transformer_cn-en-common-vocab471067-large-onnx \ + --lm-dir damo/speech_ngram_lm_zh-cn-ai-wesp-fst \ + --itn-dir thuduj12/fst_itn_zh \ + --hotword /workspace/models/hotwords.txt > log.txt 2>&1 & + +# 濡傛灉鎮ㄦ兂鍏抽棴ssl锛屽鍔犲弬鏁帮細--certfile 0 +# 榛樿鍔犺浇鏃堕棿鎴虫ā鍨嬶紝濡傛灉鎮ㄦ兂浣跨敤nn鐑瘝妯″瀷杩涜閮ㄧ讲锛岃璁剧疆--model-dir涓哄搴旀ā鍨嬶細 +# damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-torchscript锛堟椂闂存埑锛� +# damo/speech_paraformer-large-contextual_asr_nat-zh-cn-16k-common-vocab8404-torchscript锛坣n鐑瘝锛� +# 濡傛灉鎮ㄦ兂鍦ㄦ湇鍔$鍔犺浇鐑瘝锛岃鍦ㄥ涓绘満鏂囦欢./funasr-runtime-resources/models/hotwords.txt閰嶇疆鐑瘝锛坉ocker鏄犲皠鍦板潃涓�/workspace/models/hotwords.txt锛�: +# 姣忚涓�涓儹璇嶏紝鏍煎紡(鐑瘝 鏉冮噸)锛氶樋閲屽反宸� 20锛堟敞锛氱儹璇嶇悊璁轰笂鏃犻檺鍒讹紝浣嗕负浜嗗吋椤炬�ц兘鍜屾晥鏋滐紝寤鸿鐑瘝闀垮害涓嶈秴杩�10锛屼釜鏁颁笉瓒呰繃1k锛屾潈閲�1~100锛� +``` +濡傛灉鎮ㄦ兂瀹氬埗ngram锛屽弬鑰冩枃妗�([濡備綍璁粌LM](./lm_train_tutorial.md)) + +鏈嶅姟绔缁嗗弬鏁颁粙缁嶅彲鍙傝�僛鏈嶅姟绔敤娉曡瑙(#鏈嶅姟绔敤娉曡瑙�) + +### 瀹㈡埛绔祴璇曚笌浣跨敤 + +涓嬭浇瀹㈡埛绔祴璇曞伐鍏风洰褰晄amples +```shell +wget https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/sample/funasr_samples.tar.gz +``` +鎴戜滑浠ython璇█瀹㈡埛绔负渚嬶紝杩涜璇存槑锛屾敮鎸佸绉嶉煶棰戞牸寮忚緭鍏ワ紙.wav, .pcm, .mp3绛夛級锛屼篃鏀寔瑙嗛杈撳叆(.mp4绛�)锛屼互鍙婂鏂囦欢鍒楄〃wav.scp杈撳叆锛屽叾浠栫増鏈鎴风璇峰弬鑰冩枃妗o紙[鐐瑰嚮姝ゅ](#瀹㈡埛绔敤娉曡瑙�)锛夛紝瀹氬埗鏈嶅姟閮ㄧ讲璇峰弬鑰僛濡備綍瀹氬埗鏈嶅姟閮ㄧ讲](#濡備綍瀹氬埗鏈嶅姟閮ㄧ讲) +```shell +python3 funasr_wss_client.py --host "127.0.0.1" --port 10095 --mode offline --audio_in "../audio/asr_example.wav" +``` + +## 瀹㈡埛绔敤娉曡瑙� + +鍦ㄦ湇鍔″櫒涓婂畬鎴怓unASR鏈嶅姟閮ㄧ讲浠ュ悗锛屽彲浠ラ�氳繃濡備笅鐨勬楠ゆ潵娴嬭瘯鍜屼娇鐢ㄧ绾挎枃浠惰浆鍐欐湇鍔°�� +鐩墠鍒嗗埆鏀寔浠ヤ笅鍑犵缂栫▼璇█瀹㈡埛绔� + +- [Python](#python-client) +- [CPP](#cpp-client) +- [html缃戦〉鐗堟湰](#Html缃戦〉鐗�) +- [Java](#Java-client) + +### python-client +鑻ユ兂鐩存帴杩愯client杩涜娴嬭瘯锛屽彲鍙傝�冨涓嬬畝鏄撹鏄庯紝浠ython鐗堟湰涓轰緥锛� + +```shell +python3 funasr_wss_client.py --host "127.0.0.1" --port 10095 --mode offline \ + --audio_in "../audio/asr_example.wav" --output_dir "./results" +``` + +鍛戒护鍙傛暟璇存槑锛� +```text +--host 涓篎unASR runtime-SDK鏈嶅姟閮ㄧ讲鏈哄櫒ip锛岄粯璁や负鏈満ip锛�127.0.0.1锛夛紝濡傛灉client涓庢湇鍔′笉鍦ㄥ悓涓�鍙版湇鍔″櫒锛� + 闇�瑕佹敼涓洪儴缃叉満鍣╥p +--port 10095 閮ㄧ讲绔彛鍙� +--mode offline琛ㄧず绂荤嚎鏂囦欢杞啓 +--audio_in 闇�瑕佽繘琛岃浆鍐欑殑闊抽鏂囦欢锛屾敮鎸佹枃浠惰矾寰勶紝鏂囦欢鍒楄〃wav.scp +--thread_num 璁剧疆骞跺彂鍙戦�佺嚎绋嬫暟锛岄粯璁や负1 +--ssl 璁剧疆鏄惁寮�鍚痵sl璇佷功鏍¢獙锛岄粯璁�1寮�鍚紝璁剧疆涓�0鍏抽棴 +--hotword 鐑瘝鏂囦欢锛屾瘡琛屼竴涓儹璇嶏紝鏍煎紡(鐑瘝 鏉冮噸)锛氶樋閲屽反宸� 20 +--use_itn 璁剧疆鏄惁浣跨敤itn锛岄粯璁�1寮�鍚紝璁剧疆涓�0鍏抽棴 +``` + +### cpp-client +杩涘叆samples/cpp鐩綍鍚庯紝鍙互鐢╟pp杩涜娴嬭瘯锛屾寚浠ゅ涓嬶細 +```shell +./funasr-wss-client --server-ip 127.0.0.1 --port 10095 --wav-path ../audio/asr_example.wav +``` + +鍛戒护鍙傛暟璇存槑锛� +```text +--server-ip 涓篎unASR runtime-SDK鏈嶅姟閮ㄧ讲鏈哄櫒ip锛岄粯璁や负鏈満ip锛�127.0.0.1锛夛紝濡傛灉client涓庢湇鍔′笉鍦ㄥ悓涓�鍙版湇鍔″櫒锛� + 闇�瑕佹敼涓洪儴缃叉満鍣╥p +--port 10095 閮ㄧ讲绔彛鍙� +--wav-path 闇�瑕佽繘琛岃浆鍐欑殑闊抽鏂囦欢锛屾敮鎸佹枃浠惰矾寰� +--hotword 鐑瘝鏂囦欢锛屾瘡琛屼竴涓儹璇嶏紝鏍煎紡(鐑瘝 鏉冮噸)锛氶樋閲屽反宸� 20 +--thread-num 璁剧疆瀹㈡埛绔嚎绋嬫暟 +--use-itn 璁剧疆鏄惁浣跨敤itn锛岄粯璁�1寮�鍚紝璁剧疆涓�0鍏抽棴 +``` + +### Html缃戦〉鐗� +鍦ㄦ祻瑙堝櫒涓墦寮� html/static/index.html锛屽嵆鍙嚭鐜板涓嬮〉闈紝鏀寔楹﹀厠椋庤緭鍏ヤ笌鏂囦欢涓婁紶锛岀洿鎺ヨ繘琛屼綋楠� + +<img src="images/html.png" width="900"/> + +### Java-client +```shell +FunasrWsClient --host localhost --port 10095 --audio_in ./asr_example.wav --mode offline +``` +璇︾粏鍙互鍙傝�冩枃妗o紙[鐐瑰嚮姝ゅ](../java/readme.md)锛� + +## 鏈嶅姟绔敤娉曡瑙o細 + +### 鍚姩FunASR鏈嶅姟 +```shell +cd /workspace/FunASR/runtime +nohup bash run_server.sh \ + --download-model-dir /workspace/models \ + --model-dir damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-torchscript \ + --vad-dir damo/speech_fsmn_vad_zh-cn-16k-common-onnx \ + --punc-dir damo/punc_ct-transformer_cn-en-common-vocab471067-large-onnx \ + --lm-dir damo/speech_ngram_lm_zh-cn-ai-wesp-fst \ + --itn-dir thuduj12/fst_itn_zh \ + --certfile ../../../ssl_key/server.crt \ + --keyfile ../../../ssl_key/server.key \ + --hotword ../../hotwords.txt > log.txt 2>&1 & + ``` +**run_server.sh鍛戒护鍙傛暟浠嬬粛** +```text +--download-model-dir 妯″瀷涓嬭浇鍦板潃锛岄�氳繃璁剧疆model ID浠嶮odelscope涓嬭浇妯″瀷 +--model-dir modelscope model ID 鎴栬�� 鏈湴妯″瀷璺緞 +--vad-dir modelscope model ID 鎴栬�� 鏈湴妯″瀷璺緞 +--punc-dir modelscope model ID 鎴栬�� 鏈湴妯″瀷璺緞 +--lm-dir modelscope model ID 鎴栬�� 鏈湴妯″瀷璺緞 +--itn-dir modelscope model ID 鎴栬�� 鏈湴妯″瀷璺緞 +--port 鏈嶅姟绔洃鍚殑绔彛鍙凤紝榛樿涓� 10095 +--decoder-thread-num 鏈嶅姟绔嚎绋嬫睜涓暟(鏀寔鐨勬渶澶у苟鍙戣矾鏁�)锛� + **寤鸿姣忚矾鍒嗛厤1G鏄惧瓨锛屽嵆20G鏄惧瓨鍙厤缃�20璺苟鍙�** +--io-thread-num 鏈嶅姟绔惎鍔ㄧ殑IO绾跨▼鏁� +--model-thread-num 姣忚矾璇嗗埆鐨勫唴閮ㄧ嚎绋嬫暟(鎺у埗ONNX妯″瀷鐨勫苟琛�)锛岄粯璁や负 1锛� + 鍏朵腑寤鸿 decoder-thread-num*model-thread-num 绛変簬鎬荤嚎绋嬫暟 +--certfile ssl鐨勮瘉涔︽枃浠讹紝榛樿涓猴細../../../ssl_key/server.crt锛屽鏋滈渶瑕佸叧闂璼sl锛屽弬鏁拌缃负0 +--keyfile ssl鐨勫瘑閽ユ枃浠讹紝榛樿涓猴細../../../ssl_key/server.key +--hotword 鐑瘝鏂囦欢璺緞锛屾瘡琛屼竴涓儹璇嶏紝鏍煎紡锛氱儹璇� 鏉冮噸(渚嬪:闃块噷宸村反 20)锛� + 濡傛灉瀹㈡埛绔彁渚涚儹璇嶏紝鍒欎笌瀹㈡埛绔彁渚涚殑鐑瘝鍚堝苟涓�璧蜂娇鐢紝鏈嶅姟绔儹璇嶅叏灞�鐢熸晥锛屽鎴风鐑瘝鍙拡瀵瑰搴斿鎴风鐢熸晥銆� +``` + +### 鍏抽棴FunASR鏈嶅姟 +```text +# 鏌ョ湅 funasr-wss-server 瀵瑰簲鐨凱ID +ps -x | grep funasr-wss-server +kill -9 PID +``` + +### 淇敼妯″瀷鍙婂叾浠栧弬鏁� +鏇挎崲姝e湪浣跨敤鐨勬ā鍨嬫垨鑰呭叾浠栧弬鏁帮紝闇�鍏堝叧闂璅unASR鏈嶅姟锛屼慨鏀归渶瑕佹浛鎹㈢殑鍙傛暟锛屽苟閲嶆柊鍚姩FunASR鏈嶅姟銆傚叾涓ā鍨嬮渶涓篗odelScope涓殑ASR/VAD/PUNC妯″瀷锛屾垨鑰呬粠ModelScope涓ā鍨媐inetune鍚庣殑妯″瀷銆� +```text +# 渚嬪鏇挎崲ASR妯″瀷涓� damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-torchscript锛屽垯濡備笅璁剧疆鍙傛暟 --model-dir + --model-dir damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-torchscript +# 璁剧疆绔彛鍙� --port + --port <port number> +# 璁剧疆鏈嶅姟绔惎鍔ㄧ殑鎺ㄧ悊绾跨▼鏁� --decoder-thread-num + --decoder-thread-num <decoder thread num> +# 璁剧疆鏈嶅姟绔惎鍔ㄧ殑IO绾跨▼鏁� --io-thread-num + --io-thread-num <io thread num> +# 鍏抽棴SSL璇佷功 + --certfile 0 +``` + +鎵ц涓婅堪鎸囦护鍚庯紝鍚姩绂荤嚎鏂囦欢杞啓鏈嶅姟銆傚鏋滄ā鍨嬫寚瀹氫负ModelScope涓璵odel id锛屼細鑷姩浠嶮oldeScope涓笅杞藉涓嬫ā鍨嬶細 +[FSMN-VAD妯″瀷](https://www.modelscope.cn/models/damo/speech_fsmn_vad_zh-cn-16k-common-onnx/summary), +[Paraformer-lagre妯″瀷](https://www.modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-torchscript/summary), +[CT-Transformer鏍囩偣棰勬祴妯″瀷](https://www.modelscope.cn/models/damo/punc_ct-transformer_cn-en-common-vocab471067-large-onnx/summary), +[鍩轰簬FST鐨勪腑鏂嘔TN](https://www.modelscope.cn/models/thuduj12/fst_itn_zh/summary), +[Ngram涓枃璇█妯″瀷](https://www.modelscope.cn/models/damo/speech_ngram_lm_zh-cn-ai-wesp-fst/summary) + +濡傛灉锛屾偍甯屾湜閮ㄧ讲鎮╢inetune鍚庣殑妯″瀷锛堜緥濡�10epoch.pb锛夛紝闇�瑕佹墜鍔ㄥ皢妯″瀷閲嶅懡鍚嶄负model.pb锛屽苟灏嗗師modelscope涓ā鍨媘odel.pb鏇挎崲鎺夛紝灏嗚矾寰勬寚瀹氫负`model_dir`鍗冲彲銆� diff --git a/runtime/docs/benchmark_libtorch_cpp.md b/runtime/docs/benchmark_libtorch_cpp.md new file mode 100644 index 0000000..b7f99c6 --- /dev/null +++ b/runtime/docs/benchmark_libtorch_cpp.md @@ -0,0 +1,31 @@ +# GPU Benchmark (libtorch-cpp) + +## Configuration +### Data set: +A long audio test set(Non-open source) containing 103 audio files, with durations ranging from 2 to 30 minutes. + +## [FSMN-VAD](https://www.modelscope.cn/models/damo/speech_fsmn_vad_zh-cn-16k-common-onnx/summary) + [Paraformer-large](https://www.modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-torchscript/summary) + [CT-Transformer](https://www.modelscope.cn/models/damo/punc_ct-transformer_zh-cn-common-vocab272727-onnx/summary) + +```shell +./funasr-onnx-offline-rtf \ + --model-dir ./damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-torchscript \ + --vad-dir ./damo/speech_fsmn_vad_zh-cn-16k-common-onnx \ + --punc-dir ./damo/punc_ct-transformer_cn-en-common-vocab471067-large-onnx \ + --gpu \ + --thread-num 20 \ + --bladedisc true \ + --batch-size 20 \ + --wav-path ./long_test.scp +``` +Node: run in docker, ref to ([docs](./SDK_advanced_guide_offline_gpu_zh.md)) + +### Intel(R) Xeon(R) Platinum 8369B CPU @ 2.90GHz 16core-32processor with avx512_vnni, GPU @ A10 + +| concurrent-tasks | batch | RTF | Speedup Rate | +|------------------|:------:|:------:|:------------:| +| 1 | 1 | 0.0076 | 130 | +| 1 | 20 | 0.0048 | 208 | +| 20 | 20 | 0.0008 | 1200 | + +Node: On CPUs, the single-thread RTF is 0.066, and 32-threads' speedup is 330+ + diff --git a/runtime/readme.md b/runtime/readme.md index 28a063d..fb795c9 100644 --- a/runtime/readme.md +++ b/runtime/readme.md @@ -7,9 +7,23 @@ - File transcription service, Mandarin, CPU version, done - The real-time transcription service, Mandarin (CPU), done - File transcription service, English, CPU version, done -- File transcription service, Mandarin, GPU version, in progress +- File transcription service, Mandarin, GPU version, done - and more. +## File Transcription Service, Mandarin (GPU) + +Currently, the FunASR runtime-SDK supports the deployment of file transcription service, Mandarin (GPU version), with a complete speech recognition chain that can transcribe tens of hours of audio into punctuated text, and supports recognition for more than a hundred concurrent streams. + +To meet the needs of different users, we have prepared different tutorials with text and images for both novice and advanced developers. + +### Whats-new +- 2024/06/27: File Transcription Service 1.0 GPU released, supporting dynamic batch processing and multi-threading concurrency. In the long audio test set, the single-thread RTF is 0.0076, and multi-threads' speedup is 1200+ (compared to 330+ on CPU), ref to([docs](./docs/benchmark_libtorch_cpp.md)) , docker image version funasr-runtime-sdk-gpu-0.1.0 (aa10f938da3b) + +### Advanced Development Guide + +The documentation mainly targets advanced developers who require modifications and customization of the service. It supports downloading model deployments from modelscope and also supports deploying models that users have fine-tuned. For detailed information, please refer to the documentation available by [docs](./docs/SDK_advanced_guide_offline_gpu.md) + + ## File Transcription Service, English (CPU) Currently, the FunASR runtime-SDK supports the deployment of file transcription service, English (CPU version), with a complete speech recognition chain that can transcribe tens of hours of audio into punctuated text, and supports recognition for more than a hundred concurrent streams. diff --git a/runtime/readme_cn.md b/runtime/readme_cn.md index 9cb7b58..0359d6e 100644 --- a/runtime/readme_cn.md +++ b/runtime/readme_cn.md @@ -10,9 +10,22 @@ - 涓枃绂荤嚎鏂囦欢杞啓鏈嶅姟锛圕PU鐗堟湰锛夛紝宸插畬鎴� - 涓枃娴佸紡璇煶璇嗗埆鏈嶅姟锛圕PU鐗堟湰锛夛紝宸插畬鎴� - 鑻辨枃绂荤嚎鏂囦欢杞啓鏈嶅姟锛圕PU鐗堟湰锛夛紝宸插畬鎴� -- 涓枃绂荤嚎鏂囦欢杞啓鏈嶅姟锛圙PU鐗堟湰锛夛紝杩涜涓� +- 涓枃绂荤嚎鏂囦欢杞啓鏈嶅姟锛圙PU鐗堟湰锛夛紝宸插畬鎴� - 鏇村鏀寔涓� +## 涓枃绂荤嚎鏂囦欢杞啓鏈嶅姟锛圙PU鐗堟湰锛� + +涓枃璇煶绂荤嚎鏂囦欢鏈嶅姟閮ㄧ讲锛圙PU鐗堟湰锛夛紝鎷ユ湁瀹屾暣鐨勮闊宠瘑鍒摼璺紝鍙互灏嗗嚑鍗佷釜灏忔椂鐨勯暱闊抽涓庤棰戣瘑鍒垚甯︽爣鐐圭殑鏂囧瓧锛岃�屼笖鏀寔澶氳矾璇锋眰鍚屾椂杩涜杞啓銆� +涓轰簡鏀寔涓嶅悓鐢ㄦ埛鐨勯渶姹傦紝閽堝涓嶅悓鍦烘櫙锛屽噯澶囦簡涓嶅悓鐨勫浘鏂囨暀绋嬶細 + +### 鏈�鏂板姩鎬� +- 2024/06/27: 涓枃绂荤嚎鏂囦欢杞啓鏈嶅姟GPU 1.0 鍙戝竷锛屾敮鎸佸姩鎬乥atch锛屾敮鎸佸璺苟鍙戯紝鍦ㄩ暱闊抽娴嬭瘯闆嗕笂鍗曠嚎RTF涓�0.0076锛屽绾垮姞閫熸瘮涓�1200+锛圕PU涓�330+锛夛紝璇﹁([鏂囨。](./docs/benchmark_libtorch_cpp.md))锛宒okcer闀滃儚鐗堟湰funasr-runtime-sdk-gpu-0.1.0 (aa10f938da3b) + +### 閮ㄧ讲涓庡紑鍙戞枃妗� + +閮ㄧ讲妯″瀷鏉ヨ嚜浜嶮odelScope锛屾垨鑰呯敤鎴穎inetune锛屾敮鎸佺敤鎴峰畾鍒舵湇鍔★紝璇︾粏鏂囨。鍙傝�冿紙[鐐瑰嚮姝ゅ](./docs/SDK_advanced_guide_offline_gpu_zh.md)锛� + + ## 鑻辨枃绂荤嚎鏂囦欢杞啓鏈嶅姟锛圕PU鐗堟湰锛� 鑻辨枃绂荤嚎鏂囦欢杞啓鏈嶅姟閮ㄧ讲锛圕PU鐗堟湰锛夛紝鎷ユ湁瀹屾暣鐨勮闊宠瘑鍒摼璺紝鍙互灏嗗嚑鍗佷釜灏忔椂鐨勯暱闊抽涓庤棰戣瘑鍒垚甯︽爣鐐圭殑鏂囧瓧锛岃�屼笖鏀寔涓婄櫨璺姹傚悓鏃惰繘琛岃浆鍐欍�� -- Gitblit v1.9.1