From 32905d8cdedd53dad26680b0bd41397aaf0e51ae Mon Sep 17 00:00:00 2001 From: 游雁 <zhifu.gzf@alibaba-inc.com> Date: 星期五, 05 一月 2024 11:52:48 +0800 Subject: [PATCH] funasr1.0 --- runtime/docs/SDK_advanced_guide_online.md | 26 ++++++++++++++------------ 1 files changed, 14 insertions(+), 12 deletions(-) diff --git a/runtime/docs/SDK_advanced_guide_online.md b/runtime/docs/SDK_advanced_guide_online.md index 29b7f13..abcdb0d 100644 --- a/runtime/docs/SDK_advanced_guide_online.md +++ b/runtime/docs/SDK_advanced_guide_online.md @@ -1,6 +1,8 @@ # Real-time Speech Transcription Service Development Guide +([绠�浣撲腑鏂嘳(SDK_advanced_guide_online_zh.md)|English) -FunASR provides a real-time speech transcription service that can be easily deployed on local or cloud servers, with the FunASR runtime-SDK as the core. It integrates the speech endpoint detection (VAD), Paraformer-large non-streaming speech recognition (ASR), Paraformer-large streaming speech recognition (ASR), punctuation (PUNC), and other related capabilities open-sourced by the speech laboratory of DAMO Academy on the Modelscope community. The software package can perform real-time speech-to-text transcription, and can also accurately transcribe text at the end of sentences for high-precision output. The output text contains punctuation and supports high-concurrency multi-channel requests. +[//]: # (FunASR provides a real-time speech transcription service that can be easily deployed on local or cloud servers, with the FunASR runtime-SDK as the core. It integrates the speech endpoint detection (VAD), Paraformer-large non-streaming speech recognition (ASR), Paraformer-large streaming speech recognition (ASR), punctuation (PUNC), and other related capabilities open-sourced by the speech laboratory of DAMO Academy on the Modelscope community. The software package can perform real-time speech-to-text transcription, and can also accurately transcribe text at the end of sentences for high-precision output. The output text contains punctuation and supports high-concurrency multi-channel requests.) +FunASR Real-time Speech Recognition Software Package integrates real-time versions of speech endpoint detection model, speech recognition model, punctuation prediction model, and so on. By using multiple models collaboratively, it can perform real-time speech-to-text conversion, as well as high-precision transcription correction at the end of a sentence, with punctuation included in the output text. It supports multiple concurrent requests. Depending on the user's scenarios, it supports three service modes: real-time speech recognition service (online), non-real-time single-sentence transcription (offline), and real-time and non-real-time integrated collaboration (2pass). The software package provides client libraries in various programming languages such as HTML, Python, C++, Java, and C#, allowing users to use and further develop the software. <img src="images/online_structure.png" width="900"/> @@ -24,9 +26,9 @@ ### Pull Docker Image Use the following command to pull and start the FunASR software package docker image: ```shell -sudo docker pull registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-online-cpu-0.1.4 +sudo docker pull registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-online-cpu-0.1.5 mkdir -p ./funasr-runtime-resources/models -sudo docker run -p 10096:10095 -it --privileged=true -v $PWD/funasr-runtime-resources/models:/workspace/models registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-online-cpu-0.1.4 +sudo docker run -p 10096:10095 -it --privileged=true -v $PWD/funasr-runtime-resources/models:/workspace/models registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-online-cpu-0.1.5 ``` ### Launching the Server @@ -83,9 +85,6 @@ --vad-dir damo/speech_fsmn_vad_zh-cn-16k-common-onnx \ --punc-dir damo/punc_ct-transformer_zh-cn-common-vad_realtime-vocab272727-onnx \ --itn-dir thuduj12/fst_itn_zh \ - --decoder-thread-num 32 \ - --io-thread-num 8 \ - --port 10095 \ --certfile ../../../ssl_key/server.crt \ --keyfile ../../../ssl_key/server.key \ --hotword ../../hotwords.txt > log.out 2>&1 & @@ -100,17 +99,20 @@ ### More details about the script run_server_2pass.sh: ```text --download-model-dir: Model download address, download models from Modelscope by setting the model ID. ---model-dir: Modelscope model ID. +--model-dir: modelscope model ID or local model path. --online-model-dir modelscope model ID --quantize: True for quantized ASR model, False for non-quantized ASR model. Default is True. ---vad-dir: Modelscope model ID. +--vad-dir: modelscope model ID or local model path. --vad-quant: True for quantized VAD model, False for non-quantized VAD model. Default is True. ---punc-dir: Modelscope model ID. +--punc-dir: modelscope model ID or local model path. --punc-quant: True for quantized PUNC model, False for non-quantized PUNC model. Default is True. ---itn-dir modelscope model ID +--itn-dir modelscope model ID or local model path. --port: Port number that the server listens on. Default is 10095. ---decoder-thread-num: Number of inference threads that the server starts. Default is 8. ---io-thread-num: Number of IO threads that the server starts. Default is 1. +--decoder-thread-num: The number of thread pools on the server side that can handle concurrent requests. + The script will automatically configure parameters decoder-thread-num and io-thread-num based on the server's thread count. +--io-thread-num: Number of IO threads that the server starts. +--model-thread-num: The number of internal threads for each recognition route to control the parallelism of the ONNX model. + The default value is 1. It is recommended that decoder-thread-num * model-thread-num equals the total number of threads. --certfile <string>: SSL certificate file. Default is ../../../ssl_key/server.crt. If you want to close ssl锛宻et 0 --keyfile <string>: SSL key file. Default is ../../../ssl_key/server.key. --hotword: Hotword file path, one line for each hotword(e.g.:闃块噷宸村反 20), if the client provides hot words, then combined with the hot words provided by the client. -- Gitblit v1.9.1