We can send streaming audio data to server in real-time with grpc client every 300 ms e.g., and get transcribed text when stop speaking.
The audio data is in streaming, the asr inference process is in offline.
Install the modelscope and funasr
pip install "modelscope[audio_asr]" -f https://modelscope.oss-cn-beijing.aliyuncs.com/releases/repo.html
git clone https://github.com/alibaba/FunASR.git && cd FunASR
pip install --editable ./
Install the requirements for server
cd funasr/runtime/python/websocket
pip install -r requirements_server.txt
Start server
python ASR_server.py --host "0.0.0.0" --port 10095 --asr_model "damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch"
Install the requirements for clientshell git clone https://github.com/alibaba/FunASR.git && cd FunASR cd funasr/runtime/python/websocket pip install -r requirements_client.txt
Start client
python ASR_client.py --host "127.0.0.1" --port 10095 --chunk_size 300