| Acknowledge | ●●●●● 补丁 | 查看 | 原始文档 | blame | 历史 | |
| README.md | ●●●●● 补丁 | 查看 | 原始文档 | blame | 历史 | |
| README_zh.md | ●●●●● 补丁 | 查看 | 原始文档 | blame | 历史 | |
| docs/images/XVERSE.png | 补丁 | 查看 | 原始文档 | blame | 历史 | |
| docs/index.rst | ●●●●● 补丁 | 查看 | 原始文档 | blame | 历史 | |
| funasr/runtime/docs/SDK_tutorial_online.md | ●●●●● 补丁 | 查看 | 原始文档 | blame | 历史 | |
| funasr/runtime/docs/SDK_tutorial_online_zh.md | ●●●●● 补丁 | 查看 | 原始文档 | blame | 历史 | |
| funasr/runtime/docs/websocket_protocol.md | ●●●●● 补丁 | 查看 | 原始文档 | blame | 历史 | |
| funasr/runtime/docs/websocket_protocol_zh.md | ●●●●● 补丁 | 查看 | 原始文档 | blame | 历史 | |
| funasr/runtime/python/onnxruntime/funasr_onnx/utils/utils.py | ●●●●● 补丁 | 查看 | 原始文档 | blame | 历史 | |
| funasr/runtime/python/onnxruntime/setup.py | ●●●●● 补丁 | 查看 | 原始文档 | blame | 历史 | |
| funasr/runtime/readme.md | ●●●●● 补丁 | 查看 | 原始文档 | blame | 历史 | |
| funasr/runtime/readme_cn.md | ●●●●● 补丁 | 查看 | 原始文档 | blame | 历史 |
Acknowledge
@@ -6,3 +6,4 @@ 4. We acknowledge [ChinaTelecom](https://github.com/zhuzizyf/damo-fsmn-vad-infer-httpserver) for contributing the VAD runtime. 5. We acknowledge [RapidAI](https://github.com/RapidAI) for contributing the Paraformer and CT_Transformer-punc runtime. 6. We acknowledge [AiHealthx](http://www.aihealthx.com/) for contributing the websocket service and html5. 7. We acknowledge [XVERSE](http://www.xverse.cn/index.html) for contributing the grpc service. README.md
@@ -63,8 +63,8 @@ ## Contributors | <div align="left"><img src="docs/images/damo.png" width="180"/> | <div align="left"><img src="docs/images/nwpu.png" width="260"/> | <img src="docs/images/China_Telecom.png" width="200"/> </div> | <img src="docs/images/RapidAI.png" width="200"/> </div> | <img src="docs/images/aihealthx.png" width="200"/> </div> | |:---------------------------------------------------------------:|:---------------------------------------------------------------:|:--------------------------------------------------------------:|:-------------------------------------------------------:|:-----------------------------------------------------------:| | <div align="left"><img src="docs/images/damo.png" width="180"/> | <div align="left"><img src="docs/images/nwpu.png" width="260"/> | <img src="docs/images/China_Telecom.png" width="200"/> </div> | <img src="docs/images/RapidAI.png" width="200"/> </div> | <img src="docs/images/aihealthx.png" width="200"/> </div> | <img src="docs/images/XVERSE.png" width="250"/> </div> | |:---------------------------------------------------------------:|:---------------------------------------------------------------:|:--------------------------------------------------------------:|:-------------------------------------------------------:|:-----------------------------------------------------------:|:------------------------------------------------------:| The contributors can be found in [contributors list]((./Acknowledge)) README_zh.md
@@ -60,8 +60,8 @@ ## 社区贡献者 | <div align="left"><img src="docs/images/damo.png" width="180"/> | <div align="left"><img src="docs/images/nwpu.png" width="260"/> | <img src="docs/images/China_Telecom.png" width="200"/> </div> | <img src="docs/images/RapidAI.png" width="200"/> </div> | <img src="docs/images/aihealthx.png" width="200"/> </div> | |:---------------------------------------------------------------:|:---------------------------------------------------------------:|:--------------------------------------------------------------:|:-------------------------------------------------------:|:-----------------------------------------------------------:| | <div align="left"><img src="docs/images/damo.png" width="180"/> | <div align="left"><img src="docs/images/nwpu.png" width="260"/> | <img src="docs/images/China_Telecom.png" width="200"/> </div> | <img src="docs/images/RapidAI.png" width="200"/> </div> | <img src="docs/images/aihealthx.png" width="200"/> </div> | <img src="docs/images/XVERSE.png" width="250"/> </div> | |:---------------------------------------------------------------:|:---------------------------------------------------------------:|:--------------------------------------------------------------:|:-------------------------------------------------------:|:-----------------------------------------------------------:|:------------------------------------------------------:| 贡献者名单请参考([致谢名单](./Acknowledge)) docs/images/XVERSE.png
docs/index.rst
@@ -71,11 +71,9 @@ :maxdepth: 1 :caption: Runtime and Service ./funasr/runtime/readme.md ./funasr/runtime/docs/SDK_tutorial_online.md ./funasr/runtime/docs/SDK_tutorial.md ./funasr/runtime/python/websocket/README.md ./funasr/runtime/websocket/readme.md ./funasr/runtime/html5/readme.md funasr/runtime/docs/SDK_tutorial_online.md
@@ -29,7 +29,7 @@ # curl -O https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/shell/funasr-runtime-deploy-online-cpu-en.sh; ``` Execute the deployment tool and press the Enter key at the prompt to complete the installation and deployment of the server. Currently, the convenient deployment tool only supports Linux environments. For other environments, please refer to the development guide ([docs](./SDK_advanced_guide_offline.md)). Execute the deployment tool and press the Enter key at the prompt to complete the installation and deployment of the server. Currently, the convenient deployment tool only supports Linux environments. For other environments, please refer to the development guide ([docs](./SDK_advanced_guide_online.md)). ```shell sudo bash funasr-runtime-deploy-online-cpu-zh.sh install --workspace ./funasr-runtime-resources ``` funasr/runtime/docs/SDK_tutorial_online_zh.md
@@ -30,7 +30,7 @@ # curl -O https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/shell/funasr-runtime-deploy-online-cpu-zh.sh; ``` 执行部署工具,在提示处输入回车键即可完成服务端安装与部署。目前便捷部署工具暂时仅支持Linux环境,其他环境部署参考开发指南([点击此处](#客户端用法详解)) 执行部署工具,在提示处输入回车键即可完成服务端安装与部署。目前便捷部署工具暂时仅支持Linux环境,其他环境部署参考开发指南([点击此处](./SDK_advanced_guide_online_zh.md)) ```shell sudo bash funasr-runtime-deploy-online-cpu-zh.sh install --workspace ./funasr-runtime-resources ``` funasr/runtime/docs/websocket_protocol.md
New file @@ -0,0 +1,88 @@ ([简体中文](./websocket_protocol_zh.md)|English) # WebSocket/gRPC Communication Protocol ## Offline File Transcription ### Sending Data from Client to Server #### Message Format Configuration parameters and meta information are in JSON format, while audio data is in bytes. #### Initial Communication The message (which needs to be serialized in JSON) is: ```text {"mode": "offline", "wav_name": "wav_name", "is_speaking": True,"wav_format":"pcm"} ``` Parameter explanation: ```text `mode`: `offline`, indicating the inference mode for offline file transcription `wav_name`: the name of the audio file to be transcribed `wav_format`: the audio and video file extension, such as pcm, mp3, mp4, etc. `is_speaking`: False indicates the end of a sentence, such as a VAD segmentation point or the end of a WAV file `audio_fs`: when the input audio is in PCM format, the audio sampling rate parameter needs to be added ``` #### Sending Audio Data For PCM format, directly send the audio data. For other audio formats, send the header information and audio and video bytes data together. Multiple sampling rates and audio and video formats are supported. #### Sending End of Audio Flag After sending the audio data, an end-of-audio flag needs to be sent (which needs to be serialized in JSON): ```text {"is_speaking": False} ``` ### Sending Data from Server to Client #### Sending Recognition Results The message (serialized in JSON) is: ```text {"mode": "offline", "wav_name": "wav_name", "text": "asr ouputs", "is_final": True} ``` Parameter explanation: ```text `mode`: `offline`, indicating the inference mode for offline file transcription `wav_name`: the name of the audio file to be transcribed `text`: the text output of speech recognition `is_final`: indicating the end of recognition ``` ## Real-time Speech Recognition ### System Architecture Diagram <div align="left"><img src="images/2pass.jpg" width="400"/></div> ### Sending Data from Client to Server #### Message Format Configuration parameters and meta information are in JSON format, while audio data is in bytes. #### Initial Communication The message (which needs to be serialized in JSON) is: ```text {"mode": "2pass", "wav_name": "wav_name", "is_speaking": True, "wav_format":"pcm", "chunk_size":[5,10,5] ``` Parameter explanation: ```text `mode`: `offline` indicates the inference mode for single-sentence recognition; `online` indicates the inference mode for real-time speech recognition; `2pass` indicates real-time speech recognition and offline model correction for sentence endings. `wav_name`: the name of the audio file to be transcribed `wav_format`: the audio and video file extension, such as pcm, mp3, mp4, etc. (Note: only PCM audio streams are supported in version 1.0) `is_speaking`: False indicates the end of a sentence, such as a VAD segmentation point or the end of a WAV file `chunk_size`: indicates the latency configuration of the streaming model, `[5,10,5]` indicates that the current audio is 600ms long, with a 300ms look-ahead and look-back time. `audio_fs`: when the input audio is in PCM format, the audio sampling rate parameter needs to be added ``` #### Sending Audio Data Directly send the audio data, removing the header information and sending only the bytes data. Supported audio sampling rates are 8000 (which needs to be specified as audio_fs in message), and 16000. #### Sending End of Audio Flag After sending the audio data, an end-of-audio flag needs to be sent (which needs to be serialized in JSON): ```text {"is_speaking": False} ``` ### Sending Data from Server to Client #### Sending Recognition Results The message (serialized in JSON) is: ```text {"mode": "2pass-online", "wav_name": "wav_name", "text": "asr ouputs", "is_final": True} ``` Parameter explanation: ```text `mode`: indicates the inference mode, divided into `2pass-online` for real-time recognition results and `2pass-offline` for 2-pass corrected recognition results. `wav_name`: the name of the audio file to be transcribed `text`: the text output of speech recognition `is_final`: indicating the end of recognition ``` funasr/runtime/docs/websocket_protocol_zh.md
@@ -1,3 +1,4 @@ (简体中文|[English](./websocket_protocol.md)) # websocket/grpc通信协议 ## 离线文件转写 ### 从客户端往服务端发送数据 @@ -64,7 +65,7 @@ `audio_fs`:当输入音频为pcm数据是,需要加上音频采样率参数 ``` #### 发送音频数据 直接将音频数据,移除头部信息后的bytes数据发送,支持音频采样率为80000,16000 直接将音频数据,移除头部信息后的bytes数据发送,支持音频采样率为8000(`message`中需要指定`audio_fs`为8000),16000 #### 发送结束标志 音频数据发送结束后,需要发送结束标志(需要用json序列化): ```text funasr/runtime/python/onnxruntime/funasr_onnx/utils/utils.py
@@ -9,8 +9,11 @@ import re import numpy as np import yaml from onnxruntime import (GraphOptimizationLevel, InferenceSession, SessionOptions, get_available_providers, get_device) try: from onnxruntime import (GraphOptimizationLevel, InferenceSession, SessionOptions, get_available_providers, get_device) except: print("please pip3 install onnxruntime") import jieba import warnings funasr/runtime/python/onnxruntime/setup.py
@@ -13,7 +13,7 @@ MODULE_NAME = 'funasr_onnx' VERSION_NUM = '0.1.2' VERSION_NUM = '0.2.0' setuptools.setup( name=MODULE_NAME, funasr/runtime/readme.md
@@ -1,4 +1,4 @@ # FunASR runtime-SDK # FunASR Runtime Roadmap 中文文档([点击此处](./readme_cn.md)) FunASR is a speech recognition framework developed by the Speech Lab of DAMO Academy, which integrates industrial-level models in the fields of speech endpoint detection, speech recognition, punctuation segmentation, and more. funasr/runtime/readme_cn.md
@@ -1,4 +1,4 @@ # FunASR runtime-SDK # FunASR软件包路线图 English Version([docs](./readme.md))