python/FunASR-XL.git

parent: 453b4f36 | 补丁 | 提交 | ignore whitespace

Merge branch 'main' of https://github.com/alibaba-damo-academy/FunASR into ...

雾聪

2023-08-09 3aea0e27ec912e2f761d11c75e84ad6e5f1a2b29

Merge branch 'main' of https://github.com/alibaba-damo-academy/FunASR into main

11个文件已修改

2个文件已添加

	Acknowledge	1 ●●●●● 补丁 \| 查看 \| 原始文档 \| blame \| 历史
	README.md	4 ●●●●● 补丁 \| 查看 \| 原始文档 \| blame \| 历史
	README_zh.md	4 ●●●●● 补丁 \| 查看 \| 原始文档 \| blame \| 历史
	docs/images/XVERSE.png	补丁 \| 查看 \| 原始文档 \| blame \| 历史
	docs/index.rst	4 ●●●●● 补丁 \| 查看 \| 原始文档 \| blame \| 历史
	funasr/runtime/docs/SDK_tutorial_online.md	2 ●●●●● 补丁 \| 查看 \| 原始文档 \| blame \| 历史
	funasr/runtime/docs/SDK_tutorial_online_zh.md	2 ●●●●● 补丁 \| 查看 \| 原始文档 \| blame \| 历史
	funasr/runtime/docs/websocket_protocol.md	88 ●●●●● 补丁 \| 查看 \| 原始文档 \| blame \| 历史
	funasr/runtime/docs/websocket_protocol_zh.md	3 ●●●●● 补丁 \| 查看 \| 原始文档 \| blame \| 历史
	funasr/runtime/python/onnxruntime/funasr_onnx/utils/utils.py	7 ●●●●● 补丁 \| 查看 \| 原始文档 \| blame \| 历史
	funasr/runtime/python/onnxruntime/setup.py	2 ●●●●● 补丁 \| 查看 \| 原始文档 \| blame \| 历史
	funasr/runtime/readme.md	2 ●●●●● 补丁 \| 查看 \| 原始文档 \| blame \| 历史
	funasr/runtime/readme_cn.md	2 ●●●●● 补丁 \| 查看 \| 原始文档 \| blame \| 历史

 Acknowledge

@@ -6,3 +6,4 @@
4. We acknowledge [ChinaTelecom](https://github.com/zhuzizyf/damo-fsmn-vad-infer-httpserver) for contributing the VAD runtime.
5. We acknowledge [RapidAI](https://github.com/RapidAI) for contributing the Paraformer and CT_Transformer-punc runtime.
6. We acknowledge [AiHealthx](http://www.aihealthx.com/) for contributing the websocket service and html5.
7. We acknowledge [XVERSE](http://www.xverse.cn/index.html) for contributing the grpc service.

 README.md

@@ -63,8 +63,8 @@

## Contributors

| <div align="left"><img src="docs/images/damo.png" width="180"/> | <div align="left"><img src="docs/images/nwpu.png" width="260"/> | <img src="docs/images/China_Telecom.png" width="200"/> </div>  | <img src="docs/images/RapidAI.png" width="200"/> </div> | <img src="docs/images/aihealthx.png" width="200"/> </div> |
|:---------------------------------------------------------------:|:---------------------------------------------------------------:|:--------------------------------------------------------------:|:-------------------------------------------------------:|:-----------------------------------------------------------:|
| <div align="left"><img src="docs/images/damo.png" width="180"/> | <div align="left"><img src="docs/images/nwpu.png" width="260"/> | <img src="docs/images/China_Telecom.png" width="200"/> </div>  | <img src="docs/images/RapidAI.png" width="200"/> </div> | <img src="docs/images/aihealthx.png" width="200"/> </div> | <img src="docs/images/XVERSE.png" width="250"/> </div> |
|:---------------------------------------------------------------:|:---------------------------------------------------------------:|:--------------------------------------------------------------:|:-------------------------------------------------------:|:-----------------------------------------------------------:|:------------------------------------------------------:|

The contributors can be found in [contributors list]((./Acknowledge))


 README_zh.md

@@ -60,8 +60,8 @@

## 社区贡献者

| <div align="left"><img src="docs/images/damo.png" width="180"/> | <div align="left"><img src="docs/images/nwpu.png" width="260"/> | <img src="docs/images/China_Telecom.png" width="200"/> </div>  | <img src="docs/images/RapidAI.png" width="200"/> </div> | <img src="docs/images/aihealthx.png" width="200"/> </div> |
|:---------------------------------------------------------------:|:---------------------------------------------------------------:|:--------------------------------------------------------------:|:-------------------------------------------------------:|:-----------------------------------------------------------:|
| <div align="left"><img src="docs/images/damo.png" width="180"/> | <div align="left"><img src="docs/images/nwpu.png" width="260"/> | <img src="docs/images/China_Telecom.png" width="200"/> </div>  | <img src="docs/images/RapidAI.png" width="200"/> </div> | <img src="docs/images/aihealthx.png" width="200"/> </div> | <img src="docs/images/XVERSE.png" width="250"/> </div> |
|:---------------------------------------------------------------:|:---------------------------------------------------------------:|:--------------------------------------------------------------:|:-------------------------------------------------------:|:-----------------------------------------------------------:|:------------------------------------------------------:|

贡献者名单请参考（[致谢名单](./Acknowledge)）


 docs/images/XVERSE.png


 docs/index.rst

@@ -71,11 +71,9 @@
   :maxdepth: 1
   :caption: Runtime and Service


   ./funasr/runtime/readme.md
   ./funasr/runtime/docs/SDK_tutorial_online.md
   ./funasr/runtime/docs/SDK_tutorial.md
   ./funasr/runtime/python/websocket/README.md
   ./funasr/runtime/websocket/readme.md
   ./funasr/runtime/html5/readme.md



 funasr/runtime/docs/SDK_tutorial_online.md

@@ -29,7 +29,7 @@
# curl -O https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/shell/funasr-runtime-deploy-online-cpu-en.sh;
```

Execute the deployment tool and press the Enter key at the prompt to complete the installation and deployment of the server. Currently, the convenient deployment tool only supports Linux environments. For other environments, please refer to the development guide ([docs](./SDK_advanced_guide_offline.md)).
Execute the deployment tool and press the Enter key at the prompt to complete the installation and deployment of the server. Currently, the convenient deployment tool only supports Linux environments. For other environments, please refer to the development guide ([docs](./SDK_advanced_guide_online.md)).
```shell
sudo bash funasr-runtime-deploy-online-cpu-zh.sh install --workspace ./funasr-runtime-resources
```

 funasr/runtime/docs/SDK_tutorial_online_zh.md

@@ -30,7 +30,7 @@
# curl -O https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/shell/funasr-runtime-deploy-online-cpu-zh.sh;
```

执行部署工具，在提示处输入回车键即可完成服务端安装与部署。目前便捷部署工具暂时仅支持Linux环境，其他环境部署参考开发指南（[点击此处](#客户端用法详解)）
执行部署工具，在提示处输入回车键即可完成服务端安装与部署。目前便捷部署工具暂时仅支持Linux环境，其他环境部署参考开发指南（[点击此处](./SDK_advanced_guide_online_zh.md)）
```shell
sudo bash funasr-runtime-deploy-online-cpu-zh.sh install --workspace ./funasr-runtime-resources
```

 funasr/runtime/docs/websocket_protocol.md

New file
@@ -0,0 +1,88 @@
([简体中文](./websocket_protocol_zh.md)|English)

# WebSocket/gRPC Communication Protocol
## Offline File Transcription
### Sending Data from Client to Server
#### Message Format
Configuration parameters and meta information are in JSON format, while audio data is in bytes.
#### Initial Communication
The message (which needs to be serialized in JSON) is:
```text
{"mode": "offline", "wav_name": "wav_name", "is_speaking": True,"wav_format":"pcm"}
```
Parameter explanation:
```text
`mode`: `offline`, indicating the inference mode for offline file transcription
`wav_name`: the name of the audio file to be transcribed
`wav_format`: the audio and video file extension, such as pcm, mp3, mp4, etc.
`is_speaking`: False indicates the end of a sentence, such as a VAD segmentation point or the end of a WAV file
`audio_fs`: when the input audio is in PCM format, the audio sampling rate parameter needs to be added
```

#### Sending Audio Data
For PCM format, directly send the audio data. For other audio formats, send the header information and audio and video bytes data together. Multiple sampling rates and audio and video formats are supported.

#### Sending End of Audio Flag
After sending the audio data, an end-of-audio flag needs to be sent (which needs to be serialized in JSON):
```text
{"is_speaking": False}
```

### Sending Data from Server to Client
#### Sending Recognition Results
The message (serialized in JSON) is:
```text
{"mode": "offline", "wav_name": "wav_name", "text": "asr ouputs", "is_final": True}
```
Parameter explanation:
```text
`mode`: `offline`, indicating the inference mode for offline file transcription
`wav_name`: the name of the audio file to be transcribed
`text`: the text output of speech recognition
`is_final`: indicating the end of recognition
```

## Real-time Speech Recognition
### System Architecture Diagram

<div align="left"><img src="images/2pass.jpg" width="400"/></div>

### Sending Data from Client to Server
#### Message Format
Configuration parameters and meta information are in JSON format, while audio data is in bytes.

#### Initial Communication
The message (which needs to be serialized in JSON) is:
```text
{"mode": "2pass", "wav_name": "wav_name", "is_speaking": True, "wav_format":"pcm", "chunk_size":[5,10,5]
```
Parameter explanation:
```text
`mode`: `offline` indicates the inference mode for single-sentence recognition; `online` indicates the inference mode for real-time speech recognition; `2pass` indicates real-time speech recognition and offline model correction for sentence endings.
`wav_name`: the name of the audio file to be transcribed
`wav_format`: the audio and video file extension, such as pcm, mp3, mp4, etc. (Note: only PCM audio streams are supported in version 1.0)
`is_speaking`: False indicates the end of a sentence, such as a VAD segmentation point or the end of a WAV file
`chunk_size`: indicates the latency configuration of the streaming model, `[5,10,5]` indicates that the current audio is 600ms long, with a 300ms look-ahead and look-back time.
`audio_fs`: when the input audio is in PCM format, the audio sampling rate parameter needs to be added
```
#### Sending Audio Data
Directly send the audio data, removing the header information and sending only the bytes data. Supported audio sampling rates are 8000 (which needs to be specified as audio_fs in message), and 16000.
#### Sending End of Audio Flag
After sending the audio data, an end-of-audio flag needs to be sent (which needs to be serialized in JSON):
```text
{"is_speaking": False}
```
### Sending Data from Server to Client
#### Sending Recognition Results
The message (serialized in JSON) is:

```text
{"mode": "2pass-online", "wav_name": "wav_name", "text": "asr ouputs", "is_final": True}
```
Parameter explanation:
```text
`mode`: indicates the inference mode, divided into `2pass-online` for real-time recognition results and `2pass-offline` for 2-pass corrected recognition results.
`wav_name`: the name of the audio file to be transcribed
`text`: the text output of speech recognition
`is_final`: indicating the end of recognition
```

 funasr/runtime/docs/websocket_protocol_zh.md

@@ -1,3 +1,4 @@
(简体中文|[English](./websocket_protocol.md))
# websocket/grpc通信协议
## 离线文件转写
### 从客户端往服务端发送数据
@@ -64,7 +65,7 @@
`audio_fs`：当输入音频为pcm数据是，需要加上音频采样率参数
```
#### 发送音频数据
直接将音频数据，移除头部信息后的bytes数据发送，支持音频采样率为80000，16000
直接将音频数据，移除头部信息后的bytes数据发送，支持音频采样率为8000（`message`中需要指定`audio_fs`为8000），16000
#### 发送结束标志
音频数据发送结束后，需要发送结束标志（需要用json序列化）：
```text

 funasr/runtime/python/onnxruntime/funasr_onnx/utils/utils.py

@@ -9,8 +9,11 @@
import re
import numpy as np
import yaml
from onnxruntime import (GraphOptimizationLevel, InferenceSession,
                         SessionOptions, get_available_providers, get_device)
try:
    from onnxruntime import (GraphOptimizationLevel, InferenceSession,
                             SessionOptions, get_available_providers, get_device)
except:
    print("please pip3 install onnxruntime")
import jieba
import warnings


 funasr/runtime/python/onnxruntime/setup.py

@@ -13,7 +13,7 @@


MODULE_NAME = 'funasr_onnx'
VERSION_NUM = '0.1.2'
VERSION_NUM = '0.2.0'

setuptools.setup(
    name=MODULE_NAME,

 funasr/runtime/readme.md

@@ -1,4 +1,4 @@
# FunASR runtime-SDK
# FunASR Runtime Roadmap
中文文档（[点击此处](./readme_cn.md)）

FunASR is a speech recognition framework developed by the Speech Lab of DAMO Academy, which integrates industrial-level models in the fields of speech endpoint detection, speech recognition, punctuation segmentation, and more. 

 funasr/runtime/readme_cn.md

@@ -1,4 +1,4 @@
# FunASR runtime-SDK
# FunASR软件包路线图

English Version（[docs](./readme.md)）

			@@ -6,3 +6,4 @@
			4. We acknowledge [ChinaTelecom](https://github.com/zhuzizyf/damo-fsmn-vad-infer-httpserver) for contributing the VAD runtime.
			5. We acknowledge [RapidAI](https://github.com/RapidAI) for contributing the Paraformer and CT_Transformer-punc runtime.
			6. We acknowledge [AiHealthx](http://www.aihealthx.com/) for contributing the websocket service and html5.
			7. We acknowledge [XVERSE](http://www.xverse.cn/index.html) for contributing the grpc service.

			@@ -63,8 +63,8 @@

			## Contributors

			\| <div align="left"><img src="docs/images/damo.png" width="180"/> \| <div align="left"><img src="docs/images/nwpu.png" width="260"/> \| <img src="docs/images/China_Telecom.png" width="200"/> </div> \| <img src="docs/images/RapidAI.png" width="200"/> </div> \| <img src="docs/images/aihealthx.png" width="200"/> </div> \|
			\|:---------------------------------------------------------------:\|:---------------------------------------------------------------:\|:--------------------------------------------------------------:\|:-------------------------------------------------------:\|:-----------------------------------------------------------:\|
			\| <div align="left"><img src="docs/images/damo.png" width="180"/> \| <div align="left"><img src="docs/images/nwpu.png" width="260"/> \| <img src="docs/images/China_Telecom.png" width="200"/> </div> \| <img src="docs/images/RapidAI.png" width="200"/> </div> \| <img src="docs/images/aihealthx.png" width="200"/> </div> \| <img src="docs/images/XVERSE.png" width="250"/> </div> \|
			\|:---------------------------------------------------------------:\|:---------------------------------------------------------------:\|:--------------------------------------------------------------:\|:-------------------------------------------------------:\|:-----------------------------------------------------------:\|:------------------------------------------------------:\|

			The contributors can be found in [contributors list]((./Acknowledge))

			@@ -60,8 +60,8 @@

			## 社区贡献者

			\| <div align="left"><img src="docs/images/damo.png" width="180"/> \| <div align="left"><img src="docs/images/nwpu.png" width="260"/> \| <img src="docs/images/China_Telecom.png" width="200"/> </div> \| <img src="docs/images/RapidAI.png" width="200"/> </div> \| <img src="docs/images/aihealthx.png" width="200"/> </div> \|
			\|:---------------------------------------------------------------:\|:---------------------------------------------------------------:\|:--------------------------------------------------------------:\|:-------------------------------------------------------:\|:-----------------------------------------------------------:\|
			\| <div align="left"><img src="docs/images/damo.png" width="180"/> \| <div align="left"><img src="docs/images/nwpu.png" width="260"/> \| <img src="docs/images/China_Telecom.png" width="200"/> </div> \| <img src="docs/images/RapidAI.png" width="200"/> </div> \| <img src="docs/images/aihealthx.png" width="200"/> </div> \| <img src="docs/images/XVERSE.png" width="250"/> </div> \|
			\|:---------------------------------------------------------------:\|:---------------------------------------------------------------:\|:--------------------------------------------------------------:\|:-------------------------------------------------------:\|:-----------------------------------------------------------:\|:------------------------------------------------------:\|

			贡献者名单请参考（[致谢名单](./Acknowledge)）

			@@ -71,11 +71,9 @@
			:maxdepth: 1
			:caption: Runtime and Service


			./funasr/runtime/readme.md
			./funasr/runtime/docs/SDK_tutorial_online.md
			./funasr/runtime/docs/SDK_tutorial.md
			./funasr/runtime/python/websocket/README.md
			./funasr/runtime/websocket/readme.md
			./funasr/runtime/html5/readme.md

			@@ -29,7 +29,7 @@
			# curl -O https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/shell/funasr-runtime-deploy-online-cpu-en.sh;
			```

			Execute the deployment tool and press the Enter key at the prompt to complete the installation and deployment of the server. Currently, the convenient deployment tool only supports Linux environments. For other environments, please refer to the development guide ([docs](./SDK_advanced_guide_offline.md)).
			Execute the deployment tool and press the Enter key at the prompt to complete the installation and deployment of the server. Currently, the convenient deployment tool only supports Linux environments. For other environments, please refer to the development guide ([docs](./SDK_advanced_guide_online.md)).
			```shell
			sudo bash funasr-runtime-deploy-online-cpu-zh.sh install --workspace ./funasr-runtime-resources
			```

			@@ -30,7 +30,7 @@
			# curl -O https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/shell/funasr-runtime-deploy-online-cpu-zh.sh;
			```

			执行部署工具，在提示处输入回车键即可完成服务端安装与部署。目前便捷部署工具暂时仅支持Linux环境，其他环境部署参考开发指南（[点击此处](#客户端用法详解)）
			执行部署工具，在提示处输入回车键即可完成服务端安装与部署。目前便捷部署工具暂时仅支持Linux环境，其他环境部署参考开发指南（[点击此处](./SDK_advanced_guide_online_zh.md)）
			```shell
			sudo bash funasr-runtime-deploy-online-cpu-zh.sh install --workspace ./funasr-runtime-resources
			```

New file
			@@ -0,0 +1,88 @@
			([简体中文](./websocket_protocol_zh.md)\|English)

			# WebSocket/gRPC Communication Protocol
			## Offline File Transcription
			### Sending Data from Client to Server
			#### Message Format
			Configuration parameters and meta information are in JSON format, while audio data is in bytes.
			#### Initial Communication
			The message (which needs to be serialized in JSON) is:
			```text
			{"mode": "offline", "wav_name": "wav_name", "is_speaking": True,"wav_format":"pcm"}
			```
			Parameter explanation:
			```text
			`mode`: `offline`, indicating the inference mode for offline file transcription
			`wav_name`: the name of the audio file to be transcribed
			`wav_format`: the audio and video file extension, such as pcm, mp3, mp4, etc.
			`is_speaking`: False indicates the end of a sentence, such as a VAD segmentation point or the end of a WAV file
			`audio_fs`: when the input audio is in PCM format, the audio sampling rate parameter needs to be added
			```

			#### Sending Audio Data
			For PCM format, directly send the audio data. For other audio formats, send the header information and audio and video bytes data together. Multiple sampling rates and audio and video formats are supported.

			#### Sending End of Audio Flag
			After sending the audio data, an end-of-audio flag needs to be sent (which needs to be serialized in JSON):
			```text
			{"is_speaking": False}
			```

			### Sending Data from Server to Client
			#### Sending Recognition Results
			The message (serialized in JSON) is:
			```text
			{"mode": "offline", "wav_name": "wav_name", "text": "asr ouputs", "is_final": True}
			```
			Parameter explanation:
			```text
			`mode`: `offline`, indicating the inference mode for offline file transcription
			`wav_name`: the name of the audio file to be transcribed
			`text`: the text output of speech recognition
			`is_final`: indicating the end of recognition
			```

			## Real-time Speech Recognition
			### System Architecture Diagram

			<div align="left"><img src="images/2pass.jpg" width="400"/></div>

			### Sending Data from Client to Server
			#### Message Format
			Configuration parameters and meta information are in JSON format, while audio data is in bytes.

			#### Initial Communication
			The message (which needs to be serialized in JSON) is:
			```text
			{"mode": "2pass", "wav_name": "wav_name", "is_speaking": True, "wav_format":"pcm", "chunk_size":[5,10,5]
			```
			Parameter explanation:
			```text
			`mode`: `offline` indicates the inference mode for single-sentence recognition; `online` indicates the inference mode for real-time speech recognition; `2pass` indicates real-time speech recognition and offline model correction for sentence endings.
			`wav_name`: the name of the audio file to be transcribed
			`wav_format`: the audio and video file extension, such as pcm, mp3, mp4, etc. (Note: only PCM audio streams are supported in version 1.0)
			`is_speaking`: False indicates the end of a sentence, such as a VAD segmentation point or the end of a WAV file
			`chunk_size`: indicates the latency configuration of the streaming model, `[5,10,5]` indicates that the current audio is 600ms long, with a 300ms look-ahead and look-back time.
			`audio_fs`: when the input audio is in PCM format, the audio sampling rate parameter needs to be added
			```
			#### Sending Audio Data
			Directly send the audio data, removing the header information and sending only the bytes data. Supported audio sampling rates are 8000 (which needs to be specified as audio_fs in message), and 16000.
			#### Sending End of Audio Flag
			After sending the audio data, an end-of-audio flag needs to be sent (which needs to be serialized in JSON):
			```text
			{"is_speaking": False}
			```
			### Sending Data from Server to Client
			#### Sending Recognition Results
			The message (serialized in JSON) is:

			```text
			{"mode": "2pass-online", "wav_name": "wav_name", "text": "asr ouputs", "is_final": True}
			```
			Parameter explanation:
			```text
			`mode`: indicates the inference mode, divided into `2pass-online` for real-time recognition results and `2pass-offline` for 2-pass corrected recognition results.
			`wav_name`: the name of the audio file to be transcribed
			`text`: the text output of speech recognition
			`is_final`: indicating the end of recognition
			```

			@@ -1,3 +1,4 @@
			(简体中文\|[English](./websocket_protocol.md))
			# websocket/grpc通信协议
			## 离线文件转写
			### 从客户端往服务端发送数据
			@@ -64,7 +65,7 @@
			`audio_fs`：当输入音频为pcm数据是，需要加上音频采样率参数
			```
			#### 发送音频数据
			直接将音频数据，移除头部信息后的bytes数据发送，支持音频采样率为80000，16000
			直接将音频数据，移除头部信息后的bytes数据发送，支持音频采样率为8000（`message`中需要指定`audio_fs`为8000），16000
			#### 发送结束标志
			音频数据发送结束后，需要发送结束标志（需要用json序列化）：
			```text

			@@ -9,8 +9,11 @@
			import re
			import numpy as np
			import yaml
			from onnxruntime import (GraphOptimizationLevel, InferenceSession,
			SessionOptions, get_available_providers, get_device)
			try:
			from onnxruntime import (GraphOptimizationLevel, InferenceSession,
			SessionOptions, get_available_providers, get_device)
			except:
			print("please pip3 install onnxruntime")
			import jieba
			import warnings

			@@ -13,7 +13,7 @@


			MODULE_NAME = 'funasr_onnx'
			VERSION_NUM = '0.1.2'
			VERSION_NUM = '0.2.0'

			setuptools.setup(
			name=MODULE_NAME,

			@@ -1,4 +1,4 @@
			# FunASR runtime-SDK
			# FunASR Runtime Roadmap
			中文文档（[点击此处](./readme_cn.md)）

			FunASR is a speech recognition framework developed by the Speech Lab of DAMO Academy, which integrates industrial-level models in the fields of speech endpoint detection, speech recognition, punctuation segmentation, and more.

			@@ -1,4 +1,4 @@
			# FunASR runtime-SDK
			# FunASR软件包路线图

			English Version（[docs](./readme.md)）