From 8e2b7a67b967e65456a56522942d4a7d259eeb94 Mon Sep 17 00:00:00 2001
From: 游雁 <zhifu.gzf@alibaba-inc.com>
Date: 星期三, 09 八月 2023 10:55:34 +0800
Subject: [PATCH] docs

---
 funasr/runtime/docs/SDK_tutorial_online.md    |    2 
 funasr/runtime/docs/websocket_protocol_zh.md  |    1 
 funasr/runtime/docs/SDK_tutorial_online_zh.md |    2 
 funasr/runtime/docs/websocket_protocol.md     |   88 ++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 91 insertions(+), 2 deletions(-)

diff --git a/funasr/runtime/docs/SDK_tutorial_online.md b/funasr/runtime/docs/SDK_tutorial_online.md
index b4800bb..bc02176 100644
--- a/funasr/runtime/docs/SDK_tutorial_online.md
+++ b/funasr/runtime/docs/SDK_tutorial_online.md
@@ -29,7 +29,7 @@
 # curl -O https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/shell/funasr-runtime-deploy-online-cpu-en.sh;
 ```
 
-Execute the deployment tool and press the Enter key at the prompt to complete the installation and deployment of the server. Currently, the convenient deployment tool only supports Linux environments. For other environments, please refer to the development guide ([docs](./SDK_advanced_guide_offline.md)).
+Execute the deployment tool and press the Enter key at the prompt to complete the installation and deployment of the server. Currently, the convenient deployment tool only supports Linux environments. For other environments, please refer to the development guide ([docs](./SDK_advanced_guide_online.md)).
 ```shell
 sudo bash funasr-runtime-deploy-online-cpu-zh.sh install --workspace ./funasr-runtime-resources
 ```
diff --git a/funasr/runtime/docs/SDK_tutorial_online_zh.md b/funasr/runtime/docs/SDK_tutorial_online_zh.md
index 89ba6f3..159b0d7 100644
--- a/funasr/runtime/docs/SDK_tutorial_online_zh.md
+++ b/funasr/runtime/docs/SDK_tutorial_online_zh.md
@@ -30,7 +30,7 @@
 # curl -O https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/shell/funasr-runtime-deploy-online-cpu-zh.sh;
 ```
 
-鎵ц閮ㄧ讲宸ュ叿锛屽湪鎻愮ず澶勮緭鍏ュ洖杞﹂敭鍗冲彲瀹屾垚鏈嶅姟绔畨瑁呬笌閮ㄧ讲銆傜洰鍓嶄究鎹烽儴缃插伐鍏锋殏鏃朵粎鏀寔Linux鐜锛屽叾浠栫幆澧冮儴缃插弬鑰冨紑鍙戞寚鍗楋紙[鐐瑰嚮姝ゅ](#瀹㈡埛绔敤娉曡瑙�)锛�
+鎵ц閮ㄧ讲宸ュ叿锛屽湪鎻愮ず澶勮緭鍏ュ洖杞﹂敭鍗冲彲瀹屾垚鏈嶅姟绔畨瑁呬笌閮ㄧ讲銆傜洰鍓嶄究鎹烽儴缃插伐鍏锋殏鏃朵粎鏀寔Linux鐜锛屽叾浠栫幆澧冮儴缃插弬鑰冨紑鍙戞寚鍗楋紙[鐐瑰嚮姝ゅ](./SDK_advanced_guide_online_zh.md)锛�
 ```shell
 sudo bash funasr-runtime-deploy-online-cpu-zh.sh install --workspace ./funasr-runtime-resources
 ```
diff --git a/funasr/runtime/docs/websocket_protocol.md b/funasr/runtime/docs/websocket_protocol.md
new file mode 100644
index 0000000..94823a2
--- /dev/null
+++ b/funasr/runtime/docs/websocket_protocol.md
@@ -0,0 +1,88 @@
+([绠�浣撲腑鏂嘳(./websocket_protocol_zh.md)|English)
+
+# WebSocket/gRPC Communication Protocol
+## Offline File Transcription
+### Sending Data from Client to Server
+#### Message Format
+Configuration parameters and meta information are in JSON format, while audio data is in bytes.
+#### Initial Communication
+The message (which needs to be serialized in JSON) is:
+```text
+{"mode": "offline", "wav_name": "wav_name", "is_speaking": True,"wav_format":"pcm"}
+```
+Parameter explanation:
+```text
+`mode`: `offline`, indicating the inference mode for offline file transcription
+`wav_name`: the name of the audio file to be transcribed
+`wav_format`: the audio and video file extension, such as pcm, mp3, mp4, etc.
+`is_speaking`: False indicates the end of a sentence, such as a VAD segmentation point or the end of a WAV file
+`audio_fs`: when the input audio is in PCM format, the audio sampling rate parameter needs to be added
+```
+
+#### Sending Audio Data
+For PCM format, directly send the audio data. For other audio formats, send the header information and audio and video bytes data together. Multiple sampling rates and audio and video formats are supported.
+
+#### Sending End of Audio Flag
+After sending the audio data, an end-of-audio flag needs to be sent (which needs to be serialized in JSON):
+```text
+{"is_speaking": False}
+```
+
+### Sending Data from Server to Client
+#### Sending Recognition Results
+The message (serialized in JSON) is:
+```text
+{"mode": "offline", "wav_name": "wav_name", "text": "asr ouputs", "is_final": True}
+```
+Parameter explanation:
+```text
+`mode`: `offline`, indicating the inference mode for offline file transcription
+`wav_name`: the name of the audio file to be transcribed
+`text`: the text output of speech recognition
+`is_final`: indicating the end of recognition
+```
+
+## Real-time Speech Recognition
+### System Architecture Diagram
+
+<div align="left"><img src="images/2pass.jpg" width="400"/></div>
+
+### Sending Data from Client to Server
+#### Message Format
+Configuration parameters and meta information are in JSON format, while audio data is in bytes.
+
+#### Initial Communication
+The message (which needs to be serialized in JSON) is:
+```text
+{"mode": "2pass", "wav_name": "wav_name", "is_speaking": True, "wav_format":"pcm", "chunk_size":[5,10,5]
+```
+Parameter explanation:
+```text
+`mode`: `offline` indicates the inference mode for single-sentence recognition; `online` indicates the inference mode for real-time speech recognition; `2pass` indicates real-time speech recognition and offline model correction for sentence endings.
+`wav_name`: the name of the audio file to be transcribed
+`wav_format`: the audio and video file extension, such as pcm, mp3, mp4, etc. (Note: only PCM audio streams are supported in version 1.0)
+`is_speaking`: False indicates the end of a sentence, such as a VAD segmentation point or the end of a WAV file
+`chunk_size`: indicates the latency configuration of the streaming model, `[5,10,5]` indicates that the current audio is 600ms long, with a 300ms look-ahead and look-back time.
+`audio_fs`: when the input audio is in PCM format, the audio sampling rate parameter needs to be added
+```
+#### Sending Audio Data
+Directly send the audio data, removing the header information and sending only the bytes data. Supported audio sampling rates are 8000 (which needs to be specified as audio_fs in message), and 16000.
+#### Sending End of Audio Flag
+After sending the audio data, an end-of-audio flag needs to be sent (which needs to be serialized in JSON):
+```text
+{"is_speaking": False}
+```
+### Sending Data from Server to Client
+#### Sending Recognition Results
+The message (serialized in JSON) is:
+
+```text
+{"mode": "2pass-online", "wav_name": "wav_name", "text": "asr ouputs", "is_final": True}
+```
+Parameter explanation:
+```text
+`mode`: indicates the inference mode, divided into `2pass-online` for real-time recognition results and `2pass-offline` for 2-pass corrected recognition results.
+`wav_name`: the name of the audio file to be transcribed
+`text`: the text output of speech recognition
+`is_final`: indicating the end of recognition
+```
\ No newline at end of file
diff --git a/funasr/runtime/docs/websocket_protocol_zh.md b/funasr/runtime/docs/websocket_protocol_zh.md
index b8ac7c3..38ab3c4 100644
--- a/funasr/runtime/docs/websocket_protocol_zh.md
+++ b/funasr/runtime/docs/websocket_protocol_zh.md
@@ -1,3 +1,4 @@
+(绠�浣撲腑鏂噟[English](./websocket_protocol.md))
 # websocket/grpc閫氫俊鍗忚
 ## 绂荤嚎鏂囦欢杞啓
 ### 浠庡鎴风寰�鏈嶅姟绔彂閫佹暟鎹�

--
Gitblit v1.9.1