From c2e4e3c2e9be855277d9f4fa9cd0544892ff829a Mon Sep 17 00:00:00 2001
From: 游雁 <zhifu.gzf@alibaba-inc.com>
Date: 星期三, 30 八月 2023 09:57:30 +0800
Subject: [PATCH] Merge branch 'main' of github.com:alibaba-damo-academy/FunASR add

---
 funasr/runtime/docs/websocket_protocol.md |   14 +++++++++-----
 1 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/funasr/runtime/docs/websocket_protocol.md b/funasr/runtime/docs/websocket_protocol.md
index 94823a2..5521935 100644
--- a/funasr/runtime/docs/websocket_protocol.md
+++ b/funasr/runtime/docs/websocket_protocol.md
@@ -1,6 +1,8 @@
 ([绠�浣撲腑鏂嘳(./websocket_protocol_zh.md)|English)
 
 # WebSocket/gRPC Communication Protocol
+This protocol is the communication protocol for the FunASR software package, which includes offline file transcription ([deployment document](./SDK_tutorial.md)) and real-time speech recognition ([deployment document](./SDK_tutorial_online.md)).
+
 ## Offline File Transcription
 ### Sending Data from Client to Server
 #### Message Format
@@ -8,7 +10,7 @@
 #### Initial Communication
 The message (which needs to be serialized in JSON) is:
 ```text
-{"mode": "offline", "wav_name": "wav_name", "is_speaking": True,"wav_format":"pcm"}
+{"mode": "offline", "wav_name": "wav_name","wav_format":"pcm","is_speaking": True,"wav_format":"pcm","hotwords":"闃块噷宸村反 杈炬懇闄� 闃块噷浜�"}
 ```
 Parameter explanation:
 ```text
@@ -17,6 +19,7 @@
 `wav_format`: the audio and video file extension, such as pcm, mp3, mp4, etc.
 `is_speaking`: False indicates the end of a sentence, such as a VAD segmentation point or the end of a WAV file
 `audio_fs`: when the input audio is in PCM format, the audio sampling rate parameter needs to be added
+`hotwords`锛欼f AM is the hotword model, hotword data needs to be sent to the server in string format, with " " used as a separator between hotwords. For example锛�"闃块噷宸村反 杈炬懇闄� 闃块噷浜�"
 ```
 
 #### Sending Audio Data
@@ -32,7 +35,7 @@
 #### Sending Recognition Results
 The message (serialized in JSON) is:
 ```text
-{"mode": "offline", "wav_name": "wav_name", "text": "asr ouputs", "is_final": True}
+{"mode": "offline", "wav_name": "wav_name", "text": "asr ouputs", "is_final": True, "timestamp":"[[100,200], [200,500]]"}
 ```
 Parameter explanation:
 ```text
@@ -40,12 +43,13 @@
 `wav_name`: the name of the audio file to be transcribed
 `text`: the text output of speech recognition
 `is_final`: indicating the end of recognition
+`timestamp`锛欼f AM is a timestamp model, it will return this field, indicating the timestamp, in the format of "[[100,200], [200,500]]"
 ```
 
 ## Real-time Speech Recognition
 ### System Architecture Diagram
 
-<div align="left"><img src="images/2pass.jpg" width="400"/></div>
+<div align="left"><img src="images/2pass.jpg" width="600"/></div>
 
 ### Sending Data from Client to Server
 #### Message Format
@@ -54,7 +58,7 @@
 #### Initial Communication
 The message (which needs to be serialized in JSON) is:
 ```text
-{"mode": "2pass", "wav_name": "wav_name", "is_speaking": True, "wav_format":"pcm", "chunk_size":[5,10,5]
+{"mode": "2pass", "wav_name": "wav_name", "is_speaking": True, "wav_format":"pcm", "chunk_size":[5,10,5]}
 ```
 Parameter explanation:
 ```text
@@ -85,4 +89,4 @@
 `wav_name`: the name of the audio file to be transcribed
 `text`: the text output of speech recognition
 `is_final`: indicating the end of recognition
-```
\ No newline at end of file
+```

--
Gitblit v1.9.1