From 9a0bc00e5fb2f892987216eafca8aeb140e17c6c Mon Sep 17 00:00:00 2001
From: shixian.shi <shixian.shi@alibaba-inc.com>
Date: 星期五, 13 十月 2023 15:14:00 +0800
Subject: [PATCH] update docs and readme

---
 docs/model_zoo/modelscope_models.md      |    3 ++-
 egs_modelscope/asr/TEMPLATE/README.md    |   22 ++++++++++++++++++++++
 egs_modelscope/asr/TEMPLATE/README_zh.md |   23 +++++++++++++++++++++++
 README_zh.md                             |    3 ++-
 docs/model_zoo/modelscope_models_zh.md   |    3 ++-
 egs_modelscope/asr_vad_spk/TEMPLATE      |    1 +
 README.md                                |    1 +
 7 files changed, 53 insertions(+), 3 deletions(-)

diff --git a/README.md b/README.md
index 0a8ee83..1d083c1 100644
--- a/README.md
+++ b/README.md
@@ -28,6 +28,7 @@
 
 <a name="whats-new"></a>
 ## What's new: 
+- 2023/10/10: The ASR-SpeakersDiarization combined pipeline [speech_campplus_speaker-diarization_common](https://github.com/alibaba-damo-academy/FunASR/blob/main/egs_modelscope/asr_vad_spk/speech_paraformer-large-vad-punc-spk_asr_nat-zh-cn/demo.py) is now released. Experience the model to get recognition results with speaker information.
 - 2023/10/07: [FunCodec](https://github.com/alibaba-damo-academy/FunCodec): A Fundamental, Reproducible and Integrable Open-source Toolkit for Neural Speech Codec.
 - 2023/09/01: The offline file transcription service 2.0 (CPU) of Mandarin has been released, with added support for ffmpeg, timestamp, and hotword models. For more details, please refer to ([Deployment documentation](funasr/runtime/docs/SDK_tutorial.md)).
 - 2023/08/07: The real-time transcription service (CPU) of Mandarin has been released. For more details, please refer to ([Deployment documentation](funasr/runtime/docs/SDK_tutorial_online.md)).
diff --git a/README_zh.md b/README_zh.md
index 29fa307..5ff9fce 100644
--- a/README_zh.md
+++ b/README_zh.md
@@ -31,8 +31,9 @@
 
 <a name="鏈�鏂板姩鎬�"></a>
 ## 鏈�鏂板姩鎬�
+- 2023.10.10: [Paraformer-long-Spk](https://github.com/alibaba-damo-academy/FunASR/blob/main/egs_modelscope/asr_vad_spk/speech_paraformer-large-vad-punc-spk_asr_nat-zh-cn/demo.py)妯″瀷鍙戝竷锛屾敮鎸佸湪闀胯闊宠瘑鍒殑鍩虹涓婅幏鍙栨瘡鍙ヨ瘽鐨勮璇濅汉鏍囩銆�
 - 2023.10.07: [FunCodec](https://github.com/alibaba-damo-academy/FunCodec): FunCodec鎻愪緵寮�婧愭ā鍨嬪拰璁粌宸ュ叿锛屽彲浠ョ敤浜庨煶棰戠鏁g紪鐮侊紝浠ュ強鍩轰簬绂绘暎缂栫爜鐨勮闊宠瘑鍒�佽闊冲悎鎴愮瓑浠诲姟銆�
-- 2023.09.01锛氫腑鏂囩绾挎枃浠惰浆鍐欐湇鍔�2.0 CPU鐗堟湰鍙戝竷锛屾柊澧瀎fmpeg銆佹椂闂存埑涓庣儹璇嶆ā鍨嬫敮鎸侊紝璇︾粏淇℃伅鍙傞槄([涓�閿儴缃叉枃妗(funasr/runtime/docs/SDK_tutorial_zh.md))
+- 2023.09.01: 涓枃绂荤嚎鏂囦欢杞啓鏈嶅姟2.0 CPU鐗堟湰鍙戝竷锛屾柊澧瀎fmpeg銆佹椂闂存埑涓庣儹璇嶆ā鍨嬫敮鎸侊紝璇︾粏淇℃伅鍙傞槄([涓�閿儴缃叉枃妗(funasr/runtime/docs/SDK_tutorial_zh.md))
 - 2023.08.07: 涓枃瀹炴椂璇煶鍚啓鏈嶅姟涓�閿儴缃茬殑CPU鐗堟湰鍙戝竷锛岃缁嗕俊鎭弬闃�([涓�閿儴缃叉枃妗(funasr/runtime/docs/SDK_tutorial_online_zh.md))
 - 2023.07.17: BAT涓�绉嶄綆寤惰繜浣庡唴瀛樻秷鑰楃殑RNN-T妯″瀷鍙戝竷锛岃缁嗕俊鎭弬闃咃紙[BAT](egs/aishell/bat)锛�
 - 2023.07.03: 涓枃绂荤嚎鏂囦欢杞啓鏈嶅姟涓�閿儴缃茬殑CPU鐗堟湰鍙戝竷锛岃缁嗕俊鎭弬闃�([涓�閿儴缃叉枃妗(funasr/runtime/docs/SDK_tutorial_zh.md))
diff --git a/docs/model_zoo/modelscope_models.md b/docs/model_zoo/modelscope_models.md
index 23180ca..1e15381 100644
--- a/docs/model_zoo/modelscope_models.md
+++ b/docs/model_zoo/modelscope_models.md
@@ -17,7 +17,8 @@
 |                                                                     Model Name                                                                     | Language |          Training Data           | Vocab Size | Parameter | Offline/Online | Notes                                                                                                                           |
 |:--------------------------------------------------------------------------------------------------------------------------------------------------:|:--------:|:--------------------------------:|:----------:|:---------:|:--------------:|:--------------------------------------------------------------------------------------------------------------------------------|
 |        [Paraformer-large](https://www.modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary)        | CN & EN  | Alibaba Speech Data (60000hours) |    8404    |   220M    |    Offline     | Duration of input wav <= 20s                                                                                                    |
-| [Paraformer-large-long](https://www.modelscope.cn/models/damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary) | CN & EN  | Alibaba Speech Data (60000hours) |    8404    |   220M    |    Offline     | Which would deal with arbitrary length input wav                                                                                 |
+| [Paraformer-large-long](https://www.modelscope.cn/models/damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary) | CN & EN  | Alibaba Speech Data (60000hours) |    8404    |   220M    |    Offline     | Which would deal with arbitrary length input wav |
+| [Paraformer-large-Spk](https://modelscope.cn/models/damo/speech_paraformer-large-vad-punc-spk_asr_nat-zh-cn/summary) | CN & EN  | Alibaba Speech Data (60000hours) |    8404    |   220M    |    Offline     | Supporting speaker diarizatioin for ASR results based on paraformer-large-long |
 | [Paraformer-large-contextual](https://www.modelscope.cn/models/damo/speech_paraformer-large-contextual_asr_nat-zh-cn-16k-common-vocab8404/summary) | CN & EN  | Alibaba Speech Data (60000hours) |    8404    |   220M    |    Offline     | Which supports the hotword customization based on the incentive enhancement, and improves the recall and precision of hotwords. |
 |              [Paraformer](https://modelscope.cn/models/damo/speech_paraformer_asr_nat-zh-cn-16k-common-vocab8358-tensorflow1/summary)              | CN & EN  | Alibaba Speech Data (50000hours) |    8358    |    68M    |    Offline     | Duration of input wav <= 20s                                                                                                    |
 |           [Paraformer-online](https://www.modelscope.cn/models/damo/speech_paraformer_asr_nat-zh-cn-16k-common-vocab8404-online/summary)           | CN & EN  | Alibaba Speech Data (50000hours) |    8404    |    68M    |     Online     | Which could deal with streaming input                                                                                           |
diff --git a/docs/model_zoo/modelscope_models_zh.md b/docs/model_zoo/modelscope_models_zh.md
index 6821fee..c21ae97 100644
--- a/docs/model_zoo/modelscope_models_zh.md
+++ b/docs/model_zoo/modelscope_models_zh.md
@@ -17,7 +17,8 @@
 |                                                                     妯″瀷鍚嶅瓧                                                                     |    璇█    |         璁粌鏁版嵁          |       璇嶅吀澶у皬        | 鍙傛暟閲�  | 闈炲疄鏃�/瀹炴椂  | 澶囨敞                         |
 |:--------------------------------------------------------------------------------------------------------------------------------------------------:|:--------:|:---------------------:|:-----------------:|:----:|:-------:|:---------------------------|
 |        [Paraformer-large](https://www.modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary)        |  涓枃鍜岃嫳鏂�   |    闃块噷宸村反璇煶鏁版嵁锛�60000灏忔椂锛�  |       8404        | 220M |   闈炲疄鏃�   | 杈撳叆wav鏂囦欢鎸佺画鏃堕棿涓嶈秴杩�20绉�          |
-| [Paraformer-large闀块煶棰戠増鏈琞(https://www.modelscope.cn/models/damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary) |  涓枃鍜岃嫳鏂�   |   闃块噷宸村反璇煶鏁版嵁锛�60000灏忔椂锛�   |       8404        | 220M |   闈炲疄鏃�   || 鑳藉澶勭悊浠绘剰闀垮害鐨勮緭鍏av鏂囦欢                                                                                |
+| [Paraformer-large闀块煶棰戠増鏈琞(https://www.modelscope.cn/models/damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary) |  涓枃鍜岃嫳鏂�   |   闃块噷宸村反璇煶鏁版嵁锛�60000灏忔椂锛�   |       8404        | 220M |   闈炲疄鏃�   | 鑳藉澶勭悊浠绘剰闀垮害鐨勮緭鍏av鏂囦欢         |
+| [Paraformer-large-Spk](https://modelscope.cn/models/damo/speech_paraformer-large-vad-punc-spk_asr_nat-zh-cn/summary) |  涓枃鍜岃嫳鏂�   |   闃块噷宸村反璇煶鏁版嵁锛�60000灏忔椂锛�   |       8404        | 220M |   闈炲疄鏃�   | 鍦ㄩ暱闊抽鍔熻兘鐨勫熀纭�涓婃坊鍔犺璇濅汉璇嗗埆鍔熻兘         |
 |     [Paraformer-large鐑瘝](https://www.modelscope.cn/models/damo/speech_paraformer-large-contextual_asr_nat-zh-cn-16k-common-vocab8404/summary)      |         涓枃鍜岃嫳鏂�         | 闃块噷宸村反璇煶鏁版嵁锛�60000灏忔椂锛� | 8404 |  220M   | 闈炲疄鏃�                        | 鍩轰簬婵�鍔卞寮虹殑鐑瘝瀹氬埗鏀寔锛屽彲浠ユ彁楂樼儹璇嶇殑鍙洖鐜囧拰鍑嗙‘鐜囷紝杈撳叆wav鏂囦欢鎸佺画鏃堕棿涓嶈秴杩�20绉�  |
 |       [Paraformer](https://modelscope.cn/models/damo/speech_paraformer_asr_nat-zh-cn-16k-common-vocab8358-tensorflow1/summary)                     |   涓枃鍜岃嫳鏂�  |   闃块噷宸村反璇煶鏁版嵁锛�50000灏忔椂锛�   |       8358        | 68M  |   绂荤嚎    | 杈撳叆wav鏂囦欢鎸佺画鏃堕棿涓嶈秴杩�20绉�          |
 |               [Paraformer瀹炴椂](https://modelscope.cn/models/damo/speech_paraformer_asr_nat-zh-cn-16k-common-vocab8404-online/summary)                | 涓枃鍜岃嫳鏂�  | 闃块噷宸村反璇煶鏁版嵁 (50000hours) |       8404        | 68M  | 瀹炴椂  | 鑳藉澶勭悊娴佸紡杈撳叆                   |
diff --git a/egs_modelscope/asr/TEMPLATE/README.md b/egs_modelscope/asr/TEMPLATE/README.md
index e44a09d..ac73950 100644
--- a/egs_modelscope/asr/TEMPLATE/README.md
+++ b/egs_modelscope/asr/TEMPLATE/README.md
@@ -99,6 +99,28 @@
 ```
 The decoding mode of `fast` and `normal` is fake streaming, which could be used for evaluating of recognition accuracy.
 Full code of demo, please ref to [demo](https://github.com/alibaba-damo-academy/FunASR/discussions/151)
+
+#### [Paraformer-Spk](https://modelscope.cn/models/damo/speech_paraformer-large-vad-punc-spk_asr_nat-zh-cn/summary)
+This model allows user to get recognition results which contain speaker info of each sentence. Refer to [CAM++](https://modelscope.cn/models/damo/speech_campplus_speaker-diarization_common/summary) for detailed information about speaker diarization model.
+```python
+from modelscope.pipelines import pipeline
+from modelscope.utils.constant import Tasks
+
+if __name__ == '__main__':
+    audio_in = 'https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_speaker_demo.wav'
+    output_dir = "./results"
+    inference_pipeline = pipeline(
+        task=Tasks.auto_speech_recognition,
+        model='damo/speech_paraformer-large-vad-punc-spk_asr_nat-zh-cn',
+        model_revision='v0.0.2',
+        vad_model='damo/speech_fsmn_vad_zh-cn-16k-common-pytorch',
+        punc_model='damo/punc_ct-transformer_cn-en-common-vocab471067-large',
+        output_dir=output_dir,
+    )
+    rec_result = inference_pipeline(audio_in=audio_in, batch_size_token=5000, batch_size_token_threshold_s=40, max_single_segment_time=6000)
+    print(rec_result)
+```
+
 #### [RNN-T-online model]()
 Undo
 
diff --git a/egs_modelscope/asr/TEMPLATE/README_zh.md b/egs_modelscope/asr/TEMPLATE/README_zh.md
index d1fd269..47656b3 100644
--- a/egs_modelscope/asr/TEMPLATE/README_zh.md
+++ b/egs_modelscope/asr/TEMPLATE/README_zh.md
@@ -100,6 +100,29 @@
 fast 鍜� normal 鐨勮В鐮佹ā寮忔槸鍋囨祦寮忚В鐮侊紝鍙敤浜庤瘎浼拌瘑鍒噯纭�с��
 婕旂ず鐨勫畬鏁翠唬鐮侊紝璇峰弬瑙� [demo](https://github.com/alibaba-damo-academy/FunASR/discussions/151)
 
+#### [Paraformer-Spk model](https://modelscope.cn/models/damo/speech_paraformer-large-vad-punc-spk_asr_nat-zh-cn/summary)
+杩斿洖璇嗗埆缁撴灉鐨勫悓鏃惰繑鍥炴瘡涓瓙鍙ョ殑璇磋瘽浜哄垎绫荤粨鏋溿�傚叧浜庤璇濅汉鏃ュ織妯″瀷鐨勮鎯呰瑙乕CAM++](https://modelscope.cn/models/damo/speech_campplus_speaker-diarization_common/summary)銆�
+
+```python
+from modelscope.pipelines import pipeline
+from modelscope.utils.constant import Tasks
+
+if __name__ == '__main__':
+    audio_in = 'https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_speaker_demo.wav'
+    output_dir = "./results"
+    inference_pipeline = pipeline(
+        task=Tasks.auto_speech_recognition,
+        model='damo/speech_paraformer-large-vad-punc-spk_asr_nat-zh-cn',
+        model_revision='v0.0.2',
+        vad_model='damo/speech_fsmn_vad_zh-cn-16k-common-pytorch',
+        punc_model='damo/punc_ct-transformer_cn-en-common-vocab471067-large',
+        output_dir=output_dir,
+    )
+    rec_result = inference_pipeline(audio_in=audio_in, batch_size_token=5000, batch_size_token_threshold_s=40, max_single_segment_time=6000)
+    print(rec_result)
+```
+
+
 #### [RNN-T-online 妯″瀷]()
 Undo
 
diff --git a/egs_modelscope/asr_vad_spk/TEMPLATE b/egs_modelscope/asr_vad_spk/TEMPLATE
new file mode 120000
index 0000000..f969ea0
--- /dev/null
+++ b/egs_modelscope/asr_vad_spk/TEMPLATE
@@ -0,0 +1 @@
+../asr/TEMPLATE
\ No newline at end of file

--
Gitblit v1.9.1