From 9a0bc00e5fb2f892987216eafca8aeb140e17c6c Mon Sep 17 00:00:00 2001 From: shixian.shi <shixian.shi@alibaba-inc.com> Date: 星期五, 13 十月 2023 15:14:00 +0800 Subject: [PATCH] update docs and readme --- docs/model_zoo/modelscope_models.md | 3 ++- egs_modelscope/asr/TEMPLATE/README.md | 22 ++++++++++++++++++++++ egs_modelscope/asr/TEMPLATE/README_zh.md | 23 +++++++++++++++++++++++ README_zh.md | 3 ++- docs/model_zoo/modelscope_models_zh.md | 3 ++- egs_modelscope/asr_vad_spk/TEMPLATE | 1 + README.md | 1 + 7 files changed, 53 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index 0a8ee83..1d083c1 100644 --- a/README.md +++ b/README.md @@ -28,6 +28,7 @@ <a name="whats-new"></a> ## What's new: +- 2023/10/10: The ASR-SpeakersDiarization combined pipeline [speech_campplus_speaker-diarization_common](https://github.com/alibaba-damo-academy/FunASR/blob/main/egs_modelscope/asr_vad_spk/speech_paraformer-large-vad-punc-spk_asr_nat-zh-cn/demo.py) is now released. Experience the model to get recognition results with speaker information. - 2023/10/07: [FunCodec](https://github.com/alibaba-damo-academy/FunCodec): A Fundamental, Reproducible and Integrable Open-source Toolkit for Neural Speech Codec. - 2023/09/01: The offline file transcription service 2.0 (CPU) of Mandarin has been released, with added support for ffmpeg, timestamp, and hotword models. For more details, please refer to ([Deployment documentation](funasr/runtime/docs/SDK_tutorial.md)). - 2023/08/07: The real-time transcription service (CPU) of Mandarin has been released. For more details, please refer to ([Deployment documentation](funasr/runtime/docs/SDK_tutorial_online.md)). diff --git a/README_zh.md b/README_zh.md index 29fa307..5ff9fce 100644 --- a/README_zh.md +++ b/README_zh.md @@ -31,8 +31,9 @@ <a name="鏈�鏂板姩鎬�"></a> ## 鏈�鏂板姩鎬� +- 2023.10.10: [Paraformer-long-Spk](https://github.com/alibaba-damo-academy/FunASR/blob/main/egs_modelscope/asr_vad_spk/speech_paraformer-large-vad-punc-spk_asr_nat-zh-cn/demo.py)妯″瀷鍙戝竷锛屾敮鎸佸湪闀胯闊宠瘑鍒殑鍩虹涓婅幏鍙栨瘡鍙ヨ瘽鐨勮璇濅汉鏍囩銆� - 2023.10.07: [FunCodec](https://github.com/alibaba-damo-academy/FunCodec): FunCodec鎻愪緵寮�婧愭ā鍨嬪拰璁粌宸ュ叿锛屽彲浠ョ敤浜庨煶棰戠鏁g紪鐮侊紝浠ュ強鍩轰簬绂绘暎缂栫爜鐨勮闊宠瘑鍒�佽闊冲悎鎴愮瓑浠诲姟銆� -- 2023.09.01锛氫腑鏂囩绾挎枃浠惰浆鍐欐湇鍔�2.0 CPU鐗堟湰鍙戝竷锛屾柊澧瀎fmpeg銆佹椂闂存埑涓庣儹璇嶆ā鍨嬫敮鎸侊紝璇︾粏淇℃伅鍙傞槄([涓�閿儴缃叉枃妗(funasr/runtime/docs/SDK_tutorial_zh.md)) +- 2023.09.01: 涓枃绂荤嚎鏂囦欢杞啓鏈嶅姟2.0 CPU鐗堟湰鍙戝竷锛屾柊澧瀎fmpeg銆佹椂闂存埑涓庣儹璇嶆ā鍨嬫敮鎸侊紝璇︾粏淇℃伅鍙傞槄([涓�閿儴缃叉枃妗(funasr/runtime/docs/SDK_tutorial_zh.md)) - 2023.08.07: 涓枃瀹炴椂璇煶鍚啓鏈嶅姟涓�閿儴缃茬殑CPU鐗堟湰鍙戝竷锛岃缁嗕俊鎭弬闃�([涓�閿儴缃叉枃妗(funasr/runtime/docs/SDK_tutorial_online_zh.md)) - 2023.07.17: BAT涓�绉嶄綆寤惰繜浣庡唴瀛樻秷鑰楃殑RNN-T妯″瀷鍙戝竷锛岃缁嗕俊鎭弬闃咃紙[BAT](egs/aishell/bat)锛� - 2023.07.03: 涓枃绂荤嚎鏂囦欢杞啓鏈嶅姟涓�閿儴缃茬殑CPU鐗堟湰鍙戝竷锛岃缁嗕俊鎭弬闃�([涓�閿儴缃叉枃妗(funasr/runtime/docs/SDK_tutorial_zh.md)) diff --git a/docs/model_zoo/modelscope_models.md b/docs/model_zoo/modelscope_models.md index 23180ca..1e15381 100644 --- a/docs/model_zoo/modelscope_models.md +++ b/docs/model_zoo/modelscope_models.md @@ -17,7 +17,8 @@ | Model Name | Language | Training Data | Vocab Size | Parameter | Offline/Online | Notes | |:--------------------------------------------------------------------------------------------------------------------------------------------------:|:--------:|:--------------------------------:|:----------:|:---------:|:--------------:|:--------------------------------------------------------------------------------------------------------------------------------| | [Paraformer-large](https://www.modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary) | CN & EN | Alibaba Speech Data (60000hours) | 8404 | 220M | Offline | Duration of input wav <= 20s | -| [Paraformer-large-long](https://www.modelscope.cn/models/damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary) | CN & EN | Alibaba Speech Data (60000hours) | 8404 | 220M | Offline | Which would deal with arbitrary length input wav | +| [Paraformer-large-long](https://www.modelscope.cn/models/damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary) | CN & EN | Alibaba Speech Data (60000hours) | 8404 | 220M | Offline | Which would deal with arbitrary length input wav | +| [Paraformer-large-Spk](https://modelscope.cn/models/damo/speech_paraformer-large-vad-punc-spk_asr_nat-zh-cn/summary) | CN & EN | Alibaba Speech Data (60000hours) | 8404 | 220M | Offline | Supporting speaker diarizatioin for ASR results based on paraformer-large-long | | [Paraformer-large-contextual](https://www.modelscope.cn/models/damo/speech_paraformer-large-contextual_asr_nat-zh-cn-16k-common-vocab8404/summary) | CN & EN | Alibaba Speech Data (60000hours) | 8404 | 220M | Offline | Which supports the hotword customization based on the incentive enhancement, and improves the recall and precision of hotwords. | | [Paraformer](https://modelscope.cn/models/damo/speech_paraformer_asr_nat-zh-cn-16k-common-vocab8358-tensorflow1/summary) | CN & EN | Alibaba Speech Data (50000hours) | 8358 | 68M | Offline | Duration of input wav <= 20s | | [Paraformer-online](https://www.modelscope.cn/models/damo/speech_paraformer_asr_nat-zh-cn-16k-common-vocab8404-online/summary) | CN & EN | Alibaba Speech Data (50000hours) | 8404 | 68M | Online | Which could deal with streaming input | diff --git a/docs/model_zoo/modelscope_models_zh.md b/docs/model_zoo/modelscope_models_zh.md index 6821fee..c21ae97 100644 --- a/docs/model_zoo/modelscope_models_zh.md +++ b/docs/model_zoo/modelscope_models_zh.md @@ -17,7 +17,8 @@ | 妯″瀷鍚嶅瓧 | 璇█ | 璁粌鏁版嵁 | 璇嶅吀澶у皬 | 鍙傛暟閲� | 闈炲疄鏃�/瀹炴椂 | 澶囨敞 | |:--------------------------------------------------------------------------------------------------------------------------------------------------:|:--------:|:---------------------:|:-----------------:|:----:|:-------:|:---------------------------| | [Paraformer-large](https://www.modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary) | 涓枃鍜岃嫳鏂� | 闃块噷宸村反璇煶鏁版嵁锛�60000灏忔椂锛� | 8404 | 220M | 闈炲疄鏃� | 杈撳叆wav鏂囦欢鎸佺画鏃堕棿涓嶈秴杩�20绉� | -| [Paraformer-large闀块煶棰戠増鏈琞(https://www.modelscope.cn/models/damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary) | 涓枃鍜岃嫳鏂� | 闃块噷宸村反璇煶鏁版嵁锛�60000灏忔椂锛� | 8404 | 220M | 闈炲疄鏃� || 鑳藉澶勭悊浠绘剰闀垮害鐨勮緭鍏av鏂囦欢 | +| [Paraformer-large闀块煶棰戠増鏈琞(https://www.modelscope.cn/models/damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary) | 涓枃鍜岃嫳鏂� | 闃块噷宸村反璇煶鏁版嵁锛�60000灏忔椂锛� | 8404 | 220M | 闈炲疄鏃� | 鑳藉澶勭悊浠绘剰闀垮害鐨勮緭鍏av鏂囦欢 | +| [Paraformer-large-Spk](https://modelscope.cn/models/damo/speech_paraformer-large-vad-punc-spk_asr_nat-zh-cn/summary) | 涓枃鍜岃嫳鏂� | 闃块噷宸村反璇煶鏁版嵁锛�60000灏忔椂锛� | 8404 | 220M | 闈炲疄鏃� | 鍦ㄩ暱闊抽鍔熻兘鐨勫熀纭�涓婃坊鍔犺璇濅汉璇嗗埆鍔熻兘 | | [Paraformer-large鐑瘝](https://www.modelscope.cn/models/damo/speech_paraformer-large-contextual_asr_nat-zh-cn-16k-common-vocab8404/summary) | 涓枃鍜岃嫳鏂� | 闃块噷宸村反璇煶鏁版嵁锛�60000灏忔椂锛� | 8404 | 220M | 闈炲疄鏃� | 鍩轰簬婵�鍔卞寮虹殑鐑瘝瀹氬埗鏀寔锛屽彲浠ユ彁楂樼儹璇嶇殑鍙洖鐜囧拰鍑嗙‘鐜囷紝杈撳叆wav鏂囦欢鎸佺画鏃堕棿涓嶈秴杩�20绉� | | [Paraformer](https://modelscope.cn/models/damo/speech_paraformer_asr_nat-zh-cn-16k-common-vocab8358-tensorflow1/summary) | 涓枃鍜岃嫳鏂� | 闃块噷宸村反璇煶鏁版嵁锛�50000灏忔椂锛� | 8358 | 68M | 绂荤嚎 | 杈撳叆wav鏂囦欢鎸佺画鏃堕棿涓嶈秴杩�20绉� | | [Paraformer瀹炴椂](https://modelscope.cn/models/damo/speech_paraformer_asr_nat-zh-cn-16k-common-vocab8404-online/summary) | 涓枃鍜岃嫳鏂� | 闃块噷宸村反璇煶鏁版嵁 (50000hours) | 8404 | 68M | 瀹炴椂 | 鑳藉澶勭悊娴佸紡杈撳叆 | diff --git a/egs_modelscope/asr/TEMPLATE/README.md b/egs_modelscope/asr/TEMPLATE/README.md index e44a09d..ac73950 100644 --- a/egs_modelscope/asr/TEMPLATE/README.md +++ b/egs_modelscope/asr/TEMPLATE/README.md @@ -99,6 +99,28 @@ ``` The decoding mode of `fast` and `normal` is fake streaming, which could be used for evaluating of recognition accuracy. Full code of demo, please ref to [demo](https://github.com/alibaba-damo-academy/FunASR/discussions/151) + +#### [Paraformer-Spk](https://modelscope.cn/models/damo/speech_paraformer-large-vad-punc-spk_asr_nat-zh-cn/summary) +This model allows user to get recognition results which contain speaker info of each sentence. Refer to [CAM++](https://modelscope.cn/models/damo/speech_campplus_speaker-diarization_common/summary) for detailed information about speaker diarization model. +```python +from modelscope.pipelines import pipeline +from modelscope.utils.constant import Tasks + +if __name__ == '__main__': + audio_in = 'https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_speaker_demo.wav' + output_dir = "./results" + inference_pipeline = pipeline( + task=Tasks.auto_speech_recognition, + model='damo/speech_paraformer-large-vad-punc-spk_asr_nat-zh-cn', + model_revision='v0.0.2', + vad_model='damo/speech_fsmn_vad_zh-cn-16k-common-pytorch', + punc_model='damo/punc_ct-transformer_cn-en-common-vocab471067-large', + output_dir=output_dir, + ) + rec_result = inference_pipeline(audio_in=audio_in, batch_size_token=5000, batch_size_token_threshold_s=40, max_single_segment_time=6000) + print(rec_result) +``` + #### [RNN-T-online model]() Undo diff --git a/egs_modelscope/asr/TEMPLATE/README_zh.md b/egs_modelscope/asr/TEMPLATE/README_zh.md index d1fd269..47656b3 100644 --- a/egs_modelscope/asr/TEMPLATE/README_zh.md +++ b/egs_modelscope/asr/TEMPLATE/README_zh.md @@ -100,6 +100,29 @@ fast 鍜� normal 鐨勮В鐮佹ā寮忔槸鍋囨祦寮忚В鐮侊紝鍙敤浜庤瘎浼拌瘑鍒噯纭�с�� 婕旂ず鐨勫畬鏁翠唬鐮侊紝璇峰弬瑙� [demo](https://github.com/alibaba-damo-academy/FunASR/discussions/151) +#### [Paraformer-Spk model](https://modelscope.cn/models/damo/speech_paraformer-large-vad-punc-spk_asr_nat-zh-cn/summary) +杩斿洖璇嗗埆缁撴灉鐨勫悓鏃惰繑鍥炴瘡涓瓙鍙ョ殑璇磋瘽浜哄垎绫荤粨鏋溿�傚叧浜庤璇濅汉鏃ュ織妯″瀷鐨勮鎯呰瑙乕CAM++](https://modelscope.cn/models/damo/speech_campplus_speaker-diarization_common/summary)銆� + +```python +from modelscope.pipelines import pipeline +from modelscope.utils.constant import Tasks + +if __name__ == '__main__': + audio_in = 'https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_speaker_demo.wav' + output_dir = "./results" + inference_pipeline = pipeline( + task=Tasks.auto_speech_recognition, + model='damo/speech_paraformer-large-vad-punc-spk_asr_nat-zh-cn', + model_revision='v0.0.2', + vad_model='damo/speech_fsmn_vad_zh-cn-16k-common-pytorch', + punc_model='damo/punc_ct-transformer_cn-en-common-vocab471067-large', + output_dir=output_dir, + ) + rec_result = inference_pipeline(audio_in=audio_in, batch_size_token=5000, batch_size_token_threshold_s=40, max_single_segment_time=6000) + print(rec_result) +``` + + #### [RNN-T-online 妯″瀷]() Undo diff --git a/egs_modelscope/asr_vad_spk/TEMPLATE b/egs_modelscope/asr_vad_spk/TEMPLATE new file mode 120000 index 0000000..f969ea0 --- /dev/null +++ b/egs_modelscope/asr_vad_spk/TEMPLATE @@ -0,0 +1 @@ +../asr/TEMPLATE \ No newline at end of file -- Gitblit v1.9.1