From 0170f534b017653d504a32ad4a6da267f4db09ac Mon Sep 17 00:00:00 2001
From: 游雁 <zhifu.gzf@alibaba-inc.com>
Date: 星期五, 05 七月 2024 00:17:06 +0800
Subject: [PATCH] sensevoice

---
 README_zh.md |   79 ++++++++++++++++++++++++++++++---------
 1 files changed, 61 insertions(+), 18 deletions(-)

diff --git a/README_zh.md b/README_zh.md
index 65029a1..275a30d 100644
--- a/README_zh.md
+++ b/README_zh.md
@@ -2,7 +2,11 @@
 
 (绠�浣撲腑鏂噟[English](./README.md))
 
-# FunASR: A Fundamental End-to-End Speech Recognition Toolkit
+
+
+[![SVG Banners](https://svg-banners.vercel.app/api?type=origin&text1=FunASR馃&text2=馃挅%20A%20Fundamental%20End-to-End%20Speech%20Recognition%20Toolkit&width=800&height=210)](https://github.com/Akshay090/svg-banners)
+
+[//]: # (# FunASR: A Fundamental End-to-End Speech Recognition Toolkit)
 
 [![PyPI](https://img.shields.io/pypi/v/funasr)](https://pypi.org/project/funasr/)
 
@@ -29,10 +33,18 @@
 
 <a name="鏈�鏂板姩鎬�"></a>
 ## 鏈�鏂板姩鎬�
+- 2024/07/04锛歔SenseVoice](https://github.com/FunAudioLLM/SenseVoice) 鏄竴涓熀纭�璇煶鐞嗚В妯″瀷锛屽叿澶囧绉嶈闊崇悊瑙h兘鍔涳紝娑电洊浜嗚嚜鍔ㄨ闊宠瘑鍒紙ASR锛夈�佽瑷�璇嗗埆锛圠ID锛夈�佹儏鎰熻瘑鍒紙SER锛変互鍙婇煶棰戜簨浠舵娴嬶紙AED锛夈��
+- 2024/07/01锛氫腑鏂囩绾挎枃浠惰浆鍐欐湇鍔PU鐗堟湰 1.1鍙戝竷锛屼紭鍖朾ladedisc妯″瀷鍏煎鎬ч棶棰橈紱璇︾粏淇℃伅鍙傞槄([閮ㄧ讲鏂囨。](runtime/readme_cn.md))
+- 2024/06/27锛氫腑鏂囩绾挎枃浠惰浆鍐欐湇鍔PU鐗堟湰 1.0鍙戝竷锛屾敮鎸佸姩鎬乥atch锛屾敮鎸佸璺苟鍙戯紝鍦ㄩ暱闊抽娴嬭瘯闆嗕笂鍗曠嚎RTF涓�0.0076锛屽绾垮姞閫熸瘮涓�1200+锛圕PU涓�330+锛夛紱璇︾粏淇℃伅鍙傞槄([閮ㄧ讲鏂囨。](runtime/readme_cn.md))
+- 2024/05/15锛氭柊澧炲姞鎯呮劅璇嗗埆妯″瀷锛孾emotion2vec+large](https://modelscope.cn/models/iic/emotion2vec_plus_large/summary)锛孾emotion2vec+base](https://modelscope.cn/models/iic/emotion2vec_plus_base/summary)锛孾emotion2vec+seed](https://modelscope.cn/models/iic/emotion2vec_plus_seed/summary)锛岃緭鍑烘儏鎰熺被鍒负锛氱敓姘�/angry锛屽紑蹇�/happy锛屼腑绔�/neutral锛岄毦杩�/sad銆�
+- 2024/05/15: 涓枃绂荤嚎鏂囦欢杞啓鏈嶅姟 4.5銆佽嫳鏂囩绾挎枃浠惰浆鍐欐湇鍔� 1.6銆佷腑鏂囧疄鏃惰闊冲惉鍐欐湇鍔� 1.10 鍙戝竷锛岄�傞厤FunASR 1.0妯″瀷缁撴瀯锛涜缁嗕俊鎭弬闃�([閮ㄧ讲鏂囨。](runtime/readme_cn.md))
 - 2024/03/05锛氭柊澧炲姞Qwen-Audio涓嶲wen-Audio-Chat闊抽鏂囨湰妯℃�佸ぇ妯″瀷锛屽湪澶氫釜闊抽棰嗗煙娴嬭瘯姒滃崟鍒锋锛屼腑鏀寔璇煶瀵硅瘽锛岃缁嗙敤娉曡 [绀轰緥](examples/industrial_data_pretraining/qwen_audio)銆�
 - 2024/03/05锛氭柊澧炲姞Whisper-large-v3妯″瀷鏀寔锛屽璇█璇煶璇嗗埆/缈昏瘧/璇璇嗗埆锛屾敮鎸佷粠 [modelscope](examples/industrial_data_pretraining/whisper/demo.py)浠撳簱涓嬭浇锛屼篃鏀寔浠� [openai](examples/industrial_data_pretraining/whisper/demo_from_openai.py)浠撳簱涓嬭浇妯″瀷銆�
 - 2024/03/05: 涓枃绂荤嚎鏂囦欢杞啓鏈嶅姟 4.4銆佽嫳鏂囩绾挎枃浠惰浆鍐欐湇鍔� 1.5銆佷腑鏂囧疄鏃惰闊冲惉鍐欐湇鍔� 1.9 鍙戝竷锛宒ocker闀滃儚鏀寔arm64骞冲彴锛屽崌绾odelscope鐗堟湰锛涜缁嗕俊鎭弬闃�([閮ㄧ讲鏂囨。](runtime/readme_cn.md))
 - 2024/01/30锛歠unasr-1.0鍙戝竷锛屾洿鏂拌鏄嶽鏂囨。](https://github.com/alibaba-damo-academy/FunASR/discussions/1319)
+
+<details><summary>灞曞紑鏃ュ織</summary>
+
 - 2024/01/30锛氭柊澧炲姞鎯呮劅璇嗗埆 [妯″瀷閾炬帴](https://www.modelscope.cn/models/iic/emotion2vec_base_finetuned/summary)锛屽師濮嬫ā鍨� [repo](https://github.com/ddlBoJack/emotion2vec).
 - 2024/01/25: 涓枃绂荤嚎鏂囦欢杞啓鏈嶅姟 4.2銆佽嫳鏂囩绾挎枃浠惰浆鍐欐湇鍔� 1.3锛屼紭鍖杤ad鏁版嵁澶勭悊鏂瑰紡锛屽ぇ骞呴檷浣庡嘲鍊煎唴瀛樺崰鐢紝鍐呭瓨娉勬紡浼樺寲锛涗腑鏂囧疄鏃惰闊冲惉鍐欐湇鍔� 1.7 鍙戝竷锛屽鎴风浼樺寲锛涜缁嗕俊鎭弬闃�([閮ㄧ讲鏂囨。](runtime/readme_cn.md))
 - 2024/01/09: funasr绀惧尯杞欢鍖厀indows 2.0鐗堟湰鍙戝竷锛屾敮鎸佽蒋浠跺寘涓枃绂荤嚎鏂囦欢杞啓4.1銆佽嫳鏂囩绾挎枃浠惰浆鍐�1.2銆佷腑鏂囧疄鏃跺惉鍐欐湇鍔�1.6鐨勬渶鏂板姛鑳斤紝璇︾粏淇℃伅鍙傞槄([FunASR绀惧尯杞欢鍖厀indows鐗堟湰](https://www.modelscope.cn/models/damo/funasr-runtime-win-cpu-x64/summary))
@@ -50,21 +62,33 @@
 - 2023.07.17: BAT涓�绉嶄綆寤惰繜浣庡唴瀛樻秷鑰楃殑RNN-T妯″瀷鍙戝竷锛岃缁嗕俊鎭弬闃咃紙[BAT](egs/aishell/bat)锛�
 - 2023.06.26: ASRU2023 澶氶�氶亾澶氭柟浼氳杞綍鎸戞垬璧�2.0瀹屾垚绔炶禌缁撴灉鍏竷锛岃缁嗕俊鎭弬闃咃紙[M2MeT2.0](https://alibaba-damo-academy.github.io/FunASR/m2met2_cn/index.html)锛�
 
+</details>
+
 <a name="瀹夎鏁欑▼"></a>
 ## 瀹夎鏁欑▼
 
+- 瀹夎funasr涔嬪墠锛岀‘淇濆凡缁忓畨瑁呬簡涓嬮潰渚濊禆鐜:
+```text
+python>=3.8
+torch>=1.13
+torchaudio
+```
+
+- pip瀹夎
 ```shell
 pip3 install -U funasr
 ```
-鎴栬�呬粠婧愪唬鐮佸畨瑁�
+
+- 鎴栬�呬粠婧愪唬鐮佸畨瑁�
 ``` sh
 git clone https://github.com/alibaba/FunASR.git && cd FunASR
 pip3 install -e ./
 ```
-濡傛灉闇�瑕佷娇鐢ㄥ伐涓氶璁粌妯″瀷锛屽畨瑁卪odelscope锛堝彲閫夛級
+
+濡傛灉闇�瑕佷娇鐢ㄥ伐涓氶璁粌妯″瀷锛屽畨瑁卪odelscope涓巋uggingface_hub锛堝彲閫夛級
 
 ```shell
-pip3 install -U modelscope
+pip3 install -U modelscope huggingface huggingface_hub
 ```
 
 ## 妯″瀷浠撳簱
@@ -74,19 +98,21 @@
 锛堟敞锛氣瓙 琛ㄧずModelScope妯″瀷浠撳簱锛岎煠� 琛ㄧずHuggingface妯″瀷浠撳簱锛岎煃�琛ㄧずOpenAI妯″瀷浠撳簱锛�
 
 
-|                                                                                                     妯″瀷鍚嶅瓧                                                                                                      |        浠诲姟璇︽儏        |     璁粌鏁版嵁     | 鍙傛暟閲�  | 
-|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:------------------:|:------------:|:----:|
-|    paraformer-zh <br> ([猸怾(https://www.modelscope.cn/models/damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary)  [馃](https://huggingface.co/funasr/paraformer-tp) )    |  璇煶璇嗗埆锛屽甫鏃堕棿鎴宠緭鍑猴紝闈炲疄鏃�   |  60000灏忔椂锛屼腑鏂�  | 220M |
-| paraformer-zh-streaming <br> ( [猸怾(https://modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online/summary) [馃](https://huggingface.co/funasr/paraformer-zh-streaming) ) |      璇煶璇嗗埆锛屽疄鏃�       |  60000灏忔椂锛屼腑鏂�  | 220M |
-|         paraformer-en <br> ( [猸怾(https://www.modelscope.cn/models/damo/speech_paraformer-large-vad-punc_asr_nat-en-16k-common-vocab10020/summary) [馃](https://huggingface.co/funasr/paraformer-en) )         |      璇煶璇嗗埆锛岄潪瀹炴椂      |  50000灏忔椂锛岃嫳鏂�  | 220M |
-|                      conformer-en <br> ( [猸怾(https://modelscope.cn/models/damo/speech_conformer_asr-en-16k-vocab4199-pytorch/summary) [馃](https://huggingface.co/funasr/conformer-en) )                      |      璇煶璇嗗埆锛岄潪瀹炴椂      |  50000灏忔椂锛岃嫳鏂�  | 220M |
-|                        ct-punc <br> ( [猸怾(https://modelscope.cn/models/damo/punc_ct-transformer_cn-en-common-vocab471067-large/summary) [馃](https://huggingface.co/funasr/ct-punc) )                         |        鏍囩偣鎭㈠        |  100M锛屼腑鏂囦笌鑻辨枃  | 1.1G | 
-|                            fsmn-vad <br> ( [猸怾(https://modelscope.cn/models/damo/speech_fsmn_vad_zh-cn-16k-common-pytorch/summary) [馃](https://huggingface.co/funasr/fsmn-vad) )                             |     璇煶绔偣妫�娴嬶紝瀹炴椂      | 5000灏忔椂锛屼腑鏂囦笌鑻辨枃 | 0.4M | 
-|                              fa-zh <br> ( [猸怾(https://modelscope.cn/models/damo/speech_timestamp_prediction-v1-16k-offline/summary) [馃](https://huggingface.co/funasr/fa-zh) )                               |      瀛楃骇鍒椂闂存埑棰勬祴      |  50000灏忔椂锛屼腑鏂�  | 38M  |
-|                                 cam++ <br> ( [猸怾(https://modelscope.cn/models/iic/speech_campplus_sv_zh-cn_16k-common/summary) [馃](https://huggingface.co/funasr/campplus) )                                 |      璇磋瘽浜虹‘璁�/鍒嗗壊      |    5000灏忔椂    | 7.2M | 
-|                                     Whisper-large-v3 <br> ([猸怾(https://www.modelscope.cn/models/iic/Whisper-large-v3/summary)  [馃崁](https://github.com/openai/whisper) )                                      |  璇煶璇嗗埆锛屽甫鏃堕棿鎴宠緭鍑猴紝闈炲疄鏃�   |     澶氳瑷�      |  1G  |
-|                                         Qwen-Audio <br> ([猸怾(examples/industrial_data_pretraining/qwen_audio/demo.py)  [馃](https://huggingface.co/Qwen/Qwen-Audio) )                                         |  闊抽鏂囨湰澶氭ā鎬佸ぇ妯″瀷锛堥璁粌锛�   |     澶氳瑷�      |  8B  |
-|                   Qwen-Audio-Chat <br> ([猸怾(examples/industrial_data_pretraining/qwen_audio/demo_chat.py)  [馃](https://huggingface.co/Qwen/Qwen-Audio-Chat) )                                                | 闊抽鏂囨湰澶氭ā鎬佸ぇ妯″瀷锛坈hat鐗堟湰锛� |     澶氳瑷�      |  8B  |
+|                                                                                                     妯″瀷鍚嶅瓧                                                                                                      |        浠诲姟璇︽儏        |      璁粌鏁版嵁      |  鍙傛暟閲�   | 
+|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:------------------:|:--------------:|:------:|
+|   SenseVoiceSmall <br> ([猸怾(https://www.modelscope.cn/models/iic/SenseVoiceSmall)  [馃](https://huggingface.co/FunAudioLLM/SenseVoiceSmall) )   |  澶氱璇煶鐞嗚В鑳藉姏锛屾兜鐩栦簡鑷姩璇煶璇嗗埆锛圓SR锛夈�佽瑷�璇嗗埆锛圠ID锛夈�佹儏鎰熻瘑鍒紙SER锛変互鍙婇煶棰戜簨浠舵娴嬶紙AED锛�   |  400000灏忔椂锛屼腑鏂�   |  330M  |
+|    paraformer-zh <br> ([猸怾(https://www.modelscope.cn/models/damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary)  [馃](https://huggingface.co/funasr/paraformer-zh) )    |  璇煶璇嗗埆锛屽甫鏃堕棿鎴宠緭鍑猴紝闈炲疄鏃�   |   60000灏忔椂锛屼腑鏂�   |  220M  |
+| paraformer-zh-streaming <br> ( [猸怾(https://modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online/summary) [馃](https://huggingface.co/funasr/paraformer-zh-streaming) ) |      璇煶璇嗗埆锛屽疄鏃�       |   60000灏忔椂锛屼腑鏂�   |  220M  |
+|         paraformer-en <br> ( [猸怾(https://www.modelscope.cn/models/damo/speech_paraformer-large-vad-punc_asr_nat-en-16k-common-vocab10020/summary) [馃](https://huggingface.co/funasr/paraformer-en) )         |      璇煶璇嗗埆锛岄潪瀹炴椂      |   50000灏忔椂锛岃嫳鏂�   |  220M  |
+|                      conformer-en <br> ( [猸怾(https://modelscope.cn/models/damo/speech_conformer_asr-en-16k-vocab4199-pytorch/summary) [馃](https://huggingface.co/funasr/conformer-en) )                      |      璇煶璇嗗埆锛岄潪瀹炴椂      |   50000灏忔椂锛岃嫳鏂�   |  220M  |
+|                        ct-punc <br> ( [猸怾(https://modelscope.cn/models/damo/punc_ct-transformer_cn-en-common-vocab471067-large/summary) [馃](https://huggingface.co/funasr/ct-punc) )                         |        鏍囩偣鎭㈠        |   100M锛屼腑鏂囦笌鑻辨枃   |  290M  | 
+|                            fsmn-vad <br> ( [猸怾(https://modelscope.cn/models/damo/speech_fsmn_vad_zh-cn-16k-common-pytorch/summary) [馃](https://huggingface.co/funasr/fsmn-vad) )                             |     璇煶绔偣妫�娴嬶紝瀹炴椂      |  5000灏忔椂锛屼腑鏂囦笌鑻辨枃  |  0.4M  | 
+|                              fa-zh <br> ( [猸怾(https://modelscope.cn/models/damo/speech_timestamp_prediction-v1-16k-offline/summary) [馃](https://huggingface.co/funasr/fa-zh) )                               |      瀛楃骇鍒椂闂存埑棰勬祴      |   50000灏忔椂锛屼腑鏂�   |  38M   |
+|                                 cam++ <br> ( [猸怾(https://modelscope.cn/models/iic/speech_campplus_sv_zh-cn_16k-common/summary) [馃](https://huggingface.co/funasr/campplus) )                                 |      璇磋瘽浜虹‘璁�/鍒嗗壊      |     5000灏忔椂     |  7.2M  | 
+|                                     Whisper-large-v3 <br> ([猸怾(https://www.modelscope.cn/models/iic/Whisper-large-v3/summary)  [馃崁](https://github.com/openai/whisper) )                                      |  璇煶璇嗗埆锛屽甫鏃堕棿鎴宠緭鍑猴紝闈炲疄鏃�   |      澶氳瑷�       | 1550 M |
+|                                         Qwen-Audio <br> ([猸怾(examples/industrial_data_pretraining/qwen_audio/demo.py)  [馃](https://huggingface.co/Qwen/Qwen-Audio) )                                         |  闊抽鏂囨湰澶氭ā鎬佸ぇ妯″瀷锛堥璁粌锛�   |      澶氳瑷�       |   8B   |
+|                                 Qwen-Audio-Chat <br> ([猸怾(examples/industrial_data_pretraining/qwen_audio/demo_chat.py)  [馃](https://huggingface.co/Qwen/Qwen-Audio-Chat) )                                  | 闊抽鏂囨湰澶氭ā鎬佸ぇ妯″瀷锛坈hat鐗堟湰锛� |      澶氳瑷�       |   8B   |
+|                        emotion2vec+large <br> ([猸怾(https://modelscope.cn/models/iic/emotion2vec_plus_large/summary)  [馃](https://huggingface.co/emotion2vec/emotion2vec_plus_large) )                        |    鎯呮劅璇嗗埆妯″瀷          | 40000灏忔椂锛�4绉嶆儏鎰熺被鍒� |  300M  |
 
 <a name="蹇�熷紑濮�"></a>
 ## 蹇�熷紑濮�
@@ -145,13 +171,15 @@
 
 娉細`chunk_size`涓烘祦寮忓欢鏃堕厤缃紝`[0,10,5]`琛ㄧず涓婂睆瀹炴椂鍑哄瓧绮掑害涓篳10*60=600ms`锛屾湭鏉ヤ俊鎭负`5*60=300ms`銆傛瘡娆℃帹鐞嗚緭鍏ヤ负`600ms`锛堥噰鏍风偣鏁颁负`16000*0.6=960`锛夛紝杈撳嚭涓哄搴旀枃瀛楋紝鏈�鍚庝竴涓闊崇墖娈佃緭鍏ラ渶瑕佽缃甡is_final=True`鏉ュ己鍒惰緭鍑烘渶鍚庝竴涓瓧銆�
 
+<details><summary>鏇村渚嬪瓙</summary>
+
 ### 璇煶绔偣妫�娴嬶紙闈炲疄鏃讹級
 ```python
 from funasr import AutoModel
 
 model = AutoModel(model="fsmn-vad")
 
-wav_file = f"{model.model_path}/example/asr_example.wav"
+wav_file = f"{model.model_path}/example/vad_example.wav"
 res = model.generate(input=wav_file)
 print(res)
 ```
@@ -208,9 +236,24 @@
 res = model.generate(input=(wav_file, text_file), data_type=("sound", "text"))
 print(res)
 ```
+
+### 鎯呮劅璇嗗埆
+```python
+from funasr import AutoModel
+
+model = AutoModel(model="emotion2vec_plus_large")
+
+wav_file = f"{model.model_path}/example/test.wav"
+
+res = model.generate(wav_file, output_dir="./outputs", granularity="utterance", extract_embedding=False)
+print(res)
+```
+
 鏇磋缁嗭紙[鏁欑▼鏂囨。](docs/tutorial/README_zh.md)锛夛紝
 鏇村锛圼妯″瀷绀轰緥](https://github.com/alibaba-damo-academy/FunASR/tree/main/examples/industrial_data_pretraining)锛�
 
+</details>
+
 ## 瀵煎嚭ONNX
 ### 浠庡懡浠よ瀵煎嚭
 ```shell

--
Gitblit v1.9.1