From 28ccfbfc51068a663a80764e14074df5edf2b5ba Mon Sep 17 00:00:00 2001
From: kongdeqiang <kongdeqiang960204@163.com>
Date: 星期五, 13 三月 2026 17:41:41 +0800
Subject: [PATCH] 提交
---
README_zh.md | 70 ++++++++++++++++++++++++++++++----
1 files changed, 61 insertions(+), 9 deletions(-)
diff --git a/README_zh.md b/README_zh.md
index d51fa5a..b367008 100644
--- a/README_zh.md
+++ b/README_zh.md
@@ -29,10 +29,17 @@
<a name="鏍稿績鍔熻兘"></a>
## 鏍稿績鍔熻兘
- FunASR鏄竴涓熀纭�璇煶璇嗗埆宸ュ叿鍖咃紝鎻愪緵澶氱鍔熻兘锛屽寘鎷闊宠瘑鍒紙ASR锛夈�佽闊崇鐐规娴嬶紙VAD锛夈�佹爣鐐规仮澶嶃�佽瑷�妯″瀷銆佽璇濅汉楠岃瘉銆佽璇濅汉鍒嗙鍜屽浜哄璇濊闊宠瘑鍒瓑銆侳unASR鎻愪緵浜嗕究鎹风殑鑴氭湰鍜屾暀绋嬶紝鏀寔棰勮缁冨ソ鐨勬ā鍨嬬殑鎺ㄧ悊涓庡井璋冦��
-- 鎴戜滑鍦╗ModelScope](https://www.modelscope.cn/models?page=1&tasks=auto-speech-recognition)涓嶽huggingface](https://huggingface.co/FunASR)涓婂彂甯冧簡澶ч噺寮�婧愭暟鎹泦鎴栬�呮捣閲忓伐涓氭暟鎹缁冪殑妯″瀷锛屽彲浠ラ�氳繃鎴戜滑鐨刐妯″瀷浠撳簱](https://github.com/alibaba-damo-academy/FunASR/blob/main/docs/model_zoo/modelscope_models.md)浜嗚В妯″瀷鐨勮缁嗕俊鎭�備唬琛ㄦ�х殑[Paraformer](https://www.modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary)闈炶嚜鍥炲綊绔埌绔闊宠瘑鍒ā鍨嬪叿鏈夐珮绮惧害銆侀珮鏁堢巼銆佷究鎹烽儴缃茬殑浼樼偣锛屾敮鎸佸揩閫熸瀯寤鸿闊宠瘑鍒湇鍔★紝璇︾粏淇℃伅鍙互闃呰([鏈嶅姟閮ㄧ讲鏂囨。](runtime/readme_cn.md))銆�
+- 鎴戜滑鍦╗ModelScope](https://www.modelscope.cn/models?page=1&tasks=auto-speech-recognition)涓嶽huggingface](https://huggingface.co/FunASR)涓婂彂甯冧簡澶ч噺寮�婧愭暟鎹泦鎴栬�呮捣閲忓伐涓氭暟鎹缁冪殑妯″瀷锛屽彲浠ラ�氳繃鎴戜滑鐨刐妯″瀷浠撳簱](https://github.com/modelscope/FunASR/blob/main/model_zoo/readme_zh.md)浜嗚В妯″瀷鐨勮缁嗕俊鎭�備唬琛ㄦ�х殑[Paraformer](https://www.modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary)闈炶嚜鍥炲綊绔埌绔闊宠瘑鍒ā鍨嬪叿鏈夐珮绮惧害銆侀珮鏁堢巼銆佷究鎹烽儴缃茬殑浼樼偣锛屾敮鎸佸揩閫熸瀯寤鸿闊宠瘑鍒湇鍔★紝璇︾粏淇℃伅鍙互闃呰([鏈嶅姟閮ㄧ讲鏂囨。](runtime/readme_cn.md))銆�
<a name="鏈�鏂板姩鎬�"></a>
## 鏈�鏂板姩鎬�
+- 2024/10/29: 涓枃瀹炴椂璇煶鍚啓鏈嶅姟 1.12 鍙戝竷锛�2pass-offline妯″紡鏀寔SensevoiceSmall妯″瀷锛涜缁嗕俊鎭弬闃�([閮ㄧ讲鏂囨。](runtime/readme_cn.md))
+- 2024/10/10锛氭柊澧炲姞Whisper-large-v3-turbo妯″瀷鏀寔锛屽璇█璇煶璇嗗埆/缈昏瘧/璇璇嗗埆锛屾敮鎸佷粠 [modelscope](examples/industrial_data_pretraining/whisper/demo.py)浠撳簱涓嬭浇锛屼篃鏀寔浠� [openai](examples/industrial_data_pretraining/whisper/demo_from_openai.py)浠撳簱涓嬭浇妯″瀷銆�
+- 2024/09/26: 涓枃绂荤嚎鏂囦欢杞啓鏈嶅姟 4.6銆佽嫳鏂囩绾挎枃浠惰浆鍐欐湇鍔� 1.7銆佷腑鏂囧疄鏃惰闊冲惉鍐欐湇鍔� 1.11 鍙戝竷锛屼慨澶峅NNX鍐呭瓨娉勬紡銆佹敮鎸丼ensevoiceSmall onnx妯″瀷锛涗腑鏂囩绾挎枃浠惰浆鍐欐湇鍔PU 2.0 鍙戝竷锛屼慨澶嶆樉瀛樻硠婕�; 璇︾粏淇℃伅鍙傞槄([閮ㄧ讲鏂囨。](runtime/readme_cn.md))
+- 2024/09/25锛氭柊澧炶闊冲敜閱掓ā鍨嬶紝鏀寔[fsmn_kws](https://modelscope.cn/models/iic/speech_sanm_kws_phone-xiaoyun-commands-online), [fsmn_kws_mt](https://modelscope.cn/models/iic/speech_sanm_kws_phone-xiaoyun-commands-online), [sanm_kws](https://modelscope.cn/models/iic/speech_sanm_kws_phone-xiaoyun-commands-offline), [sanm_kws_streaming](https://modelscope.cn/models/iic/speech_sanm_kws_phone-xiaoyun-commands-online) 4涓ā鍨嬬殑寰皟鍜屾帹鐞嗐��
+- 2024/07/04锛歔SenseVoice](https://github.com/FunAudioLLM/SenseVoice) 鏄竴涓熀纭�璇煶鐞嗚В妯″瀷锛屽叿澶囧绉嶈闊崇悊瑙h兘鍔涳紝娑电洊浜嗚嚜鍔ㄨ闊宠瘑鍒紙ASR锛夈�佽瑷�璇嗗埆锛圠ID锛夈�佹儏鎰熻瘑鍒紙SER锛変互鍙婇煶棰戜簨浠舵娴嬶紙AED锛夈��
+- 2024/07/01锛氫腑鏂囩绾挎枃浠惰浆鍐欐湇鍔PU鐗堟湰 1.1鍙戝竷锛屼紭鍖朾ladedisc妯″瀷鍏煎鎬ч棶棰橈紱璇︾粏淇℃伅鍙傞槄([閮ㄧ讲鏂囨。](runtime/readme_cn.md))
+- 2024/06/27锛氫腑鏂囩绾挎枃浠惰浆鍐欐湇鍔PU鐗堟湰 1.0鍙戝竷锛屾敮鎸佸姩鎬乥atch锛屾敮鎸佸璺苟鍙戯紝鍦ㄩ暱闊抽娴嬭瘯闆嗕笂鍗曠嚎RTF涓�0.0076锛屽绾垮姞閫熸瘮涓�1200+锛圕PU涓�330+锛夛紱璇︾粏淇℃伅鍙傞槄([閮ㄧ讲鏂囨。](runtime/readme_cn.md))
- 2024/05/15锛氭柊澧炲姞鎯呮劅璇嗗埆妯″瀷锛孾emotion2vec+large](https://modelscope.cn/models/iic/emotion2vec_plus_large/summary)锛孾emotion2vec+base](https://modelscope.cn/models/iic/emotion2vec_plus_base/summary)锛孾emotion2vec+seed](https://modelscope.cn/models/iic/emotion2vec_plus_seed/summary)锛岃緭鍑烘儏鎰熺被鍒负锛氱敓姘�/angry锛屽紑蹇�/happy锛屼腑绔�/neutral锛岄毦杩�/sad銆�
- 2024/05/15: 涓枃绂荤嚎鏂囦欢杞啓鏈嶅姟 4.5銆佽嫳鏂囩绾挎枃浠惰浆鍐欐湇鍔� 1.6銆佷腑鏂囧疄鏃惰闊冲惉鍐欐湇鍔� 1.10 鍙戝竷锛岄�傞厤FunASR 1.0妯″瀷缁撴瀯锛涜缁嗕俊鎭弬闃�([閮ㄧ讲鏂囨。](runtime/readme_cn.md))
- 2024/03/05锛氭柊澧炲姞Qwen-Audio涓嶲wen-Audio-Chat闊抽鏂囨湰妯℃�佸ぇ妯″瀷锛屽湪澶氫釜闊抽棰嗗煙娴嬭瘯姒滃崟鍒锋锛屼腑鏀寔璇煶瀵硅瘽锛岃缁嗙敤娉曡 [绀轰緥](examples/industrial_data_pretraining/qwen_audio)銆�
@@ -85,7 +92,7 @@
濡傛灉闇�瑕佷娇鐢ㄥ伐涓氶璁粌妯″瀷锛屽畨瑁卪odelscope涓巋uggingface_hub锛堝彲閫夛級
```shell
-pip3 install -U modelscope huggingface_hub
+pip3 install -U modelscope huggingface huggingface_hub
```
## 妯″瀷浠撳簱
@@ -97,15 +104,18 @@
| 妯″瀷鍚嶅瓧 | 浠诲姟璇︽儏 | 璁粌鏁版嵁 | 鍙傛暟閲� |
|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:------------------:|:--------------:|:------:|
+| SenseVoiceSmall <br> ([猸怾(https://www.modelscope.cn/models/iic/SenseVoiceSmall) [馃](https://huggingface.co/FunAudioLLM/SenseVoiceSmall) ) | 澶氱璇煶鐞嗚В鑳藉姏锛屾兜鐩栦簡鑷姩璇煶璇嗗埆锛圓SR锛夈�佽瑷�璇嗗埆锛圠ID锛夈�佹儏鎰熻瘑鍒紙SER锛変互鍙婇煶棰戜簨浠舵娴嬶紙AED锛� | 400000灏忔椂锛屼腑鏂� | 330M |
| paraformer-zh <br> ([猸怾(https://www.modelscope.cn/models/damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary) [馃](https://huggingface.co/funasr/paraformer-zh) ) | 璇煶璇嗗埆锛屽甫鏃堕棿鎴宠緭鍑猴紝闈炲疄鏃� | 60000灏忔椂锛屼腑鏂� | 220M |
| paraformer-zh-streaming <br> ( [猸怾(https://modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online/summary) [馃](https://huggingface.co/funasr/paraformer-zh-streaming) ) | 璇煶璇嗗埆锛屽疄鏃� | 60000灏忔椂锛屼腑鏂� | 220M |
| paraformer-en <br> ( [猸怾(https://www.modelscope.cn/models/damo/speech_paraformer-large-vad-punc_asr_nat-en-16k-common-vocab10020/summary) [馃](https://huggingface.co/funasr/paraformer-en) ) | 璇煶璇嗗埆锛岄潪瀹炴椂 | 50000灏忔椂锛岃嫳鏂� | 220M |
| conformer-en <br> ( [猸怾(https://modelscope.cn/models/damo/speech_conformer_asr-en-16k-vocab4199-pytorch/summary) [馃](https://huggingface.co/funasr/conformer-en) ) | 璇煶璇嗗埆锛岄潪瀹炴椂 | 50000灏忔椂锛岃嫳鏂� | 220M |
-| ct-punc <br> ( [猸怾(https://modelscope.cn/models/damo/punc_ct-transformer_cn-en-common-vocab471067-large/summary) [馃](https://huggingface.co/funasr/ct-punc) ) | 鏍囩偣鎭㈠ | 100M锛屼腑鏂囦笌鑻辨枃 | 1.1B |
+| ct-punc <br> ( [猸怾(https://modelscope.cn/models/damo/punc_ct-transformer_cn-en-common-vocab471067-large/summary) [馃](https://huggingface.co/funasr/ct-punc) ) | 鏍囩偣鎭㈠ | 100M锛屼腑鏂囦笌鑻辨枃 | 290M |
| fsmn-vad <br> ( [猸怾(https://modelscope.cn/models/damo/speech_fsmn_vad_zh-cn-16k-common-pytorch/summary) [馃](https://huggingface.co/funasr/fsmn-vad) ) | 璇煶绔偣妫�娴嬶紝瀹炴椂 | 5000灏忔椂锛屼腑鏂囦笌鑻辨枃 | 0.4M |
+| fsmn-kws <br> ( [猸怾(https://modelscope.cn/models/iic/speech_charctc_kws_phone-xiaoyun/summary) ) | 璇煶鍞ら啋锛屽疄鏃� | 5000灏忔椂锛屼腑鏂� | 0.7M |
| fa-zh <br> ( [猸怾(https://modelscope.cn/models/damo/speech_timestamp_prediction-v1-16k-offline/summary) [馃](https://huggingface.co/funasr/fa-zh) ) | 瀛楃骇鍒椂闂存埑棰勬祴 | 50000灏忔椂锛屼腑鏂� | 38M |
| cam++ <br> ( [猸怾(https://modelscope.cn/models/iic/speech_campplus_sv_zh-cn_16k-common/summary) [馃](https://huggingface.co/funasr/campplus) ) | 璇磋瘽浜虹‘璁�/鍒嗗壊 | 5000灏忔椂 | 7.2M |
| Whisper-large-v3 <br> ([猸怾(https://www.modelscope.cn/models/iic/Whisper-large-v3/summary) [馃崁](https://github.com/openai/whisper) ) | 璇煶璇嗗埆锛屽甫鏃堕棿鎴宠緭鍑猴紝闈炲疄鏃� | 澶氳瑷� | 1550 M |
+| Whisper-large-v3-turbo <br> ([猸怾(https://www.modelscope.cn/models/iic/Whisper-large-v3-turbo/summary) [馃崁](https://github.com/openai/whisper) ) | 璇煶璇嗗埆锛屽甫鏃堕棿鎴宠緭鍑猴紝闈炲疄鏃� | 澶氳瑷� | 809 M |
| Qwen-Audio <br> ([猸怾(examples/industrial_data_pretraining/qwen_audio/demo.py) [馃](https://huggingface.co/Qwen/Qwen-Audio) ) | 闊抽鏂囨湰澶氭ā鎬佸ぇ妯″瀷锛堥璁粌锛� | 澶氳瑷� | 8B |
| Qwen-Audio-Chat <br> ([猸怾(examples/industrial_data_pretraining/qwen_audio/demo_chat.py) [馃](https://huggingface.co/Qwen/Qwen-Audio-Chat) ) | 闊抽鏂囨湰澶氭ā鎬佸ぇ妯″瀷锛坈hat鐗堟湰锛� | 澶氳瑷� | 8B |
| emotion2vec+large <br> ([猸怾(https://modelscope.cn/models/iic/emotion2vec_plus_large/summary) [馃](https://huggingface.co/emotion2vec/emotion2vec_plus_large) ) | 鎯呮劅璇嗗埆妯″瀷 | 40000灏忔椂锛�4绉嶆儏鎰熺被鍒� | 300M |
@@ -124,6 +134,43 @@
娉細鏀寔鍗曟潯闊抽鏂囦欢璇嗗埆锛屼篃鏀寔鏂囦欢鍒楄〃锛屽垪琛ㄤ负kaldi椋庢牸wav.scp锛歚wav_id wav_path`
### 闈炲疄鏃惰闊宠瘑鍒�
+#### SenseVoice
+```python
+from funasr import AutoModel
+from funasr.utils.postprocess_utils import rich_transcription_postprocess
+
+model_dir = "iic/SenseVoiceSmall"
+
+model = AutoModel(
+ model=model_dir,
+ vad_model="fsmn-vad",
+ vad_kwargs={"max_single_segment_time": 30000},
+ device="cuda:0",
+)
+
+# en
+res = model.generate(
+ input=f"{model.model_path}/example/en.mp3",
+ cache={},
+ language="auto", # "zn", "en", "yue", "ja", "ko", "nospeech"
+ use_itn=True,
+ batch_size_s=60,
+ merge_vad=True, #
+ merge_length_s=15,
+)
+text = rich_transcription_postprocess(res[0]["text"])
+print(text)
+```
+鍙傛暟璇存槑锛�
+- `model_dir`锛氭ā鍨嬪悕绉帮紝鎴栨湰鍦扮鐩樹腑鐨勬ā鍨嬭矾寰勩��
+- `vad_model`锛氳〃绀哄紑鍚疺AD锛孷AD鐨勪綔鐢ㄦ槸灏嗛暱闊抽鍒囧壊鎴愮煭闊抽锛屾鏃舵帹鐞嗚�楁椂鍖呮嫭浜哣AD涓嶴enseVoice鎬昏�楁椂锛屼负閾捐矾鑰楁椂锛屽鏋滈渶瑕佸崟鐙祴璇昐enseVoice妯″瀷鑰楁椂锛屽彲浠ュ叧闂璙AD妯″瀷銆�
+- `vad_kwargs`锛氳〃绀篤AD妯″瀷閰嶇疆,`max_single_segment_time`: 琛ㄧず`vad_model`鏈�澶у垏鍓查煶棰戞椂闀�, 鍗曚綅鏄绉抦s銆�
+- `use_itn`锛氳緭鍑虹粨鏋滀腑鏄惁鍖呭惈鏍囩偣涓庨�嗘枃鏈鍒欏寲銆�
+- `batch_size_s` 琛ㄧず閲囩敤鍔ㄦ�乥atch锛宐atch涓�婚煶棰戞椂闀匡紝鍗曚綅涓虹s銆�
+- `merge_vad`锛氭槸鍚﹀皢 vad 妯″瀷鍒囧壊鐨勭煭闊抽纰庣墖鍚堟垚锛屽悎骞跺悗闀垮害涓篳merge_length_s`锛屽崟浣嶄负绉抯銆�
+- `ban_emo_unk`锛氱鐢╡mo_unk鏍囩锛岀鐢ㄥ悗鎵�鏈夌殑鍙ュ瓙閮戒細琚祴涓庢儏鎰熸爣绛俱��
+
+#### Paraformer
```python
from funasr import AutoModel
# paraformer-zh is a multi-functional asr model
@@ -268,11 +315,16 @@
### 娴嬭瘯ONNX
```python
# pip3 install -U funasr-onnx
-from funasr_onnx import Paraformer
+from pathlib import Path
+from runtime.python.onnxruntime.funasr_onnx.paraformer_bin import Paraformer
+
+
+home_dir = Path.home()
+
model_dir = "damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch"
model = Paraformer(model_dir, batch_size=1, quantize=True)
-wav_path = ['~/.cache/modelscope/hub/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/example/asr_example.wav']
+wav_path = [f"{home_dir}/.cache/modelscope/hub/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/example/asr_example.wav"]
result = model(wav_path)
print(result)
@@ -296,11 +348,11 @@
<a name="绀惧尯浜ゆ祦"></a>
## 鑱旂郴鎴戜滑
-濡傛灉鎮ㄥ湪浣跨敤涓亣鍒伴棶棰橈紝鍙互鐩存帴鍦╣ithub椤甸潰鎻怚ssues銆傛杩庤闊冲叴瓒g埍濂借�呮壂鎻忎互涓嬬殑閽夐拤缇ゆ垨鑰呭井淇$兢浜岀淮鐮佸姞鍏ョぞ鍖虹兢锛岃繘琛屼氦娴佸拰璁ㄨ銆�
+濡傛灉鎮ㄥ湪浣跨敤涓亣鍒伴棶棰橈紝鍙互鐩存帴鍦╣ithub椤甸潰鎻怚ssues銆傛杩庤闊冲叴瓒g埍濂借�呮壂鎻忎互涓嬬殑閽夐拤缇や簩缁寸爜鍔犲叆绀惧尯缇わ紝杩涜浜ゆ祦鍜岃璁恒��
-| 閽夐拤缇� | 寰俊 |
-|:---------------------------------------------------------------------:|:-----------------------------------------------------:|
-| <div align="left"><img src="docs/images/dingding.jpg" width="250"/> | <img src="docs/images/wechat.png" width="215"/></div> |
+| 閽夐拤缇� |
+|:-------------------------------------------------------------------:|
+| <div align="left"><img src="docs/images/dingding.png" width="250"/> |
## 绀惧尯璐$尞鑰�
--
Gitblit v1.9.1