From 31bf3a88a09f6e7c895224f4dc75e8f2c138d5c8 Mon Sep 17 00:00:00 2001
From: 雾聪 <wucong.lyb@alibaba-inc.com>
Date: 星期一, 01 七月 2024 20:43:06 +0800
Subject: [PATCH] update funasr-runtime-sdk-gpu-0.1.1
---
README_zh.md | 75 +++++++++++++++++++++++++++++--------
1 files changed, 58 insertions(+), 17 deletions(-)
diff --git a/README_zh.md b/README_zh.md
index 80c2e7e..fe05f13 100644
--- a/README_zh.md
+++ b/README_zh.md
@@ -2,7 +2,11 @@
(绠�浣撲腑鏂噟[English](./README.md))
-# FunASR: A Fundamental End-to-End Speech Recognition Toolkit
+
+
+[](https://github.com/Akshay090/svg-banners)
+
+[//]: # (# FunASR: A Fundamental End-to-End Speech Recognition Toolkit)
[](https://pypi.org/project/funasr/)
@@ -29,10 +33,17 @@
<a name="鏈�鏂板姩鎬�"></a>
## 鏈�鏂板姩鎬�
+- 2024/07/01锛氫腑鏂囩绾挎枃浠惰浆鍐欐湇鍔PU鐗堟湰 1.1鍙戝竷锛屼紭鍖朾ladedisc妯″瀷鍏煎鎬ч棶棰橈紱璇︾粏淇℃伅鍙傞槄([閮ㄧ讲鏂囨。](runtime/readme_cn.md))
+- 2024/06/27锛氫腑鏂囩绾挎枃浠惰浆鍐欐湇鍔PU鐗堟湰 1.0鍙戝竷锛屾敮鎸佸姩鎬乥atch锛屾敮鎸佸璺苟鍙戯紝鍦ㄩ暱闊抽娴嬭瘯闆嗕笂鍗曠嚎RTF涓�0.0076锛屽绾垮姞閫熸瘮涓�1200+锛圕PU涓�330+锛夛紱璇︾粏淇℃伅鍙傞槄([閮ㄧ讲鏂囨。](runtime/readme_cn.md))
+- 2024/05/15锛氭柊澧炲姞鎯呮劅璇嗗埆妯″瀷锛孾emotion2vec+large](https://modelscope.cn/models/iic/emotion2vec_plus_large/summary)锛孾emotion2vec+base](https://modelscope.cn/models/iic/emotion2vec_plus_base/summary)锛孾emotion2vec+seed](https://modelscope.cn/models/iic/emotion2vec_plus_seed/summary)锛岃緭鍑烘儏鎰熺被鍒负锛氱敓姘�/angry锛屽紑蹇�/happy锛屼腑绔�/neutral锛岄毦杩�/sad銆�
+- 2024/05/15: 涓枃绂荤嚎鏂囦欢杞啓鏈嶅姟 4.5銆佽嫳鏂囩绾挎枃浠惰浆鍐欐湇鍔� 1.6銆佷腑鏂囧疄鏃惰闊冲惉鍐欐湇鍔� 1.10 鍙戝竷锛岄�傞厤FunASR 1.0妯″瀷缁撴瀯锛涜缁嗕俊鎭弬闃�([閮ㄧ讲鏂囨。](runtime/readme_cn.md))
- 2024/03/05锛氭柊澧炲姞Qwen-Audio涓嶲wen-Audio-Chat闊抽鏂囨湰妯℃�佸ぇ妯″瀷锛屽湪澶氫釜闊抽棰嗗煙娴嬭瘯姒滃崟鍒锋锛屼腑鏀寔璇煶瀵硅瘽锛岃缁嗙敤娉曡 [绀轰緥](examples/industrial_data_pretraining/qwen_audio)銆�
- 2024/03/05锛氭柊澧炲姞Whisper-large-v3妯″瀷鏀寔锛屽璇█璇煶璇嗗埆/缈昏瘧/璇璇嗗埆锛屾敮鎸佷粠 [modelscope](examples/industrial_data_pretraining/whisper/demo.py)浠撳簱涓嬭浇锛屼篃鏀寔浠� [openai](examples/industrial_data_pretraining/whisper/demo_from_openai.py)浠撳簱涓嬭浇妯″瀷銆�
- 2024/03/05: 涓枃绂荤嚎鏂囦欢杞啓鏈嶅姟 4.4銆佽嫳鏂囩绾挎枃浠惰浆鍐欐湇鍔� 1.5銆佷腑鏂囧疄鏃惰闊冲惉鍐欐湇鍔� 1.9 鍙戝竷锛宒ocker闀滃儚鏀寔arm64骞冲彴锛屽崌绾odelscope鐗堟湰锛涜缁嗕俊鎭弬闃�([閮ㄧ讲鏂囨。](runtime/readme_cn.md))
- 2024/01/30锛歠unasr-1.0鍙戝竷锛屾洿鏂拌鏄嶽鏂囨。](https://github.com/alibaba-damo-academy/FunASR/discussions/1319)
+
+<details><summary>灞曞紑鏃ュ織</summary>
+
- 2024/01/30锛氭柊澧炲姞鎯呮劅璇嗗埆 [妯″瀷閾炬帴](https://www.modelscope.cn/models/iic/emotion2vec_base_finetuned/summary)锛屽師濮嬫ā鍨� [repo](https://github.com/ddlBoJack/emotion2vec).
- 2024/01/25: 涓枃绂荤嚎鏂囦欢杞啓鏈嶅姟 4.2銆佽嫳鏂囩绾挎枃浠惰浆鍐欐湇鍔� 1.3锛屼紭鍖杤ad鏁版嵁澶勭悊鏂瑰紡锛屽ぇ骞呴檷浣庡嘲鍊煎唴瀛樺崰鐢紝鍐呭瓨娉勬紡浼樺寲锛涗腑鏂囧疄鏃惰闊冲惉鍐欐湇鍔� 1.7 鍙戝竷锛屽鎴风浼樺寲锛涜缁嗕俊鎭弬闃�([閮ㄧ讲鏂囨。](runtime/readme_cn.md))
- 2024/01/09: funasr绀惧尯杞欢鍖厀indows 2.0鐗堟湰鍙戝竷锛屾敮鎸佽蒋浠跺寘涓枃绂荤嚎鏂囦欢杞啓4.1銆佽嫳鏂囩绾挎枃浠惰浆鍐�1.2銆佷腑鏂囧疄鏃跺惉鍐欐湇鍔�1.6鐨勬渶鏂板姛鑳斤紝璇︾粏淇℃伅鍙傞槄([FunASR绀惧尯杞欢鍖厀indows鐗堟湰](https://www.modelscope.cn/models/damo/funasr-runtime-win-cpu-x64/summary))
@@ -50,21 +61,33 @@
- 2023.07.17: BAT涓�绉嶄綆寤惰繜浣庡唴瀛樻秷鑰楃殑RNN-T妯″瀷鍙戝竷锛岃缁嗕俊鎭弬闃咃紙[BAT](egs/aishell/bat)锛�
- 2023.06.26: ASRU2023 澶氶�氶亾澶氭柟浼氳杞綍鎸戞垬璧�2.0瀹屾垚绔炶禌缁撴灉鍏竷锛岃缁嗕俊鎭弬闃咃紙[M2MeT2.0](https://alibaba-damo-academy.github.io/FunASR/m2met2_cn/index.html)锛�
+</details>
+
<a name="瀹夎鏁欑▼"></a>
## 瀹夎鏁欑▼
+- 瀹夎funasr涔嬪墠锛岀‘淇濆凡缁忓畨瑁呬簡涓嬮潰渚濊禆鐜:
+```text
+python>=3.8
+torch>=1.13
+torchaudio
+```
+
+- pip瀹夎
```shell
pip3 install -U funasr
```
-鎴栬�呬粠婧愪唬鐮佸畨瑁�
+
+- 鎴栬�呬粠婧愪唬鐮佸畨瑁�
``` sh
git clone https://github.com/alibaba/FunASR.git && cd FunASR
pip3 install -e ./
```
-濡傛灉闇�瑕佷娇鐢ㄥ伐涓氶璁粌妯″瀷锛屽畨瑁卪odelscope锛堝彲閫夛級
+
+濡傛灉闇�瑕佷娇鐢ㄥ伐涓氶璁粌妯″瀷锛屽畨瑁卪odelscope涓巋uggingface_hub锛堝彲閫夛級
```shell
-pip3 install -U modelscope
+pip3 install -U modelscope huggingface huggingface_hub
```
## 妯″瀷浠撳簱
@@ -74,19 +97,20 @@
锛堟敞锛氣瓙 琛ㄧずModelScope妯″瀷浠撳簱锛岎煠� 琛ㄧずHuggingface妯″瀷浠撳簱锛岎煃�琛ㄧずOpenAI妯″瀷浠撳簱锛�
-| 妯″瀷鍚嶅瓧 | 浠诲姟璇︽儏 | 璁粌鏁版嵁 | 鍙傛暟閲� |
-|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:------------------:|:------------:|:----:|
-| paraformer-zh <br> ([猸怾(https://www.modelscope.cn/models/damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary) [馃](https://huggingface.co/funasr/paraformer-tp) ) | 璇煶璇嗗埆锛屽甫鏃堕棿鎴宠緭鍑猴紝闈炲疄鏃� | 60000灏忔椂锛屼腑鏂� | 220M |
-| paraformer-zh-streaming <br> ( [猸怾(https://modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online/summary) [馃](https://huggingface.co/funasr/paraformer-zh-streaming) ) | 璇煶璇嗗埆锛屽疄鏃� | 60000灏忔椂锛屼腑鏂� | 220M |
-| paraformer-en <br> ( [猸怾(https://www.modelscope.cn/models/damo/speech_paraformer-large-vad-punc_asr_nat-en-16k-common-vocab10020/summary) [馃](https://huggingface.co/funasr/paraformer-en) ) | 璇煶璇嗗埆锛岄潪瀹炴椂 | 50000灏忔椂锛岃嫳鏂� | 220M |
-| conformer-en <br> ( [猸怾(https://modelscope.cn/models/damo/speech_conformer_asr-en-16k-vocab4199-pytorch/summary) [馃](https://huggingface.co/funasr/conformer-en) ) | 璇煶璇嗗埆锛岄潪瀹炴椂 | 50000灏忔椂锛岃嫳鏂� | 220M |
-| ct-punc <br> ( [猸怾(https://modelscope.cn/models/damo/punc_ct-transformer_cn-en-common-vocab471067-large/summary) [馃](https://huggingface.co/funasr/ct-punc) ) | 鏍囩偣鎭㈠ | 100M锛屼腑鏂囦笌鑻辨枃 | 1.1G |
-| fsmn-vad <br> ( [猸怾(https://modelscope.cn/models/damo/speech_fsmn_vad_zh-cn-16k-common-pytorch/summary) [馃](https://huggingface.co/funasr/fsmn-vad) ) | 璇煶绔偣妫�娴嬶紝瀹炴椂 | 5000灏忔椂锛屼腑鏂囦笌鑻辨枃 | 0.4M |
-| fa-zh <br> ( [猸怾(https://modelscope.cn/models/damo/speech_timestamp_prediction-v1-16k-offline/summary) [馃](https://huggingface.co/funasr/fa-zh) ) | 瀛楃骇鍒椂闂存埑棰勬祴 | 50000灏忔椂锛屼腑鏂� | 38M |
-| cam++ <br> ( [猸怾(https://modelscope.cn/models/iic/speech_campplus_sv_zh-cn_16k-common/summary) [馃](https://huggingface.co/funasr/campplus) ) | 璇磋瘽浜虹‘璁�/鍒嗗壊 | 5000灏忔椂 | 7.2M |
-| Whisper-large-v3 <br> ([猸怾(https://www.modelscope.cn/models/iic/Whisper-large-v3/summary) [馃崁](https://github.com/openai/whisper) ) | 璇煶璇嗗埆锛屽甫鏃堕棿鎴宠緭鍑猴紝闈炲疄鏃� | 澶氳瑷� | 1G |
-| Qwen-Audio <br> ([猸怾(examples/industrial_data_pretraining/qwen_audio/demo.py) [馃](https://huggingface.co/Qwen/Qwen-Audio) ) | 闊抽鏂囨湰澶氭ā鎬佸ぇ妯″瀷锛堥璁粌锛� | 澶氳瑷� | 8B |
-| Qwen-Audio-Chat <br> ([猸怾(examples/industrial_data_pretraining/qwen_audio/demo_chat.py) [馃](https://huggingface.co/Qwen/Qwen-Audio-Chat) ) | 闊抽鏂囨湰澶氭ā鎬佸ぇ妯″瀷锛坈hat鐗堟湰锛� | 澶氳瑷� | 8B |
+| 妯″瀷鍚嶅瓧 | 浠诲姟璇︽儏 | 璁粌鏁版嵁 | 鍙傛暟閲� |
+|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:------------------:|:--------------:|:------:|
+| paraformer-zh <br> ([猸怾(https://www.modelscope.cn/models/damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary) [馃](https://huggingface.co/funasr/paraformer-zh) ) | 璇煶璇嗗埆锛屽甫鏃堕棿鎴宠緭鍑猴紝闈炲疄鏃� | 60000灏忔椂锛屼腑鏂� | 220M |
+| paraformer-zh-streaming <br> ( [猸怾(https://modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online/summary) [馃](https://huggingface.co/funasr/paraformer-zh-streaming) ) | 璇煶璇嗗埆锛屽疄鏃� | 60000灏忔椂锛屼腑鏂� | 220M |
+| paraformer-en <br> ( [猸怾(https://www.modelscope.cn/models/damo/speech_paraformer-large-vad-punc_asr_nat-en-16k-common-vocab10020/summary) [馃](https://huggingface.co/funasr/paraformer-en) ) | 璇煶璇嗗埆锛岄潪瀹炴椂 | 50000灏忔椂锛岃嫳鏂� | 220M |
+| conformer-en <br> ( [猸怾(https://modelscope.cn/models/damo/speech_conformer_asr-en-16k-vocab4199-pytorch/summary) [馃](https://huggingface.co/funasr/conformer-en) ) | 璇煶璇嗗埆锛岄潪瀹炴椂 | 50000灏忔椂锛岃嫳鏂� | 220M |
+| ct-punc <br> ( [猸怾(https://modelscope.cn/models/damo/punc_ct-transformer_cn-en-common-vocab471067-large/summary) [馃](https://huggingface.co/funasr/ct-punc) ) | 鏍囩偣鎭㈠ | 100M锛屼腑鏂囦笌鑻辨枃 | 290M |
+| fsmn-vad <br> ( [猸怾(https://modelscope.cn/models/damo/speech_fsmn_vad_zh-cn-16k-common-pytorch/summary) [馃](https://huggingface.co/funasr/fsmn-vad) ) | 璇煶绔偣妫�娴嬶紝瀹炴椂 | 5000灏忔椂锛屼腑鏂囦笌鑻辨枃 | 0.4M |
+| fa-zh <br> ( [猸怾(https://modelscope.cn/models/damo/speech_timestamp_prediction-v1-16k-offline/summary) [馃](https://huggingface.co/funasr/fa-zh) ) | 瀛楃骇鍒椂闂存埑棰勬祴 | 50000灏忔椂锛屼腑鏂� | 38M |
+| cam++ <br> ( [猸怾(https://modelscope.cn/models/iic/speech_campplus_sv_zh-cn_16k-common/summary) [馃](https://huggingface.co/funasr/campplus) ) | 璇磋瘽浜虹‘璁�/鍒嗗壊 | 5000灏忔椂 | 7.2M |
+| Whisper-large-v3 <br> ([猸怾(https://www.modelscope.cn/models/iic/Whisper-large-v3/summary) [馃崁](https://github.com/openai/whisper) ) | 璇煶璇嗗埆锛屽甫鏃堕棿鎴宠緭鍑猴紝闈炲疄鏃� | 澶氳瑷� | 1550 M |
+| Qwen-Audio <br> ([猸怾(examples/industrial_data_pretraining/qwen_audio/demo.py) [馃](https://huggingface.co/Qwen/Qwen-Audio) ) | 闊抽鏂囨湰澶氭ā鎬佸ぇ妯″瀷锛堥璁粌锛� | 澶氳瑷� | 8B |
+| Qwen-Audio-Chat <br> ([猸怾(examples/industrial_data_pretraining/qwen_audio/demo_chat.py) [馃](https://huggingface.co/Qwen/Qwen-Audio-Chat) ) | 闊抽鏂囨湰澶氭ā鎬佸ぇ妯″瀷锛坈hat鐗堟湰锛� | 澶氳瑷� | 8B |
+| emotion2vec+large <br> ([猸怾(https://modelscope.cn/models/iic/emotion2vec_plus_large/summary) [馃](https://huggingface.co/emotion2vec/emotion2vec_plus_large) ) | 鎯呮劅璇嗗埆妯″瀷 | 40000灏忔椂锛�4绉嶆儏鎰熺被鍒� | 300M |
<a name="蹇�熷紑濮�"></a>
## 蹇�熷紑濮�
@@ -144,6 +168,8 @@
```
娉細`chunk_size`涓烘祦寮忓欢鏃堕厤缃紝`[0,10,5]`琛ㄧず涓婂睆瀹炴椂鍑哄瓧绮掑害涓篳10*60=600ms`锛屾湭鏉ヤ俊鎭负`5*60=300ms`銆傛瘡娆℃帹鐞嗚緭鍏ヤ负`600ms`锛堥噰鏍风偣鏁颁负`16000*0.6=960`锛夛紝杈撳嚭涓哄搴旀枃瀛楋紝鏈�鍚庝竴涓闊崇墖娈佃緭鍏ラ渶瑕佽缃甡is_final=True`鏉ュ己鍒惰緭鍑烘渶鍚庝竴涓瓧銆�
+
+<details><summary>鏇村渚嬪瓙</summary>
### 璇煶绔偣妫�娴嬶紙闈炲疄鏃讹級
```python
@@ -208,9 +234,24 @@
res = model.generate(input=(wav_file, text_file), data_type=("sound", "text"))
print(res)
```
+
+### 鎯呮劅璇嗗埆
+```python
+from funasr import AutoModel
+
+model = AutoModel(model="emotion2vec_plus_large")
+
+wav_file = f"{model.model_path}/example/test.wav"
+
+res = model.generate(wav_file, output_dir="./outputs", granularity="utterance", extract_embedding=False)
+print(res)
+```
+
鏇磋缁嗭紙[鏁欑▼鏂囨。](docs/tutorial/README_zh.md)锛夛紝
鏇村锛圼妯″瀷绀轰緥](https://github.com/alibaba-damo-academy/FunASR/tree/main/examples/industrial_data_pretraining)锛�
+</details>
+
## 瀵煎嚭ONNX
### 浠庡懡浠よ瀵煎嚭
```shell
--
Gitblit v1.9.1