From 369382050bf71c249944545f009a29a8632fdda5 Mon Sep 17 00:00:00 2001
From: 游雁 <zhifu.gzf@alibaba-inc.com>
Date: 星期四, 25 一月 2024 15:04:47 +0800
Subject: [PATCH] funasr1.0.2

---
 examples/industrial_data_pretraining/uniasr/demo.py |    6 -
 model_zoo/modelscope_models_zh.md                   |   42 +++++++-------
 funasr/auto/auto_model.py                           |    3 
 README_zh.md                                        |   16 ++--
 funasr/models/uniasr/template.yaml                  |   52 +++++++++++++----
 README.md                                           |   20 +++---
 6 files changed, 82 insertions(+), 57 deletions(-)

diff --git a/README.md b/README.md
index 1e3707c..499bb40 100644
--- a/README.md
+++ b/README.md
@@ -55,16 +55,16 @@
 (Note: 馃 represents the Huggingface model zoo link, 猸� represents the ModelScope model zoo link)
 
 
-|                                                                             Model Name                                                                             |                                Task Details                                 |          Training Data           | Parameters |
-|:------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:---------------------------------------------------------------------------:|:--------------------------------:|:----------:|
-|    paraformer-zh <br> ([猸怾(https://www.modelscope.cn/models/damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary)  [馃]() )    |             speech recognition, with timestamps, non-streaming              |      60000 hours, Mandarin       |    220M    |
-|                paraformer-zh-spk <br> ( [猸怾(https://modelscope.cn/models/damo/speech_paraformer-large-vad-punc-spk_asr_nat-zh-cn/summary)  [馃]() )                | speech recognition with speaker diarization, with timestamps, non-streaming |      60000 hours, Mandarin       |    220M    |
-| <nobr>paraformer-zh-online <br> ( [猸怾(https://modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online/summary) [馃]() )</nobr> |                        speech recognition, streaming                        |      60000 hours, Mandarin       |    220M    |
-|         paraformer-en <br> ( [猸怾(https://www.modelscope.cn/models/damo/speech_paraformer-large-vad-punc_asr_nat-en-16k-common-vocab10020/summary) [馃]() )         |             speech recognition, with timestamps, non-streaming              |       50000 hours, English       |    220M    |
-|                     conformer-en <br> ( [猸怾(https://modelscope.cn/models/damo/speech_conformer_asr-en-16k-vocab4199-pytorch/summary) [馃]() )                      |                      speech recognition, non-streaming                      |       50000 hours, English       |    220M    |
-|                     ct-punc <br> ( [猸怾(https://modelscope.cn/models/damo/punc_ct-transformer_cn-en-common-vocab471067-large/summary) [馃]() )                      |                           punctuation restoration                           |    100M, Mandarin and English    |    1.1G    | 
-|                          fsmn-vad <br> ( [猸怾(https://modelscope.cn/models/damo/speech_fsmn_vad_zh-cn-16k-common-pytorch/summary) [馃]() )                          |                          voice activity detection                           | 5000 hours, Mandarin and English |    0.4M    | 
-|                          fa-zh <br> ( [猸怾(https://modelscope.cn/models/damo/speech_timestamp_prediction-v1-16k-offline/summary) [馃]() )                           |                            timestamp prediction                             |       5000 hours, Mandarin       |    38M     | 
+|                                                                             Model Name                                                                             |                    Task Details                    |          Training Data           | Parameters |
+|:------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:--------------------------------------------------:|:--------------------------------:|:----------:|
+|    paraformer-zh <br> ([猸怾(https://www.modelscope.cn/models/damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary)  [馃]() )    | speech recognition, with timestamps, non-streaming |      60000 hours, Mandarin       |    220M    |
+| <nobr>paraformer-zh-online <br> ( [猸怾(https://modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online/summary) [馃]() )</nobr> |           speech recognition, streaming            |      60000 hours, Mandarin       |    220M    |
+|         paraformer-en <br> ( [猸怾(https://www.modelscope.cn/models/damo/speech_paraformer-large-vad-punc_asr_nat-en-16k-common-vocab10020/summary) [馃]() )         | speech recognition, with timestamps, non-streaming |       50000 hours, English       |    220M    |
+|                     conformer-en <br> ( [猸怾(https://modelscope.cn/models/damo/speech_conformer_asr-en-16k-vocab4199-pytorch/summary) [馃]() )                      |         speech recognition, non-streaming          |       50000 hours, English       |    220M    |
+|                     ct-punc <br> ( [猸怾(https://modelscope.cn/models/damo/punc_ct-transformer_cn-en-common-vocab471067-large/summary) [馃]() )                      |              punctuation restoration               |    100M, Mandarin and English    |    1.1G    | 
+|                          fsmn-vad <br> ( [猸怾(https://modelscope.cn/models/damo/speech_fsmn_vad_zh-cn-16k-common-pytorch/summary) [馃]() )                          |              voice activity detection              | 5000 hours, Mandarin and English |    0.4M    | 
+|                          fa-zh <br> ( [猸怾(https://modelscope.cn/models/damo/speech_timestamp_prediction-v1-16k-offline/summary) [馃]() )                           |                timestamp prediction                |       5000 hours, Mandarin       |    38M     | 
+|                cam++ <br> ( [猸怾(https://modelscope.cn/models/iic/speech_campplus_sv_zh-cn_16k-common/summary) [馃]() )                                             |        speaker verification/diarization            |            5000 hours            |    7.2M    | 
 
 
 
diff --git a/README_zh.md b/README_zh.md
index 552a5a0..9d7c151 100644
--- a/README_zh.md
+++ b/README_zh.md
@@ -57,16 +57,16 @@
 锛堟敞锛歔馃]()琛ㄧずHuggingface妯″瀷浠撳簱閾炬帴锛孾猸怾()琛ㄧずModelScope妯″瀷浠撳簱閾炬帴锛�
 
 
-|                                                                             妯″瀷鍚嶅瓧                                                                             |        浠诲姟璇︽儏        |     璁粌鏁版嵁     | 鍙傛暟閲�  |
+|                                         妯″瀷鍚嶅瓧                                                                                                                 |        浠诲姟璇︽儏        |     璁粌鏁版嵁     | 鍙傛暟閲�  |
 |:------------------------------------------------------------------------------------------------------------------------------------------------------------:|:------------------:|:------------:|:----:|
 | paraformer-zh <br> ([猸怾(https://www.modelscope.cn/models/damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary)  [馃]() ) |  璇煶璇嗗埆锛屽甫鏃堕棿鎴宠緭鍑猴紝闈炲疄鏃�   |  60000灏忔椂锛屼腑鏂�  | 220M |
-| paraformer-zh-spk <br> ( [猸怾(https://modelscope.cn/models/damo/speech_paraformer-large-vad-punc-spk_asr_nat-zh-cn/summary)  [馃]() )             | 鍒嗚鑹茶闊宠瘑鍒紝甯︽椂闂存埑杈撳嚭锛岄潪瀹炴椂 |  60000灏忔椂锛屼腑鏂�  | 220M |
-| paraformer-zh-streaming <br> ( [猸怾(https://modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online/summary) [馃]() )   |      璇煶璇嗗埆锛屽疄鏃�       |  60000灏忔椂锛屼腑鏂�  | 220M |
-| paraformer-en <br> ( [猸怾(https://www.modelscope.cn/models/damo/speech_paraformer-large-vad-punc_asr_nat-en-16k-common-vocab10020/summary) [馃]() )      | 璇煶璇嗗埆锛岄潪瀹炴椂 |  50000灏忔椂锛岃嫳鏂�  | 220M |
-| conformer-en <br> ( [猸怾(https://modelscope.cn/models/damo/speech_conformer_asr-en-16k-vocab4199-pytorch/summary) [馃]() )                   |      璇煶璇嗗埆锛岄潪瀹炴椂      |  50000灏忔椂锛岃嫳鏂�  | 220M |
-| ct-punc <br> ( [猸怾(https://modelscope.cn/models/damo/punc_ct-transformer_cn-en-common-vocab471067-large/summary) [馃]() )                   |      鏍囩偣鎭㈠      |  100M锛屼腑鏂囦笌鑻辨枃  | 1.1G | 
-| fsmn-vad <br> ( [猸怾(https://modelscope.cn/models/damo/speech_fsmn_vad_zh-cn-16k-common-pytorch/summary) [馃]() )                       |     璇煶绔偣妫�娴嬶紝瀹炴椂      | 5000灏忔椂锛屼腑鏂囦笌鑻辨枃 | 0.4M | 
-| fa-zh <br> ( [猸怾(https://modelscope.cn/models/damo/speech_timestamp_prediction-v1-16k-offline/summary) [馃]() )                        |   瀛楃骇鍒椂闂存埑棰勬祴         |  50000灏忔椂锛屼腑鏂�  | 38M  |
+|   paraformer-zh-streaming <br> ( [猸怾(https://modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online/summary) [馃]() )   |      璇煶璇嗗埆锛屽疄鏃�       |  60000灏忔椂锛屼腑鏂�  | 220M |
+|      paraformer-en <br> ( [猸怾(https://www.modelscope.cn/models/damo/speech_paraformer-large-vad-punc_asr_nat-en-16k-common-vocab10020/summary) [馃]() )      |      璇煶璇嗗埆锛岄潪瀹炴椂      |  50000灏忔椂锛岃嫳鏂�  | 220M |
+|                  conformer-en <br> ( [猸怾(https://modelscope.cn/models/damo/speech_conformer_asr-en-16k-vocab4199-pytorch/summary) [馃]() )                   |      璇煶璇嗗埆锛岄潪瀹炴椂      |  50000灏忔椂锛岃嫳鏂�  | 220M |
+|                  ct-punc <br> ( [猸怾(https://modelscope.cn/models/damo/punc_ct-transformer_cn-en-common-vocab471067-large/summary) [馃]() )                   |        鏍囩偣鎭㈠        |  100M锛屼腑鏂囦笌鑻辨枃  | 1.1G | 
+|                       fsmn-vad <br> ( [猸怾(https://modelscope.cn/models/damo/speech_fsmn_vad_zh-cn-16k-common-pytorch/summary) [馃]() )                       |     璇煶绔偣妫�娴嬶紝瀹炴椂      | 5000灏忔椂锛屼腑鏂囦笌鑻辨枃 | 0.4M | 
+|                       fa-zh <br> ( [猸怾(https://modelscope.cn/models/damo/speech_timestamp_prediction-v1-16k-offline/summary) [馃]() )                        |      瀛楃骇鍒椂闂存埑棰勬祴      |  50000灏忔椂锛屼腑鏂�  | 38M  |
+|                           cam++ <br> ( [猸怾(https://modelscope.cn/models/iic/speech_campplus_sv_zh-cn_16k-common/summary) [馃]() )                            |      璇磋瘽浜虹‘璁�/鍒嗗壊      |   5000灏忔椂     |    7.2M    | 
 
 
 <a name="蹇�熷紑濮�"></a>
diff --git a/examples/industrial_data_pretraining/uniasr/demo.py b/examples/industrial_data_pretraining/uniasr/demo.py
index 1259021..6dcd557 100644
--- a/examples/industrial_data_pretraining/uniasr/demo.py
+++ b/examples/industrial_data_pretraining/uniasr/demo.py
@@ -5,11 +5,7 @@
 
 from funasr import AutoModel
 
-model = AutoModel(model="/Users/zhifu/Downloads/modelscope_models/speech_UniASR_asr_2pass-zh-cn-16k-common-vocab8358-tensorflow1-online", model_revision="v2.0.4",
-                  # vad_model="damo/speech_fsmn_vad_zh-cn-16k-common-pytorch",
-                  # vad_model_revision="v2.0.4",
-                  # punc_model="damo/punc_ct-transformer_zh-cn-common-vocab272727-pytorch",
-                  # punc_model_revision="v2.0.4",
+model = AutoModel(model="iic/speech_UniASR-large_asr_2pass-zh-cn-16k-common-vocab8358-tensorflow1-offline", model_revision="v2.0.4",
                   )
 
 res = model.generate(input="https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_zh.wav")
diff --git a/funasr/auto/auto_model.py b/funasr/auto/auto_model.py
index 5f89dd6..5bfc080 100644
--- a/funasr/auto/auto_model.py
+++ b/funasr/auto/auto_model.py
@@ -224,7 +224,7 @@
         asr_result_list = []
         num_samples = len(data_list)
         disable_pbar = kwargs.get("disable_pbar", False)
-        pbar = tqdm(colour="blue", total=num_samples+1, dynamic_ncols=True) if not disable_pbar else None
+        pbar = tqdm(colour="blue", total=num_samples, dynamic_ncols=True) if not disable_pbar else None
         time_speech_total = 0.0
         time_escape_total = 0.0
         for beg_idx in range(0, num_samples, batch_size):
@@ -350,6 +350,7 @@
             
             end_asr_total = time.time()
             time_escape_total_per_sample = end_asr_total - beg_asr_total
+            pbar_sample.update(1)
             pbar_sample.set_description(f"rtf_avg_per_sample: {time_escape_total_per_sample / time_speech_total_per_sample:0.3f}, "
                                  f"time_speech_total_per_sample: {time_speech_total_per_sample: 0.3f}, "
                                  f"time_escape_total_per_sample: {time_escape_total_per_sample:0.3f}")
diff --git a/funasr/models/uniasr/template.yaml b/funasr/models/uniasr/template.yaml
index f4815c1..35c6b2e 100644
--- a/funasr/models/uniasr/template.yaml
+++ b/funasr/models/uniasr/template.yaml
@@ -18,6 +18,7 @@
     decoder_attention_chunk_type2: chunk
     loss_weight_model1: 0.5
 
+
 # encoder
 encoder: SANMEncoderChunkOpt
 encoder_conf:
@@ -34,11 +35,21 @@
     kernel_size: 11
     sanm_shfit: 0
     selfattention_layer_type: sanm
-    chunk_size: [20, 60]
-    stride: [10, 40]
-    pad_left: [5, 10]
-    encoder_att_look_back_factor: [0, 0]
-    decoder_att_look_back_factor: [0, 0]
+    chunk_size:
+    - 20
+    - 60
+    stride:
+    - 10
+    - 40
+    pad_left:
+    - 5
+    - 10
+    encoder_att_look_back_factor:
+    - 0
+    - 0
+    decoder_att_look_back_factor:
+    - 0
+    - 0
 
 # decoder
 decoder: FsmnDecoderSCAMAOpt
@@ -55,6 +66,7 @@
     kernel_size: 11
     concat_embeds: true
 
+# predictor
 predictor: CifPredictorV2
 predictor_conf:
     idim: 320
@@ -62,6 +74,8 @@
     l_order: 1
     r_order: 1
 
+
+# encoder2
 encoder2: SANMEncoderChunkOpt
 encoder2_conf:
     output_size: 320
@@ -77,12 +91,23 @@
     kernel_size: 21
     sanm_shfit: 0
     selfattention_layer_type: sanm
-    chunk_size: [45, 70]
-    stride: [35, 50]
-    pad_left: [5, 10]
-    encoder_att_look_back_factor: [0, 0]
-    decoder_att_look_back_factor: [0, 0]
+    chunk_size:
+    - 45
+    - 70
+    stride:
+    - 35
+    - 50
+    pad_left:
+    - 5
+    - 10
+    encoder_att_look_back_factor:
+    - 0
+    - 0
+    decoder_att_look_back_factor:
+    - 0
+    - 0
 
+# decoder
 decoder2: FsmnDecoderSCAMAOpt
 decoder2_conf:
     attention_dim: 320
@@ -108,10 +133,12 @@
 stride_conv_conf:
     kernel_size: 2
     stride: 2
-    pad: [0, 1]
+    pad:
+    - 0
+    - 1
 
 # frontend related
-frontend: WavFrontendOnline
+frontend: WavFrontend
 frontend_conf:
     fs: 16000
     window: hamming
@@ -120,6 +147,7 @@
     frame_shift: 10
     lfr_m: 7
     lfr_n: 6
+    dither: 0.0
 
 specaug: SpecAugLFR
 specaug_conf:
diff --git a/model_zoo/modelscope_models_zh.md b/model_zoo/modelscope_models_zh.md
index 88fa23e..1f501a3 100644
--- a/model_zoo/modelscope_models_zh.md
+++ b/model_zoo/modelscope_models_zh.md
@@ -33,26 +33,26 @@
 
 #### UniASR妯″瀷
 
-|                                                                    妯″瀷鍚嶅瓧                                                                     |    璇█    |           璁粌鏁版嵁           | Vocab Size | Parameter | 闈炲疄鏃�/瀹炴椂 | 澶囨敞                                                                                                                           |
-|:-------------------------------------------------------------------------------------------------------------------------------------------------:|:--------:|:---------------------------------:|:----------:|:---------:|:--------------:|:--------------------------------------------------------------------------------------------------------------------------------|
-|             [UniASR](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-zh-cn-16k-common-vocab8358-tensorflow1-瀹炴椂/summary)             |  涓枃鍜岃嫳鏂�   | 闃块噷宸村反璇煶鏁版嵁 (60000 灏忔椂) |    8358    |   100M    |     瀹炴椂     | 娴佸紡绂荤嚎涓�浣撳寲妯″瀷                                                                                                    |
-|      [UniASR-large](https://modelscope.cn/models/damo/speech_UniASR-large_asr_2pass-zh-cn-16k-common-vocab8358-tensorflow1-闈炲疄鏃�/summary)       |  涓枃鍜岃嫳鏂�   | 闃块噷宸村反璇煶鏁版嵁 (60000 灏忔椂) |    8358    |   220M    |    闈炲疄鏃�     | 娴佸紡绂荤嚎涓�浣撳寲妯″瀷                                                                                                    |
-|          [UniASR English](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-en-16k-common-vocab1080-tensorflow1-瀹炴椂/summary)           |    鑻辨枃    | 闃块噷宸村反璇煶鏁版嵁 (10000 灏忔椂) |    1080     |    95M    |     瀹炴椂     | 娴佸紡绂荤嚎涓�浣撳寲妯″瀷                                                                                                    |
-|          [UniASR Russian](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-ru-16k-common-vocab1664-tensorflow1-瀹炴椂/summary)           |    淇勮    | 闃块噷宸村反璇煶鏁版嵁 (5000 灏忔椂)  |    1664     |    95M    |     瀹炴椂     | 娴佸紡绂荤嚎涓�浣撳寲妯″瀷                                                                                                    |
-|           [UniASR Japanese](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-ja-16k-common-vocab93-tensorflow1-瀹炴椂/summary)           |    鏃ヨ    | 闃块噷宸村反璇煶鏁版嵁 (5000 灏忔椂)  |    5977     |    95M    |     瀹炴椂     | 娴佸紡绂荤嚎涓�浣撳寲妯″瀷                                                                                                    |
-|           [UniASR Korean](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-ko-16k-common-vocab6400-tensorflow1-瀹炴椂/summary)           |    闊╄    | 闃块噷宸村反璇煶鏁版嵁 (2000 灏忔椂)  |    6400     |    95M    |     瀹炴椂     | 娴佸紡绂荤嚎涓�浣撳寲妯″瀷                                                                                                    |
-| [UniASR Cantonese (CHS)](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-cantonese-CHS-16k-common-vocab1468-tensorflow1-瀹炴椂/summary) | 绮よ锛堢畝浣撲腑鏂囷級 | 闃块噷宸村反璇煶鏁版嵁 (5000 灏忔椂)  |    1468     |    95M    |     瀹炴椂     | 娴佸紡绂荤嚎涓�浣撳寲妯″瀷                                                                                                    |
-|         [UniASR Indonesian](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-id-16k-common-vocab1067-tensorflow1-瀹炴椂/summary)         |   鍗板凹璇�    | 闃块噷宸村反璇煶鏁版嵁 (1000 灏忔椂)  |    1067     |    95M    |     瀹炴椂     | 娴佸紡绂荤嚎涓�浣撳寲妯″瀷                                                                                                    |
-|           [UniASR Vietnamese](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-vi-16k-common-vocab1001-pytorch-瀹炴椂/summary)           |   瓒婂崡璇�    | 闃块噷宸村反璇煶鏁版嵁 (1000 灏忔椂)  |    1001     |    95M    |     瀹炴椂     | 娴佸紡绂荤嚎涓�浣撳寲妯″瀷                                                                                                    |
-|          [UniASR Spanish](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-es-16k-common-vocab3445-tensorflow1-瀹炴椂/summary)           |   瑗跨彮鐗欒   | 闃块噷宸村反璇煶鏁版嵁 (1000 灏忔椂)  |    3445     |    95M    |     瀹炴椂     | 娴佸紡绂荤嚎涓�浣撳寲妯″瀷                                                                                                    |
-|         [UniASR Portuguese](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-pt-16k-common-vocab1617-tensorflow1-瀹炴椂/summary)         |   钁¤悇鐗欒   | 闃块噷宸村反璇煶鏁版嵁 (1000 灏忔椂)  |    1617     |    95M    |     瀹炴椂     | 娴佸紡绂荤嚎涓�浣撳寲妯″瀷                                                                                                    |
-|           [UniASR French](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-fr-16k-common-vocab3472-tensorflow1-瀹炴椂/summary)           |    娉曡    | 闃块噷宸村反璇煶鏁版嵁 (1000 灏忔椂)  |    3472     |    95M    |     瀹炴椂     | 娴佸紡绂荤嚎涓�浣撳寲妯″瀷                                                                                                    |
-|           [UniASR German](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-de-16k-common-vocab3690-tensorflow1-瀹炴椂/summary)           |    寰疯    | 闃块噷宸村反璇煶鏁版嵁 (1000 灏忔椂)  |    3690     |    95M    |     瀹炴椂     | 娴佸紡绂荤嚎涓�浣撳寲妯″瀷                                                                                                    |
-|            [UniASR Persian](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-fa-16k-common-vocab1257-pytorch-瀹炴椂/summary)             |   娉㈡柉璇�    | 闃块噷宸村反璇煶鏁版嵁 (1000 灏忔椂)  |    1257     |    95M    |     瀹炴椂     | 娴佸紡绂荤嚎涓�浣撳寲妯″瀷                                                                                                    |
-|                [UniASR Burmese](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-my-16k-common-vocab696-pytorch/summary)                 |   缂呯敻璇�    | 闃块噷宸村反璇煶鏁版嵁 (1000 灏忔椂)  |    696     |    95M    |     瀹炴椂     | 娴佸紡绂荤嚎涓�浣撳寲妯″瀷                                                                                                    |
-|                [UniASR Hebrew](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-he-16k-common-vocab1085-pytorch/summary)                 |   甯屼集鏉ヨ   | 闃块噷宸村反璇煶鏁版嵁 (1000 灏忔椂)  |    1085    |    95M    |     瀹炴椂     | 娴佸紡绂荤嚎涓�浣撳寲妯″瀷                                                                                                    |
-|              [UniASR Urdu](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-ur-16k-common-vocab877-pytorch/summary)                      |   涔屽皵閮借   | 闃块噷宸村反璇煶鏁版嵁 (1000 灏忔椂)  |    877     |    95M    |     瀹炴椂     | 娴佸紡绂荤嚎涓�浣撳寲妯″瀷                                                                                                    |
-|              [UniASR Turkish](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-tr-16k-common-vocab1582-pytorch/summary)                      |   鍦熻�冲叾璇�   | 闃块噷宸村反璇煶鏁版嵁 (1000 灏忔椂)  |    1582     |    95M    |     瀹炴椂     | 娴佸紡绂荤嚎涓�浣撳寲妯″瀷                                                                                                    |
+|                                                                     妯″瀷鍚嶅瓧                                                                      |    璇█    |           璁粌鏁版嵁           | Vocab Size | Parameter | 闈炲疄鏃�/瀹炴椂 | 澶囨敞                                                                                                                           |
+|:---------------------------------------------------------------------------------------------------------------------------------------------:|:--------:|:---------------------------------:|:----------:|:---------:|:--------------:|:--------------------------------------------------------------------------------------------------------------------------------|
+|           [UniASR](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-zh-cn-16k-common-vocab8358-tensorflow1-online/summary)           |  涓枃鍜岃嫳鏂�   | 闃块噷宸村反璇煶鏁版嵁 (60000 灏忔椂) |    8358    |   100M    |     瀹炴椂     | 娴佸紡绂荤嚎涓�浣撳寲妯″瀷                                                                                                    |
+|      [UniASR-large](https://modelscope.cn/models/damo/speech_UniASR-large_asr_2pass-zh-cn-16k-common-vocab8358-tensorflow1-offline/summary)       |  涓枃鍜岃嫳鏂�   | 闃块噷宸村反璇煶鏁版嵁 (60000 灏忔椂) |    8358    |   220M    |    闈炲疄鏃�     | 娴佸紡绂荤嚎涓�浣撳寲妯″瀷                                                                                                    |
+|          [UniASR English](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-en-16k-common-vocab1080-tensorflow1-online/summary)           |    鑻辨枃    | 闃块噷宸村反璇煶鏁版嵁 (10000 灏忔椂) |    1080     |    95M    |     瀹炴椂     | 娴佸紡绂荤嚎涓�浣撳寲妯″瀷                                                                                                    |
+|          [UniASR Russian](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-ru-16k-common-vocab1664-tensorflow1-online/summary)           |    淇勮    | 闃块噷宸村反璇煶鏁版嵁 (5000 灏忔椂)  |    1664     |    95M    |     瀹炴椂     | 娴佸紡绂荤嚎涓�浣撳寲妯″瀷                                                                                                    |
+|           [UniASR Japanese](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-ja-16k-common-vocab93-tensorflow1-online/summary)           |    鏃ヨ    | 闃块噷宸村反璇煶鏁版嵁 (5000 灏忔椂)  |    5977     |    95M    |     瀹炴椂     | 娴佸紡绂荤嚎涓�浣撳寲妯″瀷                                                                                                    |
+|           [UniASR Korean](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-ko-16k-common-vocab6400-tensorflow1-online/summary)           |    闊╄    | 闃块噷宸村反璇煶鏁版嵁 (2000 灏忔椂)  |    6400     |    95M    |     瀹炴椂     | 娴佸紡绂荤嚎涓�浣撳寲妯″瀷                                                                                                    |
+| [UniASR Cantonese (CHS)](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-cantonese-CHS-16k-common-vocab1468-tensorflow1-online/summary) | 绮よ锛堢畝浣撲腑鏂囷級 | 闃块噷宸村反璇煶鏁版嵁 (5000 灏忔椂)  |    1468     |    95M    |     瀹炴椂     | 娴佸紡绂荤嚎涓�浣撳寲妯″瀷                                                                                                    |
+|         [UniASR Indonesian](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-id-16k-common-vocab1067-tensorflow1-online/summary)         |   鍗板凹璇�    | 闃块噷宸村反璇煶鏁版嵁 (1000 灏忔椂)  |    1067     |    95M    |     瀹炴椂     | 娴佸紡绂荤嚎涓�浣撳寲妯″瀷                                                                                                    |
+|           [UniASR Vietnamese](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-vi-16k-common-vocab1001-pytorch-online/summary)           |   瓒婂崡璇�    | 闃块噷宸村反璇煶鏁版嵁 (1000 灏忔椂)  |    1001     |    95M    |     瀹炴椂     | 娴佸紡绂荤嚎涓�浣撳寲妯″瀷                                                                                                    |
+|          [UniASR Spanish](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-es-16k-common-vocab3445-tensorflow1-online/summary)           |   瑗跨彮鐗欒   | 闃块噷宸村反璇煶鏁版嵁 (1000 灏忔椂)  |    3445     |    95M    |     瀹炴椂     | 娴佸紡绂荤嚎涓�浣撳寲妯″瀷                                                                                                    |
+|         [UniASR Portuguese](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-pt-16k-common-vocab1617-tensorflow1-online/summary)         |   钁¤悇鐗欒   | 闃块噷宸村反璇煶鏁版嵁 (1000 灏忔椂)  |    1617     |    95M    |     瀹炴椂     | 娴佸紡绂荤嚎涓�浣撳寲妯″瀷                                                                                                    |
+|           [UniASR French](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-fr-16k-common-vocab3472-tensorflow1-online/summary)           |    娉曡    | 闃块噷宸村反璇煶鏁版嵁 (1000 灏忔椂)  |    3472     |    95M    |     瀹炴椂     | 娴佸紡绂荤嚎涓�浣撳寲妯″瀷                                                                                                    |
+|           [UniASR German](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-de-16k-common-vocab3690-tensorflow1-online/summary)           |    寰疯    | 闃块噷宸村反璇煶鏁版嵁 (1000 灏忔椂)  |    3690     |    95M    |     瀹炴椂     | 娴佸紡绂荤嚎涓�浣撳寲妯″瀷                                                                                                    |
+|            [UniASR Persian](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-fa-16k-common-vocab1257-pytorch-online/summary)             |   娉㈡柉璇�    | 闃块噷宸村反璇煶鏁版嵁 (1000 灏忔椂)  |    1257     |    95M    |     瀹炴椂     | 娴佸紡绂荤嚎涓�浣撳寲妯″瀷                                                                                                    |
+|              [UniASR Burmese](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-my-16k-common-vocab696-pytorch/summary)               |   缂呯敻璇�    | 闃块噷宸村反璇煶鏁版嵁 (1000 灏忔椂)  |    696     |    95M    |     瀹炴椂     | 娴佸紡绂荤嚎涓�浣撳寲妯″瀷                                                                                                    |
+|              [UniASR Hebrew](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-he-16k-common-vocab1085-pytorch/summary)               |   甯屼集鏉ヨ   | 闃块噷宸村反璇煶鏁版嵁 (1000 灏忔椂)  |    1085    |    95M    |     瀹炴椂     | 娴佸紡绂荤嚎涓�浣撳寲妯″瀷                                                                                                    |
+|                [UniASR Urdu](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-ur-16k-common-vocab877-pytorch/summary)                |   涔屽皵閮借   | 闃块噷宸村反璇煶鏁版嵁 (1000 灏忔椂)  |    877     |    95M    |     瀹炴椂     | 娴佸紡绂荤嚎涓�浣撳寲妯″瀷                                                                                                    |
+|              [UniASR Turkish](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-tr-16k-common-vocab1582-pytorch/summary)              |   鍦熻�冲叾璇�   | 闃块噷宸村反璇煶鏁版嵁 (1000 灏忔椂)  |    1582     |    95M    |     瀹炴椂     | 娴佸紡绂荤嚎涓�浣撳寲妯″瀷                                                                                                    |
 
 
 #### Conformer妯″瀷
@@ -115,7 +115,7 @@
 
 |                                                    妯″瀷鍚嶅瓧                                     |  璇█  |    璁粌鏁版嵁    | 妯″瀷鍙傛暟 | 澶囨敞       |
 |:--------------------------------------------------------------------------------------------------:|:--------------:|:-------------------:|:----------:|:---------|
-| [TP-Aligner](https://modelscope.cn/models/damo/speech_timestamp_prediction-v1-16k-闈炲疄鏃�/summary) |涓枃| 闃块噷宸村反璇煶鏁版嵁 (50000hours) |   37.8M    | 鏃堕棿鎴虫ā鍨嬶紝涓枃 |
+| [TP-Aligner](https://modelscope.cn/models/damo/speech_timestamp_prediction-v1-16k-offline/summary) |涓枃| 闃块噷宸村反璇煶鏁版嵁 (50000hours) |   37.8M    | 鏃堕棿鎴虫ā鍨嬶紝涓枃 |
 
 ### 閫嗘枃鏈鍒欏寲
 

--
Gitblit v1.9.1