python/FunASR-XL.git

parent: c892cc34 | 补丁 | 提交 | ignore whitespace

游雁

2024-01-23 9c9e02b2a4c54b8f9d1c198e0708ae1803adbd4c

funasr1.0

27个文件已修改

2个文件已添加

	README.md	16 ●●●●● 补丁 \| 查看 \| 原始文档 \| blame \| 历史
	README_zh.md	14 ●●●●● 补丁 \| 查看 \| 原始文档 \| blame \| 历史
	examples/industrial_data_pretraining/bicif_paraformer/demo.py	8 ●●●●● 补丁 \| 查看 \| 原始文档 \| blame \| 历史
	examples/industrial_data_pretraining/bicif_paraformer/infer.sh	4 ●●●●● 补丁 \| 查看 \| 原始文档 \| blame \| 历史
	examples/industrial_data_pretraining/contextual_paraformer/demo.py	2 ●●●●● 补丁 \| 查看 \| 原始文档 \| blame \| 历史
	examples/industrial_data_pretraining/contextual_paraformer/infer.sh	2 ●●●●● 补丁 \| 查看 \| 原始文档 \| blame \| 历史
	examples/industrial_data_pretraining/ct_transformer/demo.py	4 ●●●●● 补丁 \| 查看 \| 原始文档 \| blame \| 历史
	examples/industrial_data_pretraining/ct_transformer/infer.sh	6 ●●●●● 补丁 \| 查看 \| 原始文档 \| blame \| 历史
	examples/industrial_data_pretraining/ct_transformer_streaming/demo.py	3 ●●●●● 补丁 \| 查看 \| 原始文档 \| blame \| 历史
	examples/industrial_data_pretraining/ct_transformer_streaming/infer.sh	2 ●●●●● 补丁 \| 查看 \| 原始文档 \| blame \| 历史
	examples/industrial_data_pretraining/emotion2vec/demo.py	2 ●●●●● 补丁 \| 查看 \| 原始文档 \| blame \| 历史
	examples/industrial_data_pretraining/emotion2vec/infer.sh	2 ●●●●● 补丁 \| 查看 \| 原始文档 \| blame \| 历史
	examples/industrial_data_pretraining/fsmn_vad_streaming/demo.py	2 ●●●●● 补丁 \| 查看 \| 原始文档 \| blame \| 历史
	examples/industrial_data_pretraining/fsmn_vad_streaming/infer.sh	2 ●●●●● 补丁 \| 查看 \| 原始文档 \| blame \| 历史
	examples/industrial_data_pretraining/monotonic_aligner/demo.py	2 ●●●●● 补丁 \| 查看 \| 原始文档 \| blame \| 历史
	examples/industrial_data_pretraining/monotonic_aligner/infer.sh	2 ●●●●● 补丁 \| 查看 \| 原始文档 \| blame \| 历史
	examples/industrial_data_pretraining/paraformer-zh-spk/demo.py	6 ●●●●● 补丁 \| 查看 \| 原始文档 \| blame \| 历史
	examples/industrial_data_pretraining/paraformer-zh-spk/infer.sh	6 ●●●●● 补丁 \| 查看 \| 原始文档 \| blame \| 历史
	examples/industrial_data_pretraining/paraformer/demo.py	8 ●●●●● 补丁 \| 查看 \| 原始文档 \| blame \| 历史
	examples/industrial_data_pretraining/paraformer/finetune.sh	2 ●●●●● 补丁 \| 查看 \| 原始文档 \| blame \| 历史
	examples/industrial_data_pretraining/paraformer/infer.sh	2 ●●●●● 补丁 \| 查看 \| 原始文档 \| blame \| 历史
	examples/industrial_data_pretraining/paraformer_streaming/demo.py	8 ●●●●● 补丁 \| 查看 \| 原始文档 \| blame \| 历史
	examples/industrial_data_pretraining/paraformer_streaming/infer.sh	2 ●●●●● 补丁 \| 查看 \| 原始文档 \| blame \| 历史
	examples/industrial_data_pretraining/scama/demo.py	2 ●●●●● 补丁 \| 查看 \| 原始文档 \| blame \| 历史
	examples/industrial_data_pretraining/scama/infer.sh	2 ●●●●● 补丁 \| 查看 \| 原始文档 \| blame \| 历史
	examples/industrial_data_pretraining/seaco_paraformer/demo.py	12 ●●●●● 补丁 \| 查看 \| 原始文档 \| blame \| 历史
	examples/industrial_data_pretraining/seaco_paraformer/infer.sh	6 ●●●●● 补丁 \| 查看 \| 原始文档 \| blame \| 历史
	model_zoo/readme.md	32 ●●●●● 补丁 \| 查看 \| 原始文档 \| blame \| 历史
	model_zoo/readme_zh.md	28 ●●●●● 补丁 \| 查看 \| 原始文档 \| blame \| 历史

 README.md

@@ -91,9 +91,9 @@
from funasr import AutoModel
# paraformer-zh is a multi-functional asr model
# use vad, punc, spk or not as you need
model = AutoModel(model="paraformer-zh", model_revision="v2.0.2",
                  vad_model="fsmn-vad", vad_model_revision="v2.0.2",
                  punc_model="ct-punc-c", punc_model_revision="v2.0.3",
model = AutoModel(model="paraformer-zh", model_revision="v2.0.4",
                  vad_model="fsmn-vad", vad_model_revision="v2.0.4",
                  punc_model="ct-punc-c", punc_model_revision="v2.0.4",
                  # spk_model="cam++", spk_model_revision="v2.0.2",
                  )
res = model.generate(input=f"{model.model_path}/example/asr_example.wav", 
@@ -111,7 +111,7 @@
encoder_chunk_look_back = 4 #number of chunks to lookback for encoder self-attention
decoder_chunk_look_back = 1 #number of encoder chunks to lookback for decoder cross-attention

model = AutoModel(model="paraformer-zh-streaming", model_revision="v2.0.2")
model = AutoModel(model="paraformer-zh-streaming", model_revision="v2.0.4")

import soundfile
import os
@@ -134,7 +134,7 @@
```python
from funasr import AutoModel

model = AutoModel(model="fsmn-vad", model_revision="v2.0.2")
model = AutoModel(model="fsmn-vad", model_revision="v2.0.4")
wav_file = f"{model.model_path}/example/asr_example.wav"
res = model.generate(input=wav_file)
print(res)
@@ -144,7 +144,7 @@
from funasr import AutoModel

chunk_size = 200 # ms
model = AutoModel(model="fsmn-vad", model_revision="v2.0.2")
model = AutoModel(model="fsmn-vad", model_revision="v2.0.4")

import soundfile

@@ -165,7 +165,7 @@
```python
from funasr import AutoModel

model = AutoModel(model="ct-punc", model_revision="v2.0.2")
model = AutoModel(model="ct-punc", model_revision="v2.0.4")
res = model.generate(input="那今天的会就到这里吧 happy new year 明年见")
print(res)
```
@@ -173,7 +173,7 @@
```python
from funasr import AutoModel

model = AutoModel(model="fa-zh", model_revision="v2.0.2")
model = AutoModel(model="fa-zh", model_revision="v2.0.4")
wav_file = f"{model.model_path}/example/asr_example.wav"
text_file = f"{model.model_path}/example/text.txt"
res = model.generate(input=(wav_file, text_file), data_type=("sound", "text"))

 README_zh.md

@@ -87,9 +87,9 @@
from funasr import AutoModel
# paraformer-zh is a multi-functional asr model
# use vad, punc, spk or not as you need
model = AutoModel(model="paraformer-zh", model_revision="v2.0.2",
                  vad_model="fsmn-vad", vad_model_revision="v2.0.2",
                  punc_model="ct-punc-c", punc_model_revision="v2.0.3",
model = AutoModel(model="paraformer-zh", model_revision="v2.0.4",
                  vad_model="fsmn-vad", vad_model_revision="v2.0.4",
                  punc_model="ct-punc-c", punc_model_revision="v2.0.4",
                  # spk_model="cam++", spk_model_revision="v2.0.2",
                  )
res = model.generate(input=f"{model.model_path}/example/asr_example.wav", 
@@ -108,7 +108,7 @@
encoder_chunk_look_back = 4 #number of chunks to lookback for encoder self-attention
decoder_chunk_look_back = 1 #number of encoder chunks to lookback for decoder cross-attention

model = AutoModel(model="paraformer-zh-streaming", model_revision="v2.0.2")
model = AutoModel(model="paraformer-zh-streaming", model_revision="v2.0.4")

import soundfile
import os
@@ -132,7 +132,7 @@
```python
from funasr import AutoModel

model = AutoModel(model="fsmn-vad", model_revision="v2.0.2")
model = AutoModel(model="fsmn-vad", model_revision="v2.0.4")

wav_file = f"{model.model_path}/example/asr_example.wav"
res = model.generate(input=wav_file)
@@ -144,7 +144,7 @@
from funasr import AutoModel

chunk_size = 200 # ms
model = AutoModel(model="fsmn-vad", model_revision="v2.0.2")
model = AutoModel(model="fsmn-vad", model_revision="v2.0.4")

import soundfile

@@ -166,7 +166,7 @@
```python
from funasr import AutoModel

model = AutoModel(model="ct-punc", model_revision="v2.0.2")
model = AutoModel(model="ct-punc", model_revision="v2.0.4")

res = model.generate(input="那今天的会就到这里吧 happy new year 明年见")
print(res)

 examples/industrial_data_pretraining/bicif_paraformer/demo.py

@@ -6,13 +6,13 @@
from funasr import AutoModel

model = AutoModel(model="damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch",
                  model_revision="v2.0.2",
                  model_revision="v2.0.4",
                  vad_model="damo/speech_fsmn_vad_zh-cn-16k-common-pytorch",
                  vad_model_revision="v2.0.2",
                  vad_model_revision="v2.0.4",
                  punc_model="damo/punc_ct-transformer_zh-cn-common-vocab272727-pytorch",
                  punc_model_revision="v2.0.3",
                  punc_model_revision="v2.0.4",
                  spk_model="damo/speech_campplus_sv_zh-cn_16k-common",
                  spk_model_revision="v2.0.2",
                  spk_model_revision="v2.0.4",
                  )

res = model.generate(input="https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_vad_punc_example.wav", batch_size_s=300, batch_size_threshold_s=60)

 examples/industrial_data_pretraining/bicif_paraformer/infer.sh

@@ -1,8 +1,8 @@

model="damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch"
model_revision="v2.0.2"
model_revision="v2.0.4"
vad_model="damo/speech_fsmn_vad_zh-cn-16k-common-pytorch"
vad_model_revision="v2.0.2"
vad_model_revision="v2.0.4"
punc_model="damo/punc_ct-transformer_zh-cn-common-vocab272727-pytorch"
punc_model_revision="v2.0.3"
spk_model="damo/speech_campplus_sv_zh-cn_16k-common"

 examples/industrial_data_pretraining/contextual_paraformer/demo.py

@@ -5,7 +5,7 @@

from funasr import AutoModel

model = AutoModel(model="damo/speech_paraformer-large-contextual_asr_nat-zh-cn-16k-common-vocab8404", model_revision="v2.0.2")
model = AutoModel(model="damo/speech_paraformer-large-contextual_asr_nat-zh-cn-16k-common-vocab8404", model_revision="v2.0.4")

res = model.generate(input="https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_zh.wav",
            hotword='达摩院 魔搭')

 examples/industrial_data_pretraining/contextual_paraformer/infer.sh

@@ -1,6 +1,6 @@

model="damo/speech_paraformer-large-contextual_asr_nat-zh-cn-16k-common-vocab8404"
model_revision="v2.0.2"
model_revision="v2.0.4"

python funasr/bin/inference.py \
+model=${model} \

 examples/industrial_data_pretraining/ct_transformer/demo.py

@@ -5,7 +5,7 @@

from funasr import AutoModel

model = AutoModel(model="damo/punc_ct-transformer_zh-cn-common-vocab272727-pytorch", model_revision="v2.0.2")
model = AutoModel(model="damo/punc_ct-transformer_zh-cn-common-vocab272727-pytorch", model_revision="v2.0.4")

res = model.generate(input="https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_text/punc_example.txt")
print(res)
@@ -13,7 +13,7 @@

from funasr import AutoModel

model = AutoModel(model="damo/punc_ct-transformer_cn-en-common-vocab471067-large", model_revision="v2.0.2")
model = AutoModel(model="damo/punc_ct-transformer_cn-en-common-vocab471067-large", model_revision="v2.0.4")

res = model.generate(input="https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_text/punc_example.txt")
print(res)

 examples/industrial_data_pretraining/ct_transformer/infer.sh

@@ -1,9 +1,9 @@

model="damo/punc_ct-transformer_zh-cn-common-vocab272727-pytorch"
model_revision="v2.0.2"
#model="damo/punc_ct-transformer_zh-cn-common-vocab272727-pytorch"
#model_revision="v2.0.4"

model="damo/punc_ct-transformer_cn-en-common-vocab471067-large"
model_revision="v2.0.2"
model_revision="v2.0.4"

python funasr/bin/inference.py \
+model=${model} \

 examples/industrial_data_pretraining/ct_transformer_streaming/demo.py

@@ -5,7 +5,7 @@

from funasr import AutoModel

model = AutoModel(model="damo/punc_ct-transformer_zh-cn-common-vad_realtime-vocab272727", model_revision="v2.0.1")
model = AutoModel(model="damo/punc_ct-transformer_zh-cn-common-vad_realtime-vocab272727", model_revision="v2.0.4")

inputs = "跨境河流是养育沿岸|人民的生命之源长期以来为帮助下游地区防灾减灾中方技术人员|在上游地区极为恶劣的自然条件下克服巨大困难甚至冒着生命危险|向印方提供汛期水文资料处理紧急事件中方重视印方在跨境河流问题上的关切|愿意进一步完善双方联合工作机制|凡是|中方能做的我们|都会去做而且会做得更好我请印度朋友们放心中国在上游的|任何开发利用都会经过科学|规划和论证兼顾上下游的利益"
vads = inputs.split("|")
@@ -13,7 +13,6 @@
cache = {}
for vad in vads:
    rec_result = model.generate(input=vad, cache=cache)
    print(rec_result)
    rec_result_all += rec_result[0]['text']

print(rec_result_all)

 examples/industrial_data_pretraining/ct_transformer_streaming/infer.sh

@@ -1,6 +1,6 @@

model="damo/punc_ct-transformer_zh-cn-common-vad_realtime-vocab272727"
model_revision="v2.0.1"
model_revision="v2.0.4"

python funasr/bin/inference.py \
+model=${model} \

 examples/industrial_data_pretraining/emotion2vec/demo.py

@@ -5,7 +5,7 @@

from funasr import AutoModel

model = AutoModel(model="damo/emotion2vec_base", model_revision="v2.0.1")
model = AutoModel(model="damo/emotion2vec_base", model_revision="v2.0.4")

wav_file = f"{model.model_path}/example/test.wav"
res = model.generate(wav_file, output_dir="./outputs", granularity="utterance")

 examples/industrial_data_pretraining/emotion2vec/infer.sh

@@ -1,6 +1,6 @@

model="damo/emotion2vec_base"
model_revision="v2.0.0"
model_revision="v2.0.4"

python funasr/bin/inference.py \
+model=${model} \

 examples/industrial_data_pretraining/fsmn_vad_streaming/demo.py

@@ -7,7 +7,7 @@
wav_file = "https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/vad_example.wav"

chunk_size = 60000 # ms
model = AutoModel(model="damo/speech_fsmn_vad_zh-cn-16k-common-pytorch", model_revision="v2.0.2")
model = AutoModel(model="damo/speech_fsmn_vad_zh-cn-16k-common-pytorch", model_revision="v2.0.4")

res = model.generate(input=wav_file, chunk_size=chunk_size, )
print(res)

 examples/industrial_data_pretraining/fsmn_vad_streaming/infer.sh

@@ -1,7 +1,7 @@


model="damo/speech_fsmn_vad_zh-cn-16k-common-pytorch"
model_revision="v2.0.2"
model_revision="v2.0.4"

python funasr/bin/inference.py \
+model=${model} \

 examples/industrial_data_pretraining/monotonic_aligner/demo.py

@@ -5,7 +5,7 @@

from funasr import AutoModel

model = AutoModel(model="damo/speech_timestamp_prediction-v1-16k-offline", model_revision="v2.0.2")
model = AutoModel(model="damo/speech_timestamp_prediction-v1-16k-offline", model_revision="v2.0.4")

res = model.generate(input=("https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_zh.wav",
                   "欢迎大家来到魔搭社区进行体验"),

 examples/industrial_data_pretraining/monotonic_aligner/infer.sh

@@ -1,6 +1,6 @@

model="damo/speech_timestamp_prediction-v1-16k-offline"
model_revision="v2.0.2"
model_revision="v2.0.4"

python funasr/bin/inference.py \
+model=${model} \

 examples/industrial_data_pretraining/paraformer-zh-spk/demo.py

@@ -6,11 +6,11 @@
from funasr import AutoModel

model = AutoModel(model="damo/speech_seaco_paraformer_large_asr_nat-zh-cn-16k-common-vocab8404-pytorch",
                  model_revision="v2.0.2",
                  model_revision="v2.0.4",
                  vad_model="damo/speech_fsmn_vad_zh-cn-16k-common-pytorch",
                  vad_model_revision="v2.0.2",
                  vad_model_revision="v2.0.4",
                  punc_model="damo/punc_ct-transformer_zh-cn-common-vocab272727-pytorch",
                  punc_model_revision="v2.0.3",
                  punc_model_revision="v2.0.4",
                  spk_model="damo/speech_campplus_sv_zh-cn_16k-common",
                  spk_model_revision="v2.0.2"
                  )

 examples/industrial_data_pretraining/paraformer-zh-spk/infer.sh

@@ -1,10 +1,10 @@

model="damo/speech_seaco_paraformer_large_asr_nat-zh-cn-16k-common-vocab8404-pytorch"
model_revision="v2.0.2"
model_revision="v2.0.4"
vad_model="damo/speech_fsmn_vad_zh-cn-16k-common-pytorch"
vad_model_revision="v2.0.2"
vad_model_revision="v2.0.4"
punc_model="damo/punc_ct-transformer_zh-cn-common-vocab272727-pytorch"
punc_model_revision="v2.0.3"
punc_model_revision="v2.0.4"
spk_model="damo/speech_campplus_sv_zh-cn_16k-common"
spk_model_revision="v2.0.2"


 examples/industrial_data_pretraining/paraformer/demo.py

@@ -5,11 +5,11 @@

from funasr import AutoModel

model = AutoModel(model="damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch", model_revision="v2.0.3",
model = AutoModel(model="damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch", model_revision="v2.0.4",
                  # vad_model="damo/speech_fsmn_vad_zh-cn-16k-common-pytorch",
                  # vad_model_revision="v2.0.2",
                  # vad_model_revision="v2.0.4",
                  # punc_model="damo/punc_ct-transformer_zh-cn-common-vocab272727-pytorch",
                  # punc_model_revision="v2.0.3",
                  # punc_model_revision="v2.0.4",
                  )

res = model.generate(input="https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_zh.wav")
@@ -19,7 +19,7 @@
''' can not use currently
from funasr import AutoFrontend

frontend = AutoFrontend(model="damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch", model_revision="v2.0.2")
frontend = AutoFrontend(model="damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch", model_revision="v2.0.4")

fbanks = frontend(input="https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_zh.wav", batch_size=2)


 examples/industrial_data_pretraining/paraformer/finetune.sh

@@ -8,7 +8,7 @@

python funasr/bin/train.py \
+model="damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch" \
+model_revision="v2.0.2" \
+model_revision="v2.0.4" \
+train_data_set_list="/Users/zhifu/funasr_github/test_local/aishell2_dev_ios/asr_task_debug_len_10.jsonl" \
+valid_data_set_list="/Users/zhifu/funasr_github/test_local/aishell2_dev_ios/asr_task_debug_len_10.jsonl" \
++dataset_conf.batch_size=64 \

 examples/industrial_data_pretraining/paraformer/infer.sh

@@ -1,6 +1,6 @@

model="damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch"
model_revision="v2.0.2"
model_revision="v2.0.4"

python funasr/bin/inference.py \
+model=${model} \

 examples/industrial_data_pretraining/paraformer_streaming/demo.py

@@ -5,11 +5,11 @@

from funasr import AutoModel

chunk_size = [0, 10, 5] #[0, 10, 5] 600ms, [0, 8, 4] 480ms
encoder_chunk_look_back = 4 #number of chunks to lookback for encoder self-attention
decoder_chunk_look_back = 1 #number of encoder chunks to lookback for decoder cross-attention
chunk_size = [5, 10, 5] #[0, 10, 5] 600ms, [0, 8, 4] 480ms
encoder_chunk_look_back = 0 #number of chunks to lookback for encoder self-attention
decoder_chunk_look_back = 0 #number of encoder chunks to lookback for decoder cross-attention

model = AutoModel(model="damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online", model_revision="v2.0.2")
model = AutoModel(model="damo/speech_paraformer_asr_nat-zh-cn-16k-common-vocab8404-online", model_revision="v2.0.4")
res = model.generate(input="https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_zh.wav",
            chunk_size=chunk_size,
            encoder_chunk_look_back=encoder_chunk_look_back,

 examples/industrial_data_pretraining/paraformer_streaming/infer.sh

@@ -1,6 +1,6 @@

model="damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online"
model_revision="v2.0.2"
model_revision="v2.0.4"

python funasr/bin/inference.py \
+model=${model} \

 examples/industrial_data_pretraining/scama/demo.py

@@ -9,7 +9,7 @@
encoder_chunk_look_back = 0 #number of chunks to lookback for encoder self-attention
decoder_chunk_look_back = 0 #number of encoder chunks to lookback for decoder cross-attention

model = AutoModel(model="/Users/zhifu/Downloads/modelscope_models/speech_SCAMA_asr-zh-cn-16k-common-vocab8358-streaming", model_revision="v2.0.2")
model = AutoModel(model="/Users/zhifu/Downloads/modelscope_models/speech_SCAMA_asr-zh-cn-16k-common-vocab8358-streaming", model_revision="v2.0.4")
cache = {}
res = model.generate(input="https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_zh.wav",
            chunk_size=chunk_size,

 examples/industrial_data_pretraining/scama/infer.sh

@@ -1,6 +1,6 @@

model="damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online"
model_revision="v2.0.2"
model_revision="v2.0.4"

python funasr/bin/inference.py \
+model=${model} \

 examples/industrial_data_pretraining/seaco_paraformer/demo.py

@@ -5,14 +5,14 @@

from funasr import AutoModel

model = AutoModel(model="damo/speech_seaco_paraformer_large_asr_nat-zh-cn-16k-common-vocab8404-pytorch",
                  model_revision="v2.0.2",
model = AutoModel(model="iic/speech_seaco_paraformer_large_asr_nat-zh-cn-16k-common-vocab8404-pytorch",
                  model_revision="v2.0.4",
                  vad_model="damo/speech_fsmn_vad_zh-cn-16k-common-pytorch",
                  vad_model_revision="v2.0.2",
                  vad_model_revision="v2.0.4",
                  punc_model="damo/punc_ct-transformer_zh-cn-common-vocab272727-pytorch",
                  punc_model_revision="v2.0.3",
                  spk_model="damo/speech_campplus_sv_zh-cn_16k-common",
                  spk_model_revision="v2.0.2",
                  punc_model_revision="v2.0.4",
                  # spk_model="damo/speech_campplus_sv_zh-cn_16k-common",
                  # spk_model_revision="v2.0.2",
                  )

res = model.generate(input="https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_zh.wav",

 examples/industrial_data_pretraining/seaco_paraformer/infer.sh

@@ -1,10 +1,10 @@

model="damo/speech_seaco_paraformer_large_asr_nat-zh-cn-16k-common-vocab8404-pytorch"
model_revision="v2.0.2"
model_revision="v2.0.4"
vad_model="damo/speech_fsmn_vad_zh-cn-16k-common-pytorch"
vad_model_revision="v2.0.2"
vad_model_revision="v2.0.4"
punc_model="damo/punc_ct-transformer_zh-cn-common-vocab272727-pytorch"
punc_model_revision="v2.0.3"
punc_model_revision="v2.0.4"

python funasr/bin/inference.py \
+model=${model} \

 model_zoo/readme.md

New file
@@ -0,0 +1,32 @@
([简体中文](./readme_zh.md)|English)

# Model Zoo

## Model License
You are free to use, copy, modify, and share FunASR models under the conditions of this agreement. You should indicate the model source and author information when using, copying, modifying and sharing FunASR models. You should keep the relevant names of models in [FunASR software]. Full model license could see [license](https://github.com/alibaba-damo-academy/FunASR/blob/main/MODEL_LICENSE)

## Model Usage
Ref to [docs](https://alibaba-damo-academy.github.io/FunASR/en/modelscope_pipeline/quick_start.html)

## Model Zoo
Here we provided several pretrained models on different datasets. The details of models and datasets can be found on [ModelScope](https://www.modelscope.cn/models?page=1&tasks=auto-speech-recognition).

### Speech Recognition
#### Paraformer


FunASR has open-sourced a large number of pre-trained models on industrial data. You are free to use, copy, modify, and share FunASR models under the [Model License Agreement](./MODEL_LICENSE). Below are some representative models, for more models please refer to the [Model Zoo]().

(Note: 🤗 represents the Huggingface model zoo link, ⭐ represents the ModelScope model zoo link)


|                                                                             Model Name                                                                             |                                Task Details                                 |          Training Data           | Parameters |
|:------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:---------------------------------------------------------------------------:|:--------------------------------:|:----------:|
|    paraformer-zh <br> ([⭐](https://www.modelscope.cn/models/damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary)  [🤗]() )    |             speech recognition, with timestamps, non-streaming              |      60000 hours, Mandarin       |    220M    |
|                paraformer-zh-spk <br> ( [⭐](https://modelscope.cn/models/damo/speech_paraformer-large-vad-punc-spk_asr_nat-zh-cn/summary)  [🤗]() )                | speech recognition with speaker diarization, with timestamps, non-streaming |      60000 hours, Mandarin       |    220M    |
| <nobr>paraformer-zh-online <br> ( [⭐](https://modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online/summary) [🤗]() )</nobr> |                        speech recognition, streaming                        |      60000 hours, Mandarin       |    220M    |
|         paraformer-en <br> ( [⭐](https://www.modelscope.cn/models/damo/speech_paraformer-large-vad-punc_asr_nat-en-16k-common-vocab10020/summary) [🤗]() )         |             speech recognition, with timestamps, non-streaming              |       50000 hours, English       |    220M    |
|                     conformer-en <br> ( [⭐](https://modelscope.cn/models/damo/speech_conformer_asr-en-16k-vocab4199-pytorch/summary) [🤗]() )                      |                      speech recognition, non-streaming                      |       50000 hours, English       |    220M    |
|                     ct-punc <br> ( [⭐](https://modelscope.cn/models/damo/punc_ct-transformer_cn-en-common-vocab471067-large/summary) [🤗]() )                      |                           punctuation restoration                           |    100M, Mandarin and English    |    1.1G    | 
|                          fsmn-vad <br> ( [⭐](https://modelscope.cn/models/damo/speech_fsmn_vad_zh-cn-16k-common-pytorch/summary) [🤗]() )                          |                          voice activity detection                           | 5000 hours, Mandarin and English |    0.4M    | 
|                          fa-zh <br> ( [⭐](https://modelscope.cn/models/damo/speech_timestamp_prediction-v1-16k-offline/summary) [🤗]() )                           |                            timestamp prediction                             |       5000 hours, Mandarin       |    38M     | 

 model_zoo/readme_zh.md

New file
@@ -0,0 +1,28 @@
(简体中文|[English](./readme.md))

# 模型仓库

## 模型许可协议
您可以在本协议的条件下自由使用、复制、修改和分享FunASR模型。在使用、复制、修改和分享FunASR模型时，您应当标明模型来源和作者信息。您应当在[FunASR软件]中保留相关模型的名称。完整的模型许可证请参见 [模型许可协议](https://github.com/alibaba-damo-academy/FunASR/blob/main/MODEL_LICENSE)

## 模型用法
模型用法参考[文档](funasr/quick_start_zh.md)

## 模型仓库
这里我们提供了在不同数据集上预训练的模型。模型和数据集的详细信息可在 [ModelScope](https://www.modelscope.cn/models?page=1&tasks=auto-speech-recognition)中找到.

### 语音识别模型
#### Paraformer模型

（注：[🤗]()表示Huggingface模型仓库链接，[⭐]()表示ModelScope模型仓库链接）

|                                                                              模型名字                                                                              |         任务详情          |     训练数据     | 参数量  |
|:--------------------------------------------------------------------------------------------------------------------------------------------------------------:|:---------------------:|:------------:|:----:|
|      paraformer-zh <br> ([⭐](https://www.modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary)  [🤗]() )       |    语音识别，带时间戳输出，非实时    |  60000小时，中文  | 220M |
| SeACoParaformer-zh <br> ( [⭐](https://www.modelscope.cn/models/iic/speech_seaco_paraformer_large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary)  [🤗]() ) | 带热词功能的语音识别，带时间戳输出，非实时 |  60000小时，中文  | 220M |
|              paraformer-zh-spk <br> ( [⭐](https://modelscope.cn/models/damo/speech_paraformer-large-vad-punc-spk_asr_nat-zh-cn/summary)  [🤗]() )              |  分角色语音识别，带时间戳输出，非实时   |  60000小时，中文  | 220M |
|    paraformer-zh-streaming <br> ( [⭐](https://modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online/summary) [🤗]() )    |        语音识别，实时        |  60000小时，中文  | 220M |
| paraformer-zh-streaming-small <br> ( [⭐](https://www.modelscope.cn/models/iic/speech_paraformer_asr_nat-zh-cn-16k-common-vocab8404-online/summary) [🤗]() ) |        语音识别，实时        |  60000小时，中文  | 220M |

|       paraformer-en <br> ( [⭐](https://www.modelscope.cn/models/damo/speech_paraformer-large-vad-punc_asr_nat-en-16k-common-vocab10020/summary) [🤗]() )       |       语音识别，非实时        |  50000小时，英文  | 220M |

			@@ -91,9 +91,9 @@
			from funasr import AutoModel
			# paraformer-zh is a multi-functional asr model
			# use vad, punc, spk or not as you need
			model = AutoModel(model="paraformer-zh", model_revision="v2.0.2",
			vad_model="fsmn-vad", vad_model_revision="v2.0.2",
			punc_model="ct-punc-c", punc_model_revision="v2.0.3",
			model = AutoModel(model="paraformer-zh", model_revision="v2.0.4",
			vad_model="fsmn-vad", vad_model_revision="v2.0.4",
			punc_model="ct-punc-c", punc_model_revision="v2.0.4",
			# spk_model="cam++", spk_model_revision="v2.0.2",
			)
			res = model.generate(input=f"{model.model_path}/example/asr_example.wav",
			@@ -111,7 +111,7 @@
			encoder_chunk_look_back = 4 #number of chunks to lookback for encoder self-attention
			decoder_chunk_look_back = 1 #number of encoder chunks to lookback for decoder cross-attention

			model = AutoModel(model="paraformer-zh-streaming", model_revision="v2.0.2")
			model = AutoModel(model="paraformer-zh-streaming", model_revision="v2.0.4")

			import soundfile
			import os
			@@ -134,7 +134,7 @@
			```python
			from funasr import AutoModel

			model = AutoModel(model="fsmn-vad", model_revision="v2.0.2")
			model = AutoModel(model="fsmn-vad", model_revision="v2.0.4")
			wav_file = f"{model.model_path}/example/asr_example.wav"
			res = model.generate(input=wav_file)
			print(res)
			@@ -144,7 +144,7 @@
			from funasr import AutoModel

			chunk_size = 200 # ms
			model = AutoModel(model="fsmn-vad", model_revision="v2.0.2")
			model = AutoModel(model="fsmn-vad", model_revision="v2.0.4")

			import soundfile

			@@ -165,7 +165,7 @@
			```python
			from funasr import AutoModel

			model = AutoModel(model="ct-punc", model_revision="v2.0.2")
			model = AutoModel(model="ct-punc", model_revision="v2.0.4")
			res = model.generate(input="那今天的会就到这里吧 happy new year 明年见")
			print(res)
			```
			@@ -173,7 +173,7 @@
			```python
			from funasr import AutoModel

			model = AutoModel(model="fa-zh", model_revision="v2.0.2")
			model = AutoModel(model="fa-zh", model_revision="v2.0.4")
			wav_file = f"{model.model_path}/example/asr_example.wav"
			text_file = f"{model.model_path}/example/text.txt"
			res = model.generate(input=(wav_file, text_file), data_type=("sound", "text"))

			@@ -87,9 +87,9 @@
			from funasr import AutoModel
			# paraformer-zh is a multi-functional asr model
			# use vad, punc, spk or not as you need
			model = AutoModel(model="paraformer-zh", model_revision="v2.0.2",
			vad_model="fsmn-vad", vad_model_revision="v2.0.2",
			punc_model="ct-punc-c", punc_model_revision="v2.0.3",
			model = AutoModel(model="paraformer-zh", model_revision="v2.0.4",
			vad_model="fsmn-vad", vad_model_revision="v2.0.4",
			punc_model="ct-punc-c", punc_model_revision="v2.0.4",
			# spk_model="cam++", spk_model_revision="v2.0.2",
			)
			res = model.generate(input=f"{model.model_path}/example/asr_example.wav",
			@@ -108,7 +108,7 @@
			encoder_chunk_look_back = 4 #number of chunks to lookback for encoder self-attention
			decoder_chunk_look_back = 1 #number of encoder chunks to lookback for decoder cross-attention

			model = AutoModel(model="paraformer-zh-streaming", model_revision="v2.0.2")
			model = AutoModel(model="paraformer-zh-streaming", model_revision="v2.0.4")

			import soundfile
			import os
			@@ -132,7 +132,7 @@
			```python
			from funasr import AutoModel

			model = AutoModel(model="fsmn-vad", model_revision="v2.0.2")
			model = AutoModel(model="fsmn-vad", model_revision="v2.0.4")

			wav_file = f"{model.model_path}/example/asr_example.wav"
			res = model.generate(input=wav_file)
			@@ -144,7 +144,7 @@
			from funasr import AutoModel

			chunk_size = 200 # ms
			model = AutoModel(model="fsmn-vad", model_revision="v2.0.2")
			model = AutoModel(model="fsmn-vad", model_revision="v2.0.4")

			import soundfile

			@@ -166,7 +166,7 @@
			```python
			from funasr import AutoModel

			model = AutoModel(model="ct-punc", model_revision="v2.0.2")
			model = AutoModel(model="ct-punc", model_revision="v2.0.4")

			res = model.generate(input="那今天的会就到这里吧 happy new year 明年见")
			print(res)

			@@ -6,13 +6,13 @@
			from funasr import AutoModel

			model = AutoModel(model="damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch",
			model_revision="v2.0.2",
			model_revision="v2.0.4",
			vad_model="damo/speech_fsmn_vad_zh-cn-16k-common-pytorch",
			vad_model_revision="v2.0.2",
			vad_model_revision="v2.0.4",
			punc_model="damo/punc_ct-transformer_zh-cn-common-vocab272727-pytorch",
			punc_model_revision="v2.0.3",
			punc_model_revision="v2.0.4",
			spk_model="damo/speech_campplus_sv_zh-cn_16k-common",
			spk_model_revision="v2.0.2",
			spk_model_revision="v2.0.4",
			)

			res = model.generate(input="https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_vad_punc_example.wav", batch_size_s=300, batch_size_threshold_s=60)

			@@ -1,8 +1,8 @@

			model="damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch"
			model_revision="v2.0.2"
			model_revision="v2.0.4"
			vad_model="damo/speech_fsmn_vad_zh-cn-16k-common-pytorch"
			vad_model_revision="v2.0.2"
			vad_model_revision="v2.0.4"
			punc_model="damo/punc_ct-transformer_zh-cn-common-vocab272727-pytorch"
			punc_model_revision="v2.0.3"
			spk_model="damo/speech_campplus_sv_zh-cn_16k-common"

			@@ -5,7 +5,7 @@

			from funasr import AutoModel

			model = AutoModel(model="damo/speech_paraformer-large-contextual_asr_nat-zh-cn-16k-common-vocab8404", model_revision="v2.0.2")
			model = AutoModel(model="damo/speech_paraformer-large-contextual_asr_nat-zh-cn-16k-common-vocab8404", model_revision="v2.0.4")

			res = model.generate(input="https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_zh.wav",
			hotword='达摩院魔搭')

			@@ -1,6 +1,6 @@

			model="damo/speech_paraformer-large-contextual_asr_nat-zh-cn-16k-common-vocab8404"
			model_revision="v2.0.2"
			model_revision="v2.0.4"

			python funasr/bin/inference.py \
			+model=${model} \

			@@ -5,7 +5,7 @@

			from funasr import AutoModel

			model = AutoModel(model="damo/punc_ct-transformer_zh-cn-common-vocab272727-pytorch", model_revision="v2.0.2")
			model = AutoModel(model="damo/punc_ct-transformer_zh-cn-common-vocab272727-pytorch", model_revision="v2.0.4")

			res = model.generate(input="https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_text/punc_example.txt")
			print(res)
			@@ -13,7 +13,7 @@

			from funasr import AutoModel

			model = AutoModel(model="damo/punc_ct-transformer_cn-en-common-vocab471067-large", model_revision="v2.0.2")
			model = AutoModel(model="damo/punc_ct-transformer_cn-en-common-vocab471067-large", model_revision="v2.0.4")

			res = model.generate(input="https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_text/punc_example.txt")
			print(res)

			@@ -1,9 +1,9 @@

			model="damo/punc_ct-transformer_zh-cn-common-vocab272727-pytorch"
			model_revision="v2.0.2"
			#model="damo/punc_ct-transformer_zh-cn-common-vocab272727-pytorch"
			#model_revision="v2.0.4"

			model="damo/punc_ct-transformer_cn-en-common-vocab471067-large"
			model_revision="v2.0.2"
			model_revision="v2.0.4"

			python funasr/bin/inference.py \
			+model=${model} \

			@@ -1,6 +1,6 @@

			model="damo/punc_ct-transformer_zh-cn-common-vad_realtime-vocab272727"
			model_revision="v2.0.1"
			model_revision="v2.0.4"

			python funasr/bin/inference.py \
			+model=${model} \

			@@ -5,7 +5,7 @@

			from funasr import AutoModel

			model = AutoModel(model="damo/emotion2vec_base", model_revision="v2.0.1")
			model = AutoModel(model="damo/emotion2vec_base", model_revision="v2.0.4")

			wav_file = f"{model.model_path}/example/test.wav"
			res = model.generate(wav_file, output_dir="./outputs", granularity="utterance")

			@@ -1,6 +1,6 @@

			model="damo/emotion2vec_base"
			model_revision="v2.0.0"
			model_revision="v2.0.4"

			python funasr/bin/inference.py \
			+model=${model} \

			@@ -7,7 +7,7 @@
			wav_file = "https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/vad_example.wav"

			chunk_size = 60000 # ms
			model = AutoModel(model="damo/speech_fsmn_vad_zh-cn-16k-common-pytorch", model_revision="v2.0.2")
			model = AutoModel(model="damo/speech_fsmn_vad_zh-cn-16k-common-pytorch", model_revision="v2.0.4")

			res = model.generate(input=wav_file, chunk_size=chunk_size, )
			print(res)

			@@ -1,7 +1,7 @@


			model="damo/speech_fsmn_vad_zh-cn-16k-common-pytorch"
			model_revision="v2.0.2"
			model_revision="v2.0.4"

			python funasr/bin/inference.py \
			+model=${model} \

			@@ -5,7 +5,7 @@

			from funasr import AutoModel

			model = AutoModel(model="damo/speech_timestamp_prediction-v1-16k-offline", model_revision="v2.0.2")
			model = AutoModel(model="damo/speech_timestamp_prediction-v1-16k-offline", model_revision="v2.0.4")

			res = model.generate(input=("https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_zh.wav",
			"欢迎大家来到魔搭社区进行体验"),

			@@ -1,6 +1,6 @@

			model="damo/speech_timestamp_prediction-v1-16k-offline"
			model_revision="v2.0.2"
			model_revision="v2.0.4"

			python funasr/bin/inference.py \
			+model=${model} \

			@@ -6,11 +6,11 @@
			from funasr import AutoModel

			model = AutoModel(model="damo/speech_seaco_paraformer_large_asr_nat-zh-cn-16k-common-vocab8404-pytorch",
			model_revision="v2.0.2",
			model_revision="v2.0.4",
			vad_model="damo/speech_fsmn_vad_zh-cn-16k-common-pytorch",
			vad_model_revision="v2.0.2",
			vad_model_revision="v2.0.4",
			punc_model="damo/punc_ct-transformer_zh-cn-common-vocab272727-pytorch",
			punc_model_revision="v2.0.3",
			punc_model_revision="v2.0.4",
			spk_model="damo/speech_campplus_sv_zh-cn_16k-common",
			spk_model_revision="v2.0.2"
			)

			@@ -1,10 +1,10 @@

			model="damo/speech_seaco_paraformer_large_asr_nat-zh-cn-16k-common-vocab8404-pytorch"
			model_revision="v2.0.2"
			model_revision="v2.0.4"
			vad_model="damo/speech_fsmn_vad_zh-cn-16k-common-pytorch"
			vad_model_revision="v2.0.2"
			vad_model_revision="v2.0.4"
			punc_model="damo/punc_ct-transformer_zh-cn-common-vocab272727-pytorch"
			punc_model_revision="v2.0.3"
			punc_model_revision="v2.0.4"
			spk_model="damo/speech_campplus_sv_zh-cn_16k-common"
			spk_model_revision="v2.0.2"

			@@ -5,11 +5,11 @@

			from funasr import AutoModel

			model = AutoModel(model="damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch", model_revision="v2.0.3",
			model = AutoModel(model="damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch", model_revision="v2.0.4",
			# vad_model="damo/speech_fsmn_vad_zh-cn-16k-common-pytorch",
			# vad_model_revision="v2.0.2",
			# vad_model_revision="v2.0.4",
			# punc_model="damo/punc_ct-transformer_zh-cn-common-vocab272727-pytorch",
			# punc_model_revision="v2.0.3",
			# punc_model_revision="v2.0.4",
			)

			res = model.generate(input="https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_zh.wav")
			@@ -19,7 +19,7 @@
			''' can not use currently
			from funasr import AutoFrontend

			frontend = AutoFrontend(model="damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch", model_revision="v2.0.2")
			frontend = AutoFrontend(model="damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch", model_revision="v2.0.4")

			fbanks = frontend(input="https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_zh.wav", batch_size=2)

			@@ -8,7 +8,7 @@

			python funasr/bin/train.py \
			+model="damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch" \
			+model_revision="v2.0.2" \
			+model_revision="v2.0.4" \
			+train_data_set_list="/Users/zhifu/funasr_github/test_local/aishell2_dev_ios/asr_task_debug_len_10.jsonl" \
			+valid_data_set_list="/Users/zhifu/funasr_github/test_local/aishell2_dev_ios/asr_task_debug_len_10.jsonl" \
			++dataset_conf.batch_size=64 \

			@@ -1,6 +1,6 @@

			model="damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch"
			model_revision="v2.0.2"
			model_revision="v2.0.4"

			python funasr/bin/inference.py \
			+model=${model} \

			@@ -5,11 +5,11 @@

			from funasr import AutoModel

			chunk_size = [0, 10, 5] #[0, 10, 5] 600ms, [0, 8, 4] 480ms
			encoder_chunk_look_back = 4 #number of chunks to lookback for encoder self-attention
			decoder_chunk_look_back = 1 #number of encoder chunks to lookback for decoder cross-attention
			chunk_size = [5, 10, 5] #[0, 10, 5] 600ms, [0, 8, 4] 480ms
			encoder_chunk_look_back = 0 #number of chunks to lookback for encoder self-attention
			decoder_chunk_look_back = 0 #number of encoder chunks to lookback for decoder cross-attention

			model = AutoModel(model="damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online", model_revision="v2.0.2")
			model = AutoModel(model="damo/speech_paraformer_asr_nat-zh-cn-16k-common-vocab8404-online", model_revision="v2.0.4")
			res = model.generate(input="https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_zh.wav",
			chunk_size=chunk_size,
			encoder_chunk_look_back=encoder_chunk_look_back,

			@@ -9,7 +9,7 @@
			encoder_chunk_look_back = 0 #number of chunks to lookback for encoder self-attention
			decoder_chunk_look_back = 0 #number of encoder chunks to lookback for decoder cross-attention

			model = AutoModel(model="/Users/zhifu/Downloads/modelscope_models/speech_SCAMA_asr-zh-cn-16k-common-vocab8358-streaming", model_revision="v2.0.2")
			model = AutoModel(model="/Users/zhifu/Downloads/modelscope_models/speech_SCAMA_asr-zh-cn-16k-common-vocab8358-streaming", model_revision="v2.0.4")
			cache = {}
			res = model.generate(input="https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_zh.wav",
			chunk_size=chunk_size,

			@@ -5,14 +5,14 @@

			from funasr import AutoModel

			model = AutoModel(model="damo/speech_seaco_paraformer_large_asr_nat-zh-cn-16k-common-vocab8404-pytorch",
			model_revision="v2.0.2",
			model = AutoModel(model="iic/speech_seaco_paraformer_large_asr_nat-zh-cn-16k-common-vocab8404-pytorch",
			model_revision="v2.0.4",
			vad_model="damo/speech_fsmn_vad_zh-cn-16k-common-pytorch",
			vad_model_revision="v2.0.2",
			vad_model_revision="v2.0.4",
			punc_model="damo/punc_ct-transformer_zh-cn-common-vocab272727-pytorch",
			punc_model_revision="v2.0.3",
			spk_model="damo/speech_campplus_sv_zh-cn_16k-common",
			spk_model_revision="v2.0.2",
			punc_model_revision="v2.0.4",
			# spk_model="damo/speech_campplus_sv_zh-cn_16k-common",
			# spk_model_revision="v2.0.2",
			)

			res = model.generate(input="https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_zh.wav",

New file
			@@ -0,0 +1,32 @@
			([简体中文](./readme_zh.md)\|English)

			# Model Zoo

			## Model License
			You are free to use, copy, modify, and share FunASR models under the conditions of this agreement. You should indicate the model source and author information when using, copying, modifying and sharing FunASR models. You should keep the relevant names of models in [FunASR software]. Full model license could see [license](https://github.com/alibaba-damo-academy/FunASR/blob/main/MODEL_LICENSE)

			## Model Usage
			Ref to [docs](https://alibaba-damo-academy.github.io/FunASR/en/modelscope_pipeline/quick_start.html)

			## Model Zoo
			Here we provided several pretrained models on different datasets. The details of models and datasets can be found on [ModelScope](https://www.modelscope.cn/models?page=1&tasks=auto-speech-recognition).

			### Speech Recognition
			#### Paraformer


			FunASR has open-sourced a large number of pre-trained models on industrial data. You are free to use, copy, modify, and share FunASR models under the [Model License Agreement](./MODEL_LICENSE). Below are some representative models, for more models please refer to the [Model Zoo]().

			(Note: 🤗 represents the Huggingface model zoo link, ⭐ represents the ModelScope model zoo link)


			\| Model Name \| Task Details \| Training Data \| Parameters \|
			\|:------------------------------------------------------------------------------------------------------------------------------------------------------------------:\|:---------------------------------------------------------------------------:\|:--------------------------------:\|:----------:\|
			\| paraformer-zh <br> ([⭐](https://www.modelscope.cn/models/damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary) [🤗]() ) \| speech recognition, with timestamps, non-streaming \| 60000 hours, Mandarin \| 220M \|
			\| paraformer-zh-spk <br> ( [⭐](https://modelscope.cn/models/damo/speech_paraformer-large-vad-punc-spk_asr_nat-zh-cn/summary) [🤗]() ) \| speech recognition with speaker diarization, with timestamps, non-streaming \| 60000 hours, Mandarin \| 220M \|
			\| <nobr>paraformer-zh-online <br> ( [⭐](https://modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online/summary) [🤗]() )</nobr> \| speech recognition, streaming \| 60000 hours, Mandarin \| 220M \|
			\| paraformer-en <br> ( [⭐](https://www.modelscope.cn/models/damo/speech_paraformer-large-vad-punc_asr_nat-en-16k-common-vocab10020/summary) [🤗]() ) \| speech recognition, with timestamps, non-streaming \| 50000 hours, English \| 220M \|
			\| conformer-en <br> ( [⭐](https://modelscope.cn/models/damo/speech_conformer_asr-en-16k-vocab4199-pytorch/summary) [🤗]() ) \| speech recognition, non-streaming \| 50000 hours, English \| 220M \|
			\| ct-punc <br> ( [⭐](https://modelscope.cn/models/damo/punc_ct-transformer_cn-en-common-vocab471067-large/summary) [🤗]() ) \| punctuation restoration \| 100M, Mandarin and English \| 1.1G \|
			\| fsmn-vad <br> ( [⭐](https://modelscope.cn/models/damo/speech_fsmn_vad_zh-cn-16k-common-pytorch/summary) [🤗]() ) \| voice activity detection \| 5000 hours, Mandarin and English \| 0.4M \|
			\| fa-zh <br> ( [⭐](https://modelscope.cn/models/damo/speech_timestamp_prediction-v1-16k-offline/summary) [🤗]() ) \| timestamp prediction \| 5000 hours, Mandarin \| 38M \|

New file
			@@ -0,0 +1,28 @@
			(简体中文\|[English](./readme.md))

			# 模型仓库

			## 模型许可协议
			您可以在本协议的条件下自由使用、复制、修改和分享FunASR模型。在使用、复制、修改和分享FunASR模型时，您应当标明模型来源和作者信息。您应当在[FunASR软件]中保留相关模型的名称。完整的模型许可证请参见 [模型许可协议](https://github.com/alibaba-damo-academy/FunASR/blob/main/MODEL_LICENSE)

			## 模型用法
			模型用法参考[文档](funasr/quick_start_zh.md)

			## 模型仓库
			这里我们提供了在不同数据集上预训练的模型。模型和数据集的详细信息可在 [ModelScope](https://www.modelscope.cn/models?page=1&tasks=auto-speech-recognition)中找到.

			### 语音识别模型
			#### Paraformer模型

			（注：[🤗]()表示Huggingface模型仓库链接，[⭐]()表示ModelScope模型仓库链接）

			\| 模型名字 \| 任务详情 \| 训练数据 \| 参数量 \|
			\|:--------------------------------------------------------------------------------------------------------------------------------------------------------------:\|:---------------------:\|:------------:\|:----:\|
			\| paraformer-zh <br> ([⭐](https://www.modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary) [🤗]() ) \| 语音识别，带时间戳输出，非实时 \| 60000小时，中文 \| 220M \|
			\| SeACoParaformer-zh <br> ( [⭐](https://www.modelscope.cn/models/iic/speech_seaco_paraformer_large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary) [🤗]() ) \| 带热词功能的语音识别，带时间戳输出，非实时 \| 60000小时，中文 \| 220M \|
			\| paraformer-zh-spk <br> ( [⭐](https://modelscope.cn/models/damo/speech_paraformer-large-vad-punc-spk_asr_nat-zh-cn/summary) [🤗]() ) \| 分角色语音识别，带时间戳输出，非实时 \| 60000小时，中文 \| 220M \|
			\| paraformer-zh-streaming <br> ( [⭐](https://modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online/summary) [🤗]() ) \| 语音识别，实时 \| 60000小时，中文 \| 220M \|
			\| paraformer-zh-streaming-small <br> ( [⭐](https://www.modelscope.cn/models/iic/speech_paraformer_asr_nat-zh-cn-16k-common-vocab8404-online/summary) [🤗]() ) \| 语音识别，实时 \| 60000小时，中文 \| 220M \|

			\| paraformer-en <br> ( [⭐](https://www.modelscope.cn/models/damo/speech_paraformer-large-vad-punc_asr_nat-en-16k-common-vocab10020/summary) [🤗]() ) \| 语音识别，非实时 \| 50000小时，英文 \| 220M \|