From 32905d8cdedd53dad26680b0bd41397aaf0e51ae Mon Sep 17 00:00:00 2001
From: 游雁 <zhifu.gzf@alibaba-inc.com>
Date: 星期五, 05 一月 2024 11:52:48 +0800
Subject: [PATCH] funasr1.0

---
 README.md |   37 +++++++++++++++++++++++--------------
 1 files changed, 23 insertions(+), 14 deletions(-)

diff --git a/README.md b/README.md
index 3f6b434..db0fdcb 100644
--- a/README.md
+++ b/README.md
@@ -50,17 +50,17 @@
 (Note: 馃 represents the Huggingface model zoo link, 猸� represents the ModelScope model zoo link)
 
 
-|                                                                              Model Name                                                                              |                                Task Details                                 |          Training Date           | Parameters |
-|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:---------------------------------------------------------------------------:|:--------------------------------:|:----------:|
-| <nobr>paraformer-zh ([猸怾(https://www.modelscope.cn/models/damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary)  [馃]() )</nobr> |             speech recognition, with timestamps, non-streaming              |      60000 hours, Mandarin       |    220M    |
-|             <nobr>paraformer-zh-spk ( [猸怾(https://modelscope.cn/models/damo/speech_paraformer-large-vad-punc-spk_asr_nat-zh-cn/summary)  [馃]() )</nobr>             | speech recognition with speaker diarization, with timestamps, non-streaming |      60000 hours, Mandarin       |    220M    |
-|    <nobr>paraformer-zh-online ( [猸怾(https://modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online/summary) [馃]() )</nobr>     |                      speech recognition, non-streaming                      |      60000 hours, Mandarin       |    220M    |
-|      <nobr>paraformer-en ( [猸怾(https://www.modelscope.cn/models/damo/speech_paraformer-large-vad-punc_asr_nat-en-16k-common-vocab10020/summary) [馃]() )</nobr>      |             speech recognition, with timestamps, non-streaming              |       50000 hours, English       |    220M    |
-|                                                            <nobr>paraformer-en-spk ([馃]() [猸怾() )</nobr>                                                            |         speech recognition with speaker diarization, non-streaming          |       50000 hours, English       |    220M    |
-|                  <nobr>conformer-en ( [猸怾(https://modelscope.cn/models/damo/speech_conformer_asr-en-16k-vocab4199-pytorch/summary) [馃]() )</nobr>                   |                      speech recognition, non-streaming                      |       50000 hours, English       |    220M    |
-|                  <nobr>ct-punc ( [猸怾(https://modelscope.cn/models/damo/punc_ct-transformer_cn-en-common-vocab471067-large/summary) [馃]() )</nobr>                   |                           punctuation restoration                           |    100M, Mandarin and English    |    1.1G    | 
-|                       <nobr>fsmn-vad ( [猸怾(https://modelscope.cn/models/damo/speech_fsmn_vad_zh-cn-16k-common-pytorch/summary) [馃]() )</nobr>                       |                          voice activity detection                           | 5000 hours, Mandarin and English |    0.4M    | 
-|                       <nobr>fa-zh ( [猸怾(https://modelscope.cn/models/damo/speech_timestamp_prediction-v1-16k-offline/summary) [馃]() )</nobr>                        |                            timestamp prediction                             |       5000 hours, Mandarin       |    38M     | 
+|                                                                             Model Name                                                                             |                                Task Details                                 |          Training Date           | Parameters |
+|:------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:---------------------------------------------------------------------------:|:--------------------------------:|:----------:|
+|    paraformer-zh <br> ([猸怾(https://www.modelscope.cn/models/damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary)  [馃]() )    |             speech recognition, with timestamps, non-streaming              |      60000 hours, Mandarin       |    220M    |
+|                paraformer-zh-spk <br> ( [猸怾(https://modelscope.cn/models/damo/speech_paraformer-large-vad-punc-spk_asr_nat-zh-cn/summary)  [馃]() )                | speech recognition with speaker diarization, with timestamps, non-streaming |      60000 hours, Mandarin       |    220M    |
+| <nobr>paraformer-zh-online <br> ( [猸怾(https://modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online/summary) [馃]() )</nobr> |                        speech recognition, streaming                        |      60000 hours, Mandarin       |    220M    |
+|         paraformer-en <br> ( [猸怾(https://www.modelscope.cn/models/damo/speech_paraformer-large-vad-punc_asr_nat-en-16k-common-vocab10020/summary) [馃]() )         |             speech recognition, with timestamps, non-streaming              |       50000 hours, English       |    220M    |
+|                                                               paraformer-en-spk <br> ([猸怾()[馃]()  )                                                               |         speech recognition with speaker diarization, non-streaming          |               Undo               |    Undo    |
+|                     conformer-en <br> ( [猸怾(https://modelscope.cn/models/damo/speech_conformer_asr-en-16k-vocab4199-pytorch/summary) [馃]() )                      |                      speech recognition, non-streaming                      |       50000 hours, English       |    220M    |
+|                     ct-punc <br> ( [猸怾(https://modelscope.cn/models/damo/punc_ct-transformer_cn-en-common-vocab471067-large/summary) [馃]() )                      |                           punctuation restoration                           |    100M, Mandarin and English    |    1.1G    | 
+|                          fsmn-vad <br> ( [猸怾(https://modelscope.cn/models/damo/speech_fsmn_vad_zh-cn-16k-common-pytorch/summary) [馃]() )                          |                          voice activity detection                           | 5000 hours, Mandarin and English |    0.4M    | 
+|                          fa-zh <br> ( [猸怾(https://modelscope.cn/models/damo/speech_timestamp_prediction-v1-16k-offline/summary) [馃]() )                           |                            timestamp prediction                             |       5000 hours, Mandarin       |    38M     | 
 
 
 
@@ -76,6 +76,15 @@
 FunASR supports inference and fine-tuning of models trained on industrial data for tens of thousands of hours. For more details, please refer to [modelscope_egs](https://alibaba-damo-academy.github.io/FunASR/en/modelscope_pipeline/quick_start.html). It also supports training and fine-tuning of models on academic standard datasets. For more information, please refer to [egs](https://alibaba-damo-academy.github.io/FunASR/en/academic_recipe/asr_recipe.html).
 
 Below is a quick start tutorial. Test audio files ([Mandarin](https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/vad_example.wav), [English]()).
+
+### Command-line usage
+
+```shell
+funasr --model paraformer-zh asr_example_zh.wav
+```
+
+Notes: Support recognition of single audio file, as well as file list in Kaldi-style wav.scp format: `wav_id wav_pat`
+
 ### Speech Recognition (Non-streaming)
 ```python
 from funasr import infer
@@ -138,13 +147,13 @@
 
 ## Contributors
 
-| <div align="left"><img src="docs/images/nwpu.png" width="260"/> | <img src="docs/images/China_Telecom.png" width="200"/> </div>  | <img src="docs/images/RapidAI.png" width="200"/> </div> | <img src="docs/images/aihealthx.png" width="200"/> </div> | <img src="docs/images/XVERSE.png" width="250"/> </div> |
-|:---------------------------------------------------------------:|:--------------------------------------------------------------:|:-------------------------------------------------------:|:-----------------------------------------------------------:|:------------------------------------------------------:|
+| <div align="left"><img src="docs/images/alibaba.png" width="260"/> | <div align="left"><img src="docs/images/nwpu.png" width="260"/> | <img src="docs/images/China_Telecom.png" width="200"/> </div>  | <img src="docs/images/RapidAI.png" width="200"/> </div> | <img src="docs/images/aihealthx.png" width="200"/> </div> | <img src="docs/images/XVERSE.png" width="250"/> </div> |
+|:------------------------------------------------------------------:|:---------------------------------------------------------------:|:--------------------------------------------------------------:|:-------------------------------------------------------:|:-----------------------------------------------------------:|:------------------------------------------------------:|
 
 The contributors can be found in [contributors list](./Acknowledge.md)
 
 ## License
-This project is licensed under the [The MIT License](https://opensource.org/licenses/MIT). FunASR also contains various third-party components and some code modified from other repos under other open source licenses.
+This project is licensed under [The MIT License](https://opensource.org/licenses/MIT). FunASR also contains various third-party components and some code modified from other repos under other open source licenses.
 The use of pretraining model is subject to [model license](./MODEL_LICENSE)
 
 

--
Gitblit v1.9.1