From ba3a3bf4e67e861b833092d05d7c3842ea670cbc Mon Sep 17 00:00:00 2001
From: zhifu gao <zhifu.gzf@alibaba-inc.com>
Date: 星期二, 28 五月 2024 14:53:18 +0800
Subject: [PATCH] Add files via upload

---
 README.md |   18 +++++++++++-------
 1 files changed, 11 insertions(+), 7 deletions(-)

diff --git a/README.md b/README.md
index 95529c6..e02b3e2 100644
--- a/README.md
+++ b/README.md
@@ -14,6 +14,7 @@
 | [**News**](https://github.com/alibaba-damo-academy/FunASR#whats-new) 
 | [**Installation**](#installation)
 | [**Quick Start**](#quick-start)
+| [**Tutorial**](https://github.com/alibaba-damo-academy/FunASR/blob/main/docs/tutorial/README.md)
 | [**Runtime**](./runtime/readme.md)
 | [**Model Zoo**](#model-zoo)
 | [**Contact**](#contact)
@@ -27,6 +28,8 @@
 
 <a name="whats-new"></a>
 ## What's new:
+- 2024/05/15锛歟motion recognition models are new supported. [emotion2vec+large](https://modelscope.cn/models/iic/emotion2vec_plus_large/summary)锛孾emotion2vec+base](https://modelscope.cn/models/iic/emotion2vec_plus_base/summary)锛孾emotion2vec+seed](https://modelscope.cn/models/iic/emotion2vec_plus_seed/summary). currently supports the following categories: 0: angry 1: happy 2: neutral 3: sad 4: unknown.
+- 2024/05/15: Offline File Transcription Service 4.5, Offline File Transcription Service of English 1.6锛孯eal-time Transcription Service 1.10 released锛宎dapting to FunASR 1.0 model structure锛�([docs](runtime/readme.md))
 - 2024/03/05锛欰dded the Qwen-Audio and Qwen-Audio-Chat large-scale audio-text multimodal models, which have topped multiple audio domain leaderboards. These models support speech dialogue, [usage](examples/industrial_data_pretraining/qwen_audio).
 - 2024/03/05锛欰dded support for the Whisper-large-v3 model, a multitasking model that can perform multilingual speech recognition, speech translation, and language identification. It can be downloaded from the[modelscope](examples/industrial_data_pretraining/whisper/demo.py), and [openai](examples/industrial_data_pretraining/whisper/demo_from_openai.py).
 - 2024/03/05: Offline File Transcription Service 4.4, Offline File Transcription Service of English 1.5锛孯eal-time Transcription Service 1.9 released锛宒ocker image supports ARM64 platform, update modelscope锛�([docs](runtime/readme.md))
@@ -82,10 +85,11 @@
 |                                   fsmn-vad <br> ( [猸怾(https://modelscope.cn/models/damo/speech_fsmn_vad_zh-cn-16k-common-pytorch/summary) [馃](https://huggingface.co/funasr/fsmn-vad) )                                   |               voice activity detection                | 5000 hours, Mandarin and English |    0.4M    | 
 |                                     fa-zh <br> ( [猸怾(https://modelscope.cn/models/damo/speech_timestamp_prediction-v1-16k-offline/summary) [馃](https://huggingface.co/funasr/fa-zh) )                                     |                 timestamp prediction                  |       5000 hours, Mandarin       |    38M     | 
 |                                       cam++ <br> ( [猸怾(https://modelscope.cn/models/iic/speech_campplus_sv_zh-cn_16k-common/summary) [馃](https://huggingface.co/funasr/campplus) )                                        |           speaker verification/diarization            |            5000 hours            |    7.2M    | 
-|                                                  Whisper-large-v2 <br> ([猸怾(https://www.modelscope.cn/models/iic/speech_whisper-large_asr_multilingual/summary)  [馃崁](https://github.com/openai/whisper) )                                                  |  speech recognition, with timestamps, non-streaming   |          multilingual            |    1.5G    |
-|                                                Whisper-large-v3 <br> ([猸怾(https://www.modelscope.cn/models/iic/Whisper-large-v3/summary)  [馃崁](https://github.com/openai/whisper) )                                                 |  speech recognition, with timestamps, non-streaming   |          multilingual            |    1.5G    |
-|                                         Qwen-Audio <br> ([猸怾(examples/industrial_data_pretraining/qwen_audio/demo.py)  [馃](https://huggingface.co/Qwen/Qwen-Audio) )                                         |      audio-text multimodal models (pretraining)       |     multilingual      |  8B  |
-|                   Qwen-Audio-Chat <br> ([猸怾(examples/industrial_data_pretraining/qwen_audio/demo_chat.py)  [馃](https://huggingface.co/Qwen/Qwen-Audio-Chat) )                                                |          audio-text multimodal models (chat)          |     multilingual      |  8B  |
+|                                                  Whisper-large-v2 <br> ([猸怾(https://www.modelscope.cn/models/iic/speech_whisper-large_asr_multilingual/summary)  [馃崁](https://github.com/openai/whisper) )                                                  |  speech recognition, with timestamps, non-streaming   |           multilingual           |    1550 M    |
+|                                                Whisper-large-v3 <br> ([猸怾(https://www.modelscope.cn/models/iic/Whisper-large-v3/summary)  [馃崁](https://github.com/openai/whisper) )                                                 |  speech recognition, with timestamps, non-streaming   |           multilingual           |    1550 M    |
+|                                         Qwen-Audio <br> ([猸怾(examples/industrial_data_pretraining/qwen_audio/demo.py)  [馃](https://huggingface.co/Qwen/Qwen-Audio) )                                         |      audio-text multimodal models (pretraining)       |           multilingual           |  8B  |
+|                   Qwen-Audio-Chat <br> ([猸怾(examples/industrial_data_pretraining/qwen_audio/demo_chat.py)  [馃](https://huggingface.co/Qwen/Qwen-Audio-Chat) )                                                |          audio-text multimodal models (chat)          |           multilingual           |  8B  |
+|                        emotion2vec+large <br> ([猸怾(https://modelscope.cn/models/iic/emotion2vec_plus_large/summary)  [馃](https://huggingface.co/emotion2vec/emotion2vec_plus_large) )                        |              speech emotion recongintion              |           40000 hours            |  300M  |
 
 
 
@@ -154,7 +158,7 @@
 from funasr import AutoModel
 
 model = AutoModel(model="fsmn-vad")
-wav_file = f"{model.model_path}/example/asr_example.wav"
+wav_file = f"{model.model_path}/example/vad_example.wav"
 res = model.generate(input=wav_file)
 print(res)
 ```
@@ -207,8 +211,8 @@
 res = model.generate(input=(wav_file, text_file), data_type=("sound", "text"))
 print(res)
 ```
-
-More examples ref to [docs](https://github.com/alibaba-damo-academy/FunASR/tree/main/examples/industrial_data_pretraining)
+More usages ref to [docs](docs/tutorial/README_zh.md), 
+more examples ref to [demo](https://github.com/alibaba-damo-academy/FunASR/tree/main/examples/industrial_data_pretraining)
 
 
 ## Export ONNX

--
Gitblit v1.9.1