From 4a7a984a5f3e3f894f86ce82e76ddd13d8a42a20 Mon Sep 17 00:00:00 2001 From: zhifu gao <zhifu.gzf@alibaba-inc.com> Date: 星期一, 11 三月 2024 17:56:30 +0800 Subject: [PATCH] Dev gzf (#1465) --- README.md | 28 +++++++++++++++++++++++----- 1 files changed, 23 insertions(+), 5 deletions(-) diff --git a/README.md b/README.md index 970c5eb..9841fe1 100644 --- a/README.md +++ b/README.md @@ -27,7 +27,8 @@ <a name="whats-new"></a> ## What's new: -- 2024/03/05锛欰dded support for the Whisper-large-v3 model, a multitasking model that can perform multilingual speech recognition, speech translation, and language identification. It can be downloaded from the[modelscope](https://www.modelscope.cn/models/iic/Whisper-large-v3/summary), and [openai](https://github.com/alibaba-damo-academy/FunASR/tree/main/examples/industrial_data_pretraining/whisper). +- 2024/03/05锛欰dded the Qwen-Audio and Qwen-Audio-Chat large-scale audio-text multimodal models, which have topped multiple audio domain leaderboards. These models support speech dialogue, [usage](examples/industrial_data_pretraining/qwen_audio). +- 2024/03/05锛欰dded support for the Whisper-large-v3 model, a multitasking model that can perform multilingual speech recognition, speech translation, and language identification. It can be downloaded from the[modelscope](examples/industrial_data_pretraining/whisper/demo.py), and [openai](examples/industrial_data_pretraining/whisper/demo_from_openai.py). - 2024/03/05: Offline File Transcription Service 4.4, Offline File Transcription Service of English 1.5锛孯eal-time Transcription Service 1.9 released锛宒ocker image supports ARM64 platform, update modelscope锛�([docs](runtime/readme.md)) - 2024/01/30锛歠unasr-1.0 has been released ([docs](https://github.com/alibaba-damo-academy/FunASR/discussions/1319)) - 2024/01/30锛歟motion recognition models are new supported. [model link](https://www.modelscope.cn/models/iic/emotion2vec_base_finetuned/summary), modified from [repo](https://github.com/ddlBoJack/emotion2vec). @@ -83,6 +84,8 @@ | cam++ <br> ( [猸怾(https://modelscope.cn/models/iic/speech_campplus_sv_zh-cn_16k-common/summary) [馃](https://huggingface.co/funasr/campplus) ) | speaker verification/diarization | 5000 hours | 7.2M | | Whisper-large-v2 <br> ([猸怾(https://www.modelscope.cn/models/iic/speech_whisper-large_asr_multilingual/summary) [馃崁](https://github.com/openai/whisper) ) | speech recognition, with timestamps, non-streaming | multilingual | 1.5G | | Whisper-large-v3 <br> ([猸怾(https://www.modelscope.cn/models/iic/Whisper-large-v3/summary) [馃崁](https://github.com/openai/whisper) ) | speech recognition, with timestamps, non-streaming | multilingual | 1.5G | +| Qwen-Audio <br> ([猸怾(examples/industrial_data_pretraining/qwen_audio/demo.py) [馃](https://huggingface.co/Qwen/Qwen-Audio) ) | audio-text multimodal models (pretraining) | multilingual | 8B | +| Qwen-Audio-Chat <br> ([猸怾(examples/industrial_data_pretraining/qwen_audio/demo_chat.py) [馃](https://huggingface.co/Qwen/Qwen-Audio-Chat) ) | audio-text multimodal models (chat) | multilingual | 8B | @@ -207,7 +210,22 @@ More examples ref to [docs](https://github.com/alibaba-damo-academy/FunASR/tree/main/examples/industrial_data_pretraining) -[//]: # (FunASR supports inference and fine-tuning of models trained on industrial datasets of tens of thousands of hours. For more details, please refer to ([modelscope_egs](https://alibaba-damo-academy.github.io/FunASR/en/modelscope_pipeline/quick_start.html)). It also supports training and fine-tuning of models on academic standard datasets. For more details, please refer to([egs](https://alibaba-damo-academy.github.io/FunASR/en/academic_recipe/asr_recipe.html)). The models include speech recognition (ASR), speech activity detection (VAD), punctuation recovery, language model, speaker verification, speaker separation, and multi-party conversation speech recognition. For a detailed list of models, please refer to the [Model Zoo](https://github.com/alibaba-damo-academy/FunASR/blob/main/docs/model_zoo/modelscope_models.md):) + +## Export ONNX + +### Command-line usage +```shell +funasr-export ++model=paraformer ++quantize=false +``` + +### python +```python +from funasr import AutoModel + +model = AutoModel(model="paraformer") + +res = model.export(quantize=False) +``` ## Deployment Service FunASR supports deploying pre-trained or further fine-tuned models for service. Currently, it supports the following types of service deployment: @@ -226,9 +244,9 @@ You can also scan the following DingTalk group or WeChat group QR code to join the community group for communication and discussion. -|DingTalk group | WeChat group | -|:---:|:-----------------------------------------------------:| -|<div align="left"><img src="docs/images/dingding.jpg" width="250"/> | <img src="docs/images/wechat.png" width="215"/></div> | +| DingTalk group | WeChat group | +|:-------------------------------------------------------------------:|:-----------------------------------------------------:| +| <div align="left"><img src="docs/images/dingding.png" width="250"/> | <img src="docs/images/wechat.png" width="215"/></div> | ## Contributors -- Gitblit v1.9.1