From 12a7adfdf3dd4f80b5d3a51cfc4eecc84eaa7c64 Mon Sep 17 00:00:00 2001
From: jmwang66 <wangjiaming.wjm@alibaba-inc.com>
Date: 星期一, 16 一月 2023 18:46:40 +0800
Subject: [PATCH] update version 0.1.6

---
 README.md |   53 +++++++++++++++++++++++++++++++++--------------------
 1 files changed, 33 insertions(+), 20 deletions(-)

diff --git a/README.md b/README.md
index 6dd38b2..6bf1278 100644
--- a/README.md
+++ b/README.md
@@ -2,38 +2,50 @@
 
 # FunASR: A Fundamental End-to-End Speech Recognition Toolkit
 
-<strong>FunASR</strong> hopes to build a bridge between academic research and industrial applications on speech recognition. By supporting the training & finetuning of the industrial-grade speech recognition model released on [ModelScope](https://www.modelscope.cn/models?page=1&tasks=auto-speech-recognition), researchers and developers can conduct research and production of speech recognition models more conveniently, and promote the development of speech recognition ecology. ASR for Fun锛�
+<strong>FunASR</strong> hopes to build a bridge between academic research and industrial applications on speech recognition. By supporting the training & finetuning of the industrial-grade speech recognition model released on [ModelScope](https://www.modelscope.cn/models?page=1&tasks=auto-speech-recognition), researchers and developers can conduct research and production of speech recognition models more conveniently, and promote the development of speech recognition ecology. ASR for Fun锛乕Model Zoo](docs/modelscope_models.md)
 
-## Highlights
-- FunASR supports many types of models, such as, Tranformer, Conformer, [Paraformer](https://arxiv.org/abs/2206.08317).
-- A large number of ASR models trained on academic datasets or industrial datasets are open sourced on [ModelScope](https://www.modelscope.cn/models?page=1&tasks=auto-speech-recognition), 
-- The pretrained model [Paraformer-large](https://www.modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary) obtains the first place on many task in [SpeechIO leaderboard](https://github.com/SpeechColab/Leaderboard)
-- FunASR supports large-scale dataset dataloader and multi-GPU training.
+## Release Notes: 
+### 2023.1.16, funasr-0.1.6
+- We release a new version model [Paraformer-large-long](https://modelscope.cn/models/damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary), which integrate the [VAD](https://modelscope.cn/models/damo/speech_fsmn_vad_zh-cn-16k-common-pytorch/summary) model, [ASR](https://www.modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary),
+ [Punctuation](https://www.modelscope.cn/models/damo/punc_ct-transformer_zh-cn-common-vocab272727-pytorch/summary) model and timestamp together. The model could take in several hours long inputs.
+- We release a new type model, [VAD](https://modelscope.cn/models/damo/speech_fsmn_vad_zh-cn-16k-common-pytorch/summary), which could predict the duration of none-silence speech. It could be freely integrated with any ASR models in [Model Zoo](docs/modelscope_models.md).
+- We release a new type model, [Punctuation](https://www.modelscope.cn/models/damo/punc_ct-transformer_zh-cn-common-vocab272727-pytorch/summary), which could predict the punctuation of ASR models's results. It could be freely integrated with any ASR models in [Model Zoo](docs/modelscope_models.md).
+- We release a new model, [Data2vec](https://www.modelscope.cn/models/damo/speech_data2vec_pretrain-zh-cn-aishell2-16k-pytorch/summary), an unsupervised pretraining model which could be finetuned on ASR and other downstream tasks.
+- We release a new model, [Paraformer-Tiny](https://www.modelscope.cn/models/damo/speech_paraformer-tiny-commandword_asr_nat-zh-cn-16k-vocab544-pytorch/summary), a lightweight Paraformer model which supports Mandarin command words recognition.
+- We release a new type model, [SV](https://www.modelscope.cn/models/damo/speech_xvector_sv-zh-cn-cnceleb-16k-spk3465-pytorch/summary), which could extract speaker embeddings and further perform speaker verification on paired utterances. It will be supported for speaker diarization in the future version.
+- We improve the pipeline of modelscope to speedup the inference, by integrating the process of build model into build pipeline.
+- Various new types of audio input types are now supported by modelscope inference pipeline, including wav.scp, wav format, audio bytes, wave samples...
+
+## Key Features
+- Many types of typical models are supported, e.g., [Tranformer](https://arxiv.org/abs/1706.03762), [Conformer](https://arxiv.org/abs/2005.08100), [Paraformer](https://arxiv.org/abs/2206.08317).
+- We have released large number of academic and industrial pretrained models on [ModelScope](https://www.modelscope.cn/models?page=1&tasks=auto-speech-recognition)
+- The pretrained model [Paraformer-large](https://www.modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary) obtains the best performance on many tasks in [SpeechIO leaderboard](https://github.com/SpeechColab/Leaderboard)
+- FunASR supplies a easy-to-use pipeline to finetune pretrained models from [ModelScope](https://www.modelscope.cn/models?page=1&tasks=auto-speech-recognition)
+- Compared to [Espnet](https://github.com/espnet/espnet) framework, the training speed of large-scale datasets in FunASR is much faster owning to the optimized dataloader.
 
 ## Installation(Training and Developing)
-
-- Clone the repo:
-``` sh
-git clone https://github.com/alibaba/FunASR.git
-```
 
 - Install Conda:
 ``` sh
 wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
 sh Miniconda3-latest-Linux-x86_64.sh
+source ~/.bashrc
 conda create -n funasr python=3.7
 conda activate funasr
 ```
 
 - Install Pytorch (version >= 1.7.0): 
-
-| cuda  | |
-|:-----:| --- |
-|  9.2  | conda install pytorch==1.7.0 torchvision==0.8.0 torchaudio==0.7.0 cudatoolkit=9.2 -c pytorch |
-| 10.2  | conda install pytorch==1.8.0 torchvision==0.9.0 torchaudio==0.8.0 cudatoolkit=10.2 -c pytorch |
-| 11.1  | conda install pytorch==1.8.0 torchvision==0.9.0 torchaudio==0.8.0 cudatoolkit=11.1 -c pytorch |
-
+``` sh
+pip3 install torch torchvision torchaudio
+```
 For more versions, please see [https://pytorch.org/get-started/locally](https://pytorch.org/get-started/locally)
+
+
+If you are in the area of China, you could set the source to speed the downloading.
+
+``` sh
+pip config set global.index-url https://mirror.sjtu.edu.cn/pypi/web/simple
+```
 
 - Install ModelScope:
 ``` sh
@@ -45,10 +57,11 @@
 - Install FunASR and other packages: 
 
 ``` sh
+git clone https://github.com/alibaba/FunASR.git && cd FunASR
 pip install --editable ./
 ```
 
-## Pretrained model hub
+## Pretrained Model Zoo
 
 We have trained many academic and industrial models, [model hub](docs/modelscope_models.md)
 
@@ -59,7 +72,7 @@
 - email: [funasr@list.alibaba-inc.com](funasr@list.alibaba-inc.com)
 
 - Dingding group:
-<div align="left"><img src="docs/images/dingding.jpg" width="400"/></div>
+<div align="left"><img src="docs/images/dingding.jpg" width="250"/>!<img src="docs/images/wechat.png" width="222"/></div>
 
 
 ## Acknowledge

--
Gitblit v1.9.1