| | |
| | | <div align="left"><img src="image/funasr_logo.jpg" width="400"/></div> |
| | | <div align="left"><img src="docs/images/funasr_logo.jpg" width="400"/></div> |
| | | |
| | | # FunASR: A Fundamental End-to-End Speech Recognition Toolkit |
| | | |
| | | <strong>FunASR</strong> hopes to build a bridge between academic research and industrial applications on speech recognition. By supporting the training & finetuning of the industrial-grade speech recognition model released on [ModelScope](https://www.modelscope.cn/models?page=1&tasks=auto-speech-recognition), researchers and developers can conduct research and production of speech recognition models more conveniently, and promote the development of speech recognition ecology. ASR for Fun! |
| | | <strong>FunASR</strong> hopes to build a bridge between academic research and industrial applications on speech recognition. By supporting the training & finetuning of the industrial-grade speech recognition model released on [ModelScope](https://www.modelscope.cn/models?page=1&tasks=auto-speech-recognition), researchers and developers can conduct research and production of speech recognition models more conveniently, and promote the development of speech recognition ecology. ASR for Fun |
| | | |
| | | ## Installation(Training and Developing) |
| | | ## Release Notes: |
| | | ### 2023.1.16, funasr-0.1.6 |
| | | - We release a new version model [Paraformer-large-long](https://modelscope.cn/models/damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary), which integrate the [VAD](https://modelscope.cn/models/damo/speech_fsmn_vad_zh-cn-16k-common-pytorch/summary) model, [ASR](https://www.modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary), |
| | | [Punctuation](https://www.modelscope.cn/models/damo/punc_ct-transformer_zh-cn-common-vocab272727-pytorch/summary) model and timestamp together. The model could take in several hours long inputs. |
| | | - We release a new type model, [VAD](https://modelscope.cn/models/damo/speech_fsmn_vad_zh-cn-16k-common-pytorch/summary), which could predict the duration of none-silence speech. It could be freely integrated with any ASR models in [Model Zoo](docs/modelscope_models.md). |
| | | - We release a new type model, [Punctuation](https://www.modelscope.cn/models/damo/punc_ct-transformer_zh-cn-common-vocab272727-pytorch/summary), which could predict the punctuation of ASR models's results. It could be freely integrated with any ASR models in [Model Zoo](docs/modelscope_models.md). |
| | | - We release a new model, [Data2vec](https://www.modelscope.cn/models/damo/speech_data2vec_pretrain-zh-cn-aishell2-16k-pytorch/summary), an unsupervised pretraining model which could be finetuned on ASR and other downstream tasks. |
| | | - We release a new model, [Paraformer-Tiny](https://www.modelscope.cn/models/damo/speech_paraformer-tiny-commandword_asr_nat-zh-cn-16k-vocab544-pytorch/summary), a lightweight Paraformer model which supports Mandarin command words recognition. |
| | | - We release a new type model, [SV](https://www.modelscope.cn/models/damo/speech_xvector_sv-zh-cn-cnceleb-16k-spk3465-pytorch/summary), which could extract speaker embeddings and further perform speaker verification on paired utterances. It will be supported for speaker diarization in the future version. |
| | | - We improve the pipeline of modelscope to speedup the inference, by integrating the process of build model into build pipeline. |
| | | - Various new types of audio input types are now supported by modelscope inference pipeline, including wav.scp, wav format, audio bytes, wave samples... |
| | | |
| | | - Clone the repo: |
| | | ``` sh |
| | | git clone https://github.com/alibaba/FunASR.git |
| | | ``` |
| | | ## Key Features |
| | | - Many types of typical models are supported, e.g., [Tranformer](https://arxiv.org/abs/1706.03762), [Conformer](https://arxiv.org/abs/2005.08100), [Paraformer](https://arxiv.org/abs/2206.08317). |
| | | - We have released large number of academic and industrial pretrained models on [ModelScope](https://www.modelscope.cn/models?page=1&tasks=auto-speech-recognition) |
| | | - The pretrained model [Paraformer-large](https://www.modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary) obtains the best performance on many tasks in [SpeechIO leaderboard](https://github.com/SpeechColab/Leaderboard) |
| | | - FunASR supplies a easy-to-use pipeline to finetune pretrained models from [ModelScope](https://www.modelscope.cn/models?page=1&tasks=auto-speech-recognition) |
| | | - Compared to [Espnet](https://github.com/espnet/espnet) framework, the training speed of large-scale datasets in FunASR is much faster owning to the optimized dataloader. |
| | | |
| | | ## Installation |
| | | |
| | | - Install Conda: |
| | | ``` sh |
| | | wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh |
| | | sh Miniconda3-latest-Linux-x86_64.sh |
| | | source ~/.bashrc |
| | | conda create -n funasr python=3.7 |
| | | conda activate funasr |
| | | ``` |
| | | |
| | | - Install Pytorch (version >= 1.7.0): |
| | | |
| | | | cuda | | |
| | | |:-----:| --- | |
| | | | 9.2 | conda install pytorch==1.7.0 torchvision==0.8.0 torchaudio==0.7.0 cudatoolkit=9.2 -c pytorch | |
| | | | 10.2 | conda install pytorch==1.8.0 torchvision==0.9.0 torchaudio==0.8.0 cudatoolkit=10.2 -c pytorch | |
| | | | 11.1 | conda install pytorch==1.8.0 torchvision==0.9.0 torchaudio==0.8.0 cudatoolkit=11.1 -c pytorch | |
| | | |
| | | For more versions, please see https://pytorch.org/get-started/locally/ |
| | | ``` sh |
| | | pip3 install torch torchvision torchaudio |
| | | ``` |
| | | For more versions, please see [https://pytorch.org/get-started/locally](https://pytorch.org/get-started/locally) |
| | | |
| | | - Install ModelScope: |
| | | |
| | | If you are in the area of China, you could set the source to speedup the downloading. |
| | | |
| | | ``` sh |
| | | pip install "modelscope[audio]" -f https://modelscope.oss-cn-beijing.aliyuncs.com/releases/repo.html |
| | | pip config set global.index-url https://mirror.sjtu.edu.cn/pypi/web/simple |
| | | ``` |
| | | |
| | | - Install other packages: |
| | | Install or upgrade modelscope. |
| | | ``` sh |
| | | pip install "modelscope[audio]" --upgrade -f https://modelscope.oss-cn-beijing.aliyuncs.com/releases/repo.html |
| | | ``` |
| | | |
| | | For more details about modelscope, please see [modelscope installation](https://modelscope.cn/docs/%E7%8E%AF%E5%A2%83%E5%AE%89%E8%A3%85) |
| | | |
| | | - Install FunASR and other packages: |
| | | |
| | | ``` sh |
| | | git clone https://github.com/alibaba/FunASR.git && cd FunASR |
| | | pip install --editable ./ |
| | | ``` |
| | | |
| | |
| | | |
| | | - email: [funasr@list.alibaba-inc.com](funasr@list.alibaba-inc.com) |
| | | |
| | | - Dingding group: |
| | | <div align="left"><img src="image/dingding.jpg" width="400"/></div> |
| | | - Dingding group and Wechat group: |
| | | <div align="left"><img src="docs/images/dingding.jpg" width="250"/> <img src="docs/images/wechat.png" width="222"/></div> |
| | | |
| | | |
| | | ## Acknowledge |