| | |
| | | ## Citations |
| | | |
| | | ``` bibtex |
| | | @inproceedings{gao2020universal, |
| | | title={Universal ASR: Unifying Streaming and Non-Streaming ASR Using a Single Encoder-Decoder Model}, |
| | | author={Gao, Zhifu and Zhang, Shiliang and Lei, Ming and McLoughlin, Ian}, |
| | | booktitle={arXiv preprint arXiv:2010.14099}, |
| | | year={2020} |
| | | } |
| | | |
| | | @inproceedings{gao2022paraformer, |
| | | title={Paraformer: Fast and Accurate Parallel Transformer for Non-autoregressive End-to-End Speech Recognition}, |
| | | author={Gao, Zhifu and Zhang, Shiliang and McLoughlin, Ian and Yan, Zhijie}, |
| | | booktitle={INTERSPEECH}, |
| | | year={2022} |
| | | } |
| | | @inproceedings{gao2020universal, |
| | | title={Universal ASR: Unifying Streaming and Non-Streaming ASR Using a Single Encoder-Decoder Model}, |
| | | author={Gao, Zhifu and Zhang, Shiliang and Lei, Ming and McLoughlin, Ian}, |
| | | booktitle={arXiv preprint arXiv:2010.14099}, |
| | | year={2020} |
| | | } |
| | | @inproceedings{Shi2023AchievingTP, |
| | | title={Achieving Timestamp Prediction While Recognizing with Non-Autoregressive End-to-End ASR Model}, |
| | | author={Xian Shi and Yanni Chen and Shiliang Zhang and Zhijie Yan}, |
| | |
| | | ```shell |
| | | exit |
| | | sudo docker ps |
| | | sudo docker stop <container-id> |
| | | sudo docker stop funasr |
| | | ``` |
| | | |
| | |
| | | :caption: Recipe |
| | | |
| | | ./recipe/asr_recipe.md |
| | | ./recipe/sv_recipe.md |
| | | ./recipe/punc_recipe.md |
| | | ./recipe/vad_recipe.md |
| | | ./recipe/sv_recipe.md |
| | | ./recipe/sd_recipe.md |
| | | |
| | | .. toctree:: |
| | | :maxdepth: 1 |
| | |
| | | |
| | | .. toctree:: |
| | | :maxdepth: 1 |
| | | :caption: Huggingface pipeline |
| | | |
| | | Undo |
| | | |
| | | .. toctree:: |
| | | :maxdepth: 1 |
| | | :caption: Runtime |
| | | |
| | | ./runtime/export.md |
| | |
| | | rec_result = inference_pipeline(audio_in='https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_zh.wav') |
| | | print(rec_result) |
| | | ``` |
| | | The decoding mode of `fast` and `normal` |
| | | The decoding mode of `fast` and `normal` is fake streaming, which could be used for evaluating of recognition accuracy. |
| | | Full code of demo, please ref to [demo](https://github.com/alibaba-damo-academy/FunASR/discussions/151) |
| | | #### [RNN-T-online model]() |
| | | Undo |
| | |
| | | |
| | | #### API-reference |
| | | ##### Define pipeline |
| | | - `task`: `Tasks.auto_speech_recognition` |
| | | - `task`: `Tasks.voice_activity_detection` |
| | | - `model`: model name in [model zoo](https://alibaba-damo-academy.github.io/FunASR/en/modelscope_models.html#pretrained-models-on-modelscope), or model path in local disk |
| | | - `ngpu`: `1` (Defalut), decoding on GPU. If ngpu=0, decoding on CPU |
| | | - `ncpu`: `1` (Defalut), sets the number of threads used for intraop parallelism on CPU |
| | |
| | | - `output_dir`: None (Defalut), the output path of results if set |
| | | |
| | | ### Inference with multi-thread CPUs or multi GPUs |
| | | FunASR also offer recipes [infer.sh](https://github.com/alibaba-damo-academy/FunASR/blob/main/egs_modelscope/asr/TEMPLATE/infer.sh) to decode with multi-thread CPUs, or multi GPUs. |
| | | FunASR also offer recipes [infer.sh](https://github.com/alibaba-damo-academy/FunASR/blob/main/egs_modelscope/vad/TEMPLATE/infer.sh) to decode with multi-thread CPUs, or multi GPUs. |
| | | |
| | | - Setting parameters in `infer.sh` |
| | | - `model`: model name in [model zoo](https://alibaba-damo-academy.github.io/FunASR/en/modelscope_models.html#pretrained-models-on-modelscope), or model path in local disk |