python/FunASR-XL.git

parent: 7ca311f8 | 补丁 | 提交 | ignore whitespace

游雁

2023-04-21 4e0aae556bbfb81f765ddb3e247f34441c607c5e

docs

5个文件已修改

	README.md	13 ●●●●● 补丁 \| 查看 \| 原始文档 \| blame \| 历史
	docs/docker.md	2 ●●●●● 补丁 \| 查看 \| 原始文档 \| blame \| 历史
	docs/index.rst	9 ●●●●● 补丁 \| 查看 \| 原始文档 \| blame \| 历史
	egs_modelscope/asr/TEMPLATE/README.md	2 ●●●●● 补丁 \| 查看 \| 原始文档 \| blame \| 历史
	egs_modelscope/vad/TEMPLATE/README.md	4 ●●●●● 补丁 \| 查看 \| 原始文档 \| blame \| 历史

 README.md

@@ -97,19 +97,18 @@
## Citations

``` bibtex
@inproceedings{gao2020universal,
  title={Universal ASR: Unifying Streaming and Non-Streaming ASR Using a Single Encoder-Decoder Model},
  author={Gao, Zhifu and Zhang, Shiliang and Lei, Ming and McLoughlin, Ian},
  booktitle={arXiv preprint arXiv:2010.14099},
  year={2020}
}

@inproceedings{gao2022paraformer,
  title={Paraformer: Fast and Accurate Parallel Transformer for Non-autoregressive End-to-End Speech Recognition},
  author={Gao, Zhifu and Zhang, Shiliang and McLoughlin, Ian and Yan, Zhijie},
  booktitle={INTERSPEECH},
  year={2022}
}
@inproceedings{gao2020universal,
  title={Universal ASR: Unifying Streaming and Non-Streaming ASR Using a Single Encoder-Decoder Model},
  author={Gao, Zhifu and Zhang, Shiliang and Lei, Ming and McLoughlin, Ian},
  booktitle={arXiv preprint arXiv:2010.14099},
  year={2020}
}
@inproceedings{Shi2023AchievingTP,
  title={Achieving Timestamp Prediction While Recognizing with Non-Autoregressive End-to-End ASR Model},
  author={Xian Shi and Yanni Chen and Shiliang Zhang and Zhijie Yan},

 docs/docker.md

@@ -60,6 +60,6 @@
```shell
exit
sudo docker ps
sudo docker stop <container-id>
sudo docker stop funasr
```


 docs/index.rst

@@ -21,9 +21,10 @@
   :caption: Recipe

   ./recipe/asr_recipe.md
   ./recipe/sv_recipe.md
   ./recipe/punc_recipe.md
   ./recipe/vad_recipe.md
   ./recipe/sv_recipe.md
   ./recipe/sd_recipe.md

.. toctree::
   :maxdepth: 1
@@ -52,6 +53,12 @@

.. toctree::
   :maxdepth: 1
   :caption: Huggingface pipeline

   Undo

.. toctree::
   :maxdepth: 1
   :caption: Runtime

   ./runtime/export.md

 egs_modelscope/asr/TEMPLATE/README.md

@@ -53,7 +53,7 @@
rec_result = inference_pipeline(audio_in='https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_zh.wav')
print(rec_result)
```
The decoding mode of `fast` and `normal`
The decoding mode of `fast` and `normal` is fake streaming, which could be used for evaluating of recognition accuracy.
Full code of demo, please ref to [demo](https://github.com/alibaba-damo-academy/FunASR/discussions/151)
#### [RNN-T-online model]()
Undo

 egs_modelscope/vad/TEMPLATE/README.md

@@ -45,7 +45,7 @@

#### API-reference
##### Define pipeline
- `task`: `Tasks.auto_speech_recognition`
- `task`: `Tasks.voice_activity_detection`
- `model`: model name in [model zoo](https://alibaba-damo-academy.github.io/FunASR/en/modelscope_models.html#pretrained-models-on-modelscope), or model path in local disk
- `ngpu`: `1` (Defalut), decoding on GPU. If ngpu=0, decoding on CPU
- `ncpu`: `1` (Defalut), sets the number of threads used for intraop parallelism on CPU 
@@ -67,7 +67,7 @@
- `output_dir`: None (Defalut), the output path of results if set

### Inference with multi-thread CPUs or multi GPUs
FunASR also offer recipes [infer.sh](https://github.com/alibaba-damo-academy/FunASR/blob/main/egs_modelscope/asr/TEMPLATE/infer.sh) to decode with multi-thread CPUs, or multi GPUs.
FunASR also offer recipes [infer.sh](https://github.com/alibaba-damo-academy/FunASR/blob/main/egs_modelscope/vad/TEMPLATE/infer.sh) to decode with multi-thread CPUs, or multi GPUs.

- Setting parameters in `infer.sh`
    - `model`: model name in [model zoo](https://alibaba-damo-academy.github.io/FunASR/en/modelscope_models.html#pretrained-models-on-modelscope), or model path in local disk

			@@ -97,19 +97,18 @@
			## Citations

			``` bibtex
			@inproceedings{gao2020universal,
			title={Universal ASR: Unifying Streaming and Non-Streaming ASR Using a Single Encoder-Decoder Model},
			author={Gao, Zhifu and Zhang, Shiliang and Lei, Ming and McLoughlin, Ian},
			booktitle={arXiv preprint arXiv:2010.14099},
			year={2020}
			}

			@inproceedings{gao2022paraformer,
			title={Paraformer: Fast and Accurate Parallel Transformer for Non-autoregressive End-to-End Speech Recognition},
			author={Gao, Zhifu and Zhang, Shiliang and McLoughlin, Ian and Yan, Zhijie},
			booktitle={INTERSPEECH},
			year={2022}
			}
			@inproceedings{gao2020universal,
			title={Universal ASR: Unifying Streaming and Non-Streaming ASR Using a Single Encoder-Decoder Model},
			author={Gao, Zhifu and Zhang, Shiliang and Lei, Ming and McLoughlin, Ian},
			booktitle={arXiv preprint arXiv:2010.14099},
			year={2020}
			}
			@inproceedings{Shi2023AchievingTP,
			title={Achieving Timestamp Prediction While Recognizing with Non-Autoregressive End-to-End ASR Model},
			author={Xian Shi and Yanni Chen and Shiliang Zhang and Zhijie Yan},

			@@ -60,6 +60,6 @@
			```shell
			exit
			sudo docker ps
			sudo docker stop <container-id>
			sudo docker stop funasr
			```

			@@ -21,9 +21,10 @@
			:caption: Recipe

			./recipe/asr_recipe.md
			./recipe/sv_recipe.md
			./recipe/punc_recipe.md
			./recipe/vad_recipe.md
			./recipe/sv_recipe.md
			./recipe/sd_recipe.md

			.. toctree::
			:maxdepth: 1
			@@ -52,6 +53,12 @@

			.. toctree::
			:maxdepth: 1
			:caption: Huggingface pipeline

			Undo

			.. toctree::
			:maxdepth: 1
			:caption: Runtime

			./runtime/export.md

			@@ -53,7 +53,7 @@
			rec_result = inference_pipeline(audio_in='https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_zh.wav')
			print(rec_result)
			```
			The decoding mode of `fast` and `normal`
			The decoding mode of `fast` and `normal` is fake streaming, which could be used for evaluating of recognition accuracy.
			Full code of demo, please ref to [demo](https://github.com/alibaba-damo-academy/FunASR/discussions/151)
			#### [RNN-T-online model]()
			Undo

			@@ -45,7 +45,7 @@

			#### API-reference
			##### Define pipeline
			- `task`: `Tasks.auto_speech_recognition`
			- `task`: `Tasks.voice_activity_detection`
			- `model`: model name in [model zoo](https://alibaba-damo-academy.github.io/FunASR/en/modelscope_models.html#pretrained-models-on-modelscope), or model path in local disk
			- `ngpu`: `1` (Defalut), decoding on GPU. If ngpu=0, decoding on CPU
			- `ncpu`: `1` (Defalut), sets the number of threads used for intraop parallelism on CPU
			@@ -67,7 +67,7 @@
			- `output_dir`: None (Defalut), the output path of results if set

			### Inference with multi-thread CPUs or multi GPUs
			FunASR also offer recipes [infer.sh](https://github.com/alibaba-damo-academy/FunASR/blob/main/egs_modelscope/asr/TEMPLATE/infer.sh) to decode with multi-thread CPUs, or multi GPUs.
			FunASR also offer recipes [infer.sh](https://github.com/alibaba-damo-academy/FunASR/blob/main/egs_modelscope/vad/TEMPLATE/infer.sh) to decode with multi-thread CPUs, or multi GPUs.

			- Setting parameters in `infer.sh`
			- `model`: model name in [model zoo](https://alibaba-damo-academy.github.io/FunASR/en/modelscope_models.html#pretrained-models-on-modelscope), or model path in local disk