python/FunASR-XL.git

			@@ -13,182 +13,44 @@

			<div align="center">
			<h4>
			<a href="#最新动态"> 最新动态 </a>
			<a href="#核心功能"> 核心功能 </a>
			｜<a href="#最新动态"> 最新动态 </a>
			｜<a href="#安装教程"> 安装 </a>
			｜<a href="#快速开始"> 快速开始 </a>
			｜<a href="https://alibaba-damo-academy.github.io/FunASR/en/index.html"> 教程文档 </a>
			｜<a href="#核心功能"> 核心功能 </a>
			｜<a href="./docs/model_zoo/modelscope_models.md"> 模型仓库 </a>
			｜<a href="./funasr/runtime/readme_cn.md"> 服务部署 </a>
			｜<a href="#联系我们"> 联系我们 </a>
			</h4>
			</div>

			<a name="最新动态"></a>
			## 最新动态

			### 服务部署SDK

			- 2023.07.03:
			中文离线文件转写服务（CPU版本）发布，支持一键部署和测试([点击此处](funasr/runtime/readme_cn.md))

			### ASRU 2023 多通道多方会议转录挑战 2.0

			详情请参考文档（[点击此处](https://alibaba-damo-academy.github.io/FunASR/m2met2_cn/index.html)）


			### 语音识别

			- 学术模型：
			- Encoder-Decoder模型：[Transformer](egs/aishell/transformer)，[Conformer](egs/aishell/conformer)，[Branchformer](egs/aishell/branchformer)
			- Transducer模型：[RNNT（流式）](egs/aishell/rnnt)，[BAT](egs/aishell/bat)
			- 非自回归模型：[Paraformer](egs/aishell/paraformer)
			- 多说话人识别模型：[MFCCA](egs_modelscope/asr/mfcca)

			- 工业模型：
			- 中文通用模型：[Paraformer-large](egs_modelscope/asr/paraformer/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch)，[Paraformer-large长音频版本](egs_modelscope/asr_vad_punc/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch)，[Paraformer-large流式版本](egs_modelscope/asr/paraformer/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online)
			- 中文通用热词模型：[Paraformer-large-contextual](egs_modelscope/asr/paraformer/speech_paraformer-large-contextual_asr_nat-zh-cn-16k-common-vocab8404)，
			- 英文通用模型：[Conformer]()
			- 流式离线一体化模型： [16k UniASR闽南语](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-minnan-16k-common-vocab3825/summary)、 [16k UniASR法语](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-fr-16k-common-vocab3472-tensorflow1-online/summary)、 [16k UniASR德语](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-de-16k-common-vocab3690-tensorflow1-online/summary)、 [16k UniASR越南语](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-vi-16k-common-vocab1001-pytorch-online/summary)、 [16k UniASR波斯语](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-fa-16k-common-vocab1257-pytorch-online/summary),
			[16k UniASR缅甸语](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-my-16k-common-vocab696-pytorch/summary)、 [16k UniASR希伯来语](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-he-16k-common-vocab1085-pytorch/summary)、 [16k UniASR乌尔都语](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-ur-16k-common-vocab877-pytorch/summary)、 [8k UniASR中文金融领域](https://www.modelscope.cn/models/damo/speech_UniASR_asr_2pass-zh-cn-8k-finance-vocab3445-online/summary)、[16k UniASR中文音视频领域](https://www.modelscope.cn/models/damo/speech_UniASR_asr_2pass-zh-cn-16k-audio_and_video-vocab3445-online/summary)

			### 说话人识别
			- 说话人确认模型：[xvector](egs_modelscope/speaker_verification)
			- 说话人日志模型：[SOND](egs/callhome/diarization/sond)

			### 标点恢复
			- 中文标点模型：[CT-Transformer](egs_modelscope/punctuation/punc_ct-transformer_zh-cn-common-vocab272727-pytorch)，[CT-Transformer流式](egs_modelscope/punctuation/punc_ct-transformer_zh-cn-common-vadrealtime-vocab272727)

			### 端点检测
			- [FSMN-VAD](egs_modelscope/vad/speech_fsmn_vad_zh-cn-16k-common)

			### 时间戳预测
			- 字级别模型：[TP-Aligner](egs_modelscope/tp/speech_timestamp_prediction-v1-16k-offline)

			<a name="核心功能"></a>
			## 核心功能
			- FunASR是一个基础语音识别工具包，提供多种功能，包括语音识别（ASR）、语音活动检测（VAD）、标点恢复、语言模型、说话人验证、说话人分离和多人对话语音识别。
			- 我们在[ModelScope](https://www.modelscope.cn/models?page=1&tasks=auto-speech-recognition)上发布了大量的学术和工业预训练模型，可以通过我们的[模型仓库](https://github.com/alibaba-damo-academy/FunASR/blob/main/docs/model_zoo/modelscope_models.md)访问。代表性的[Paraformer-large](https://www.modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary)模型在许多语音识别任务中实现了SOTA性能。
			- FunASR提供了一个易于使用的接口，可以直接基于ModelScope中托管模型进行推理与微调。此外，FunASR中的优化数据加载器可以加速大规模数据集的训练速度。
			- FunASR是一个基础语音识别工具包，提供多种功能，包括语音识别（ASR）、语音端点检测（VAD）、标点恢复、语言模型、说话人验证、说话人分离和多人对话语音识别等。FunASR提供了便捷的脚本和教程，支持预训练好的模型的推理与微调。
			- 我们在[ModelScope](https://www.modelscope.cn/models?page=1&tasks=auto-speech-recognition)上发布了大量开源数据集或者海量工业数据训练的模型，可以通过我们的[模型仓库](https://github.com/alibaba-damo-academy/FunASR/blob/main/docs/model_zoo/modelscope_models.md)了解模型的详细信息。代表性的[Paraformer](https://www.modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary)非自回归端到端语音识别模型具有高精度、高效率、便捷部署的优点，支持快速构建语音识别服务，详细信息可以阅读([服务部署文档](funasr/runtime/readme_cn.md))。

			<a name="最新动态"></a>
			## 最新动态
			- 2023.07.17: BAT一种低延迟低内存消耗的RNN-T模型发布，详细信息参阅（[BAT](egs/aishell/bat)）
			- 2023.07.03: 中文离线文件转写服务一键部署的CPU版本发布，详细信息参阅([一键部署文档]([funasr/runtime/readme_cn.md](https://github.com/alibaba-damo-academy/FunASR/blob/main/funasr/runtime/docs/SDK_tutorial_zh.md)))
			- 2023.06.26: ASRU2023 多通道多方会议转录挑战赛2.0完成竞赛结果公布，详细信息参阅（[M2MeT2.0](https://alibaba-damo-academy.github.io/FunASR/m2met2_cn/index.html)）

			<a name="安装教程"></a>
			## 安装教程
			FunASR安装教程请阅读（[Installation](https://alibaba-damo-academy.github.io/FunASR/en/installation/installation.html)）

			直接安装发布软件包

			```shell
			pip3 install -U funasr
			# 中国大陆用户，如果遇到网络问题，可以用下面指令:
			# pip3 install -U funasr -i https://mirror.sjtu.edu.cn/pypi/web/simple
			```

			您也可以从源码安装


			``` sh
			git clone https://github.com/alibaba/FunASR.git && cd FunASR
			pip3 install -e ./
			# 中国大陆用户，如果遇到网络问题，可以用下面指令:
			# pip3 install -e ./ -i https://mirror.sjtu.edu.cn/pypi/web/simple
			```
			如果您需要使用ModelScope中发布的预训练模型，需要安装ModelScope

			```shell
			pip3 install -U modelscope
			# 中国大陆用户，如果遇到网络问题，可以用下面指令:
			# pip3 install -U modelscope -f https://modelscope.oss-cn-beijing.aliyuncs.com/releases/repo.html -i https://mirror.sjtu.edu.cn/pypi/web/simple
			```

			更详细安装过程介绍（[点击此处](https://alibaba-damo-academy.github.io/FunASR/en/installation/installation.html)）
			<a name="服务部署"></a>
			## 服务部署
			FunASR支持预训练或者进一步微调的模型进行服务部署。目前中文离线文件转写服务一键部署的CPU版本已经发布，详细信息参阅([一键部署文档]([funasr/runtime/readme_cn.md](https://github.com/alibaba-damo-academy/FunASR/blob/main/funasr/runtime/docs/SDK_tutorial_zh.md)))。更多服务部署详细信息可以参阅([服务部署文档](funasr/runtime/readme_cn.md))。

			<a name="快速开始"></a>
			## 快速开始
			FunASR支持数万小时工业数据训练的模型的推理和微调，详细信息可以参阅（[modelscope_egs](https://alibaba-damo-academy.github.io/FunASR/en/modelscope_pipeline/quick_start.html)）；也支持学术标准数据集模型的训练和微调，详细信息可以参阅（[egs](https://alibaba-damo-academy.github.io/FunASR/en/academic_recipe/asr_recipe.html)）。模型包含语音识别（ASR）、语音活动检测（VAD）、标点恢复、语言模型、说话人验证、说话人分离和多人对话语音识别等，详细模型列表可以参阅[模型仓库](https://github.com/alibaba-damo-academy/FunASR/blob/main/docs/model_zoo/modelscope_models.md)：

			您可以通过如下几种方式使用FunASR功能:

			- 服务部署SDK
			- 工业模型egs
			- 学术模型egs

			### 服务部署SDK

			#### python版本示例

			支持实时流式语音识别，并且会用非流式模型进行纠错，输出文本带有标点。目前只支持单个client，如需多并发请参考下方c++版本服务部署SDK

			##### 服务端部署
			```shell
			cd funasr/runtime/python/websocket
			python funasr_wss_server.py --port 10095
			```

			##### 客户端测试
			```shell
			python funasr_wss_client.py --host "127.0.0.1" --port 10095 --mode 2pass --chunk_size "5,10,5"
			#python funasr_wss_client.py --host "127.0.0.1" --port 10095 --mode 2pass --chunk_size "8,8,4" --audio_in "./data/wav.scp"
			```
			更多例子可以参考（[点击此处](https://alibaba-damo-academy.github.io/FunASR/en/runtime/websocket_python.html#id2)）

			<a name="cpp版本示例"></a>
			#### c++版本示例

			目前已支持离线文件转写服务（CPU），支持上百路并发请求

			##### 服务端部署
			可以用个下面指令，一键部署完成部署
			```shell
			curl -O https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/shell/funasr-runtime-deploy-offline-cpu-zh.sh
			sudo bash funasr-runtime-deploy-offline-cpu-zh.sh install --workspace ./funasr-runtime-resources
			```

			##### 客户端测试

			```shell
			python3 funasr_wss_client.py --host "127.0.0.1" --port 10095 --mode offline --audio_in "../audio/asr_example.wav"
			```
			更多例子参考（[点击此处](https://github.com/alibaba-damo-academy/FunASR/blob/main/funasr/runtime/docs/SDK_tutorial_zh.md)）


			### 工业模型egs

			如果您希望使用ModelScope中预训练好的工业模型，进行推理或者微调训练，您可以参考下面指令：


			```python
			from modelscope.pipelines import pipeline
			from modelscope.utils.constant import Tasks

			inference_pipeline = pipeline(
			task=Tasks.auto_speech_recognition,
			model='damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch',
			)

			rec_result = inference_pipeline(audio_in='https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_zh.wav')
			print(rec_result)
			# {'text': '欢迎大家来体验达摩院推出的语音识别模型'}
			```

			更多例子可以参考（[点击此处](https://alibaba-damo-academy.github.io/FunASR/en/modelscope_pipeline/quick_start.html)）


			### 学术模型egs

			如果您希望从头开始训练，通常为学术模型，您可以通过下面的指令启动训练与推理：

			```shell
			cd egs/aishell/paraformer
			. ./run.sh --CUDA_VISIBLE_DEVICES="0,1" --gpu_num=2
			```

			更多例子可以参考（[点击此处](https://alibaba-damo-academy.github.io/FunASR/en/academic_recipe/asr_recipe.html)）

			<a name="联系我们"></a>
			<a name="社区交流"></a>
			## 联系我们

			如果您在使用中遇到困难，可以通过以下方式联系我们

			- 邮件: [funasr@list.alibaba-inc.com](funasr@list.alibaba-inc.com)

			如果您在使用中遇到问题，可以直接在github页面提Issues。欢迎语音兴趣爱好者扫描以下的钉钉群或者微信群二维码加入社区群，进行交流和讨论。
			\| 钉钉群 \| 微信 \|
			\|:---------------------------------------------------------------------:\|:-----------------------------------------------------:\|
			\| <div align="left"><img src="docs/images/dingding.jpg" width="250"/> \| <img src="docs/images/wechat.png" width="232"/></div> \|
			@@ -202,12 +64,8 @@


			## 许可协议
			项目遵循[The MIT License](https://opensource.org/licenses/MIT)开源协议。工业模型许可协议请参考（[点击此处](./MODEL_LICENSE)）
			项目遵循[The MIT License](https://opensource.org/licenses/MIT)开源协议，许可协议请参考（[点击此处](./MODEL_LICENSE)）


			## Stargazers over time

			[![Stargazers over time](https://starchart.cc/alibaba-damo-academy/FunASR.svg)](https://starchart.cc/alibaba-damo-academy/FunASR)

			## 论文引用

			@@ -218,6 +76,12 @@
			year={2023},
			booktitle={INTERSPEECH},
			}
			@inproceedings{An2023bat,
			author={Keyu An and Xian Shi and Shiliang Zhang},
			title={BAT: Boundary aware transducer for memory-efficient and low-latency ASR},
			year={2023},
			booktitle={INTERSPEECH},
			}
			@inproceedings{gao22b_interspeech,
			author={Zhifu Gao and ShiLiang Zhang and Ian McLoughlin and Zhijie Yan},
			title={{Paraformer: Fast and Accurate Parallel Transformer for Non-autoregressive End-to-End Speech Recognition}},