python/FunASR-XL.git

			@@ -128,6 +128,44 @@
			注：支持单条音频文件识别，也支持文件列表，列表为kaldi风格wav.scp：`wav_id wav_path`

			### 非实时语音识别
			#### SenseVoice
			```python
			from funasr import AutoModel
			from funasr.utils.postprocess_utils import rich_transcription_postprocess

			model_dir = "iic/SenseVoiceSmall"

			model = AutoModel(
			model=model_dir,
			vad_model="fsmn-vad",
			vad_kwargs={"max_single_segment_time": 30000},
			device="cuda:0",
			)

			# en
			res = model.generate(
			input=f"{model.model_path}/example/en.mp3",
			cache={},
			language="auto", # "zn", "en", "yue", "ja", "ko", "nospeech"
			use_itn=True,
			batch_size_s=60,
			merge_vad=True, #
			merge_length_s=15,
			)
			text = rich_transcription_postprocess(res[0]["text"])
			print(text)
			```
			参数说明：
			- `model_dir`：模型名称，或本地磁盘中的模型路径。
			- `trust_remote_code`：
			- `True`表示model代码实现从`remote_code`处加载，`remote_code`指定`model`具体代码的位置（例如，当前目录下的`model.py`），支持绝对路径与相对路径，以及网络url。
			- `False`表示，model代码实现为 [FunASR](https://github.com/modelscope/FunASR) 内部集成版本，此时修改当前目录下的`model.py`不会生效，因为加载的是funasr内部版本，模型代码[点击查看](https://github.com/modelscope/FunASR/tree/main/funasr/models/sense_voice)。
			- `max_single_segment_time`: 表示`vad_model`最大切割音频时长, 单位是毫秒ms。
			- `use_itn`：输出结果中是否包含标点与逆文本正则化。
			- `batch_size_s` 表示采用动态batch，batch中总音频时长，单位为秒s。
			- `merge_vad`：是否将 vad 模型切割的短音频碎片合成，合并后长度为`merge_length_s`，单位为秒s。

			#### Paraformer
			```python
			from funasr import AutoModel
			# paraformer-zh is a multi-functional asr model