python/FunASR-XL.git

parent: badb2fd0 | 补丁 | 提交 | show whitespace

游雁

2024-05-30 f577bb5e72b0a8ce4b7c947e0661e15deb4078ea

docs

4个文件已修改

	README.md	17 ●●●●● 补丁 \| 查看 \| 原始文档 \| blame \| 历史
	README_zh.md	21 ●●●●● 补丁 \| 查看 \| 原始文档 \| blame \| 历史
	funasr/download/name_maps_from_hub.py	22 ●●●●● 补丁 \| 查看 \| 原始文档 \| blame \| 历史
	setup.py	2 ●●●●● 补丁 \| 查看 \| 原始文档 \| blame \| 历史

 README.md

@@ -157,6 +157,8 @@
```
Note: `chunk_size` is the configuration for streaming latency.` [0,10,5]` indicates that the real-time display granularity is `10*60=600ms`, and the lookahead information is `5*60=300ms`. Each inference input is `600ms` (sample points are `16000*0.6=960`), and the output is the corresponding text. For the last speech segment input, `is_final=True` needs to be set to force the output of the last word.

<details><summary>More Examples</summary>

### Voice Activity Detection (Non-Streaming)
```python
from funasr import AutoModel
@@ -215,9 +217,24 @@
res = model.generate(input=(wav_file, text_file), data_type=("sound", "text"))
print(res)
```


### Speech Emotion Recognition
```python
from funasr import AutoModel

model = AutoModel(model="emotion2vec_plus_large")

wav_file = f"{model.model_path}/example/test.wav"

res = model.generate(wav_file, output_dir="./outputs", granularity="utterance", extract_embedding=False)
print(res)
```

More usages ref to [docs](docs/tutorial/README_zh.md), 
more examples ref to [demo](https://github.com/alibaba-damo-academy/FunASR/tree/main/examples/industrial_data_pretraining)

</details>

## Export ONNX


 README_zh.md

@@ -68,10 +68,10 @@
git clone https://github.com/alibaba/FunASR.git && cd FunASR
pip3 install -e ./
```
如果需要使用工业预训练模型，安装modelscope（可选）
如果需要使用工业预训练模型，安装modelscope与huggingface_hub（可选）

```shell
pip3 install -U modelscope
pip3 install -U modelscope huggingface_hub
```

## 模型仓库
@@ -153,6 +153,8 @@

注：`chunk_size`为流式延时配置，`[0,10,5]`表示上屏实时出字粒度为`10*60=600ms`，未来信息为`5*60=300ms`。每次推理输入为`600ms`（采样点数为`16000*0.6=960`），输出为对应文字，最后一个语音片段输入需要设置`is_final=True`来强制输出最后一个字。

<details><summary>更多例子</summary>

### 语音端点检测（非实时）
```python
from funasr import AutoModel
@@ -216,9 +218,24 @@
res = model.generate(input=(wav_file, text_file), data_type=("sound", "text"))
print(res)
```

### 情感识别
```python
from funasr import AutoModel

model = AutoModel(model="emotion2vec_plus_large")

wav_file = f"{model.model_path}/example/test.wav"

res = model.generate(wav_file, output_dir="./outputs", granularity="utterance", extract_embedding=False)
print(res)
```

更详细（[教程文档](docs/tutorial/README_zh.md)），
更多（[模型示例](https://github.com/alibaba-damo-academy/FunASR/tree/main/examples/industrial_data_pretraining)）

</details>

## 导出ONNX
### 从命令行导出
```shell

 funasr/download/name_maps_from_hub.py

@@ -12,10 +12,30 @@
    "Whisper-large-v2": "iic/speech_whisper-large_asr_multilingual",
    "Whisper-large-v3": "iic/Whisper-large-v3",
    "Qwen-Audio": "Qwen/Qwen-Audio",
    "emotion2vec_plus_large": "iic/emotion2vec_plus_large",
    "emotion2vec_plus_base": "iic/emotion2vec_plus_base",
    "emotion2vec_plus_seed": "iic/emotion2vec_plus_seed",
}

name_maps_hf = {
    "": "",
    "paraformer": "funasr/paraformer-zh",
    "paraformer-zh": "funasr/paraformer-zh",
    "paraformer-en": "funasr/paraformer-zh",
    "paraformer-zh-streaming": "funasr/paraformer-zh-streaming",
    "fsmn-vad": "funasr/fsmn-vad",
    "ct-punc": "funasr/ct-punc",
    "ct-punc-c": "iic/punc_ct-transformer_zh-cn-common-vocab272727-pytorch",
    "fa-zh": "funasr/fa-zh",
    "cam++": "funasr/campplus",
    "Whisper-large-v2": "iic/speech_whisper-large_asr_multilingual",
    "Whisper-large-v3": "iic/Whisper-large-v3",
    "Qwen-Audio": "Qwen/Qwen-Audio",
    "emotion2vec_plus_large": "emotion2vec/emotion2vec_plus_large",
    "iic/emotion2vec_plus_large": "emotion2vec/emotion2vec_plus_large",
    "emotion2vec_plus_base": "emotion2vec/emotion2vec_plus_base",
    "iic/emotion2vec_plus_base": "emotion2vec/emotion2vec_plus_base",
    "emotion2vec_plus_seed": "emotion2vec/emotion2vec_plus_seed",
    "iic/emotion2vec_plus_seed": "emotion2vec/emotion2vec_plus_seed",
}

name_maps_openai = {

 setup.py

@@ -39,7 +39,7 @@
        "jaconv",
        "hydra-core>=1.3.2",
        "tensorboardX",
        "rotary_embedding_torch",
        # "rotary_embedding_torch",
        "openai-whisper",
    ],
    # train: The modules invoked when training only.

			@@ -157,6 +157,8 @@
			```
			Note: `chunk_size` is the configuration for streaming latency.` [0,10,5]` indicates that the real-time display granularity is `1060=600ms`, and the lookahead information is `560=300ms`. Each inference input is `600ms` (sample points are `16000*0.6=960`), and the output is the corresponding text. For the last speech segment input, `is_final=True` needs to be set to force the output of the last word.

			<details><summary>More Examples</summary>

			### Voice Activity Detection (Non-Streaming)
			```python
			from funasr import AutoModel
			@@ -215,9 +217,24 @@
			res = model.generate(input=(wav_file, text_file), data_type=("sound", "text"))
			print(res)
			```


			### Speech Emotion Recognition
			```python
			from funasr import AutoModel

			model = AutoModel(model="emotion2vec_plus_large")

			wav_file = f"{model.model_path}/example/test.wav"

			res = model.generate(wav_file, output_dir="./outputs", granularity="utterance", extract_embedding=False)
			print(res)
			```

			More usages ref to [docs](docs/tutorial/README_zh.md),
			more examples ref to [demo](https://github.com/alibaba-damo-academy/FunASR/tree/main/examples/industrial_data_pretraining)

			</details>

			## Export ONNX

			@@ -68,10 +68,10 @@
			git clone https://github.com/alibaba/FunASR.git && cd FunASR
			pip3 install -e ./
			```
			如果需要使用工业预训练模型，安装modelscope（可选）
			如果需要使用工业预训练模型，安装modelscope与huggingface_hub（可选）

			```shell
			pip3 install -U modelscope
			pip3 install -U modelscope huggingface_hub
			```

			## 模型仓库
			@@ -153,6 +153,8 @@

			注：`chunk_size`为流式延时配置，`[0,10,5]`表示上屏实时出字粒度为`1060=600ms`，未来信息为`560=300ms`。每次推理输入为`600ms`（采样点数为`16000*0.6=960`），输出为对应文字，最后一个语音片段输入需要设置`is_final=True`来强制输出最后一个字。

			<details><summary>更多例子</summary>

			### 语音端点检测（非实时）
			```python
			from funasr import AutoModel
			@@ -216,9 +218,24 @@
			res = model.generate(input=(wav_file, text_file), data_type=("sound", "text"))
			print(res)
			```

			### 情感识别
			```python
			from funasr import AutoModel

			model = AutoModel(model="emotion2vec_plus_large")

			wav_file = f"{model.model_path}/example/test.wav"

			res = model.generate(wav_file, output_dir="./outputs", granularity="utterance", extract_embedding=False)
			print(res)
			```

			更详细（[教程文档](docs/tutorial/README_zh.md)），
			更多（[模型示例](https://github.com/alibaba-damo-academy/FunASR/tree/main/examples/industrial_data_pretraining)）

			</details>

			## 导出ONNX
			### 从命令行导出
			```shell

			@@ -12,10 +12,30 @@
			"Whisper-large-v2": "iic/speech_whisper-large_asr_multilingual",
			"Whisper-large-v3": "iic/Whisper-large-v3",
			"Qwen-Audio": "Qwen/Qwen-Audio",
			"emotion2vec_plus_large": "iic/emotion2vec_plus_large",
			"emotion2vec_plus_base": "iic/emotion2vec_plus_base",
			"emotion2vec_plus_seed": "iic/emotion2vec_plus_seed",
			}

			name_maps_hf = {
			"": "",
			"paraformer": "funasr/paraformer-zh",
			"paraformer-zh": "funasr/paraformer-zh",
			"paraformer-en": "funasr/paraformer-zh",
			"paraformer-zh-streaming": "funasr/paraformer-zh-streaming",
			"fsmn-vad": "funasr/fsmn-vad",
			"ct-punc": "funasr/ct-punc",
			"ct-punc-c": "iic/punc_ct-transformer_zh-cn-common-vocab272727-pytorch",
			"fa-zh": "funasr/fa-zh",
			"cam++": "funasr/campplus",
			"Whisper-large-v2": "iic/speech_whisper-large_asr_multilingual",
			"Whisper-large-v3": "iic/Whisper-large-v3",
			"Qwen-Audio": "Qwen/Qwen-Audio",
			"emotion2vec_plus_large": "emotion2vec/emotion2vec_plus_large",
			"iic/emotion2vec_plus_large": "emotion2vec/emotion2vec_plus_large",
			"emotion2vec_plus_base": "emotion2vec/emotion2vec_plus_base",
			"iic/emotion2vec_plus_base": "emotion2vec/emotion2vec_plus_base",
			"emotion2vec_plus_seed": "emotion2vec/emotion2vec_plus_seed",
			"iic/emotion2vec_plus_seed": "emotion2vec/emotion2vec_plus_seed",
			}

			name_maps_openai = {

			@@ -39,7 +39,7 @@
			"jaconv",
			"hydra-core>=1.3.2",
			"tensorboardX",
			"rotary_embedding_torch",
			# "rotary_embedding_torch",
			"openai-whisper",
			],
			# train: The modules invoked when training only.