游雁
2024-05-30 f577bb5e72b0a8ce4b7c947e0661e15deb4078ea
docs
4个文件已修改
62 ■■■■■ 已修改文件
README.md 17 ●●●●● 补丁 | 查看 | 原始文档 | blame | 历史
README_zh.md 21 ●●●●● 补丁 | 查看 | 原始文档 | blame | 历史
funasr/download/name_maps_from_hub.py 22 ●●●●● 补丁 | 查看 | 原始文档 | blame | 历史
setup.py 2 ●●● 补丁 | 查看 | 原始文档 | blame | 历史
README.md
@@ -157,6 +157,8 @@
```
Note: `chunk_size` is the configuration for streaming latency.` [0,10,5]` indicates that the real-time display granularity is `10*60=600ms`, and the lookahead information is `5*60=300ms`. Each inference input is `600ms` (sample points are `16000*0.6=960`), and the output is the corresponding text. For the last speech segment input, `is_final=True` needs to be set to force the output of the last word.
<details><summary>More Examples</summary>
### Voice Activity Detection (Non-Streaming)
```python
from funasr import AutoModel
@@ -215,9 +217,24 @@
res = model.generate(input=(wav_file, text_file), data_type=("sound", "text"))
print(res)
```
### Speech Emotion Recognition
```python
from funasr import AutoModel
model = AutoModel(model="emotion2vec_plus_large")
wav_file = f"{model.model_path}/example/test.wav"
res = model.generate(wav_file, output_dir="./outputs", granularity="utterance", extract_embedding=False)
print(res)
```
More usages ref to [docs](docs/tutorial/README_zh.md), 
more examples ref to [demo](https://github.com/alibaba-damo-academy/FunASR/tree/main/examples/industrial_data_pretraining)
</details>
## Export ONNX
README_zh.md
@@ -68,10 +68,10 @@
git clone https://github.com/alibaba/FunASR.git && cd FunASR
pip3 install -e ./
```
如果需要使用工业预训练模型,安装modelscope(可选)
如果需要使用工业预训练模型,安装modelscope与huggingface_hub(可选)
```shell
pip3 install -U modelscope
pip3 install -U modelscope huggingface_hub
```
## 模型仓库
@@ -153,6 +153,8 @@
注:`chunk_size`为流式延时配置,`[0,10,5]`表示上屏实时出字粒度为`10*60=600ms`,未来信息为`5*60=300ms`。每次推理输入为`600ms`(采样点数为`16000*0.6=960`),输出为对应文字,最后一个语音片段输入需要设置`is_final=True`来强制输出最后一个字。
<details><summary>更多例子</summary>
### 语音端点检测(非实时)
```python
from funasr import AutoModel
@@ -216,9 +218,24 @@
res = model.generate(input=(wav_file, text_file), data_type=("sound", "text"))
print(res)
```
### 情感识别
```python
from funasr import AutoModel
model = AutoModel(model="emotion2vec_plus_large")
wav_file = f"{model.model_path}/example/test.wav"
res = model.generate(wav_file, output_dir="./outputs", granularity="utterance", extract_embedding=False)
print(res)
```
更详细([教程文档](docs/tutorial/README_zh.md)),
更多([模型示例](https://github.com/alibaba-damo-academy/FunASR/tree/main/examples/industrial_data_pretraining))
</details>
## 导出ONNX
### 从命令行导出
```shell
funasr/download/name_maps_from_hub.py
@@ -12,10 +12,30 @@
    "Whisper-large-v2": "iic/speech_whisper-large_asr_multilingual",
    "Whisper-large-v3": "iic/Whisper-large-v3",
    "Qwen-Audio": "Qwen/Qwen-Audio",
    "emotion2vec_plus_large": "iic/emotion2vec_plus_large",
    "emotion2vec_plus_base": "iic/emotion2vec_plus_base",
    "emotion2vec_plus_seed": "iic/emotion2vec_plus_seed",
}
name_maps_hf = {
    "": "",
    "paraformer": "funasr/paraformer-zh",
    "paraformer-zh": "funasr/paraformer-zh",
    "paraformer-en": "funasr/paraformer-zh",
    "paraformer-zh-streaming": "funasr/paraformer-zh-streaming",
    "fsmn-vad": "funasr/fsmn-vad",
    "ct-punc": "funasr/ct-punc",
    "ct-punc-c": "iic/punc_ct-transformer_zh-cn-common-vocab272727-pytorch",
    "fa-zh": "funasr/fa-zh",
    "cam++": "funasr/campplus",
    "Whisper-large-v2": "iic/speech_whisper-large_asr_multilingual",
    "Whisper-large-v3": "iic/Whisper-large-v3",
    "Qwen-Audio": "Qwen/Qwen-Audio",
    "emotion2vec_plus_large": "emotion2vec/emotion2vec_plus_large",
    "iic/emotion2vec_plus_large": "emotion2vec/emotion2vec_plus_large",
    "emotion2vec_plus_base": "emotion2vec/emotion2vec_plus_base",
    "iic/emotion2vec_plus_base": "emotion2vec/emotion2vec_plus_base",
    "emotion2vec_plus_seed": "emotion2vec/emotion2vec_plus_seed",
    "iic/emotion2vec_plus_seed": "emotion2vec/emotion2vec_plus_seed",
}
name_maps_openai = {
setup.py
@@ -39,7 +39,7 @@
        "jaconv",
        "hydra-core>=1.3.2",
        "tensorboardX",
        "rotary_embedding_torch",
        # "rotary_embedding_torch",
        "openai-whisper",
    ],
    # train: The modules invoked when training only.