| | |
| | | ``` |
| | | Note: `chunk_size` is the configuration for streaming latency.` [0,10,5]` indicates that the real-time display granularity is `10*60=600ms`, and the lookahead information is `5*60=300ms`. Each inference input is `600ms` (sample points are `16000*0.6=960`), and the output is the corresponding text. For the last speech segment input, `is_final=True` needs to be set to force the output of the last word. |
| | | |
| | | <details><summary>More Examples</summary> |
| | | |
| | | ### Voice Activity Detection (Non-Streaming) |
| | | ```python |
| | | from funasr import AutoModel |
| | |
| | | res = model.generate(input=(wav_file, text_file), data_type=("sound", "text")) |
| | | print(res) |
| | | ``` |
| | | |
| | | |
| | | ### Speech Emotion Recognition |
| | | ```python |
| | | from funasr import AutoModel |
| | | |
| | | model = AutoModel(model="emotion2vec_plus_large") |
| | | |
| | | wav_file = f"{model.model_path}/example/test.wav" |
| | | |
| | | res = model.generate(wav_file, output_dir="./outputs", granularity="utterance", extract_embedding=False) |
| | | print(res) |
| | | ``` |
| | | |
| | | More usages ref to [docs](docs/tutorial/README_zh.md), |
| | | more examples ref to [demo](https://github.com/alibaba-damo-academy/FunASR/tree/main/examples/industrial_data_pretraining) |
| | | |
| | | </details> |
| | | |
| | | ## Export ONNX |
| | | |
| | |
| | | git clone https://github.com/alibaba/FunASR.git && cd FunASR |
| | | pip3 install -e ./ |
| | | ``` |
| | | 如果需要使用工业预训练模型,安装modelscope(可选) |
| | | 如果需要使用工业预训练模型,安装modelscope与huggingface_hub(可选) |
| | | |
| | | ```shell |
| | | pip3 install -U modelscope |
| | | pip3 install -U modelscope huggingface_hub |
| | | ``` |
| | | |
| | | ## 模型仓库 |
| | |
| | | |
| | | 注:`chunk_size`为流式延时配置,`[0,10,5]`表示上屏实时出字粒度为`10*60=600ms`,未来信息为`5*60=300ms`。每次推理输入为`600ms`(采样点数为`16000*0.6=960`),输出为对应文字,最后一个语音片段输入需要设置`is_final=True`来强制输出最后一个字。 |
| | | |
| | | <details><summary>更多例子</summary> |
| | | |
| | | ### 语音端点检测(非实时) |
| | | ```python |
| | | from funasr import AutoModel |
| | |
| | | res = model.generate(input=(wav_file, text_file), data_type=("sound", "text")) |
| | | print(res) |
| | | ``` |
| | | |
| | | ### 情感识别 |
| | | ```python |
| | | from funasr import AutoModel |
| | | |
| | | model = AutoModel(model="emotion2vec_plus_large") |
| | | |
| | | wav_file = f"{model.model_path}/example/test.wav" |
| | | |
| | | res = model.generate(wav_file, output_dir="./outputs", granularity="utterance", extract_embedding=False) |
| | | print(res) |
| | | ``` |
| | | |
| | | 更详细([教程文档](docs/tutorial/README_zh.md)), |
| | | 更多([模型示例](https://github.com/alibaba-damo-academy/FunASR/tree/main/examples/industrial_data_pretraining)) |
| | | |
| | | </details> |
| | | |
| | | ## 导出ONNX |
| | | ### 从命令行导出 |
| | | ```shell |
| | |
| | | "Whisper-large-v2": "iic/speech_whisper-large_asr_multilingual", |
| | | "Whisper-large-v3": "iic/Whisper-large-v3", |
| | | "Qwen-Audio": "Qwen/Qwen-Audio", |
| | | "emotion2vec_plus_large": "iic/emotion2vec_plus_large", |
| | | "emotion2vec_plus_base": "iic/emotion2vec_plus_base", |
| | | "emotion2vec_plus_seed": "iic/emotion2vec_plus_seed", |
| | | } |
| | | |
| | | name_maps_hf = { |
| | | "": "", |
| | | "paraformer": "funasr/paraformer-zh", |
| | | "paraformer-zh": "funasr/paraformer-zh", |
| | | "paraformer-en": "funasr/paraformer-zh", |
| | | "paraformer-zh-streaming": "funasr/paraformer-zh-streaming", |
| | | "fsmn-vad": "funasr/fsmn-vad", |
| | | "ct-punc": "funasr/ct-punc", |
| | | "ct-punc-c": "iic/punc_ct-transformer_zh-cn-common-vocab272727-pytorch", |
| | | "fa-zh": "funasr/fa-zh", |
| | | "cam++": "funasr/campplus", |
| | | "Whisper-large-v2": "iic/speech_whisper-large_asr_multilingual", |
| | | "Whisper-large-v3": "iic/Whisper-large-v3", |
| | | "Qwen-Audio": "Qwen/Qwen-Audio", |
| | | "emotion2vec_plus_large": "emotion2vec/emotion2vec_plus_large", |
| | | "iic/emotion2vec_plus_large": "emotion2vec/emotion2vec_plus_large", |
| | | "emotion2vec_plus_base": "emotion2vec/emotion2vec_plus_base", |
| | | "iic/emotion2vec_plus_base": "emotion2vec/emotion2vec_plus_base", |
| | | "emotion2vec_plus_seed": "emotion2vec/emotion2vec_plus_seed", |
| | | "iic/emotion2vec_plus_seed": "emotion2vec/emotion2vec_plus_seed", |
| | | } |
| | | |
| | | name_maps_openai = { |
| | |
| | | "jaconv", |
| | | "hydra-core>=1.3.2", |
| | | "tensorboardX", |
| | | "rotary_embedding_torch", |
| | | # "rotary_embedding_torch", |
| | | "openai-whisper", |
| | | ], |
| | | # train: The modules invoked when training only. |