| | |
| | | |
| | | ([简体中文](./README_zh.md)|English) |
| | | |
| | | # FunASR: A Fundamental End-to-End Speech Recognition Toolkit |
| | | [//]: # (# FunASR: A Fundamental End-to-End Speech Recognition Toolkit) |
| | | |
| | | [](https://github.com/Akshay090/svg-banners) |
| | | |
| | | [](https://pypi.org/project/funasr/) |
| | | |
| | |
| | | <a name="Installation"></a> |
| | | ## Installation |
| | | |
| | | - Requirements |
| | | ```text |
| | | python>=3.8 |
| | | torch>=1.13 |
| | | torchaudio |
| | | ``` |
| | | |
| | | - Install for pypi |
| | | ```shell |
| | | pip3 install -U funasr |
| | | ``` |
| | | Or install from source code |
| | | - Or install from source code |
| | | ``` sh |
| | | git clone https://github.com/alibaba/FunASR.git && cd FunASR |
| | | pip3 install -e ./ |
| | | ``` |
| | | Install modelscope for the pretrained models (Optional) |
| | | - Install modelscope or huggingface_hub for the pretrained models (Optional) |
| | | |
| | | ```shell |
| | | pip3 install -U modelscope |
| | | pip3 install -U modelscope huggingface_hub |
| | | ``` |
| | | |
| | | ## Model Zoo |
| | |
| | | ``` |
| | | Note: `chunk_size` is the configuration for streaming latency.` [0,10,5]` indicates that the real-time display granularity is `10*60=600ms`, and the lookahead information is `5*60=300ms`. Each inference input is `600ms` (sample points are `16000*0.6=960`), and the output is the corresponding text. For the last speech segment input, `is_final=True` needs to be set to force the output of the last word. |
| | | |
| | | <details><summary>More Examples</summary> |
| | | |
| | | ### Voice Activity Detection (Non-Streaming) |
| | | ```python |
| | | from funasr import AutoModel |
| | |
| | | res = model.generate(input=(wav_file, text_file), data_type=("sound", "text")) |
| | | print(res) |
| | | ``` |
| | | |
| | | |
| | | ### Speech Emotion Recognition |
| | | ```python |
| | | from funasr import AutoModel |
| | | |
| | | model = AutoModel(model="emotion2vec_plus_large") |
| | | |
| | | wav_file = f"{model.model_path}/example/test.wav" |
| | | |
| | | res = model.generate(wav_file, output_dir="./outputs", granularity="utterance", extract_embedding=False) |
| | | print(res) |
| | | ``` |
| | | |
| | | More usages ref to [docs](docs/tutorial/README_zh.md), |
| | | more examples ref to [demo](https://github.com/alibaba-damo-academy/FunASR/tree/main/examples/industrial_data_pretraining) |
| | | |
| | | </details> |
| | | |
| | | ## Export ONNX |
| | | |