| | |
| | | Notes: Support recognition of single audio file, as well as file list in Kaldi-style wav.scp format: `wav_id wav_pat` |
| | | |
| | | ### Speech Recognition (Non-streaming) |
| | | #### SenseVoice |
| | | ```python |
| | | from funasr import AutoModel |
| | | from funasr.utils.postprocess_utils import rich_transcription_postprocess |
| | | |
| | | model_dir = "iic/SenseVoiceSmall" |
| | | |
| | | model = AutoModel( |
| | | model=model_dir, |
| | | vad_model="fsmn-vad", |
| | | vad_kwargs={"max_single_segment_time": 30000}, |
| | | device="cuda:0", |
| | | ) |
| | | |
| | | # en |
| | | res = model.generate( |
| | | input=f"{model.model_path}/example/en.mp3", |
| | | cache={}, |
| | | language="auto", # "zn", "en", "yue", "ja", "ko", "nospeech" |
| | | use_itn=True, |
| | | batch_size_s=60, |
| | | merge_vad=True, # |
| | | merge_length_s=15, |
| | | ) |
| | | text = rich_transcription_postprocess(res[0]["text"]) |
| | | print(text) |
| | | ``` |
| | | #### Paraformer |
| | | ```python |
| | | from funasr import AutoModel |
| | | # paraformer-zh is a multi-functional asr model |