| | |
| | | ``` |
| | | Note: `chunk_size` is the configuration for streaming latency.` [0,10,5]` indicates that the real-time display granularity is `10*60=600ms`, and the lookahead information is `5*60=300ms`. Each inference input is `600ms` (sample points are `16000*0.6=960`), and the output is the corresponding text. For the last speech segment input, `is_final=True` needs to be set to force the output of the last word. |
| | | |
| | | ### Voice Activity Detection (streaming) |
| | | ### Voice Activity Detection (Non-Streaming) |
| | | ```python |
| | | from funasr import AutoModel |
| | | |
| | |
| | | res = model.generate(input=wav_file) |
| | | print(res) |
| | | ``` |
| | | ### Voice Activity Detection (Non-streaming) |
| | | ### Voice Activity Detection (Streaming) |
| | | ```python |
| | | from funasr import AutoModel |
| | | |
| | |
| | | |
| | | res = model.generate(input="https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_zh.wav", |
| | | hotword='达摩院 魔搭', |
| | | # sentence_timestamp=True, |
| | | # sentence_timestamp=True, # return sentence level information when spk_model is not given |
| | | ) |
| | | print(res) |
| | |
| | | if overlap > max_overlap: |
| | | max_overlap = overlap |
| | | sentence_spk = spk |
| | | d['spk'] = sentence_spk |
| | | d['spk'] = int(sentence_spk) |
| | | sd_sentence_list.append(d) |
| | | return sd_sentence_list |
| | | |