| | |
| | | When you input long audio and encounter Out Of Memory (OOM) issues, since memory usage tends to increase quadratically with audio length, consider the following three scenarios: |
| | | |
| | | a) At the beginning of inference, memory usage primarily depends on `batch_size_s`. Appropriately reducing this value can decrease memory usage. |
| | | |
| | | b) During the middle of inference, when encountering long audio segments cut by VAD and the total token count is less than `batch_size_s`, yet still facing OOM, you can appropriately reduce `batch_size_threshold_s`. If the threshold is exceeded, the batch size is forced to 1. |
| | | |
| | | c) Towards the end of inference, if long audio segments cut by VAD have a total token count less than `batch_size_s` and exceed the `threshold` batch_size_threshold_s, forcing the batch size to 1 and still facing OOM, you may reduce `max_single_segment_time` to shorten the VAD audio segment length. |
| | | |
| | | #### Speech Recognition (Streaming) |
| | |
| | | from funasr import AutoModel |
| | | |
| | | model = AutoModel(model="fsmn-vad") |
| | | wav_file = f"{model.model_path}/example/asr_example.wav" |
| | | wav_file = f"{model.model_path}/example/vad_example.wav" |
| | | res = model.generate(input=wav_file) |
| | | print(res) |
| | | ``` |
| | |
| | | ++train_conf.validate_interval=2000 \ |
| | | ++train_conf.save_checkpoint_interval=2000 \ |
| | | ++train_conf.keep_nbest_models=20 \ |
| | | ++train_conf.avg_nbest_model=5 \ |
| | | ++train_conf.avg_nbest_model=10 \ |
| | | ++optim_conf.lr=0.0002 \ |
| | | ++output_dir="${output_dir}" &> ${log_file} |
| | | ``` |
| | |
| | | print(result) |
| | | ``` |
| | | |
| | | More examples ref to [demo](https://github.com/alibaba-damo-academy/FunASR/tree/main/runtime/python/onnxruntime) |
| | | More examples ref to [demo](https://github.com/alibaba-damo-academy/FunASR/tree/main/runtime/python/onnxruntime) |