From 2868fe3df4e92a6ae3e327faf6e57ea492e04124 Mon Sep 17 00:00:00 2001 From: 志浩 <neo.dzh@alibaba-inc.com> Date: 星期四, 16 三月 2023 19:24:21 +0800 Subject: [PATCH] Merge branch 'main' into dev_dzh --- egs_modelscope/tp/speech_timestamp_prediction-v1-16k-offline/README.md | 25 +++++++++++++++++++++++++ 1 files changed, 25 insertions(+), 0 deletions(-) diff --git a/egs_modelscope/tp/speech_timestamp_prediction-v1-16k-offline/README.md b/egs_modelscope/tp/speech_timestamp_prediction-v1-16k-offline/README.md new file mode 100644 index 0000000..5488aaa --- /dev/null +++ b/egs_modelscope/tp/speech_timestamp_prediction-v1-16k-offline/README.md @@ -0,0 +1,25 @@ +# ModelScope Model + +## How to finetune and infer using a pretrained ModelScope Model + +### Inference + +Or you can use the finetuned model for inference directly. + +- Setting parameters in `infer.py` + - <strong>audio_in:</strong> # support wav, url, bytes, and parsed audio format. + - <strong>text_in:</strong> # support text, text url. + - <strong>output_dir:</strong> # If the input format is wav.scp, it needs to be set. + +- Then you can run the pipeline to infer with: +```python + python infer.py +``` + + +Modify inference related parameters in vad.yaml. + +- max_end_silence_time: The end-point silence duration to judge the end of sentence, the parameter range is 500ms~6000ms, and the default value is 800ms +- speech_noise_thres: The balance of speech and silence scores, the parameter range is (-1,1) + - The value tends to -1, the greater probability of noise being judged as speech + - The value tends to 1, the greater probability of speech being judged as noise -- Gitblit v1.9.1