From beef97a2fcb30337006b8102e6f0c0ca1d9f19e0 Mon Sep 17 00:00:00 2001 From: 游雁 <zhifu.gzf@alibaba-inc.com> Date: 星期三, 17 七月 2024 10:38:08 +0800 Subject: [PATCH] update --- README_zh.md | 6 ++---- README.md | 16 +++++++--------- 2 files changed, 9 insertions(+), 13 deletions(-) diff --git a/README.md b/README.md index 525b563..4374a2f 100644 --- a/README.md +++ b/README.md @@ -156,15 +156,13 @@ text = rich_transcription_postprocess(res[0]["text"]) print(text) ``` -Parameter Descriptions: -- `model_dir`: The name of the model, or the model's path on the local disk. -- `trust_remote_code`: - - When set to `True`, this indicates that the model's code implementation should be loaded from the location specified by `remote_code`, which points to the exact code for the model (for example, `model.py` in the current directory). It supports absolute paths, relative paths, and network URLs. - - When set to `False`, this signifies that the model's code implementation is the integrated version within [FunASR](https://github.com/modelscope/FunASR). In this case, any modifications to `model.py` in the current directory will not take effect because the version loaded is the internal one from FunASR. For the model code, [click here to view](https://github.com/modelscope/FunASR/tree/main/funasr/models/sense_voice). -- `max_single_segment_time`: The maximum length of audio segments that the `vad_model` can cut, measured in milliseconds (ms). -- `use_itn`: Indicates whether the output should include punctuation and inverse text normalization. -- `batch_size_s`: Represents a dynamic batch size where the total duration of the audio in the batch is measured in seconds (s). -- `merge_vad`: Whether to concatenate short audio fragments cut by the vad model, with the merged length being `merge_length_s`, measured in seconds (s). +Parameter Description: +- `model_dir`: The name of the model, or the path to the model on the local disk. +- `vad_model`: This indicates the activation of VAD (Voice Activity Detection). The purpose of VAD is to split long audio into shorter clips. In this case, the inference time includes both VAD and SenseVoice total consumption, and represents the end-to-end latency. If you wish to test the SenseVoice model's inference time separately, the VAD model can be disabled. +- `vad_kwargs`: Specifies the configurations for the VAD model. `max_single_segment_time`: denotes the maximum duration for audio segmentation by the `vad_model`, with the unit being milliseconds (ms). +- `use_itn`: Whether the output result includes punctuation and inverse text normalization. +- `batch_size_s`: Indicates the use of dynamic batching, where the total duration of audio in the batch is measured in seconds (s). +- `merge_vad`: Whether to merge short audio fragments segmented by the VAD model, with the merged length being `merge_length_s`, in seconds (s). #### Paraformer ```python diff --git a/README_zh.md b/README_zh.md index 5b3985f..bd6d1cd 100644 --- a/README_zh.md +++ b/README_zh.md @@ -157,10 +157,8 @@ ``` 鍙傛暟璇存槑锛� - `model_dir`锛氭ā鍨嬪悕绉帮紝鎴栨湰鍦扮鐩樹腑鐨勬ā鍨嬭矾寰勩�� -- `trust_remote_code`锛� - - `True`琛ㄧずmodel浠g爜瀹炵幇浠巂remote_code`澶勫姞杞斤紝`remote_code`鎸囧畾`model`鍏蜂綋浠g爜鐨勪綅缃紙渚嬪锛屽綋鍓嶇洰褰曚笅鐨刞model.py`锛夛紝鏀寔缁濆璺緞涓庣浉瀵硅矾寰勶紝浠ュ強缃戠粶url銆� - - `False`琛ㄧず锛宮odel浠g爜瀹炵幇涓� [FunASR](https://github.com/modelscope/FunASR) 鍐呴儴闆嗘垚鐗堟湰锛屾鏃朵慨鏀瑰綋鍓嶇洰褰曚笅鐨刞model.py`涓嶄細鐢熸晥锛屽洜涓哄姞杞界殑鏄痜unasr鍐呴儴鐗堟湰锛屾ā鍨嬩唬鐮乕鐐瑰嚮鏌ョ湅](https://github.com/modelscope/FunASR/tree/main/funasr/models/sense_voice)銆� -- `max_single_segment_time`: 琛ㄧず`vad_model`鏈�澶у垏鍓查煶棰戞椂闀�, 鍗曚綅鏄绉抦s銆� +- `vad_model`锛氳〃绀哄紑鍚疺AD锛孷AD鐨勪綔鐢ㄦ槸灏嗛暱闊抽鍒囧壊鎴愮煭闊抽锛屾鏃舵帹鐞嗚�楁椂鍖呮嫭浜哣AD涓嶴enseVoice鎬昏�楁椂锛屼负閾捐矾鑰楁椂锛屽鏋滈渶瑕佸崟鐙祴璇昐enseVoice妯″瀷鑰楁椂锛屽彲浠ュ叧闂璙AD妯″瀷銆� +- `vad_kwargs`锛氳〃绀篤AD妯″瀷閰嶇疆,`max_single_segment_time`: 琛ㄧず`vad_model`鏈�澶у垏鍓查煶棰戞椂闀�, 鍗曚綅鏄绉抦s銆� - `use_itn`锛氳緭鍑虹粨鏋滀腑鏄惁鍖呭惈鏍囩偣涓庨�嗘枃鏈鍒欏寲銆� - `batch_size_s` 琛ㄧず閲囩敤鍔ㄦ�乥atch锛宐atch涓�婚煶棰戞椂闀匡紝鍗曚綅涓虹s銆� - `merge_vad`锛氭槸鍚﹀皢 vad 妯″瀷鍒囧壊鐨勭煭闊抽纰庣墖鍚堟垚锛屽悎骞跺悗闀垮害涓篳merge_length_s`锛屽崟浣嶄负绉抯銆� -- Gitblit v1.9.1