| | |
| | | ([简体中文](./README_zh.md)|English) |
| | | ([简体中文](https://github.com/modelscope/FunASR/blob/main/docs/tutorial/README_zh.md)|English) |
| | | |
| | | FunASR has open-sourced a large number of pre-trained models on industrial data. You are free to use, copy, modify, and share FunASR models under the [Model License Agreement](https://github.com/alibaba-damo-academy/FunASR/blob/main/MODEL_LICENSE). Below, we list some representative models. For a comprehensive list, please refer to our [Model Zoo](https://github.com/alibaba-damo-academy/FunASR/tree/main/model_zoo). |
| | | |
| | |
| | | <a href="#Inference"> Model Inference </a> |
| | | |<a href="#Training"> Model Training and Testing </a> |
| | | |<a href="#Export"> Model Export and Testing </a> |
| | | |<a href="#new-model-registration-tutorial"> New Model Registration Tutorial </a> |
| | | </h4> |
| | | </div> |
| | | |
| | |
| | | When you input long audio and encounter Out Of Memory (OOM) issues, since memory usage tends to increase quadratically with audio length, consider the following three scenarios: |
| | | |
| | | a) At the beginning of inference, memory usage primarily depends on `batch_size_s`. Appropriately reducing this value can decrease memory usage. |
| | | |
| | | b) During the middle of inference, when encountering long audio segments cut by VAD and the total token count is less than `batch_size_s`, yet still facing OOM, you can appropriately reduce `batch_size_threshold_s`. If the threshold is exceeded, the batch size is forced to 1. |
| | | |
| | | c) Towards the end of inference, if long audio segments cut by VAD have a total token count less than `batch_size_s` and exceed the `threshold` batch_size_threshold_s, forcing the batch size to 1 and still facing OOM, you may reduce `max_single_segment_time` to shorten the VAD audio segment length. |
| | | |
| | | #### Speech Recognition (Streaming) |
| | |
| | | from funasr import AutoModel |
| | | |
| | | model = AutoModel(model="fsmn-vad") |
| | | wav_file = f"{model.model_path}/example/asr_example.wav" |
| | | wav_file = f"{model.model_path}/example/vad_example.wav" |
| | | res = model.generate(input=wav_file) |
| | | print(res) |
| | | ``` |
| | |
| | | ### Detailed Parameter Description: |
| | | |
| | | ```shell |
| | | funasr/bin/train.py \ |
| | | funasr/bin/train_ds.py \ |
| | | ++model="${model_name_or_model_dir}" \ |
| | | ++train_data_set_list="${train_data}" \ |
| | | ++valid_data_set_list="${val_data}" \ |
| | |
| | | gpu_num=$(echo $CUDA_VISIBLE_DEVICES | awk -F "," '{print NF}') |
| | | |
| | | torchrun --nnodes 1 --nproc_per_node ${gpu_num} \ |
| | | ../../../funasr/bin/train.py ${train_args} |
| | | ../../../funasr/bin/train_ds.py ${train_args} |
| | | ``` |
| | | --nnodes represents the total number of participating nodes, while --nproc_per_node indicates the number of processes running on each node. |
| | | |
| | |
| | | gpu_num=$(echo $CUDA_VISIBLE_DEVICES | awk -F "," '{print NF}') |
| | | |
| | | torchrun --nnodes 2 --node_rank 0 --nproc_per_node ${gpu_num} --master_addr=192.168.1.1 --master_port=12345 \ |
| | | ../../../funasr/bin/train.py ${train_args} |
| | | ../../../funasr/bin/train_ds.py ${train_args} |
| | | ``` |
| | | On the worker node (assuming the IP is 192.168.1.2), you need to ensure that the MASTER_ADDR and MASTER_PORT environment variables are set to match those of the master node, and then run the same command: |
| | | |
| | |
| | | gpu_num=$(echo $CUDA_VISIBLE_DEVICES | awk -F "," '{print NF}') |
| | | |
| | | torchrun --nnodes 2 --node_rank 1 --nproc_per_node ${gpu_num} --master_addr=192.168.1.1 --master_port=12345 \ |
| | | ../../../funasr/bin/train.py ${train_args} |
| | | ../../../funasr/bin/train_ds.py ${train_args} |
| | | ``` |
| | | |
| | | --nnodes indicates the total number of nodes participating in the training, --node_rank represents the ID of the current node, and --nproc_per_node specifies the number of processes running on each node (usually corresponds to the number of GPUs). |
| | |
| | | ++data_type_list='["source", "target"]' \ |
| | | ++jsonl_file_in="../../../data/list/train.jsonl" |
| | | ``` |
| | | |
| | | |
| | | #### Large-Scale Data Training |
| | | When dealing with large datasets (e.g., 50,000 hours or more), memory issues may arise, especially in multi-GPU experiments. To address this, split the JSONL files into slices, write them into a TXT file (one slice per line), and set `data_split_num`. For example: |
| | | ```shell |
| | | train_data="/root/data/list/data.list" |
| | | |
| | | funasr/bin/train_ds.py \ |
| | | ++train_data_set_list="${train_data}" \ |
| | | ++dataset_conf.data_split_num=256 |
| | | ``` |
| | | **Details:** |
| | | - `data.list`: A plain text file listing the split JSONL files. For example, the content of `data.list` might be: |
| | | ```bash |
| | | data/list/train.0.jsonl |
| | | data/list/train.1.jsonl |
| | | ... |
| | | ``` |
| | | - `data_split_num`: Specifies the number of slice groups. For instance, if `data.list` contains 512 lines and `data_split_num=256`, the data will be divided into 256 groups, each containing 2 JSONL files. This ensures that only 2 JSONL files are loaded for training at a time, reducing memory usage during training. Note: Groups are created sequentially. |
| | | |
| | | **Recommendation:** |
| | | If the dataset is extremely large and contains heterogeneous data types, perform **data balancing** during splitting to ensure uniformity across groups. |
| | | |
| | | #### Training log |
| | | |
| | |
| | | res = model.export(quantize=False) |
| | | ``` |
| | | |
| | | ### optimize onnx |
| | | ```shell |
| | | # pip3 install -U onnxslim |
| | | onnxslim model.onnx model.onnx |
| | | ``` |
| | | |
| | | ### Test ONNX |
| | | ```python |
| | | # pip3 install -U funasr-onnx |
| | |
| | | print(result) |
| | | ``` |
| | | |
| | | More examples ref to [demo](https://github.com/alibaba-damo-academy/FunASR/tree/main/runtime/python/onnxruntime) |
| | | More examples ref to [demo](https://github.com/alibaba-damo-academy/FunASR/tree/main/runtime/python/onnxruntime) |
| | | |
| | | |
| | | <a name="new-model-registration-tutorial"></a> |
| | | ## New Model Registration Tutorial |
| | | |
| | | ### Viewing the Registry |
| | | |
| | | ```plaintext |
| | | from funasr.register import tables |
| | | |
| | | tables.print() |
| | | ``` |
| | | |
| | | Supports viewing the registry of a specified type: `tables.print("model")` |
| | | |
| | | ### Registering Models |
| | | |
| | | ```python |
| | | from funasr.register import tables |
| | | |
| | | @tables.register("model_classes", "SenseVoiceSmall") |
| | | class SenseVoiceSmall(nn.Module): |
| | | def __init__(*args, **kwargs): |
| | | ... |
| | | |
| | | def forward( |
| | | self, |
| | | **kwargs, |
| | | ): |
| | | |
| | | def inference( |
| | | self, |
| | | data_in, |
| | | data_lengths=None, |
| | | key: list = None, |
| | | tokenizer=None, |
| | | frontend=None, |
| | | **kwargs, |
| | | ): |
| | | ... |
| | | |
| | | ``` |
| | | |
| | | Add `@tables.register("model_classes","SenseVoiceSmall")` before the class name that needs to be registered to complete the registration. The class needs to implement the methods: __init__, forward, and inference. |
| | | |
| | | Complete code: [https://github.com/modelscope/FunASR/blob/main/funasr/models/sense_voice/model.py#L443](https://github.com/modelscope/FunASR/blob/main/funasr/models/sense_voice/model.py#L443) |
| | | |
| | | After registration, specify the newly registered model in config.yaml to define the model |
| | | |
| | | ```python |
| | | model: SenseVoiceSmall |
| | | model_conf: |
| | | ... |
| | | ``` |
| | | |
| | | [More detailed tutorial documents](https://github.com/modelscope/FunASR/blob/main/docs/tutorial/Tables.md) |