# ModelScope Model
## How to finetune and infer using a pretrained Paraformer-large Model
### Finetune
- Modify finetune training related parameters in `finetune.py`
- output_dir: # result dir
- data_dir: # the dataset dir needs to include files: `train/wav.scp`, `train/text`; `validation/wav.scp`, `validation/text`
- dataset_type: # for dataset larger than 1000 hours, set as `large`, otherwise set as `small`
- batch_bins: # batch size. For dataset_type is `small`, `batch_bins` indicates the feature frames. For dataset_type is `large`, `batch_bins` indicates the duration in ms
- max_epoch: # number of training epoch
- lr: # learning rate
- Then you can run the pipeline to finetune with:
```python
python finetune.py
```
### Inference
Or you can use the finetuned model for inference directly.
- Setting parameters in `infer.py`
- data_dir: # the dataset dir needs to include `test/wav.scp`. If `test/text` is also exists, CER will be computed
- output_dir: # result dir
- ngpu: # the number of GPUs for decoding
- njob: # the number of jobs for each GPU
- Then you can run the pipeline to infer with:
```python
python infer.py
```
- Results
The decoding results can be found in `$output_dir/1best_recog/text.cer`, which includes recognition results of each sample and the CER metric of the whole test set.
### Inference using local finetuned model
- Modify inference related parameters in `infer_after_finetune.py`
- output_dir: # result dir
- data_dir: # the dataset dir needs to include `test/wav.scp`. If `test/text` is also exists, CER will be computed~~~~
- decoding_model_name: # set the checkpoint name for decoding, e.g., `valid.cer_ctc.ave.pb`
- Then you can run the pipeline to finetune with:
```python
python infer_after_finetune.py
```
- Results
The decoding results can be found in `$output_dir/decoding_results/text.cer`, which includes recognition results of each sample and the CER metric of the whole test set.