| | |
| | | BAC009S0002W0124 自 六 月 底 呼 和 浩 特 市 率 先 宣 布 取 消 限 购 后 |
| | | ... |
| | | ``` |
| | | These two files both have two columns, while the first column is the wav ids and the second column is the corresponding wav paths/label tokens. |
| | | These two files both have two columns, while the first column is wav ids and the second column is the corresponding wav paths/label tokens. |
| | | |
| | | ## Stage 1: Feature Generation |
| | | This stage extracts FBank feature from raw wav `wav.scp` and apply speed perturbation as data augmentation according to `speed_perturb`. You can set `nj` to control the number of jobs for feature generation. The output features are saved in `$feats_dir/dump/xxx/ark` and the corresponding `feats.scp` files are saved as `$feats_dir/dump/xxx/feats.scp`. An example of `feats.scp` can be seen as follows: |
| | | This stage extracts FBank features from `wav.scp` and apply speed perturbation as data augmentation according to `speed_perturb`. Users can set `nj` to control the number of jobs for feature generation. The generated features are saved in `$feats_dir/dump/xxx/ark` and the corresponding `feats.scp` files are saved as `$feats_dir/dump/xxx/feats.scp`. An example of `feats.scp` can be seen as follows: |
| | | * `feats.scp` |
| | | ``` |
| | | ... |
| | | BAC009S0002W0122_sp0.9 /nfs/funasr_data/aishell-1/dump/fbank/train/ark/feats.16.ark:592751055 |
| | | ... |
| | | ``` |
| | | Note that samples in this file have already been shuffled. This file contains two columns. The first column is the wav-id while the second column is the kaldi-ark feature path. Besides, `speech_shape` and `text_shape` are also generated in this stage, denoting the speech feature shape and text length of each sample. The examples are shown as follows: |
| | | Note that samples in this file have already been shuffled randomly. This file contains two columns. The first column is wav ids while the second column is kaldi-ark feature paths. Besides, `speech_shape` and `text_shape` are also generated in this stage, denoting the speech feature shape and text length of each sample. The examples are shown as follows: |
| | | * `speech_shape` |
| | | ``` |
| | | ... |
| | |
| | | BAC009S0002W0122_sp0.9 15 |
| | | ... |
| | | ``` |
| | | These two files have two columns. The first column is the wav-id and the second column is the corresponding speech feature shape and text length. |
| | | These two files have two columns. The first column is wav ids and the second column is the corresponding speech feature shape and text length. |
| | | |
| | | ## Stage 2: Dictionary Preparation |
| | | This stage prepares a dictionary, which is used as a mapping between label characters and integer indices during ASR training. The output dictionary file is saved as `$feats_dir/data/$lang_toekn_list/$token_type/tokens.txt`. Here we show an example of `tokens.txt` as follows: |
| | | This stage processes the dictionary, which is used as a mapping between label characters and integer indices during ASR training. The processed dictionary file is saved as `$feats_dir/data/$lang_toekn_list/$token_type/tokens.txt`. An example of `tokens.txt` is as follows: |
| | | * `tokens.txt` |
| | | ``` |
| | | <blank> |
| | |
| | | * `<unk>`: indicates the out-of-vocabulary token |
| | | |
| | | ## Stage 3: Training |
| | | This stage achieves the training of the specified model. To start training, you should manually set `exp_dir`, `CUDA_VISIBLE_DEVICES` and `gpu_num`, which have already been explained above. By default, the best `$keep_nbest_models` checkpoints on validation dataset will be averaged to generate a better model and adopted for decoding. |
| | | This stage achieves the training of the specified model. To start training, users should manually set `exp_dir`, `CUDA_VISIBLE_DEVICES` and `gpu_num`, which have already been explained above. By default, the best `$keep_nbest_models` checkpoints on validation dataset will be averaged to generate a better model and adopted for decoding. |
| | | |
| | | * DDP Training |
| | | |
| | |
| | | |
| | | * DataLoader |
| | | |
| | | [comment]: <> (We support two types of DataLoaders for small and large datasets, respectively. By default, the small DataLoader is used and you can set `dataset_type=large` to enable large DataLoader. For small DataLoader, ) |
| | | We support an optional iterable-style DataLoader based on [Pytorch Iterable-style DataPipes](https://pytorch.org/data/beta/torchdata.datapipes.iter.html) for large dataset and you can set `dataset_type=large` to enable it. |
| | | We support an optional iterable-style DataLoader based on [Pytorch Iterable-style DataPipes](https://pytorch.org/data/beta/torchdata.datapipes.iter.html) for large dataset and users can set `dataset_type=large` to enable it. |
| | | |
| | | * Configuration |
| | | |
| | | The parameters of the training, including model, optimization, dataset, etc., are specified by a YAML file in `conf` directory. Also, you can directly specify the parameters in `run.sh` recipe. Please avoid to specify the same parameters in both the YAML file and the recipe. |
| | | The parameters of the training, including model, optimization, dataset, etc., can be set by a YAML file in `conf` directory. Also, users can directly set the parameters in `run.sh` recipe. Please avoid to set the same parameters in both the YAML file and the recipe. |
| | | |
| | | * Training Steps |
| | | |
| | | We support two parameters to specify the training steps, namely `max_epoch` and `max_update`. `max_epoch` indicates the total training epochs while `max_update` indicates the total training steps. If these two parameters are specified at the same time, once the training reaches any one of the two parameters, the training will be stopped. |
| | | We support two parameters to specify the training steps, namely `max_epoch` and `max_update`. `max_epoch` indicates the total training epochs while `max_update` indicates the total training steps. If these two parameters are specified at the same time, once the training reaches any one of these two parameters, the training will be stopped. |
| | | |
| | | * Tensorboard |
| | | |
| | | You can use tensorboard to observe the loss, learning rate, etc. Please run the following command: |
| | | Users can use tensorboard to observe the loss, learning rate, etc. Please run the following command: |
| | | ``` |
| | | tensorboard --logdir ${exp_dir}/exp/${model_dir}/tensorboard/train |
| | | ``` |
| | | |
| | | ## Stage 4: Decoding |
| | | This stage generates the recognition results with acoustic features as input and calculate the `CER` to verify the performance of the trained model. |
| | | This stage generates the recognition results and calculates the `CER` to verify the performance of the trained model. |
| | | |
| | | * Mode Selection |
| | | |
| | | As we support conformer, paraformer and uniasr in FunASR and they have different inference interfaces, a `mode` param is specified as `asr/paraformer/uniase` according to the trained model. |
| | | As we support paraformer, uniasr, conformer and other models in FunASR, a `mode` parameter should be specified as `asr/paraformer/uniasr` according to the trained model. |
| | | |
| | | * Configuration |
| | | |