| | |
| | | # FunASR-1.x.x 注册模型教程 |
| | | # FunASR-1.x.x 注册新模型教程 |
| | | |
| | | 1.0版本的设计初衷是【**让模型集成更简单**】,核心feature为注册表与AutoModel: |
| | | (简体中文|[English](./Tables.md)) |
| | | |
| | | funasr-1.x.x 版本的设计初衷是【**让模型集成更简单**】,核心feature为注册表与AutoModel: |
| | | |
| | | * 注册表的引入,使得开发中可以用搭积木的方式接入模型,兼容多种task; |
| | | |
| | |
| | | * 统一学术与工业模型推理训练脚本; |
| | | |
| | | |
| | |  |
| | | |
| | | # 快速上手 |
| | | |
| | | ## 基于automodel用法 |
| | | |
| | | ### Paraformer模型 |
| | | |
| | | 输入任意时长语音,输出为语音内容对应文字,文字具有标点断句,字级别时间戳,以及说话人身份。 |
| | | |
| | | ```python |
| | | from funasr import AutoModel |
| | | |
| | | model = AutoModel(model="paraformer-zh", |
| | | vad_model="fsmn-vad", |
| | | vad_kwargs={"max_single_segment_time": 60000}, |
| | | punc_model="ct-punc", |
| | | # spk_model="cam++" |
| | | ) |
| | | wav_file = f"{model.model_path}/example/asr_example.wav" |
| | | res = model.generate(input=wav_file, batch_size_s=300, batch_size_threshold_s=60, hotword='魔搭') |
| | | print(res) |
| | | ``` |
| | | |
| | | ### SenseVoiceSmall模型 |
| | | |
| | |
| | | res = model.generate(input=[str], output_dir=[str]) |
| | | ``` |
| | | |
| | | * wav文件路径, 例如: asr\_example.wav |
| | | |
| | | * pcm文件路径, 例如: asr\_example.pcm,此时需要指定音频采样率fs(默认为16000) |
| | | |
| | | * 音频字节数流,例如:麦克风的字节数数据 |
| | | |
| | | * wav.scp,kaldi 样式的 wav 列表 (`wav_id \t wav_path`), 例如: |
| | | |
| | | * * wav文件路径, 例如: asr\_example.wav |
| | | |
| | | * pcm文件路径, 例如: asr\_example.pcm,此时需要指定音频采样率fs(默认为16000) |
| | | |
| | | * 音频字节数流,例如:麦克风的字节数数据 |
| | | |
| | | * wav.scp,kaldi 样式的 wav 列表 (`wav_id \t wav_path`), 例如: |
| | | |
| | | |
| | | ```plaintext |
| | | asr_example1 ./audios/asr_example1.wav |
| | |
| | | |
| | | # 注册表详解 |
| | | |
| | | 以SenseVoiceSmall模型为例,讲解如何注册新模型,模型链接: |
| | | |
| | | **modelscope:**[https://www.modelscope.cn/models/iic/SenseVoiceSmall/files](https://www.modelscope.cn/models/iic/SenseVoiceSmall/files) |
| | | |
| | | **huggingface:**[https://huggingface.co/FunAudioLLM/SenseVoiceSmall](https://huggingface.co/FunAudioLLM/SenseVoiceSmall) |
| | | |
| | | ## 模型资源目录 |
| | | |
| | |  |
| | |  |
| | | |
| | | **配置文件**:config.yaml |
| | | |
| | | ```yaml |
| | | model: SenseVoiceLarge |
| | | model_conf: |
| | | lsm_weight: 0.1 |
| | | length_normalized_loss: true |
| | | activation_checkpoint: true |
| | | sos: <|startoftranscript|> |
| | | eos: <|endoftext|> |
| | | downsample_rate: 4 |
| | | use_padmask: true |
| | | |
| | | encoder: SenseVoiceEncoder |
| | | encoder: SenseVoiceEncoderSmall |
| | | encoder_conf: |
| | | input_size: 128 |
| | | attention_heads: 20 |
| | | linear_units: 1280 |
| | | num_blocks: 32 |
| | | dropout_rate: 0.1 |
| | | positional_dropout_rate: 0.1 |
| | | attention_dropout_rate: 0.1 |
| | | kernel_size: 31 |
| | | sanm_shfit: 0 |
| | | att_type: self_att_fsmn_sdpa |
| | | downsample_rate: 4 |
| | | use_padmask: true |
| | | max_position_embeddings: 2048 |
| | | rope_theta: 10000 |
| | | |
| | | frontend: WhisperFrontend |
| | | frontend_conf: |
| | | fs: 16000 |
| | | n_mels: 128 |
| | | do_pad_trim: false |
| | | filters_path: null |
| | | output_size: 512 |
| | | attention_heads: 4 |
| | | linear_units: 2048 |
| | | num_blocks: 50 |
| | | tp_blocks: 20 |
| | | dropout_rate: 0.1 |
| | | positional_dropout_rate: 0.1 |
| | | attention_dropout_rate: 0.1 |
| | | input_layer: pe |
| | | pos_enc_class: SinusoidalPositionEncoder |
| | | normalize_before: true |
| | | kernel_size: 11 |
| | | sanm_shfit: 0 |
| | | selfattention_layer_type: sanm |
| | | |
| | | tokenizer: SenseVoiceTokenizer |
| | | |
| | | model: SenseVoiceSmall |
| | | model_conf: |
| | | length_normalized_loss: true |
| | | sos: 1 |
| | | eos: 2 |
| | | ignore_id: -1 |
| | | |
| | | tokenizer: SentencepiecesTokenizer |
| | | tokenizer_conf: |
| | | vocab_path: null |
| | | is_multilingual: true |
| | | num_languages: 8749 |
| | | bpemodel: null |
| | | unk_symbol: <unk> |
| | | split_with_space: true |
| | | |
| | | dataset: SenseVoiceDataset |
| | | frontend: WavFrontend |
| | | frontend_conf: |
| | | fs: 16000 |
| | | window: hamming |
| | | n_mels: 80 |
| | | frame_length: 25 |
| | | frame_shift: 10 |
| | | lfr_m: 7 |
| | | lfr_n: 6 |
| | | cmvn_file: null |
| | | |
| | | |
| | | dataset: SenseVoiceCTCDataset |
| | | dataset_conf: |
| | | index_ds: IndexDSJsonl |
| | | batch_sampler: BatchSampler |
| | | batch_sampler: EspnetStyleBatchSampler |
| | | data_split_num: 32 |
| | | batch_type: token |
| | | batch_size: 12000 |
| | | sort_size: 64 |
| | | batch_size: 14000 |
| | | max_token_length: 2000 |
| | | min_token_length: 60 |
| | | max_source_length: 2000 |
| | | min_source_length: 60 |
| | | max_target_length: 150 |
| | | max_target_length: 200 |
| | | min_target_length: 0 |
| | | shuffle: true |
| | | num_workers: 4 |
| | | sos: ${model_conf.sos} |
| | | eos: ${model_conf.eos} |
| | | IndexDSJsonl: IndexDSJsonl |
| | | retry: 20 |
| | | |
| | | train_conf: |
| | | accum_grad: 1 |
| | | grad_clip: 5 |
| | | max_epoch: 5 |
| | | keep_nbest_models: 200 |
| | | avg_nbest_model: 200 |
| | | max_epoch: 20 |
| | | keep_nbest_models: 10 |
| | | avg_nbest_model: 10 |
| | | log_interval: 100 |
| | | resume: true |
| | | validate_interval: 10000 |
| | |
| | | |
| | | optim: adamw |
| | | optim_conf: |
| | | lr: 2.5e-05 |
| | | |
| | | lr: 0.00002 |
| | | scheduler: warmuplr |
| | | scheduler_conf: |
| | | warmup_steps: 20000 |
| | | warmup_steps: 25000 |
| | | |
| | | ``` |
| | | |
| | | **模型参数**:model.pt |
| | | |
| | | **路径解析**:configuration.json |
| | | **路径解析**:configuration.json(非必需) |
| | | |
| | | ```json |
| | | { |
| | |
| | | "file_path_metas": { |
| | | "init_param":"model.pt", |
| | | "config":"config.yaml", |
| | | "tokenizer_conf": {"vocab_path": "tokens.tiktoken"}, |
| | | "frontend_conf":{"filters_path": "mel_filters.npz"}} |
| | | "tokenizer_conf": {"bpemodel": "chn_jpn_yue_eng_ko_spectok.bpe.model"}, |
| | | "frontend_conf":{"cmvn_file": "am.mvn"}} |
| | | } |
| | | ``` |
| | | |
| | | configuration.json的作用是给file\_path\_metas中的item拼接上模型根目录,以便于路径能够被正确的解析,以上为例,假设模型根目录为:/home/zhifu.gzf/init\_model/SenseVoiceSmall,目录中config.yaml中的相关路径被替换成了正确的路径(忽略无关配置): |
| | | |
| | | ```yaml |
| | | init_param: /home/zhifu.gzf/init_model/SenseVoiceSmall/model.pt |
| | | |
| | | tokenizer_conf: |
| | | bpemodel: /home/zhifu.gzf/init_model/SenseVoiceSmall/chn_jpn_yue_eng_ko_spectok.bpe.model |
| | | |
| | | frontend_conf: |
| | | cmvn_file: /home/zhifu.gzf/init_model/SenseVoiceSmall/am.mvn |
| | | ``` |
| | | |
| | | ## 注册表 |
| | | |
| | |  |
| | | |
| | | ### 查看注册表 |
| | | |
| | | ```python |
| | | ```plaintext |
| | | from funasr.register import tables |
| | | |
| | | tables.print() |
| | | ``` |
| | | |
| | | 支持查看指定类型的注册表:`tables.print("model")` |
| | | 支持查看指定类型的注册表:\`tables.print("model")\`,目前funasr已经注册模型如上图所示。目前预先定义了如下几个分类: |
| | | |
| | | ```python |
| | | model_classes = {} |
| | | frontend_classes = {} |
| | | specaug_classes = {} |
| | | normalize_classes = {} |
| | | encoder_classes = {} |
| | | decoder_classes = {} |
| | | joint_network_classes = {} |
| | | predictor_classes = {} |
| | | stride_conv_classes = {} |
| | | tokenizer_classes = {} |
| | | dataloader_classes = {} |
| | | batch_sampler_classes = {} |
| | | dataset_classes = {} |
| | | index_ds_classes = {} |
| | | ``` |
| | | |
| | | ### 新注册 |
| | | ### 注册模型 |
| | | |
| | | ```python |
| | | from funasr.register import tables |
| | | |
| | | @tables.register("model_classes", "MinMo_S2T") |
| | | class MinMo_S2T(nn.Module): |
| | | @tables.register("model_classes", "SenseVoiceSmall") |
| | | class SenseVoiceSmall(nn.Module): |
| | | def __init__(*args, **kwargs): |
| | | ... |
| | | |
| | |
| | | |
| | | ``` |
| | | |
| | | 在config.yaml中指定新注册模型 |
| | | 在需要注册的类名前加上 @tables.register("model\_classes", "SenseVoiceSmall"),即可完成注册,类需要实现有:\_\_init\_\_,forward,inference方法。 |
| | | |
| | | ```yaml |
| | | model: MinMo_S2T |
| | | register用法: |
| | | |
| | | ```python |
| | | @tables.register("注册分类", "注册名") |
| | | ``` |
| | | |
| | | 其中,"注册分类"可以是预先定义好的分类(见上面图),如果是自己定义的新分类,会自动将新分类写进注册表分类中,"注册名"即希望注册名字,后续可以直接来使用。 |
| | | |
| | | 完整代码:[https://github.com/modelscope/FunASR/blob/main/funasr/models/sense\_voice/model.py#L443](https://github.com/modelscope/FunASR/blob/main/funasr/models/sense_voice/model.py#L443) |
| | | |
| | | 注册完成后,在config.yaml中指定新注册模型,即可实现对模型的定义 |
| | | |
| | | ```python |
| | | model: SenseVoiceSmall |
| | | model_conf: |
| | | ... |
| | | ``` |
| | | |
| | | ### 注册失败 |
| | | |
| | | 如果出现找不到注册模型或发方法,assert model\_class is not None, f'{kwargs\["model"\]} is not registered'。模型注册的原理是,import 模型文件,可以通过import来查看具体注册失败原因,例如,上述模型文件为,funasr/models/sense\_voice/model.py: |
| | | |
| | | ```python |
| | | from funasr.models.sense_voice.model import * |
| | | ``` |
| | | |
| | | ## 注册原则 |
| | | |
| | | * Model:模型之间互相独立,每一个模型,都需要在funasr/models/下面新建一个模型目录,不要采用类的继承方法!!!不要从其他模型目录中import,所有需要用到的都单独放到自己的模型目录中!!!不要修改现有的模型代码!!! |