| | |
| | | # FunASR-1.x.x 注册模型教程 |
| | | |
| | | 1.0版本的设计初衷是【**让模型集成更简单**】,核心feature为注册表与AutoModel: |
| | | funasr-1.x.x 版本的设计初衷是【**让模型集成更简单**】,核心feature为注册表与AutoModel: |
| | | |
| | | * 注册表的引入,使得开发中可以用搭积木的方式接入模型,兼容多种task; |
| | | |
| | |
| | | |
| | | * 统一学术与工业模型推理训练脚本; |
| | | |
| | | |
| | |  |
| | | |
| | | # 快速上手 |
| | | |
| | |
| | | |
| | | # 注册表详解 |
| | | |
| | | 以SenseVoiceSmall模型为例,讲解如何注册新模型,模型链接: |
| | | |
| | | **modelscope:**[https://www.modelscope.cn/models/iic/SenseVoiceSmall/files](https://www.modelscope.cn/models/iic/SenseVoiceSmall/files) |
| | | |
| | | **huggingface:**[https://huggingface.co/FunAudioLLM/SenseVoiceSmall](https://huggingface.co/FunAudioLLM/SenseVoiceSmall) |
| | | |
| | | ## 模型资源目录 |
| | | |
| | |  |
| | | |
| | | **模型链接为:**[https://www.modelscope.cn/models/iic/SenseVoiceSmall/files](https://www.modelscope.cn/models/iic/SenseVoiceSmall/files) |
| | | |
| | | **配置文件**:config.yaml |
| | | |
| | |
| | | |
| | | **模型参数**:model.pt |
| | | |
| | | **路径解析**:configuration.json |
| | | **路径解析**:configuration.json(非必需) |
| | | |
| | | ```json |
| | | { |
| | |
| | | } |
| | | ``` |
| | | |
| | | 内容可以复用,直接拷贝即可,需要注意字段 `file_path_metas` 所有内容会自动拼接模型资源路径,并且会覆盖 `config.yaml` 中相同字段的路径。 |
| | | configuration.json的作用是给file\_path\_metas中的item拼接上模型根目录,以便于路径能够被正确的解析,以上为例,假设模型根目录为:/home/zhifu.gzf/init\_model/SenseVoiceSmall,目录中config.yaml中的相关路径被替换成了正确的路径(忽略无关配置): |
| | | |
| | | ```yaml |
| | | init_param: /home/zhifu.gzf/init_model/SenseVoiceSmall/model.pt |
| | | |
| | | tokenizer_conf: |
| | | bpemodel: /home/zhifu.gzf/init_model/SenseVoiceSmall/chn_jpn_yue_eng_ko_spectok.bpe.model |
| | | |
| | | frontend_conf: |
| | | cmvn_file: /home/zhifu.gzf/init_model/SenseVoiceSmall/am.mvn |
| | | ``` |
| | | |
| | | ## 注册表 |
| | | |
| | |  |
| | | |
| | | ### 查看注册表 |
| | | |
| | |
| | | tables.print() |
| | | ``` |
| | | |
| | | 支持查看指定类型的注册表,例如只看注册的`model`类:`tables.print("model")` |
| | | 支持查看指定类型的注册表:\`tables.print("model")\`,目前funasr已经注册模型如上图所示。目前预先定义了如下几个分类: |
| | | |
| | | ```python |
| | | model_classes = {} |
| | | frontend_classes = {} |
| | | specaug_classes = {} |
| | | normalize_classes = {} |
| | | encoder_classes = {} |
| | | decoder_classes = {} |
| | | joint_network_classes = {} |
| | | predictor_classes = {} |
| | | stride_conv_classes = {} |
| | | tokenizer_classes = {} |
| | | dataloader_classes = {} |
| | | batch_sampler_classes = {} |
| | | dataset_classes = {} |
| | | index_ds_classes = {} |
| | | ``` |
| | | |
| | | ### 注册模型 |
| | | |
| | |
| | | |
| | | ``` |
| | | |
| | | 在需要注册的类名前加上 `@tables.register("model_classes","SenseVoiceSmall")`,即可完成注册,类需要实现有:__init__,forward,inference方法。 |
| | | 在需要注册的类名前加上 @tables.register("model\_classes", "SenseVoiceSmall"),即可完成注册,类需要实现有:\_\_init\_\_,forward,inference方法。 |
| | | |
| | | register用法: |
| | | |
| | | ```python |
| | | @tables.register("注册分类", "注册名") |
| | | ``` |
| | | |
| | | 其中,"注册分类"可以是预先定义好的分类(见上面图),如果是自己定义的新分类,会自动将新分类写进注册表分类中,"注册名"即希望注册名字,后续可以直接来使用。 |
| | | |
| | | 完整代码:[https://github.com/modelscope/FunASR/blob/main/funasr/models/sense\_voice/model.py#L443](https://github.com/modelscope/FunASR/blob/main/funasr/models/sense_voice/model.py#L443) |
| | | |
| | |
| | | ... |
| | | ``` |
| | | |
| | | ## 注册失败 |
| | | ### 注册失败 |
| | | |
| | | 如果出现找不到注册模型或者注册函数,`assert model_class is not None, f'{kwargs["model"]} is not registered'`。模型注册的原理是,import 模型文件,可以通过import来查看具体注册失败原因,例如,上述模型文件为,funasr/models/sense_voice/model.py: |
| | | 如果出现找不到注册模型或发方法,assert model\_class is not None, f'{kwargs\["model"\]} is not registered'。模型注册的原理是,import 模型文件,可以通过import来查看具体注册失败原因,例如,上述模型文件为,funasr/models/sense\_voice/model.py: |
| | | |
| | | ```python |
| | | from funasr.models.sense_voice.model import * |