| | |
| | | ## Using paraformer with ONNXRuntime |
| | | |
| | | <p align="left"> |
| | | <a href=""><img src="https://img.shields.io/badge/Python->=3.7,<=3.10-aff.svg"></a> |
| | | <a href=""><img src="https://img.shields.io/badge/OS-Linux%2C%20Win%2C%20Mac-pink.svg"></a> |
| | | </p> |
| | | |
| | | ### Introduction |
| | | - Model comes from [speech_paraformer](https://www.modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary). |
| | | # ONNXRuntime-python |
| | | |
| | | |
| | | ### Steps: |
| | | 1. Download the whole directory |
| | | ## Install `funasr_onnx` |
| | | |
| | | install from pip |
| | | ```shell |
| | | pip install -U funasr_onnx |
| | | # For the users in China, you could install with the command: |
| | | # pip install -U funasr_onnx -i https://mirror.sjtu.edu.cn/pypi/web/simple |
| | | ``` |
| | | |
| | | or install from source code |
| | | |
| | | ```shell |
| | | git clone https://github.com/alibaba/FunASR.git && cd FunASR |
| | | cd funasr/runtime/python/onnxruntime/paraformer/rapid_paraformer |
| | | cd funasr/runtime/python/onnxruntime |
| | | pip install -e ./ |
| | | # For the users in China, you could install with the command: |
| | | # pip install -e ./ -i https://mirror.sjtu.edu.cn/pypi/web/simple |
| | | ``` |
| | | 2. Install the related packages. |
| | | ```bash |
| | | pip install -r requirements.txt |
| | | ``` |
| | | 3. Export the model. |
| | | |
| | | `Tips`: torch 1.11.0 is required. |
| | | |
| | | ```shell |
| | | python -m funasr.export.export_model [model_name] [export_dir] [true] |
| | | ``` |
| | | `model_name`: the model is to export. |
| | | ## Inference with runtime |
| | | |
| | | `export_dir`: the dir where the onnx is export. |
| | | ### Speech Recognition |
| | | #### Paraformer |
| | | ```python |
| | | from funasr_onnx import Paraformer |
| | | from pathlib import Path |
| | | |
| | | More details ref to ([export docs](https://github.com/alibaba-damo-academy/FunASR/tree/main/funasr/export)) |
| | | model_dir = "damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch" |
| | | model = Paraformer(model_dir, batch_size=1, quantize=True) |
| | | |
| | | wav_path = ['{}/.cache/modelscope/hub/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/example/asr_example.wav'.format(Path.home())] |
| | | |
| | | result = model(wav_path) |
| | | print(result) |
| | | ``` |
| | | - `model_dir`: model_name in modelscope or local path downloaded from modelscope. If the local path is set, it should contain `model.onnx`, `config.yaml`, `am.mvn` |
| | | - `batch_size`: `1` (Default), the batch size duration inference |
| | | - `device_id`: `-1` (Default), infer on CPU. If you want to infer with GPU, set it to gpu_id (Please make sure that you have install the onnxruntime-gpu) |
| | | - `quantize`: `False` (Default), load the model of `model.onnx` in `model_dir`. If set `True`, load the model of `model_quant.onnx` in `model_dir` |
| | | - `intra_op_num_threads`: `4` (Default), sets the number of threads used for intraop parallelism on CPU |
| | | |
| | | Input: wav formt file, support formats: `str, np.ndarray, List[str]` |
| | | |
| | | Output: `List[str]`: recognition result |
| | | |
| | | #### Paraformer-online |
| | | |
| | | ### Voice Activity Detection |
| | | #### FSMN-VAD |
| | | ```python |
| | | from funasr_onnx import Fsmn_vad |
| | | from pathlib import Path |
| | | |
| | | model_dir = "damo/speech_fsmn_vad_zh-cn-16k-common-pytorch" |
| | | wav_path = '{}/.cache/modelscope/hub/damo/speech_fsmn_vad_zh-cn-16k-common-pytorch/example/vad_example.wav'.format(Path.home()) |
| | | |
| | | model = Fsmn_vad(model_dir) |
| | | |
| | | result = model(wav_path) |
| | | print(result) |
| | | ``` |
| | | - `model_dir`: model_name in modelscope or local path downloaded from modelscope. If the local path is set, it should contain `model.onnx`, `config.yaml`, `am.mvn` |
| | | - `batch_size`: `1` (Default), the batch size duration inference |
| | | - `device_id`: `-1` (Default), infer on CPU. If you want to infer with GPU, set it to gpu_id (Please make sure that you have install the onnxruntime-gpu) |
| | | - `quantize`: `False` (Default), load the model of `model.onnx` in `model_dir`. If set `True`, load the model of `model_quant.onnx` in `model_dir` |
| | | - `intra_op_num_threads`: `4` (Default), sets the number of threads used for intraop parallelism on CPU |
| | | |
| | | Input: wav formt file, support formats: `str, np.ndarray, List[str]` |
| | | |
| | | Output: `List[str]`: recognition result |
| | | |
| | | |
| | | - `e.g.`, Export model from modelscope |
| | | ```shell |
| | | python -m funasr.export.export_model 'damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch' "./export" true |
| | | ``` |
| | | - `e.g.`, Export model from local path, the model'name must be `model.pb`. |
| | | ```shell |
| | | python -m funasr.export.export_model '/mnt/workspace/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch' "./export" true |
| | | ``` |
| | | #### FSMN-VAD-online |
| | | ```python |
| | | from funasr_onnx import Fsmn_vad_online |
| | | import soundfile |
| | | from pathlib import Path |
| | | |
| | | 5. Run the demo. |
| | | - Model_dir: the model path, which contains `model.onnx`, `config.yaml`, `am.mvn`. |
| | | - Input: wav formt file, support formats: `str, np.ndarray, List[str]` |
| | | - Output: `List[str]`: recognition result. |
| | | - Example: |
| | | ```python |
| | | from paraformer_onnx import Paraformer |
| | | model_dir = "damo/speech_fsmn_vad_zh-cn-16k-common-pytorch" |
| | | wav_path = '{}/.cache/modelscope/hub/damo/speech_fsmn_vad_zh-cn-16k-common-pytorch/example/vad_example.wav'.format(Path.home()) |
| | | |
| | | model_dir = "/nfs/zhifu.gzf/export/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch" |
| | | model = Paraformer(model_dir, batch_size=1) |
| | | model = Fsmn_vad_online(model_dir) |
| | | |
| | | wav_path = ['/nfs/zhifu.gzf/export/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/example/asr_example.wav'] |
| | | |
| | | result = model(wav_path) |
| | | print(result) |
| | | ``` |
| | | ##online vad |
| | | speech, sample_rate = soundfile.read(wav_path) |
| | | speech_length = speech.shape[0] |
| | | # |
| | | sample_offset = 0 |
| | | step = 1600 |
| | | param_dict = {'in_cache': []} |
| | | for sample_offset in range(0, speech_length, min(step, speech_length - sample_offset)): |
| | | if sample_offset + step >= speech_length - 1: |
| | | step = speech_length - sample_offset |
| | | is_final = True |
| | | else: |
| | | is_final = False |
| | | param_dict['is_final'] = is_final |
| | | segments_result = model(audio_in=speech[sample_offset: sample_offset + step], |
| | | param_dict=param_dict) |
| | | if segments_result: |
| | | print(segments_result) |
| | | ``` |
| | | - `model_dir`: model_name in modelscope or local path downloaded from modelscope. If the local path is set, it should contain `model.onnx`, `config.yaml`, `am.mvn` |
| | | - `batch_size`: `1` (Default), the batch size duration inference |
| | | - `device_id`: `-1` (Default), infer on CPU. If you want to infer with GPU, set it to gpu_id (Please make sure that you have install the onnxruntime-gpu) |
| | | - `quantize`: `False` (Default), load the model of `model.onnx` in `model_dir`. If set `True`, load the model of `model_quant.onnx` in `model_dir` |
| | | - `intra_op_num_threads`: `4` (Default), sets the number of threads used for intraop parallelism on CPU |
| | | |
| | | ## Speed |
| | | Input: wav formt file, support formats: `str, np.ndarray, List[str]` |
| | | |
| | | Environment:Intel(R) Xeon(R) Platinum 8163 CPU @ 2.50GHz |
| | | Output: `List[str]`: recognition result |
| | | |
| | | Test [wav, 5.53s, 100 times avg.](https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_zh.wav) |
| | | |
| | | | Backend | RTF | |
| | | |:-------:|:-----------------:| |
| | | | Pytorch | 0.110 | |
| | | | Onnx | 0.038 | |
| | | ### Punctuation Restoration |
| | | #### CT-Transformer |
| | | ```python |
| | | from funasr_onnx import CT_Transformer |
| | | |
| | | model_dir = "damo/punc_ct-transformer_zh-cn-common-vocab272727-pytorch" |
| | | model = CT_Transformer(model_dir) |
| | | |
| | | text_in="跨境河流是养育沿岸人民的生命之源长期以来为帮助下游地区防灾减灾中方技术人员在上游地区极为恶劣的自然条件下克服巨大困难甚至冒着生命危险向印方提供汛期水文资料处理紧急事件中方重视印方在跨境河流问题上的关切愿意进一步完善双方联合工作机制凡是中方能做的我们都会去做而且会做得更好我请印度朋友们放心中国在上游的任何开发利用都会经过科学规划和论证兼顾上下游的利益" |
| | | result = model(text_in) |
| | | print(result[0]) |
| | | ``` |
| | | - `model_dir`: model_name in modelscope or local path downloaded from modelscope. If the local path is set, it should contain `model.onnx`, `config.yaml`, `am.mvn` |
| | | - `device_id`: `-1` (Default), infer on CPU. If you want to infer with GPU, set it to gpu_id (Please make sure that you have install the onnxruntime-gpu) |
| | | - `quantize`: `False` (Default), load the model of `model.onnx` in `model_dir`. If set `True`, load the model of `model_quant.onnx` in `model_dir` |
| | | - `intra_op_num_threads`: `4` (Default), sets the number of threads used for intraop parallelism on CPU |
| | | |
| | | Input: `str`, raw text of asr result |
| | | |
| | | Output: `List[str]`: recognition result |
| | | |
| | | |
| | | #### CT-Transformer-online |
| | | ```python |
| | | from funasr_onnx import CT_Transformer_VadRealtime |
| | | |
| | | model_dir = "damo/punc_ct-transformer_zh-cn-common-vad_realtime-vocab272727" |
| | | model = CT_Transformer_VadRealtime(model_dir) |
| | | |
| | | text_in = "跨境河流是养育沿岸|人民的生命之源长期以来为帮助下游地区防灾减灾中方技术人员|在上游地区极为恶劣的自然条件下克服巨大困难甚至冒着生命危险|向印方提供汛期水文资料处理紧急事件中方重视印方在跨境河流>问题上的关切|愿意进一步完善双方联合工作机制|凡是|中方能做的我们|都会去做而且会做得更好我请印度朋友们放心中国在上游的|任何开发利用都会经过科学|规划和论证兼顾上下游的利益" |
| | | |
| | | vads = text_in.split("|") |
| | | rec_result_all="" |
| | | param_dict = {"cache": []} |
| | | for vad in vads: |
| | | result = model(vad, param_dict=param_dict) |
| | | rec_result_all += result[0] |
| | | |
| | | print(rec_result_all) |
| | | ``` |
| | | - `model_dir`: model_name in modelscope or local path downloaded from modelscope. If the local path is set, it should contain `model.onnx`, `config.yaml`, `am.mvn` |
| | | - `device_id`: `-1` (Default), infer on CPU. If you want to infer with GPU, set it to gpu_id (Please make sure that you have install the onnxruntime-gpu) |
| | | - `quantize`: `False` (Default), load the model of `model.onnx` in `model_dir`. If set `True`, load the model of `model_quant.onnx` in `model_dir` |
| | | - `intra_op_num_threads`: `4` (Default), sets the number of threads used for intraop parallelism on CPU |
| | | |
| | | Input: `str`, raw text of asr result |
| | | |
| | | Output: `List[str]`: recognition result |
| | | |
| | | ## Performance benchmark |
| | | |
| | | Please ref to [benchmark](https://github.com/alibaba-damo-academy/FunASR/blob/main/funasr/runtime/python/benchmark_onnx.md) |
| | | |
| | | ## Acknowledge |
| | | 1. We acknowledge [SWHL](https://github.com/RapidAI/RapidASR) for contributing the onnxruntime(python api). |
| | | 1. This project is maintained by [FunASR community](https://github.com/alibaba-damo-academy/FunASR). |
| | | 2. We partially refer [SWHL](https://github.com/RapidAI/RapidASR) for onnxruntime (only for paraformer model). |