| | |
| | | print(rec_result) |
| | | ``` |
| | | |
| | | #### API-reference |
| | | ##### Define pipeline |
| | | ### API-reference |
| | | #### Define pipeline |
| | | - `task`: `Tasks.auto_speech_recognition` |
| | | - `model`: model name in [model zoo](https://alibaba-damo-academy.github.io/FunASR/en/modelscope_models.html#pretrained-models-on-modelscope), or model path in local disk |
| | | - `ngpu`: `1` (Default), decoding on GPU. If ngpu=0, decoding on CPU |
| | | - `ncpu`: `1` (Default), sets the number of threads used for intraop parallelism on CPU |
| | | - `output_dir`: `None` (Default), the output path of results if set |
| | | - `batch_size`: `1` (Default), batch size when decoding |
| | | ##### Infer pipeline |
| | | #### Infer pipeline |
| | | - `audio_in`: the input to decode, which could be: |
| | | - wav_path, `e.g.`: asr_example.wav, |
| | | - pcm_path, `e.g.`: asr_example.pcm, |
| | |
| | | model='damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch', |
| | | vad_model='damo/speech_fsmn_vad_zh-cn-16k-common-pytorch', |
| | | punc_model='damo/punc_ct-transformer_zh-cn-common-vocab272727-pytorch', |
| | | output_dir=output_dir |
| | | ) |
| | | rec_result = inference_pipeline(audio_in=audio_in) |
| | | print(rec_result) |
| | |
| | | Full code of demo, please ref to [demo](https://github.com/alibaba-damo-academy/FunASR/discussions/238) |
| | | |
| | | |
| | | #### API-reference |
| | | ##### Define pipeline |
| | | ### API-reference |
| | | #### Define pipeline |
| | | - `task`: `Tasks.punctuation` |
| | | - `model`: model name in [model zoo](https://alibaba-damo-academy.github.io/FunASR/en/modelscope_models.html#pretrained-models-on-modelscope), or model path in local disk |
| | | - `ngpu`: `1` (Default), decoding on GPU. If ngpu=0, decoding on CPU |
| | | - `output_dir`: `None` (Default), the output path of results if set |
| | | - `model_revision`: `None` (Default), setting the model version |
| | | |
| | | ##### Infer pipeline |
| | | #### Infer pipeline |
| | | - `text_in`: the input to decode, which could be: |
| | | - text bytes, `e.g.`: "我们都是木头人不会讲话不会动" |
| | | - text file, `e.g.`: example/punc_example.txt |
| | |
| | | print(results) |
| | | ``` |
| | | |
| | | #### API-reference |
| | | ##### Define pipeline |
| | | ### API-reference |
| | | #### Define pipeline |
| | | - `task`: `Tasks.speaker_diarization` |
| | | - `model`: model name in [model zoo](https://alibaba-damo-academy.github.io/FunASR/en/modelscope_models.html#pretrained-models-on-modelscope), or model path in local disk |
| | | - `ngpu`: `1` (Default), decoding on GPU. If ngpu=0, decoding on CPU |
| | |
| | | - vad format: spk1: [1.0, 3.0], [5.0, 8.0] |
| | | - rttm format: "SPEAKER test1 0 1.00 2.00 <NA> <NA> spk1 <NA> <NA>" and "SPEAKER test1 0 5.00 3.00 <NA> <NA> spk1 <NA> <NA>" |
| | | |
| | | ##### Infer pipeline for speaker embedding extraction |
| | | #### Infer pipeline for speaker embedding extraction |
| | | - `audio_in`: the input to process, which could be: |
| | | - list of url: `e.g.`: waveform files at a website |
| | | - list of local file path: `e.g.`: path/to/a.wav |
| | |
| | | ``` |
| | | Full code of demo, please ref to [infer.py](https://github.com/alibaba-damo-academy/FunASR/blob/main/egs_modelscope/speaker_verification/speech_xvector_sv-zh-cn-cnceleb-16k-spk3465-pytorch/infer.py). |
| | | |
| | | #### API-reference |
| | | ##### Define pipeline |
| | | ### API-reference |
| | | #### Define pipeline |
| | | - `task`: `Tasks.speaker_verification` |
| | | - `model`: model name in [model zoo](https://alibaba-damo-academy.github.io/FunASR/en/modelscope_models.html#pretrained-models-on-modelscope), or model path in local disk |
| | | - `ngpu`: `1` (Default), decoding on GPU. If ngpu=0, decoding on CPU |
| | |
| | | - `sv_threshold`: `0.9465` (Default), the similarity threshold to determine |
| | | whether utterances belong to the same speaker (it should be in (0, 1)) |
| | | |
| | | ##### Infer pipeline for speaker embedding extraction |
| | | #### Infer pipeline for speaker embedding extraction |
| | | - `audio_in`: the input to process, which could be: |
| | | - url (str): `e.g.`: https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/sv_example_enroll.wav |
| | | - local_path: `e.g.`: path/to/a.wav |
| | |
| | | - fbank1.scp,speech,kaldi_ark: `e.g.`: extracted 80-dimensional fbank features |
| | | with kaldi toolkits. |
| | | |
| | | ##### Infer pipeline for speaker verification |
| | | #### Infer pipeline for speaker verification |
| | | - `audio_in`: the input to process, which could be: |
| | | - Tuple(url1, url2): `e.g.`: (https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/sv_example_enroll.wav, https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/sv_example_different.wav) |
| | | - Tuple(local_path1, local_path2): `e.g.`: (path/to/a.wav, path/to/b.wav) |
| | |
| | | |
| | | |
| | | |
| | | #### API-reference |
| | | ##### Define pipeline |
| | | ### API-reference |
| | | #### Define pipeline |
| | | - `task`: `Tasks.speech_timestamp` |
| | | - `model`: model name in [model zoo](https://alibaba-damo-academy.github.io/FunASR/en/modelscope_models.html#pretrained-models-on-modelscope), or model path in local disk |
| | | - `ngpu`: `1` (Default), decoding on GPU. If ngpu=0, decoding on CPU |
| | | - `ncpu`: `1` (Default), sets the number of threads used for intraop parallelism on CPU |
| | | - `output_dir`: `None` (Default), the output path of results if set |
| | | - `batch_size`: `1` (Default), batch size when decoding |
| | | ##### Infer pipeline |
| | | #### Infer pipeline |
| | | - `audio_in`: the input speech to predict, which could be: |
| | | - wav_path, `e.g.`: asr_example.wav (wav in local or url), |
| | | - wav.scp, kaldi style wav list (`wav_id wav_path`), `e.g.`: |
| | |
| | | |
| | | |
| | | |
| | | #### API-reference |
| | | ##### Define pipeline |
| | | ### API-reference |
| | | #### Define pipeline |
| | | - `task`: `Tasks.voice_activity_detection` |
| | | - `model`: model name in [model zoo](https://alibaba-damo-academy.github.io/FunASR/en/modelscope_models.html#pretrained-models-on-modelscope), or model path in local disk |
| | | - `ngpu`: `1` (Default), decoding on GPU. If ngpu=0, decoding on CPU |
| | | - `ncpu`: `1` (Default), sets the number of threads used for intraop parallelism on CPU |
| | | - `output_dir`: `None` (Default), the output path of results if set |
| | | - `batch_size`: `1` (Default), batch size when decoding |
| | | ##### Infer pipeline |
| | | #### Infer pipeline |
| | | - `audio_in`: the input to decode, which could be: |
| | | - wav_path, `e.g.`: asr_example.wav, |
| | | - pcm_path, `e.g.`: asr_example.pcm, |