python/FunASR-XL.git

parent: 9ce2fcb9 | 补丁 | 提交 | ignore whitespace

游雁

2023-04-27 30aa982bf29ceefaf52c0013c12c19adc57dea0e

docs

1个文件已删除

7个文件已修改

	README.md	118 ●●●●● 补丁 \| 查看 \| 原始文档 \| blame \| 历史
	egs_modelscope/asr/TEMPLATE/README.md	6 ●●●●● 补丁 \| 查看 \| 原始文档 \| blame \| 历史
	egs_modelscope/asr_vad_punc/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/demo.py	1 ●●●●● 补丁 \| 查看 \| 原始文档 \| blame \| 历史
	egs_modelscope/punctuation/TEMPLATE/README.md	6 ●●●●● 补丁 \| 查看 \| 原始文档 \| blame \| 历史
	egs_modelscope/speaker_diarization/TEMPLATE/README.md	6 ●●●●● 补丁 \| 查看 \| 原始文档 \| blame \| 历史
	egs_modelscope/speaker_verification/TEMPLATE/README.md	8 ●●●●● 补丁 \| 查看 \| 原始文档 \| blame \| 历史
	egs_modelscope/tp/TEMPLATE/README.md	6 ●●●●● 补丁 \| 查看 \| 原始文档 \| blame \| 历史
	egs_modelscope/vad/TEMPLATE/README.md	6 ●●●●● 补丁 \| 查看 \| 原始文档 \| blame \| 历史

 README.md

File was deleted

 egs_modelscope/asr/TEMPLATE/README.md

@@ -76,15 +76,15 @@
print(rec_result)
```

#### API-reference
##### Define pipeline
### API-reference
#### Define pipeline
- `task`: `Tasks.auto_speech_recognition`
- `model`: model name in [model zoo](https://alibaba-damo-academy.github.io/FunASR/en/modelscope_models.html#pretrained-models-on-modelscope), or model path in local disk
- `ngpu`: `1` (Default), decoding on GPU. If ngpu=0, decoding on CPU
- `ncpu`: `1` (Default), sets the number of threads used for intraop parallelism on CPU 
- `output_dir`: `None` (Default), the output path of results if set
- `batch_size`: `1` (Default), batch size when decoding
##### Infer pipeline
#### Infer pipeline
- `audio_in`: the input to decode, which could be: 
  - wav_path, `e.g.`: asr_example.wav,
  - pcm_path, `e.g.`: asr_example.pcm, 

 egs_modelscope/asr_vad_punc/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/demo.py

@@ -9,6 +9,7 @@
        model='damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch',
        vad_model='damo/speech_fsmn_vad_zh-cn-16k-common-pytorch',
        punc_model='damo/punc_ct-transformer_zh-cn-common-vocab272727-pytorch',
        output_dir=output_dir
    )
    rec_result = inference_pipeline(audio_in=audio_in)
    print(rec_result)

 egs_modelscope/punctuation/TEMPLATE/README.md

@@ -52,15 +52,15 @@
Full code of demo, please ref to [demo](https://github.com/alibaba-damo-academy/FunASR/discussions/238)


#### API-reference
##### Define pipeline
### API-reference
#### Define pipeline
- `task`: `Tasks.punctuation`
- `model`: model name in [model zoo](https://alibaba-damo-academy.github.io/FunASR/en/modelscope_models.html#pretrained-models-on-modelscope), or model path in local disk
- `ngpu`: `1` (Default), decoding on GPU. If ngpu=0, decoding on CPU
- `output_dir`: `None` (Default), the output path of results if set
- `model_revision`: `None` (Default), setting the model version

##### Infer pipeline
#### Infer pipeline
- `text_in`: the input to decode, which could be:
  - text bytes, `e.g.`: "我们都是木头人不会讲话不会动"
  - text file, `e.g.`: example/punc_example.txt

 egs_modelscope/speaker_diarization/TEMPLATE/README.md

@@ -37,8 +37,8 @@
print(results)
```

#### API-reference
##### Define pipeline
### API-reference
#### Define pipeline
- `task`: `Tasks.speaker_diarization`
- `model`: model name in [model zoo](https://alibaba-damo-academy.github.io/FunASR/en/modelscope_models.html#pretrained-models-on-modelscope), or model path in local disk
- `ngpu`: `1` (Default), decoding on GPU. If ngpu=0, decoding on CPU
@@ -50,7 +50,7 @@
  - vad format: spk1: [1.0, 3.0], [5.0, 8.0]
  - rttm format: "SPEAKER test1 0 1.00 2.00 <NA> <NA> spk1 <NA> <NA>" and "SPEAKER test1 0 5.00 3.00 <NA> <NA> spk1 <NA> <NA>"

##### Infer pipeline for speaker embedding extraction
#### Infer pipeline for speaker embedding extraction
- `audio_in`: the input to process, which could be: 
  - list of url: `e.g.`: waveform files at a website
  - list of local file path: `e.g.`: path/to/a.wav

 egs_modelscope/speaker_verification/TEMPLATE/README.md

@@ -47,8 +47,8 @@
```
Full code of demo, please ref to [infer.py](https://github.com/alibaba-damo-academy/FunASR/blob/main/egs_modelscope/speaker_verification/speech_xvector_sv-zh-cn-cnceleb-16k-spk3465-pytorch/infer.py).

#### API-reference
##### Define pipeline
### API-reference
#### Define pipeline
- `task`: `Tasks.speaker_verification`
- `model`: model name in [model zoo](https://alibaba-damo-academy.github.io/FunASR/en/modelscope_models.html#pretrained-models-on-modelscope), or model path in local disk
- `ngpu`: `1` (Default), decoding on GPU. If ngpu=0, decoding on CPU
@@ -57,7 +57,7 @@
- `sv_threshold`: `0.9465` (Default), the similarity threshold to determine 
whether utterances belong to the same speaker (it should be in (0, 1))

##### Infer pipeline for speaker embedding extraction
#### Infer pipeline for speaker embedding extraction
- `audio_in`: the input to process, which could be: 
  - url (str): `e.g.`: https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/sv_example_enroll.wav
  - local_path: `e.g.`: path/to/a.wav
@@ -71,7 +71,7 @@
  - fbank1.scp,speech,kaldi_ark: `e.g.`: extracted 80-dimensional fbank features
with kaldi toolkits.

##### Infer pipeline for speaker verification
#### Infer pipeline for speaker verification
- `audio_in`: the input to process, which could be: 
  - Tuple(url1, url2): `e.g.`: (https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/sv_example_enroll.wav, https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/sv_example_different.wav)
  - Tuple(local_path1, local_path2): `e.g.`: (path/to/a.wav, path/to/b.wav)  

 egs_modelscope/tp/TEMPLATE/README.md

@@ -23,15 +23,15 @@



#### API-reference
##### Define pipeline
### API-reference
#### Define pipeline
- `task`: `Tasks.speech_timestamp`
- `model`: model name in [model zoo](https://alibaba-damo-academy.github.io/FunASR/en/modelscope_models.html#pretrained-models-on-modelscope), or model path in local disk
- `ngpu`: `1` (Default), decoding on GPU. If ngpu=0, decoding on CPU
- `ncpu`: `1` (Default), sets the number of threads used for intraop parallelism on CPU 
- `output_dir`: `None` (Default), the output path of results if set
- `batch_size`: `1` (Default), batch size when decoding
##### Infer pipeline
#### Infer pipeline
- `audio_in`: the input speech to predict, which could be: 
  - wav_path, `e.g.`: asr_example.wav (wav in local or url), 
  - wav.scp, kaldi style wav list (`wav_id wav_path`), `e.g.`: 

 egs_modelscope/vad/TEMPLATE/README.md

@@ -43,15 +43,15 @@



#### API-reference
##### Define pipeline
### API-reference
#### Define pipeline
- `task`: `Tasks.voice_activity_detection`
- `model`: model name in [model zoo](https://alibaba-damo-academy.github.io/FunASR/en/modelscope_models.html#pretrained-models-on-modelscope), or model path in local disk
- `ngpu`: `1` (Default), decoding on GPU. If ngpu=0, decoding on CPU
- `ncpu`: `1` (Default), sets the number of threads used for intraop parallelism on CPU 
- `output_dir`: `None` (Default), the output path of results if set
- `batch_size`: `1` (Default), batch size when decoding
##### Infer pipeline
#### Infer pipeline
- `audio_in`: the input to decode, which could be: 
  - wav_path, `e.g.`: asr_example.wav,
  - pcm_path, `e.g.`: asr_example.pcm,

			@@ -76,15 +76,15 @@
			print(rec_result)
			```

			#### API-reference
			##### Define pipeline
			### API-reference
			#### Define pipeline
			- `task`: `Tasks.auto_speech_recognition`
			- `model`: model name in [model zoo](https://alibaba-damo-academy.github.io/FunASR/en/modelscope_models.html#pretrained-models-on-modelscope), or model path in local disk
			- `ngpu`: `1` (Default), decoding on GPU. If ngpu=0, decoding on CPU
			- `ncpu`: `1` (Default), sets the number of threads used for intraop parallelism on CPU
			- `output_dir`: `None` (Default), the output path of results if set
			- `batch_size`: `1` (Default), batch size when decoding
			##### Infer pipeline
			#### Infer pipeline
			- `audio_in`: the input to decode, which could be:
			- wav_path, `e.g.`: asr_example.wav,
			- pcm_path, `e.g.`: asr_example.pcm,

			@@ -9,6 +9,7 @@
			model='damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch',
			vad_model='damo/speech_fsmn_vad_zh-cn-16k-common-pytorch',
			punc_model='damo/punc_ct-transformer_zh-cn-common-vocab272727-pytorch',
			output_dir=output_dir
			)
			rec_result = inference_pipeline(audio_in=audio_in)
			print(rec_result)

			@@ -52,15 +52,15 @@
			Full code of demo, please ref to [demo](https://github.com/alibaba-damo-academy/FunASR/discussions/238)


			#### API-reference
			##### Define pipeline
			### API-reference
			#### Define pipeline
			- `task`: `Tasks.punctuation`
			- `model`: model name in [model zoo](https://alibaba-damo-academy.github.io/FunASR/en/modelscope_models.html#pretrained-models-on-modelscope), or model path in local disk
			- `ngpu`: `1` (Default), decoding on GPU. If ngpu=0, decoding on CPU
			- `output_dir`: `None` (Default), the output path of results if set
			- `model_revision`: `None` (Default), setting the model version

			##### Infer pipeline
			#### Infer pipeline
			- `text_in`: the input to decode, which could be:
			- text bytes, `e.g.`: "我们都是木头人不会讲话不会动"
			- text file, `e.g.`: example/punc_example.txt

			@@ -37,8 +37,8 @@
			print(results)
			```

			#### API-reference
			##### Define pipeline
			### API-reference
			#### Define pipeline
			- `task`: `Tasks.speaker_diarization`
			- `model`: model name in [model zoo](https://alibaba-damo-academy.github.io/FunASR/en/modelscope_models.html#pretrained-models-on-modelscope), or model path in local disk
			- `ngpu`: `1` (Default), decoding on GPU. If ngpu=0, decoding on CPU
			@@ -50,7 +50,7 @@
			- vad format: spk1: [1.0, 3.0], [5.0, 8.0]
			- rttm format: "SPEAKER test1 0 1.00 2.00 <NA> <NA> spk1 <NA> <NA>" and "SPEAKER test1 0 5.00 3.00 <NA> <NA> spk1 <NA> <NA>"

			##### Infer pipeline for speaker embedding extraction
			#### Infer pipeline for speaker embedding extraction
			- `audio_in`: the input to process, which could be:
			- list of url: `e.g.`: waveform files at a website
			- list of local file path: `e.g.`: path/to/a.wav

			@@ -47,8 +47,8 @@
			```
			Full code of demo, please ref to [infer.py](https://github.com/alibaba-damo-academy/FunASR/blob/main/egs_modelscope/speaker_verification/speech_xvector_sv-zh-cn-cnceleb-16k-spk3465-pytorch/infer.py).

			#### API-reference
			##### Define pipeline
			### API-reference
			#### Define pipeline
			- `task`: `Tasks.speaker_verification`
			- `model`: model name in [model zoo](https://alibaba-damo-academy.github.io/FunASR/en/modelscope_models.html#pretrained-models-on-modelscope), or model path in local disk
			- `ngpu`: `1` (Default), decoding on GPU. If ngpu=0, decoding on CPU
			@@ -57,7 +57,7 @@
			- `sv_threshold`: `0.9465` (Default), the similarity threshold to determine
			whether utterances belong to the same speaker (it should be in (0, 1))

			##### Infer pipeline for speaker embedding extraction
			#### Infer pipeline for speaker embedding extraction
			- `audio_in`: the input to process, which could be:
			- url (str): `e.g.`: https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/sv_example_enroll.wav
			- local_path: `e.g.`: path/to/a.wav
			@@ -71,7 +71,7 @@
			- fbank1.scp,speech,kaldi_ark: `e.g.`: extracted 80-dimensional fbank features
			with kaldi toolkits.

			##### Infer pipeline for speaker verification
			#### Infer pipeline for speaker verification
			- `audio_in`: the input to process, which could be:
			- Tuple(url1, url2): `e.g.`: (https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/sv_example_enroll.wav, https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/sv_example_different.wav)
			- Tuple(local_path1, local_path2): `e.g.`: (path/to/a.wav, path/to/b.wav)

			@@ -23,15 +23,15 @@



			#### API-reference
			##### Define pipeline
			### API-reference
			#### Define pipeline
			- `task`: `Tasks.speech_timestamp`
			- `model`: model name in [model zoo](https://alibaba-damo-academy.github.io/FunASR/en/modelscope_models.html#pretrained-models-on-modelscope), or model path in local disk
			- `ngpu`: `1` (Default), decoding on GPU. If ngpu=0, decoding on CPU
			- `ncpu`: `1` (Default), sets the number of threads used for intraop parallelism on CPU
			- `output_dir`: `None` (Default), the output path of results if set
			- `batch_size`: `1` (Default), batch size when decoding
			##### Infer pipeline
			#### Infer pipeline
			- `audio_in`: the input speech to predict, which could be:
			- wav_path, `e.g.`: asr_example.wav (wav in local or url),
			- wav.scp, kaldi style wav list (`wav_id wav_path`), `e.g.`:

			@@ -43,15 +43,15 @@



			#### API-reference
			##### Define pipeline
			### API-reference
			#### Define pipeline
			- `task`: `Tasks.voice_activity_detection`
			- `model`: model name in [model zoo](https://alibaba-damo-academy.github.io/FunASR/en/modelscope_models.html#pretrained-models-on-modelscope), or model path in local disk
			- `ngpu`: `1` (Default), decoding on GPU. If ngpu=0, decoding on CPU
			- `ncpu`: `1` (Default), sets the number of threads used for intraop parallelism on CPU
			- `output_dir`: `None` (Default), the output path of results if set
			- `batch_size`: `1` (Default), batch size when decoding
			##### Infer pipeline
			#### Infer pipeline
			- `audio_in`: the input to decode, which could be:
			- wav_path, `e.g.`: asr_example.wav,
			- pcm_path, `e.g.`: asr_example.pcm,