python/FunASR-XL.git

parent: cf96541c | 补丁 | 提交 | show whitespace

Merge pull request #460 from alibaba-damo-academy/dev_zc

zhifu gao

2023-05-05 e06317e9d0584086af2ea8c11baf822d71674c49

Merge pull request #460 from alibaba-damo-academy/dev_zc

add docs/modelscope_pipeline/itn_pipeline.md

1个文件已修改

1个文件已添加

	docs/modelscope_models.md	24 ●●●●● 补丁 \| 查看 \| 原始文档 \| blame \| 历史
	docs/modelscope_pipeline/itn_pipeline.md	70 ●●●●● 补丁 \| 查看 \| 原始文档 \| blame \| 历史

 docs/modelscope_models.md

@@ -111,15 +111,15 @@
### Inverse Text Normalization (ITN) Models

|                                                    Model Name                                                    | Language | Parameters | Notes |
|:----------------------------------------------------------------------------------------------------------------:|:--------:|:----------:|:------|
| [English](https://modelscope.cn/models/damo/speech_inverse_text_processing_fun-text-processing-itn-en/summary) |    EN    | 1.54M | ITN, ASR post processing |
| [Russian](https://modelscope.cn/models/damo/speech_inverse_text_processing_fun-text-processing-itn-ru/summary) |    RU    | 1.28M | ITN, ASR post processing |
| [Japanese](https://modelscope.cn/models/damo/speech_inverse_text_processing_fun-text-processing-itn-ja/summary) |    JA    | 6.8M | ITN, ASR post processing |
| [Korean](https://modelscope.cn/models/damo/speech_inverse_text_processing_fun-text-processing-itn-ko/summary) |    KO    | 1.28M | InverASR post processing |
| [Indonesian](https://modelscope.cn/models/damo/speech_inverse_text_processing_fun-text-processing-itn-id/summary) |    ID    | 2.06M | ITN, ASR post processing |
| [Vietnamese](https://modelscope.cn/models/damo/speech_inverse_text_processing_fun-text-processing-itn-vi/summary) |    VI    | 0.92M | ITN, ASR post processing |
| [Tagalog](https://modelscope.cn/models/damo/speech_inverse_text_processing_fun-text-processing-itn-tl/summary) |    TL    | 1.28M | ITN, ASR post processing |
| [Spanish](https://modelscope.cn/models/damo/speech_inverse_text_processing_fun-text-processing-itn-es/summary) |    ES    | 1.28M | ITN, ASR post processing |
| [Portuguese](https://modelscope.cn/models/damo/speech_inverse_text_processing_fun-text-processing-itn-pt/summary) |    PT    | 1.28M | ITN, ASR post processing |
| [French](https://modelscope.cn/models/damo/speech_inverse_text_processing_fun-text-processing-itn-fr/summary) |    FR    | 1.28M | InverASR post processing |
| [German](https://modelscope.cn/models/damo/speech_inverse_text_processing_fun-text-processing-itn-de/summary)|    GE    | 1.28M | ITN, ASR post processing |
|:----------------------------------------------------------------------------------------------------------------:|:--------:|:----------:|:-------------------------|
| [English](https://modelscope.cn/models/damo/speech_inverse_text_processing_fun-text-processing-itn-en/summary) |    EN    |   1.54M    | ITN, ASR post-processing |
| [Russian](https://modelscope.cn/models/damo/speech_inverse_text_processing_fun-text-processing-itn-ru/summary) |    RU    |   17.79M   | ITN, ASR post-processing |
| [Japanese](https://modelscope.cn/models/damo/speech_inverse_text_processing_fun-text-processing-itn-ja/summary) |    JA    |    6.8M    | ITN, ASR post-processing |
| [Korean](https://modelscope.cn/models/damo/speech_inverse_text_processing_fun-text-processing-itn-ko/summary) |    KO    |   1.28M    | ITN, ASR post-processing |
| [Indonesian](https://modelscope.cn/models/damo/speech_inverse_text_processing_fun-text-processing-itn-id/summary) |    ID    |   2.06M    | ITN, ASR post-processing |
| [Vietnamese](https://modelscope.cn/models/damo/speech_inverse_text_processing_fun-text-processing-itn-vi/summary) |    VI    |   0.92M    | ITN, ASR post-processing |
| [Tagalog](https://modelscope.cn/models/damo/speech_inverse_text_processing_fun-text-processing-itn-tl/summary) |    TL    |    0.65M     | ITN, ASR post-processing |
| [Spanish](https://modelscope.cn/models/damo/speech_inverse_text_processing_fun-text-processing-itn-es/summary) |    ES    |   1.32M    | ITN, ASR post-processing |
| [Portuguese](https://modelscope.cn/models/damo/speech_inverse_text_processing_fun-text-processing-itn-pt/summary) |    PT    |   1.28M    | ITN, ASR post-processing |
| [French](https://modelscope.cn/models/damo/speech_inverse_text_processing_fun-text-processing-itn-fr/summary) |    FR    |   4.39M    | ITN, ASR post-processing |
| [German](https://modelscope.cn/models/damo/speech_inverse_text_processing_fun-text-processing-itn-de/summary)|    GE    |   3.95M    | ITN, ASR post-processing |

 docs/modelscope_pipeline/itn_pipeline.md

New file
@@ -0,0 +1,70 @@
# Inverse Text Normalization (ITN)

> **Note**: 
> The modelscope pipeline supports all the models in [model zoo](https://modelscope.cn/models?page=1&tasks=inverse-text-processing&type=audio) to inference. Here we take the model of the Japanese ITN model as example to demonstrate the usage.

## Inference

### Quick start
#### [Japanese ITN model](https://modelscope.cn/models/damo/speech_inverse_text_processing_fun-text-processing-itn-ja/summary)
```python
from modelscope.pipelines import pipeline
from modelscope.utils.constant import Tasks

itn_inference_pipline = pipeline(
    task=Tasks.inverse_text_processing,
    model='damo/speech_inverse_text_processing_fun-text-processing-itn-ja',
    model_revision=None)

itn_result = itn_inference_pipline(text_in='百二十三')
print(itn_result)
```
- read text data directly.
```python
rec_result = inference_pipeline(text_in='一九九九年に誕生した同商品にちなみ、約三十年前、二十四歳の頃の幸四郎の写真を公開。')
```
- text stored via url，example：https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_text/ja_itn_example.txt
```python
rec_result = inference_pipeline(text_in='https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_text/ja_itn_example.txt')
```

Full code of demo, please ref to [demo](https://github.com/alibaba-damo-academy/FunASR/tree/main/fun_text_processing/inverse_text_normalization)

#### Modify Your Own ITN Model
The rule-based ITN code is open-sourced in [FunTextProcessing](https://github.com/alibaba-damo-academy/FunASR/tree/main/fun_text_processing), users can modify by their own grammar rules. After modify the rules, the users can export their own ITN models in local directory.

##### Export ITN Model
Use the code in FunASR to export ITN model. An example to export ITN model to local folder is shown as below.
```shell
cd fun_text_processing/inverse_text_normalization/
python export_models.py --language ja --export_dir ./itn_models/
```

##### Evaluate ITN Model
Users can evaluate their own ITN model in local directory. Here is an example:
```shell
python fun_text_processing/inverse_text_normalization/inverse_normalize.py --input_file ja_itn_example.txt --cache_dir ./itn_models/ --output_file output.txt --language=ja
```

### API-reference
#### Define pipeline
- `task`: `Tasks.inverse_text_processing`
- `model`: model name in [model zoo](https://modelscope.cn/models?page=1&tasks=inverse-text-processing&type=audio), or model path in local disk
- `output_dir`: `None` (Default), the output path of results if set
- `model_revision`: `None` (Default), setting the model version

#### Infer pipeline
- `text_in`: the input to decode, which could be:
  - text bytes, `e.g.`: "一九九九年に誕生した同商品にちなみ、約三十年前、二十四歳の頃の幸四郎の写真を公開。"
  - text file, `e.g.`: https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_text/ja_itn_example.txt
  In this case of `text file` input, `output_dir` must be set to save the output results


## Finetune with pipeline

### Quick start

### Finetune with your data

## Inference with your finetuned model

			@@ -111,15 +111,15 @@
			### Inverse Text Normalization (ITN) Models

			\| Model Name \| Language \| Parameters \| Notes \|
			\|:----------------------------------------------------------------------------------------------------------------:\|:--------:\|:----------:\|:------\|
			\| [English](https://modelscope.cn/models/damo/speech_inverse_text_processing_fun-text-processing-itn-en/summary) \| EN \| 1.54M \| ITN, ASR post processing \|
			\| [Russian](https://modelscope.cn/models/damo/speech_inverse_text_processing_fun-text-processing-itn-ru/summary) \| RU \| 1.28M \| ITN, ASR post processing \|
			\| [Japanese](https://modelscope.cn/models/damo/speech_inverse_text_processing_fun-text-processing-itn-ja/summary) \| JA \| 6.8M \| ITN, ASR post processing \|
			\| [Korean](https://modelscope.cn/models/damo/speech_inverse_text_processing_fun-text-processing-itn-ko/summary) \| KO \| 1.28M \| InverASR post processing \|
			\| [Indonesian](https://modelscope.cn/models/damo/speech_inverse_text_processing_fun-text-processing-itn-id/summary) \| ID \| 2.06M \| ITN, ASR post processing \|
			\| [Vietnamese](https://modelscope.cn/models/damo/speech_inverse_text_processing_fun-text-processing-itn-vi/summary) \| VI \| 0.92M \| ITN, ASR post processing \|
			\| [Tagalog](https://modelscope.cn/models/damo/speech_inverse_text_processing_fun-text-processing-itn-tl/summary) \| TL \| 1.28M \| ITN, ASR post processing \|
			\| [Spanish](https://modelscope.cn/models/damo/speech_inverse_text_processing_fun-text-processing-itn-es/summary) \| ES \| 1.28M \| ITN, ASR post processing \|
			\| [Portuguese](https://modelscope.cn/models/damo/speech_inverse_text_processing_fun-text-processing-itn-pt/summary) \| PT \| 1.28M \| ITN, ASR post processing \|
			\| [French](https://modelscope.cn/models/damo/speech_inverse_text_processing_fun-text-processing-itn-fr/summary) \| FR \| 1.28M \| InverASR post processing \|
			\| [German](https://modelscope.cn/models/damo/speech_inverse_text_processing_fun-text-processing-itn-de/summary)\| GE \| 1.28M \| ITN, ASR post processing \|
			\|:----------------------------------------------------------------------------------------------------------------:\|:--------:\|:----------:\|:-------------------------\|
			\| [English](https://modelscope.cn/models/damo/speech_inverse_text_processing_fun-text-processing-itn-en/summary) \| EN \| 1.54M \| ITN, ASR post-processing \|
			\| [Russian](https://modelscope.cn/models/damo/speech_inverse_text_processing_fun-text-processing-itn-ru/summary) \| RU \| 17.79M \| ITN, ASR post-processing \|
			\| [Japanese](https://modelscope.cn/models/damo/speech_inverse_text_processing_fun-text-processing-itn-ja/summary) \| JA \| 6.8M \| ITN, ASR post-processing \|
			\| [Korean](https://modelscope.cn/models/damo/speech_inverse_text_processing_fun-text-processing-itn-ko/summary) \| KO \| 1.28M \| ITN, ASR post-processing \|
			\| [Indonesian](https://modelscope.cn/models/damo/speech_inverse_text_processing_fun-text-processing-itn-id/summary) \| ID \| 2.06M \| ITN, ASR post-processing \|
			\| [Vietnamese](https://modelscope.cn/models/damo/speech_inverse_text_processing_fun-text-processing-itn-vi/summary) \| VI \| 0.92M \| ITN, ASR post-processing \|
			\| [Tagalog](https://modelscope.cn/models/damo/speech_inverse_text_processing_fun-text-processing-itn-tl/summary) \| TL \| 0.65M \| ITN, ASR post-processing \|
			\| [Spanish](https://modelscope.cn/models/damo/speech_inverse_text_processing_fun-text-processing-itn-es/summary) \| ES \| 1.32M \| ITN, ASR post-processing \|
			\| [Portuguese](https://modelscope.cn/models/damo/speech_inverse_text_processing_fun-text-processing-itn-pt/summary) \| PT \| 1.28M \| ITN, ASR post-processing \|
			\| [French](https://modelscope.cn/models/damo/speech_inverse_text_processing_fun-text-processing-itn-fr/summary) \| FR \| 4.39M \| ITN, ASR post-processing \|
			\| [German](https://modelscope.cn/models/damo/speech_inverse_text_processing_fun-text-processing-itn-de/summary)\| GE \| 3.95M \| ITN, ASR post-processing \|

New file
			@@ -0,0 +1,70 @@
			# Inverse Text Normalization (ITN)

			> Note:
			> The modelscope pipeline supports all the models in [model zoo](https://modelscope.cn/models?page=1&tasks=inverse-text-processing&type=audio) to inference. Here we take the model of the Japanese ITN model as example to demonstrate the usage.

			## Inference

			### Quick start
			#### [Japanese ITN model](https://modelscope.cn/models/damo/speech_inverse_text_processing_fun-text-processing-itn-ja/summary)
			```python
			from modelscope.pipelines import pipeline
			from modelscope.utils.constant import Tasks

			itn_inference_pipline = pipeline(
			task=Tasks.inverse_text_processing,
			model='damo/speech_inverse_text_processing_fun-text-processing-itn-ja',
			model_revision=None)

			itn_result = itn_inference_pipline(text_in='百二十三')
			print(itn_result)
			```
			- read text data directly.
			```python
			rec_result = inference_pipeline(text_in='一九九九年に誕生した同商品にちなみ、約三十年前、二十四歳の頃の幸四郎の写真を公開。')
			```
			- text stored via url，example：https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_text/ja_itn_example.txt
			```python
			rec_result = inference_pipeline(text_in='https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_text/ja_itn_example.txt')
			```

			Full code of demo, please ref to [demo](https://github.com/alibaba-damo-academy/FunASR/tree/main/fun_text_processing/inverse_text_normalization)

			#### Modify Your Own ITN Model
			The rule-based ITN code is open-sourced in [FunTextProcessing](https://github.com/alibaba-damo-academy/FunASR/tree/main/fun_text_processing), users can modify by their own grammar rules. After modify the rules, the users can export their own ITN models in local directory.

			##### Export ITN Model
			Use the code in FunASR to export ITN model. An example to export ITN model to local folder is shown as below.
			```shell
			cd fun_text_processing/inverse_text_normalization/
			python export_models.py --language ja --export_dir ./itn_models/
			```

			##### Evaluate ITN Model
			Users can evaluate their own ITN model in local directory. Here is an example:
			```shell
			python fun_text_processing/inverse_text_normalization/inverse_normalize.py --input_file ja_itn_example.txt --cache_dir ./itn_models/ --output_file output.txt --language=ja
			```

			### API-reference
			#### Define pipeline
			- `task`: `Tasks.inverse_text_processing`
			- `model`: model name in [model zoo](https://modelscope.cn/models?page=1&tasks=inverse-text-processing&type=audio), or model path in local disk
			- `output_dir`: `None` (Default), the output path of results if set
			- `model_revision`: `None` (Default), setting the model version

			#### Infer pipeline
			- `text_in`: the input to decode, which could be:
			- text bytes, `e.g.`: "一九九九年に誕生した同商品にちなみ、約三十年前、二十四歳の頃の幸四郎の写真を公開。"
			- text file, `e.g.`: https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_text/ja_itn_example.txt
			In this case of `text file` input, `output_dir` must be set to save the output results


			## Finetune with pipeline

			### Quick start

			### Finetune with your data

			## Inference with your finetuned model