From 944a053a66d67e3964e4903908a8074c0d1bf45b Mon Sep 17 00:00:00 2001
From: 游雁 <zhifu.gzf@alibaba-inc.com>
Date: 星期四, 21 三月 2024 14:20:09 +0800
Subject: [PATCH] tutorial

---
 /dev/null |    1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/docs/academic_recipe/asr_recipe.md b/docs/academic_recipe/asr_recipe.md
deleted file mode 100644
index 9e19c61..0000000
--- a/docs/academic_recipe/asr_recipe.md
+++ /dev/null
@@ -1,267 +0,0 @@
-# Speech Recognition
-In FunASR, we provide several ASR benchmarks, such as AISHLL, Librispeech, WenetSpeech, while different model architectures are supported, including conformer, paraformer, uniasr.
-
-## Quick Start
-After downloaded and installed FunASR, users can use our provided recipes to easily reproduce the relevant experimental results. Here we take "paraformer on AISHELL-1" as an example. 
-
-First, move to the corresponding dictionary of the AISHELL-1 paraformer example.
-```sh
-cd egs/aishell/paraformer
-```
-
-Then you can directly start the recipe as follows:
-```sh
-conda activate funasr
-bash run.sh --CUDA_VISIBLE_DEVICES "0,1" --gpu_num 2
-```
-
-The training log files are saved in `${exp_dir}/exp/${model_dir}/log/train.log.*`锛� which can be viewed using the following command:
-```sh
-vim exp/*_train_*/log/train.log.0
-```
-
-Users can observe the training loss, prediction accuracy and other training information, like follows:
-```text
-... 1epoch:train:751-800batch:800num_updates: ... loss_ctc=106.703, loss_att=86.877, acc=0.029, loss_pre=1.552 ...
-... 1epoch:train:801-850batch:850num_updates: ... loss_ctc=107.890, loss_att=87.832, acc=0.029, loss_pre=1.702 ...
-```
-
-At the end of each epoch, the evaluation metrics are calculated on the validation set, like follows:
-```text
-... [valid] loss_ctc=99.914, cer_ctc=1.000, loss_att=80.512, acc=0.029, cer=0.971, wer=1.000, loss_pre=1.952, loss=88.285 ...
-```
-
-Also, users can use tensorboard to observe these training information by the following command:
-```sh
-tensorboard --logdir ${exp_dir}/exp/${model_dir}/tensorboard/train
-```
-Here is an example of loss:
-
-<img src="./academic_recipe/images/loss.png" width="200"/>
-
-The inference results are saved in `${exp_dir}/exp/${model_dir}/decode_asr_*/$dset`. The main two files are `text.cer` and `text.cer.txt`. `text.cer` saves the comparison between the recognized text and the reference text, like follows:
-```text
-...
-BAC009S0764W0213(nwords=11,cor=11,ins=0,del=0,sub=0) corr=100.00%,cer=0.00%
-ref:    鏋� 寤� 鑹� 濂� 鐨� 鏃� 娓� 甯� 鍦� 鐜� 澧�
-res:    鏋� 寤� 鑹� 濂� 鐨� 鏃� 娓� 甯� 鍦� 鐜� 澧�
-...
-```
-`text.cer.txt` saves the final results, like follows:
-```text
-%WER ...
-%SER ...
-Scored ... sentences, ...
-```
-
-## Introduction
-We provide a recipe `egs/aishell/paraformer/run.sh` for training a paraformer model on AISHELL-1 dataset. This recipe consists of five stages, supporting training on multiple GPUs and decoding by CPU or GPU. Before introducing each stage in detail, we first explain several parameters which should be set by users.
-- `CUDA_VISIBLE_DEVICES`: `0,1` (Default), visible gpu list
-- `gpu_num`: `2` (Default), the number of GPUs used for training
-- `gpu_inference`: `true` (Default), whether to use GPUs for decoding
-- `njob`: `1`  (Default),for CPU decoding, indicating the total number of CPU jobs; for GPU decoding, indicating the number of jobs on each GPU
-- `raw_data`: the raw path of AISHELL-1 dataset
-- `feats_dir`: the path for saving processed data
-- `token_type`: `char` (Default), indicate how to process text
-- `type`: `sound` (Default), set the input type
-- `scp`: `wav.scp` (Default), set the input file
-- `nj`: `64` (Default), the number of jobs for data preparation
-- `speed_perturb`: `"0.9, 1.0 ,1.1"` (Default), the range of speech perturbed
-- `exp_dir`: the path for saving experimental results
-- `tag`: `exp1` (Default), the suffix of experimental result directory
-- `stage` `0` (Default), start the recipe from the specified stage
-- `stop_stage` `5` (Default), stop the recipe from the specified stage
-
-### Stage 0: Data preparation
-This stage processes raw AISHELL-1 dataset `$raw_data` and generates the corresponding `wav.scp` and `text` in `$feats_dir/data/xxx`. `xxx` means `train/dev/test`. Here we assume users have already downloaded AISHELL-1 dataset. If not, users can download data [here](https://www.openslr.org/33/) and set the path for `$raw_data`. The examples of `wav.scp` and `text` are as follows:
-* `wav.scp`
-```
-BAC009S0002W0122 /nfs/ASR_DATA/AISHELL-1/data_aishell/wav/train/S0002/BAC009S0002W0122.wav
-BAC009S0002W0123 /nfs/ASR_DATA/AISHELL-1/data_aishell/wav/train/S0002/BAC009S0002W0123.wav
-BAC009S0002W0124 /nfs/ASR_DATA/AISHELL-1/data_aishell/wav/train/S0002/BAC009S0002W0124.wav
-...
-```
-* `text`
-```
-BAC009S0002W0122 鑰� 瀵� 妤� 甯� 鎴� 浜� 鎶� 鍒� 浣� 鐢� 鏈� 澶� 鐨� 闄� 璐�
-BAC009S0002W0123 涔� 鎴� 涓� 鍦� 鏂� 鏀� 搴� 鐨� 鐪� 涓� 閽�
-BAC009S0002W0124 鑷� 鍏� 鏈� 搴� 鍛� 鍜� 娴� 鐗� 甯� 鐜� 鍏� 瀹� 甯� 鍙� 娑� 闄� 璐� 鍚�
-...
-```
-These two files both have two columns, while the first column is wav ids and the second column is the corresponding wav paths/label tokens.
-
-### Stage 1: Feature and CMVN Generation
-This stage computes CMVN based on `train` dataset, which is used in the following stages. Users can set `nj` to control the number of jobs for computing CMVN. The generated CMVN file is saved as `$feats_dir/data/train/cmvn/am.mvn`.
-
-### Stage 2: Dictionary Preparation
-This stage processes the dictionary, which is used as a mapping between label characters and integer indices during ASR training. The processed dictionary file is saved as `$feats_dir/data/$lang_toekn_list/$token_type/tokens.txt`. An example of `tokens.txt` is as follows:
-```
-<blank>
-<s>
-</s>
-涓�
-涓�
-...
-榫�
-榫�
-<unk>
-```
-There are four tokens must be specified:
-* `<blank>`: (required), indicates the blank token for CTC, must be in the first line
-* `<s>`: (required), indicates the start-of-sentence token, must be in the second line
-* `</s>`: (required), indicates the end-of-sentence token, must be in the third line
-* `<unk>`: (required), indicates the out-of-vocabulary token, must be in the last line
-
-### Stage 3: LM Training
-
-### Stage 4: ASR Training
-This stage achieves the training of the specified model. To start training, users should manually set `exp_dir` to specify the path for saving experimental results. By default, the best `$keep_nbest_models` checkpoints on validation dataset will be averaged to generate a better model and adopted for decoding. FunASR implements `train.py` for training different models and users can configure the following parameters if necessary. The training command is as follows:
-
-```sh
-train.py \
-    --task_name asr \
-    --use_preprocessor true \
-    --token_list $token_list \
-    --data_dir ${feats_dir}/data \
-    --train_set ${train_set} \
-    --valid_set ${valid_set} \
-    --data_file_names "wav.scp,text" \
-    --cmvn_file ${feats_dir}/data/${train_set}/cmvn/am.mvn \
-    --speed_perturb ${speed_perturb} \
-    --resume true \
-    --output_dir ${exp_dir}/exp/${model_dir} \
-    --config $asr_config \
-    --ngpu $gpu_num \
-    ...
-```
-
-* `task_name`: `asr` (Default), specify the task type of the current recipe
-* `ngpu`: `2` (Default), specify the number of GPUs for training. When `ngpu > 1`, DistributedDataParallel (DDP, the detail can be found [here](https://pytorch.org/tutorials/intermediate/ddp_tutorial.html)) training will be enabled. Correspondingly, `CUDA_VISIBLE_DEVICES` should be set to specify which ids of GPUs will be used.
-* `use_preprocessor`: `true` (Default), specify whether to use pre-processing on each sample
-* `token_list`: the path of token list for training
-* `dataset_type`: `small` (Default). FunASR supports `small` dataset type for training small datasets. Besides, an optional iterable-style DataLoader based on [Pytorch Iterable-style DataPipes](https://pytorch.org/data/beta/torchdata.datapipes.iter.html) for large datasets is supported and users can specify `dataset_type=large` to enable it.
-* `data_dir`: the path of data. Specifically, the data for training is saved in `$data_dir/data/$train_set` while the data for validation is saved in `$data_dir/data/$valid_set`
-* `data_file_names`: `"wav.scp,text"` specify the speech and text file names for ASR
-* `cmvn_file`: the path of cmvn file
-* `resume`: `true`, whether to enable "checkpoint training"
-* `output_dir`: the path for saving training results
-* `config`: the path of configuration file, which is usually a YAML file in `conf` directory. In FunASR, the parameters of the training, including model, optimization, dataset, etc., can also be set in this file. Note that if the same parameters are specified in both recipe and config file, the parameters of recipe will be employed
-
-### Stage 5: Decoding
-This stage generates the recognition results and calculates the `CER` to verify the performance of the trained model. 
-
-* Mode Selection
-
-As we support paraformer, uniasr, conformer and other models in FunASR, a `mode` parameter should be specified as `asr/paraformer/uniasr` according to the trained model.
-
-* Configuration
-
-We support CTC decoding, attention decoding and hybrid CTC-attention decoding in FunASR, which can be specified by `ctc_weight` in a YAML file in `conf` directory. Specifically, `ctc_weight=1.0` indicates CTC decoding, `ctc_weight=0.0` indicates attention decoding, `0.0<ctc_weight<1.0` indicates hybrid CTC-attention decoding.
-
-* CPU/GPU Decoding
-
-We support CPU and GPU decoding in FunASR. For CPU decoding, you should set `gpu_inference=False` and set `njob` to specify the total number of CPU decoding jobs. For GPU decoding, you should set `gpu_inference=True`. You should also set `gpuid_list` to indicate which GPUs are used for decoding and `njobs` to indicate the number of decoding jobs on each GPU.
-
-* Performance
-
-We adopt `CER` to verify the performance. The results are in `$exp_dir/exp/$model_dir/$decoding_yaml_name/$average_model_name/$dset`, namely `text.cer` and `text.cer.txt`. `text.cer` saves the comparison between the recognized text and the reference text while `text.cer.txt` saves the final `CER` results. The following is an example of `text.cer`:
-```
-...
-BAC009S0764W0213(nwords=11,cor=11,ins=0,del=0,sub=0) corr=100.00%,cer=0.00%
-ref:    鏋� 寤� 鑹� 濂� 鐨� 鏃� 娓� 甯� 鍦� 鐜� 澧�
-res:    鏋� 寤� 鑹� 濂� 鐨� 鏃� 娓� 甯� 鍦� 鐜� 澧�
-...
-```
-
-## Change settings
-Here we explain how to perform common custom settings, which can help users to modify scripts according to their own needs.
-
-### Training with specified GPUs
-
-For example, if users want to use 2 GPUs with id `2` and `3`, users can run the following command:
-```sh
-. ./run.sh --CUDA_VISIBLE_DEVICES "2,3" --gpu_num 2 
-```
-
-### Start from/Stop at a specified stage
-
-The recipe includes several stages. Users can start form or stop at any stage. For example, the following command achieves starting from the third stage and stopping at the fifth stage:
-```sh
-. ./run.sh --stage 3 --stop_stage 5
-```
-
-### Specify total training steps
-
-FunASR supports two parameters to specify the training steps, namely `max_epoch` and `max_update`. `max_epoch` indicates the total training epochs while `max_update` indicates the total training steps. If these two parameters are specified at the same time, once the training reaches any one of these two parameters, the training will be stopped.
-
-### Change the configuration of the model
-
-The configuration of the model is set in the config file `conf/train_*.yaml`. Specifically, the default encoder configuration of paraformer is as follows:
-```
-encoder: conformer
-encoder_conf:
-    output_size: 256    # dimension of attention
-    attention_heads: 4  # the number of heads in multi-head attention
-    linear_units: 2048  # the number of units of position-wise feed forward
-    num_blocks: 12      # the number of encoder blocks
-    dropout_rate: 0.1
-    positional_dropout_rate: 0.1
-    attention_dropout_rate: 0.0
-    input_layer: conv2d  # encoder input layer architecture type
-    normalize_before: true
-    pos_enc_layer_type: rel_pos
-    selfattention_layer_type: rel_selfattn
-    activation_type: swish
-    macaron_style: true
-    use_cnn_module: true
-    cnn_module_kernel: 15
-
-```
-Users can change the encoder configuration by modify these values. For example, if users want to use an encoder with 16 conformer blocks and each block has 8 attention heads, users just need to change `num_blocks` from 12 to 16 and change `attention_heads` from 4 to 8. Besides, the batch_size, learning rate and other training hyper-parameters are also set in this config file. To change these hyper-parameters, users just need to directly change the corresponding values in this file. For example, the default learning rate is `0.0005`. If users want to change the learning rate to 0.0002, set the value of lr as `lr: 0.0002`.
-
-### Change different input data type
-
-FunASR supports different input data types, including `sound`, `kaldi_ark`, `npy`, `text` and `text_int`. Users can specify any number and any type of input, which is achieved by `data_names` and `data_types` (in `config/train_*.yaml`). For example, ASR task usually requires speech and the transcripts as input. In FunASR, by default, speech is saved as raw audio (such as wav format) and transcripts are saved as text format. Correspondingly, `data_names` and `data_types` are set as follows (seen in `config/train_*.yaml`):
-```text
-dataset_conf:
-    data_names: speech,text
-    data_types: sound,text
-    ...
-```
-When the input type changes to FBank, users just need to modify as `data_types: kaldi_ark,text` in the config file. Note `data_file_names` used in `train.py` should also be changed to the new file name.
-
-### How to resume training process
-FunASR supports resuming training as follows:
-```shell
-train.py ... --resume true ...
-```
-
-### How to transfer / fine-tuning from pre-trained models
-
-FunASR supports transferring / fine-tuning from a pre-trained model by specifying the `init_param` parameter. The usage format is as follows:
-```shell
-train.py ... --init_param <file_path>:<src_key>:<dst_key>:<exclude_keys>  ..
-```
-For example, the following command achieves loading all pretrained parameters starting from decoder except decoder.embed and set it to model.decoder2: 
-```shell
-train.py ... --init_param model.pb:decoder:decoder2:decoder.embed  ...
-```
-Besides, loading parameters from multiple pre-trained models is supported. For example, the following command achieves loading encoder parameters from the pre-trained model1 and decoder parameters from the pre-trained model2:
-```sh
-train.py ... --init_param model1.pb:encoder  --init_param model2.pb:decoder ...
-```
-
-### How to freeze part of the model parameters
-
-In certain situations, users may want to fix part of the model parameters update the rest model parameters. FunASR employs `freeze_param` to achieve this. For example, to fix all parameters like `encoder.*`, users need to set `freeze_param ` as follows:
-```sh
-train.py ... --freeze_param encoder ...
-```
-
-### ModelScope Usage
-
-Users can use ModelScope for inference and fine-tuning based on a trained academic model. To achieve this, users need to run the stage 6 in the script. In this stage, relevant files required by ModelScope will be generated automatically. Users can then use the corresponding ModelScope interface by replacing the model name with the local trained model path. For the detailed usage of the ModelScope interface, please refer to [ModelScope Usage](https://alibaba-damo-academy.github.io/FunASR/en/modelscope_pipeline/quick_start.html).
-
-### Decoding by CPU or GPU
-
-We support CPU and GPU decoding. For CPU decoding, set `gpu_inference=false` and `njob` to specific the total number of CPU jobs. For GPU decoding, first set `gpu_inference=true`. Then set `gpuid_list` to specific which GPUs for decoding and `njob` to specific the number of decoding jobs on each GPU.
diff --git a/docs/academic_recipe/images/loss.png b/docs/academic_recipe/images/loss.png
deleted file mode 100644
index f559864..0000000
--- a/docs/academic_recipe/images/loss.png
+++ /dev/null
Binary files differ
diff --git a/docs/academic_recipe/lm_recipe.md b/docs/academic_recipe/lm_recipe.md
deleted file mode 100644
index 730e27c..0000000
--- a/docs/academic_recipe/lm_recipe.md
+++ /dev/null
@@ -1,3 +0,0 @@
-# Speech Recognition
-
-Undo
diff --git a/docs/academic_recipe/punc_recipe.md b/docs/academic_recipe/punc_recipe.md
deleted file mode 100644
index e9f79bb..0000000
--- a/docs/academic_recipe/punc_recipe.md
+++ /dev/null
@@ -1,2 +0,0 @@
-# Punctuation Restoration
-Undo
\ No newline at end of file
diff --git a/docs/academic_recipe/sd_recipe.md b/docs/academic_recipe/sd_recipe.md
deleted file mode 100644
index 8b38d7b..0000000
--- a/docs/academic_recipe/sd_recipe.md
+++ /dev/null
@@ -1,2 +0,0 @@
-# Speaker Diarization
-Undo
diff --git a/docs/academic_recipe/sv_recipe.md b/docs/academic_recipe/sv_recipe.md
deleted file mode 100644
index 7fe493b..0000000
--- a/docs/academic_recipe/sv_recipe.md
+++ /dev/null
@@ -1,2 +0,0 @@
-# Speaker Verification
-Undo
diff --git a/docs/academic_recipe/vad_recipe.md b/docs/academic_recipe/vad_recipe.md
deleted file mode 100644
index 0216bc3..0000000
--- a/docs/academic_recipe/vad_recipe.md
+++ /dev/null
@@ -1,2 +0,0 @@
-# Voice Activity Detection
-Undo
diff --git a/docs/funasr b/docs/funasr
deleted file mode 120000
index 5782c20..0000000
--- a/docs/funasr
+++ /dev/null
@@ -1 +0,0 @@
-../funasr
\ No newline at end of file
diff --git a/docs/modelscope_pipeline/asr_pipeline.md b/docs/modelscope_pipeline/asr_pipeline.md
deleted file mode 120000
index 465d5a2..0000000
--- a/docs/modelscope_pipeline/asr_pipeline.md
+++ /dev/null
@@ -1 +0,0 @@
-../../egs_modelscope/asr/TEMPLATE/README.md
\ No newline at end of file
diff --git a/docs/modelscope_pipeline/itn_pipeline.md b/docs/modelscope_pipeline/itn_pipeline.md
deleted file mode 100644
index 2336842..0000000
--- a/docs/modelscope_pipeline/itn_pipeline.md
+++ /dev/null
@@ -1,63 +0,0 @@
-# Inverse Text Normalization (ITN)
-
-> **Note**: 
-> The modelscope pipeline supports all the models in [model zoo](https://modelscope.cn/models?page=1&tasks=inverse-text-processing&type=audio) to inference. Here we take the model of the Japanese ITN model as example to demonstrate the usage.
-
-## Inference
-
-### Quick start
-#### [Japanese ITN model](https://modelscope.cn/models/damo/speech_inverse_text_processing_fun-text-processing-itn-ja/summary)
-```python
-from modelscope.pipelines import pipeline
-from modelscope.utils.constant import Tasks
-
-itn_inference_pipline = pipeline(
-    task=Tasks.inverse_text_processing,
-    model='damo/speech_inverse_text_processing_fun-text-processing-itn-ja',
-    model_revision=None)
-
-itn_result = itn_inference_pipline(text_in='鐧句簩鍗佷笁')
-print(itn_result)
-# 123
-```
-- read text data directly.
-```python
-rec_result = inference_pipeline(text_in='涓�涔濅節涔濆勾銇獣鐢熴仐銇熷悓鍟嗗搧銇仭銇伩銆佺磩涓夊崄骞村墠銆佷簩鍗佸洓姝炽伄闋冦伄骞稿洓閮庛伄鍐欑湡銈掑叕闁嬨��')
-# 1999骞淬伀瑾曠敓銇椼仧鍚屽晢鍝併伀銇°仾銇裤�佺磩30骞村墠銆�24姝炽伄闋冦伄骞稿洓閮庛伄鍐欑湡銈掑叕闁嬨��
-```
-- text stored via url锛宔xample锛歨ttps://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_text/ja_itn_example.txt
-```python
-rec_result = inference_pipeline(text_in='https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_text/ja_itn_example.txt')
-```
-
-Full code of demo, please ref to [demo](https://github.com/alibaba-damo-academy/FunASR/tree/main/fun_text_processing/inverse_text_normalization)
-
-### API-reference
-#### Define pipeline
-- `task`: `Tasks.inverse_text_processing`
-- `model`: model name in [model zoo](https://modelscope.cn/models?page=1&tasks=inverse-text-processing&type=audio), or model path in local disk
-- `output_dir`: `None` (Default), the output path of results if set
-- `model_revision`: `None` (Default), setting the model version
-
-#### Infer pipeline
-- `text_in`: the input to decode, which could be:
-  - text bytes, `e.g.`: "涓�涔濅節涔濆勾銇獣鐢熴仐銇熷悓鍟嗗搧銇仭銇伩銆佺磩涓夊崄骞村墠銆佷簩鍗佸洓姝炽伄闋冦伄骞稿洓閮庛伄鍐欑湡銈掑叕闁嬨��"
-  - text file, `e.g.`: https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_text/ja_itn_example.txt
-  In this case of `text file` input, `output_dir` must be set to save the output results
-
-## Modify Your Own ITN Model
-The rule-based ITN code is open-sourced in [FunTextProcessing](https://github.com/alibaba-damo-academy/FunASR/tree/main/fun_text_processing), users can modify by their own grammar rules for different languages. Let's take Japanese as an example, users can add their own whitelist in ```FunASR/fun_text_processing/inverse_text_normalization/ja/data/whitelist.tsv```. After modified the grammar rules, the users can export and evaluate their own ITN models in local directory.
-
-### Export ITN Model
-Export ITN model via ```FunASR/fun_text_processing/inverse_text_normalization/export_models.py```. An example to export ITN model to local folder is shown as below.
-```shell
-cd FunASR/fun_text_processing/inverse_text_normalization/
-python export_models.py --language ja --export_dir ./itn_models/
-```
-
-### Evaluate ITN Model
-Users can evaluate their own ITN model in local directory via ```FunASR/fun_text_processing/inverse_text_normalization/inverse_normalize.py```. Here is an example:
-```shell
-cd FunASR/fun_text_processing/inverse_text_normalization/
-python inverse_normalize.py --input_file ja_itn_example.txt --cache_dir ./itn_models/ --output_file output.txt --language=ja
-```
\ No newline at end of file
diff --git a/docs/modelscope_pipeline/lm_pipeline.md b/docs/modelscope_pipeline/lm_pipeline.md
deleted file mode 100644
index c4090ec..0000000
--- a/docs/modelscope_pipeline/lm_pipeline.md
+++ /dev/null
@@ -1,14 +0,0 @@
-# Language Models
-
-## Inference with pipeline
-### Quick start
-### Inference with you data
-### Inference with multi-threads on CPU
-### Inference with multi GPU
-
-## Finetune with pipeline
-### Quick start
-### Finetune with your data
-
-## Inference with your finetuned model
-
diff --git a/docs/modelscope_pipeline/modelscope_usages.md b/docs/modelscope_pipeline/modelscope_usages.md
deleted file mode 100644
index 84c8e1d..0000000
--- a/docs/modelscope_pipeline/modelscope_usages.md
+++ /dev/null
@@ -1,53 +0,0 @@
-# ModelScope Usage
-ModelScope is an open-source model-as-service platform supported by Alibaba, which provides flexible and convenient model applications for users in academia and industry. For specific usages and open source models, please refer to [ModelScope](https://www.modelscope.cn/models?page=1&tasks=auto-speech-recognition). In the domain of speech, we provide autoregressive/non-autoregressive speech recognition, speech pre-training, punctuation prediction and other models, which are convenient for users.
-
-## Overall Introduction
-We provide the usages of different models under the `egs_modelscope`, which supports directly employing our provided models for inference, as well as finetuning the models we provided as pre-trained initial models. Next, we will introduce the model provided in the `egs_modelscope/asr/paraformer/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch` directory, including `infer.py`, `finetune.py` and `infer_after_finetune .py`. The corresponding functions are as follows:
-- `infer.py`: perform inference on the specified dataset based on our provided model
-- `finetune.py`: employ our provided model as the initial model for fintuning
-- `infer_after_finetune.py`: perform inference on the specified dataset based on the finetuned model
-
-## Inference
-We provide `infer.py` to achieve the inference. Based on this file, users can preform inference on the specified dataset based on our provided model and obtain the corresponding recognition results. If the transcript is given, the `CER` will be calculated at the same time. Before performing inference, users can set the following parameters to modify the inference configuration:
-* `data_dir`锛歞ataset directory. The directory should contain the wav list file `wav.scp` and the transcript file `text` (optional). For the format of these two files, please refer to the instructions in [Quick Start](./get_started.md). If the `text` file exists, the CER will be calculated accordingly, otherwise it will be skipped.
-* `output_dir`锛歵he directory for saving the inference results
-* `batch_size`锛歜atch size during the inference
-* `ctc_weight`锛歴ome models contain a CTC module, users can set this parameter to specify the weight of the CTC module during the inference
-
-In addition to directly setting parameters in `infer.py`, users can also manually set the parameters in the `decoding.yaml` file in the model download directory to modify the inference configuration.
-
-## Finetuning
-We provide `finetune.py` to achieve the finetuning. Based on this file, users can finetune on the specified dataset based on our provided model as the initial model to achieve better performance in the specificed domain. Before finetuning, users can set the following parameters to modify the finetuning configuration:
-* `data_path`锛歞ataset directory銆俆his directory should contain the `train` directory for saving the training set and the `dev` directory for saving the validation set. Each directory needs to contain the wav list file `wav.scp` and the transcript file `text`
-* `output_dir`锛歵he directory for saving the finetuning results
-* `dataset_type`锛歠or small dataset锛宻et as `small`锛沠or dataset larger than 1000 hours锛宻et as `large`
-* `batch_bins`锛歜atch size锛宨f dataset_type is set as `small`锛宼he unit of batch_bins is the number of fbank feature frames; if dataset_type is set as `large`, the unit of batch_bins is milliseconds
-* `max_epoch`锛歵he maximum number of training epochs
-
-The following parameters can also be set. However, if there is no special requirement, users can ignore these parameters and use the default value we provided directly:
-* `accum_grad`锛歵he accumulation of the gradient
-* `keep_nbest_models`锛歴elect the `keep_nbest_models` models with the best performance and average the parameters 
-  of these models to get a better model
-* `optim`锛歴et the optimizer
-* `lr`锛歴et the learning rate
-* `scheduler`锛歴et learning rate adjustment strategy
-* `scheduler_conf`锛歴et the related parameters of the learning rate adjustment strategy
-* `specaug`锛歴et for the spectral augmentation
-* `specaug_conf`锛歴et related parameters of the spectral augmentation
-
-In addition to directly setting parameters in `finetune.py`, users can also manually set the parameters in the `finetune.yaml` file in the model download directory to modify the finetuning configuration.
-
-## Inference after Finetuning
-We provide `infer_after_finetune.py` to achieve the inference based on the model finetuned by users. Based on this file, users can preform inference on the specified dataset based on the finetuned model and obtain the corresponding recognition results. If the transcript is given, the `CER` will be calculated at the same time. Before performing inference, users can set the following parameters to modify the inference configuration:
-* `data_dir`锛歞ataset directory銆俆he directory should contain the wav list file `wav.scp` and the transcript file `text` (optional). If the `text` file exists, the CER will be calculated accordingly, otherwise it will be skipped.
-* `output_dir`锛歵he directory for saving the inference results
-* `batch_size`锛歜atch size during the inference
-* `ctc_weight`锛歴ome models contain a CTC module, users can set this parameter to specify the weight of the CTC module during the inference
-* `decoding_model_name`锛歴et the name of the model used for the inference
-
-The following parameters can also be set. However, if there is no special requirement, users can ignore these parameters and use the default value we provided directly:
-* `modelscope_model_name`锛歵he initial model name used when finetuning
-* `required_files`锛歠iles required for the inference when using the modelscope interface
-
-## Announcements
-Some models may have other specific parameters during the finetuning and inference. The usages of these parameters can be found in the `README.md` file in the corresponding directory.
\ No newline at end of file
diff --git a/docs/modelscope_pipeline/punc_pipeline.md b/docs/modelscope_pipeline/punc_pipeline.md
deleted file mode 120000
index 4ef4711..0000000
--- a/docs/modelscope_pipeline/punc_pipeline.md
+++ /dev/null
@@ -1 +0,0 @@
-../../egs_modelscope/punctuation/TEMPLATE/README.md
\ No newline at end of file
diff --git a/docs/modelscope_pipeline/quick_start.md b/docs/modelscope_pipeline/quick_start.md
deleted file mode 100644
index 2b9219b..0000000
--- a/docs/modelscope_pipeline/quick_start.md
+++ /dev/null
@@ -1,226 +0,0 @@
-([绠�浣撲腑鏂嘳(./quick_start_zh.md)|English)
-
-# Quick Start
-
-> **Note**: 
-> The modelscope pipeline supports all the models in [model zoo](https://alibaba-damo-academy.github.io/FunASR/en/model_zoo/modelscope_models.html#pretrained-models-on-modelscope) to inference and finetine. Here we take typic model as example to demonstrate the usage.
-
-
-## Inference with pipeline
-
-### Speech Recognition
-#### Paraformer Model
-```python
-from modelscope.pipelines import pipeline
-from modelscope.utils.constant import Tasks
-
-inference_pipeline = pipeline(
-    task=Tasks.auto_speech_recognition,
-    model='damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch',
-)
-
-rec_result = inference_pipeline(audio_in='https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_zh.wav')
-print(rec_result)
-# {'text': '娆㈣繋澶у鏉ヤ綋楠岃揪鎽╅櫌鎺ㄥ嚭鐨勮闊宠瘑鍒ā鍨�'}
-```
-
-### Voice Activity Detection
-#### FSMN-VAD Model
-```python
-from modelscope.pipelines import pipeline
-from modelscope.utils.constant import Tasks
-from modelscope.utils.logger import get_logger
-import logging
-logger = get_logger(log_level=logging.CRITICAL)
-logger.setLevel(logging.CRITICAL)
-
-inference_pipeline = pipeline(
-    task=Tasks.voice_activity_detection,
-    model='damo/speech_fsmn_vad_zh-cn-16k-common-pytorch',
-    )
-
-segments_result = inference_pipeline(audio_in='https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/vad_example.wav')
-print(segments_result)
-# {'text': [[70, 2340], [2620, 6200], [6480, 23670], [23950, 26250], [26780, 28990], [29950, 31430], [31750, 37600], [38210, 46900], [47310, 49630], [49910, 56460], [56740, 59540], [59820, 70450]]}
-```
-
-### Punctuation Restoration
-#### CT_Transformer Model
-```python
-from modelscope.pipelines import pipeline
-from modelscope.utils.constant import Tasks
-
-inference_pipeline = pipeline(
-    task=Tasks.punctuation,
-    model='damo/punc_ct-transformer_zh-cn-common-vocab272727-pytorch',
-    )
-
-rec_result = inference_pipeline(text_in='鎴戜滑閮芥槸鏈ㄥご浜轰笉浼氳璇濅笉浼氬姩')
-print(rec_result)
-# {'text': '鎴戜滑閮芥槸鏈ㄥご浜猴紝涓嶄細璁茶瘽锛屼笉浼氬姩銆�'}
-```
-
-### Timestamp Prediction
-#### TP-Aligner Model
-```python
-from modelscope.pipelines import pipeline
-from modelscope.utils.constant import Tasks
-
-inference_pipeline = pipeline(
-    task=Tasks.speech_timestamp,
-    model='damo/speech_timestamp_prediction-v1-16k-offline',)
-
-rec_result = inference_pipeline(
-    audio_in='https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_timestamps.wav',
-    text_in='涓� 涓� 涓� 澶� 骞� 娲� 鍥� 瀹� 涓� 浠� 涔� 璺� 鍒� 瑗� 澶� 骞� 娲� 鏉� 浜� 鍛�',)
-print(rec_result)
-# {'text': '<sil> 0.000 0.380;涓� 0.380 0.560;涓� 0.560 0.800;涓� 0.800 0.980;澶� 0.980 1.140;骞� 1.140 1.260;娲� 1.260 1.440;鍥� 1.440 1.680;瀹� 1.680 1.920;<sil> 1.920 2.040;涓� 2.040 2.200;浠� 2.200 2.320;涔� 2.320 2.500;璺� 2.500 2.680;鍒� 2.680 2.860;瑗� 2.860 3.040;澶� 3.040 3.200;骞� 3.200 3.380;娲� 3.380 3.500;鏉� 3.500 3.640;浜� 3.640 3.800;鍛� 3.800 4.150;<sil> 4.150 4.440;', 'timestamp': [[380, 560], [560, 800], [800, 980], [980, 1140], [1140, 1260], [1260, 1440], [1440, 1680], [1680, 1920], [2040, 2200], [2200, 2320], [2320, 2500], [2500, 2680], [2680, 2860], [2860, 3040], [3040, 3200], [3200, 3380], [3380, 3500], [3500, 3640], [3640, 3800], [3800, 4150]]}
-```
-
-### Speaker Verification
-#### X-vector Model
-```python
-from modelscope.pipelines import pipeline
-from modelscope.utils.constant import Tasks
-import numpy as np
-
-inference_sv_pipline = pipeline(
-    task=Tasks.speaker_verification,
-    model='damo/speech_xvector_sv-zh-cn-cnceleb-16k-spk3465-pytorch'
-)
-
-# embedding extract
-spk_embedding = inference_sv_pipline(audio_in='https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/sv_example_enroll.wav')["spk_embedding"]
-
-# speaker verification
-rec_result = inference_sv_pipline(audio_in=('https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/sv_example_enroll.wav','https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/sv_example_same.wav'))
-print(rec_result["scores"][0])
-# 0.8540499500025098
-```
-
-### Speaker Diarization
-#### SOND Model
-```python
-from modelscope.pipelines import pipeline
-from modelscope.utils.constant import Tasks
-
-inference_diar_pipline = pipeline(
-    mode="sond_demo",
-    num_workers=0,
-    task=Tasks.speaker_diarization,
-    diar_model_config="sond.yaml",
-    model='damo/speech_diarization_sond-en-us-callhome-8k-n16k4-pytorch',
-    model_revision="v1.0.3",
-    sv_model="damo/speech_xvector_sv-en-us-callhome-8k-spk6135-pytorch",
-    sv_model_revision="v1.0.0",
-)
-
-audio_list=[
-    "https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_data/record.wav",
-    "https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_data/spk_A.wav",
-    "https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_data/spk_B.wav",
-    "https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_data/spk_B1.wav"
-]
-
-results = inference_diar_pipline(audio_in=audio_list)
-print(results)
-# {'text': 'spk1 [(0.8, 1.84), (2.8, 6.16), (7.04, 10.64), (12.08, 12.8), (14.24, 15.6)]\nspk2 [(0.0, 1.12), (1.68, 3.2), (4.48, 7.12), (8.48, 9.04), (10.56, 14.48), (15.44, 16.0)]'}
-```
-
-### FAQ
-#### How to switch device from GPU to CPU with pipeline
-
-The pipeline defaults to decoding with GPU (`ngpu=1`) when GPU is available. If you want to switch to CPU, you could set `ngpu=0`
-```python
-inference_pipeline = pipeline(
-    task=Tasks.auto_speech_recognition,
-    model='damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch',
-    ngpu=0,
-)
-```
-
-#### How to infer from local model path
-Download model to local dir, by modelscope-sdk
-
-```python
-from modelscope.hub.snapshot_download import snapshot_download
-
-local_dir_root = "./models_from_modelscope"
-model_dir = snapshot_download('damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch', cache_dir=local_dir_root)
-```
-
-Or download model to local dir, by git lfs
-```shell
-git lfs install
-# git clone https://www.modelscope.cn/<namespace>/<model-name>.git
-git clone https://www.modelscope.cn/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch.git
-```
-
-Infer with local model path
-```python
-local_dir_root = "./models_from_modelscope/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch"
-inference_pipeline = pipeline(
-    task=Tasks.auto_speech_recognition,
-    model=local_dir_root,
-)
-```
-
-## Finetune with pipeline
-### Speech Recognition
-#### Paraformer Model
-
-finetune.py
-```python
-import os
-from modelscope.metainfo import Trainers
-from modelscope.trainers import build_trainer
-from modelscope.msdatasets.audio.asr_dataset import ASRDataset
-
-def modelscope_finetune(params):
-    if not os.path.exists(params.output_dir):
-        os.makedirs(params.output_dir, exist_ok=True)
-    # dataset split ["train", "validation"]
-    ds_dict = ASRDataset.load(params.data_path, namespace='speech_asr')
-    kwargs = dict(
-        model=params.model,
-        data_dir=ds_dict,
-        dataset_type=params.dataset_type,
-        work_dir=params.output_dir,
-        batch_bins=params.batch_bins,
-        max_epoch=params.max_epoch,
-        lr=params.lr)
-    trainer = build_trainer(Trainers.speech_asr_trainer, default_args=kwargs)
-    trainer.train()
-
-
-if __name__ == '__main__':
-    from funasr.utils.modelscope_param import modelscope_args
-    params = modelscope_args(model="damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch")
-    params.output_dir = "./checkpoint"                      # 妯″瀷淇濆瓨璺緞
-    params.data_path = "speech_asr_aishell1_trainsets"      # 鏁版嵁璺緞锛屽彲浠ヤ负modelscope涓凡涓婁紶鏁版嵁锛屼篃鍙互鏄湰鍦版暟鎹�
-    params.dataset_type = "small"                           # 灏忔暟鎹噺璁剧疆small锛岃嫢鏁版嵁閲忓ぇ浜�1000灏忔椂锛岃浣跨敤large
-    params.batch_bins = 2000                                # batch size锛屽鏋渄ataset_type="small"锛宐atch_bins鍗曚綅涓篺bank鐗瑰緛甯ф暟锛屽鏋渄ataset_type="large"锛宐atch_bins鍗曚綅涓烘绉掞紝
-    params.max_epoch = 50                                   # 鏈�澶ц缁冭疆鏁�
-    params.lr = 0.00005                                     # 璁剧疆瀛︿範鐜�
-    
-    modelscope_finetune(params)
-```
-
-```shell
-python finetune.py &> log.txt &
-```
-tail log.txt
-```
-[bach-gpu011024008134] 2023-04-23 18:59:13,976 (e2e_asr_paraformer:467) INFO: enable sampler in paraformer, sampling_ratio: 0.75
-[bach-gpu011024008134] 2023-04-23 18:59:48,924 (trainer:777) INFO: 2epoch:train:1-50batch:50num_updates: iter_time=0.008, forward_time=0.302, loss_att=0.186, acc=0.942, loss_pre=0.005, loss=0.192, backward_time=0.231, optim_step_time=0.117, optim0_lr0=7.484e-06, train_time=0.753
-[bach-gpu011024008134] 2023-04-23 19:00:23,869 (trainer:777) INFO: 2epoch:train:51-100batch:100num_updates: iter_time=1.152e-04, forward_time=0.275, loss_att=0.184, acc=0.945, loss_pre=0.005, loss=0.189, backward_time=0.234, optim_step_time=0.117, optim0_lr0=7.567e-06, train_time=0.699
-[bach-gpu011024008134] 2023-04-23 19:00:58,463 (trainer:777) INFO: 2epoch:train:101-150batch:150num_updates: iter_time=1.123e-04, forward_time=0.271, loss_att=0.204, acc=0.942, loss_pre=0.005, loss=0.210, backward_time=0.231, optim_step_time=0.116, optim0_lr0=7.651e-06, train_time=0.692
-```
-
-### FAQ
-### Multi GPUs training and distributed training
-
-If you want finetune with multi-GPUs, you could:
-```shell
-CUDA_VISIBLE_DEVICES=1,2 python -m torch.distributed.launch --nproc_per_node 2 finetune.py > log.txt 2>&1
-```
\ No newline at end of file
diff --git a/docs/modelscope_pipeline/quick_start_zh.md b/docs/modelscope_pipeline/quick_start_zh.md
deleted file mode 100644
index 91ad3c0..0000000
--- a/docs/modelscope_pipeline/quick_start_zh.md
+++ /dev/null
@@ -1,227 +0,0 @@
-(绠�浣撲腑鏂噟[English](./quick_start.md))
-
-# 蹇�熶娇鐢�
-
-> **娉ㄦ剰**: 
-> modelscope pipeline鏀寔model zoo涓殑鎵�鏈夋ā鍨嬭繘琛屾帹鐞嗗拰寰皟銆傝繖閲屾垜浠互typic妯″瀷涓轰緥鏉ユ紨绀虹敤娉曘��
-
-
-## 浣跨敤pipeline杩涜鎺ㄧ悊
-
-### 璇煶璇嗗埆
-#### Paraformer妯″瀷
-```python
-from modelscope.pipelines import pipeline
-from modelscope.utils.constant import Tasks
-
-inference_pipeline = pipeline(
-    task=Tasks.auto_speech_recognition,
-    model='damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch',
-)
-
-rec_result = inference_pipeline(audio_in='https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_zh.wav')
-print(rec_result)
-# {'text': '娆㈣繋澶у鏉ヤ綋楠岃揪鎽╅櫌鎺ㄥ嚭鐨勮闊宠瘑鍒ā鍨�'}
-```
-
-### 璇煶绔偣妫�娴�
-#### FSMN-VAD妯″瀷
-```python
-from modelscope.pipelines import pipeline
-from modelscope.utils.constant import Tasks
-from modelscope.utils.logger import get_logger
-import logging
-logger = get_logger(log_level=logging.CRITICAL)
-logger.setLevel(logging.CRITICAL)
-
-inference_pipeline = pipeline(
-    task=Tasks.voice_activity_detection,
-    model='damo/speech_fsmn_vad_zh-cn-16k-common-pytorch',
-    )
-
-segments_result = inference_pipeline(audio_in='https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/vad_example.wav')
-print(segments_result)
-# {'text': [[70, 2340], [2620, 6200], [6480, 23670], [23950, 26250], [26780, 28990], [29950, 31430], [31750, 37600], [38210, 46900], [47310, 49630], [49910, 56460], [56740, 59540], [59820, 70450]]}
-```
-
-### 鏍囩偣鎭㈠
-#### CT_Transformer妯″瀷
-```python
-from modelscope.pipelines import pipeline
-from modelscope.utils.constant import Tasks
-
-inference_pipeline = pipeline(
-    task=Tasks.punctuation,
-    model='damo/punc_ct-transformer_zh-cn-common-vocab272727-pytorch',
-    )
-
-rec_result = inference_pipeline(text_in='鎴戜滑閮芥槸鏈ㄥご浜轰笉浼氳璇濅笉浼氬姩')
-print(rec_result)
-# {'text': '鎴戜滑閮芥槸鏈ㄥご浜猴紝涓嶄細璁茶瘽锛屼笉浼氬姩銆�'}
-```
-
-### 鏃堕棿鎴抽娴�
-#### TP-Aligner妯″瀷
-```python
-from modelscope.pipelines import pipeline
-from modelscope.utils.constant import Tasks
-
-inference_pipeline = pipeline(
-    task=Tasks.speech_timestamp,
-    model='damo/speech_timestamp_prediction-v1-16k-offline',)
-
-rec_result = inference_pipeline(
-    audio_in='https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_timestamps.wav',
-    text_in='涓� 涓� 涓� 澶� 骞� 娲� 鍥� 瀹� 涓� 浠� 涔� 璺� 鍒� 瑗� 澶� 骞� 娲� 鏉� 浜� 鍛�',)
-print(rec_result)
-# {'text': '<sil> 0.000 0.380;涓� 0.380 0.560;涓� 0.560 0.800;涓� 0.800 0.980;澶� 0.980 1.140;骞� 1.140 1.260;娲� 1.260 1.440;鍥� 1.440 1.680;瀹� 1.680 1.920;<sil> 1.920 2.040;涓� 2.040 2.200;浠� 2.200 2.320;涔� 2.320 2.500;璺� 2.500 2.680;鍒� 2.680 2.860;瑗� 2.860 3.040;澶� 3.040 3.200;骞� 3.200 3.380;娲� 3.380 3.500;鏉� 3.500 3.640;浜� 3.640 3.800;鍛� 3.800 4.150;<sil> 4.150 4.440;', 'timestamp': [[380, 560], [560, 800], [800, 980], [980, 1140], [1140, 1260], [1260, 1440], [1440, 1680], [1680, 1920], [2040, 2200], [2200, 2320], [2320, 2500], [2500, 2680], [2680, 2860], [2860, 3040], [3040, 3200], [3200, 3380], [3380, 3500], [3500, 3640], [3640, 3800], [3800, 4150]]}
-```
-
-### 璇磋瘽浜虹‘璁�
-#### X-vector妯″瀷
-```python
-from modelscope.pipelines import pipeline
-from modelscope.utils.constant import Tasks
-import numpy as np
-
-inference_sv_pipline = pipeline(
-    task=Tasks.speaker_verification,
-    model='damo/speech_xvector_sv-zh-cn-cnceleb-16k-spk3465-pytorch'
-)
-
-# embedding extract
-spk_embedding = inference_sv_pipline(audio_in='https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/sv_example_enroll.wav')["spk_embedding"]
-
-# speaker verification
-rec_result = inference_sv_pipline(audio_in=('https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/sv_example_enroll.wav','https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/sv_example_same.wav'))
-print(rec_result["scores"][0])
-# 0.8540499500025098
-```
-
-### 璇磋瘽浜烘棩蹇�
-#### SOND妯″瀷
-```python
-from modelscope.pipelines import pipeline
-from modelscope.utils.constant import Tasks
-
-inference_diar_pipline = pipeline(
-    mode="sond_demo",
-    num_workers=0,
-    task=Tasks.speaker_diarization,
-    diar_model_config="sond.yaml",
-    model='damo/speech_diarization_sond-en-us-callhome-8k-n16k4-pytorch',
-    model_revision="v1.0.3",
-    sv_model="damo/speech_xvector_sv-en-us-callhome-8k-spk6135-pytorch",
-    sv_model_revision="v1.0.0",
-)
-
-audio_list=[
-    "https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_data/record.wav",
-    "https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_data/spk_A.wav",
-    "https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_data/spk_B.wav",
-    "https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_data/spk_B1.wav"
-]
-
-results = inference_diar_pipline(audio_in=audio_list)
-print(results)
-# {'text': 'spk1 [(0.8, 1.84), (2.8, 6.16), (7.04, 10.64), (12.08, 12.8), (14.24, 15.6)]\nspk2 [(0.0, 1.12), (1.68, 3.2), (4.48, 7.12), (8.48, 9.04), (10.56, 14.48), (15.44, 16.0)]'}
-```
-
-### 甯歌闂
-#### 浣跨敤pipeline杩涜鎺ㄧ悊锛屽浣曞湪CPU涓嶨PU杩涜鍒囨崲
-
-The pipeline defaults to decoding with GPU (`ngpu=1`) when GPU is available. If you want to switch to CPU, you could set `ngpu=0`
-```python
-inference_pipeline = pipeline(
-    task=Tasks.auto_speech_recognition,
-    model='damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch',
-    ngpu=0,
-)
-```
-
-#### 濡備綍浠庢湰鍦版ā鍨嬭繘琛屾帹鐞嗭紙涓嶈仈缃戜娇鐢級
-浣跨敤modelscope-sdk灏嗘ā鍨嬩笅杞藉埌鏈湴
-
-```python
-from modelscope.hub.snapshot_download import snapshot_download
-
-local_dir_root = "./models_from_modelscope"
-model_dir = snapshot_download('damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch', cache_dir=local_dir_root)
-```
-
-鎴栬�呬娇鐢╣it灏嗘ā鍨嬩笅杞藉埌鏈湴
-```shell
-git lfs install
-# git clone https://www.modelscope.cn/<namespace>/<model-name>.git
-git clone https://www.modelscope.cn/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch.git
-```
-
-浠庝笅杞界殑鏈湴妯″瀷杩涜鎺ㄧ悊锛堝彲浠ヤ笉鑱旂綉浣跨敤锛�
-```python
-local_dir_root = "./models_from_modelscope/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch"
-inference_pipeline = pipeline(
-    task=Tasks.auto_speech_recognition,
-    model=local_dir_root,
-)
-```
-
-## 浣跨敤pipeline杩涜寰皟
-### 璇煶璇嗗埆
-#### Paraformer妯″瀷
-
-finetune.py
-```python
-import os
-from modelscope.metainfo import Trainers
-from modelscope.trainers import build_trainer
-from modelscope.msdatasets.audio.asr_dataset import ASRDataset
-
-def modelscope_finetune(params):
-    if not os.path.exists(params.output_dir):
-        os.makedirs(params.output_dir, exist_ok=True)
-    # dataset split ["train", "validation"]
-    ds_dict = ASRDataset.load(params.data_path, namespace='speech_asr')
-    kwargs = dict(
-        model=params.model,
-        data_dir=ds_dict,
-        dataset_type=params.dataset_type,
-        work_dir=params.output_dir,
-        batch_bins=params.batch_bins,
-        max_epoch=params.max_epoch,
-        lr=params.lr)
-    trainer = build_trainer(Trainers.speech_asr_trainer, default_args=kwargs)
-    trainer.train()
-
-
-if __name__ == '__main__':
-    from funasr.utils.modelscope_param import modelscope_args
-    params = modelscope_args(model="damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch")
-    params.output_dir = "./checkpoint"                      # 妯″瀷淇濆瓨璺緞
-    params.data_path = "speech_asr_aishell1_trainsets"      # 鏁版嵁璺緞锛屽彲浠ヤ负modelscope涓凡涓婁紶鏁版嵁锛屼篃鍙互鏄湰鍦版暟鎹�
-    params.dataset_type = "small"                           # 灏忔暟鎹噺璁剧疆small锛岃嫢鏁版嵁閲忓ぇ浜�1000灏忔椂锛岃浣跨敤large
-    params.batch_bins = 2000                                # batch size锛屽鏋渄ataset_type="small"锛宐atch_bins鍗曚綅涓篺bank鐗瑰緛甯ф暟锛屽鏋渄ataset_type="large"锛宐atch_bins鍗曚綅涓烘绉掞紝
-    params.max_epoch = 50                                   # 鏈�澶ц缁冭疆鏁�
-    params.lr = 0.00005                                     # 璁剧疆瀛︿範鐜�
-    
-    modelscope_finetune(params)
-```
-
-```shell
-python finetune.py &> log.txt &
-```
-tail log.txt
-```
-[bach-gpu011024008134] 2023-04-23 18:59:13,976 (e2e_asr_paraformer:467) INFO: enable sampler in paraformer, sampling_ratio: 0.75
-[bach-gpu011024008134] 2023-04-23 18:59:48,924 (trainer:777) INFO: 2epoch:train:1-50batch:50num_updates: iter_time=0.008, forward_time=0.302, loss_att=0.186, acc=0.942, loss_pre=0.005, loss=0.192, backward_time=0.231, optim_step_time=0.117, optim0_lr0=7.484e-06, train_time=0.753
-[bach-gpu011024008134] 2023-04-23 19:00:23,869 (trainer:777) INFO: 2epoch:train:51-100batch:100num_updates: iter_time=1.152e-04, forward_time=0.275, loss_att=0.184, acc=0.945, loss_pre=0.005, loss=0.189, backward_time=0.234, optim_step_time=0.117, optim0_lr0=7.567e-06, train_time=0.699
-[bach-gpu011024008134] 2023-04-23 19:00:58,463 (trainer:777) INFO: 2epoch:train:101-150batch:150num_updates: iter_time=1.123e-04, forward_time=0.271, loss_att=0.204, acc=0.942, loss_pre=0.005, loss=0.210, backward_time=0.231, optim_step_time=0.116, optim0_lr0=7.651e-06, train_time=0.692
-```
-
-### 甯歌闂
-### 澶欸PU璁粌
-
-鍙互浣跨敤涓嬮潰鐨勬寚浠よ繘琛屽GPU璁粌
-```shell
-CUDA_VISIBLE_DEVICES=1,2 python -m torch.distributed.launch --nproc_per_node 2 finetune.py > log.txt 2>&1
-```
-
diff --git a/docs/modelscope_pipeline/sd_pipeline.md b/docs/modelscope_pipeline/sd_pipeline.md
deleted file mode 120000
index 9c3ac98..0000000
--- a/docs/modelscope_pipeline/sd_pipeline.md
+++ /dev/null
@@ -1 +0,0 @@
-../../egs_modelscope/speaker_diarization/TEMPLATE/README.md
\ No newline at end of file
diff --git a/docs/modelscope_pipeline/sv_pipeline.md b/docs/modelscope_pipeline/sv_pipeline.md
deleted file mode 120000
index 3217355..0000000
--- a/docs/modelscope_pipeline/sv_pipeline.md
+++ /dev/null
@@ -1 +0,0 @@
-../../egs_modelscope/speaker_verification/TEMPLATE/README.md
\ No newline at end of file
diff --git a/docs/modelscope_pipeline/tp_pipeline.md b/docs/modelscope_pipeline/tp_pipeline.md
deleted file mode 120000
index 5e7b0f4..0000000
--- a/docs/modelscope_pipeline/tp_pipeline.md
+++ /dev/null
@@ -1 +0,0 @@
-../../egs_modelscope/tp/TEMPLATE/README.md
\ No newline at end of file
diff --git a/docs/modelscope_pipeline/vad_pipeline.md b/docs/modelscope_pipeline/vad_pipeline.md
deleted file mode 120000
index 30ea6fc..0000000
--- a/docs/modelscope_pipeline/vad_pipeline.md
+++ /dev/null
@@ -1 +0,0 @@
-../../egs_modelscope/vad/TEMPLATE/README.md
\ No newline at end of file

--
Gitblit v1.9.1