From f4c6af4528d94735e2666b5233e4b8a6faf40bf2 Mon Sep 17 00:00:00 2001
From: 游雁 <zhifu.gzf@alibaba-inc.com>
Date: 星期四, 21 十一月 2024 16:07:06 +0800
Subject: [PATCH] docs

---
 /dev/null             |  473 -----------------------------------------------------------
 examples/README.md    |    1 
 examples/README_zh.md |    1 
 3 files changed, 2 insertions(+), 473 deletions(-)

diff --git a/examples/README.md b/examples/README.md
deleted file mode 100644
index 802b1a4..0000000
--- a/examples/README.md
+++ /dev/null
@@ -1,461 +0,0 @@
-([绠�浣撲腑鏂嘳(./README_zh.md)|English)
-
-FunASR has open-sourced a large number of pre-trained models on industrial data. You are free to use, copy, modify, and share FunASR models under the [Model License Agreement](https://github.com/alibaba-damo-academy/FunASR/blob/main/MODEL_LICENSE). Below, we list some representative models. For a comprehensive list, please refer to our [Model Zoo](https://github.com/alibaba-damo-academy/FunASR/tree/main/model_zoo).
-
-<div align="center">  
-<h4>
- <a href="#Inference"> Model Inference </a>   
-锝�<a href="#Training"> Model Training and Testing </a>
-锝�<a href="#Export"> Model Export and Testing </a>
-</h4>
-</div>
-
-<a name="Inference"></a>
-## Model Inference
-
-### Quick Start
-
-For command-line invocation:
-```shell
-funasr ++model=paraformer-zh ++vad_model="fsmn-vad" ++punc_model="ct-punc" ++input=asr_example_zh.wav
-```
-
-For python code invocation (recommended): 
-
-```python
-from funasr import AutoModel
-
-model = AutoModel(model="paraformer-zh")
-
-res = model.generate(input="https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/vad_example.wav")
-print(res)
-```
-
-### API Description 
-#### AutoModel Definition
-```python
-model = AutoModel(model=[str], device=[str], ncpu=[int], output_dir=[str], batch_size=[int], hub=[str], **kwargs)
-```
-- `model`(str): model name in the [Model Repository](https://github.com/alibaba-damo-academy/FunASR/tree/main/model_zoo), or a model path on local disk.
-- `device`(str): `cuda:0` (default gpu0) for using GPU for inference, specify `cpu` for using CPU.
-- `ncpu`(int): `4` (default), sets the number of threads for CPU internal operations.
-- `output_dir`(str): `None` (default), set this to specify the output path for the results.
-- `batch_size`(int): `1` (default), the number of samples per batch during decoding.
-- `hub`(str)锛歚ms` (default) to download models from ModelScope. Use `hf` to download models from Hugging Face.
-- `**kwargs`(dict): Any parameters found in config.yaml can be directly specified here, for instance, the maximum segmentation length in the vad model max_single_segment_time=6000 (milliseconds).
-
-#### AutoModel Inference
-```python
-res = model.generate(input=[str], output_dir=[str])
-```
-- `input`: The input to be decoded, which could be:
-  - A wav file path, e.g., asr_example.wav
-  - A pcm file path, e.g., asr_example.pcm, in this case, specify the audio sampling rate fs (default is 16000)
-  - An audio byte stream, e.g., byte data from a microphone
-  - A wav.scp, a Kaldi-style wav list (wav_id \t wav_path), for example:
-  ```text
-  asr_example1  ./audios/asr_example1.wav
-  asr_example2  ./audios/asr_example2.wav
-  ```
-  When using wav.scp as input, you must set output_dir to save the output results.
-  - Audio samples, `e.g.`: `audio, rate = soundfile.read("asr_example_zh.wav")`, data type is numpy.ndarray. Supports batch inputs, type is list锛�
-  ```[audio_sample1, audio_sample2, ..., audio_sampleN]```
-  - fbank input, supports batch grouping. Shape is [batch, frames, dim], type is torch.Tensor.
-- `output_dir`: None (default), if set, specifies the output path for the results.
-- `**kwargs`(dict): Inference parameters related to the model, for example,`beam_size=10`锛宍decoding_ctc_weight=0.1`.
-
-
-### More Usage Introduction
-
-
-#### Speech Recognition (Non-streaming)
-##### SenseVoice
-```python
-from funasr import AutoModel
-from funasr.utils.postprocess_utils import rich_transcription_postprocess
-
-model_dir = "iic/SenseVoiceSmall"
-
-model = AutoModel(
-    model=model_dir,
-    vad_model="fsmn-vad",
-    vad_kwargs={"max_single_segment_time": 30000},
-    device="cuda:0",
-)
-
-# en
-res = model.generate(
-    input=f"{model.model_path}/example/en.mp3",
-    cache={},
-    language="auto",  # "zn", "en", "yue", "ja", "ko", "nospeech"
-    use_itn=True,
-    batch_size_s=60,
-    merge_vad=True,  #
-    merge_length_s=15,
-)
-text = rich_transcription_postprocess(res[0]["text"])
-print(text)
-```
-Notes:
-- `model_dir`: The name of the model, or the path to the model on the local disk.
-- `vad_model`: This indicates the activation of VAD (Voice Activity Detection). The purpose of VAD is to split long audio into shorter clips. In this case, the inference time includes both VAD and SenseVoice total consumption, and represents the end-to-end latency. If you wish to test the SenseVoice model's inference time separately, the VAD model can be disabled.
-- `vad_kwargs`: Specifies the configurations for the VAD model. `max_single_segment_time`: denotes the maximum duration for audio segmentation by the `vad_model`, with the unit being milliseconds (ms).
-- `use_itn`: Whether the output result includes punctuation and inverse text normalization.
-- `batch_size_s`: Indicates the use of dynamic batching, where the total duration of audio in the batch is measured in seconds (s).
-- `merge_vad`: Whether to merge short audio fragments segmented by the VAD model, with the merged length being `merge_length_s`, in seconds (s).
-- `ban_emo_unk`: Whether to ban the output of the `emo_unk` token.
-
-##### Paraformer
-```python
-from funasr import AutoModel
-# paraformer-zh is a multi-functional asr model
-# use vad, punc, spk or not as you need
-model = AutoModel(model="paraformer-zh",  
-                  vad_model="fsmn-vad", 
-                  vad_kwargs={"max_single_segment_time": 60000},
-                  punc_model="ct-punc", 
-                  # spk_model="cam++"
-                  )
-wav_file = f"{model.model_path}/example/asr_example.wav"
-res = model.generate(input=wav_file, batch_size_s=300, batch_size_threshold_s=60, hotword='榄旀惌')
-print(res)
-```
-Notes:
-- Typically, the input duration for models is limited to under 30 seconds. However, when combined with `vad_model`, support for audio input of any length is enabled, not limited to the paraformer model鈥攁ny audio input model can be used.
-- Parameters related to model can be directly specified in the definition of AutoModel; parameters related to `vad_model` can be set through `vad_kwargs`, which is a dict; similar parameters include `punc_kwargs` and `spk_kwargs`.
-- `max_single_segment_time`: Denotes the maximum audio segmentation length for `vad_model`, measured in milliseconds (ms).
-- `batch_size_s` represents the use of dynamic batching, where the total audio duration within a batch is measured in seconds (s).
-- `batch_size_threshold_s`: Indicates that when the duration of an audio segment post-VAD segmentation exceeds the batch_size_threshold_s threshold, the batch size is set to 1, measured in seconds (s).
-
-Recommendations: 
-
-When you input long audio and encounter Out Of Memory (OOM) issues, since memory usage tends to increase quadratically with audio length, consider the following three scenarios:
-
-a) At the beginning of inference, memory usage primarily depends on `batch_size_s`. Appropriately reducing this value can decrease memory usage.
-b) During the middle of inference, when encountering long audio segments cut by VAD and the total token count is less than `batch_size_s`, yet still facing OOM, you can appropriately reduce `batch_size_threshold_s`. If the threshold is exceeded, the batch size is forced to 1.
-c) Towards the end of inference, if long audio segments cut by VAD have a total token count less than `batch_size_s` and exceed the `threshold` batch_size_threshold_s, forcing the batch size to 1 and still facing OOM, you may reduce `max_single_segment_time` to shorten the VAD audio segment length.
-
-#### Speech Recognition (Streaming)
-```python
-from funasr import AutoModel
-
-chunk_size = [0, 10, 5] #[0, 10, 5] 600ms, [0, 8, 4] 480ms
-encoder_chunk_look_back = 4 #number of chunks to lookback for encoder self-attention
-decoder_chunk_look_back = 1 #number of encoder chunks to lookback for decoder cross-attention
-
-model = AutoModel(model="paraformer-zh-streaming")
-
-import soundfile
-import os
-
-wav_file = os.path.join(model.model_path, "example/asr_example.wav")
-speech, sample_rate = soundfile.read(wav_file)
-chunk_stride = chunk_size[1] * 960 # 600ms
-
-cache = {}
-total_chunk_num = int(len((speech)-1)/chunk_stride+1)
-for i in range(total_chunk_num):
-    speech_chunk = speech[i*chunk_stride:(i+1)*chunk_stride]
-    is_final = i == total_chunk_num - 1
-    res = model.generate(input=speech_chunk, cache=cache, is_final=is_final, chunk_size=chunk_size, encoder_chunk_look_back=encoder_chunk_look_back, decoder_chunk_look_back=decoder_chunk_look_back)
-    print(res)
-```
-Note: `chunk_size` is the configuration for streaming latency.` [0,10,5]` indicates that the real-time display granularity is `10*60=600ms`, and the lookahead information is `5*60=300ms`. Each inference input is `600ms` (sample points are `16000*0.6=960`), and the output is the corresponding text. For the last speech segment input, `is_final=True` needs to be set to force the output of the last word.
-
-#### Voice Activity Detection (Non-Streaming)
-```python
-from funasr import AutoModel
-
-model = AutoModel(model="fsmn-vad")
-wav_file = f"{model.model_path}/example/vad_example.wav"
-res = model.generate(input=wav_file)
-print(res)
-```
-Note: The output format of the VAD model is: `[[beg1, end1], [beg2, end2], ..., [begN, endN]]`, where `begN/endN` indicates the starting/ending point of the `N-th` valid audio segment, measured in milliseconds.
-
-#### Voice Activity Detection (Streaming)
-```python
-from funasr import AutoModel
-
-chunk_size = 200 # ms
-model = AutoModel(model="fsmn-vad")
-
-import soundfile
-
-wav_file = f"{model.model_path}/example/vad_example.wav"
-speech, sample_rate = soundfile.read(wav_file)
-chunk_stride = int(chunk_size * sample_rate / 1000)
-
-cache = {}
-total_chunk_num = int(len((speech)-1)/chunk_stride+1)
-for i in range(total_chunk_num):
-    speech_chunk = speech[i*chunk_stride:(i+1)*chunk_stride]
-    is_final = i == total_chunk_num - 1
-    res = model.generate(input=speech_chunk, cache=cache, is_final=is_final, chunk_size=chunk_size)
-    if len(res[0]["value"]):
-        print(res)
-```
-Note: The output format for the streaming VAD model can be one of four scenarios:
-- `[[beg1, end1], [beg2, end2], .., [begN, endN]]`锛歍he same as the offline VAD output result mentioned above.
-- `[[beg, -1]]`锛欼ndicates that only a starting point has been detected.
-- `[[-1, end]]`锛欼ndicates that only an ending point has been detected.
-- `[]`锛欼ndicates that neither a starting point nor an ending point has been detected. 
-
-The output is measured in milliseconds and represents the absolute time from the starting point.
-#### Punctuation Restoration
-```python
-from funasr import AutoModel
-
-model = AutoModel(model="ct-punc")
-res = model.generate(input="閭ｄ粖澶╃殑浼氬氨鍒拌繖閲屽惂 happy new year 鏄庡勾瑙�")
-print(res)
-```
-#### Timestamp Prediction
-```python
-from funasr import AutoModel
-
-model = AutoModel(model="fa-zh")
-wav_file = f"{model.model_path}/example/asr_example.wav"
-text_file = f"{model.model_path}/example/text.txt"
-res = model.generate(input=(wav_file, text_file), data_type=("sound", "text"))
-print(res)
-```
-
-More examples ref to [docs](https://github.com/alibaba-damo-academy/FunASR/tree/main/examples/industrial_data_pretraining)
-
-<a name="Training"></a>
-## Model Training and Testing
-
-### Quick Start
-
-Execute via command line (for quick testing, not recommended):
-```shell
-funasr-train ++model=paraformer-zh ++train_data_set_list=data/list/train.jsonl ++valid_data_set_list=data/list/val.jsonl ++output_dir="./outputs" &> log.txt &
-```
-
-Execute with Python code (supports multi-node and multi-GPU, recommended):
-
-```shell
-cd examples/industrial_data_pretraining/paraformer
-bash finetune.sh
-# "log_file: ./outputs/log.txt"
-```
-Full code ref to [finetune.sh](https://github.com/alibaba-damo-academy/FunASR/blob/main/examples/industrial_data_pretraining/paraformer/finetune.sh)
-
-### Detailed Parameter Description:
-
-```shell
-funasr/bin/train.py \
-++model="${model_name_or_model_dir}" \
-++train_data_set_list="${train_data}" \
-++valid_data_set_list="${val_data}" \
-++dataset_conf.batch_size=20000 \
-++dataset_conf.batch_type="token" \
-++dataset_conf.num_workers=4 \
-++train_conf.max_epoch=50 \
-++train_conf.log_interval=1 \
-++train_conf.resume=false \
-++train_conf.validate_interval=2000 \
-++train_conf.save_checkpoint_interval=2000 \
-++train_conf.keep_nbest_models=20 \
-++train_conf.avg_nbest_model=10 \
-++optim_conf.lr=0.0002 \
-++output_dir="${output_dir}" &> ${log_file}
-```
-
-- `model`锛坰tr锛�: The name of the model (the ID in the model repository), at which point the script will automatically download the model to local storage; alternatively, the path to a model already downloaded locally.
-- `train_data_set_list`锛坰tr锛�: The path to the training data, typically in jsonl format, for specific details refer to [examples](https://github.com/alibaba-damo-academy/FunASR/blob/main/data/list).
-- `valid_data_set_list`锛坰tr锛夛細The path to the validation data, also generally in jsonl format, for specific details refer to examples](https://github.com/alibaba-damo-academy/FunASR/blob/main/data/list).
-- `dataset_conf.batch_type`锛坰tr锛夛細example (default), the type of batch. example means batches are formed with a fixed number of batch_size samples; length or token means dynamic batching, with total length or number of tokens of the batch equalling batch_size.
-- `dataset_conf.batch_size`锛坕nt锛夛細Used in conjunction with batch_type. When batch_type=example, it represents the number of samples; when batch_type=length, it represents the length of the samples, measured in fbank frames (1 frame = 10 ms) or the number of text tokens.
-- `train_conf.max_epoch`锛坕nt锛夛細The total number of epochs for training.
-- `train_conf.log_interval`锛坕nt锛夛細The number of steps between logging.
-- `train_conf.resume`锛坕nt锛夛細Whether to enable checkpoint resuming for training.
-- `train_conf.validate_interval`锛坕nt锛夛細The interval in steps to run validation tests during training.
-- `train_conf.save_checkpoint_interval`锛坕nt锛夛細The interval in steps for saving the model during training.
-- `train_conf.keep_nbest_models`锛坕nt锛夛細The maximum number of model parameters to retain, sorted by validation set accuracy, from highest to lowest.
-- `train_conf.avg_nbest_model`锛坕nt锛夛細Average over the top n models with the highest accuracy.
-- `optim_conf.lr`锛坒loat锛夛細The learning rate.
-- `output_dir`锛坰tr锛夛細The path for saving the model.
-- `**kwargs`(dict): Any parameters in config.yaml can be specified directly here, for example, to filter out audio longer than 20s: dataset_conf.max_token_length=2000, measured in fbank frames (1 frame = 10 ms) or the number of text tokens.
-
-#### Multi-GPU Training
-##### Single-Machine Multi-GPU Training
-```shell
-export CUDA_VISIBLE_DEVICES="0,1"
-gpu_num=$(echo $CUDA_VISIBLE_DEVICES | awk -F "," '{print NF}')
-
-torchrun --nnodes 1 --nproc_per_node ${gpu_num} --master_port 12345 \
-../../../funasr/bin/train.py ${train_args}
-```
---nnodes represents the total number of participating nodes, while --nproc_per_node indicates the number of processes running on each node. --master_port indicates the port is 12345
-
-##### Multi-Machine Multi-GPU Training
-
-On the master node, assuming the IP is 192.168.1.1 and the port is 12345, and you're using 2 GPUs, you would run the following command:
-```shell
-export CUDA_VISIBLE_DEVICES="0,1"
-gpu_num=$(echo $CUDA_VISIBLE_DEVICES | awk -F "," '{print NF}')
-
-torchrun --nnodes 2 --node_rank 0 --nproc_per_node ${gpu_num} --master_addr 192.168.1.1 --master_port 12345 \
-../../../funasr/bin/train.py ${train_args}
-```
-On the worker node (assuming the IP is 192.168.1.2), you need to ensure that the MASTER_ADDR and MASTER_PORT environment variables are set to match those of the master node, and then run the same command:
-
-```shell
-export CUDA_VISIBLE_DEVICES="0,1"
-gpu_num=$(echo $CUDA_VISIBLE_DEVICES | awk -F "," '{print NF}')
-
-torchrun --nnodes 2 --node_rank 1 --nproc_per_node ${gpu_num} --master_addr 192.168.1.1 --master_port 12345 \
-../../../funasr/bin/train.py ${train_args}
-```
-
---nnodes indicates the total number of nodes participating in the training, --node_rank represents the ID of the current node, and --nproc_per_node specifies the number of processes running on each node (usually corresponds to the number of GPUs).
-
-#### Data prepare
-
-`jsonl` ref to锛圼demo](https://github.com/alibaba-damo-academy/FunASR/blob/main/data/list)锛�.
-The instruction scp2jsonl can be used to generate from wav.scp and text.txt. The preparation process for wav.scp and text.txt is as follows:
-
-`train_text.txt`
-
-```bash
-ID0012W0013 褰撳鎴烽闄╂壙鍙楄兘鍔涜瘎浼颁緷鎹彂鐢熷彉鍖栨椂
-ID0012W0014 鎵�鏈夊彧瑕佸鐞� data 涓嶇浣犳槸鍋� machine learning 鍋� deep learning
-ID0012W0015 he tried to think how it could be
-```
-
-
-`train_wav.scp`
-
-
-```bash
-BAC009S0764W0121 https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/BAC009S0764W0121.wav
-BAC009S0916W0489 https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/BAC009S0916W0489.wav
-ID0012W0015 https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_cn_en.wav
-```
-
-`Command`
-
-```shell
-# generate train.jsonl and val.jsonl from wav.scp and text.txt
-scp2jsonl \
-++scp_file_list='["../../../data/list/train_wav.scp", "../../../data/list/train_text.txt"]' \
-++data_type_list='["source", "target"]' \
-++jsonl_file_out="../../../data/list/train.jsonl"
-```
-
-(Optional, not required) If you need to parse from jsonl back to wav.scp and text.txt, you can use the following command:
-
-```shell
-# generate wav.scp and text.txt from train.jsonl and val.jsonl
-jsonl2scp \
-++scp_file_list='["../../../data/list/train_wav.scp", "../../../data/list/train_text.txt"]' \
-++data_type_list='["source", "target"]' \
-++jsonl_file_in="../../../data/list/train.jsonl"
-```
-
-#### Training log
-
-##### log.txt
-```shell
-tail log.txt
-[2024-03-21 15:55:52,137][root][INFO] - train, rank: 3, epoch: 0/50, step: 6990/1, total step: 6990, (loss_avg_rank: 0.327), (loss_avg_epoch: 0.409), (ppl_avg_epoch: 1.506), (acc_avg_epoch: 0.795), (lr: 1.165e-04), [('loss_att', 0.259), ('acc', 0.825), ('loss_pre', 0.04), ('loss', 0.299), ('batch_size', 40)], {'data_load': '0.000', 'forward_time': '0.315', 'backward_time': '0.555', 'optim_time': '0.076', 'total_time': '0.947'}, GPU, memory: usage: 3.830 GB, peak: 18.357 GB, cache: 20.910 GB, cache_peak: 20.910 GB
-[2024-03-21 15:55:52,139][root][INFO] - train, rank: 1, epoch: 0/50, step: 6990/1, total step: 6990, (loss_avg_rank: 0.334), (loss_avg_epoch: 0.409), (ppl_avg_epoch: 1.506), (acc_avg_epoch: 0.795), (lr: 1.165e-04), [('loss_att', 0.285), ('acc', 0.823), ('loss_pre', 0.046), ('loss', 0.331), ('batch_size', 36)], {'data_load': '0.000', 'forward_time': '0.334', 'backward_time': '0.536', 'optim_time': '0.077', 'total_time': '0.948'}, GPU, memory: usage: 3.943 GB, peak: 18.291 GB, cache: 19.619 GB, cache_peak: 19.619 GB
-```
-
-
-- `rank`锛歡pu id銆�
-- `epoch`,`step`,`total step`锛歵he current epoch, step, and total steps.
-- `loss_avg_rank`锛歵he average loss across all GPUs for the current step.
-- `loss/ppl/acc_avg_epoch`锛歵he overall average loss/perplexity/accuracy for the current epoch, up to the current step count. The last step of the epoch when it ends represents the total average loss/perplexity/accuracy for that epoch; it is recommended to use the accuracy metric.
-- `lr`锛歵he learning rate for the current step.
-- `[('loss_att', 0.259), ('acc', 0.825), ('loss_pre', 0.04), ('loss', 0.299), ('batch_size', 40)]`锛歵he specific data for the current GPU ID.
-- `total_time`锛歵he total time taken for a single step.
-- `GPU, memory`锛歵he model-used/peak memory and the model+cache-used/peak memory.
-
-##### tensorboard
-```bash
-tensorboard --logdir /xxxx/FunASR/examples/industrial_data_pretraining/paraformer/outputs/log/tensorboard
-```
-http://localhost:6006/
-
-### 璁粌鍚庢ā鍨嬫祴璇�
-
-
-#### With `configuration.json` file
-
-Assuming the training model path is: ./model_dir, if a configuration.json file has been generated in this directory, you only need to change the model name to the model path in the above model inference method. 
-
-For example, for shell inference:
-```shell
-python -m funasr.bin.inference ++model="./model_dir" ++input=="${input}" ++output_dir="${output_dir}"
-```
-
-Python inference
-
-```python
-from funasr import AutoModel
-
-model = AutoModel(model="./model_dir")
-
-res = model.generate(input=wav_file)
-print(res)
-```
-
-#### Without `configuration.json` file
-
-If there is no configuration.json in the model path, you need to manually specify the exact configuration file path and the model path.
-
-```shell
-python -m funasr.bin.inference \
---config-path "${local_path}" \
---config-name "${config}" \
-++init_param="${init_param}" \
-++tokenizer_conf.token_list="${tokens}" \
-++frontend_conf.cmvn_file="${cmvn_file}" \
-++input="${input}" \
-++output_dir="${output_dir}" \
-++device="${device}"
-```
-
-Parameter Introduction
-- `config-path`锛歍his is the path to the config.yaml saved during the experiment, which can be found in the experiment's output directory.
-- `config-name`锛歍he name of the configuration file, usually config.yaml. It supports both YAML and JSON formats, for example config.json.
-- `init_param`锛歍he model parameters that need to be tested, usually model.pt. You can choose a specific model file as needed.
-- `tokenizer_conf.token_list`锛歍he path to the vocabulary file, which is normally specified in config.yaml. There is no need to manually specify it again unless the path in config.yaml is incorrect, in which case the correct path must be manually specified here.
-- `frontend_conf.cmvn_file`锛歍he CMVN (Cepstral Mean and Variance Normalization) file used when extracting fbank features from WAV files, which is usually specified in config.yaml. There is no need to manually specify it again unless the path in config.yaml is incorrect, in which case the correct path must be manually specified here.
-
-Other parameters are the same as mentioned above. A complete [example](https://github.com/alibaba-damo-academy/FunASR/blob/main/examples/industrial_data_pretraining/paraformer/infer_from_local.sh) can be found here.
-
-<a name="Export"></a>
-## Export ONNX
-
-### Command-line usage
-```shell
-funasr-export ++model=paraformer ++quantize=false ++device=cpu
-```
-
-### Python
-```python
-from funasr import AutoModel
-
-model = AutoModel(model="paraformer", device="cpu")
-
-res = model.export(quantize=False)
-```
-
-### Test ONNX
-```python
-# pip3 install -U funasr-onnx
-from funasr_onnx import Paraformer
-model_dir = "damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch"
-model = Paraformer(model_dir, batch_size=1, quantize=True)
-
-wav_path = ['~/.cache/modelscope/hub/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/example/asr_example.wav']
-
-result = model(wav_path)
-print(result)
-```
-
-More examples ref to [demo](https://github.com/alibaba-damo-academy/FunASR/tree/main/runtime/python/onnxruntime)
\ No newline at end of file
diff --git a/examples/README.md b/examples/README.md
new file mode 120000
index 0000000..d4ce990
--- /dev/null
+++ b/examples/README.md
@@ -0,0 +1 @@
+../docs/tutorial/README.md
\ No newline at end of file
diff --git a/examples/README_zh.md b/examples/README_zh.md
deleted file mode 100644
index db5e276..0000000
--- a/examples/README_zh.md
+++ /dev/null
@@ -1,473 +0,0 @@
-(绠�浣撲腑鏂噟[English](./README.md))
-
-FunASR寮�婧愪簡澶ч噺鍦ㄥ伐涓氭暟鎹笂棰勮缁冩ā鍨嬶紝鎮ㄥ彲浠ュ湪 [妯″瀷璁稿彲鍗忚](https://github.com/alibaba-damo-academy/FunASR/blob/main/MODEL_LICENSE)涓嬭嚜鐢变娇鐢ㄣ�佸鍒躲�佷慨鏀瑰拰鍒嗕韩FunASR妯″瀷锛屼笅闈㈠垪涓句唬琛ㄦ�х殑妯″瀷锛屾洿澶氭ā鍨嬭鍙傝�� [妯″瀷浠撳簱](https://github.com/alibaba-damo-academy/FunASR/tree/main/model_zoo)銆�
-
-<div align="center">  
-<h4>
- <a href="#妯″瀷鎺ㄧ悊"> 妯″瀷鎺ㄧ悊 </a>   
-锝�<a href="#妯″瀷璁粌涓庢祴璇�"> 妯″瀷璁粌涓庢祴璇� </a>
-锝�<a href="#妯″瀷瀵煎嚭涓庢祴璇�"> 妯″瀷瀵煎嚭涓庢祴璇� </a>
-</h4>
-</div>
-
-<a name="妯″瀷鎺ㄧ悊"></a>
-## 妯″瀷鎺ㄧ悊
-
-### 蹇�熶娇鐢�
-
-鍛戒护琛屾柟寮忚皟鐢細
-```shell
-funasr ++model=paraformer-zh ++vad_model="fsmn-vad" ++punc_model="ct-punc" ++input=asr_example_zh.wav
-```
-
-python浠ｇ爜璋冪敤锛堟帹鑽愶級
-
-```python
-from funasr import AutoModel
-
-model = AutoModel(model="paraformer-zh")
-
-res = model.generate(input="https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/vad_example.wav")
-print(res)
-```
-
-### 鎺ュ彛璇存槑
-
-#### AutoModel 瀹氫箟
-```python
-model = AutoModel(model=[str], device=[str], ncpu=[int], output_dir=[str], batch_size=[int], hub=[str], **kwargs)
-```
-- `model`(str): [妯″瀷浠撳簱](https://github.com/alibaba-damo-academy/FunASR/tree/main/model_zoo) 涓殑妯″瀷鍚嶇О锛屾垨鏈湴纾佺洏涓殑妯″瀷璺緞
-- `device`(str): `cuda:0`锛堥粯璁pu0锛夛紝浣跨敤 GPU 杩涜鎺ㄧ悊锛屾寚瀹氥�傚鏋滀负`cpu`锛屽垯浣跨敤 CPU 杩涜鎺ㄧ悊
-- `ncpu`(int): `4` 锛堥粯璁わ級锛岃缃敤浜� CPU 鍐呴儴鎿嶄綔骞惰鎬х殑绾跨▼鏁�
-- `output_dir`(str): `None` 锛堥粯璁わ級锛屽鏋滆缃紝杈撳嚭缁撴灉鐨勮緭鍑鸿矾寰�
-- `batch_size`(int): `1` 锛堥粯璁わ級锛岃В鐮佹椂鐨勬壒澶勭悊锛屾牱鏈釜鏁�
-- `hub`(str)锛歚ms`锛堥粯璁わ級锛屼粠modelscope涓嬭浇妯″瀷銆傚鏋滀负`hf`锛屼粠huggingface涓嬭浇妯″瀷銆�
-- `**kwargs`(dict): 鎵�鏈夊湪`config.yaml`涓弬鏁帮紝鍧囧彲浠ョ洿鎺ュ湪姝ゅ鎸囧畾锛屼緥濡傦紝vad妯″瀷涓渶澶у垏鍓查暱搴� `max_single_segment_time=6000` 锛堟绉掞級銆�
-
-#### AutoModel 鎺ㄧ悊
-```python
-res = model.generate(input=[str], output_dir=[str])
-```
-- `input`: 瑕佽В鐮佺殑杈撳叆锛屽彲浠ユ槸锛�
-  - wav鏂囦欢璺緞, 渚嬪: asr_example.wav
-  - pcm鏂囦欢璺緞, 渚嬪: asr_example.pcm锛屾鏃堕渶瑕佹寚瀹氶煶棰戦噰鏍风巼fs锛堥粯璁や负16000锛�
-  - 闊抽瀛楄妭鏁版祦锛屼緥濡傦細楹﹀厠椋庣殑瀛楄妭鏁版暟鎹�
-  - wav.scp锛宬aldi 鏍峰紡鐨� wav 鍒楄〃 (`wav_id \t wav_path`), 渚嬪:
-  ```text
-  asr_example1  ./audios/asr_example1.wav
-  asr_example2  ./audios/asr_example2.wav
-  ```
-  鍦ㄨ繖绉嶈緭鍏� `wav.scp` 鐨勬儏鍐典笅锛屽繀椤昏缃� `output_dir` 浠ヤ繚瀛樿緭鍑虹粨鏋�
-  - 闊抽閲囨牱鐐癸紝渚嬪锛歚audio, rate = soundfile.read("asr_example_zh.wav")`, 鏁版嵁绫诲瀷涓� numpy.ndarray銆傛敮鎸乥atch杈撳叆锛岀被鍨嬩负list锛�
-  ```[audio_sample1, audio_sample2, ..., audio_sampleN]```
-  - fbank杈撳叆锛屾敮鎸佺粍batch銆俿hape涓篬batch, frames, dim]锛岀被鍨嬩负torch.Tensor锛屼緥濡�
-- `output_dir`: None 锛堥粯璁わ級锛屽鏋滆缃紝杈撳嚭缁撴灉鐨勮緭鍑鸿矾寰�
-- `**kwargs`(dict): 涓庢ā鍨嬬浉鍏崇殑鎺ㄧ悊鍙傛暟锛屼緥濡傦紝`beam_size=10`锛宍decoding_ctc_weight=0.1`銆�
-
-
-### 鏇村鐢ㄦ硶浠嬬粛
-
-
-#### 闈炲疄鏃惰闊宠瘑鍒�
-##### SenseVoice
-```python
-from funasr import AutoModel
-from funasr.utils.postprocess_utils import rich_transcription_postprocess
-
-model_dir = "iic/SenseVoiceSmall"
-
-model = AutoModel(
-    model=model_dir,
-    vad_model="fsmn-vad",
-    vad_kwargs={"max_single_segment_time": 30000},
-    device="cuda:0",
-)
-
-# en
-res = model.generate(
-    input=f"{model.model_path}/example/en.mp3",
-    cache={},
-    language="auto",  # "zn", "en", "yue", "ja", "ko", "nospeech"
-    use_itn=True,
-    batch_size_s=60,
-    merge_vad=True,  #
-    merge_length_s=15,
-)
-text = rich_transcription_postprocess(res[0]["text"])
-print(text)
-```
-鍙傛暟璇存槑锛�
-- `model_dir`锛氭ā鍨嬪悕绉帮紝鎴栨湰鍦扮鐩樹腑鐨勬ā鍨嬭矾寰勩��
-- `vad_model`锛氳〃绀哄紑鍚疺AD锛孷AD鐨勪綔鐢ㄦ槸灏嗛暱闊抽鍒囧壊鎴愮煭闊抽锛屾鏃舵帹鐞嗚�楁椂鍖呮嫭浜哣AD涓嶴enseVoice鎬昏�楁椂锛屼负閾捐矾鑰楁椂锛屽鏋滈渶瑕佸崟鐙祴璇昐enseVoice妯″瀷鑰楁椂锛屽彲浠ュ叧闂璙AD妯″瀷銆�
-- `vad_kwargs`锛氳〃绀篤AD妯″瀷閰嶇疆,`max_single_segment_time`: 琛ㄧず`vad_model`鏈�澶у垏鍓查煶棰戞椂闀�, 鍗曚綅鏄绉抦s銆�
-- `use_itn`锛氳緭鍑虹粨鏋滀腑鏄惁鍖呭惈鏍囩偣涓庨�嗘枃鏈鍒欏寲銆�
-- `batch_size_s` 琛ㄧず閲囩敤鍔ㄦ�乥atch锛宐atch涓�婚煶棰戞椂闀匡紝鍗曚綅涓虹s銆�
-- `merge_vad`锛氭槸鍚﹀皢 vad 妯″瀷鍒囧壊鐨勭煭闊抽纰庣墖鍚堟垚锛屽悎骞跺悗闀垮害涓篳merge_length_s`锛屽崟浣嶄负绉抯銆�
-- `ban_emo_unk`锛氱鐢╡mo_unk鏍囩锛岀鐢ㄥ悗鎵�鏈夌殑鍙ュ瓙閮戒細琚祴涓庢儏鎰熸爣绛俱��
-
-##### Paraformer
-```python
-from funasr import AutoModel
-# paraformer-zh is a multi-functional asr model
-# use vad, punc, spk or not as you need
-model = AutoModel(model="paraformer-zh",  
-                  vad_model="fsmn-vad", 
-                  vad_kwargs={"max_single_segment_time": 60000},
-                  punc_model="ct-punc", 
-                  # spk_model="cam++"
-                  )
-wav_file = f"{model.model_path}/example/asr_example.wav"
-res = model.generate(input=wav_file, batch_size_s=300, batch_size_threshold_s=60, hotword='榄旀惌')
-print(res)
-```
-娉ㄦ剰锛�
-- 閫氬父妯″瀷杈撳叆闄愬埗鏃堕暱30s浠ヤ笅锛岀粍鍚坄vad_model`鍚庯紝鏀寔浠绘剰鏃堕暱闊抽杈撳叆锛屼笉灞�闄愪簬paraformer妯″瀷锛屾墍鏈夐煶棰戣緭鍏ユā鍨嬪潎鍙互銆�
-- `model`鐩稿叧鐨勫弬鏁板彲浠ョ洿鎺ュ湪`AutoModel`瀹氫箟涓洿鎺ユ寚瀹氾紱涓巂vad_model`鐩稿叧鍙傛暟鍙互閫氳繃`vad_kwargs`鏉ユ寚瀹氾紝绫诲瀷涓篸ict锛涚被浼肩殑鏈塦punc_kwargs`锛宍spk_kwargs`锛�
-- `max_single_segment_time`: 琛ㄧず`vad_model`鏈�澶у垏鍓查煶棰戞椂闀�, 鍗曚綅鏄绉抦s.
-- `batch_size_s` 琛ㄧず閲囩敤鍔ㄦ�乥atch锛宐atch涓�婚煶棰戞椂闀匡紝鍗曚綅涓虹s銆�
-- `batch_size_threshold_s`: 琛ㄧず`vad_model`鍒囧壊鍚庨煶棰戠墖娈垫椂闀胯秴杩� `batch_size_threshold_s`闃堝�兼椂锛屽皢batch_size鏁拌缃负1, 鍗曚綅涓虹s.
-
-寤鸿锛氬綋鎮ㄨ緭鍏ヤ负闀块煶棰戯紝閬囧埌OOM闂鏃讹紝鍥犱负鏄惧瓨鍗犵敤涓庨煶棰戞椂闀垮憟骞虫柟鍏崇郴澧炲姞锛屽垎涓�3绉嶆儏鍐碉細
-- a)鎺ㄧ悊璧峰闃舵锛屾樉瀛樹富瑕佸彇鍐充簬`batch_size_s`锛岄�傚綋鍑忓皬璇ュ�硷紝鍙互鍑忓皯鏄惧瓨鍗犵敤锛�
-- b)鎺ㄧ悊涓棿闃舵锛岄亣鍒癡AD鍒囧壊鐨勯暱闊抽鐗囨锛屾�籺oken鏁板皬浜巂batch_size_s`锛屼粛鐒跺嚭鐜癘OM锛屽彲浠ラ�傚綋鍑忓皬`batch_size_threshold_s`锛岃秴杩囬槇鍊硷紝寮哄埗batch涓�1; 
-- c)鎺ㄧ悊蹇粨鏉熼樁娈碉紝閬囧埌VAD鍒囧壊鐨勯暱闊抽鐗囨锛屾�籺oken鏁板皬浜巂batch_size_s`锛屼笖瓒呰繃闃堝�糮batch_size_threshold_s`锛屽己鍒禸atch涓�1锛屼粛鐒跺嚭鐜癘OM锛屽彲浠ラ�傚綋鍑忓皬`max_single_segment_time`锛屼娇寰梀AD鍒囧壊闊抽鏃堕暱鍙樼煭銆�
-
-#### 瀹炴椂璇煶璇嗗埆
-
-```python
-from funasr import AutoModel
-
-chunk_size = [0, 10, 5] #[0, 10, 5] 600ms, [0, 8, 4] 480ms
-encoder_chunk_look_back = 4 #number of chunks to lookback for encoder self-attention
-decoder_chunk_look_back = 1 #number of encoder chunks to lookback for decoder cross-attention
-
-model = AutoModel(model="paraformer-zh-streaming")
-
-import soundfile
-import os
-
-wav_file = os.path.join(model.model_path, "example/asr_example.wav")
-speech, sample_rate = soundfile.read(wav_file)
-chunk_stride = chunk_size[1] * 960 # 600ms
-
-cache = {}
-total_chunk_num = int(len((speech)-1)/chunk_stride+1)
-for i in range(total_chunk_num):
-    speech_chunk = speech[i*chunk_stride:(i+1)*chunk_stride]
-    is_final = i == total_chunk_num - 1
-    res = model.generate(input=speech_chunk, cache=cache, is_final=is_final, chunk_size=chunk_size, encoder_chunk_look_back=encoder_chunk_look_back, decoder_chunk_look_back=decoder_chunk_look_back)
-    print(res)
-```
-
-娉細`chunk_size`涓烘祦寮忓欢鏃堕厤缃紝`[0,10,5]`琛ㄧず涓婂睆瀹炴椂鍑哄瓧绮掑害涓篳10*60=600ms`锛屾湭鏉ヤ俊鎭负`5*60=300ms`銆傛瘡娆℃帹鐞嗚緭鍏ヤ负`600ms`锛堥噰鏍风偣鏁颁负`16000*0.6=960`锛夛紝杈撳嚭涓哄搴旀枃瀛楋紝鏈�鍚庝竴涓闊崇墖娈佃緭鍏ラ渶瑕佽缃甡is_final=True`鏉ュ己鍒惰緭鍑烘渶鍚庝竴涓瓧銆�
-
-#### 璇煶绔偣妫�娴嬶紙闈炲疄鏃讹級
-```python
-from funasr import AutoModel
-
-model = AutoModel(model="fsmn-vad")
-
-wav_file = f"{model.model_path}/example/vad_example.wav"
-res = model.generate(input=wav_file)
-print(res)
-```
-娉細VAD妯″瀷杈撳嚭鏍煎紡涓猴細`[[beg1, end1], [beg2, end2], .., [begN, endN]]`锛屽叾涓璥begN/endN`琛ㄧず绗琡N`涓湁鏁堥煶棰戠墖娈电殑璧峰鐐�/缁撴潫鐐癸紝
-鍗曚綅涓烘绉掋��
-
-#### 璇煶绔偣妫�娴嬶紙瀹炴椂锛�
-```python
-from funasr import AutoModel
-
-chunk_size = 200 # ms
-model = AutoModel(model="fsmn-vad")
-
-import soundfile
-
-wav_file = f"{model.model_path}/example/vad_example.wav"
-speech, sample_rate = soundfile.read(wav_file)
-chunk_stride = int(chunk_size * sample_rate / 1000)
-
-cache = {}
-total_chunk_num = int(len((speech)-1)/chunk_stride+1)
-for i in range(total_chunk_num):
-    speech_chunk = speech[i*chunk_stride:(i+1)*chunk_stride]
-    is_final = i == total_chunk_num - 1
-    res = model.generate(input=speech_chunk, cache=cache, is_final=is_final, chunk_size=chunk_size)
-    if len(res[0]["value"]):
-        print(res)
-```
-娉細娴佸紡VAD妯″瀷杈撳嚭鏍煎紡涓�4绉嶆儏鍐碉細
-- `[[beg1, end1], [beg2, end2], .., [begN, endN]]`锛氬悓涓婄绾縑AD杈撳嚭缁撴灉銆�
-- `[[beg, -1]]`锛氳〃绀哄彧妫�娴嬪埌璧峰鐐广��
-- `[[-1, end]]`锛氳〃绀哄彧妫�娴嬪埌缁撴潫鐐广��
-- `[]`锛氳〃绀烘棦娌℃湁妫�娴嬪埌璧峰鐐癸紝涔熸病鏈夋娴嬪埌缁撴潫鐐�
-杈撳嚭缁撴灉鍗曚綅涓烘绉掞紝浠庤捣濮嬬偣寮�濮嬬殑缁濆鏃堕棿銆�
-
-#### 鏍囩偣鎭㈠
-```python
-from funasr import AutoModel
-
-model = AutoModel(model="ct-punc")
-
-res = model.generate(input="閭ｄ粖澶╃殑浼氬氨鍒拌繖閲屽惂 happy new year 鏄庡勾瑙�")
-print(res)
-```
-
-#### 鏃堕棿鎴抽娴�
-```python
-from funasr import AutoModel
-
-model = AutoModel(model="fa-zh")
-
-wav_file = f"{model.model_path}/example/asr_example.wav"
-text_file = f"{model.model_path}/example/text.txt"
-res = model.generate(input=(wav_file, text_file), data_type=("sound", "text"))
-print(res)
-```
-鏇村锛圼绀轰緥](https://github.com/alibaba-damo-academy/FunASR/tree/main/examples/industrial_data_pretraining)锛�
-
-<a name="鏍稿績鍔熻兘"></a>
-## 妯″瀷璁粌涓庢祴璇�
-
-### 蹇�熷紑濮�
-
-鍛戒护琛屾墽琛岋紙鐢ㄤ簬蹇�熸祴璇曪紝涓嶆帹鑽愶級锛�
-```shell
-funasr-train ++model=paraformer-zh ++train_data_set_list=data/list/train.jsonl ++valid_data_set_list=data/list/val.jsonl ++output_dir="./outputs" &> log.txt &
-```
-
-python浠ｇ爜鎵ц锛堝彲浠ュ鏈哄鍗★紝鎺ㄨ崘锛�
-
-```shell
-cd examples/industrial_data_pretraining/paraformer
-bash finetune.sh
-# "log_file: ./outputs/log.txt"
-```
-璇︾粏瀹屾暣鐨勮剼鏈弬鑰� [finetune.sh](https://github.com/alibaba-damo-academy/FunASR/blob/main/examples/industrial_data_pretraining/paraformer/finetune.sh)
-
-### 璇︾粏鍙傛暟浠嬬粛
-
-```shell
-funasr/bin/train.py \
-++model="${model_name_or_model_dir}" \
-++train_data_set_list="${train_data}" \
-++valid_data_set_list="${val_data}" \
-++dataset_conf.batch_size=20000 \
-++dataset_conf.batch_type="token" \
-++dataset_conf.num_workers=4 \
-++train_conf.max_epoch=50 \
-++train_conf.log_interval=1 \
-++train_conf.resume=false \
-++train_conf.validate_interval=2000 \
-++train_conf.save_checkpoint_interval=2000 \
-++train_conf.keep_nbest_models=20 \
-++train_conf.avg_nbest_model=10 \
-++optim_conf.lr=0.0002 \
-++output_dir="${output_dir}" &> ${log_file}
-```
-
-- `model`锛坰tr锛夛細妯″瀷鍚嶅瓧锛堟ā鍨嬩粨搴撲腑鐨処D锛夛紝姝ゆ椂鑴氭湰浼氳嚜鍔ㄤ笅杞芥ā鍨嬪埌鏈湴锛涙垨鑰呮湰鍦板凡缁忎笅杞藉ソ鐨勬ā鍨嬭矾寰勩��
-- `train_data_set_list`锛坰tr锛夛細璁粌鏁版嵁璺緞锛岄粯璁や负jsonl鏍煎紡锛屽叿浣撳弬鑰冿紙[渚嬪瓙](https://github.com/alibaba-damo-academy/FunASR/blob/main/data/list)锛夈��
-- `valid_data_set_list`锛坰tr锛夛細楠岃瘉鏁版嵁璺緞锛岄粯璁や负jsonl鏍煎紡锛屽叿浣撳弬鑰冿紙[渚嬪瓙](https://github.com/alibaba-damo-academy/FunASR/blob/main/data/list)锛夈��
-- `dataset_conf.batch_type`锛坰tr锛夛細`example`锛堥粯璁わ級锛宐atch鐨勭被鍨嬨�俙example`琛ㄧず鎸夌収鍥哄畾鏁扮洰batch_size涓牱鏈粍batch锛沗length` or `token` 琛ㄧず鍔ㄦ�佺粍batch锛宐atch鎬婚暱搴︽垨鑰卼oken鏁颁负batch_size銆�
-- `dataset_conf.batch_size`锛坕nt锛夛細涓� `batch_type` 鎼厤浣跨敤锛屽綋 `batch_type=example` 鏃讹紝琛ㄧず鏍锋湰涓暟锛涘綋 `batch_type=length` 鏃讹紝琛ㄧず鏍锋湰涓暱搴︼紝鍗曚綅涓篺bank甯ф暟锛�1甯�10ms锛夋垨鑰呮枃瀛梩oken涓暟銆�
-- `train_conf.max_epoch`锛坕nt锛夛細`100`锛堥粯璁わ級锛岃缁冩�籩poch鏁般��
-- `train_conf.log_interval`锛坕nt锛夛細`50`锛堥粯璁わ級锛屾墦鍗版棩蹇楅棿闅攕tep鏁般��
-- `train_conf.resume`锛坕nt锛夛細`True`锛堥粯璁わ級锛屾槸鍚﹀紑鍚柇鐐归噸璁��
-- `train_conf.validate_interval`锛坕nt锛夛細`5000`锛堥粯璁わ級锛岃缁冧腑鍋氶獙璇佹祴璇曠殑闂撮殧step鏁般��
-- `train_conf.save_checkpoint_interval`锛坕nt锛夛細`5000`锛堥粯璁わ級锛岃缁冧腑妯″瀷淇濆瓨闂撮殧step鏁般��
-- `train_conf.avg_keep_nbest_models_type`锛坰tr锛夛細`acc`锛堥粯璁わ級锛屼繚鐣檔best鐨勬爣鍑嗕负acc锛堣秺澶ц秺濂斤級銆俙loss`琛ㄧず锛屼繚鐣檔best鐨勬爣鍑嗕负loss锛堣秺灏忚秺濂斤級銆�
-- `train_conf.keep_nbest_models`锛坕nt锛夛細`500`锛堥粯璁わ級锛屼繚鐣欐渶澶у灏戜釜妯″瀷鍙傛暟锛岄厤鍚� `avg_keep_nbest_models_type` 鎸夌収楠岃瘉闆� acc/loss 淇濈暀鏈�浣崇殑n涓ā鍨嬶紝鍏朵粬鍒犻櫎锛岃妭绾﹀瓨鍌ㄧ┖闂淬��
-- `train_conf.avg_nbest_model`锛坕nt锛夛細`10`锛堥粯璁わ級锛屼繚鐣欐渶澶у灏戜釜妯″瀷鍙傛暟锛岄厤鍚� `avg_keep_nbest_models_type` 鎸夌収楠岃瘉闆� acc/loss 瀵规渶浣崇殑n涓ā鍨嬪钩鍧囥��
-- `train_conf.accum_grad`锛坕nt锛夛細`1`锛堥粯璁わ級锛屾搴︾疮绉姛鑳姐��
-- `train_conf.grad_clip`锛坒loat锛夛細`10.0`锛堥粯璁わ級锛屾搴︽埅鏂姛鑳姐��
-- `train_conf.use_fp16`锛坆ool锛夛細`False`锛堥粯璁わ級锛屽紑鍚痜p16璁粌锛屽姞蹇缁冮�熷害銆�
-- `optim_conf.lr`锛坒loat锛夛細瀛︿範鐜囥��
-- `output_dir`锛坰tr锛夛細妯″瀷淇濆瓨璺緞銆�
-- `**kwargs`(dict): 鎵�鏈夊湪`config.yaml`涓弬鏁帮紝鍧囧彲浠ョ洿鎺ュ湪姝ゅ鎸囧畾锛屼緥濡傦紝杩囨护20s浠ヤ笂闀块煶棰戯細`dataset_conf.max_token_length=2000`锛屽崟浣嶄负闊抽fbank甯ф暟锛�1甯�10ms锛夋垨鑰呮枃瀛梩oken涓暟銆�
-
-#### 澶歡pu璁粌
-##### 鍗曟満澶歡pu璁粌
-```shell
-export CUDA_VISIBLE_DEVICES="0,1"
-gpu_num=$(echo $CUDA_VISIBLE_DEVICES | awk -F "," '{print NF}')
-
-torchrun --nnodes 1 --nproc_per_node ${gpu_num} --master_port 12345 \
-../../../funasr/bin/train.py ${train_args}
-```
---nnodes 琛ㄧず鍙備笌鐨勮妭鐐规�绘暟锛�--nproc_per_node 琛ㄧず姣忎釜鑺傜偣涓婅繍琛岀殑杩涚▼鏁帮紝--master_port 琛ㄧず绔彛鍙�
-
-##### 澶氭満澶歡pu璁粌
-
-鍦ㄤ富鑺傜偣涓婏紝鍋囪IP涓�192.168.1.1锛岀鍙ｄ负12345锛屼娇鐢ㄧ殑鏄�2涓狦PU锛屽垯杩愯濡備笅鍛戒护锛�
-```shell
-export CUDA_VISIBLE_DEVICES="0,1"
-gpu_num=$(echo $CUDA_VISIBLE_DEVICES | awk -F "," '{print NF}')
-
-torchrun --nnodes 2 --node_rank 0 --nproc_per_node ${gpu_num} --master_addr 192.168.1.1 --master_port 12345 \
-../../../funasr/bin/train.py ${train_args}
-```
-鍦ㄤ粠鑺傜偣涓婏紙鍋囪IP涓�192.168.1.2锛夛紝浣犻渶瑕佺‘淇滿ASTER_ADDR鍜孧ASTER_PORT鐜鍙橀噺涓庝富鑺傜偣璁剧疆鐨勪竴鑷达紝骞惰繍琛屽悓鏍风殑鍛戒护锛�
-```shell
-export CUDA_VISIBLE_DEVICES="0,1"
-gpu_num=$(echo $CUDA_VISIBLE_DEVICES | awk -F "," '{print NF}')
-
-torchrun --nnodes 2 --node_rank 1 --nproc_per_node ${gpu_num} --master_addr 192.168.1.1 --master_port 12345 \
-../../../funasr/bin/train.py ${train_args}
-```
-
---nnodes 琛ㄧず鍙備笌鐨勮妭鐐规�绘暟锛�--node_rank 琛ㄧず褰撳墠鑺傜偣id锛�--nproc_per_node 琛ㄧず姣忎釜鑺傜偣涓婅繍琛岀殑杩涚▼鏁帮紙閫氬父涓篻pu涓暟锛夛紝--master_port 琛ㄧず绔彛鍙�
-
-#### 鍑嗗鏁版嵁
-
-`jsonl`鏍煎紡鍙互鍙傝�冿紙[渚嬪瓙](https://github.com/alibaba-damo-academy/FunASR/blob/main/data/list)锛夈��
-鍙互鐢ㄦ寚浠� `scp2jsonl` 浠巜av.scp涓巘ext.txt鐢熸垚銆倃av.scp涓巘ext.txt鍑嗗杩囩▼濡備笅锛�
-
-`train_text.txt`
-
-宸﹁竟涓烘暟鎹敮涓�ID锛岄渶涓巂train_wav.scp`涓殑`ID`涓�涓�瀵瑰簲
-鍙宠竟涓洪煶棰戞枃浠舵爣娉ㄦ枃鏈紝鏍煎紡濡備笅锛�
-
-```bash
-ID0012W0013 褰撳鎴烽闄╂壙鍙楄兘鍔涜瘎浼颁緷鎹彂鐢熷彉鍖栨椂
-ID0012W0014 鎵�鏈夊彧瑕佸鐞� data 涓嶇浣犳槸鍋� machine learning 鍋� deep learning
-ID0012W0015 he tried to think how it could be
-```
-
-
-`train_wav.scp`
-
-宸﹁竟涓烘暟鎹敮涓�ID锛岄渶涓巂train_text.txt`涓殑`ID`涓�涓�瀵瑰簲
-鍙宠竟涓洪煶棰戞枃浠剁殑璺緞锛屾牸寮忓涓�
-
-```bash
-BAC009S0764W0121 https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/BAC009S0764W0121.wav
-BAC009S0916W0489 https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/BAC009S0916W0489.wav
-ID0012W0015 https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_cn_en.wav
-```
-
-`鐢熸垚鎸囦护`
-
-```shell
-# generate train.jsonl and val.jsonl from wav.scp and text.txt
-scp2jsonl \
-++scp_file_list='["../../../data/list/train_wav.scp", "../../../data/list/train_text.txt"]' \
-++data_type_list='["source", "target"]' \
-++jsonl_file_out="../../../data/list/train.jsonl"
-```
-
-锛堝彲閫夛紝闈炲繀闇�锛夊鏋滈渶瑕佷粠jsonl瑙ｆ瀽鎴恮av.scp涓巘ext.txt锛屽彲浠ヤ娇鐢ㄦ寚浠わ細
-
-```shell
-# generate wav.scp and text.txt from train.jsonl and val.jsonl
-jsonl2scp \
-++scp_file_list='["../../../data/list/train_wav.scp", "../../../data/list/train_text.txt"]' \
-++data_type_list='["source", "target"]' \
-++jsonl_file_in="../../../data/list/train.jsonl"
-```
-
-#### 鏌ョ湅璁粌鏃ュ織
-
-##### 鏌ョ湅瀹為獙log
-```shell
-tail log.txt
-[2024-03-21 15:55:52,137][root][INFO] - train, rank: 3, epoch: 0/50, step: 6990/1, total step: 6990, (loss_avg_rank: 0.327), (loss_avg_epoch: 0.409), (ppl_avg_epoch: 1.506), (acc_avg_epoch: 0.795), (lr: 1.165e-04), [('loss_att', 0.259), ('acc', 0.825), ('loss_pre', 0.04), ('loss', 0.299), ('batch_size', 40)], {'data_load': '0.000', 'forward_time': '0.315', 'backward_time': '0.555', 'optim_time': '0.076', 'total_time': '0.947'}, GPU, memory: usage: 3.830 GB, peak: 18.357 GB, cache: 20.910 GB, cache_peak: 20.910 GB
-[2024-03-21 15:55:52,139][root][INFO] - train, rank: 1, epoch: 0/50, step: 6990/1, total step: 6990, (loss_avg_rank: 0.334), (loss_avg_epoch: 0.409), (ppl_avg_epoch: 1.506), (acc_avg_epoch: 0.795), (lr: 1.165e-04), [('loss_att', 0.285), ('acc', 0.823), ('loss_pre', 0.046), ('loss', 0.331), ('batch_size', 36)], {'data_load': '0.000', 'forward_time': '0.334', 'backward_time': '0.536', 'optim_time': '0.077', 'total_time': '0.948'}, GPU, memory: usage: 3.943 GB, peak: 18.291 GB, cache: 19.619 GB, cache_peak: 19.619 GB
-```
-鎸囨爣瑙ｉ噴锛�
-- `rank`锛氳〃绀篻pu id銆�
-- `epoch`,`step`,`total step`锛氳〃绀哄綋鍓峞poch锛宻tep锛屾�籹tep銆�
-- `loss_avg_rank`锛氳〃绀哄綋鍓峴tep锛屾墍鏈塯pu骞冲潎loss銆�
-- `loss/ppl/acc_avg_epoch`锛氳〃绀哄綋鍓峞poch鍛ㄦ湡锛屾埅姝㈠綋鍓峴tep鏁版椂锛屾�诲钩鍧噇oss/ppl/acc銆俥poch缁撴潫鏃剁殑鏈�鍚庝竴涓猻tep琛ㄧずepoch鎬诲钩鍧噇oss/ppl/acc锛屾帹鑽愪娇鐢╝cc鎸囨爣銆�
-- `lr`锛氬綋鍓峴tep鐨勫涔犵巼銆�
-- `[('loss_att', 0.259), ('acc', 0.825), ('loss_pre', 0.04), ('loss', 0.299), ('batch_size', 40)]`锛氳〃绀哄綋鍓峠pu id鐨勫叿浣撴暟鎹��
-- `total_time`锛氳〃绀哄崟涓猻tep鎬昏�楁椂銆�
-- `GPU, memory`锛氬垎鍒〃绀猴紝妯″瀷浣跨敤/宄板�兼樉瀛橈紝妯″瀷+缂撳瓨浣跨敤/宄板�兼樉瀛樸��
-
-##### tensorboard鍙鍖�
-```bash
-tensorboard --logdir /xxxx/FunASR/examples/industrial_data_pretraining/paraformer/outputs/log/tensorboard
-```
-娴忚鍣ㄤ腑鎵撳紑锛歨ttp://localhost:6006/
-
-### 璁粌鍚庢ā鍨嬫祴璇�
-
-
-#### 鏈塩onfiguration.json
-
-鍋囧畾锛岃缁冩ā鍨嬭矾寰勪负锛�./model_dir锛屽鏋滄敼鐩綍涓嬫湁鐢熸垚configuration.json锛屽彧闇�瑕佸皢 [涓婅堪妯″瀷鎺ㄧ悊鏂规硶](https://github.com/alibaba-damo-academy/FunASR/blob/main/examples/README_zh.md#%E6%A8%A1%E5%9E%8B%E6%8E%A8%E7%90%86) 涓ā鍨嬪悕瀛椾慨鏀逛负妯″瀷璺緞鍗冲彲
-
-渚嬪锛�
-
-浠巗hell鎺ㄧ悊
-```shell
-python -m funasr.bin.inference ++model="./model_dir" ++input=="${input}" ++output_dir="${output_dir}"
-```
-浠巔ython鎺ㄧ悊
-
-```python
-from funasr import AutoModel
-
-model = AutoModel(model="./model_dir")
-
-res = model.generate(input=wav_file)
-print(res)
-```
-
-#### 鏃燾onfiguration.json鏃�
-
-濡傛灉妯″瀷璺緞涓棤configuration.json鏃讹紝闇�瑕佹墜鍔ㄦ寚瀹氬叿浣撻厤缃枃浠惰矾寰勪笌妯″瀷璺緞
-
-```shell
-python -m funasr.bin.inference \
---config-path "${local_path}" \
---config-name "${config}" \
-++init_param="${init_param}" \
-++tokenizer_conf.token_list="${tokens}" \
-++frontend_conf.cmvn_file="${cmvn_file}" \
-++input="${input}" \
-++output_dir="${output_dir}" \
-++device="${device}"
-```
-
-鍙傛暟浠嬬粛
-- `config-path`锛氫负瀹為獙涓繚瀛樼殑 `config.yaml`锛屽彲浠ヤ粠瀹為獙杈撳嚭鐩綍涓煡鎵俱��
-- `config-name`锛氶厤缃枃浠跺悕锛屼竴鑸负 `config.yaml`锛屾敮鎸亂aml鏍煎紡涓巎son鏍煎紡锛屼緥濡� `config.json`
-- `init_param`锛氶渶瑕佹祴璇曠殑妯″瀷鍙傛暟锛屼竴鑸负`model.pt`锛屽彲浠ヨ嚜宸遍�夋嫨鍏蜂綋鐨勬ā鍨嬫枃浠�
-- `tokenizer_conf.token_list`锛氳瘝琛ㄦ枃浠惰矾寰勶紝涓�鑸湪 `config.yaml` 鏈夋寚瀹氾紝鏃犻渶鍐嶆墜鍔ㄦ寚瀹氾紝褰� `config.yaml` 涓矾寰勪笉姝ｇ‘鏃讹紝闇�瑕佸湪姝ゅ鎵嬪姩鎸囧畾銆�
-- `frontend_conf.cmvn_file`锛歸av鎻愬彇fbank涓敤鍒扮殑cmvn鏂囦欢锛屼竴鑸湪 `config.yaml` 鏈夋寚瀹氾紝鏃犻渶鍐嶆墜鍔ㄦ寚瀹氾紝褰� `config.yaml` 涓矾寰勪笉姝ｇ‘鏃讹紝闇�瑕佸湪姝ゅ鎵嬪姩鎸囧畾銆�
-
-鍏朵粬鍙傛暟鍚屼笂锛屽畬鏁� [绀轰緥](https://github.com/alibaba-damo-academy/FunASR/blob/main/examples/industrial_data_pretraining/paraformer/infer_from_local.sh)
-
-
-<a name="妯″瀷瀵煎嚭涓庢祴璇�"></a>
-## 妯″瀷瀵煎嚭涓庢祴璇�
-### 浠庡懡浠よ瀵煎嚭
-```shell
-funasr-export ++model=paraformer ++quantize=false
-```
-
-### 浠嶱ython瀵煎嚭
-```python
-from funasr import AutoModel
-
-model = AutoModel(model="paraformer")
-
-res = model.export(quantize=False)
-```
-
-### 娴嬭瘯ONNX
-```python
-# pip3 install -U funasr-onnx
-from funasr_onnx import Paraformer
-model_dir = "damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch"
-model = Paraformer(model_dir, batch_size=1, quantize=True)
-
-wav_path = ['~/.cache/modelscope/hub/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/example/asr_example.wav']
-
-result = model(wav_path)
-print(result)
-```
-
-鏇村渚嬪瓙璇峰弬鑰� [鏍蜂緥](https://github.com/alibaba-damo-academy/FunASR/tree/main/runtime/python/onnxruntime)
\ No newline at end of file
diff --git a/examples/README_zh.md b/examples/README_zh.md
new file mode 120000
index 0000000..a2059ae
--- /dev/null
+++ b/examples/README_zh.md
@@ -0,0 +1 @@
+../docs/tutorial/README_zh.md
\ No newline at end of file

--
Gitblit v1.9.1