Merge branch 'main' of https://github.com/alibaba-damo-academy/FunASR into main
| | |
| | | |
| | | |DingTalk group | WeChat group | |
| | | |:---:|:-----------------------------------------------------:| |
| | | |<div align="left"><img src="docs/images/dingding.jpg" width="250"/> | <img src="docs/images/wechat.png" width="232"/></div> | |
| | | |<div align="left"><img src="docs/images/dingding.jpg" width="250"/> | <img src="docs/images/wechat.png" width="215"/></div> | |
| | | |
| | | ## Contributors |
| | | |
| | |
| | | ## 联系我们 |
| | | |
| | | 如果您在使用中遇到问题,可以直接在github页面提Issues。欢迎语音兴趣爱好者扫描以下的钉钉群或者微信群二维码加入社区群,进行交流和讨论。 |
| | | |
| | | | 钉钉群 | 微信 | |
| | | |:---------------------------------------------------------------------:|:-----------------------------------------------------:| |
| | | | <div align="left"><img src="docs/images/dingding.jpg" width="250"/> | <img src="docs/images/wechat.png" width="232"/></div> | |
| | | | <div align="left"><img src="docs/images/dingding.jpg" width="250"/> | <img src="docs/images/wechat.png" width="215"/></div> | |
| | | |
| | | ## 社区贡献者 |
| | | |
| | |
| | | The three dataset for training mentioned above can be downloaded at [OpenSLR](https://openslr.org/resources.php). The participants can download via the following links. Particularly, in the baseline we provide convenient data preparation scripts for AliMeeting corpus. |
| | | - [AliMeeting](https://openslr.org/119/) |
| | | - [AISHELL-4](https://openslr.org/111/) |
| | | - [CN-Celeb](https://openslr.org/82/) |
| | | - [CN-Celeb](https://openslr.org/82/) |
| | | |
| | | Now, the new test set is available [here](https://speech-lab-share-data.oss-cn-shanghai.aliyuncs.com/AliMeeting/openlr/Test_2023_Ali.tar.gz) |
| | |
| | | To advance the current state-of-the-art in multi-talker automatic speech recognition, the M2MeT2.0 challenge proposes a speaker-attributed ASR task, comprising two sub-tracks: fixed and open training conditions. |
| | | To facilitate reproducible research, we provide a comprehensive overview of the dataset, challenge rules, evaluation metrics, and baseline systems. |
| | | |
| | | Now the new test set contains about 10 hours audio is available. You can download from `here <https://speech-lab-share-data.oss-cn-shanghai.aliyuncs.com/AliMeeting/openlr/Test_2023_Ali.tar.gz>`_ |
| | | |
| | | .. toctree:: |
| | | :maxdepth: 1 |
| | |
| | | inference_pipeline = pipeline( |
| | | task=Tasks.auto_speech_recognition, |
| | | model='damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online', |
| | | model_revision='v1.0.6', |
| | | model_revision='v1.0.7', |
| | | update_model=False, |
| | | mode='paraformer_streaming' |
| | | ) |
| | | import soundfile |
| | | speech, sample_rate = soundfile.read("example/asr_example.wav") |
| | | |
| | | chunk_size = [5, 10, 5] #[5, 10, 5] 600ms, [8, 8, 4] 480ms |
| | | param_dict = {"cache": dict(), "is_final": False, "chunk_size": chunk_size} |
| | | chunk_size = [0, 10, 5] #[0, 10, 5] 600ms, [0, 8, 4] 480ms |
| | | encoder_chunk_look_back = 4 #number of chunks to lookback for encoder self-attention |
| | | decoder_chunk_look_back = 1 #number of encoder chunks to lookback for decoder cross-attention |
| | | param_dict = {"cache": dict(), "is_final": False, "chunk_size": chunk_size, |
| | | "encoder_chunk_look_back": encoder_chunk_look_back, "decoder_chunk_look_back": decoder_chunk_look_back} |
| | | chunk_stride = chunk_size[1] * 960 # 600ms、480ms |
| | | # first chunk, 600ms |
| | | speech_chunk = speech[0:chunk_stride] |
| | |
| | | inference_pipeline = pipeline( |
| | | task=Tasks.auto_speech_recognition, |
| | | model='damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online', |
| | | model_revision='v1.0.6', |
| | | model_revision='v1.0.7', |
| | | update_model=False, |
| | | mode="paraformer_fake_streaming" |
| | | ) |
| | |
| | | inference_pipeline = pipeline( |
| | | task=Tasks.auto_speech_recognition, |
| | | model='damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online', |
| | | model_revision='v1.0.6', |
| | | model_revision='v1.0.7', |
| | | update_model=False, |
| | | mode='paraformer_streaming' |
| | | ) |
| | | import soundfile |
| | | speech, sample_rate = soundfile.read("example/asr_example.wav") |
| | | |
| | | chunk_size = [5, 10, 5] #[5, 10, 5] 600ms, [8, 8, 4] 480ms |
| | | param_dict = {"cache": dict(), "is_final": False, "chunk_size": chunk_size} |
| | | chunk_size = [0, 10, 5] #[0, 10, 5] 600ms, [0, 8, 4] 480ms |
| | | encoder_chunk_look_back = 4 #number of chunks to lookback for encoder self-attention |
| | | decoder_chunk_look_back = 1 #number of encoder chunks to lookback for decoder cross-attention |
| | | param_dict = {"cache": dict(), "is_final": False, "chunk_size": chunk_size, |
| | | "encoder_chunk_look_back": encoder_chunk_look_back, "decoder_chunk_look_back": decoder_chunk_look_back} |
| | | chunk_stride = chunk_size[1] * 960 # 600ms、480ms |
| | | # first chunk, 600ms |
| | | speech_chunk = speech[0:chunk_stride] |
| | |
| | | inference_pipeline = pipeline( |
| | | task=Tasks.auto_speech_recognition, |
| | | model='damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online', |
| | | model_revision='v1.0.6', |
| | | model_revision='v1.0.7', |
| | | update_model=False, |
| | | mode="paraformer_fake_streaming" |
| | | ) |
| | |
| | | inference_pipeline = pipeline( |
| | | task=Tasks.auto_speech_recognition, |
| | | model='damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online', |
| | | model_revision='v1.0.6', |
| | | model_revision='v1.0.7', |
| | | update_model=False, |
| | | mode="paraformer_fake_streaming" |
| | | ) |
| | |
| | | inference_pipeline = pipeline( |
| | | task=Tasks.auto_speech_recognition, |
| | | model='damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online', |
| | | model_revision='v1.0.6', |
| | | model_revision='v1.0.7', |
| | | update_model=False, |
| | | mode="paraformer_streaming" |
| | | ) |
| New file |
| | |
| | | import os |
| | | import logging |
| | | import torch |
| | | import soundfile |
| | | |
| | | from modelscope.pipelines import pipeline |
| | | from modelscope.utils.constant import Tasks |
| | | from modelscope.utils.logger import get_logger |
| | | |
| | | logger = get_logger(log_level=logging.CRITICAL) |
| | | logger.setLevel(logging.CRITICAL) |
| | | |
| | | os.environ["MODELSCOPE_CACHE"] = "./" |
| | | inference_pipeline = pipeline( |
| | | task=Tasks.auto_speech_recognition, |
| | | model='damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online', |
| | | model_revision='v1.0.7', |
| | | update_model=False, |
| | | mode="paraformer_streaming" |
| | | ) |
| | | |
| | | model_dir = os.path.join(os.environ["MODELSCOPE_CACHE"], "damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online") |
| | | speech, sample_rate = soundfile.read(os.path.join(model_dir, "example/asr_example.wav")) |
| | | speech_length = speech.shape[0] |
| | | |
| | | sample_offset = 0 |
| | | chunk_size = [0, 10, 5] #[0, 10, 5] 600ms, [0, 8, 4] 480ms |
| | | encoder_chunk_look_back = 4 #number of chunks to lookback for encoder self-attention |
| | | decoder_chunk_look_back = 1 #number of encoder chunks to lookback for decoder cross-attention |
| | | stride_size = chunk_size[1] * 960 |
| | | param_dict = {"cache": dict(), "is_final": False, "chunk_size": chunk_size, |
| | | "encoder_chunk_look_back": encoder_chunk_look_back, "decoder_chunk_look_back": decoder_chunk_look_back} |
| | | final_result = "" |
| | | |
| | | for sample_offset in range(0, speech_length, min(stride_size, speech_length - sample_offset)): |
| | | if sample_offset + stride_size >= speech_length - 1: |
| | | stride_size = speech_length - sample_offset |
| | | param_dict["is_final"] = True |
| | | rec_result = inference_pipeline(audio_in=speech[sample_offset: sample_offset + stride_size], |
| | | param_dict=param_dict) |
| | | if len(rec_result) != 0: |
| | | final_result += rec_result['text'] |
| | | print(rec_result) |
| | | print(final_result) |
| | |
| | | ds_dict = MsDataset.load(params.data_path) |
| | | kwargs = dict( |
| | | model=params.model, |
| | | model_revision='v1.0.6', |
| | | model_revision='v1.0.7', |
| | | update_model=False, |
| | | data_dir=ds_dict, |
| | | dataset_type=params.dataset_type, |
| | |
| | | model=args.model, |
| | | output_dir=args.output_dir, |
| | | batch_size=args.batch_size, |
| | | model_revision='v1.0.6', |
| | | model_revision='v1.0.7', |
| | | update_model=False, |
| | | mode="paraformer_fake_streaming", |
| | | param_dict={"decoding_model": args.decoding_mode, "hotword": args.hotword_txt} |
| | |
| | | + pynini.union( |
| | | pynutil.add_weight(((DAMO_SIGMA - "one") @ cardinal_graph), -0.7) @ add_leading_zero_to_double_digit |
| | | + delete_space |
| | | + pynutil.delete("cents"), |
| | | + (pynutil.delete("cents") | pynutil.delete("cent")), |
| | | pynini.cross("one", "01") + delete_space + pynutil.delete("cent"), |
| | | ) |
| | | + pynutil.insert("\"") |
| | |
| | | data = yaml.load(f, Loader=yaml.Loader) |
| | | return data |
| | | |
| | | def _prepare_cache(cache: dict = {}, chunk_size=[5, 10, 5], batch_size=1): |
| | | def _prepare_cache(cache: dict = {}, chunk_size=[5, 10, 5], encoder_chunk_look_back=0, |
| | | decoder_chunk_look_back=0, batch_size=1): |
| | | if len(cache) > 0: |
| | | return cache |
| | | config = _read_yaml(asr_train_config) |
| | | enc_output_size = config["encoder_conf"]["output_size"] |
| | | feats_dims = config["frontend_conf"]["n_mels"] * config["frontend_conf"]["lfr_m"] |
| | | cache_en = {"start_idx": 0, "cif_hidden": torch.zeros((batch_size, 1, enc_output_size)), |
| | | "cif_alphas": torch.zeros((batch_size, 1)), "chunk_size": chunk_size, "last_chunk": False, |
| | | "cif_alphas": torch.zeros((batch_size, 1)), "chunk_size": chunk_size, |
| | | "encoder_chunk_look_back": encoder_chunk_look_back, "last_chunk": False, "opt": None, |
| | | "feats": torch.zeros((batch_size, chunk_size[0] + chunk_size[2], feats_dims)), "tail_chunk": False} |
| | | cache["encoder"] = cache_en |
| | | |
| | | cache_de = {"decode_fsmn": None} |
| | | cache_de = {"decode_fsmn": None, "decoder_chunk_look_back": decoder_chunk_look_back, "opt": None, "chunk_size": chunk_size} |
| | | cache["decoder"] = cache_de |
| | | |
| | | return cache |
| | | |
| | | def _cache_reset(cache: dict = {}, chunk_size=[5, 10, 5], batch_size=1): |
| | | def _cache_reset(cache: dict = {}, chunk_size=[5, 10, 5], encoder_chunk_look_back=0, |
| | | decoder_chunk_look_back=0, batch_size=1): |
| | | if len(cache) > 0: |
| | | config = _read_yaml(asr_train_config) |
| | | enc_output_size = config["encoder_conf"]["output_size"] |
| | | feats_dims = config["frontend_conf"]["n_mels"] * config["frontend_conf"]["lfr_m"] |
| | | cache_en = {"start_idx": 0, "cif_hidden": torch.zeros((batch_size, 1, enc_output_size)), |
| | | "cif_alphas": torch.zeros((batch_size, 1)), "chunk_size": chunk_size, "last_chunk": False, |
| | | "feats": torch.zeros((batch_size, chunk_size[0] + chunk_size[2], feats_dims)), |
| | | "tail_chunk": False} |
| | | "cif_alphas": torch.zeros((batch_size, 1)), "chunk_size": chunk_size, |
| | | "encoder_chunk_look_back": encoder_chunk_look_back, "last_chunk": False, "opt": None, |
| | | "feats": torch.zeros((batch_size, chunk_size[0] + chunk_size[2], feats_dims)), "tail_chunk": False} |
| | | cache["encoder"] = cache_en |
| | | |
| | | cache_de = {"decode_fsmn": None} |
| | | cache_de = {"decode_fsmn": None, "decoder_chunk_look_back": decoder_chunk_look_back, "opt": None, "chunk_size": chunk_size} |
| | | cache["decoder"] = cache_de |
| | | |
| | | return cache |
| | | |
| | | #def _prepare_cache(cache: dict = {}, chunk_size=[5, 10, 5], batch_size=1): |
| | | # if len(cache) > 0: |
| | | # return cache |
| | | # config = _read_yaml(asr_train_config) |
| | | # enc_output_size = config["encoder_conf"]["output_size"] |
| | | # feats_dims = config["frontend_conf"]["n_mels"] * config["frontend_conf"]["lfr_m"] |
| | | # cache_en = {"start_idx": 0, "cif_hidden": torch.zeros((batch_size, 1, enc_output_size)), |
| | | # "cif_alphas": torch.zeros((batch_size, 1)), "chunk_size": chunk_size, "last_chunk": False, |
| | | # "feats": torch.zeros((batch_size, chunk_size[0] + chunk_size[2], feats_dims)), "tail_chunk": False} |
| | | # cache["encoder"] = cache_en |
| | | |
| | | # cache_de = {"decode_fsmn": None} |
| | | # cache["decoder"] = cache_de |
| | | |
| | | # return cache |
| | | |
| | | #def _cache_reset(cache: dict = {}, chunk_size=[5, 10, 5], batch_size=1): |
| | | # if len(cache) > 0: |
| | | # config = _read_yaml(asr_train_config) |
| | | # enc_output_size = config["encoder_conf"]["output_size"] |
| | | # feats_dims = config["frontend_conf"]["n_mels"] * config["frontend_conf"]["lfr_m"] |
| | | # cache_en = {"start_idx": 0, "cif_hidden": torch.zeros((batch_size, 1, enc_output_size)), |
| | | # "cif_alphas": torch.zeros((batch_size, 1)), "chunk_size": chunk_size, "last_chunk": False, |
| | | # "feats": torch.zeros((batch_size, chunk_size[0] + chunk_size[2], feats_dims)), |
| | | # "tail_chunk": False} |
| | | # cache["encoder"] = cache_en |
| | | |
| | | # cache_de = {"decode_fsmn": None} |
| | | # cache["decoder"] = cache_de |
| | | |
| | | # return cache |
| | | |
| | | def _forward( |
| | | data_path_and_name_and_type, |
| | |
| | | is_final = False |
| | | cache = {} |
| | | chunk_size = [5, 10, 5] |
| | | encoder_chunk_look_back = 0 |
| | | decoder_chunk_look_back = 0 |
| | | if param_dict is not None and "cache" in param_dict: |
| | | cache = param_dict["cache"] |
| | | if param_dict is not None and "is_final" in param_dict: |
| | | is_final = param_dict["is_final"] |
| | | if param_dict is not None and "chunk_size" in param_dict: |
| | | chunk_size = param_dict["chunk_size"] |
| | | if param_dict is not None and "encoder_chunk_look_back" in param_dict: |
| | | encoder_chunk_look_back = param_dict["encoder_chunk_look_back"] |
| | | if encoder_chunk_look_back > 0: |
| | | chunk_size[0] = 0 |
| | | if param_dict is not None and "decoder_chunk_look_back" in param_dict: |
| | | decoder_chunk_look_back = param_dict["decoder_chunk_look_back"] |
| | | |
| | | # 7 .Start for-loop |
| | | # FIXME(kamo): The output format should be discussed about |
| | | raw_inputs = torch.unsqueeze(raw_inputs, axis=0) |
| | | asr_result_list = [] |
| | | cache = _prepare_cache(cache, chunk_size=chunk_size, batch_size=1) |
| | | cache = _prepare_cache(cache, chunk_size=chunk_size, encoder_chunk_look_back=encoder_chunk_look_back, |
| | | decoder_chunk_look_back=decoder_chunk_look_back, batch_size=1) |
| | | item = {} |
| | | if data_path_and_name_and_type is not None and data_path_and_name_and_type[2] == "sound": |
| | | sample_offset = 0 |
| | | speech_length = raw_inputs.shape[1] |
| | | stride_size = chunk_size[1] * 960 |
| | | cache = _prepare_cache(cache, chunk_size=chunk_size, batch_size=1) |
| | | cache = _prepare_cache(cache, chunk_size=chunk_size, encoder_chunk_look_back=encoder_chunk_look_back, |
| | | decoder_chunk_look_back=decoder_chunk_look_back, batch_size=1) |
| | | final_result = "" |
| | | for sample_offset in range(0, speech_length, min(stride_size, speech_length - sample_offset)): |
| | | if sample_offset + stride_size >= speech_length - 1: |
| | |
| | | |
| | | asr_result_list.append(item) |
| | | if is_final: |
| | | cache = _cache_reset(cache, chunk_size=chunk_size, batch_size=1) |
| | | cache = _cache_reset(cache, chunk_size=chunk_size, encoder_chunk_look_back=encoder_chunk_look_back, |
| | | decoder_chunk_look_back=decoder_chunk_look_back, batch_size=1) |
| | | return asr_result_list |
| | | |
| | | return _forward |
| | |
| | | self.batch_mode = batch_mode |
| | | |
| | | def set_epoch(self, epoch): |
| | | self.epoch = epoch |
| | | self.datapipe.set_epoch(epoch) |
| | | |
| | | def __iter__(self): |
| | | buffer = [] |
| | |
| | | self.fn = fn |
| | | |
| | | def set_epoch(self, epoch): |
| | | self.epoch = epoch |
| | | self.datapipe.set_epoch(epoch) |
| | | |
| | | def __iter__(self): |
| | | assert callable(self.fn) |
| | |
| | | if self.fn(data): |
| | | yield data |
| | | else: |
| | | continue |
| | | continue |
| | |
| | | self.fn = fn |
| | | |
| | | def set_epoch(self, epoch): |
| | | self.epoch = epoch |
| | | self.datapipe.set_epoch(epoch) |
| | | |
| | | def __iter__(self): |
| | | assert callable(self.fn) |
| | |
| | | |
| | | return x, tgt_mask, memory, memory_mask, cache |
| | | |
| | | def forward_chunk(self, tgt, tgt_mask, memory, memory_mask=None, cache=None): |
| | | def forward_one_step(self, tgt, tgt_mask, memory, memory_mask=None, cache=None): |
| | | """Compute decoded features. |
| | | |
| | | Args: |
| | |
| | | |
| | | |
| | | return x, tgt_mask, memory, memory_mask, cache |
| | | |
| | | def forward_chunk(self, tgt, memory, fsmn_cache=None, opt_cache=None, chunk_size=None, look_back=0): |
| | | """Compute decoded features. |
| | | |
| | | Args: |
| | | tgt (torch.Tensor): Input tensor (#batch, maxlen_out, size). |
| | | tgt_mask (torch.Tensor): Mask for input tensor (#batch, maxlen_out). |
| | | memory (torch.Tensor): Encoded memory, float32 (#batch, maxlen_in, size). |
| | | memory_mask (torch.Tensor): Encoded memory mask (#batch, maxlen_in). |
| | | cache (List[torch.Tensor]): List of cached tensors. |
| | | Each tensor shape should be (#batch, maxlen_out - 1, size). |
| | | |
| | | Returns: |
| | | torch.Tensor: Output tensor(#batch, maxlen_out, size). |
| | | torch.Tensor: Mask for output tensor (#batch, maxlen_out). |
| | | torch.Tensor: Encoded memory (#batch, maxlen_in, size). |
| | | torch.Tensor: Encoded memory mask (#batch, maxlen_in). |
| | | |
| | | """ |
| | | residual = tgt |
| | | if self.normalize_before: |
| | | tgt = self.norm1(tgt) |
| | | tgt = self.feed_forward(tgt) |
| | | |
| | | x = tgt |
| | | if self.self_attn: |
| | | if self.normalize_before: |
| | | tgt = self.norm2(tgt) |
| | | x, fsmn_cache = self.self_attn(tgt, None, fsmn_cache) |
| | | x = residual + self.dropout(x) |
| | | |
| | | if self.src_attn is not None: |
| | | residual = x |
| | | if self.normalize_before: |
| | | x = self.norm3(x) |
| | | |
| | | x, opt_cache = self.src_attn.forward_chunk(x, memory, opt_cache, chunk_size, look_back) |
| | | x = residual + x |
| | | |
| | | return x, memory, fsmn_cache, opt_cache |
| | | |
| | | |
| | | class FsmnDecoderSCAMAOpt(BaseTransformerDecoder): |
| | | """ |
| | |
| | | for i in range(self.att_layer_num): |
| | | decoder = self.decoders[i] |
| | | c = cache[i] |
| | | x, tgt_mask, memory, memory_mask, c_ret = decoder.forward_chunk( |
| | | x, tgt_mask, memory, memory_mask, c_ret = decoder.forward_one_step( |
| | | x, tgt_mask, memory, memory_mask, cache=c |
| | | ) |
| | | new_cache.append(c_ret) |
| | |
| | | j = i + self.att_layer_num |
| | | decoder = self.decoders2[i] |
| | | c = cache[j] |
| | | x, tgt_mask, memory, memory_mask, c_ret = decoder.forward_chunk( |
| | | x, tgt_mask, memory, memory_mask, c_ret = decoder.forward_one_step( |
| | | x, tgt_mask, memory, memory_mask, cache=c |
| | | ) |
| | | new_cache.append(c_ret) |
| | | |
| | | for decoder in self.decoders3: |
| | | x, tgt_mask, memory, memory_mask, _ = decoder.forward_chunk( |
| | | x, tgt_mask, memory, memory_mask, _ = decoder.forward_one_step( |
| | | x, tgt_mask, memory, None, cache=None |
| | | ) |
| | | |
| | |
| | | lora_rank: int = 8, |
| | | lora_alpha: int = 16, |
| | | lora_dropout: float = 0.1, |
| | | chunk_multiply_factor: tuple = (1,), |
| | | tf2torch_tensor_name_prefix_torch: str = "decoder", |
| | | tf2torch_tensor_name_prefix_tf: str = "seq2seq/decoder", |
| | | ): |
| | |
| | | ) |
| | | self.tf2torch_tensor_name_prefix_torch = tf2torch_tensor_name_prefix_torch |
| | | self.tf2torch_tensor_name_prefix_tf = tf2torch_tensor_name_prefix_tf |
| | | self.chunk_multiply_factor = chunk_multiply_factor |
| | | |
| | | def forward( |
| | | self, |
| | |
| | | cache_layer_num = len(self.decoders) |
| | | if self.decoders2 is not None: |
| | | cache_layer_num += len(self.decoders2) |
| | | new_cache = [None] * cache_layer_num |
| | | fsmn_cache = [None] * cache_layer_num |
| | | else: |
| | | new_cache = cache["decode_fsmn"] |
| | | fsmn_cache = cache["decode_fsmn"] |
| | | |
| | | if cache["opt"] is None: |
| | | cache_layer_num = len(self.decoders) |
| | | opt_cache = [None] * cache_layer_num |
| | | else: |
| | | opt_cache = cache["opt"] |
| | | |
| | | for i in range(self.att_layer_num): |
| | | decoder = self.decoders[i] |
| | | x, tgt_mask, memory, memory_mask, c_ret = decoder.forward_chunk( |
| | | x, None, memory, None, cache=new_cache[i] |
| | | x, memory, fsmn_cache[i], opt_cache[i] = decoder.forward_chunk( |
| | | x, memory, fsmn_cache=fsmn_cache[i], opt_cache=opt_cache[i], |
| | | chunk_size=cache["chunk_size"], look_back=cache["decoder_chunk_look_back"] |
| | | ) |
| | | new_cache[i] = c_ret |
| | | |
| | | if self.num_blocks - self.att_layer_num > 1: |
| | | for i in range(self.num_blocks - self.att_layer_num): |
| | | j = i + self.att_layer_num |
| | | decoder = self.decoders2[i] |
| | | x, tgt_mask, memory, memory_mask, c_ret = decoder.forward_chunk( |
| | | x, None, memory, None, cache=new_cache[j] |
| | | x, memory, fsmn_cache[j], _ = decoder.forward_chunk( |
| | | x, memory, fsmn_cache=fsmn_cache[j] |
| | | ) |
| | | new_cache[j] = c_ret |
| | | |
| | | for decoder in self.decoders3: |
| | | |
| | | x, tgt_mask, memory, memory_mask, _ = decoder.forward_chunk( |
| | | x, None, memory, None, cache=None |
| | | x, memory, _, _ = decoder.forward_chunk( |
| | | x, memory |
| | | ) |
| | | if self.normalize_before: |
| | | x = self.after_norm(x) |
| | | if self.output_layer is not None: |
| | | x = self.output_layer(x) |
| | | cache["decode_fsmn"] = new_cache |
| | | |
| | | cache["decode_fsmn"] = fsmn_cache |
| | | if cache["decoder_chunk_look_back"] > 0 or cache["decoder_chunk_look_back"] == -1: |
| | | cache["opt"] = opt_cache |
| | | return x |
| | | |
| | | def forward_one_step( |
| | |
| | | for i in range(self.att_layer_num): |
| | | decoder = self.decoders[i] |
| | | c = cache[i] |
| | | x, tgt_mask, memory, memory_mask, c_ret = decoder.forward_chunk( |
| | | x, tgt_mask, memory, memory_mask, c_ret = decoder.forward_one_step( |
| | | x, tgt_mask, memory, None, cache=c |
| | | ) |
| | | new_cache.append(c_ret) |
| | |
| | | j = i + self.att_layer_num |
| | | decoder = self.decoders2[i] |
| | | c = cache[j] |
| | | x, tgt_mask, memory, memory_mask, c_ret = decoder.forward_chunk( |
| | | x, tgt_mask, memory, memory_mask, c_ret = decoder.forward_one_step( |
| | | x, tgt_mask, memory, None, cache=c |
| | | ) |
| | | new_cache.append(c_ret) |
| | | |
| | | for decoder in self.decoders3: |
| | | |
| | | x, tgt_mask, memory, memory_mask, _ = decoder.forward_chunk( |
| | | x, tgt_mask, memory, memory_mask, _ = decoder.forward_one_step( |
| | | x, tgt_mask, memory, None, cache=None |
| | | ) |
| | | |
| | |
| | | if not self.normalize_before: |
| | | x = self.norm2(x) |
| | | |
| | | |
| | | return x, mask, cache, mask_shfit_chunk, mask_att_chunk_encoder |
| | | |
| | | def forward_chunk(self, x, cache=None, chunk_size=None, look_back=0): |
| | | """Compute encoded features. |
| | | |
| | | Args: |
| | | x_input (torch.Tensor): Input tensor (#batch, time, size). |
| | | mask (torch.Tensor): Mask tensor for the input (#batch, time). |
| | | cache (torch.Tensor): Cache tensor of the input (#batch, time - 1, size). |
| | | |
| | | Returns: |
| | | torch.Tensor: Output tensor (#batch, time, size). |
| | | torch.Tensor: Mask tensor (#batch, time). |
| | | |
| | | """ |
| | | |
| | | residual = x |
| | | if self.normalize_before: |
| | | x = self.norm1(x) |
| | | |
| | | if self.in_size == self.size: |
| | | attn, cache = self.self_attn.forward_chunk(x, cache, chunk_size, look_back) |
| | | x = residual + attn |
| | | else: |
| | | x, cache = self.self_attn.forward_chunk(x, cache, chunk_size, look_back) |
| | | |
| | | if not self.normalize_before: |
| | | x = self.norm1(x) |
| | | |
| | | residual = x |
| | | if self.normalize_before: |
| | | x = self.norm2(x) |
| | | x = residual + self.feed_forward(x) |
| | | if not self.normalize_before: |
| | | x = self.norm2(x) |
| | | |
| | | return x, cache |
| | | |
| | | |
| | | class SANMEncoder(AbsEncoder): |
| | | """ |
| | |
| | | xs_pad: torch.Tensor, |
| | | ilens: torch.Tensor, |
| | | cache: dict = None, |
| | | ctc: CTC = None, |
| | | ): |
| | | xs_pad *= self.output_size() ** 0.5 |
| | | if self.embed is None: |
| | |
| | | xs_pad = to_device(cache["feats"], device=xs_pad.device) |
| | | else: |
| | | xs_pad = self._add_overlap_chunk(xs_pad, cache) |
| | | encoder_outs = self.encoders0(xs_pad, None, None, None, None) |
| | | xs_pad, masks = encoder_outs[0], encoder_outs[1] |
| | | intermediate_outs = [] |
| | | if len(self.interctc_layer_idx) == 0: |
| | | encoder_outs = self.encoders(xs_pad, None, None, None, None) |
| | | xs_pad, masks = encoder_outs[0], encoder_outs[1] |
| | | if cache["opt"] is None: |
| | | cache_layer_num = len(self.encoders0) + len(self.encoders) |
| | | new_cache = [None] * cache_layer_num |
| | | else: |
| | | for layer_idx, encoder_layer in enumerate(self.encoders): |
| | | encoder_outs = encoder_layer(xs_pad, None, None, None, None) |
| | | xs_pad, masks = encoder_outs[0], encoder_outs[1] |
| | | if layer_idx + 1 in self.interctc_layer_idx: |
| | | encoder_out = xs_pad |
| | | new_cache = cache["opt"] |
| | | |
| | | # intermediate outputs are also normalized |
| | | if self.normalize_before: |
| | | encoder_out = self.after_norm(encoder_out) |
| | | for layer_idx, encoder_layer in enumerate(self.encoders0): |
| | | encoder_outs = encoder_layer.forward_chunk(xs_pad, new_cache[layer_idx], cache["chunk_size"], cache["encoder_chunk_look_back"]) |
| | | xs_pad, new_cache[0] = encoder_outs[0], encoder_outs[1] |
| | | |
| | | intermediate_outs.append((layer_idx + 1, encoder_out)) |
| | | |
| | | if self.interctc_use_conditioning: |
| | | ctc_out = ctc.softmax(encoder_out) |
| | | xs_pad = xs_pad + self.conditioning_layer(ctc_out) |
| | | for layer_idx, encoder_layer in enumerate(self.encoders): |
| | | encoder_outs = encoder_layer.forward_chunk(xs_pad, new_cache[layer_idx+len(self.encoders0)], cache["chunk_size"], cache["encoder_chunk_look_back"]) |
| | | xs_pad, new_cache[layer_idx+len(self.encoders0)] = encoder_outs[0], encoder_outs[1] |
| | | |
| | | if self.normalize_before: |
| | | xs_pad = self.after_norm(xs_pad) |
| | | if cache["encoder_chunk_look_back"] > 0 or cache["encoder_chunk_look_back"] == -1: |
| | | cache["opt"] = new_cache |
| | | |
| | | if len(intermediate_outs) > 0: |
| | | return (xs_pad, intermediate_outs), None, None |
| | | return xs_pad, ilens, None |
| | | |
| | | def gen_tf2torch_map_dict(self): |
| | |
| | | att_outs = self.forward_attention(v_h, scores, mask, mask_att_chunk_encoder) |
| | | return att_outs + fsmn_memory |
| | | |
| | | def forward_chunk(self, x, cache=None, chunk_size=None, look_back=0): |
| | | """Compute scaled dot product attention. |
| | | |
| | | Args: |
| | | query (torch.Tensor): Query tensor (#batch, time1, size). |
| | | key (torch.Tensor): Key tensor (#batch, time2, size). |
| | | value (torch.Tensor): Value tensor (#batch, time2, size). |
| | | mask (torch.Tensor): Mask tensor (#batch, 1, time2) or |
| | | (#batch, time1, time2). |
| | | |
| | | Returns: |
| | | torch.Tensor: Output tensor (#batch, time1, d_model). |
| | | |
| | | """ |
| | | q_h, k_h, v_h, v = self.forward_qkv(x) |
| | | if chunk_size is not None and look_back > 0 or look_back == -1: |
| | | if cache is not None: |
| | | k_h_stride = k_h[:, :, :-(chunk_size[2]), :] |
| | | v_h_stride = v_h[:, :, :-(chunk_size[2]), :] |
| | | k_h = torch.cat((cache["k"], k_h), dim=2) |
| | | v_h = torch.cat((cache["v"], v_h), dim=2) |
| | | |
| | | cache["k"] = torch.cat((cache["k"], k_h_stride), dim=2) |
| | | cache["v"] = torch.cat((cache["v"], v_h_stride), dim=2) |
| | | if look_back != -1: |
| | | cache["k"] = cache["k"][:, :, -(look_back * chunk_size[1]):, :] |
| | | cache["v"] = cache["v"][:, :, -(look_back * chunk_size[1]):, :] |
| | | else: |
| | | cache_tmp = {"k": k_h[:, :, :-(chunk_size[2]), :], |
| | | "v": v_h[:, :, :-(chunk_size[2]), :]} |
| | | cache = cache_tmp |
| | | fsmn_memory = self.forward_fsmn(v, None) |
| | | q_h = q_h * self.d_k ** (-0.5) |
| | | scores = torch.matmul(q_h, k_h.transpose(-2, -1)) |
| | | att_outs = self.forward_attention(v_h, scores, None) |
| | | return att_outs + fsmn_memory, cache |
| | | |
| | | |
| | | class MultiHeadedAttentionSANMwithMask(MultiHeadedAttentionSANM): |
| | | def __init__(self, *args, **kwargs): |
| | | super().__init__(*args, **kwargs) |
| | |
| | | scores = torch.matmul(q_h, k_h.transpose(-2, -1)) |
| | | return self.forward_attention(v_h, scores, memory_mask) |
| | | |
| | | def forward_chunk(self, x, memory, cache=None, chunk_size=None, look_back=0): |
| | | """Compute scaled dot product attention. |
| | | |
| | | Args: |
| | | query (torch.Tensor): Query tensor (#batch, time1, size). |
| | | key (torch.Tensor): Key tensor (#batch, time2, size). |
| | | value (torch.Tensor): Value tensor (#batch, time2, size). |
| | | mask (torch.Tensor): Mask tensor (#batch, 1, time2) or |
| | | (#batch, time1, time2). |
| | | |
| | | Returns: |
| | | torch.Tensor: Output tensor (#batch, time1, d_model). |
| | | |
| | | """ |
| | | q_h, k_h, v_h = self.forward_qkv(x, memory) |
| | | if chunk_size is not None and look_back > 0: |
| | | if cache is not None: |
| | | k_h = torch.cat((cache["k"], k_h), dim=2) |
| | | v_h = torch.cat((cache["v"], v_h), dim=2) |
| | | cache["k"] = k_h[:, :, -(look_back * chunk_size[1]):, :] |
| | | cache["v"] = v_h[:, :, -(look_back * chunk_size[1]):, :] |
| | | else: |
| | | cache_tmp = {"k": k_h[:, :, -(look_back * chunk_size[1]):, :], |
| | | "v": v_h[:, :, -(look_back * chunk_size[1]):, :]} |
| | | cache = cache_tmp |
| | | q_h = q_h * self.d_k ** (-0.5) |
| | | scores = torch.matmul(q_h, k_h.transpose(-2, -1)) |
| | | return self.forward_attention(v_h, scores, None), cache |
| | | |
| | | |
| | | class MultiHeadSelfAttention(nn.Module): |
| | | """Multi-Head Attention layer. |
| | |
| | | Use the following command to pull and launch the Docker image for the FunASR runtime-SDK: |
| | | |
| | | ```shell |
| | | sudo docker pull registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-cpu-0.2.1 |
| | | sudo docker pull registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-cpu-0.2.2 |
| | | |
| | | sudo docker run -p 10095:10095 -it --privileged=true -v /root:/workspace/models registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-cpu-0.2.1 |
| | | sudo docker run -p 10095:10095 -it --privileged=true -v /root:/workspace/models registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-cpu-0.2.2 |
| | | ``` |
| | | |
| | | Introduction to command parameters: |
| | |
| | | 通过下述命令拉取并启动FunASR runtime-SDK的docker镜像: |
| | | |
| | | ```shell |
| | | sudo docker pull registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-cpu-0.2.1 |
| | | sudo docker pull \ |
| | | registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-cpu-0.2.2 |
| | | mkdir -p ./funasr-runtime-resources/models |
| | | sudo docker run -p 10095:10095 -it --privileged=true -v ./funasr-runtime-resources/models:/workspace/models registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-cpu-0.2.1 |
| | | sudo docker run -p 10095:10095 -it --privileged=true \ |
| | | -v ./funasr-runtime-resources/models:/workspace/models \ |
| | | registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-cpu-0.2.2 |
| | | ``` |
| | | 如果您没有安装docker,可参考[Docker安装](#Docker安装) |
| | | |
| | |
| | | 若想直接运行client进行测试,可参考如下简易说明,以python版本为例: |
| | | |
| | | ```shell |
| | | python3 wss_client_asr.py --host "127.0.0.1" --port 10095 --mode offline --audio_in "../audio/asr_example.wav" --output_dir "./results" |
| | | python3 funasr_wss_client.py --host "127.0.0.1" --port 10095 --mode offline \ |
| | | --audio_in "../audio/asr_example.wav" --output_dir "./results" |
| | | ``` |
| | | |
| | | 命令参数说明: |
| | | ```text |
| | | --host 为FunASR runtime-SDK服务部署机器ip,默认为本机ip(127.0.0.1),如果client与服务不在同一台服务器,需要改为部署机器ip |
| | | --host 为FunASR runtime-SDK服务部署机器ip,默认为本机ip(127.0.0.1),如果client与服务不在同一台服务器, |
| | | 需要改为部署机器ip |
| | | --port 10095 部署端口号 |
| | | --mode offline表示离线文件转写 |
| | | --audio_in 需要进行转写的音频文件,支持文件路径,文件列表wav.scp |
| | | --thread_num 设置并发发送线程数,默认为1 |
| | | --ssl 设置是否开启ssl证书校验,默认1开启,设置为0关闭 |
| | | --hotword 如果模型为热词模型,可以设置热词: *.txt(每行一个热词) 或者空格分隔的热词字符串 (could be: 阿里巴巴 达摩院) |
| | | --hotword 如果模型为热词模型,可以设置热词: *.txt(每行一个热词) 或者空格分隔的热词字符串(阿里巴巴 达摩院) |
| | | --use_itn 设置是否使用itn,默认1开启,设置为0关闭 |
| | | ``` |
| | | |
| | |
| | | 命令参数说明: |
| | | |
| | | ```text |
| | | --server-ip 为FunASR runtime-SDK服务部署机器ip,默认为本机ip(127.0.0.1),如果client与服务不在同一台服务器,需要改为部署机器ip |
| | | --server-ip 为FunASR runtime-SDK服务部署机器ip,默认为本机ip(127.0.0.1),如果client与服务不在同一台服务器, |
| | | 需要改为部署机器ip |
| | | --port 10095 部署端口号 |
| | | --wav-path 需要进行转写的音频文件,支持文件路径 |
| | | --hotword 如果模型为热词模型,可以设置热词: *.txt(每行一个热词) 或者空格分隔的热词字符串 (could be: 阿里巴巴 达摩院) |
| | | --hotword 如果模型为热词模型,可以设置热词: *.txt(每行一个热词) 或者空格分隔的热词字符串 (阿里巴巴 达摩院) |
| | | --use-itn 设置是否使用itn,默认1开启,设置为0关闭 |
| | | ``` |
| | | |
| | |
| | | Use the following command to pull and start the FunASR software package docker image: |
| | | |
| | | ```shell |
| | | sudo docker pull registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-online-cpu-0.1.1 |
| | | sudo docker pull registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-online-cpu-0.1.2 |
| | | mkdir -p ./funasr-runtime-resources/models |
| | | sudo docker run -p 10095:10095 -it --privileged=true -v ./funasr-runtime-resources/models:/workspace/models registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-online-cpu-0.1.1 |
| | | sudo docker run -p 10095:10095 -it --privileged=true -v ./funasr-runtime-resources/models:/workspace/models registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-online-cpu-0.1.2 |
| | | ``` |
| | | If you do not have Docker installed, please refer to [Docker Installation](https://alibaba-damo-academy.github.io/FunASR/en/installation/docker.html) |
| | | |
| | |
| | | 通过下述命令拉取并启动FunASR软件包的docker镜像: |
| | | |
| | | ```shell |
| | | sudo docker pull registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-online-cpu-0.1.1 |
| | | sudo docker pull \ |
| | | registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-online-cpu-0.1.2 |
| | | mkdir -p ./funasr-runtime-resources/models |
| | | sudo docker run -p 10095:10095 -it --privileged=true -v ./funasr-runtime-resources/models:/workspace/models registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-online-cpu-0.1.1 |
| | | sudo docker run -p 10095:10095 -it --privileged=true \ |
| | | -v ./funasr-runtime-resources/models:/workspace/models \ |
| | | registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-online-cpu-0.1.2 |
| | | ``` |
| | | 如果您没有安装docker,可参考[Docker安装](https://alibaba-damo-academy.github.io/FunASR/en/installation/docker_zh.html) |
| | | |
| | |
| | | |
| | | 命令参数说明: |
| | | ```text |
| | | --host 为FunASR runtime-SDK服务部署机器ip,默认为本机ip(127.0.0.1),如果client与服务不在同一台服务器,需要改为部署机器ip |
| | | --host 为FunASR runtime-SDK服务部署机器ip,默认为本机ip(127.0.0.1),如果client与服务不在同一台服务器, |
| | | 需要改为部署机器ip |
| | | --port 10095 部署端口号 |
| | | --mode:`offline`表示推理模式为一句话识别;`online`表示推理模式为实时语音识别;`2pass`表示为实时语音识别,并且说话句尾采用离线模型进行纠错。 |
| | | --mode:`offline`表示推理模式为一句话识别;`online`表示推理模式为实时语音识别;`2pass`表示为实时语音识别, |
| | | 并且说话句尾采用离线模型进行纠错。 |
| | | --chunk_size:表示流式模型latency配置`[5,10,5]`,表示当前音频解码片段为600ms,并且回看300ms,右看300ms。 |
| | | --audio_in 需要进行转写的音频文件,支持文件路径,文件列表wav.scp |
| | | --thread_num 设置并发发送线程数,默认为1 |
| | | --ssl 设置是否开启ssl证书校验,默认1开启,设置为0关闭+ |
| | | --hotword 如果模型为热词模型,可以设置热词: *.txt(每行一个热词) 或者空格分隔的热词字符串 (could be: 阿里巴巴 达摩院) |
| | | --hotword 如果模型为热词模型,可以设置热词: *.txt(每行一个热词) 或者空格分隔的热词字符串 (阿里巴巴 达摩院) |
| | | --use_itn 设置是否使用itn,默认1开启,设置为0关闭 |
| | | ``` |
| | | |
| | | ### cpp-client |
| | | 进入samples/cpp目录后,可以用cpp进行测试,指令如下: |
| | | ```shell |
| | | ./funasr-wss-client-2pass --server-ip 127.0.0.1 --port 10095 --mode 2pass --wav-path ../audio/asr_example.wav |
| | | ./funasr-wss-client-2pass --server-ip 127.0.0.1 --port 10095 --mode 2pass \ |
| | | --wav-path ../audio/asr_example.wav |
| | | ``` |
| | | |
| | | 命令参数说明: |
| | | |
| | | ```text |
| | | --server-ip 为FunASR runtime-SDK服务部署机器ip,默认为本机ip(127.0.0.1),如果client与服务不在同一台服务器,需要改为部署机器ip |
| | | --server-ip 为FunASR runtime-SDK服务部署机器ip,默认为本机ip(127.0.0.1),如果client与服务不在同一台服务器, |
| | | 需要改为部署机器ip |
| | | --port 10095 部署端口号 |
| | | --mode:`offline`表示推理模式为一句话识别;`online`表示推理模式为实时语音识别;`2pass`表示为实时语音识别,并且说话句尾采用离线模型进行纠错。 |
| | | --mode:`offline`表示推理模式为一句话识别;`online`表示推理模式为实时语音识别;`2pass`表示为实时语音识别, |
| | | 并且说话句尾采用离线模型进行纠错。 |
| | | --chunk-size:表示流式模型latency配置`[5,10,5]`,表示当前音频解码片段为600ms,并且回看300ms,右看300ms。 |
| | | --wav-path 需要进行转写的音频文件,支持文件路径 |
| | | --thread-num 设置并发发送线程数,默认为1 |
| | | --is-ssl 设置是否开启ssl证书校验,默认1开启,设置为0关闭 |
| | | --hotword 如果模型为热词模型,可以设置热词: *.txt(每行一个热词) 或者空格分隔的热词字符串 (could be: 阿里巴巴 达摩院) |
| | | --hotword 如果模型为热词模型,可以设置热词: *.txt(每行一个热词) 或者空格分隔的热词字符串 (阿里巴巴 达摩院) |
| | | --use-itn 设置是否使用itn,默认1开启,设置为0关闭 |
| | | ``` |
| | | |
| | |
| | | |
| | | 命令参数说明: |
| | | ```text |
| | | --host 为FunASR runtime-SDK服务部署机器ip,默认为本机ip(127.0.0.1),如果client与服务不在同一台服务器,需要改为部署机器ip |
| | | --host 为FunASR runtime-SDK服务部署机器ip,默认为本机ip(127.0.0.1),如果client与服务不在同一台服务器, |
| | | 需要改为部署机器ip |
| | | --port 10095 部署端口号 |
| | | --mode offline表示离线文件转写 |
| | | --audio_in 需要进行转写的音频文件,支持文件路径,文件列表wav.scp |
| | | --thread_num 设置并发发送线程数,默认为1 |
| | | --ssl 设置是否开启ssl证书校验,默认1开启,设置为0关闭 |
| | | --hotword 如果模型为热词模型,可以设置热词: *.txt(每行一个热词) 或者空格分隔的热词字符串 (could be: 阿里巴巴 达摩院) |
| | | --hotword 如果模型为热词模型,可以设置热词: *.txt(每行一个热词) 或者空格分隔的热词字符串 (阿里巴巴 达摩院) |
| | | --use_itn 设置是否使用itn,默认1开启,设置为0关闭 |
| | | ``` |
| | | |
| | |
| | | 命令参数说明: |
| | | |
| | | ```text |
| | | --server-ip 为FunASR runtime-SDK服务部署机器ip,默认为本机ip(127.0.0.1),如果client与服务不在同一台服务器,需要改为部署机器ip |
| | | --server-ip 为FunASR runtime-SDK服务部署机器ip,默认为本机ip(127.0.0.1),如果client与服务不在同一台服务器, |
| | | 需要改为部署机器ip |
| | | --port 10095 部署端口号 |
| | | --wav-path 需要进行转写的音频文件,支持文件路径 |
| | | --thread_num 设置并发发送线程数,默认为1 |
| | | --ssl 设置是否开启ssl证书校验,默认1开启,设置为0关闭 |
| | | --hotword 如果模型为热词模型,可以设置热词: *.txt(每行一个热词) 或者空格分隔的热词字符串 (could be: 阿里巴巴 达摩院) |
| | | --hotword 如果模型为热词模型,可以设置热词: *.txt(每行一个热词) 或者空格分隔的热词字符串 (阿里巴巴 达摩院) |
| | | --use-itn 设置是否使用itn,默认1开启,设置为0关闭 |
| | | ``` |
| | | |
| | |
| | | 1A7F0DBE2A2F221C00A6EEB7 /* AudioCapture.mm in Sources */ = {isa = PBXBuildFile; fileRef = 1A7F0DBB2A2F221C00A6EEB7 /* AudioCapture.mm */; }; |
| | | 1A7F0DBF2A2F221C00A6EEB7 /* AudioRecorder.m in Sources */ = {isa = PBXBuildFile; fileRef = 1A7F0DBD2A2F221C00A6EEB7 /* AudioRecorder.m */; }; |
| | | 1A7F0DC32A2F312D00A6EEB7 /* model in Resources */ = {isa = PBXBuildFile; fileRef = 1A7F0DC22A2F312D00A6EEB7 /* model */; }; |
| | | 1ACBFB692AB99D55002FC7C7 /* seg_dict.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 1ACBFB672AB99D55002FC7C7 /* seg_dict.cpp */; }; |
| | | 1ACBFB6C2AB9A086002FC7C7 /* encode_converter.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 1ACBFB6B2AB9A086002FC7C7 /* encode_converter.cpp */; }; |
| | | 59C4114F365C8D714BD515FB /* Pods_paraformer_online.framework in Frameworks */ = {isa = PBXBuildFile; fileRef = EA7D0713E60886A787BAA0EA /* Pods_paraformer_online.framework */; }; |
| | | /* End PBXBuildFile section */ |
| | | |
| | |
| | | 1AB8E1EE2AA086F200F4F795 /* model.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; path = model.h; sourceTree = "<group>"; }; |
| | | 1AB8E1EF2AA086F200F4F795 /* offline-stream.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; path = "offline-stream.h"; sourceTree = "<group>"; }; |
| | | 1AB8E1F02AA086F200F4F795 /* vad-model.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; path = "vad-model.h"; sourceTree = "<group>"; }; |
| | | 1ACBFB672AB99D55002FC7C7 /* seg_dict.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; path = seg_dict.cpp; sourceTree = "<group>"; }; |
| | | 1ACBFB682AB99D55002FC7C7 /* seg_dict.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; path = seg_dict.h; sourceTree = "<group>"; }; |
| | | 1ACBFB6A2AB9A086002FC7C7 /* encode_converter.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; path = encode_converter.h; sourceTree = "<group>"; }; |
| | | 1ACBFB6B2AB9A086002FC7C7 /* encode_converter.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; path = encode_converter.cpp; sourceTree = "<group>"; }; |
| | | B9ED2A36675364C815C03C96 /* Pods-paraformer_online.debug.xcconfig */ = {isa = PBXFileReference; includeInIndex = 1; lastKnownFileType = text.xcconfig; name = "Pods-paraformer_online.debug.xcconfig"; path = "Target Support Files/Pods-paraformer_online/Pods-paraformer_online.debug.xcconfig"; sourceTree = "<group>"; }; |
| | | EA7D0713E60886A787BAA0EA /* Pods_paraformer_online.framework */ = {isa = PBXFileReference; explicitFileType = wrapper.framework; includeInIndex = 0; path = Pods_paraformer_online.framework; sourceTree = BUILT_PRODUCTS_DIR; }; |
| | | /* End PBXFileReference section */ |
| | |
| | | 1A6C92FB2A84D64E007E36DC /* ct-transformer.cpp */, |
| | | 1A6C93032A84D64E007E36DC /* ct-transformer.h */, |
| | | 1A6C92F92A84D64E007E36DC /* e2e-vad.h */, |
| | | 1ACBFB6B2AB9A086002FC7C7 /* encode_converter.cpp */, |
| | | 1ACBFB6A2AB9A086002FC7C7 /* encode_converter.h */, |
| | | 1A6C92F72A84D64E007E36DC /* fsmn-vad-online.cpp */, |
| | | 1A6C92E92A84D64E007E36DC /* fsmn-vad-online.h */, |
| | | 1A6C92E82A84D64E007E36DC /* fsmn-vad.cpp */, |
| | |
| | | 1A6C93022A84D64E007E36DC /* punc-model.cpp */, |
| | | 1A6C92ED2A84D64E007E36DC /* resample.cpp */, |
| | | 1A6C92E32A84D64E007E36DC /* resample.h */, |
| | | 1ACBFB672AB99D55002FC7C7 /* seg_dict.cpp */, |
| | | 1ACBFB682AB99D55002FC7C7 /* seg_dict.h */, |
| | | 1A6C93012A84D64E007E36DC /* tensor.h */, |
| | | 1A6C92F02A84D64E007E36DC /* tokenizer.cpp */, |
| | | 1A6C92EF2A84D64E007E36DC /* tokenizer.h */, |
| | |
| | | 1A6C93F72A84D66E007E36DC /* symbolize.cc in Sources */, |
| | | 1A6C93062A84D64E007E36DC /* util.cpp in Sources */, |
| | | 1A6C94222A84D66E007E36DC /* nodebuilder.cpp in Sources */, |
| | | 1ACBFB692AB99D55002FC7C7 /* seg_dict.cpp in Sources */, |
| | | 1A6C94132A84D66E007E36DC /* exp.cpp in Sources */, |
| | | 1A6C930A2A84D64E007E36DC /* vocab.cpp in Sources */, |
| | | 1A6C94012A84D66E007E36DC /* logging.cc in Sources */, |
| | |
| | | 1A6C940F2A84D66E007E36DC /* emitter.cpp in Sources */, |
| | | 1A6C93DE2A84D66E007E36DC /* fftsg.c in Sources */, |
| | | 1A6C940B2A84D66E007E36DC /* ostream_wrapper.cpp in Sources */, |
| | | 1ACBFB6C2AB9A086002FC7C7 /* encode_converter.cpp in Sources */, |
| | | 1A6C93E12A84D66E007E36DC /* log.cc in Sources */, |
| | | 1A6C94092A84D66E007E36DC /* exceptions.cpp in Sources */, |
| | | 1A6C94152A84D66E007E36DC /* node.cpp in Sources */, |
| | |
| | | "@executable_path/Frameworks", |
| | | ); |
| | | MARKETING_VERSION = 1.0; |
| | | PRODUCT_BUNDLE_IDENTIFIER = "com.qiuwei.paraformer-online"; |
| | | PRODUCT_BUNDLE_IDENTIFIER = "com.qiuwei.paraformer-online1"; |
| | | PRODUCT_NAME = "$(TARGET_NAME)"; |
| | | SWIFT_EMIT_LOC_STRINGS = YES; |
| | | TARGETED_DEVICE_FAMILY = "1,2"; |
| | |
| | | "@executable_path/Frameworks", |
| | | ); |
| | | MARKETING_VERSION = 1.0; |
| | | PRODUCT_BUNDLE_IDENTIFIER = "com.qiuwei.paraformer-online"; |
| | | PRODUCT_BUNDLE_IDENTIFIER = "com.qiuwei.paraformer-online1"; |
| | | PRODUCT_NAME = "$(TARGET_NAME)"; |
| | | SWIFT_EMIT_LOC_STRINGS = YES; |
| | | TARGETED_DEVICE_FAMILY = "1,2"; |
| | |
| | | _FUNASRAPI FUNASR_RESULT FunOfflineInfer(FUNASR_HANDLE handle, const char* sz_filename, FUNASR_MODE mode, |
| | | QM_CALLBACK fn_callback, const std::vector<std::vector<float>> &hw_emb, |
| | | int sampling_rate=16000, bool itn=true); |
| | | #if !defined(__APPLE__) |
| | | _FUNASRAPI const std::vector<std::vector<float>> CompileHotwordEmbedding(FUNASR_HANDLE handle, std::string &hotwords, ASR_TYPE mode=ASR_OFFLINE); |
| | | #endif |
| | | |
| | | _FUNASRAPI void FunOfflineUninit(FUNASR_HANDLE handle); |
| | | |
| | | //2passStream |
| | |
| | | virtual std::string Rescoring() = 0; |
| | | virtual void InitHwCompiler(const std::string &hw_model, int thread_num){}; |
| | | virtual void InitSegDict(const std::string &seg_dict_model){}; |
| | | virtual std::vector<std::vector<float>> CompileHotwordEmbedding(std::string &hotwords){}; |
| | | virtual std::vector<std::vector<float>> CompileHotwordEmbedding(std::string &hotwords){return std::vector<std::vector<float>>();}; |
| | | }; |
| | | |
| | | Model *CreateModel(std::map<std::string, std::string>& model_path, int thread_num=1, ASR_TYPE type=ASR_OFFLINE); |
| | |
| | | #include "model.h" |
| | | #include "punc-model.h" |
| | | #include "vad-model.h" |
| | | #if !defined(__APPLE__) |
| | | #include "itn-model.h" |
| | | #endif |
| | | |
| | | namespace funasr { |
| | | class OfflineStream { |
| | |
| | | std::unique_ptr<VadModel> vad_handle= nullptr; |
| | | std::unique_ptr<Model> asr_handle= nullptr; |
| | | std::unique_ptr<PuncModel> punc_handle= nullptr; |
| | | #if !defined(__APPLE__) |
| | | std::unique_ptr<ITNModel> itn_handle = nullptr; |
| | | #endif |
| | | bool UseVad(){return use_vad;}; |
| | | bool UsePunc(){return use_punc;}; |
| | | bool UseITN(){return use_itn;}; |
| | |
| | | #include "model.h" |
| | | #include "punc-model.h" |
| | | #include "vad-model.h" |
| | | #if !defined(__APPLE__) |
| | | #include "itn-model.h" |
| | | #endif |
| | | |
| | | namespace funasr { |
| | | class TpassStream { |
| | |
| | | std::unique_ptr<VadModel> vad_handle = nullptr; |
| | | std::unique_ptr<Model> asr_handle = nullptr; |
| | | std::unique_ptr<PuncModel> punc_online_handle = nullptr; |
| | | #if !defined(__APPLE__) |
| | | std::unique_ptr<ITNModel> itn_handle = nullptr; |
| | | #endif |
| | | bool UseVad(){return use_vad;}; |
| | | bool UsePunc(){return use_punc;}; |
| | | bool UseITN(){return use_itn;}; |
| | |
| | | string punc_res = (offline_stream->punc_handle)->AddPunc((p_result->msg).c_str()); |
| | | p_result->msg = punc_res; |
| | | } |
| | | #if !defined(__APPLE__) |
| | | if(offline_stream->UseITN() && itn){ |
| | | string msg_itn = offline_stream->itn_handle->Normalize(p_result->msg); |
| | | p_result->msg = msg_itn; |
| | | } |
| | | #endif |
| | | |
| | | return p_result; |
| | | } |
| | |
| | | string punc_res = (offline_stream->punc_handle)->AddPunc((p_result->msg).c_str()); |
| | | p_result->msg = punc_res; |
| | | } |
| | | #if !defined(__APPLE__) |
| | | if(offline_stream->UseITN() && itn){ |
| | | string msg_itn = offline_stream->itn_handle->Normalize(p_result->msg); |
| | | p_result->msg = msg_itn; |
| | | } |
| | | #endif |
| | | return p_result; |
| | | } |
| | | |
| | | #if !defined(__APPLE__) |
| | | _FUNASRAPI const std::vector<std::vector<float>> CompileHotwordEmbedding(FUNASR_HANDLE handle, std::string &hotwords, ASR_TYPE mode) |
| | | { |
| | | if (mode == ASR_OFFLINE){ |
| | |
| | | } |
| | | |
| | | } |
| | | |
| | | #endif |
| | | |
| | | // APIs for 2pass-stream Infer |
| | | _FUNASRAPI FUNASR_RESULT FunTpassInferBuffer(FUNASR_HANDLE handle, FUNASR_HANDLE online_handle, const char* sz_buf, |
| | |
| | | string online_msg = ((funasr::ParaformerOnline*)asr_online_handle)->online_res; |
| | | string msg_punc = punc_online_handle->AddPunc(online_msg.c_str(), punc_cache[0]); |
| | | p_result->tpass_msg = msg_punc; |
| | | |
| | | #if !defined(__APPLE__) |
| | | // ITN |
| | | if(tpass_stream->UseITN() && itn){ |
| | | string msg_itn = tpass_stream->itn_handle->Normalize(msg_punc); |
| | | p_result->tpass_msg = msg_itn; |
| | | } |
| | | |
| | | #endif |
| | | ((funasr::ParaformerOnline*)asr_online_handle)->online_res = ""; |
| | | p_result->msg += msg; |
| | | }else{ |
| | |
| | | msg_punc += "。"; |
| | | } |
| | | p_result->tpass_msg = msg_punc; |
| | | #if !defined(__APPLE__) |
| | | if(tpass_stream->UseITN() && itn){ |
| | | string msg_itn = tpass_stream->itn_handle->Normalize(msg_punc); |
| | | p_result->tpass_msg = msg_itn; |
| | | } |
| | | #endif |
| | | |
| | | if(frame != NULL){ |
| | | delete frame; |
| | |
| | | use_punc = true; |
| | | } |
| | | } |
| | | |
| | | #if !defined(__APPLE__) |
| | | // Optional: ITN, here we just support language_type=MandarinEnglish |
| | | if(model_path.find(ITN_DIR) != model_path.end() && model_path.at(ITN_DIR) != ""){ |
| | | string itn_tagger_path = PathAppend(model_path.at(ITN_DIR), ITN_TAGGER_NAME); |
| | |
| | | use_itn = true; |
| | | } |
| | | } |
| | | #endif |
| | | } |
| | | |
| | | OfflineStream *CreateOfflineStream(std::map<std::string, std::string>& model_path, int thread_num) |
| | |
| | | return ""; |
| | | } |
| | | //PrintMat(hw_emb, "input_clas_emb"); |
| | | const int64_t hotword_shape[3] = {1, hw_emb.size(), hw_emb[0].size()}; |
| | | const int64_t hotword_shape[3] = {1, static_cast<int64_t>(hw_emb.size()), static_cast<int64_t>(hw_emb[0].size())}; |
| | | embedding.reserve(hw_emb.size() * hw_emb[0].size()); |
| | | for (auto item : hw_emb) { |
| | | embedding.insert(embedding.end(), item.begin(), item.end()); |
| | |
| | | #else |
| | | #include "onnxruntime_run_options_config_keys.h" |
| | | #include "onnxruntime_cxx_api.h" |
| | | #include "itn-model.h" |
| | | #include "itn-processor.h" |
| | | #endif |
| | | |
| | | #include "kaldi-native-fbank/csrc/feature-fbank.h" |
| | |
| | | #include "model.h" |
| | | #include "vad-model.h" |
| | | #include "punc-model.h" |
| | | #include "itn-model.h" |
| | | #include "tokenizer.h" |
| | | #include "ct-transformer.h" |
| | | #include "ct-transformer-online.h" |
| | | #include "itn-processor.h" |
| | | #include "e2e-vad.h" |
| | | #include "fsmn-vad.h" |
| | | #include "encode_converter.h" |
| | |
| | | use_punc = true; |
| | | } |
| | | } |
| | | |
| | | #if !defined(__APPLE__) |
| | | // Optional: ITN, here we just support language_type=MandarinEnglish |
| | | if(model_path.find(ITN_DIR) != model_path.end()){ |
| | | string itn_tagger_path = PathAppend(model_path.at(ITN_DIR), ITN_TAGGER_NAME); |
| | |
| | | use_itn = true; |
| | | } |
| | | } |
| | | #endif |
| | | |
| | | } |
| | | |
| | |
| | | mm = new TpassStream(model_path, thread_num); |
| | | return mm; |
| | | } |
| | | } // namespace funasr |
| | | } // namespace funasr |
| | |
| | | stride = int(60 * chunk_size[1]/ chunk_interval / 1000 * 16000 * 2)
|
| | | chunk_num = (len(audio_bytes) - 1) // stride + 1
|
| | |
|
| | | message = json.dumps({"mode": mode, "chunk_size": chunk_size, "chunk_interval": chunk_interval,
|
| | | message = json.dumps({"mode": args.mode, "chunk_size": args.chunk_size, "encoder_chunk_look_back": 4,
|
| | | "decoder_chunk_look_back": 1, "chunk_interval": args.chunk_interval, |
| | | "wav_name": wav_name, "is_speaking": True})
|
| | |
|
| | | self.websocket.send(message)
|
| | |
| | | print("text",text)
|
| | |
|
| | |
|
| | | |
| | | |
| | |
| | | type=str, |
| | | default="5, 10, 5", |
| | | help="chunk") |
| | | parser.add_argument("--encoder_chunk_look_back", |
| | | type=int, |
| | | default=4, |
| | | help="number of chunks to lookback for encoder self-attention") |
| | | parser.add_argument("--decoder_chunk_look_back", |
| | | type=int, |
| | | default=1, |
| | | help="number of encoder chunks to lookback for decoder cross-attention") |
| | | parser.add_argument("--chunk_interval", |
| | | type=int, |
| | | default=10, |
| | |
| | | input=True, |
| | | frames_per_buffer=CHUNK) |
| | | |
| | | message = json.dumps({"mode": args.mode, "chunk_size": args.chunk_size, "chunk_interval": args.chunk_interval, |
| | | message = json.dumps({"mode": args.mode, "chunk_size": args.chunk_size, "encoder_chunk_look_back": args.encoder_chunk_look_back, |
| | | "decoder_chunk_look_back": args.decoder_chunk_look_back, "chunk_interval": args.chunk_interval, |
| | | "wav_name": "microphone", "is_speaking": True}) |
| | | #voices.put(message) |
| | | await websocket.send(message) |
| | |
| | | model=args.asr_model_online, |
| | | ngpu=args.ngpu, |
| | | ncpu=args.ncpu, |
| | | model_revision='v1.0.4', |
| | | update_model='v1.0.4', |
| | | model_revision='v1.0.7', |
| | | update_model='v1.0.7', |
| | | mode='paraformer_streaming') |
| | | |
| | | print("model loaded! only support one client at the same time now!!!!") |
| | |
| | | websocket.wav_name = messagejson.get("wav_name") |
| | | if "chunk_size" in messagejson: |
| | | websocket.param_dict_asr_online["chunk_size"] = messagejson["chunk_size"] |
| | | if "encoder_chunk_look_back" in messagejson: |
| | | websocket.param_dict_asr_online["encoder_chunk_look_back"] = messagejson["encoder_chunk_look_back"] |
| | | if "decoder_chunk_look_back" in messagejson: |
| | | websocket.param_dict_asr_online["decoder_chunk_look_back"] = messagejson["decoder_chunk_look_back"] |
| | | if "mode" in messagejson: |
| | | websocket.mode = messagejson["mode"] |
| | | if len(frames_asr_online) > 0 or len(frames_asr) > 0 or not isinstance(message, str): |
| | |
| | | |
| | | ### latest version & image ID |
| | | |
| | | | image version | image ID | INFO | |
| | | |------------------------------|-----|------| |
| | | | funasr-runtime-sdk-cpu-0.2.1 | 1ad3d19e0707 | | |
| | | | image version | image ID | INFO | |
| | | |-------------------------------------|-----|------| |
| | | | funasr-runtime-sdk-online-cpu-0.1.2 | 7222c5319bcf | | |
| | | |
| | | ## File Transcription Service, Mandarin (CPU) |
| | | |
| | |
| | | The documentation mainly targets advanced developers who require modifications and customization of the service. It supports downloading model deployments from modelscope and also supports deploying models that users have fine-tuned. For detailed information, please refer to the documentation available by [docs](./docs/SDK_advanced_guide_offline.md) |
| | | |
| | | ### latest version & image ID |
| | | | image version | image ID | INFO | |
| | | |-----|-----|------| |
| | | | funasr-runtime-sdk-online-cpu-0.1.1 | bdbdd0b27dee | | |
| | | | image version | image ID | INFO | |
| | | |------------------------------|-----|------| |
| | | | funasr-runtime-sdk-cpu-0.2.2 | 2c5286be13e9 | | |
| | |
| | | |
| | | ### 最新版本及image ID |
| | | |
| | | | image version | image ID | INFO | |
| | | |------------------------------|-----|------| |
| | | | funasr-runtime-sdk-cpu-0.2.1 | 1ad3d19e0707 | | |
| | | | image version | image ID | INFO | |
| | | |-------------------------------------|-----|------| |
| | | | funasr-runtime-sdk-online-cpu-0.1.2 | 7222c5319bcf | | |
| | | |
| | | |
| | | ## 中文离线文件转写服务(CPU版本) |
| | |
| | | 文档介绍了背后技术原理,识别准确率,计算效率等,以及核心优势介绍:便捷、高精度、高效率、长音频链路,详细文档参考([点击此处](https://mp.weixin.qq.com/s/DHQwbgdBWcda0w_L60iUww)) |
| | | |
| | | ### 最新版本及image ID |
| | | | image version | image ID | INFO | |
| | | |-----|-----|------| |
| | | | funasr-runtime-sdk-online-cpu-0.1.1 | bdbdd0b27dee | | |
| | | | image version | image ID | INFO | |
| | | |------------------------------|-----|------| |
| | | | funasr-runtime-sdk-cpu-0.2.2 | 2c5286be13e9 | | |