python/FunASR-XL.git - Gitblit

python / FunASR-XL

FUNASR训练

parent: 2139ef69 | 补丁 | 提交 | show whitespace

识别结果中有英语时，缺少空格或者第一个单词的问题 (#2284)

Haitao

2024-12-13 7263fb08e9170e90e67cb9b48884cc6a35cb3b62

识别结果中有英语时，缺少空格或者第一个单词的问题 (#2284)

* Update ct-transformer-online.cpp

修复最后两个单词之间没有空格的问题

* Update ct-transformer-online.cpp

解决语音中连续两句英语，offline结果丢失第二句第一个单词的情况。

1个文件已修改

runtime/onnxruntime/src/ct-transformer-online.cpp

7 ●●●●● 补丁 | 查看 | 原始文档 | blame | 历史

 runtime/onnxruntime/src/ct-transformer-online.cpp

@@ -42,6 +42,11 @@
    vector<int> InputData;
    string strText; //full_text
    strText = accumulate(arr_cache.begin(), arr_cache.end(), strText);

    // 如果上一句的结尾是英语字母，并且这一句的开始也是英语字母，应该添加空格
    if ((strText.size() > 0 and !(strText[strText.size()-1] & 0x80)) && (strlen(sz_input) > 0 && !(sz_input[0] & 0x80)))
        strText += " ";

    strText += sz_input;  // full_text = precache + text  
    m_tokenizer.Tokenize(strText.c_str(), strOut, InputData);

@@ -107,7 +112,7 @@
    {
        if (!(sentence_words_list[i][0] & 0x80) && (i + 1) < sentence_words_list.size() && !(sentence_words_list[i + 1][0] & 0x80))
        {
            sentence_words_list[i] = " " + sentence_words_list[i];
            sentence_words_list[i] = sentence_words_list[i] + " ";
        }
        if (nSkipNum < arr_cache.size())  //    if skip_num < len(cache):
            nSkipNum++;