python/FunASR-XL.git

python / FunASR-XL

FUNASR训练

概况
操作记录
提交次数
目录
文档
派生
对比

blame | 历史 | 补丁 | 提交 | 提交对比 | ignore whitespace

wechat

游雁

2024-04-23 22d71ff774d409f8b413f9955a7c0efef5b3e288

 funasr/tokenizer/char_tokenizer.py

@@ -93,7 +93,8 @@
   return seg_dict

def seg_tokenize(txt, seg_dict):
   pattern = re.compile(r'^[\u4E00-\u9FA50-9]+$')
   # pattern = re.compile(r'^[\u4E00-\u9FA50-9]+$')
   pattern = re.compile(r"([\u4E00-\u9FA5A-Za-z0-9])")
   out_txt = ""
   for word in txt:
      word = word.lower()

			@@ -93,7 +93,8 @@
			return seg_dict

			def seg_tokenize(txt, seg_dict):
			pattern = re.compile(r'^[\u4E00-\u9FA50-9]+$')
			# pattern = re.compile(r'^[\u4E00-\u9FA50-9]+$')
			pattern = re.compile(r"([\u4E00-\u9FA5A-Za-z0-9])")
			out_txt = ""
			for word in txt:
			word = word.lower()