Han Zhang
2025-03-18 3c349ac0531b07239f37b81254f8568ab80e3f6a
refs
用户 Han Zhang <45134013+holazzer@users.noreply.github.com>
星期二, 三月 18, 2025 11:45 +0800
提交者 GitHub <noreply@github.com>
星期二, 三月 18, 2025 11:45 +0800
提交3c349ac0531b07239f37b81254f8568ab80e3f6a
目录 a8f3cd125790df9b80e6056f13262d77e4c5a90f 目录 | zip | gz
parent 77db489a8f9d1ff0771bfaea55cbeedfc77aac77 查看 | 对比
fix: use converted token_ids for alignment for sensevoice model with timestamp output (#2429)

* fix: use converted token_ids for alignment

BPE doesn't guarantee converted ids (subwords) are revertible. which means `tokens` converted back is not always the same as `token_int`. A easy fix is to directly use the converted ids for alignment. Since they are from the same text, it shouldn't matter.

* fix: handle empty string

to index an empty string is to raise an exception. 这里没有判空。
2个文件已修改
12 ■■■■ 已修改文件
funasr/models/sense_voice/model.py 10 ●●●● 对比 | 查看 | 原始文档 | blame | 历史
funasr/utils/timestamp_tools.py 2 ●●● 对比 | 查看 | 原始文档 | blame | 历史