BienBoy
2025-02-01 c1e365fea09aafda387cac12fdff43d28c598979
fix: resolve unexpected 'out of memory' issue in multi-GPU setup (#2373)

Fixed a bug where calling torch.cuda.empty_cache() caused extra memory usage on 'cuda:0', leading to unexpected 'out of memory' errors in multi-GPU environments.

Reference:
- https://github.com/pytorch/pytorch/issues/25752
- https://github.com/pytorch/pytorch/issues/144025
4个文件已修改
4 ■■■■ 已修改文件
funasr/auto/auto_model.py 1 ●●●● 补丁 | 查看 | 原始文档 | blame | 历史
funasr/bin/train.py 1 ●●●● 补丁 | 查看 | 原始文档 | blame | 历史
funasr/bin/train_ds.py 1 ●●●● 补丁 | 查看 | 原始文档 | blame | 历史
funasr/models/language_model/rnn/decoders.py 1 ●●●● 补丁 | 查看 | 原始文档 | blame | 历史
funasr/auto/auto_model.py
@@ -366,6 +366,7 @@
        if pbar:
            # pbar.update(1)
            pbar.set_description(f"rtf_avg: {time_escape_total/time_speech_total:0.3f}")
        with torch.cuda.device(next(model.parameters()).device):
        torch.cuda.empty_cache()
        return asr_result_list
funasr/bin/train.py
@@ -221,6 +221,7 @@
            )
            trainer.start_step = 0
            with torch.cuda.device(kwargs["device"]):
            torch.cuda.empty_cache()
            time_escaped = (time.perf_counter() - time_slice_i) / 3600.0
funasr/bin/train_ds.py
@@ -184,6 +184,7 @@
            )
            trainer.start_step = 0
            with torch.cuda.device(kwargs["device"]):
            torch.cuda.empty_cache()
            time_escaped = (time.perf_counter() - time_slice_i) / 3600.0
funasr/models/language_model/rnn/decoders.py
@@ -873,6 +873,7 @@
                        ctc_state[idx], accum_best_ids
                    )
        with torch.cuda.device(vscores.device):
        torch.cuda.empty_cache()
        dummy_hyps = [{"yseq": [self.sos, self.eos], "score": np.array([-float("inf")])}]