python/FunASR-XL.git

FUNASR训练

parent: c4e70144 | 补丁 | 提交 | ignore whitespace

fix: resolve unexpected 'out of memory' issue in multi-GPU setup (#2373)

BienBoy

2025-02-01 c1e365fea09aafda387cac12fdff43d28c598979

fix: resolve unexpected 'out of memory' issue in multi-GPU setup (#2373)

Fixed a bug where calling torch.cuda.empty_cache() caused extra memory usage on 'cuda:0', leading to unexpected 'out of memory' errors in multi-GPU environments.

Reference:
- https://github.com/pytorch/pytorch/issues/25752
- https://github.com/pytorch/pytorch/issues/144025

4个文件已修改

	funasr/auto/auto_model.py	3 ●●●●● 补丁 \| 查看 \| 原始文档 \| blame \| 历史
	funasr/bin/train.py	3 ●●●●● 补丁 \| 查看 \| 原始文档 \| blame \| 历史
	funasr/bin/train_ds.py	3 ●●●●● 补丁 \| 查看 \| 原始文档 \| blame \| 历史
	funasr/models/language_model/rnn/decoders.py	3 ●●●●● 补丁 \| 查看 \| 原始文档 \| blame \| 历史

 funasr/auto/auto_model.py

@@ -366,7 +366,8 @@
        if pbar:
            # pbar.update(1)
            pbar.set_description(f"rtf_avg: {time_escape_total/time_speech_total:0.3f}")
        torch.cuda.empty_cache()
        with torch.cuda.device(next(model.parameters()).device):
            torch.cuda.empty_cache()
        return asr_result_list

    def inference_with_vad(self, input, input_len=None, **cfg):

 funasr/bin/train.py

@@ -221,7 +221,8 @@
            )
            trainer.start_step = 0

            torch.cuda.empty_cache()
            with torch.cuda.device(kwargs["device"]):
                torch.cuda.empty_cache()

            time_escaped = (time.perf_counter() - time_slice_i) / 3600.0
            logging.info(

 funasr/bin/train_ds.py

@@ -184,7 +184,8 @@
            )
            trainer.start_step = 0

            torch.cuda.empty_cache()
            with torch.cuda.device(kwargs["device"]):
                torch.cuda.empty_cache()

            time_escaped = (time.perf_counter() - time_slice_i) / 3600.0
            logging.info(

 funasr/models/language_model/rnn/decoders.py

@@ -873,7 +873,8 @@
                        ctc_state[idx], accum_best_ids
                    )

        torch.cuda.empty_cache()
        with torch.cuda.device(vscores.device):
            torch.cuda.empty_cache()

        dummy_hyps = [{"yseq": [self.sos, self.eos], "score": np.array([-float("inf")])}]
        ended_hyps = [

			@@ -366,7 +366,8 @@
			if pbar:
			# pbar.update(1)
			pbar.set_description(f"rtf_avg: {time_escape_total/time_speech_total:0.3f}")
			torch.cuda.empty_cache()
			with torch.cuda.device(next(model.parameters()).device):
			torch.cuda.empty_cache()
			return asr_result_list

			def inference_with_vad(self, input, input_len=None, **cfg):

			@@ -221,7 +221,8 @@
			)
			trainer.start_step = 0

			torch.cuda.empty_cache()
			with torch.cuda.device(kwargs["device"]):
			torch.cuda.empty_cache()

			time_escaped = (time.perf_counter() - time_slice_i) / 3600.0
			logging.info(

			@@ -184,7 +184,8 @@
			)
			trainer.start_step = 0

			torch.cuda.empty_cache()
			with torch.cuda.device(kwargs["device"]):
			torch.cuda.empty_cache()

			time_escaped = (time.perf_counter() - time_slice_i) / 3600.0
			logging.info(

			@@ -873,7 +873,8 @@
			ctc_state[idx], accum_best_ids
			)

			torch.cuda.empty_cache()
			with torch.cuda.device(vscores.device):
			torch.cuda.empty_cache()

			dummy_hyps = [{"yseq": [self.sos, self.eos], "score": np.array([-float("inf")])}]
			ended_hyps = [