RT-DETR模型训练中断,接着训练的方法
RT-DETR模型训练中断,如何接着训练?
这里一共250epoch已经全部训练结束,只是最后的结果部分被中断
问题:模型正常训练250epoch后,报错如下:
250 epochs completed in 48.328 hours.File "C:\Users\Administrator\.conda\envs\RTDETR-main\Lib\site-packages\torch\serialization.py", line 1529, in loadraise pickle.UnpicklingError(_get_wo_message(str(e))) from None
_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint.(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.WeightsUnpickler error: Unsupported global: GLOBAL ultralytics.nn.tasks.RTDETRDetectionModel was not an allowed global by default. Please use `torch.serialization.add_safe_globals([ultralytics.nn.tasks.RTDETRDetectionModel])` or the `torch.serialization.safe_globals([ultralytics.nn.tasks.RTDETRDetectionModel])` context manager to allowlist this global if you trust this class/function.Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.进程已结束,退出代码为 1
解决:参考这篇博客解决
https://blog.csdn.net/qq_52551375/article/details/149266289?spm=1011.2415.3001.5331
(1)修改task.py中的这行代码:return torch.load(file, map_location=‘cpu’, weights_only=False), file # load
(2)修改torch_utils.py中的这行代码: x = torch.load(f, map_location=torch.device(‘cpu’),weights_only=False)
接着训练代码:
只需加上resume=“last.pt”这行代码,写上实际的权重文件last.pt路径
import warnings
warnings.filterwarnings('ignore')
from ultralytics import RTDETR
if __name__ == '__main__':model = RTDETR('ultralytics/cfg/addmodels/rtdetr-r50.yaml')model.train(data='dataset/Visdrone.yaml',cache=False,imgsz=640,epochs=250,batch=4,workers=0, device='0', resume='runs/train/8.31_Visdrone-rtdetr-r50-exp-4bs-250epo/weights/last.pt', # last.pt path# resume=True,project='runs/train',name='8.31_Visdrone-rtdetr-r50-exp-4bs-250epo',# name='7.10_HIT-UAV-RT-DETR-r18test',)
运行后就会得到最终训练完成后的结果。