Pytorch报错 “RuntimeError: Expected to have finished reduction in the prior iteration ... ” 的解决方案

在单卡跑代码的时候没有问题,多卡的时候出现报错信息:

RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss. You can enable unused parameter detection by (1) passing the keyword argument `find_unused_parameters=True` to `torch.nn.parallel.DistributedDataParallel`; (2) making sure all `forward` function outputs participate in calculating loss. If you already have done the above two steps, then the distributed data parallel module wasn't able to locate the output tensors in the return value of your module's `forward` function. Please include the loss function and the structure of the return value of `forward` of your module when reporting this issue (e.g. list, dict, iterable).

我觉的比较合适的解决方案是找到代码中那一部分的参数没有接收到回传梯度,然后改正即可。可以在梯度回传之后、参数更新之前加入代码:

for name, param in model.named_parameters():
    if param.grad is None:
        print(name)

查看哪些参数没有接收到回传参数,并分析问题在哪。

你可能感兴趣的:(cv,pytorch,深度学习,python)