完美解决windows系统raise RuntimeError("Distributed package doesn't have NCCL "

在训练时出现如下问题: 

File "C:\Users\urser\anaconda3\lib\site-packages\torch\distributed\distributed_c10d.py", line 597, in _new_process_group_helper
    raise RuntimeError("Distributed package doesn't have NCCL "
RuntimeError: Distributed package doesn't have NCCL built in

从文字上来看,错误提示很明显了,没有NCCL

而windows不支持NCCL backend.

我们看下官方文档:

As of PyTorch v1.8, Windows supports all collective communications backend but NCCL, If the init_method argument of init_process_group() points to a file it must adhere to the following schema:

而要解决这个问题也很简单,不使用NCCL backend.就可以了。

只需要一行代码就可以解决问题。

获取解决方案请阅读全文

隐藏内容需要支付:¥10

未经允许不得转载!完美解决windows系统raise RuntimeError("Distributed package doesn't have NCCL "

如遇到无法显示的问题,请先尝试刷新页面

客服联系邮箱:ai52learn@foxmail.com

本文地址:https://ai.52learn.online/11955