- ubuntu20.04 (pytorch1.6+cuda10.2) & (pytorch1.2+cuda9.2) 多cuda切换
- neural_renderer_pytorch在torch1.2下运行,1.6装不上
- 使用命令切换cuda版本
sudo rm -rf cuda sudo ln -s /usr/local/cuda-9.2 /usr/local/cuda sudo rm -rf cuda sudo ln -s /usr/local/cuda-10.2 /usr/local/cuda
最近想学习 无先验三维模型,无监督,利用对称信息的三维重建代码 2020 cvpr https://github.com/elliottwu/unsup3d
使用微分渲染器, 我当前的环境为 pytorch1.6, cuda 10.2, 用如下命令安装, 源码为https://github.com/daniilidis-group/neural_renderer
pip install neural_renderer_pytorch
或者 pip install https://github.com/daniilidis-group/neural_renderer/zipball/master
两条命令一样的
报错很多,如下
搜索报错原因, 需要将pytorch版本降低到1.2.0(见https://zhuanlan.zhihu.com/p/195068015) 同时unsup3d的作者写到环境为pytorch1.2.0&cuda9.2, 而我的版本是pytorch1.6, cuda 10.2
那就重新创建一个虚拟环境安装torch1.2.0(https://pytorch.org/get-started/previous-versions/) 官方推荐的对应cuda版本为9.2和10.0, 而我的是10.2, 搞个10.2的试试:
conda install pytorch==1.2.0 torchvision==0.4.0 cudatoolkit=10.2 -c pytorch 果然不行,有很多冲突, 还是得装9.2
如何安装多个CUDA版本并可以随时切换,参考如下:
https://blog.csdn.net/yinxingtianxia/article/details/80462892
https://www.cnblogs.com/yhjoker/p/10972795.html
https://zhuanlan.zhihu.com/p/127901837
/usr/local/cuda 实际上是一个软连接文件,当其存在时一般被设置为指向系统中某一个版本的 cuda 文件夹, 只需要修改上述软连接实际指向的 cuda 目录,而不需要修改任何其他的路径接口
sudo rm -rf /usr/local/cuda //删除软链接,注意是 /usr/local/cuda 而不是 /usr/local/cuda/,前者仅删除软链接,而后者会删除软链接所指向的目录的所有内容,操作请小心
sudo ln -s cuda_path /usr/local/cuda //创建名为 /usr/local/cuda 的软链接,其指向 cuda_path 所指定的 cuda 安装目录
1.安装cuda9.2和cudnn(已经安装了10.2)
https://developer.nvidia.com/cuda-toolkit-archive(下载runfile,我的系统是20.04,安装的16.04)
https://developer.nvidia.com/rdp/cudnn-archive
sudo sh cuda_9.2.148_396.37_linux.run
You are attempting to install on an unsupported configuration. Do you wish to continue?
(y)es/(n)o [ default is no ]: yInstall NVIDIA Accelerated Graphics Driver for Linux-x86_64 396.37?
(y)es/(n)o/(q)uit: nInstall the CUDA 9.2 Toolkit?
(y)es/(n)o/(q)uit: yEnter Toolkit Location
[ default is /usr/local/cuda-9.2 ]:Do you want to install a symbolic link at /usr/local/cuda? 这里选yes的话 cuda的软链接会到9.2,选择no还是原来的10.2
(y)es/(n)o/(q)uit: nInstall the CUDA 9.2 Samples?
(y)es/(n)o/(q)uit: yEnter CUDA Samples Location
[ default is /home/meiga ]:Installing the CUDA Toolkit in /usr/local/cuda-9.2 ...
Missing recommended library: libXi.so
Missing recommended library: libXmu.soInstalling the CUDA Samples in /home/meiga ...
Copying samples to /home/meiga/NVIDIA_CUDA-9.2_Samples now...
Finished copying samples.===========
= Summary =
===========Driver: Not Selected
Toolkit: Installed in /usr/local/cuda-9.2
Samples: Installed in /home/meiga, but missing recommended librariesPlease make sure that
- PATH includes /usr/local/cuda-9.2/bin
- LD_LIBRARY_PATH includes /usr/local/cuda-9.2/lib64, or, add /usr/local/cuda-9.2/lib64 to /etc/ld.so.conf and run ldconfig as rootTo uninstall the CUDA Toolkit, run the uninstall script in /usr/local/cuda-9.2/bin
Please see CUDA_Installation_Guide_Linux.pdf in /usr/local/cuda-9.2/doc/pdf for detailed information on setting up CUDA.
***WARNING: Incomplete installation! This installation did not install the CUDA Driver. A driver of version at least 384.00 is required for CUDA 9.2 functionality to work.
To install the driver using this installer, run the following command, replacingwith the name of this run file:
sudo.run -silent -driver Logfile is /tmp/cuda_install_35795.log
上面出现了几个missing,按照如下依赖,也没用,还是missing
sudo apt-get install freeglut3-dev build-essential libx11-dev libxmu-dev libxi-dev libgl1-mesa-glx libglu1-mesa libglu1-mesa-dev
没有卸载刚才安装的,重新run一遍,Logfile is /tmp/cuda_install_41818.log
发现cuda9.2/bin下面的文件变少了相比第一次安装
先不管missing 直接往下走吧
cudnn文件解压后
sudo cp cuda/include/cudnn.h /usr/local/cuda-9.2/include/
sudo cp cuda/lib64/libcudnn* /usr/local/cuda-9.2/lib64/
sudo chmod a+r /usr/local/cuda-9.2/include/cudnn.h
sudo chmod a+r /usr/local/cuda-9.2/lib64/libcudnn*
2.修改环境变量,把136.137 换成后面几行
gedit ~/.bashrc
source ~/.bashrc
3.多个版本cuda切换 (9.2-10.2)
使用stat命令查看当前cuda软链接指向的哪个cuda版本, cuda目录指向cuda10.2(软链接), torch找环境变量时指向cuda目录(cpp_extension),
重新建立软链接,stat显示的是修改后的,但是torch显示的版本还是10.2,可能因为这个环境下的torch安装时选择的cudatoolkit=10.2,但是他使用的CUDA_HOME下的/usr/local/cuda的软链接已经到9.2了
4.新建一个python虚拟环境,安装cuda9.2+torch1.2(为了安装neural_render,1.6不适配,所以降到1.2)
5.安装neural_renderer_pytorch
pip install neural_renderer_pytorch