ubuntu22.04从新系统到tensorflow GPU支持

ubuntu22.04 CUDA从驱动到tensorflow安装

  • 0 系统常规设置和软件安装
    • 0.1 挂载第二硬盘默认Home
    • 0.2 软件安装
    • 0.3 安装指定版本的python
    • 0.4 python虚拟环境设置
  • 1 直接安装
    • 1.1 配置信息
    • 1.2 驱动安装
    • 1.3 集显显示,独显运算(其它debug用)
    • 1.4 卸载驱动(备用,未试)
  • 日常使用
    • ssh后台运行(断联不中断)

0 系统常规设置和软件安装

0.1 挂载第二硬盘默认Home

sudo mount /dev/sda1 /media/xxx # 挂载硬盘sda1

# 文件转移
cd /home
sudo cp -ax * /media/xxx/
sudo mv home home_old
sudo mkdir home

# 修改挂载分区表
sudo blkid /dev/sda1 # 查看硬盘UUID
sudo vim /etc/fstab
UUID=497435b9-08a6-4764-a0cd-0fdc5d278181	/home	ext4	defaults	1	2 #添加一行UUID就是上面获取到的内容

sudo reboot #重启

0.2 软件安装

sudo apt install git # 安装git
sudo apt install vim # 安装vim
sudo apt install tree # 安装tree
sudo apt install net-tools # 安装ifconfig


# 安装SSH服务
sudo apt-get install openssh-server
sudo ps -e |grep ssh # 确认服务状态
sudo service ssh start # 如果没有启动执行

sudo apt install python3-pip # 安装pip
sudo ln -s /usr/bin/python3 /usr/bin/python #命令行超链接python到python3

# 配置清华源
pip install pip -U
pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple

# jupyter
pip install jupyter
# jupyter notebook --generate-config
# jupyter notebook password
# vim ~/.jupyter/jupyter_notebook_config.py

0.3 安装指定版本的python

sudo add-apt-repository ppa:deadsnakes/ppa
sudo apt-get update
sudo apt-get install python3.11

# 设置切换选项
sudo update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.11 1
sudo update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.12 2

# 切换
sudo update-alternatives --config python3
python -V #确认当前python版本

0.4 python虚拟环境设置

# 初始化一个虚拟环境配置文件
python3 -m venv .venv

source .venv/bin/activate # 进入虚拟环境
which python # 确认虚拟环境
deactivate # 退出虚拟环境

1 直接安装

1.1 配置信息

  • 显卡Nvidia A2000

1.2 驱动安装

# Nvidia仓设置
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo apt-get update

sudo apt-get -y install cuda-drivers-535 # 显卡驱动安装
sudo apt-get -y install cuda-toolkit-12-2 # CUDA工具包安装
sudo apt-get -y install cudnn9-cuda-11 # cuDNN安装

sudo apt-get -y install libcudnn8
sudo apt-get -y install libcudnn8-dev

nvidia-msi # 重启后显卡驱动确认,输出如下
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.161.08             Driver Version: 535.161.08   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA RTX A2000 12GB          Off | 00000000:01:00.0 Off |                  Off |
| 30%   36C    P8              10W /  70W |    174MiB / 12282MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A      1715      G   /usr/lib/xorg/Xorg                           84MiB |
|    0   N/A  N/A      1907      G   /usr/bin/gnome-shell                         48MiB |
+---------------------------------------------------------------------------------------+

# 环境配置
vim ~/.bashrc

export PATH=$PATH:/usr/local/cuda/bin  
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64
export LIBRARY_PATH=$LIBRARY_PATH:/usr/local/cuda/lib64

source ~/.bashrc # 刷新环境变量
nvcc -V # 确认CUDA工具包

pip install tensorflow
# pip install cuda-python
pip install --extra-index-url https://pypi.nvidia.com cuda-python==11.8.3
python -c "import tensorflow as tf;print(tf.config.list_physical_devices('GPU'))" # 确认GPU可用

python -c "import tensorflow as tf;print('Num GPUs: ', len(tf.config.experimental.list_physical_devices('GPU')))" # 确认GPU可用
TF_CPP_MAX_VLOG_LEVEL=3 python -c "import tensorflow as tf;print(tf.config.list_physical_devices('GPU'))" # 确认GPU可用

最终输出:

2024-05-06 22:26:38.255804: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-05-06 22:26:38.280854: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-05-06 22:26:38.640060: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2024-05-06 22:26:38.868457: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2024-05-06 22:26:38.888000: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2024-05-06 22:26:38.888122: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
Num GPUs:  1

1.3 集显显示,独显运算(其它debug用)

prime-select query # 查看设置
sudo prime-select intel # 设置Intel为主
sudo prime-select on-demand # 设置GPU按需输出

1.4 卸载驱动(备用,未试)

sudo apt-get --purge remove nvidia-*
sudo apt-get --purge remove libnvidia-*

sudo dpkg --force-all -P nvidia-firmware-535-535.54.03 nvidia-kernel-common-535 nvidia-compute-utils-535 libnvidia-decode-535 nvidia-driver-535
sudo dpkg --force-all -P nvidia-*
sudo dpkg --force-all -P libnvidia-*

sudo apt autoremove
sudo apt autoclean

日常使用

ssh后台运行(断联不中断)

nohup python train.py
nohup python train.py >log.out &

ps -aux # 找到该命令的pid,用kill命令杀掉,强制结束

你可能感兴趣的:(tensorflow,人工智能)