nvidia-device-plugin实现gpu虚拟化

简介

 NVIDIA device plugin是以dameonset方式部署到k8s集群,部署后可以实现:

  • 暴露集群中n每个node节点的gpu数量
  • 跟踪gpu健康状态
  • 可以在k8s集群中运行gpu容器

前置条件

  • NVIDIA drivers ~= 384.81
  • nvidia-docker >= 2.0 || nvidia-container-toolkit >= 1.7.0 (>= 1.11.0 to use integrated GPUs on Tegra-based systems)
  • nvidia-container-runtime configured as the default low-level runtime
  • Kubernetes version >= 1.10

快速开始

准备GPU节点

1.为每台节点安装nvidia-container-toolkit

2.设置nvidia-container-runtime为默认容器运行时

#cat /etc/docker/daemon.json
{
	"exec-opts": ["native.cgroupdriver=systemd"],
	"data-root": "/data/docker",
	"default-runtime": "nvidia",
    "runtimes": {
        "nvidia": {
            "path": "nvidia-container-runtime",
            "runtimeArgs": []
       

你可能感兴趣的:(gpu虚拟化,k8s,gpu算力)