K8S集群部署采坑总结

k8s版本:1.24.1

matser节点操作系统:CentOS Linux release 7.9.2009 (Core)

node1:CentOS Linux release 7.9.2009 (Core)

node2:CentOS Linux release 7.2.1511 (Core)

问题一:master节点kubeadm init时,报错k8s.gcr.io/pause:3.6拉取失败

[root@k8s-master-12 ~]# journalctl -xeu kubelet | grep Failed

"Error syncing pod, skipping" err="failed to \"CreatePodSandbox\" for \"kube-scheduler-k8s-master-12_kube-system(b886b3e01aef1830d12847a83ba1b808)\" with CreatePodSandboxError: \"Failed to create sandbox for pod \\\"kube-scheduler-k8s-master-12_kube-system(b886b3e01aef1830d12847a83ba1b808)\\\": rpc error: code = Unknown desc = failed to get sandbox image \\\"k8s.gcr.io/pause:3.6\\\": failed to pull image \\\"k8s.gcr.io/pause:3.6\\\": failed to pull and unpack image \\\"k8s.gcr.io/pause:3.6\\\": failed to resolve reference \\\"k8s.gcr.io/pause:3.6\\\": failed to do request: Head \\\"https://k8s.gcr.io/v2/pause/manifests/3.6\\\": dial tcp: lookup k8s.gcr.io on [::1]:53: read udp [::1]:41965->[::1]:53: read: connection refused\"" pod="kube-system/kube-scheduler-k8s-master-12" podUID=b886b3e01aef1830d12847a83ba1b808

kubeadm config images list查看,发现当前是k8s.gcr.io/pause:3.7,因此需要手动拉取pause:3.6,再打tag为3.6(用ctr命令,而不是docker,因为k8s1.24版本是用的containerd)

[root@k8s-master-12 ~]# kubeadm config images list
W0620 03:20:59.180365     713 version.go:103] could not fetch a Kubernetes version from the internet: unable to get URL "https://dl.k8s.io/release/stable-1.txt": Get "https://dl.k8s.io/release/stable-1.txt": dial tcp: lookup dl.k8s.io on [::1]:53: read udp [::1]:45852->[::1]:53: read: connection refused
W0620 03:20:59.180419     713 version.go:104] falling back to the local client version: v1.24.1
k8s.gcr.io/kube-apiserver:v1.24.1
k8s.gcr.io/kube-controller-manager:v1.24.1
k8s.gcr.io/kube-scheduler:v1.24.1
k8s.gcr.io/kube-proxy:v1.24.1
k8s.gcr.io/pause:3.7
k8s.gcr.io/etcd:3.5.3-0
k8s.gcr.io/coredns/coredns:v1.8.6

[root@k8s-master-12 ~]# ctr images pull XXX(镜像仓库地址)/google_containers/pause:3.6
[root@k8s-master-12 ~]# ctr -n k8s.io i tag XXX(镜像仓库地址)/google_containers/pause:3.6 k8s.gcr.io/pause:3.6
[root@k8s-master-12 ~]# crictl img  //查看是否存在pause3.6

清除残留配置后,再次初始化就成功了

[root@k8s-master-12 ~]# kubeadm reset
[root@k8s-master-12 ~]# rm -fr  $HOME/.kube/config
[root@k8s-master-12 ~]#  kubeadm init   --apiserver-advertise-address=10.113.43.12  --image-repository XXX(镜像源地址)/google_containers  --kubernetes-version v1.24.1 --service-cidr=1   0.0.0/16 --pod-network-cidr=192.168.0.0/16 --ignore-preflight-errors=all

。。。。。。。

[addons] Applied essential addon: kube-proxy

Your Kubernetes control-plane has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

Alternatively, if you are the root user, you can run:

  export KUBECONFIG=/etc/kubernetes/admin.conf

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  https://kubernetes.io/docs/concepts/cluster-administration/addons/

Then you can join any number of worker nodes by running the following on each as root:

kubeadm join 10.113.43.12:6443 --token hiiazd.3ubgkzpq7lbtvw9s \
        --discovery-token-ca-cert-hash sha256:a6527a27f392e4153b83b04b4151e19061d2d1c7d7b633a73063397a1f56a5e4

 问题二:node1 加入集群,报错container runtime没运行

[root@k8s-node-13 ~]# kubeadm join 10.113.43.12:6443 --token qk4yxy.pe4pl3nos9l62axk         --discovery-token-ca-cert-hash sha256:6c06b0fa2b797d723350e2878cb11537241cedf81d08a14d5dae7440dd815323
[preflight] Running pre-flight checks
error execution phase preflight: [preflight] Some fatal errors occurred:
        [ERROR CRI]: container runtime is not running: output: E0621 17:02:32.927809   32096 remote_runtime.go:925] "Status from runtime service failed" err="rpc error: code = Unimplemented desc = unknown service runtime.v1alpha2.RuntimeService"
time="2022-06-21T17:02:32+08:00" level=fatal msg="getting status of runtime: rpc error: code = Unimplemented desc = unknown service runtime.v1alpha2.RuntimeService"
, error: exit status 1
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
To see the stack trace of this error execute with --v=5 or higher

但实际上运行了,网上搜索了很久,发现是/etc/containerd/config.toml文件有问题,因此处理如下:

[root@k8s-node-13 ~]# rm /etc/containerd/config.toml
rm: remove regular file ‘/etc/containerd/config.toml’? y
[root@k8s-node-13 ~]# systemctl restart containerd
[root@k8s-node-13 ~]# kubeadm join 10.113.43.12:6443 --token qk4yxy.pe4pl3nos9l62axk         --discovery-token-ca-cert-hash sha256:6c06b0fa2b797d723350e2878cb11537241cedf81d08a14d5dae7440dd815323
[preflight] Running pre-flight checks
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Starting the kubelet
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...

This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.

Run 'kubectl get nodes' on the control-plane to see this node join the cluster.

就成功加入了。

问题三:node2加入集群报错,但是用问题二同样的方法处理没用,还是报错,暂时没有找到解决办法,初步判断是centos版本太低导致的,换个高点的版本就没问题

k8s-node-11:~ #kubeadm join 10.113.43.12:6443 --token qk4yxy.pe4pl3nos9l62axk         --discovery-token-ca-cert-hash sha256:6c06b0fa2b797d723350e2878cb11537241cedf81d08a14d5dae7440dd815323
[preflight] Running pre-flight checks
error execution phase preflight: [preflight] Some fatal errors occurred:
        [ERROR CRI]: container runtime is not running: output: E0621 17:07:51.862976    9539 remote_runtime.go:925] "Status from runtime service failed" err="rpc error: code = Unimplemented desc = unknown service runtime.v1alpha2.RuntimeService"
time="2022-06-21T17:07:51+08:00" level=fatal msg="getting status of runtime: rpc error: code = Unimplemented desc = unknown service runtime.v1alpha2.RuntimeService"
, error: exit status 1
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
To see the stack trace of this error execute with --v=5 or higher

问题四:k8s在1.20.X以后弃用了docker,所以要安装containerd,以及设置镜像仓库。

文件:/etc/containerd/config.toml,设置完重启containerd服务,查看配置是否成功的命令:containerd config dump,然后使用crictl  pull命令拉取镜像。(endpoint填写镜像仓库地址,registry.mirrors后面填写仓库名字,可以是具体的,比如docker.io,也可以是*,表示任意的)

[root@k8s-master-12 containerd]# pwd
/etc/containerd
[root@k8s-master-12 containerd]# cat config.toml
version = 2
[plugins]
[plugins."io.containerd.grpc.v1.cri".registry.mirrors."*"] 
endpoint = ["https://XXXX"]

[root@k8s-master-12 containerd]# crictl pull quay.io/tigera/operator:v1.20.3
quay.io/tigera/operator:v1.20.3
manifest-sha256:9ae2b4f9a6f6a211a2cba2c1673e676360e6a10412937f69b1e6ba18c048010b: done           |++++++++++++++++++++++++++++++++++++++|
layer-sha256:d2e79031f8a619b0146f7991937cda7f81b8eee0fcf2c1b47c52e7413124543e:    done           |++++++++++++++++++++++++++++++++++++++|
config-sha256:f4bd4345cfd5f775d6bda10b7feb7f485df47c4d1dcac19a65a4800403553073:   done           |++++++++++++++++++++++++++++++++++++++|
layer-sha256:b27edfcafb3d7907f7efa69af3d5c65fac02950b8ecabb81a95d94439907b5dc:    done           |++++++++++++++++++++++++++++++++++++++|
layer-sha256:758e6bb30bbed6329af62f79786370bcfa0e42ade20f5090521ac70d85cf4e22:    done           |++++++++++++++++++++++++++++++++++++++|
layer-sha256:d93adff3933d85f6d59786719276abe7055f0a6a60db46d68cde81803dbf7043:    done           |++++++++++++++++++++++++++++++++++++++|
layer-sha256:696df4151dbaf7560090636491f1468e19eed083d61f078dd6ba46d4370db553:    done           |++++++++++++++++++++++++++++++++++++++|
elapsed: 3.3 s                                                                    total:  23.0 M (7.0 MiB/s)
unpacking linux/amd64 sha256:9ae2b4f9a6f6a211a2cba2c1673e676360e6a10412937f69b1e6ba18c048010b...
done: 1.007909634s

注意:①这个文件里只需要填写需要改变的地方,不需要修改的会默认使用默认配置,之前我试过把全部默认配置写到config.tolm文件里后,再修改镜像仓库,发现修改没有生效,原因未知。

         ②镜像仓库设置只对crictl命令生效,ctr命令无效

 

你可能感兴趣的:(运维,kubernetes)