kubernetes 要求集群内各节点(包括 master 节点)能通过 Pod 网段互联互通。flannel 使用 vxlan 技术为各节点创建一个可以互通的 Pod 网络,使用的端口为 UDP 8472(需要开放该端口,如公有云 AWS 等)。
flanneld 第一次启动时,从 etcd 获取配置的 Pod 网段信息,为本节点分配一个未使用的地址段,然后创建 flannedl.1
网络接口(也可能是其它名称,如 flannel1 等)。
flannel 将分配给自己的 Pod 网段信息写入 /run/flannel/docker
文件,docker 后续使用这个文件中的环境变量设置 docker0
网桥,从而从这个地址段为本节点的所有 Pod 容器分配 IP。
注意:
1、如果没有特殊指明,本文档的所有操作均在 k8s-01 节点上执行,然后远程分发文件和执行命令;
2、flanneld 与 etcd v3.4.x不兼容,需要保证安装的 etcd 为 v3.3.x;
3、flanneld 与 docker 结合使用;
cd /opt/k8s/work
mkdir flannel
wget https://github.com/coreos/flannel/releases/download/v0.11.0/flannel-v0.11.0-linux-amd64.tar.gz
tar -zxf flannel-v0.11.0-linux-amd64.tar.gz -C flannel
分发二进制文件到集群所有节点:
cat > deploy.sh << "EOF"
#!/bin/bash
cd /opt/k8s/work
source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
do
echo ">>> ${node_ip}"
scp flannel/{flanneld,mk-docker-opts.sh} root@${node_ip}:/opt/k8s/bin/
ssh root@${node_ip} "chmod +x /opt/k8s/bin/*"
done
EOF
flanneld 从 etcd 集群存取网段分配信息,而 etcd 集群启用了双向 x509 证书认证,所以需要为 flanneld 生成证书和私钥。
创建证书签名请求:
cd /opt/k8s/work
cat > flanneld-csr.json <<EOF
{
"CN": "flanneld",
"hosts": [],
"key": {
"algo": "rsa",
"size": 2048
},
"names": [
{
"C": "CN",
"ST": "BeiJing",
"L": "BeiJing",
"O": "k8s",
"OU": "4Paradigm"
}
]
}
EOF
生成证书和私钥:
$ cfssl gencert -ca=/opt/k8s/work/ca.pem \
-ca-key=/opt/k8s/work/ca-key.pem \
-config=/opt/k8s/work/ca-config.json \
-profile=kubernetes flanneld-csr.json | cfssljson -bare flanneld
$ ls flanneld*pem
flanneld-key.pem flanneld.pem
将生成的证书和私钥分发到所有节点(master 和 worker):
cat > deploy.sh << "EOF"
#!/bin/bash
cd /opt/k8s/work
source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
do
echo ">>> ${node_ip}"
ssh root@${node_ip} "mkdir -p /etc/flanneld/cert"
scp flanneld*.pem root@${node_ip}:/etc/flanneld/cert
done
EOF
注意:本步骤只需执行一次。
$ cd /opt/k8s/work
$ source /opt/k8s/bin/environment.sh
$ etcdctl \
--endpoints=${ETCD_ENDPOINTS} \
--ca-file=/opt/k8s/work/ca.pem \
--cert-file=/opt/k8s/work/flanneld.pem \
--key-file=/opt/k8s/work/flanneld-key.pem \
mk ${FLANNEL_ETCD_PREFIX}/config '{"Network":"'${CLUSTER_CIDR}'", "SubnetLen": 21, "Backend": {"Type": "vxlan"}}'
${CLUSTER_CIDR}
地址段(如 /16)必须小于 SubnetLen
,必须与 kube-controller-manager
的 --cluster-cidr
参数值一致;$ cd /opt/k8s/work
$ source /opt/k8s/bin/environment.sh
$ cat > flanneld.service << EOF
[Unit]
Description=Flanneld overlay address etcd agent
After=network.target
After=network-online.target
Wants=network-online.target
After=etcd.service
Before=docker.service
[Service]
Type=notify
ExecStart=/opt/k8s/bin/flanneld \\
-etcd-cafile=/etc/kubernetes/cert/ca.pem \\
-etcd-certfile=/etc/flanneld/cert/flanneld.pem \\
-etcd-keyfile=/etc/flanneld/cert/flanneld-key.pem \\
-etcd-endpoints=${ETCD_ENDPOINTS} \\
-etcd-prefix=${FLANNEL_ETCD_PREFIX} \\
-iface=${IFACE} \\
-ip-masq
ExecStartPost=/opt/k8s/bin/mk-docker-opts.sh -k DOCKER_NETWORK_OPTIONS -d /run/flannel/docker
Restart=always
RestartSec=5
StartLimitInterval=0
[Install]
WantedBy=multi-user.target
RequiredBy=docker.service
EOF
mk-docker-opts.sh
脚本将分配给 flanneld 的 Pod 子网段信息写入 /run/flannel/docker
文件,后续 docker 启动时使用这个文件中的环境变量配置 docker0 网桥;-iface
参数指定通信接口;-ip-masq
: flanneld 为访问 Pod 网络外的流量设置 SNAT 规则,同时将传递给 Docker 的变量 --ip-masq
(/run/flannel/docker
文件中)设置为 false,这样 Docker 将不再创建 SNAT 规则; Docker 的 --ip-masq
为 true 时,创建的 SNAT 规则比较“暴力”:将所有本节点 Pod 发起的、访问非 docker0 接口的请求做 SNAT,这样访问其他节点 Pod 的请求来源 IP 会被设置为 flannel.1 接口的 IP,导致目的 Pod 看不到真实的来源 Pod IP。 flanneld 创建的 SNAT 规则比较温和,只对访问非 Pod 网段的请求做 SNAT。cat > deploy.sh << "EOF"
#!/bin/bash
cd /opt/k8s/work
source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
do
echo ">>> ${node_ip}"
scp flanneld.service root@${node_ip}:/etc/systemd/system/
done
EOF
cat > deploy.sh << "EOF"
#!/bin/bash
cd /opt/k8s/work
source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
do
echo ">>> ${node_ip}"
ssh root@${node_ip} "systemctl daemon-reload && systemctl enable flanneld && systemctl restart flanneld"
done
EOF
cat > deploy.sh << "EOF"
#!/bin/bash
cd /opt/k8s/work
source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
do
echo ">>> ${node_ip}"
ssh root@${node_ip} "systemctl status flanneld|grep Active"
done
EOF
输出结果如下:
$ ./deploy.sh
>>> 192.168.0.71
Active: active (running) since Thu 2020-04-23 14:10:55 CST; 5min ago
>>> 192.168.0.72
Active: active (running) since Thu 2020-04-23 14:10:57 CST; 5min ago
>>> 192.168.0.73
Active: active (running) since Thu 2020-04-23 14:10:58 CST; 5min ago
看到状态为 active (running)
,表示启动成功。如果失败,用下面的命令查看日志:
journalctl -u flanneld
查看集群 Pod 网段(/16):
$ source /opt/k8s/bin/environment.sh
$ etcdctl \
--endpoints=${ETCD_ENDPOINTS} \
--ca-file=/etc/kubernetes/cert/ca.pem \
--cert-file=/etc/flanneld/cert/flanneld.pem \
--key-file=/etc/flanneld/cert/flanneld-key.pem \
get ${FLANNEL_ETCD_PREFIX}/config
输出:
{"Network":"172.30.0.0/16", "SubnetLen": 21, "Backend": {"Type": "vxlan"}}
查看已经分配的 Pod 子网段列表(/21):
$ source /opt/k8s/bin/environment.sh
$ etcdctl \
--endpoints=${ETCD_ENDPOINTS} \
--ca-file=/etc/kubernetes/cert/ca.pem \
--cert-file=/etc/flanneld/cert/flanneld.pem \
--key-file=/etc/flanneld/cert/flanneld-key.pem \
ls ${FLANNEL_ETCD_PREFIX}/subnets
输出(结果视部署情况而定):
/subnets/172.30.128.0-21
/subnets/172.30.88.0-21
/subnets/172.30.104.0-21
查看某一 Pod 网段对应的节点 IP 和 flannel 接口地址:
$ source /opt/k8s/bin/environment.sh
$ etcdctl \
--endpoints=${ETCD_ENDPOINTS} \
--ca-file=/etc/kubernetes/cert/ca.pem \
--cert-file=/etc/flanneld/cert/flanneld.pem \
--key-file=/etc/flanneld/cert/flanneld-key.pem \
get ${FLANNEL_ETCD_PREFIX}/subnets/172.30.128.0-21
输出(结果视部署情况而定):
{"PublicIP":"192.168.0.71","BackendType":"vxlan","BackendData":{"VtepMAC":"e2:5f:51:da:76:54"}}
$ ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 00:0c:29:e7:5d:81 brd ff:ff:ff:ff:ff:ff
inet 192.168.0.71/24 brd 192.168.0.255 scope global noprefixroute eth0
valid_lft forever preferred_lft forever
3: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default
link/ether e2:5f:51:da:76:54 brd ff:ff:ff:ff:ff:ff
inet 172.30.128.0/32 scope global flannel.1
valid_lft forever preferred_lft forever
$ ip route show | grep flannel.1
172.30.88.0/21 via 172.30.88.0 dev flannel.1 onlink
172.30.104.0/21 via 172.30.104.0 dev flannel.1 onlink
${FLANNEL_ETCD_PREFIX}/subnets/172.30.128.0-24
,来决定进请求发送给哪个节点的互联 IP;在各节点上部署 flannel 后,检查是否创建了 flannel 接口(名称可能为 flannel0、flannel.0、flannel.1 等):
cat > deploy.sh << "EOF"
#!/bin/bash
cd /opt/k8s/work
source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
do
echo ">>> ${node_ip}"
ssh ${node_ip} "/usr/sbin/ip addr show flannel.1|grep -w inet"
done
EOF
输出:
>>> 192.168.0.71
inet 172.30.128.0/32 scope global flannel.1
>>> 192.168.0.72
inet 172.30.88.0/32 scope global flannel.1
>>> 192.168.0.73
inet 172.30.104.0/32 scope global flannel.1
在各节点上 ping 所有flannel 接口 IP,确保能通:
注意
:把其中的IP段换成自己环境的。
cat > deploy.sh << "EOF"
#!/bin/bash
cd /opt/k8s/work
source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
do
echo ">>> ${node_ip}"
ssh ${node_ip} "ping -c 1 172.30.128.0"
ssh ${node_ip} "ping -c 1 172.30.88.0"
ssh ${node_ip} "ping -c 1 172.30.104.0"
done
EOF
输出:
$ ./deploy.sh
>>> 192.168.0.71
PING 172.30.128.0 (172.30.128.0) 56(84) bytes of data.
64 bytes from 172.30.128.0: icmp_seq=1 ttl=64 time=0.073 ms
--- 172.30.128.0 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.073/0.073/0.073/0.000 ms
PING 172.30.88.0 (172.30.88.0) 56(84) bytes of data.
64 bytes from 172.30.88.0: icmp_seq=1 ttl=64 time=0.427 ms
--- 172.30.88.0 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.427/0.427/0.427/0.000 ms
PING 172.30.104.0 (172.30.104.0) 56(84) bytes of data.
64 bytes from 172.30.104.0: icmp_seq=1 ttl=64 time=0.507 ms
--- 172.30.104.0 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.507/0.507/0.507/0.000 ms
>>> 192.168.0.72
PING 172.30.128.0 (172.30.128.0) 56(84) bytes of data.
64 bytes from 172.30.128.0: icmp_seq=1 ttl=64 time=0.527 ms
--- 172.30.128.0 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.527/0.527/0.527/0.000 ms
PING 172.30.88.0 (172.30.88.0) 56(84) bytes of data.
64 bytes from 172.30.88.0: icmp_seq=1 ttl=64 time=0.077 ms
--- 172.30.88.0 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.077/0.077/0.077/0.000 ms
PING 172.30.104.0 (172.30.104.0) 56(84) bytes of data.
64 bytes from 172.30.104.0: icmp_seq=1 ttl=64 time=0.468 ms
--- 172.30.104.0 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.468/0.468/0.468/0.000 ms
>>> 192.168.0.73
PING 172.30.128.0 (172.30.128.0) 56(84) bytes of data.
64 bytes from 172.30.128.0: icmp_seq=1 ttl=64 time=0.455 ms
--- 172.30.128.0 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.455/0.455/0.455/0.000 ms
PING 172.30.88.0 (172.30.88.0) 56(84) bytes of data.
64 bytes from 172.30.88.0: icmp_seq=1 ttl=64 time=0.144 ms
--- 172.30.88.0 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.144/0.144/0.144/0.000 ms
PING 172.30.104.0 (172.30.104.0) 56(84) bytes of data.
64 bytes from 172.30.104.0: icmp_seq=1 ttl=64 time=0.047 ms
--- 172.30.104.0 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.047/0.047/0.047/0.000 ms