一次ceph节点时钟同步异常排查总结

问题现象

ceph mon节点时钟同步异常:

$ sudo /var/lib/ceph/bin/ceph -s
  cluster:
    id:     3fe6c651-2a0c-4f15-851b-7215536897eb
    health: HEALTH_WARN
            clock skew detected on mon.c

$ sudo /var/lib/ceph/bin/ceph health detail
HEALTH_WARN clock skew detected on mon.c
MON_CLOCK_SKEW clock skew detected on mon.c
    mon.c clock skew 0.274578s > max 0.15s (latency 0.000160198s)

集群配置的最大允许时钟偏差为0.15s:

$ cat /var/lib/ceph/etc/ceph/ceph.conf | grep mon_clock_drift_allowed
mon_clock_drift_allowed = 0.15

使用date +%s.%N命令可确认系统精确时间,经确认非误报,确实存在时钟同步异常。

集群节点间采用chronyd进行时钟同步,所有节点的时钟源均配置为第一个mon节点:

$ cat /etc/chrony.conf
driftfile /var/lib/chrony/drift
rtcsync
local stratum 10
#default
server 10.127.15.182 minpoll 0 maxpoll 0
logdir /var/log/chrony
log measurements statistics tracking

问题定位

1、确认时钟同步状态

$ sudo chronyc sources -v
210 Number of sources = 1

  .-- Source mode  '^' = server, '=' = peer, '#' = local clock.
 / .- Source state '*' = current synced, '+' = combined , '-' = not combined,
| /   '?' = unreachable, 'x' = time may be in error, '~' = time too variable.
||                                                 .- xxxx [ yyyy ] +/- zzzz
||      Reachability register (octal) -.           |  xxxx = adjusted offset,
||      Log2(Polling interval) --.      |          |  yyyy = measured offset,
||                                \     |          |  zzzz = estimated error.
||                                 |    |           \
MS Name/IP address         Stratum Poll Reach LastRx Last sample               
===============================================================================
^? 10.127.15.182                 0   0     0     -     +0ns[   +0ns] +/-    0ns

显示^?,即unreachable,时钟源不可达。

2、尝试手动同步时间

$ sudo chronyc -a makestep
200 OK
$ sudo chronyc sources -v
210 Number of sources = 1

  .-- Source mode  '^' = server, '=' = peer, '#' = local clock.
 / .- Source state '*' = current synced, '+' = combined , '-' = not combined,
| /   '?' = unreachable, 'x' = time may be in error, '~' = time too variable.
||                                                 .- xxxx [ yyyy ] +/- zzzz
||      Reachability register (octal) -.           |  xxxx = adjusted offset,
||      Log2(Polling interval) --.      |          |  yyyy = measured offset,
||                                \     |          |  zzzz = estimated error.
||                                 |    |           \
MS Name/IP address         Stratum Poll Reach LastRx Last sample               
===============================================================================
^? 10.127.15.182                 0   0     0     -     +0ns[   +0ns] +/-    0ns

无效,时钟无法正常同步。

3、检查网络连通性

$ ping 10.127.15.182
PING 10.127.15.182 (10.127.15.182) 56(84) bytes of data.
64 bytes from 10.127.15.182: icmp_seq=1 ttl=64 time=0.011 ms
64 bytes from 10.127.15.182: icmp_seq=2 ttl=64 time=0.013 ms
64 bytes from 10.127.15.182: icmp_seq=3 ttl=64 time=0.017 ms
^C
--- 10.127.15.182 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2003ms
rtt min/avg/max/mdev = 0.011/0.013/0.017/0.002 ms

ping可达,暂判断网络连通性无异常(10.127.15.182为时钟源的浮动地址)。

4、检查防火墙配置

节点处于同一子网下,检查操作系统防火墙配置:

$ sudo iptables -L
Chain INPUT (policy ACCEPT)
target     prot opt source               destination         

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination         

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination      

配置无异常。

5、抓包进一步判断unreachable原因

在时钟源尝试抓取异常节点ntp数据包,未抓取到数据包,可判断ntp同步包未送达时钟源:

$ sudo tcpdump -nn -i bond0.3530 udp and host 10.127.15.156 and port 123
dropped privs to tcpdump
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on bond0.3530, link-type EN10MB (Ethernet), capture size 262144 bytes

在异常节点进行的同样的抓包操作,也未抓取到数据包,此时结合第3步ping可达,出现一个奇怪的现象,即ntp同步包未到达自身网卡。在异常节点长ping时钟源的地址10.127.15.182,并在时钟源抓包:

$ sudo tcpdump -nn -i bond0.3530 icmp and host 10.127.15.182
dropped privs to tcpdump
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on bond0.3530, link-type EN10MB (Ethernet), capture size 262144 bytes

同样未抓取到数据,可见icmp包未达到时钟源,但ping却可达,由此可以判断数据包送错了主机。

6、进一步判断异常节点访问时钟源时数据包送往何处

检查arp表无对应mac地址:

$ ip neigh show | grep 10.127.15.182
# 或
$ arp -n | grep 10.127.15.182

检查路由表发现路由走到了回环接口lo

$ ip route get 10.127.15.182
local 10.127.15.182 dev lo src 10.127.15.182 uid 1003 
    cache <local> 
    
$ ip route show table local
xxxxxx
local 10.127.15.156 dev bond0.3530 proto kernel scope host src 10.127.15.156 
local 10.127.15.182 dev bond0.3530 proto kernel scope host src 10.127.15.182 
xxxxxx

$ ip a
xxxxxx
15: bond0.3530@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 8c:2a:8e:57:5c:d5 brd ff:ff:ff:ff:ff:ff
    inet 10.127.15.156/26 brd 10.127.15.191 scope global noprefixroute bond0.3530
       valid_lft forever preferred_lft forever
    inet6 2409:8c00:7821:4000::a7f:f9c/122 scope global noprefixroute 
       valid_lft forever preferred_lft forever
    inet6 fe80::b9ed:8ee:b44d:286e/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever

10.127.15.182非本机地址,却走到了回环接口,问题定位。

问题解决

删除异常路由条目:

$ sudo ip route delete table local 10.127.15.182 dev bond0.3530 src 10.127.15.182

或重启网络服务

$ sudo systemctl restart NetworkManager

你可能感兴趣的:(ceph,chronyd,route)