我的ceph集群运行了一段时间后,报如下警告:
[root@gnop029-ct-zhejiang_wenzhou-16-12 ~]# ceph -s cluster c6e7e7d9-2b91-4550-80b0-6fa46d0644f6 health HEALTH_WARN clock skew detected on mon.c 896 pgs stuck inactive 896 pgs stuck unclean noscrub flag(s) set Monitor clock skew detected monmap e1: 5 mons at {a=101.71.4.11:6789/0,b=101.71.4.12:6789/0,c=101.71.4.13:6789/0,d=101.71.4.14:6789/0,e=101.71.4.15:6789/0} election epoch 28, quorum 0,1,2,3,4 a,b,c,d,e osdmap e1616: 240 osds: 216 up, 216 in flags noscrub pgmap v16891: 4992 pgs, 18 pools, 1093 GB data, 38340 objects 5446 GB used, 361 TB / 386 TB avail 4096 active+clean 896 creating核心信息就是 clock skew detected on mon.c
我采用如下办法解决问题:
1. 才每一台MON机器上执行如下命令关闭ntpd服务
service ntpd stop2. 执行ntpupdate命令进行时间信息同步
[root@gnop029-ct-zhejiang_wenzhou-16-14 ~]# ntpdate us.pool.ntp.org 5 Dec 16:27:20 ntpdate[30359]: adjust time server 209.118.204.201 offset 0.000712 sec3. 重新启动ntpd服务
service ntpd start4.重新启动ceph -s后,发现集群不再报时间问题:
[root@gnop029-ct-zhejiang_wenzhou-16-14 ~]# ceph -s cluster c6e7e7d9-2b91-4550-80b0-6fa46d0644f6 health HEALTH_WARN 896 pgs stuck inactive 896 pgs stuck unclean noscrub flag(s) set monmap e1: 5 mons at {a=101.71.4.11:6789/0,b=101.71.4.12:6789/0,c=101.71.4.13:6789/0,d=101.71.4.14:6789/0,e=101.71.4.15:6789/0} election epoch 28, quorum 0,1,2,3,4 a,b,c,d,e osdmap e1616: 240 osds: 216 up, 216 in flags noscrub pgmap v16891: 4992 pgs, 18 pools, 1093 GB data, 38340 objects 5446 GB used, 361 TB / 386 TB avail 4096 active+clean 896 creating