二节点启动时,crsd一直不能启动成功,crsctl stat res -t -init查看crsd是offline状态
ora.asm
1 ONLINE ONLINE rac2 Started,STABLE
ora.cluster_interconnect.haip
1 ONLINE OFFLINE rac2 STABLE
ora.crf
1 ONLINE ONLINE rac2 STABLE
ora.crsd
1 ONLINE OFFLINE STABLE
ora.cssd
1 ONLINE ONLINE rac2 STABLE
检查集群alert日志,发现,ACFS Driver load完后就没有CRSD启动的信息
[ohasd(44185)]CRS-2767:Resource state recovery not attempted for 'ora.diskmon' as its target state is OFFLINE
[client(45793)]CRS-10001:31-Jul-23 15:10 ACFS-9391: Checking for existing ADVM/ACFS installation.
[client(45798)]CRS-10001:31-Jul-23 15:10 ACFS-9392: Validating ADVM/ACFS installation files for operating system.
[client(45800)]CRS-10001:31-Jul-23 15:10 ACFS-9393: Verifying ASM Administrator setup.
[client(45803)]CRS-10001:31-Jul-23 15:10 ACFS-9308: Loading installed ADVM/ACFS drivers.
[client(45806)]CRS-10001:31-Jul-23 15:10 ACFS-9154: Loading 'oracleoks.ko' driver.
[client(45839)]CRS-10001:31-Jul-23 15:10 ACFS-9154: Loading 'oracleadvm.ko' driver.
[client(45877)]CRS-10001:31-Jul-23 15:10 ACFS-9154: Loading 'oracleacfs.ko' driver.
[client(45986)]CRS-10001:31-Jul-23 15:10 ACFS-9327: Verifying ADVM/ACFS devices.
[client(45994)]CRS-10001:31-Jul-23 15:10 ACFS-9156: Detecting control device '/dev/asm/.asm_ctl_spec'.
[client(45998)]CRS-10001:31-Jul-23 15:10 ACFS-9156: Detecting control device '/dev/ofsctl'.
[client(46003)]CRS-10001:31-Jul-23 15:10 ACFS-9322: completed
从日志看起来,二节点的集群启动hung在了crsd进程的启动上,检查OS日志,ACFS Driver load完后,multipath 进程在操作asm path。
Jul 29 12:40:35 rac2 kernel: OKSK-00004: Module load succeeded. Build information: (LOW DEBUG) USM_11.2.0.4.0ACFSPSU_LINUX.X64_160211 2016/02/11 10:45:33
Jul 29 12:40:36 rac2 kernel: ADVMK-00001: Module load succeeded. Build information: (LOW DEBUG) - USM_11.2.0.4.0ACFSPSU_LINUX.X64_160211 built on 2016/02/11 11:04:07.
Jul 29 12:40:36 rac2 multipathd: asm!.asm_ctl_spec: add path (uevent)
Jul 29 12:40:36 rac2 multipathd: asm!.asm_ctl_spec: failed to get path uid
Jul 29 12:40:36 rac2 multipathd: uevent trigger error
Jul 29 12:40:36 rac2 multipathd: asm!.asm_ctl_vmb: add path (uevent)
Jul 29 12:40:36 rac2 multipathd: asm!.asm_ctl_vmb: failed to get path uid
Jul 29 12:40:36 rac2 multipathd: uevent trigger error
Jul 29 12:40:36 rac2 multipathd: asm!.asm_ctl_vdbg: add path (uevent)
Jul 29 12:40:36 rac2 multipathd: asm!.asm_ctl_vdbg: failed to get path uid
Jul 29 12:40:36 rac2 multipathd: uevent trigger error
这个报错在mos中有提到:
Error 'Multipathd: Asm!.Asm_ctl_spec: Failed To Store Path Info' found In /var/log/messages (Doc ID 1268895.1)
可以对multipath多路径修改解决,按照文档的描述应该是在multipath.conf中添加以下代码:
blacklist {
devnode "^asm/*"
devnode "ofsctl"
}
检查多路径配置,发现确实没有这些配置,这类asm path具体是指哪些呢?
指的是/dev/ofsctl 和 /dev/asm/* 这些,要排除掉下面的。
[root@rac01 ~]# ls -la /dev/asm*
total 0
drwxrwx--- 2 root asmadmin 280 Aug 23 14:38 .
drwxr-xr-x 20 root root 7280 Aug 23 14:38 ..
brwxrwx--- 1 root asmadmin 252, 0 Aug 23 14:38 .asm_ctl_spec
brwxrwx--- 1 root asmadmin 252, 10 Aug 23 14:38 .asm_ctl_vbg0
brwxrwx--- 1 root asmadmin 252, 11 Aug 23 14:38 .asm_ctl_vbg1
brwxrwx--- 1 root asmadmin 252, 12 Aug 23 14:38 .asm_ctl_vbg2
brwxrwx--- 1 root asmadmin 252, 13 Aug 23 14:38 .asm_ctl_vbg3
brwxrwx--- 1 root asmadmin 252, 14 Aug 23 14:38 .asm_ctl_vbg4
brwxrwx--- 1 root asmadmin 252, 15 Aug 23 14:38 .asm_ctl_vbg5
brwxrwx--- 1 root asmadmin 252, 16 Aug 23 14:38 .asm_ctl_vbg6
brwxrwx--- 1 root asmadmin 252, 17 Aug 23 14:38 .asm_ctl_vbg7
brwxrwx--- 1 root asmadmin 252, 18 Aug 23 14:38 .asm_ctl_vbg8
brwxrwx--- 1 root asmadmin 252, 1 Aug 23 14:38 .asm_ctl_vdbg
brwxrwx--- 1 root asmadmin 252, 2 Aug 23 14:38 .asm_ctl_vmb
还要确认multipath可以打开 /dev/ofsctl:
lsof /dev/ofsctl
由于生产环境另外一个节点不能停,暂时不能对多路径配置做修改。
所以根据集群alert日志,尝试把acfs先暂时禁用掉,参考Doc ID 1417294.1文档
"crsctl stop crs" Fails to Stop ora.drivers.acfs With CRS-2675 (Doc ID 1417294.1)
进入$GRID_HOME/bin下,通过命令禁用acfs
./acfsroot disable
禁用acfs后,二节点集群数据库就正常拉起来了。