多路径配置问题和ACFS启用原因导致rac二节点不能正常启动

二节点启动时,crsd一直不能启动成功,crsctl stat res -t -init查看crsd是offline状态

ora.asm
      1        ONLINE  ONLINE       rac2                     Started,STABLE
ora.cluster_interconnect.haip
      1        ONLINE  OFFLINE      rac2                     STABLE
ora.crf
      1        ONLINE  ONLINE       rac2                     STABLE
ora.crsd
      1        ONLINE  OFFLINE                               STABLE
ora.cssd
      1        ONLINE  ONLINE       rac2                     STABLE

检查集群alert日志,发现,ACFS Driver load完后就没有CRSD启动的信息

[ohasd(44185)]CRS-2767:Resource state recovery not attempted for 'ora.diskmon' as its target state is OFFLINE
[client(45793)]CRS-10001:31-Jul-23 15:10 ACFS-9391: Checking for existing ADVM/ACFS installation.
[client(45798)]CRS-10001:31-Jul-23 15:10 ACFS-9392: Validating ADVM/ACFS installation files for operating system.
[client(45800)]CRS-10001:31-Jul-23 15:10 ACFS-9393: Verifying ASM Administrator setup.
[client(45803)]CRS-10001:31-Jul-23 15:10 ACFS-9308: Loading installed ADVM/ACFS drivers.
[client(45806)]CRS-10001:31-Jul-23 15:10 ACFS-9154: Loading 'oracleoks.ko' driver.
[client(45839)]CRS-10001:31-Jul-23 15:10 ACFS-9154: Loading 'oracleadvm.ko' driver.
[client(45877)]CRS-10001:31-Jul-23 15:10 ACFS-9154: Loading 'oracleacfs.ko' driver.
[client(45986)]CRS-10001:31-Jul-23 15:10 ACFS-9327: Verifying ADVM/ACFS devices.
[client(45994)]CRS-10001:31-Jul-23 15:10 ACFS-9156: Detecting control device '/dev/asm/.asm_ctl_spec'.
[client(45998)]CRS-10001:31-Jul-23 15:10 ACFS-9156: Detecting control device '/dev/ofsctl'.
[client(46003)]CRS-10001:31-Jul-23 15:10 ACFS-9322: completed

从日志看起来,二节点的集群启动hung在了crsd进程的启动上,检查OS日志,ACFS Driver load完后,multipath 进程在操作asm path。

Jul 29 12:40:35 rac2 kernel: OKSK-00004: Module load succeeded. Build information: (LOW DEBUG) USM_11.2.0.4.0ACFSPSU_LINUX.X64_160211 2016/02/11 10:45:33
Jul 29 12:40:36 rac2 kernel: ADVMK-00001: Module load succeeded. Build information: (LOW DEBUG) - USM_11.2.0.4.0ACFSPSU_LINUX.X64_160211 built on 2016/02/11 11:04:07.
Jul 29 12:40:36 rac2 multipathd: asm!.asm_ctl_spec: add path (uevent)
Jul 29 12:40:36 rac2 multipathd: asm!.asm_ctl_spec: failed to get path uid
Jul 29 12:40:36 rac2 multipathd: uevent trigger error
Jul 29 12:40:36 rac2 multipathd: asm!.asm_ctl_vmb: add path (uevent)
Jul 29 12:40:36 rac2 multipathd: asm!.asm_ctl_vmb: failed to get path uid
Jul 29 12:40:36 rac2 multipathd: uevent trigger error
Jul 29 12:40:36 rac2 multipathd: asm!.asm_ctl_vdbg: add path (uevent)
Jul 29 12:40:36 rac2 multipathd: asm!.asm_ctl_vdbg: failed to get path uid
Jul 29 12:40:36 rac2 multipathd: uevent trigger error

这个报错在mos中有提到:

Error 'Multipathd: Asm!.Asm_ctl_spec: Failed To Store Path Info' found In /var/log/messages (Doc ID 1268895.1)    

可以对multipath多路径修改解决,按照文档的描述应该是在multipath.conf中添加以下代码:

blacklist {

devnode "^asm/*"

devnode "ofsctl"

}

检查多路径配置,发现确实没有这些配置,这类asm path具体是指哪些呢?

指的是/dev/ofsctl 和 /dev/asm/* 这些,要排除掉下面的。

[root@rac01 ~]# ls -la /dev/asm*
total 0
drwxrwx---  2 root asmadmin     280 Aug 23 14:38 .
drwxr-xr-x 20 root root        7280 Aug 23 14:38 ..
brwxrwx---  1 root asmadmin 252,  0 Aug 23 14:38 .asm_ctl_spec
brwxrwx---  1 root asmadmin 252, 10 Aug 23 14:38 .asm_ctl_vbg0
brwxrwx---  1 root asmadmin 252, 11 Aug 23 14:38 .asm_ctl_vbg1
brwxrwx---  1 root asmadmin 252, 12 Aug 23 14:38 .asm_ctl_vbg2
brwxrwx---  1 root asmadmin 252, 13 Aug 23 14:38 .asm_ctl_vbg3
brwxrwx---  1 root asmadmin 252, 14 Aug 23 14:38 .asm_ctl_vbg4
brwxrwx---  1 root asmadmin 252, 15 Aug 23 14:38 .asm_ctl_vbg5
brwxrwx---  1 root asmadmin 252, 16 Aug 23 14:38 .asm_ctl_vbg6
brwxrwx---  1 root asmadmin 252, 17 Aug 23 14:38 .asm_ctl_vbg7
brwxrwx---  1 root asmadmin 252, 18 Aug 23 14:38 .asm_ctl_vbg8
brwxrwx---  1 root asmadmin 252,  1 Aug 23 14:38 .asm_ctl_vdbg
brwxrwx---  1 root asmadmin 252,  2 Aug 23 14:38 .asm_ctl_vmb

还要确认multipath可以打开 /dev/ofsctl:

lsof /dev/ofsctl

由于生产环境另外一个节点不能停,暂时不能对多路径配置做修改。

所以根据集群alert日志,尝试把acfs先暂时禁用掉,参考Doc ID 1417294.1文档

"crsctl stop crs" Fails to Stop ora.drivers.acfs With CRS-2675 (Doc ID 1417294.1)

进入$GRID_HOME/bin下,通过命令禁用acfs

./acfsroot disable

禁用acfs后,二节点集群数据库就正常拉起来了。

你可能感兴趣的:(数据库,oracle)