oracle rac clsn00107,Oracle RAC failover tolerant

Dear all,

I have a rac system which includes 2 nodes of DB (db1 and db2) and both installed on vmware 6.5. They are installed on 2 ESXi servers ESX1 and ESX2 respectively. For this system, I have 2 storage systems to provide storage resource (storage1 and storage2). This picture below can describe more clearly about DB system.

oracle rac clsn00107,Oracle RAC failover tolerant_第1张图片

I have some diskgroups which has 2 asm disks and they are all normal redundancy. For storage of OCR and voting disks, I have create 2 diskgroup OCR and VOTE which has 3 disks, 2 of them are in storage1 and storage2, the other one is stored in NFS server which is 1 of my vm and I use dd command to create a virtual disk and share it between db1 and db2. This NFS VM is configured to use resource of localdisk.

DiskgroupASM Disk 1 resourceASM Disk 2 resourceASM Disk 3 resource

OCRstorage 1storage 2NFS disk

VOTEstorage 1storage 1NFS disk

ARCHIVEstorage 1storage 1

DATAstorage 1storage 1

In my test case, I poweroff node db1 and broke the connection between storage1 and physical machine ESX2 (which node2 is running on). As I understand, in this case, OCR and VOTE still be online because just 1 of 3 disks is corrupted and DB still be online. But no.

After the connection is blocked, CRS service in db2 is down:

[[email protected] ~]$ crsctl check cluster -all

**************************************************************

p-db-02:

CRS-4535: Cannot communicate with Cluster Ready Services

CRS-4529: Cluster Synchronization Services is online

CRS-4533: Event Manager is online

And these logs are created in crs and asm alert log:

CRS alert log:

2017-12-26 21:32:34.467 [OCSSD(15404)]CRS-1719: Cluster Synchronization Service daemon (CSSD) clssnmvWorkerThread_0 not scheduled for 72230 msecs.

2017-12-26 21:32:42.468 [OCSSD(15404)]CRS-1719: Cluster Synchronization Service daemon (CSSD) clssnmvDiskPingThread_0 not scheduled for 80020 msecs.

2017-12-26 21:32:42.468 [OCSSD(15404)]CRS-1719: Cluster Synchronization Service daemon (CSSD) clssnmvWorkerThread_0 not scheduled for 80230 msecs.

2017-12-26 21:32:50.469 [OCSSD(15404)]CRS-1719: Cluster Synchronization Service daemon (CSSD) clssnmvDiskPingThread_0 not scheduled for 88020 msecs.

2017-12-26 21:32:50.469 [OCSSD(15404)]CRS-1719: Cluster Synchronization Service daemon (CSSD) clssnmvWorkerThread_0 not scheduled for 88230 msecs.

2017-12-26 21:32:58.470 [OCSSD(15404)]CRS-1719: Cluster Synchronization Service daemon (CSSD) clssnmvDiskPingThread_0 not scheduled for 96020 msecs.

2017-12-26 21:32:58.470 [OCSSD(15404)]CRS-1719: Cluster Synchronization Service daemon (CSSD) clssnmvWorkerThread_0 not scheduled for 96230 msecs.

2017-12-26 21:33:01.711 [OCSSD(15404)]CRS-1615: No I/O has completed after 50% of the maximum interval. Voting file /dev/oracleasm/disks/OCR1_2 will be considered not functional in 99740 milliseconds

2017-12-26 21:34:31.477 [OCSSD(15404)]CRS-1719: Cluster Synchronization Service daemon (CSSD) clssnmvDiskPingThread_0 not scheduled for 8020 msecs.

2017-12-26 21:34:39.477 [OCSSD(15404)]CRS-1719: Cluster Synchronization Service daemon (CSSD) clssnmvDiskPingThread_0 not scheduled for 16020 msecs.

2017-12-26 21:34:41.477 [OCSSD(15404)]CRS-1604: CSSD voting file is offline: /dev/oracleasm/disks/OCR1_2; details at (:CSSNM00058:) in /vendor/app/oracle/diag/crs/p-db-02/crs/trace/ocssd.trc.

2017-12-26 21:34:41.477 [OCSSD(15404)]CRS-1672: The number of voting files currently available 2 has fallen to the minimum number of voting files required 2.

2017-12-26 21:39:28.099 [ORAAGENT(5555)]CRS-5011: Check of resource "ora.asm" failed: details at "(:CLSN00006:)" in "/vendor/app/oracle/diag/crs/p-db-02/crs/trace/ohasd_oraagent_oracle.trc"

2017-12-26 21:39:28.160 [ORAAGENT(17427)]CRS-5011: Check of resource "ora.asm" failed: details at "(:CLSN00006:)" in "/vendor/app/oracle/diag/crs/p-db-02/crs/trace/crsd_oraagent_oracle.trc"

2017-12-26 21:39:28.162 [ORAAGENT(5555)]CRS-5011: Check of resource "ora.asm" failed: details at "(:CLSN00006:)" in "/vendor/app/oracle/diag/crs/p-db-02/crs/trace/ohasd_oraagent_oracle.trc"

2017-12-26 21:39:28.190 [ORAAGENT(17427)]CRS-5011: Check of resource "_mgmtdb" failed: details at "(:CLSN00007:)" in "/vendor/app/oracle/diag/crs/p-db-02/crs/trace/crsd_oraagent_oracle.trc"

2017-12-26 21:39:28.197 [ORAAGENT(17427)]CRS-5011: Check of resource "_mgmtdb" failed: details at "(:CLSN00007:)" in "/vendor/app/oracle/diag/crs/p-db-02/crs/trace/crsd_oraagent_oracle.trc"

2017-12-26 21:39:28.213 [ORAAGENT(17427)]CRS-5011: Check of resource "_mgmtdb" failed: details at "(:CLSN00007:)" in "/vendor/app/oracle/diag/crs/p-db-02/crs/trace/crsd_oraagent_oracle.trc"

2017-12-26 21:39:28.231 [ORAAGENT(17427)]CRS-5011: Check of resource "_mgmtdb" failed: details at "(:CLSN00007:)" in "/vendor/app/oracle/diag/crs/p-db-02/crs/trace/crsd_oraagent_oracle.trc"

2017-12-26 21:39:28.416 [ORAAGENT(17427)]CRS-5017: The resource action "ora.cdb.db start" encountered the following error:

2017-12-26 21:39:28.416+ORA-01078: failure in processing system parameters

ORA-01565: error in identifying file '+NPCDBDG/CDB/spfileCDB.ora'

ORA-17503: ksfdopn:2 Failed to open file +NPCDBDG/CDB/spfileCDB.ora

ORA-01092: ORACLE instance terminated. Disconnection forced

. For details refer to "(:CLSN00107:)" in "/vendor/app/oracle/diag/crs/p-db-02/crs/trace/crsd_oraagent_oracle.trc".

2017-12-26 21:39:28.494 [OCSSD(15404)]CRS-1719: Cluster Synchronization Service daemon (CSSD) clssnmvDiskPingThread_0 not scheduled for 120040 msecs.

2017-12-26 21:39:29.449 [CRSD(17321)]CRS-2878: Failed to restart resource 'ora.cdb.db'

2017-12-26 21:39:29.449 [CRSD(17321)]CRS-2769: Unable to failover resource 'ora.cdb.db'.

2017-12-26 21:39:36.495 [OCSSD(15404)]CRS-1719: Cluster Synchronization Service daemon (CSSD) clssnmvDiskPingThread_0 not scheduled for 128040 msecs.

2017-12-26 21:39:44.495 [OCSSD(15404)]CRS-1719: Cluster Synchronization Service daemon (CSSD) clssnmvDiskPingThread_0 not scheduled for 136040 msecs.

2017-12-26 21:39:52.496 [OCSSD(15404)]CRS-1719: Cluster Synchronization Service daemon (CSSD) clssnmvDiskPingThread_0 not scheduled for 144040 msecs.

2017-12-26 21:42:26.402 [CRSD(17321)]CRS-1024: The Cluster Ready Service on this node terminated because the ASM instance was not active on this node. Details at (:PROCR00009:) in /vendor/app/oracle/diag/crs/p-db-02/crs/trace/crsd.trc.

2017-12-26 21:42:26.410 [ORAROOTAGENT(17431)]CRS-5822: Agent '/vendor/app/12.1.0.2/grid/bin/orarootagent_root' disconnected from server. Details at (:CRSAGF00117:) {0:3:7} in /vendor/app/oracle/diag/crs/p-db-02/crs/trace/crsd_orarootagent_root.trc.

2017-12-26 21:42:26.411 [ORAAGENT(17427)]CRS-5822: Agent '/vendor/app/12.1.0.2/grid/bin/oraagent_oracle' disconnected from server. Details at (:CRSAGF00117:) {0:1:41} in /vendor/app/oracle/diag/crs/p-db-02/crs/trace/crsd_oraagent_oracle.trc.

2017-12-26 21:42:26.412 [SCRIPTAGENT(3165)]CRS-5822: Agent '/vendor/app/12.1.0.2/grid/bin/scriptagent_oracle' disconnected from server. Details at (:CRSAGF00117:) {0:5:18} in /vendor/app/oracle/diag/crs/p-db-02/crs/trace/crsd_scriptagent_oracle.trc.

2017-12-26 21:42:26.461 [CRSD(1773)]CRS-8500: Oracle Clusterware CRSD process is starting with operating system process ID 1773

2017-12-26 21:42:33.505 [OCSSD(15404)]CRS-1719: Cluster Synchronization Service daemon (CSSD) clssnmvDiskPingThread_0 not scheduled for 120050 msecs.

2017-12-26 21:42:35.822 [CRSD(1773)]CRS-1013: The OCR location in an ASM disk group is inaccessible. Details in /vendor/app/oracle/diag/crs/p-db-02/crs/trace/crsd.trc.

2017-12-26 21:42:41.506 [OCSSD(15404)]CRS-1719: Cluster Synchronization Service daemon (CSSD) clssnmvDiskPingThread_0 not scheduled for 128050 msecs.

2017-12-26 21:42:44.098 [CRSD(1773)]CRS-1013: The OCR location in an ASM disk group is inaccessible. Details in /vendor/app/oracle/diag/crs/p-db-02/crs/trace/crsd.trc.

2017-12-26 21:42:44.101 [CRSD(1773)]CRS-0804: Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage

ORA-15077: could not locate ASM instance serving a required diskgroup

ASM alert log:

attached file.

I don't understand why DB is still down even all diskgroups are configured in normal redundancy mode.

Thank you,

你可能感兴趣的:(oracle,rac,clsn00107)