测试环境启动oracle rac时,发现数据库无法正常打开。
1.现象如下:
SQL> startup ORACLE instance started. Total System Global Area 1068937216 bytes Fixed Size 2235208 bytes Variable Size 822084792 bytes Database Buffers 239075328 bytes Redo Buffers 5541888 bytes Database mounted. ORA-00600: internal error code, arguments: [kcratr_nab_less_than_odr], [1], [315], [70505], [70506], [], [], [], [], [], [], [] SQL>
Mon Jan 04 16:18:48 2016 Starting ORACLE instance (normal) LICENSE_MAX_SESSION = 0 LICENSE_SESSIONS_WARNING = 0 Private Interface 'eth1:1' configured from GPnP for use as a private interconnect. [name='eth1:1', type=1, ip=169.254.12.15, mac=08-00-27-ec-9f-91, net=169.254.0.0/16, mask=255.255.0.0, use=haip:cluster_interconnect/62] Public Interface 'eth0' configured from GPnP for use as a public interface. [name='eth0', type=1, ip=192.168.77.185, mac=08-00-27-d9-2f-ac, net=192.168.77.0/24, mask=255.255.255.0, use=public/1] Public Interface 'eth0:1' configured from GPnP for use as a public interface. [name='eth0:1', type=1, ip=192.168.77.187, mac=08-00-27-d9-2f-ac, net=192.168.77.0/24, mask=255.255.255.0, use=public/1] Picked latch-free SCN scheme 3 Using LOG_ARCHIVE_DEST_1 parameter default value as /u01/app/oracle/product/11.2.0/dbhome_1/dbs/arch Autotune of undo retention is turned on. LICENSE_MAX_USERS = 0 SYS auditing is disabled Starting up: Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production With the Partitioning, Real Application Clusters, OLAP, Data Mining and Real Application Testing options. ORACLE_HOME = /u01/app/oracle/product/11.2.0/dbhome_1 System name: Linux Node name: racnode1.oracle.com Release: 2.6.18-308.el5 Version: #1 SMP Fri Jan 27 17:17:51 EST 2012 Machine: x86_64 Using parameter settings in server-side pfile /u01/app/oracle/product/11.2.0/dbhome_1/dbs/initRACDB_1.ora System parameters with non-default values: processes = 150 spfile = "+DATADG/racdb/spfileracdb.ora" memory_target = 1G control_files = "+DATADG/racdb/control01.ctl" control_files = "+DATADG/racdb/control02.ctl" db_block_size = 8192 compatible = "11.2.0.0.0" cluster_database = TRUE remote_login_passwordfile= "EXCLUSIVE" db_domain = "" dispatchers = "(PROTOCOL=TCP) (SERVICE=RACDBXDB)" remote_listener = "racnode-cluster-scan:1521" audit_file_dest = "/u01/app/oracle/admin/RACDB/adump" audit_trail = "DB" db_name = "RACDB" open_cursors = 300 diagnostic_dest = "/u01/app/oracle" Cluster communication is configured to use the following interface(s) for this instance 169.254.12.15 cluster interconnect IPC version:Oracle UDP/IP (generic) IPC Vendor 1 proto 2 .......................................................
....................................................... Mon Jan 04 16:18:56 2016 LMD0 started with pid=12, OS id=13703 * Load Monitor used for high load check * New Low - High Load Threshold Range = [1920 - 2560] Mon Jan 04 16:18:56 2016 LMS0 started with pid=13, OS id=13707 at elevated priority .......................................................
....................................................... Mon Jan 04 16:18:56 2016 MMON started with pid=24, OS id=13753 starting up 1 dispatcher(s) for network address '(ADDRESS=(PARTIAL=YES)(PROTOCOL=TCP))'... Mon Jan 04 16:18:56 2016 MMNL started with pid=25, OS id=13757 starting up 1 shared server(s) ... NOTE: initiating MARK startup Starting background process MARK Mon Jan 04 16:18:58 2016 MARK started with pid=28, OS id=13773 NOTE: MARK has subscribed lmon registered with NM - instance number 1 (internal mem no 0) Reconfiguration started (old inc 0, new inc 2) List of instances: 1 (myinst: 1) Global Resource Directory frozen ---------这个是什么意思? * allocate domain 0, invalid = TRUE Communication channels reestablished Master broadcasted resource hash value bitmaps Non-local Process blocks cleaned out LMS 0: 0 GCS shadows cancelled, 0 closed, 0 Xw survived Set master node info Submitted all remote-enqueue requests Dwn-cvts replayed, VALBLKs dubious All grantable enqueues granted Post SMON to start 1st pass IR Submitted all GCS remote-cache requests Post SMON to start 1st pass IR Fix write in gcs resources Reconfiguration complete Mon Jan 04 16:18:59 2016 LCK0 started with pid=30, OS id=13785 Mon Jan 04 16:18:59 2016 Starting background process RSMN Mon Jan 04 16:18:59 2016 RSMN started with pid=31, OS id=13789 ORACLE_BASE from environment = /u01/app/oracle Mon Jan 04 16:19:02 2016 ALTER DATABASE MOUNT This instance was first to mount Mon Jan 04 16:19:02 2016 ALTER SYSTEM SET local_listener='(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.77.187)(PORT=1521))))' SCOPE=MEMORY SID='RACDB_1'; NOTE: Loaded library: System SUCCESS: diskgroup DATADG was mounted NOTE: dependency between database RACDB and diskgroup resource ora.DATADG.dg is established Mon Jan 04 16:19:07 2016 Successful mount of redo thread 1, with mount id 897839478 Database mounted in Shared Mode (CLUSTER_DATABASE=TRUE) Lost write protection disabled Completed: ALTER DATABASE MOUNT Mon Jan 04 16:19:08 2016 ALTER DATABASE OPEN This instance was first to open Beginning crash recovery of 1 threads parallel recovery started with 2 processes Started redo scan Completed redo scan read 4 KB redo, 9 data blocks need recovery Errors in file /u01/app/oracle/diag/rdbms/racdb/RACDB_1/trace/RACDB_1_ora_13919.trc (incident=11059): ORA-00600: internal error code, arguments: [kcratr_nab_less_than_odr], [1], [315], [70505], [70506], [], [], [], [], [], [], [] Incident details in: /u01/app/oracle/diag/rdbms/racdb/RACDB_1/incident/incdir_11059/RACDB_1_ora_13919_i11059.trc Mon Jan 04 16:19:11 2016 Dumping diagnostic data in directory=[cdmp_20160104161911], requested by (instance=1, osid=13919), summary=[incident=11059]. Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. Abort recovery for domain 0 Aborting crash recovery due to error 600 -----------------实例恢复出错导致ora-600错误 Errors in file /u01/app/oracle/diag/rdbms/racdb/RACDB_1/trace/RACDB_1_ora_13919.trc: ORA-00600: internal error code, arguments: [kcratr_nab_less_than_odr], [1], [315], [70505], [70506], [], [], [], [], [], [], [] Abort recovery for domain 0 Errors in file /u01/app/oracle/diag/rdbms/racdb/RACDB_1/trace/RACDB_1_ora_13919.trc: ORA-00600: internal error code, arguments: [kcratr_nab_less_than_odr], [1], [315], [70505], [70506], [], [], [], [], [], [], [] ORA-600 signalled during: ALTER DATABASE OPEN... Mon Jan 04 16:20:01 2016 Sweep [inc][11059]: completed Sweep [inc2][11059]: completed [oracle@racnode1 trace]$
3.查看trace文件
[oracle@racnode1 trace]$ cat /u01/app/oracle/diag/rdbms/racdb/RACDB_1/trace/RACDB_1_ora_13919.trc Trace file /u01/app/oracle/diag/rdbms/racdb/RACDB_1/trace/RACDB_1_ora_13919.trc ....................................................................................... ....................................................................................... *** 2016-01-04 16:19:09.223 *** SESSION ID:(17.3) 2016-01-04 16:19:09.223 *** CLIENT ID:() 2016-01-04 16:19:09.223 *** SERVICE NAME:() 2016-01-04 16:19:09.223 *** MODULE NAME:([email protected] (TNS V1-V3)) 2016-01-04 16:19:09.223 *** ACTION NAME:() 2016-01-04 16:19:09.223 2016-01-04 16:19:09.202395 : Start recovery for domain=0, valid=0, flags=0x4 Successfully allocated 2 recovery slaves Using 67 overflow buffers per recovery slave Thread 1 checkpoint: logseq 315, block 2, scn 8074391 cache-low rba: logseq 315, block 70496 on-disk rba: logseq 315, block 70506, scn 8093866 ------------ on-disk rba 70506 start recovery at logseq 315, block 70496, scn 0 Reading mirrors of log member '+DATADG/racdb/redo01.log' (thread 1 seq 315 block 70505) ksfdrfms:Mirror Read file=+DATADG/racdb/redo01.log fob=0x9c438b58 bufp=0x2b13e9de0000 blkno=70505 nbytes=601600 ksfdrfms: Read success from mirror side=1 logical extent number=0 disk=DATADG_0000 path=/dev/asm-diske Mirror I/O done from ASM disk /dev/asm-diske ksfdrnms:Mirror Read file=+DATADG/racdb/redo01.log fob=0x9c438b58 bufp=0x2b13e9de0000 nbytes=601600 ksfdrnms: Read success from mirror side=2 logical extent number=1 disk=DATADG_0001 path=/dev/asm-diskf Mirror I/O done from ASM disk /dev/asm-diskf ksfdrnms:Mirror Read file=+DATADG/racdb/redo01.log fob=0x9c438b58 bufp=0x2b13e9de0000 nbytes=601600 *** 2016-01-04 16:19:09.515 Started writing zeroblks thread 1 seq 315 blocks 70505-70512 *** 2016-01-04 16:19:09.516 Completed writing zeroblks thread 1 seq 315 ==== Redo read statistics for thread 1 ==== Total physical reads (from disk and memory): 4096Kb -- Redo read_disk statistics -- Read rate (ASYNC): 4Kb in 0.04s => 0.11 Mb/sec Longest record: 0Kb, moves: 0/16 (0%) Change moves: 0/5 (0%), moved: 0Mb Longest LWN: 1Kb, moves: 0/5 (0%), moved: 0Mb Last redo scn: 0x0000.007b8095 (8093845) ---------------------------------------------- ----- Recovery Hash Table Statistics --------- Hash table buckets = 262144 Longest hash chain = 1 Average hash chain = 9/9 = 1.0 Max compares per lookup = 1 Avg compares per lookup = 10/24 = 0.4 ---------------------------------------------- WARNING! Crash recovery of thread 1 seq 315 is ending at redo block 70505 but should not have ended before redo block 70506 *** 2016-01-04 16:19:10.329 Incident 11059 created, dump file: /u01/app/oracle/diag/rdbms/racdb/RACDB_1/incident/incdir_11059/RACDB_1_ora_13919_i11059.trc ORA-00600: internal error code, arguments: [kcratr_nab_less_than_odr], [1], [315], [70505], [70506], [], [], [], [], [], [], [] 2016-01-04 16:19:11.323857 : Abort recovery for domain 0, flags = 0x4 2016-01-04 16:19:11.323886 : kjb_abort_recovery: domain flags=0x0, valid=0 ORA-00600: internal error code, arguments: [kcratr_nab_less_than_odr], [1], [315], [70505], [70506], [], [], [], [], [], [], [] 2016-01-04 16:19:11.324075 : Abort recovery for domain 0, flags = 0x4 2016-01-04 16:19:11.324083 : aborting recovery of 0 (0) with cluster inc 2 (0) recovery 1 2016-01-04 16:19:11.324090 : kjb_abort_recovery: domain flags=0x0, valid=0 ORA-00600: internal error code, arguments: [kcratr_nab_less_than_odr], [1], [315], [70505], [70506], [], [], [], [], [], [], [] [oracle@racnode1 trace]$
WARNING! Crash recovery of thread 1 seq 315 is ending at redo block 70505 but should not have ended before redo block 70506
根据报错信息和trace文件可以推断:
该问题应该是由于服务器异常锻炼,导致LGWR写online redo log时失败,下次重新启动数据库时需要做实例恢复,实例恢复需要的所有信息无法在redo log中得到,因为上次断电导致redo log写失败了。
ora-600里几个参数[kcratr_nab_less_than_odr], [1], [315], [70505], [70506], [], [], [], [], [], [], []意思应该如下:
实例1恢复需要用到日志序列号log sequence为315的日志,需要用到的日志快为70506,但是只能读到70505,70505以后的日志块写失败了,无法读到,所以数据库就起不来了。
查看实例1当前日志,确实是315。
SQL> select * from v$log; GROUP# THREAD# SEQUENCE# BYTES BLOCKSIZE MEMBERS ARC STATUS FIRST_CHANGE# FIRST_TIM NEXT_CHANGE# NEXT_TIME ---------- ---------- ---------- ---------- ---------- ---------- --- ---------------- ------------- --------- ------------ --------- 1 1 315 52428800 512 1 NO CURRENT 8074391 26-DEC-15 2.8147E+14 2 1 314 52428800 512 1 NO INACTIVE 8051838 26-DEC-15 8074391 26-DEC-15 3 2 43 52428800 512 1 NO CURRENT 4947005 16-SEP-15 2.8147E+14 4 2 42 52428800 512 1 NO INACTIVE 4251425 12-AUG-15 4947005 16-SEP-15
4. 参照MetaLink文档ID 1299564.1的方案来恢复数据库:
recover database until cancel using backup controlfile;
alter database open resetlogs;
参考文章:
eygle的博客 http://www.eygle.com/archives/2010/05/kcratr_nab_less_than_odr.html