Jaffar's (Mr RAC) Oracle blog: January 2010

1.27.2010

Book Review: Oracle 11g Streams Implementers Guide

Very soon I will be receiving the 'Oracle 11g Streams Implementers Guide' ebook by Packt Publishing and will spend some time to read the book to write the review on my blog. In the meantime, if you want to learn more about this book, you can visit here.

Happy reading,

Jaffar

1.26.2010

Advt: Oracle 11g R1/R1 Real Application Clusters Handbook

Well, its my turn to do a shameless marketing for our upcoming 'Oracle 11g R1/R2 Real Application Clusters' handbook. This book will be hitting the market some time in April 2010 and you can get more details on this book here.
Almost a year back an idea about writing a book on RAC came up in my mind and started looking for fellow co-authors plus a publishing house for our book and finally ended up writing this book for Packt Publication along with other co-authors, Ben Prusinski and Guenadi Jilevski.
One thing I must agree that writing a book requires a lot of research, patience, time and knowledge and if you are working whilst writing the book, is like a uphill task to balance the work and coping up with the chapters schedule time.Working in one of the largest RAC setup organization here in Saudi Arabia helped me tremendously to achieve this feet.
We tried to give our level best to cover most of the useful topics for the DBA who wants to administrate and maintain a complex RAC environment. Since Oracle 11gR2 is released, we had to modify a lot of contents to cover most of the useful 11gR2 features in the book.

I will keep you update on my blog about my book and will let you know the Contents of the book as well.

Happy reading,

Jaffar

1.24.2010

Oracle bugs - Part 3(GETTING "CONNECTION NOT ACTIVE" IN EVMD.LOG )

I am back again with Part 3 of Oracle bugs series. Although this bug won't bug you much, nevertheless, the frequent appearance of the following message in the evmd.log (under the cluster home) may raise some alarms (doubts) in your mind about your network stability:

2009-11-29 15:34:04.821: [ COMMCRS][1]clscsendx: (6000000000397610) Physical connection (6000000000b09c40) not active
2009-11-29 18:17:41.748: [ COMMCRS][1]clscreceive: (6000000000397910) Connection not active
2009-11-29 18:17:41.748: [ COMMCRS][1]clscreceive: (6000000000397910) Connection not active
2010-01-23 06:51:22.669: [ COMMCRS][1]clscsendx: (6000000000398150) Physical connection (60000000009ff860) not active
2010-01-23 09:43:06.210: [ COMMCRS][1]clscsendx: (6000000000398150) Physical connection (60000000009ff860) not active
2010-01-23 12:52:23.196: [ COMMCRS][1]clscsendx: (60000000003982d0) Physical connection (60000000009ff860) not active

Those warnings are flagged CLSC_TRACE_ALWAYS as indicated in Bug 3251584.
They concerns connections between the evmd daemons and other processes that are disconnected from the point of view of the evmd, i.e. the evmd detect the remote connections are not active anymore, but the remote side has not disconnect the link in a proper way. Hence the warning message. They may not be removed as indicated in Bug 3251584.

Those evmd warning messages can be safely ignored as they are informative only.

Happy reading,

Jaffar

1.11.2010

Oracle Bugs - Part 2 (ASM instance on RAC crashed due to ORA-600 [kfclUpdateBuffers20])

Here comes the Part 2 story of my encounter with various Oracle bugs in the recent past.

An ASM instance on one of the node of 8 node RAC environment crashed due to 'ORA-00600: internal error code, arguments: [kfclUpdateBuffers20]' error and all the active instances on this node crashed too with 'communication failure with ASM instance'.

The cause is due to an intermittent buffer conversion problem in ASM internal structures, LMS b/g process has terminated the ASM instance with an ORA-600 kfclUpdateBuffers20 error.

Following bugs have been registered against this behavior:

Bug 6489250 INTERMITTENT ORA-600 [KFCLUPDATEBUFFERS20] WITH ASM

Bug:6605106 AN ASM NULL CONVERTED BUFFER MAY BE REFERENCED

Details:

ORA-600 [kfclUpdateBuffers20] can occur in ASM as a null converted

buffer may be accessed. Other errors could also occur.

To fix this bug, either apply one-off patch which suits your environment version or apply 10.2.0.5 or upgrade to 11g.

Happy oracle bugs reading.

Jaffar

1.04.2010

Oracle bugs - Part 1 (an Instance hung on 'WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK')

Since couple of months, I have been dealing with various Oracle bugs on our 8 node RAC running on HPUX environment with Oracle V10.2.0.4. That really keeps my life busy and given enough nightmares. I just thought of sharing my experience dealing these bugs in a three part series. In the first part of the series, I am going to discuss a situation where an RAC database with two instances hung scenario and how we handled the situation.

An RAC database with two instances was hung with the 'WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK!' error message in the alert.log file. Initially, neither I was able to connect to the database via sqlplus nor able to shutdown the database using SRVCTL STOP DATABASE command. I had to connect one of the instance using the 'sqlplus -prelim' option and before I perform instance abort, I have collected the hanganalyze (level 3) and systemstate (level 266) dumps using the oradebug utility. Upon investigating the hanganlyze tracefile, it shows the following contention:

63CA629C:BE596F7A 300 215 10005 2 KSL WAIT END [enq: SQ - contention] 1397817350/0x53510006 144/0x90 0/0x0 time=2929720
63CA62C9:BE596F7B 300 215 10005 1 KSL WAIT BEG [enq: SQ - contention] 1397817350/0x53510006 144/0x90 0/0x0

Further analysis shows a dead lock between two process which were either directly or indirectly blocking all other sessions in the database. Digging further more with the trace file reveals the real problem as you can see in the subsequent lines:

Deadlock between process 33 (J000) running DBMS_STATS_SCH and process 45.
Process 33 is waiting for 'cursor: pin S wait on X' mutex idn=57bc44f9 in shared mode while holding library cache lock handle=c00000010e9123c0 in exclusive mode

Mutex c0000001074df688(490, 0) idn 57bc44f9 oper GET_SHRD --> waiting
LIBRARY OBJECT LOCK: lock=c000000101141390 handle=c00000010e9123c0 mode=X

Process 45 is waiting for waiting for 'library cache lock' on SYS.C_COBJ# handle address=c00000010e9123c0 in shared mode while holding mutex idn 57bc44f9 in mode EXCL.

Mutex c0000001074df688(490, 0) idn 57bc44f9 oper EXCL
LIBRARY OBJECT LOCK: lock=c000000108b0b208 handle=c00000010e9123c0 request=S

As usual, an Oracle support informed me that its a bug 6011045 and the behavior is due to the following reasons:

Abstract: DBMS_STATS causes deadlock between 'cursor: pin S wait on X' and 'library cache lock'
Fixed-Releases: B1
Tags: DBMSPKG DEADLOCK OPS
Details:
Gathering statistics (With DBMS_STATS) on bootstrap objects (like ind$) can cause a deadlock on sessions waiting on between 'cursor: pin S wait on X' and 'library cache lock' in RAC. In some instances sessions waiting on 'DFS lock handle' and 'row cache lock' are also part of the deadlock path.

Yes, we had a job scheduled daily at might night to collect the database statistics (dbms_stats.gather_database_stats). Though it was running smoothly more than a year now, on any unlucky day, you gonna face this sort of troubles. My initial impression was that this bug is fixed in v.10.2.0.4. However, the Oracle support informed me that the bug is not fixed in 10.2.0.4, either I need to upgrade to 11g or request for a one-off patch for my platform.

In part 2, I am going to discuss a situation where one of the RAC database instance crashed due to 'ORA-07445 error'.

Happy reading

Jaffar

Jaffar's (Mr RAC) Oracle blog

Expert Oracle RAC