Jaffar's (Mr RAC) Oracle blog: September 2010

9.28.2010

Beware of unethical LinkedIn connect or view pics request

Beware of an unethical LinkedIn connect or view photos request. It could be a trap to install some stupid spam and malware softwares on your PC and which screams you later with pesky messges. In fact, myself had fallen to this trap a few days ago when I received such request from a friend to connect on the LinkedIn social networking site After I click the connect link, a malware software had installed and started giving some very bothersome messages on my PC. Indeed it really irritated me a lot and took almost 5 hours of my time to get rid off this problem.
Ensure its a genuine connect request from the LinkedIn site before you go ahead and click the link.

Jaffar

9.26.2010

kghfrunp: latch: nowait paralyzes RAC database instances

When I arrived office this morning, I received a complain from a business user that he is unable to log in on one of our business critical RAC database (v10gR2, configured with 2 instances) running on HP Superdome. Nevertheless, after the complain, I was able to establish a connection on the first instance, but, on the second instance, I couldn't, the connection just got hanged. Suspecting the blocking scenario, I then tried to run some queries on the first instance to understand the ongoing situation of the database. Alas, the query too hanged and the cursor didn't return back to the SQL prompt.
The alert.log of the second instance just shows an 'WARNING: inbound connection timed out (ORA-3136)' message. However, the instance alter.log file shows the following information:

ORA-12012: error on auto execute of job 42781

ORA-27468: "EXFSYS.RLM$SCHDNEGACTION" is locked by another process

Errors in file /u00/app/oracle/admin/XXXX/bdump/xxxx1_j000_23982.trc:

ORA-12012: error on auto execute of job 42781

ORA-27468: "EXFSYS.RLM$SCHDNEGACTION" is locked by another process

I then had a closer look at the details given in the DIAG trace file on the second node and found some valuable information, as listed below:

waiter count=24

          gotten 8128165 times wait, failed first 518198 sleeps 267045

          gotten 83569 times nowait, failed: 232295

        possible holder pid = 24 ospid=14113

      on wait list for c0000001b8cdd6f0

    Process Group: DEFAULT, pseudo proc: c0000001bc2f4d90

    O/S info: user: oracle, term: UNKNOWN, ospid: 14391

    OSD pid info: Unix process pid: 14391, image: oracle@usogp08 (LCK0)

  waiting for c0000001b8cdd6f0 Child row cache objects level=4 child#=16

        Location from where latch is held: kghfrunp: clatch: nowait:

        Context saved from call: 0

        state=busy, wlstate=free

          waiters [orapid (seconds since: put on list, posted, alive check)]: 

    ----------------------------------------

    SO: c0000001b5387868, type: 16, owner: c0000001bf2a61c0, flag: INIT/-/-/0x00

    (osp req holder)

Enqueue blocker waiting on 'latch: row cache objects'

After analyzing the text closely, it was loud and clear that the database indeed is suffering from the blocking problem. Instead of aborting the second instance (could be managed through sqlplus -prelim / as sysdba command though) in order to release the lock held on the database, I thought it would be better to find the lock holder session and kill it's process. I manage to get the details about the holder session from the diag file, ' possible holder pid = 24 ospid=14113'. After getting the ospid details of the possible holder session, I found that it was a job queue oracle background process. Without any second thought, I just killed the process using from the OS using the 'kill -9 processid' command. Once process has been killed, the database was back to the normal functionality. I then took a AWR report of the second instance and it reveals the following facts about the sitaution:

top 5 timed events
latch: row cache objects     420,942 64,707 154   40.3 Concurrency
rdbms ipc reply               32,832 62,721 1,910 39.1 Other
cursor: pin S wait on X    1,898,149 25,440 13    15.8 Concurrency
row cache lock               321,355 10,961 34     6.8 Concurrency
CPU time                              3,953        2.5

shared KQR L PO 214.04 56.62 -73.55
library cache kqlmbfre: child: in loop 0 66 175
library cache kqlmbfre: child: no obj to free 0 18 258

shared pool kghfrunp: clatch: wait   0 308,002 417,684
shared pool kghfrunp: alloc: wait    0 129,691 128,527
shared pool kghfrunp: clatch: nowait 0 125,485 0
shared pool kghfre 0   3,043 3,386
shared pool kghalo                   0 258 777
shared pool kgh_next_free            0 79 322
shared pool kghalp                   0 18 48

The ML LCK temporarily stuck waiting for latch 'Child row cache objects' [ID 843638.1] points towards shared pool stress and below is the abstract about the cause of the problem:

The shared pool is stressed and memory need to be freed for the new cursors. As a consequence, the dictionary cache is reduced in size by the LCK process causing a temporal hang of the instance since the LCK can't do other activity during that time. Since the dictionary cache is a memory area protected clusterwide in RAC, the LCK is responsible to free it in collaboration with the dictionary cache users (the sessions using cursors referenced in the dictionary cache). This process can be time consuming when the dictionary cache is big.

After understanding the facts & figures shown in the AWR report, indeed, there was a pressure on the shared pool's dictionary cache memory component. As suggested in the ML, I need to pay attention to the larger text SQLs and tune or need to apply a patch to avoid the embarrassing situation in the future.
Stay tune, I will post more details in the coming days.

References
BUG:8266531 - LCK STUCK WAITING FOR LATCH 'CHILD ROW CACHE OBJECTS'
BUG:8666117 - LCK0 PROCESS STUCK AT WAITING FOR "LATCH: ROW CACHE OBJECTS"

9.09.2010

Wishing you a happy and prosperous EID

Yet another a pleasing holy month of RAMADA ended today by the grace of almighty and a great day of EID celebration ahead! Hence, I would like take this opportunity to wish everyone out there a very happy,hale and prosperous EID. Enjoy your EID with your parents,family,friends,colleagues, relatives and etc.

Jaffar

Jaffar's (Mr RAC) Oracle blog

Expert Oracle RAC