Jaffar's (Mr RAC) Oracle blog: May 2013

5.31.2013

Introducing Java EE 7 - Live Webcast

Wednesday, June 12, 2013 / Thursday, June 13, 2013

Two opportunities to come together with the Java community, chat with experts, and explore Java EE 7:

9 a.m. PT / 12 p.m. ET / 5 p.m. London or

9 p.m. PT / 12 a.m. ET (Thursday) / 2 p.m. Sydney (Thursday)

The introduction of Java EE 7 is a free online event where you can connect with Java users from all over the world as you learn about the power and capabilities of Java EE 7. Join us for presentations from Oracle technical leaders and Java users from both large and small enterprises, deep dives into the new JSRs, and scheduled chats with Java experts.

Jave EE 7 updates (session recording and PDF)
https://java.net/projects/jugs/downloads/download/JavaEE_Update_ArunGupta_May30.mp3

https://glassfish.java.net/javaee7/techkit/JavaEE7-1hour.pdf

5.28.2013

Expert Oralce RAC 12c - upcoming book

Here is the TOC for upcoming Expert Oracle RAC 12c book, which is slated to release sometime in August 2013 (of course subject to Oracle 12c announcement), published by Apress.

Overview of Oracle RAC
Clusterware Management and Troubleshooting
RAC Operational Practices
RAC New Features
Storage and ASM Practices
Application Design Issues
Managing and Optimizing a Complex RAC Environment
Backup and Recovery in RAC
Network Practices in RAC
RAC Database Optimization
Locks and Deadlocks
Parallel Query in RAC
Clusterware and Database Upgrades
Oracle RAC One Node
Virtualized RAC - Setup DB Clouds - Part 1
Virtualized RAC - Setup DB Clouds - Part 2

You might get about 29% off on pre-order copy at Amazon.

http://www.amazon.com/Expert-Oracle-Syed-Jaffar-Hussain/dp/1430250445

5.18.2013

A tricky standby database situation

A very tricky and interesting situation came-up this morning while configuring one of the standby databases of over 1.5TB sized . Whilst the database is being cloned to the DR site as part of the DUPLICATE..ACTIVE DATABASE command, which actually took more than 1.5 day, a couple of new datafiles were added to the primary database. After cloning process was over, the newly build DR database was almost 2 days behind withe the primary database. I knew I can make it in SYNC the PRIMARY and STANDBY applying the standby roll-froward method, but, I already have a daily cumulative incremental backups on TAPE. If I perform incremental backup to do the roll-forward upgrade, it gonna take much time. Hence, I determined to make use of the existing backups. When the the roll-forward method was followed, the following confronted:

RMAN> SWITCH DATABASE TO COPY;

RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of switch to copy command at 05/18/2013 10:25:29
RMAN-06571: datafile 58 does not have recoverable copy

Obviously, it was expected, because the datafile in the question was added after standby database creation initiations.

Workaround:
Had to try out-of-the-box solution (roll-forward method).

Re-create and restore the standby controlfile
Restore missing datafiles on the standby
Catalog standby database datafiles (diskgroup was different from primary)
Recover the database
Complete the rest of the standby configure to make it in sync

Will be writing a detailed article on this. Stay tuned for more.

Happy reading

Jaffar

5.14.2013

New Page - Data Guard

A quick update about the new page.

I have created a new page (tab) 'Data Guard' on my blog to share/discuss all data guard related issues that we confronted during our extensive DR setup and testing. The objective is to record all the errors/issues of data guard setup and how we resolve them. Also, I will be sharing the DR configuration procedure and the best practices that we used in our environment.

Appreciate your inputs, and if you are interested to share/write something on the subject matter, do write to me, I will put it on the page under your name.

Have a nice day,

Jaffar

5.11.2013

Its data guard time for the team yet again

Just a very quick update about my upcoming tasks and what I will be doing for the next 3 weeks time.

It is indeed going to be a super busy rest of the month for the entire team as over 41 RAC databases data guard configuration need to be done. We will be pretty engaged and occupied for the next 3 weeks creating standby databases and configuring DG setup in the context to have a fully functional DR environment.

We have done similar practice in the past (a few months ago) to test the DR capabilities for database and application, and now its time to have a permanent DR configuration. Therefore, anticipate a lot of blogging about DR stuff in the coming days at my blog.

Wish me luck people.

Jaffar

5.03.2013

Things to be considerd before/after the OS patch deployment

The objective of this write-up is to emphasize the importance of considering things like verifying the patch compatibility and relinking the Oracle home after patching the underlying Operating System (OS) in any Oracle environment. I would like to share an incident (a little story) that we encountered a few days ago in one of our non-production RAC environments where the Clusterware stack didn't start after the OS patch deployment.

As part of the patching policy set in the organization, our HPUX admin scheduled the latest quarterly HPUX v11.3x OS patch deployment activity on all servers, and a non-RAC and Oracle RAC environments have patched in the context. Though the patching activity went smoothly on both the environments, we faced issues starting the Cluster stack in the Cluster environment. When the cluster stack status was verified, we have noticed that the Cluster Synchronization Daemon process (cssd) was in 'STARTING' state, as shown below:

$ ./crsctl stat res -init -t

--------------------------------------------------------------------------------

Name Target State Server State details

--------------------------------------------------------------------------------

Cluster Resources

--------------------------------------------------------------------------------

ora.asm

1 ONLINE OFFLINE rac1

ora.cluster_interconnect.haip

1 ONLINE OFFLINE rac1

ora.crsd

1 ONLINE OFFLINE rac1

ora.cssd

1 ONLINE OFFLINE rac1 STARTING

ora.cssdmonitor

1 ONLINE ONLINE rac1

Oracle High Availability Daemon process (ohsd) started without any issues, however, the crsd couldn't be started on any of the nodes after the patch deployment . Upon examining the ocssd.log, it was found that some how the voting disks were not able to discover by the process, hence, the crsd process couldn't start and the following messages appeared in the ocssd.log:

CRS-1714:Unable to discover any voting files

2013-04-23 18:47:16.553: [ SKGFD][6]Discovery with str:/dev/rdsk/c0t5d5,/dev/rdsk/c0t5d4:

2013-04-23 18:47:16.553: [ SKGFD][6]UFS discovery with :/dev/rdsk/c0t5d5:

2013-04-23 18:47:16.559: [ SKGFD][6]Fetching UFS disk :/dev/rdsk/c0t5d5:

2013-04-23 18:47:16.559: [ SKGFD][6]OSS discovery with :/dev/rdsk/c0t5d5:

2013-04-23 18:47:16.559: [ SKGFD][6]Discovery advancing to nxt string :/dev/rdsk/c0t5d4:

2013-04-23 18:47:16.559: [ SKGFD][6]UFS discovery with :/dev/rdsk/c0t5d4:

2013-04-23 18:47:16.564: [ SKGFD][6]Fetching UFS disk :/dev/rdsk/c0t5d4:

2013-04-23 18:47:16.564: [ SKGFD][6]OSS discovery with :/dev/rdsk/c0t5d4:

2013-04-23 18:47:16.564: [ CSSD][6]clssnmvDiskVerify: Successful discovery of 0 disks

2013-04-23 18:47:16.564: [ CSSD][6]clssnmCompleteInitVFDiscovery: Completing initial voting file discovery

2013-04-23 18:47:16.564: [ CSSD][6]clssnmvFindInitialConfigs: No voting files found

From the messages it was pretty clear that for some reasons the voting disks (placed on the shared storage) are inaccessible to the node/s. When searched over the internet and in the My Oracle Support (MOS) with the combination of error codes, all the links were pointing to verify the ownership and permission on the voting disks. We found there were no issues with regards to the ownership and permissions on the voting disks, we even dumped the the disks with the DD command found no corruption and no ownership/permission issues. After 1 hour of hard struggles, there was a little hope about the issue when we come across of a MOS note (id 1508899.1) that explained an incident close to ours.

According the note, this issue was due to a bug : 14810756 and the workaround is to apply patch: 14810756 or rollback the OS patch PHCO_43004. There was no chance of applying the patch for us as we were not able to start-up the cluster, hence, we verified with the OS admin whether PHCO_43004 is part of the bundle patch that deployed a while ago on HPUX 11.3x plat form. The OS admin then confirmed us that the particular patch is indeed part of the patch bundle deployed a while ago. We then requested the OS admin to roll-back the patch in the context to try our luck. After rolling back the patch from a node, Clusterstack successfully started on the node. We did the same on the rest of the nodes and everything came back successfully.

The MOS note states that the issue likely to happen during the execution of the rootupgrade.sh script as part of the the cluster upgrade from 11.2.0.2 to 11.2.0.3 on the HPUX 11.3x platform, and when the voting disks is placed on disk/raw devices.

We fail to understand why the HP didn't mentioned about this behavior despite there were similar issues recorded and addressed on the HP forums.

Conclusion:

The motive of his blog entry is emphasize the importance of verifying the compatibility of the PATCH before deploying in any environment.

Also, it is highly advised to relink the binaries manually right after the OS patch deployment. The following demonstrates how to relink the binaries in 11gR2 GI RAC env.:

as the root user:

Unlock the CRS (ensure cluster stack is not running on the server)

$GRID_HOME/crs/install/rootcrs.pl -unlock

cd $ORACLE_HOME/rdbms/lib

make -f ins_rdbms.mk rac_on ioracle

References:

How to Check Whether Oracle Binary/Instance is RAC Enabled and Relink Oracle Binary in RAC [ID 284785.1]
hp-ux: 11gR2 GI Fails to Start or rootupgrade.sh Fails with "clsfmt: Received unexpected error 4 from skgfifi for file" if PHCO_43004 is Applied [ID 1508899.1]

Jaffar's (Mr RAC) Oracle blog

Expert Oracle RAC

5.31.2013

Introducing Java EE 7 - Live Webcast

5.28.2013

Expert Oralce RAC 12c - upcoming book

Table of contents

5.18.2013

A tricky standby database situation

5.14.2013

New Page - Data Guard

5.11.2013

Its data guard time for the team yet again

5.03.2013

Things to be considerd before/after the OS patch deployment

Total Pageviews