Applying a PSU patch in a large, complex cluster env.? Think of these basics

After successfully applying the latest PSU (7) patch in two non-production and one production 11gR2 cluster environments, now its time to get ourselves more tougher and stronger. Its now time to apply the patch in the most business critical and very complex cluster (20 node) environment.

With our past patching experience in different cluster environments, we noted that the average time that took to patch an individual home, GI/RDBMS was about 45 min, overall close to 2 hours duration for a node. Keeping that equation in mind, to patch a 20 node cluster env., it gonna roughly take about 40 hours non-stop action to complete the task.

Having said that, this time around we have got different business requirements, like, patching  nodes randomly and splitting the action into two consecutive week-ends. Although, the patching activity is going to apply in a rolling fashion, which gives us the advantage to prevent service interruption,  I had raised the concerns mentioned below with Oracle  Support:

  • Is it mandatory to patch all nodes in a cluster in sequence? Can we apply the patch randomly on the nodes? (we had a terrible time once applying the patch randomly, wanted to take no risk this time around).
  • Is there any workaround to expedite the patch deployment duration? Like, applying the patch in parallel, i.e., patching multiple nodes together.
  • Can we split the activity in to two weeks time frame?
Here is what Oracle Support has to say:
  •  The sequence does not matter technically. Any node can be patched as per the workload/availability conditions.
  • ent versions of the components. So in sort, please **do not ** keep the patching incomplete for more that 24 hours.***BUT*** it is not supported to keep some of the nodes unpatched, since this can bring un-stability in form of direct incompatibility between some component or in form of differential performance due to differ
  • Please do not attempt patching the patching of the nodes in parallel, since the OCR gets updated separately for individual node, you are introducing a huge risk of "soft corruption" in the OCR by unintended overlapping of the OCR updates.
I am not pretty convinced with the last point. With 11gR2, Oracle cluster now maintains a local OCR copy on every individual node in the cluster. Why don't they make use of this opportunity and let us patch in parllel to speed up the process on a very large and complex environment.

I would be desperately looking forward for the parallel patching improvement in the future Oracle releases.

Happy reading




Kamran Agayev A. said...

Nice post, thanks for sharing Jaffar

Mahir M. Quluzade said...

Interesting post, thanks for sharing Syed!