A brand new Exadata X5-2L eighth rack (I knew the latest is X7 now, but it was for a POC, so no worries) has been deployed recently at a customer for Oracle EBS Exadata migration POC purpose. Its wasn't an easy walk in the park as I initially presumed. There were some challenges (network, configuration) thrown during the migration, but, happily overcome and had it installed and EBS database migration completion.
So, I am going to share yet another Exadata bare metal deployment story explaining the challenges I have faced, and how they are fixed.
Issue 1) DB network cable issues:
After successful execution of the elasticConfig, all the Exadata factory IP addresses have been set to client IPs. Though the management network was accessible from the outside, client network was not accessible. When verified with the network team about enabling the ports on the corporate switch, they confirmed that the ports are enabled, however, the connection is showing as not active and asked to us investigate the network cables connected to the DB nodes. When we verified the network cables ports, we didn't find any lights flashing and after an extensive investigation (Switch ports, SFP on Exadata and Corporate switch, checking the cables status), it was found that the cables pin direction was not properly connected. Also, found that the network bonding interfaces (eth4 and eth5) were not up, confirmed from ethtool eth1 command. After fixing the cables, and bringing up the interfaces (ifup eth4 & eith5), we could see that cables are connected properly and we can also see the lights on the ports.
$ethtool eth4 (shows the interfaces were not connected)
Settings for eth4:
Supported ports: [ TP ]
Supported link modes: 100baseT/Full
1000baseT/Full
10000baseT/Full
Supported pause frame use: No
Supports auto-negotiation: Yes
Advertised link modes: 100baseT/Full
1000baseT/Full
10000baseT/Full
Advertised pause frame use: No
Advertised auto-negotiation: Yes
Speed: Unknown!
Duplex: Unknown! (255)
Port: Twisted Pair
PHYAD: 0
Transceiver: external
Auto-negotiation: on
MDI-X: Unknown
Supports Wake-on: d
Wake-on: d
Current message level: 0x00000007 (7)
drv probe link
Link detected: no
Issue 2) Wrong Netmask selection for client network:
After fixing the cable issues, we then continued with onecommand execution. During the validation it failed because of different netmask for client network (under the cluster information section). The customer unfortunately made a mistake in the client network netmask selection for cluster settings, so there was a difference in the client netmask value for client and cluster. This was fixed by modifying the netmask value in the ifcfg-boneth0 file (/etc/sysconfig/network-scripts), restart the network services.
Issue 3) Failed eight Rack configuration (rack type and disk size):
Since the system was delivered somewhere during end of August 2015, no one actually knows exactly the disk size and rack model. The BOQ (Bill of quantity) for the order only shows X5-2 HC storage. So, there was wrong selection in the OEDA for Exadata rack and disk size. Instead of 4TB disk size, it was selected as 8TB disk size and instead of Elastic configuration, fixed Eighth rack was selected. This was fixed by rerunning the OEDA with the correct options.
Issue 4) Cell server IP issues:
There was another obstacle faced while doing the cell connectivity (part of onecommand). Cell server IPs were not modified by the elasticConfig. Fortunately, I found my friend blog on this topic and quickly fixed the issue. This is why I like to blog all the technical issues, who knows, this could solve someone pains.
http://blog.umairmansoob.com/exadata-deployment-error-cell-02559/
Issue 5) SCAN Listener configuration:
Cluster validation failed due to inconsistent values for scan name. During the investigation of various issues, private, public & scan IPs are put in the /etc/hosts file. So, while configuring LISTENER_SCAN2 and SCAN3, this issue happened. This was fairly understandable. Due to 3 entries of scan values in the /etc/hosts file this happened. Upon a quick google about the issue, the following blog helped me to fix the issue
https://learnwithme11g.wordpress.com/2010/09/03/how-to-add-scan-listener-in-11gr2-2/
Finally, I have managed to deploy the Exadata successfully and perform the Oracle EBS database migration . No doubt, this experience really made me strong in network and other areas. So, every challenges comes with opportunity to learn.
I thank those individuals who really write blogs and share their experience to help Oracle community.
There is still one open issues which is yet to be resolved. A slow sqlplus and db startup. I presume this is due to heavy resource utilization over the server. Yet to resolve the mystery. Stay tuned for more updates.
Update : 06-Nov-2017.
sqlplus slowness resolved by increasing the hugepages from 26gb to 96GB.
So, I am going to share yet another Exadata bare metal deployment story explaining the challenges I have faced, and how they are fixed.
Issue 1) DB network cable issues:
After successful execution of the elasticConfig, all the Exadata factory IP addresses have been set to client IPs. Though the management network was accessible from the outside, client network was not accessible. When verified with the network team about enabling the ports on the corporate switch, they confirmed that the ports are enabled, however, the connection is showing as not active and asked to us investigate the network cables connected to the DB nodes. When we verified the network cables ports, we didn't find any lights flashing and after an extensive investigation (Switch ports, SFP on Exadata and Corporate switch, checking the cables status), it was found that the cables pin direction was not properly connected. Also, found that the network bonding interfaces (eth4 and eth5) were not up, confirmed from ethtool eth1 command. After fixing the cables, and bringing up the interfaces (ifup eth4 & eith5), we could see that cables are connected properly and we can also see the lights on the ports.
$ethtool eth4 (shows the interfaces were not connected)
Settings for eth4:
Supported ports: [ TP ]
Supported link modes: 100baseT/Full
1000baseT/Full
10000baseT/Full
Supported pause frame use: No
Supports auto-negotiation: Yes
Advertised link modes: 100baseT/Full
1000baseT/Full
10000baseT/Full
Advertised pause frame use: No
Advertised auto-negotiation: Yes
Speed: Unknown!
Duplex: Unknown! (255)
Port: Twisted Pair
PHYAD: 0
Transceiver: external
Auto-negotiation: on
MDI-X: Unknown
Supports Wake-on: d
Wake-on: d
Current message level: 0x00000007 (7)
drv probe link
Link detected: no
Issue 2) Wrong Netmask selection for client network:
After fixing the cable issues, we then continued with onecommand execution. During the validation it failed because of different netmask for client network (under the cluster information section). The customer unfortunately made a mistake in the client network netmask selection for cluster settings, so there was a difference in the client netmask value for client and cluster. This was fixed by modifying the netmask value in the ifcfg-boneth0 file (/etc/sysconfig/network-scripts), restart the network services.
Issue 3) Failed eight Rack configuration (rack type and disk size):
Since the system was delivered somewhere during end of August 2015, no one actually knows exactly the disk size and rack model. The BOQ (Bill of quantity) for the order only shows X5-2 HC storage. So, there was wrong selection in the OEDA for Exadata rack and disk size. Instead of 4TB disk size, it was selected as 8TB disk size and instead of Elastic configuration, fixed Eighth rack was selected. This was fixed by rerunning the OEDA with the correct options.
Issue 4) Cell server IP issues:
There was another obstacle faced while doing the cell connectivity (part of onecommand). Cell server IPs were not modified by the elasticConfig. Fortunately, I found my friend blog on this topic and quickly fixed the issue. This is why I like to blog all the technical issues, who knows, this could solve someone pains.
http://blog.umairmansoob.com/exadata-deployment-error-cell-02559/
Issue 5) SCAN Listener configuration:
Cluster validation failed due to inconsistent values for scan name. During the investigation of various issues, private, public & scan IPs are put in the /etc/hosts file. So, while configuring LISTENER_SCAN2 and SCAN3, this issue happened. This was fairly understandable. Due to 3 entries of scan values in the /etc/hosts file this happened. Upon a quick google about the issue, the following blog helped me to fix the issue
https://learnwithme11g.wordpress.com/2010/09/03/how-to-add-scan-listener-in-11gr2-2/
Finally, I have managed to deploy the Exadata successfully and perform the Oracle EBS database migration . No doubt, this experience really made me strong in network and other areas. So, every challenges comes with opportunity to learn.
I thank those individuals who really write blogs and share their experience to help Oracle community.
There is still one open issues which is yet to be resolved. A slow sqlplus and db startup. I presume this is due to heavy resource utilization over the server. Yet to resolve the mystery. Stay tuned for more updates.
Update : 06-Nov-2017.
sqlplus slowness resolved by increasing the hugepages from 26gb to 96GB.
No comments:
Post a Comment