Since last month, all of sudden our data warehouse database(size 1.7tb) RMAN full database backup started failing with the following error:
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-03009: failure of backup command on ch09 channel at 06/01/2007 13:15:15ORA-19502: write error on file "OFDMP_Dly_BACKUP_DBFFri010607krij6c1h_1_1", blockno 10731521 (blocksize=1024)ORA-27030: skgfwrt: sbtwrite2 returned errorORA-19511: Error received from media manager layer, error text:VxBSASendData: Failed with error:Server Status: Communication with the server has not been iniatated or the server status has not been retrieved fromthe server.
Looking into the above error message it is clearly understood that the problem is definitely relates to Media Manager as the communication between the channels from the client side to the netbackup server side loosing the communication after sometime.
We reported this problem with Veritas (vendor) and their support staff were here to run the back in trace mode and took those trace file for further investigations.
Meanwhile, I have also opened a p1 TAR with Oracle Support, they simply said that the problem is moreover related to Veritas and we should contact Veritas to resolve the issue.
It was like not having backups for more than 1 month (luckily we have DRC in sync with the primary).
Surprisingly, backing up a single, set of datafiles or tablespace was working fine, the problem is only when backing up entire database. Once it finishes 1 hr of time, backup was failing with the above stated error message.
Upon Veritas recommendation we gave look at NET_BUFFER_SZ value in the Netbackup Server and the Client side(the database server configured as netbackup client, since there is no server edition available on HP Superdom, we were using Netbackup client) we found out that the client and the server has different values in the NET_BUFFER_SIZE file.
We changed the value of this file in the client side to match with the server side file and backup started working fine.
As per general recommendation, NET_BUFFER_SZ file value on the client should be <= to the value on the server side.
Now the backup is running without any issues. No one has clue how this file got changed the values.