Post 8 node Oracle 10g cluster upgrade to 11g R2 with nearly 60 databases on HPUX OS, we have observed a significant increase in the server resources consumption, in particular for CPU and Memory. Due to the heavy resource consumption (or lack of sufficient resources time to time), a couple of nodes with nearly 10 databases running across each node started evicting more frequently. However, a couple of days back, one of the node went down for the same reason, and the subsequent node start up, the databases on the node refuse to start up automatically and we have noticed the memory and virtually memory consumption was just shooting up in no time. We identified the symptoms of the behavior and repaired the issue after spending nearly a half day time (it was a very silly mistake, I will blog about it later on). However, one particular RAC database was not able to start as its diskgroup was not mounted. When we tried mount the diskgroup in question manauly, we come across the following ORA error:
It was a bit of surprise to receive the ORA-15041 error while mounting the ASM diskgroup as I was in a impression that the error would come when the diskgroup is mounted and when it runs out of adequate space to cope up with the space reqruiements from the database. After all R&D and trying all the options in our capacity, we have logged a SR with the Oracle support and the issue was opened for 4 days... yes, the database was down for 4 days, luckily it was a development database plus a week-end too.
The following has been recorded in the ASM alert.log :
Thu May 19 16:04:47 2011
GMON dismounting group 57 at 143 for pid 32, osid 23046
NOTE: Disk in mode 0x8 marked for de-assignment
ERROR: diskgroup DG_XXXX was not mounted
ORA-15032: not all alterations performed
ORA-15202: cannot create additional ASM internal change segment
ORA-15041: diskgroup "DG_XXXX" space exhausted
ERROR: alter diskgroup DG_XXXX mount
Thu May 19 16:05:10 2011
SQL> alter diskgroup DG_XXXX mount
NOTE: cache registered group DG_XXXX number=57 incarn=0xdc516e6a
NOTE: cache began mount (not first) of group DG_XXXX number=57 incarn=0xdc516e6a
NOTE: Assigning number (57,0) to disk (/dev/rdsk/oracle/data/ln1/xxxxxxx)
Thu May 19 16:05:14 2011
GMON querying group 57 at 145 for pid 31, osid 23739
NOTE: cache opening disk 0 of grp 57: DG_XXXX_0000 path:/dev/rdsk/oracle/data/ln1/xxxxxx
NOTE: F1X0 found on disk 0 au 2 fcn 0.0
NOTE: cache mounting (not first) external redundancy group 57/0xDC516E6A (DG_XXXX)
Thu May 19 16:05:14 2011
kjbdomatt send to inst 1
kjbdomatt send to inst 3
kjbdomatt send to inst 4
kjbdomatt send to inst 5
kjbdomatt send to inst 6
Thu May 19 16:05:14 2011
NOTE: attached to recovery domain 57
NOTE: redo buffer size is 256 blocks (1053184 bytes)
Thu May 19 16:05:14 2011
NOTE: LGWR attempting to mount thread 4 for diskgroup 57 (DG_XXXX)
NOTE: ACD expansion required for disk group 57
Thu May 19 16:05:33 2011
WARNING: unable to grow ACD, probably out of space
ERROR: ORA-15041 signalled during mount of diskgroup DG_XXXX
NOTE: cache dismounting (clean) group 57/0xDC516E6A (DG_XXXX)
NOTE: lgwr not being msg'd to dismount
The above messages indicates that ASM is trying to update or recover some metadata information and no space has been left in the diskgroup.
Nearly four different Oracle Engineers handled the issue and neither a workaround nor a solution was offered by any of them. Upon escalating the issue further, the support engineer asked me to use the AMDU utility and asked us to execute the following script:
When the above script was executed, the following was appeared on the screen:
AMDU-00204: Disk N0109 is in currently mounted diskgroup DG_XXXXX
AMDU-00201: Disk N0109: 'ASM disks location'
The first message raised my eyebrows, and started to think where the hell the diskgroup is mounted as I had tried on multiple instances to mount the diskgroup and ended up with the error. I then decided to go through the rest of the 7 ASM instances to check where the diskgroup is mounted. I found that the diskgroup was mounted on the first node and I was bit disappointed and sort of cursing myself why I didn't check all the ASM instances before. Upon adding a new asm disk to the diskgroup from the node1 ASM instance, I was able to mount the diskgroup successfully on the other nodes in the context.
The good thing about this investigation was coming across of the AMDU utility which is ported with Oracle 11g Grid Control software. I have also learned from MOS ID 553639.1 that the utility also can be configured on Oracle 10g (read the note for further instructions).
A directory with a couple of files will be created in the current location after executing the script. You can read the report.txt file, however, you can't read the other informative files, i.e.,.map , as Map files are ASCII files that describe the data in the image files for a particular disk group.
Here is the extract of the note:
When the diskgroup is not mounted, this information is not available, which makes difficult to diagnose the errors avoiding the diskgroup to be mounted. This problem has been resolved with AMDU.
AMDU is a tool introduced in 11g where it is posible to extract all the available metadata from one or more ASM disks, generate formatted block printouts from the dump output, extract one or more files from a diskgroup (mounted/unmounted) and write them to the OS file system.
This tool is very important when dealing with internal errors related to the ASM metadata. Although this tool was released with 11g, it can be used with ASM 10g.
You can get the help about the utility with amdu -help
References:
https://twiki.cern.ch/twiki/bin/view/PDBService/ASM_utilities
Placeholder for AMDU binaries and using with ASM 10g [ID 553639.1]
Happy reading,
Jaffar
ERROR at line 1:
ORA-15032: not all alterations performed
ORA-15202: cannot create additional ASM internal change segment
ORA-15041: diskgroup "DG_XXXX" space exhausted
ORA-15032: not all alterations performed
ORA-15202: cannot create additional ASM internal change segment
ORA-15041: diskgroup "DG_XXXX" space exhausted
It was a bit of surprise to receive the ORA-15041 error while mounting the ASM diskgroup as I was in a impression that the error would come when the diskgroup is mounted and when it runs out of adequate space to cope up with the space reqruiements from the database. After all R&D and trying all the options in our capacity, we have logged a SR with the Oracle support and the issue was opened for 4 days... yes, the database was down for 4 days, luckily it was a development database plus a week-end too.
The following has been recorded in the ASM alert.log :
Thu May 19 16:04:47 2011
GMON dismounting group 57 at 143 for pid 32, osid 23046
NOTE: Disk in mode 0x8 marked for de-assignment
ERROR: diskgroup DG_XXXX was not mounted
ORA-15032: not all alterations performed
ORA-15202: cannot create additional ASM internal change segment
ORA-15041: diskgroup "DG_XXXX" space exhausted
ERROR: alter diskgroup DG_XXXX mount
Thu May 19 16:05:10 2011
SQL> alter diskgroup DG_XXXX mount
NOTE: cache registered group DG_XXXX number=57 incarn=0xdc516e6a
NOTE: cache began mount (not first) of group DG_XXXX number=57 incarn=0xdc516e6a
NOTE: Assigning number (57,0) to disk (/dev/rdsk/oracle/data/ln1/xxxxxxx)
Thu May 19 16:05:14 2011
GMON querying group 57 at 145 for pid 31, osid 23739
NOTE: cache opening disk 0 of grp 57: DG_XXXX_0000 path:/dev/rdsk/oracle/data/ln1/xxxxxx
NOTE: F1X0 found on disk 0 au 2 fcn 0.0
NOTE: cache mounting (not first) external redundancy group 57/0xDC516E6A (DG_XXXX)
Thu May 19 16:05:14 2011
kjbdomatt send to inst 1
kjbdomatt send to inst 3
kjbdomatt send to inst 4
kjbdomatt send to inst 5
kjbdomatt send to inst 6
Thu May 19 16:05:14 2011
NOTE: attached to recovery domain 57
NOTE: redo buffer size is 256 blocks (1053184 bytes)
Thu May 19 16:05:14 2011
NOTE: LGWR attempting to mount thread 4 for diskgroup 57 (DG_XXXX)
NOTE: ACD expansion required for disk group 57
Thu May 19 16:05:33 2011
WARNING: unable to grow ACD, probably out of space
ERROR: ORA-15041 signalled during mount of diskgroup DG_XXXX
NOTE: cache dismounting (clean) group 57/0xDC516E6A (DG_XXXX)
NOTE: lgwr not being msg'd to dismount
The above messages indicates that ASM is trying to update or recover some metadata information and no space has been left in the diskgroup.
Nearly four different Oracle Engineers handled the issue and neither a workaround nor a solution was offered by any of them. Upon escalating the issue further, the support engineer asked me to use the AMDU utility and asked us to execute the following script:
./amdu -diskstring 'ASM disks location' -dump 'diskgroup_name'
When the above script was executed, the following was appeared on the screen:
AMDU-00204: Disk N0109 is in currently mounted diskgroup DG_XXXXX
AMDU-00201: Disk N0109: 'ASM disks location'
The first message raised my eyebrows, and started to think where the hell the diskgroup is mounted as I had tried on multiple instances to mount the diskgroup and ended up with the error. I then decided to go through the rest of the 7 ASM instances to check where the diskgroup is mounted. I found that the diskgroup was mounted on the first node and I was bit disappointed and sort of cursing myself why I didn't check all the ASM instances before. Upon adding a new asm disk to the diskgroup from the node1 ASM instance, I was able to mount the diskgroup successfully on the other nodes in the context.
The good thing about this investigation was coming across of the AMDU utility which is ported with Oracle 11g Grid Control software. I have also learned from MOS ID 553639.1 that the utility also can be configured on Oracle 10g (read the note for further instructions).
A directory with a couple of files will be created in the current location after executing the script. You can read the report.txt file, however, you can't read the other informative files, i.e.,
Here is the extract of the note:
On Oracle 10g the content of a diskgroup can be reviewed if the diskgroup is mounted and there are internal views that display the specific allocation of the files (ASM files and Database files) inside of the disks.
When the diskgroup is not mounted, this information is not available, which makes difficult to diagnose the errors avoiding the diskgroup to be mounted. This problem has been resolved with AMDU.
AMDU is a tool introduced in 11g where it is posible to extract all the available metadata from one or more ASM disks, generate formatted block printouts from the dump output, extract one or more files from a diskgroup (mounted/unmounted) and write them to the OS file system.
This tool is very important when dealing with internal errors related to the ASM metadata. Although this tool was released with 11g, it can be used with ASM 10g.
You can get the help about the utility with amdu -help
References:
https://twiki.cern.ch/twiki/bin/view/PDBService/ASM_utilities
Placeholder for AMDU binaries and using with ASM 10g [ID 553639.1]
Happy reading,
Jaffar
No comments:
Post a Comment