7.09.2007

Change behavior of GROUP BY clause in Oracle 10g.

After successful upgrade of one of the busiest 9i database to 10g, we have received a very first
change of behavior from the users.

Users complained that after upgrading the database, some of the queries which were returning results in an order are now not returning the data in any order now.

The queries have GROUP BY clause, and after some research, we found out that
starting with 10g, Oracle uses the new HASH GROUP BY mechanism which will not guarantee the result in any order unless ORDER BY clause is specified.

HASH GROUP BY algorithm is explained in the following URL (looks old stuff)
http://www.ianywhere.com/developer/product_manuals/sqlanywhere/0901/en/html/dbugen9/00000428.htm

In a nutshell, in Oracle 9i, GROUP BY clause gives result in sort order, this behavior has
changed from 10g onwards.

The general and Oracle recommendation is, to add the ORDER BY clause in order to have 100% guaranteed order by result.

Following is the test which I have carried out on 9i and 10g database (UNIX PLATFORMS):

>>on 9i database

select owner,count(1) from dba_segments group by owner;

OWNER COUNT(1)
------------------------------ ----------
BARABC 42
BASABC 10
BIPABC 82
CPABC 23
DMIABC 40
DMI_TRGTABC 247


Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE
1 0 SORT (GROUP BY)
2 1 VIEW OF 'DBA_OBJECTS'
3 2 UNION-ALL
4 3 FILTER
5 4 TABLE ACCESS (BY INDEX ROWID) OF 'OBJ$'



>>on 10g database

OWNER COUNT(1)
------------------------------ ----------
SYSTEM 353
CISABCD 46
OEABCD 35
SCABCD 71

-----------------------------------------------------------------------------------
Id Operation Name Rows Bytes Cost (%CPU)
-----------------------------------------------------------------------------------
0 SELECT STATEMENT 18020 299K 81 (7)
1 HASH GROUP BY 18020 299K 81 (7)
2 VIEW DBA_OBJECTS 18020 299K 79 (4)


As you can see, 9i query returns the result in the ascending order, where as 10g, didn't.

If the ORDER BY clause is added on 10g database, the explain plan looks like below, an extra sorting is being performaned:

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE
1 0 SORT (ORDER BY)
2 1 SORT (GROUP BY)
3 2 VIEW OF 'DBA_OBJECTS'

The default behavior of GROUP BY clause in 9i, provides an internal sorting when the GROUP BY clause is used, avoiding an extra sorting, which in my opinion could improve the performance of the query.

I don't know the internal behavior of HASH GROUP BY algorithm, but, this will be useful when
you don't want data in any order after the GROUP BY clause.

After further investigations, I have found out that HASH GROUP BY has few bugs which
leads to an inaccurate result. As per Oracle, the bug it is resolved in 11g version.

Metalink : Bug 4604970 WRONG RESULTS WITH 'HASH GROUP BY 'AGGREGATION ENABLED .

Alex had discussed about this behavior at his blog:
http://www.oracloid.com/2006/05/hash-group-by-can-give-wrong-result-in-oracle-102/

Disable HASH GROUP SORT in 10g:

Oracle recommend setting the _gby_hash_aggregation_enabled parameter = false OR
optimizer_features_enabled=9.2.0 to get rid of HASH GROUP BY mechanism.

alter session set "_gby_hash_aggregation_enabled" = FALSE;

Id Operation Name Rows Bytes Cost (%CPU)
---------------------------------------------------------------------------------
0 SELECT STATEMENT 2181 37077 997 (2)
1 SORT GROUP BY 2181 37077 997 (2)
2 VIEW SYS_DBA_SEGS 2181 37077 996 (1)

Julian Dyke has a brief explanation note and a example at his site:

http://julian.dyke.users.btopenworld.com/com/Optimisation/Operations/HashGroupBy.html

My colleague did a small test on 9.2.0.7 database where he found out an interesting point that the ordering of the result is different when ORDER BY clause is not specified with the GROUP BY.

drop table test;
create table test(a varchar2(30));

insert into test values ('ahmed');
insert into test values ('zubair');
insert into test values ('11');
insert into test values ('-11');
insert into test values ('Zubair');
insert into test values ('test');

commit;


SQL> set autotrace on exp
SQL> select a, count(*) from test group by a;

A COUNT(*)
------------------------------ ----------
-11 1
11 1
Zubair 1
ahmed 1
test 1
zubair 1

6 rows selected.


Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE
1 0 SORT (GROUP BY)
2 1 TABLE ACCESS (FULL) OF 'TEST'



SQL> select a, count(*) from test group by a order by a;

A COUNT(*)
------------------------------ ----------
-11 1
11 1
ahmed 1
test 1
Zubair 1
zubair 1

Please note that the Schema names are modified to show the test case.

References:

Refer Metalink Note : 345048.1 - 'Group By' Doesn't not Sort If you don't use order by in 10g.
http://searchoracle.techtarget.com/expert/KnowledgebaseAnswer/0,289625,sid41_gci1251893,00.html
Update : (14 July)

I was reading the Oracle University Introduction to SQL course PPTs and in the 'Aggregate Data using The Group Functions' it did mention that the rows are sorted by ascending order of the columns included in the GROUP BY list.

I personally think that was a great idea, it indeed avoid an extra sorting (comes using by ORDER BY clause.


Happy Reading,

Jaffar

4 comments:

Vit said...

Well, Oracle always said that the only way to gaurantee result set ordering is to use the ORDER BY clause - finally, the advice proved true.

And with the example on 9.2.0.7 - thre reasons seems to be the nls sort used - the plain group by looks like a plain binary sort, the order by uses some NLS sort.

The Human Fly said...

>>
And with the example on 9.2.0.7 - thre reasons seems to be the nls sort used - the plain group by looks like a plain binary sort, the order by uses some NLS sort.
>>
I really need to see that difference.

Vit said...

I was able to reproduce this - try:

select * from test order by nlssort(a,'nls_sort = binary');
- this gives the first result set


select * from test order by nlssort(a,'nls_sort = arabic');
- this gives the second result set

Coskan Gundogar said...

I think the parameter only affects the sort algoritm because in Oracle XE I got the same results with different plans


sys@XE> alter session set "_gby_hash_aggregation_enabled" =true;

Session altered.

sys@XE> select owner,count(1) from dba_segments group by owner;

OWNER COUNT(1)
------------------------------ ----------
MDSYS 96
PR 2
TSMSYS 4
FLOWS_020100 640
OUTLN 8
CTXSYS 74
HR 41
FLOWS_FILES 7
SYSTEM 357
SCOTT 20
DBSNMP 25
XDB 753
SYS 1851

13 rows selected.

sys@XE> alter session set "_gby_hash_aggregation_enabled" = FALSE;

Session altered.

sys@XE> select owner,count(1) from dba_segments group by owner;

OWNER COUNT(1)
------------------------------ ----------
MDSYS 96
PR 2
TSMSYS 4
FLOWS_020100 640
OUTLN 8
CTXSYS 74
HR 41
FLOWS_FILES 7
SYSTEM 357
SCOTT 20
DBSNMP 25
XDB 753
SYS 1851

13 rows selected.

sys@XE>