1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14.

Tuesday, February 19, 2008

Histograms On Columns Containing Unique Data ?

I recently came across an oddity in 10g (10.2.0.3.0) whereby a problem with an execution plan was resolved by creating a histogram on a column containing unique data. This was the query:-





To trouble shoot the issue I ran the query with the gather_plan_statistics hint as per the SQL trouble shooting guide mentioned by Greg Rahn on his blog (www.structureddata.org), this is a technique for finding the root problem of an execution plan based on the predicated cardinality of an operation in the plan being different from the actual cardinality once the SQL has run. Usually the difference between the predicted and actual cardinality will be out be an order of magnitude. I'm not sure who first came across this but the first reference I have found to such a technique was on Wolfgang Breitling's web site (www.centrexcc.com). Here Wolfgang refers to this technique as "Tuning by cardinality feedback".


Once you have obtained your 'enhanced' plan you look for the inner most operation where predicted and actual cardinality is different by a significant margin. In my plan this is line 12:-





The index AD_SERVICE_POINT_IX30 is a function based index and the column SYS_NC00055$ is the pseudo column associated with it. Note that the predicate used with line 12 is a LIKE and that the statistics were gathered on this table by the built in GATHER_STATS_JOB, i.e. Oracle was left to it's own devices to determine what histograms to create and the number of buckets to create the histograms with. After working through the issue with Oracle Support it transpired that the resolution to this was to create a histogram "by hand" on the function based index with the maximum number of buckets:-

exec dbms_stats.gather_table_stats(null,
'AD_SERVICE_POINT',
cascade=>false,
method_opt=>'for columns SYS_NC00055$ size 254');
This is the plan now that the histogram has been manually created:-




In fact Oracle didn't create a histogram at all, there was a histogram on SYS_NC00055$, but it only two rows in user_tab_histograms view for the pseudo column, i.e. all the values were in one bucket = no data distribution information. The number of end points in a histogram is one per bucket + 1. The apology Jonathan Lewis uses to describe this in his Cost Based Fundamentals book is that of the number of fence posts you have per fence panels. After a bit more digging by Oracle support the explanation for method_opt=> FOR . . . COLUMNS SIZE AUTO getting this wrong was because the LIKE predicate is irregular as such there was insufficient information in col_usage$ from which DBMS_STATS could determine what sort of histogram to create. col_usage$ is a dictionary table the RDBMS engine uses to record the usage of a column every time a query is parsed. I assumed that "LIKE being irregular" referred to to the predicate being a special case in terms of it accepting wild card characters as part of it's input strings.

I have come across a couple of references in material alluding to the GATHER_STATS_JOB not being infallible, Greg Rahn refers to the defaults as " . . . a good start but not perfect in specific situations . . ." in "When To Use The Appropriate Database Technology" presentation made at OOW 2007. Richard Foote (http://richardfoote.wordpress.com/) is also not a big fan of method_opt=>' . . . SIZE AUTO' as he claims that this can lead to histograms being created when they shouldn't be and histogram creation omittion when histograms are genuinely required, this Richard goes on to say can lead to excessive latching around the shared pool, there is also the inference that this can lead to sub optimal plans.

Futher reading:-

No comments:

 
1. 2.