J2EE and Oracle Performance Musings: Why Does The Optimizer Get It Wrong ?

Stale Statistics
When gathering statistics knowing the rate at which the base data changes and when it changes is paramount. Some applications for example load temporary 'scratch' tables with data and use these tables in a subsequent SQL statements and then blow the data away. Dynamic sampling can help here, however the cost in terms of time added to the parsing of a statement due to dynamic sampling has to be balanced with the amount of time it takes to execute the query. For example, dynamic sampling may be more suitable for data warehouse and reporting applications where parse time are a small fraction of the total SQL execution time as opposed to OLTP type applications. Indeed, when, how often and how to gather statistics has provoked a lot of debate within the DBA community, here is the take of one expert on this very subject.

Tables with columns generated from sequences whose statistics become stale are particularly prone to throwing execution plans out. There is a graph in his "Cost-Based Oracle Fundamentals" book by Jonathan Lewis illustrating how the accuracy of predicted cardinalities degrades as values used in equality predicates get increasingly out of range of the column's high low values.

Dave Ensor (stalwart of the UKOUG DBMS SIG presentation circuit) once made a comment dubbed the statistics gathering paradox, which stated that the only time it is safe to gather statistics was when this made no changes to execution plans (assuming they are optimal in the first place).

Even Column Data Distribution Assumtpion
By default the optimizer assumes that column data is distributed evenly and uniformly, this can be corrected by the creation of histograms, however you should be aware that if method_opt=>'. . . SIZE AUTO' is used, you may find cases where histograms are created when they are not required, histograms not being created when they are needed and histograms created with the wrong number of buckets. I found this to be the case with a function based index used in conjuction with a LIKE predicate.

Getting Statistics Wrong
Using estimate_percent=>DBMS_STATS.AUTO_SAMPLE_SIZE in 10g can be less than 100% accurate when determining the numbers of distinct values for highly skewed columns. Oracle 11g uses a new algorithm to gather more accurate statistics when auto sampling is used, refer to Greg Rahn's blog entry on 11g auto sampling.

Maximum Number Of 254 Buckets Per Histogram
This issue bites when you require skew information on more than 254 distinct values. Some material refers to "Popular values" in a histogram, these are values which are bucket end points. In his book "Oracle Cost Based Fundamentals", Jonathan Lewis provides an example of creating a histogram in favour of supplied popular values. A more extreme solution is to partition tables with more than 254 distinct values.

Oracle Bugs
All software contains bugs, even software produced by world class software producers who adhere to exacting software engineering practices and standards such as level 5 of the Carnegie Melon Capability Maturity Model produce software with bugs in, this is not a slight on Oracle.

Bugs in the optimizer cardinality engine

Bugs in DBMS_STATS.

Costing bugs, plans incorrectly selected.

Transformations not being costed, in Oracle 10g some transformations that were never costed in 9i are now costed in 10g. With every new release of Oracle, new parameters prefixed by underscores appear (hidden parameters), these usually relate to new optimizer features which are not always costed until the next release appears. All the parameters relating to the CBO can be found by running a 10053 trace. I'm sure that as I write this, the likes of Jonathan Lewis are already on the case with working what new and hidden parameters affect the CBO in 11g and what they do.

Selectivity On Columns Used In Expressions
Until you start using extended statistics with Oracle 11g, the optimizer has no way of determining the correct selectivity for expressions, e.g. TRUNC(col1) = . . . , this can sometimes lead to sub optimal plans, 11g allows this to be rectified using:-

DBMS_STATS.CREATE_EXTENDED_STATS(ownname => user,
tabname => 'MYTAB',
extension => 'TRUNC(col1)')

Incorrect Configuration Of The Cost Based Optimizer Environment
The Oracle Real World Performance Group recommends that certain parameters affecting the cost based defaults are left at their defaults, refer to Greg Rah's blog:-

"I think it’s important to understand what variables influence the Optimizer in order to focus the debugging effort. There are quite a number of variables, but frequently the cause of the problem ones are: (1) non-default optimizer parameters and (2) non-representative object/system statistics. Based on my observations I would say that the most abused Optimizer parameters are:

OPTIMIZER_INDEX_CACHING

OPTIMIZER_INDEX_COST_ADJ

DB_FILE_MULTIBLOCK_READ_COUNT

Many see setting these as a solution to get the Optimizer to choose an index plan over a table scan plan, but this is problematic in several ways:

1. This is a global change to a local problem
2. Although it appears to solve one problem, it is unknown how many bad execution
plans resulted from this change
3. The root cause of why the index plan was not chosen is unknown, just that tweaking
parameters gave the desired result
4. Using non-default parameters makes it almost impossible to correctly and effectively
troubleshoot the root cause"

Data Correlation / The Predicate Dependence Assumption
Oracle always assumes that predicates are always independent when calculating selectivity and cardinalities. If you have a table with two columns e.g. salary and tax bracket, if you write a query with predicates that refer to these columns there is a dependence between these columns which the optimizer will always assume does not exist, therefore it will work out their cardinalities independantly when patently they are not independant. At a very high level the selectivity of the predicates that is ANDed together is calculated as:-

selectivity(col1) * selectivity(col2)

and for columns ORed together:-

selectivity(col1) + selectivity(col2) - selectivity(col1 AND col2)

In an example in "SQL Tuning By Statistics" Wolfgang Breitling fixes this by setting the column statistic density by hand.

Dynamic sampling first introduced in Oracle 9i can help with this.

Extended statistics in Oracle 11g, Greg Rahn provides an example of this:-
SELECT DBMS_STATS.CREATE_EXTENDED_STATS(USER, 'CALENDAR', '(MONTH, ZODIAC)') FROM DUAL;

Violation Of The Join Uniformity Assumption The CBO assumes that there is a one to one correspondance between rows joined from one row source to rows in other row sources. I came across this on Wolfgang Breitlings web site. "Facalcies Of The Cost Based Optimizer" refers to this.

Tuesday, February 19, 2008

Why Does The Optimizer Get It Wrong ?

No comments:

J2EE and Oracle Performance Musings

Some of my favorite links . . .

Blog Hit Count

My Blog List

Blog Archive

About Me