J2EE and Oracle Performance Musings: 2008

Friday, December 26, 2008

Nuances Of JDBC Connection Pools

There was a particular piece of code I was tuning a while back which highlighted some of the quirks of JDBC connection pooling. A piece of PL/SQL created and executed an INSERT SELECT statement based upon the parameters it was called with from a WebSphere application server. The statement was bind friendly, but due to bind peeking a lot of on the fly re-optimizations were taking place which hit performance.

A common misconception with Oracle is that execution plans are only re-generated when a hard parse takes place or when an execution plan is aged out of the shared pool, this is not the case, v$sql_shared_cursor gives numerous reasons as to why re-optimizations and also the generation of child cursors take place. However, as per a previous posting I made, this view does not capture bind peeking activity.

As a quick fix, and yes I know this solution was only really a crutch, I placed an execute immediate statement at the top of the code to set cursor_sharing to force. I could have also fixed this using the hint opt_param('_optim_peek_user_binds', 'false'), with the benefit of hind sight and the rule of thumb that the scope of the fix should match the scope of the problem, this may have been a better way to have gone.

I did not realise at the time that this alter session would affect other SQL statements executed by the application server that reused this particular connection from the pool, with disastrous results on performance. In particular, a batch process which executed the same statement with different binds multiple times was not being re-optimized at execution time, when in fact Oracle had just cause to do this. Fortunately this particular scenario was constrained to one test environment I was working on. The resolution to the problem was to put another execute immediate statement at the bottom of the stored procedure to set cursor_sharing back to what is was set to before.

A popular technique to avoid database access and solve mutating trigger problems is to use a PL/SQL array declared in a PL/SQL package specfication. Indeed, the other nuance of JDBC connection pooling is that if an SQL call from the Java code intialises such a variable, and then the connection is released in the Java code, subsequent calls from the same Java thread, may not necessarily reuse the same connection from the JDBC connection pool and thus not see the data cached in the PL/SQL package.

Good Java programming practice dictates that you should always lazily grab resources and then release them as soon as they are no longer required, incresing the likelihood that threads may use many different connections from the JDBC connection pool. Also note that when obtaining a JDBC connection in Java code, what you are in fact doing is obtaining a handle to a connection from the pool . New connections to the database should only be made:-

1. After the application server starts up

2. In the minimum and maximum connection settings for the pool are not the same and the application server needs to grow the size of the pool up to the maximum setting.

A Presentation On Trouble Shooting Bad Execution Plans In Oracle

Oracle Sql Tuning

View SlideShare presentation or Upload your own.

Tuesday, December 23, 2008

Performance Bottlenecks In The J2EE App Server That Can Throttle The Database

I mentioned in my post about layered architectures that I would discuss factors that can throttle throughput from the J2EE application server to the database (or persistence layer). To start of with, I will focus of the ethos of designing distributed object architectures, in the words of Martin Fowler:-

". . . Hence, we get to my First Law of Distributed Object Design: Don't distribute your objects!"

Distributing components between J2EE application servers in the J2EE world, involves the use of remote method invocation (RMI) calls to remote interfaces, assuming we are talking about synchronous communication here. Remote method calls to beans deployed to remote servers incurs the following performance overheads:-

Copy by value

Network latency

Every incoming RMI call needs to be serviced by its own Java thread

Servicing incoming connections from the TCP-IP stack causes context switches

The original text behind this design ethos appears in Martin's book "Patterns Of Enterprise Architecture", there is also a very similar article that Martin has published in Dr Dobbs journal which can be found here.

If all the beans are deployed to the same application server, this will eliminate all the bottlenecks except for that of copy by value. However, as you will see, the overhead of pass by copy is still significant and will both increase heap utilisation and extend the part of major garbage collection that looks for dangling objects on the heap. On the last project I worked on at the time of writing this, we found that:-

The 10g database time (not to be confused with wall clock time) was low.
Average number of sessions from ADDM reports was low.
The CPU on the WebSphere server was maxed out.

Profiling the code with a Java profiler revealed that 50% of the CPU time was being expended on a method with a name like com.ibm.CORBA.util.copyObject. This was because the business logic or domain beans were calling the database access beans via remote interfaces, which uses pass by value.

For the benefit of anyone reading this who is not that familiar with the J2EE world, I will elaborate on pass by reference, pass by value and remote and local homes. One of the original design goals being the J2EE standard was to present a framework in which distributed applications could easily be produced, this was based on CORBA and the synchronous communications method was RMI. Prior to the EJB 2.1 standard, it was assumed that when one enterprise bean called another bean, the 'other' bean was always remote, hence a remote interface and copy by value was always used. However, as time went by people realised that despite this, a lot of applications used beans that were deployed to the same J2EE app server and hence suffered this performance penalty for no added benefit. Thus, in the EJB 2.1 standard 'local' interfaces came about. Beans talking to each other using local interfaces use the much more efficient pass by reference method of passing object references rather than whole objects by value.

Fortunately with WebSphere there is an option available on the Object Request Broker (ORB), to turn pass by reference 'On', this forces all the beans deployed to the same EJB container using the same class loader to use pass by reference, irrespective of whether they have been coded to use remote interfaces. In the case of the application that I was looking at, this had the rather large impact of:-

Reducing batch process run times by 50%.

Reducing CPU consumption on the server hosting the WebSphere application server by 50%.

Doubling the 10g db time that a batch process spent in the database.

Increasing the average number of sessions active whilst the batch process was running, again this is an Oracle 10g time metric which is the database time divided by the actual elapsed "walk clock" time.

Setting the class loader policy on for the application server to single also helps. In tuning WebSphere, I also looked at the:-

server JVM

client JVM

ORB with regard to the setting up a JNI reader thread pool and the fragment size

JDBC pool, both the size of the pool, pool utilization and cached statement discard rate.

EJB container, specifically the EJB cache size

Log4j logging level

number of batch process threads
turning off the diagnostic trace

To the best of my knowledge, the only thing I didn't look at was turning the PMI off completely. For both the client and server JVM, I found that using the JVM otpions as used by IBM in a Solaris SPECjAppserver2004 submission yielded the best results. Refer to this link for a blog on which some of the IBMers who carry out this work write, in particular Andrew Spyker, WebSphere performance architect.

However, out of all the things I looked at, I got the biggest bang for my buck from turning pass by reference to 'On' by a long chalk. This comes with a word of caution though, if your software is designed such that beans that receive RMI calls modify objects and the calling bean is not expecting this, wild and erratic application can result. It is for this very reason, or so I believe , that pass by reference is now turned 'Off' by default on WebSphere 6.1.

This brings me on to J2EE clustering and shared nothing architectures. WebSphere network deployment, the clustered version of the WebSphere uses a shared nothing architecture, similar to that discussed in this article here, which again re-iterates the point about distributed architectures and performance:-

"The most scalable architectures are sharing-nothing clusters. In such clusters, each node has the same functionalities, and knows nothing about the existence of other nodes. The load balancer will decide how to dispatch the requests to the back end server instances. It is rare that a load balancer will become a bottleneck, because a load balancer only need to do some very simple jobs, such as distributing requests, health checking, and session sticking. By adding more nodes, the computing ability of the cluster is increasing almost lineally, if the back end database or other information systems are powerful enough."

With WebSphere network deployment, the same ear file is deployed to each node in the cluster, or cell, to use IBM terminology, workload distribution is managed in a round robin fashion via the workload manager. WebSphere eXtreme scale adds some extra intelligence to this by distributing the work load based on the current available capacity of nodes in the cluster.

What exacerbated my original problem of the massive copy by object overhead was the batch processing design, which is fundamentally this. The problem being that because Oracle is called for every individual item in the workload, there is heavy traffic between the business logic layer and the persistence layer. Accessing a database item by item is really inefficient in terms of:-

1. Hard and soft parsing, in particular I found that setting the batch process thread count above 32 stressing the library cache pin mutex. Tanel Poder has written a good article on the library cache pin mutex and extended use of mutexes in 11g here.

2. Latching

3. Network round trips and network latency

4. Physical and logical reads

5. JDBC calls

Some people get very agitated about batch process designs that process the workload item by item and in principle this results in certain inefficiencies when accessing the database. However, the performance targets for the batch processes where the same as those for the system this new one was replacing, but with 200% of the workload and two years worth of "Database aging", i.e. the database had been artificially aged to give the appearance of the application having been in use for two years. In summary, all batch processes beat their counter parts in the original application by at least a factor of 100%.

The morals of this story are:-

1. Distributed object architectures are bad for performance

2. Your application needs to be profiled in order to work out where the time is going, in the example I have highlighted, the database was never the bottleneck.

3. A holistic approach to looking at the architecture needs to be taken with regard to achieving good performance and scalability.

I will get around to discussing approaches on alleviating the stresses that item by item oriented batch processing imposes on the Oracle database in a future post.

Monday, December 01, 2008

J2EE Layered Application Architectures and Performance

I have spent most of my career as a DBA focusing on performance. With most applications there is an intimate relationship between the design and architecture of the application and that of its performance and scalability. Generally speaking, if the design and architecture is wrong, getting an application to perform and scale can be akin to applying elastoplasts over the biggest holes.

As an industry, IT has a patchy record for delivering projects on time and to budget, as such there are many projects out there that barely manage to cross the finishing line having met all of their functional requirements let alone performance requirements. Due to this, tuning can be reactive and end up becoming a battle against the architecture and design.

There was a time when most applications consisted of a client and business logic written directly in the database using stored procedures. Indeed the database software formed the largest part of the application stack. However, more recent architectural trends have been heavily influenced by:-

The internet, Web 2.0 has firmly established itself and Web 3.0 is around the corner

Service Oriented Architectures (SOA)

Software as a service (SAAS)

Event driver architectures
Extreme transaction processing
Grid architectures
Mash ups

Indeed, there was a time when the database was a focal point of integration efforts, now the industry is moving towards integration via XML based middleware. Here is one such article on this very subject from design and architecture luminary Martin Fowler. In the current climate "integration architects" are highly sort after people.

Cloud computing is on its way and just as during the old web 1.0 days, to quote a marketing tag line from Sun. "We were into intranets when most people thought this was a typo", there will also be internal clouds as well. Therefore if you have an interest in performance and tuning, it is highly likely that you will have some involvement in the areas outside the database and find that the database is being under utilised because of what is happening in the layers above it. Cary Millsap made a presentation at Oracle Open World 2007 highlighting the importance of establishing application profiles when tuning.

One of the most prevalent software architectures today uses a combination of layering and tiering, a tier being a layer that is physically deployed on a separate server as opposed to a layer. What is paramount about the tier-ing / layering design approach is that it provides a "clean separation of concerns". Layering can also be vertical, i.e. swathes of development for particular functional areas which cut through the horizontal layers. Classically, the 'Horizontal' layers are:-

Presentation

Business logic (or domain)

Integration

Architects in the finance world prefer tiers, in that each tier can reside in its own DMZ (de-militarised zone) and hence provide maximum security. My interest in application architecture has developed from the bottom up, i.e. from the database upwards. However, the integration layer is not always an RDBMS, it can be:-

1. A web service

2. A message queue

3. Distributed object cache (which may be transactional)

4. An enterprise service bus adapter or message broker

5. Java Connection Architecture connection

6. etc . . .

In fact with XA distributed transactions, a transaction can span multiple end points below the integration layer.

Some patterns are almost de facto in certain layers and tiers, such as model view controller in the presentation tier, session facade in the business logic or domain tier, to name but two. Also, there are frameworks and third party libraries that take a lot of the effort out of having to crank out boiler plate code and the code behind the application infrastructure, e.g.:-

1. Apache struts and Java Server Faces for the presentation layer

2. Spring for the whole application or specific layers

3. Log 4j for logging

4. Apache commons, which is a common utilities library

However, whatever you do you should never; code business logic into the presentation layer or explicitly code details of the integration layer into the business logic or domain layer.

When looking at the performance of such architectures one needs to establish a profile of where particular activities spend their time within the various layers or tiers and the networks involved in connecting any tiers. If a J2EE application server is involved in the application stack, something known as the "carrot model" should be understood. All J2EE application servers use the following set of components that form a pipe line from the 'front' of the application server to the 'back', these are:-

1. Object request broker used for J2EE clients.

2. Web container

3. Enterprise Java Bean Container

4. JDBC Pool or JCA connectors.

The WebSphere application server has a http server built into the web container, however, this is only really recommended for testing purposes so you will probably have an http server outside of the J2EE application server. What the "carrot model" represents is that component utilisation is usually quite high at the 'front' of the application server and gradually dwindles off as you get towards the JDBC connection pool or JCA connectors, hence the analogy of a carrot, thick at one end and thin at the other. Note that ORB is only used for J2EE clients, web clients will utilise the web container http server or external http server directly.

A lot of texts refer to the effort and careful consideration that needs to go into designing the layers of the applications architecture. I would add that the same level of consideration needs to be given to where processing should be carried out. In the simplest of architectures with a J2EE application server on one machine and a database on another, there is still scope for the network to become the bottleneck by excessive calls being made to the database to carry out the processing of business logic and also standing data lookup. In one example, a stateless bean used to access fixed domain data was the most utilised bean in the application until a standing data cache in the application server was implemented. In another example, a stored procedure was called to validate individual items within individual records, using fixed domain data cached in the application server. By carrying the validation entirely within the application server, the process concerned ran faster, required fewer threads than before with lower CPU consumption on both of the servers hosting the application server and the database.

There is equal scope for poor performance and scalability to be present by design with vertical layering also. One particular design trend is to develop specific functional areas with an API implemented using a session facade. Tools such as Apache Axis and also the J2EE application server itself make exposing stateless session beans as a web service easy and straight forward. Consider two separate beans used to implement the APIs for two different functional areas, if these areas are tightly coupled and they are to be used in a service oriented architecture, there is going to be a significant amount of XML to Java serialisation that takes place involving the two respective web services, that might be avoided, if the API was provided by a single bean.

I strongly advise people to read the IBM red book "Designing and Coding Applications for Performance and Scalability in WebSphere Application Server". This is one of the best texts I have come across in the area of designing and writing scalable and well performing J2EE applications. If I had my way, this would be mandatory reading material for all developers, designers and architects on J2EE development projects.

In the not too distant future I will post an article on my experiences of the specific things that can stifle utilisation of the database server when a J2EE application is used.

Tuesday, November 04, 2008

The Rise Of MySql From The Low End

At the time of writing this, the penultimate bog entry from Jonathan Schwartz on the Sun web site refers to innovation loving a crisis, the crisis being the turmoil in the global economy. From this, it is not too difficult to make the mental leap and arrive at the inference that software with no license costs is going to be very attractive at present. Now, this comes from Jonathan's latest blog entry (again latest at the time of writing this):-

"The third is the need for change - one executive to whom I spoke recently said her entire discretionary budget was consumed by one proprietary vendor's price increase. So she's out looking for an alternative, and MySQL fits the bill. Which is to say, necessity's the mother of invention - and there's a lot of necessity going around right now."

Makes you wonder if MySql is going to start biting Oracle at it's ankles around the lower end of the market more than ever before.

Monday, October 13, 2008

"The Performance Hacker"

I've just read the Oracle Real World Performance Group's Open World 2008 presentations entitled "Current Trends in Real World Database Performance". The bit about the performance hacker was interesting. Tuning via initialisation parameters is a bad practice with little or no mileage to gained, unless parameters have been grossly mis-configured in the first place. At the extreme end of the performance hacker spectrum are those that dabble with initialisation parameters that most DBAs will never have any cause to change in their entire careers, such as spin_count (_spin_count from 8i), worse still are those that dabble with the '_' parameters without direction from Oracle support. In most sessions Oracle's Graham Wood gives at the UKOUG conferences, he always mentions that the scope of a fix should match the scope of the problem. You need to get to the root cause of the problem, although the use of hints is no where near as bad as hacking initialisation parameters, it is fixing the symptoms of the problem rather than directly addressing the problem itself.

Of the other characteristics that define the "Performance hacker", they like to indulge in tuning via the internet. There are, however, some excellent sources of information on the internet and in blog sphere regarding tuning:

The Real World Performance Group
The Optimizer Group Blog
Jonathan Lewis's blog
Wolfgang Breitlings web site etc

The trick is to prefer sources of information from organisations and individuals that provide clearly worked through test cases in order to demonstrate their points. Sites that make grandiose claims without backing them up with worked examples should be avoided like the plague.

If you are completely new to Oracle you may have a tuning advisor view of the performance world. Read the 11g documentation and you would be forgiven for thinking that 11g is a self tuning database and that the addm and bundled in advisors are the only tools you need. Also note that 11g comes with an SQL test case builder, for creating test cases to pass onto to Oracle support for poorly performing statements. The SQL API of which is available through the DBMS_SQLDIAG package, read into this what you will.

The approach recommended by the Real World Performance Group is based around what I first found in a paper entitled "Performance Tuning By Cardinality Feedback" by Wolfgang Breitling, a reviewer of Jonathan Lewis's cost based fundamentals book. The basic premise of this is that you look at where actual cardinality versus the predicted cardinality in a plan is out, usually by an order of magnitude and usually somewhere near the start of the plan. To the best of my knowledge, this approach is not mentioned in the standard Oracle performance tuning material.

Knowing about this technique and how to apply it is made easier in Oracle 10g with the gather_plan_statistics hint (or statistics_level=all) and the DBMS_STATS.XPLAN package. If you use the "Tuning by cardinality" feedback approach and find that you plan does not improve through addressing issues including:-

1. Inaccurate stats, particularly the number of distinct values against highly skewed data.
2. Data correlation or predicate independence assumption issues.
3. Predicate values being out of bounds when compared to min max values for a column.
4. Missing histograms or histograms with inappropriate numbers of buckets.
5. Pseudo columns relating to function based indexes with out stats or histograms.
6. Selectivity assumptions made when functions are used.
7. Skewed data, for which there is more than 254 disitinct values and the values you are interested in are not 'popular' values, i.e. do not fall on a bucket boundary end point.

By providing full evidence that this approach has been applied and followed, you will get a much more expediant resolution to your issue out of Oracle support than someone reporting a poorly performing query or a query with a large full table scan in it's plan.

Thursday, September 18, 2008

A Ticking Performance And Scalability Time Bomb and Java Object Caching Grids

During last years Oracle Uk User group I attended a session by Graham Wood from Oracle on the 10g performance infrastructure, covering ASH, ADDM, the time model and all that good stuff. At the beginning of his session, Graham mentioned that he often encountered projects where Java programmers who needed to persist something, simply created a table without a second thought for performance. When I attended a UML course earlier on in the year, this was also echoed by the course instructor, a designer and analyst of considerable experience. Specifically, that due care had not been placed into entity relationship modeling, culminating in data models that were a nightmare to maintain, which was becomming a greater and greater issue across a lot of projects.

In the Java world the term used to describe the layer that interfaces with the database is known as "object relational mapping", or ORM to use it's abbreviation. The focus tends to be on mapping tools that provide neat frameworks for managing the SQL, XML configuration files are a common component used in such approaches. However, there is little focus on leveraging features such as Oracle's array interface to make database access as efficient as possible. Indeed, as Tom Kyte often mentions in his books, some people take the black box approach to using the database. In the same book, Expert One on One Oracle, I think, one project team elected to develop something akin to the query optimizer in the application server. Talk about re-inventing the wheel.

There are inherant bottlenecks with using a relational database, mainly the data type impedance mismatch. Numerous JVM based object caching solutions which can front the database and even do away with it, have made their way onto the market to address this. Examples of such solutions include Oracle Coherence, Gigaspace Application Fabric, IBM Object Grid, Appistry, Terracotta etc. These caching solutions and their associated processing grids have given rise to the paradigm of "Extreme transaction processing", which I believe is a term coined by the Gartner group. Someone I spoke to in Oracle pre-sales earlier on in the year mentioned a customer who had managed to achieve with £40,000 of Intel based blades and Coherence what they had budgeted £200,000 of Unix hardware for. Indeed, the founder of Tangosol, the commpany that first developed Coherence, then to be acquired by Oracle, came up with the idea whilst he was a Java consultant and visited numerous customer sites where accessing databases was the main bottleneck in the application architecture. As an aside, Coherence can cache a wide variety of data sources other than databases including web services. It is also equally at home in caching with .Net as it is with Java.

The fact of the matter is that most people still need to use a relational database somewhere. In most peoples infrastruture and architectures some form of reporting needs to be performed, who has ever heard of a reporting solution based on Java objects ?. There may well be a reporting solution based on Java objects out there somewhere, but in my experience most people use SQL to meet their reporting requirements. SQL is a rich and mature language for reporting, with its analytic extensions and constructs such as materialised views, so the relational database might still be around for some time. As Tom Kyte once said, platforms change once every ten years from main frames, to Unix, to Windows and now Linux, however Oracle has always been most peoples database of choice and the software of choice for storing, processing and reporting on a organisation's data.

In reality a best of both worlds approach might be what is required to achieve scalable architectures and this is possible why Oracle has introduced a business logic tier caching solution for 11g based on the Times Ten in memory database. Note that when looking at caching options, there are two ways to go:-

1. Object caching

2. Relational data caching, refer to Times Ten and various offerings from Progress

The road you should go down depends on whether your application is expecting objects or relational data. A relational data caching solution will not eliminate the data type impendance mismatch problem, however what it wil do is vastly reduce latency from your architecture when accessing data.

Wednesday, March 26, 2008

Missed Opportunities and Getting It Right The First Time

Whenever Oracle brings out new features you are sometimes left thinking; thats nice, but I could really do with the new future allowing me to do x, y and z as well. For example:-

1. Index Organized tables in 8i not supporting secondary indexes.

2. Global indexes on partitioned tables becoming unusable after partition maintenance operations, drops, splits etc . . .

3. Being able to pass compressed blocks across the cluster interconnect, still an issue in 11g release 1 (I think).

This brings me on to cursor sharing and histograms. Histograms have been around since Oracle 7 are are used to provide the optimizer with information on skewed data, basic stuff really. Cursor sharing came along in 8i and is a way of getting the RDBMS to use bind variables when literal values are used in predicates. Up until 9i, histograms only got used when literal values were specified in a predicate on a column with a histogram. Where these two features come together is that if you are using bind variables Oracle will perform bind variable peeking, usually when a histogram exists, but also in some circumstances when histograms do not exist on the column(s) your predicates are used against.

Historically, most Oracle developers use literals against columns for which the data is skewed and binds against columns in predicates for which there is no value in forcing a re-parse of the SQL statement, typically values which are highly selective. With cursor_sharing = similar, plans should only be re-used if the parsed cursor can 'safely' re-use an existing plan. However Metalink technical note Note.377847.1 "Unsafe Peeked Bind Variables and Histograms", states that the RDBMS can mark a bind variable as being unsafe and thus generate many child cursors for statements, I have seen first at first hand the RDBMS doing this even when there is no histogram on a column (method_opt=>" . . . size = 1"). The solutions to this are:-

1. Use cursor sharing exact, this defeats the purpose of using cursor_sharing=similar somewhat.

2. 'Drop' histograms on columns that cause unsafe bind peeking, there is a good post on the Pythian blog about the overuse of histograms.

3. Check whether dynamic_sampling is causing this as per the Metalink technical note.

4. Set _optim_peek_user_binds = false to disable bind peeking full stop, if you still get this without a histogram.

v$sql_shared_cursor will not give any insight into why this happening. There is no 'fix' to this issue as such, however 11g introduces intelligence into the issue with adaptive cursor sharing. The on line Oracle documentation, Optimizer blog and Technet all provide good references on this.

The real question is why has it taken Oracle this long to introduce intelligence into cursor sharing ?. On the same theme of new features taking their time (and then some) to mature, I'd like to use the results cache in 11g in anger, but how long will it take Oracle to shave the rough edges off this feature, i.e.:-

1. Results being invalidated when none related rows in the related base tables change.

2. No explicit information in autotrace stats for knowing that results caching has been used

etc . .

I've heard whispers of new functionality appearing in 11g release 2 being related to none core RDBMS technology, perhaps Oracle should focus on the maturity of the RDBMS engine core features first.

Tuesday, March 11, 2008

A New Type Of Synchronization Mechanism In 10g Release 2

Up until 10g release 2, the only two locking and synchronization mechanisms available to the RDBMS engine were latches and enqueues. Oracle 10g release 2 introduces mutexs which replaces the library cache pin for shared cursors on platforms supporting atomic compare and swap operations (CAS). There is scant information on the library cache pin mutex on MetaLink or the internet at large for that matter, however Tanel Poder from Miracle A/S and the Oak Table provides some useful insights into this:-

Re: cursor: pin S wait on X
From: Tanel Põder
To: , "Mladen Gogala"
Date: Sat, 29 Apr 2006 20:10:50 +0900 I studied the mutex stuff a bit when 10.2 came out, I'd say mutexes are less intelligent than latches. But thanks to that, they are more lightweight and thanks to quite large caches on modern CPUs we can afford having tons of them - allowing very fine grained locking.On my 32bit linux installation a mutex was 28 bytes in size, while a regular latch structure was 110 bytes. In theory every library cache child should have it's own mutex, but in practice I saw one case where two different child objects were protected by the same mutex - that was what library cache dump showed me anyway.Mutex also acts as a pin structure, so there's no need to do another memory write for pinning a cursor. Mutex allows immediate shared access to resources it protects, the old approach would mean getting a library cache latch in exclusive mode and then modifying the corresponding library cache child pin structure.Mutexes also don't maintain statistics on successful gets as latches do, only sleeps are registered.Thanks to all that, an immediately succeeding mutex get operation takes several times less CPU instructions (and hits possibly less memory stalls), from a 10gR2 perf tuning course I remember it was about 250 vs 50 CPU instructions on IA32 platform.Note that mutexes don't provide much benefit for poorly written applications, for most benefit your apps should avoid any kind of reparsing - keeping cursors open and cancelling them manually when interest in query results has ended.Tanel.

Mutexs are indeed light weight and Oracle states in some information that Oracle support provided me with (the document included 'external' in the title making me presume it is for public consumption) that the rationales for their introduction into the RDBMS engine are:-

1. They are smaller and much faster to get than latches.

2. It is fast and easy to create mutexes for individual structures leading to less contention on the mechanisms that protect the actual structures.

3. The mutex implementation is highly flexible in that components can define wait policy for individual mutexes.

4. Mutexes exhibit a dual nature ideal for pin type operations, many sessions can reference a mutex in shared mode where as only one session can hold a mutex in exclusive mode.

Activity on the library cache pin can be observed through waits appearing on events prefixed by cursor: pin, this is usually followed by something denoting the type of operation, e.g. cursor: pin S wait on X, pinning a cursor for an execute. The v$ views expose library cache pin activity through the v$mutex_sleep_history and v$mutex_sleeps views. When monitoring this mutex, the most important thing to keep an eye for is the amount of time waited on cursor pin events as opposed to the number of events. Large wait times can be due to numerous child cursors being built beneath a parent cursor through cursor invalidation, bind variable variance, statements being aged out of the shared pool etc. In fact, v$sql_shared_cursor view gives the comprehensive range of scenarios that can cause cursors to be be re-parsed:-

SQL_IDADDRESSCHILD_ADDRESS

CHILD_NUMBERUN

BOUND_CURSOR

SQL_TYPE_MISMATCH

OPTIMIZER_MISMATCH

OUTLINE_MISMATCH

STATS_ROW_MISMATCH

LITERAL_MISMATCH

SEC_DEPTH_MISMATCH

EXPLAIN_PLAN_CURSOR

BUFFERED_DML_MISMATCH

PDML_ENV_MISMATCH

INST_DRTLD_MISMATCH

SLAVE_QC_MISMATCH

TYPECHECK_MISMATCH

AUTH_CHECK_MISMATCH

BIND_MISMATCH

DESCRIBE_MISMATCH

LANGUAGE_MISMATCH

TRANSLATION_MISMATCH

ROW_LEVEL_SEC_MISMATCH

NSUFF_PRIVS

INSUFF_PRIVS_REM

REMOTE_TRANS_MISMATCH

LOGMINER_SESSION_MISMATCH

INCOMP_LTRL_MISMATCH

OVERLAP_TIME_MISMATCH

SQL_REDIRECT_MISMATCH

MV_QUERY_GEN_MISMATCH

USER_BIND_PEEK_MISMATCH

TYPCHK_DEP_MISMATCH

NO_TRIGGER_MISMATCH

FLASHBACK_CURSOR

ANYDATA_TRANSFORMATION

INCOMPLETE_CURSOR

TOP_LEVEL_RPI_CURSOR

DIFFERENT_LONG_LENGTH

LOGICAL_STANDBY_APPLY

DIFF_CALL_DURN

BIND_UACS_DIFF

PLSQL_CMP_SWITCHS_DIFF

CURSOR_PARTS_MISMATCH

STB_OBJECT_MISMATCH

ROW_SHIP_MISMATCH

PQ_SLAVE_MISMATCH

TOP_LEVEL_DDL_MISMATCH

MULTI_PX_MISMATCH

BIND_PEEKED_PQ_MISMATCH

MV_REWRITE_MISMATCH

ROLL_INVALID_MISMATCH

OPTIMIZER_MODE_MISMATCH

PX_MISMATCHMV_STALE

OBJ_MISMATCH

FLASHBACK_TABLE_MISMATCH

LITREP_COMP_MISMATCH

Metalink technical note 438755.1 'Formated V$SQL_SHARED_CURSOR Report by SQLID or Hash Value' contains information on how to obtain useful reports from this view, it also provides other links to articles to assist in investigating shared cursor issues further. Its a shame that the treasure trove of Oracle internals hints and tips;http://www.ixora.com.au/ is no longer kept uptodate as I'm sure that Steve Adams would have provided some excellant technical details on this mutex much more eloquantly than I have in this posting here.

Tuesday, March 04, 2008

The Great Oracle Diagnostic Tools Debate

There has been a lot of blogging recently over the limitations of the ADDM and time model introduced in 10g, Dan Fink has blogged on this, Richard Foote has blogged on this in response to Dan's blog. Guy Harrison made a presentation at OOW 2007 mentioning the limitiations of ADDM and the 10g time model. Cary Milsap made a presentation at OOW 2007 on 'Skew' also citing the limitations of ADDM and the time based model specifically with regard to developing application performance profiles. I agree that:-

The "self managing database" is more a thing of marketing than technology, to quote Dan Fink.
Extended tracing allows you to see where time is being expended on individual statements to a much finer degree of granularity than ADDM and the time model.

However, there are certain practicalities that need to be considered here:-

Extended trace is of limited use for "one off" performance issues, you wouldn't really want this to be enabled all the time on a production environment.
The use of J2EE application servers is common place today, unless your application Java code is well instrumented by using using DBMS_SUPPORT to end to end tracing, a logon trigger to trace what the JDBC client is doing may have to be used. Trace files can be tagged using EXECUTE IMMEDIATE 'ALTER SESSION SET TRACEFILE_IDENTIFIER . . .' . Unfortunately good application instrumentation is the exception rather than the rule and only Oracle savy Java developers or teams with access to a good DBA will know about end to end tracing and trcsess. Otherwise, something such as a logon trigger may lead to a forest of trace files to wade through.

Some IT shops (and I know this is bad practise) don't always have a test / development environment for every production environment or don't have test environments that are identical to their production environments, this is often the case with data warehouses. Therefore, if you have a very intermitant problem in production or a problem which despite being reproducible is very infrequently, reproducing this on another environment might be a problem and again you wouldn't really want SQL tracing enabled on a production environment 24/7 until a problem re-occurs. This may not be such an issue if via good instrumentation the application provides the facility for tracing to be generated on specific parts of the application and under specific circumstances, however, my experience is that applications that do this are few and far between.

In his post Richard Foote uses a very good example of enabling extended tracing being analogous to putting a camera on some who journey gets held up, his wife in the example. What do you do if you don't know what part of the software to "put the camera" on ?.
The issue you are experiencing might be time sensitive, meaning that in order to reproduce the issue you need to replay data which was loaded / processed by the application at specific dates and times. With Unix this can be done by changing the system clock, but again, this is not something you would really want to do in a production environment.

I would love to know what Grahan Wood, architect and 'father' of the ADDM and time model makes of this debate and if he has plans to modify the ADDM and time model in light of these criticisms. Furthermore, I would also be interested to know what percentage of performance issues are down to bad configuration, poor coding practises, poor design and poor architecting, as it occurs to me that some people are concerned about ADDM and the time model taking some of the art out of tuning. However, I believe that an element of tuning is identifying poor design and architectural descisions a skill that no tool will be able to replace for a long long time.

Thursday, February 21, 2008

Good Rules Of Thumb For Benchmarking And Tuning

Benchmarking Goals

Start out with clearly defined performance goals. Wooly, ill conceived or even none exisitant targets defeat the purposes of any bench marking exercise.
Performance, throughput and scalability goals should be linked to the business aims of the software.

Testing Environment and Load Simulation

Always use data which is representative of what the application will look like in production.
Scale you data in order to understand how performance will be affected as the database grows, if database growth is applicable.
Use a realistic time distribution for when SQL statements will hit the database, i.e. ask youself at what what sort of frequency are the SQL statements going to hit the database and how this frequency is distributed over the time window when the application will be in use.
Use a realistic distribution of the SQL statements that used against the database.
Ensure that when benchmarking and tuning your application, that your application is the only using you server, network, I/O infrastructure. Activity outside of your application will skew your results. SANs are brilliant for skewing your results through caching, some SANs queue all I/O until the write part of the cache is written to disk once it becomes full, read aheads can kick in etc . . .
When using load simulation tools be aware of 'think' time, if this is to low you can in fact be simulating activity which is a order or magnitude greater than what you are trying to achieve.
Leverage Oracle features that allows your environment to easily reset and for tests to be easily reproduced, such as flashback database.

Metrics Capture

Use meaningful metrics for capturing performance, the ultimate goal of your exercise will be the fulfilling processing under certain conditions in order to meet certain business goals, therefore you should be able to express some of your findings in terms of figures that are meaningful to the business users of the application.
Hit ratios are flawed indicators of performance, there is some PL/SQL on Connor MacDonalds http://www.oracledba.co.uk/ web site which illustrates how to set the buffer cache hit ratio to support this point. Jonathan Lewis has made a presentation on the subject of hit ratios, in which he explains the flaws of traditional hit ratios and put's forward the "Fan Hit Ratio" as a superior performance metric, refer to http://www.jlcomp.demon.co.uk/hit_ratios.html. The crux of hit ratios being flawed is that they loose meaning, latency to name but one of the things that hit ratios do not capture.
Techniques for capturing metrics should not incur overheads that they skew your results.

Tuning

Tune What You Know, this gets reiterated at every UKOUG conference during various performance and tuning related presentations and justly so. Only tune after you have hard statistics showing where your time is going, do not blindly apply techniques for optimisation first without knowing where the time is being expended.
Make one change to your environment at a time when tuning.
When tuning, make changes which correspond to the scope of the problem, e.g. when tuning a query look for changes which are within the scope of the query, object statistics, indexes etc rather than initialisation parameter changes. There are obviously parameters that affect optimisation, however, when trouble shooting your first port of call should always be to the places which are local to the problem.
When gathering performance metrics and using Oracle 10g, ensure that database time is one of the metrics you collect.
If testing turnaround times for parts of your application are considerable, consider (if possible) breaking out certain parts of the application in order to test the affect of certain changes idependantly before performing the full blown test using the real software.
Try to understand what parts of the architecture / infrastrcture will throttle the performance of your software.
Optimise "single stream" performance as much as possible before scaling out.
Once performance goals have been met, document baselines of key performance indicators (document a flight envelope) to provide a tool for production performance trouble shooting.

Wednesday, February 20, 2008

Preventing Database Abuse By J2EE Application Servers

Some time back in around 2002, Mircrosoft took the Sun Petstore sample application and re-wrote in .Net. The driver behind this was to demonstrate that .Net was supperior to J2EE technology in terms of performance and this "they did". This created a real furore in the J2EE architect community at the time. Further investigation into what Microsoft had actually done revealed that the .Net layer of the rewritten application was only a very thin wrapper for calls to transact SQL stored procedures. Here is a references to this benchmark:-

http://www.oreillynet.com/onjava/blog/2002/11/another_review_of_the_petstore.html

The morals of this story are:-

1. You shouldn't believe all benchmarks on face value.

2. The fastest way to manipulate data in a database is to do as close to the data as possible, i.e. use transact SQL or PL/SQL stored procedures.

3. All software vendors will put spin on their benchmarks in order to make their architectures

and software appear to be the fastest and most scalable on the market.

It is also worth noting that bench marks not using your data loose some meaning when you try to apply them to your specific circumstances.

Tom Kytes once said (perhaps on his blog somewhere) that he believed all the best applications in world used stored procedures or BEGIN ENDs (anonymous PL/SQL blocks). But what if you have an application that does not do this, which tries to do as much in the application server as possible and you couldn't leverage PL/SQL because those skills do not exist in the development team. Even if you did have PL/SQL expertise at hand you would have to effectively re-write your application to leverage this, which you do not have the time and money for. There is still hope, I will cover some of the things you can do in this article.

The main objective is to eliminate latency when accessing data in the database as much as possible. The fastest way to do something is to not do it at all, when using Oracle this means:-

Keep hard parsing to a minimum.
Keep physical reads to a minimum.
Keep latching to a minimum.
Keeping logical reads to a minimum.
Accessing SQL from any language even from PL/SQL carries certain overheads, minimise database access from an application server in order to reduce data access latency and network round trips, assuming that the application server and database server are two seperate machines.

This posting assumes the use of bean managed persistance, i.e. the JDBC API is used explicitly in the Java code and that an Oracle JDBC Driver compatible with the JDBC 2.0 standard is being used. Here are my tips for optimising a J2EE application which uses Oracle, assuming that it uses widget by widget logic (row by row processing) and you do not want to re-write the application, my points should be reasonably comprehensive, but please forgive me if I have missed anything out.

Good Coding Practises

Make sure you code is bind valariable friendly, use prepared statements for the same statements that are executed multiple times. Refer to this tutorial from Sun for more information on prepared statements:-

http://java.sun.com/docs/books/tutorial/jdbc/basics/prepared.html
Avoid using the database if you can by caching standing (lookup) data in the middle tier.
If there is any application logic that purely uses standing data, after you have cached this perform the processing entirely in the middle tier so as to avoid round trips on the network.
If your application is based on processing numbers of dicrete items, lets say 'widgets'. If you have a batch process that processes 10 widgets for example, ensure that statements with execution counts that should relate to the numbers of widgets being processed are not performed more times that the number of widgets.
Look for SQL the application is generating that returns no rows or affects no rows, is this redundant SQL ?, can the application logic be changed so that this does not need to be executed.
JDBC resources, namely result set, statement and connection handles should always be released when they are no longer required, use v$open_cursor in order to track down cursor 'leaks'.

Intelligent Use Of SQL, PL/SQL and JDBC

Use the JDBC batching API to execute multiple statements in one batch. This effectively puts a 'BEGIN' and 'END' around multiple statements. The Oracle JDBC driver supports two flavours of this, the standard JDBC 2.0 interface and an Oracle proprietry interface, refer to:-

http://download-west.oracle.com/docs/cd/B19306_01/java.102/b14355/oraperf.htm

Look at using things such as in line functions, subquery factoring and merge statement to achieve in one piece of SQL what you might otherwise require multiple statements for.
If you frequently create the same result set and perform application logic against it multiple times, look at aggregating this into a working table.
From 8i onwards Oracle has the facility to support Java stored procedures, can some of the data intensive porocessing be moved closer to the database using these, i.e. leverage the existing Java skills in the team.
If your application is designed around widget by widget processing logic, it's probably performing a fair amount of index range scanning. Look at making these more efficient by rebuilding larger more frequently accessed indexes with a larger database block sizes, this significantly reduces latching around the buffer cache, i.e. larger blocks = less logical reads + larger leaf blocks = less branch blocks in the index B-tree. Also consider index compression.
Leverage the JDBC features for traversing result sets, if for example you use the queries:-

SELECT empno FROM emp WHERE deptno = 10

and

SELECT MAX(empno) FROM emp WHERE deptno = 10

Within close proximity in your application you might be able to do away with the second query by using the JDBC API to navigate around the result set returned by the first query.
Use set the fetch size array in JDBC in order to minimise network round trips when retrieving multiple multiple rows from a query.
The chances are that any batch processes in you widget by widget processing software achieves it's throughput via threading, if this is the case you may experience significant log file sync waits (waits on the contents of the log buffer being written to disk). This particular bottleneck can be eliminated by using the 10g asynchronous write to redo log files feature via the COMMIT_WRITE initialisation parameter. However, be advised that there are certain resiliency remaifications to using asynchronous writes to the redo logs.
Use connection pooling, all J2EE applications servers support JDBC connection pooling, reusing existing connections is an order or magnitude faster than having to create a connection every time a statement needs to be executed. Also set the minimum and maximum number of connections to the same value in order to prevent connection storms, i.e. bursts of requests from the application server to the database for the creation of connections.

Move The Data Closer To The Application Server => Reduce Data Access Latency

Leverage the Oracle 11g client side results cache in order to perform SELECTS without incurring the cost of jumping out of the application server into the database, however be aware that in 11g release 1, ANY changes to the data in the base tables upon which a result set is dependant will invalidate the results in the results cache, will this restriction be lifted in 11g release 2 ?.
A better option than using the results cache in 11g may be to build you own cache off the back of the database change notification mechansim available via the 11g JDBC driver for both 11g and 10g databases.
If you have more exacting caching requirements look at a caching software specifically for this purpose such as Oracle Coherance, IBM WebSphere DataGrid, Gigaspaces etc . . .

Tuesday, February 19, 2008

Why Does The Optimizer Get It Wrong ?

Stale Statistics
When gathering statistics knowing the rate at which the base data changes and when it changes is paramount. Some applications for example load temporary 'scratch' tables with data and use these tables in a subsequent SQL statements and then blow the data away. Dynamic sampling can help here, however the cost in terms of time added to the parsing of a statement due to dynamic sampling has to be balanced with the amount of time it takes to execute the query. For example, dynamic sampling may be more suitable for data warehouse and reporting applications where parse time are a small fraction of the total SQL execution time as opposed to OLTP type applications. Indeed, when, how often and how to gather statistics has provoked a lot of debate within the DBA community, here is the take of one expert on this very subject.

Tables with columns generated from sequences whose statistics become stale are particularly prone to throwing execution plans out. There is a graph in his "Cost-Based Oracle Fundamentals" book by Jonathan Lewis illustrating how the accuracy of predicted cardinalities degrades as values used in equality predicates get increasingly out of range of the column's high low values.

Dave Ensor (stalwart of the UKOUG DBMS SIG presentation circuit) once made a comment dubbed the statistics gathering paradox, which stated that the only time it is safe to gather statistics was when this made no changes to execution plans (assuming they are optimal in the first place).

Even Column Data Distribution Assumtpion
By default the optimizer assumes that column data is distributed evenly and uniformly, this can be corrected by the creation of histograms, however you should be aware that if method_opt=>'. . . SIZE AUTO' is used, you may find cases where histograms are created when they are not required, histograms not being created when they are needed and histograms created with the wrong number of buckets. I found this to be the case with a function based index used in conjuction with a LIKE predicate.

Getting Statistics Wrong
Using estimate_percent=>DBMS_STATS.AUTO_SAMPLE_SIZE in 10g can be less than 100% accurate when determining the numbers of distinct values for highly skewed columns. Oracle 11g uses a new algorithm to gather more accurate statistics when auto sampling is used, refer to Greg Rahn's blog entry on 11g auto sampling.

Maximum Number Of 254 Buckets Per Histogram
This issue bites when you require skew information on more than 254 distinct values. Some material refers to "Popular values" in a histogram, these are values which are bucket end points. In his book "Oracle Cost Based Fundamentals", Jonathan Lewis provides an example of creating a histogram in favour of supplied popular values. A more extreme solution is to partition tables with more than 254 distinct values.

Oracle Bugs
All software contains bugs, even software produced by world class software producers who adhere to exacting software engineering practices and standards such as level 5 of the Carnegie Melon Capability Maturity Model produce software with bugs in, this is not a slight on Oracle.

Bugs in the optimizer cardinality engine

Bugs in DBMS_STATS.

Costing bugs, plans incorrectly selected.

Transformations not being costed, in Oracle 10g some transformations that were never costed in 9i are now costed in 10g. With every new release of Oracle, new parameters prefixed by underscores appear (hidden parameters), these usually relate to new optimizer features which are not always costed until the next release appears. All the parameters relating to the CBO can be found by running a 10053 trace. I'm sure that as I write this, the likes of Jonathan Lewis are already on the case with working what new and hidden parameters affect the CBO in 11g and what they do.

Selectivity On Columns Used In Expressions
Until you start using extended statistics with Oracle 11g, the optimizer has no way of determining the correct selectivity for expressions, e.g. TRUNC(col1) = . . . , this can sometimes lead to sub optimal plans, 11g allows this to be rectified using:-

DBMS_STATS.CREATE_EXTENDED_STATS(ownname => user,
tabname => 'MYTAB',
extension => 'TRUNC(col1)')

Incorrect Configuration Of The Cost Based Optimizer Environment
The Oracle Real World Performance Group recommends that certain parameters affecting the cost based defaults are left at their defaults, refer to Greg Rah's blog:-

"I think it’s important to understand what variables influence the Optimizer in order to focus the debugging effort. There are quite a number of variables, but frequently the cause of the problem ones are: (1) non-default optimizer parameters and (2) non-representative object/system statistics. Based on my observations I would say that the most abused Optimizer parameters are:

OPTIMIZER_INDEX_CACHING

OPTIMIZER_INDEX_COST_ADJ

DB_FILE_MULTIBLOCK_READ_COUNT

Many see setting these as a solution to get the Optimizer to choose an index plan over a table scan plan, but this is problematic in several ways:

1. This is a global change to a local problem
2. Although it appears to solve one problem, it is unknown how many bad execution
plans resulted from this change
3. The root cause of why the index plan was not chosen is unknown, just that tweaking
parameters gave the desired result
4. Using non-default parameters makes it almost impossible to correctly and effectively
troubleshoot the root cause"

Data Correlation / The Predicate Dependence Assumption
Oracle always assumes that predicates are always independent when calculating selectivity and cardinalities. If you have a table with two columns e.g. salary and tax bracket, if you write a query with predicates that refer to these columns there is a dependence between these columns which the optimizer will always assume does not exist, therefore it will work out their cardinalities independantly when patently they are not independant. At a very high level the selectivity of the predicates that is ANDed together is calculated as:-

selectivity(col1) * selectivity(col2)

and for columns ORed together:-

selectivity(col1) + selectivity(col2) - selectivity(col1 AND col2)

In an example in "SQL Tuning By Statistics" Wolfgang Breitling fixes this by setting the column statistic density by hand.

Dynamic sampling first introduced in Oracle 9i can help with this.

Extended statistics in Oracle 11g, Greg Rahn provides an example of this:-
SELECT DBMS_STATS.CREATE_EXTENDED_STATS(USER, 'CALENDAR', '(MONTH, ZODIAC)') FROM DUAL;

Violation Of The Join Uniformity Assumption The CBO assumes that there is a one to one correspondance between rows joined from one row source to rows in other row sources. I came across this on Wolfgang Breitlings web site. "Facalcies Of The Cost Based Optimizer" refers to this.

Excellant Sources For Information On Tuning And The Optimizer

Jonathan Lewis's blog:-
http://jonathanlewis.wordpress.com/
Blog of Greg Rahn from the Oracle Real World Performance Group
http://structureddata.org/
Wolfgang Breitlings web site
http://www.centrexcc.com/
The Oracle Optimizer Team blog
http://optimizermagic.blogspot.com/
Papers by Christian Antognini
http://www.trivadis.com/e/knowhow/publikationen/index.asp

Histograms On Columns Containing Unique Data ?

I recently came across an oddity in 10g (10.2.0.3.0) whereby a problem with an execution plan was resolved by creating a histogram on a column containing unique data. This was the query:-

To trouble shoot the issue I ran the query with the gather_plan_statistics hint as per the SQL trouble shooting guide mentioned by Greg Rahn on his blog (www.structureddata.org), this is a technique for finding the root problem of an execution plan based on the predicated cardinality of an operation in the plan being different from the actual cardinality once the SQL has run. Usually the difference between the predicted and actual cardinality will be out be an order of magnitude. I'm not sure who first came across this but the first reference I have found to such a technique was on Wolfgang Breitling's web site (www.centrexcc.com). Here Wolfgang refers to this technique as "Tuning by cardinality feedback".

Once you have obtained your 'enhanced' plan you look for the inner most operation where predicted and actual cardinality is different by a significant margin. In my plan this is line 12:-

The index AD_SERVICE_POINT_IX30 is a function based index and the column SYS_NC00055$ is the pseudo column associated with it. Note that the predicate used with line 12 is a LIKE and that the statistics were gathered on this table by the built in GATHER_STATS_JOB, i.e. Oracle was left to it's own devices to determine what histograms to create and the number of buckets to create the histograms with. After working through the issue with Oracle Support it transpired that the resolution to this was to create a histogram "by hand" on the function based index with the maximum number of buckets:-

exec dbms_stats.gather_table_stats(null,

'AD_SERVICE_POINT',

cascade=>false,

method_opt=>'for columns SYS_NC00055$ size 254');

This is the plan now that the histogram has been manually created:-

In fact Oracle didn't create a histogram at all, there was a histogram on SYS_NC00055$, but it only two rows in user_tab_histograms view for the pseudo column, i.e. all the values were in one bucket = no data distribution information. The number of end points in a histogram is one per bucket + 1. The apology Jonathan Lewis uses to describe this in his Cost Based Fundamentals book is that of the number of fence posts you have per fence panels. After a bit more digging by Oracle support the explanation for method_opt=> FOR . . . COLUMNS SIZE AUTO getting this wrong was because the LIKE predicate is irregular as such there was insufficient information in col_usage$ from which DBMS_STATS could determine what sort of histogram to create. col_usage$ is a dictionary table the RDBMS engine uses to record the usage of a column every time a query is parsed. I assumed that "LIKE being irregular" referred to to the predicate being a special case in terms of it accepting wild card characters as part of it's input strings.

I have come across a couple of references in material alluding to the GATHER_STATS_JOB not being infallible, Greg Rahn refers to the defaults as " . . . a good start but not perfect in specific situations . . ." in "When To Use The Appropriate Database Technology" presentation made at OOW 2007. Richard Foote (http://richardfoote.wordpress.com/) is also not a big fan of method_opt=>' . . . SIZE AUTO' as he claims that this can lead to histograms being created when they shouldn't be and histogram creation omittion when histograms are genuinely required, this Richard goes on to say can lead to excessive latching around the shared pool, there is also the inference that this can lead to sub optimal plans.

Futher reading:-

Worked example of method_opt=>' . . . SIZE AUTO' getting it wrong from Richard Foote
http://richardfoote.files.wordpress.com/2008/01/dbms_stats-method_opt-auto-dangers-demo-version-3.txt
"Histograms - Myths and Facts" by Wiolfgang Breitling: http://www.centrexcc.com/
"Histograms An Overview" Metalink document id 1031826.6
"Interpretting Histogram Information" Metalink document id 72539.1