J2EE and Oracle Performance Musings: December 2008

Friday, December 26, 2008

Nuances Of JDBC Connection Pools

There was a particular piece of code I was tuning a while back which highlighted some of the quirks of JDBC connection pooling. A piece of PL/SQL created and executed an INSERT SELECT statement based upon the parameters it was called with from a WebSphere application server. The statement was bind friendly, but due to bind peeking a lot of on the fly re-optimizations were taking place which hit performance.

A common misconception with Oracle is that execution plans are only re-generated when a hard parse takes place or when an execution plan is aged out of the shared pool, this is not the case, v$sql_shared_cursor gives numerous reasons as to why re-optimizations and also the generation of child cursors take place. However, as per a previous posting I made, this view does not capture bind peeking activity.

As a quick fix, and yes I know this solution was only really a crutch, I placed an execute immediate statement at the top of the code to set cursor_sharing to force. I could have also fixed this using the hint opt_param('_optim_peek_user_binds', 'false'), with the benefit of hind sight and the rule of thumb that the scope of the fix should match the scope of the problem, this may have been a better way to have gone.

I did not realise at the time that this alter session would affect other SQL statements executed by the application server that reused this particular connection from the pool, with disastrous results on performance. In particular, a batch process which executed the same statement with different binds multiple times was not being re-optimized at execution time, when in fact Oracle had just cause to do this. Fortunately this particular scenario was constrained to one test environment I was working on. The resolution to the problem was to put another execute immediate statement at the bottom of the stored procedure to set cursor_sharing back to what is was set to before.

A popular technique to avoid database access and solve mutating trigger problems is to use a PL/SQL array declared in a PL/SQL package specfication. Indeed, the other nuance of JDBC connection pooling is that if an SQL call from the Java code intialises such a variable, and then the connection is released in the Java code, subsequent calls from the same Java thread, may not necessarily reuse the same connection from the JDBC connection pool and thus not see the data cached in the PL/SQL package.

Good Java programming practice dictates that you should always lazily grab resources and then release them as soon as they are no longer required, incresing the likelihood that threads may use many different connections from the JDBC connection pool. Also note that when obtaining a JDBC connection in Java code, what you are in fact doing is obtaining a handle to a connection from the pool . New connections to the database should only be made:-

1. After the application server starts up

2. In the minimum and maximum connection settings for the pool are not the same and the application server needs to grow the size of the pool up to the maximum setting.

A Presentation On Trouble Shooting Bad Execution Plans In Oracle

Oracle Sql Tuning

View SlideShare presentation or Upload your own.

Tuesday, December 23, 2008

Performance Bottlenecks In The J2EE App Server That Can Throttle The Database

I mentioned in my post about layered architectures that I would discuss factors that can throttle throughput from the J2EE application server to the database (or persistence layer). To start of with, I will focus of the ethos of designing distributed object architectures, in the words of Martin Fowler:-

". . . Hence, we get to my First Law of Distributed Object Design: Don't distribute your objects!"

Distributing components between J2EE application servers in the J2EE world, involves the use of remote method invocation (RMI) calls to remote interfaces, assuming we are talking about synchronous communication here. Remote method calls to beans deployed to remote servers incurs the following performance overheads:-

Copy by value

Network latency

Every incoming RMI call needs to be serviced by its own Java thread

Servicing incoming connections from the TCP-IP stack causes context switches

The original text behind this design ethos appears in Martin's book "Patterns Of Enterprise Architecture", there is also a very similar article that Martin has published in Dr Dobbs journal which can be found here.

If all the beans are deployed to the same application server, this will eliminate all the bottlenecks except for that of copy by value. However, as you will see, the overhead of pass by copy is still significant and will both increase heap utilisation and extend the part of major garbage collection that looks for dangling objects on the heap. On the last project I worked on at the time of writing this, we found that:-

The 10g database time (not to be confused with wall clock time) was low.
Average number of sessions from ADDM reports was low.
The CPU on the WebSphere server was maxed out.

Profiling the code with a Java profiler revealed that 50% of the CPU time was being expended on a method with a name like com.ibm.CORBA.util.copyObject. This was because the business logic or domain beans were calling the database access beans via remote interfaces, which uses pass by value.

For the benefit of anyone reading this who is not that familiar with the J2EE world, I will elaborate on pass by reference, pass by value and remote and local homes. One of the original design goals being the J2EE standard was to present a framework in which distributed applications could easily be produced, this was based on CORBA and the synchronous communications method was RMI. Prior to the EJB 2.1 standard, it was assumed that when one enterprise bean called another bean, the 'other' bean was always remote, hence a remote interface and copy by value was always used. However, as time went by people realised that despite this, a lot of applications used beans that were deployed to the same J2EE app server and hence suffered this performance penalty for no added benefit. Thus, in the EJB 2.1 standard 'local' interfaces came about. Beans talking to each other using local interfaces use the much more efficient pass by reference method of passing object references rather than whole objects by value.

Fortunately with WebSphere there is an option available on the Object Request Broker (ORB), to turn pass by reference 'On', this forces all the beans deployed to the same EJB container using the same class loader to use pass by reference, irrespective of whether they have been coded to use remote interfaces. In the case of the application that I was looking at, this had the rather large impact of:-

Reducing batch process run times by 50%.

Reducing CPU consumption on the server hosting the WebSphere application server by 50%.

Doubling the 10g db time that a batch process spent in the database.

Increasing the average number of sessions active whilst the batch process was running, again this is an Oracle 10g time metric which is the database time divided by the actual elapsed "walk clock" time.

Setting the class loader policy on for the application server to single also helps. In tuning WebSphere, I also looked at the:-

server JVM

client JVM

ORB with regard to the setting up a JNI reader thread pool and the fragment size

JDBC pool, both the size of the pool, pool utilization and cached statement discard rate.

EJB container, specifically the EJB cache size

Log4j logging level

number of batch process threads
turning off the diagnostic trace

To the best of my knowledge, the only thing I didn't look at was turning the PMI off completely. For both the client and server JVM, I found that using the JVM otpions as used by IBM in a Solaris SPECjAppserver2004 submission yielded the best results. Refer to this link for a blog on which some of the IBMers who carry out this work write, in particular Andrew Spyker, WebSphere performance architect.

However, out of all the things I looked at, I got the biggest bang for my buck from turning pass by reference to 'On' by a long chalk. This comes with a word of caution though, if your software is designed such that beans that receive RMI calls modify objects and the calling bean is not expecting this, wild and erratic application can result. It is for this very reason, or so I believe , that pass by reference is now turned 'Off' by default on WebSphere 6.1.

This brings me on to J2EE clustering and shared nothing architectures. WebSphere network deployment, the clustered version of the WebSphere uses a shared nothing architecture, similar to that discussed in this article here, which again re-iterates the point about distributed architectures and performance:-

"The most scalable architectures are sharing-nothing clusters. In such clusters, each node has the same functionalities, and knows nothing about the existence of other nodes. The load balancer will decide how to dispatch the requests to the back end server instances. It is rare that a load balancer will become a bottleneck, because a load balancer only need to do some very simple jobs, such as distributing requests, health checking, and session sticking. By adding more nodes, the computing ability of the cluster is increasing almost lineally, if the back end database or other information systems are powerful enough."

With WebSphere network deployment, the same ear file is deployed to each node in the cluster, or cell, to use IBM terminology, workload distribution is managed in a round robin fashion via the workload manager. WebSphere eXtreme scale adds some extra intelligence to this by distributing the work load based on the current available capacity of nodes in the cluster.

What exacerbated my original problem of the massive copy by object overhead was the batch processing design, which is fundamentally this. The problem being that because Oracle is called for every individual item in the workload, there is heavy traffic between the business logic layer and the persistence layer. Accessing a database item by item is really inefficient in terms of:-

1. Hard and soft parsing, in particular I found that setting the batch process thread count above 32 stressing the library cache pin mutex. Tanel Poder has written a good article on the library cache pin mutex and extended use of mutexes in 11g here.

2. Latching

3. Network round trips and network latency

4. Physical and logical reads

5. JDBC calls

Some people get very agitated about batch process designs that process the workload item by item and in principle this results in certain inefficiencies when accessing the database. However, the performance targets for the batch processes where the same as those for the system this new one was replacing, but with 200% of the workload and two years worth of "Database aging", i.e. the database had been artificially aged to give the appearance of the application having been in use for two years. In summary, all batch processes beat their counter parts in the original application by at least a factor of 100%.

The morals of this story are:-

1. Distributed object architectures are bad for performance

2. Your application needs to be profiled in order to work out where the time is going, in the example I have highlighted, the database was never the bottleneck.

3. A holistic approach to looking at the architecture needs to be taken with regard to achieving good performance and scalability.

I will get around to discussing approaches on alleviating the stresses that item by item oriented batch processing imposes on the Oracle database in a future post.

Monday, December 01, 2008

J2EE Layered Application Architectures and Performance

I have spent most of my career as a DBA focusing on performance. With most applications there is an intimate relationship between the design and architecture of the application and that of its performance and scalability. Generally speaking, if the design and architecture is wrong, getting an application to perform and scale can be akin to applying elastoplasts over the biggest holes.

As an industry, IT has a patchy record for delivering projects on time and to budget, as such there are many projects out there that barely manage to cross the finishing line having met all of their functional requirements let alone performance requirements. Due to this, tuning can be reactive and end up becoming a battle against the architecture and design.

There was a time when most applications consisted of a client and business logic written directly in the database using stored procedures. Indeed the database software formed the largest part of the application stack. However, more recent architectural trends have been heavily influenced by:-

The internet, Web 2.0 has firmly established itself and Web 3.0 is around the corner

Service Oriented Architectures (SOA)

Software as a service (SAAS)

Event driver architectures
Extreme transaction processing
Grid architectures
Mash ups

Indeed, there was a time when the database was a focal point of integration efforts, now the industry is moving towards integration via XML based middleware. Here is one such article on this very subject from design and architecture luminary Martin Fowler. In the current climate "integration architects" are highly sort after people.

Cloud computing is on its way and just as during the old web 1.0 days, to quote a marketing tag line from Sun. "We were into intranets when most people thought this was a typo", there will also be internal clouds as well. Therefore if you have an interest in performance and tuning, it is highly likely that you will have some involvement in the areas outside the database and find that the database is being under utilised because of what is happening in the layers above it. Cary Millsap made a presentation at Oracle Open World 2007 highlighting the importance of establishing application profiles when tuning.

One of the most prevalent software architectures today uses a combination of layering and tiering, a tier being a layer that is physically deployed on a separate server as opposed to a layer. What is paramount about the tier-ing / layering design approach is that it provides a "clean separation of concerns". Layering can also be vertical, i.e. swathes of development for particular functional areas which cut through the horizontal layers. Classically, the 'Horizontal' layers are:-

Presentation

Business logic (or domain)

Integration

Architects in the finance world prefer tiers, in that each tier can reside in its own DMZ (de-militarised zone) and hence provide maximum security. My interest in application architecture has developed from the bottom up, i.e. from the database upwards. However, the integration layer is not always an RDBMS, it can be:-

1. A web service

2. A message queue

3. Distributed object cache (which may be transactional)

4. An enterprise service bus adapter or message broker

5. Java Connection Architecture connection

6. etc . . .

In fact with XA distributed transactions, a transaction can span multiple end points below the integration layer.

Some patterns are almost de facto in certain layers and tiers, such as model view controller in the presentation tier, session facade in the business logic or domain tier, to name but two. Also, there are frameworks and third party libraries that take a lot of the effort out of having to crank out boiler plate code and the code behind the application infrastructure, e.g.:-

1. Apache struts and Java Server Faces for the presentation layer

2. Spring for the whole application or specific layers

3. Log 4j for logging

4. Apache commons, which is a common utilities library

However, whatever you do you should never; code business logic into the presentation layer or explicitly code details of the integration layer into the business logic or domain layer.

When looking at the performance of such architectures one needs to establish a profile of where particular activities spend their time within the various layers or tiers and the networks involved in connecting any tiers. If a J2EE application server is involved in the application stack, something known as the "carrot model" should be understood. All J2EE application servers use the following set of components that form a pipe line from the 'front' of the application server to the 'back', these are:-

1. Object request broker used for J2EE clients.

2. Web container

3. Enterprise Java Bean Container

4. JDBC Pool or JCA connectors.

The WebSphere application server has a http server built into the web container, however, this is only really recommended for testing purposes so you will probably have an http server outside of the J2EE application server. What the "carrot model" represents is that component utilisation is usually quite high at the 'front' of the application server and gradually dwindles off as you get towards the JDBC connection pool or JCA connectors, hence the analogy of a carrot, thick at one end and thin at the other. Note that ORB is only used for J2EE clients, web clients will utilise the web container http server or external http server directly.

A lot of texts refer to the effort and careful consideration that needs to go into designing the layers of the applications architecture. I would add that the same level of consideration needs to be given to where processing should be carried out. In the simplest of architectures with a J2EE application server on one machine and a database on another, there is still scope for the network to become the bottleneck by excessive calls being made to the database to carry out the processing of business logic and also standing data lookup. In one example, a stateless bean used to access fixed domain data was the most utilised bean in the application until a standing data cache in the application server was implemented. In another example, a stored procedure was called to validate individual items within individual records, using fixed domain data cached in the application server. By carrying the validation entirely within the application server, the process concerned ran faster, required fewer threads than before with lower CPU consumption on both of the servers hosting the application server and the database.

There is equal scope for poor performance and scalability to be present by design with vertical layering also. One particular design trend is to develop specific functional areas with an API implemented using a session facade. Tools such as Apache Axis and also the J2EE application server itself make exposing stateless session beans as a web service easy and straight forward. Consider two separate beans used to implement the APIs for two different functional areas, if these areas are tightly coupled and they are to be used in a service oriented architecture, there is going to be a significant amount of XML to Java serialisation that takes place involving the two respective web services, that might be avoided, if the API was provided by a single bean.

I strongly advise people to read the IBM red book "Designing and Coding Applications for Performance and Scalability in WebSphere Application Server". This is one of the best texts I have come across in the area of designing and writing scalable and well performing J2EE applications. If I had my way, this would be mandatory reading material for all developers, designers and architects on J2EE development projects.

In the not too distant future I will post an article on my experiences of the specific things that can stifle utilisation of the database server when a J2EE application is used.

Friday, December 26, 2008

Nuances Of JDBC Connection Pools

A Presentation On Trouble Shooting Bad Execution Plans In Oracle

Tuesday, December 23, 2008

Performance Bottlenecks In The J2EE App Server That Can Throttle The Database

Monday, December 01, 2008

J2EE Layered Application Architectures and Performance

J2EE and Oracle Performance Musings

Some of my favorite links . . .

Blog Hit Count

My Blog List

Blog Archive

About Me