J2EE and Oracle Performance Musings

Wednesday, July 15, 2009

SQL Server Clustered Tables versus Oracle Index Only Tables

A SQL Server clustered table and an Oracle index only table are essentially the same thing, a table that instead of being heap organised, is stored in sorted order. In the case of Oracle a B*Tree is used and a doubly linked list for SQL Server. But here is the interesting bit, in Oracle due to the expense of secondary index lookups, the use of index organised tables is the excpetion rather than the rule, however if you refer to this presentation from Kimberley Tripp, in SQL Server the use of clustered indexes is encouraged a lot more than their counterparts in Oracle. I can only assume that this is because SQL Server does not have an equivalent to Oracle's table lookup by rowid. In short this took me a bit by surprise.

Tuesday, July 14, 2009

Sane San . . . Sane Virtualization ?

James Morle published an article some time ago entitled "Sane SAN", it discussed the practicalities allocating database storage from a SAN. One of the points it touched on was the danger of being given an amount of storage without knowing how many physical spindles it was spread across. This was before virtualization took off, the problem being that there may be several layers of virtualization before you get down to the physical disks. For example, Hewlett Packard produce something called the Enterprise Virtual Array, this aggregates trays of storage into virtual arrays. With another product called SVM, which, off the top of my head stands for something like storage virtual manager, these virtual arrays can be pooled into virtual SANs.

On the subject of virtualization, the "Oracle Storage Guy" compares the price performance of Oracle RAC versus Oracle standard edition on clustered VMWare, with some caveats, clustered VMWare comes out the winner. This provides serious food for thought as to where it is best to carry out the clustering in the technology stack.

Tuesday, March 03, 2009

Oracle 11.1.0.7.0 and cell physical IO interconnect byt-es

I've been playing around with 11.1.0.7.0 and noticed the system statistic "cell physical IO interconnect byt" in a statspack report. There are no references to this statistic on metalink whatsoever, however there is a reference on Kevin Closson's blog stating that this is an exadata related statistic, not that I dispute Kevin's information, but I'm not using exadata, this is something I'll try to bottom out with Oracle support.

Monday, February 16, 2009

SQL Server Cross Training For Oracle Professionals

This is a presentation I prepared for some of my colleagues at NSB Retail Systems before we became part of British Telecom. The aim of this was to provide cross training into SQL Server for Oracle professionals. The information was correct as of SQL Server version 2000, since then SQL Server has evolved to incorporate the following changes (to name but four), however the core concepts are still very much the same:-

there have been numerous transact SQL enhancements
The internals have been exposed somewhat via dynamic management views
Query Analyzer, Enterprise Manager and OLAP Analysis manager have been consolidated into "Management studio".
There is a new database tuning advisor

Sql Server Training

View more presentations from chris1adkin. (tags: sql_server)

Oracle, DB Time and Performance Baselines

I've been tasked at work with investigating performance baselines. Fortunately, as we use 10g and have paid for the diagnostics and tuning packs, base lines (or preserved snapshots) can easily be created with dbms_workload_repository.create_baseline. However, there are couple of things that would be really nice to have that are not available, namely:-

1. The ability for Enterprise Manager to generate notifications when the db time of a database deviates from that of a specified baseline by a specified threshold. This can be worked around via the custom metrics that the enterprise manager framework allows you to create. I'm a tad surprised that this functionality does not exist as db time is the cornerstone of the Oracle performance infrastructure from 10g onwards.

2. Consider the scenario where you have an existing production database and you want to test the impact of adding new functionality and / or adding an increased load to production database. Obviously you don't want to do this without some prior testing, fortunately you have a test environment with hardware that is identical to production. You test the changes on test and after some tuning you deem the changes satisfactory to go into production, also you have used dbms_workload_repository to create a baseline on test. At this stage it would be really nice to have the ability to export the baseline from the test environment and import it into production. This ability does exist but it is not documented, here is the solution as proposed by Oracle support, the speed at which Oracle support came back with suggests the solution is something that other people have asked for before. Here it is:-

Though AWR data can be exported and imported for transporting it into other databases
this is not supported by oracle and this is to be used only for troubleshooting purposes,

This is implemented with an internal package which is created with the script awrextr.sql
which is under $ORACLE_HOME/rdbms/admin

The steps to follow are

To download the data into a data pump dumpfile, you would use the procedure AWR_EXTRACT:

begin
DBMS_SWRF_INTERNAL.AWR_EXTRACT (
dmpfile => 'awr_data.dmp',
dmpdir => 'TMP_DIR',
bid => 302,
eid => 305);
end;

dmpfile => The name of the target file for the data is mentioned here. This is a Data Pump export
file. If non filename is given, the default value awrdat.dmp is used.

dmpdir => The directory object where the dumpfile is written. In this case, you may have defined
a directory TMP_DIR as /tmp.

bid => The snapshot ID of the beginning snapshot of the period.

eid => The end snapshot ID.

Once you dump the AWR data to the dumpfile awr_data.dmp to the new location and load it using another procedure in
the same package,

First we need to load the AWR data to a temporary staging area before loading into the SYS schema. In this case, the data
will be loaded to AWR_TEST schema initially.

begin
DBMS_SWRF_INTERNAL.AWR_LOAD (
SCHNAME => 'AWR_TEST',
dmpfile => 'awr_data',
dmpdir => 'TMP_DIR');
end;

After staging, the data is moved into the SYS schema:

begin
DBMS_SWRF_INTERNAL.MOVE_TO_AWR (SCHNAME => 'TEST');
end;

Once this is completed, the AWR data can be used for any performance analysis or you can build a central repository of
AWR data collected from multiple databases

Monday, February 02, 2009

The Death Of ESBs

I've just watched "Does My Bus Look Big In This" on infoq, this infers that by adopting Enterprise Service Buses we are effectively brushing our spaghetti under the carpet by putting it all in "one box". It also makes a compelling case for "Guerilla SOA", i.e. picking the small winable integration battles, rather than a monolithic campaign of putting an ESB in and all of its associated middleware. I wonder if many of the big ESB and middleware players are cogniscent of the fact there might be a major sea change coming in the world of SOA.

Saturday, January 24, 2009

Efficient Oracle Database Usage By Java and J2EE Applications

In case you haven't guessed, I like producing presentations. In the next two weeks or so, I intend to produce a new presentation on the the "Efficient Use of Oracle By J2EE and Java Applications". To whet the appetite this will include:-

1. Connection management best practices:-

Connection pool usage
Setting the connection pool min and max settings to be the same to avoid connection storms
Oracle 11g server side connection pooling
Using the 11g to throttle connection rates

2. Efficient cursor management

The statement Vs preparedStatement API.
Avoiding hard parsing
Avoiding soft parsing
Parameters that affect parsing: session_cached_cursors,
cursor_space_for_time and cursor_sharing

3. Minimizing network round trips and leveraging bulk operations

The JDBC fetch size
Statement batching in JDBC
The Oracle array interface
Leveraging Oracle bulk operations
Intelligent Oracle statement feature usage, specifically usage of: merge
statements, in line views, in line PL/SQL functions, sub query factoring.
The client side result cache
Programming and design styles for minimizing application 'Chatter'
Caching

4. Design and coding according to the putting the processing close to resource that requires it, specifically:-

Do not be precious about Java or the database; it is not necessarily a good
thing to do everything in the database.
Do not re-invent database functionality in Java

5. Best practices for logger buffer usage

JDBC auto commit.
Batch and asynchronous commits as introduced in Oracle 10g.

Watch this space . . .

Friday, January 23, 2009

J2EE Batch Processing Design and Tuning presentation here !!!

As promised here is my presentation on J2EE batch processing. There may be material on the Internet and blogosphere that covers some of the individual areas I touch on in greater detail. However as a piece of material that aggregates this in one place, I don't think its bad.

| Get your Presentation Pack

Wednesday, January 21, 2009

Presentation On J2EE Batch Processing and Tuning

I have spent much of my time recently on a project focusing on the volume and performance testing of J2EE (Websphere app server) batch processes using a Oracle 10g database for persistence. I will shortly be posting a presentation which will cover some salient points that emerged from this work via slideshare.net.

I wrote the presentation for several reasons:-

1. To capture key lessons learned on the project from a technical point of view.

2. To capture information on batch process design considerations as there is not much information in the public domain.

3. To produce some "joined up" information on tuning J2EE software using Oracle, mainly because most J2EE material stops short of the database at the JDBC level and most database tuning material stops short of application server at JDBC but from the other direction. There is material "out there" but not in one place presented in a manner that covers the whole software stack.

In this day and age most databases are 'fronted' by an application server and we have long since moved on from all the business logic residing within the database and the database being the main focal point of integration. Very few applications are islands and the 'glue' used to join applications up is almost as important as the applications themselves, therefore a more all encompassing approach needs to be taken when tuning and performance monitoring the software stack.

These are the main points I will cover:-

1. Batch process architecture and design considerations.

2. The batch process design used on my project

3. Available monitoring and tuning tools for tuning J2EE architectures using Oracle.

4. Tuning findings

5. Conclusions

As everyone always likes to skip to the end of a book before reading the rest, I will touch on some of the conclusions from this work, namely:-

1. Why Martin Fowler's first law on distributed Object architectures is well founded.

2. Dangers of swamping a J2EE application server with too many threads.

3. Avoiding 'Chatty' designs.

4. Although row by row processing is a concern, it is not always the end of the world in terms of performance and only a genuine concern if it prevents performance and scalability targets from being reached.

Thursday, January 15, 2009

The Basics That Undermine RAC Scalability And Availability

I'm writing this post as one of the things I hope to get into in the near future is Oracle RAC. There are two reasons for adopting Oracle RAC:-

1. Scalability

Cluster inter-connect traffic and performance is a major factor here. As such Infiband is being touted as the up and coming inter-connect fabric. Also if you google Joel Goodman Oracle, you will most probably find references to presentations and papers on how uncached sequences without noorder specified can degrade performance via the pinging of blocks across the inter-connect. Other than this, the same factors that influence performance and scalability for a single instance application also apply to RAC. Specifically, these are the "usual suspects" that the Oracle Real World performance include in most of the material they present at Oracle Open World, namely:-

1. Good schema design
2. Efficient connection management
3. Efficient cursor management

Therefore, my first question is this, how many RAC sites test their applications in single instance environments for scalability around these three factors. Also, how many sites test the scalability of their clusters as they add nodes. Interestingly according to Julian Dyke’s book on 10g RAC and Linux, Oracle actually turn cache fusion off in their RAC TCP bench marks.

A further recommendation is that your application has some intelligence when connecting to database services, such that the workload directed at specific nodes, leads to a minimal amount of cluster inter-connect traffic. Without any evidence to back it up, there are probably applications out there which use tables to implement things best described as application level semaphores or tables which occupy very few blocks with flag like columns, which if placed on a RAC cluster will slay the inter-connect.

2. High availability

It is recommended that there is adequate capacity in the cluster to soak up the workload created when one or more nodes fails. Technical issues aside, a cluster is not a panacea for high availability and this can be undermined by several factors:-

Poor testing of changes applied to a cluster.
Lack of a realistic test cluster.
A lack of tight change management procedures and processes.

In short, as I alluded to all of this, it begged the question, how many RAC sites carry out all of this work when implementing RAC, there is no way of knowing suffice it to say that the answer is probably less than 100%.

Friday, December 26, 2008

Nuances Of JDBC Connection Pools

There was a particular piece of code I was tuning a while back which highlighted some of the quirks of JDBC connection pooling. A piece of PL/SQL created and executed an INSERT SELECT statement based upon the parameters it was called with from a WebSphere application server. The statement was bind friendly, but due to bind peeking a lot of on the fly re-optimizations were taking place which hit performance.

A common misconception with Oracle is that execution plans are only re-generated when a hard parse takes place or when an execution plan is aged out of the shared pool, this is not the case, v$sql_shared_cursor gives numerous reasons as to why re-optimizations and also the generation of child cursors take place. However, as per a previous posting I made, this view does not capture bind peeking activity.

As a quick fix, and yes I know this solution was only really a crutch, I placed an execute immediate statement at the top of the code to set cursor_sharing to force. I could have also fixed this using the hint opt_param('_optim_peek_user_binds', 'false'), with the benefit of hind sight and the rule of thumb that the scope of the fix should match the scope of the problem, this may have been a better way to have gone.

I did not realise at the time that this alter session would affect other SQL statements executed by the application server that reused this particular connection from the pool, with disastrous results on performance. In particular, a batch process which executed the same statement with different binds multiple times was not being re-optimized at execution time, when in fact Oracle had just cause to do this. Fortunately this particular scenario was constrained to one test environment I was working on. The resolution to the problem was to put another execute immediate statement at the bottom of the stored procedure to set cursor_sharing back to what is was set to before.

A popular technique to avoid database access and solve mutating trigger problems is to use a PL/SQL array declared in a PL/SQL package specfication. Indeed, the other nuance of JDBC connection pooling is that if an SQL call from the Java code intialises such a variable, and then the connection is released in the Java code, subsequent calls from the same Java thread, may not necessarily reuse the same connection from the JDBC connection pool and thus not see the data cached in the PL/SQL package.

Good Java programming practice dictates that you should always lazily grab resources and then release them as soon as they are no longer required, incresing the likelihood that threads may use many different connections from the JDBC connection pool. Also note that when obtaining a JDBC connection in Java code, what you are in fact doing is obtaining a handle to a connection from the pool . New connections to the database should only be made:-

1. After the application server starts up

2. In the minimum and maximum connection settings for the pool are not the same and the application server needs to grow the size of the pool up to the maximum setting.

A Presentation On Trouble Shooting Bad Execution Plans In Oracle

Oracle Sql Tuning

View SlideShare presentation or Upload your own.

Tuesday, December 23, 2008

Performance Bottlenecks In The J2EE App Server That Can Throttle The Database

I mentioned in my post about layered architectures that I would discuss factors that can throttle throughput from the J2EE application server to the database (or persistence layer). To start of with, I will focus of the ethos of designing distributed object architectures, in the words of Martin Fowler:-

". . . Hence, we get to my First Law of Distributed Object Design: Don't distribute your objects!"

Distributing components between J2EE application servers in the J2EE world, involves the use of remote method invocation (RMI) calls to remote interfaces, assuming we are talking about synchronous communication here. Remote method calls to beans deployed to remote servers incurs the following performance overheads:-

Copy by value

Network latency

Every incoming RMI call needs to be serviced by its own Java thread

Servicing incoming connections from the TCP-IP stack causes context switches

The original text behind this design ethos appears in Martin's book "Patterns Of Enterprise Architecture", there is also a very similar article that Martin has published in Dr Dobbs journal which can be found here.

If all the beans are deployed to the same application server, this will eliminate all the bottlenecks except for that of copy by value. However, as you will see, the overhead of pass by copy is still significant and will both increase heap utilisation and extend the part of major garbage collection that looks for dangling objects on the heap. On the last project I worked on at the time of writing this, we found that:-

The 10g database time (not to be confused with wall clock time) was low.
Average number of sessions from ADDM reports was low.
The CPU on the WebSphere server was maxed out.

Profiling the code with a Java profiler revealed that 50% of the CPU time was being expended on a method with a name like com.ibm.CORBA.util.copyObject. This was because the business logic or domain beans were calling the database access beans via remote interfaces, which uses pass by value.

For the benefit of anyone reading this who is not that familiar with the J2EE world, I will elaborate on pass by reference, pass by value and remote and local homes. One of the original design goals being the J2EE standard was to present a framework in which distributed applications could easily be produced, this was based on CORBA and the synchronous communications method was RMI. Prior to the EJB 2.1 standard, it was assumed that when one enterprise bean called another bean, the 'other' bean was always remote, hence a remote interface and copy by value was always used. However, as time went by people realised that despite this, a lot of applications used beans that were deployed to the same J2EE app server and hence suffered this performance penalty for no added benefit. Thus, in the EJB 2.1 standard 'local' interfaces came about. Beans talking to each other using local interfaces use the much more efficient pass by reference method of passing object references rather than whole objects by value.

Fortunately with WebSphere there is an option available on the Object Request Broker (ORB), to turn pass by reference 'On', this forces all the beans deployed to the same EJB container using the same class loader to use pass by reference, irrespective of whether they have been coded to use remote interfaces. In the case of the application that I was looking at, this had the rather large impact of:-

Reducing batch process run times by 50%.

Reducing CPU consumption on the server hosting the WebSphere application server by 50%.

Doubling the 10g db time that a batch process spent in the database.

Increasing the average number of sessions active whilst the batch process was running, again this is an Oracle 10g time metric which is the database time divided by the actual elapsed "walk clock" time.

Setting the class loader policy on for the application server to single also helps. In tuning WebSphere, I also looked at the:-

server JVM

client JVM

ORB with regard to the setting up a JNI reader thread pool and the fragment size

JDBC pool, both the size of the pool, pool utilization and cached statement discard rate.

EJB container, specifically the EJB cache size

Log4j logging level

number of batch process threads
turning off the diagnostic trace

To the best of my knowledge, the only thing I didn't look at was turning the PMI off completely. For both the client and server JVM, I found that using the JVM otpions as used by IBM in a Solaris SPECjAppserver2004 submission yielded the best results. Refer to this link for a blog on which some of the IBMers who carry out this work write, in particular Andrew Spyker, WebSphere performance architect.

However, out of all the things I looked at, I got the biggest bang for my buck from turning pass by reference to 'On' by a long chalk. This comes with a word of caution though, if your software is designed such that beans that receive RMI calls modify objects and the calling bean is not expecting this, wild and erratic application can result. It is for this very reason, or so I believe , that pass by reference is now turned 'Off' by default on WebSphere 6.1.

This brings me on to J2EE clustering and shared nothing architectures. WebSphere network deployment, the clustered version of the WebSphere uses a shared nothing architecture, similar to that discussed in this article here, which again re-iterates the point about distributed architectures and performance:-

"The most scalable architectures are sharing-nothing clusters. In such clusters, each node has the same functionalities, and knows nothing about the existence of other nodes. The load balancer will decide how to dispatch the requests to the back end server instances. It is rare that a load balancer will become a bottleneck, because a load balancer only need to do some very simple jobs, such as distributing requests, health checking, and session sticking. By adding more nodes, the computing ability of the cluster is increasing almost lineally, if the back end database or other information systems are powerful enough."

With WebSphere network deployment, the same ear file is deployed to each node in the cluster, or cell, to use IBM terminology, workload distribution is managed in a round robin fashion via the workload manager. WebSphere eXtreme scale adds some extra intelligence to this by distributing the work load based on the current available capacity of nodes in the cluster.

What exacerbated my original problem of the massive copy by object overhead was the batch processing design, which is fundamentally this. The problem being that because Oracle is called for every individual item in the workload, there is heavy traffic between the business logic layer and the persistence layer. Accessing a database item by item is really inefficient in terms of:-

1. Hard and soft parsing, in particular I found that setting the batch process thread count above 32 stressing the library cache pin mutex. Tanel Poder has written a good article on the library cache pin mutex and extended use of mutexes in 11g here.

2. Latching

3. Network round trips and network latency

4. Physical and logical reads

5. JDBC calls

Some people get very agitated about batch process designs that process the workload item by item and in principle this results in certain inefficiencies when accessing the database. However, the performance targets for the batch processes where the same as those for the system this new one was replacing, but with 200% of the workload and two years worth of "Database aging", i.e. the database had been artificially aged to give the appearance of the application having been in use for two years. In summary, all batch processes beat their counter parts in the original application by at least a factor of 100%.

The morals of this story are:-

1. Distributed object architectures are bad for performance

2. Your application needs to be profiled in order to work out where the time is going, in the example I have highlighted, the database was never the bottleneck.

3. A holistic approach to looking at the architecture needs to be taken with regard to achieving good performance and scalability.

I will get around to discussing approaches on alleviating the stresses that item by item oriented batch processing imposes on the Oracle database in a future post.

Monday, December 01, 2008

J2EE Layered Application Architectures and Performance

I have spent most of my career as a DBA focusing on performance. With most applications there is an intimate relationship between the design and architecture of the application and that of its performance and scalability. Generally speaking, if the design and architecture is wrong, getting an application to perform and scale can be akin to applying elastoplasts over the biggest holes.

As an industry, IT has a patchy record for delivering projects on time and to budget, as such there are many projects out there that barely manage to cross the finishing line having met all of their functional requirements let alone performance requirements. Due to this, tuning can be reactive and end up becoming a battle against the architecture and design.

There was a time when most applications consisted of a client and business logic written directly in the database using stored procedures. Indeed the database software formed the largest part of the application stack. However, more recent architectural trends have been heavily influenced by:-

The internet, Web 2.0 has firmly established itself and Web 3.0 is around the corner

Service Oriented Architectures (SOA)

Software as a service (SAAS)

Event driver architectures
Extreme transaction processing
Grid architectures
Mash ups

Indeed, there was a time when the database was a focal point of integration efforts, now the industry is moving towards integration via XML based middleware. Here is one such article on this very subject from design and architecture luminary Martin Fowler. In the current climate "integration architects" are highly sort after people.

Cloud computing is on its way and just as during the old web 1.0 days, to quote a marketing tag line from Sun. "We were into intranets when most people thought this was a typo", there will also be internal clouds as well. Therefore if you have an interest in performance and tuning, it is highly likely that you will have some involvement in the areas outside the database and find that the database is being under utilised because of what is happening in the layers above it. Cary Millsap made a presentation at Oracle Open World 2007 highlighting the importance of establishing application profiles when tuning.

One of the most prevalent software architectures today uses a combination of layering and tiering, a tier being a layer that is physically deployed on a separate server as opposed to a layer. What is paramount about the tier-ing / layering design approach is that it provides a "clean separation of concerns". Layering can also be vertical, i.e. swathes of development for particular functional areas which cut through the horizontal layers. Classically, the 'Horizontal' layers are:-

Presentation

Business logic (or domain)

Integration

Architects in the finance world prefer tiers, in that each tier can reside in its own DMZ (de-militarised zone) and hence provide maximum security. My interest in application architecture has developed from the bottom up, i.e. from the database upwards. However, the integration layer is not always an RDBMS, it can be:-

1. A web service

2. A message queue

3. Distributed object cache (which may be transactional)

4. An enterprise service bus adapter or message broker

5. Java Connection Architecture connection

6. etc . . .

In fact with XA distributed transactions, a transaction can span multiple end points below the integration layer.

Some patterns are almost de facto in certain layers and tiers, such as model view controller in the presentation tier, session facade in the business logic or domain tier, to name but two. Also, there are frameworks and third party libraries that take a lot of the effort out of having to crank out boiler plate code and the code behind the application infrastructure, e.g.:-

1. Apache struts and Java Server Faces for the presentation layer

2. Spring for the whole application or specific layers

3. Log 4j for logging

4. Apache commons, which is a common utilities library

However, whatever you do you should never; code business logic into the presentation layer or explicitly code details of the integration layer into the business logic or domain layer.

When looking at the performance of such architectures one needs to establish a profile of where particular activities spend their time within the various layers or tiers and the networks involved in connecting any tiers. If a J2EE application server is involved in the application stack, something known as the "carrot model" should be understood. All J2EE application servers use the following set of components that form a pipe line from the 'front' of the application server to the 'back', these are:-

1. Object request broker used for J2EE clients.

2. Web container

3. Enterprise Java Bean Container

4. JDBC Pool or JCA connectors.

The WebSphere application server has a http server built into the web container, however, this is only really recommended for testing purposes so you will probably have an http server outside of the J2EE application server. What the "carrot model" represents is that component utilisation is usually quite high at the 'front' of the application server and gradually dwindles off as you get towards the JDBC connection pool or JCA connectors, hence the analogy of a carrot, thick at one end and thin at the other. Note that ORB is only used for J2EE clients, web clients will utilise the web container http server or external http server directly.

A lot of texts refer to the effort and careful consideration that needs to go into designing the layers of the applications architecture. I would add that the same level of consideration needs to be given to where processing should be carried out. In the simplest of architectures with a J2EE application server on one machine and a database on another, there is still scope for the network to become the bottleneck by excessive calls being made to the database to carry out the processing of business logic and also standing data lookup. In one example, a stateless bean used to access fixed domain data was the most utilised bean in the application until a standing data cache in the application server was implemented. In another example, a stored procedure was called to validate individual items within individual records, using fixed domain data cached in the application server. By carrying the validation entirely within the application server, the process concerned ran faster, required fewer threads than before with lower CPU consumption on both of the servers hosting the application server and the database.

There is equal scope for poor performance and scalability to be present by design with vertical layering also. One particular design trend is to develop specific functional areas with an API implemented using a session facade. Tools such as Apache Axis and also the J2EE application server itself make exposing stateless session beans as a web service easy and straight forward. Consider two separate beans used to implement the APIs for two different functional areas, if these areas are tightly coupled and they are to be used in a service oriented architecture, there is going to be a significant amount of XML to Java serialisation that takes place involving the two respective web services, that might be avoided, if the API was provided by a single bean.

I strongly advise people to read the IBM red book "Designing and Coding Applications for Performance and Scalability in WebSphere Application Server". This is one of the best texts I have come across in the area of designing and writing scalable and well performing J2EE applications. If I had my way, this would be mandatory reading material for all developers, designers and architects on J2EE development projects.

In the not too distant future I will post an article on my experiences of the specific things that can stifle utilisation of the database server when a J2EE application is used.

Tuesday, November 04, 2008

The Rise Of MySql From The Low End

At the time of writing this, the penultimate bog entry from Jonathan Schwartz on the Sun web site refers to innovation loving a crisis, the crisis being the turmoil in the global economy. From this, it is not too difficult to make the mental leap and arrive at the inference that software with no license costs is going to be very attractive at present. Now, this comes from Jonathan's latest blog entry (again latest at the time of writing this):-

"The third is the need for change - one executive to whom I spoke recently said her entire discretionary budget was consumed by one proprietary vendor's price increase. So she's out looking for an alternative, and MySQL fits the bill. Which is to say, necessity's the mother of invention - and there's a lot of necessity going around right now."

Makes you wonder if MySql is going to start biting Oracle at it's ankles around the lower end of the market more than ever before.

Monday, October 13, 2008

"The Performance Hacker"

I've just read the Oracle Real World Performance Group's Open World 2008 presentations entitled "Current Trends in Real World Database Performance". The bit about the performance hacker was interesting. Tuning via initialisation parameters is a bad practice with little or no mileage to gained, unless parameters have been grossly mis-configured in the first place. At the extreme end of the performance hacker spectrum are those that dabble with initialisation parameters that most DBAs will never have any cause to change in their entire careers, such as spin_count (_spin_count from 8i), worse still are those that dabble with the '_' parameters without direction from Oracle support. In most sessions Oracle's Graham Wood gives at the UKOUG conferences, he always mentions that the scope of a fix should match the scope of the problem. You need to get to the root cause of the problem, although the use of hints is no where near as bad as hacking initialisation parameters, it is fixing the symptoms of the problem rather than directly addressing the problem itself.

Of the other characteristics that define the "Performance hacker", they like to indulge in tuning via the internet. There are, however, some excellent sources of information on the internet and in blog sphere regarding tuning:

The Real World Performance Group
The Optimizer Group Blog
Jonathan Lewis's blog
Wolfgang Breitlings web site etc

The trick is to prefer sources of information from organisations and individuals that provide clearly worked through test cases in order to demonstrate their points. Sites that make grandiose claims without backing them up with worked examples should be avoided like the plague.

If you are completely new to Oracle you may have a tuning advisor view of the performance world. Read the 11g documentation and you would be forgiven for thinking that 11g is a self tuning database and that the addm and bundled in advisors are the only tools you need. Also note that 11g comes with an SQL test case builder, for creating test cases to pass onto to Oracle support for poorly performing statements. The SQL API of which is available through the DBMS_SQLDIAG package, read into this what you will.

The approach recommended by the Real World Performance Group is based around what I first found in a paper entitled "Performance Tuning By Cardinality Feedback" by Wolfgang Breitling, a reviewer of Jonathan Lewis's cost based fundamentals book. The basic premise of this is that you look at where actual cardinality versus the predicted cardinality in a plan is out, usually by an order of magnitude and usually somewhere near the start of the plan. To the best of my knowledge, this approach is not mentioned in the standard Oracle performance tuning material.

Knowing about this technique and how to apply it is made easier in Oracle 10g with the gather_plan_statistics hint (or statistics_level=all) and the DBMS_STATS.XPLAN package. If you use the "Tuning by cardinality" feedback approach and find that you plan does not improve through addressing issues including:-

1. Inaccurate stats, particularly the number of distinct values against highly skewed data.
2. Data correlation or predicate independence assumption issues.
3. Predicate values being out of bounds when compared to min max values for a column.
4. Missing histograms or histograms with inappropriate numbers of buckets.
5. Pseudo columns relating to function based indexes with out stats or histograms.
6. Selectivity assumptions made when functions are used.
7. Skewed data, for which there is more than 254 disitinct values and the values you are interested in are not 'popular' values, i.e. do not fall on a bucket boundary end point.

By providing full evidence that this approach has been applied and followed, you will get a much more expediant resolution to your issue out of Oracle support than someone reporting a poorly performing query or a query with a large full table scan in it's plan.

Thursday, September 18, 2008

A Ticking Performance And Scalability Time Bomb and Java Object Caching Grids

During last years Oracle Uk User group I attended a session by Graham Wood from Oracle on the 10g performance infrastructure, covering ASH, ADDM, the time model and all that good stuff. At the beginning of his session, Graham mentioned that he often encountered projects where Java programmers who needed to persist something, simply created a table without a second thought for performance. When I attended a UML course earlier on in the year, this was also echoed by the course instructor, a designer and analyst of considerable experience. Specifically, that due care had not been placed into entity relationship modeling, culminating in data models that were a nightmare to maintain, which was becomming a greater and greater issue across a lot of projects.

In the Java world the term used to describe the layer that interfaces with the database is known as "object relational mapping", or ORM to use it's abbreviation. The focus tends to be on mapping tools that provide neat frameworks for managing the SQL, XML configuration files are a common component used in such approaches. However, there is little focus on leveraging features such as Oracle's array interface to make database access as efficient as possible. Indeed, as Tom Kyte often mentions in his books, some people take the black box approach to using the database. In the same book, Expert One on One Oracle, I think, one project team elected to develop something akin to the query optimizer in the application server. Talk about re-inventing the wheel.

There are inherant bottlenecks with using a relational database, mainly the data type impedance mismatch. Numerous JVM based object caching solutions which can front the database and even do away with it, have made their way onto the market to address this. Examples of such solutions include Oracle Coherence, Gigaspace Application Fabric, IBM Object Grid, Appistry, Terracotta etc. These caching solutions and their associated processing grids have given rise to the paradigm of "Extreme transaction processing", which I believe is a term coined by the Gartner group. Someone I spoke to in Oracle pre-sales earlier on in the year mentioned a customer who had managed to achieve with £40,000 of Intel based blades and Coherence what they had budgeted £200,000 of Unix hardware for. Indeed, the founder of Tangosol, the commpany that first developed Coherence, then to be acquired by Oracle, came up with the idea whilst he was a Java consultant and visited numerous customer sites where accessing databases was the main bottleneck in the application architecture. As an aside, Coherence can cache a wide variety of data sources other than databases including web services. It is also equally at home in caching with .Net as it is with Java.

The fact of the matter is that most people still need to use a relational database somewhere. In most peoples infrastruture and architectures some form of reporting needs to be performed, who has ever heard of a reporting solution based on Java objects ?. There may well be a reporting solution based on Java objects out there somewhere, but in my experience most people use SQL to meet their reporting requirements. SQL is a rich and mature language for reporting, with its analytic extensions and constructs such as materialised views, so the relational database might still be around for some time. As Tom Kyte once said, platforms change once every ten years from main frames, to Unix, to Windows and now Linux, however Oracle has always been most peoples database of choice and the software of choice for storing, processing and reporting on a organisation's data.

In reality a best of both worlds approach might be what is required to achieve scalable architectures and this is possible why Oracle has introduced a business logic tier caching solution for 11g based on the Times Ten in memory database. Note that when looking at caching options, there are two ways to go:-

1. Object caching

2. Relational data caching, refer to Times Ten and various offerings from Progress

The road you should go down depends on whether your application is expecting objects or relational data. A relational data caching solution will not eliminate the data type impendance mismatch problem, however what it wil do is vastly reduce latency from your architecture when accessing data.

Wednesday, March 26, 2008

Missed Opportunities and Getting It Right The First Time

Whenever Oracle brings out new features you are sometimes left thinking; thats nice, but I could really do with the new future allowing me to do x, y and z as well. For example:-

1. Index Organized tables in 8i not supporting secondary indexes.

2. Global indexes on partitioned tables becoming unusable after partition maintenance operations, drops, splits etc . . .

3. Being able to pass compressed blocks across the cluster interconnect, still an issue in 11g release 1 (I think).

This brings me on to cursor sharing and histograms. Histograms have been around since Oracle 7 are are used to provide the optimizer with information on skewed data, basic stuff really. Cursor sharing came along in 8i and is a way of getting the RDBMS to use bind variables when literal values are used in predicates. Up until 9i, histograms only got used when literal values were specified in a predicate on a column with a histogram. Where these two features come together is that if you are using bind variables Oracle will perform bind variable peeking, usually when a histogram exists, but also in some circumstances when histograms do not exist on the column(s) your predicates are used against.

Historically, most Oracle developers use literals against columns for which the data is skewed and binds against columns in predicates for which there is no value in forcing a re-parse of the SQL statement, typically values which are highly selective. With cursor_sharing = similar, plans should only be re-used if the parsed cursor can 'safely' re-use an existing plan. However Metalink technical note Note.377847.1 "Unsafe Peeked Bind Variables and Histograms", states that the RDBMS can mark a bind variable as being unsafe and thus generate many child cursors for statements, I have seen first at first hand the RDBMS doing this even when there is no histogram on a column (method_opt=>" . . . size = 1"). The solutions to this are:-

1. Use cursor sharing exact, this defeats the purpose of using cursor_sharing=similar somewhat.

2. 'Drop' histograms on columns that cause unsafe bind peeking, there is a good post on the Pythian blog about the overuse of histograms.

3. Check whether dynamic_sampling is causing this as per the Metalink technical note.

4. Set _optim_peek_user_binds = false to disable bind peeking full stop, if you still get this without a histogram.

v$sql_shared_cursor will not give any insight into why this happening. There is no 'fix' to this issue as such, however 11g introduces intelligence into the issue with adaptive cursor sharing. The on line Oracle documentation, Optimizer blog and Technet all provide good references on this.

The real question is why has it taken Oracle this long to introduce intelligence into cursor sharing ?. On the same theme of new features taking their time (and then some) to mature, I'd like to use the results cache in 11g in anger, but how long will it take Oracle to shave the rough edges off this feature, i.e.:-

1. Results being invalidated when none related rows in the related base tables change.

2. No explicit information in autotrace stats for knowing that results caching has been used

etc . .

I've heard whispers of new functionality appearing in 11g release 2 being related to none core RDBMS technology, perhaps Oracle should focus on the maturity of the RDBMS engine core features first.

Tuesday, March 11, 2008

A New Type Of Synchronization Mechanism In 10g Release 2

Up until 10g release 2, the only two locking and synchronization mechanisms available to the RDBMS engine were latches and enqueues. Oracle 10g release 2 introduces mutexs which replaces the library cache pin for shared cursors on platforms supporting atomic compare and swap operations (CAS). There is scant information on the library cache pin mutex on MetaLink or the internet at large for that matter, however Tanel Poder from Miracle A/S and the Oak Table provides some useful insights into this:-

Re: cursor: pin S wait on X
From: Tanel Põder
To: , "Mladen Gogala"
Date: Sat, 29 Apr 2006 20:10:50 +0900 I studied the mutex stuff a bit when 10.2 came out, I'd say mutexes are less intelligent than latches. But thanks to that, they are more lightweight and thanks to quite large caches on modern CPUs we can afford having tons of them - allowing very fine grained locking.On my 32bit linux installation a mutex was 28 bytes in size, while a regular latch structure was 110 bytes. In theory every library cache child should have it's own mutex, but in practice I saw one case where two different child objects were protected by the same mutex - that was what library cache dump showed me anyway.Mutex also acts as a pin structure, so there's no need to do another memory write for pinning a cursor. Mutex allows immediate shared access to resources it protects, the old approach would mean getting a library cache latch in exclusive mode and then modifying the corresponding library cache child pin structure.Mutexes also don't maintain statistics on successful gets as latches do, only sleeps are registered.Thanks to all that, an immediately succeeding mutex get operation takes several times less CPU instructions (and hits possibly less memory stalls), from a 10gR2 perf tuning course I remember it was about 250 vs 50 CPU instructions on IA32 platform.Note that mutexes don't provide much benefit for poorly written applications, for most benefit your apps should avoid any kind of reparsing - keeping cursors open and cancelling them manually when interest in query results has ended.Tanel.

Mutexs are indeed light weight and Oracle states in some information that Oracle support provided me with (the document included 'external' in the title making me presume it is for public consumption) that the rationales for their introduction into the RDBMS engine are:-

1. They are smaller and much faster to get than latches.

2. It is fast and easy to create mutexes for individual structures leading to less contention on the mechanisms that protect the actual structures.

3. The mutex implementation is highly flexible in that components can define wait policy for individual mutexes.

4. Mutexes exhibit a dual nature ideal for pin type operations, many sessions can reference a mutex in shared mode where as only one session can hold a mutex in exclusive mode.

Activity on the library cache pin can be observed through waits appearing on events prefixed by cursor: pin, this is usually followed by something denoting the type of operation, e.g. cursor: pin S wait on X, pinning a cursor for an execute. The v$ views expose library cache pin activity through the v$mutex_sleep_history and v$mutex_sleeps views. When monitoring this mutex, the most important thing to keep an eye for is the amount of time waited on cursor pin events as opposed to the number of events. Large wait times can be due to numerous child cursors being built beneath a parent cursor through cursor invalidation, bind variable variance, statements being aged out of the shared pool etc. In fact, v$sql_shared_cursor view gives the comprehensive range of scenarios that can cause cursors to be be re-parsed:-

SQL_IDADDRESSCHILD_ADDRESS

CHILD_NUMBERUN

BOUND_CURSOR

SQL_TYPE_MISMATCH

OPTIMIZER_MISMATCH

OUTLINE_MISMATCH

STATS_ROW_MISMATCH

LITERAL_MISMATCH

SEC_DEPTH_MISMATCH

EXPLAIN_PLAN_CURSOR

BUFFERED_DML_MISMATCH

PDML_ENV_MISMATCH

INST_DRTLD_MISMATCH

SLAVE_QC_MISMATCH

TYPECHECK_MISMATCH

AUTH_CHECK_MISMATCH

BIND_MISMATCH

DESCRIBE_MISMATCH

LANGUAGE_MISMATCH

TRANSLATION_MISMATCH

ROW_LEVEL_SEC_MISMATCH

NSUFF_PRIVS

INSUFF_PRIVS_REM

REMOTE_TRANS_MISMATCH

LOGMINER_SESSION_MISMATCH

INCOMP_LTRL_MISMATCH

OVERLAP_TIME_MISMATCH

SQL_REDIRECT_MISMATCH

MV_QUERY_GEN_MISMATCH

USER_BIND_PEEK_MISMATCH

TYPCHK_DEP_MISMATCH

NO_TRIGGER_MISMATCH

FLASHBACK_CURSOR

ANYDATA_TRANSFORMATION

INCOMPLETE_CURSOR

TOP_LEVEL_RPI_CURSOR

DIFFERENT_LONG_LENGTH

LOGICAL_STANDBY_APPLY

DIFF_CALL_DURN

BIND_UACS_DIFF

PLSQL_CMP_SWITCHS_DIFF

CURSOR_PARTS_MISMATCH

STB_OBJECT_MISMATCH

ROW_SHIP_MISMATCH

PQ_SLAVE_MISMATCH

TOP_LEVEL_DDL_MISMATCH

MULTI_PX_MISMATCH

BIND_PEEKED_PQ_MISMATCH

MV_REWRITE_MISMATCH

ROLL_INVALID_MISMATCH

OPTIMIZER_MODE_MISMATCH

PX_MISMATCHMV_STALE

OBJ_MISMATCH

FLASHBACK_TABLE_MISMATCH

LITREP_COMP_MISMATCH

Metalink technical note 438755.1 'Formated V$SQL_SHARED_CURSOR Report by SQLID or Hash Value' contains information on how to obtain useful reports from this view, it also provides other links to articles to assist in investigating shared cursor issues further. Its a shame that the treasure trove of Oracle internals hints and tips;http://www.ixora.com.au/ is no longer kept uptodate as I'm sure that Steve Adams would have provided some excellant technical details on this mutex much more eloquantly than I have in this posting here.

Tuesday, March 04, 2008

The Great Oracle Diagnostic Tools Debate

There has been a lot of blogging recently over the limitations of the ADDM and time model introduced in 10g, Dan Fink has blogged on this, Richard Foote has blogged on this in response to Dan's blog. Guy Harrison made a presentation at OOW 2007 mentioning the limitiations of ADDM and the 10g time model. Cary Milsap made a presentation at OOW 2007 on 'Skew' also citing the limitations of ADDM and the time based model specifically with regard to developing application performance profiles. I agree that:-

The "self managing database" is more a thing of marketing than technology, to quote Dan Fink.
Extended tracing allows you to see where time is being expended on individual statements to a much finer degree of granularity than ADDM and the time model.

However, there are certain practicalities that need to be considered here:-

Extended trace is of limited use for "one off" performance issues, you wouldn't really want this to be enabled all the time on a production environment.
The use of J2EE application servers is common place today, unless your application Java code is well instrumented by using using DBMS_SUPPORT to end to end tracing, a logon trigger to trace what the JDBC client is doing may have to be used. Trace files can be tagged using EXECUTE IMMEDIATE 'ALTER SESSION SET TRACEFILE_IDENTIFIER . . .' . Unfortunately good application instrumentation is the exception rather than the rule and only Oracle savy Java developers or teams with access to a good DBA will know about end to end tracing and trcsess. Otherwise, something such as a logon trigger may lead to a forest of trace files to wade through.

Some IT shops (and I know this is bad practise) don't always have a test / development environment for every production environment or don't have test environments that are identical to their production environments, this is often the case with data warehouses. Therefore, if you have a very intermitant problem in production or a problem which despite being reproducible is very infrequently, reproducing this on another environment might be a problem and again you wouldn't really want SQL tracing enabled on a production environment 24/7 until a problem re-occurs. This may not be such an issue if via good instrumentation the application provides the facility for tracing to be generated on specific parts of the application and under specific circumstances, however, my experience is that applications that do this are few and far between.

In his post Richard Foote uses a very good example of enabling extended tracing being analogous to putting a camera on some who journey gets held up, his wife in the example. What do you do if you don't know what part of the software to "put the camera" on ?.
The issue you are experiencing might be time sensitive, meaning that in order to reproduce the issue you need to replay data which was loaded / processed by the application at specific dates and times. With Unix this can be done by changing the system clock, but again, this is not something you would really want to do in a production environment.

I would love to know what Grahan Wood, architect and 'father' of the ADDM and time model makes of this debate and if he has plans to modify the ADDM and time model in light of these criticisms. Furthermore, I would also be interested to know what percentage of performance issues are down to bad configuration, poor coding practises, poor design and poor architecting, as it occurs to me that some people are concerned about ADDM and the time model taking some of the art out of tuning. However, I believe that an element of tuning is identifying poor design and architectural descisions a skill that no tool will be able to replace for a long long time.