1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14.

Thursday, September 18, 2008

A Ticking Performance And Scalability Time Bomb and Java Object Caching Grids

During last years Oracle Uk User group I attended a session by Graham Wood from Oracle on the 10g performance infrastructure, covering ASH, ADDM, the time model and all that good stuff. At the beginning of his session, Graham mentioned that he often encountered projects where Java programmers who needed to persist something, simply created a table without a second thought for performance. When I attended a UML course earlier on in the year, this was also echoed by the course instructor, a designer and analyst of considerable experience. Specifically, that due care had not been placed into entity relationship modeling, culminating in data models that were a nightmare to maintain, which was becomming a greater and greater issue across a lot of projects.

In the Java world the term used to describe the layer that interfaces with the database is known as "object relational mapping", or ORM to use it's abbreviation. The focus tends to be on mapping tools that provide neat frameworks for managing the SQL, XML configuration files are a common component used in such approaches. However, there is little focus on leveraging features such as Oracle's array interface to make database access as efficient as possible. Indeed, as Tom Kyte often mentions in his books, some people take the black box approach to using the database. In the same book, Expert One on One Oracle, I think, one project team elected to develop something akin to the query optimizer in the application server. Talk about re-inventing the wheel.

There are inherant bottlenecks with using a relational database, mainly the data type impedance mismatch. Numerous JVM based object caching solutions which can front the database and even do away with it, have made their way onto the market to address this. Examples of such solutions include Oracle Coherence, Gigaspace Application Fabric, IBM Object Grid, Appistry, Terracotta etc. These caching solutions and their associated processing grids have given rise to the paradigm of "Extreme transaction processing", which I believe is a term coined by the Gartner group. Someone I spoke to in Oracle pre-sales earlier on in the year mentioned a customer who had managed to achieve with £40,000 of Intel based blades and Coherence what they had budgeted £200,000 of Unix hardware for. Indeed, the founder of Tangosol, the commpany that first developed Coherence, then to be acquired by Oracle, came up with the idea whilst he was a Java consultant and visited numerous customer sites where accessing databases was the main bottleneck in the application architecture. As an aside, Coherence can cache a wide variety of data sources other than databases including web services. It is also equally at home in caching with .Net as it is with Java.

The fact of the matter is that most people still need to use a relational database somewhere. In most peoples infrastruture and architectures some form of reporting needs to be performed, who has ever heard of a reporting solution based on Java objects ?. There may well be a reporting solution based on Java objects out there somewhere, but in my experience most people use SQL to meet their reporting requirements. SQL is a rich and mature language for reporting, with its analytic extensions and constructs such as materialised views, so the relational database might still be around for some time. As Tom Kyte once said, platforms change once every ten years from main frames, to Unix, to Windows and now Linux, however Oracle has always been most peoples database of choice and the software of choice for storing, processing and reporting on a organisation's data.
In reality a best of both worlds approach might be what is required to achieve scalable architectures and this is possible why Oracle has introduced a business logic tier caching solution for 11g based on the Times Ten in memory database. Note that when looking at caching options, there are two ways to go:-
1. Object caching
2. Relational data caching, refer to Times Ten and various offerings from Progress
The road you should go down depends on whether your application is expecting objects or relational data. A relational data caching solution will not eliminate the data type impendance mismatch problem, however what it wil do is vastly reduce latency from your architecture when accessing data.
 
1. 2.