Tuesday, March 03, 2009
Oracle 11.1.0.7.0 and cell physical IO interconnect byt-es
Monday, February 16, 2009
SQL Server Cross Training For Oracle Professionals
- there have been numerous transact SQL enhancements
- The internals have been exposed somewhat via dynamic management views
- Query Analyzer, Enterprise Manager and OLAP Analysis manager have been consolidated into "Management studio".
- There is a new database tuning advisor
Oracle, DB Time and Performance Baselines
Though AWR data can be exported and imported for transporting it into other databases
this is not supported by oracle and this is to be used only for troubleshooting purposes,
This is implemented with an internal package which is created with the script awrextr.sql
which is under $ORACLE_HOME/rdbms/admin
The steps to follow are
To download the data into a data pump dumpfile, you would use the procedure AWR_EXTRACT:
begin
DBMS_SWRF_INTERNAL.AWR_EXTRACT (
dmpfile => 'awr_data.dmp',
dmpdir => 'TMP_DIR',
bid => 302,
eid => 305);
end;
dmpfile => The name of the target file for the data is mentioned here. This is a Data Pump export
file. If non filename is given, the default value awrdat.dmp is used.
dmpdir => The directory object where the dumpfile is written. In this case, you may have defined
a directory TMP_DIR as /tmp.
bid => The snapshot ID of the beginning snapshot of the period.
eid => The end snapshot ID.
Once you dump the AWR data to the dumpfile awr_data.dmp to the new location and load it using another procedure in
the same package,
First we need to load the AWR data to a temporary staging area before loading into the SYS schema. In this case, the data
will be loaded to AWR_TEST schema initially.
begin
DBMS_SWRF_INTERNAL.AWR_LOAD (
SCHNAME => 'AWR_TEST',
dmpfile => 'awr_data',
dmpdir => 'TMP_DIR');
end;
After staging, the data is moved into the SYS schema:
begin
DBMS_SWRF_INTERNAL.MOVE_TO_AWR (SCHNAME => 'TEST');
end;
Once this is completed, the AWR data can be used for any performance analysis or you can build a central repository of
AWR data collected from multiple databases
Monday, February 02, 2009
The Death Of ESBs
Saturday, January 24, 2009
Efficient Oracle Database Usage By Java and J2EE Applications
1. Connection management best practices:-
- Connection pool usage
- Setting the connection pool min and max settings to be the same to avoid connection storms
- Oracle 11g server side connection pooling
- Using the 11g to throttle connection rates
2. Efficient cursor management
- The statement Vs preparedStatement API.
- Avoiding hard parsing
- Avoiding soft parsing
- Parameters that affect parsing: session_cached_cursors,
- cursor_space_for_time and cursor_sharing
3. Minimizing network round trips and leveraging bulk operations
- The JDBC fetch size
- Statement batching in JDBC
- The Oracle array interface
- Leveraging Oracle bulk operations
- Intelligent Oracle statement feature usage, specifically usage of: merge
- statements, in line views, in line PL/SQL functions, sub query factoring.
- The client side result cache
- Programming and design styles for minimizing application 'Chatter'
- Caching
4. Design and coding according to the putting the processing close to resource that requires it, specifically:-
- Do not be precious about Java or the database; it is not necessarily a good
- thing to do everything in the database.
- Do not re-invent database functionality in Java
5. Best practices for logger buffer usage
- JDBC auto commit.
- Batch and asynchronous commits as introduced in Oracle 10g.
Watch this space . . .
Friday, January 23, 2009
J2EE Batch Processing Design and Tuning presentation here !!!
Wednesday, January 21, 2009
Presentation On J2EE Batch Processing and Tuning
I wrote the presentation for several reasons:-
1. To capture key lessons learned on the project from a technical point of view.
2. To capture information on batch process design considerations as there is not much information in the public domain.
3. To produce some "joined up" information on tuning J2EE software using Oracle, mainly because most J2EE material stops short of the database at the JDBC level and most database tuning material stops short of application server at JDBC but from the other direction. There is material "out there" but not in one place presented in a manner that covers the whole software stack.
These are the main points I will cover:-
1. Batch process architecture and design considerations.
2. The batch process design used on my project
3. Available monitoring and tuning tools for tuning J2EE architectures using Oracle.
4. Tuning findings
5. Conclusions
As everyone always likes to skip to the end of a book before reading the rest, I will touch on some of the conclusions from this work, namely:-
1. Why Martin Fowler's first law on distributed Object architectures is well founded.
2. Dangers of swamping a J2EE application server with too many threads.
3. Avoiding 'Chatty' designs.
4. Although row by row processing is a concern, it is not always the end of the world in terms of performance and only a genuine concern if it prevents performance and scalability targets from being reached.
Thursday, January 15, 2009
The Basics That Undermine RAC Scalability And Availability
1. Scalability
Cluster inter-connect traffic and performance is a major factor here. As such Infiband is being touted as the up and coming inter-connect fabric. Also if you google Joel Goodman Oracle, you will most probably find references to presentations and papers on how uncached sequences without noorder specified can degrade performance via the pinging of blocks across the inter-connect. Other than this, the same factors that influence performance and scalability for a single instance application also apply to RAC. Specifically, these are the "usual suspects" that the Oracle Real World performance include in most of the material they present at Oracle Open World, namely:-
1. Good schema design
2. Efficient connection management
3. Efficient cursor management
Therefore, my first question is this, how many RAC sites test their applications in single instance environments for scalability around these three factors. Also, how many sites test the scalability of their clusters as they add nodes. Interestingly according to Julian Dyke’s book on 10g RAC and Linux, Oracle actually turn cache fusion off in their RAC TCP bench marks.
A further recommendation is that your application has some intelligence when connecting to database services, such that the workload directed at specific nodes, leads to a minimal amount of cluster inter-connect traffic. Without any evidence to back it up, there are probably applications out there which use tables to implement things best described as application level semaphores or tables which occupy very few blocks with flag like columns, which if placed on a RAC cluster will slay the inter-connect.
2. High availability
It is recommended that there is adequate capacity in the cluster to soak up the workload created when one or more nodes fails. Technical issues aside, a cluster is not a panacea for high availability and this can be undermined by several factors:-
- Poor testing of changes applied to a cluster.
- Lack of a realistic test cluster.
- A lack of tight change management procedures and processes.
Friday, December 26, 2008
Nuances Of JDBC Connection Pools
A common misconception with Oracle is that execution plans are only re-generated when a hard parse takes place or when an execution plan is aged out of the shared pool, this is not the case, v$sql_shared_cursor gives numerous reasons as to why re-optimizations and also the generation of child cursors take place. However, as per a previous posting I made, this view does not capture bind peeking activity.
As a quick fix, and yes I know this solution was only really a crutch, I placed an execute immediate statement at the top of the code to set cursor_sharing to force. I could have also fixed this using the hint opt_param('_optim_peek_user_binds', 'false'), with the benefit of hind sight and the rule of thumb that the scope of the fix should match the scope of the problem, this may have been a better way to have gone.
I did not realise at the time that this alter session would affect other SQL statements executed by the application server that reused this particular connection from the pool, with disastrous results on performance. In particular, a batch process which executed the same statement with different binds multiple times was not being re-optimized at execution time, when in fact Oracle had just cause to do this. Fortunately this particular scenario was constrained to one test environment I was working on. The resolution to the problem was to put another execute immediate statement at the bottom of the stored procedure to set cursor_sharing back to what is was set to before.
A popular technique to avoid database access and solve mutating trigger problems is to use a PL/SQL array declared in a PL/SQL package specfication. Indeed, the other nuance of JDBC connection pooling is that if an SQL call from the Java code intialises such a variable, and then the connection is released in the Java code, subsequent calls from the same Java thread, may not necessarily reuse the same connection from the JDBC connection pool and thus not see the data cached in the PL/SQL package.
Good Java programming practice dictates that you should always lazily grab resources and then release them as soon as they are no longer required, incresing the likelihood that threads may use many different connections from the JDBC connection pool. Also note that when obtaining a JDBC connection in Java code, what you are in fact doing is obtaining a handle to a connection from the pool . New connections to the database should only be made:-
1. After the application server starts up
2. In the minimum and maximum connection settings for the pool are not the same and the application server needs to grow the size of the pool up to the maximum setting.
A Presentation On Trouble Shooting Bad Execution Plans In Oracle
Tuesday, December 23, 2008
Performance Bottlenecks In The J2EE App Server That Can Throttle The Database
- Copy by value
- Network latency
- Every incoming RMI call needs to be serviced by its own Java thread
- Servicing incoming connections from the TCP-IP stack causes context switches
The original text behind this design ethos appears in Martin's book "Patterns Of Enterprise Architecture", there is also a very similar article that Martin has published in Dr Dobbs journal which can be found here.
If all the beans are deployed to the same application server, this will eliminate all the bottlenecks except for that of copy by value. However, as you will see, the overhead of pass by copy is still significant and will both increase heap utilisation and extend the part of major garbage collection that looks for dangling objects on the heap. On the last project I worked on at the time of writing this, we found that:-
- The 10g database time (not to be confused with wall clock time) was low.
- Average number of sessions from ADDM reports was low.
- The CPU on the WebSphere server was maxed out.
For the benefit of anyone reading this who is not that familiar with the J2EE world, I will elaborate on pass by reference, pass by value and remote and local homes. One of the original design goals being the J2EE standard was to present a framework in which distributed applications could easily be produced, this was based on CORBA and the synchronous communications method was RMI. Prior to the EJB 2.1 standard, it was assumed that when one enterprise bean called another bean, the 'other' bean was always remote, hence a remote interface and copy by value was always used. However, as time went by people realised that despite this, a lot of applications used beans that were deployed to the same J2EE app server and hence suffered this performance penalty for no added benefit. Thus, in the EJB 2.1 standard 'local' interfaces came about. Beans talking to each other using local interfaces use the much more efficient pass by reference method of passing object references rather than whole objects by value.
Fortunately with WebSphere there is an option available on the Object Request Broker (ORB), to turn pass by reference 'On', this forces all the beans deployed to the same EJB container using the same class loader to use pass by reference, irrespective of whether they have been coded to use remote interfaces. In the case of the application that I was looking at, this had the rather large impact of:-
- Reducing batch process run times by 50%.
- Reducing CPU consumption on the server hosting the WebSphere application server by 50%.
- Doubling the 10g db time that a batch process spent in the database.
- Increasing the average number of sessions active whilst the batch process was running, again this is an Oracle 10g time metric which is the database time divided by the actual elapsed "walk clock" time.
Setting the class loader policy on for the application server to single also helps. In tuning WebSphere, I also looked at the:-
- server JVM
- client JVM
- ORB with regard to the setting up a JNI reader thread pool and the fragment size
- JDBC pool, both the size of the pool, pool utilization and cached statement discard rate.
- EJB container, specifically the EJB cache size
- Log4j logging level
- number of batch process threads
- turning off the diagnostic trace
However, out of all the things I looked at, I got the biggest bang for my buck from turning pass by reference to 'On' by a long chalk. This comes with a word of caution though, if your software is designed such that beans that receive RMI calls modify objects and the calling bean is not expecting this, wild and erratic application can result. It is for this very reason, or so I believe , that pass by reference is now turned 'Off' by default on WebSphere 6.1.
This brings me on to J2EE clustering and shared nothing architectures. WebSphere network deployment, the clustered version of the WebSphere uses a shared nothing architecture, similar to that discussed in this article here, which again re-iterates the point about distributed architectures and performance:-
"The most scalable architectures are sharing-nothing clusters. In such clusters, each node has the same functionalities, and knows nothing about the existence of other nodes. The load balancer will decide how to dispatch the requests to the back end server instances. It is rare that a load balancer will become a bottleneck, because a load balancer only need to do some very simple jobs, such as distributing requests, health checking, and session sticking. By adding more nodes, the computing ability of the cluster is increasing almost lineally, if the back end database or other information systems are powerful enough."
With WebSphere network deployment, the same ear file is deployed to each node in the cluster, or cell, to use IBM terminology, workload distribution is managed in a round robin fashion via the workload manager. WebSphere eXtreme scale adds some extra intelligence to this by distributing the work load based on the current available capacity of nodes in the cluster.
What exacerbated my original problem of the massive copy by object overhead was the batch processing design, which is fundamentally this. The problem being that because Oracle is called for every individual item in the workload, there is heavy traffic between the business logic layer and the persistence layer. Accessing a database item by item is really inefficient in terms of:-
1. Hard and soft parsing, in particular I found that setting the batch process thread count above 32 stressing the library cache pin mutex. Tanel Poder has written a good article on the library cache pin mutex and extended use of mutexes in 11g here.
2. Latching
3. Network round trips and network latency
4. Physical and logical reads
5. JDBC calls
Some people get very agitated about batch process designs that process the workload item by item and in principle this results in certain inefficiencies when accessing the database. However, the performance targets for the batch processes where the same as those for the system this new one was replacing, but with 200% of the workload and two years worth of "Database aging", i.e. the database had been artificially aged to give the appearance of the application having been in use for two years. In summary, all batch processes beat their counter parts in the original application by at least a factor of 100%.
The morals of this story are:-
1. Distributed object architectures are bad for performance
2. Your application needs to be profiled in order to work out where the time is going, in the example I have highlighted, the database was never the bottleneck.
3. A holistic approach to looking at the architecture needs to be taken with regard to achieving good performance and scalability.
I will get around to discussing approaches on alleviating the stresses that item by item oriented batch processing imposes on the Oracle database in a future post.
Monday, December 01, 2008
J2EE Layered Application Architectures and Performance
I have spent most of my career as a DBA focusing on performance. With most applications there is an intimate relationship between the design and architecture of the application and that of its performance and scalability. Generally speaking, if the design and architecture is wrong, getting an application to perform and scale can be akin to applying elastoplasts over the biggest holes.
As an industry, IT has a patchy record for delivering projects on time and to budget, as such there are many projects out there that barely manage to cross the finishing line having met all of their functional requirements let alone performance requirements. Due to this, tuning can be reactive and end up becoming a battle against the architecture and design.
There was a time when most applications consisted of a client and business logic written directly in the database using stored procedures. Indeed the database software formed the largest part of the application stack. However, more recent architectural trends have been heavily influenced by:-
- The internet, Web 2.0 has firmly established itself and Web 3.0 is around the corner
- Service Oriented Architectures (SOA)
- Software as a service (SAAS)
- Event driver architectures
- Extreme transaction processing
- Grid architectures
- Mash ups
Indeed, there was a time when the database was a focal point of integration efforts, now the industry is moving towards integration via XML based middleware. Here is one such article on this very subject from design and architecture luminary Martin Fowler. In the current climate "integration architects" are highly sort after people.
Cloud computing is on its way and just as during the old web 1.0 days, to quote a marketing tag line from Sun. "We were into intranets when most people thought this was a typo", there will also be internal clouds as well. Therefore if you have an interest in performance and tuning, it is highly likely that you will have some involvement in the areas outside the database and find that the database is being under utilised because of what is happening in the layers above it. Cary Millsap made a presentation at Oracle Open World 2007 highlighting the importance of establishing application profiles when tuning.
One of the most prevalent software architectures today uses a combination of layering and tiering, a tier being a layer that is physically deployed on a separate server as opposed to a layer. What is paramount about the tier-ing / layering design approach is that it provides a "clean separation of concerns". Layering can also be vertical, i.e. swathes of development for particular functional areas which cut through the horizontal layers. Classically, the 'Horizontal' layers are:-
- Presentation
- Business logic (or domain)
- Integration
Architects in the finance world prefer tiers, in that each tier can reside in its own DMZ (de-militarised zone) and hence provide maximum security. My interest in application architecture has developed from the bottom up, i.e. from the database upwards. However, the integration layer is not always an RDBMS, it can be:-
1. A web service
2. A message queue
3. Distributed object cache (which may be transactional)
4. An enterprise service bus adapter or message broker
5. Java Connection Architecture connection
6. etc . . .
In fact with XA distributed transactions, a transaction can span multiple end points below the integration layer.Some patterns are almost de facto in certain layers and tiers, such as model view controller in the presentation tier, session facade in the business logic or domain tier, to name but two. Also, there are frameworks and third party libraries that take a lot of the effort out of having to crank out boiler plate code and the code behind the application infrastructure, e.g.:-
1. Apache struts and Java Server Faces for the presentation layer
2. Spring for the whole application or specific layers
3. Log 4j for logging
4. Apache commons, which is a common utilities library
However, whatever you do you should never; code business logic into the presentation layer or explicitly code details of the integration layer into the business logic or domain layer.
When looking at the performance of such architectures one needs to establish a profile of where particular activities spend their time within the various layers or tiers and the networks involved in connecting any tiers. If a J2EE application server is involved in the application stack, something known as the "carrot model" should be understood. All J2EE application servers use the following set of components that form a pipe line from the 'front' of the application server to the 'back', these are:-
1. Object request broker used for J2EE clients.
2. Web container
3. Enterprise Java Bean Container
4. JDBC Pool or JCA connectors.
The WebSphere application server has a http server built into the web container, however, this is only really recommended for testing purposes so you will probably have an http server outside of the J2EE application server. What the "carrot model" represents is that component utilisation is usually quite high at the 'front' of the application server and gradually dwindles off as you get towards the JDBC connection pool or JCA connectors, hence the analogy of a carrot, thick at one end and thin at the other. Note that ORB is only used for J2EE clients, web clients will utilise the web container http server or external http server directly.
A lot of texts refer to the effort and careful consideration that needs to go into designing the layers of the applications architecture. I would add that the same level of consideration needs to be given to where processing should be carried out. In the simplest of architectures with a J2EE application server on one machine and a database on another, there is still scope for the network to become the bottleneck by excessive calls being made to the database to carry out the processing of business logic and also standing data lookup. In one example, a stateless bean used to access fixed domain data was the most utilised bean in the application until a standing data cache in the application server was implemented. In another example, a stored procedure was called to validate individual items within individual records, using fixed domain data cached in the application server. By carrying the validation entirely within the application server, the process concerned ran faster, required fewer threads than before with lower CPU consumption on both of the servers hosting the application server and the database.
There is equal scope for poor performance and scalability to be present by design with vertical layering also. One particular design trend is to develop specific functional areas with an API implemented using a session facade. Tools such as Apache Axis and also the J2EE application server itself make exposing stateless session beans as a web service easy and straight forward. Consider two separate beans used to implement the APIs for two different functional areas, if these areas are tightly coupled and they are to be used in a service oriented architecture, there is going to be a significant amount of XML to Java serialisation that takes place involving the two respective web services, that might be avoided, if the API was provided by a single bean.
Tuesday, November 04, 2008
The Rise Of MySql From The Low End
"The third is the need for change - one executive to whom I spoke recently said her entire discretionary budget was consumed by one proprietary vendor's price increase. So she's out looking for an alternative, and MySQL fits the bill. Which is to say, necessity's the mother of invention - and there's a lot of necessity going around right now."
Makes you wonder if MySql is going to start biting Oracle at it's ankles around the lower end of the market more than ever before.
Monday, October 13, 2008
"The Performance Hacker"
Of the other characteristics that define the "Performance hacker", they like to indulge in tuning via the internet. There are, however, some excellent sources of information on the internet and in blog sphere regarding tuning:
- The Real World Performance Group
- The Optimizer Group Blog
- Jonathan Lewis's blog
- Wolfgang Breitlings web site etc
The approach recommended by the Real World Performance Group is based around what I first found in a paper entitled "Performance Tuning By Cardinality Feedback" by Wolfgang Breitling, a reviewer of Jonathan Lewis's cost based fundamentals book. The basic premise of this is that you look at where actual cardinality versus the predicted cardinality in a plan is out, usually by an order of magnitude and usually somewhere near the start of the plan. To the best of my knowledge, this approach is not mentioned in the standard Oracle performance tuning material.
Knowing about this technique and how to apply it is made easier in Oracle 10g with the gather_plan_statistics hint (or statistics_level=all) and the DBMS_STATS.XPLAN package. If you use the "Tuning by cardinality" feedback approach and find that you plan does not improve through addressing issues including:-
1. Inaccurate stats, particularly the number of distinct values against highly skewed data.
2. Data correlation or predicate independence assumption issues.
3. Predicate values being out of bounds when compared to min max values for a column.
4. Missing histograms or histograms with inappropriate numbers of buckets.
5. Pseudo columns relating to function based indexes with out stats or histograms.
6. Selectivity assumptions made when functions are used.
7. Skewed data, for which there is more than 254 disitinct values and the values you are interested in are not 'popular' values, i.e. do not fall on a bucket boundary end point.
By providing full evidence that this approach has been applied and followed, you will get a much more expediant resolution to your issue out of Oracle support than someone reporting a poorly performing query or a query with a large full table scan in it's plan.
Thursday, September 18, 2008
A Ticking Performance And Scalability Time Bomb and Java Object Caching Grids
In the Java world the term used to describe the layer that interfaces with the database is known as "object relational mapping", or ORM to use it's abbreviation. The focus tends to be on mapping tools that provide neat frameworks for managing the SQL, XML configuration files are a common component used in such approaches. However, there is little focus on leveraging features such as Oracle's array interface to make database access as efficient as possible. Indeed, as Tom Kyte often mentions in his books, some people take the black box approach to using the database. In the same book, Expert One on One Oracle, I think, one project team elected to develop something akin to the query optimizer in the application server. Talk about re-inventing the wheel.
There are inherant bottlenecks with using a relational database, mainly the data type impedance mismatch. Numerous JVM based object caching solutions which can front the database and even do away with it, have made their way onto the market to address this. Examples of such solutions include Oracle Coherence, Gigaspace Application Fabric, IBM Object Grid, Appistry, Terracotta etc. These caching solutions and their associated processing grids have given rise to the paradigm of "Extreme transaction processing", which I believe is a term coined by the Gartner group. Someone I spoke to in Oracle pre-sales earlier on in the year mentioned a customer who had managed to achieve with £40,000 of Intel based blades and Coherence what they had budgeted £200,000 of Unix hardware for. Indeed, the founder of Tangosol, the commpany that first developed Coherence, then to be acquired by Oracle, came up with the idea whilst he was a Java consultant and visited numerous customer sites where accessing databases was the main bottleneck in the application architecture. As an aside, Coherence can cache a wide variety of data sources other than databases including web services. It is also equally at home in caching with .Net as it is with Java.
The fact of the matter is that most people still need to use a relational database somewhere. In most peoples infrastruture and architectures some form of reporting needs to be performed, who has ever heard of a reporting solution based on Java objects ?. There may well be a reporting solution based on Java objects out there somewhere, but in my experience most people use SQL to meet their reporting requirements. SQL is a rich and mature language for reporting, with its analytic extensions and constructs such as materialised views, so the relational database might still be around for some time. As Tom Kyte once said, platforms change once every ten years from main frames, to Unix, to Windows and now Linux, however Oracle has always been most peoples database of choice and the software of choice for storing, processing and reporting on a organisation's data.
Wednesday, March 26, 2008
Missed Opportunities and Getting It Right The First Time
1. Index Organized tables in 8i not supporting secondary indexes.
2. Global indexes on partitioned tables becoming unusable after partition maintenance operations, drops, splits etc . . .
3. Being able to pass compressed blocks across the cluster interconnect, still an issue in 11g release 1 (I think).
Historically, most Oracle developers use literals against columns for which the data is skewed and binds against columns in predicates for which there is no value in forcing a re-parse of the SQL statement, typically values which are highly selective. With cursor_sharing = similar, plans should only be re-used if the parsed cursor can 'safely' re-use an existing plan. However Metalink technical note Note.377847.1 "Unsafe Peeked Bind Variables and Histograms", states that the RDBMS can mark a bind variable as being unsafe and thus generate many child cursors for statements, I have seen first at first hand the RDBMS doing this even when there is no histogram on a column (method_opt=>" . . . size = 1"). The solutions to this are:-
1. Use cursor sharing exact, this defeats the purpose of using cursor_sharing=similar somewhat.
2. 'Drop' histograms on columns that cause unsafe bind peeking, there is a good post on the Pythian blog about the overuse of histograms.
3. Check whether dynamic_sampling is causing this as per the Metalink technical note.
4. Set _optim_peek_user_binds = false to disable bind peeking full stop, if you still get this without a histogram.
v$sql_shared_cursor will not give any insight into why this happening. There is no 'fix' to this issue as such, however 11g introduces intelligence into the issue with adaptive cursor sharing. The on line Oracle documentation, Optimizer blog and Technet all provide good references on this.
The real question is why has it taken Oracle this long to introduce intelligence into cursor sharing ?. On the same theme of new features taking their time (and then some) to mature, I'd like to use the results cache in 11g in anger, but how long will it take Oracle to shave the rough edges off this feature, i.e.:-
1. Results being invalidated when none related rows in the related base tables change.
2. No explicit information in autotrace stats for knowing that results caching has been used
I've heard whispers of new functionality appearing in 11g release 2 being related to none core RDBMS technology, perhaps Oracle should focus on the maturity of the RDBMS engine core features first.
Tuesday, March 11, 2008
A New Type Of Synchronization Mechanism In 10g Release 2
Re: cursor: pin S wait on X
From: Tanel Põder
To:
Date: Sat, 29 Apr 2006 20:10:50 +0900 I studied the mutex stuff a bit when 10.2 came out, I'd say mutexes are less intelligent than latches. But thanks to that, they are more lightweight and thanks to quite large caches on modern CPUs we can afford having tons of them - allowing very fine grained locking.On my 32bit linux installation a mutex was 28 bytes in size, while a regular latch structure was 110 bytes. In theory every library cache child should have it's own mutex, but in practice I saw one case where two different child objects were protected by the same mutex - that was what library cache dump showed me anyway.Mutex also acts as a pin structure, so there's no need to do another memory write for pinning a cursor. Mutex allows immediate shared access to resources it protects, the old approach would mean getting a library cache latch in exclusive mode and then modifying the corresponding library cache child pin structure.Mutexes also don't maintain statistics on successful gets as latches do, only sleeps are registered.Thanks to all that, an immediately succeeding mutex get operation takes several times less CPU instructions (and hits possibly less memory stalls), from a 10gR2 perf tuning course I remember it was about 250 vs 50 CPU instructions on IA32 platform.Note that mutexes don't provide much benefit for poorly written applications, for most benefit your apps should avoid any kind of reparsing - keeping cursors open and cancelling them manually when interest in query results has ended.Tanel.
Tuesday, March 04, 2008
The Great Oracle Diagnostic Tools Debate
- The "self managing database" is more a thing of marketing than technology, to quote Dan Fink.
- Extended tracing allows you to see where time is being expended on individual statements to a much finer degree of granularity than ADDM and the time model.
However, there are certain practicalities that need to be considered here:-
- Extended trace is of limited use for "one off" performance issues, you wouldn't really want this to be enabled all the time on a production environment.
- The use of J2EE application servers is common place today, unless your application Java code is well instrumented by using using DBMS_SUPPORT to end to end tracing, a logon trigger to trace what the JDBC client is doing may have to be used. Trace files can be tagged using EXECUTE IMMEDIATE 'ALTER SESSION SET TRACEFILE_IDENTIFIER . . .' . Unfortunately good application instrumentation is the exception rather than the rule and only Oracle savy Java developers or teams with access to a good DBA will know about end to end tracing and trcsess. Otherwise, something such as a logon trigger may lead to a forest of trace files to wade through.
- Some IT shops (and I know this is bad practise) don't always have a test / development environment for every production environment or don't have test environments that are identical to their production environments, this is often the case with data warehouses. Therefore, if you have a very intermitant problem in production or a problem which despite being reproducible is very infrequently, reproducing this on another environment might be a problem and again you wouldn't really want SQL tracing enabled on a production environment 24/7 until a problem re-occurs. This may not be such an issue if via good instrumentation the application provides the facility for tracing to be generated on specific parts of the application and under specific circumstances, however, my experience is that applications that do this are few and far between.
- In his post Richard Foote uses a very good example of enabling extended tracing being analogous to putting a camera on some who journey gets held up, his wife in the example. What do you do if you don't know what part of the software to "put the camera" on ?.
- The issue you are experiencing might be time sensitive, meaning that in order to reproduce the issue you need to replay data which was loaded / processed by the application at specific dates and times. With Unix this can be done by changing the system clock, but again, this is not something you would really want to do in a production environment.
I would love to know what Grahan Wood, architect and 'father' of the ADDM and time model makes of this debate and if he has plans to modify the ADDM and time model in light of these criticisms. Furthermore, I would also be interested to know what percentage of performance issues are down to bad configuration, poor coding practises, poor design and poor architecting, as it occurs to me that some people are concerned about ADDM and the time model taking some of the art out of tuning. However, I believe that an element of tuning is identifying poor design and architectural descisions a skill that no tool will be able to replace for a long long time.
Thursday, February 21, 2008
Good Rules Of Thumb For Benchmarking And Tuning
- Start out with clearly defined performance goals. Wooly, ill conceived or even none exisitant targets defeat the purposes of any bench marking exercise.
- Performance, throughput and scalability goals should be linked to the business aims of the software.
Testing Environment and Load Simulation
- Always use data which is representative of what the application will look like in production.
- Scale you data in order to understand how performance will be affected as the database grows, if database growth is applicable.
- Use a realistic time distribution for when SQL statements will hit the database, i.e. ask youself at what what sort of frequency are the SQL statements going to hit the database and how this frequency is distributed over the time window when the application will be in use.
- Use a realistic distribution of the SQL statements that used against the database.
- Ensure that when benchmarking and tuning your application, that your application is the only using you server, network, I/O infrastructure. Activity outside of your application will skew your results. SANs are brilliant for skewing your results through caching, some SANs queue all I/O until the write part of the cache is written to disk once it becomes full, read aheads can kick in etc . . .
- When using load simulation tools be aware of 'think' time, if this is to low you can in fact be simulating activity which is a order or magnitude greater than what you are trying to achieve.
- Leverage Oracle features that allows your environment to easily reset and for tests to be easily reproduced, such as flashback database.
Metrics Capture
- Use meaningful metrics for capturing performance, the ultimate goal of your exercise will be the fulfilling processing under certain conditions in order to meet certain business goals, therefore you should be able to express some of your findings in terms of figures that are meaningful to the business users of the application.
- Hit ratios are flawed indicators of performance, there is some PL/SQL on Connor MacDonalds http://www.oracledba.co.uk/ web site which illustrates how to set the buffer cache hit ratio to support this point. Jonathan Lewis has made a presentation on the subject of hit ratios, in which he explains the flaws of traditional hit ratios and put's forward the "Fan Hit Ratio" as a superior performance metric, refer to http://www.jlcomp.demon.co.uk/hit_ratios.html. The crux of hit ratios being flawed is that they loose meaning, latency to name but one of the things that hit ratios do not capture.
- Techniques for capturing metrics should not incur overheads that they skew your results.
Tuning
- Tune What You Know, this gets reiterated at every UKOUG conference during various performance and tuning related presentations and justly so. Only tune after you have hard statistics showing where your time is going, do not blindly apply techniques for optimisation first without knowing where the time is being expended.
- Make one change to your environment at a time when tuning.
- When tuning, make changes which correspond to the scope of the problem, e.g. when tuning a query look for changes which are within the scope of the query, object statistics, indexes etc rather than initialisation parameter changes. There are obviously parameters that affect optimisation, however, when trouble shooting your first port of call should always be to the places which are local to the problem.
- When gathering performance metrics and using Oracle 10g, ensure that database time is one of the metrics you collect.
- If testing turnaround times for parts of your application are considerable, consider (if possible) breaking out certain parts of the application in order to test the affect of certain changes idependantly before performing the full blown test using the real software.
- Try to understand what parts of the architecture / infrastrcture will throttle the performance of your software.
- Optimise "single stream" performance as much as possible before scaling out.
- Once performance goals have been met, document baselines of key performance indicators (document a flight envelope) to provide a tool for production performance trouble shooting.
Wednesday, February 20, 2008
Preventing Database Abuse By J2EE Application Servers
1. You shouldn't believe all benchmarks on face value.
The main objective is to eliminate latency when accessing data in the database as much as possible. The fastest way to do something is to not do it at all, when using Oracle this means:-
- Keep hard parsing to a minimum.
- Keep physical reads to a minimum.
- Keep latching to a minimum.
- Keeping logical reads to a minimum.
- Accessing SQL from any language even from PL/SQL carries certain overheads, minimise database access from an application server in order to reduce data access latency and network round trips, assuming that the application server and database server are two seperate machines.
This posting assumes the use of bean managed persistance, i.e. the JDBC API is used explicitly in the Java code and that an Oracle JDBC Driver compatible with the JDBC 2.0 standard is being used. Here are my tips for optimising a J2EE application which uses Oracle, assuming that it uses widget by widget logic (row by row processing) and you do not want to re-write the application, my points should be reasonably comprehensive, but please forgive me if I have missed anything out.
Good Coding Practises
- Make sure you code is bind valariable friendly, use prepared statements for the same statements that are executed multiple times. Refer to this tutorial from Sun for more information on prepared statements:-
http://java.sun.com/docs/books/tutorial/jdbc/basics/prepared.html - Avoid using the database if you can by caching standing (lookup) data in the middle tier.
- If there is any application logic that purely uses standing data, after you have cached this perform the processing entirely in the middle tier so as to avoid round trips on the network.
- If your application is based on processing numbers of dicrete items, lets say 'widgets'. If you have a batch process that processes 10 widgets for example, ensure that statements with execution counts that should relate to the numbers of widgets being processed are not performed more times that the number of widgets.
- Look for SQL the application is generating that returns no rows or affects no rows, is this redundant SQL ?, can the application logic be changed so that this does not need to be executed.
- JDBC resources, namely result set, statement and connection handles should always be released when they are no longer required, use v$open_cursor in order to track down cursor 'leaks'.
Intelligent Use Of SQL, PL/SQL and JDBC
- Use the JDBC batching API to execute multiple statements in one batch. This effectively puts a 'BEGIN' and 'END' around multiple statements. The Oracle JDBC driver supports two flavours of this, the standard JDBC 2.0 interface and an Oracle proprietry interface, refer to:-
http://download-west.oracle.com/docs/cd/B19306_01/java.102/b14355/oraperf.htm
- Look at using things such as in line functions, subquery factoring and merge statement to achieve in one piece of SQL what you might otherwise require multiple statements for.
- If you frequently create the same result set and perform application logic against it multiple times, look at aggregating this into a working table.
- From 8i onwards Oracle has the facility to support Java stored procedures, can some of the data intensive porocessing be moved closer to the database using these, i.e. leverage the existing Java skills in the team.
- If your application is designed around widget by widget processing logic, it's probably performing a fair amount of index range scanning. Look at making these more efficient by rebuilding larger more frequently accessed indexes with a larger database block sizes, this significantly reduces latching around the buffer cache, i.e. larger blocks = less logical reads + larger leaf blocks = less branch blocks in the index B-tree. Also consider index compression.
- Leverage the JDBC features for traversing result sets, if for example you use the queries:-
SELECT empno FROM emp WHERE deptno = 10
and
SELECT MAX(empno) FROM emp WHERE deptno = 10
Within close proximity in your application you might be able to do away with the second query by using the JDBC API to navigate around the result set returned by the first query. - Use set the fetch size array in JDBC in order to minimise network round trips when retrieving multiple multiple rows from a query.
- The chances are that any batch processes in you widget by widget processing software achieves it's throughput via threading, if this is the case you may experience significant log file sync waits (waits on the contents of the log buffer being written to disk). This particular bottleneck can be eliminated by using the 10g asynchronous write to redo log files feature via the COMMIT_WRITE initialisation parameter. However, be advised that there are certain resiliency remaifications to using asynchronous writes to the redo logs.
- Use connection pooling, all J2EE applications servers support JDBC connection pooling, reusing existing connections is an order or magnitude faster than having to create a connection every time a statement needs to be executed. Also set the minimum and maximum number of connections to the same value in order to prevent connection storms, i.e. bursts of requests from the application server to the database for the creation of connections.
Move The Data Closer To The Application Server => Reduce Data Access Latency
- Leverage the Oracle 11g client side results cache in order to perform SELECTS without incurring the cost of jumping out of the application server into the database, however be aware that in 11g release 1, ANY changes to the data in the base tables upon which a result set is dependant will invalidate the results in the results cache, will this restriction be lifted in 11g release 2 ?.
- A better option than using the results cache in 11g may be to build you own cache off the back of the database change notification mechansim available via the 11g JDBC driver for both 11g and 10g databases.
- If you have more exacting caching requirements look at a caching software specifically for this purpose such as Oracle Coherance, IBM WebSphere DataGrid, Gigaspaces etc . . .
Further reading:-
JDBC 2.0 Fundamentals from Sun http://java.sun.com/developer/onlineTraining/Database/JDBC20Intro/
Oracle 10.2 JDBC Developers Guide And Reference
http://download.oracle.com/docs/cd/B19306_01/java.102/b14355/toc.htm