1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14.

Thursday, January 15, 2009

The Basics That Undermine RAC Scalability And Availability

I'm writing this post as one of the things I hope to get into in the near future is Oracle RAC. There are two reasons for adopting Oracle RAC:-

1. Scalability

Cluster inter-connect traffic and performance is a major factor here. As such Infiband is being touted as the up and coming inter-connect fabric. Also if you google Joel Goodman Oracle, you will most probably find references to presentations and papers on how uncached sequences without noorder specified can degrade performance via the pinging of blocks across the inter-connect. Other than this, the same factors that influence performance and scalability for a single instance application also apply to RAC. Specifically, these are the "usual suspects" that the Oracle Real World performance include in most of the material they present at Oracle Open World, namely:-

1. Good schema design
2. Efficient connection management
3. Efficient cursor management

Therefore, my first question is this, how many RAC sites test their applications in single instance environments for scalability around these three factors. Also, how many sites test the scalability of their clusters as they add nodes. Interestingly according to Julian Dyke’s book on 10g RAC and Linux, Oracle actually turn cache fusion off in their RAC TCP bench marks.

A further recommendation is that your application has some intelligence when connecting to database services, such that the workload directed at specific nodes, leads to a minimal amount of cluster inter-connect traffic. Without any evidence to back it up, there are probably applications out there which use tables to implement things best described as application level semaphores or tables which occupy very few blocks with flag like columns, which if placed on a RAC cluster will slay the inter-connect.

2. High availability

It is recommended that there is adequate capacity in the cluster to soak up the workload created when one or more nodes fails. Technical issues aside, a cluster is not a panacea for high availability and this can be undermined by several factors:-
  • Poor testing of changes applied to a cluster.
  • Lack of a realistic test cluster.
  • A lack of tight change management procedures and processes.
In short, as I alluded to all of this, it begged the question, how many RAC sites carry out all of this work when implementing RAC, there is no way of knowing suffice it to say that the answer is probably less than 100%.

No comments:

 
1. 2.