The Lazy Tech's Blog: February 2011

Monday, February 21, 2011

Exadata Series - Performance Comparison

So how does Exadata compare?

A lot of people have asked this question and in truth although your mileage will vary, Exadata is a definite screamer and should solve most performance challenges. But there is more to compare than just performance! A lot needs to be taken into consideration and I highly recommend considering all the variables before making any decisions since Exadata is very expensive and ROI needs to be proven. Some criteria to include:

Solution Maturity
Pricing (CAPEX and OPEX)
Performance (OLTP, Warehousing, Analytics)
Scalability
Availability
Backup/Recovery Options
Migration Options
Management Options
Integration Requirements
Virtualization Options
Tiering Options
Provisioning Options
Training Requirement
Professional Services Available
Refresh/Replication Options
Cooling
Floor space
Networking

Now going back to the 'U' platform, it is very impressive and when paired with the appropriate storage solution is a strong value proposition. Storage features such as snap and clone can provide instant alternate environments, or backup/recovery options. Such options are not available for Exadata. Some will tell you that Data Guard combined with Flashback does the same thing but there are substantial differences (for example, instant snap or clone vs. build and setup the Data Guard environment). You must determine for yourself what solution will best address your business needs now and in the future.

What were the platforms?

Again, without getting into too much detail (for reasons previously mentioned), lets call use the following:

Platform P5 w/CX type storage (baseline)
Platform P7 w/DS type storage
Platform 'U' blades w/CX type storage - Note that the storage was suboptimal as it was our sandbox storage environment with a known bottleneck in the number of connections; the hardware/software stack was also not optimized for Oracle
Platform HDS w/AMS type storage
Oracle Exadata v2 (1/4 rack w/High Performance disks)

Knowing we were looking at Exadata, the other vendors took the approach of matching Exadata in terms of price/performance taking into consideration the cores and license costs. This was especially relevant to platform P7 since their core factor is 1.0 whereas Intel (used by Exadata and platform 'U') is 0.5. As expected, in the pure CPU processing area, platform P7 was the most efficient CPU. This of course resulted in the vendor (as they knew going in) being able to use less processors to match, or more accurately beat, the Intel processors and hence making the core factor a non-issue. For example, using 6 x P7 cores bested 12 x Intel Nehalem cores. It will be argued of course that Exadata has 96 cores for the DB nodes + 168 cores for the Storage nodes (in a full rack) since processing will also be done by the Storage Servers. That is a valid argument except were the storage servers are not involved which depends a lot on your workload. It must be noted that platform 'U' did quite well given its degraded setup, even besting Exadata in a few individual tests (a real testament to the the platform).

For testing we devised 5 test cases consisting of the same 12 unit tests (i.e. loads, stats collection, and queries):

T1: "As is", i.e. just run without any changes
T2: Hints dropped
T3: Hints & indexes dropped
T4: Same as T3 but using compression (HCC for Exadata & AC for other platforms)
T5: Same as T3 but with artificially induced load on the CPUs (at 100%)

Testing was done off-site by the respective vendors, except platform 'U' which was done on-site by myself. Oracle apparently has a policy against making available performance data so I'd recommend this be discussed upfront if you want access to the AWR and other such information for review. We were unaware of this policy going into the tests and were told the AWR was not captured. As we persisted the explanation changed into it being "company confidential", and recently into such information is not generally made available.

I also recommend ensuring the appropriate Oracle resources are made available. We were less than impressed with the Oracle team running the POC as given the collective resources of Oracle at their disposal it took them until the next day to realize the Exadata machine was improperly configured, had a failed Flash card, and also how to use Data Pump (we had to help them here). Just getting our data (less than 4TB) inside the machine was taking over 4+ hours (operation was killed) until each of these issues were addressed. The load time was still unimpressive though our contention was that the machine was still less than ideally configured:

Exadata: less than 3 hours
Platform 'U': 1 hour 45 minutes
Platform P7: 16 minutes

To share some other performance numbers (ordered by overall best time improvement) see below. Note that I've combined the platform 'U' and 'HDS' results since HDS was the storage piece and U was the compute piece.

T1: P7 (9x), Exadata (7x), U/HDS (4x)
T2: Exadata (13x), P7 (10x), U/HDS (9x)
T3: P7 (21x), Exadata (11x), U/HDS (8x)
T4: P7 (16x), Exadata (12x), U/HDS (8x)
T5: U/HDS (17x), P7 (9x), Exadata (6x)

Of note is that we had a particularly nasty load test with which all the platforms had trouble, so much so in fact that in the case of T1 none of the systems managed to complete the test in a time which bested the baseline (unit tests were stopped once past the baseline based on our discretion).

Curiously, the Exadata DB nodes were 100% CPU utilized by two concurrent IAS streams in T5, while for the other platforms a more artificial stress was required using a CPU heavy PL/SQL function (a session per CPU thread). We found this quite strange given the similar processing power between the Exadata DB nodes and the other Intel platforms, though as we were not given access to any data we were unable to get any answers.

Monday, February 14, 2011

Exadata Series - The beginning

As with any story there is always a beginning, and so it is with this Exadata project. I would say it all started around Q1 2010, when... well, lets just say we decided to look at what our next generation platform would be for our enterprise databases.

The Current Platform

Our existing platform is IBM Power5 with micro-partitions. These servers are very reliable, hold tons of memory, and provide lots of processing power. Although the performance of the Power5 has been eclipsed by Intel's Nehalem and even further by IBM's Power7 (the current generation Power), they are even now more than capable of supporting any workload. Without disclosing any major details which may get me into trouble with my current employer, lets just say there were a few such machines, well populated for CPU, and RAM and various I/O and HBA cards. Capacity on Demand (CoD) also being utilized when required. That is one of the main benefits of the Power platform IMHO, being able to pay as you grow as well as pay as you need/use.

These machines hosted multiple databases (a few sharing the same partition), for all the various environments (i.e. development up to production). Our storage platform was mid-tier (like CLARiion CX3/CX4) of which ~80TB was for the 20+ production Oracle databases to be migrated to Exadata. Of course the storage platform was general purpose so there were PB being used for various purposes (including other Oracle databases).

Database workload was no different from others being a mixture of OLTP, warehousing (reporting), and analytics (the usual 60/40, or 70/30). The database sizes ranged from ~100GB up to 11+TB, with the main distribution in the mid GB range. The larger ones being the most business critical (aren't they always?).

Step 1: Lets test out RAC first

Being as we had a fair number of databases, and the direction was to consolidate, our thinking was to leverage Real Application Clusters (RAC) with Services to provide some separation between workloads and/or applications, provide scalability, and improve availability. So we decided to first test out RAC using 11.2.0.1 on commodity and see how our workload functions in such an environment as a first step. My understanding was that previous tests with 10.1 RAC did not go so well as our applications were not RAC-friendly.

At the time we were finishing up testing on platform 'U' (a relatively new blade server platform) for some other virtualization stuff, so we decided to extend that hardware loan and re-use it for our RAC testing. The virtualization tests went well and we had already decided to use it going forward so it was advantageous to minimize platforms (save costs and all that) should we decide to go with a commodity solution. The RAC tests went well, especially the actual platform 'U' testing. There are real advantages to stateless computing, i.e. using server profiles with a SAN-booted OS image and being able to move from one blade server to another with minimal downtime.

Much later on, we ran into some problems, similar to others, with using 11.2.0.2 due to the multicast requirements for HAIP. I unfortunately did not have time to resolve this problem with the server and networking guys as I was only given an hour. In any case this was a non-issue as we simply used 11.2.0.1. In upcoming entries I'll share some interesting performance comparisons for our workload.

Update note:
Seems there are a few workarounds to addressing the multicast-case issues I faced on platform 'U':

an updated firmware
set the virtual IO card to promiscuous mode
turn of filtering in the Linux OS

New postings (and welcome Exadata)

Once again, it is time for some blog posts. I find it quite hard to blog even though I really should do more (for various reasons). There is just so much already out there on the internet for what I want to say, and then there is the time it takes to do a reasonably good blog posting. I really respect all who do it regularly while having a full time job and family.

Anyways, I'll be putting together a few series on Oracle Enterprise Manager Grid Control 11g,and Oracle11gR2 DB including RAC, and Oracle Exadata.

We've just gotten our first Exadata X2-2 full rack HC machine. This is one of four to be delivered and it has been quite a ride. It has been over a year of decision making involving POC, RFP, finalizing a decision, and ordering. I'll try to document as best as I can keeping in mind NDA. Many thanks must go out to all the various bloggers and pioneers (some of whom I've had the good fortune to meet) of Exadata v1, v2, and X2-2. Without your postings and information, this would have been quite a larger ordeal.

The Importance of Security
A colleague of mine was so excited he forgot all about our campus security policy and took a picture of the delivery truck. I found this a bit much since he attended OpenWorld 2010 and saw the X2-2, X2-8, and Exalogic machines up close and took pictures, as well as the fact it was just the truck since there was no view of the actual machine, but I can't really blame him. In any case, his phone was quickly confiscated and the picture removed due to our strict no pictures on site policy! He was written up in the guards report though I'm sure no harm will come from it (or so we hope), and although I had not taken any pictures I was guilty by proximity and my phone was also confiscated and checked as a precaution. He is planning to request access to the data center for a second attempt, though this time authorized (and no pictures this time). I wish him luck!