Monday, February 14, 2011

Exadata Series - The beginning

As with any story there is always a beginning, and so it is with this Exadata project. I would say it all started around Q1 2010, when... well, lets just say we decided to look at what our next generation platform would be for our enterprise databases.


The Current Platform

Our existing platform is IBM Power5 with micro-partitions. These servers are very reliable, hold tons of memory, and provide lots of processing power. Although the performance of the Power5 has been eclipsed by Intel's Nehalem and even further by IBM's Power7 (the current generation Power), they are even now more than capable of supporting any workload. Without disclosing any major details which may get me into trouble with my current employer, lets just say there were a few such machines, well populated for CPU, and RAM and various I/O and HBA cards. Capacity on Demand (CoD) also being utilized when required. That is one of the main benefits of the Power platform IMHO, being able to pay as you grow as well as pay as you need/use.

These machines hosted multiple databases (a few sharing the same partition), for all the various environments (i.e. development up to production). Our storage platform was mid-tier (like CLARiion CX3/CX4) of which ~80TB was for the 20+ production Oracle databases to be migrated to Exadata. Of course the storage platform was general purpose so there were PB being used for various purposes (including other Oracle databases).

Database workload was no different from others being a mixture of OLTP, warehousing (reporting), and analytics (the usual 60/40, or 70/30).
The database sizes ranged from ~100GB up to 11+TB, with the main distribution in the mid GB range. The larger ones being the most business critical (aren't they always?).

Step 1: Lets test out RAC first

Being as we had a fair number of databases, and the direction was to consolidate, our thinking was to leverage Real Application Clusters (RAC) with Services to provide some separation between workloads and/or applications, provide scalability, and improve availability. So we decided to first test out RAC using 11.2.0.1 on commodity and see how our workload functions in such an environment as a first step. My understanding was that previous tests with 10.1 RAC did not go so well as our applications were not RAC-friendly.

At the time we were finishing up testing on platform 'U'
(a relatively new blade server platform) for some other virtualization stuff, so we decided to extend that hardware loan and re-use it for our RAC testing. The virtualization tests went well and we had already decided to use it going forward so it was advantageous to minimize platforms (save costs and all that) should we decide to go with a commodity solution. The RAC tests went well, especially the actual platform 'U' testing. There are real advantages to stateless computing, i.e. using server profiles with a SAN-booted OS image and being able to move from one blade server to another with minimal downtime.

Much later on, we ran into some problems, similar to others, with using 11.2.0.2 due to the multicast requirements for HAIP. I unfortunately did not have time to resolve this problem with the server and networking guys as I was only given an hour. In any case this was a non-issue as we simply used 11.2.0.1. In upcoming entries I'll share some interesting performance comparisons for our workload.


Update note:
Seems there are a few workarounds to addressing the multicast-case issues I faced on platform 'U':

  • an updated firmware
  • set the virtual IO card to promiscuous mode
  • turn of filtering in the Linux OS

No comments:

Post a Comment