The Lazy Tech's Blog: rac

Showing posts with label rac. Show all posts

Thursday, February 16, 2012

Using udev to configure disks for ASM in Linux

I've never been the biggest fan of Oracle ASMLib. While it is far easier to configure disks for ASM using ASMLib than udev or multipath, it also added an additional requirement for rpms, an additional layer in the stack, and dependency on the kernel version. An internet search will reveal numerious discussions on this very topic, with many having a preference to not use ASMLib. However, due to the increased ease of configuration, strong recommendations from Oracle and believed performance improvements I had always used ASMLib. For what it's worth, there does seem to be some truth behind ASMLib having a performance benefit as though I've not personally tested it, I have read blog postings where its removal resulted in greater CPU usage. If you have enough resources this is not a major concern though over time it may end of being the case.

My stance, and I would think a lot of others, have since changed with the recent developments in Oracle's competition with Red Hat. To be more specific, MOS note "Oracle ASMLib Software Update Policy for Red Hat Enterprise Linux Supportd by Red Hat [ID 1089399.1]", the quote of interest being:

"For RHEL6, Oracle will only provide ASMLib software and updates when configured with a kernel distributed by Oracle. Oracle will not provide ASMLib packages for kernels distributed by Red Hat as part of RHEL6. ASMLib updates will be delivered via Unbreakable Linux Network(ULN) which is available to customers with Oracle Linux support. ULN works with both Oracle Linux or Red Hat Linux installations, but ASMlib usage will require replacing any Red Hat kernel with a kernel provided by Oracle."

With this note, I've since been devoting time to updating my Oracle database standards for Linux, i.e. SLES and RHEL since we are not an Oracle Linux shop and have no plans to pursue such a course (nothing personal Oracle, it's just business). So I began work on what I though was simply making a few modifications to use udev instead of ASMLib following which I'd do due diligence to test it out on our standard versions of SLES and RHEL. I've used it before on RHEL and it should prove simple enough, right?

The Case of the Missing SCSI ID

I was so wrong. When I tried to obtain the SCSI ID for the disks during my testing nothing was returned. I spent some time researching, but came up empty until I realized I was not searching correctly. You see we now run most of our Linux environments on VMware vSphere and I was not using this in my search terms. Correct input and bingo! Seems there is a known issue where by default VMware does not expose SCSI IDs for the disks. To correct this you will need to:

Shutdown your VM guest
In vCenter, right click your VM guest in the LHS pane and select 'Edit Settings...' (can also be obtained directly via 'Summary' tab)
Click the 'Options' tab
Select the 'Advanced -> General' navigation item on the left side and click the 'Configuration Parameters...' button displayed on the right
Click the 'Add Row' button
Add the name 'disk.EnabledUUID' with a value of 'TRUE' (no quotes of any kind) and click 'OK' all the way through the screens to save
Restart your VM guest and enjoy receiving SCSI IDs

An alternate method is to use a text editor and add the entry disk.EnabledUUID="TRUE" to your VM guest's VMX file and restart your VM guest.

UDEV Steps
Now onwards with the steps for udev configuration!

1. Add the "options=-g" line to the /etc/scsi_id.config file as the root user.

2. Obtain the SCSI ID for your disks as the root user:

/sbin/scsi_id -g -s /block/{sd_device}

3. Create a udev rules file in /etc/udev/rules.d directory as the root user with entries similar to the below:

vi /etc/udev/rules.d/99-udev-oracle.rules
# ####################################################
# FILE: 99-udev-oracle.rules
# DESC: UDEV rules file for Oracle ASM functionality.
# Should be placed under /etc/udev/rules.d
# ####################################################
# DATA disks
KERNEL=="sd*",BUS=="scsi",ENV{ID_SERIAL}=="{scsi_id}", NAME="asmdisk1", OWNER="oracle", GROUP="oinstall", MODE="660"
# FRA disks
KERNEL=="sd*",BUS=="scsi",ENV{ID_SERIAL}=="{scsi_id}", NAME="asmdisk2", OWNER="oracle", GROUP="oinstall", MODE="660"

4. Stop and start udev as the root user:

/etc/init.d/boot.udev stop
/etc/init.d/boot.udev start

The SCSI devices can now be accessed by ASM, and you can set your ASM_DISKSTRING parameter to be "/dev/asmdisk*".

Monday, February 14, 2011

New postings (and welcome Exadata)

Once again, it is time for some blog posts. I find it quite hard to blog even though I really should do more (for various reasons). There is just so much already out there on the internet for what I want to say, and then there is the time it takes to do a reasonably good blog posting. I really respect all who do it regularly while having a full time job and family.

Anyways, I'll be putting together a few series on Oracle Enterprise Manager Grid Control 11g,and Oracle11gR2 DB including RAC, and Oracle Exadata.

We've just gotten our first Exadata X2-2 full rack HC machine. This is one of four to be delivered and it has been quite a ride. It has been over a year of decision making involving POC, RFP, finalizing a decision, and ordering. I'll try to document as best as I can keeping in mind NDA. Many thanks must go out to all the various bloggers and pioneers (some of whom I've had the good fortune to meet) of Exadata v1, v2, and X2-2. Without your postings and information, this would have been quite a larger ordeal.

The Importance of Security
A colleague of mine was so excited he forgot all about our campus security policy and took a picture of the delivery truck. I found this a bit much since he attended OpenWorld 2010 and saw the X2-2, X2-8, and Exalogic machines up close and took pictures, as well as the fact it was just the truck since there was no view of the actual machine, but I can't really blame him. In any case, his phone was quickly confiscated and the picture removed due to our strict no pictures on site policy! He was written up in the guards report though I'm sure no harm will come from it (or so we hope), and although I had not taken any pictures I was guilty by proximity and my phone was also confiscated and checked as a precaution. He is planning to request access to the data center for a second attempt, though this time authorized (and no pictures this time). I wish him luck!

Wednesday, December 08, 2010

Oracle11gR2 RAC: Removing Database Nodes from an Existing Cluster

Phase I - Remove the node from RAC database

Policy-Managed

1. Remove DB Console from the node by running the following from another node in the cluster (as ‘oracle’ user):

$> $RDBMS_HOME/bin/emca -deleteNode db

2. For Policy-Managed database a possible method is to decrease the size of the server pool, and relocate the node to the free pool (assuming all other maximums are met). For relocation to work there should be no active sessions on the associated instance. You can either use the ‘-f’ flag to force relocation, or shutdown the service on the instance prior to running the command:

$> $RDBMS_HOME/bin/srvctl stop instance -d -n
$> $RDBMS_HOME/bin/srvctl relocate server -n -g Free

Admin-Managed

1. For Admin-Managed databases, ensure the instance to be removed is not a PREFERRED or AVAILABLE instance for any Services (i.e. modify Services to exclude the instance to be removed).

2. Remove the instance using ‘$RDBMS_HOME/bin/dbca’, running the command from a node not being removed (as ‘oracle’ user):

$> $RDBMS_HOME/bin/dbca -silent -deleteInstance -nodeList -gdbName -instanceName -sysDBAUserName sys -sysDBAPassword

3. Disable and stop any listeners running on the node (as ‘oracle’ user on any node):

$> $RDBMS_HOME/bin/srvctl disable listener -l -n
$> $RDBMS_HOME/bin/srvctl stop listener -l -n

4. Update the inventory on the node to be removed (run from the node being removed as ‘oracle’ user):

$> $RDBMS_HOME/oui/bin/runInstaller -updateNodeList ORACLE_HOME= “CLUSTER_NODES={[oldnode]}” -local

5. Deinstall the Oracle home, from the node being removed (as ‘oracle’ user):

$> $RDBMS_HOME/deinstall/deinstall -local

6. From any of the existing nodes run the following to update the inventory with the list of the remaining nodes (as ‘oracle’ user):

$> $RDBMS_HOME/oui/bin/runInstaller -updateNodeList ORACLE_HOME= “CLUSTER_NODES={[node1,...nodeX]}"

Phase II - Remove node from Clusterware

7. Check if the node is active and unpinned (as ‘root’ user):

#> $GI_HOME/bin/olsnodes -s -t

Note: The node will only be pinned if using CTSS, or using with database version < style="text-align: justify;">8. Disable Clusterware applications and daemons on the node to be removed. Use the ‘-lastnode’ option when running on the last node in the cluster to be removed (as ‘root’ user):

#> $GI_HOME/crs/install/rootcrs.pl -deconfig -force [-lastnode]

9. From any node not being removed delete Clusterware from the node (as ‘root’ user):

#> $GI_HOME/bin/crsctl delete node -n

10. As ‘grid’ user from node being removed:

$GI_HOME/oui/bin/runInstaller -updateNodeList ORACLE_HOME= "CLUSTER_NODES={[oldnode1,...oldnodeX]}" CRS=TRUE -local

11. Deinstall Clusterware software from the node:

$> $GI_HOME/deinstall/deinstall -local

ISSUE:

‘$GI_HOME/deinstall/deinstall -local’ results in questions about other nodes, and prompts for running scripts against other nodes. DO NOT RUN!! This process deconifigures CRS on all nodes!

ROOT/CAUSE:
Incorrect documentation as a command is missing.

RESOLUTION:
The correct order (as already given in this document) should be as follows:

$GI_HOME/oui/bin/runInstaller -updateNodeList ORACLE_HOME=/oracle/app/11.2.0/grid “CLUSTER_NODES={[oldnode]}” CRS=TRUE -local
$GI_HOME/deinstall/deinstall -local

12. From any existing node to remain, update the Clusterware with the remaining nodes (as ‘grid’ user):

$> $GI_HOME/oui/bin/runInstaller -updateNodeList ORACLE_HOME= “CLUSTER_NODES={[oldnode1,...oldnodeX]}” CRS=TRUE

13. Verify the node has been removed and the remaining nodes are valid:

$GI_HOME/bin/cluvfy stage -post nodedel -n -verbose

14. Remove OCM host/configuration from the MOS portal.

Oracle11gR2 RAC: FAQ

Can I change a node’s hostname?

Yes, however, the node must be removed and added back to the cluster with the new name.

How do I define a service for a Policy-Managed Database?

When you define services for a policy-managed database, you define the service to a server pool where the database is running. You can define the service as either UNIFORM (running on all instances in the server pool) or SINGLETON (running on only one instance in the server pool). For SINGLETON services, Oracle RAC chooses on which instance in the server pool the service is active. If that instance fails, then the service fails over to another instance in the server pool. A service can only run in one server pool.

Services for administrator-managed databases continue to be defined by the PREFERRED and AVAILABLE definitions.

How do I convert from a Policy-Managed Database to Administrator-Managed Database?

You cannot directly convert a policy-managed database to an administrator-managed database. Instead, you can remove the policy-managed configuration using the 'srvctl remove database' and 'srvctl remove service' commands, and then create a new administrator-managed database with the 'srvctl add database' command.

What is Grid Plug and Play (GPnP)?

Grid Plug and Play (GPnP) eliminates per-node configuration data and the need for explicit add and delete node steps. This allows a system administrator to take a template system image and run it on a new node with no further configuration. This removes many manual operations, reduces the opportunity for errors, and encourages configurations that can be changed easily. Removal of the per-node configuration makes the nodes easier to replace, because they do not need to contain individually-managed state.

Grid Plug and Play reduces the cost of installing, configuring, and managing database nodes by making their per-node state disposable. It allows nodes to be easily replaced with regenerated state.

What is a Server Pool?

Server pools enable the cluster administrator to create a policy which defines how Oracle Clusterware allocates resources. An Oracle RAC policy-managed database runs in a server pool. Oracle Clusterware attempts to keep the required number of servers in the server pool and, therefore, the required number of instances of the Oracle RAC database. A server can be in only one server pool at any time. However, a database can run in multiple server pools. Cluster-managed services run in a server pool where they are defined as either UNIFORM (active on all instances in the server pool) or SINGLETON (active on only one instance in the server pool).

You should create redo log groups only if you are using administrator-managed databases. For policy-managed databases, increase the cardinality and when the instance starts, if you are using Oracle Managed Files and Oracle ASM, then Oracle automatically allocates the thread, redo, and undo.

If you remove an instance from your Oracle RAC database, then you should disable the instance’s thread of redo so that Oracle does not have to check the thread during database recovery.

For policy-managed databases, Oracle automatically allocates the undo tablespace when the instance starts if you have OMF enabled.

What is Run-Time Connection Load Balancing?

The run-time connection load balancing feature enables routing of work requests to an instance that offers the best performance, minimizing the need to relocate work. To enable and use run-time connection load balancing, the connection goal must be set to SHORT and either of the following service-level goals must be set:

· SERVICE_TIME—The Load Balancing Advisory attempts to direct work requests to instances according to their response time. Load Balancing Advisory data is based on the elapsed time for work done by connections using the service, as well as available bandwidth to the service. This goal is best suited for workloads that require varying lengths of time to complete, for example, an internet shopping system.

· THROUGHPUT—The Load Balancing Advisory measures the percentage of the total response time that the CPU consumes for the service. This measures the efficiency of an instance, rather than the response time. This goal is best suited for workloads where each work request completes in a similar amount of time, for example, a trading system.

Client-side load balancing balances the connection requests across the listeners by setting the parameter ‘LOAD_BALANCE=ON’ directive. When you set this parameter to ON, Oracle Database randomly selects an address in the address list, and connects to that node's listener. This balances client connections across the available SCAN listeners in the cluster. When clients connect using SCAN, Oracle Net automatically load balances client connection requests across the three IP addresses you defined for the SCAN, unless you are using EZConnect.

What are the different types of Server-Side Connection Load Balancing?

With server-side load balancing, the SCAN listener directs a connection request to the best instance currently providing the service by using the load balancing advisory. The two types of connection load balancing are:

· SHORT—Connections are distributed across instances based on the amount of time that the service is used. Use the SHORT connection load balancing goal for applications that have connections of brief duration. When using connection pools that are integrated with FAN, set the connection load balancing goal to SHORT. SHORT tells the listener to use CPU-based statistics.

· LONG—Connections are distributed across instances based on the number of sessions in each instance, for each instance that supports the service. Use the LONG connection load balancing goal for applications that have connections of long duration. This is typical for connection pools and SQL*Forms sessions. LONG is the default connection load balancing goal, and tells the listener to use session-based statistics.

How do I enable the Load Balancing Advisory (LBA)?

To enable the load balancing advisory, use the ‘-B’ option when creating or modifying the service using the ‘srvctl’ command.

How does the database register with the Listener?

When a listener starts after the Oracle instance starts, and the listener is listed for service registration, registration does not occur until the next time the Oracle Database process monitor (PMON) discovery routine starts. By default, PMON discovery occurs every 60 seconds.

To override the 60-second delay, use the SQL ‘ALTER SYSTEM REGISTER’ statement. This statement forces the PMON process to register the service immediately.

If you run this statement while the listener is up and the instance is already registered, or while the listener is down, then the statement has no effect.

Can I configure both failure notifications with Universal Connection Pool (UCP)?

Connection failure notification is redundant with Fast Connection Failover (FCF) as implemented by the UCP. You should not configure both within the same application.

Should I configure Transparent Application Failure (TAF) in my service definition if using Fast Connection Failure (FCF)?

Do not configure Transparent Application Failover (TAF) with Fast Connection Failover (FCF) for JDBC clients as TAF processing will interfere with FAN ONS processing.

Can I use Fast Connection Failover (FCF) and Transparent Application Failover (TAF) together?

No. Only one of them should be used at a time.

What is the status of Fast Connection Failover (FCF) with Universal Connection Pool (UCP)?

FCF is now deprecated along with the Implicit Connection Caching in favor of using the Universal Connection Pool (UCP) for JDBC.

Does Fast Connection Failover (FCF) support planned outages?

FCF does not support planned outages like service relocation (reference Doc ID: 1076130.1). It is designed to work for unplanned outages, where a RAC service is preferred on all the nodes in the cluster and one of the nodes goes down unexpectedly. When a planned outage like a service relocation is done from one node to the other, FCF does not work as expected and the result is unpredictable. There is no solution at present for this. Enhancement request 9495973 has been raised to address this limitation.

Should I user JDBC Thin driver or JDBC OCI driver?

Oracle thin JDBC driver is usually preferred by application developers because it is cross platform and has no external dependencies. However some applications require the high-performance, native C-language based Oracle Call Interface (OCI) driver. This driver is compatible with FCF and can alternatively use Transparent Application Failover (TAF) which operates at a lower level than FCF and can automatically resubmit SELECT queries in the event of a node failure. However for most applications, the ease of deployment of the thin driver with full FCF support will outweigh any benefits offered by the OCI driver.

How do I subscribe to HA Events?

If you are using a client that uses Oracle Streams Advanced Queuing, such as OCI and ODP.NET clients, to receive FAN events, you must enable the service used by that client to access the alert notification queue by using the ‘-q’ option via the ‘srvctl’ command.

FAN events are published using ONS and Oracle Streams Advanced Queuing. The service metrics received from the Oracle RAC load balancing advisory through FAN events for the service are automatically placed in the Oracle Streams AQ queue table, ALERT_QUEUE.

Use the following query against the internal queue table for load balancing advisory FAN events to monitor load balancing advisory events generated for an instance:

SET PAGES 60 COLSEP '|' LINES 132 NUM 8 VERIFY OFF FEEDBACK OFF

COLUMN user_data HEADING "AQ Service Metrics" FORMAT A60 WRAP

BREAK ON service_name SKIP 1

SELECT

TO_CHAR(enq_time, 'HH:MI:SS') Enq_time, user_data

FROM sys.sys$service_metrics_tab

ORDER BY 1 ;

What is Connection Affinity?

Connection affinity is a performance feature that allows a connection pool to select connections that are directed at a specific Oracle RAC instance. The pool uses run-time connection load balancing (if configured) to select an Oracle RAC instance to create the first connection and then subsequent connections are created with an affinity to the same instance.

What types of affinity does Universal Connection Pool (UCP) support?

UCP JDBC connection pools support two types of connection affinity: transaction-based affinity and Web session affinity.

What is Transaction-Based Affinity?

Transaction-based affinity is an affinity to an Oracle RAC instance that can be released by either the client application or a failure event. Applications typically use this type of affinity when long-lived affinity to an Oracle RAC instance is desired or when the cost (in terms of performance) of being redirected to a new Oracle RAC instance is high. Distributed transactions are a good example of transaction-based affinity. XA connections that are enlisted in a distributed transaction keep an affinity to the Oracle RAC instance for the duration of the transaction. In this case, an application would incur a significant performance cost if a connection is redirect to a different Oracle RAC instance during the distributed transaction.

Transaction-based affinity is strictly scoped between the application/middle-tier and UCP for JDBC; therefore, transaction-based affinity only requires that the setFastConnectionFailoverEnabled property be set to true and does not require complete FCF configuration. In addition, transaction-based affinity does not technically require run-time connection load balancing. However, it can help with performance and is usually enabled regardless. If run-time connection load balancing is not enabled, the connection pool randomly picks connections.

What is Web Session Affinity?

Web session affinity is an affinity to an Oracle RAC instance that can be released by either the instance, a client application, or a failure event. The Oracle RAC instance uses a hint to communicate to a connection pool whether affinity has been enabled or disabled on the instance. An Oracle RAC instance may disable affinity based on many factors, such as performance or load. If an Oracle RAC instance can no longer support affinity, the connections in the pool are refreshed to use a new instance and affinity is established once again.

Applications typically use this type of affinity when short-lived affinity to an Oracle RAC instance is expected or if the cost (in terms of performance) of being redirected to a new Oracle RAC instance is minimal. For example, a mail client session might use Web session affinity to an Oracle RAC instance to increase performance and is relatively unaffected if a connection is redirected to a different instance.

What is recommended for WebLogic Server?

Oracle recommends using WebLogic JDBC multi data sources to handle failover instead. While connect-time failover does not provide the ability to pre-create connections to alternate Oracle RAC nodes, multi data sources have multiple connections available at all times to handle failover.

Transparent Application Failover (TAF) is not supported for any WLS data source. TAF, as delivered via JDBC is currently not transparent. It is documented to affect some ongoing query results and PreparedStatements in unpredictable and unrecoverable ways. TAF JDBC requires specific recovery code at the application level and affects the integrity of statements that WebLogic might be caching.

Do I still need to backup my Oracle Cluster Registry (OCR) and Voting Disks?

You no longer have to back up the voting disk. The voting disk data is automatically backed up in OCR as part of any configuration change and is automatically restored to any voting disk added. If all voting disks are corrupted, however, you can restore.

Oracle Clusterware automatically creates OCR backups every four hours. At any one time, Oracle Database always retains the last three backup copies of OCR. The CRSD process that creates the backups also creates and retains an OCR backup for each full day and at the end of each week. You cannot customize the backup frequencies or the number of files that Oracle Database retains.

How is DBMS_JOB functionality affected by RAC?

DBMS jobs can be set to run either on database (i.e. any active instance), or a specific instance.

What is PARELLEL_FORCE_LOCAL?

By default, the parallel server processes selected to execute a SQL statement can operate on any or all Oracle RAC nodes in the cluster. By setting PARALLEL_FORCE_LOCAL to TRUE, the parallel server processes are restricted to just one node, the node where the query coordinator resides (the node on which the SQL statement was executed). However, in 11.2.0.1 when this parameter is set to TRUE the parallel degree calculations are not being adjusted correctly to only consider the CPU_COUNT for a single node. The parallel degree will be calculated based on the RAC-wide CPU_COUNT and not the single node CPU_COUNT. Due to this bug 9671271 it is not recommended that you set PARALLEL_FORCE_LOCAL to TRUE in 11.2.0.1, instead you should setup a RAC service to limit where parallel statements can execute.

What is the Service Management Policy?

When you use automatic services in an administrator-managed database, during planned database startup, services may start on the first instances to start rather than their preferred instances. Prior to Oracle RAC 11 g release 2 (11.2), all services worked as though they were defined with a manual management policy.

Why does my user appear across all nodes when querying GV$SESSION when my service does not span all nodes?

The problem is you are querying GV$SESSION as the ABC user and this results in the "strange" behaviour. If you select gv$session, 2 parallel servers are spawned to query the v$session on each node. This happens as the same user. Hence when you query gv$session as ABC you are seeing 3 (one real and 2 parallel slaves querying v$session on each instance). The reason you are seeing 1 on one node and 3 on the other is the order in which the parallel processes query the v$session. Take the sys (or any other) user to query the session of ABC and you will not see this problem.

How does Clustereare startup with OCR and Voting Disk in ASM?

The startup sequence has been changed/replaced, now being 2-phased, optimized approach:

Phase I

· OHASD will startup "local" resources first.

· CSSD uses GPnP profile which stores location of voting disk so no need to access ASM (voting disk is stored different within ASM than other files so location is known).

Simultaneously,

· ORAAGENT starts up and ASM instance is started (subset of information in OCR is stored in OLR, enough to startup local resources), and ORAROOTAGENT starts CRSD.

So the 1st phase of Clusterware startup is to essentially start up local resources.

Phase II

· At this point ASM and full OCR information is available and the node is "joined" to cluster.

What is the Oracle Database Quality of Service Management?

Oracle Database QoS Management is an automated, policy-based product that monitors the workload requests for an entire system. Oracle Database QoS Management manages the resources that are shared across applications and adjusts the system configuration to keep the applications running at the performance levels needed by your business. Oracle Database QoS Management responds gracefully to changes in system configuration and demand, thus avoiding additional oscillations in the performance levels of your applications. If you use Oracle Database Quality of Service Management (Oracle Database QoS Management), then you cannot have SINGLETON services in a server pool, unless the maximum size of that server pool is one.

Is a re-link required for the Clusterware home after an OS upgrade?

In 11.2, there are some executables in the GRID home that can and should be re-linked after an OS upgrade. The procedure to do this is:

#> cd GI_HOME/crs/install
#> perl rootcrs.pl -unlock

As the grid infrastructure for a cluster owner:

$> export ORACLE_HOME=Grid_home
$> $GI_HOME/bin/relink

As root again:

#> cd GI_HOME/crs/insta

How do I determine the “Master” node?

For the cluster synchronization service (CSS), the master can be found by searching $GI_HOME/log/cssd/ocssd.log. For master of an enqueue resource with Oracle RAC, you can select from v$ges_resource. There should be a master_node column.

What are the different types of failover mechanisms available?

· JDBC-THIN driver supports Fast Connection Failover (FCF)

· JDBC-OCI driver supports Transparent Application Failover (TAF)

· JDBC-THIN 11gR2 supports Single Client Access Name (SCAN)

What is recommendation on type of tablespaces?

You should use locally managed, auto-allocate tablespaces. With auto-allocate Oracle automatically grows the size of the extent depending on segment size, available free space in the tablespace and other factors. The extent size of a segment starts at 64 KB and grows to 1 MB when the segment grows past 1 MB, and 8 MB once the segment size exceeds 64 MB. So for a large table, the extent size will automatically grow to be large. The use of uniform extents is strongly discouraged for two reasons; space wastage and the impact that wasted space has on scan performance.

For large partitioned objects you should use multiple big file tablespaces to avoid file header block contention during parallel load operations. File header block contention appears as the ‘gc buffer busy’ enqueue wait event in an AWR report. Checking the buffer wait statistic will indicate if it is the file header block that is being contended for.

To evenly distribute a partitioned table among multiple big file tablespaces use the STORE IN clause.

What is the recommendation on column statistics?

Prior to loading any data it is advisable to run all queries against the empty tables to populate or seed the column usage statistics. Column usage statistics are used during optimizer statistics gathering to automatically determine which columns require histograms and the number of buckets that will be used. A column is a candidate for a histogram if it has been seen in a where clause predicate e.g. an equality, range, LIKE, etc. and if there is data skew in that column.

How do I size hash partitions?

Oracle uses a linear hashing algorithm to create sub-partitions. In order to ensure that the data gets evenly distributed among the hash partitions the number of hash partitions should be a power of 2 (i.e. 2 * # of CPU). However, each hash partition should be at least 16MB in size. Any smaller and they will not have efficient scan rates with parallel query. If the subpartitions are too small (from the 2 * # of CPU) considering using a smaller number of partitions (still an even number of partitions).

è What should be my block size?

8 KB is the default block size and is the block size used during all of Oracle's testing. Typically this is good enough for a data warehouse and transactional systems (good compromise or sweet spot). By doubling the default block size you can increase the chances of getting a good compression rate as Oracle applies data compression at the block level. The more rows in the block the greater the chance Oracle will find duplicate values within a block. (Reference: Oracle Sun Database Machine Application Best Practices for Data Warehousing, Doc ID 1094934.1)

What is the guideline on how to auto-extend data files?

When configuring a file to auto extend, the size of the extension should cover all disks in the ASM disk group to optimize balance. For example, with a 4 MB AU size and 128 disks, the size of the extension should be a multiple of 512MB (4*128).

Monday, September 13, 2010

Sessions per Instance using GV$SESSION

I've been doing some testing around Oracle 11.2 RAC and decided to share a few findings so I'll be trying to document them in my blog, which I've been too busy to update. I have to give my sincerest thanks and appreciation to all the bloggers who regularly update, I've no idea how they find the time but appreciate their efforts as I've found gems of information which have really helped me.

One of the first things I've seen with 11.2 RAC is that the information, blogs included, is a bit wanting... there simply are not a lot of good, useful or factual information, even the Oracle documentation is light or inaccurate. Yes, I have given feedback to the inaccuracies and they should be corrected so hopefully you wont even notice or see any evidence. This contrasts with 11.1 and previous version which have lots of blogs, and good, useful information. To me this indicates adoption is still a work in progress for this version, which makes sense since it does bring a lot to the table which needs to be thoroughly tested.

A little trinket of, hopefully, useful information which I found on the OTN forums (credit to Sebastian), which I had not previously realized, but was puzzled during my 11.2 RAC testing:

Say you have a query to get the number of sessions of a particular user across all the nodes (for example to see how load balancing is working):

SELECT inst_id ,username ,count(*) FROM GV$SESSION
FROM username = 'ABC' GROUP BY inst_id ,username

This results in "strange" behavior if you query the GV$SESSION table as the same user ABC, as 2 parallel servers are spawned to query the V$SESSION table on each node. This happens as the same user, i.e. ABC. Hence you will see 3 (1 real and 2 parallel) slaves querying the V$SESSION on each instance. The reason you will see 1 on one node and 3 on the other is the order in which the parallel processes query the V$SESSION.

If you use another user to query the sessions for ABC you will not see this problem and get an accurate session per instance count.

Hope that explains it.

Thursday, July 08, 2010

Oracle Cluster Health Monitor (formerly IPD/OS)

Oracle Cluster Health Monitor (formerly known as Instantaneous Problem Detector for Clusters or IPD/OS) is designed to detect and analyze operating system (OS) and cluster resource related degradation and failures in order to bring more explanatory power to many issues that occur in clusters where Oracle Clusterware and Oracle RAC are running, such as node eviction.

It tracks the OS resource consumption at each node, process, and device level continuously. It collects and analyzes the cluster-wide data. In real time mode, when thresholds are hit, an alert is shown (i.e. you need to be looking at the screen in GUI mode). For root cause analysis, historical data can be replayed to understand what was happening at the time of failure.

I had a bit of trouble during the initial installation since it was not made clear in the README that you MUST unzip and run the subsequent installation process from the home directory of the user doing the installation. I forgot this during re-installation as well, which led to me spend quite a lot of time re-learning this important fact, and hence documenting this for reference.

Oracle Enterprise Manager (OEM) Grid Control, and other tools, provide similar functionality, but I've found that Cluster Health Monitor (CHM) is better in terms of real-time analysis, monitoring and its playback functionality. It is also free, as in no charge, and very easy to setup and maintain.

My environment:

4-node Oracle 11.2.0.1 RAC
Hostnames: ucstst11, ucstst12, ucstst13, ucstst14
OS: Oracle Enterprise Linux 5u4 x64

Installation Steps
1. Create 'crfuser' on all nodes in cluster - This can be any user except root, I do recommend however using a separate user from the regular 'oracle' for separation of duty. Perhaps a standard user used for monitoring tools. The user's group does not have to be oinstall, I just use it here to show that this user also owns 'Oracle' software, and will be an administrator.

#> useradd -d /opt/crfuser -m -s /bin/ksh -g oinstall crfuser

2. Setup password-less SSH user equivalence among all the users across the nodes - I wont go into this here since there are numerous sites as to how to set this up (including one of my own I believe).

Note: If you do have a previous installation of CHM installed you will need to uninstall before proceeding with the new installation. To install, on each node where there is a previous installation:

a. Disable CHM

#> /etc/init.d/init.crfd disable

b. Uninstall CHM

#> /usr/lib/oracrf/install/crfinst.pl -d

c. Remove the Berkely database or BDB (wherever it was installed), and the previus CHM installation home if it still exists.

3. Login as the CHM user ('crfuser') and unzip the install file, it can be obtained from Oracle Technology Network (OTN) here.

#> su - crfuser
#> unzip crfpack-linux.zip

Ensure you are in the 'crfuser' home directory when running the unzip command, or that the unzip is done to that directory.

4. Run the installer to setup the nodes - This step does not actually do any installation so if anything fails simply remove the files, fix the error, and try again. What this step does is to:

a. Run some checks for SSH access to all nodes
b. Assign a replica node
c. Generate a cluster-wide configuration file
d. Send the installation files across the other nodes.

To start the process, from any node:

$> cd ~/install
$> ./crfinst.pl -i ucstst11,ucstst12,ucstst13,ucstst14 -b /opt/oracrfdb -m ucstst11 -N RACDB

Replace the node list with those of your own, you will also need to designate one as a master node, i.e. '-m ucstst11'. If you leave out this argument you will be prompted to designate a master node. By default the cluster will be called 'MyCluster', I've used '-N RACDB' to name my cluster something more known to me, i.e. 'RACDB'. I've also specified (using '-b /opt/oracrfdb') that the location for the BDB database which stores all the captured data should be '/opt/oracrfdb'. This must be a separate file system or mount point, i.e. not under '/' directly otherwise the installation will complain and exit. I'm unsure as to where the default would be located or if there is one since I never tested this setup, but of course you can also work around using somewhere under '/' (if you are short on mounts or space) by modifying the appropriate line in the 'crfinst.pl' Perl script, or using a loopback filesytem.

5. Following this, you will then need to run on each node as the root user:

#> /opt/crfuser/install/crfinst.pl -f -b /opt/oracrfdb -N RACDB
#> /etc/init.d/init.crfd enable

Note: The above step will remove the files which were unzipped previously in the 'crfuser' home directory from the 'crfpack-linux.zip' file. So don't be surprised when this happens.

6. Install the GUI - It is not recommended to install the GUI on any of the RAC nodes due to performance (the GUI takes a few resources) and availability issues (the node running the GUI goes down), but in a simple setup this is okay. Ideally you would install only the GUI portion on any remote client machine. To install only the GUI:

a. Unzip the 'crfpack-linux.zip' file again, to the home directory of the 'crfuser' (or whatever user this time around).

$> cd /opt/crfuser
$> unzip crfpack-linux.zip

b. Run the GUI installation

$> mkdir oracrfgui
$> cd install
$> ./crfinst.pl -g /opt/crfuser/oracrfgui

7. Run the GUI. The below command starts the GUI by connecting to the master node, with a one second refresh time.

$> cd ~/oracrf/bin
$> ./crfgui -m ucstst11

Some Usage Tips
Refresh Rate
To modify the GUI refresh rate (default is 1 second) specify the '-r ' option. The below uses a 5 second refresh rate to update the GUI:

$> ./crfgui -r 5 -m ucstst11

Historical Mode
To do historical analysis specify the '-d ::' option which will display data from the database from the current time until the number of hours, minutes, and seconds specified in the past.

Creating a Loopback Filesystem for the BDB location

1. Create the local file on each node which must be at least 5GB per node, i.e. 5000 * N, or in my case 20000 (5000 * 4).

#> dd if=/dev/zero of=/mnt/oracrfdb.fs count=20000 bs=1M

2. Create a file system on the file, using a label of "ipdosbdb' or whatever is your choice:

#> mke2fs -F -j -L "ipdosbdb" /mnt/oracrfdb.fs

3. Edit the '/etc/fstab' file to ensure the filesystem will be re-mounted following a reboot by adding the following line:

/mnt/oracrfdb.fs /opt/oracrfdb ext3 rw,/dev/loop0 0 0

Specify an available loopback device substituting '/dev/loop0' as appropriate.

4. Mount the file system:

#> mount /opt/oracrfdb

Hope this helps. Feedback and comments welcome!

Tuesday, June 13, 2006

To establish user equivalence (SSH) between two servers

To establish user equivalence:

1. Create the directories for SSH:

mkdir ~/.ssh
chmod 700 ~/.ssh

2. Generate the user's public and private keys as the oracle user on all nodes in turn. Do not enter a passphrase when prompted, just press 'ENTER' 3 times to accept all defaults.

ssh-keygen -t dsa
ssh-keygen -t rsa

3. Create the authorized_keys file using both public keys for all nodes, copy the file to all nodes, and change the permissions (do this from one node only).

ssh salmon1 cat ~/.ssh/*.pub >> ~/.ssh/authorized_keys
ssh salmon2 cat ~/.ssh/*.pub >> ~/.ssh/authorized_keys
scp ~/.ssh/authorized_keys salmon1:.ssh/authorized_keys
scp ~/.ssh/authorized_keys salmon2:.ssh/authorized_keys
ssh salmon1 'chmod 600 ~/.ssh/authorized_keys'
ssh salmon2 'chmod 600 ~/.ssh/authorized_keys'

4. Visit each node in turn from every node (including itself). This ensure all nodes are added to each node's known_hosts file, and cluster operations can be run from any node in the cluster. This also serves to test the setup to ensure it is working.

[oracle@salmon1]$ ssh salmon1 'hostname ; date'
[oracle@salmon1]$ ssh salmon2 'hostname ; date'
[oracle@salmon2]$ ssh salmon1 'hostname ; date'
[oracle@salmon2]$ ssh salmon2 'hostname ; date'

Each of the above should return the date without prompting for a password.