The Lazy Tech's Blog: 2010

Wednesday, December 08, 2010

Oracle11gR2 RAC: Removing Database Nodes from an Existing Cluster

Phase I - Remove the node from RAC database

Policy-Managed

1. Remove DB Console from the node by running the following from another node in the cluster (as ‘oracle’ user):

$> $RDBMS_HOME/bin/emca -deleteNode db

2. For Policy-Managed database a possible method is to decrease the size of the server pool, and relocate the node to the free pool (assuming all other maximums are met). For relocation to work there should be no active sessions on the associated instance. You can either use the ‘-f’ flag to force relocation, or shutdown the service on the instance prior to running the command:

$> $RDBMS_HOME/bin/srvctl stop instance -d -n
$> $RDBMS_HOME/bin/srvctl relocate server -n -g Free

Admin-Managed

1. For Admin-Managed databases, ensure the instance to be removed is not a PREFERRED or AVAILABLE instance for any Services (i.e. modify Services to exclude the instance to be removed).

2. Remove the instance using ‘$RDBMS_HOME/bin/dbca’, running the command from a node not being removed (as ‘oracle’ user):

$> $RDBMS_HOME/bin/dbca -silent -deleteInstance -nodeList -gdbName -instanceName -sysDBAUserName sys -sysDBAPassword

3. Disable and stop any listeners running on the node (as ‘oracle’ user on any node):

$> $RDBMS_HOME/bin/srvctl disable listener -l -n
$> $RDBMS_HOME/bin/srvctl stop listener -l -n

4. Update the inventory on the node to be removed (run from the node being removed as ‘oracle’ user):

$> $RDBMS_HOME/oui/bin/runInstaller -updateNodeList ORACLE_HOME= “CLUSTER_NODES={[oldnode]}” -local

5. Deinstall the Oracle home, from the node being removed (as ‘oracle’ user):

$> $RDBMS_HOME/deinstall/deinstall -local

6. From any of the existing nodes run the following to update the inventory with the list of the remaining nodes (as ‘oracle’ user):

$> $RDBMS_HOME/oui/bin/runInstaller -updateNodeList ORACLE_HOME= “CLUSTER_NODES={[node1,...nodeX]}"

Phase II - Remove node from Clusterware

7. Check if the node is active and unpinned (as ‘root’ user):

#> $GI_HOME/bin/olsnodes -s -t

Note: The node will only be pinned if using CTSS, or using with database version < style="text-align: justify;">8. Disable Clusterware applications and daemons on the node to be removed. Use the ‘-lastnode’ option when running on the last node in the cluster to be removed (as ‘root’ user):

#> $GI_HOME/crs/install/rootcrs.pl -deconfig -force [-lastnode]

9. From any node not being removed delete Clusterware from the node (as ‘root’ user):

#> $GI_HOME/bin/crsctl delete node -n

10. As ‘grid’ user from node being removed:

$GI_HOME/oui/bin/runInstaller -updateNodeList ORACLE_HOME= "CLUSTER_NODES={[oldnode1,...oldnodeX]}" CRS=TRUE -local

11. Deinstall Clusterware software from the node:

$> $GI_HOME/deinstall/deinstall -local

ISSUE:

‘$GI_HOME/deinstall/deinstall -local’ results in questions about other nodes, and prompts for running scripts against other nodes. DO NOT RUN!! This process deconifigures CRS on all nodes!

ROOT/CAUSE:
Incorrect documentation as a command is missing.

RESOLUTION:
The correct order (as already given in this document) should be as follows:

$GI_HOME/oui/bin/runInstaller -updateNodeList ORACLE_HOME=/oracle/app/11.2.0/grid “CLUSTER_NODES={[oldnode]}” CRS=TRUE -local
$GI_HOME/deinstall/deinstall -local

12. From any existing node to remain, update the Clusterware with the remaining nodes (as ‘grid’ user):

$> $GI_HOME/oui/bin/runInstaller -updateNodeList ORACLE_HOME= “CLUSTER_NODES={[oldnode1,...oldnodeX]}” CRS=TRUE

13. Verify the node has been removed and the remaining nodes are valid:

$GI_HOME/bin/cluvfy stage -post nodedel -n -verbose

14. Remove OCM host/configuration from the MOS portal.

Oracle11gR2 RAC: Adding Database Nodes to an Existing Cluster

Environment:

Oracle RAC Database 11.2.0.1
Oracle Grid Infrastructure 11.2.0.1
Non-GNS
OEL 5.5 or SLES 11.1 (both x86_64)

Adding Nodes to a RAC Administrator-Managed Database

Cloning to Extend an Oracle RAC Database (~20 minutes or less depending)

Phase I - Extending Oracle Clusterware to a new cluster node

1. Make physical connections, and install the OS.

Warning: Follow article “11GR2 GRID INFRASTRUCTURE INSTALLATION FAILS WHEN RUNNING ROOT.SH ON NODE 2 OF RAC [ID 1059847.1]” to ensure successful completion of root.sh!! This is also pointed out in the internal Installation Guide.

2. Create Oracle accounts, and setup SSH among the new node and the existing cluster nodes.

Warning: Follow article “How To Configure SSH for a RAC Installation [ID 300548.1]" for the correct procedure for SSH setup! EACH NODE MUST BE VISITED TO ENSURE THEY ARE ADDED TO KNOWN_HOSTS FILE!! This is also mentioned in the internal Installation Guide.

3. Verify the requirements for cluster node addition using the Cluster Verification Utility (CVU). From an existing cluster node (as ‘grid’ user):

$> $GI_HOME/bin/cluvfy stage -post hwos -n -verbose

4. Compare an existing node (reference node) with the new node(s) to be added (as ‘grid’ user):

$> $GI_HOME/bin/cluvfy comp peer -refnode -n -orainv oinstall -osdba dba -verbose

5. Verify the integrity of the cluster and new node by running from an existing cluster node (as ‘grid’ user):

$GI_HOME/bin/cluvfy stage -pre nodeadd -n -fixup -verbose

6. Add the new node by running the following from an existing cluster node (as ‘grid’ user):

a. Not using GNS

$GI_HOME/oui/bin/addNode.sh -silent “CLUSTER_NEW_NODES={}” “CLUSTER_NEW_VIRTUAL_HOSTNAMES={}}”

b. Using GNS

$GI_HOME/oui/bin/addNode.sh -silent “CLUSTER_NEW_NODES={}}”

Run the root scripts when prompted.

POSSIBLE ERROR:
/oracle/app/11.2.0/grid/root.sh
Running Oracle 11g root.sh script...

The following environment variables are set as:
ORACLE_OWNER= grid
ORACLE_HOME= /oracle/app/11.2.0/grid

Enter the full pathname of the local bin directory: [/usr/local/bin]:
Copying dbhome to /usr/local/bin ...
Copying oraenv to /usr/local/bin ...
Copying coraenv to /usr/local/bin ...

Creating /etc/oratab file...
Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root.sh script.
Now product-specific root actions will be performed.
2010-08-11 16:12:19: Parsing the host name
2010-08-11 16:12:19: Checking for super user privileges
2010-08-11 16:12:19: User has super user privileges
Using configuration parameter file: /oracle/app/11.2.0/grid/crs/install/crsconfig_params
Creating trace directory
-ksh: line 1: /bin/env: not found
/oracle/app/11.2.0/grid/bin/cluutil -sourcefile /etc/oracle/ocr.loc -sourcenode ucstst12 -destfile /oracle/app/11.2.0/grid/srvm/admin/ocrloc.tmp -nodelist ucstst12 ... failed
Unable to copy OCR locations
validateOCR failed for +OCR_VOTE at /oracle/app/11.2.0/grid/crs/install/crsconfig_lib.pm line 7979.

CAUSE:
SSH User Equivalency is not properly setup.

SOLUTION:
[1] Correctly setup SSH user equivalency.

[2] Deconfigure cluster on new node:

#> /oracle/app/11.2.0/grid/crs/install/rootcrs.pl -deconfig -force

[3] Rerun root.sh

POSSIBLE ERROR:
PRCR-1013 : Failed to start resource ora.LISTENER.lsnr
PRCR-1064 : Failed to start resource ora.LISTENER.lsnr on node ucstst13
CRS-2662: Resource 'ora.LISTENER.lsnr' is disabled on server 'ucstst13'

start listener on node=ucstst13 ... failed
Configure Oracle Grid Infrastructure for a Cluster ... failed

CAUSE:

A node is being added that was previously a member of the cluster and Clusterware is aware that the node was a previous member and also that the last listener status was ‘disabled’.

SOLUTION:

Referencing “Bare Metal Restore Procedure for Compute Nodes on an Exadata Environment” (Doc ID 1084360.1), run the following from the node being added as the ‘root’ user to enable and start the local listener (this will complete the operation):

ucstst13:/oracle/app/11.2.0/grid/bin# /oracle/app/11.2.0/grid/bin/srvctl enable listener -l -n
ucstst13:/oracle/app/11.2.0/grid/bin # /oracle/app/11.2.0/grid/bin/srvctl start listener -l -n

7. Verify that the new node has been added to the cluster (as ‘grid’ user):

$GI_HOME/bin/cluvfy stage -post nodeadd -n -verbose

Phase II - Extending Oracle Database RAC to new cluster node

8. Using the ‘addNode.sh’ script, from an existing node in the cluster as the ‘oracle’ user:

$> $ORACLE_HOME/oui/bin/addNode.sh -silent "CLUSTER_NEW_NODES={newnode1,…newnodeX}"

9. On the new node run the ‘root.sh’ script as the ‘root’ user as prompted.

10. Set ORACLE_HOME and ensure you are using the ‘oracle’ account user. Ensure permissions for Oracle executable are 6751, if not, then as root user:

cd $ORACLE_HOME/bin
chgrp asmadmin oracle
chmod 6751 oracle
ls -l oracle

OR

as 'grid' user: $GI_HOME/bin/setasmgidwrap o=$RDBMS_HOME/bin/oracle

11. On any existing node, run DBCA ($ORACLE_HOME/bin/dbca) to add/create the new instance (as ‘oracle’ user):

$ORACLE_HOME/bin/dbca -silent -addInstance -nodeList -gdbName -instanceName -sysDBAUserName sys -sysDBAPassword

NOTE: Ensure the command is run from an existing node with the same or less memory than the new nodes otherwise the command will fail due to insufficient memory to support the instance. Also ensure that the log file is checked for actual success since it can differ from what is displayed at the screen.

POSSIBLE ERROR:
DBCA logs with error below (screen indicates success):
Adding instance
DBCA_PROGRESS : 1%
DBCA_PROGRESS : 2%
DBCA_PROGRESS : 6%
DBCA_PROGRESS : 13%
DBCA_PROGRESS : 20%
DBCA_PROGRESS : 26%
DBCA_PROGRESS : 33%
DBCA_PROGRESS : 40%
DBCA_PROGRESS : 46%
DBCA_PROGRESS : 53%
DBCA_PROGRESS : 66%
Completing instance management.
DBCA_PROGRESS : 76%
PRCR-1013 : Failed to start resource ora.racdb.db
PRCR-1064 : Failed to start resource ora.racdb.db on node ucstst13
ORA-01031: insufficient privileges
ORA-01034: ORACLE not available
ORA-27101: shared memory realm does not exist
Linux-x86_64 Error: 2: No such file or directory
Process ID: 0
Session ID: 0 Serial number: 0

ORA-01031: insufficient privileges
ORA-01031: insufficient privileges
CRS-2674: Start of 'ora.racdb.db' on 'ucstst13' failed
ORA-01034: ORACLE not available
ORA-27101: shared memory realm does not exist
Linux-x86_64 Error: 2: No such file or directory
Process ID: 0
Session ID: 0 Serial number: 0

ORA-01031: insufficient privileges
ORA-01031: insufficient privileges

DBCA_PROGRESS : 100%

CAUSE:

The permissions and/or ownership on the ‘$RDBMS_HOME/bin/oracle’ binary are incorrect. References: “ORA-15183 Unable to Create Database on Server using 11.2 ASM and Grid Infrastructure [ID 1054033.1]”, “Incorrect Ownership and Permission after Relinking or Patching 11gR2 Grid Infrastructure [ID 1083982.1]”.

SOLUTION:
As root user:
cd $ORACLE_HOME/bin
chgrp asmadmin oracle
chmod 6751 oracle
ls -l oracle

Ensure the ownership and permission are now like:
-rwsr-s--x 1 oratest asmadmin

Warning: Whenever a patch is applied to the database ORACLE_HOME, please ensure the above ownership and permission are corrected after the patch.

12. Following the above, the EM DB Console may not be configured correctly for the new nodes. The GUI does not show the nodes, however the EM agent does report them to the OMS agent via the command line:

**************** Current Configuration ****************
INSTANCE NODE DBCONTROL_UPLOAD_HOST
---------- ---------- ---------------------

racdb racnode10 racnode10.mydomain.com
racdb racnode11 racnode10.mydomain.com
racdb racnode12 racnode10.mydomain.com
racdb racnode13 racnode10.mydomain.com
racdb racnode14 racnode10.mydomain.com

Also, the listener is not properly configured as starting from the GI_HOME so it will appear incorrectly as down in DB Console. To correctly configure the instance in DB Console:

a. Delete the instance from EM DB Console:

$RDBMS_HOME/bin/emca -deleteInst db

b. Create the EM Agent directories on the new node for all the nodes in the cluster (including the new node) as follows:

mkdir -p $RDBMS_HOME/racnode10_racdb/sysman/config
mkdir -p $RDBMS_HOME/racnode11_racdb/sysman/config
mkdir -p $RDBMS_HOME/racnode12_racdb/sysman/config
mkdir -p $RDBMS_HOME/racnode13_racdb/sysman/config
mkdir -p $RDBMS_HOME/racnode10_racdb/sysman/emd
mkdir -p $RDBMS_HOME/racnode11_racdb/sysman/emd
mkdir -p $RDBMS_HOME/racnode12_racdb/sysman/emd
mkdir -p $RDBMS_HOME/racnode13_racdb/sysman/emd

c. Re-add the instance:

$RDBMS_HOME/emca -addInst db

To address the listener incorrectly showing as down in DB Console:

a. On each node as the oracle user, edit the file ‘$RDBMS_HOME/_/sysman/config/targets.xml’.

b. Change the "ListenerOraDir" entry for the node to match the ‘$GI_HOME/network/admin’ location.

c. Save the file and restart the DB Console.

13. Verify the administrator privileges on the new node by running on existing node:

$ORACLE_HOME/bin/cluvfy comp admprv -o db_config -d -n -verbose

14. For an Admin-Managed Cluster, add the new instances to Services, or create additional Services. For a Policy-Managed Cluster, verify the instance has been added to an existing server pool.

15. Setup OCM in the cloned homes (both GI and RDBMS):

a. Delete all subdirectories to remove previously configured host:

$> rm -rf $ORACLE_HOME/ccr/hosts/*

b. Move (do not copy) from the OH the file as shown below:

$> mv $ORACLE_HOME/ccr/inventory/core.jar $ORACLE_HOME/ccr/inventory/pending/core.jar

c. Configure OCM for the cloned home on the new node:

$> $ORACLE_HOME/ccr/bin/configCCR -a

Adding Nodes to a RAC Policy-Managed Database

When adding a node in a cluster running as a Policy-Managed Database, Oracle Clusterware tries to start the new instance before the cloning procedure completes. The following steps should be used to add the node:

1. Run the ‘addNode.sh’ script as the ‘grid’ user for the Oracle Grid Infrastructure for a cluster to add the new node (similar to step 6 above). DO NOT run the root scripts when prompted; you will run them later.

2. Install the Oracle RAC database software using a software-only installation, clone or ‘addNode.sh’ script (same as step 8 above) method. Ensure Oracle is linked with the Oracle RAC option if using the software-only installation.

3. Complete the root script actions for the Database home, similar to steps 9 - 10 above.

4. Complete the root scripts action for the Oracle Clusterware home and then finish the installation. Similar to as mentioned in step 6 above.

5. Verify that EM DB Console is fully operational, similar to step 12 above.

6. Complete the configuration for OCM, similar to step 15 above.

Oracle11gR2 RAC: FAQ

Can I change a node’s hostname?

Yes, however, the node must be removed and added back to the cluster with the new name.

How do I define a service for a Policy-Managed Database?

When you define services for a policy-managed database, you define the service to a server pool where the database is running. You can define the service as either UNIFORM (running on all instances in the server pool) or SINGLETON (running on only one instance in the server pool). For SINGLETON services, Oracle RAC chooses on which instance in the server pool the service is active. If that instance fails, then the service fails over to another instance in the server pool. A service can only run in one server pool.

Services for administrator-managed databases continue to be defined by the PREFERRED and AVAILABLE definitions.

How do I convert from a Policy-Managed Database to Administrator-Managed Database?

You cannot directly convert a policy-managed database to an administrator-managed database. Instead, you can remove the policy-managed configuration using the 'srvctl remove database' and 'srvctl remove service' commands, and then create a new administrator-managed database with the 'srvctl add database' command.

What is Grid Plug and Play (GPnP)?

Grid Plug and Play (GPnP) eliminates per-node configuration data and the need for explicit add and delete node steps. This allows a system administrator to take a template system image and run it on a new node with no further configuration. This removes many manual operations, reduces the opportunity for errors, and encourages configurations that can be changed easily. Removal of the per-node configuration makes the nodes easier to replace, because they do not need to contain individually-managed state.

Grid Plug and Play reduces the cost of installing, configuring, and managing database nodes by making their per-node state disposable. It allows nodes to be easily replaced with regenerated state.

What is a Server Pool?

Server pools enable the cluster administrator to create a policy which defines how Oracle Clusterware allocates resources. An Oracle RAC policy-managed database runs in a server pool. Oracle Clusterware attempts to keep the required number of servers in the server pool and, therefore, the required number of instances of the Oracle RAC database. A server can be in only one server pool at any time. However, a database can run in multiple server pools. Cluster-managed services run in a server pool where they are defined as either UNIFORM (active on all instances in the server pool) or SINGLETON (active on only one instance in the server pool).

You should create redo log groups only if you are using administrator-managed databases. For policy-managed databases, increase the cardinality and when the instance starts, if you are using Oracle Managed Files and Oracle ASM, then Oracle automatically allocates the thread, redo, and undo.

If you remove an instance from your Oracle RAC database, then you should disable the instance’s thread of redo so that Oracle does not have to check the thread during database recovery.

For policy-managed databases, Oracle automatically allocates the undo tablespace when the instance starts if you have OMF enabled.

What is Run-Time Connection Load Balancing?

The run-time connection load balancing feature enables routing of work requests to an instance that offers the best performance, minimizing the need to relocate work. To enable and use run-time connection load balancing, the connection goal must be set to SHORT and either of the following service-level goals must be set:

· SERVICE_TIME—The Load Balancing Advisory attempts to direct work requests to instances according to their response time. Load Balancing Advisory data is based on the elapsed time for work done by connections using the service, as well as available bandwidth to the service. This goal is best suited for workloads that require varying lengths of time to complete, for example, an internet shopping system.

· THROUGHPUT—The Load Balancing Advisory measures the percentage of the total response time that the CPU consumes for the service. This measures the efficiency of an instance, rather than the response time. This goal is best suited for workloads where each work request completes in a similar amount of time, for example, a trading system.

Client-side load balancing balances the connection requests across the listeners by setting the parameter ‘LOAD_BALANCE=ON’ directive. When you set this parameter to ON, Oracle Database randomly selects an address in the address list, and connects to that node's listener. This balances client connections across the available SCAN listeners in the cluster. When clients connect using SCAN, Oracle Net automatically load balances client connection requests across the three IP addresses you defined for the SCAN, unless you are using EZConnect.

What are the different types of Server-Side Connection Load Balancing?

With server-side load balancing, the SCAN listener directs a connection request to the best instance currently providing the service by using the load balancing advisory. The two types of connection load balancing are:

· SHORT—Connections are distributed across instances based on the amount of time that the service is used. Use the SHORT connection load balancing goal for applications that have connections of brief duration. When using connection pools that are integrated with FAN, set the connection load balancing goal to SHORT. SHORT tells the listener to use CPU-based statistics.

· LONG—Connections are distributed across instances based on the number of sessions in each instance, for each instance that supports the service. Use the LONG connection load balancing goal for applications that have connections of long duration. This is typical for connection pools and SQL*Forms sessions. LONG is the default connection load balancing goal, and tells the listener to use session-based statistics.

How do I enable the Load Balancing Advisory (LBA)?

To enable the load balancing advisory, use the ‘-B’ option when creating or modifying the service using the ‘srvctl’ command.

How does the database register with the Listener?

When a listener starts after the Oracle instance starts, and the listener is listed for service registration, registration does not occur until the next time the Oracle Database process monitor (PMON) discovery routine starts. By default, PMON discovery occurs every 60 seconds.

To override the 60-second delay, use the SQL ‘ALTER SYSTEM REGISTER’ statement. This statement forces the PMON process to register the service immediately.

If you run this statement while the listener is up and the instance is already registered, or while the listener is down, then the statement has no effect.

Can I configure both failure notifications with Universal Connection Pool (UCP)?

Connection failure notification is redundant with Fast Connection Failover (FCF) as implemented by the UCP. You should not configure both within the same application.

Should I configure Transparent Application Failure (TAF) in my service definition if using Fast Connection Failure (FCF)?

Do not configure Transparent Application Failover (TAF) with Fast Connection Failover (FCF) for JDBC clients as TAF processing will interfere with FAN ONS processing.

Can I use Fast Connection Failover (FCF) and Transparent Application Failover (TAF) together?

No. Only one of them should be used at a time.

What is the status of Fast Connection Failover (FCF) with Universal Connection Pool (UCP)?

FCF is now deprecated along with the Implicit Connection Caching in favor of using the Universal Connection Pool (UCP) for JDBC.

Does Fast Connection Failover (FCF) support planned outages?

FCF does not support planned outages like service relocation (reference Doc ID: 1076130.1). It is designed to work for unplanned outages, where a RAC service is preferred on all the nodes in the cluster and one of the nodes goes down unexpectedly. When a planned outage like a service relocation is done from one node to the other, FCF does not work as expected and the result is unpredictable. There is no solution at present for this. Enhancement request 9495973 has been raised to address this limitation.

Should I user JDBC Thin driver or JDBC OCI driver?

Oracle thin JDBC driver is usually preferred by application developers because it is cross platform and has no external dependencies. However some applications require the high-performance, native C-language based Oracle Call Interface (OCI) driver. This driver is compatible with FCF and can alternatively use Transparent Application Failover (TAF) which operates at a lower level than FCF and can automatically resubmit SELECT queries in the event of a node failure. However for most applications, the ease of deployment of the thin driver with full FCF support will outweigh any benefits offered by the OCI driver.

How do I subscribe to HA Events?

If you are using a client that uses Oracle Streams Advanced Queuing, such as OCI and ODP.NET clients, to receive FAN events, you must enable the service used by that client to access the alert notification queue by using the ‘-q’ option via the ‘srvctl’ command.

FAN events are published using ONS and Oracle Streams Advanced Queuing. The service metrics received from the Oracle RAC load balancing advisory through FAN events for the service are automatically placed in the Oracle Streams AQ queue table, ALERT_QUEUE.

Use the following query against the internal queue table for load balancing advisory FAN events to monitor load balancing advisory events generated for an instance:

SET PAGES 60 COLSEP '|' LINES 132 NUM 8 VERIFY OFF FEEDBACK OFF

COLUMN user_data HEADING "AQ Service Metrics" FORMAT A60 WRAP

BREAK ON service_name SKIP 1

SELECT

TO_CHAR(enq_time, 'HH:MI:SS') Enq_time, user_data

FROM sys.sys$service_metrics_tab

ORDER BY 1 ;

What is Connection Affinity?

Connection affinity is a performance feature that allows a connection pool to select connections that are directed at a specific Oracle RAC instance. The pool uses run-time connection load balancing (if configured) to select an Oracle RAC instance to create the first connection and then subsequent connections are created with an affinity to the same instance.

What types of affinity does Universal Connection Pool (UCP) support?

UCP JDBC connection pools support two types of connection affinity: transaction-based affinity and Web session affinity.

What is Transaction-Based Affinity?

Transaction-based affinity is an affinity to an Oracle RAC instance that can be released by either the client application or a failure event. Applications typically use this type of affinity when long-lived affinity to an Oracle RAC instance is desired or when the cost (in terms of performance) of being redirected to a new Oracle RAC instance is high. Distributed transactions are a good example of transaction-based affinity. XA connections that are enlisted in a distributed transaction keep an affinity to the Oracle RAC instance for the duration of the transaction. In this case, an application would incur a significant performance cost if a connection is redirect to a different Oracle RAC instance during the distributed transaction.

Transaction-based affinity is strictly scoped between the application/middle-tier and UCP for JDBC; therefore, transaction-based affinity only requires that the setFastConnectionFailoverEnabled property be set to true and does not require complete FCF configuration. In addition, transaction-based affinity does not technically require run-time connection load balancing. However, it can help with performance and is usually enabled regardless. If run-time connection load balancing is not enabled, the connection pool randomly picks connections.

What is Web Session Affinity?

Web session affinity is an affinity to an Oracle RAC instance that can be released by either the instance, a client application, or a failure event. The Oracle RAC instance uses a hint to communicate to a connection pool whether affinity has been enabled or disabled on the instance. An Oracle RAC instance may disable affinity based on many factors, such as performance or load. If an Oracle RAC instance can no longer support affinity, the connections in the pool are refreshed to use a new instance and affinity is established once again.

Applications typically use this type of affinity when short-lived affinity to an Oracle RAC instance is expected or if the cost (in terms of performance) of being redirected to a new Oracle RAC instance is minimal. For example, a mail client session might use Web session affinity to an Oracle RAC instance to increase performance and is relatively unaffected if a connection is redirected to a different instance.

What is recommended for WebLogic Server?

Oracle recommends using WebLogic JDBC multi data sources to handle failover instead. While connect-time failover does not provide the ability to pre-create connections to alternate Oracle RAC nodes, multi data sources have multiple connections available at all times to handle failover.

Transparent Application Failover (TAF) is not supported for any WLS data source. TAF, as delivered via JDBC is currently not transparent. It is documented to affect some ongoing query results and PreparedStatements in unpredictable and unrecoverable ways. TAF JDBC requires specific recovery code at the application level and affects the integrity of statements that WebLogic might be caching.

Do I still need to backup my Oracle Cluster Registry (OCR) and Voting Disks?

You no longer have to back up the voting disk. The voting disk data is automatically backed up in OCR as part of any configuration change and is automatically restored to any voting disk added. If all voting disks are corrupted, however, you can restore.

Oracle Clusterware automatically creates OCR backups every four hours. At any one time, Oracle Database always retains the last three backup copies of OCR. The CRSD process that creates the backups also creates and retains an OCR backup for each full day and at the end of each week. You cannot customize the backup frequencies or the number of files that Oracle Database retains.

How is DBMS_JOB functionality affected by RAC?

DBMS jobs can be set to run either on database (i.e. any active instance), or a specific instance.

What is PARELLEL_FORCE_LOCAL?

By default, the parallel server processes selected to execute a SQL statement can operate on any or all Oracle RAC nodes in the cluster. By setting PARALLEL_FORCE_LOCAL to TRUE, the parallel server processes are restricted to just one node, the node where the query coordinator resides (the node on which the SQL statement was executed). However, in 11.2.0.1 when this parameter is set to TRUE the parallel degree calculations are not being adjusted correctly to only consider the CPU_COUNT for a single node. The parallel degree will be calculated based on the RAC-wide CPU_COUNT and not the single node CPU_COUNT. Due to this bug 9671271 it is not recommended that you set PARALLEL_FORCE_LOCAL to TRUE in 11.2.0.1, instead you should setup a RAC service to limit where parallel statements can execute.

What is the Service Management Policy?

When you use automatic services in an administrator-managed database, during planned database startup, services may start on the first instances to start rather than their preferred instances. Prior to Oracle RAC 11 g release 2 (11.2), all services worked as though they were defined with a manual management policy.

Why does my user appear across all nodes when querying GV$SESSION when my service does not span all nodes?

The problem is you are querying GV$SESSION as the ABC user and this results in the "strange" behaviour. If you select gv$session, 2 parallel servers are spawned to query the v$session on each node. This happens as the same user. Hence when you query gv$session as ABC you are seeing 3 (one real and 2 parallel slaves querying v$session on each instance). The reason you are seeing 1 on one node and 3 on the other is the order in which the parallel processes query the v$session. Take the sys (or any other) user to query the session of ABC and you will not see this problem.

How does Clustereare startup with OCR and Voting Disk in ASM?

The startup sequence has been changed/replaced, now being 2-phased, optimized approach:

Phase I

· OHASD will startup "local" resources first.

· CSSD uses GPnP profile which stores location of voting disk so no need to access ASM (voting disk is stored different within ASM than other files so location is known).

Simultaneously,

· ORAAGENT starts up and ASM instance is started (subset of information in OCR is stored in OLR, enough to startup local resources), and ORAROOTAGENT starts CRSD.

So the 1st phase of Clusterware startup is to essentially start up local resources.

Phase II

· At this point ASM and full OCR information is available and the node is "joined" to cluster.

What is the Oracle Database Quality of Service Management?

Oracle Database QoS Management is an automated, policy-based product that monitors the workload requests for an entire system. Oracle Database QoS Management manages the resources that are shared across applications and adjusts the system configuration to keep the applications running at the performance levels needed by your business. Oracle Database QoS Management responds gracefully to changes in system configuration and demand, thus avoiding additional oscillations in the performance levels of your applications. If you use Oracle Database Quality of Service Management (Oracle Database QoS Management), then you cannot have SINGLETON services in a server pool, unless the maximum size of that server pool is one.

Is a re-link required for the Clusterware home after an OS upgrade?

In 11.2, there are some executables in the GRID home that can and should be re-linked after an OS upgrade. The procedure to do this is:

#> cd GI_HOME/crs/install
#> perl rootcrs.pl -unlock

As the grid infrastructure for a cluster owner:

$> export ORACLE_HOME=Grid_home
$> $GI_HOME/bin/relink

As root again:

#> cd GI_HOME/crs/insta

How do I determine the “Master” node?

For the cluster synchronization service (CSS), the master can be found by searching $GI_HOME/log/cssd/ocssd.log. For master of an enqueue resource with Oracle RAC, you can select from v$ges_resource. There should be a master_node column.

What are the different types of failover mechanisms available?

· JDBC-THIN driver supports Fast Connection Failover (FCF)

· JDBC-OCI driver supports Transparent Application Failover (TAF)

· JDBC-THIN 11gR2 supports Single Client Access Name (SCAN)

What is recommendation on type of tablespaces?

You should use locally managed, auto-allocate tablespaces. With auto-allocate Oracle automatically grows the size of the extent depending on segment size, available free space in the tablespace and other factors. The extent size of a segment starts at 64 KB and grows to 1 MB when the segment grows past 1 MB, and 8 MB once the segment size exceeds 64 MB. So for a large table, the extent size will automatically grow to be large. The use of uniform extents is strongly discouraged for two reasons; space wastage and the impact that wasted space has on scan performance.

For large partitioned objects you should use multiple big file tablespaces to avoid file header block contention during parallel load operations. File header block contention appears as the ‘gc buffer busy’ enqueue wait event in an AWR report. Checking the buffer wait statistic will indicate if it is the file header block that is being contended for.

To evenly distribute a partitioned table among multiple big file tablespaces use the STORE IN clause.

What is the recommendation on column statistics?

Prior to loading any data it is advisable to run all queries against the empty tables to populate or seed the column usage statistics. Column usage statistics are used during optimizer statistics gathering to automatically determine which columns require histograms and the number of buckets that will be used. A column is a candidate for a histogram if it has been seen in a where clause predicate e.g. an equality, range, LIKE, etc. and if there is data skew in that column.

How do I size hash partitions?

Oracle uses a linear hashing algorithm to create sub-partitions. In order to ensure that the data gets evenly distributed among the hash partitions the number of hash partitions should be a power of 2 (i.e. 2 * # of CPU). However, each hash partition should be at least 16MB in size. Any smaller and they will not have efficient scan rates with parallel query. If the subpartitions are too small (from the 2 * # of CPU) considering using a smaller number of partitions (still an even number of partitions).

è What should be my block size?

8 KB is the default block size and is the block size used during all of Oracle's testing. Typically this is good enough for a data warehouse and transactional systems (good compromise or sweet spot). By doubling the default block size you can increase the chances of getting a good compression rate as Oracle applies data compression at the block level. The more rows in the block the greater the chance Oracle will find duplicate values within a block. (Reference: Oracle Sun Database Machine Application Best Practices for Data Warehousing, Doc ID 1094934.1)

What is the guideline on how to auto-extend data files?

When configuring a file to auto extend, the size of the extension should cover all disks in the ASM disk group to optimize balance. For example, with a 4 MB AU size and 128 disks, the size of the extension should be a multiple of 512MB (4*128).