Oracle DBA – Tips and Techniques

Knowledge is Power – so share it ….

Archive for the ‘Real Application Clusters’ Category

11g ASM rolling upgrade from 11.1.0.6 to 11.1.0.7

Posted by Gavin Soorma on December 15, 2009

In Oracle 11g, we can now do rolling upgrades of the ASM instanaces in a RAC cluster configuration. In this example shown below, one of the ASM instances is being upgraded to 11.1.0.7 while the other instance is still running on version 11.1.0.6. While the ASM software is being upgraded, client connections to the RAC database will continue as the service will get relocated to the other node in the cluster which is not being currently upgraded. So in other words this is an online patch application.

The database name here is racdba and the two instances are racdba1 and racdba2 running on nodes middba1 and middba2. The platform used in this example is AIX 5.3 64 bit on a P series LPAR.

On one of the nodes in the cluster, connect to the ASM instance (+ASM1) and issue the command to start the rolling migration.

SQL> alter system start rolling migration to '11.1.0.7';

System altered.

Stop the database instance racdba1, shutdown the ASM instance +ASM1 and then stop the listener which is running from the ASM home which will now be upgraded to 11.1.0.7.

middba1 $ srvctl stop instance -d racdba -i racdba1

middba1 $ sqlplus sys as sysasm

SQL*Plus: Release 11.1.0.6.0 - Production on Fri Dec 11 09:57:13 2009

Copyright (c) 1982, 2007, Oracle.  All rights reserved.

Enter password:

Connected to:
Oracle Database 11g Enterprise Edition Release 11.1.0.6.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP, Data Mining
and Real Application Testing options

SQL> shutdown immediate;
ASM diskgroups dismounted
ASM instance shutdown
SQL> quit

middba1 $ lsnrctl stop LISTENER_MIDDBA1

LSNRCTL for IBM/AIX RISC System/6000: Version 11.1.0.6.0 - Production on 11-DEC-2009 10:00:31

Copyright (c) 1991, 2007, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=middba1_vip)(PORT=1521)(IP=FIRST)))
The command completed successfully

Initially the service ‘rac1_service’ is connected to instance racdba1 hosted on node middba1. When we shutdown the instance racdba1, the service will be relocated to the other surviving instance running on node middba2.

middba2 $ sqlplus sh/sh@rac1

SQL*Plus: Release 11.1.0.6.0 - Production on Fri Dec 11 09:11:44 2009

Copyright (c) 1982, 2007, Oracle.  All rights reserved.

Connected to:
Oracle Database 11g Enterprise Edition Release 11.1.0.6.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP, Data Mining
and Real Application Testing options

SQL> select count(*) from sales;

  COUNT(*)
----------
    918843

SQL> select host_name from v$instance;

HOST_NAME
----------------------------------------------------------------
middba1

********ASM and Database Instance is shutdown here ****************

SQL> /

HOST_NAME
----------------------------------------------------------------
middba2

SQL>  select count(*) from sales;

  COUNT(*)
----------
    918843

While installing the 11.1.0.7 patch, it should be noted that we are only selecting one node in the cluster where the patch will be applied – the other node in the cluster is where the 11.1.0.6 ASM and Database instance is running and where all the client connections are being hosted.

After the 11.1.0.7 patch application on the ASM Oracle Home is completed, we will now start the listener, the upgraded ASM instance and the database instance.

middba1 $ lsnrctl start LISTENER_MIDDBA1

middba1 $ sqlplus sys as sysasm

SQL*Plus: Release 11.1.0.7.0 - Production on Fri Dec 11 10:48:46 2009

Copyright (c) 1982, 2008, Oracle.  All rights reserved.

Enter password:
Connected to an idle instance.

SQL> startup
ASM instance started

Total System Global Area  283930624 bytes
Fixed Size                  2153056 bytes
Variable Size             256611744 bytes
ASM Cache                  25165824 bytes
ASM diskgroups mounted

middba1 $ srvctl start instance -d racdba -i racdba1

We will now repeat all the same steps on the other node middba2. After the 11.1.0.7 patch has been applied to all the ASM Oracle Home’s we will see a message indicating that we now can turn off the ASM rolling upgrade.

Note: while the ASM in in ‘rolling upgrade’ mode, a limited number of ASM operations can be performed and certain ASM operations like rebalancing cannot be carried out.

SQL> alter system stop rolling migration;

System altered.


We can finally relocate the service back to its original node - middba1

middba1 $  srvctl relocate service -d racdba -s rac1_service  -i racdba2 -t racdba1 -f

middba1 $ srvctl status service -d racdba -s rac1_service
Service rac1_service is running on instance(s) racdba1

Posted in Oracle 11g, Real Application Clusters | Tagged: , | Leave a Comment »

11.1.0.6 Clusterware upgrade to 11.1.0.7

Posted by Gavin Soorma on December 10, 2009

The following screenshots are taken from a RAC Clusterware upgrade from 11.1.0.6 to 11.1.0.7 on AIX 64 bit platform.

We encountered an issue related to Bug 8925669: ROOT111.SH 0509-150 DEPENDENT MODULE LIBTTSH11.SO COULD NOT BE LOADED. I have provided the workaround we used in this case and which did work for us.

The last screen will indicate that we need to run the root111.sh script which will stop and start the clusterware processes in the upgraded CRS Home

The installer has detected that your Cluster Ready Services (CRS) installation is distributed across the following nodes:

middba1
middba2

Because the software consists of local identical copies distributed across each of the nodes in the cluster, it is possible to patchyour CRS installation in a rolling manner, one node at a time.

To complete the installation of this patchset, you must perform the following tasks on each node:

1. Log in as the root user.
2. As the root user, perform the following tasks:

a. Shutdown the CRS daemons by issuing the following command:
/u01/crs/11.1.0/bin/crsctl stop crs
b. Run the shell script located at:
/u01/crs/11.1.0/install/root111.sh
This script will automatically start the CRS daemons on the
patched node upon completion.

3. After completing this procedure, proceed to the next node and repeat.

While running the root111.sh on the first node we encountered this error

“Could not load program /u01/crs/11.1.0/bin/ctsctl.bin:
Dependent module /u01/crs/11.1.0/lib/libttsh11.so could not be loaded.”

For this we went and changed the permissions for the file libttsh11.so which was owned by root to the software owner ‘crs’ and the group to dba

middba1 $ ls -l libttsh11.so
-rwxr-x— 1 root system 58680930 Dec 9 15:57 libttsh11.so

changed to

middba1 $ ls -lrt libtt*
-rwxr-x— 1 crs dba 58680930 Dec 9 15:57 libttsh11.so

The same was repeated on the second node in the cluster as well.

To confirm the upgrade we can run the following crsctl commands on each node:

middba2 $ crsctl query crs activeversion
Oracle Clusterware active version on the cluster is [11.1.0.7.0]

middba2 $ crsctl query crs softwareversion
Oracle Clusterware version on node [middba2] is [11.1.0.7.0]

Posted in Oracle 11g, Real Application Clusters | Tagged: , , , | Leave a Comment »

ORA-27504 error creating ASM instance on 11g RAC

Posted by Gavin Soorma on August 19, 2009

While creating the ASM instance using DBCA in an 11g RAC environment on 64 bit AIX 5L, we got the ORA-27504 error along with others as shown below.

asmdb7

The problem was caused by incorrect UDP and TCP packet settings and is documented in Metalink note 300956.1

As root, the following changes need to be made

# no -o tcp_sendspace=262144
# no -o tcp_recvspace=262144
# no -o udp_sendspace=65536
# no -o udp_recvspace=262144
# no -o rfc1323=1

Adding these entries to the /etc/rc.net will ensure that these parameters take effect on every machine reboot.

Posted in Oracle 11g, Real Application Clusters | Tagged: , , , | Leave a Comment »

11g RAC Software Relink Error on AIX

Posted by Gavin Soorma on August 18, 2009

While relinking the Oracle 11g RAC software on 64 bit AIX platform, we got the error as shown below:

ora11

Looking at the installation log we found the following lines:

ld: 0706-006 Cannot find or open library file: -l ha_gs_r
ld:open(): A file or directory in the path name does not exist.
ld: 0706-006 Cannot find or open library file: -l ha_em_r
ld:open(): A file or directory in the path name does not exist.
make: 1254-004 The error code from the last command is 255

The cause of this error is that a particluar fileset rsct.basic.rte which is required specifically for a 11g RAC installation has not been installed at the OS level. After the fileset was installed we could relink the software without the same ld errors as mentioned above.

We can check if the fileset has been installed by running the command ‘lslpp’ as shown below.

testdb:/u02/oradata/testdb> lslpp -l rsct.basic.rte
lslpp: Fileset rsct.basic.rte not installed.

After the fileset is installed we can confirm the same…

middba1:/u01/oracle/dump> lslpp -l rsct.basic.rte
 Fileset                      Level  State      Description
  ----------------------------------------------------------------------------
Path: /usr/lib/objrepos
  rsct.basic.rte            2.4.10.0  COMMITTED  RSCT Basic Function

This package is part of the OS Patch IY98287 which is one of the mandatory required patches to be installed on AIX if we are configuring a 11g RAC environment.

This is also documented in Metalink Note 471701.1

Posted in Oracle 11g, Real Application Clusters | Tagged: , , , | Leave a Comment »

11g RAC Transparent Application Failover (TAF)

Posted by Gavin Soorma on August 14, 2009

This example illustrates Transparent Application Failover (TAF) in a two node AIX 11g RAC configuration.

The database name is racdb and the two instances are racdb1 and racdb2.

Node 1 – middba1 which hosts instance racdb1.
Node2 – middba2 which hosts instance racdb2

We have defined two services racdb1 and racdb2. For service racdb1 the preferred instance is racdb1 and
for service racdb2 the preferred instance is racdb2.

The following srvctl commands illustrates the same.

middba1:/u01/oracle/dump> srvctl status service -d racdb -s racdb1
Service racdb1 is running on instance(s) racdb1

middba1:/u01/oracle/dump> srvctl config service -d racdb -s racdb1
racdb1 PREF: racdb1 AVAIL: racdb2

middba1:/u01/oracle/dump> srvctl status service -d racdb -s racdb2
Service racdb2 is running on instance(s) racdb2

middba1:/u01/oracle/dump> srvctl config service -d racdb -s racdb2
racdb2 PREF: racdb2 AVAIL: racdb1

From a SQL*PLUS client, we establish a session as user SYSTEM using the service racdb1. Note the machine name where this service is running from.

testdb:/u01/oracle> sqlplus system/xxx@racdb1

SQL*Plus: Release 11.1.0.6.0 - Production on Fri Aug 14 10:37:03 2009

Copyright (c) 1982, 2007, Oracle.  All rights reserved.

Connected to:
Oracle Database 11g Enterprise Edition Release 11.1.0.6.0 - 64bit Production
With the Partitioning, Real Application Clusters and Real Application Testing options

SQL> select host_name from v$instance;

HOST_NAME
----------------------------------------------------------------
middba1

We then execute a long running SELECT statement

SQL> select * from sh.sales;

While this statement is running we reboot the node middba1 where this service is currently connected to.

We will note that the screen will momentarily freeze while the service gets relocated to the other functional node racdb2 and then the query continues from where it got interrupted. The session will not get disconnected – so for the end user it is transparent from which instance the query is being serviced.

If we check the status of the cluster resources, we will find that the VIP from node middba1 has now relocated to middba2. Other resources that were originally running from node middba1 are now offline.

middba2:/u01/oracle> crs_stat -t
Name           Type           Target    State     Host
------------------------------------------------------------
ora....SM1.asm application    ONLINE    OFFLINE
ora....A1.lsnr application    ONLINE    OFFLINE
ora....ba1.gsd application    ONLINE    OFFLINE
ora....ba1.ons application    ONLINE    OFFLINE
ora....ba1.vip application    ONLINE    ONLINE    middba2
ora....SM2.asm application    ONLINE    ONLINE    middba2
ora....A2.lsnr application    ONLINE    ONLINE    middba2
ora....ba2.gsd application    ONLINE    ONLINE    middba2
ora....ba2.ons application    ONLINE    ONLINE    middba2
ora....ba2.vip application    ONLINE    ONLINE    middba2
ora.racdb.db   application    ONLINE    ONLINE    middba2
ora....cdb1.cs application    ONLINE    ONLINE    middba2
ora....b1.inst application    ONLINE    OFFLINE
ora....db1.srv application    ONLINE    ONLINE    middba2
ora....cdb2.cs application    ONLINE    ONLINE    middba2
ora....b2.inst application    ONLINE    ONLINE    middba2
ora....db2.srv application    ONLINE    ONLINE    middba2

If we check the status of the service, we will find that the service racdb1 is now running on node middba2 connecting to the instance racdb2.

middba2:/u01/oracle> srvctl status service -d racdb -s racdb1
Service racdb1 is running on instance(s) racdb2

Once the node middba1 which was rebooted finally does come online, we will find that the VIP will relocate
to its original node middba1 once the cluster services are brought online after the reboot.

middba2:/u01/oracle> crs_stat -t
Name           Type           Target    State     Host
------------------------------------------------------------
ora....SM1.asm application    ONLINE    ONLINE    middba1
ora....A1.lsnr application    ONLINE    ONLINE    middba1
ora....ba1.gsd application    ONLINE    ONLINE    middba1
ora....ba1.ons application    ONLINE    ONLINE    middba1
ora....ba1.vip application    ONLINE    ONLINE    middba1
ora....SM2.asm application    ONLINE    ONLINE    middba2
ora....A2.lsnr application    ONLINE    ONLINE    middba2
ora....ba2.gsd application    ONLINE    ONLINE    middba2
ora....ba2.ons application    ONLINE    ONLINE    middba2
ora....ba2.vip application    ONLINE    ONLINE    middba2
ora.racdb.db   application    ONLINE    ONLINE    middba2
ora....cdb1.cs application    ONLINE    ONLINE    middba2
ora....b1.inst application    ONLINE    ONLINE    middba1
ora....db1.srv application    ONLINE    ONLINE    middba2
ora....cdb2.cs application    ONLINE    ONLINE    middba2
ora....b2.inst application    ONLINE    ONLINE    middba2
ora....db2.srv application    ONLINE    ONLINE    middba2

Note however, that the service racdb1 will continue running on node middba2 (instance racdb2)even though the original node which it was running on middba1 has been brought online.

We will have to relocate the service back to its original node using the srvctl command

middba1:/u01/oracle> srvctl relocate service -d racdb -s racdb1 -i racdb2 -t racdb1

Posted in High Availability, Oracle 11g, Real Application Clusters | Tagged: , , , | Leave a Comment »

CRS-0184 error and its resolution

Posted by Gavin Soorma on August 12, 2009

While installing the 11g Clusterware on 64 bit AIX 5L, we  encountered the CRS-0184 error while running the root.sh on the second node.

The error was …

Cluster Synchronization Services is active on all the nodes.

Waiting for the Oracle CRSD and EVMD to start

Oracle CRS stack installed and running under init(1M)

Running vipca(silent) for configuring nodeapps

Creating VIP application resource on (2) nodes0:CRS-0184: Cannot communicate with the CRS daemon

 
The crsd.log file showed the following lines ….

2009-08-10 14:18:36.059: [ CRSMAIN][10286] Failed to spawn a thread for UI connection. error=-11
2009-08-10 14:19:12.143: [ CRSMAIN][10286] Failed to spawn a thread for UI connection. error=-11
2009-08-10 14:20:15.037: [ CRSMAIN][10286] Failed to spawn a thread for UI connection. error=-11
2009-08-10 14:21:11.379: [ CRSMAIN][10286] Failed to spawn a thread for UI connection. error=-11

The fix for this error was to increase the Shell Limits for the user ‘root’ as the Clusterware processes are started as root.

The documentation does mention that we need to so the same for the ‘oracle’ user account and fails to mention that we need to do it for the root account as well.

After changing the shell limits for root (on BOTH nodes) the CRS-0184 problem was resolved.

middba1 # ulimit -a
time(seconds) unlimited
file(blocks) unlimited
data(kbytes) unlimited
stack(kbytes) 4194304
memory(kbytes) unlimited
coredump(blocks) 2097151
nofiles(descriptors) unlimited

Note that since the root.sh on the second node will internally launch VIPCA and this part did not run to successful completion, we had to manually run the vipca command as root to get the vip,ons amd gsd resources online.
 

Posted in Administration, Oracle 11g, Real Application Clusters | Tagged: , , , | Leave a Comment »