Saturday, 20 February 2016

Implementing One Node RAC

In the past few years, technology has become a very important aspect of our lives. There isn’t any moment when we are not connected to or using technology and this is not to do some core geeky stuff but for the tasks which have become a part of our lives. Can you imagine a day when you are unable to connect to your email or unable to check the latest updates from your friends from Facebook or Twitter? I don’t think that the answer would be a yes, correct? In addition to this several parts of our life are inextricably connected with technology today.
When was the last time you went to a bank to withdraw the money to transfer it to your mom’s account or you stood in a queue to pay the electricity or phone bills? Again the answer would be a no I believe! Technology , in today’s time, is not a word that is limited to those  who keep on looking at their black colored terminal windows filled with weird looking texts but it’s meant for those “normal” people as well who are using it to make their lives easier and comfortable.
But there is a small issue!
Though technology is meant to make our lives simpler, faster and easier, at the end of the day it is a human invention and is therefore fallible. So you see, machines may fail, may become overloaded, may become obsolete and whenever any such thing happens, the technology running on that machine also ceases! Can you see what’s coming next? If the technology stops, you stop as well! As a result, for the past few years, the expression “high availability”(HA) has been gaining ground and is heard almost all the time by all those who are  maintaining anything that is related to a service which needs to be up and running as much as possible, not just for making operations simpler but also to keep the cash registers ringing.
Think about it, a closed bank branch or ATM for a bank is no good at all, doesn’t matter how small that branch would be! Hence come the numerous solutions for HA! There are a variety of solutions provided by a large number of vendors and so is done by Oracle Corp as well.  Oracle Corp provides a range of solutions under it’s Maximum Availability Architecture (MAA) stack. Notable among them is its Oracle Real Application Clusters (RAC) which takes care of node and instance crash issues.

But the issue is Oracle RAC is a solution which is not affordable for quite a few shops . Though if we were to consider the cost aspect of RAC in the light of its simplicity and other benefits as compared to other technologies, it would be a worthwhile investment, however, it is always good to have more options, which can provide you similar sort of benefits, if not completely than as closely as possible and would be even better if  at  a reduced cost! This is what Oracle Corp brought out with 11.2 version (11201) and called it Oracle RAC One Node!

But is it “really” a new feature?

Well, yes it is –sort of! Though Oracle RAC One Node is a new offering from 11.2 on, a similar functionality was available in the previous releases of Oracle db as well. This was known as Cold Cluster Failover, a topology using which the crash of a single instance of a database could be made to do a failover to another node using the cluster services. This technique was applicable to the databases of release 11.1 and 10.2 versions using which a constant monitoring was possible  for the availability of the instance or the node and if the node or the instance running on it crashed, it was possible to failover the connections over some other node. Needless to say that the nodes were supposed to use a shared storage. Also, it was mandatory to use Application VIP to make sure that the client is unaware about the existence of such a configuration. This would provide an efficient method to ensure that any kind of crash would be result in an automatic failover to the second node and also, any sort of recovery needed for the failed resources would be also done by Oracle’s Clusterware software. In addition to this, if one was looking for a planned downtime for any kind of house-keeping or for the workloads which couldn’t be catered on the source node, then this could be manually relocated through the command CRS_RELOCATE (deprecated in 112).
Though the solutions seem to appear almost fool-proof, its implementation was the real challenge.   An intensive manually driven process would be required to achieve this HA of single instance.  Various steps and commands would be required to get the job done and probably, that’s why, though this technique is a published feature of Oracle’s Clusterware, it was seldom seen being implemented at any site. Obviously RAC ONE Node solved many of the challenges and made the whole process far simpler and transparent.

So any significant good feature(s) of RAC One Node?

As discussed, the functionality was available in the previous versions as well but it was quite challenging to implement. RAC One node, introduced in 11.2 for the first time(and getting enhanced and feature rich since then constantly) offers the same functionality but more effectively and simply.  Some of the features (but not a complete list) which make it very effective is:
  1. Online migration of the sessions from active node to the other
  2. Easy conversion from RAC One Node to complete RAC and vice-versa
  3. Integrated with the features like Instance Caging to provide a better resource management
  4. Supported over Exadata
  5. Supported over OVM(Oracle Virtualization Manager)
  6. Support for Rolling Patches of RAC to provide the same interface on RAC ONE Node
  7. Easy creation of One Node database using DBCA(from 11.2.0.2)
  8. Supported on all those platforms where Oracle RAC is supported
Oracle RAC One Node is based on the same model of the RAC but it has a major difference from it as well. Full RAC works as an Active-Active solution i.e. the number of nodes that are in the cluster, they all are active and can accept connections and workloads and work together as one single unit. But RAC One Node, as the name suggests as well, works as an Active-Passive solution where at one time, only one node is going to be active at any time with the other nodes being available and ready to accept the workload in the case of a planned or an unplanned downtime related to the first node.

RAC One Node,  handling of Planned Downtime?

For a planned migration, the RAC one node  works perfectly because it is going to let the users continue with their work without  impacting the business. When in 11201, the feature was first introduced by Oracle Corp, there was a need to use the utility OMOTION to achieve a transition of the instance from source node to the target node. But from 11202 this is a non-requirement and the clusterware software itself takes care of everything, including the migration of the instance from one node to another and to move the sessions along with it.
Screenshot: three nodes belonging to one cluster, each connected to  a common, shared storage
In the above diagram, we have got three nodes belonging to one cluster, each connected to  a common, shared storage. We also have a user session which is alive while being connected to the first node.
If a planned failover occurs, the clusterware would detect it and would start shifting the session(s) which are connected to the primary node to the target node, after their current work is finished over it.  Also, in the meantime, another instance would be up on the second node and for a small time window, there would be an Oracle instance running on both the nodes.
Screenshot: If a planned failover occurs, the clusterware would detect it and would start shifting the sessions
Finally, only one node (in this example, node 2) would be active and all the sessions connected over the first node would have been migrated to the node 2 and also the instance on the node 1 would be also dismounted.
Screenshot: Only one node would be active and all the sessions connected over the first node would have been migrated to the node 2
Because RAC One Node offers an online migration of the instance from the source node to the target node,  the mechanism offers a transparent workaround for the issues which would involve the node being crashed or the instance being down or for those circumstances where the node is not having enough resources to cater the incoming workload. As Oracle 11.2 RAC uses the concept of SCAN(Single Client Access Name) , if the client is configured to use the SCAN name resolution method for discovering the cluster, it would be completely transparent for the client if such kind of migration would take place.  The only thing that probably may take long time is the actual migration period from the source node to the target node-it’s customizable though.
As shown that the RAC One node would support the online migration with minimal impact on the business, it’s important to make this fact clear that this possibility, of running two instances on two different nodes is just on a temporary basis and is also there only for the planned downtime.  The first instance along with the second instance would be only kept alive till the migration of the sessions is not completed .Once the migration of the instance on the other node is complete(along with the sessions connected on the first node) , the first instance would be closed and once again, the mode would be still Active-Passive only with only one instance up.  This also makes complete sense as if the option to have two instances running on two nodes would be there, RAC One Node wouldn’t be any different from a normal RAC, would it?
Let’s try to get the online relocation done by ourselves and see what happens to the existing and on to the target instance.
[oracle@host01 ~]$ srvctl config database -d aristone
Database unique name: aristone
Database name: aristone
Oracle home: /u01/app/oracle/product/11.2.0/dbhome_1
Oracle user: oracle
Spfile: +FRA/aristone/spfilearistone.ora
Domain:
Start options: open
Stop options: immediate
Database role: PRIMARY
Management policy: AUTOMATIC
Server pools: aristone
Database instances:
Disk Groups: FRA
Mount point paths:
Services: srvc1
Type: RAC One Node
Online relocation timeout: 30
Instance name prefix: aristone
Candidate servers: host01,host02,host03
Database is administrator managed
Before we start the migration, let’s check the status of the database and it’s instance right now.
[oracle@host01 ~]$ srvctl status database -d aristone
 Instance aristone_1 is running on node host01
 Online relocation: INACTIVE
So what we have here is a RAC One Node database with the SID aristone and it is running on the node HOST01with the instance aristone_1.  So now, we shall try to relocate the instance from this node to the target nodeHOST02.  Also it’s shown that the online relocation is not active at the moment.
It’s important to mention that with the version 11201, this task was done by a utility OMOTION but from 11202 onwards, the use of this utility is not required. The release of the software used for this demo was 11203 so obviously, the utility wasn’t required.
The conversion is done using the command SRVCTL RELOCATE DATABASE in which we are going to pass the name of the target node and the option to be in verbose mode for the output. Below is the output of this command:
[oracle@host01 ~]$ srvctl relocate database -d aristone -n host02 -w 30 -v
Configuration updated to two instances
Instance aristone_2 started
Services relocated
Waiting for up to 30 minutes for instance aristone_1 to stop ...
Instance aristone_1 stopped
Configuration updated to one instance
And from another session, we can see that the migration is going on.
[oracle@host01 ~]$ srvctl status database -d aristone
Instance aristone_2 is running on node host02
Online relocation: ACTIVE
Source instance: aristone_1 on host03
Destination instance: aristone_2 on host02
We can see that the second instance has come up and the relocation status is also shown as ACTIVE which means that the relocation is going on. We would need to run the command couple of times as it may take longer for the instance to crash.
[oracle@host01 ~]$ srvctl status database -d aristone
Instance aristone_2 is running on node host02
Online relocation: ACTIVE
Source instance: aristone_1 on host03
Destination instance: aristone_2 on host02

[oracle@host01 ~]$ srvctl status database -d aristone
Instance aristone_2 is running on node host02
Online relocation: ACTIVE
Source instance: aristone_1 on host03
Destination instance: aristone_2 on host02

Finally when the relocation would be over, this would be shown as the output,
[oracle@host01 ~]$ srvctl status database -d aristone
Instance aristone_2 is running on node host02
Online relocation: INACTIVE [oracle@host01 ~]$
As we can see, one the relocation is complete only the instance “aristone_2” is going to be working and the status of ONLINE RELOCATION is going to be completed.

What about Unplanned disasters?

If it’s an un-planned shutdown, though the instance would be up on the second node but in this case, the online migration of the sessions will not take place.
In case the reason behind the instance on the initial node has come down, was unplanned, it would cause a downtime as a result of which a smooth migration of the user sessions from the source node to the target, as in case of planned downtime , won’t be possible. But since RAC One essentially is running Grid Infrastructure or using a little old terminology, the clusterware software underneath, it would be able to detect this issue-without a DBA’s intervention.  As with the normal working of the clusterware, the first attempt would be to restart the failed instance on the very same node where it was running initially. If for some reason the instance can’t be started on the same node or worse, the node itself has crashed, there would be an automatic migration of the instance to the target node.

So how Do I Create a RAC One Node Database?

With 11202, the option to create the RAC One Node database came up with the DBCA itself.  In the prior release (11201), this wasn’t possible to be done unless you wouldn’t apply the patch #9004119 on your installation. Starting with 11202 release onwards, the option is built in into the DBCA if it finds itself running over a clustered environment.
Screenshot: creating a RAC One Node database
As shown, the 2nd option would be creating a RAC One Node database. The process is not much different from creating a normal RAC database  and we shall see it in the below screenshots.
After selecting the option to create the database, next step would be choose the option to create the database and to choose the right template for the database creation.
Screenshot: choose the right template for the database creation
Screenshot:Oracle rac create database template
Next step would be to enter the details of the for the database name and SID and to define the service that would be used by this database.  Also, you can choose whether you can choose the database to be an Admin-Managed or Policy Managed.
Screenshot: Oracle RAC one node identification
So for our example, we have chosen both the database name and SID to be ARISTONE and it’s an Admin-Managed database and being one,  the list of the nodes are shown which we can select to define that target node where the instance would be failed over in the event of a crash. The failover can’t happen to that node which is not selected in this step of the wizard by you if the database management is of Admin-Managed type.  If you are willing to create a Policy-Managed database, the node list won’t be shown but the option to use a server pool would be there and you must ensure that in that server pool, there is a node available so that the failover can happen over it.
Next step would be to choose the option of Database Control or not and also to select the passwords for the system accounts.
Screenshot: Oracle RAC One Node management options
Screenshot: Oracle  RAC One Node database credentials
The next option would be to choose the storage location which can be either ASM or file system. If you are going to choose ASM, the pop-up would ask you to give the password of the user ASMSNMP as well. We have chosen the option of ASM for the storage.
Screenshot: Oracle RAC One Node database file locations
Next would be whether we want to use Fast Recovery Area(FRA) or not.
Screenshot: Oracle RAC One Node database recovery
Next would be the option to choose the option run any script and also to choose the memory parameters and other database options.
Screenshot: Oracle RAC One Node database content
Finally it’s the option of Create Database and Summary and database creation would be started.
Screenshot: Oracle RAC One Node database creation
We can see the progress of the database creation in the progress window.
Screenshot: Oracle RAC One Node database creation progress
Finally, we have the database created and information about it shown in the last page.
Screenshot: Oracle RAC One Node database creation finished
So finally , we have the database ARISTAONE got created which is a RAC One Node database and we can confirm this even from it’s properties shown from the CLI too by running DBCA in the silent mode.
Since it’s a RAC database itself though working with one node only,
oracle@host01 ~]$ srvctl config database -d aristone
Database unique name: aristone
Database name: aristone
Oracle home: /u01/app/oracle/product/11.2.0/dbhome_1
Oracle user: oracle
Spfile: +FRA/aristone/spfilearistone.ora
Domain:
Start options: open
Stop options: immediate
Database role: PRIMARY
Management policy: AUTOMATIC
Server pools: aristone
Database instances:
Disk Groups: FRA
Mount point paths:
Services: srvc1
Type: RAC One Node Online relocation timeout: 30
Instance name prefix: aristone
Candidate servers: host01,host02,host03
Database is administrator managed
It is quite evident that most of the properties of the RAC One Database are similar to those of the normal RAC database but there are few differences as well and are specific to RAC One Node database. These are as follows:
  1. Database Type : This shows that which type of database it is. For RAC One, the output is RAC One Node.
  2. Online Relocation Timeout : This is the time that’s going to be given to the sessions to complete their transactions and switchover to the target node without any issue. If the transactions fail to get completed in this time period, it would be aborted and session would be switched over to the target node. The default unit of this parameter is in minutes and the value is 30(minutes) . Maximum value allowed for this parameter would be 12hours (720 minutes) .
  3. Candidate Servers: This is the list of those nodes where the relocation can happen.

I have a single instance, non-RAC db. Can I convert it to RAC One Node database?

In order to convert a single instance database into RAC One Node database, one can take the aid of  tool DBCA. All the tasks like creating the redo threads , undo tablespaces etc which are required for the RAC One Node database to function are going to be done automatically by the DBCA. So this makes the transition much easier. The only thing that’s going to be a pre-requisite is that the underlying database must be supporting all the mandatory properties or conditions that are required for a clustered database to run, for example a shared storage must be there on which the files should be placed.  Also, all the necessary software required to get the services of the RAC running must be there along with the required hardware-like a shared storage.
The conversion would be done with the help of the templates that can be created from the existing single instance database. Using this template, another new  RAC One Node database can be created.

And what if I have a RAC One Node database and I want to convert it to a complete RAC database?

Yes, you can do it and very easily!
If you have already got a RAC One Node database and you want to convert to a complete RAC , we can do it using the command SRVCTL  CONVERT. Since we have a database ARISTONE which is a RAC One Node database, we shall convert it to a complete RAC database which would run over three hosts. So let’s shut down it and issue the CONVERT command.
[oracle@host01 ~]$ srvctl stop database -d aristone
[oracle@host01 ~]$ srvctl convert database -d aristone -c RAC
Now since we have the instance 2 running on one host already, for the remaining two hosts we shall be adding the instances.
[oracle@host01 ~]$ srvctl add instance -d aristone -i aristone_3 -n host03
[oracle@host01 ~]$ srvctl add instance -d aristone -i aristone_1 -n host01
Now, let’s start the newly converted database and check it’s status.
[oracle@host01 ~]$ srvctl start database -d aristone
[oracle@host01 ~]$ srvctl status database -d aristone
Instance aristone_1 is running on node host01
Instance aristone_2 is running on node host02
Instance aristone_3 is running on node host03
So as expected the database is up and running with 3 instances on all the 3 nodes and is successfully converted. Let’s confirm that it’s a RAC database only by seeing it’s properties from the SRVCTL CONFIG command.
[oracle@host01 ~]$ srvctl config database -d aristone
Database unique name: aristone
Database name: aristone
Oracle home: /u01/app/oracle/product/11.2.0/dbhome_1
Oracle user: oracle
Spfile: +FRA/aristone/spfilearistone.ora
Domain:
Start options: open
Stop options: immediate
Database role: PRIMARY
Management policy: AUTOMATIC
Server pools: aristone
Database instances: aristone_2,aristone_3
Disk Groups: FRA
Mount point paths:
Services:
Type: RAC
Database is administrator managed
Since it’s a RAC database now so the TYPE is shown as RAC and also, there are no parameters like TIMOUT etc are shown which were previously visible for the RAC One Node database.

No comments:

Post a Comment