Oracle VM 3.1 Disaster Recovery (HA)

On this scenario we will talk about how to implement HA (High Availability) for OVM 3.1 between a Primary Site (Production) and a Secondary Site (DR) located at 800 km away. On this post we will only talk about the storage replication part and how to make it work when a failover is required (I will not cover the Database replication side (Using Data Guard).

Please note that the Oracle VM Manager should be installed on both sites with the same UUID using the “./runInstaller.sh –uuid <uuid>” command. The UUID for the Oracle VM manager could be found in the configuration file of the OVM manager.

$ cat /u01/app/oracle/ovm-manager-3/.config | grep -i uuid

The Preparation

1) Identify the source LUN in the production OVM Manager we want to HUR to the DR site. (360060e8005449c000000449c00004606)

2) Present the LUN to the DR site.

Also specify the HUR source LUN (360060e8005449c000000449c00004606) and HUR target LUN (You will get the LUN ID from the SAN Engineers when they present the LUN to the DR site).

Ask the San Engineers to activate the HUR process.

The Failover

In the event of a primary site failure, DR procedures will be kicked off as shown below.

1) Production site is down, the storage replication synchronization between the production site and the DR site would have to be stopped.

2) Since these are block devices, there will be an OCFS2 filesystem on the devices. You will not be able to present the replicated repositories to the OVM servers at the DR site because of the cluster ID tamped into the repositories. This will cause mounting of the OCFS2 filesystems at the DR site to fail.