Saturday 20 February 2016

Rolling Forward a Physical Standby Database Using the RECOVER Command . 12C New Features

A standby database is a transactionally-consistent copy of the production database. It enables production Oracle database to survive disasters and data corruption. If the production database becomes unavailable because of a planned or an unplanned outage, Data Guard can switch a standby database to the production role, minimizing the downtime associated with the outage. Moreover, performance of production database can be improved by offloading resource-intensive backup and reporting operations to standby systems. As you can see, it’s always desirable to have standby database synchronized with the primary database.
A standby database might lag behind the primary for various reasons like:
  • Unavailability of or insufficient  network bandwidth between primary and standby database
  • Unavailability of Standby database
  • Corruption / Accidental deletion of  Archive Redo Data on primary
If standby database lags behind the primary database:
  • Switchover will take more time.
  • Failover will result in data loss.
  • An attempt to issue Real Time Query against the standby database will result in error
    ORA-03172 : STANDBY_MAX_DATA_DELAY of n seconds exceeded.
Synchronizing the standby and primary databases can be done by copying and applying the archived logs from the primary database but this process is quite time consuming as it will first apply both the COMMITED and the NON COMMITED transactions followed by rolling back uncommitted transactions. Employing incremental backups of the primary database containing changes since the standby database was last refreshed is a faster alternative which will recover the standby database much faster as it will apply only the COMMITED transactions on the standby database. Moreover, incremental backups are also useful in cases when there are missing archived logs on Primary which have not been applied to the standby database
Prior to 12c, in order to roll forward the standby database using incremental backups you would need to:
  • Create a control file for the standby database on the primary database.
  • Take an incremental backup on the primary starting from the SCN# of the standby database.
  • Copy the incremental backup to the standby host and catalog it with RMAN.
  • Mount the standby database with newly created standby control file.
  • Cancel managed recovery of the standby database and apply incremental backup to the standby database.
  • Start managed recovery of standby database.
In 12c, this procedure has been dramatically simplified. Now you can use the RECOVER … FROM SERVICE command to synchronize the physical standby database with the primary database.  This command does the following:
  • Creates an incremental backup containing the changes to the primary database. All changes to data files on the primary database, beginning with the SCN in the standby data file header, are included in the incremental backup.
  • Transfers the incremental backup over the network to the physical standby database.
  • Applies the incremental backup to the physical standby database.
This results in rolling forward the standby data files to the same point-in-time as the primary. However, the standby control file still contains old SCN values which are lower than the SCN values in the standby data files. Therefore, to complete the synchronization of the physical standby database, the standby control file needs to be refreshed to update the SCN#.
Now I will demonstrate the steps to refresh the physical standby database with changes made to the primary database using the RECOVER…FROM SERVICE command.
Overview:
  • View current configuration and verify that primary database, far sync for primary and standby database are in sync
  • Simulate loss of archived logs on the primary database
  • Refresh the physical standby using the RECOVER…FROM SERVICE command

Setting up the example

View Current configuration

DGMGRL> show configuration;
Configuration - drsolution

Protection Mode: MaxPerformance
Databases:
   boston   - Primary database
   bostonFS - Far Sync
   london   - Physical standby database
   london2  - Logical standby database (disabled)
   londonfs - Far Sync (inactive)

Fast-Start Failover: DISABLED

Configuration Status:

SUCCESS
It can be verified that primary database (boston), far sync for primary (bostonFS) and physical standby (london) are in sync.
-- Primary (boston) --

BOSTON>select max(sequence#) from v$archived_log;

MAX(SEQUENCE#)

--------------

        190

-- Far Sync for primary (bostonFS) --

BOSTONFS>select max(sequence#) from v$archived_log;

MAX(SEQUENCE#)

--------------

        190

-- Physical standby (london) --

LONDON>select max(sequence#) from v$archived_log;

MAX(SEQUENCE#)

--------------

         190

Simulate loss of archived logs on primary database (boston)

Let’s stop redo transport from primary (boston) and switch logs on primary so that far sync (bostonFS ) and physical standby (london) lag behind primary (boston).
DGMGRL> show database boston
Database - boston
   Role:              PRIMARY
   Intended State:    TRANSPORT-ON
   Instance(s):
      boston
Database Status:
SUCCESS

DGMGRL> edit database boston set state='TRANSPORT-OFF';

Succeeded.

BOSTON>alter system switch logfile;
         alter system switch logfile;
         alter system switch logfile;

System altered.
We can verify that logs are not being transported to far sync bostonFS and standby london. Although sequence# of the latest archived log is 194 on primary, archived logs up to sequence# 191 only have reached far sync and physical standby.
BOSTON>select max(sequence#) from v$archived_log;
MAX(SEQUENCE#)
--------------
           194

BOSTONFS>select max(sequence#) from v$archived_log;
MAX(SEQUENCE#)
--------------
           191

LONDON>select max(sequence#) from v$archived_log;
MAX(SEQUENCE#)
--------------
           191
Next we’ll find out the names of archived logs generated on primary which have not been transported to standby:
BOSTON>select sequence#, name from v$archived_log where sequence# >191;

SEQUENCE# NAME

---------- ----------------------------------------------------------------

192 /u01/app/oracle/fast_recovery_area/BOSTON/archivelog/2015_02_09/o1_mf_1_192_bfjgx50j_.arc
193 /u01/app/oracle/fast_recovery_area/BOSTON/archivelog/2015_02_09/o1_mf_1_193_bfjgx6tb_.arc
194 /u01/app/oracle/fast_recovery_area/BOSTON/archivelog/2015_02_09/o1_mf_1_194_bfjgx6yh_.arc
In a real-time environment, archived logs on primary could be lost due to:
  • their deletion if archivelog deletion policy has not configured  properly
  • their corruption
In our experimental setup, I’ll  simulate loss of archived logs on primary by renaming them:
BOSTON>ho mv /u01/app/oracle/fast_recovery_area/BOSTON/archivelog/2015_02_09/o1_mf_1_192_bfjgx50j_.arc
/home/oracle/arch_192.log
BOSTON>ho mv /u01/app/oracle/fast_recovery_area/BOSTON/archivelog/2015_02_09/o1_mf_1_193_bfjgx6tb_.arc
/home/oracle/arch_193.log
BOSTON>ho mv /u01/app/oracle/fast_recovery_area/BOSTON/archivelog/2015_02_09/o1_mf_1_194_bfjgx6yh_.arc
/home/oracle/arch_194.log
Now even if we restart redo transport from primary, gap in redo logs on far_sync / standby cannot be resolved as some logs are missing on primary.
DGMGRL>  edit database boston set state='Transport-on';
DGMGRL> show configuration;
Configuration - drsolution
   Protection Mode: MaxPerformance
   Databases:
   boston   - Primary database
      Error: ORA-16724: cannot resolve gap for one or more standby databases
      bostonfs - Far Sync
        london   - Physical standby database
        london2  - Logical standby database (disabled)
      londonfs - Far Sync (inactive
Fast-Start Failover: DISABLED
Configuration Status:
ERROR

DGMGRL> show far_sync bostonFS
Far Sync - bostonfs
   Transport Lag:     3 minutes 43 seconds (computed 1 second ago)
   Instance(s):
      bostonFS
Far Sync Status:
SUCCESS

DGMGRL> show database london
Database - london
   Role:              PHYSICAL STANDBY
   Intended State:    APPLY-ON
   Transport Lag:     3 minutes 53 seconds (computed 0 seconds ago)
   Apply Lag:         3 minutes 53 seconds (computed 0 seconds ago)
   Apply Rate:        759.00 KByte/s
   Real Time Query:   OFF
   Instance(s):
      london
Database Status:
SUCCESS
It can be verified that SCN# (3717618) of standby (london) is lagging behind that (3718999) of the primary (boston)
BOSTON>select current_scn from v$database;
CURRENT_SCN
          -----------
            3718999

LONDON>select current_scn from v$database;
CURRENT_SCN
          ----------
           3717618

Refresh the physical standby using the RECOVER…FROM SERVICE command

First of all let us identify the datafiles on standby database which are out of sync with respect to primary.
On  checking  checkpoint_change# in datafile headers on primary (boston) and standby (london), we note that whereas checkpoint_change# of datafiles 5,7,8,9,10,11 match on primary and standby, for rest of the  datafiles (1,3,4,6) standby is lagging behind  primary.
BOSTON>select file#, checkpoint_change# from v$datafile;

     FILE# CHECKPOINT_CHANGE#
---------- ------------------
         1            3717658
         3            3717658
         4            3717658
         5            1910129
         6            3717658
         7            1910129
         8            3690127
         9            3690127
         10           3690127
         11           3690127

10 rows selected.

LONDON>select file#, checkpoint_change# from v$datafile;

     FILE# CHECKPOINT_CHANGE#
---------- ------------------
         1            3717619
         3            3717619
         4            3717619
         5            1910129
         6            3717619
         7            1910129
         8            3690127
         9            3690127
        10            3690127
        11            3690127

10 rows selected.
In order to synchronize the standby we will stop the managed recovery processes on the physical standby database and place the physical standby database in MOUNT mode.
LONDON>recover managed standby database cancel;

            shutdown immediate;

            startup mount;
Start RMAN and connect as target to the physical standby database. Refresh the data files on the physical standby database by using an incremental backup of the data files on the primary database.
The following command creates a compressed multi-section incremental backup on the primary database that is then used to refresh the standby data files. boston is the net service name of the primary database that is used to refresh the standby database. The NOREDO clause specifies that the archived redo log files must not be applied during recovery.
[oracle@host03 ~]$ . oraenv

ORACLE_SID = [london] ?

[oracle@host03 ~]$ rman target /

RMAN> recover database from service boston noredo using compressed backupset section size 100m;

Starting recover at 09-FEB-15

Starting implicit crosscheck backup at 09-FEB-15

allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=41 device type=DISK
Crosschecked 9 objects
Finished implicit crosscheck backup at 09-FEB-15

Starting implicit crosscheck copy at 09-FEB-15
using channel ORA_DISK_1
Crosschecked 2 objects
Finished implicit crosscheck copy at 09-FEB-15
— Catalogues all the existing backups and archivelogs available on standby —
searching for all files in the recovery area

cataloging files...

cataloging done

List of Cataloged Files

=======================
— Backups —
File Name: /u01/app/oracle/fast_recovery_area/LONDON/autobackup/2015_01_23/o1_mf_s_869760430_bd492tm3_.bkp
File Name: /u01/app/oracle/fast_recovery_area/LONDON/autobackup/2015_01_23/o1_mf_s_869760892_bd4b06ws_.bkp
...
File Name: /u01/app/oracle/fast_recovery_area/LONDON/autobackup/2015_01_22/o1_mf_s_869676552_bd1qongl_.bkp
— Archived logs —
File Name: /u01/app/oracle/fast_recovery_area/LONDON/archivelog/2015_02_06/o1_mf_1_175_bf919zvx_.arc
File Name: /u01/app/oracle/fast_recovery_area/LONDON/archivelog/2015_02_06/o1_mf_1_172_bf8pvrhn_.arc
...
...
File Name: /u01/app/oracle/fast_recovery_area/LONDON/archivelog/2015_01_21/o1_mf_1_54_bcyn59o8_.arc
File Name: /u01/app/oracle/fast_recovery_area/LONDON/archivelog/2015_01_21/o1_mf_1_60_bczt9p7o_.arc
— Skips all the datafiles (5, 7, 8, 9, 10, 11) whose checkpoint_change# on standby match that of primary are skipped for restore operation —
using channel ORA_DISK_1
skipping datafile 5; already restored to SCN 1910129
skipping datafile 7; already restored to SCN 1910129
skipping datafile 8; already restored to SCN 3690127
skipping datafile 9; already restored to SCN 3690127
skipping datafile 10; already restored to SCN 3690127
skipping datafile 11; already restored to SCN 3690127
— Restores multi-section compressed backupsets    with section size of 100m of all the data files (1, 3, 4, 6) whose checkpoint_change# on standby is behind  primary  over network. —
channel ORA_DISK_1: starting incremental datafile backup set restore
channel ORA_DISK_1: using compressed network backup set from service boston
destination for restore of datafile 00001: /u01/app/oracle/oradata/london/system01.dbf
channel ORA_DISK_1: restoring section 1 of 9
channel ORA_DISK_1: restore complete, elapsed time: 00:00:03
channel ORA_DISK_1: starting incremental datafile backup set restore
channel ORA_DISK_1: using compressed network backup set from service boston
destination for restore of datafile 00001: /u01/app/oracle/oradata/london/system01.dbf
channel ORA_DISK_1: restoring section 2 of 9
channel ORA_DISK_1: restore complete, elapsed time: 00:00:01
channel ORA_DISK_1: starting incremental datafile backup set restore
channel ORA_DISK_1: using compressed network backup set from service boston
destination for restore of datafile 00001: /u01/app/oracle/oradata/london/system01.dbf
...
...
channel ORA_DISK_1: restoring section 9 of 9
channel ORA_DISK_1: restore complete, elapsed time: 00:00:01
channel ORA_DISK_1: starting incremental datafile backup set restore
channel ORA_DISK_1: using compressed network backup set from service boston

destination for restore of datafile 00003: /u01/app/oracle/oradata/london/sysaux01.dbf
channel ORA_DISK_1: restoring section 1 of 10
channel ORA_DISK_1: restore complete, elapsed time: 00:00:03
channel ORA_DISK_1: starting incremental datafile backup set restore
channel ORA_DISK_1: using compressed network backup set from service boston
destination for restore of datafile 00003: /u01/app/oracle/oradata/london/sysaux01.dbf
...
...
channel ORA_DISK_1: restoring section 10 of 10
channel ORA_DISK_1: restore complete, elapsed time: 00:00:01
channel ORA_DISK_1: starting incremental datafile backup set restore
channel ORA_DISK_1: using compressed network backup set from service boston

destination for restore of datafile 00004: /u01/app/oracle/oradata/london/undotbs01.dbf
channel ORA_DISK_1: restoring section 1 of 3
channel ORA_DISK_1: restore complete, elapsed time: 00:00:01
channel ORA_DISK_1: starting incremental datafile backup set restore
channel ORA_DISK_1: using compressed network backup set from service boston

destination for restore of datafile 00004: /u01/app/oracle/oradata/london/undotbs01.dbf
...
...
channel ORA_DISK_1: restoring section 3 of 3
channel ORA_DISK_1: restore complete, elapsed time: 00:00:01
channel ORA_DISK_1: starting incremental datafile backup set restore
channel ORA_DISK_1: using compressed network backup set from service boston

destination for restore of datafile 00006: /u01/app/oracle/oradata/london/users01.dbf
channel ORA_DISK_1: restoring section 1 of 1
channel ORA_DISK_1: restore complete, elapsed time: 00:00:01

Finished recover at 09-FEB-15
As soon as standby is recovered, redo transport to far sync (bostonFS) and standby (london) is resumed.
BOSTON>select max(sequence#) from v$archived_log;
MAX(SEQUENCE#)
--------------
          197

BOSTONFS>select max(sequence#) from v$archived_log;
MAX(SEQUENCE#)
--------------
          197

LONDON>select max(sequence#) from v$archived_log;
MAX(SEQUENCE#)
--------------
          197
The RECOVER … FROM SERVICE command refreshes the standby data files and rolls them forward to the same point-in-time as the primary. However, the standby control file still contains old SCN values which are lower than the SCN values in the standby data files. Therefore, to complete the synchronization of the physical standby database, we must refresh the standby control file to update the SCN#.
Hence place the physical standby database in NOMOUNT mode and restore controlfile to standby by using the control file on the primary database.
LONDON>SHUTDOWN IMMEDIATE;
STARTUP NOMOUNT;

RMAN> RESTORE STANDBY CONTROLFILE FROM SERVICE boston;

Starting restore at 09-FEB-15
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=22 device type=DISK

channel ORA_DISK_1: starting datafile backup set restore
channel ORA_DISK_1: using network backup set from service boston
channel ORA_DISK_1: restoring control file
channel ORA_DISK_1: restore complete, elapsed time: 00:00:07
output file name=/u01/app/oracle/oradata/london/control01.ctl
output file name=/u01/app/oracle/fast_recovery_area/london/control02.ctl
Finished restore at 09-FEB-15
Mount the standby database.
LONDON>ALTER DATABASE MOUNT;
Now the standby database has been rolled forward successfully as can be verified using DGMGRL.
DGMGRL> show configuration;
Configuration - drsolution

   Protection Mode: MaxPerformance
   Databases:
   boston   - Primary database
     bostonfs - Far Sync
       london   - Physical standby database
       london2  - Logical standby database (disabled)
     londonfs - Far Sync (inactive)

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS
Note: If network resources are a constraint, then you can still use the Pre-12c BACKUP INCREMENTAL command to create incremental backups on the primary database, and then use the incremental backups to roll forward the physical standby database.

Conclusion:

In case standby database lags behind the primary, it can be rolled forward using incremental backups from primary database. It is a faster alternative to copying archived logs from primary and applying them on standby database. When an archive log gap is unrecoverable  this procedure is a much faster alternative to  rebuilding the standby database. This  used to be a long manual process till 11g. In 12c, datafiles from primary database can be restored over the network as backupset and they can be optionally compressed and encrypted as well.

No comments:

Post a Comment