Saturday, 20 February 2016

Building A Single Instance Physical Standby For A RAC Primary…Or..Active Dataguard

Today we are going to embark on the adventure of building an Active DataGuard database (the “standby”) with a two-node RAC as the source (the “primary”).



First a situation sketch to show what we are ending up with:
ADG_RAC_SchematicsWe assume:
* There is an existing RAC database (RACDB) on two nodes: racnode1 and racnode2.
* We are using ASM/OMF and the Oracle software and ASM instance on the standby host is already installed.
* The standby server has no existing database.
* The standby is standalone (no RAC).
* The Oracle software version is 11gR2.
* The Primary database is running in ARCHIVELOG mode.
* RAC database name: PrimRac
* RAC Scan adress: primrac-scan
* RAC node1 instance name: PrimRac01
* RAC node2 instance name:PrimRac02
* RAC Unique Name: RACScan
* Standby database name: dg_stby
[Primary node1, node2, standby ] Step 1: Configure Oracle Net Services
dg_stby =
  (DESCRIPTION =
    (ADDRESS_LIST =
      (ADDRESS = (PROTOCOL = TCP)(HOST = stby_server)(PORT = 1521))
    )
    (CONNECT_DATA =
      (SERVICE_NAME = dg_stby)
      (UR=A)
    )
  )

PrimRac =
  (DESCRIPTION =
    (ADDRESS_LIST =
      (ADDRESS = (PROTOCOL = TCP)(HOST = primrac-scan)(PORT = 1521))
    )
    (CONNECT_DATA =
      (SERVICE_NAME = PrimRac)
    )
  )

primrac01 =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = primrac01-vip)(PORT = 1521))
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = primrac)
      (instance_name=primrac01)
    )
  )

primrac02 =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = primrac02-vip)(PORT = 1521))
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = primrac)
      (instance_name=primrac02)
    )
  )
Alter the tnsnames.ora on all the hosts involved so they are “known” to each other. Note the specifics of the NODES of the RAC!
Alter the listener.ora on the STANDBY database node and add the static listener :
SID_LIST_LISTENER=
   (SID_LIST=
        (SID_DESC=
          (GLOBAL_DBNAME=dg_stby)
          (SID_NAME=dg_stby)
          (ORACLE_HOME=/oracle/base/db/dbhome1/)
        )
      )
Restart the listener to have the new changes take effect.
$ srvctl stop listener
$ srvctl status listener
Listener LISTENER is enabled
Listener LISTENER is not running

$ srvctl start listener
$ lsnrctl status

LSNRCTL for Linux: Version 11.2.0.3.0 - Production on 10-MAR-2014 12:21:13

Copyright (c) 1991, 2011, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER)))
STATUS of the LISTENER
------------------------
Alias                     LISTENER
Version                   TNSLSNR for Linux: Version 11.2.0.3.0 - Production
Start Date                10-MAR-2014 12:21:10
Uptime                    0 days 0 hr. 0 min. 2 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /oracle/grid/network/admin/listener.ora
Listener Log File         /oracle/base/diag/tnslsnr/XXXX/listener/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=xx.xx.xx.xx)(PORT=1521)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=xx.xx.xx.xx)(PORT=1521)))
Services Summary...
Service "dg_stby" has 1 instance(s).
  Instance "dg_stby", status UNKNOWN, has 1 handler(s) for this service...
The command completed successfully
The “STATUS UNKNOWN” is what we are looking for, this is the static entry doing it’s business.
[Primary node1, node2, standby] Step 2: OraPWD
This is an VERY important step, since in 11g the rules regarding the orapwd have changed, RECREATE doesn’t work as it used to! It is VERY important to COPY 1 of the files over to ALL other nodes (renaming is allowed, even necessary!) to make this work.
We choose the primary node1 orapwd file as the source. Make sure you know the sys password.
$ cd $ORACLE_HOME/dbs
$ orapwd file=orapwdprimrac01 password=oracle
In my case these files where already created, and I know the sys password. So I just copied these over to the other instances with scp.
One again: copy over the file from one node to the other node and to the standby, renaming the file on the specific node to reflect the instancename.
[Primary] Step 3: Create standby control file
Next we need to create a standby control file. Do not create this in a location which get emptied automatically (like /tmp). It will be included in the daily backup with RMAN.
As user SYS:
SQL> ALTER DATABASE CREATE STANDBY CONTROLFILE AS '/oracle/base/db/dbhome1/dbs/standbyctlprd.ctl';

Database altered.
SQL>
Copy this file to a known location (like $ORACLE_HOME/dbs) on the standby server.
[Standby] Step 4: Create standby instance pfile
One can copy over the pfile from the primary database and alter this to create a similar pfile on the standby node.
*.db_cache_size=469762048
*.java_pool_size=33554432
*.large_pool_size=33554432
*.pga_aggregate_target=939524096
*.sga_target=7449083904
*.shared_pool_size=6710886400
*.streams_pool_size=100663296
*.audit_file_dest='/oracle/base/admin/dg_stby/adump'
*.audit_trail='db'
*.cluster_database=false
*.compatible='11.2.0.0.0'
*.control_files='/oracle/base/db/dbHome1/dbs/standbyctlfile.ctl'
*.db_block_size=8192
*.db_create_file_dest='+DATA'
*.db_domain=''
*.db_name='PrimRac'
*.db_unique_name='dg_stby'
*.db_recovery_file_dest='+FRA'
*.db_recovery_file_dest_size=858412548096
*.diagnostic_dest='/oracle/base'
*.fal_server='primrac01','primrac02'
*.log_archive_config='DG_CONFIG=(primrac,dg_stby)'
*.log_archive_dest_2='SERVICE=primrac01 NOAFFIRM ASYNC VALID_FOR=(ONLINE_LOGFILES,PRIMARY_ROLE) DB_UNIQUE_NAME=primrac'
*.log_archive_dest_state_2='ENABLE'
*.log_archive_max_processes=30
*.memory_max_target=17179869184
*.memory_target=8388608000
*.open_cursors=3500
*.optimizer_capture_sql_plan_baselines=FALSE
*.optimizer_use_sql_plan_baselines=FALSE
*.parallel_force_local=TRUE
*.plsql_code_type='NATIVE'
*.processes=680
*.recyclebin='OFF'
*.remote_login_passwordfile='exclusive'
*.standby_file_management='AUTO'
*.statistics_level='ALL'
*.streams_pool_size=83886080
*.undo_retention=1800
*.undo_tablespace='UNDOTBS1'
*.thread=1
*.fal_client='dg_stby'
*.db_file_name_convert='+DATAPRD/primrac/','+DATA/DG_stby/','+FRAPRD/primrac/','+FRA/dg_stby/'
*.log_file_name_convert='+DATAPRD/primrac/','+DATA/DG_stby/','+FRAPRD/primrac/','+FRA/dg_stby/'

Make note of the following DG entries:
*.log_archive_dest_2='SERVICE=primrac01 NOAFFIRM ASYNC VALID_FOR=(ONLINE_LOGFILES,PRIMARY_ROLE) DB_UNIQUE_NAME=primrac
Here we need to be careful: the service dictates NODE1 and the DB_UNIQUE_NAME is the same as the RAC database.
*.fal_server='primrac01','primrac02'
In a non-rac primary, here the database/instance would be listed, however..in a RAC, we list ALL the RAC NAMES. Not all sites are consistent/conclusive with this. This needs to be done ONLY on the STBY node! Also be carefull wiith the quotes, the individual names need to be set between their own set of quotes, as determined in the STBY tnsnames.orafile.
We use conversion between the ASM instances, since these differ also in our environment by utilizing the following:
*.db_file_name_convert='+DATAPRD/primrac/','+DATA/DG_stby/','+FRAPRD/primrac/','+FRA/dg_stby/'
*.log_file_name_convert='+DATAPRD/primrac/','+DATA/DG_stby/','+FRAPRD/primrac/','+FRA/dg_stby/'
Also keep in mind NOT to remove (or include) the following entry:
*.thread=1
Of course DG has to know which the client is:
*.fal_client='dg_stby'
Save the file and start the standby instance:
$ export ORACLE_SID=dg_stby
$ sqlplus / as sysdba

SQL*Plus: Release 11.2.0.3.0 Production on Mon Mar 10 14:39:02 2014
Copyright (c) 1982, 2011, Oracle.  All rights reserved.
Connected to an idle instance.

SQL> create spfile from pfile='/oracle/base/db/dbHome1/dbs/initdg-stby.ora';

File created.

SQL> startup nomount
ORACLE instance started.

Total System Global Area 1.7103E+10 bytes
Fixed Size                  2245480 bytes
Variable Size            1.6609E+10 bytes
Database Buffers          469762048 bytes
Redo Buffers               21708800 bytes
SQL>
[PRIMARY, node 1] Step 5: Duplicate the database.
Now we need to get the data from the primary on the standby. We use the network duplicate, although this can also be done with an backup to disk from the primary, and restore this backup after copying them over to the remote site. Since I don’t have this disk-space locally, so I opt to use the “over the wire” method.
On the primary:
$rlwrap rman target / auxiliary sys/secret@dg_stby

Recovery Manager: Release 11.2.0.3.0 - Production on Mon Mar 10 20:42:34 2014
Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

connected to target database: primrac (DBID=1727048999)
connected to auxiliary database: primrac (not mounted)

RMAN>  run {
2> allocate channel prmy1 type disk;
3> allocate channel prmy2 type disk;
4> allocate channel prmy3 type disk;
5> allocate channel prmy4 type disk;
6> allocate auxiliary channel stby type disk;
7> duplicate target database for standby from active database nofilenamecheck dorecover;
8> }
using target database control file instead of recovery catalog
allocated channel: prmy1
channel prmy1: SID=985 instance=primrac01 device type=DISK

allocated channel: prmy2
channel prmy2: SID=1022 instance=primrac01 device type=DISK

allocated channel: prmy3
channel prmy3: SID=179 instance=primrac01 device type=DISK

allocated channel: prmy4
channel prmy4: SID=402 instance=primrac01 device type=DISK

allocated channel: stby
channel stby: SID=794 device type=DISK

Starting Duplicate Db at 10-MAR-14

contents of Memory Script:
{
backup as copy reuse
targetfile  '/oracle/base/db/dbhome1/dbs/orapwprimrac01' auxiliary format
'/oracle/base/db/dbHome1/dbs/orapwdg_stby'   ;
}
executing Memory Script

Starting backup at 10-MAR-14
Finished backup at 10-MAR-14

contents of Memory Script:
{
backup as copy current controlfile for standby auxiliary format  '/oracle/base/db/dbHome1/dbs/standbyctlprd.ctl';
}
executing Memory Script

Starting backup at 10-MAR-14
channel prmy1: starting datafile copy
copying standby control file
output file name=+FRAPRDG/primrac/controlfile/snapcf_primrac.f tag=TAG20140310T220054 RECID=1059 STAMP=841874455
channel prmy1: datafile copy complete, elapsed time: 00:00:03
Finished backup at 10-MAR-14

contents of Memory Script:
{
sql clone 'alter database mount standby database';
}
executing Memory Script

sql statement: alter database mount standby database
RMAN-05529: WARNING: DB_FILE_NAME_CONVERT resulted in invalid ASM names; names changed to disk group only.

contents of Memory Script:
{
set newname for tempfile  1 to
"+dataprdg";
set newname for tempfile  2 to
"+dataprdg";
set newname for tempfile  3 to
"+dataprdg";
switch clone tempfile all;
set newname for datafile  1 to
"+dataprdg";
set newname for datafile  2 to
"+dataprdg";
<SNIP>
set newname for datafile  22 to
"+dataprdg";
backup as copy reuse
datafile  1 auxiliary format
"+dataprdg"   datafile
2 auxiliary format
"+dataprdg"   datafile
3 auxiliary format
"+dataprdg"   datafile
4 auxiliary format

<SNIP>
22 auxiliary format
"+dataprdg"   ;
sql 'alter system archive log current';
}
executing Memory Script
executing command: SET NEWNAME
executing command: SET NEWNAME
executing command: SET NEWNAME

renamed tempfile 1 to +dataprdg in control file
renamed tempfile 2 to +dataprdg in control file
renamed tempfile 3 to +dataprdg in control file

executing command: SET NEWNAME
executing command: SET NEWNAME
executing command: SET NEWNAME
<SNIP>
executing command: SET NEWNAME

Starting backup at 10-MAR-14
channel prmy1: starting datafile copy
input datafile file number=00011 name=+DATAPRDG/primrac/datafile/car_indx.273.811722355
channel prmy2: starting datafile copy
input datafile file number=00002 name=+DATAPRDG/primrac/datafile/sysaux.260.804442149
channel prmy3: starting datafile copy
input datafile file number=00010 name=+DATAPRDG/primrac/datafile/car_data.272.811719989
channel prmy4: starting datafile copy
input datafile file number=00003 name=+DATAPRDG/primrac/datafile/undotbs1.261.804442151
<SNIP>
executing Memory Script
executing command: SET until clause
Starting recover at 10-MAR-14
starting media recovery
archived log for thread 1 with sequence 6698 is already on disk as file +FRAPRDG/dg_stby/archivelog/2014_03_10/thread_1_seq_6698.283.841877139
archived log for thread 1 with sequence 6699 is already on disk as file +FRAPRDG/dg_stby/archivelog/2014_03_10/thread_1_seq_6699.280.841877141
archived log for thread 1 with sequence 6700 is already on disk as file +FRAPRDG/dg_stby/archivelog/2014_03_10/thread_1_seq_6700.276.841877147
archived log for thread 2 with sequence 6001 is already on disk as file +FRAPRDG/dg_stby/archivelog/2014_03_10/thread_2_seq_6001.284.841877139
archived log for thread 2 with sequence 6002 is already on disk as file +FRAPRDG/dg_stby/archivelog/2014_03_10/thread_2_seq_6002.277.841877143
archived log file name=+FRAPRDG/dg_stby/archivelog/2014_03_10/thread_1_seq_6698.283.841877139 thread=1 sequence=6698
archived log file name=+FRAPRDG/dg_stby/archivelog/2014_03_10/thread_2_seq_6001.284.841877139 thread=2 sequence=6001
archived log file name=+FRAPRDG/dg_stby/archivelog/2014_03_10/thread_1_seq_6699.280.841877141 thread=1 sequence=6699
archived log file name=+FRAPRDG/dg_stby/archivelog/2014_03_10/thread_2_seq_6002.277.841877143 thread=2 sequence=6002
archived log file name=+FRAPRDG/dg_stby/archivelog/2014_03_10/thread_1_seq_6700.276.841877147 thread=1 sequence=6700
media recovery complete, elapsed time: 00:00:31
Finished recover at 10-MAR-14
Finished Duplicate Db at 10-MAR-14
released channel: prmy1
released channel: prmy2
released channel: prmy3
released channel: prmy4
released channel: stby
RMAN>exit
Recovery Manager complete.
$
This concludes the longest step (depending on the size and speed of network).
**** Special notice **** Special notice ****
Due to the long runtime (yes, this is a large database, and the network data line to the standby server is quite loaded and stretches a long way) the “normal” backup of the primary started. In itself not an issue, however in this RMAN script, after the backups completes, the archive logs are deleted/gone. I kinda did this on purpose, since I think this is more a real-world example and most importantly: I didn’t want to change to many things (rather none at all) settings on the primary side. This has some serious impact on how the recovery goes (like: Not!) of the duplication of the standby database. This will also be solved in this blog, and contains valuable information on how to resolve so-called “gaps” in the synchronization.
I’ll get back on this later when we run into this issue, but one might want to turn off the backup on the primary. One might refrain from creating a backup on the primary (and deleting the archive logs) until the log synchronization is enabled if possible.
[standby] Step 6: Configure Data Guard settings
Log on to the standby database and shut it down:
SQL> shutdown
ORA-01109: database not open

Database dismounted.
ORACLE instance shut down.

SQL> startup nomount;

ORACLE instance started.

Total System Global Area 1.7103E+10 bytes
Fixed Size                  2245480 bytes
Variable Size            1.6609E+10 bytes
Database Buffers          469762048 bytes
Redo Buffers               21708800 bytes
SQL>exit

Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP, Data Mining
and Real Application Testing options

[oracle@bedr-odb01 ~]$ rman target /

Recovery Manager: Release 11.2.0.3.0 - Production on Tue Mar 11 09:29:26 2014
Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

connected to target database: PRIMRAC (not mounted)

Startup RMAN and restore the standby control file into ASM.
RMAN> restore controlfile from '/oracle/base/db/dbHome1/dbs/standbyctlprd.ctl';

Starting restore at 11-MAR-14
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=728 device type=DISK

channel ORA_DISK_1: copied control file copy
output file name=/oracle/base/db/dbHome1/dbs/standbyctlprd.ctl
Finished restore at 11-MAR-14

RMAN> exit;
Now we restart the standby database in NOMOUNT mode:
$ rlwrap  sqlplus / as sysdba

SQL*Plus: Release 11.2.0.3.0 Production on Tue Mar 11 09:33:07 2014

Copyright (c) 1982, 2011, Oracle.  All rights reserved.

Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP, Data Mining
and Real Application Testing options
SQL> shutdown immediate;
ORA-01109: database not open

Database dismounted.
ORACLE instance shut down.

SQL> startup nomount
ORACLE instance started.

Total System Global Area 1.7103E+10 bytes
Fixed Size                  2245480 bytes
Variable Size            1.6609E+10 bytes
Database Buffers          469762048 bytes
Redo Buffers               21708800 bytes
SQL>  alter database mount;
Database altered.
SQL>
NOTE: I’m taking small steps here..Just to be sure nothing is error-ing..
[Primary, node 1] Step 7: Configure primary database
Set the log_archive_config to the following settings:
SQL> show parameter LOG_ARCHIVE_CONFIG
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
log_archive_config                   string

SQL> ALTER SYSTEM SET LOG_ARCHIVE_CONFIG='DG_CONFIG=(primrac,dg_stby)';

System altered.

SQL> show parameter LOG_ARCHIVE_CONFIG

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
log_archive_config                   string      DG_CONFIG=(primrac,dg_stby)
Check if the (default) log_archive_format is correct:
SQL> show parameters LOG_ARCHIVE_FORMAT
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
log_archive_format                   string      %t_%s_%r.dbf
Change to above if not the same!
Make sure we have enough Log_Archive Processes:
SQL> show parameters LOG_ARCHIVE_MAX_PROCESSES
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
log_archive_max_processes            integer     4

SQL> ALTER SYSTEM SET LOG_ARCHIVE_MAX_PROCESSES=30;
System altered.
SQL> show parameters LOG_ARCHIVE_MAX_PROCESSES
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
log_archive_max_processes            integer     30
SQL>
Double check we are using the remote_login_passwordfile.
THIS IS IMPORTANT, and normally this is default, but let’s not take anything for granted. Saves a lot of issues and troubleshooting later on.
SQL>show parameters REMOTE_LOGIN_PASSWORDFILE;
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
remote_login_passwordfile            string      EXCLUSIVE
SQL>
Ok. We’re good.
Turn on FORCE LOGGING:
SQL >SELECT force_logging FROM v$database;

FORCE_LOGGING
------------------------------
NO

SYSTEM@primary SQL > ALTER DATABASE FORCE LOGGING;
Database altered.
SQL >

SQL >SELECT force_logging FROM v$database;
FORCE_LOGGING
------------------------------
YES
SQL >
Configure the LOG_ARCHIVE_DEST_2 parameter:
SQL> show parameter LOG_ARCHIVE_DEST_2;

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
log_archive_dest_2                   string

SQL>ALTER SYSTEM SET LOG_ARCHIVE_DEST_2='SERVICE=dg_stby NOAFFIRM ASYNC VALID_FOR=(ONLINE_LOGFILES,PRIMARY_ROLE) DB_UNIQUE_NAME=dg_stby';

System altered.

SQL> show parameter LOG_ARCHIVE_DEST_2;

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
log_archive_dest_2                   string      SERVICE=dg_stby NOAFFIRM ASYNC
VALID_FOR=(ONLINE_LOGFILES,PR
IMARY_ROLE) DB_UNIQUE_NAME=dg_stby
SQL>
Enable the LOG_ARCHIVE_DEST_STATE_2: (–> basically this “turns on” the transport of the redolog files to the standby).
SQL> show parameter LOG_ARCHIVE_DEST_STATE_2

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
log_archive_dest_state_2             string      enable

SQL>ALTER SYSTEM SET LOG_ARCHIVE_DEST_STATE_2=ENABLE;
System altered.
SQL> show parameter LOG_ARCHIVE_DEST_STATE_2
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
log_archive_dest_state_2             string      ENABLE
SQL>
Alter the parameter FAL_SERVER:
SQL> show parameters FAL_SERVER

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
fal_server                           string
SQL> ALTER SYSTEM SET FAL_SERVER=dg_stby;
System altered.
SQL> show parameters FAL_SERVER

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
fal_server                           string      DG_STBY
SQL>
Alter parameter STANDBY_FILE_MANAGEMENT:
SQL> show parameters STANDBY_FILE_MANAGEMENT
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
standby_file_management              string      MANUAL
SQL>
This needs to be auto:
SQL> alter system set STANDBY_FILE_MANAGEMENT=auto;

System altered.

SQL> show parameters STANDBY_FILE_MANAGEMENT
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
standby_file_management              string      AUTO
SQL>
[Standby] Step 8: Create standby logfiles.
Here we are going to create the standby redo log files on the standby. We use the following recommendation from Oracle:
(maximum # of logfiles +1) * maximum # of threads
These values will be the same on the standby as well as on the primary, since we duplicated this database.
These standby redo log files will not be created on the primary in this post.
Determine maximum # of logfiles/threads:
sql> select * from v$standby_log;
GROUP#    THREAD#  SEQUENCE#      BYTES  BLOCKSIZE    MEMBERS ARC STATUS
---------- ---------- ---------- ---------- ---------- ---------- --- ----------------
FIRST_CHANGE# FIRST_TIM NEXT_CHANGE# NEXT_TIME
------------- --------- ------------ ---------
1          1       6713  209715200        512          2 NO  CURRENT
1118893072 11-MAR-14   2.8147E+14

2          1       6712  209715200        512          2 YES INACTIVE
1118640974 11-MAR-14   1118893072 11-MAR-14

3          2       6015  209715200        512          2 NO  CURRENT
1118892968 11-MAR-14   2.8147E+14

4          2       6014  209715200        512          2 YES INACTIVE
1118782650 11-MAR-14   1118892968 11-MAR-14

5          1       6711  209715200        512          2 YES INACTIVE
1118556150 11-MAR-14   1118640974 11-MAR-14

6          2       6013  209715200        512          2 YES INACTIVE
1118640499 11-MAR-14   1118782650 11-MAR-14
With a bit of squinting we can determine: we have 2 threads with each 3 groups
Thread 1: group 1,2,5
Thread 2: group 3,4,6
size of redo log should all be the same: 209715200
So we need on the standby:
(6 +1) * 2= 14 standby redo log files.
We have one more standby redo log file for each thread.
SQL> select * from v$standby_log;
no rows selected

SQL> alter database add standby logfile thread 1 group 7 '+DATAPRDG' size 209715200;
Database altered.
SQL>
And the rest for clarity:
alter database add standby logfile thread 1 group 8 '+DATAPRDG' size 209715200;
alter database add standby logfile thread 1 group 9 '+DATAPRDG' size 209715200;
alter database add standby logfile thread 1 group 10 '+DATAPRDG' size 209715200;

alter database add standby logfile thread 2 group 11 '+DATAPRDG' size 209715200;
alter database add standby logfile thread 2 group 12 '+DATAPRDG' size 209715200;
alter database add standby logfile thread 2 group 13 '+DATAPRDG' size 209715200;
alter database add standby logfile thread 2 group 14 '+DATAPRDG' size 209715200;
And now….the moment supreme…Place the standby database in Data Guard mode..
To place the standby database in Automatic Managed Recovery Mode (aka: MRP) issue the following command:
SQL> alter database recover managed standby database disconnect from session;
Database altered.

SQL> exit;
BUT….
We will run into another issue. When trying to bounce the database, or opening it in read-only mode one can run into the error where datafiles need to be recovered.
This is because the names in the ASM on the standby are different than the names in the primary ASM. The cause of this is OMF. It renamed the files to a different name in the standby, since this is how OMF works. to solve this, follow the steps that are perfectly documented in the Helios Blog.
Be aware however to recreate the standby log files, since these will be lost after this exercise. Once the database files are correct, the recovery issues should be gone.
BUT….
The basics are now in place, but  in my situation we have a gap..since between the duplicate and the enable of the logfiles transfer, a backup took place..(Mentioned in the special note section before). This is made clear by this query:
[standby] SQL> select process,status,sequence#,thread# from v$managed_standby;

PROCESS   STATUS        SEQUENCE#    THREAD#
--------- ------------ ---------- ----------
ARCH      CONNECTED             0          0
ARCH      CONNECTED             0          0
ARCH      CONNECTED             0          0
ARCH      CLOSING            6714          1
ARCH      CONNECTED             0          0
ARCH      CONNECTED             0          0
ARCH      CONNECTED             0          0
ARCH      CONNECTED             0          0
ARCH      CLOSING            6715          1
ARCH      CONNECTED             0          0
ARCH      CLOSING            6716          1

PROCESS   STATUS        SEQUENCE#    THREAD#
--------- ------------ ---------- ----------
ARCH      CONNECTED             0          0
ARCH      CONNECTED             0          0
ARCH      CONNECTED             0          0
ARCH      CONNECTED             0          0
ARCH      CONNECTED             0          0
ARCH      CONNECTED             0          0
ARCH      CONNECTED             0          0
ARCH      CONNECTED             0          0
ARCH      CONNECTED             0          0
ARCH      CONNECTED             0          0
ARCH      CONNECTED             0          0

PROCESS   STATUS        SEQUENCE#    THREAD#
--------- ------------ ---------- ----------
ARCH      CONNECTED             0          0
ARCH      CONNECTED             0          0
ARCH      CONNECTED             0          0
ARCH      CONNECTED             0          0
ARCH      CONNECTED             0          0
ARCH      CONNECTED             0          0
ARCH      CONNECTED             0          0
ARCH      CONNECTED             0          0
RFS       IDLE                  0          0
RFS       IDLE               6717          1
RFS       IDLE                  0          0

PROCESS   STATUS        SEQUENCE#    THREAD#
--------- ------------ ---------- ----------
MRP0      WAIT_FOR_GAP       6701          1
To fix this we do a recover of the archive-logs from the primary database after we determine the range:
Determine the “gap-range” by using this query on the standby database:

 SELECT high.thread#, "LowGap#", "HighGap#"
 FROM
      (SELECT thread#, MIN(sequence#)-1 "HighGap#"
      FROM
      (SELECT a.thread#, a.sequence#
          FROM
          (SELECT * FROM v$archived_log) a,
          (SELECT thread#, MAX(next_change#)gap1
              FROM v$log_history
              GROUP BY thread#
          ) b
          WHERE a.thread# = b.thread#
          AND a.next_change# > gap1
      )
      GROUP BY thread#
  ) high,
  (SELECT thread#, MIN(sequence#) "LowGap#"
      FROM
      (SELECT thread#, sequence#
          FROM v$log_history, v$datafile
          WHERE checkpoint_change# <= next_change#
          AND checkpoint_change# >= first_change#
      )
      GROUP BY thread#
  ) low
  WHERE low.thread# = high.thread#

THREAD#    LowGap#   HighGap#
---------- ---------- ----------
         1       6700       6704
Close the gap:
RMAN> restore archivelog from sequence 6700 until sequence 6704;

Starting restore at 11-MAR-14
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=804 instance=primrac01 device type=DISK

channel ORA_DISK_1: starting archived log restore to default destination
channel ORA_DISK_1: restoring archived log
archived log thread=1 sequence=6700
channel ORA_DISK_1: restoring archived log
archived log thread=1 sequence=6701
channel ORA_DISK_1: restoring archived log
archived log thread=1 sequence=6702
channel ORA_DISK_1: restoring archived log
archived log thread=1 sequence=6703
channel ORA_DISK_1: restoring archived log
archived log thread=1 sequence=6704
channel ORA_DISK_1: reading from backup piece +FRAPRDG/primrac/backupset/2014_03_11/annnf0_incr_backup_0.608.841889217
channel ORA_DISK_1: piece handle=+FRAPRDG/primrac/backupset/2014_03_11/annnf0_incr_backup_0.608.841889217 tag=INCR_BACKUP
channel ORA_DISK_1: restored backup piece 1
channel ORA_DISK_1: restore complete, elapsed time: 00:00:35
Finished restore at 11-MAR-14
Issuing the same “gap-query” as before gives us:
   THREAD#    LowGap#   HighGap#
---------- ---------- ----------
         1       6701       6701
Next issue we had is this:
The second node (primrac2) jumped ship (ORA-12514: TNS:listener does not currently know of service requested in connect descriptor) and only the log files from the node primrac1 where being applied. Luckily this is a common issue, and the procedure to recover from this is fairly straightforward. This process has been documented VERY well by The Arup Nanda Blog
Remember: All this can be prevented by having the archivelogs available until the Data Guard catches up.
[Standby] Step 9: Validating Data Guard process.
Determine the lag in seconds:
[primary or standby]
select abs(nvl(max(ceil( (extract(day from replication_lag)*24*60*60)+
(extract(hour from replication_lag)*60*60)+
(extract(minute from replication_lag)*60)+
(extract(second from replication_lag))
)),0)) lag_in_seconds from
(select sysdate-scn_to_timestamp(current_scn) replication_lag from v$database);
LAG_IN_SECONDS
--------------
             2
Check if all logs are applied:
Verify from v$archived_log if there is a gap in the sequence number. All the logs up to that gap should have APPLIED=YES and all the sequence# after the missing log sequence# are APPLIED=NO. This means that due to the missing log, MRP is not applying the logs on standby but the logs are still being transmitted to the standby and are available.
[primary or standby]
 SQL> SELECT SEQUENCE#, APPLIED FROM V$ARCHIVED_LOG;
So for example, if the missing log sequence# is 400, then the above query should show that up to sequence#399, all have APPLIED=YES and starting from 401, all are APPLIED=NO. Use the recovery method described  in the The Arup Nanda Blog.
[Standby] Step 9: Validating Data Guard process.
Update: In the day to day working with ADG, I missed some queries to validate the operations of the standby database. To get some idea of what is going on, one can use these queries on the standby:
SQL> select process,status,sequence#,thread# from v$managed_standby;

PROCESS   STATUS        SEQUENCE#    THREAD#
--------- ------------ ---------- ----------
ARCH      CLOSING            6936          1
ARCH      CLOSING            6942          1
ARCH      CONNECTED             0          0
ARCH      CLOSING            6947          1
ARCH      CLOSING            6217          2
<snip>
ARCH      CLOSING            6220          2
ARCH      CLOSING            6225          2
RFS       IDLE                  0          0
RFS       IDLE               6228          2
RFS       IDLE                  0          0

PROCESS   STATUS        SEQUENCE#    THREAD#
--------- ------------ ---------- ----------
RFS       IDLE               6950          1
RFS       IDLE                  0          0
RFS       IDLE                  0          0
MRP0      APPLYING_LOG       6950          1

SQL> select status,instance_name,database_role from v$instance,v$database;

STATUS       INSTANCE_NAME    DATABASE_ROLE
------------ ---------------- ----------------
OPEN         dg_stby01          PHYSICAL STANDBY
SQL>
Note: This database is open in read-only mode, hence the “open” status.
To determine the last received and applied log file from the primary, run this query on  the STANDBY:
SELECT 'Last Applied : ' Logs,
TO_CHAR(next_time,'DD-MON-YY:HH24:MI:SS') TIME,thread#,sequence#
FROM v$archived_log
WHERE sequence# =
(SELECT MAX(sequence#) FROM v$archived_log WHERE applied='YES'
)
UNION
SELECT 'Last Received : ' Logs,
TO_CHAR(next_time,'DD-MON-YY:HH24:MI:SS') TIME,thread#,sequence#
FROM v$archived_log
WHERE sequence# =
(SELECT MAX(sequence#) FROM v$archived_log );

LOGS             TIME                  THREAD#  SEQUENCE#
---------------- ------------------ ---------- ----------
Last Applied :   20-NOV-14:13:14:54          2      23321
Last Received :  20-NOV-14:13:37:14          2      23322


To determine the “queue” on the PRIMARY database:
SELECT
 (SELECT name FROM V$DATABASE
 ) name,
 (SELECT MAX (sequence#) FROM v$archived_log WHERE dest_id = 1
 ) Current_primary_seq,
 (SELECT MAX (sequence#)
 FROM v$archived_log
 WHERE TRUNC(next_time) > SYSDATE - 1
 AND dest_id = 2
 ) max_stby,
 (SELECT NVL (
 (SELECT MAX (sequence#) - MIN (sequence#)
 FROM v$archived_log
 WHERE TRUNC(next_time) > SYSDATE - 1
 AND dest_id = 2
 AND applied = 'NO'
 ), 0)
 FROM DUAL
 ) "To be applied",
 (
 (SELECT MAX (sequence#) FROM v$archived_log WHERE dest_id = 1
 ) -
 (SELECT MAX (sequence#) FROM v$archived_log WHERE dest_id = 2
 )) "To be Shipped"
FROM DUAL;


NAME      CURRENT_PRIMARY_SEQ   MAX_STBY To be applied To be Shipped
--------- ------------------- ---------- ------------- -------------
PRIMRAC0                23322      23322             0             0

And that’s it! It occurs to me I could just create a blog where everything goes smooth sailing from end-to-end, but in my opinion this is not how the real world works. And probably why this blog is being read. There are many blogs out there where it works for the author, but when one is trying to reproduce the steps, it bails out..and no solution is available on the blog, since they didn’t run into the issue. Or they are just smarter than me ;-).
[Standby] Step 10: Optionally place standby DB in read only mode.
To enable Active Data Guard, you need to open the physical standby in read-only mode and start redo apply.
The Data Guard should be in one of two states prior to enabling Active Data Guard:
  • The standby is mounted and redo is running
  • The standby database has been shutdown cleanly and redo was stopped
First stop the redo and open the database as read-only:
sql> recover managed standby database cancel;
sql> alter database open read only;
Restart the redo.
sql> recover managed standby database disconnect using current logfile;
Conclusion:
Implementing ADG is no mean feat, and there is a lot more to say on this subject, however I hope that some guidance and relief can be found with this information provided.

No comments:

Post a Comment