Back-to-Basics: Understanding Backup and Recovery plus High Availability(Part II – Redo)

What is redo?

Let’s look briefly at the redo process. When Oracle blocks (the smallest unit of storage in a database) are changed, including UNDO blocks, Oracle records the changes in vector changes, which are referred to as redo entries or redo records. The changes are written by the server process to the redo log buffer in the System Global Area (SGA). The redo log buffer will then be flushed into the online redo logs in near real-time by the log writer (LGWR) process (if the redo log buffer is too small, you will start seeing log buffer space waits during bursts of redo generation).

The redo entries are written by the LGWR to a disk when:

A user issues a commit
The log buffer is one-third full
The amount of unwritten redo entries is 1 MB
When a database checkpoint takes place
Otherwise, every three seconds

Redo entries are written to disk when one of the mentioned situations occurs first. In the event of a checkpoint, the redo entries are written before the checkpoint to ensure recoverability.

Redo log files record changes to the database due to transactions and internal Oracle server actions. Redo log files protect the database from the loss of integrity due to system failures caused by power outages, disk failures, etc. Redo log files must be multiplexed using different disks (fast disks are preferred) to ensure that the information stored in them is not lost during a disk failure.

The redo log consists of groups of redo log files. A group consists of a redo log file and its multiplexed copies. Each identical copy is said to be a member of that group, and each group is identified by a number. The LGWR process writes redo records from the redo log buffer to all members of a redo log group until the file is filled or a log switch operation is requested. Then, it switches and writes to the files in the next group. Redo log groups are used in a circular fashion, as shown in the following figure:

Tip: Redo log groups need at least two files per group, with the files distributed on separate disks or controllers so that no single disk failure destroys an entire log group. Also, never rely exclusively on your ASM disk group or the file system if they have mirrored disks. Remember that mirroring will not protect your database if your online redo log file is deleted or corrupted.

The loss of an entire log group is one of the most severe possible media failures you can encounter because it can result in data loss. The loss of a single member within a multiple-member log group is trivial. It does not affect database operation, other than causing an alert to be published in the alert log.

Remember that redo logs heavily influence database performance because a commit cannot be completed until the transaction information has been written to the logs. You must place your redo log files on your fastest disks served by your fastest controllers. If possible, do not place any other database files on the same disks as your redo log files.

Tip: It’s not advisable to place members of different groups on the same disk. That’s because the archiving process reads the online redo log files and will end up competing with the LGWR process.

As a resume about redo log files and redo log groups, it is essential always to:

Have a minimum of three redo log groups. Suppose your database switches too often, and you do not have an appropriate number of redo log groups. In that case, the LGWR process will need to wait until the next group is available before being able to overwrite it.
All online redo logs and standby redo logs are equal in size.
Tune your redo log file size to allow redo log switches to happen no less than 20 minutes from each other at peak times.
Remember to place the redo log files on high-performance disks.
Remember to have a minimum of two redo log members per group to reduce risk and place them in different disks away from the data.
Do not multiplex standby redo logs to prevent additional writes in the redo transport.

Remember, as mentioned earlier, that it is crucial to note that not all Oracle databases will have the archive process enabled.

Redo generation and recoverability

The purpose of redo generation is to ensure recoverability. This is why Oracle does not give the DBA much control over redo generation. If the instance crashes, all the SGA changes will be lost. Oracle will then use the redo entries in the online redo log files to bring the database to a consistent state. The cost of maintaining the redo log records is an expensive operation involving latch management operations (CPU) and frequent write access to the redo log files (I/O).

The NOARCHIVELOG mode

When your database is created by default, it will be created using the NOARCHIVELOG mode. This mode permits any normal database operations but will not provide your database with the capability to perform any point-in-time recovery operations or online backups of your database.

When the database uses this mode, no hot backup is possible (hot backup is any backup done with the database open, causing no interruption for the users). You will only be able to perform backups with your database down (shutdown, also known as the offline backup or the cold backup), and you will only be able to perform a full recovery until your backup was made. You can see in the following example what will happen if you try to make a hot backup of your database when in the NOARCHIVELOG mode:

SQL> SELECT log_mode FROM v$database;

LOG_MODE

------------

NOARCHIVELOG

SQL> ALTER DATABASE BEGIN BACKUP;

ALTER DATABASE BEGIN BACKUP

*

ERROR at line 1:

ORA-01123: cannot start online backup; media recovery not enabled

The error shown in the preceding code is the result you will receive after trying to place your database in backup mode to make a hot backup of your database files. The following example shows the result you will receive when trying to make a backup of your open database when in the NOARCHIVELOG mode using RMAN. As you can see, neither approach is possible:

RMAN> BACKUP DATABASE;

Starting backup at 04-DEC-21

using target database control file instead of recovery catalog

allocated channel: ORA_DISK_1

channel ORA_DISK_1: SID=36 device type=DISK

channel ORA_DISK_1: starting full datafile backup set

channel ORA_DISK_1: specifying datafile(s) in backup set

RMAN-03009: failure of backup command on ORA_DISK_1 channel at

12/04/2021 15:32:42

ORA-19602: cannot backup or copy active file in NOARCHIVELOG mode

continuing other job steps, job failed will not be re-run

channel ORA_DISK_1: starting full datafile backup set

channel ORA_DISK_1: specifying datafile(s) in backup set

including current control file in backup set

including current SPFILE in backup set

channel ORA_DISK_1: starting piece 1 at 04-DEC-21

channel ORA_DISK_1: finished piece 1 at 04-DEC-21

piece handle=/home/oracle/app/oracle/flash_recovery_area/ORCL/ backupset/202

1_12_04/o1_mf_ncsnf_TAG20211204T153241_8cx20wfz_.bkp

tag=TAG20211204T153241 comment=NONE

channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01

RMAN-00571: ======================================================

RMAN-00569: ========== ERROR MESSAGE STACK FOLLOWS ===============

RMAN-00571: ======================================================

RMAN-03009: failure of backup command on ORA_DISK_1 channel at

12/04/2021 15:32:42

ORA-19602: cannot backup or copy active file in NOARCHIVELOG mode

To make a full backup of your database when in the NOARCHIVELOG mode, you will need to:

First, shut down your database completely in a consistent mode.
Back up all your datafiles, parameter files, a control file, and your redo logs manually to a tape or a different location.
Re-start your database.

If a recovery is required, you will need to restore all files from your last backup and start the database, but you need to understand that all transactions made in the database after your backup will be lost.

The ARCHIVELOG mode

Oracle lets you save filled redo log files to one or more offline destinations to improve the recoverability of your data by having all transactions saved in case of a crash, reducing any possibility of data loss. The copy of the redo log file containing transactions against your database made to a different location is an ARCHIVELOG file. The process of turning redo log files into archived redo log files is called archiving.

Understanding the ARCHIVELOG mode

An archived redo log file is a physical copy of one of the filled members of a redo log group. Remember that redo log files are cyclical files that are overwritten by the Oracle database and are only archived (backup copy of the file before being overwritten) when the database is in the ARCHIVELOG mode. Each redo log file includes all redo entries and the unique log sequence number of the identical redo log group member. To clarify, if you are multiplexing your redo log file (recommended to a minimum of two members per group) and if your redo log group 1 contains two identical member files such as redolog_1a.rdo and redolog_1b.rdo, then the archive process (ARCn) will only archive one of these member files, not both, if redo log file redolog_1a.rdo becomes corrupted, the ARCn process will still be able to archive the identical surviving redo log file redolog_1b.rdo. The archived redo log generated by the ARCn process will contain a copy of every group created since you enabled archiving in your database.

When the database runs in ARCHIVELOG mode, the LGWR process cannot reuse and overwrite a given redo log group until it has been archived. This is to ensure the recoverability of your data. The background process ARCn will automate the archiving operation. The database will start multiple archive processes as necessary (the default number of processes is four) to ensure that the archiving of filled redo log files does not fall behind.

You can use archived redo logs to:

Recover a database
Update and keep a standby database in sync with a primary database
Get information about the history of a database using the LogMiner utility

In the ARCHIVELOG mode, the Oracle Database engine will make copies of all online redo log files via an internal process called ARCn. This process will generate archive copies of your redo log files to one or more archive log destination directories. The number and location of destination directories will depend on your database initialization parameters.

To use the ARCHIVELOG mode, you must first set up some configuration parameters. Once your database is in the ARCHIVELOG mode, all database activity regarding your transactions will be archived to allow your data recoverability. You must ensure that your archival destination area always has enough space available. If space runs out, your database will suspend all activities until it can back up your redo log files in the archival destination once again.

Tip: Never use the extension .log to redo log files. As mentioned earlier, use a different extension such as, for example, .rdo. This is because anyone, including you, can delete .log files by mistake when running out of space.

Preparing for the ARCHIVELOG mode

When setting your database to work in the ARCHIVELOG mode, please never forget to:

Configure your database in a proper way. Some examples of what to do when configuring a database are:
- Read the Oracle documentation: It’s always essential to follow Oracle’s recommendations in the documentation.
- Having a minimum of three control files will reduce the risk of losing a control file.
- Set the CONTROL_FILE_RECORD_KEEP_TIME initialization parameter to an acceptable value: This will set the number of days before a reusable record in the control file can be reused. It will also control the period of time that your backup information will be stored in the control file.
- Configure the size of redo log files and groups appropriately: If not configured properly, the Oracle Database engine will generate constant checkpoints that will create a high load on the buffer cache and I/O system, affecting the performance of your database. Also, having a few redo log files in a system will force the LGWR process to wait for the ARCn process to finish before overwriting a redo log file.
- Multiplex online redo log files: Do this to reduce the risk of losing an online redo log file.
- Enable block checksums: This will allow the Oracle Database engine to detect corrupted situations.
- Enable database block checking: This allows Oracle to perform block checking for corruption, but be aware that it can cause overhead in most applications depending on workload and the parameter value.
- Log checkpoints to the alert log: This helps determine whether checkpoints occur at the desired frequency.
- Use the fast-start fault recovery feature: This reduces the time required for cache recovery. The parameter FAST_START_MTTR_TARGET is the one to look over here.
- Use Oracle restart: This is used to enhance the availability of a single instance (non-RAC) and its components.
- Never use the extension .log for redo log files: Anyone, including you, can delete .log files by mistake when running out of space.
- Use block change tracking: This allows incremental backups to run to completion more quickly than otherwise.

Always be sure to have enough available space in the archival destination.
Always ensure everything is working as it is supposed to be. Never forget to implement a proactive monitoring strategy using scripts or Oracle Enterprise Manager (OEM). Some important areas to check are:
- Database structure integrity
- Data block integrity
- Redo integrity
- Undo segment integrity
- Transaction integrity
- Dictionary integrity

Checking the status of the ARCHIVELOG mode

You can determine which mode or if archiving, is being used in your instance by issuing an SQL query to the log_mode field in the v$database (ARCHIVELOG indicates archiving is enabled and NOARCHIVELOG indicates that archiving is not enabled) or by issuing the SQL archive log list command:

SQL> SELECT log_mode FROM v$database;

LOG_MODE

-------------------

ARCHIVELOG

SQL> archive log list

Database log mode Archive Mode

Automatic archival Enabled

Archive destination USE_DB_RECOVERY_FILE_DEST

Oldest online log sequence 8

Next log sequence to archive 10

Current log sequence 10

Specifying parameters

In ARCHIVELOG mode, you can choose between generating archive redo logs to a single location or multiplexing them. The most important parameters you need to be familiar with when setting your database to work in this mode are:

LOG_ARCHIVE_DEST_n: Use this parameter to specify from one to ten different archival locations (n can be a number between 1 and 10).
LOG_ARCHIVE_FORMAT: This parameter will specify the default filename format when archiving the redo log files. The following variables can be used to format the file:
- %s — log sequence number
- %S — log sequence number, zero-filled
- %t — thread number
- %T — thread number, zero-filled
- %a — activation ID
- %d — database ID
- %r — resetlogs ID

One example of how to use these parameters could be something like this: alter system set log_archive_format=”orcl_%s_%t_%r.arc” scope=spfile. This command will create archive log files with a name containing the word “orcl” which is the database ID, the log sequence number, the thread number, and the resetlogs ID.

LOG_ARCHIVE_MIN_SUCCEED_DEST: This defines the minimum number of archival destinations that must succeed to allow a redo log file to be overwritten

Viewing the status of archival destinations

You can also check the status of your archival destinations by querying the V$ARCHIVE_DEST view, in which the following variable characteristics will determine the status:

Valid/Invalid: This indicates whether the disk location or service name specified is valid or not
Enabled/Disabled: This indicates the availability state of the location and if the database can use it
Active/Inactive: This indicates whether there was a problem accessing the destination

The FRA (called Flashback Recovery Area before Oracle 11g R2, and now called Fast Recovery Area) is a disk location in which the database can store and manage all files related to backup and recovery operations. Flashback database provides a very efficient mechanism to rollback any unwanted database change.

Placing a database into the ARCHIVELOG mode

database in the ARCHIVELOG mode, and using the FRA as a secondary location for the archive log files. To achieve all this, you will need to:

Set up the size of your FRA to be used by your database. You can do this by using the command:

SQL> ALTER SYSTEM SET DB_RECOVERY_FILE_DEST_SIZE=<M/G> SCOPE=both;

Specify the location of the FRA using the command:

SQL> ALTER SYSTEM SET DB_RECOVERY_FILE_DEST= '/u01/app/oracle/fast_recovery_area' scope=both;

Define your archive log destination area using the command:

SQL> ALTER SYSTEM SET log_archive_dest_1= 'LOCATION=/DB/u02/backups/archivelog' scope=both;

Define your secondary archive log area to use the FRA with the command:

SQL> ALTER SYSTEM SET log_archive_dest_10= 'LOCATION=USE_DB_RECOVERY_FILE_DEST';

Shutdown your database using the command:

SQL> SHUTDOWN IMMEDIATE

Start your database in mount mode using the command:

SQL> STARTUP MOUNT

Switch your database to use the ARCHIVELOG mode using the command:

SQL> ALTER DATABASE ARCHIVELOG;

Then finally, open your database using the command:

SQL> ALTER DATABASE OPEN;

When in the ARCHIVELOG mode, you can make hot backups using RMAN. You can perform some user-managed backups using the alter database begin backup command (used to allow you to make a consistent backup of your entire database files). You may also use the alter tablespace <Tablespace_Name> begin backup command to make a backup of all datafiles associated with a tablespace.

Now that you know everything you are supposed to know about the ARCHIVELOG mode, let’s take a deeper look at what is redo and why it is so important to the recoverability of our database.

Differences between redo and undo

Another common question relates to the difference between redo log entries and undo information saved as part of transaction management. While redo and undo data sound almost like they could be used for the same purpose, such is not the case. The following table spells out the differences:

In the end, an undo segment is just a segment like any other (such as a table, an index, a hash cluster, or a materialized view). The critical point here is in the name, and the main rule you need to understand is that if you modify part of a segment (any segment, regardless of its type), you must generate redo so that the change can be recovered in the event of a media or instance failure. Therefore, if you modify the table EMPLOYEE, the changes made to the EMPLOYEE blocks are recorded in the redo log buffer, and consequently to the redo log files (and archive log files if running in the ARCHIVELOG mode). The changes made to EMPLOYEE must also be recorded in UNDO because you might change your mind and want to rollback the transaction before issuing a commit to confirm the changes made. Therefore, the modification to the table EMPLOYEE causes entries to be made in an undo segment, which is also a modification to a segment. Therefore, the changes made to the undo segment must also be recorded in the redo log buffer to protect your data integrity in a disaster.

Suppose your database crashes, and you need to restore a set of datafiles from five days ago, including those for the UNDO tablespace. In that case, Oracle will start reading from your archived redo, rolling the five-day-old files forward until they were four, then three, then two, then one. This will happen until the recovery process gets to the time where the only record of the changes to segments (any segment) was contained in the current online redo log file, and now that you have used the redo log entries to roll the data forward until all changes to all segments that had ever been recorded in the redo, have been applied. At this point, your undo segments have been repopulated, and the database will start rolling back those transactions that were recorded in the redo log but weren’t committed at the time of the database failure.

I can’t emphasize enough that undo segments are just slightly special tables. They’re fundamentally not very different than any other tables in the database, such as EMPLOYEE or DEPARTMENT, except that any new inserts into these tables can overwrite a previous record, which never happens to a table like EMPLOYEE, of course. If you generate undo when updating EMPLOYEE, you will consequently generate redo. This means that every time undo is generated, redo will also be generated (this is the key point to understand here).

Oracle Database stores the before and after image in redo because redo is written and generated sequentially and isn’t cached for an extended period of time in memory (as mentioned in the What is redo section in this chapter). Hence, using redo to rollback a mere mistake, or even a change of mind, while theoretically possible, would involve wading through vast amounts of redo sequentially, looking for them before the image in a sea of changes made by different transactions, and all of these will be done by reading data off the disk to memory as a normal recovery process. On the other hand, UNDO is stored in the buffer cache (just as the table EMPLOYEE is stored in the buffer cache), so there’s a good chance that reading the information needed will require only logical I/O and not physical. Your transaction will also be dynamically pointed to where it’s written in UNDO, so you and your transaction can jump straight to where your UNDO is, without having to navigate through a sea of undo generated by all other transactions.

In summary, you must redo recovery operations and undo for consistency in multiuser environments and rollback any changes of mind. In my opinion, this is one of the key points that makes Oracle superior to any other database in the market. Other databases merely have transaction logs that serve both purposes and suffer in performance and flexibility terms accordingly.

Facing excessive redo generation during an online backup?

One of the most common questions I see on the Oracle Technology Network (OTN) forums is why so much redo is generated during an online backup operation. When a tablespace is put in the backup mode, the redo generation behavior changes, but no excessive redo is generated. Additional information is logged into the online redo log file during a hot backup the first time a block is modified in a tablespace in the hot backup mode. In other words, as long as the tablespace is in the backup mode, Oracle will write the entire block to disk, but later it generates the same redo. This is done because Oracle cannot guarantee that a block was not copied while being updated as part of the backup.

In the hot backup mode, only two things are different:

The first time a block is changed in a datafile in the hot backup mode, the entire block is written to the redo log file, not just the changed bytes. This is because you can get into a situation where the process of copying the datafile and the database writer (DBWR) are simultaneously working on the same block. Hence, the entire block image is logged so that during recovery, the block is totally rewritten from redo and is consistent.
The datafile headers which contain the System Change Number (SCN) of the last completed checkpoint are not updated while a file is in the hot backup mode. The DBWR process constantly writes to the datafiles during the hot backup. The SCN recorded in the header tells us how far back in the redo stream one needs to go to recover the file.

Tip: To limit the effect of this additional logging, you should place only one tablespace at a time in the backup mode and bring the tablespace out of the backup mode as soon as you have finished backing it up. This will reduce the number of blocks that may have to be logged to the least possible.

Well, this was a huge amount of information for one blog post. In part III of this series, we will discuss High Availability and much more. Be tuned!

Regards,

Francisco Munoz Alvarez

Want to learn more?

Check these amazing content about Oracle MAA and MSA available for you:

Oracle Maximum Availability Architecture (MAA) Main Page – https://bit.ly/3ysinDY

Database Security Main Page – https://bit.ly/3NSJWw3

Cyber Security Blog – https://bit.ly/3yPVZpG

Cyber Security Technical Paper – https://bit.ly/3akX1Ao

Securing the Oracle Database – a technical primer (free fourth edition) – https://bit.ly/3bUYQVa

Security Free Labs – https://bit.ly/3NLjCDW

Disaster and Recovery Free Labs – https://bit.ly/3yms6vK

References:

Oracle Backup and Recovery Survival Guide – https://www.amazon.com.au/Oracle-Database-Backup-Recovery-Survival/dp/1782171207

Back-to-Basics: Understanding Backup and Recovery plus High Availability(Part II – Redo)

Related

Leave a Reply Cancel reply