By Francisco Munoz Alvarez
Redo Logging I/O-Related Wait Events
There are a number of Wait Events that happen during Redo Logging activities and most of them are I/O-related.
The two most important ones are 'log file parallel write' and 'log file sync'. Oracle foreground processes wait for 'log file sync' whereas the LGWR process waits for 'log file parallel write'.
Although we usually find 'log file sync' in the "Top 5 Wait/Timed Events" section of the Statspack report, in order to understand it we will first look at 'log file parallel write':
'log file parallel write'
The LGWR background process waits for this event while it is copying redo records from the memory Log Buffer cache to the current redo group's member log files on disk.
Asynchronous I/O will be used if available to make the write parallel, otherwise these writes will be done sequentially one member after the other. However, LGWR has to wait until the I/Os to all member log files are complete before the wait is completed. Hence, the factor that determines the length of this wait is the speed with which the I/O subsystem can perform the writes to the log file members.
To reduce the time waited for this event, one approach is to reduce the amount of redo generated by the database:
-
Make use of UNRECOVERABLE/NOLOGGING options.
-
Reduce the number of redo group members to the minimum necessary to ensure not all members can be lost at the same time.
-
Do not leave tablespaces in BACKUP mode for longer than necessary.
-
Only use the minimal level of Supplemental Logging required to achieve the required functionality e.g. in LogMiner, Logical Standby or Streams.
Another approach is to tune the I/O itself:
-
Place redo group members on storage locations so that parallel writes do not contend with each other.
-
Do not uses RAID-5 for redo log files.
-
Use Raw Devices for redo log files.
-
Use faster disks for redo log files.
-
If archiving is being used setup redo storage so that writes for the current redo group members do not contend with reads for the group(s) currently being archived.
'log file sync'
This Wait Event occurs in Oracle foreground processes when they have issued a COMMIT or ROLLBACK operation and are waiting for it to complete.
Part (but not all) of this wait includes waiting for LGWR to copy the redo records for the session's transaction from Log Buffer memory to disk.
So, in the time that a foreground process is waiting for 'log file sync', LGWR will also wait for a portion of this time on 'log file parallel write'.
The key to understanding what is delaying 'log file sync' is to compare average times waited for 'log file sync' and 'log file parallel write':
-
If they are almost similar, then redo logfile I/O is causing the delay and the guidelines for tuning it should be followed.
-
If 'log file parallel write' is significantly different i.e smaller, then the delay is caused by the other parts of the Redo Logging mechanism that occur during a COMMIT/ROLLBACK (and are not I/O-related). Sometimes there will be latch contention on redo latches, evidenced by 'latch free' or 'LGWR wait for redo copy' wait events.
Handling Block Corruptions due to NOLOGGING
If a NOLOGGING (or UNRECOVERABLE) operation is performed on an object and the datafile containing that object is subsequently recovered then the data blocks affected by the NOLOGGING operation are marked as corrupt and will signal an ORA-1578 error when accessed. In Oracle8i an ORA-26040 is also signaled ("ORA-26040: Data block was loaded using the NOLOGGING option") which makes the cause fairly obvious, but earlier releases have no additional error message. If a block is corrupt due to recovery through a NOLOGGING operation then you can use the Metalink Note 28814.1 onwards but note that:
-
Recovery cannot retrieve the NOLOGGING data
-
No data is salvageable from inside the block
At this point basically you can do:
-
the indexes with corrupt blocks can be dropped and re-created,
-
the corrupt tables can be dropped and built from an alternative data source.
-
the data file(s) impacted by NOLOGGING operations can be refreshed from the Primary or backup which completed after NOLOGGING operation.
-
Or a combination of the above.
Currently in Oracle 9i and Oracle 10gR1, only the primary’s database V$DATAFILE view reflects NOLOGGING operations.. In 10gR2, the V$DATAFILE view will be enhanced to include information regarding when an invalidation redo is applied and the aforementioned corrupted blocks are written to the corresponding data file on a Redo Apply (or media recovery or standby) instance.
Repair NOLOGGING Changes on Physical and Logical Standby Databases
After a NOLOGGING operation on the primary is detected, it is recommended to create a backup immediately if you want to recover from this operation in the future. However there are additional steps required if you have an existing physical or logical standby database. This is crucial if you want to preserve the data integrity of your standby databases.
For a physical standby database, Redo Apply will process the invalidation redo and mark the corresponding data blocks corrupt, follow these steps² to reinstate the relevant data files:
- Stop Redo Apply (recover managed standby database cancel)
- Offline corresponding datafile(s) (alter database datafile <NAME> offline drop ; )
- Start Redo Apply (recover managed standby database disconnect)
- Copy the appropriate backup datafiles over from the primary database (e.g. use RMAN to backup datafiles and copy them)
- Stop Redo Apply (recover managed standby database cancel)
- Online corresponding data files (alter database datafile <NAME> online ; )
- Start Redo Apply (recover managed standby database disconnect)
For a logical standby database, SQL Apply skips over the invalidation redo completely; so, the subsequent corresponding table or index will not be updated. However, future reference to missing data will result in ORA-1403 (no data found). In order to resynchronize the table with the primary table, you need to re-create it from the primary database. Follow the steps described in Oracle Data Guard Concepts and Administration
Note. Basically, you will be using the DBMS_LOGSTDBY.INSTANTIATE_TABLE procedure.
¹Invalidation redo containing information about the NOLOGGING operation and the range of blocks it affects.
²Please also refer to the Data Guard documentation:
http://download-west.oracle.com/docs/cd/B14117_01/server.101/b10823/scenarios.htm#1015741
Wait for next part, I will talk about how to detect sessions generating a lot of Redo.