Let start this New Year with a new mentality or a new resolution if you prefer call this way, let stop being reactive and start being proactive. Being proactive will reduce your database administration costs and increase your level of efficiency, and best of all, save your hours of sleep when you are on call.
Why check the problems only when they are critical, or when is too late and the database is down, or the users are screaming?
Being proactive is the best approach to keep your DB healthy and to show your company, or your clients that you really care about them.
Many DBA’s expend most of their time being firefighters only, fixing problems and working on user’s requests all the time. They don’t do any proactive work; this mentality only will cause an overload of work to them, thousands of dollars of overtime, several hours without access to the data to the users, poor performance to the applications, and what is worse of all, several unhappy users thinking that you doesn’t have the knowledge needed to take care of their data.
Let’s mention a small example, you have the archive log area alert set to fire when it is 95% full, and this happens in the middle of the night, some DBA’s will take seriously the alert and solve the problem quickly, others will wait until the next day to take care of it because they are tired, or sleeping, or they are in a place without internet access at the moment the alert arrived. Will be a lot easier if they set a proactive alert to be fire when 75% or 85%, or even better ,take a look in the general health status of the DB before leave their work shift, to try to detect and solve any possible problem before be a real problem and be awake in the middle of the night or during the weekend (Remember how important is your personal and family time). I’ll always recommend to DBA’s to run 2 checklists daily, one in the start of their shift and other before they leave their shift.
I know several DBA’s that complain all the time that they got so many calls when they are on call, but they don’t do anything to solve the root problem, they only expend their time to solve the symptoms.
Here you can find an Oracle checklist script that will help to make your life a little easier (This is not my complete script, but will be a good start for you). This script is a compilation of several normal checklists and you can setup them with your own requirement and thresholds and always remember to have a baseline to compare. This script will not only help you to detect future or current problems, but also will help you to detect possible tuning requirement.
Here is an example of the script first phase outcome:
— ———————————————————————– —
— Oracle Instance Information
— ———————————————————————– —
Cpu_Count 4 | Host_Name OLIVER
Instance_Name prod | Database_Status ACTIVE
Status OPEN | Startup_Time 10-01-2009 19:50
Version 11.1.0.7.0 | Instance_Role PRIMARY_INSTANCE
Database Space (Mb) 36604 | SGA (Mb) 511
Nb. Datafiles 43 | Nb. Tempfiles 1Archive destination LOCATION=E:\oracle\oradata\prod\archive
Database log mode ARCHIVELOG
Background Dump Dest d:\oracle\diag\rdbms\prod\prod\trace
Spfile D:\ORACLE\PRODUCT\11.1\PROD\DATABASE\SPFILEPROD.ORA
Redo size (Kb) 102400— ———————————————————————– —
— Instance CheckList —
— ———————————————————————– —
Instance Status OK | Listener Status OK
— ———————————————————————– —
— Performance Memory CheckList —
— ———————————————————————– —
Total Sessions < 700 OK – 19
Active sessions number <15 OK – 9
Data Buffer Hit Ratio > 80 OK – 97
L.Buffer Reload Pin Ratio > 99 OK – 99
Row Cache Miss Ratio < 0.015 NO – 1.351
Dict.Buffer Hit Ratio > 80 OK – 99
Log Buffer Waits = 0 
60; NO – 110
Log Buffer Retries < 0.0010 OK – 0
Switch number (Daily Avg) < 5 OK – 1
Jobs Broken = 0 OK 0
Shared_Pool Failure = 0 OK – 0
— ———————————————————————– —
— Storage CheckList —
— ———————————————————————– —
Dba_Tablespaces Status OK | V$Log Status OK
V$Datafile Status OK | V$Tempfile Status OK
V$Recover_File OK | V$Recovery_Log OK
Tablespace in Backup Mode = 0 OK – 0
Tablespace < 95% OK- 0
Objects Invalid = 0 NO – 147
Indexes unusable = 0 OK – 0
Trigger Disabled = 0 NO- 5
Constraint Disabled = 0 NO – 2
Objects close max extents = 0 OK – 0
Objects can not extent = 0 NO – 552
User Objects on Systems = 0 NO – 26
FK Without Index = 0 NO – 138
— ———————————————————————– —
— Datagard CheckList —
— ———————————————————————– —
Datagard Errors = 0 OK- 0
Datagard Gap = 0 OK – 0
Archives not Aplied < 5 OK – 2
— ———————————————————————- —
— Installed options :
— ———————————————————————- —
– Objects option
– Connection multiplexing option
– Connection pooling option
– Database queuing option
– Incremental backup and recovery option
– Instead-of triggers option
– Parallel load option
– Proxy authentication/authorization option
– Plan Stability option
– Coalesce Index option
– Transparent Application Failover option
– Sample Scan option
– Java option
– OLAP Window Functions option
You also have several tools available in the market that can help you to monitor and setup your DB alerts, and help you with the proactive monitoring like: Grid Control, Enterprise Manager, Insider (FourthElephant), Spotlight (Quest) or if you prefer, your own scripts. The idea is to use them always on a proactive way, never reactive.
Let’s change our mentality, let stop being a firefighter and start to be a real hero!
Cheers,
Francisco Munoz Alvarez