Tip of the Month: How to have super powers using Data Pump!

A lot of people don’t know several powerful functionalities that we have available when using Data Pump (expdp/impdp), most of the people only use these tools to export and import data (in other words, only to move data), and never notice that it can be used for example to help us to do:

Data Masking
Build a Metadata Repository
Create a version control
Clone Users (Create a new user using and existent user as a template)
Create smaller copies of production
Create your database in a different file structure
Move all objects from one tablespace to another
Move a object to a different schema (A simple example, change a table owner)

Now let’s see how each functionality I mentioned above can be used at real life.

1) Data Masking

In many organizations (I hope so) the DBA’s had the obligation for a security and compliance purpose to mask all sensible information that leaves the production environment to as an example to refresh or create a QA/Test or Dev environment. To help us to address this kind of requirements we could easily use the Enterprise Manager Data Masking Pack (remember it is an extra pack, and consequently you need to pay extra to use it), or as a different option, use the “remap_data” parameter available in Data Pump to help you with this requirement(**this is a new functionality at 11g)!

Let’s use the classic SSN (Social Security Number) example to illustrate how it works:

a) First let’s create the table for the test and load some data on it.
SQL> CREATE TABLE HR.EMPLOYEE
  2  ( EMP_ID   NUMBER(10) NOT NULL,
  3    EMP_NAME VARCHAR2(30),
  4    EMP_SSN  VARCHAR2(9),
  5    EMP_DOB  DATE
  6* )
SQL> /
insert into hr.employee values (101,'Francisco Munoz',123456789,'30-DEC-73');
insert into hr.employee values (102,'Horacio Miranda',234567890,'17-JUL-76');
insert into hr.employee values (103,'Evelyn Aghemio',659812831,'02-OCT-79');
b) The second step will be to create the remap function:
SQL> create or replace package pkg_masking
   2 as
   3 function mask_ssn (p_in varchar2) return varchar2;
   4 end;
   5 /
SQL> create or replace package body pkg_masking
   2 as
   3 function mask_ssn (p_in varchar2)
   4 return varchar2
   5 is
   6 begin
   7 return lpad (
   8 round(dbms_random.value (001000000,999999999)),9,0);
   9 end;
  10 end;
  11 /
This function will take a varchar argument and returns a 9 char. We will use this function to mask all SSN information inside our employee table.
SQL> desc employee
Name                                      Null?    Type
----------------------------------------- -------- ----------------------------
EMP_ID                                    NOT NULL NUMBER(10)
EMP_NAME                                           VARCHAR2(30)
EMP_SSN                                            VARCHAR2(9)
EMP_DOB                                            DATE
SQL> select * from employee;
EMP_ID     EMP_NAME                       EMP_SSN   EMP_DOB
---------- ------------------------------ --------- ---------
101        Francisco Munoz                123456789 30-DEC-73
102        Horacio Miranda                234567890 17-JUL-76
103        Evelyn Aghemio                 345678901 02-OCT-79
For this example, all you want to mask is the column EMP_SSN, which contains the SSN of each employee.

b) Now we are going to export the table employees using the expdp tool, and while exporting, we will use the parameter “remap_data” to mask the data for us in the dump file using the function we previously created.
$ expdp hr/hr tables=hr.employee dumpfile=mask_ssn.dmp directory=datapump remap_data=hr.employee.emp_ssn:pkg_masking.mask_ssn
Note: By defect the “remap_data” parameter will use the user doing the export as the owner of the remap function, if the schema owner of the function is different you will need to use the following commad:
$ expdp hr/hr tables=hr.employee dumpfile=mask_ssn.dmp directory=datapump remap_data=hr.employee.emp_ssn:owner.pkg_masking.mask_ssn
c) Now all we need to do is to import the mask_ssn.dmp in our QA/Test or Dev Database and it will magically have the new values there.
SQL> select * from employee;
EMP_ID     EMP_NAME                       EMP_SSN   EMP_DOB
---------- ------------------------------ --------- ---------
101        Francisco Munoz                108035616 30-DEC-73
102        Horacio Miranda                324184688 17-JUL-76
103        Evelyn Aghemio                 638127075 02-OCT-79
Note: you can use the “remap_data” option in the impdp tool if you have a normal export done before 😉 , also remember that you can use it to mask almost everything, but please take in consideration your application requirements and data integrity requirements when using it!

For more information regarding this option and to see another examples, please refer to this paper: http://www.oracle.com/technology/products/database/utilities/pdf/datapump11g2009_transform.pdf

2) Metadata Repository and Version Control

As a DBA, I’m always looking for proactive ways to allow me to be prepared in case of a disaster strike or if an emergency release rollback is required (I love to use always the “What if” methodology), and due to these reasons, have a metadata repository and version control of it is always useful .

But how can I easily create it? Easy, first do a full backup of your database using Datapump.
$ expdp user/password content=metadata_only full=y directory=datapump dumpfile=metadata_24112010.dmp
Note: If you want to create a repository only for objects like procedures, packages, triggers, … , all you need to do is add the parameter “include=<procedures,packages,triggers,…> to your expdp command, I usually include in the dump file name the date of the dump for reference purpose and best practice.

Then use the impdp tool to create the SQL file that will allow you to create all objects in your Database. It will be something like this:
$ impdp user/password directory=datapump dumpfile= metadata_24112010.dmp sqlfile=metadata_24112010.sql
This simple technique will allow you to create your metadata repository easily and also keep a versioning of your database objects as an extra, also if you create your repository (DB) and you want to refresh an object definition (as example let use the table emp from schema “scott”), all you will need to do is an export of the new table definition from your source database and then import it on your target database (your repository) as show bellow:
$ expdp user/password content=metadata_only tables=scott.emp directory=datapump dumpfile= refresh_of_table_emp_24112010.dmp
$ impdp user/password table_exists_action=replace directory=datapump dumpfile= refresh_of_table_name_24112010.dmp

3) Clone a User

In the past when a DBA had the need to create a new user with the same structure (All objects, tablespaces quota, synonyms, grants, system privileges, etc) was a very painful experience, now all can be done very easily using Data Pump, let use as an example that you want to create the user ”Z” exactly like the user “A”, to achieve this goal all you will need to do is first export the schema “A” definition and then import it again saying to the Data Pump to change the schema “A” for the new schema named “Z” using the “remap_schema” parameter available with impdp.
$ expdp user/password schemas=A content=metadata_only directory=datapump dumpfile= A_24112010.dmp
$ impdp user/password remap_schema=A:Z directory=datapump dumpfile= A_24112010.dmp
And your new user Z is now created like your existing user A , that easy!

4) Create smaller copies of production

That is a very common task for a DBA, you are always having a task to create a copy of your Database (for development or test purpose) but your destination server don’t have enough space to create a full copy of it! This can be easily solved with Data Pump, for this example, let say that you only have space for 70% of your production database, now to know how to proceed, we need to decide if the copy will contain metadata only (no data/rows) or if it will include the data also. Let’s see how to do each way:

a) Metadata Only

First do a full export of your source database.
$ expdp user/password content=metadata_only full=y directory=datapump dumpfile=metadata_24112010.dmp
Then, let’s import the metadata and tell the Data Pump to reduce the size of extents to 70%, you can do it using the parameter “transform” available with “impdp”, it represent the percentage multiplier that will be used to alter extent allocations and datafiles size.
$ impdp user/password transform=pctspace:70 directory=datapump dumpfile=metadata_24112010.dmp
Let’s do a test and see if this is really true, first let export any table of my test database (metadata only) and generate the “sql” script to see the normal size of it.
$expdp user/password content=metadata_only tables=user.x_integration_log_det directory=datapump dumpfile=example_24112010.dmp
$impdp user/password content=metadata_only directory=datapump dumpfile=example_24112010.dmp sqlfile=x_24112010.sql
CREATE TABLE "USER"."X_INTEGRATION_LOG_DET"
      ( "BATCH_NO" NUMBER(9,0),
        "SEQUENCE#" NUMBER(9,0),
        "FILENAME" VARCHAR2(200 BYTE),
        "ERROR_MESSAGE" VARCHAR2(2000 BYTE),
        "NO_OF_RECORDS" NUMBER,
        "STATUS" VARCHAR2(2000 BYTE)
      ) PCTFREE 10 PCTUSED 0 INITRANS 1 MAXTRANS 255 NOCOMPRESS LOGGING
STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT)
TABLESPACE "CUSTSERV_LOG_DET_DATA" ;
Above is the SQL code generated by Data Pump, you can see that the table is going to be created using 65536 for the initial extent and 1048576 for the next extent, now let’s generate it again but using the transform parameter to reduce the size of it to 70% of original size.
$impdp user/password transform=pctspace:70 content=metadata_only directory=datapump dumpfile=example_24112010.dmp sqlfile=x_24112010.sql
CREATE TABLE "USER"."X_INTEGRATION_LOG_DET"
    ( "BATCH_NO" NUMBER(9,0),
      "SEQUENCE#" NUMBER(9,0),
      "FILENAME" VARCHAR2(200 BYTE),
      "ERROR_MESSAGE" VARCHAR2(2000 BYTE),
      "NO_OF_RECORDS" NUMBER,
      "STATUS" VARCHAR2(2000 BYTE)
    ) PCTFREE 10 PCTUSED 0 INITRANS 1 MAXTRANS 255 NOCOMPRESS LOGGING
STORAGE(INITIAL 45875 NEXT 734003 MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT)
TABLESPACE "CUSTSERV_LOG_DET_DATA" ;
Above is the SQL code generated by Data Pump, and you can see that the table is now going to be created using 45875 for the initial extent and 734003 for the next extent, clearly reduced 30% of the original size, in other words, it works.

Please refer to Oracle documentation for more ways to use the transform parameter, you will not regret 😉

b) Metadata and data

First does a full export of your source database using the export parameter “sample”, this parameter specify a percentage of the data rows to be sampled and unload from your source database, in this case let’s use 70%.
$ expdp user/password sample=70 full=y directory=datapump dumpfile=expdp_70_24112010.dmp
Then, all you need to do as the example before is to import it telling the Data Pump to reduce the size of extents to 70%, and that’s it!
$ impdp user/password transform=pctspace:70 directory=datapump dumpfile=expdp_70_24112010.dmp

5) Create your database in a different file structure

This is very easy to archive, all you need to do is use the parameter “remap_datafile” on your import command as the example bellow:
$ impdp user/password directory=datapump dumpfile=example_24112010.dmp remap_datafile=’/u01/app/oracle/oradata/datafile_01.dbf’:’/u01/datafile_01.dbf’

6) Move all objects from one tablespace to another

This is very easy to do it, as the previous example, all you will need to do is use the parameter “remap_tablespace” on your import command as the example bellow:
$ impdp user/password directory=datapump dumpfile=example_24112010.dmp remap_tablespace=OLD_TBS:NEW_TBS

7) Move a object to a different schema

All you will need to do is use the parameter remap_schema as the example bellow when importing it.
$ expdp user/password tables=user.table_name directory=datapump dumpfile=table_name_24112010.dmp
$ impdp user/password directory=datapump dumpfile=table_name_24112010.dmp remap_schema=old_schema:new_schema

Always remember, Data Pump is your good friend, but RMAN still your best friend ;). You can easily guess my next month topic will be… :p

4 thoughts on “Tip of the Month: How to have super powers using Data Pump!”

Kate Johnson says:

April 30, 2010 at 7:10 pm

What version does expdp “sample=” and impdp “transform=” become valid?
expdp user/password sample=70 full=y directory=datapump dumpfile=expdp_70_24112010.dmp
impdp user/password transform=pctspace:70 directory=datapump dumpfile=expdp_70_24112010.dmp

admin says:

April 30, 2010 at 9:20 pm

Dear Kate,

Thank you for your question, they become valid (in other words, were introduced) in the version 10.2.0.1, in the version 10.1.0.1 only the transform parameter was available but only the SEGMENT_ATTRIBUTE and STORAGE options, the PCTSPACE option was only introduced with 10.2.0.1.

For more information please refer to the following 10.2.0.1 Utilities official Documentation: http://download.oracle.com/docs/cd/B19306_01/server.102/b14215/part_dp.htm

Kind Regards,

Francisco Munoz Alvarez

Julio Cesar Correa says:

May 9, 2010 at 5:38 am

Great!
I liked the option 4(create smaller copies),2(version control) and 1) Data Masking .Really,I never had used it.
I had use remap_schema,remap_tablespace and etc.

Really is a useful post!

Thanks Francisco!

JC

Pingback: Data pump | Oracle-dba

Tip of the Month: How to have super powers using Data Pump!

Related

4 thoughts on “Tip of the Month: How to have super powers using Data Pump!”

Leave a Reply Cancel reply