Oracle Data Profiling is a data investigation and quality monitoring tool. It allows business users to assess the quality of their data through metrics, to discover or infer rules based on this data and to monitor the evolution of data quality over time.
In this article I will explain which steps we have to follow to be able to investigate and profile data.
Step 1: Configure the Metabase and the Connections
A Metabase stores data objects. It also stores any information related to the stored data, called Metadata. To configure it please:
1. Select Start > All Programs > Oracle > Oracle Data Profiling and Quality > Metabase Manager to Log in to the Metabase Manager as the Metabase Administrator (madmin)
2. Select Tools > Add Metabase from the menu
3. Add a metabase named clearpeaksdq, with the default pattern and a Public Cache Size of 16 Mb, and then click OK
4. Select Tools > Add User from the menu
5. Add a User named CPUser with the password clearpeaks, as shown below, then click OK
6. Select Tools > Add Metabase User to add the CPUser user to the clearpeaksdq metabase, as shown below, and then click OK
7. Select Tools > Add Loader Connection
8. Create a loader connection for Oracle Database. Use valid TNS Name stored in tnsnames.ora file for Oracle Database client
Step 2: Import Data and Create Entities
This step describes how to import data into a Metabase and create a Metabase object, called an Entity. You can create one of two types of Entities:
Entity: Referred to as a real Entity, this type contains metabase data imported from an external data source
Dynamic Entity: This Entity type links directly to an external data source
To import the Entity please:
1. Login to Oracle Data Quality client (Start > All Programs > Oracle > Oracle Data Profiling and Quality > Oracle Data Profiling and Quality) using the following information:
- Repository: primary
- Metabase: clearpeaksdq
- Username: CPUser
- Password: clearpeaks
Main Elements of Oracle Data Profiling and Quality Control Center
2. Select Analysis > Create Entity from the menu
3. Select the Delimited Loader Connection, and then click Next
If you want to reduce the number of connections displayed, under Connection list currently filtered on, type a new filter expression and click Change filter. For example, if you only want to list connections that start with “cust”, your filter would look like the graphic example displayed.
4. Select one or multiple data source file names in the list. You can use Add SQL WHERE clause when it’s necessary to retrieve data selectively
5. Click Next
6. Select the load parameters for the Entity creation process:
7. Click Next to continue.
8. In Confirm Settings, review the list of settings and click Finish to schedule the Entity creation job. You can schedule a job to:
- Run Now - run the job immediately.
- Run Later - schedule to run at another date and time.
Cancel - do not create the Project
Step 3: Monitor the Entity Creation Process
If you are importing a large volume of data, the Entity creation (data load) process may take some time to complete, depending on the job parameters and Load Connection settings selected. To monitor the Entity creation process:
1. From the main menu, select Window > New Window
Select Analysis > Background Tasks. All background tasks display in the new window
Step 4: Set Up Project
Oracle Data Profiling Projects contain references to one or more Entities (Attributes and Permanent Joins) stored in a Metabase and allow you to logically group objects for the purpose of data investigation. To create an Oracle Data Profiling Project from the Projects tab:
1. Click the Projects tab
2. Right-click Profiling and select Create Project....
3. Select one or multiple data source file names in the list. You can use Add SQL WHERE clause when it’s necessary to retrieve data selectively
4. Click Next
See the load parameters for the Entity creation process:
The Create Profiling Project wizard displays:
- In Name, type a name for the Project
- In Description, type a brief description of the Project
- Place a checkmark next to the Entities to include
After 4 quick steps you are able to profile you data. By using Oracle Data Profiling, you can increase your data profiling efficiency by 90% or more over manual methods, and eliminate the need to design data samples or build queries and run analyses on production systems. In next article I will try to present ways of effective data profiling.