OBIEE Data Lineage Solution

Tracking OBIEE reporting data

 
BI implementations may use hundreds of data warehouse tables and thousands of reports, and one of the most common requirements in a BI project is the ability to track back the data used in reports. The possibility to quickly identify the set of columns and tables used in a given report or dashboard, or to check which reports may be affected by a change of physical column in the data warehouse, is also crucial for development. This calls for a data lineage solution which is accessible to users and developers,  and allows checking such data on an ad hoc basis.
 
Various vendors offer data lineage solutions, but these can be expensive and vendor-specific. With our simple solution, we combine Catalog Manager and Administration Tool export sources to create an easily accessible solution for tracking data lineage in OBIEE.
 
By implementing the OBIEE data lineage solution, we can check the following:
 

  1. Which physical tables and columns are used in a given report or dashboard
  2. Which reports use given physical columns or tables; this is especially important when modifying an existing table, as any change in the table’s structure must take existing reports into consideration
  3. Which are the most commonly used columns in reports in a given subject area; identifying the most commonly used columns in a report can hint at creating indexes to improve the overall performance of the OBIEE implementation even further.

 

Solution

 
The ClearPeaks OBIEE Data Lineage Solution gathers all the required data lineage information in one place and uses it as a source for OBIEE reports within Data Lineage Subject Area. Two sources are combined to achieve this:
 

  1. List of OBIEE reports and presentation columns used

 
Catalog Manager provides an option to export the list of reports and columns used. The export can be done either with the Catalog Manager UI tool or through the command line utility. We use the latter option, as it allows automation of the whole process later.
 

  1. Mappings of presentation columns back to the physical columns

 
Such mappings can be obtained manually by creating Repository Documentation from Administration Tool’s Repository Documentation utility. The output will be a repository documentation in the form of comma separated values file (.csv). This will contain column mapping from presentation through logical to physical layers, including column formula expressions. Another way to obtain column mappings is by extracting the OBIEE repository in .xml file format through the command line utility. Our solution uses the second option, as the repository file will be used in our automated script.
 
OBIEE Data Lineage Solution
 
Once we have obtained both files, we need to populate the data lineage tables.
 
The data can be transformed and manually inserted into the tables, but in our solution we use a script (which can run on the OBIEE server) that parses the data and inserts it into the tables.
 

Data Lineage Subject Area

 
Once we have populated the data lineage tables and their data model has been mapped in Administration Tool, we can create and run reports in OBIEE using Data Lineage Subject Area and filter the results according to our requirements.
 
OBIEE Data Lineage Solution
 
Let us look at a few of the use cases for the Data Lineage Subject Area:
 

Use case 1. Which data warehouse tables and columns are used in a given report?

 
We would like to know which data warehouse tables and columns are used in a particular report or dashboard. We can create a report with a list of the columns used by a given OBIEE report and their data lineage:
 
OBIEE Data Lineage Solution
 

Use case 2. Which reports use given physical tables?

 
We want to know how many and which reports or dashboards are using given physical tables or particular columns; this could be very useful when assessing the potential impact of column formula or table structure changes on reporting. Using Data Lineage Subject Area we can fetch up the list of OBIEE reports used by a given physical table:
 
OBIEE Data Lineage Solution
 

Use case 3. Which reports use given subject areas?

 
We need to know which reports and dashboards are accessing data from given subject areas. This may be particularly useful when revising users’ access permissions.
 
OBIEE Data Lineage Solution
 

Future Improvements

 
OBIEE Data Lineage Subject Area can also serve as a backbone for further developments, providing complex and complete solutions for managing existing OBIEE project implementations. Here are some examples and potential benefits of merging additional information into the solution:
 

Usage Tracking data – allows analysis of the physical tables used in the most accessed reports, the tables and columns in reports not used by anyone, removing non-used reports or underlying tables.

 

Data warehouse database metadata – such as table size, indexes on columns. This allows for performance analysis of the most heavily used tables and columns by report usage.

 

ETL Data Lineage – additional layer of data lineage tracking – allowing the tracking of data back to the very source – can be achieved by adding ETL transformation data obtained from ETL systems. For example, it is possible to track all the ETL transformations on a given presentation column down to the source system.

 

Catalog metadata – it is possible to extract additional information regarding catalog objects such as user permissions, report owners, last modified date etc., to further enhance the solution usability.

 
Adding all the above components creates a strong core for the OBIEE management dashboard, allowing continuous improvement through:
 

* tracking data from the report to the source database column
* constant analysis of the most used tables in order to improve performance
* checking which users have permissions for given OBIEE reports and dashboards

 
All of the above is accessible from the OBIEE front-end, providing a convenient and quick way to facilitate many daily business-as-usual tasks in BI deployments.
 

Conclusion

 
The ClearPeaks OBIEE Data Lineage Solution can be easily deployed in any project using Oracle Business Intelligence Enterprise Edition. The solution can be run from the command line tools, which makes it possible to create automated jobs to extract and update data on a regular basis.
 
If you think your OBIEE project could benefit from our data lineage solution, please contact us for more information via our web contact form or by leaving your comments below!

Rafal O
rafal.ostaszewski@clearpeaks.com