How to normalize and load a source file | In Informatica Powercenter

.

There are existing information systems such as CRM’s (Siebel, etc) and ERP’s (EBS, SAP, etc) where the information is stored in a way in which it is easy to extract information and feed our data warehouses, however a large number of customers still manage their information in so-called flat files (i.e.: .csv, .xls, etc) which are not as user-friendly.

Such situations still commonly exist; therefore, the purpose of this article is to explain the process to resolve this.

Continue reading this post >

Pivot Data in Standard Query Language (SQL)

.
Pivoting data is one of the most common techniques in BI to present and manipulate data. While doing this using an ETL or BI tool is usually very simple, doing it in SQL is a more tedious task.
In this article we will see two different ways to pivot data with SQL. The first one is the classical way, while the second one uses the PIVOT clause available in Oracle 11g.
Scenario
The goal is to get the data in the following table as a data source…

Pivoting data is one of the most common techniques in BI used to present and manipulate data. Using an ETL or BI tool is usually a very simple solution; accomplishing it in SQL is more tedious.

In this article we will see two different ways to pivot data with SQL. The first one is the classical way, while the second one uses the PIVOT clause available in Oracle 11g.

Continue reading this post >

Modeling Related Dimensions

.
Properly designing the data model of a data warehouse is well known to be key for the success of a Business Intelligence project. Today we will discuss a typical scenario that I’ve come across many times when discussing designs with other people: Modeling Related Dimensions.
Scenario
Two dimensions may have a strong relationship. For instance, we could sell personalised products for each customer. So each product is related to one customer only.

Properly designing the data model of a data warehouse is well known to be key for the success of a Business Intelligence project. Here we will discuss a typical scenario that I’ve come across many times when discussing designs with other people: Modeling Related Dimensions.

Scenario

Two dimensions may have a strong relationship. For instance, we could sell personalised products for each customer. So each product is related to one customer only. Continue reading this post >

Benefits of Data Aggregation in KPI Reporting

.
In today’s business world, decisions are made based on vast amounts of data Therefore, quick access to data is vital to make the right decisions at the right time.
The conflicting objectives when trying to achieve individual goals in the two asserts above include:
· Querying huge amounts of data
· A low response time
This article, based on a real case, highlights the importance of aggregating data when showing key performance indicators (KPI) in dashboards.
Scenario
The organisation is the financial department of a telecommunications company in this specific scenario. Their goal is to maintain a set of dashboards presenting a group of previously defined KPIs.
The data source is the financial module of the corporate ERP. An ETL loads the required source data into a data warehouse. For the sake of this scenario, four stars are being loaded:
· Employee expenses: Contains the individual expense transactions for each employee (e.g. Salary payment, dismissal payment, etc).
· Customer revenues: Contains the incoming revenue transactions derived from the services and products contracted by the customers.
· Other revenues: Contains the incoming revenue transactions derived from other concepts.
· Supplier expenses: Contains the supplier payment transactions.
The financial dashboard requirements are:
· Margin report (Revenues vs. Expenses)
· Revenue & Expense trend in the last one/three/five years
· Current vs. Past year revenue & expense comparison
· Current vs. Past month revenue & expense comparison
Solely creating the required stars within the data warehouse and having the BI tool querying them and building the dashboards will result in poor performance due to:
· High volumes stored in the data source (ERP)
· KPIs are built based on multiple stars (multiple stars need to be queried, then data needs to be calculated to obtain the required KPI)
The solution: Aggregation
Aggregation is going to reduce the number of rows to be queried to obtain the KPI values. By doing this, the time required to refresh the dashboards will reduce dramatically, thus reducing resource consumption and end user wait time.
Aggregation consists on rolling up the vast amounts of detail data into higher levels of the dimension hierarchies. For instance, we do not need to store each individual transaction. Instead, we can group big sets of rows, based on the month these transactions occurred and the transaction type, into just a few rows.
In our scenario, the number of rows can be reduced from millions of rows in the bigger fact tables down to a few thousand. The impact on refresh time for the Margin report is dramatic, as you can see in the statistics figure below:
Taking the solution one step beyond: The KPI repository
The aggregation solution has proven to reduce the amount of time spent in refreshing reports by reducing the number of rows to be queried. However, it does not solve the overall issue as KPIs are typically based on calculations involving multiple stars.
If performance still needs to be improved, there is an option to eliminate the calculations at report refresh time. In this case, the ETL should take care of the calculations, and the results should be pre-stored into a new star (the KPI repository).
An example of KPI repository table is shown in the following figure:
The benefits of using the KPI repository are:
· Performance
The execution and refresh time of the dashboard is shorter as:
· The data is pre-calculated.
· The query retrieves fewer rows than when querying the detail transaction stars.
· The number of fact tables to be queried is reduced from four to one. Therefore, in order to get a KPI value, one only fact table needs to be queried and no calculations are needed at runtime.
· Customised calculations
The ETL process performs a number of specific calculations and stores the results into the new fact table. The calculations that can be done during the ETL process are typically much more powerful and complex than those that can be done at runtime by the BI tool.
In our scenario the new fact table includes two customised calculations:
· Variance Previous Month
· Variance Previous Year
Conclusion
Despite the extra cost to the data warehouse project, the benefits of using aggregations in a data warehouse overcome any difficulties of dealing with them.
Using the KPI repository can take performance one step beyond, pre-calculating the results, thus minimising the time needed to refresh the reports.
It is strongly recommended to use aggregations to speed up the queries and thus decrease the report refresh time and increase the end user satisfaction.

In today’s business world, decisions are made based on vast amounts of data; therefore, quick access to data is vital to make the right decisions at the right time.

The conflicting objectives when trying to achieve individual goals in the two asserts above include:

· Querying huge amounts of data

· A low response time

This article, based on a real case, highlights the importance of aggregating data when showing key performance indicators (KPI) in dashboards.

Continue reading this post >

SQL Override: Mapping Reusability in Informatica

.

In a not so remote past, one of the biggest frustrations for ETL designers was having to replicate a job many times in order to accommodate slight differences depending on the data source. Whether it’s a change in the source table definition or a different clause in the extract SQL, today’s ETL tools have embed capabilities that allow for the reuse of metadata objects to save time and improve maintainability of the workflows.

In this post, I will describe with simple examples how to best reuse ETL mappings in Informatica PowerCenter using the SQL override capability.

Continue reading this post >

privacy policy - Copyright © 2000-2010 ClearPeaks

topnav