Sometimes because of the nature of the data in our system we need to have different types of aggregations (sum, max, last…) for one single measure. Therefore the aggregation type needs to be different depending on the dimensions (Date, Product…).One clear example of this is in dealing with stocks. Stocks have different kinds of aggregations depending on how you look at it.The aim of this article is to know how to deal with these different aggregation level measures and implementing a solution in the OBI repository to solve the issues that those measures bring.
In today’s business world, decisions are made based on vast amounts of data Therefore, quick access to data is vital to make the right decisions at the right time.
The conflicting objectives when trying to achieve individual goals in the two asserts above include:
· Querying huge amounts of data
· A low response time
This article, based on a real case, highlights the importance of aggregating data when showing key performance indicators (KPI) in dashboards.
The organisation is the financial department of a telecommunications company in this specific scenario. Their goal is to maintain a set of dashboards presenting a group of previously defined KPIs.
The data source is the financial module of the corporate ERP. An ETL loads the required source data into a data warehouse. For the sake of this scenario, four stars are being loaded:
· Employee expenses: Contains the individual expense transactions for each employee (e.g. Salary payment, dismissal payment, etc).
· Customer revenues: Contains the incoming revenue transactions derived from the services and products contracted by the customers.
· Other revenues: Contains the incoming revenue transactions derived from other concepts.
· Supplier expenses: Contains the supplier payment transactions.
The financial dashboard requirements are:
· Margin report (Revenues vs. Expenses)
· Revenue & Expense trend in the last one/three/five years
· Current vs. Past year revenue & expense comparison
· Current vs. Past month revenue & expense comparison
Solely creating the required stars within the data warehouse and having the BI tool querying them and building the dashboards will result in poor performance due to:
· High volumes stored in the data source (ERP)
· KPIs are built based on multiple stars (multiple stars need to be queried, then data needs to be calculated to obtain the required KPI)
The solution: Aggregation
Aggregation is going to reduce the number of rows to be queried to obtain the KPI values. By doing this, the time required to refresh the dashboards will reduce dramatically, thus reducing resource consumption and end user wait time.
Aggregation consists on rolling up the vast amounts of detail data into higher levels of the dimension hierarchies. For instance, we do not need to store each individual transaction. Instead, we can group big sets of rows, based on the month these transactions occurred and the transaction type, into just a few rows.
In our scenario, the number of rows can be reduced from millions of rows in the bigger fact tables down to a few thousand. The impact on refresh time for the Margin report is dramatic, as you can see in the statistics figure below:
Taking the solution one step beyond: The KPI repository
The aggregation solution has proven to reduce the amount of time spent in refreshing reports by reducing the number of rows to be queried. However, it does not solve the overall issue as KPIs are typically based on calculations involving multiple stars.
If performance still needs to be improved, there is an option to eliminate the calculations at report refresh time. In this case, the ETL should take care of the calculations, and the results should be pre-stored into a new star (the KPI repository).
An example of KPI repository table is shown in the following figure:
The benefits of using the KPI repository are:
The execution and refresh time of the dashboard is shorter as:
· The data is pre-calculated.
· The query retrieves fewer rows than when querying the detail transaction stars.
· The number of fact tables to be queried is reduced from four to one. Therefore, in order to get a KPI value, one only fact table needs to be queried and no calculations are needed at runtime.
· Customised calculations
The ETL process performs a number of specific calculations and stores the results into the new fact table. The calculations that can be done during the ETL process are typically much more powerful and complex than those that can be done at runtime by the BI tool.
In our scenario the new fact table includes two customised calculations:
· Variance Previous Month
· Variance Previous Year
Despite the extra cost to the data warehouse project, the benefits of using aggregations in a data warehouse overcome any difficulties of dealing with them.
Using the KPI repository can take performance one step beyond, pre-calculating the results, thus minimising the time needed to refresh the reports.
It is strongly recommended to use aggregations to speed up the queries and thus decrease the report refresh time and increase the end user satisfaction.
In today’s business world, decisions are made based on vast amounts of data; therefore, quick access to data is vital to make the right decisions at the right time.The conflicting objectives when trying to achieve individual goals in the two asserts above include:· Querying huge amounts of data· A low response timeThis article, based on a real case, highlights the importance of aggregating data when showing key performance indicators (KPI) in dashboards.
25 October, 2010