Unleash the power of R to forecast data from your OLAP Cubes with Tableau

.

Unleash the power of R to forecast data from your OLAP Cubes with Tableau

R is a programming language and software environment for statistical computing, the leading tool amongst statisticians, data miners and data scientists for performing sophisticated statistical analysis, data discovery and predictive analytics on any set of data, such as linear/non-linear modelling, classification, clustering, time-series analysis and so on.

R offers a wealth of functions and, as an open source project, its capabilities are constantly enhanced and extended through user-created packages; this is why it is so popular and just keeps growing.

In this article, we will see how R can be connected to Tableau to perform highly-customizable forecasting of your data from OLAP Cubes. Your users will benefit without the need for an advanced coding background, and visualize the results in the Tableau charts that they like the most, with the help of just a few neat tricks.

 

1. Forecasting with Tableau

Tableau already comes with a forecasting feature; it uses a technique known as exponential smoothing which tries to find a regular pattern in the measures that can be continued into the future.

To access this feature, simply drag a date and a measure to your sheet and right-click on the canvas. In this example, we are using Tableau’s Superstore sample dataset.

Figure 1: Tableau’s Superstore sample dataset.

Figure 1: Tableau’s Superstore sample dataset.

Select Forecast ➜ Show Forecast, and the estimated line will appear. As you can see in the picture below, it’s very simple. What Tableau did was basically a linear regression to a straight line; it also shows the confidence interval. Right-click on the canvas again to access the Forecast Options, and an options pane will pop up.

Figure 2: Forecast Options

Figure 2: Forecast Options

In the options pane (below, 1), you can set a few parameters, such as forecast length, aggregation type, and you can change or remove the confidence intervals (called prediction intervals) and adjust the forecast model: it can be automatic, automatic without seasonality, or custom. If you choose custom, you will be able to select the type of Trend and Season: None, Additive or Multiplicative.

Figure 3: Forecast Options, Summary and Models

Figure 3: Forecast Options, Summary and Models

By right-clicking again and selecting Describe Forecast, you will be shown a pane that gives you additional forecast information. You can see this in 2 and 3.

 

2. Why use R instead?

What we have described so far is basically everything that Tableau can offer when it comes to forecasting. It works pretty well, but it’s not really customizable and, above all, it doesn’t work with OLAP Cubes!

OLAP Cubes have many advantages, but unfortunately, Tableau doesn’t allow us to do all the things we can do with a “traditional” data source, and forecasting is one of these (you can find a list of other OLAP-related limits here.

Luckily for us, Tableau is such a great tool that, despite not being able to perform forecasting on cubes, it offers us an even more powerful alternative: a smooth integration with R.

 

3. Integrating R in Tableau

There are 4 functions that allow Tableau to communicate with R: SCRIPT_REAL, SCRIPT_STR, SCRIPT_INT, SCRIPT_BOOL. They only differ in the return type, as suggested by their names.

They work very simply, and take at least 2 parameters. The first one is an actual R script; it is pure R code that is passed on to the R engine to be executed, and that will return the last element that is called. From the second argument, the values that are passed on are specified. In the script, you will refer to them as .arg1, .arg2, and so on, depending on the order in which they are declared.

R will take these values, interpret them as vectors (R basically only works with vectors and their derivatives), and return a vector of data that Tableau can use just like any other calculated field.

 

4. Connecting R to Tableau

In order to communicate, R and Tableau must be connected, and this is really easy to do:

1. Download R from here or R Studio from here, and install it

2. From the R console, download the Rserve package and run it:

install.packages(“Rserve”);
library(Rserve);
Rserve();

3. In Tableau, select Help ➜ Setting and Performance ➜ Manage External Service Connections

4. Enter a server name or an IP (in the same machine, use localhost or 127.0.0.1) and select port 6311

5. Test the connection by clicking on the “Test Connection” button, and press OK if you get a positive message

You can also automatize the start of Rserve by creating an Rscript with the code mentioned in point 2 (without repeating the installation), save it as a .R file, and call it from a .bat file with “start Rscript myRserveScript.R”. You’ll find it pretty useful, especially in a server environment, because you can schedule its execution when the machine starts via the Task Scheduler, or by copying a shortcut to %AppData%\Microsoft\Windows\Start Menu\Programs\Startup (in Windows).

After Rserve has started and Tableau is successfully connected, you can start using the 4 Script_* functions to exploit R’s power.

 

5. Forecasting in R

To perform forecasting in R, you have to download a user-made package called “forecast”.

To do so, simply write install.packages(“forecast”) in your R console, and call it by writing library(forecast) any time you need it.

It offers various types of fully-customizable forecast methods, like Arima, naïve, drift, Exponential Smoothing (ETS), Seasonal-Trend decomposition by Loess (STL) and so on, seasonable or not, with a number of different parameters.

For more details about the package, check the author’s slides, the help package pdf , or each function’s help page by writing ?func_name in the console (ex: ?stlf).

 

6. Use case

In our use case, we will take a series of monthly measures from the last 3 years and forecast them with different frequency/period combinations. We will use the function called stlf, which, without getting bogged down in details, gives us more freedom when playing with periods and frequencies whilst still providing a solid result.

In the picture below, you can see our data, from January 2014 to October 2016.

Figure 4: Data, from January 2014 to October 2016.

Figure 4: Data, from January 2014 to October 2016.

We dragged our date dimension (Year Number Month Number) to the column field, and put our desired measure in the row field. The trend is not regular, but we can see how the line has a few peaks and generally tends to grow.

 

7. Writing the R scripts

Let’s see how to write the R scripts that perform the forecast.

Firstly, we need to know what values we will be passing: we have our measure (for R to accept it, it must be aggregated, but coming from the Cube it already is, and if not, the ATTR function can be called on to help) and the two integer parameters (create them in Tableau) for the number of periods to be forecasted and for the frequency of the season.

Let’s use them in this order, calling them ForecastMeasure, ForecastPeriods and Freq. The calculated field should initially look something like this:

SCRIPT_REAL("
data <- .arg1;
nulls <- length(data[is.na(data)]);
data <- data[!is.na(data)];
freq <- .arg3[1];
periods <- .arg2[1];
",
[Forecast Measure],[ForecastPeriods],[Freq]
)

In the first argument, we have already set up the R variables that collect the values to pass to the script (this is not mandatory, but just to keep the code cleaner). Remember that R works with vectors: notice how the first one (data) takes the entire .arg1, because we actually want the whole array of measures, while the parameters (simple integers) need to taken from the first value of a 1-sized array. We then filter this vector to remove eventual null values, and we save the number of those in a variable called nulls (we will need it later).

We now perform the forecasting. Add the following lines to the script:

library(forecast);
time <- ts(data,frequency=freq);
fcast <- stlf(time, h=periods);

The first line simply calls the forecast library; we then create a time series object, by passing the measure and the frequency parameter (for more customizable details, check the resources already mentioned). Finally, we produce our forecasting result, by feeding the stlf function with the time series object and the periods parameter.

Out forecast is ready. If we use the same data and we execute the script in R, the console will look like this:

The interesting columns are Forecast, Lo80, Hi80, Lo95, Hi95. The first one contains the actual forecasted value. There are 12 because we used parameter = 12 (freq = 12, too). The others are the confidence interval (the limits can be changed when calling the function).

Now let’s complete our script. There is one thing we need to remember: Tableau expects to receive the same number of measures that it sent, so we append our forecasted array to the original one, properly shifted, along with rep(NaN, nulls), an array of nulls (we know how many there are because we saved them). Add these lines to the script:

n <- length(data);
result <- append(data[(periods+1):n],fcast$mean);
result <- append(result,rep(NaN,nulls));
result

The result that we want is fcast$mean, which corresponds to the column called “Forecast” in the R matrix that we saw. We append it along with eventual null values to the segment of original data that starts from the position number period+1, in order to get an array of the original size (note: by doing this, we won’t visualize the first period elements).

That’s it! Your new calculated field should look like the one below. Drag it to the row field, and you get your forecast.

SCRIPT_REAL("
data <- .arg1;
nulls <- length(data[is.na(data)]);
data <- data[!is.na(data)];
freq <- .arg3[1];
periods <- .arg2[1];

library(forecast);
time <- ts(data,frequency=freq);
fcast <- stlf(time, h=periods);
n <- length(data);
result <- append(data[(periods+1):n],fcast$mean);
result <- append(result,rep(NaN,nulls));
result",
[Forecast Measure],[ForecastPeriods],[Freq]
)

Figure 5: Complete R script and result

Figure 5: Complete R script and result

 

8. Shifting time measures

Now, we start getting the first issues. Tableau received a vector from R the same size as the one that we passed it, so each date-measure value is substitued for the new one, coming from the calculated field. The problem is that the date hasn’t changed!

Normally, this could be fixed by creating a new calculated date with the help of the DATEADD function, but as we can’t add a dimension in cubes, we have to overcome this issue by getting dimension value as a measure with an MDX formula, and then applying DATEADD to it.

Create a new calculated member, and set it as:

[Cube_name].[Dimension_name].MemberValue

Then create a calculate field like this:

DATE(DATEADD('month',[ForecastPeriods],DATE([YearNumberMonthNumber])))

Where [YearNumberMonthNumber] is (for example) the name of your newly calculated member. This will get you a measure with the date shifted accordingly to the forecasted period that you selected.

Figure 6: Tableau's Error

Figure 6: Tableau's Error

Drag this new field on top of the original date column and… your chart will crash!

Why? You’ve just created a new shifted date for every original date value, and basically your data is now “split” into “columns” of 1 value, 1 for each shifted month, and R will receive them 1 by 1 instead of getting an entire vector to process. This explains the error that says “series is not periodic or has less than two periods”, because what R is actually seeing is a series of indipendent 1-sized vectors.

To fix this, there’s a simple but neat trick: add your original data field to “Detail”, so that it is available in the sheet, and edit your calculated forecast field like this.

Figure 7: Table Calculation - Forecast (Months)Figure 7: Table Calculation - Forecast (Months)

This will make Tableau process the calculated field at the correct level, and help R see the received data in the correct way, thus producing the desired result.

That’s it! You’ve just forecasted your OLAP cube!

Figure 8: OLAP cube forecasted

Figure 8: OLAP cube forecasted

 

9. Adding confidence intervals

You can also add confidence intervals to improve your visualization: just create two more metrics, copying the one you created for the forecast, and change fcast$mean to fcast$lower[,1], fcast$lower[,2], fcast$upper[,1], or fcast$upper[,2], depending on the limit that you want to show (remember that the defaults are 80 and 95, but these are editable in the R function).

In the picture below you can see the forecast with the 95% bounds (fcast$upper[,2] and fcast$lower[,2]) , and the trend line (calculated by Tableau).

Figure 9: Forecast final visualization

Figure 9: Forecast final visualization

 

10. Conclusions

In this guide we have shown how R can help you build great forecast charts for your OLAP Cube data, but this isn’t the only benefit: we got a much higher degree of customization, and overcame the limitations of Tableau when it comes to defining parameters, methods or entire algorithms.

Once Tableau can connect to R, its potential is virtually unlimited, and it can enhance your charts in an infinity of ways: from any source (not necessarily OLAP Cubes), or with any goal, just send the data to R and let it do the job, whatever it is! Forecasting, clustering, classification, or anything else with an R package written for it – you could even write one yourself! Then visualize the result in Tableau, and adjust the look and feel of your dashboard with the powerful features that you love.

Click here if you would like to receive more information about the topic!

Informatica Performance – Optimization techniques

.

Informatica performance - Optimization techniques

 

Informatica provides the market´s leading data integration platform. ETL Mappings are designed for data loading into the data warehouse environment to achieve better reporting which in turn helps you to understand business trends better. The major problem faced by anyone working with Informatica ETL is to design a mapping(s) that doesn’t compromise its performance. Often we end up creating a mapping that achieves only the functionality but suffers in terms of performance.

The article explains the steps to identify the performance bottlenecks and the advanced optimization techniques that are available in Informatica to overcome them.

Following topics are discussed in detail:

* Optimizing sources, Targets and Buffer Blocks:
Performance bottlenecks can occur when the Integration Service reads from a source database or writes to a target database. Depending on the source, target data, you may need to increase or decrease the buffer block size. In this article we list the steps you have to take to ensure that the sources and the Targets are fully utilized to optimum level and how to calculate Buffer block size.

* Push down Optimization (PDO):
Push Down Optimization Option enables data transformation processing to be pushed down into any relational database to make the best use of database processing power.

* Session Partitioning & Dynamic Session Partitioning:
The Session Partitioning Option increases the performance of Power Center through parallel data processing. In dynamic partitioning, Integration Service determines the number of partitions to create at run time based on factors such as source database partitions or the number of nodes in a grid.

* Concurrent Workflow Execution:
A workflow configured for Concurrent Execution can run as multiple instances concurrently. In the article we illustrate this functionality by using a workflow that is configured with multiple instances to process data belonging to multiple sites simultaneously.

* Load Balancing:
Load balancing is a mechanism which distributes the workloads across the nodes in the gird. In the article we see how to assigning Service Levels to Workflows to achieve Load Balancing.

Click to read the full article: Informatica performance - Optimization techniques!

 

Data Mining & Business Intelligence

.

Data mining & Business Intelligence

 

The term data mining refers to one of the processes involved in the task of extracting knowledge from a database, also known as KDD (Knowledge Discovery in Data bases). However, by extension, it is referred to as the KDD global process because of its commercial appeal. Understanding data mining as a KDD sub-process, we could define the term as the process of extracting underlying knowledge from a large volume of data.

It is a recent development directly linked to the scientific fields of mathematics (mainly statistics), computer science and artificial intelligence. It can be supported by different Business Intelligence systems, from which we can obtain several advantages!

In this article we´re highlighting how a Business Intelligence system is a great starting point for the data mining process and how data mining can be used for process optimization.

The topics covered are:

  1. Definition
  2. Objectives and challenges
  3. KDD Process Phases
  4. Applications:

➀ Data mining & Business Intelligence
➁ Extracting knowledge from unstructured data
➂ Optimization of processes

Click this link to read the full article on data mining: Data Mining & Business Intelligence

OBIEE Data Lineage Solution

.

Tracking OBIEE reporting data

BI implementations may use hundreds of data warehouse tables and thousands of reports, and one of the most common requirements in a BI project is the ability to track back the data used in reports. The possibility to quickly identify the set of columns and tables used in a given report or dashboard, or to check which reports may be affected by a change of physical column in the data warehouse, is also crucial for development. This calls for a data lineage solution which is accessible to users and developers,  and allows checking such data on an ad hoc basis.

Various vendors offer data lineage solutions, but these can be expensive and vendor-specific. With our simple solution, we combine Catalog Manager and Administration Tool export sources to create an easily accessible solution for tracking data lineage in OBIEE.

By implementing the OBIEE data lineage solution, we can check the following:

  1. Which physical tables and columns are used in a given report or dashboard
  2. Which reports use given physical columns or tables; this is especially important when modifying an existing table, as any change in the table’s structure must take existing reports into consideration
  3. Which are the most commonly used columns in reports in a given subject area; identifying the most commonly used columns in a report can hint at creating indexes to improve the overall performance of the OBIEE implementation even further.

Solution

The ClearPeaks OBIEE Data Lineage Solution gathers all the required data lineage information in one place and uses it as a source for OBIEE reports within Data Lineage Subject Area. Two sources are combined to achieve this:

  1. List of OBIEE reports and presentation columns used

Catalog Manager provides an option to export the list of reports and columns used. The export can be done either with the Catalog Manager UI tool or through the command line utility. We use the latter option, as it allows automation of the whole process later.

  1. Mappings of presentation columns back to the physical columns

Such mappings can be obtained manually by creating Repository Documentation from Administration Tool’s Repository Documentation utility. The output will be a repository documentation in the form of comma separated values file (.csv). This will contain column mapping from presentation through logical to physical layers, including column formula expressions. Another way to obtain column mappings is by extracting the OBIEE repository in .xml file format through the command line utility. Our solution uses the second option, as the repository file will be used in our automated script.

OBIEE Data Lineage Solution

Once we have obtained both files, we need to populate the data lineage tables.

The data can be transformed and manually inserted into the tables, but in our solution we use a script (which can run on the OBIEE server) that parses the data and inserts it into the tables.

Data Lineage Subject Area

Once we have populated the data lineage tables and their data model has been mapped in Administration Tool, we can create and run reports in OBIEE using Data Lineage Subject Area and filter the results according to our requirements.

OBIEE Data Lineage Solution

Let us look at a few of the use cases for the Data Lineage Subject Area:

Use case 1. Which data warehouse tables and columns are used in a given report?

We would like to know which data warehouse tables and columns are used in a particular report or dashboard. We can create a report with a list of the columns used by a given OBIEE report and their data lineage:

OBIEE Data Lineage Solution


Use case 2.
Which reports use given physical tables?

We want to know how many and which reports or dashboards are using given physical tables or particular columns; this could be very useful when assessing the potential impact of column formula or table structure changes on reporting. Using Data Lineage Subject Area we can fetch up the list of OBIEE reports used by a given physical table:

OBIEE Data Lineage Solution


Use case 3.
Which reports use given subject areas?

We need to know which reports and dashboards are accessing data from given subject areas. This may be particularly useful when revising users’ access permissions.

OBIEE Data Lineage Solution


Future Improvements

OBIEE Data Lineage Subject Area can also serve as a backbone for further developments, providing complex and complete solutions for managing existing OBIEE project implementations. Here are some examples and potential benefits of merging additional information into the solution:

Usage Tracking data – allows analysis of the physical tables used in the most accessed reports, the tables and columns in reports not used by anyone, removing non-used reports or underlying tables.

Data warehouse database metadata – such as table size, indexes on columns. This allows for performance analysis of the most heavily used tables and columns by report usage.

ETL Data Lineage – additional layer of data lineage tracking – allowing the tracking of data back to the very source – can be achieved by adding ETL transformation data obtained from ETL systems. For example, it is possible to track all the ETL transformations on a given presentation column down to the source system.

Catalog metadata – it is possible to extract additional information regarding catalog objects such as user permissions, report owners, last modified date etc., to further enhance the solution usability.

Adding all the above components creates a strong core for the OBIEE management dashboard, allowing continuous improvement through:

* tracking data from the report to the source database column
* constant analysis of the most used tables in order to improve performance
* checking which users have permissions for given OBIEE reports and dashboards

All of the above is accessible from the OBIEE front-end, providing a convenient and quick way to facilitate many daily business-as-usual tasks in BI deployments.

Conclusion

The ClearPeaks OBIEE Data Lineage Solution can be easily deployed in any project using Oracle Business Intelligence Enterprise Edition. The solution can be run from the command line tools, which makes it possible to create automated jobs to extract and update data on a regular basis.

If you think your OBIEE project could benefit from our data lineage solution, please contact us for more information via our web contact form or by leaving your comments below!

Blog Article Author: Rafal Ostaszewski

Oracle BI Cloud Service, Part V: Managing the Service

.

How Oracle BI Cloud Service is managed

Parts II, III and IV of our BICS series were dedicated to the basic functionalities of the service. In this post, the last one in the series, we will explain how the BI Cloud Service is managed.

As in the on-premise version of OBI, security of the BI cloud service is managed with users, roles and application roles. All the cloud services you buy will be assigned a unique identity domain that will allow you to manage what users and roles can access to which service associated with that identity domain. Application roles allow you to set permissions to users and roles within each cloud service.

You can manage users and roles from the My Services page of Oracle Cloud with the Identity Domain Administrator credentials.

 

image

Figure 1: Oracle Cloud My Services webpage

Application roles are managed from the BI Cloud Service Console with BI Service Administrator credentials.

bics 2

 Figure 2: Managing users and roles

From the Service Console you can easily add users, roles and application roles as members to one or multiple application roles, manage pre-defined application roles, and create your own application roles.

 

bics 3

Figure 3: Oracle BI Cloud Service - Service Console

A useful feature included in BI Cloud Service is Snapshot: you can capture with one click the state of the service at a point in time including data model, catalog and application roles (but not database elements which should be backed-up separately). You can then save the snapshot in the cloud (maximum 10 snapshots) or download it to a file system, and upload a saved snapshot to restore the service to a previous state. Each snapshot import is total in that it overwrites everything that was in the instance beforehand. Apart from backups, snapshots are useful for moving data from the pre-prod to the prod environment.

Image 3

Figure 4: Taking system snapshots

Other common administration tasks of OBIEE are also available in the BI Cloud Service, such as monitoring users currently signed in, analyze or test SQL queries, and monitor usage and other metrics.

 

Conclusion

The Oracle BI Cloud Service is a good option for small and medium companies looking for an easy-to-use BI framework with a fast deployment cycle and minimum capital and operating expenditures. The agility and availability of the system allows companies to start reporting against their business data in a couple of hours. We are still missing some features to consider it as an option for large projects, but it is still useful for any company requiring their BI contents to be available in the cloud and in any mobile device.

So what are your thoughts on the Oracle BI Cloud Service? Would you consider deploying your BI fully in the cloud using the service? Contact us at info@clearpeaks.com or using our web form if you would like more information about the Oracle BI Cloud Service, or leave your comments below!

privacy policy - Copyright © 2000-2010 ClearPeaks

topnav