OBIEE Data API

.

 

Introduction

OBIEE Data API is a REST API using Node.JS to consume OBIEE 11g web services. The REST API identifies the call from external application using OAuth. The application helps to build a non-direct connection using rest calls to the data source rather than using a direct JDBC/ODBC connection.

We came across a situation where we needed to expose the data from OBIEE seamlessly to our mobile application in an agile way without tightly coupling the data layers with the mobile presentation layer. Using the OBIEE Data API will ensure the application server only connects to the data source via REST calls and then communicates to the users, as shown in the diagram below.

Figure 1: Data flow architecture

Figure 1: Data flow architecture

 

1. Motivation

Below are the key factors that drove the development of the OBIEE Data API:

How to get data from multiple silos for the enterprise mobile application. 

The Enterprise mobile application required data from multiple data sources, including OBIEE reports and Oracle and SQL databases of internal applications. We wanted to limit the configuration of new data connections to a single source to simplify the data flow to the mobile application.

All the complexity surrounding data source definitions, database drivers, and multi-silo network access should be the responsibility solely of the data API and should not be tied to the data reading application

How to conceal the direct connection details to data sources.  

As per the security audit recommendations we received, we needed to minimize the exposure of the direct connection to the databases from multiple applications and to conceal the actual connection details.

 

2. How does the Data API work?

When a predefined data endpoint is called from the external (mobile) application, the API checks if the called endpoint exists in the metadata dictionary of defined Data API entities.

If it finds a suitable match, then the API establishes a connection to the respective data source and requests to execute the logical SQL statement defined in the entity. The response received from the OBIEE server is rearranged into a predefined JSON format and sends back the final response to the external requestor.

Let’s see how the Data entity is defined and configured in the API.

Defining Entities

Data API entities are the defined SQL statements to be executed against a known data source connection. They include what parameters are to be used and how the flat JSON response is reshaped, as per the user requirements.

We can create data source connections to connect the OBIEE server as shown below.

Figure 2: Define data source connection

Figure 2: Define data source connection

 

The below screen describes how we define a new entity or data endpoint.

Every data entity has a unique name, a data source connection, a query to be executed, a list of parameters it can receive and process, and the configuration necessary to transform the raw data to the required JSON output.

Figure 3: Define data entity

Figure 3: Define data entity

 

How to connect to the OBIEE server using the API to retrieve data.

Data API operations predominately first attempt to establish a connection to the data retrieving system and then initiate the execution of a query to retrieve raw data.

From the OBIEE web services, we are using the SAWSessionService logon method to establish the connection. The response received from the logon method contains the session ID or token required to call further methods to fetch data from OBIEE instance. Once the connection is established, the executeSQLQuery method of XMLViewService is called by passing the logical query saved in the entity, along with the session ID needed to retrieve the raw data. The data formatter module of the API will reformat the received XML data to the required JSON format. A sample output of the data is shown below.

Figure 4: Sample Output

Figure 4: Sample Output

 

3. Is the Data endpoint secure?

Yes, Security mechanism within the API will ensure that the API call is from a trustworthy source. This is achieved through a token authentication method. Prior to the data endpoint call, the authenticate method has to be called to obtain a token. This token is sent as a header message along with the data end point call to the REST API server.

 

Conclusion: Applicability (Mobile Application, Web Applications...)

Commonly, enterprise data is hosted on a large number of systems. It is commonly hard to get access to them and quickly build new solutions, which need data from multiple silos. If you implement a RESTFUL API solution, which has access to multiple silo’s, you can expose the data in multiple silo’s to a large number of applications and their servers in a more agile way. Implementing REST API to expose enterprise data from multiple silo’s is a great way of setting your data free to any type of applications including mobile, web, embedded components, etc.

The OBIEE Data API initiative has now been enhanced to retrieve data from Oracle and SQL server databases using the oracledb and mssql npm packages, respectively. The REST API application services can be further extended to connect to mongodb, cloud web services, essbase cube, etc.

Figure 5: Solution road map

Figure 5: Solution road map

 

Click here if you would you like to know more about how to use the OBIEE Data API.

ClearPeaks´ Visualisation Plugin

.

 

Introduction

If you’ve worked with any kind of data, you know how harrowing reading rows and rows of numbers can be. It isn’t easy to wade through all of those numbers and easily figure out what they mean. That’s the case where visualisation comes to the rescue. Visualisation de-mystifies the data and helps decision-makers derive actionable insights from it. The whole idea of this blog is to delve into the blossoming world of data visualisation. It is hard to ignore the weight that more advanced and more interactive visualisations carry in today’s data-centric world. It is common to get requests to add visualisations or other improvements to the dashboard in order to provide a flashier look or design. Typically, these kinds of requests are handled by embedding HTML and JavaScript to provide data driven, custom visualisations. At ClearPeaks, we have developed a set of Custom Visualisations that allow configurable visualisations to work natively in modern browser technologies.
 

1. Motivation

Following are the main Key factors, which drive the need for Custom Visualization Plugins:

Add new Capabilities to existing chart types for modern reporting tool.
Add brand new Visualization types.

Let us discuss each one of these points in detail.

Add new Capabilities to existing Chart types.
How many times have you received a request that you just could not meet with native chart types because there was not enough ability to customize the chart? For example, imagine that you want to build a line/bar combo chart with a third distinct Y-axis. Oh sure, you can map three metrics, but if the metrics do not have the same unit of measurement or the same scale, then you are screwed. To handle these kind of scenarios, we have to use these Pre-build custom Plugins.

Add brand new Visualisation types.
There is huge scope for improvement in terms of Interactive Visualizations with traditional charts Library and there are so many new visualization Patterns we can build using JavaScript like Donuts, Meters, Pokers or Animated Donuts etc, which is hard to visualize with Traditional Reporting tools.

 

2. How does it work?

ClearPeaks’ plugin is solely based on native browser technologies and doesn't require client side plugins like Flash or Java. The plugin is built with jQuery and SVG. With the ClearPeaks plugin referenced in your webpage, we are ready to use a wide range of highly customizable charts. In case of the Oracle BI implementations, where their servers are isolated from the Internet and protected in internal LAN segments, referencing these resources from Oracle BI means having to first deploy them in the WebLogic Server as static resources.
ClearPeaks’ plugin allows us to easily customize the design of a chart, like its size, colour and fonts. Also, it is possible to include or exclude parts of the charts.
 

3. Data Visualization Graphs

In this section, we present some of the visualization graphs developed by our team:

Gauge Chart: Gauge provides a rich amount of configurable items, which can set options we pass through the plugin invocation.
Figure 1 - Gauge Chart
Donut Chart: A Donut Chart is a circle chart that shows the percentage of an activity.
Figure 2 - Donut Chart
Pyramid: A Pyramid chart displays a single series of data in progressively decreasing or increasing proportions, organized in segments, where each segment represents the value for the particular item from the series.
Figure 3 - Pyramid
TimeLine: Timeline allows to visualise the starting and ending of an activity.
Figure 4 - Timeline

Visualisation also includes the below charts.

Horizontal Bar Chart:
Figure 5 - Horizontal Bar Chart
Vertical Bar Chart:
Figure 6 - Vertical Bar Chart
Bubble Chart:
Figure 7 - Bubble Chart
Percentage Bar:
Figure 8 - Percentage Bar

 

Conclusion

ClearPeaks’ custom visualisation plugin provides some of the best and most unique visualisations available. The charts are highly customizable in terms of size, colour and design. Better yet, the visualisations can be implemented in OBIEE or any custom html reports or dashboards.

Click here if you would you like to know more about this innovative solution!

Displaying Geographic Data in OBIEE

.

 

Introduction - What is geographic data?

The main goal of Business intelligence is to transform raw data into meaningful and useful information for the purpose of enabling more effective operational insights, as well as more tactical decision-making. The nature and character of this raw data can be very heterogeneous, ranging from structured data of orders originating from transactional databases to unstructured data coming from clients´ Twitter feeds. Today, I want to focus on one specific data type: geographic data.

The term ‘geographic’ neither refers to the way data is stored, nor its source. Instead, it denotes a functional characteristic, meaning data can be somehow positioned on the Earth. More precisely, geographic data can be defined as data with an implicit or explicit association with a location relative to the Earth, either a point, a line or a polygon.

In the following images you can see a clear example of how important is to show geographic data properly. While it is very difficult to see a clear pattern on the bar chart, the map displays a much clearer picture. Indeed, it turns out we are visualising the latitude of each region of Spain. The conclusion is that, as I like to say, showing geographic data without a map means losing information.

What is geographic data? Comparison bar chart - map

Figure 1: Comparison bar chart - map

As you might know, most organisations have some geographic data among their data sets. It can be points with a client’s location or an event’s situation, lines representing streets or railways, or polygons with the shape of countries or some other customised regions. Geographic data is usually present, and hence, it has to be properly displayed in order to get useful insights and information from it.
 

1. Geographic Visualisations Components

When creating geographic visualisations, that is, maps with data on the top, three pieces or components are needed:

Background map: which is the map displayed on the bottom. It can be either an online map (Google Maps, Bing, TomTom, etc.), or an on-premise map (e.g. HERE or OpenStreetMaps) which was previously designed and stored. As you can see on the images below, the visual experience of an online map is hardly achievable with an on-premise one.
Examples on-premise maps

Figure 2: Examples background maps

Data layers: which are composed by a shape (a point, a line or a polygon) and data objects (measures and attributes). The critical issue is which shapes are identified by the visualisation tool and which are not. Geocodes comprised of latitude/longitude coordinates are usually identifiable as points, but not literal addresses. For polygons, only main administrative areas are usually recognised, while custom areas will have to be manually introduced.
High-end geographic information system (high-end GIS): which is the software that matches the background map with the data layers, renders the map and includes some extra spatial functionalities. Some examples of high-end GIS are Oracle MapViewer (for OBIEE), Tableau (already built-in) or libraries such as Google Maps JavaScript API, Leaflet or Kendo.

 

2. Creating Geographic Visualisations in OBIEE12c

When working with OBIEE12c, we have three main options to implement some nice geographic visualisations:

Oracle MapViewer with online or on-premise maps:Developing OBIEE built-in maps using the Map View analysis type, which is specially designed to display several map visualisations such as Color Fill Map, Bubble Map or Pie Graph Map. The background map can be either some online map like Oracle eLocation if you have Internet connectivity or a customised offline one. The MapViewer toolkit for OBIEE and the Oracle Spatial option for Oracle database are necessary. The pros are that no third party is involved in the solution, nor is any code needed. On the other side, the amount of visualisations is limited and requires the configuration of layers and background maps.
On-premise library with online or on-premise maps: Another solution is developing maps using a library on-premise, such as Leaflet or Kendo, together with a customised on-premise map, for a 100% offline solution, or with an online map. Remember, OBIEE allows you to run JavaScript code using Narrative View analyses. In this case, the pros are no extra Oracle tools needed, higher visualisation features and options, and no Internet required. However, the development and maintenance cost increases significantly.
Google Maps JavaScript API: The last main solution is developing maps using the Google Maps JS API (or some other geographic API). Again, we use Narrative View analyses to run the JavaScript code on OBIEE, but in this case you also need to enable cross-site scripting from the API to OBIEE server. The pros are increased user experience and better visualisation features, while some drawbacks are dependency on third parties and higher development cost.

In short, each solution has its pros and cons. For this reason, doing an analysis of the users requirements, the system limitations and the maintenance and development cost is a must before starting with the development.
 

3. Developing Geographic Visualisations with Google Maps JS API

In this section, we explore the possibilities of developing geographic visualisations in OBIEE12c using Google Maps JavaScript API. Specifically, we show three different dashboards, each with one map and several extra functionalities developed with HTML, CSS and JavaScript, such as the title and some navigation buttons.
The first one is a heat map colouring the area of towns by some measure. It also includes several background map styles, a legend with the minimum and maximum value of the measure, and a tooltip with some specific attributes and measures. Moreover, the traditional Google buttons can be configured. In this case the Street View button is hidden and only the zooming buttons are shown.

Developing Geographic Visualisations with Google Maps JS API

Figure 3: Developing Geographic Visualisations with Google Maps JS API - Dashboard 1

The next dashboard shows a heat map with a similar look and feel to the previous one. This one also incorporates a label with the number of points requested, which is very important to control due to performance issues. It is a perfect way to identify geographical patterns on the localisation of events.

Image 4

Figure 4: Developing Geographic Visualisations with Google Maps JS API - Dashbaord 2

The last one shows a markers map which uses specific icons to represent different values of a category. Also, a legend with the descriptions of the icons used can be shown or hidden by clicking on the information button. This is a great manner of introducing another dimension on a geographical analysis.

Developing Geographic Visualisations with Google Maps JS API

Figure 5: Developing Geographic Visualisations with Google Maps JS API - Dashbaord 3

Obviously, the complexity of the code depends on the characteristics of the map itself such as the type of map, the utilisation of custom geometrics or markers, the amount of auxiliary elements such as legends and tooltips, or the level of customisation of the background map. However, when working with OBIEE, there is another complexity element to take into account: The insertion of this code into the narrative view structure.
 

Conclusions

Although this article gives a more general overview of several topics related to the big world of geographic data, we can still get some conclusions from what has been said.
First and most importantly, if you have geographic data, remember to draw a map! Moreover, we have talked about the main components you need to create a geographical analysis, and the technical options to implement it in OBIEE. Finally we have shown some nice dashboards using Google Maps API.
Click here if you would you like to know more about displaying geographic data in OBIEE and the services we offer!

Real Time Business Intelligence with Oracle Technologies

.

 

Introduction

In our previous blog article we discussed the necessity of real time solutions for Business Intelligence (BI) platforms and presented a user case with Microsoft Technologies (namely Azure and Power BI). In this case, we are analysing the same scenario, but we instead propose a design using Oracle Cloud and Oracle On Premise technologies.

We recommend going through the previous blog article to understand completely the scenario under analysis and its requirements.

 

1. Oracle Technologies for Real Time BI

Oracle offers both in cloud and on-premises solutions, focusing on Big Data. These solutions can be used for a variety of Big Data applications, including real-time streaming analytics.

a. Oracle Cloud Services

From all the services offered on Oracle Cloud, the following suit the needs of a real-time BI application:

Oracle Event Hub Service. This service provides a managed Apache Kafka cluster, where we can easily assign resources and create topics using the web interface.
Oracle Big Data Compute Edition Service (OBDCE). This service provides a managed Big Data cluster with most of the Apache Hadoop/Spark stack elements, where the resource management and execution of tasks is also managed from the web interface.

Both the Event Hub and the OBDCE services are part of the PaaS offerings of Oracle Cloud, which means that the whole hardware and software infrastructure behind them is managed by Oracle. The key benefit is that, even though we are using standard open source technologies, we don’t have to worry about resource provisioning, software updates, connectivity, etc. With this, the developers can focus on building the solutions, without losing time on administrative tasks.

Another important point is that the connectivity between services on the cloud can be configured very easily using the web console, which ensures a reliable and safe path for the data.

b. On Premise Technologies

In addition to the cloud solution, a similar environment can be built on premises. For this we are using the Oracle Big Data Appliance. The Big Data Appliance consists, generally speaking, of the following components:

Hardware: Several server nodes and other accessory elements for networking, power, etc. The configuration can be a starter rack with 6 nodes to a full rack with 18 nodes. Multi-rack configurations allow for an even larger number of nodes.
Software: All the components of the Cloudera Distribution for Hadoop (CDH) and additional components from Oracle like Oracle Database, Oracle NoSQL Database, Oracle Spatial and Graph, Oracle R Enterprise, Oracle Connectors, etc.

For the purpose of our Real Time project, the required components are Kafka and Spark, as we will see later. Both are part of CDH and key elements of the standard open source real-time analytics scenario. In this case, all the components will be available in the same cluster.

It is also important to know that, for demo projects like this, Oracle offers a Big Data Lite virtual machine, that contains most of the components of the Big Data Appliance.

c. Real-Time Visualization

At the time of writing this article, there is no BI tool from Oracle (OBIEE, DV) that allows visualization of real-time data. To tackle this, we decided to build a custom front-end that could be used both as a standalone web application or as one integrated into OBIEE.
The key technologies we are using for this purpose are:

Flask, which is a lightweight web framework for Python.
SocketIO, which is a framework based on WebSockets for real-time web applications using asynchronous communication.
ChartJS, which is a JavaScript library for building charts.

 

2. Solution Design and Development

The architectural design for the solutions is shown in the figure below and explained throughout the rest of this section:

Realtime BI solution design diagram, using both cloud and on premise Oracle technologies

Figure 1: Realtime BI solution design diagram, using both cloud and on premise Oracle technologies

a. On Premise Source System
The on-premises source system part of the solution simulates a real-time operational system using a public data feed provided by Network Rail. The data from the feed is processed and inserted into a PostgreSQL database. An event monitor script listens for notifications sent from the database and forwards them to the real-time processing queue (Kafka), either located on Oracle Cloud or on the Oracle Big Data Appliance.

For more details on this part of the solution, please, refer to the previous article using Microsoft Technologies. The design is exactly the same in this case, except that, in this solution, the Event Monitor sends the events to a Kafka queue (cloud or on premises), instead of sending them to an Azure Event Hub.

b. Oracle Cloud and Oracle Big Data Appliance

As explained in previous sections and shown in the diagram above, the Oracle solution for real-time stream processing can be developed both in the Oracle Cloud and using the Oracle Big Data Appliance. Both solutions are similar, as the underlying technologies are Kafka for event queueing and Spark Streaming for event processing.

Apache Kafka is a message queueing system where Producers and Consumers can send and receive messages from the different queues, respectively. Each queue is called Topic, and there can be many of them per Kafka Broker (or node). The Topics can be optionally split into Partitions, which means that the messages will be distributed among them. Kafka uses Zookeper for configuration and managing tasks.

In our scenario, we are using a simple configuration with just a single Kafka Broker, three Topics, each of them with a single partition. The process works as follows:

The Event Monitor sends the events received from the database to Topic 1.
Spark Streaming consumes the messages from Topic 1, processes them and sends the results to Topics 2 and 3.
In the Flask Web Server a couple of Kafka Consumers are listening to Topics 2 and 3, and forwarding them to the web application.
Interaction between main components of the solution and the different Kafka topics

Figure 2: Interaction between main components of the solution and the different Kafka topics

Spark Streaming is one of the multiple tools of the Apache Spark stack, built on top of the Spark Core. Basically, it converts a continuous stream of messages (called DStream) into a batched stream using a specific time window. Each batch is treated as a normal Spark RDD (Resilient Distributed Dataset), which is the basic unit of data in Spark. The Spark Core can apply most of the available operations to process this RDDs of the batched stream.

Spark Streaming processing workflow

Figure 3: Spark Streaming processing workflow

In our scenario, Spark Streaming is being used to aggregate the input data and to calculate averages of the PPM metric by timestamp and by operator. The process to calculate these averages requires few operations, as shown in the Python code snippet below:

# Create Kafka consumer (using Spark Streaming API)
consumer = KafkaUtils.createDirectStream(streamingContext,
[topicIn],
{"metadata.broker.list": kafkaBroker})

# Create Kafka Producer (using Kafka for Python API)
producer = KafkaProducer(bootstrap_servers=brokers,
value_serializer=lambda v: json.dumps(v).encode('utf-8'))

# Consume the input topic
# Calculate average by timestamp
# Produce to the output topic
ppmAvg = consumer.map(lambda x: json.loads(x[1]))
                 .map(lambda x: (x[u'timestamp'], float(x[u'ppm']))
                 .reduceByKey(lambda a, b: (a+b)/2)\
                 .transform(lambda rdd: rdd.sortByKey())\
                 .foreachRDD(lambda rdd: sendkafka(producer, topicOut, rdd))

From this sample code, we can see that the RDD processing is similar to Spark Core, with operations such as map, reduceByKey and transform. The main difference in Spark Streaming is that we can use the foreachRDD operation, which executes the specified function for each of the processed batches.

It is also important to know that, at the time of writing this article, the Spark Streaming API in Python does not offer the option to create a Kafka Producer. Therefore, we are using the Kafka for Python library to create it and send the processed messages.

Together with the average by timestamp shown above, we are also generating an average by operator. As we have two sets of processed data, Spark needs to send the data to two separate Kafka Topics (2 and 3).

One key problem with Spark Streaming is that it does not allow data processing in Event Time. This basically means that we can’t synchronize the windowing applied by Spark to create the batches to the timestamps of the source events. Therefore, as shown in the diagram below, this can lead to the events of different source timestamps being mixed in with the aggregates created by Spark.

Misalignment between Kafka and Spark Streaming windows causes events to be processed inside the incorrect time window

Figure 4: Misalignment between Kafka and Spark Streaming windows causes events to be processed inside the incorrect time window

In fact, in our scenario, we have events that are timestamped in the source, but we were not able to align the Spark Streaming batching process to this time.

There are some possible solutions for this issue, namely:

Spark Streaming with Updates.This is a workaround, were we can tell Spark Streaming to update the results of a batch with data coming in a “future” processing window. We tested this approach but, unfortunately, it lead to compatibility errors with Kafka and we couldn’t go ahead with it.
Spark Structured Streaming.This is a separate tool of the stack built on top of Spark SQL, which is meant to solve the problem with Event Time processing. Unfortunately, at the time of writing this article, it is still only available in “alpha” version as part of Spark 2.X. Again, we were able to test this experimental feature but couldn’t get it to properly work with Kafka.
Other streaming processing tools.There are other existing tools in the Big Data ecosystem that can work with Event Time. Kafka itself has a Streams API that can be used for simple data processing. There is also Storm and others.

c. On Premise BI System

As introduced earlier, Oracle standard BI solutions don’t offer the possibility to connect to real-time sources. Therefore, we decided to build our own custom platform and integrate it with OBIEE.

The key component of this part of the solution is the real-time messaging between the web server and the browser provided by SocketIO. In this library, the client requests the server to start a session. If the server accepts, a continuous stream of bidirectional messages is opened (in our case the messages are unidirectional, from the server to the client). Both the client and the server react to the received messages. Finally, either the client or the server can close the connection (in our system the client closes the connection when the browser is closed).

Continuous bidirectional communication channel using SocketIO

Figure 5: Continuous bidirectional communication channel using SocketIO

Although the message channel is bidirectional, only the server is sending messages to the clients. What it does is consume the events coming from the Kafka Topics populated by Spark Streaming and forwards them, with a slight manipulation, to a SocketIO namespace using two different named channels.

The web server sends the messages received from the Kafka topics to the SocketIO channels

Figure 6: The web server sends the messages received from the Kafka topics to the SocketIO channels

# Create a SocketIO object using the Flask-SocketIO add-in for Flask
socketio = SocketIO(app, async_mode=async_mode)

# Create a Kafka Consumer using PyKafka
client = KafkaClient(hosts=kafka_host, zookeeper_hosts=zookeeper_host)
topic = client.topics[kafka_topic]
consumer = topic.get_simple_consumer(consumer_group=kafka_consumer_group,
                                     auto_offset_reset=OffsetType.LATEST,
                                     auto_commit_enable=True,
                                     auto_commit_interval_ms=1000,
                                     auto_start=False)

# Start the Consumer, monitor the input
# and send the received data through the SocketIO channel
consumer.start()
while True:
socketio.sleep(0.1)
message = consumer.consume(block=False)
if message is not None:
data = json.loads(message.value.decode('utf-8'))
socketio.emit(channel,
{'key': data[0], 'value': data[1]}, namespace=namespace)

On the client side we have two possibilities to display the data, a standalone website served by the same web server or a set of custom analysis developed in OBIEE. In both cases, the key elements are the SocketIO and ChartJS JavaScript libraries. The first one establishes the connection with the server and the second one is used to create the charts. The SocketIO object is configured so that anytime it receives a message from any of the channels, it will update the data of the chart and ask ChartJS to refresh it accordingly. The required code in Javascript is shown in the following snippet:

# Create the socket using the JavaScript SocketIO client library
socket = io(location.protocol + '//'
+ document.domain + ':'
+ location.port
+ namespace);

# Create an event listener for the “realtime_data01” channel
# and update the corresponding charts
socket.on('realtime_data01', function(msg) {
if (lineChart.data.labels.length > 20) {
lineChart.data.labels.shift();
lineChart.data.datasets[0].data.shift();
}
lineChart.data.labels.push(msg.key);
lineChart.data.datasets[0].data.push(msg.value);
lineChart.update();
});

Here, we are asking the socket object to update the values of the lineChart object, shift the values if required and update the chart when a new message is received in the realtime_data01 channel. The result will be a set of charts updating automatically as soon as the new data is sent through the socket:

: Standalone web application with real time visualizations

Figure 7: Streaming tiles showing real time data coming from Stream Analytics

Moreover, using dummy analysis with static text visualizations, we can embed the HTML and JavaScript into a normal OBIEE dashboard. In the example below, we can see exactly the same visualisations as in the standalone web. However, we will always be able to combine these real time visualisations with normal RPD-based OBIEE analyses.

Real-time visualizations embedded into an OBIEE dashboard

Figure 8: Real-time visualizations embedded into an OBIEE dashboard

 

3. Scenario Analysis and Conclusions

Based on our experience developing this solution with Oracle technologies for Real Time BI scenario, we have identified the following benefits, as well as areas for future improvement:

Advantageous features:

Oracle solutions for Big Data are available both in the Cloud and On Premise, suiting perfectly to different companies and scenarios.
The Oracle Big Data stack is based on open source standard software, which makes it really easy to develop solutions and to find guidance.
Kafka topics are really easy to create and configure in a matter of minutes.
Spark Streaming leverages all the processing power of Spark to real time streams, thus allowing for very sophisticated analytics in a few lines of code.
SocketIO allows creating bidirectional channels between web servers and applications and suits very well the needs of a real time application.
The web based front end is very flexible as it can be served standalone or integrated into other tools as OBIEE.

Areas for improvement:

The Oracle Big Data stack comes with a plethora of components pre-installed, which can be unnecessary in many applications.
Spark Streaming windowing is not ready yet for event time processing, which makes it flawed for some real-time applications. Other solutions from the Spark stack are still not yet completely ready for production environments.
Oracle standard BI solutions do not offer real-time visualisation solutions.

Authors: Iñigo Hernáez, Oscar Martinez

Click here if you would like to receive more information about the Real Time BI Services we offer!

Real Time Business Intelligence with Azure and Power BI

.

 

Introduction

The purpose of this blog article is to analyse the increasing need for real time solutions applied to BI platforms. We will first understand the context and then present a user case and its corresponding solution. The technical development takes advantage of the Event Hub, Stream Analytics and SQL Database services of Microsoft Azure, together with visualisations in Power BI.
 

1. Traditional and Real Time BI

Traditionally, a BI solution consists of a centralized DWH that is loaded incrementally in batches. The most common approach is to run the ETL process once a day, outside of working hours, to avoid undesired workloads and performance issues on the source transactional systems. The main drawback of this approach is that the information analysed by business users is always slightly outdated.

In most cases, business processes can be successfully analysed even if the available information is from previous days. However, in recent years, the number of business processes that require real-time monitoring has increased dramatically. Fraud detection, sensor networks, call centre overloads, mobile traffic monitoring, and transport fleet geolocation are just a few examples of where real-time analysis has transformed from a luxury to a necessity.

To understand the difference between traditional and real-time BI necessities, let’s use the Value-Time Curve, as proposed by Richard Hackathorn. This curve represents the value of an action initiated by a business event along time.

Real Time Business Intelligence with Azure and Power BI

Figure 1: Value-Time curve for business processes, as proposed by Richard Hackathorn

In the figure above, we can see different types of business processes. In all cases, there is an event happening at time 0 and an action triggered at time 8. We can see that:

For business process 1, the decay is quadratic, so we are losing value from the very beginning.
For business process 2, the decay is exponential, so we don’t lose much value at the beginning, but there is a sudden drop in value at one point in time.
For business process 3, the increase is exponential, so, in contrast to the previous cases, the value of the action increases with time.

In these examples, a real-time action is especially critical for Business Process 1, while traditional BI with a batch incremental load can be enough for Business Process 2. The special case of Business Process 3 is clearly not suitable for real-time analytics.

Considering these special business processes, BI is starting to adopt solutions that provide real-time analytic capabilities to business users. The key objective of these solutions is to reduce the latency between the event generated by the business process and the corresponding actions taken by business users.
 

2. Case Study Overview

For this case study, we are using an open data feed provided by Network Rail, the owner of a large part of the rail network of England, Scotland and Wales. From all their feeds, we selected the Real Time PPM (Public Performance Measure), with the following characteristics:

The feed is provided in JSON format
The data is provided on a 60 seconds basis for around 120 rail services
PPM is measuring the ratio of delayed to total trains
Along with PPM, the total number of on-time, delayed and very late trains are provided

In this case study, we consider the data for each service and minute as a business event. This means that the feed provides around 120 events per minute. This is around 2 events per second. As this event rate is a bit low for analysing the limitations of a real-time solution, we built an event interpolator, which basically takes 2 consecutive feeds and generates interpolated events at a higher rate.
 

3. Solution Design and Development

The overall architecture of the solution is as displayed in the below figure and is explained in the following sections:

Real Time Business Intelligence with Azure and Power BI

Figure 2: Realtime BI solution design diagram

 

4. On Premise

Using the data feed provided by Network Rail, we simulated an Operational System with continuous data events. This module does the following operations:

A feed receiver script downloads the feed every minute using the Network Rail API and stores a JSON file in the server
A feed splitter script reads the JSON files and, if required, interpolates the data to create events at a higher frequency. Finally, it inserts the data in a PostgreSQL database.
In the PostgreSQL, a trigger calls a custom function every time a row is inserted in the operational table. This function launches a notification including the data of the new row in JSON format.

Once the Operational System is ready, we setup an Event Monitor, which does the following:

An event monitor script listens to the notifications created by the PostgreSQL database and sends these events to an Azure Event Hub. If required, the script can encapsulate multiple events in one single message sent to the Event Hub. We will explain in detail the necessity of this feature later on.

Apart from the elements explained above, which compose the main information flow in the scenario, we also used other elements in our development:

We used Power BI Desktop to connect to the PostgreSQL database locally and analyse the events being inserted.
We also installed the Power BI Data Gateway and configured it so that we were able to connect to our on-premises PostgreSQL database from Power BI Online

 

5. Azure

Microsoft’s cloud ecosystem Azure offers a plethora of services — many of them oriented to the BI/Data market. Among them, some services are specially designed for real time applications. In this solution, we have used the following services from Azure:

Event Hub, which is a cloud end-point capable of receiving, queueing and storing (temporally) millions of events per second.
Stream Analytics, which allows the querying and analysing of real time events coming from an Event Hub and sends them to a variety of services like Power BI, SQL Databases and even other Event Hubs.
SQL Database, which is a transparent and scalable database that works as a classical SQL Server.

In this scenario, we are using an Event Hub as the single entry point of events. That is, our on-premises event monitor script is using the Azure SDK for Python to connect to the Azure Service Bus and to send the events to this Event Hub. It is very straightforward to send events using the API. In our case, as the PostgreSQL notifications are sending a JSON object, we have to pass it to the Event Hub with very limited manipulation (just adding a few timestamps for monitoring).

As we introduced before, the event monitor can encapsulate multiple events to reduce the number of messages to be sent to the Event Hub. The reason to do this is that, despite the ingest throughput of the Event Hub being high (and scalable if required), the network latency impacts the number of events we can send. Basically, each event sent is a call to the REST API of the Event Hub, which means that until a response is received, the process is blocked. Depending on the network latency, this can take a long time and limit the overall application throughput.

Luckily, the Event Hub API allows encapsulating multiple events in one message. To do this we create a JSON Array and push multiple event objects into it. The Event Hub is able to extract each individual event and process it independently.

Real Time Business Intelligence with Azure and Power BI

Figure 3: Schematic view of multiple events encapsulated into one packet to overcome the limitation imposed by network latency

By increasing the number of events per message, we can overcome the network latency limitation, as shown in the figure below. It is also important to select correctly the Azure Region where we deploy our services, so that latency is minimised. There are online services that can estimate the latency to different regions. However, it is always better to test it for each specific case and environment.

 

Real Time Business Intelligence with Azure and Power BI

Figure 4: As the number of events per packet increases, the system can deal with higher network latencies.

Once the stream is queued by the Event Hub, we can use it as the input of a Stream Analytics service. With this service we can apply SQL-like queries to our stream of events and create aggregates and other calculations. Two of the most important points of the queries are the Timestamp and Time Window functions. See the example below:

SELECT
operatorName,
AVG(ppm) avgPpm
INTO powerBIOutput1
FROM eventHubInput1
TIMESTAMP BY eventDate
GROUP BY operatorName, TUMBLINGWINDOW(second, 5)

In this query we are doing the following:

Reading data from the eventHubInput1 input
Aggregating the ppm metric by applying an average
Applying a Timestamp to the input stream using the eventDate column
Grouping the values for each operatorName using a tumblingwindow of 5 seconds
Sending the result to the powerBIOutput1 output

There are 3 types of Time Windows that can be applied in Stream Analytics, but the most common one is the Tumbling Window, which uses non-overlapping time windows to aggregate the events.

The data processes in the Stream Analytics are then forwarded to the following outputs:

Other Event Hubs and Stream Analytics so that more complex aggregations can be created. In our case, we used this pipe to create a Bottom 10 aggregation.
An Azure SQL Database service to store the values. We are using it to store the last 1 hour of data.
Power BI Cloud as a Streaming Dataset for real-time analysis.

 

6. Power BI Cloud

As explained before, one of the outputs from Azure Stream Analytics is Power BI Cloud. When we create this type of output, one Power BI account is authorized to access the stream. Automatically, when the data starts to flow, the new stream will be added to the Streaming Datasets section in Power BI Cloud:

Real Time Business Intelligence with Azure and Power BI

Figure 5: Real time datasets of Power BI generated in Stream Analytics

Once our Streaming Dataset is ready, we can create real-time visualizations, using the Real-Time Data Tiles that can be added to any of our dashboards, and select the appropriate dataset and visualisation:

Real Time Business Intelligence with Azure and Power BI

Figure 6: Process of adding a streaming tile to a Power BI dashboard

At the time of writing this article, the following visualisations are available:

Card
Gauge
Line
Bar
Column

The Real-Time Data Tiles are automatically refreshed depending on the Windowing applied to the incoming dataset in Stream Analytics. In the query shown above, we were applying a Tumbling Window of 5 seconds, so the data in the tiles will be refreshed accordingly:

Real Time Business Intelligence with Azure and Power BI

Figure 7: Streaming tiles showing real time data coming from Stream Analytics

The main drawback that we have experienced with real-time visualisations is that they are currently limited both in terms of quantity and customisation, as compared to the Power BI reports that we can create with standard datasets. We believe this is due to the relatively early stage of development of these visualisations.

To overcome this issue, we also tested the Direct Query feature, which allows direct connection to the data source without prefetching the data. We can use this feature only with some technologies, including on-premises and cloud databases. In our case, we tested the following scenarios:

On-premises SQL Server through Power BI Data Gateway
Azure SQL Database

For this type of datasets, we can configure a cache refresh interval, which, as of today, is limited to 15 minutes.

Real Time Business Intelligence with Azure and Power BI

Figure 8: Setting up the refresh schedule of a Direct Query dataset

The reports that we create using these data sources will be automatically refreshed at the specified frequency:

Real Time Business Intelligence with Azure and Power BI

Figure 9: Power BI report created using the Azure SQL Database data with Direct Query

The combination of Real-Time Data Tiles using Streaming Datasets and Reports using Datasets with Direct Query, both on-premises and on Azure, provides the best results for a real-time BI dashboard in Power BI.

Real Time Business Intelligence with Azure and Power BI

Figure 10: Final Power BI dashboard that combines Streaming and Direct Query tiles


7. Scenario Analysis and Conclusions

Considering this experience developing a solution with Azure and Power BI for a Real Time BI platform, we have identified the following benefits, as well as areas for future improvement:

Advantageous features:

Setting up the Azure services for real-time data queueing and processing is very simple.
Scaling out the Azure services when we need more resources is just a matter of a few clicks and is completely transparent for the developer.
The integration between Azure and Power BI is very powerful.
The Real Time Data Tiles of Power BI Cloud, especially their automatic refreshing, is something that can differentiate the product from its competitors.
Using Direct Query data sources, we can complement the real-time dashboard with near real-time data.

Areas for improvement:

In a BI environment, events will mainly be generated in transactional systems, so a continuous monitoring is required, which involves custom developments for each scenario.
The network latency of the Azure service imposes a limit on the number of events that can be sent per second. We can packetize multiple events to increase the throughput, but there might be a bottleneck.
Real Time Data Tiles in Power BI seem to be at an early stage of development and they are almost non-customizable.
Cache refreshing for Direct Query sources in Power BI is limited to 15 minutes, which might be not enough in some scenarios.
Even though starting the Event Hub and Stream Analytics services is rather fast and easy, it can take up to one hour until we start seeing the real-time tiles showing data and being refreshed. However, once the real-time tiles start showing data they are stable.
The SQL Database service of Azure gets unresponsive sometimes and requires manual intervention to fix it (restarting, truncating, etc.). The current information provided by Azure is not clear enough to help in finding why the service gets unresponsive.

Authors: Iñigo Hernáez, Oscar Martinez

Click here if you would like to receive more information about the Real Time BI Services we offer!

privacy policy - Copyright © 2000-2010 ClearPeaks

topnav