Displaying Geographic Data in OBIEE

.

 

Introduction - What is geographic data?

The main goal of Business intelligence is to transform raw data into meaningful and useful information for the purpose of enabling more effective operational insights, as well as more tactical decision-making. The nature and character of this raw data can be very heterogeneous, ranging from structured data of orders originating from transactional databases to unstructured data coming from clients´ Twitter feeds. Today, I want to focus on one specific data type: geographic data.

The term ‘geographic’ neither refers to the way data is stored, nor its source. Instead, it denotes a functional characteristic, meaning data can be somehow positioned on the Earth. More precisely, geographic data can be defined as data with an implicit or explicit association with a location relative to the Earth, either a point, a line or a polygon.

In the following images you can see a clear example of how important is to show geographic data properly. While it is very difficult to see a clear pattern on the bar chart, the map displays a much clearer picture. Indeed, it turns out we are visualising the latitude of each region of Spain. The conclusion is that, as I like to say, showing geographic data without a map means losing information.

What is geographic data? Comparison bar chart - map

Figure 1: Comparison bar chart - map

As you might know, most organisations have some geographic data among their data sets. It can be points with a client’s location or an event’s situation, lines representing streets or railways, or polygons with the shape of countries or some other customised regions. Geographic data is usually present, and hence, it has to be properly displayed in order to get useful insights and information from it.
 

1. Geographic Visualisations Components

When creating geographic visualisations, that is, maps with data on the top, three pieces or components are needed:

Background map: which is the map displayed on the bottom. It can be either an online map (Google Maps, Bing, TomTom, etc.), or an on-premise map (e.g. HERE or OpenStreetMaps) which was previously designed and stored. As you can see on the images below, the visual experience of an online map is hardly achievable with an on-premise one.
Examples on-premise maps

Figure 2: Examples background maps

Data layers: which are composed by a shape (a point, a line or a polygon) and data objects (measures and attributes). The critical issue is which shapes are identified by the visualisation tool and which are not. Geocodes comprised of latitude/longitude coordinates are usually identifiable as points, but not literal addresses. For polygons, only main administrative areas are usually recognised, while custom areas will have to be manually introduced.
High-end geographic information system (high-end GIS): which is the software that matches the background map with the data layers, renders the map and includes some extra spatial functionalities. Some examples of high-end GIS are Oracle MapViewer (for OBIEE), Tableau (already built-in) or libraries such as Google Maps JavaScript API, Leaflet or Kendo.

 

2. Creating Geographic Visualisations in OBIEE12c

When working with OBIEE12c, we have three main options to implement some nice geographic visualisations:

Oracle MapViewer with online or on-premise maps:Developing OBIEE built-in maps using the Map View analysis type, which is specially designed to display several map visualisations such as Color Fill Map, Bubble Map or Pie Graph Map. The background map can be either some online map like Oracle eLocation if you have Internet connectivity or a customised offline one. The MapViewer toolkit for OBIEE and the Oracle Spatial option for Oracle database are necessary. The pros are that no third party is involved in the solution, nor is any code needed. On the other side, the amount of visualisations is limited and requires the configuration of layers and background maps.
On-premise library with online or on-premise maps: Another solution is developing maps using a library on-premise, such as Leaflet or Kendo, together with a customised on-premise map, for a 100% offline solution, or with an online map. Remember, OBIEE allows you to run JavaScript code using Narrative View analyses. In this case, the pros are no extra Oracle tools needed, higher visualisation features and options, and no Internet required. However, the development and maintenance cost increases significantly.
Google Maps JavaScript API: The last main solution is developing maps using the Google Maps JS API (or some other geographic API). Again, we use Narrative View analyses to run the JavaScript code on OBIEE, but in this case you also need to enable cross-site scripting from the API to OBIEE server. The pros are increased user experience and better visualisation features, while some drawbacks are dependency on third parties and higher development cost.

In short, each solution has its pros and cons. For this reason, doing an analysis of the users requirements, the system limitations and the maintenance and development cost is a must before starting with the development.
 

3. Developing Geographic Visualisations with Google Maps JS API

In this section, we explore the possibilities of developing geographic visualisations in OBIEE12c using Google Maps JavaScript API. Specifically, we show three different dashboards, each with one map and several extra functionalities developed with HTML, CSS and JavaScript, such as the title and some navigation buttons.
The first one is a heat map colouring the area of towns by some measure. It also includes several background map styles, a legend with the minimum and maximum value of the measure, and a tooltip with some specific attributes and measures. Moreover, the traditional Google buttons can be configured. In this case the Street View button is hidden and only the zooming buttons are shown.

Developing Geographic Visualisations with Google Maps JS API

Figure 3: Developing Geographic Visualisations with Google Maps JS API - Dashboard 1

The next dashboard shows a heat map with a similar look and feel to the previous one. This one also incorporates a label with the number of points requested, which is very important to control due to performance issues. It is a perfect way to identify geographical patterns on the localisation of events.

Image 4

Figure 4: Developing Geographic Visualisations with Google Maps JS API - Dashbaord 2

The last one shows a markers map which uses specific icons to represent different values of a category. Also, a legend with the descriptions of the icons used can be shown or hidden by clicking on the information button. This is a great manner of introducing another dimension on a geographical analysis.

Developing Geographic Visualisations with Google Maps JS API

Figure 5: Developing Geographic Visualisations with Google Maps JS API - Dashbaord 3

Obviously, the complexity of the code depends on the characteristics of the map itself such as the type of map, the utilisation of custom geometrics or markers, the amount of auxiliary elements such as legends and tooltips, or the level of customisation of the background map. However, when working with OBIEE, there is another complexity element to take into account: The insertion of this code into the narrative view structure.
 

Conclusions

Although this article gives a more general overview of several topics related to the big world of geographic data, we can still get some conclusions from what has been said.
First and most importantly, if you have geographic data, remember to draw a map! Moreover, we have talked about the main components you need to create a geographical analysis, and the technical options to implement it in OBIEE. Finally we have shown some nice dashboards using Google Maps API.
Click here if you would you like to know more about displaying geographic data in OBIEE and the services we offer!

Real Time Business Intelligence with Oracle Technologies

.

 

Introduction

In our previous blog article we discussed the necessity of real time solutions for Business Intelligence (BI) platforms and presented a user case with Microsoft Technologies (namely Azure and Power BI). In this case, we are analysing the same scenario, but we instead propose a design using Oracle Cloud and Oracle On Premise technologies.

We recommend going through the previous blog article to understand completely the scenario under analysis and its requirements.

 

1. Oracle Technologies for Real Time BI

Oracle offers both in cloud and on-premises solutions, focusing on Big Data. These solutions can be used for a variety of Big Data applications, including real-time streaming analytics.

a. Oracle Cloud Services

From all the services offered on Oracle Cloud, the following suit the needs of a real-time BI application:

Oracle Event Hub Service. This service provides a managed Apache Kafka cluster, where we can easily assign resources and create topics using the web interface.
Oracle Big Data Compute Edition Service (OBDCE). This service provides a managed Big Data cluster with most of the Apache Hadoop/Spark stack elements, where the resource management and execution of tasks is also managed from the web interface.

Both the Event Hub and the OBDCE services are part of the PaaS offerings of Oracle Cloud, which means that the whole hardware and software infrastructure behind them is managed by Oracle. The key benefit is that, even though we are using standard open source technologies, we don’t have to worry about resource provisioning, software updates, connectivity, etc. With this, the developers can focus on building the solutions, without losing time on administrative tasks.

Another important point is that the connectivity between services on the cloud can be configured very easily using the web console, which ensures a reliable and safe path for the data.

b. On Premise Technologies

In addition to the cloud solution, a similar environment can be built on premises. For this we are using the Oracle Big Data Appliance. The Big Data Appliance consists, generally speaking, of the following components:

Hardware: Several server nodes and other accessory elements for networking, power, etc. The configuration can be a starter rack with 6 nodes to a full rack with 18 nodes. Multi-rack configurations allow for an even larger number of nodes.
Software: All the components of the Cloudera Distribution for Hadoop (CDH) and additional components from Oracle like Oracle Database, Oracle NoSQL Database, Oracle Spatial and Graph, Oracle R Enterprise, Oracle Connectors, etc.

For the purpose of our Real Time project, the required components are Kafka and Spark, as we will see later. Both are part of CDH and key elements of the standard open source real-time analytics scenario. In this case, all the components will be available in the same cluster.

It is also important to know that, for demo projects like this, Oracle offers a Big Data Lite virtual machine, that contains most of the components of the Big Data Appliance.

c. Real-Time Visualization

At the time of writing this article, there is no BI tool from Oracle (OBIEE, DV) that allows visualization of real-time data. To tackle this, we decided to build a custom front-end that could be used both as a standalone web application or as one integrated into OBIEE.
The key technologies we are using for this purpose are:

Flask, which is a lightweight web framework for Python.
SocketIO, which is a framework based on WebSockets for real-time web applications using asynchronous communication.
ChartJS, which is a JavaScript library for building charts.

 

2. Solution Design and Development

The architectural design for the solutions is shown in the figure below and explained throughout the rest of this section:

Realtime BI solution design diagram, using both cloud and on premise Oracle technologies

Figure 1: Realtime BI solution design diagram, using both cloud and on premise Oracle technologies

a. On Premise Source System
The on-premises source system part of the solution simulates a real-time operational system using a public data feed provided by Network Rail. The data from the feed is processed and inserted into a PostgreSQL database. An event monitor script listens for notifications sent from the database and forwards them to the real-time processing queue (Kafka), either located on Oracle Cloud or on the Oracle Big Data Appliance.

For more details on this part of the solution, please, refer to the previous article using Microsoft Technologies. The design is exactly the same in this case, except that, in this solution, the Event Monitor sends the events to a Kafka queue (cloud or on premises), instead of sending them to an Azure Event Hub.

b. Oracle Cloud and Oracle Big Data Appliance

As explained in previous sections and shown in the diagram above, the Oracle solution for real-time stream processing can be developed both in the Oracle Cloud and using the Oracle Big Data Appliance. Both solutions are similar, as the underlying technologies are Kafka for event queueing and Spark Streaming for event processing.

Apache Kafka is a message queueing system where Producers and Consumers can send and receive messages from the different queues, respectively. Each queue is called Topic, and there can be many of them per Kafka Broker (or node). The Topics can be optionally split into Partitions, which means that the messages will be distributed among them. Kafka uses Zookeper for configuration and managing tasks.

In our scenario, we are using a simple configuration with just a single Kafka Broker, three Topics, each of them with a single partition. The process works as follows:

The Event Monitor sends the events received from the database to Topic 1.
Spark Streaming consumes the messages from Topic 1, processes them and sends the results to Topics 2 and 3.
In the Flask Web Server a couple of Kafka Consumers are listening to Topics 2 and 3, and forwarding them to the web application.
Interaction between main components of the solution and the different Kafka topics

Figure 2: Interaction between main components of the solution and the different Kafka topics

Spark Streaming is one of the multiple tools of the Apache Spark stack, built on top of the Spark Core. Basically, it converts a continuous stream of messages (called DStream) into a batched stream using a specific time window. Each batch is treated as a normal Spark RDD (Resilient Distributed Dataset), which is the basic unit of data in Spark. The Spark Core can apply most of the available operations to process this RDDs of the batched stream.

Spark Streaming processing workflow

Figure 3: Spark Streaming processing workflow

In our scenario, Spark Streaming is being used to aggregate the input data and to calculate averages of the PPM metric by timestamp and by operator. The process to calculate these averages requires few operations, as shown in the Python code snippet below:

# Create Kafka consumer (using Spark Streaming API)
consumer = KafkaUtils.createDirectStream(streamingContext,
[topicIn],
{"metadata.broker.list": kafkaBroker})

# Create Kafka Producer (using Kafka for Python API)
producer = KafkaProducer(bootstrap_servers=brokers,
value_serializer=lambda v: json.dumps(v).encode('utf-8'))

# Consume the input topic
# Calculate average by timestamp
# Produce to the output topic
ppmAvg = consumer.map(lambda x: json.loads(x[1]))
                 .map(lambda x: (x[u'timestamp'], float(x[u'ppm']))
                 .reduceByKey(lambda a, b: (a+b)/2)\
                 .transform(lambda rdd: rdd.sortByKey())\
                 .foreachRDD(lambda rdd: sendkafka(producer, topicOut, rdd))

From this sample code, we can see that the RDD processing is similar to Spark Core, with operations such as map, reduceByKey and transform. The main difference in Spark Streaming is that we can use the foreachRDD operation, which executes the specified function for each of the processed batches.

It is also important to know that, at the time of writing this article, the Spark Streaming API in Python does not offer the option to create a Kafka Producer. Therefore, we are using the Kafka for Python library to create it and send the processed messages.

Together with the average by timestamp shown above, we are also generating an average by operator. As we have two sets of processed data, Spark needs to send the data to two separate Kafka Topics (2 and 3).

One key problem with Spark Streaming is that it does not allow data processing in Event Time. This basically means that we can’t synchronize the windowing applied by Spark to create the batches to the timestamps of the source events. Therefore, as shown in the diagram below, this can lead to the events of different source timestamps being mixed in with the aggregates created by Spark.

Misalignment between Kafka and Spark Streaming windows causes events to be processed inside the incorrect time window

Figure 4: Misalignment between Kafka and Spark Streaming windows causes events to be processed inside the incorrect time window

In fact, in our scenario, we have events that are timestamped in the source, but we were not able to align the Spark Streaming batching process to this time.

There are some possible solutions for this issue, namely:

Spark Streaming with Updates.This is a workaround, were we can tell Spark Streaming to update the results of a batch with data coming in a “future” processing window. We tested this approach but, unfortunately, it lead to compatibility errors with Kafka and we couldn’t go ahead with it.
Spark Structured Streaming.This is a separate tool of the stack built on top of Spark SQL, which is meant to solve the problem with Event Time processing. Unfortunately, at the time of writing this article, it is still only available in “alpha” version as part of Spark 2.X. Again, we were able to test this experimental feature but couldn’t get it to properly work with Kafka.
Other streaming processing tools.There are other existing tools in the Big Data ecosystem that can work with Event Time. Kafka itself has a Streams API that can be used for simple data processing. There is also Storm and others.

c. On Premise BI System

As introduced earlier, Oracle standard BI solutions don’t offer the possibility to connect to real-time sources. Therefore, we decided to build our own custom platform and integrate it with OBIEE.

The key component of this part of the solution is the real-time messaging between the web server and the browser provided by SocketIO. In this library, the client requests the server to start a session. If the server accepts, a continuous stream of bidirectional messages is opened (in our case the messages are unidirectional, from the server to the client). Both the client and the server react to the received messages. Finally, either the client or the server can close the connection (in our system the client closes the connection when the browser is closed).

Continuous bidirectional communication channel using SocketIO

Figure 5: Continuous bidirectional communication channel using SocketIO

Although the message channel is bidirectional, only the server is sending messages to the clients. What it does is consume the events coming from the Kafka Topics populated by Spark Streaming and forwards them, with a slight manipulation, to a SocketIO namespace using two different named channels.

The web server sends the messages received from the Kafka topics to the SocketIO channels

Figure 6: The web server sends the messages received from the Kafka topics to the SocketIO channels

# Create a SocketIO object using the Flask-SocketIO add-in for Flask
socketio = SocketIO(app, async_mode=async_mode)

# Create a Kafka Consumer using PyKafka
client = KafkaClient(hosts=kafka_host, zookeeper_hosts=zookeeper_host)
topic = client.topics[kafka_topic]
consumer = topic.get_simple_consumer(consumer_group=kafka_consumer_group,
                                     auto_offset_reset=OffsetType.LATEST,
                                     auto_commit_enable=True,
                                     auto_commit_interval_ms=1000,
                                     auto_start=False)

# Start the Consumer, monitor the input
# and send the received data through the SocketIO channel
consumer.start()
while True:
socketio.sleep(0.1)
message = consumer.consume(block=False)
if message is not None:
data = json.loads(message.value.decode('utf-8'))
socketio.emit(channel,
{'key': data[0], 'value': data[1]}, namespace=namespace)

On the client side we have two possibilities to display the data, a standalone website served by the same web server or a set of custom analysis developed in OBIEE. In both cases, the key elements are the SocketIO and ChartJS JavaScript libraries. The first one establishes the connection with the server and the second one is used to create the charts. The SocketIO object is configured so that anytime it receives a message from any of the channels, it will update the data of the chart and ask ChartJS to refresh it accordingly. The required code in Javascript is shown in the following snippet:

# Create the socket using the JavaScript SocketIO client library
socket = io(location.protocol + '//'
+ document.domain + ':'
+ location.port
+ namespace);

# Create an event listener for the “realtime_data01” channel
# and update the corresponding charts
socket.on('realtime_data01', function(msg) {
if (lineChart.data.labels.length > 20) {
lineChart.data.labels.shift();
lineChart.data.datasets[0].data.shift();
}
lineChart.data.labels.push(msg.key);
lineChart.data.datasets[0].data.push(msg.value);
lineChart.update();
});

Here, we are asking the socket object to update the values of the lineChart object, shift the values if required and update the chart when a new message is received in the realtime_data01 channel. The result will be a set of charts updating automatically as soon as the new data is sent through the socket:

: Standalone web application with real time visualizations

Figure 7: Streaming tiles showing real time data coming from Stream Analytics

Moreover, using dummy analysis with static text visualizations, we can embed the HTML and JavaScript into a normal OBIEE dashboard. In the example below, we can see exactly the same visualisations as in the standalone web. However, we will always be able to combine these real time visualisations with normal RPD-based OBIEE analyses.

Real-time visualizations embedded into an OBIEE dashboard

Figure 8: Real-time visualizations embedded into an OBIEE dashboard

 

3. Scenario Analysis and Conclusions

Based on our experience developing this solution with Oracle technologies for Real Time BI scenario, we have identified the following benefits, as well as areas for future improvement:

Advantageous features:

Oracle solutions for Big Data are available both in the Cloud and On Premise, suiting perfectly to different companies and scenarios.
The Oracle Big Data stack is based on open source standard software, which makes it really easy to develop solutions and to find guidance.
Kafka topics are really easy to create and configure in a matter of minutes.
Spark Streaming leverages all the processing power of Spark to real time streams, thus allowing for very sophisticated analytics in a few lines of code.
SocketIO allows creating bidirectional channels between web servers and applications and suits very well the needs of a real time application.
The web based front end is very flexible as it can be served standalone or integrated into other tools as OBIEE.

Areas for improvement:

The Oracle Big Data stack comes with a plethora of components pre-installed, which can be unnecessary in many applications.
Spark Streaming windowing is not ready yet for event time processing, which makes it flawed for some real-time applications. Other solutions from the Spark stack are still not yet completely ready for production environments.
Oracle standard BI solutions do not offer real-time visualisation solutions.

Authors: Iñigo Hernáez, Oscar Martinez

Click here if you would like to receive more information about the Real Time BI Services we offer!

Real Time Business Intelligence with Azure and Power BI

.

 

Introduction

The purpose of this blog article is to analyse the increasing need for real time solutions applied to BI platforms. We will first understand the context and then present a user case and its corresponding solution. The technical development takes advantage of the Event Hub, Stream Analytics and SQL Database services of Microsoft Azure, together with visualisations in Power BI.

1. Traditional and Real Time BI

Traditionally, a BI solution consists of a centralized DWH that is loaded incrementally in batches. The most common approach is to run the ETL process once a day, outside of working hours, to avoid undesired workloads and performance issues on the source transactional systems. The main drawback of this approach is that the information analysed by business users is always slightly outdated.

In most cases, business processes can be successfully analysed even if the available information is from previous days. However, in recent years, the number of business processes that require real-time monitoring has increased dramatically. Fraud detection, sensor networks, call centre overloads, mobile traffic monitoring, and transport fleet geolocation are just a few examples of where real-time analysis has transformed from a luxury to a necessity.

To understand the difference between traditional and real-time BI necessities, let’s use the Value-Time Curve, as proposed by Richard Hackathorn. This curve represents the value of an action initiated by a business event along time.

Real Time Business Intelligence with Azure and Power BI

Figure 1: Value-Time curve for business processes, as proposed by Richard Hackathorn

In the figure above, we can see different types of business processes. In all cases, there is an event happening at time 0 and an action triggered at time 8. We can see that:

For business process 1, the decay is quadratic, so we are losing value from the very beginning.
For business process 2, the decay is exponential, so we don’t lose much value at the beginning, but there is a sudden drop in value at one point in time.
For business process 3, the increase is exponential, so, in contrast to the previous cases, the value of the action increases with time.

In these examples, a real-time action is especially critical for Business Process 1, while traditional BI with a batch incremental load can be enough for Business Process 2. The special case of Business Process 3 is clearly not suitable for real-time analytics.

Considering these special business processes, BI is starting to adopt solutions that provide real-time analytic capabilities to business users. The key objective of these solutions is to reduce the latency between the event generated by the business process and the corresponding actions taken by business users.

2. Case Study Overview

For this case study, we are using an open data feed provided by Network Rail, the owner of a large part of the rail network of England, Scotland and Wales. From all their feeds, we selected the Real Time PPM (Public Performance Measure), with the following characteristics:

The feed is provided in JSON format
The data is provided on a 60 seconds basis for around 120 rail services
PPM is measuring the ratio of delayed to total trains
Along with PPM, the total number of on-time, delayed and very late trains are provided

In this case study, we consider the data for each service and minute as a business event. This means that the feed provides around 120 events per minute. This is around 2 events per second. As this event rate is a bit low for analysing the limitations of a real-time solution, we built an event interpolator, which basically takes 2 consecutive feeds and generates interpolated events at a higher rate.

3. Solution Design and Development

The overall architecture of the solution is as displayed in the below figure and is explained in the following sections:

Real Time Business Intelligence with Azure and Power BI

Figure 2: Realtime BI solution design diagram

 

4. On Premise

Using the data feed provided by Network Rail, we simulated an Operational System with continuous data events. This module does the following operations:

A feed receiver script downloads the feed every minute using the Network Rail API and stores a JSON file in the server
A feed splitter script reads the JSON files and, if required, interpolates the data to create events at a higher frequency. Finally, it inserts the data in a PostgreSQL database.
In the PostgreSQL, a trigger calls a custom function every time a row is inserted in the operational table. This function launches a notification including the data of the new row in JSON format.

Once the Operational System is ready, we setup an Event Monitor, which does the following:

An event monitor script listens to the notifications created by the PostgreSQL database and sends these events to an Azure Event Hub. If required, the script can encapsulate multiple events in one single message sent to the Event Hub. We will explain in detail the necessity of this feature later on.

Apart from the elements explained above, which compose the main information flow in the scenario, we also used other elements in our development:

We used Power BI Desktop to connect to the PostgreSQL database locally and analyse the events being inserted.
We also installed the Power BI Data Gateway and configured it so that we were able to connect to our on-premises PostgreSQL database from Power BI Online

 

5. Azure

Microsoft’s cloud ecosystem Azure offers a plethora of services — many of them oriented to the BI/Data market. Among them, some services are specially designed for real time applications. In this solution, we have used the following services from Azure:

Event Hub, which is a cloud end-point capable of receiving, queueing and storing (temporally) millions of events per second.
Stream Analytics, which allows the querying and analysing of real time events coming from an Event Hub and sends them to a variety of services like Power BI, SQL Databases and even other Event Hubs.
SQL Database, which is a transparent and scalable database that works as a classical SQL Server.

In this scenario, we are using an Event Hub as the single entry point of events. That is, our on-premises event monitor script is using the Azure SDK for Python to connect to the Azure Service Bus and to send the events to this Event Hub. It is very straightforward to send events using the API. In our case, as the PostgreSQL notifications are sending a JSON object, we have to pass it to the Event Hub with very limited manipulation (just adding a few timestamps for monitoring).

As we introduced before, the event monitor can encapsulate multiple events to reduce the number of messages to be sent to the Event Hub. The reason to do this is that, despite the ingest throughput of the Event Hub being high (and scalable if required), the network latency impacts the number of events we can send. Basically, each event sent is a call to the REST API of the Event Hub, which means that until a response is received, the process is blocked. Depending on the network latency, this can take a long time and limit the overall application throughput.

Luckily, the Event Hub API allows encapsulating multiple events in one message. To do this we create a JSON Array and push multiple event objects into it. The Event Hub is able to extract each individual event and process it independently.

Real Time Business Intelligence with Azure and Power BI

Figure 3: Schematic view of multiple events encapsulated into one packet to overcome the limitation imposed by network latency

By increasing the number of events per message, we can overcome the network latency limitation, as shown in the figure below. It is also important to select correctly the Azure Region where we deploy our services, so that latency is minimised. There are online services that can estimate the latency to different regions. However, it is always better to test it for each specific case and environment.

 

Real Time Business Intelligence with Azure and Power BI

Figure 4: As the number of events per packet increases, the system can deal with higher network latencies.

Once the stream is queued by the Event Hub, we can use it as the input of a Stream Analytics service. With this service we can apply SQL-like queries to our stream of events and create aggregates and other calculations. Two of the most important points of the queries are the Timestamp and Time Window functions. See the example below:

SELECT
operatorName,
AVG(ppm) avgPpm
INTO powerBIOutput1
FROM eventHubInput1
TIMESTAMP BY eventDate
GROUP BY operatorName, TUMBLINGWINDOW(second, 5)

In this query we are doing the following:

Reading data from the eventHubInput1 input
Aggregating the ppm metric by applying an average
Applying a Timestamp to the input stream using the eventDate column
Grouping the values for each operatorName using a tumblingwindow of 5 seconds
Sending the result to the powerBIOutput1 output

There are 3 types of Time Windows that can be applied in Stream Analytics, but the most common one is the Tumbling Window, which uses non-overlapping time windows to aggregate the events.

The data processes in the Stream Analytics are then forwarded to the following outputs:

Other Event Hubs and Stream Analytics so that more complex aggregations can be created. In our case, we used this pipe to create a Bottom 10 aggregation.
An Azure SQL Database service to store the values. We are using it to store the last 1 hour of data.
Power BI Cloud as a Streaming Dataset for real-time analysis.

 

6. Power BI Cloud

As explained before, one of the outputs from Azure Stream Analytics is Power BI Cloud. When we create this type of output, one Power BI account is authorized to access the stream. Automatically, when the data starts to flow, the new stream will be added to the Streaming Datasets section in Power BI Cloud:

Real Time Business Intelligence with Azure and Power BI

Figure 5: Real time datasets of Power BI generated in Stream Analytics

Once our Streaming Dataset is ready, we can create real-time visualizations, using the Real-Time Data Tiles that can be added to any of our dashboards, and select the appropriate dataset and visualisation:

Real Time Business Intelligence with Azure and Power BI

Figure 6: Process of adding a streaming tile to a Power BI dashboard

At the time of writing this article, the following visualisations are available:

Card
Gauge
Line
Bar
Column

The Real-Time Data Tiles are automatically refreshed depending on the Windowing applied to the incoming dataset in Stream Analytics. In the query shown above, we were applying a Tumbling Window of 5 seconds, so the data in the tiles will be refreshed accordingly:

Real Time Business Intelligence with Azure and Power BI

Figure 7: Streaming tiles showing real time data coming from Stream Analytics

The main drawback that we have experienced with real-time visualisations is that they are currently limited both in terms of quantity and customisation, as compared to the Power BI reports that we can create with standard datasets. We believe this is due to the relatively early stage of development of these visualisations.

To overcome this issue, we also tested the Direct Query feature, which allows direct connection to the data source without prefetching the data. We can use this feature only with some technologies, including on-premises and cloud databases. In our case, we tested the following scenarios:

On-premises SQL Server through Power BI Data Gateway
Azure SQL Database

For this type of datasets, we can configure a cache refresh interval, which, as of today, is limited to 15 minutes.

Real Time Business Intelligence with Azure and Power BI

Figure 8: Setting up the refresh schedule of a Direct Query dataset

The reports that we create using these data sources will be automatically refreshed at the specified frequency:

Real Time Business Intelligence with Azure and Power BI

Figure 9: Power BI report created using the Azure SQL Database data with Direct Query

The combination of Real-Time Data Tiles using Streaming Datasets and Reports using Datasets with Direct Query, both on-premises and on Azure, provides the best results for a real-time BI dashboard in Power BI.

Real Time Business Intelligence with Azure and Power BI

Figure 10: Final Power BI dashboard that combines Streaming and Direct Query tiles


7. Scenario Analysis and Conclusions

Considering this experience developing a solution with Azure and Power BI for a Real Time BI platform, we have identified the following benefits, as well as areas for future improvement:

Advantageous features:

Setting up the Azure services for real-time data queueing and processing is very simple.
Scaling out the Azure services when we need more resources is just a matter of a few clicks and is completely transparent for the developer.
The integration between Azure and Power BI is very powerful.
The Real Time Data Tiles of Power BI Cloud, especially their automatic refreshing, is something that can differentiate the product from its competitors.
Using Direct Query data sources, we can complement the real-time dashboard with near real-time data.

Areas for improvement:

In a BI environment, events will mainly be generated in transactional systems, so a continuous monitoring is required, which involves custom developments for each scenario.
The network latency of the Azure service imposes a limit on the number of events that can be sent per second. We can packetize multiple events to increase the throughput, but there might be a bottleneck.
Real Time Data Tiles in Power BI seem to be at an early stage of development and they are almost non-customizable.
Cache refreshing for Direct Query sources in Power BI is limited to 15 minutes, which might be not enough in some scenarios.
Even though starting the Event Hub and Stream Analytics services is rather fast and easy, it can take up to one hour until we start seeing the real-time tiles showing data and being refreshed. However, once the real-time tiles start showing data they are stable.
The SQL Database service of Azure gets unresponsive sometimes and requires manual intervention to fix it (restarting, truncating, etc.). The current information provided by Azure is not clear enough to help in finding why the service gets unresponsive.

Authors: Iñigo Hernáez, Oscar Martinez

Click here if you would like to receive more information about the Real Time BI Services we offer!

Extending Oracle DVD Map functionality

.

 

1. The challenge

Despite Oracle Data Visualization Desktop (DVD) having now caught up with the leaders in the data visualization quadrant, there are, like in any product, still detailed functionalities which can always be enhanced. As such we are excited to share this product enhancement related to the map functionality with you.

In the example below, you can see an out-of-the-box DVD map visualisation showing traffic violations in US.

Extending Oracle DVD Map functionality

Figure 1: US traffic violations map chart

As you can see there is only a limited amount of insight which can be drawn from this visualization. Let´s try to build on this by adding some sizing based on a measure - in this case, the number of accidents:

Extending Oracle DVD Map functionality

Figure 2: US traffic violations map chart- sizing added

Even after colouring it with the app’s fine amount (Fine Amt), it still looks like something is missing.

Extending Oracle DVD Map functionality

Figure 3: US traffic violation - Formatted version

It’s possible that we could improve our visualisation by colouring by a dimension instead of by a measure; however, DVD does not allow dimension values at this time.

 

2. DVD 12.3 Plugin

Let´s try downloading the Custom Points Map Plugin from the Oracle Store instead.

Extending Oracle DVD Map functionality

Figure 4: US traffic violations - plugin used

Ok, that looks nicer. The ability to control the size of the dots really helps for better  chart comprehension, but the plugin still lacks helpful functionalities, such as being able to colour values by dimension.  Further inspection  of the plugin leaves much to be desired. What if you decide to maximize or minimize the visualisation? (⤢):

Extending Oracle DVD Map functionality

Figure 5: Maximize map chart

Well, not the best vantage point. Of course, you can resolve this by simply zooming in several times… until the moment you have to max out again.

 

3. The ClearPeak's solution

So are you ready now for a bit of magic? Let me present to you the solution that ClearPeaks developed in order to enhance the user experience. By simply adding some adjustments to the code, we were able to address all of these issues while utilizing the same visualisation tool (Oracle DVD 12.2.3) and plugin (Custom Points Map Plugin) as before. Alright, now we are ready to explore our data!

Extending Oracle DVD Map functionality

Figure 6: US traffic violations - using updated plugin version 1

Our first order of business was to resolve the dimension colour problem, enabling this feature for our customer. This was such an improvement! Now it´s much easier to find recognizable patterns or dependencies. We also thought that it may be wise to implement a colour gradient for measures in order to promote further and more accurate analysis of our data, so we did that too!

Extending Oracle DVD Map functionality

Figure 7: US traffic violations - using updated plugin version 2

Oh, and the problem with the zooming in/out? Yeah, we fixed that as well.  This is the final result of our tweaks:

Extending Oracle DVD Map functionality

Figure 8: US traffic violations - using updated plugin - final version

Looks much nicer, right?

Unfortunately, as nothing in life is ever truly perfect, there are still some inconveniences with our touch-up, namely, legends are  not available at this time. The good news is that there is a work-around for this problem, called “list chart”. To take advantage of it, add a second visualisation onto your canvas and drag the dimension (SubAgency) into it. Select “list chart” as a visualisation and you’ll have your legend.

Do not forget that DVD keeps the colour context within a canvas, so you should not worry about losing your legend.

Curious about how to make this solution work for you too? Luckily, you just have to follow the architecture of the plugin. Define which boxes (colour dimensions) you want to see and which values are allowed in it (i.e. colour boxes for dimension values and size box and gradient colours for measure values). Then, make sure to include them in the plugin.xml file. After that, link them with the map visualisation in the pointsMapVizdatamodelhandler.js file. Defining further details about the colours, buckets, etc. can be done in pointsMapViz.js. Of course, all changes should be synchronized with the main map library used by Oracle - oraclemapsv2.js.

 

4. Demo

Check out a demo of this solution below:

Conclusion

Even though there are still some restrictions when using DVD as a reporting tool, Oracle is really heading in the right direction with this new version - 12.2.3, especially with their custom plugins idea. By giving the users the opportunity to download and create new plugins and adjust the code of the plugins according to everybody´s needs, Oracle DVD offers a flexibility that nowadays is required on the BI market. If we combine all this with ClearPeaks’s know-how, it really is a recipe for turning every Oracle DVD project into a success.

Contact us if you’d like to receive more information about this solution.

Think Big, Think Data!

.

Big Data & Data Science
A recent publication in The Economist asserts that data has now surpassed oil as the world’s most valuable resource. In the last two years alone, the world has collected and stored more data than all previous years in recorded history — combined. With over 60 billion connected devices worldwide, this trend will only continue. By 2020, it is estimated that over 30% of all data will be stored in cloud services. Data Mania is truly upon us!

However, you may still be pondering on whether Big Data (BD) could provide true value to your organisation and how effectively you could kick-off such an ambitious initiative. Perhaps you’re in a quandary about how you could leverage BD for competitive advantage, or perhaps you’re actively considering a digital transformation program. Be rest assured that your competitors certainly are. On average, only 0.5% of corporate data is actually analysed, leading to untold numbers of missed opportunities for those that discount it, and substantial benefits for those that do not. Retailers leveraging BD alone have increased their margins by up to 60%!

Digitalization is rewriting the rules of competition, with incumbent companies most at risk of being left behind by new disrupters in the industry. It is no secret that BD has a paramount role to play in any digitalization initiative. This year alone, 73% of all Fortune 1000 companies have reported that they are investing in BD, and that number is set to grow in coming years.

But BD, as the Economist rightly suggests, is a resource. If data is not captured, analysed and exploited, its value becomes inconsequential. This is where Data Science - specifically Data Discovery and Prediction - comes in. Not only can you now report on past actuals as in classic Business Intelligence (BI), but calling on algorithms and machine learning techniques, you can now predict into the future, leveraging your BD resources to feed your predictive models to a high degree of accuracy. Imagine the business opportunities this presents in all functional areas of your organisation!

Big DataHopefully we now have you thinking about BD and the possibilities it provides! But how does this BD initiative impact on the past years of investment in your Corporate Data Warehouse (DWH)? Simply stated, any BD initiative absolutely co-exists with the DWH — only with BD, the type of data, the velocity and the increased volume serves to enrich the DWH platform, providing a much more holistic business picture. BD technology platforms, either on premise or more often on Cloud, allow for this high volume data capture, which more classical relational database technologies cannot.

 

So let's get started!

At ClearPeaks, we offer our customers a pragmatic proof-of-concept (POC) service in which we will work with you to:

Define the right POC business case, the expected ROI, the problem and the desired BD solution.
Deploy a scaled-down POC environment, capturing data from various diverse sources beyond what you are capturing in your DWH. This could be high volume data, real-time streaming data, social media data, etc.
Use Cloud BD platforms to provide a full POC experience, after which an in-house hosting / cloud decision can be made. We start with Cloud as it´s quick to deploy, elastic and cost-efficient.
Demonstrate how the combination of DWH, BD, predictive modeling and powerful visualisations can bring tangible benefits to your organisation, all in an acceptable timeframe and with minimal costs.

 

Some useful Big Data definitions:

Volume, Variety & Velocity: Dealing with large, fast and varied data is now a possibility for any business. The key point now is defining what knowledge can be extracted from the data, not implementing complex software solutions.
Cloud: On-premise hosting of BD platforms is not always possible, and in some cases is not really recommended. The perfect partner for BD is the Cloud. The Cloud enables your BD solution to grow with you in a flexible and cost-effective manner, without the headaches of maintaining complex hardware systems.
Real-time: Provide up-to-the-second information about an enterprise's customers and present it in a way that allows for better and quicker business decision-making — perhaps even within the time span of a customer interaction. Real-time analytics can support instant refreshes to corporate dashboards to reflect business changes throughout the day.
Predictive analytics and machine learning: Predictive models exploit patterns found in historical and transactional data to identify risks and opportunities. Apply it to cross-sell products to your costumers, improve customer retention and credit risk assessments, optimize marketing campaigns, increase your revenues and more.

 

Authors:
Gordon, Oscar, Marc

Click here if you would like to know more about our Big Data services!

privacy policy - Copyright © 2000-2010 ClearPeaks

topnav