Think Big, Think Data!


Big Data & Data Science
A recent publication in The Economist asserts that data has now surpassed oil as the world’s most valuable resource. In the last two years alone, the world has collected and stored more data than all previous years in recorded history — combined. With over 60 billion connected devices worldwide, this trend will only continue. By 2020, it is estimated that over 30% of all data will be stored in cloud services. Data Mania is truly upon us!

However, you may still be pondering on whether Big Data (BD) could provide true value to your organisation and how effectively you could kick-off such an ambitious initiative. Perhaps you’re in a quandary about how you could leverage BD for competitive advantage, or perhaps you’re actively considering a digital transformation program. Be rest assured that your competitors certainly are. On average, only 0.5% of corporate data is actually analysed, leading to untold numbers of missed opportunities for those that discount it, and substantial benefits for those that do not. Retailers leveraging BD alone have increased their margins by up to 60%!

Digitalization is rewriting the rules of competition, with incumbent companies most at risk of being left behind by new disrupters in the industry. It is no secret that BD has a paramount role to play in any digitalization initiative. This year alone, 73% of all Fortune 1000 companies have reported that they are investing in BD, and that number is set to grow in coming years.

But BD, as the Economist rightly suggests, is a resource. If data is not captured, analysed and exploited, its value becomes inconsequential. This is where Data Science - specifically Data Discovery and Prediction - comes in. Not only can you now report on past actuals as in classic Business Intelligence (BI), but calling on algorithms and machine learning techniques, you can now predict into the future, leveraging your BD resources to feed your predictive models to a high degree of accuracy. Imagine the business opportunities this presents in all functional areas of your organisation!

Big DataHopefully we now have you thinking about BD and the possibilities it provides! But how does this BD initiative impact on the past years of investment in your Corporate Data Warehouse (DWH)? Simply stated, any BD initiative absolutely co-exists with the DWH — only with BD, the type of data, the velocity and the increased volume serves to enrich the DWH platform, providing a much more holistic business picture. BD technology platforms, either on premise or more often on Cloud, allow for this high volume data capture, which more classical relational database technologies cannot.


So let's get started!

At ClearPeaks, we offer our customers a pragmatic proof-of-concept (POC) service in which we will work with you to:

Define the right POC business case, the expected ROI, the problem and the desired BD solution.
Deploy a scaled-down POC environment, capturing data from various diverse sources beyond what you are capturing in your DWH. This could be high volume data, real-time streaming data, social media data, etc.
Use Cloud BD platforms to provide a full POC experience, after which an in-house hosting / cloud decision can be made. We start with Cloud as it´s quick to deploy, elastic and cost-efficient.
Demonstrate how the combination of DWH, BD, predictive modeling and powerful visualisations can bring tangible benefits to your organisation, all in an acceptable timeframe and with minimal costs.


Some useful Big Data definitions:

Volume, Variety & Velocity: Dealing with large, fast and varied data is now a possibility for any business. The key point now is defining what knowledge can be extracted from the data, not implementing complex software solutions.
Cloud: On-premise hosting of BD platforms is not always possible, and in some cases is not really recommended. The perfect partner for BD is the Cloud. The Cloud enables your BD solution to grow with you in a flexible and cost-effective manner, without the headaches of maintaining complex hardware systems.
Real-time: Provide up-to-the-second information about an enterprise's customers and present it in a way that allows for better and quicker business decision-making — perhaps even within the time span of a customer interaction. Real-time analytics can support instant refreshes to corporate dashboards to reflect business changes throughout the day.
Predictive analytics and machine learning: Predictive models exploit patterns found in historical and transactional data to identify risks and opportunities. Apply it to cross-sell products to your costumers, improve customer retention and credit risk assessments, optimize marketing campaigns, increase your revenues and more.


Gordon, Oscar, Marc

Click here if you would like to know more about our Big Data services!

Oracle BI Cloud Service, Part V: Managing the Service


How Oracle BI Cloud Service is managed

Parts II, III and IV of our BICS series were dedicated to the basic functionalities of the service. In this post, the last one in the series, we will explain how the BI Cloud Service is managed.

As in the on-premise version of OBI, security of the BI cloud service is managed with users, roles and application roles. All the cloud services you buy will be assigned a unique identity domain that will allow you to manage what users and roles can access to which service associated with that identity domain. Application roles allow you to set permissions to users and roles within each cloud service.

You can manage users and roles from the My Services page of Oracle Cloud with the Identity Domain Administrator credentials.



Figure 1: Oracle Cloud My Services webpage

Application roles are managed from the BI Cloud Service Console with BI Service Administrator credentials.

bics 2

 Figure 2: Managing users and roles

From the Service Console you can easily add users, roles and application roles as members to one or multiple application roles, manage pre-defined application roles, and create your own application roles.


bics 3

Figure 3: Oracle BI Cloud Service - Service Console

A useful feature included in BI Cloud Service is Snapshot: you can capture with one click the state of the service at a point in time including data model, catalog and application roles (but not database elements which should be backed-up separately). You can then save the snapshot in the cloud (maximum 10 snapshots) or download it to a file system, and upload a saved snapshot to restore the service to a previous state. Each snapshot import is total in that it overwrites everything that was in the instance beforehand. Apart from backups, snapshots are useful for moving data from the pre-prod to the prod environment.

Image 3

Figure 4: Taking system snapshots

Other common administration tasks of OBIEE are also available in the BI Cloud Service, such as monitoring users currently signed in, analyze or test SQL queries, and monitor usage and other metrics.



The Oracle BI Cloud Service is a good option for small and medium companies looking for an easy-to-use BI framework with a fast deployment cycle and minimum capital and operating expenditures. The agility and availability of the system allows companies to start reporting against their business data in a couple of hours. We are still missing some features to consider it as an option for large projects, but it is still useful for any company requiring their BI contents to be available in the cloud and in any mobile device.

So what are your thoughts on the Oracle BI Cloud Service? Would you consider deploying your BI fully in the cloud using the service? Contact us at or using our web form if you would like more information about the Oracle BI Cloud Service, or leave your comments below!

GASCO & ClearPeaks presented with Finalist Award at Oracle OpenWorld 2015


Openworld Logo White border Newç

GASCO & ClearPeaks are proud to have been presented a Finalist Award for “Big Data & Business Analytics” at the Oracle Excellence Awards for their successful deployment of a fully integrated, innovative and highly advanced Enterprise BI Platform.

Arguably the most advanced Oil & Gas Analytics environment in the Gulf region, the GASCO BI platform, delivered by ClearPeaks, extends the capabilities of Oracle BI ensuring optimal customer experience and user adoption, whilst leveraging the Oracle platform’s versatility and robustness.

The GASCO & ClearPeaks BI collaboration is proud to announce a user adoption of over 1000 active users across all departments and production sites, from C-level to site management and down to operational analysts, consuming more than 300 corporate reports and over 15 corporate BI dashboards daily. The entire enterprise is now enabled on one integrated platform.

Oracle Excellence AwardsBack in 2010 the picture was very different. GASCO was facing all the common challenges encountered in enterprises, that being the impossibility to enable management insight and reporting due to the lack of integration and consolidation of key business data and metrics. GASCO were working with a plethora of departmental and disconnected data sources, catering for very basic operational reporting, but making it impossible to drive decision making, data governance and to deploy commonly agreed business reporting rules. A further complication was that different technologies were being used across departments, adding to the user frustration of not having common and consistent standards and functionalities.

At this time, GASCO engaged ClearPeaks to embark on the journey of materialising their vision of a fully integrated, scalable enterprise-wide BI platform offering a single, reliable one-stop-shop for all GASCO reporting needs and deploying powerful, state-of-the-art visualizations to enhance user adoption with the following key objectives:

* Enabling a Business Insight across all divisions on a common platform
* Driving key management decision making
* Driving user adoption whilst reducing IT dependencies
* Addressing data quality, integration, governance & consolidation challenges
* Unifying diverse sources of information to optimize reporting performance

Core to materialising this vision, ClearPeaks set out to deploy a centralised data warehouse, integrating the complex array of data sources present at GASCO, harmonising master data and enabling cross-functional, enterprise-wide and strategic reporting.

Big Data Ecosystem – Spark and Tableau


In this article we give you the big picture of how Big Data fits in your actual BI architecture and how to connect Tableau to Spark to enrich your current BI reports and dashboards with data that you were not able to analyse before. Give your reports and dashboards a 360º view, and understand what, when, why, who, where and how.

After reading this article you will understand what Big Data can offer you and you will be able to load your own data into HDFS and perform your analysis on it using Tableau powered by Apache Spark.

The Big Data ecosystem

When considering a Big Data solution, it is important to keep in mind the architecture of a traditional BI system and how Big Data comes into play.

Until now, basically we have been working with structured data coming mainly from RDBMS loaded into a DWH, ready to be analysed and shown to the end user. Before considering how this structure may change when taking Big Data into the field, one could wonder how exactly the use of Big Data technology benefits my current solution. Using this technology allows the system to process higher volumes of data much faster, which can be more diverse, giving the chance to efficiently and safely extract information from data that a traditional solution can’t (high fault tolerance).

In addition, using Big Data permits the hardware structure to grow horizontally, which is more economical and flexible.

So, how does Big Data enter this ecosystem? Well, the main architecture concepts are quite the same, but there are big changes. The main differences are a whole new set of data sources, specifically non-structured and a completely new environment to store and process data.

BId Data -Spark and Tableau

In the picture above, at the top we have our traditional BI architecture. Below we can see how the new Big Data architecture still preserves the same concepts, Data Acquisition, Data Storage, etc. We are showing a few Big Data tools from the ones available in Apache Hadoop project.

What is important to point out is that Reporting & visualization must be combined. We must combine data from traditional and Big Data storage to provide a 360º view, which is where the true value resides.

To combine it there are different options. We could administer our aggregation calculations from HDFS, Cassandra data etc to feed the Data warehouse with information we were unable to compute before. Or we could use a reporting & visualization tool capable of combining traditional Data warehouse and Big Data storage or engines, like Tableau does.

A Big Data implementation: Apache Spark + Tableau

When approaching a Big Data implementation, there are quite a lot of different options and possibilities available, from new data sources and connectors to the final visualization layer, passing through the cluster and its components for storing and processing data.

A good approach to a Big Data solution is the combination of Apache Spark for processing in Hadoop clusters consuming data from storage systems such as HDFS, Cassandra, Hbase or S3, and Tableau such as the visualization software that will make the information available to the end users.

Spark has demonstrated a  great improvement in terms of performance compared to the original Hadoop’s MapReduce model. It also stands out as a one-component solution for Big Data processing, with support for ETL, interactive queries, advanced analytics and streaming.

The result is a unified engine for Big Data that stands out in low-latency applications and iterative computations, where fast performance is required, like iterative processing, interactive querying, large-scale batch computations, streaming or graph computations.

Tableau is growing really quickly, and has already proven to be one of the most powerful data discovery and visualisation tools. It has connectors to nearly any data source such as Excel, corporate Data Warehouse or SparkSQL. But where Tableau really stands out is when transforming data into compelling and interactive dashboards and visualizations through its intuitive user interface.

The combination of Apache Spark with Tableau stands out as a complete end-to-end Big Data solution, relying on Spark’s capabilities for processing the data and Tableau’s expertise for visualisation. Integrating Tableau with Apache Spark gives the chance to visually analyse Big Data in an easy and business-friendly way, no Spark SQL code is needed here.

Connecting Tableau with Apache Spark

Here, at ClearPeaks, we are convinced that connecting Apache Spark to Tableau is one of the best approaches for processing and visualising Big Data. So, how does this solution work? We are already working with this technology, and are proud to show a demonstration of Tableau connected to Apache Spark.


  • Tableau Desktop, any version that supports SparkSQL connector.
  • Apache Spark installed either on your machine or on an accessible cluster.


Tableau uses a specific SparkSQL connector, which communicates with Spark Thrift Server to finally use Apache Spark engine.

Big Data Spark & Tabelau

Software components

Tableau Desktop

Apache Spark Driver for ODBC with SQL Connector

Apache Spark (includes Spark Thrift Server)

Set up the environment

Installing Tableau Desktop and Apache Spark is out of the scope of this article. We assume that you have already installed Tableau Desktop and Apache Spark.

Apache Spark needs to be built with Hive support, i.e.: adding –Phive and –Phive-thriftserver profiles to your build options. More details here.

Install Apache Spark Driver for ODBC with SQL Connector

Install Apache Spark connector from Simba webpage. They are offering a free trial period which can be used to follow this article.

It has an installation wizard which makes installation a straightforward process.

Configure and start Apache Spark Thrift Server

Configuration files

Spark Thrift Server uses Hive Metastore by default unless another database is specified. We need to copy hive-site.xml config file from Hive to Spark conf folder.

cp /etc/hive/hive-site.xml /usr/lib/spark/conf/

park needs access to Hive libraries in order to connect to Hive Metastore. If those libraries are not already in Spark CLASSPATH variable, they need to be added.

Add the following line to /usr/lib/spark/bin/


Start Apache Spark Thrift Server

We can start Spark Thrift Server with the following command:

./sbin/ --master <master-uri>

<master-uri> might be yarn-cluster if you are running yarn, or spark://host:7077 if you are running spark in standalone mode.

Additionally, you can specify the host and port using the following properties:

./sbin/ \

  --hiveconf hive.server2.thrift.port=<listening-port> \

  --hiveconf<listening-host> \

  --master <master-uri>

To check if Spark Thrift Server has started successfully you can look at Thrift Server log. <thriftserver-log-file> is shown after starting Spark Thrift Server in console output.

tail -f <thriftserver-log-file>

Spark Thrift Server is ready to serve requests as soon as the log file shows the following lines:

INFO AbstractService: Service:ThriftBinaryCLIService is started.

INFO AbstractService: Service:HiveServer2 is started.

INFO HiveThriftServer2: HiveThriftServer2 started

INFO ThriftCLIService: ThriftBinaryCLIService listening on

Connect Tableau using SparkSQL connector

Start Tableau and select option to connect to Spark SQL.

Select the appropriate Type depending on your Spark version and the appropriate Authentication depending on your security.

Big Data Spark Tableau

The next steps are selecting schema, tables and desired relations, the same as when using any other Tableau connector.

Now you are able to run your own analysis on Big Data powered by Spark!

Spark tableau  Big Data

The dashboard above has been created in Tableau 9.0 after following the instructions provided. Apache Spark is used by Tableau to transparently retrieve and perform calculations over our data stored in HDFS.

Show us a capture of your Spark powered dashboards and reports. Share with us your impressions about Apache Spark and Tableau tandem in the comment section at the bottom.

Happy analytics!


Eduard Gil & Pol Oliva

Bonus: Add data to Hive Metastore to consume it in Tableau

If you are not familiar with the process of loading data to Hive Metastore you will find this section very useful.

This section describes how to load your csv from your file system to Hive Metastore. After this process you will be able to use it from Tableau using the process described in this article.

For this example we are going to use the following file that contains the well-known employee example:


123234877,Michael,Rogers, IT
546523478,John,Doe,Human Resources
654873219,Zacary,Efron,Human Resources
745685214,Eric,Goldsmith,Human Resources

As we can see it follows the schema: Employee Id, Name, Last Name, and Department.

We are going to use beeline to connect to Thrift JDBC Server. Beeline is shipped with Spark and Hive.

Start beeline from the command line


Connect to Thrift JDBC Server

beeline> !connect jdbc:hive2://localhost:10000

Create the table and specify the schema of it

beeline> CREATE TABLE employees (employee_id INT, name STRING, last_name STRING, department STRING)  ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';

Now you are ready to load your my_employees.csv file into the previously created table

beeline> LOAD DATA LOCAL INPATH '/home/user/my_employees.csv' INTO TABLE employees;

We can even perform operations over employees table using beeline

beeline> SELECT COUNT(*) FROM employees;


OBIEE 11g Installation in Silent Mode


At ClearPeaks we recently received a request to perform an OBIEE installation on an Oracle Enterprise Linux (OEL) server without Graphical User Interface (GUI).

The Repository Creation Utility (RCU) and the Oracle Universal Installer (OUI) offer the capability of being executed without a graphical assistant. It will be necessary to run them in SILENT MODE.

Since a database was already installed, only the RCU and OBIEE silent installation process is described in this post.

1. Schema Creation with RCU

1.1 Prerequisites

Make sure that Database and listener are running

1.2  Schema creation

1.2.1  Passwords file creation

As it is a silent installation, the RCU installer will require a text file containing the following passwords (with this sorting):

  • Database password
  • Component 1 schema password (BIPLATFORM)
  • Component 2 schema password (MDS)

vi rcu_passwords.txt

OBIEE Silent Mode

Ensure that the file belongs to Oracle before running the rcu command.

1.2.2 Execution in silent mode

As in every schema creation through RCU, it will be necessary to obtain the software from the Oracle site and extract it. The executable is located at rcuHome/bin/

Execute the following command to start the installer in silent mode:

./rcu -silent -createRepository -connectString localhost:1521:orcl -dbUser SYS -dbRole SYSDBA -schemaPrefix DEV -component BIPLATFORM -component MDS -f < ../../rcu_passwords.txt

After a while, this should be the result:

OBIEE Silent Mode

2. OBIEE Installation

2.1  Prerequisites

2.1.1   Database and listener

As in the RCU execution, the database and the listener need to be started and working before starting the OUI.

2.1.2  Schemas created through RCU

BIPLATFORM and MDS schemas must be created during the RCU installation.

2.1.3  Unset ORACLE HOME variable

If you have already installed an ORACLE database within the same server where you are going to install the OBIEE server, the ORACLE_HOME environment variable must be disabled. Bear in mind that the variable remains disabled only in the terminal session.

Execute the following command (as root):


OBIEE Silent Mode

2.1.4  Set Kernel Parameters

The last step is to modify the Kernel Parameters (as root):

The next lines must be added in the limits.conf file

  • oracle hard nofile 4096
  • oracle soft nofile 4096

vi /etc/security/limits.conf

OBIEE Silent Mode

2.2  Silent validation

2.2.1  Response file creation

If you don’t have GUI in your server, you can edit the response file I used for this installation:


It will be necessary to replace the <SECURE_VALUE> for your actual passwords.

2.2.2  Silent validation execution

Before installing OBIEE, a silent validation is required. OUI needs the response file to be executed in silent mode.

Ensure that the response file belongs to Oracle before running the installer.

Execute the following command as an Oracle user (the full path of the response file is required).

./runInstaller -silentvalidate -response /home/oracle/Desktop/bi_binaries/obiee_binaries/bishiphome/Disk1/response_file.rsp

You can ignore the following error:

OBIEE Silent Mode

2.3  Silent installation

2.3.1  Location file

If you already have an oraInst.loc file in your system, you can use it:

vi /home/oracle/app/oracle/product/11.2.0/dbhome_1/oraInst.loc

OBIEE Silent Mode

If this file does not exist on the system, the installation program creates it automatically.

2.3.2  Silent installation execution

This is the last and most critical step of the installation. Please make sure that all the previous steps have been performed successfully.

Execute the OUI in silent mode (as an Oracle user):

./runInstaller -silent -response /home/oracle/Desktop/bi_binaries/obiee_binaries/bishiphome/Disk1/response_file.rsp -invPtrLoc /home/oracle/app/oracle/product/11.2.0/dbhome_1/oraInst.loc

This step will take several minutes. If the previous steps have been performed correctly, the installation should end successfully.


This post outlines how to fully install OBIEE 11g on a Linux server without GUI.

Advantages of the silent mode installation include:

  • No need to consume extra resources with the graphical user interface.
  • The whole installation could be automatically executed by a script.
  • Possibility to perform equal installations if the response files don’t change.
  • No need to spend more time executing the graphical wizard manually.

For more information, consult the official Oracle documentation:

privacy policy - Copyright © 2000-2010 ClearPeaks