Data Quality Series – Part 3: Data Standardization with EDQ


Data Quality Series – Data Standardization with EDQ


In the first part of the Data Quality Series we introduced the main concepts of data quality. In the second article we explained how Oracle Enterprise Data Quality (EDQ) can help us profile our data.

The focus in this post is to review some of EDQ’s basic cleansing and standardization capabilities.



EDQ provides a number of built-in transformations for data cleansing, and some of the most commonly used transformations are:

Character Replace: replaces characters according to a map
Denoise: removes noise characters from text attributes
Lower Case: converts string values to lower case
Upper Case: converts string values to upper case
Proper Case: converts string values to proper case
Normalize No Data: normalizes attributes with no data to nulls
Normalize Whitespace: removes leading and trailing whitespace, and normalizes inter-word whitespace to a single space
RegEx Replace: replaces a string value matching a regular expression with a given value, or part of the matching expression
Replace: replaces values using a map

EDQ also comes with a series of built-in processors for data type conversion that make the normalization of types in our data really easy. There are also transformations that can be used to enhance our data, for example by adding dates, concatenating or parsing fields, or adding initials from text values.

Maps can be very useful for data standardization; by keeping all the possible values representing an entity together with the corresponding standard value for that entity in map tables, we can standardize the data using EDQ in just a couple of simple steps. In the following example, we will see how we can improve the quality of two common data fields containing people information: Country and Gender.


1. Data Standardization with EDQ

Let’s suppose that we have a dataset with a Country field. The table should contain data mostly for the United States; however, a quick Frequency profile shows the following results:

Data Quality Series – Data Standardization with EDQ

Figure 1: Frequency profile of the Country field. We can see that some countries are represented by more than one value in the database.

We can see that we have very bad data quality for this field: the values ‘USA’, ‘US’, ‘U.S.A’, ‘United States’ and ‘U.S’ all belong to the single entity that we would like to call ‘United States of America’. Moreover, there are two different values for Canada (‘Canada’ and ‘CAN’), and also some null values. With EDQ, we can easily fix issues like this using mapping tables.

The first transformation that we need is the Normalize Whitespace processor, which removes leading and trailing whitespace, and normalizes inter-word spaces to a single space. Simply create a new process, add the transformation, connect the Reader output to the normalizer input, and select the fields that need whitespace normalization:

Data Quality Series – Data Standardization with EDQ

Figure 2: Normalize Whitespace processor in EDQ. Before processing text fields for standardization, it is recommended to remove leading and trailing spaces and to normalize inter-word spaces to a single space.

We will generate the values for the null ones from another field containing the city, using an external mapping called City to Country Mappings. This table contains a mapping of cities to their corresponding countries:

Data Quality Series – Data Standardization with EDQ

Figure 3: City to Country Mappings

To use it in our project, we just need to add it to the Reference Data section:

Data Quality Series – Data Standardization with EDQ

Figure 4: Adding Reference Data

We will use the mapping to generate the name of the country in a new column called “Derived Country” for each city in our dataset. To do this, we add the processor Enhance from Map, connected to the output of the Normalize Whitespace processor, using City as the field to match, and the added mapping:

Data Quality Series – Data Standardization with EDQ

Figure 5: Process for normalizing whitespace and populating the Country field in EDQ

Data Quality Series – Data Standardization with EDQ

Figure 6: Enhance from Map processor in EDQ. We will populate the “Country” field using the “City to Country” mapping, matching by the field “City”.

After running this process, we can see in the Enhance from Map processor results that we were able to enhance 3338 fields:

Data Quality Series – Data Standardization with EDQ

Figure 7: Results of the country enhancement using the City to Country Mapping in EDQ

However, 2100 records remain with an unenhanced country, so we still have work to do on this field to make it fit for use. The next step is the normalization of the name, using another mapping called Country Variants. This mapping contains the most common variants for ‘United States of America’, plus variants for Canada:

Data Quality Series – Data Standardization with EDQ

Figure 8: Country Variants mapping. The Standardized Country column contains the standard that we want to use for each real world entity, in this case countries.

To normalize these different variants, we add a Replace processor to the process, calling it Standardize Country:

Data Quality Series – Data Standardization with EDQ

Figure 9: Country Standardization process in EDQ.

In the properties of the transformation, we need to define Country as the input field (this is the field that we are standardizing), and we have to specify the replacements in the Options tab. We will use the mentioned Country Variants mapping, containing the replacements.

Data Quality Series – Data Standardization with EDQ

Figure 10: Replace processor in EDQ. The processor will replace the original value of the Country field with the standard from the Country Variants mapping.

Finally, we simply add a Merge Attributes processor to merge the country derived from the city and the standardized country into a new column, ‘Best Country’, and a Frequency Profile processor to check the results:

Data Quality Series – Data Standardization with EDQ

Figure 11: Complete Country Standardization process in EDQ. The Merge Attributes processor, called “Create Best Country Attribute”, merges the values of the country derived from the city and the country obtained after standardization using the Country Variants mapping.

The transformation will merge the derived country with the standardized country into a new column called “Best Country”:

Data Quality Series – Data Standardization with EDQ

Figure 12: Merging attributes in EDQ.

With a Frequency profiler we can analyse the results of the new field:

Data Quality Series – Data Standardization with EDQ

Figure 13: Frequency profiler to verify the results of the standardization of the Country field in EDQ.

We can see how the quality of the field has improved thanks to the transformations applied:

Data Quality Series – Data Standardization with EDQ

Figure 14: Country Frequency profiler before standardization in EDQ, showing null values and many values for the same countries.


Data Quality Series – Data Standardization with EDQ

Figure 15: Country Frequency profiler after standardization in EDQ.

We were able to eliminate all the variants of United States of America, and we also identified other countries that were not appearing correctly before the standardization process.

Now let’s suppose we have a dataset with people including a field for Gender, which should contain either ‘M’ or ‘F’ (Male or Female) for each record. However, a quick profiling of the field shows the following results:

Data Quality Series – Data Standardization with EDQ

Figure 16: Gender frequency profiling in EDQ.

We can see that we have a lot of null values, and some records with ‘U’ in the gender field. We’re going to improve the quality of the field with EDQ, by trying to generate the gender using the title and the first name fields, and merge this generated gender with the Gender field available in the data to get the best result.

We’ll begin by adding an Unstructured Name Parser. We will use it to parse the name field, so that will be the input of the transformation, specified in the Input sub-processor of the Parser processor:

Data Quality Series – Data Standardization with EDQ

Figure 17: Parsing text fields with EDQ.

Data Quality Series – Data Standardization with EDQ

Figure 18: Input sub-processor of the Unstructed Name Parser processor in EDQ. The input is specified here.

The name will be mapped to the only attribute of the processor called “FullName”:

Data Quality Series – Data Standardization with EDQ

Figure 19: Map sub-processor of the Unstructed Name Parser processor in EDQ. The FullName attribute is mapped to the input field Name.

After running the process, we can see all the steps taken and all the rules created in the results of the parser in order to parse the name field into three new fields: P1_Prefix with the title, P1_First with the first name, and P1_Last with the last name, for each record:

Data Quality Series – Data Standardization with EDQ

Figure 20: Results of the Unstructured Name Parser, with the fuzzy rules created to parse the string containing the name into three separate fields for the title, first name, and last name.

Now we can use these fields to derive the gender of each person in the dataset. To do this, we add the built-in transformation Add Gender to the process, and join it to the pass output of the parser transformation:

Data Quality Series – Data Standardization with EDQ

Figure 21: Enhancing a Gender field in EDQ. The process contains a parser transformation and EDQ's built-in transformation for adding the gender.

To derive the gender, we need to specify the fields that contain the title and the first name in the processor configuration, as well as the stated gender, which the processor will use in combination with the derived gender to create the output “Best Gender”.

Data Quality Series – Data Standardization with EDQ

Figure 22: Enhancing a Gender field in EDQ. The Add Gender processor requires three attributes: title, first name, and stated gender.

If we run the process now, we can see how we have improved the quality of this field in the results of the Add Gender processor:

Data Quality Series – Data Standardization with EDQ

Figure 23: Results of the Gender standardization in EDQ.

Now there are only 6 records with missing gender, and we have reduced the percentage of missing gender records from 19.5% to 0.1%.



EDQ provides some solid features that make data cleansing and standardization tasks much easier: we can apply built-in transformations, use external maps, and apply business rules to standardize, correct wrong values, normalize fields, and enhance our data.

You can try these and other usage examples of EDQ for standardization downloading the latest EDQ virtual machine and hands-on lab from the following links:

Latest EDQ virtual machine from Oracle
Getting Started Hands-On Lab

Click here
if you would like to know more about the Data Quality Services we offer!

Data Quality Series – Part 2: Data Profiling with EDQ


Data Quality Series



This is the second article in our blog series about Data Quality. In the first part of the series we introduced the main concepts of data quality.

The focus of this article is to introduce Data Profiling with EDQ.



Enterprise Data Quality (EDQ) is Oracle’s total solution for data governance and data quality management. It is part of the Oracle Fusion Middleware product range and the current version,, can be downloaded here.

There is a virtual machine for testing EDQ with a sample dataset, available for download too at the following link.

You can also find extensive documentation on the Oracle webpage - go to the following links for documentation and more resources.

Oracle EDQ can be used to profile, audit, standardize, normalize, and de-duplicate our data. The list below summarizes the key features of EDQ:

Integrated data profiling, auditing, cleansing and matching
Browser-based access to the different client applications used to operate and configure the product
Ability to handle all types of data
Connection to files (text, XML, Office) and any JDBC and ODBC compliant data sources and targets
Multi-user project support
SOA support to create processes that may be exposed to external applications as a service
A single repository to hold data, gathered statistics, and project tracking information
Fully extensible architecture allowing the insertion of any required custom processing

EDQ is fully extensible, offering extensions for Address Verification and Watchlist Screening, for example, and can be integrated as batch processing or as real-time processing with other integration tools.

In this article we will review the basic profiling capabilities of EDQ and explain how we can profile our data using EDQ in just a few steps.


1. Data Profiling

The first step in a data quality project is profiling the data. Data profiling is the analysis of data to clarify its structure, content, relationships and derivation rules. Profiling helps not only to understand anomalies and to assess data quality, but also to discover, register, and assess enterprise metadata, and so the purpose of data profiling is both to validate metadata when it is available and to discover metadata when it is not.

EDQ comes with several built-in transformations to profile our data quickly and to start discovering patterns and quality issues. Some of the most important transformations for profiling are:

Quickstats profiler: analyses high-level completeness, duplication, and value frequency across many attributes, and highlights possible issues. For example, we could use this transformation to identify duplicate values in columns that should contain unique values, or columns with more unique values than expected.

Figure 1: Data Quality Series – Data Profiling with EDQ

Figure 1: Example of the results of the Quickstats profiler in EDQ. We can start discovering issues in the data using this transformation, for example, fields with null values, or with more unique values than expected. 

Data types profiler: analyses attribute values for their data types, and assesses data type consistency. In the example below, we can see that some columns have inconsistent data types, and we can clearly identify wrong values (the Street column has 1 numeric value and the rest text, the Cell column has 6 Date/time values, the Active columns have 15 numeric values, etc.).

Figure 2: Data Quality Series – Data Profiling with EDQ

Figure 2: Example of the results of the Data types profiler in EDQ. This transformation is very useful to find data type consistency issues.

Max/Min profiler: finds minimum and maximum values – longest, shortest, lowest, and highest.

Figure 3: Data Quality Series – Data Profiling with EDQ

Figure 3: Example of the results of the Max/Min profiler in EDQ. One of the basic profiles that we can do on our data is an analysis of the maximum and minimum values of each field to find wrong values.

Frequency profiler: analyses value frequency across many attributes. In the image below, we can see how this transformation can be used to rapidly identify consistency problems in the values of the data.

Figure 4: Data Quality Series – Data Profiling with EDQ

Figure 4: Example of the results of the Frequency profiler in EDQ. We can already see data quality.

Patterns profiler: analyses character patterns and pattern frequency across many attributes.

Figure 5: Data Quality Series – Data Profiling with EDQ

Figure 5: Example of the results of the Patterns profiler in EDQ, very useful to find data quality issues in fields such as zip codes, telephone numbers, or email addresses.

Record completeness profiler: analyses records for their completeness across many attributes.

Figure 6: Data Quality Series – Data Profiling with EDQ

Figure 6: Example of the results of the Record completeness profiler in EDQ.

Other profiling transformations include, but are not limited to, number profiler, character profiler, date profiler, and RegEx patterns profiler.


2. Data Profiling with Oracle EDQ

EDQ developments are called projects, which contain processes that run processors (transformations). There is also the possibility to include processors in jobs, to make them accessible to external applications. All these developments are created using the Director application, accessible from the EDQ Launchpad on the EDQ server:

Figure 7: Data Quality Series – Data Profiling with EDQ

Figure 7: The EDQ Launchpad. All the client applications are accesible from this page.

The Director is a Java application, and is the main application for the operation of EDQ.

It is divided into five different panes:

Figure 8: Data Quality Series – Data Profiling with EDQ

Figure 8: The Director client application, the main application for developing in EDQ.

The projects and processors, together with the data stores, staged data, reference datasets, and other files, will be shown in the Project Browser pane (1). The main section of the application is the Project Canvas (2), where the different processes and jobs are built. The Tool Palette (3) will show the different tools available for use in different data quality processes and jobs. The Results Browser (4) will show the results of the processor selected on the canvas. Finally, the Tasks window in the bottom left corner (5) shows the progress of the execution of the processes as they are running.

To start profiling our data, the first thing we need to do is create a new project in the Director:

Figure 9: Data Quality Series – Data Profiling with EDQ

Figure 9: Creating a project in EDQ.

Then we need to add a data store to connect to:

Figure 10: Data Quality Series – Data Profiling with EDQ

Figure 10: Creating a new data store in EDQ.

Once we have a valid data store we can connect to, we have to create a snapshot of the desired tables in the data store. All the processing done in EDQ will be done on this snapshot, so the original data source will remain unchanged.

Figure 11: Data Quality Series – Data Profiling with EDQ

Figure 11: Creating a data snapshot in EDQ.

Finally, we need to create the process that will contain all the profiling transformations or processors:

Figure 12: Data Quality Series – Data Profiling with EDQ

Figure 12: Creating a process in EDQ.

After selecting our created snapshot, we can select the processors that we need on the next screen of the New Process window, checking the box ‘Add Profiling’, or we can leave it un-checked and manually drag and drop the processors one by one to the process canvas from the tool palette.

Figure 13: Data Quality Series – Data Profiling with EDQ

Figure 13: A simple data profiling process, containing the Reader processor and six profiling processors.

If we add the processors to the process manually, we will have to connect the output of the reader transformation to the input of the profiling transformations.

To run the process, we simply need to save it then click on the run icon:

Figure 14: Data Quality Series – Data Profiling with EDQ

Figure 14: Executing a process in EDQ.

The execution progress will be shown in the Tasks window, and once all the processors have finished the results of each individual transformation will be shown in the Results Browser. All these results can be exported to Excel using the buttons above the results in the Results Browser.


3. Creating Issues

A lot of common problems that could jeopardize the quality of BI or Analytics solutions can be identified with data profiling. EDQ includes features to create a collaborative environment between the different users involved in the data governance processes. For example, we can create issues and assign them to DBAs, application developers, ETL developers, BI developers, and business owners as they are found. To do this, simply right-click on any value that might indicate a data quality issue in the Results Browser, and select Create Issue.

Figure 15: Data Quality Series – Data Profiling with EDQ

Figure 15: Creating issues in EDQ.

The issues can be reviewed via the Issue Management application, accessible from the EDQ Launchpad or from the Director:

Figure 16: Data Quality Series – Data Profiling with EDQ

Figure 16: Link to the Issue Manager application accessible from the Director in EDQ. The Issue Manager application can also be accessed from the EDQ Launchpad.


4. Creating Reference Datasets

The data profiling step is useful for creating datasets that can be used later on as reference for data auditing. To do this, simply right-click on the valid values for a given field and select Create Reference Data.

Figure 17: Data Quality Series – Data Profiling with EDQ

Figure 17: Creating reference data in EDQ.

Figure 18: Data Quality Series – Data Profiling with EDQ

Figure 18: Reviewing reference data in EDQ.

We can add extra rows if there are missing values in our data.

After saving, this dataset will be the single source of truth for the field, and it can be used in other processes for auditing the data.

We can also create reference data from patterns instead of from individual values. EDQ also comes with many different types of reference data out-of-the-box, which can speed up the creation of data auditing processes.


5. Data Auditing

There are a number of processors available to audit data, called data checks. The most important ones are:

Data Type Check: checks that a string attribute contains data of the expected data type
Duplicate Check: checks for records that are duplicated across selected attributes
Email Check: checks email addresses are in a valid syntactic format
Invalid Character Check: checks a string attribute for invalid characters
Pattern Check: checks values for an attribute against reference data of valid and invalid character formats
No Data Check: checks whether or not values in an attribute contain any meaningful data
RegEx Check: checks values for a string attribute against reference data of valid and invalid regular expressions
Business Rules Check: checks data against a set of business rules

These processors allow the output streams to be divided for valid and invalid records to be handled separately or together as they are audited:

Figure 19: Data Quality Series – Data Profiling with EDQ

Figure 19: Different streams of data from a data check transformation in EDQ. The records with populated email can be treated separately from the records without email using the different outputs.

Data can be audited using reference data created by users. For example, the List Check and Pattern Check processors allow specifying reference datasets to determine valid and invalid records:

Figure 20: Data Quality Series – Data Profiling with EDQ

Figure 20: Using reference data for auditing in EDQ.

This way the processors will check if the selected field of each record in the database matches the reference data. We can drill down in the results of these processors to see the matching and un-matching records, and export the records to Excel to share them with DBAs, business users, or other users.


6. Data Profiling with ODI

Oracle Data Integrator comes with some data profiling capabilities, which although not as complete as EDQ, are enough for small projects with basic profiling needs. The table below shows the comparison between the data profiling features included in ODI and the more advanced features of Oracle EDQ:

Figure 21: Data Quality Series – Data Profiling with EDQ

Figure 21: Data Profiling Feature.


EDQ offers some great features to profile and audit data in a quick and simple way, and in a collaborative environment to raise issues with other users involved in the data quality assurance process. Tasks that we would normally do manually on the databases can be done with EDQ much faster, speeding up the process of finding quality issues in our data. A good profiling and auditing of our data is the first requisite for a successful quality assurance process, which in turn is a requisite for the success of any Analytics / BI project.

Stay tuned and read the next part of the Oracle EDQ series, in which we explain data standardization and cleansing using EDQ.

Click here if you would like to know more about the Data Quality Services we offer!

Data Quality Series – Part 1: Introduction


Data Quality

This article is the first in a series of blog posts about the topic "Data Quality". In the next couple of weeks we will go through the following subject matters:

Part 1: Introduction
Part 2: Data Profiling with EDQ
Part 3: Data Standardization with EDQ
Part 4: Data De-Duplication with EDQ

The focus of this first article is to introduce "Data Quality".


1. Introduction

Data quality is the perception or assessment of the fitness of data to serve its purpose in a given context [1]. Good quality data is crucial for analysts to create reports that contain accurate and consistent information, and in some businesses, bad quality or out-of-date data may increase costs. For example, a mailing campaign that sends letters to the wrong customer addresses is a waste of money.

Moreover, reports containing poor quality data may be regarded by customers as erroneous and thus reduce their confidence in the delivered dashboards. In fact, according to Gartner, a loss of confidence by users in their reports/dashboards is the number 1 cause of Data Warehouse / Data Mart / Data Governance project failures.

Many of the data quality tasks are very close to the database level and can be performed by a DBA, for instance by adding checks in the columns to allow just valid values, or by setting default date formats. But in some scenarios we may find that these validations are not performed, and certain data quality tools can help us to analyse and fix these issues.

The term “Data Quality” involves many different aspects:


The data conforms to the syntax (format, type, range) of its definition. Database, metadata or documentation rules as to the allowable types (string, integer, floating point etc.), the format (length, number of digits etc.) and range (minimum, maximum or contained within a set of allowable values).


The data correctly describes the "real world" object or event being described. Does it agree with an identified reference of correct information?


Does the data align with the operational context? For example, a birthdate of 01/01/01 is valid, but is it reasonable in context?


The proportion of non-blank values against blank values. Business rules define what "100% complete" represents: optional data might be missing, but if the data meets the expectations it can be considered complete.


Values across all systems in an organization reflect the same information and are in sync with each other.


The data is current and "fresh"; data lifecycle is a key factor.


The level of detail of the data element, e.g. the number of significant digits in a number. Rounding, for example, can introduce errors.


The need for access control and usage monitoring.

Referential Integrity:

Constraints against duplication are in place (e.g. foreign keys in a RDMBS)


The data is available for use when expected and needed.


No value occurs more than once in the data set.

Data quality is affected by the way data is entered, stored and managed, and data quality assurance (DQA) is the process of verifying the reliability and effectiveness of data. In the section below we will see the basic approach to a successful DQA project.


2. Data Quality Projects

Every data quality project needs to start with an understanding of the benefits that we are going to obtain from it, that is, an assessment and understanding of the value of the data as a key for better decision-making and improved business performance, or,  in other words, its utility.

It is possible to analyse how data is being used to achieve business objectives, and how the achievement of these goals is impeded when flawed data is introduced into the environment. To do this, we must consider the following points:

What the business expectations for data quality are
How the business can be impacted by poor data quality
How to correlate such impacts with specific data quality issues

Once a negative impact on the ways the business operates due to poor quality data has been determined, the necessary approach to assemble a data quality management programme and institute the practices that will lead to improvement must be planned. This plan must consider:

The processes that need to be instituted
The participants that will execute those processes and their roles and responsibilities
The tools that will be used to support the processes

Normally, a data quality project lifecycle involves at least three different steps:

1. Data Profiling
2. Data Standardization or Cleansing
3. Data Matching and De-Duplication


3. Data Profiling

Data profiling is the analysis of data in order to clarify its structure, content, relationships and derivation rules. It mainly involves gathering different aggregate statistics or informative summaries about the data, and ensuring that the values match up to expectations.

Profiling helps not only to understand anomalies and to assess data quality, but also to discover, register, and assess enterprise metadata; thus the purpose of data profiling is both to validate metadata when it is available and to discover metadata when it is not.

The typical outputs of the data profiling process are:

Column Profiling:

Record count, unique count, null count, blank count, pattern count
Minimum maximum, mean, mode, median, standard deviation
Completeness & number of non-null records
Data types
Primary key candidates

Frequency Profiling:

Count/ratio of distinct values

Primary/Foreign Key Analysis:

Referential integrity checks (can be achieved by creating a join profile)
Candidate primary and foreign keys

Duplicate Analysis:

Identify potential duplicate records

Business Rules Conformance:

The data meets an initial set of rules

Outlier Analysis:

Identify possible bad records


4. Data Standardization

Once the data profiling process is finished, the next step is to fix the issues that have been identified, by applying rules, replacing the wrong values, and fixing data inconsistencies. Data standardization, or data cleansing, is the process of developing and applying technical standards to the data, to maximize its compatibility, interoperability, safety and quality.


5. Data Matching and De-Duplication

Record matching is the process of finding duplicate records in the data, that is, records which may relate to a single real world entity. This may seem like a trivial task, but in reality it is far from it. The first challenge is to identify the criteria under which two records represent the same entity, considering that the data may come in different formats and using different conventions (free text fields), may be incomplete and/or incorrect (typos, etc.), and the context of the different fields (for example, four different fields may represent a single address). We want to identify duplicates regardless of these differences in the data, so we need to define measures of distance between the records and then apply rules to decide if they classify as duplicates.

Moreover, efficient computation strategies need to be used to find duplicate pairs in large volumes of data. The number of comparisons needed using traditional methods on large datasets quickly scales up to extremely long execution times, and techniques such as a previous clustering of the data become necessary.

Data de-duplication is the process of merging record pairs representing the same real world entity, also known as consolidation. It relies on record matching to find pairs of possible duplicate records.


6. Data Quality Tools

All the tasks involved in a data quality assurance project can be done manually on the database, or in the ETL processes if working on a data source being loaded from another source. However, there are many vendor applications on the market to make these tasks much easier. Most of these applications can be integrated with other ETL tools, and offer batch processing as well as real-time processing to ensure we always have the best quality data.

In the next post in this series of blog articles we are introducing Oracle Enterprise Data Quality (EDQ), a leading data quality tool, so stay tuned!

[1] Margaret Rouse, “data quality”. Web. TechTarget. November 2005. Accessed January 2017.


Javier Giaretta, Nicolas Ribas and Axel Bernaus

Click here if you would like to know more about Data Quality!


OBIEE12c Integration with Oracle EBS Security


Integration of the Oracle Business Intelligence Enterprise Edition and Oracle E-Business Suite provides a seamless controlled flow between the systems, allowing for drill down and drill back from key metrics to underlying detail. Theoretically, this can be done between any OBIEE form and any EBS form with pass-thru to any connected EBS subsystem.

If you are integrating OBIEE with EBS, you are likely to be using OBIA, Oracle Business Intelligence Analytics, although this is certainly not a requirement. OBIA is a pre-built, pre-packaged BI solution that delivers role-based intelligence to the organization. It is a set of OBIEE dashboards and reports that run from a pre-built warehouse previously serviced by Informatica/DAC, while the next generation of the OBIA warehouse utilizes Oracles Data Integrator, ODI, which runs high-volume batch load plans, event-driven load plans, and even SOA data services.

1. OBIEE 12c Configuration

While configuring an initialization block to retrieve data from EBS, make sure that Row-wise initialization is checked, as this allows multiple results to be stored in the variable, regardless of whether the variable is static or dynamic; otherwise you will only be able to retrieve the last item in the result set. Be sure to set the Execution Precedence of the EBS Integration init block that attaches the session through the session cookie, so that it executes before any attempt is made to retrieve security information.

Figure 1: EBS Configuration

Figure 1: EBS Configuration

Two files must be modified in order for WebLogic to find, accept and attach to the EBS session.

• instanceconfig.xml

• authenticationschema.xml

To configure external authentication, you will need to modify instanceconfig.xml as follows:
Path to instanceconfig.xml:


Note: Take a backup of the file before editing.

Add “EBS-ICX” in the EnabledSchemas xml tag.

<!--This Configuration setting is managed by Oracle Enterprise Manager Fusion 
Middleware Control-->

Middleware recognizes internal schema name for interface to EBS "EBS-ICX". The Oracle already worked on that. It is only necessary to let the system know that this is to be utilized. Then, let the system know the name of the session cookie that EBS writes.
To configure authenticationschema.xml it is necessary to know the Oracle EBS instance cookie name.
Follow the steps to get the cookie name.

1. Login to Oracle EBS

2. If you are using Google Chrome or Firefox then open Dev Tools ➜ Web Console and write the following command:




ALERT command will pop up a dialog box as follows, while DOCUMENT.WRITE will display the cookie information in browser window.

Figure 2: Alert command pop up

Figure 2: Alert command pop up

Notice that key value pair of the cookie, ERPU1 is the cookie name of Oracle EBS Test instance and the value dynamically generated for each user after each login. We only required the key from it which is “ERPU1”. Now we will use this value in authenticationschema.xml file.

Path to authenticationschema.xml file:


Note: Take a backup of the file before editing.

Edit the following tags in the file:

<SchemaKeyVariable source="cookie" forceValue="EBS-ICX" nameInSource="ERPU1" />


<RequestVariable source="cookie" type="auth" nameInSource="ERPU1" 

As per Oracle Doc ID 2141505.1
"Access Prohibited" When Logging In To Analytics In EBS Integrated Environment.
Following tag need to be added in the file under “AuthenticationSchema”.

<RequestVariable source="constant" type="auth" nameInSource="ssi" 

That’s it for OBIEE configuration!


2. RPD Changes using Administration Tool

Here comes the part that is familiar to every OBIEE administrator, the RPD modifications. If you are following the document, the sample EBS connection pool can be used or create a new one just for the initialization process and retrieving security.
Create database objects and connection pools for Oracle EBS database.

Figure 3: EBS Connection Pool

Figure 3: EBS Connection Pool

Note: APPS user should have all READ and EXECUTE permissions to run PL/SQL queries. If not, grant the privileges to the APPS user.

Now, create an init block which will use this connection pool to retrieve the EBS context and set that into OBIEE session variables. The init block will use the just defined connection pool and will send a data source query to the EBS database:


Figure 4: EBS Security Context

Figure 4: EBS Security Context

Referring to another Oracle document, 1539742.1, create these static session variables to hold the context:


Note: You have to create another init block named “EBS Security Context – ROLES – Row wise” only for ROLES as a user will have more than one Role in Oracle EBS and the init block will be set for row-wise initialization.

Figure 5: EBS Security Context - Roles

Figure 5: EBS Security Context - Roles

Figure 6: EBS Security Context - Roles - Row Wise

Figure 6: EBS Security Context - Roles - Row Wise

The following query will be used to fetch all the Responsibilities of log-in user and assign it to the variable ROLES.



3. Oracle EBS Configuration

Now we need to introduce a responsibility for OBIEE through which a user can navigate to OBIEE from Oracle EBS.

1. Create a Function, using Application in EBS:

Figure 7: Oracle EBS Configuration - Form Functions - Description

Figure 7: Oracle EBS Configuration - Form Functions - Description

2. In the Properties Tab, add as follows:

Function: OBIEE
Type: SSWA jsp function
Maintenance Mode Support: None
Context Dependence: Responsibility

Figure 8: Oracle EBS Configuration - Form Functions - Properties

Figure 8: Oracle EBS Configuration - Form Functions - Properties

3. In the Web HTML tab, add the following link:

Function: OBIEE
HTML Call: OracleOasis.jsp?mode=OBIEE&function=Dashboard

Figure 9: Oracle EBS Configuration - Form Functions - Web HTML

Figure 9: Oracle EBS Configuration - Form Functions - Web HTML

4. Create a Menu in Oracle EBS named “OBIEE Dashboard” and add the Function created in step 1:

Figure 10: Oracle EBS Configuration - Menus

Figure 10: Oracle EBS Configuration - Menus

Note: Only create Menu for OBIEE Dashboard

5. Assign Menu to the relevant responsibility:

Figure 11: Oracle EBS Configuration - Users

Figure 11: Oracle EBS Configuration - Users

6. Set Profile

You need to enter the URL of the Oracle BI Server as part of a profile. You can set up a profile for a responsibility, a user, or a site. The following procedure shows how to set profile options for a responsibility:

Figure 12: Oracle EBS Configuration - Find System Profile Values

Figure 12: Oracle EBS Configuration - Find System Profile Values

You should use a fully-qualified host server.domain name rather than an IP address or just a host name. The OBIEE domain must be the same as the Oracle EBS domain, so that the EBS-ICX cookie is visible to OBIEE from the user's browser.

OBIEE 12c: Integrating OBIEE 12c with Oracle E-Business Suite (EBS) Security (Doc ID 2174747.1)
Chapter 9: Oracle® Fusion Middleware Integrator's Guide for Oracle Business Intelligence Enterprise Edition.

Click here if you would like to receive more information about the topic or if you need help with your EBS-OBIEE configuration!

Unleash the power of R to forecast data from your OLAP Cubes with Tableau


Unleash the power of R to forecast data from your OLAP Cubes with Tableau

R is a programming language and software environment for statistical computing, the leading tool amongst statisticians, data miners and data scientists for performing sophisticated statistical analysis, data discovery and predictive analytics on any set of data, such as linear/non-linear modelling, classification, clustering, time-series analysis and so on.

R offers a wealth of functions and, as an open source project, its capabilities are constantly enhanced and extended through user-created packages; this is why it is so popular and just keeps growing.

In this article, we will see how R can be connected to Tableau to perform highly-customizable forecasting of your data from OLAP Cubes. Your users will benefit without the need for an advanced coding background, and visualize the results in the Tableau charts that they like the most, with the help of just a few neat tricks.


1. Forecasting with Tableau

Tableau already comes with a forecasting feature; it uses a technique known as exponential smoothing which tries to find a regular pattern in the measures that can be continued into the future.

To access this feature, simply drag a date and a measure to your sheet and right-click on the canvas. In this example, we are using Tableau’s Superstore sample dataset.

Figure 1: Tableau’s Superstore sample dataset.

Figure 1: Tableau’s Superstore sample dataset.

Select Forecast ➜ Show Forecast, and the estimated line will appear. As you can see in the picture below, it’s very simple. What Tableau did was basically a linear regression to a straight line; it also shows the confidence interval. Right-click on the canvas again to access the Forecast Options, and an options pane will pop up.

Figure 2: Forecast Options

Figure 2: Forecast Options

In the options pane (below, 1), you can set a few parameters, such as forecast length, aggregation type, and you can change or remove the confidence intervals (called prediction intervals) and adjust the forecast model: it can be automatic, automatic without seasonality, or custom. If you choose custom, you will be able to select the type of Trend and Season: None, Additive or Multiplicative.

Figure 3: Forecast Options, Summary and Models

Figure 3: Forecast Options, Summary and Models

By right-clicking again and selecting Describe Forecast, you will be shown a pane that gives you additional forecast information. You can see this in 2 and 3.


2. Why use R instead?

What we have described so far is basically everything that Tableau can offer when it comes to forecasting. It works pretty well, but it’s not really customizable and, above all, it doesn’t work with OLAP Cubes!

OLAP Cubes have many advantages, but unfortunately, Tableau doesn’t allow us to do all the things we can do with a “traditional” data source, and forecasting is one of these (you can find a list of other OLAP-related limits here.

Luckily for us, Tableau is such a great tool that, despite not being able to perform forecasting on cubes, it offers us an even more powerful alternative: a smooth integration with R.


3. Integrating R in Tableau

There are 4 functions that allow Tableau to communicate with R: SCRIPT_REAL, SCRIPT_STR, SCRIPT_INT, SCRIPT_BOOL. They only differ in the return type, as suggested by their names.

They work very simply, and take at least 2 parameters. The first one is an actual R script; it is pure R code that is passed on to the R engine to be executed, and that will return the last element that is called. From the second argument, the values that are passed on are specified. In the script, you will refer to them as .arg1, .arg2, and so on, depending on the order in which they are declared.

R will take these values, interpret them as vectors (R basically only works with vectors and their derivatives), and return a vector of data that Tableau can use just like any other calculated field.


4. Connecting R to Tableau

In order to communicate, R and Tableau must be connected, and this is really easy to do:

1. Download R from here or R Studio from here, and install it

2. From the R console, download the Rserve package and run it:


3. In Tableau, select Help ➜ Setting and Performance ➜ Manage External Service Connections

4. Enter a server name or an IP (in the same machine, use localhost or and select port 6311

5. Test the connection by clicking on the “Test Connection” button, and press OK if you get a positive message

You can also automatize the start of Rserve by creating an Rscript with the code mentioned in point 2 (without repeating the installation), save it as a .R file, and call it from a .bat file with “start Rscript myRserveScript.R”. You’ll find it pretty useful, especially in a server environment, because you can schedule its execution when the machine starts via the Task Scheduler, or by copying a shortcut to %AppData%\Microsoft\Windows\Start Menu\Programs\Startup (in Windows).

After Rserve has started and Tableau is successfully connected, you can start using the 4 Script_* functions to exploit R’s power.


5. Forecasting in R

To perform forecasting in R, you have to download a user-made package called “forecast”.

To do so, simply write install.packages(“forecast”) in your R console, and call it by writing library(forecast) any time you need it.

It offers various types of fully-customizable forecast methods, like Arima, naïve, drift, Exponential Smoothing (ETS), Seasonal-Trend decomposition by Loess (STL) and so on, seasonable or not, with a number of different parameters.

For more details about the package, check the author’s slides, the help package pdf , or each function’s help page by writing ?func_name in the console (ex: ?stlf).


6. Use case

In our use case, we will take a series of monthly measures from the last 3 years and forecast them with different frequency/period combinations. We will use the function called stlf, which, without getting bogged down in details, gives us more freedom when playing with periods and frequencies whilst still providing a solid result.

In the picture below, you can see our data, from January 2014 to October 2016.

Figure 4: Data, from January 2014 to October 2016.

Figure 4: Data, from January 2014 to October 2016.

We dragged our date dimension (Year Number Month Number) to the column field, and put our desired measure in the row field. The trend is not regular, but we can see how the line has a few peaks and generally tends to grow.


7. Writing the R scripts

Let’s see how to write the R scripts that perform the forecast.

Firstly, we need to know what values we will be passing: we have our measure (for R to accept it, it must be aggregated, but coming from the Cube it already is, and if not, the ATTR function can be called on to help) and the two integer parameters (create them in Tableau) for the number of periods to be forecasted and for the frequency of the season.

Let’s use them in this order, calling them ForecastMeasure, ForecastPeriods and Freq. The calculated field should initially look something like this:

data <- .arg1;
nulls <- length(data[]);
data <- data[!];
freq <- .arg3[1];
periods <- .arg2[1];
[Forecast Measure],[ForecastPeriods],[Freq]

In the first argument, we have already set up the R variables that collect the values to pass to the script (this is not mandatory, but just to keep the code cleaner). Remember that R works with vectors: notice how the first one (data) takes the entire .arg1, because we actually want the whole array of measures, while the parameters (simple integers) need to taken from the first value of a 1-sized array. We then filter this vector to remove eventual null values, and we save the number of those in a variable called nulls (we will need it later).

We now perform the forecasting. Add the following lines to the script:

time <- ts(data,frequency=freq);
fcast <- stlf(time, h=periods);

The first line simply calls the forecast library; we then create a time series object, by passing the measure and the frequency parameter (for more customizable details, check the resources already mentioned). Finally, we produce our forecasting result, by feeding the stlf function with the time series object and the periods parameter.

Out forecast is ready. If we use the same data and we execute the script in R, the console will look like this:

The interesting columns are Forecast, Lo80, Hi80, Lo95, Hi95. The first one contains the actual forecasted value. There are 12 because we used parameter = 12 (freq = 12, too). The others are the confidence interval (the limits can be changed when calling the function).

Now let’s complete our script. There is one thing we need to remember: Tableau expects to receive the same number of measures that it sent, so we append our forecasted array to the original one, properly shifted, along with rep(NaN, nulls), an array of nulls (we know how many there are because we saved them). Add these lines to the script:

n <- length(data);
result <- append(data[(periods+1):n],fcast$mean);
result <- append(result,rep(NaN,nulls));

The result that we want is fcast$mean, which corresponds to the column called “Forecast” in the R matrix that we saw. We append it along with eventual null values to the segment of original data that starts from the position number period+1, in order to get an array of the original size (note: by doing this, we won’t visualize the first period elements).

That’s it! Your new calculated field should look like the one below. Drag it to the row field, and you get your forecast.

data <- .arg1;
nulls <- length(data[]);
data <- data[!];
freq <- .arg3[1];
periods <- .arg2[1];

time <- ts(data,frequency=freq);
fcast <- stlf(time, h=periods);
n <- length(data);
result <- append(data[(periods+1):n],fcast$mean);
result <- append(result,rep(NaN,nulls));
[Forecast Measure],[ForecastPeriods],[Freq]

Figure 5: Complete R script and result

Figure 5: Complete R script and result


8. Shifting time measures

Now, we start getting the first issues. Tableau received a vector from R the same size as the one that we passed it, so each date-measure value is substitued for the new one, coming from the calculated field. The problem is that the date hasn’t changed!

Normally, this could be fixed by creating a new calculated date with the help of the DATEADD function, but as we can’t add a dimension in cubes, we have to overcome this issue by getting dimension value as a measure with an MDX formula, and then applying DATEADD to it.

Create a new calculated member, and set it as:


Then create a calculate field like this:


Where [YearNumberMonthNumber] is (for example) the name of your newly calculated member. This will get you a measure with the date shifted accordingly to the forecasted period that you selected.

Figure 6: Tableau's Error

Figure 6: Tableau's Error

Drag this new field on top of the original date column and… your chart will crash!

Why? You’ve just created a new shifted date for every original date value, and basically your data is now “split” into “columns” of 1 value, 1 for each shifted month, and R will receive them 1 by 1 instead of getting an entire vector to process. This explains the error that says “series is not periodic or has less than two periods”, because what R is actually seeing is a series of indipendent 1-sized vectors.

To fix this, there’s a simple but neat trick: add your original data field to “Detail”, so that it is available in the sheet, and edit your calculated forecast field like this.

Figure 7: Table Calculation - Forecast (Months)Figure 7: Table Calculation - Forecast (Months)

This will make Tableau process the calculated field at the correct level, and help R see the received data in the correct way, thus producing the desired result.

That’s it! You’ve just forecasted your OLAP cube!

Figure 8: OLAP cube forecasted

Figure 8: OLAP cube forecasted


9. Adding confidence intervals

You can also add confidence intervals to improve your visualization: just create two more metrics, copying the one you created for the forecast, and change fcast$mean to fcast$lower[,1], fcast$lower[,2], fcast$upper[,1], or fcast$upper[,2], depending on the limit that you want to show (remember that the defaults are 80 and 95, but these are editable in the R function).

In the picture below you can see the forecast with the 95% bounds (fcast$upper[,2] and fcast$lower[,2]) , and the trend line (calculated by Tableau).

Figure 9: Forecast final visualization

Figure 9: Forecast final visualization


10. Conclusions

In this guide we have shown how R can help you build great forecast charts for your OLAP Cube data, but this isn’t the only benefit: we got a much higher degree of customization, and overcame the limitations of Tableau when it comes to defining parameters, methods or entire algorithms.

Once Tableau can connect to R, its potential is virtually unlimited, and it can enhance your charts in an infinity of ways: from any source (not necessarily OLAP Cubes), or with any goal, just send the data to R and let it do the job, whatever it is! Forecasting, clustering, classification, or anything else with an R package written for it – you could even write one yourself! Then visualize the result in Tableau, and adjust the look and feel of your dashboard with the powerful features that you love.

Click here if you would like to receive more information about the topic!

privacy policy - Copyright © 2000-2010 ClearPeaks