Data Mining & Business Intelligence

.

Data mining & Business Intelligence

 

The term data mining refers to one of the processes involved in the task of extracting knowledge from a database, also known as KDD (Knowledge Discovery in Data bases). However, by extension, it is referred to as the KDD global process because of its commercial appeal. Understanding data mining as a KDD sub-process, we could define the term as the process of extracting underlying knowledge from a large volume of data.

It is a recent development directly linked to the scientific fields of mathematics (mainly statistics), computer science and artificial intelligence. It can be supported by different Business Intelligence systems, from which we can obtain several advantages!

In this article we´re highlighting how a Business Intelligence system is a great starting point for the data mining process and how data mining can be used for process optimization.

The topics covered are:

  1. Definition
  2. Objectives and challenges
  3. KDD Process Phases
  4. Applications:

➀ Data mining & Business Intelligence
➁ Extracting knowledge from unstructured data
➂ Optimization of processes

Click this link to read the full article on data mining: Data Mining & Business Intelligence

ClearPeaks featured in Pipeline Magazine!

.

ClearPeaks featured in Pipeline Magazine
We are excited to be featured in the Suppliers Focus section of this month´s issue of Pipeline Magazine!

Pipeline Magazine delivers the latest Oil, Gas and Energy news from the Middle East and across the globe,  and is the world's leading specialist magazine for OSS, BSS, ICT, and ICE technology news and information.

We have been active in the Middle East for eight years now,  with offices in Dubai and Abu Dhabi, allowing us to provide our award-winning services to meet growing demand in the region, where the shifting economic landscape makes BI is an increasingly valuable asset. We are proud to say that our operations there are constantly expanding, and this year alone we will be delivering several major projects, with more “in the pipeline” for the coming years.

Read the full article here:
Pipeline Magazine June 2016 - Suppliers Focus!

 

Customizing ODI Knowledge Modules

.

Lately we have seen a growing interest in ODI (Oracle Data Integrator). We have been working with this tool here in ClearPeaks for a while now, on several projects for different verticals, but it is also true that with the appearance of the newer versions of Oracle Business Intelligence Applications (OBI Apps) 11.1.1.X, it’s a pretty hot topic, and that’s why we’ve decided it was time to publish a quick “tips and tricks” guide for ODI, hoping to save our customers a lot of effort, time, and money.

The first thing to bear in mind when you start working with ODI is that you really must follow best practices in order to take full advantage of two of the most powerful weapons in the ODI arsenal: the declarative design and the knowledge modules. For those of you that don’t have too much experience with ODI, let me run through these two outstanding features.

The declarative design basically allows you to create an abstraction layer between the developer and the actual code that is going to be performed by the DBMS. This means that in ODI you define “what you want to get”, and the way to do so is automatically implemented by the Knowledge Module.

However, you might ask yourself “Is this possible? Can we really rely on the default ODI KMs?” Well, the answer is very simple: for standard needs, where performance is not a problem, yes! But in most of our BI projects, remember that we have had to tune the KMs to adapt them to our customers’ needs and to obtain the maximum benefit from the tool.

But don’t think that this undermines what is a fantastic feature. ODI comes with a great set of KMs that give you the perfect starting point to create your own customized KM. And moreover, all the developers don’t need to go into the details of the KM implementation; in a typical ODI project, the architect will be responsible for setting up the ODI environment and will provide the whole team with the appropriate KMs that will satisfy the particular project needs.

So in principle, the developers don’t need to know all the implementation details (it is up to each user/developer to go beyond and analyze the code ODI is generating, if required). This abstraction significantly speeds up the process of developing an ETL, since once the logic is established, there is no need to redo it over and over again.

A typical example to consider is the logic necessary to load a Slowly Changing Dimension (SCD Type II). With other tools, each developer would need to fully understand the logic of the SCDs and the way the integration process is performed, since it has to be replicated for each table to be loaded.

With the ODI declarative design, you just establish what you want, and the KM will take care of the logic. So you simply need to indicate:

➜ I want to treat “Table X” as a Slowly Changing Dimension (we will mark it like this).

Oracle Data Integrator

 

➜ I want to use “Column X” and “Column Y” to store the starting and ending date of the row, respectively.

Oracle Data Integrator

 

➜ I want to use “Column Z” as the current row flag.

Oracle Data Integrator

 

➜ I want “Column A” to be the Primary Key (Surrogate Key) of “Table X”.

Oracle Data Integrator

 

➜ And I want this column / set of columns (e.g. B and C) to be the Integration Key (Natural Key) of Table X (the column, or columns, that will be used to determine if the row is a new row, or if the row previously existed and has to be updated).

Oracle Data Integrator

 

➜ Finally, we can indicate for each column if we want the ETL to add a new row when the value changes in the source system, or if we prefer to update the whole table.

Oracle Data Integrator

 

✓ And that’s it! By then selecting the appropriate KM, tuned with the necessary logic by our architect, we can develop as many mappings as we want for Slowly Changing Dimensions. Just indicate the source for each column and run it. Quick and easy!

We have also mentioned the concept of Knowledge Modules. So, some of you may wonder, what is a “Knowledge Module”? This is simply a generic set of steps that will perform the needed logic for your ETL process. Each step can be written in different languages (SQL, Oracle-SQL, Jython, and many more) depending on the technology underneath, with placeholders for each column, table, and in general, “entity” that will take part in our ETL. At execution time, those placeholders are filled with the details of the mappings that have been developed, and this gives the project team the flexibility to reuse logic and speed up the delivery of the ETL process.

Well, that’s enough of an introduction to these two handy features in ODI. Now let’s see some usage examples of the things that can be done, and which can help our customers to reduce implementation time.


1. Automatizing Loads

▼ Automatizing the Incremental / Full Load Strategy
There are several ways to automatize an Incremental / Full Load Strategy using KMs. If we think back to the previous version of OBI Apps, in Informatica we had to have two separate mappings for the Incremental and the Full version of the mapping. With ODI this can be automatized in several ways:

ⓐ Manually: Adding an option to the mapping
The simplest way is to add an option to the mapping so we manually specify if we want to execute a full load or not. This option will drive the execution of a new step in our KM. For example, this step may consist in truncating the table, and if we are populating a dimension, resetting the associated sequence.Here we see the importance of following best practices and naming conventions. If we follow a good methodology when naming our DWH objects, we can distinguish the table type by the name, and also the sequences can be easily related to the corresponding dimension.

Oracle Data Integrator
Oracle Data Integrator

 

ⓑ Automatically: Adding an option to the mapping and using a control table
The second option is an extension of the previous one. By using a control table containing the table name and one column containing the “FULL_LOAD_FLAG”, we can invoke this process only if the DWH administrator has set the property in the table (FULL_LOAD_FLAG = 1). This way, there is no need to modify any ODI object to change the behavior of the interface.

▼ Automatizing the population of the unspecified rows for dimensions
A typical need in any data warehousing project is to populate an “unspecified” row for each dimension. This will give us the capability of always making inner joins between the facts and dimensions, and thus improve performance in all our OLAP queries.

Once again, we could generate a separate interface for this, for each dimension, but will it be efficient? ODI gives us the possibility of defining a generic step in our KM to be executed for any dimension table to be loaded.

A typical need in any data warehousing project is to populate an “unspecified” row for each dimension. This will give us the capability of always making inner joins between the facts and dimensions, and thus improve performance in all our OLAP queries.

Once again, we could generate a separate interface for this, for each dimension, but will it be efficient? ODI gives us the possibility of defining a generic step in our KM to be executed for any dimension table to be loaded.

At execution time, ODI is aware of the data type of each column to be populated (remember that every object is stored in the model, and so we have an internal representation stored in the ODI repository). We can prepare a simple process that will basically assign a default value for each data type. For example:

Oracle Data Integrator

Oracle Data Integrator

NOTE: See the ODI Substitution API Reference to learn more about the functions you can use to retrieve useful information from the ODI model, or contact us for further details!

▼ Automatizing the population of Fact Modules by using a control table
Another interesting functionality we usually include in our projects is control of the periods to be loaded by a control table. As we mentioned before, with this table we can change the behaviour of our ETL process just by changing one row in one table. This speeds the process of programming and administering the loads up a lot.

By using this table, the ETL administrator is capable of specifying the starting date that we need to populate, and the number of periods to be loaded. More options can be useful in this table, like a flag (PREV_EXEC_CORRECT_FLG) indicating if the automated previous execution of the ETL was correct (if not the ETL should not continue in order to avoid inconsistencies, for example, in Slowly Changing Dimensions). Another flag might be used to indicate if the population of this table should be included in the automated process (IS_PART_OF_DAILY_LOAD), or the FULL_LOAD_FLAG that we already mentioned.

In the example below, the source system needed to be queried daily due to performance restrictions; by specifying a starting date and the number of periods to be loaded, the ETL automatically ran a loop for the number of days to be loaded by leveraging the ODI variables.

Oracle Data Integrator
Oracle Data Integrator

In this case, it is also important to highlight that the interfaces have to be filtered by a variable containing the value of the day to be populated:

Oracle Data Integrator

Another interesting modification done to the KMs involved in this case study is that of the corresponding CKM to add an option that will avoid deleting the previous errors, since the same session is going to be executed several times in a loop.

Oracle Data Integrator

In some cases, it might even be interesting to remove the period from the target table before inserting, if we are performing a full period insertion (we don’t want to merge the data with the previously inserted data, but to directly reload a complete period).

Oracle Data Integrator

 

As you can see above, by using the ODI functions we can generate generic steps that will work for any object in our ODI model. For example, <%=odiRef.getTable(“L”,”TARG_NAME”,”A”)%> will retrieve the target table name.

See: http://docs.oracle.com/cd/E21764_01/integrate.1111/e12645/odiref_reference.htm#ODIKD1295


2. Boosting performance

▼ Boosting performance for populating staging tables
Since ODI 11g does not have a multi-target capability, we sometimes need to reuse the same staging table to populate several final targets; we have to load these staging tables as fast as possible to speed up the performance of the whole ETL process. We can use the default KMs to load these staging tables, but for one staging table you may not need to apply integration processes, validations checks and other time-consuming logic; to get round this, a good practice is to generate your own staging KM, including only the necessary steps:
Oracle Data Integrator

Then, in the “Insert” step, you can populate the target table directly, instead of the integration table:

<%=ODIREF.GETTABLE("L","TARG_NAME","A")%>
VS
<%=ODIREF.GETTABLE("L","INT_NAME","W")%>

Oracle Data Integrator

 

▼ Boosting performance by using partition exchange
In order to boost performance of load processes and reporting as well, it is sometimes useful to apply partitioning to your DWH tables. This will give you two main advantages:

➀ At ETL time, the population of a big table (GBs of data) is much faster if the table is partitioned, since you can populate a temporary empty table, and then execute a partition exchange to swap the partitions between both tables. This way, all the population in your temporary table, the consistency checks, constraints and all the processes you may need to do can be done much faster.
It is important to emphasize that this is useful for snapshot tables, and in general for tables that do not need to cross data between partitions.

➁ At reporting time, if you just need to analyze one partition (one period) at a time, or maybe two to compare them, the response time in a partitioned table will be faster.

To accomplish the partition exchange in ODI, you can also use a knowledge module that can encapsulate the logic needed. The command below will execute the partition exchange between the temporary table and the target table, leveraging ODI functions.


ALTER TABLE

<%=ODIREF.GETTARGETTABLE( "SCHEMA" )%>.<%=ODIREF.GETTARGETTABLE("RES_NAME")%>
EXCHANGE PARTITION ' || V_PARTITION_NAME || ' WITH TABLE
<%=ODIREF.GETTABLE("L", "INT_NAME","W")%>
<$="<%=ODIREF.GETUSEREXIT("PARTITION EXCHANGE OPTIONS")%>"$>

 

Note that the variable V_PARTITION_NAME will need to contain the name of the partition to be exchanged. This can be retrieved by a simple query on the temporary table that your mapping has populated.

➀ Loading historical information into a Slowly Changing Dimension
The last interesting modification that we are going to show you is to the Oracle SCD KM. Imagine that you launch your system with a starting date (e.g. 30 Sept., 2014) and after some months running your ETL system in production, the source system is updated with historical information so that the business can analyze trends and customer behavior over several years. Obviously, we can’t load the latest information in the source system to our DWH because it will override the current information with this stale but still relevant information. We need to look for a way to include this information in our system, so if a user queries information about old periods, they can analyze and explore the attributes the customer had at that point in time.

Well, we can do the following:

➀ Duplicate the mapping you are using to load your SCD.

➁ Duplicate the Slowly Changing Dimension IKM you are using to populate your SCDs.

➂ Modify the copy of your IKM to set the "ACTIVE_TO_DATE" to the date when you launched your system. This way you can run a separate load for older periods without impacting the daily load you are running on a daily basis with current data. Any row loaded by this second thread (older periods), will always finish by the date when you launched your system, so there won’t be duplicates in your SCD and you can explore both current and old information about the entities in your dimension (e.g. your customer profile, segment, location, etc.).

Oracle Data Integrator

 

Just by modifying small pieces of the given KM code, you can achieve your particular needs. Just be careful that you modify the appropriate properties (in this case, SCD_END and also the SCD_FLAG have to have different values from those in your daily thread).

We hope that you’ve found these techniques that can be applied to ODI KMs to speed up the process of generating and manipulating your ETL interesting, and if you have further questions, do not hesitate to contact us.
Stay tuned for more interesting features to apply to your ETL processes with #ODI!!

Get in touch with us and see what we can do for you!  ✉

Blog Article: Emilio Soriano

Oracle Data Visualization Desktop – April 2016 Release

.

1. Introduction - Oracle Data Visualization Desktop

With the release of Oracle Business Intelligence Enterprise Edition 12c (OBIEE 12c), Oracle announced a new Data Visualization tool, aimed at ad hoc personal data discovery tasks. Oracle is putting a great deal of effort into developing this new tool, that is available as:

* component of Oracle BI Cloud Service (BICS)
* standalone cloud service named Data Visualization Cloud Services (DVCS)
* component of OBIEE 12c on premise
* a standalone desktop tool named Data Visualization Desktop (DVD)

Oracle Data Visualization Desktop

At the end of April 2016 Oracle released the first publicly available version of Oracle Data Visualization Desktop (DVD), under version 12.2.1.0.0 (and timestamp 20160422085526). In this blog post we will present the main characteristics of this tool (DVD); but most aspects are common to the other three above modalities.


2. Data Sources and Data Management

There are 3 types of data sources that can be used in DVD:

  1. Excel sheets, which allow complete offline analysis
  2. Databases like Oracle, SQL Server, MySQL, Teradata, Redshift, MongoDB, Spark, etc.
  3. OBIEE, where the user can connect to an existing Analysis or use a Logical QueryOracle Data Visualization Desktop

Once a source has been added to a project, DVD offers different options to manage the data:

* Modify data types (string, number, date, etc.)
* Alternate between Attributes and Measures
* Select the Aggregation rule of Measures
* Create Calculations using a wide variety of functions

Oracle Data Visualization Desktop

Multiple sources of different types can be added to a DVD project, joined automatically using fields with matching names; the joins can always be edited using the Source Diagram.

Oracle Data Visualization Desktop


3. Visualizations

One of the most important characteristics of DVD is the high number of visualizations available out-of-the-box. There are 22 data visualizations in total, plus the possibility of including Text Boxes and Images. All the available visualizations are shown in the image below:

Oracle Data Visualization Desktop

The visualizations are very easily created by dragging and dropping Data Elements (data columns in DVD) to the different Drop Targets (that is, the corresponding visual dimensions of the visualization).

Oracle Data Visualization Desktop

The visualizations can be highly customized in DVD. The user can edit the titles and axis, modify the colour schemes, sort the data, change the data format, etc.

In addition, the Map visualization allows you to create custom maps using the GeoJSON format. The underlying map engine is able to render the new maps and join them to the corresponding data element.

Oracle Data Visualization Desktop

Multiple visualizations can be combined in the Canvas, thus allowing the creation of complete dashboards to analyse the data. In addition, through the Data Brushing feature, the data selected in any visualization is highlighted in the others.

Oracle Data Visualization Desktop


4. Data Discovery and Advanced Analytics

As a Data Discovery tool, DVD includes multiple features to facilitate the data analysis process. One simple tool used for data discovery is the filters: the user can decide to filter Attributes based on values or Measures based on ranges.

Oracle Data Visualization Desktop

Together with the filters, Reference Lines and Trend Lines are available in DVD straight out-of-the-box. As well as these features, more Advanced Analytics tools are available in combination with R. For this reason, DVD includes an Oracle R Distribution (version 3.1.1) installer executable after the installation of DVD. When R and the required libraries are installed, we will be able to use Clustering, Outlier Detection and Forecasting, as well as custom R scripts.

In the example below we use Clusters to identify how the number of apartments by neighbourhood affects the price. In addition, we have a Reference Line to analyse the average apartment price for different room types. Finally, using Trend Lines, we can see that the relationship between minimum number of nights and price has been increasing over the last few years.

Oracle Data Visualization Desktop

Thanks to the data discovery and advanced analytics capabilities of DVD, we can easily identify hidden information and patterns in our data. In order to keep track of the data discovery flow, we can use the Story Navigator, which allows different insights to be saved. These insights are very useful when we want to share a project, letting other users understand our findings fast.

Oracle Data Visualization Desktop


5. Managing Projects

It is very easy to share Projects in DVD. The first thing to do is to save them locally; the different Projects are shown in the Home page. From the Home page we can select the option to export the Project, which will create a DVA (.dva) file. It is possible to store the source data in this file and protect it with a password.

Oracle Data Visualization Desktop

At the other end, we can similarly use the Import option to add the Project to our main folder in the Home page.


6. Oracle Data Visualization Training

We provide a wide range of training services designed to propel course delegates to a project-ready state, ensuring they have both the necessary BI theory as well as the hands-on practical skills needed to engage in a BI project with confidence.
Here in ClearPeaks we are experts on Oracle Data Visualization and we can offer you this expertise in our specialized training sessions.

Get in touch with us and see what we can do for you!

Blog Article Author: Iñigo Hernáez

OBIEE Data Lineage Solution

.

Tracking OBIEE reporting data

BI implementations may use hundreds of data warehouse tables and thousands of reports, and one of the most common requirements in a BI project is the ability to track back the data used in reports. The possibility to quickly identify the set of columns and tables used in a given report or dashboard, or to check which reports may be affected by a change of physical column in the data warehouse, is also crucial for development. This calls for a data lineage solution which is accessible to users and developers,  and allows checking such data on an ad hoc basis.

Various vendors offer data lineage solutions, but these can be expensive and vendor-specific. With our simple solution, we combine Catalog Manager and Administration Tool export sources to create an easily accessible solution for tracking data lineage in OBIEE.

By implementing the OBIEE data lineage solution, we can check the following:

  1. Which physical tables and columns are used in a given report or dashboard
  2. Which reports use given physical columns or tables; this is especially important when modifying an existing table, as any change in the table’s structure must take existing reports into consideration
  3. Which are the most commonly used columns in reports in a given subject area; identifying the most commonly used columns in a report can hint at creating indexes to improve the overall performance of the OBIEE implementation even further.

Solution

The ClearPeaks OBIEE Data Lineage Solution gathers all the required data lineage information in one place and uses it as a source for OBIEE reports within Data Lineage Subject Area. Two sources are combined to achieve this:

  1. List of OBIEE reports and presentation columns used

Catalog Manager provides an option to export the list of reports and columns used. The export can be done either with the Catalog Manager UI tool or through the command line utility. We use the latter option, as it allows automation of the whole process later.

  1. Mappings of presentation columns back to the physical columns

Such mappings can be obtained manually by creating Repository Documentation from Administration Tool’s Repository Documentation utility. The output will be a repository documentation in the form of comma separated values file (.csv). This will contain column mapping from presentation through logical to physical layers, including column formula expressions. Another way to obtain column mappings is by extracting the OBIEE repository in .xml file format through the command line utility. Our solution uses the second option, as the repository file will be used in our automated script.

OBIEE Data Lineage Solution

Once we have obtained both files, we need to populate the data lineage tables.

The data can be transformed and manually inserted into the tables, but in our solution we use a script (which can run on the OBIEE server) that parses the data and inserts it into the tables.

Data Lineage Subject Area

Once we have populated the data lineage tables and their data model has been mapped in Administration Tool, we can create and run reports in OBIEE using Data Lineage Subject Area and filter the results according to our requirements.

OBIEE Data Lineage Solution

Let us look at a few of the use cases for the Data Lineage Subject Area:

Use case 1. Which data warehouse tables and columns are used in a given report?

We would like to know which data warehouse tables and columns are used in a particular report or dashboard. We can create a report with a list of the columns used by a given OBIEE report and their data lineage:

OBIEE Data Lineage Solution


Use case 2.
Which reports use given physical tables?

We want to know how many and which reports or dashboards are using given physical tables or particular columns; this could be very useful when assessing the potential impact of column formula or table structure changes on reporting. Using Data Lineage Subject Area we can fetch up the list of OBIEE reports used by a given physical table:

OBIEE Data Lineage Solution


Use case 3.
Which reports use given subject areas?

We need to know which reports and dashboards are accessing data from given subject areas. This may be particularly useful when revising users’ access permissions.

OBIEE Data Lineage Solution


Future Improvements

OBIEE Data Lineage Subject Area can also serve as a backbone for further developments, providing complex and complete solutions for managing existing OBIEE project implementations. Here are some examples and potential benefits of merging additional information into the solution:

Usage Tracking data – allows analysis of the physical tables used in the most accessed reports, the tables and columns in reports not used by anyone, removing non-used reports or underlying tables.

Data warehouse database metadata – such as table size, indexes on columns. This allows for performance analysis of the most heavily used tables and columns by report usage.

ETL Data Lineage – additional layer of data lineage tracking – allowing the tracking of data back to the very source – can be achieved by adding ETL transformation data obtained from ETL systems. For example, it is possible to track all the ETL transformations on a given presentation column down to the source system.

Catalog metadata – it is possible to extract additional information regarding catalog objects such as user permissions, report owners, last modified date etc., to further enhance the solution usability.

Adding all the above components creates a strong core for the OBIEE management dashboard, allowing continuous improvement through:

* tracking data from the report to the source database column
* constant analysis of the most used tables in order to improve performance
* checking which users have permissions for given OBIEE reports and dashboards

All of the above is accessible from the OBIEE front-end, providing a convenient and quick way to facilitate many daily business-as-usual tasks in BI deployments.

Conclusion

The ClearPeaks OBIEE Data Lineage Solution can be easily deployed in any project using Oracle Business Intelligence Enterprise Edition. The solution can be run from the command line tools, which makes it possible to create automated jobs to extract and update data on a regular basis.

If you think your OBIEE project could benefit from our data lineage solution, please contact us for more information via our web contact form or by leaving your comments below!

Blog Article Author: Rafal Ostaszewski

privacy policy - Copyright © 2000-2010 ClearPeaks

topnav