Informatica Performance – Optimization techniques

.

Informatica performance - Optimization techniques

 

Informatica provides the market´s leading data integration platform. ETL Mappings are designed for data loading into the data warehouse environment to achieve better reporting which in turn helps you to understand business trends better. The major problem faced by anyone working with Informatica ETL is to design a mapping(s) that doesn’t compromise its performance. Often we end up creating a mapping that achieves only the functionality but suffers in terms of performance.

The article explains the steps to identify the performance bottlenecks and the advanced optimization techniques that are available in Informatica to overcome them.

Following topics are discussed in detail:

* Optimizing sources, Targets and Buffer Blocks:
Performance bottlenecks can occur when the Integration Service reads from a source database or writes to a target database. Depending on the source, target data, you may need to increase or decrease the buffer block size. In this article we list the steps you have to take to ensure that the sources and the Targets are fully utilized to optimum level and how to calculate Buffer block size.

* Push down Optimization (PDO):
Push Down Optimization Option enables data transformation processing to be pushed down into any relational database to make the best use of database processing power.

* Session Partitioning & Dynamic Session Partitioning:
The Session Partitioning Option increases the performance of Power Center through parallel data processing. In dynamic partitioning, Integration Service determines the number of partitions to create at run time based on factors such as source database partitions or the number of nodes in a grid.

* Concurrent Workflow Execution:
A workflow configured for Concurrent Execution can run as multiple instances concurrently. In the article we illustrate this functionality by using a workflow that is configured with multiple instances to process data belonging to multiple sites simultaneously.

* Load Balancing:
Load balancing is a mechanism which distributes the workloads across the nodes in the gird. In the article we see how to assigning Service Levels to Workflows to achieve Load Balancing.

Click to read the full article: Informatica performance - Optimization techniques!

 

Business Intelligence in the Oil & Gas sector

.

Business Intelligence in Oil & Gas: Decoding the dataWe are excited to be featured once again in Pipeline Magazine with the article "Decoding your data" - about the Business Intelligence Platform we deployed at Abu Dhabi Gas Industries Ltd (GASCO)!

The digitisation and automation of assets offers producers in the global oil and gas sector the opportunity to dramatically increase their efficiency and productivity levels by collecting performance data from their assets. The amount of data being collected in this sector is staggering. Across the Middle East, tens of thousands of remote sensors are collecting millions of gigabytes of data from oil and gas operations.

However, collecting the data is only the first part of the challenge. If operators are to make the most of this wealth of digital information, they must find ways to quickly and efficiently analyse it. If this is done properly, companies should be able to:

* review their past performance
* refine their current production processes in near real time
* and even predict trends as they emerge

All of this should allow them to stay ahead of the profitability curve.

As many of you already know ClearPeaks has provided Business Intelligence (BI) solutions for operators in the oil and gas sector. We have delivered a ground breaking project with UAE gas processing giant, Abu Dhabi Gas Industries Ltd (GASCO). We began working with Gasco in 2010, when the company decided that it wanted to streamline its operations through better use of analytical data.

With the GASCO BI Platform, we have delivered arguably one of the most advanced Oil & Gas Analytics environment in the Gulf region. It extends the capabilities of Oracle BI ensuring optimal customer experience and user adoption, whilst leveraging the Oracle platform’s versatility and robustness. We are proud to announce:

* a user adoption of over 1000 active users across all departments and production sites
* from Finance, HR & Procurement to Gas Plant Production & Maintenance
* addressing users’ needs from C-level to site management and down to operational analysts
* consuming more than 300 corporate reports and over 15 corporate BI dashboards daily

The entire enterprise is now enabled on one integrated platform.

Read the full article here!

Pipeline Magazine delivers the latest Oil, Gas and Energy news from the Middle East and across the globe,  and is the world's leading specialist magazine for OSS, BSS, ICT, and ICE technology news and information.

If you would like to receive more information about the GASCO BI project and how this experience can benefit your organisation, please contact us to receive a copy of the Success Story e-book or to hear about it directly from our team!

Data Mining & Business Intelligence

.

Data mining & Business Intelligence

 

The term data mining refers to one of the processes involved in the task of extracting knowledge from a database, also known as KDD (Knowledge Discovery in Data bases). However, by extension, it is referred to as the KDD global process because of its commercial appeal. Understanding data mining as a KDD sub-process, we could define the term as the process of extracting underlying knowledge from a large volume of data.

It is a recent development directly linked to the scientific fields of mathematics (mainly statistics), computer science and artificial intelligence. It can be supported by different Business Intelligence systems, from which we can obtain several advantages!

In this article we´re highlighting how a Business Intelligence system is a great starting point for the data mining process and how data mining can be used for process optimization.

The topics covered are:

  1. Definition
  2. Objectives and challenges
  3. KDD Process Phases
  4. Applications:

➀ Data mining & Business Intelligence
➁ Extracting knowledge from unstructured data
➂ Optimization of processes

Click this link to read the full article on data mining: Data Mining & Business Intelligence

ClearPeaks featured in Pipeline Magazine!

.

ClearPeaks featured in Pipeline Magazine
We are excited to be featured in the Suppliers Focus section of this month´s issue of Pipeline Magazine!

Pipeline Magazine delivers the latest Oil, Gas and Energy news from the Middle East and across the globe,  and is the world's leading specialist magazine for OSS, BSS, ICT, and ICE technology news and information.

We have been active in the Middle East for eight years now,  with offices in Dubai and Abu Dhabi, allowing us to provide our award-winning services to meet growing demand in the region, where the shifting economic landscape makes BI is an increasingly valuable asset. We are proud to say that our operations there are constantly expanding, and this year alone we will be delivering several major projects, with more “in the pipeline” for the coming years.

Read the full article here: 
Pipeline Magazine June 2016 - Suppliers Focus!

 

Customizing ODI Knowledge Modules

.

Lately we have seen a growing interest in ODI (Oracle Data Integrator). We have been working with this tool here in ClearPeaks for a while now, on several projects for different verticals, but it is also true that with the appearance of the newer versions of Oracle Business Intelligence Applications (OBI Apps) 11.1.1.X, it’s a pretty hot topic, and that’s why we’ve decided it was time to publish a quick “tips and tricks” guide for ODI, hoping to save our customers a lot of effort, time, and money.

The first thing to bear in mind when you start working with ODI is that you really must follow best practices in order to take full advantage of two of the most powerful weapons in the ODI arsenal: the declarative design and the knowledge modules. For those of you that don’t have too much experience with ODI, let me run through these two outstanding features.

The declarative design basically allows you to create an abstraction layer between the developer and the actual code that is going to be performed by the DBMS. This means that in ODI you define “what you want to get”, and the way to do so is automatically implemented by the Knowledge Module.

However, you might ask yourself “Is this possible? Can we really rely on the default ODI KMs?” Well, the answer is very simple: for standard needs, where performance is not a problem, yes! But in most of our BI projects, remember that we have had to tune the KMs to adapt them to our customers’ needs and to obtain the maximum benefit from the tool.

But don’t think that this undermines what is a fantastic feature. ODI comes with a great set of KMs that give you the perfect starting point to create your own customized KM. And moreover, all the developers don’t need to go into the details of the KM implementation; in a typical ODI project, the architect will be responsible for setting up the ODI environment and will provide the whole team with the appropriate KMs that will satisfy the particular project needs.

So in principle, the developers don’t need to know all the implementation details (it is up to each user/developer to go beyond and analyze the code ODI is generating, if required). This abstraction significantly speeds up the process of developing an ETL, since once the logic is established, there is no need to redo it over and over again.

A typical example to consider is the logic necessary to load a Slowly Changing Dimension (SCD Type II). With other tools, each developer would need to fully understand the logic of the SCDs and the way the integration process is performed, since it has to be replicated for each table to be loaded.

With the ODI declarative design, you just establish what you want, and the KM will take care of the logic. So you simply need to indicate:

➜ I want to treat “Table X” as a Slowly Changing Dimension (we will mark it like this).

Oracle Data Integrator

 

➜ I want to use “Column X” and “Column Y” to store the starting and ending date of the row, respectively.

Oracle Data Integrator

 

➜ I want to use “Column Z” as the current row flag.

Oracle Data Integrator

 

➜ I want “Column A” to be the Primary Key (Surrogate Key) of “Table X”.

Oracle Data Integrator

 

➜ And I want this column / set of columns (e.g. B and C) to be the Integration Key (Natural Key) of Table X (the column, or columns, that will be used to determine if the row is a new row, or if the row previously existed and has to be updated).

Oracle Data Integrator

 

➜ Finally, we can indicate for each column if we want the ETL to add a new row when the value changes in the source system, or if we prefer to update the whole table.

Oracle Data Integrator

 

✓ And that’s it! By then selecting the appropriate KM, tuned with the necessary logic by our architect, we can develop as many mappings as we want for Slowly Changing Dimensions. Just indicate the source for each column and run it. Quick and easy!

We have also mentioned the concept of Knowledge Modules. So, some of you may wonder, what is a “Knowledge Module”? This is simply a generic set of steps that will perform the needed logic for your ETL process. Each step can be written in different languages (SQL, Oracle-SQL, Jython, and many more) depending on the technology underneath, with placeholders for each column, table, and in general, “entity” that will take part in our ETL. At execution time, those placeholders are filled with the details of the mappings that have been developed, and this gives the project team the flexibility to reuse logic and speed up the delivery of the ETL process.

Well, that’s enough of an introduction to these two handy features in ODI. Now let’s see some usage examples of the things that can be done, and which can help our customers to reduce implementation time.


1. Automatizing Loads

▼ Automatizing the Incremental / Full Load Strategy
There are several ways to automatize an Incremental / Full Load Strategy using KMs. If we think back to the previous version of OBI Apps, in Informatica we had to have two separate mappings for the Incremental and the Full version of the mapping. With ODI this can be automatized in several ways:

ⓐ Manually: Adding an option to the mapping
The simplest way is to add an option to the mapping so we manually specify if we want to execute a full load or not. This option will drive the execution of a new step in our KM. For example, this step may consist in truncating the table, and if we are populating a dimension, resetting the associated sequence.Here we see the importance of following best practices and naming conventions. If we follow a good methodology when naming our DWH objects, we can distinguish the table type by the name, and also the sequences can be easily related to the corresponding dimension.

Oracle Data Integrator
Oracle Data Integrator

 

ⓑ Automatically: Adding an option to the mapping and using a control table
The second option is an extension of the previous one. By using a control table containing the table name and one column containing the “FULL_LOAD_FLAG”, we can invoke this process only if the DWH administrator has set the property in the table (FULL_LOAD_FLAG = 1). This way, there is no need to modify any ODI object to change the behavior of the interface.

▼ Automatizing the population of the unspecified rows for dimensions
A typical need in any data warehousing project is to populate an “unspecified” row for each dimension. This will give us the capability of always making inner joins between the facts and dimensions, and thus improve performance in all our OLAP queries.

Once again, we could generate a separate interface for this, for each dimension, but will it be efficient? ODI gives us the possibility of defining a generic step in our KM to be executed for any dimension table to be loaded.

A typical need in any data warehousing project is to populate an “unspecified” row for each dimension. This will give us the capability of always making inner joins between the facts and dimensions, and thus improve performance in all our OLAP queries.

Once again, we could generate a separate interface for this, for each dimension, but will it be efficient? ODI gives us the possibility of defining a generic step in our KM to be executed for any dimension table to be loaded.

At execution time, ODI is aware of the data type of each column to be populated (remember that every object is stored in the model, and so we have an internal representation stored in the ODI repository). We can prepare a simple process that will basically assign a default value for each data type. For example:

Oracle Data Integrator

Oracle Data Integrator

NOTE: See the ODI Substitution API Reference to learn more about the functions you can use to retrieve useful information from the ODI model, or contact us for further details!

▼ Automatizing the population of Fact Modules by using a control table
Another interesting functionality we usually include in our projects is control of the periods to be loaded by a control table. As we mentioned before, with this table we can change the behaviour of our ETL process just by changing one row in one table. This speeds the process of programming and administering the loads up a lot.

By using this table, the ETL administrator is capable of specifying the starting date that we need to populate, and the number of periods to be loaded. More options can be useful in this table, like a flag (PREV_EXEC_CORRECT_FLG) indicating if the automated previous execution of the ETL was correct (if not the ETL should not continue in order to avoid inconsistencies, for example, in Slowly Changing Dimensions). Another flag might be used to indicate if the population of this table should be included in the automated process (IS_PART_OF_DAILY_LOAD), or the FULL_LOAD_FLAG that we already mentioned.

In the example below, the source system needed to be queried daily due to performance restrictions; by specifying a starting date and the number of periods to be loaded, the ETL automatically ran a loop for the number of days to be loaded by leveraging the ODI variables.

Oracle Data Integrator
Oracle Data Integrator

In this case, it is also important to highlight that the interfaces have to be filtered by a variable containing the value of the day to be populated:

Oracle Data Integrator

Another interesting modification done to the KMs involved in this case study is that of the corresponding CKM to add an option that will avoid deleting the previous errors, since the same session is going to be executed several times in a loop.

Oracle Data Integrator

In some cases, it might even be interesting to remove the period from the target table before inserting, if we are performing a full period insertion (we don’t want to merge the data with the previously inserted data, but to directly reload a complete period).

Oracle Data Integrator

 

As you can see above, by using the ODI functions we can generate generic steps that will work for any object in our ODI model. For example, <%=odiRef.getTable(“L”,”TARG_NAME”,”A”)%> will retrieve the target table name.

See: http://docs.oracle.com/cd/E21764_01/integrate.1111/e12645/odiref_reference.htm#ODIKD1295


2. Boosting performance

▼ Boosting performance for populating staging tables
Since ODI 11g does not have a multi-target capability, we sometimes need to reuse the same staging table to populate several final targets; we have to load these staging tables as fast as possible to speed up the performance of the whole ETL process. We can use the default KMs to load these staging tables, but for one staging table you may not need to apply integration processes, validations checks and other time-consuming logic; to get round this, a good practice is to generate your own staging KM, including only the necessary steps:
Oracle Data Integrator

Then, in the “Insert” step, you can populate the target table directly, instead of the integration table:

<%=ODIREF.GETTABLE("L","TARG_NAME","A")%>
VS
<%=ODIREF.GETTABLE("L","INT_NAME","W")%>

Oracle Data Integrator

 

▼ Boosting performance by using partition exchange
In order to boost performance of load processes and reporting as well, it is sometimes useful to apply partitioning to your DWH tables. This will give you two main advantages:

➀ At ETL time, the population of a big table (GBs of data) is much faster if the table is partitioned, since you can populate a temporary empty table, and then execute a partition exchange to swap the partitions between both tables. This way, all the population in your temporary table, the consistency checks, constraints and all the processes you may need to do can be done much faster.
It is important to emphasize that this is useful for snapshot tables, and in general for tables that do not need to cross data between partitions.

➁ At reporting time, if you just need to analyze one partition (one period) at a time, or maybe two to compare them, the response time in a partitioned table will be faster.

To accomplish the partition exchange in ODI, you can also use a knowledge module that can encapsulate the logic needed. The command below will execute the partition exchange between the temporary table and the target table, leveraging ODI functions.


ALTER TABLE

<%=ODIREF.GETTARGETTABLE( "SCHEMA" )%>.<%=ODIREF.GETTARGETTABLE("RES_NAME")%>
EXCHANGE PARTITION ' || V_PARTITION_NAME || ' WITH TABLE
<%=ODIREF.GETTABLE("L", "INT_NAME","W")%>
<$="<%=ODIREF.GETUSEREXIT("PARTITION EXCHANGE OPTIONS")%>"$>

 

Note that the variable V_PARTITION_NAME will need to contain the name of the partition to be exchanged. This can be retrieved by a simple query on the temporary table that your mapping has populated.

➀ Loading historical information into a Slowly Changing Dimension
The last interesting modification that we are going to show you is to the Oracle SCD KM. Imagine that you launch your system with a starting date (e.g. 30 Sept., 2014) and after some months running your ETL system in production, the source system is updated with historical information so that the business can analyze trends and customer behavior over several years. Obviously, we can’t load the latest information in the source system to our DWH because it will override the current information with this stale but still relevant information. We need to look for a way to include this information in our system, so if a user queries information about old periods, they can analyze and explore the attributes the customer had at that point in time.

Well, we can do the following:

➀ Duplicate the mapping you are using to load your SCD.

➁ Duplicate the Slowly Changing Dimension IKM you are using to populate your SCDs.

➂ Modify the copy of your IKM to set the "ACTIVE_TO_DATE" to the date when you launched your system. This way you can run a separate load for older periods without impacting the daily load you are running on a daily basis with current data. Any row loaded by this second thread (older periods), will always finish by the date when you launched your system, so there won’t be duplicates in your SCD and you can explore both current and old information about the entities in your dimension (e.g. your customer profile, segment, location, etc.).

Oracle Data Integrator

 

Just by modifying small pieces of the given KM code, you can achieve your particular needs. Just be careful that you modify the appropriate properties (in this case, SCD_END and also the SCD_FLAG have to have different values from those in your daily thread).

We hope that you’ve found these techniques that can be applied to ODI KMs to speed up the process of generating and manipulating your ETL interesting, and if you have further questions, do not hesitate to contact us.
Stay tuned for more interesting features to apply to your ETL processes with #ODI!!

Get in touch with us and see what we can do for you!  ✉

Blog Article: Emilio Soriano

privacy policy - Copyright © 2000-2010 ClearPeaks

topnav