ClearPeaks Blog

SQL Override: Mapping Reusability in Informatica

.

In a not so remote past, one of the biggest frustrations for ETL designers was having to replicate a job many times in order to accommodate slight differences depending on the data source. Whether it’s a change in the source table definition or a different clause in the extract SQL, today’s ETL tools have embed capabilities that allow for the reuse of metadata objects to save time and improve maintainability of the workflows.

In this post, I will describe with simple examples how to best reuse ETL mappings in Informatica PowerCenter using the SQL override capability.

A typical BI scenario, where reusing objects can be useful, is the consolidation of information in the data warehouse that has been retrieved from many instances of a specific OLTP system. Let’s assume we have a source dependent mapping which loads information about Contacts in a staging table (see screen shot below). In our scenario, the Contact information is stored in multiple data sources (for example, 3 transactional systems) and the source tables have the same structure.

As you can see, the SQL code is generic and does not contain any source specific condition.

SQL query in an Informatica mapping

SQL query in an Informatica mapping

More often than not, single OLTP instances have been slightly customised to fit a specific business or the technical needs of the country/business area that the transactional system is located in.

For example, let’s assume the conditions we need to apply to retrieve consistent data from the 3 sources are as follows:

  1. We need all Contacts aged 18+ from Source 1.
  2. We need Contacts of any age but having a ROW_ID beginning with “EU_” from Source 2.
  3. The BIRTH_DATE format in Source 3 is different from the respective dates in the other sources.

A way to retrieve this data and consolidate it in the staging area would be to have 3 different ETL jobs, one for each source, having basically the same metadata structure and with a different SQL in the Source Qualifier transformation. Each mapping would then be instantiated in its own session. Therefore, 3 mappings + 3 sessions = 6 objects to maintain.

By using the SQL override capability in the ETL session of Informatica Workflow Manager, it is possible to use a single mapping, containing the code described in the screen shot above, and change the SQL in three sessions, one for each source, embedding the appropriate conditions to retrieve the proper information.

SQL override in the session

SQL override in the session

SQL override in the session for source 1

SQL override in the session for source 1

Using the SQL override property, the number of objects to take into account when supporting and maintaining the ETL process is now 1 mapping + 3 sessions = 4 objects.

A possible drawback of using this technique is that now there are 2 different places to check any free-hand SQL when reverse-engineering an ETL job, instead of using only the original mapping in Informatica Designer. This can be particularly painful during the ETL support handover phase, especially when the available documentation is poor or, as so often happens, nonexistent.

Another condition necessary for the use of the SQL override is the consistency of the three source tables. In case there are data type differences, the designer has to choose whether to have different mappings or to create another source-target stream in the same mapping, depending on how much complexity can be afforded. The latter option is strongly discouraged: as a best practice, mappings should have one and only one target.

Other techniques to optimise maintainability and tidiness of ETL jobs involve the use of mapplets and worklets, and defining the appropriate number of sessions to be used in a single workflow. But this should be the subject of another post. Stay tuned!

One Response to “SQL Override: Mapping Reusability in Informatica”

  1. Very useful information esp with screen shots. Though the blog information be about very basic info on informatica, it is still best compared to non-availability of such info on WWW

Leave a Comment