January 18th, 2012.
Isabel
Informatica Power Center can use different sources of data. The most common sources are tables, views or synonyms, but for storing small sets of data, companies usually use flat files, MS Excel files or MS Access. Informatica treat Excel and Access file sources as a relational database, not as a flat file.
Which steps do we have to follow to load this data from these files to our Data Warehouse? How are the connections created? Where do these files have to be located? How should we configure Informatica?
In this article I will explain the way to load data from Excel files using Informatica Power Center running on Windows.
Continue reading this post >
November 15th, 2011.
Antonio
GoldenGate is a tool that Oracle acquired in 2009 that allows us to capture, route, transform and send data between heterogeneous systems in real time with a very low impact on the source systems.
So basically Oracle GoldenGate (OGG from now on) moves data between point A and point B, but it does it in a brilliant way. Let’s take a look in detail to its main features:
Very low impact on data sources – This means that OGG does not disturb the source systems when fetching new data. In order to achieve that it doesn’t really access the database layer but it’s constantly looking for change in the redo files (or equivalent technology) to know when a transaction has been done.
Heterogeneous Systems – Thanks to the ODBC technology and the VAM (Vendor Access Modules) OGG is able to interconnect a nice amount of database vendors between each other, although only the ones that offers referential integrity can be chosen as source systems.
Real time – Maybe you’ve heard that OGG can transfer data with a sub second latency. That is true if your integration scenario meets some conditions and the physical distance between source and destination is not huge. Anyway, in the worst circumstances we are talking about real low latency times.
Continue reading this post >
October 20th, 2011.
Gianluca
In any Business Intelligence environment, changing technology of data sources is a big challenge.
This is valid particularly in the case of mature, long-running BI platforms, where the overall ETL processing is likely to exceed three or four hundred single jobs.
A change of the data source database technology – for example, from SQL Server to Oracle – and related data migration means often a painstaking exercise of manually updating every single ETL step, unit and regression testing, QA and moving to Production.
In order to minimise the effort required it is recommended to avoid database-specific SQL wherever possible, and to make use of any automation your ETL tool offers in order to make the code portable across platforms.
Continue reading this post >
July 11th, 2011.
Gustavo
There are existing information systems such as CRM’s (Siebel, etc) and ERP’s (EBS, SAP, etc) where the information is stored in a way in which it is easy to extract information and feed our data warehouses, however a large number of customers still manage their information in so-called flat files (i.e.: .csv, .xls, etc) which are not as user-friendly.
Such situations still commonly exist; therefore, the purpose of this article is to explain the process to resolve this.
Continue reading this post >
July 5th, 2011.
Jordi S.
When implementing an ETL, one of the most common tasks is the deduplication of rows. If you need to implement this in PL/SQL, one method could be with correlated subqueries. However, in this article we will take the approach sometimes forgotten by developers, which is using Analytic Functions.
The main objective of this article is to compare the two different ways to realize the deduplication.
Continue reading this post >