Extracting data from Taleo

.

Over the past few years we have seen companies focusing more and more on Human Resources / Human Capital activities. This is no surprise, considering that nowadays a large number of businesses depend more on people skills and creativity than on machinery or capital, so hiring the right people has become a critical process. As a consequence, more and more emphasis is put on having the right software to support HR/HC activities, and this in turn leads to the necessity of building a BI solution on top of those systems for a correct evaluation of processes and resources. One of the most commonly used HR systems is Taleo, an Oracle product that resides in the cloud, so there is no direct access to its underlying data. Nevertheless, most BI systems are still on-premise, so if we want to use Taleo data, we need to extract it from the cloud first.

 

1. Taleo data extraction methods

As mentioned before, there is no way of direct access to Taleo data; nevertheless, there are several ways to extract it, and once extracted we will be able to use it in the BI solution:

Querying Taleo API
Using Cloud Connector
Using Taleo Connect Client

API is very robust, but the most complex of the methods, since it requires a separate application to be written. Usually, depending on the configuration of the BI system, either Oracle Cloud Connector, Taleo Connect Client or a combination of both is used.

Extracting data from Taleo

Figure 1: Cloud connector in ODI objects tree

Oracle Cloud Connector is a component of OBI Apps, and essentially it’s Java code that replicates Taleo entities / tables. It’s also easy to use: just by creating any Load Plan in BIACM using Taleo as the source system, a series of calls to Cloud Connector are generated that effectively replicate Taleo tables to local schema. Although it works well, it has 2 significant disadvantages:

It’s only available as a component of BI Apps
It doesn’t extract Taleo UDFs

So even if we have BI Apps installed and we use Cloud Connector, there will be some columns (UDFs) that will not get extracted. This is why the use of Taleo Connect Client is often a must.

 

2. Taleo Connect Client

Taleo Connect Client is a tool that is used to export or import data from / to Taleo. In this article we’re going to focus on extraction. It can extract any field, including UDFs, so it can be used in combination with BI Apps Cloud Connector or, if that’s not available, then as a unique extraction tool. There are versions for both Windows and Linux operating systems. Let’s look at the Windows version first.

Part 1 - Installation & running: Taleo Connect Client can be downloaded from the Oracle e-delivery website; just type Taleo Connect Client into the searcher and you will see it on the list. Choose the necessary version, select Application Installer and Application Data Model (required!), remembering that it must match the version of the Taleo application you will be working with; then just download and install. Important – the Data Model must be installed before the application is installed.

Extracting data from Taleo

Figure 2: Downloading TCC for Windows

After TCC is installed, we run it, providing the necessary credentials in the initial screen:

Extracting data from Taleo

Figure 3: Taleo Connect Client welcome screen

And then, after clicking on ‘ping’, we connect to Taleo.  The window that we see is initially empty, but we can create or execute new extracts from it. But before going on to this step, let’s find out how to see the UDFs: in the right panel, go to the ‘Product integration pack’ tab, also selecting the correct product and model. Then, in the Entities tab, we can see a list of entities / tables, and in fields / relations, we can see columns and relations with other entities / tables (through foreign keys). After the first run, you will probably have some UDFs that are not on the list of fields / relations available. Why is this? Because what we initially see in the field list are only Taleo out-of-the-box fields, installed with the Data Model installer. But don’t worry, this can easily be fixed: use the ‘Synchronize custom fields’ icon (highlighted on the screenshot). After clicking on it you will be taken to a log-on screen where you’ll have to provide log-on credentials again, and after clicking on the ‘Synchronize’ button, the UDFs will be retrieved.

Extracting data from Taleo

Figure 4: Synchronizing out-of-the-box model with User Defined Fields

Extracting data from Taleo

Figure 5: Synchronized list of fields, including some UDFs (marked with 'person' icon)

Part 2 - Preparing the extract: Once we have all the required fields available, preparing the extract is pretty straightforward. Go to File->New->New Export Wizard, then choose the desired Entity, and click on Finish. Now, in the General window, set Export Mode to ‘CSV-Entity’, and in the Projections tab, select the columns that you want to extract by dragging and dropping them from the Entity->Structure window on the right. You can also add filters or sort the result set. Finally, save the export file. The other component necessary to actually extract the data is so-called configuration. To create it, we select File->New->New Configuration Wizard, then we point the export file that we’ve created in the previous step and, in the subsequent step, our endpoint (the Taleo instance that we will extract the data from). Then, on the following screen, there are more extract parameters, like request format and encoding, extract file name format and encoding and much more. In most cases, using the default values of parameters will let us extract the data successfully, so unless it’s clearly required, there is no need to change anything. So now the configuration file can be saved and the extraction process can start, just by clicking on the ‘Execute the configuration’ button (on the toolbar just below the main menu). If the extraction is successful, then all the indicators in the Monitoring window on the right will turn green, as in the screenshot below.

Extracting data from Taleo

Figure 6: TCC successfull extraction

By using a bat file created during the installation, you can schedule TCC jobs to be executed on a timely basis, using Windows Scheduler, but it’s much more common to have your OBI / BI Apps (or almost any other DBMS that your organization uses as a data warehouse) installed on a Linux / Unix server. This is why we’re going to have a look at how to install and set up TCC in a Linux environment.
 
Part 3 - TCC in a Linux / Unix environment: TCC setup in a Linux / Unix environment is a bit more complex. To simplify it, we will use some of the components that were already created and used when we worked with Windows TCC, and although the frontend of the application is totally different (to be precise, there is no frontend at all in the Linux version as it’s strictly command-line), the way the data is extracted from Taleo is exactly the same (using extracts designed as XML files and Taleo APIs). So, after downloading the application installer and data model from edelivery.oracle.com , we install both components. Installation is actually just extracting the files, first from zip to tgz, and then from tgz to uncompressed content. But this time, unlike in Windows, we recommend installing (extracting) the application first, and then extracting the data model files to an application subfolder named ‘featurepacks’ (this must be created, it doesn’t exist by default). It’s also necessary to create a subfolder ‘system’ in the application directory. Once this is done, you can move some components of your Windows TCC instance to the Linux one (of course, if you have no Windows machine available, you can create any of these components manually):

Copy file default.configuration_brd.xml from windows TCC/system to the Linux TCC/system
Copy extract XML and configuration XML files, from wherever you had them created, to the main Linux TCC directory

There are also some changes that need to be made in the TaleoConnectClient.sh file

Set JAVA_HOME variable there, at the top of the file (just below #!/bin/bash line), setting it to the path of your Java SDK installation (for some reason, in our installation, system variable JAVA_HOME wasn’t captured correctly by the script)
In the line below #Execute the client,  after the TCC_PARAMETERS variable, add:

✓ parameters of proxy server if it is to be used:

-Dhttp.proxyHost=ipNumber   –Dhttp.proxyPort=portNumber

✓ path of Data Model:

-Dcom.taleo.integration.client.productpacks.dir=/u01/oracle/tcc-15A.2.0.20/featurepacks

So, in the end, the TaleoConnectClient.sh file in our environment has the following content (IP addresses where ‘masked’):

#!/bin/sh
JAVA_HOME=/u01/middleware/Oracle_BI1/jdk
# Make sure that the JAVA_HOME variable is defined if  [ ! "${JAVA_HOME}" ] then
    echo +-----------------------------------------+
    echo "+ The JAVA_HOME variable is not defined.  +"
    echo +-----------------------------------------+
    exit 1
fi

# Make sure the IC_HOME variable is defined if  [ ! "${IC_HOME}" ] then
    IC_HOME=.
fi

# Check if the IC_HOME points to a valid taleo Connect Client folder if [ -e "${IC_HOME}/lib/taleo-integrationclient.jar" ] then
    # Define the class path for the client execution
    IC_CLASSPATH="${IC_HOME}/lib/taleo-integrationclient.jar":"${IC_HOME}/log"

    # Execute the client
    ${JAVA_HOME}/bin/java ${JAVA_OPTS} -Xmx256m ${TCC_PARAMETERS}
-Dhttp.proxyHost=10.10.10.10 -Dhttp.proxyPort=8080 -Dco m.taleo.integration.client.productpacks.dir=/u01/tcc_linux/tcc-15A.2.0.20/featurepacks
  -Dcom.taleo.integration.client.i
nstall.dir="${IC_HOME}" -Djava.endorsed.dirs="${IC_HOME}/lib/endorsed"
-Djavax.xml.parsers.SAXParserFactory=org.apache.xe
rces.jaxp.SAXParserFactoryImpl
-Djavax.xml.transform.TransformerFactory=net.sf.saxon.TransformerFactoryImpl
-Dorg.apache.
commons.logging.Log=org.apache.commons.logging.impl.Log4JLogger
-Djavax.xml.xpath.XPathFactory:http://java.sun.com/jaxp/x
path/dom=net.sf.saxon.xpath.XPathFactoryImpl -classpath ${IC_CLASSPATH} com.taleo.integration.client.Client ${@} else
    echo +-----------------------------------------------------------------------------------------------
    echo "+ The IC_HOME variable is defined as (${IC_HOME}) but does not contain the Taleo Connect Client"
    echo "+ The library ${IC_HOME}/lib/taleo-integrationclient.jar
cannot be found.                      "
    echo +-----------------------------------------------------------------------------------------------
    exit 2
fi

Once this is ready, we can also apply the necessary changes to the extract and configuration files, although there is no need to change anything in the extract definition (file blog_article_sq.xml). Let’s have a quick look at content of this file:

<?xml version="1.0" encoding="UTF-8"?>
<quer:query productCode="RC1501" model="http://www.taleo.com/ws/tee800/2009/01" projectedClass="JobInformation" locale="en" mode="CSV-ENTITY" largegraph="true" preventDuplicates="false" xmlns:quer="http://www.taleo.com/ws/integration/query"><quer:subQueries/><quer:projections><quer:projection><quer:field path="BillRateMedian"/></quer:projection><quer:projection><quer:field path="JobGrade"/></quer:projection><quer:projection><quer:field path="NumberToHire"/></quer:projection><quer:projection><quer:field path="JobInformationGroup,Description"/></quer:projection></quer:projections><quer:projectionFilterings/><quer:filterings/><quer:sortings/><quer:sortingFilterings/><quer:groupings/><quer:joinings/></quer:query>

Just by seeing the file we can figure out how to add more columns manually: we just need to add more quer tags, like

<quer:projection><quer:field path="DesiredFieldPath"/></quer:projection>

With regard to the configuration file, we need to make some small changes: in tags cli:SpecificFile and cli:Folder absolute Windows paths are used. Once we move the files to Linux, we need to replace them with Linux filesystem paths, absolute or relative.

Once the files are ready, the only remaining task is to run the extract, which is done by running:

./TaleoConnectClient.sh blog_article_cfg.xml 

See the execution log:

[KKanicki@BIApps tcc-15A.2.0.20]$ ./TaleoConnectClient.sh blog_article_cfg.xml
2017-03-16 20:18:26,876 [INFO] Client - Using the following log file: /biapps/tcc_linux/tcc-15A.2.0.20/log/taleoconnectclient.log
2017-03-16 20:18:26,876 [INFO] Client - Using the following log file: /biapps/tcc_linux/tcc-15A.2.0.20/log/taleoconnectclient.log
2017-03-16 20:18:27,854 [INFO] Client - Taleo Connect Client invoked with configuration=blog_article_cfg.xml, request message=null, response message=null
2017-03-16 20:18:27,854 [INFO] Client - Taleo Connect Client invoked with configuration=blog_article_cfg.xml, request message=null, response message=null
2017-03-16 20:18:31,010 [INFO] WorkflowManager - Starting workflow execution
2017-03-16 20:18:31,010 [INFO] WorkflowManager - Starting workflow execution
2017-03-16 20:18:31,076 [INFO] WorkflowManager - Starting workflow step: Prepare Export
2017-03-16 20:18:31,076 [INFO] WorkflowManager - Starting workflow step: Prepare Export
2017-03-16 20:18:31,168 [INFO] WorkflowManager - Completed workflow step: Prepare Export
2017-03-16 20:18:31,168 [INFO] WorkflowManager - Completed workflow step: Prepare Export
2017-03-16 20:18:31,238 [INFO] WorkflowManager - Starting workflow step: Wrap SOAP
2017-03-16 20:18:31,238 [INFO] WorkflowManager - Starting workflow step: Wrap SOAP
2017-03-16 20:18:31,249 [INFO] WorkflowManager - Completed workflow step: Wrap SOAP
2017-03-16 20:18:31,249 [INFO] WorkflowManager - Completed workflow step: Wrap SOAP
2017-03-16 20:18:31,307 [INFO] WorkflowManager - Starting workflow step: Send
2017-03-16 20:18:31,307 [INFO] WorkflowManager - Starting workflow step: Send
2017-03-16 20:18:33,486 [INFO] WorkflowManager - Completed workflow step: Send
2017-03-16 20:18:33,486 [INFO] WorkflowManager - Completed workflow step: Send
2017-03-16 20:18:33,546 [INFO] WorkflowManager - Starting workflow step: Poll
2017-03-16 20:18:33,546 [INFO] WorkflowManager - Starting workflow step: Poll
2017-03-16 20:18:34,861 [INFO] Poller - Poll results: Request Message ID=Export-JobInformation-20170316T201829;Response Message Number=123952695;State=Completed;Record Count=1;Record Index=1;
2017-03-16 20:18:34,861 [INFO] Poller - Poll results: Request Message ID=Export-JobInformation-20170316T201829;Response Message Number=123952695;State=Completed;Record Count=1;Record Index=1;
2017-03-16 20:18:34,862 [INFO] WorkflowManager - Completed workflow step: Poll
2017-03-16 20:18:34,862 [INFO] WorkflowManager - Completed workflow step: Poll
2017-03-16 20:18:34,920 [INFO] WorkflowManager - Starting workflow step: Retrieve
2017-03-16 20:18:34,920 [INFO] WorkflowManager - Starting workflow step: Retrieve
2017-03-16 20:18:36,153 [INFO] WorkflowManager - Completed workflow step: Retrieve
2017-03-16 20:18:36,153 [INFO] WorkflowManager - Completed workflow step: Retrieve
2017-03-16 20:18:36,206 [INFO] WorkflowManager - Starting workflow step: Strip SOAP
2017-03-16 20:18:36,206 [INFO] WorkflowManager - Starting workflow step: Strip SOAP
2017-03-16 20:18:36,273 [INFO] WorkflowManager - Completed workflow step: Strip SOAP
2017-03-16 20:18:36,273 [INFO] WorkflowManager - Completed workflow step: Strip SOAP
2017-03-16 20:18:36,331 [INFO] WorkflowManager - Completed workflow execution
2017-03-16 20:18:36,331 [INFO] WorkflowManager - Completed workflow execution
2017-03-16 20:18:36,393 [INFO] Client - The workflow execution succeeded.

And that’s it! Assuming our files were correctly prepared, the extract will be ready in the folder declared in cli:Folder tag of the configuration file.

As for scheduling, different approaches are available, the most basic being to use the Linux crontab as the scheduler, but you can also use any ETL tool that is used in your project easily. See the screenshot below for an ODI example:

Extracting data from Taleo

Figure 7: TCC extracts placed into 1 ODI package

The file extract_candidate.sh contains simple call of TCC extraction:

[KKanicki@BIApps tcc-15A.2.0.20]$ cat extract_candidate.sh
#!/bin/bash
cd /u01/tcc_linux/tcc-15A.2.0.20/
./TaleoConnectClient.sh extracts_definitions/candidate_cfg.xml

If your extracts fail or you have any other issues with configuring Taleo Connect Client, feel free to ask us in the comments section below!

In the last couple of years we have delivered several highly successful BI Projects in the Human Resources / Human Capital space! Don´t hesitate to contact us if you would like to receive specific information about these solutions!

Use JavaScript to embed Tableau dashboards into SharePoint

.

How to use JavaScript to embed Tableau dashboards into SharePoint with automatic sizing and current date quick filter selection?

As we all know, Tableau comes with its own web-based repository (Tableau Server), but sometimes we have to consume the dashboard that we create in other platforms. One of them is SharePoint, Microsoft’s web-based application, often used by enterprises to share content with their employees and departments.

Use JavaScript to embed tableau dashboards into SharePoint with automatic sizing and current date quick filter selection

Figure 1: Schema of the integration between SharePoint, Tableau Server and the enterprise accounts

Tableau allows us to easily share our dashboards by embedding them in JavaScript code or iFrames, providing ready-to-use code snippets that we just need to copy and paste in our solution.

These are very useful, but what if we want to go even further? For example, it’s currently not possible to set a quick filter to automatically select the current date. What if we want our dashboard to show data for the current year or month by default?

In this article, we’re going to explain how to embed dashboards into SharePoint, how to dynamically resize them and how to apply the default date quick filter with the help of some JavaScript power.

 

1. Embedding the dashboard into SharePoint

To embed the dashboard into SharePoint, we must first create a page with an “Embed Code” container. Once your SharePoint page is ready, click on Edit, then Insert, and select Embed Code. This particular type of container, as its name suggests, allows us to embed code inside SharePoint, and it will be our door to Tableau.

Use JavaScript to embed tableau dashboards into SharePoint with automatic sizing and current date quick filter selection

Figure 2: SharePoint’s Insert toolbar with highlighted “Embed code” button

Now that your container is ready, you will see a canvas where you can write your code. Let’s start by embedding the dashboard as it is, with an iFrame.

First of all, we need the dashboard link. We can find it in Tableau Server: just navigate to the dashboard that you want to embed into SharePoint and click on Share, in the upper-right corner. You will see two links: “Embed Code” and “Link”. The first is a JavaScript snippet, ready to be copied and pasted; the second is a pure URL to the dashboard. Click on the second and copy it.

Use JavaScript to embed tableau dashboards into SharePoint with automatic sizing and current date quick filter selection

Figure 3: Tableau Server’s “share” option and corresponding links

We are using this link because it is slightly less verbose; it’s also interesting to see how the two technologies can easily integrate.

Now that the link is ready, return to the Embed Code container canvas and write a simple HTML page that contains an iFrame, and use the link as the source URL, like this:

<!DOCTYPE html>
<html>
  <center><body>
    <iframe src="(Tableau Server URL)" id="myIframe"></iframe>
  </body></center>
</html>

Click on insert, and the dashboard will appear in your page!
 

2. Dynamic Sizing

Now that your dashboard is into SharePoint, you might want to change its size. Sometimes, to make our dashboards look as cool as they can on every screen, we set the resolution to automatic in Tableau Desktop; this is usually a great idea, but into SharePoint we might have big headers, multiple contents on the same browser page, etc., which don’t let our dashboard look as good as it could. We can solve this issue by changing the size of the Embed Code container automatically, with just a few JavaScript lines.

Edit the container that you just set up, and add the following script AFTER the iFrame (but before the closing body tag):

<script type="text/javaScript">
var iframe = document.getElementById('myIframe');
	
//Automatic size: the following lines of code adapt the dashboard size to the browser window
var height = window.innerHeight;
var width = window.innerWidth;					   
iframe.height = height*0.95;
iframe.width = width*0.95;

document.getElementById('contentBox').style.marginLeft="0px";
document.getElementById('contentBox').style.marginRight="0px";
document.getElementById('contentRow').style.paddingTop="0px";
	
var containers = document.getElementsByClassName('container-fluid');
var container = containers[1];
container.style.marginLeft="0px";
container.style.marginRight="0px";
container.style.paddingLeft="0px";
container.style.paddingRight="0px";
		
containers = document.getElementsByClassName('welcome blank-wp');
container = containers[0];
container.style.paddingLeft="0px";
container.style.paddingRight="0px";
container.style.paddingTop="0px";
container.style.paddingBottom="0px";

containers = document.getElementsByClassName('welcome-content');
container = containers[0];
container.style.marginTop="0px";	
</script>

Again, click on Insert, and your dashboard will automatically resize when it’s loaded!In this script, we take the actual size of the browser and use it as a reference to change our “contentBox” size. In the example we basically use 95% of the available space in both dimensions, but you can change it as you wish. Notice how we also remove all possible padding.
 

3. Setting default current date filter

Our dashboard is now ready into SharePoint and it automatically resizes. The next step is to add automatic filters.

Tableau allows us to manually set parameters in the URL, with the classic name=value combination. We will exploit this, changing the parameter list when loading the page.

Let’s say, for example, that we have a date dimension called Year-Month, in the format YYYY-MM, and we want our dashboard to automatically filter to the actual month every time we open it. This is something that Tableau currently can’t do (you can vote for it in “ideas” in the community: use think link or this ). Luckily, JavaScript comes to the rescue.

Add the following lines of code after your iFrame (the additional comments help understand the various steps):

//get the parameters from the URL
var loc = window.location.href;
var params = "";
if(loc.indexOf('?') >= 0) params = loc.split('?')[1];	
	
//get the actual iframe URL
var iframe = document.getElementById('myIframe');
var iframe_url = iframe.src;

//if the date parameter doesn't exist, set the actual Year-Month	
n = params.indexOf("Year-Month"); 		
if(n<0){ //if the parameter is not in the URL yet…
var today = new Date();
var mm = today.getMonth()+1; //JS’s months go from 0 to 11
var yyyy = today.getFullYear();			
if(mm<10) mm="0"+mm;

//append the new parameter to the other ones (if any)
var ym = "Year-Month="+yyyy+"-"+mm;
if (params.length > 0) params=params+'&'+ym;
else params = ym;
}
	
//set the new iframe URL (or restore the original one)		
iframe.src = iframe_url+params;

That’s it! By clicking Import, your dashboard will filter according to the current month, automatically. Pretty cool!

 

4. Next steps

This is a pretty basic but highly useful implementation of Tableau embedding. We used the date in our example, but you can easily set whatever parameter you want. You can also set up an URL action in your dashboard that points to a SharePoint page, and then write JavaScript code that processes the various parameters, in order to apply the filters automatically when the dashboard is loaded in that page.

For an even better use of the tool, don’t forget to check out the Tableau APIs:

JavaScript API
REST API

Contact us if you would like to hear more about the topic or need any help embedding your Tableau dashboards into SharePoint.

 

Advanced analytics: Tableau + Python

.

Tableau has released TabPy, Tableau Python Server, an API that enables Python code evaluation within Tableau. Thanks to TabPy, you can create calculated fields using Python code in Tableau 10.2. As significant as the R connection with Tableau 8.1, now Python lovers will be able to leverage all the power of advanced analytics and visualize the results. Python is a widely used general-purpose programming language, and a large number of Python libraries are available to perform statistical analysis, predictive models or machine learning algorithms. Tableau and Python are a perfect match, a dream team for any data scientist.

Here at ClearPeaks we are convinced that connecting TabPy with Tableau is one of the best approaches for predictive analytics. So how does this new feature work? We are already working with this technology, and are proud to offer a demonstration of Python code running within a Tableau environment.

In this article you will learn how to connect Tableau with TabPy; just follow the example to see how to use this new feature and you´ll be ready to start your own journey in this new advanced data analytics environment.

Prerequisites:

Tableau Desktop 10.2
TabPy (Tableau Python Server)
Anaconda Navigator

 

1. Installing and setting up Anaconda Navigator

Anaconda is the leading open data science platform powered by Python, whose open source version is a high performance distribution of Python and R, and includes over 100 of the most popular Python, R and Scala packages for data science.

Download Anaconda Navigator, Python 2.7 and 64-bit version here.

TabPy is only supported in a Python 2.7 environment.

Install Anaconda Navigator with the default options in the installation set-up.

Advanced analytics: Tableau + Python

Figure 1: Anaconda Navigator installation

Once Anaconda Navigator has been installed we need to create a new enviroment:

Click on “Create” and type “Tableau-Python-Server” in the environment name, then click “Create”.

Advanced analytics: Tableau + Python

Figure 2: Creating a new environment in Anaconda

This new environment will be used when connecting Tableau with TabPy. We also need some packages in order to perform mathematical calculations, apply machine learning algorithms, plot, define dataframes, create predictive models or edit Python code. We’re going to install the required packages: “Numpy”, “scikit-learn”, “matplotlib”, “pandas”, “statsmodel” and “ipython”; search for the packages in the “Not Installed” window and install them all.

Advanced analytics: Tableau + Python

Figure 3: Installing Python Packages

You can also download “Spyder”, a powerful Python IDE with advanced editing, interactive testing, debugging and introspection features, to try your Python code before using it in Tableau calculated fields.

Go to the Home tab of Anaconda Navigator, and select the created environment “Tableau-Python-Server”; you can install “Spyder” from there.

Advanced analytics: Tableau + Python

Figure 4: Anaconda Navigator

 

2. Installing TabPy

Running Python code within a Tableau workbook requires a Python server to execute it, and the TabPy framework gets the job done. Download TabPy from github at the following link and decompress the file.

Advanced analytics: Tableau + Python

Figure 5: Download Tabpy

On Windows prompt:

activate Tableau-Python-Server

In the TabPy folder run the following commands to install TabPy client and server as well as package dependencies.

pip install -r ./tabpy-server/requirements.txt

pip install ./tabpy-client

pip install ./tabpy-server

As the packages are installed, you will see the install locations listed in the command line. These might look like /Users/username/anaconda/envs/Tableau-Python-Server/lib/python2.7/site-packages or \Users\username\anaconda\envs\Tableau-Python-Server\lib\site-packages depending on your environment.

Navigate to the tabpy_server folder under site-packages and run startup.bat 9001. 9001 is the port to connect Tableau with the TabPy server.

 

3. Connecting Tableau with TabPy

The next step is to connect Tableau 10.2 with TabPy. This can be done in Help > Settings and Performance > Manage External Service Connection:

Advanced analytics: Tableau + Python

Figure 6: Connecting Tableau + Python Server

Test the connection and you should get a “successfully connected” prompt. Now you can check out an example of predictive analysis.

 

4. Using Tableau + Python for predictive analytics

In this section, you can follow through an example of how to visualize a Time Series prediction with Tableau.

Advanced analytics: Tableau + Python

We’re going to use a dataset called AirPassengers, containing historic data on the number of passengers per month of an airline company. First, we open a new Tableau workbook and connect to the AirPassengers.csv dataset, then we create all calculated fields and parameters:

Advanced analytics: Tableau + Python

Figure 7: Calculated fields and Dimensions

Define the parameters:

We can create parameters to choose the number of autoregressive terms, the number of moving average terms and the seasonal differencing order to our arima time series model.

Advanced analytics: Tableau + Python

Figure 8: Autoregresive terms

Advanced analytics: Tableau + Python

Figure 9: Moving average terms

Advanced analytics: Tableau + Python

Figure 10: Seasonal differencing order

Advanced analytics: Tableau + Python

Figure 11: Months to forecast

Time dimensions need to be formatted.

Advanced analytics: Tableau + Python

Figures 12 and 13: Date parsing and Date shift

Finally we have to create the calculated fields. The measure “#Passengers” is defined as integer; we should fomat this field to assign float (number decimal) format, so we can combine axis of this measure with the predicted outcomes.

Advanced analytics: Tableau + Python

Figures 14 and 15: Synchronize number of passengers with correspondent month and differentiate forecast values with predicted past values

Now we’re going to embed python code to the new calculated fields. The code embedded to the calculated field “Time Series” will return the predicted values of an Arima(p, d, q) model, where p=[AR (time lag)]; d=[I Seasonal Difference]; q=[MA (Moving Average)], with the parameters we created above. Find the python code below:

SCRIPT_REAL(
'
import pandas as pd
import numpy as np
import matplotlib.pylab as plt
from matplotlib.pylab import rcParams
dates = _arg1
passeng = _arg2
order_arima = min(_arg3)
seasonal_diff = min (_arg4)
ma_param = min (_arg5)
months_forecast = min(_arg6)
ts = pd.DataFrame({"dates": dates,"Passengers": passeng})
ts["Passengers"] = ts["Passengers"].astype("float64")
ts = ts.set_index(["dates"])
ts_log = np.log(ts)
ts_log.index = pd.to_datetime(ts_log.index)
ts_log_diff = ts_log - ts_log.shift()
ts_log_diff["Passengers"][0] = 0
from statsmodels.tsa.arima_model import ARIMA
model = ARIMA(ts_log_diff, order=(order_arima, seasonal_diff, ma_param))
results_ARIMA = model.fit(disp=-1)
predictions_value = results_ARIMA.forecast(months_forecast)[0]
from dateutil.relativedelta import relativedelta
add_month = relativedelta(months=1)
predictions_dates = list()
for i in range(months_forecast):
predictions_dates.append ( results_ARIMA.fittedvalues.index[-1] + ((i+1)*add_month))
forecast_log_diff = pd.Series(predictions_value, index=predictions_dates)
predictions_ARIMA_diff = pd.Series(results_ARIMA.fittedvalues, copy=True)
predictions_ARIMA_diff_cumsum = predictions_ARIMA_diff.cumsum()
predictions_ARIMA_log = pd.Series(np.asscalar(ts_log.ix[0]), index=ts_log.index)
predictions_ARIMA_log = predictions_ARIMA_log.add(predictions_ARIMA_diff_cumsum,fill_value=0)
predictions_ARIMA = np.exp(predictions_ARIMA_log)
forecast_log_diff_ARIMA = pd.Series(forecast_log_diff, copy=True)
forecast_ARIMA_log_diff_cumsum = forecast_log_diff_ARIMA.cumsum()
forecast_ARIMA_log = pd.Series(np.asscalar(ts_log.ix[-1]), index=forecast_log_diff_ARIMA.index)
forecast_ARIMA_log = forecast_ARIMA_log.add(forecast_ARIMA_log_diff_cumsum,fill_value=0)
forecast_ARIMA = np.exp(forecast_ARIMA_log)
return list(predictions_ARIMA)
',
ATTR([Month]), ATTR([#Passengers]), MIN([AR (Time lag)]), MIN([I Seasonal Difference]), 
MIN([MA (Moving average)]), MIN([Months Forecast]))

For the forecasting part, create the calculated field “Forecast”:

SCRIPT_REAL(
'
import pandas as pd
import numpy as np
import matplotlib.pylab as plt
from matplotlib.pylab import rcParams
dates = _arg1
passeng = _arg2
order_arima = min(_arg3)
seasonal_diff = min (_arg4)
ma_param = min (_arg5)
months_forecast = min(_arg6)
ts = pd.DataFrame({"dates": dates,"Passengers": passeng})
ts["Passengers"] = ts["Passengers"].astype("float64")
ts = ts.set_index(["dates"])
ts_log = np.log(ts)
ts_log.index = pd.to_datetime(ts_log.index)
ts_log_diff = ts_log - ts_log.shift()
ts_log_diff["Passengers"][0] = 0
from statsmodels.tsa.arima_model import ARIMA
model = ARIMA(ts_log_diff, order=(order_arima, seasonal_diff, ma_param))
results_ARIMA = model.fit(disp=-1)
predictions_value = results_ARIMA.forecast(months_forecast)[0]
from dateutil.relativedelta import relativedelta
add_month = relativedelta(months=1)
predictions_dates = list()
for i in range(months_forecast):
predictions_dates.append ( results_ARIMA.fittedvalues.index[-1] + ((i+1)*add_month))
forecast_log_diff = pd.Series(predictions_value, index=predictions_dates)
predictions_ARIMA_diff = pd.Series(results_ARIMA.fittedvalues, copy=True)
predictions_ARIMA_diff_cumsum = predictions_ARIMA_diff.cumsum()
predictions_ARIMA_log = pd.Series(np.asscalar(ts_log.ix[0]), index=ts_log.index)
predictions_ARIMA_log = predictions_ARIMA_log.add(predictions_ARIMA_diff_cumsum,fill_value=0)
predictions_ARIMA = np.exp(predictions_ARIMA_log)
forecast_log_diff_ARIMA = pd.Series(forecast_log_diff, copy=True)
forecast_ARIMA_log_diff_cumsum = forecast_log_diff_ARIMA.cumsum()
forecast_ARIMA_log = pd.Series(np.asscalar(ts_log.ix[-1]), index=forecast_log_diff_ARIMA.index)
forecast_ARIMA_log = forecast_ARIMA_log.add(forecast_ARIMA_log_diff_cumsum,fill_value=0)
forecast_ARIMA = np.exp(forecast_ARIMA_log)
forecast_ARIMA_2 = predictions_ARIMA.append(forecast_ARIMA)
forecast_ARIMA_2 = forecast_ARIMA_2[len(forecast_ARIMA):]
return list(forecast_ARIMA_2)
',
ATTR([Month]), ATTR([#Passengers]), min([AR (Time lag)]), MIN([I Seasonal Difference]),
MIN([MA (Moving average)]), MIN([Months Forecast]))

We have also included the sum of squared errors calculated field so that it is possible to compare different models.

Here’s the code:

SCRIPT_REAL(
'
import pandas as pd
import numpy as np
import matplotlib.pylab as plt
from matplotlib.pylab import rcParams
dates = _arg1
passeng = _arg2
order_arima = min(_arg3)
seasonal_diff = min (_arg4)
ma_param = min (_arg5)
months_forecast = min(_arg6)
ts = pd.DataFrame({"dates": dates,"Passengers": passeng})
ts["Passengers"] = ts["Passengers"].astype("float64")
ts = ts.set_index(["dates"])
ts_log = np.log(ts)
ts_log.index = pd.to_datetime(ts_log.index)
ts_log_diff = ts_log - ts_log.shift()
ts_log_diff["Passengers"][0] = 0
from statsmodels.tsa.arima_model import ARIMA
model = ARIMA(ts_log_diff, order=(order_arima, seasonal_diff, ma_param))
results_ARIMA = model.fit(disp=-1)
predictions_value = results_ARIMA.forecast(months_forecast)[0]
from dateutil.relativedelta import relativedelta
add_month = relativedelta(months=1)
predictions_dates = list()
for i in range(months_forecast):
predictions_dates.append ( results_ARIMA.fittedvalues.index[-1] + ((i+1)*add_month))
forecast_log_diff = pd.Series(predictions_value, index=predictions_dates)
predictions_ARIMA_diff = pd.Series(results_ARIMA.fittedvalues, copy=True)
predictions_ARIMA_diff_cumsum = predictions_ARIMA_diff.cumsum()
predictions_ARIMA_log = pd.Series(np.asscalar(ts_log.ix[0]), index=ts_log.index)
predictions_ARIMA_log = predictions_ARIMA_log.add(predictions_ARIMA_diff_cumsum,fill_value=0)
predictions_ARIMA = np.exp(predictions_ARIMA_log)
forecast_log_diff_ARIMA = pd.Series(forecast_log_diff, copy=True)
forecast_ARIMA_log_diff_cumsum = forecast_log_diff_ARIMA.cumsum()
forecast_ARIMA_log = pd.Series(np.asscalar(ts_log.ix[-1]), index=forecast_log_diff_ARIMA.index)
forecast_ARIMA_log = forecast_ARIMA_log.add(forecast_ARIMA_log_diff_cumsum,fill_value=0)
forecast_ARIMA = np.exp(forecast_ARIMA_log)
predictions_ARIMA_dataframe = pd.DataFrame(predictions_ARIMA)
SSE = sum((ts.ix[:,0]-predictions_ARIMA_dataframe.ix[:,0])**2)
return SSE
',
ATTR([Month]), ATTR([#Passengers]), MIN([AR (Time lag)]), MIN([I Seasonal Difference]),
MIN([MA (Moving average)]), MIN([Months Forecast]))

Finally, we’re going to design the view by dragging “Forecast date” (month level) to columns, “Number of Passengers” and “Forecast” to rows. Convert the plots to dual axis and then synchronize axis, obtaining this view:

Advanced analytics: Tableau + Python

Figure 16: Tableau + Python Predictive analysis view

Here you can see the real values as bars, predicted values as a blue line and forecast values as an orange line; you can play with different parameters to forecast more months or create a new ARIMA(p,d,q) model. Notice that the title is also dynamic - you can get this with:

Advanced analytics: Tableau + Python

Figure 17: Dynamic plot title

As you can see, it’s not the best model ever, so you can take it as a challenge to improve it!

 

Conclusion

To summarize, we have seen how to use Python code with Tableau, a new feature which gives us the opportunity to implement advanced analytics in our data. An example of Time Series prediction is included in this article. We can see a clear advantage to using this approach when compared to using the Tableau built-in tool set: adjustability of model parameters.

We hope you enjoyed this article and please leave a comment if you have any doubts, suggestions or opinions.

Stay tuned for future posts!

Click here if you would like to know more about the topic.

Data Quality with EDQ – Part 3: Data Deduplication

.

Data Quality with EDQ – Part 3: Data Deduplication

 

In our previous article about Data Quality we reviewed some of EDQ´s basic cleansing and standardization capabilities.

In this post, which is the last in the series, we are reviewing the basic data deduplication capabilities of EDQ.

Introduction

The first step in a data deduplication or consolidation process flow is record matching, the process of finding duplicate records in the data, that is, records which may relate to a single real-world entity. This is not a trivial task, since we need to identify duplicate records despite different formats and conventions, typos, missing data, etc..

EDQ comes with some built-in transformations to match records; their names are self-explanatory in most cases:

Advanced Match: gives control over all the configuration options
Match Entities
Match Households
Match Individuals (Name, Address, DoB)
Match Individuals (Name, Address)

There are also processors for a variety of different matching scenarios:

Consolidate: matches and merges multiple datasets
Deduplicate: matches a single dataset
Enhance: adds information from one or more reference datasets
Group & Merge: gives a simple merge based on exact match only
Link: find matches between datasets but without merging the data

The matching processors execute a series of ordered sub-processors where we can configure the logic to be used in the matching and the decision rules for candidate duplicate records, among other things. The sub-processors are:

Input: select the attributes from the data stream included in the matching process
Identify: create identifiers to use in matching, i.e., what data will be used to identify records, and map the identifiers to attributes
Cluster:  divide the data streams into clusters; this can be used to reduce the number of comparisons needed and thus the time necessary to run the process
Match: choose which comparisons to perform, and how to interpret them with match rules
Merge: use rules to merge matching records, to create a ‘best’ set of output records (optional)

In the following section we are going to run through an example of data deduplication using EDQ.

 

1. Deduplicating Data using EDQ

Let’s imagine that we have a dataset with a list of people, with many fields containing personal information, including full name, date of birth, and address. We want to see if we have duplicate persons in the table, and to do this, we are going to use EDQ to find matching records based on the full name, address, and date of birth.

For this specific case, the best processor to use is Match Individuals (Name, Address, DoB). We’re going to add this processor to a new process, connecting all the fields available in the dataset as identifiers:

Data Quality Series – Data De-duplication with EDQ

Figure 1: Process for data standardization and deduplication in EDQ. The steps Create GivenName, Create FullName, and Create WholeAddress are concatenations of strings to form the full name in a single string and the whole address in another string.

Data Quality Series – Data De-duplication with EDQ

Figure 2: Matching identifiers of the Match Individuals processor in EDQ.

We can run the process with this configuration to review the results:

Data Quality Series – Data De-duplication with EDQ

Figure 3: Running the Match Individuals processor in EDQ.

In the new window that opens we can review the matching records, set decisions for actions for each individual case, and even add comments:

Data Quality Series – Data De-duplication with EDQ

Figure 4: Results of the Matching Individuals processor in EDQ. We can review and set a decision for each pair of duplicate candidates individually.

These results can also be exported to Excel for further investigation.

Now we’re going to merge the duplicated records using the Merge sub-processor of the Match Individuals processor; we simply double-click on the Merge sub-processor and check the option ‘Output unrelated records’.

We can write the deduplicated records into a new dataset; to do so, we have to add a new Write processor connected to the Merged output of the Match Individuals processor:

Data Quality Series – Data De-duplication with EDQ

Figure 5:  The data standardization and deduplication process in EDQ, with a Writer processor to export the results to a new dataset.

The records will be written into a new staged dataset called Deduplicated Customers:

Data Quality Series – Data De-duplication with EDQ

Figure 6: Writing the results of the process into a new dataset in EDQ.

Data Quality Series – Data De-duplication with EDQ

Figure 7: New staged data created from the results of the process of deduplication in EDQ.

After running the process, we can see that the new dataset with the deduplicated customers has been created in Staged Data:

Data Quality Series – Data De-duplication with EDQ

Figure 8: Dataset created from the standardization and deduplication process in EDQ.

And if we right-click on the Writer processor and select ‘Show results in new window’, we can see that some of the records have a MatchGroupSize field equal to 2, which means that these records come from the merge of two records.

 

Conclusion

Data deduplication is a complex task, one of the most complex manual tasks in the quality assurance process, where tools like EDQ are extremely handy. With EDQ we were able to find matching records using complex rules and to merge the records belonging to a single entity, all in a few clicks. The ability to review each pair of duplicate candidates individually and to decide case-by-case proved to be really helpful in reducing the risk of merging records corresponding to different real-world entities in the deduplication process.

You can try this and other examples of EDQ usage by downloading the latest virtual machine from Oracle, available here.

This concludes our series of blog articles about Oracle EDQ.
You can find the previous posts in the links below:

Part 1: Introduction
Part 2: Data Profiling with EDQ
Part 3: Data Standardization with EDQ

Have you tried EDQ in any of your projects? What was your experience using EDQ like? If you have anything you’d like to share, or if there are any questions you’d like to raise, please leave your comments below!

Click here if you would like to know more about the Data Quality Services we offer!

 

Data Quality with EDQ – Part 2: Data Standardization

.

Data Quality with EDQ – Part 2: Data Standardization

 

In the first post of the Data Quality Series we introduced the main concepts of data quality. In the second article we explained how Oracle Enterprise Data Quality (EDQ) can help us profile our data.

The focus in this post is to review some of EDQ’s basic cleansing and standardization capabilities.

Introduction

EDQ provides a number of built-in transformations for data cleansing, and some of the most commonly used transformations are:

Character Replace: replaces characters according to a map
Denoise: removes noise characters from text attributes
Lower Case: converts string values to lower case
Upper Case: converts string values to upper case
Proper Case: converts string values to proper case
Normalize No Data: normalizes attributes with no data to nulls
Normalize Whitespace: removes leading and trailing whitespace, and normalizes inter-word whitespace to a single space
RegEx Replace: replaces a string value matching a regular expression with a given value, or part of the matching expression
Replace: replaces values using a map

EDQ also comes with a series of built-in processors for data type conversion that make the normalization of types in our data really easy. There are also transformations that can be used to enhance our data, for example by adding dates, concatenating or parsing fields, or adding initials from text values.

Maps can be very useful for data standardization; by keeping all the possible values representing an entity together with the corresponding standard value for that entity in map tables, we can standardize the data using EDQ in just a couple of simple steps. In the following example, we will see how we can improve the quality of two common data fields containing people information: Country and Gender.

 

1. Data Standardization with EDQ

Let’s suppose that we have a dataset with a Country field. The table should contain data mostly for the United States; however, a quick Frequency profile shows the following results:

Data Quality Series – Data Standardization with EDQ

Figure 1: Frequency profile of the Country field. We can see that some countries are represented by more than one value in the database.

We can see that we have very bad data quality for this field: the values ‘USA’, ‘US’, ‘U.S.A’, ‘United States’ and ‘U.S’ all belong to the single entity that we would like to call ‘United States of America’. Moreover, there are two different values for Canada (‘Canada’ and ‘CAN’), and also some null values. With EDQ, we can easily fix issues like this using mapping tables.

The first transformation that we need is the Normalize Whitespace processor, which removes leading and trailing whitespace, and normalizes inter-word spaces to a single space. Simply create a new process, add the transformation, connect the Reader output to the normalizer input, and select the fields that need whitespace normalization:

Data Quality Series – Data Standardization with EDQ

Figure 2: Normalize Whitespace processor in EDQ. Before processing text fields for standardization, it is recommended to remove leading and trailing spaces and to normalize inter-word spaces to a single space.

We will generate the values for the null ones from another field containing the city, using an external mapping called City to Country Mappings. This table contains a mapping of cities to their corresponding countries:

Data Quality Series – Data Standardization with EDQ

Figure 3: City to Country Mappings

To use it in our project, we just need to add it to the Reference Data section:

Data Quality Series – Data Standardization with EDQ

Figure 4: Adding Reference Data

We will use the mapping to generate the name of the country in a new column called “Derived Country” for each city in our dataset. To do this, we add the processor Enhance from Map, connected to the output of the Normalize Whitespace processor, using City as the field to match, and the added mapping:

Data Quality Series – Data Standardization with EDQ

Figure 5: Process for normalizing whitespace and populating the Country field in EDQ

Data Quality Series – Data Standardization with EDQ

Figure 6: Enhance from Map processor in EDQ. We will populate the “Country” field using the “City to Country” mapping, matching by the field “City”.

After running this process, we can see in the Enhance from Map processor results that we were able to enhance 3338 fields:

Data Quality Series – Data Standardization with EDQ

Figure 7: Results of the country enhancement using the City to Country Mapping in EDQ

However, 2100 records remain with an unenhanced country, so we still have work to do on this field to make it fit for use. The next step is the normalization of the name, using another mapping called Country Variants. This mapping contains the most common variants for ‘United States of America’, plus variants for Canada:

Data Quality Series – Data Standardization with EDQ

Figure 8: Country Variants mapping. The Standardized Country column contains the standard that we want to use for each real world entity, in this case countries.

To normalize these different variants, we add a Replace processor to the process, calling it Standardize Country:

Data Quality Series – Data Standardization with EDQ

Figure 9: Country Standardization process in EDQ.

In the properties of the transformation, we need to define Country as the input field (this is the field that we are standardizing), and we have to specify the replacements in the Options tab. We will use the mentioned Country Variants mapping, containing the replacements.

Data Quality Series – Data Standardization with EDQ

Figure 10: Replace processor in EDQ. The processor will replace the original value of the Country field with the standard from the Country Variants mapping.

Finally, we simply add a Merge Attributes processor to merge the country derived from the city and the standardized country into a new column, ‘Best Country’, and a Frequency Profile processor to check the results:

Data Quality Series – Data Standardization with EDQ

Figure 11: Complete Country Standardization process in EDQ. The Merge Attributes processor, called “Create Best Country Attribute”, merges the values of the country derived from the city and the country obtained after standardization using the Country Variants mapping.

The transformation will merge the derived country with the standardized country into a new column called “Best Country”:

Data Quality Series – Data Standardization with EDQ

Figure 12: Merging attributes in EDQ.

With a Frequency profiler we can analyse the results of the new field:

Data Quality Series – Data Standardization with EDQ

Figure 13: Frequency profiler to verify the results of the standardization of the Country field in EDQ.

We can see how the quality of the field has improved thanks to the transformations applied:

Data Quality Series – Data Standardization with EDQ

Figure 14: Country Frequency profiler before standardization in EDQ, showing null values and many values for the same countries.

 

Data Quality Series – Data Standardization with EDQ

Figure 15: Country Frequency profiler after standardization in EDQ.

We were able to eliminate all the variants of United States of America, and we also identified other countries that were not appearing correctly before the standardization process.

Now let’s suppose we have a dataset with people including a field for Gender, which should contain either ‘M’ or ‘F’ (Male or Female) for each record. However, a quick profiling of the field shows the following results:

Data Quality Series – Data Standardization with EDQ

Figure 16: Gender frequency profiling in EDQ.

We can see that we have a lot of null values, and some records with ‘U’ in the gender field. We’re going to improve the quality of the field with EDQ, by trying to generate the gender using the title and the first name fields, and merge this generated gender with the Gender field available in the data to get the best result.

We’ll begin by adding an Unstructured Name Parser. We will use it to parse the name field, so that will be the input of the transformation, specified in the Input sub-processor of the Parser processor:

Data Quality Series – Data Standardization with EDQ

Figure 17: Parsing text fields with EDQ.

Data Quality Series – Data Standardization with EDQ

Figure 18: Input sub-processor of the Unstructed Name Parser processor in EDQ. The input is specified here.

The name will be mapped to the only attribute of the processor called “FullName”:

Data Quality Series – Data Standardization with EDQ

Figure 19: Map sub-processor of the Unstructed Name Parser processor in EDQ. The FullName attribute is mapped to the input field Name.

After running the process, we can see all the steps taken and all the rules created in the results of the parser in order to parse the name field into three new fields: P1_Prefix with the title, P1_First with the first name, and P1_Last with the last name, for each record:

Data Quality Series – Data Standardization with EDQ

Figure 20: Results of the Unstructured Name Parser, with the fuzzy rules created to parse the string containing the name into three separate fields for the title, first name, and last name.

Now we can use these fields to derive the gender of each person in the dataset. To do this, we add the built-in transformation Add Gender to the process, and join it to the pass output of the parser transformation:

Data Quality Series – Data Standardization with EDQ

Figure 21: Enhancing a Gender field in EDQ. The process contains a parser transformation and EDQ's built-in transformation for adding the gender.

To derive the gender, we need to specify the fields that contain the title and the first name in the processor configuration, as well as the stated gender, which the processor will use in combination with the derived gender to create the output “Best Gender”.

Data Quality Series – Data Standardization with EDQ

Figure 22: Enhancing a Gender field in EDQ. The Add Gender processor requires three attributes: title, first name, and stated gender.

If we run the process now, we can see how we have improved the quality of this field in the results of the Add Gender processor:

Data Quality Series – Data Standardization with EDQ

Figure 23: Results of the Gender standardization in EDQ.

Now there are only 6 records with missing gender, and we have reduced the percentage of missing gender records from 19.5% to 0.1%.

 

Conclusion

EDQ provides some solid features that make data cleansing and standardization tasks much easier: we can apply built-in transformations, use external maps, and apply business rules to standardize, correct wrong values, normalize fields, and enhance our data.

You can try these and other usage examples of EDQ for standardization downloading the latest EDQ virtual machine and hands-on lab from the following links:

Latest EDQ virtual machine from Oracle
Getting Started Hands-On Lab

Our last article, which is the last in the series, is reviewing the basic data deduplication capabilities of EDQ.


Click here
if you would like to know more about the Data Quality Services we offer!

privacy policy - Copyright © 2000-2010 ClearPeaks

topnav