BI system on Amazon Cloud | Amazon Web Services

.

Introduction

The purpose of this blog is to explain how to configure a BI system on cloud using Amazon Web services (AWS). Our system will include an ETL server (pentaho data integrator AKA Kettle), a reporting server (Tableau) and a data warehouse (Redshift). Every of these components will be based on one AWS, these services will be detailed below.

Amazon provides a set of web services completely hosted on cloud in a single account,  these services are easy to manage through the AWS console. The services are paid on demand, this helps us to scale up the resources needed and create a budget plan that can be managed and modified easily. It allows the flexibily to remove or add new on demand services.

For payments, AWS provides also a set of dashboards, where we can review the detailed amount broken down by service.

From the variety of the AWS, some of them are enough to create the infrastructure we need to create our BI system completely hosted on cloud.

In this blog article I will explain 3 AWS to create a complete BI system:

  • EC2 (used to host the servers, ETL and reporting)
  • S3 (used to store and send files to Redshift)
  • Redshift (data warehouse)

From the console we can manage all of the web services we have signed up for, in our case we will focus on the following ones:

Picture1

Amazon Web Services:

1. EC2

EC2 is a compute AWS used to create instances of machines needed to support our infrastructure, in our case of a BI system, we will use 2 instances, one for the ETL server and a different one for the reporting server.

EC2 is completely dynamic, it allows maintenance of the infrastructure with a simple and intuitive front end, where we can operate into our instances. As main features,  it allows resizing of the resources of the instance on demand,  to add more memory, increase the number of CPUs and add new HDD volumes to the instance.

There are so many other features detailed on the following video:

In this scenario for our BI system, we have created 2 Windows machines, the instance can be selected from a set of preconfigured machines, then once created we can modify some properties as explained above.

Picture2

Figure 1 Creating a new instance

There are different prices and paying methods for the instances, the pricing and the licenses for the different sort of instances can be reviewed in the links below:

https://aws.amazon.com/ec2/instance-types/

https://aws.amazon.com/ec2/pricing/

 

One of the great features on EC2 instance is that with only a little knowledge of IT we can manage the infrastructure by ourselves, we can set up our network, connect to the machines using remote desktop, and share files between the instances and our local machines, we can take snapshots of the volumes, images of the instances that can be downloaded and deployed on premises.

Regarding the network and security configurations, we can assign a static IP to the instances, we can limit the access to that instance to be only reachable from certain IPs, so the instances can be secured.

Picture3

Figure 2 EC2 Landing page

 

As a conclusion, we can use this service to create any kind of instance that fit our needs and we will pay for the resources and usage we make of it, it is flexible and securable.

For the BI system we want to configure, EC2 will host 2 instances:

  • ETL server running on Windows: this server will be the responsible of make the data extraction and transformations and send the files generated to S3. We will use an open source ETL tool, Pentaho data integrator, the features of this ETL tool can be reviewed in the following link:

http://community.pentaho.com/projects/data-integration/

 

  • Reporting server running on Windows: this server will contain the dashboards and visualizations of the information hosted on redshift, we will use tableau as a reporting server, the features of tableau can be reviewed in the following link:

http://www.tableau.com/products/server

 

2. S3

S3 is one of the storage AWS, basically it is used to store data into a file directory inside a bucket. We will use this service for optimization reasons

image 7 blur

Figure 3 S3 Buckets

One of the bottlenecks that can appear in a BI system is the data loading into the database tables in the data warehouse,  as this tables use to be very large, usually we want to bulk load the tables, using the tandem redshift-S3 this can be done in a very efficient way

Once we have configured our bucket and assign a user to it, we can send files to the S3 bucket given a URL and using the AWS command line interface (AWS CLI). This will improve the performance of the table loads, as the files on S3 can be bulk loaded into tables in a very efficient way.

The service allows to secure the files, add encryption and some other interesting features.

3. Redshift

Redshift completes our BI system, it is a database service, scalable, columnar postgre database.

The latest visualization tools such as tableau, have in built connectors to access the information. It's easy to connect a database client to Redshift by specifying the URL. Redshift does not support table partitioning or indexing, however we can set sort and distribution keys on our tables to improve query performance, it also allows table compression setting the encoding on the columns.

As explained above, in order to improve the performance, we will use S3 to load the tables, in order to do this, we will create a set of files in our ETL server and after we will send it to S3, once the file has been set we will launch the copy command to load the table, the reference for the copy command can be reviewed at the following link:

http://docs.aws.amazon.com/cli/latest/reference/s3/cp.html

The relation between S3 and redshift is tight, we can also issue commands from our SQL client to store extracts from the tables directly into files in an S3 buckets.

Redshift can be configured in nodes, there are different kinds of nodes depending on our needs, we will chose between the different kind of nodes (computing or storage), once the node has been created it can be resized, it permits snapshots to be taken of the data and the size can be scalable to petabytes We can also apply security settings and configure alerts that will be received on an email inbox

picture 1 blur

Figure 4 Redshift Cluster configuration and properties

 

Another good feature of redshift on the management console is the ability to check the query status and monitor the resources used by the database such as disk and cpu usage, query time, etc as seen on the following figure:

Picture6

Figure 5 Redshift reports

Conclusion

AWS provides a set of on demand services that can be used to create any kind of IT system.

Regarding the benefits of using it to configure a BI system, it provides scalable on high performance services to create a data warehouse on redshift, host BI tools in EC2 instances with easy maintenance and security configuration, as well as fast data transfers using S3, these services working together are a great option to consider for saving time and money on our BI system infrastructure and configuration.

 

Write Back Functionality in OBIEE

.

About Write Back Functionality:

One of the interesting attributes that OBIEE provides is the facility to enable users to add/update data back to the database. The user can have a column for which values can be entered in the user interface (UI) section on their platform and this can be updated in database. This could have multiple benefits as end users may want to rank their customers or rate their regional business based on performance, and be able to use this data from time to time. This converts OBIEE into a useful reporting tool and mini application for modifying business data.

Requirements for implementing the functionality:

Implementing write back requires the configuration of multiple objects within the architecture i.e. Database, Connection Pool, Presentation, BMM and Physical Layers, UI privileges, Column/Table properties etc.

Example on implementing the Write back functionality:

Here I am going to demonstrate how to make the Attribute2 column in the Product table (Sample apps) to be a writeable column.

  • Edit instanceconfig.xml

This is the initial step to enabling Write Back in OBIEE. Open the instance config file from the location – <Middleware>/instances/instance1/config/OracleBIPresentationServicesComponent/coreapplication_obipsn

Under <DSN>, Add <LightWriteBack >true</LightWriteBack >

  • Enable Write Back in the Repository tables

Open the RPD in Offline mode. Then expand the Logical table Product in the BMM layer. Double click on the column Attribute2 and in the general tab enable ‘Writeable’.

image 1

In the presentation layer expand the table Product, double click on the column Attribute2, and in permissions change this column as Read/Write for BI author.

image 2

  • Setting direct database request permission

In the RPD, goto manage > Identity > application roles > BI Author > Permission> select execute Direct DB request> select Allow

image 3

  • Disable cache for physical tables

Select the SAMP_PRODUCTS_D table in the physical layer and disable cacheable option.

Double click on D2 customer > unselect override source table and cacheable.

image 4

Deploy the modified RPD and restart the BI Presentation services.

  • Grant write back privilege to users

Log on to OBIEE presentation services > Administration > manage privileges > Write Back property and click on denied: authenticated user > granted: to authenticated user

  • Create Analysis for Write Back

Create a new analysis with columns P1 Product and P6 Attribute2. Open the column property of Attribute2, select the Write Back tab and enable it. Save the analysis.

image 5

  • Create write back XML template

Goto <Middleware>/instances/instance1/bifoundation/OracleBIPresentationServicesComponent/coreapplication_obips1/analyticsRes/customMessages

Append the attached tags to the Write Back template.xml file (attached Write Back template.xml for reference)

<WebMessage name="wb_prod_attribute"> -- This web message is the reference for this block in the presentation

<XML>

<writeBack connectionPool="Sample Relational Connection"> -- Set the name as in the RPD file

<insert></insert>

<update>

UPDATE SAMP_PRODUCTS_D SET ATTRIBUTE_2='@2' WHERE PROD_DSC='@1' –- define the update query and refer the columns with their position in the answers

</update>

</writeBack>

</XML>

</WebMessage>

image 7 image 6

  • Enable Write Back in table view

Open the saved analysis > table view > edit view > Table view property > Write Back tab > Select enable Write Back and provide the name as wb_prod_attribute (Saved WebMessage name in the xml). Save the Analysis.

image 8

With this step, we have completed the configuration of Write Back in OBIEE. Now this should be tested in order to validate the Write Back configuration.

  • Testing the Write Back Option

Open the saved report > Click on Update.

This changes the column attribute2 to writeable. Change the value and click apply

image 9

Edit the column to the desired value.

image 10

Click Apply and Done

Now open the SQL developer and check the Product in the edited row.

SELECT PROD_DSC,ATTRIBUTE_2 FROM SAMP_PRODUCTS_D where prod_dsc = ‘7 Megapixel Digital Camera’

image 11

Now we can see that the changes made in the answers are reflected in the DB.
By using this simple this technique OBIEE can act as a front end form for updating data in the database.

privacy policy - Copyright © 2000-2010 ClearPeaks

topnav