Clud and Big Data

What can Cloud do for BI and Big Data?

This is the second article in our series about what Big Data, Cloud and Advanced Analytics can do for Business Intelligence (BI). In the previous article, What can Big Data do for BI? , we explained how Big Data technologies can overcome Business Intelligence limitations: massive datasets, real-time data and different types of data are no longer problems, but opportunities to leverage deeper and more accurate business insights.

 

In this article we are going to explain how Cloud computing opens a new world of possibilities regarding the deployment of BI and Big Data technologies.

 

1. Cloud overview

 

Cloud has been a buzzword for the last 10 years, and whilst at the beginning some considered that it would never get past the “hype” phase, Cloud technologies have now reached their technological plateau and are widely used both personally and professionally.

 

First, let’s recap how the NIST (National Institute of Standards and Technology) defines the essential characteristics of Cloud computing: on-demand self-service, broad network access, resource pooling, rapid elasticity and measured service.

 

Cloud computing is offered in three different service models:

 

  • IaaS – Infrastructure as a Service: In this model the Cloud vendor provides access to computing, network and storage resources. Computing resources are offered as virtual machines or dedicated servers, storage resources are mainly offered as block, file or object storage, and network resources provide the capability to securely interconnect the rest of the resources as well as to provide Internet access. In this service model the Cloud provider oversees all the underlying infrastructure, including data centre management, energy, connectivity, hardware and virtualization, while the customer has control over the operating systems (OSs) and the applications running on top of the computing services provided. It is worth mentioning that the underlying infrastructure is shared by all customers.
  • PaaS – Platform as a Service: In this model the Cloud vendor manages an extra layer on top of the IaaS model to provide a platform to be consumed by the customer. The customer no longer oversees the OS, but focuses solely on the development, management and delivery of applications.
  • SaaS – Software as a Service: In this model the Cloud vendor manages an extra layer on top of PaaS to deliver software to customers. All the application stack is running on Cloud resources, and the client can use the software without any installation on local devices.

 

The following matrix compares the three Cloud services models plus an on-premise (bare-metal) deployment. The green bars are layers managed by the customer and the grey bars are layers managed by the Cloud provider. We can see, as explained before, that moving towards a SaaS model, the customer has fewer elements to manage.

 

 

It is important to bear in mind that as we move towards a SaaS model, in addition to less management there is also less complexity but also less customization. Finding the right balance between management and customization will be key for a successful Cloud transition.

 

2. Cloud benefits

 

Next, let’s look at the main benefits of using Cloud technologies, remembering that these benefits might vary depending on the customer and the platform. It is advisable to carry out a tailored analysis to identify the benefits you can expect in terms of cost and functionality before moving to the Cloud:

  • Agility: Ease to scale up or down the platform according to requirements.
  • Reliability: Cloud platforms are more reliable and have higher Service-Level Agreements (SLAs) than most in-house platforms. Some Cloud services include built-in High Availability (HA).
  • Cost: Reduce the cost of the platform by only paying for the time the resources are consumed.

 

3. Cloud, Business Intelligence and Big Data

 

After this brief introduction to Cloud technologies, let’s analyse how they can impact BI and Big Data. In our previous blog article What can Big Data do for BI?, we reviewed traditional BI architecture and we discussed how to upgrade it by leveraging Big Data technologies – building a Big Data platform based on Hadoop and upgrading the Data Warehouse (DWH) with the latest database developments. Figures 1 and 2 depict traditional BI architecture and BI with Big Data architecture respectively.

Figure 1: Traditional BI architecture

 

Figure 2: BI with Big Data architecture

 

The impact of Cloud technologies on BI and Big Data is essentially that there are now many more deployment options for the various components of either architecture, and different components can now be deployed with different approaches.

 

When deploying a BI architecture with or without Big Data we need to decide on various deployment aspects: we need to choose the Software stack to use, and whether to go for an on-premise deployment, Cloud or hybrid. In the case of an on-premise deployment, we need to decide on the Hardware vendor; likewise, in a Cloud deployment we need to choose the Cloud provider and we need to select if we opt for IaaS or PaaS services for each component. On top of all that, we also need to dimension the components properly.

 

There are so many deployment combinations that it is impossible to cover them all in a single article, so here we will use a fictitious case study.

 

4. Fictitious case study

 

4.1. Introduction 

 

Let’s suppose a fictitious company has a BI platform following traditional BI architecture (Figure 1), and the platform is currently fully deployed on-premise on dedicated hardware. After reviewing the long-term BI strategy with ClearPeaks, the executive level of the company decides to upgrade their BI platform with Big Data technologies (Figure 2). Moreover, it is also keen on leveraging Cloud technologies and, of course, ClearPeaks is offering guidance and driving the company in its upgrade process and its Cloud journey.

After reviewing the BI strategy, gathering requirements and pertinent discussions with the IT department, ClearPeaks has recommended the following deployment approach:

  • The Big Data platform will be deployed on a fresh installation in the Cloud.
  • The existing Data Warehouse, which is currently running on on-premise hardware, will be migrated and upgraded to the Cloud.

 

This example offers us two typical situations most companies face today: (i) deploying a component or an entire platform in the Cloud from scratch, and (ii) migrating and upgrading a component or an entire platform from an on-premise deployment to the Cloud.

 

4.2. Big Data platform Cloud deployment

 

Let’s review what our options are when deploying a fresh installation of a Big Data platform in the Cloud.

 

  • IaaS approach: The Cloud vendor provides the virtual machines (VMs), the storage and the connectivity required, so we do not need to deal with the physical hardware installation, cabling, etc. However, we are in charge of setting up the OS, installing the applications, configuring HA, etc. This approach gives us great flexibility and control over our platform at the cost of considerable administration efforts. We can select our preferred Hadoop software distribution and add any Hadoop ecosystem tool we require, including real time and NoSQL tools. We can customize the number of masters and workers and the resources reserved for the nodes (CPUs, RAM, storage).
  • PaaS approach: The Cloud vendor deploys and configures the services looking after the underlying resources for the user transparently. We only have to provide the amount of resources required and we no longer need to deal with the software installation. As mentioned at the beginning of the article, when we shift towards a more managed model we lose customization. A Big Data PaaS service usually consists of a pre-configured Hadoop-based cluster with certain preloaded and preconfigured tools and, in most cases, we cannot add more tools. For example, with Azure HD Insight and Amazon EMR the storage layer is handled by a separate service, such as Azure Data Lake Store or Amazon S3. Adding services for real-time pipelines or NoSQL workloads means adding separate services. It is also possible to create platforms that combine IaaS services with PaaS services. For example, we could create a Hadoop cluster in an IaaS approach but we could have additional storage leveraging PaaS services, like the previously mentioned Azure Data Lake Store or Amazon S3.

 

Whether to recommend one approach or the other will depend on many factors: can we cover the functionality we need with the available PaaS services? Do we prioritize the decrement (possibly complete removal) of administration efforts over having greater flexibility and full control over the platform? Do we have workloads that are non-recurrent or computing needs that are not 24/7?

 

If the answer to all these questions is yes, then we will most likely recommend a PaaS approach. In other cases, an IaaS approach may be more suitable.

 

4.3. Data Warehouse Cloud migration

 

In the previous section we reviewed what our options are when planning a fresh deployment in the Cloud. In this section we’ll focus on the migration of an existing platform, in this case the DWH, from on-premise to the Cloud. As described in this Forbes article, when migrating an existing platform to the Cloud, there are different strategies we can follow.

 

  • Re-host or Lift and Shift: By leveraging IaaS services we deploy the same platform and applications we have on-premise but run them in the Cloud. The existing development will only require a little fine tuning; this type of strategy is the most straightforward. In the context of the Data Warehouse, we would create a set of IaaS VMs and install the software as it is on-premise. Finally, we would do a DB migration as we would between two on-premise systems.
  • Re-factor (or Re-architect) or Lift and Reengineer. This approach is a full transformation. It would convert our current DWH to a Cloud-native platform leveraging PaaS services. In the context of the Data Warehouse, we would change our existing on-premise DB into a native Data Warehouse PaaS service in our Cloud of choice – AWS Redshift, Azure SQL Data Warehouse or Google Cloud Platform BigQuery.
  • Re-platform or Lift and Reshape: This approach is usually taken by companies who cannot afford to transform their environment to a full Cloud-native, either because of management constraints or because the Cloud does not provide all the required functionalities. Nevertheless, “re-platforming” the environment would still provide some optimizations. A mix of IaaS and PaaS services would be used. In the context of the Data Warehouse, we would change our existing on-premise DB to a managed DB offering, possibly also adding autoscaling.

 

Recommending which Cloud migration strategy to follow will depend on a series of aspects: are our time constraints soft, or do we require minimal downtime? Do we prioritize lower administration concerns over flexibility and control? Are the available PaaS services compatible with the rest of our software stack? If the answer to all the questions is yes, then we will most likely recommend a Re-factor strategy. In other cases, re-hosting or re-platforming may be more suitable.

 

Conclusion

 

This article is a high-level exercise to introduce Cloud technology and the range of scenarios when it comes to leveraging it for BI and Big Data deployments. Whether you are considering a fresh installation or a migration to the Cloud, think of ClearPeaks as your guide. We have a team of experts to help you in your transition to the Cloud. We can:

 

  • Help you define the right migration strategy to the Cloud (“Lift and Shift”, “Lift and Reengineer” or “Lift and Reshape”), depending on your situation.
  • Perform a Cloud pilot in which we migrate a representative slice of your BI platform to the Cloud in the space of a few weeks. This will help you to understand the time and consultancy cost of a full Cloud migration project and it will also provide you with a cost estimation for a complete BI platform hosted in the Cloud. This is especially recommendable when considering PaaS services in which cost estimation is not trivial, since users do not have direct control over the size of allocated resources.
  • Analyze, define, plan and execute Cloud migrations. This includes functional and business requirements gathering, the design and definition of the infrastructure, the planning of the Cloud migration process with minimal downtime, interaction with the Cloud providers, the creation of the Cloud services required, the execution of the actual migration process and the configuration and tuning of the final platform.

 

For more information, please contact us and we will be happy to help you.

Esteve B
esteve.bosch@clearpeaks.com