By Erik Gfesser — Jul 9, 2020

Media Query Source: Part 26

The responses I provided to a media outlet on July 7, 2020:

Media: Is the pace of data and database movement to the cloud accelerating?

Gfesser: Firms are increasingly leveraging the cloud to both host and process data, using both managed and containerized services.

Decreased costs are driving much of this adoption. However, increased security and flexibility are also big draws.

Firms which had been reluctant to use the cloud for data due to security concerns, for example, have increasingly determined that public cloud providers provide needed security.

AWS states, for example, that the responsibility of their cloud is to provide security “of the cloud”, keeping in mind that customers are still responsible for security “in the cloud”.

This responsibility is shared between AWS and its customers: AWS is responsible for protecting the infrastructure that runs its service offerings, and customers are responsible for any configuration needed by services they choose to adopt.

A simple example of security “in the cloud” is users, roles, and permissions: it would be unreasonable to expect AWS to take care of this type of security since the individuals making use of services will vary from customer to customer, as will the data to be accessed.

Media: What issues do data managers face when deploying across multi-cloud environments? How difficult is it to integrate data from (or going to) multiple clouds?

Gfesser: Compatibility of chosen cloud services is one of the main challenges deploying across different cloud environments.

It goes without saying that data integration in general has been an ongoing challenge for many firms due to the disparate systems that are typically integrated.

While services provided by a particular public cloud, called “cloud native” services, can provide many conveniences, firms should expect these to work differently because of the uniqueness to each respective cloud.

In contrast, containerized services can provide consistency in terms of how data is processed, because the same code can be used, but upfront costs can often be higher because teams are responsible for additional legwork.

One way to strike a balance between these two extremes is to adopt common open source products that are either containerized or used as part of a cloud native ecosystem.

For example, Apache Spark is a data processing framework used under the covers in many commercial products, such as AWS Glue, Databricks Unified Analytics Platform, and a variety of commercial Hadoop distributions such as AWS EMR.

Because each of these cloud native services makes use of what is essentially the same open source (keeping in mind the specific versions used), code written for one can often be moved between these with little to no changes, limiting the needed heavy lifting to other aspects.

Over time, I expect more multi-cloud products to be offered by vendors independent of the cloud providers, alleviating some of the challenges associated with multi-cloud deployments.

See all of my responses to media queries here.

Subscribe to Erik on Software