Community Comment: Part 14

The comments I provided in reaction to a community discussion thread:
https://www.linkedin.com/feed/update/urn:li:activity:6871126959674400769?commentUrn=urn%3Ali%3Acomment%3A%28activity%3A6871126959674400769%2C6873206176721506304%29

Data Warehouse Architect & Developer at Healthcare Data Consultancy: I keep seeing this again and again: databases like Snowflake, Redshift, or BigQuery that are called 'Data Warehouses' when discussed in the context of acquiring them. They are not.

What they are are 'Analytical Databases': they are tools / technology fit to store data structures in to be used in an analytical context: write once, read many, good in joins, aggregations, etc…

A 'Data warehouse' is WHAT you can decide to build using these tools: a 'subject oriented, time variant, non volatile collection of data / information for analytical decision making'. Thus, Data warehousing is about implementing data architectural patterns for analytics. When building a good 'Data Warehouse' you need many more tools then only an 'Analytical Database'

Why is this important?

We can also use an 'Analytical Database' to make a complete mess of everything, to build something that definitely is NOT a "data warehouse".

I really believe we should stop naming technology as something we can BUILD by USING that technology.

After all, we also don't call a HAMMER a CHAIR, right? Using this as an example, it's immediately clear how weird this would be…

We could say: 'it is only language', who cares?

But IMHO, when working in this field of data / analytics, if there is one thing that we should try to do is use our language in a consistent way.
If not, we're really making it more complex then it needs to be…

Gfesser: Good post. Another way to put this would be to simply differentiate between infrastructure and solution. In working with some talented infrastructure folks over the years, I've often commented that with respect to data-intensive solutions, infrastructure isn't synonymous with data architecture. Data architecture is the implemented solution (models, code etc) making use of this infrastructure. Data models and data distribution strategies for Amazon Redshift, for example, aren't things that come out of the box with a Redshift deployment. Why? Because the data and use cases need to be taken into account.

Co-Founder at Data Engineering Firm: Well said. What is your take about 'data mesh'? Can I buy a solution for that?

Data Warehouse Architect & Developer at Healthcare Data Consultancy: Hi [Co-Founder at Data Engineering Firm], I would say the answer to this is a hard 'NO'. Data Mesh is such an emerging concept that it is still somehow unclear WHAT is exactly is and even more, HOW to practically implement it. What is describes is a vision of a fully decentralized data / analytics architectural pattern. A pattern where Data 'sets' are threated 'as a product'. Where ownership of data products are pushed down to 'the business'. The main work we need to do to get there has to do with culture change….

So, this example of a 'data mesh product' can, – by defacto -, NOT be fully true: https://www.starburst.io/

Senior Cloud Architect Data & AI at Microsoft: [Data Warehouse Architect & Developer at Healthcare Data Consultancy] lets agree to disagreement.

Data Warehouse Architect & Developer: ok, let me rephrase my answer: you CAN 'buy' different tools, solution components and write software that can be used to build a data mesh with. A data mesh itself, just like a data warehouse is not something that can be bought. Furthermore, there is no SINGLE technology / tool in existence that can be used to build a data mesh with.

Senior Cloud Architect Data & AI at Microsoft: [Data Warehouse Architect & Developer at Healthcare Data Consultancy] agree!

Co-Founder at Data Engineering Firm: [Data Warehouse Architect & Developer at Healthcare Data Consultancy] I think the same way. Data Mesh is not something that came out of the box. Neither the data warehouse, data lake, Lakehouse. But some companies are taking advantage of the new buzzwords in the space. I like the data mesh concept. I like Starburst. But they are not the same.

Gfesser: My team and I built a hybrid of a third and fourth generation data platform in 2019-2020, with data mesh being the fourth generation. As some other commenters have said, it takes multiple products/services and custom code to create a data mesh. A shrink-wrapped data mesh cannot be purchased. And if and when such a product could actually be purchased down the road, by that time we will already be well on our way toward a fifth generation. For anyone interested, I wrote a blog post on this topic:
Building a Data Platform on AWS from Scratch: Part 2
https://www.linkedin.com/pulse/building-data-platform-aws-from-scratch-part-2-erik-gfesser

Subscribe to Erik on Software

Don’t miss out on the latest issues. Sign up now to get access to the library of members-only issues.
jamie@example.com
Subscribe