Community Comment: Part 23 - Architecture diagrams can often mislead

  • Architecture diagrams can often mislead
  • Databricks isn't a storage solution
  • Databricks is a Hadoop successor
  • Both ETL and ELT involve input & ouput

The comments I provided in reaction to a community discussion thread.

Product Manager at Data Modeling Product Firm:

It's interesting how data storage technologies drive the whole data industry. Basically, companies like Databricks or Snowflake, which provide data storage solutions, create the world of data, and all other companies have to adapt their product to remain in the stream.

Gfesser:


Databricks isn't a "data storage solution", and arguably, the "source" and "storage" areas of the diagram in this post are misleading. For example, why is Hadoop listed only as a potential source? Usage of Hadoop has certainly decreased significantly over time, but it's not really a source: it's technically an alternative to Databricks (after all, Hadoop is a predecessor). Perhaps most confusing are the arrows between source and storage for "ELT" and "reverse ETL". Part of me chuckles when hearing all the recent vendor talk about reverse ETL, wondering when "reverse ELT" will be introduced. Seriously, though, the source and storage areas of this diagram really need to be broken down further to help prevent misinterpretation, and a box needs to be drawn around everything to the right of the source area to delineate what is actually part of the platform. Oh, wait, perhaps I'm oversimplifying: doesn't reverse ETL involve output? 🙂

Subscribe to Erik on Software

Don’t miss out on the latest issues. Sign up now to get access to the library of members-only issues.
jamie@example.com
Subscribe