Community Comment: Part 27 - Assessment is needed when deciding between transactional and analytical modeling

  • Transactional versus analytical modeling
  • Discussion with purported staff data engineer
  • Making broad generalizations doesn't help
  • Quoting $ savings in vacuum not advised

The comments I provided in reaction to a community discussion thread.

Owner of Data Engineering Training Firm and former Staff Data Engineer at Airbnb:

If you model your data for analytics and you're doing transactions, you're going to have a bad time.

If you model your data for transactions and you're doing analytics you're going to have a bad time.

Why is this the case?

Transactional data modeling (OLTP) is optimized for access to data of a single user or entity.

Analytical data modeling (OLAP) is optimized for accessing the entire data set.

If you have an OLTP use case and you model OLAP, you'll get excess IO and unnecessary data for your transaction.

If you have an OLAP use case and you model it OLTP, you'll get excess compute and JOINs and shuffle your data all over the place.

Most online systems are OLTP. Most data engineers' jobs are taking that OLTP model and converting it into a robust OLAP system for data scientists.

#dataengineering

Developer Advocate at Cloud Data Warehouse Software Firm:

That used to be the case, but isn't anymore.

Highly normalized structures work perfectly fine in modern data warehouse platforms. Joins are of little cost anymore.

This reduces the necessity to transform due to physical limitations and can decrease time to market for any given project by up to 40%

Owner of Data Engineering Training Firm and former Staff Data Engineer at Airbnb:

I don't believe this is true and sounds more like marketing than reality.

For example, although you'll call this an edge case probably, it took over 20 database tables joined together to determine the price and availability of a given listing on a given night at Airbnb.

You're saying there's no value in making an analytical table that people can query since they can just do the joins themselves right?

Those joins took 15 minutes to run.

The materialized tables I gave them gave an answer in <2 seconds.

Gfesser:


Both of you are making very broad generalizations without mentioning all the factors that come into play. And while both of you are correct for some use cases, quoting time-to-market savings is always ill advised in a vacuum: this should never be done until after an assessment has been performed.

Subscribe to Erik on Software

Don’t miss out on the latest issues. Sign up now to get access to the library of members-only issues.
jamie@example.com
Subscribe