Community Comment: Part 11
The comments I provided in reaction to a community discussion thread:
https://www.linkedin.com/feed/update/urn:li:activity:6852656686390358016?commentUrn=urn%3Ali%3Acomment%3A%28activity%3A6852656686390358016%2C6852961081292328960%29
Principal at Data Analytics Firm:
Two months ago I saved a startup client $15,000 a year on their Snowflake bill by simply sorting two of their BIG multi TB sized log data tables, changing around ETL jobs so all data coming in is now sorted, and allowing the micro-partitioning capabilities of Snowflake to break down the table presorted. Removed the cluster keys – all they were doing was costing money.
Dashboards went from taking 10-15 mins to run to 10-15 seconds for same queries. 60x improvement in performance from sorting instead of using cluster keys. Took 24 hours – 3 days of time total – to scope, build and test, implement, and deliver.
Last week, did similar for a different customer.
What was in place was a "one-layer" data warehouse – the warehouse is all passthrough – no transformed data hitting Looker, just a giant multi-TB table that took 20-30 minutes to run queries on.
Created a transformation job that simply dedupes and sorts, sorted all data in final BI-facing table, re-pointed Looker model to new table, dashes now run in 15-20 seconds. All cluster keys removed. Total savings is a delta of $5,000 a year, plus now end users don't wait a half hour for data.
Snowflake is awesome. It's my favorite data warehouse, it's the best solution for small data teams of 3-10 headcount in terms of TCO and speed to delivery, and it's the core of the modern data stack. I also like using stages and running COPIES from a bucket to Snowflake – keeps data quality and ownership on a small data team instead of keeping it on an engineering function.
But…
Snowflake's ease of use leads to bad practices. Poor clustering, low transformation, clicking a larger T-shirt size warehouse to pay to make tech debt go away.
Snowflake claims they killed the DBA. I think that's true. But the skillset of structuring objects and tables, actively managing resources, and basic data architecture of how tables are sorted and partitioned is lacking in today's data market.
And it's a problem of supply not meeting demand – there just are not that many practitioners out there who can do solutions architecture on cloud data stacks well – in my opinion and in what I see in the market, it's maybe ~5k people globally who can build a quality cloud data stack today.
If you have Snowflake in place and your BI dashboards are taking minutes to run, something is wrong and we should talk.
Gfesser: Absolutely true, thanks for saying this!: "The skillet of structuring objects and tables, actively managing resources, and basic data architecture of how tables are sorted and partitioned is lacking in today's data market. And it's a problem of supply not meeting demand – there are just not that many practitioners out there who can do solutions architecture on cloud data stacks well – in my opinion and in what I see in the market, it's maybe ~5k people globally who can build a quality data stack today." I've lost count of how often I've seen stakeholders at firms trivialize this type of work, completely not understanding the effort involved. And interestingly, from my experience many stakeholders trivialize simply *because* they don't understand.