Community Comment: Part 8

The comments I provided in reaction to a community discussion thread:
https://www.linkedin.com/feed/update/urn:li:activity:6764517635876040704?commentUrn=urn%3Ali%3Acomment%3A%28activity%3A6764517635876040704%2C6766303415871041537%29

Head of Experimentation at Social Media Platform:

We should re-title “data cleaning” as “understanding the data”. Why?

1. It’s not a bad thing to spend 80 percent or more of our time deeply understanding the data.

2. Cleaning the data well requires understandings it’s nuances. Cleaning is just a small part.

3. Data cleaning brings up the wrong image. We aren’t trying to make it perfect, we’re trying to actively prep it for analysis.

4. Understanding the data is hard. It’s why it takes so much time. Cleaning feels too easy.

5. It’s a science in and of itself. We should treat it like that 🙂

What do you think? Does this represent how you think?

Gfesser: Data cleansing might be a misinterpreted activity, but #3 seems to introduce circular reasoning. According to this statement, the purpose of data cleansing is to prepare data for "analysis", suggesting that analysis is the next step, but analysis wouldn't be needed if it's already "deeply understood". The definition of "analysis" could probably use some fleshing out as well.

Subscribe to Erik on Software

Don’t miss out on the latest issues. Sign up now to get access to the library of members-only issues.
jamie@example.com
Subscribe