New Book Review: "Going Pro in Data Science"

New book review for Going Pro in Data Science: What it Takes to Succeed as a Professional Data Scientist, by Jerry Overton, O'Reilly, 2016, reposted here:

Going_pro_in_data_science

Stars-5-0._V47081849_

Probably the most practical text on how to succeed as a data science professional. While I have personally read dozens of blog posts on this topic, most of them heavily concentrate on data modeling, and although this aspect is important, it does not explain the other areas on which the data scientist needs to concentrate, and does not explain the big picture. Overton however demonstrates his ability to look under the covers so-to-speak and explain first-hand his thoughts on this subject from personal experience.

As a reader, I knew immediately from the author's refreshing introduction that he takes a more pragmatic approach than other individuals in the data science field: "The common ask from a data scientist is the combination of subject matter expertise, mathematics, and computer science. However I've found that the skill that tends to be most effective in practice are agile experimentation, hypothesis testing, and professional data science programming."

"This more pragmatic view of data science skills shifts the focus from searching for a unicorn to relying on real flesh-and-blood humans. After you have data science skills that work, what remains to consistently finding actionable insights is a practical method of induction. Induction is the go-to method of reasoning when you don't have all the information…In practice, finding useful evidence and interpreting its significance is the key skill of the practicing data scientist – even more so than mastering details of a machine learning algorithm."

Overton furthers these thoughts by stating that the goal of this book is to communicate what he has learned so far about data science process that works: (1) start with a question, (2) guess at a pattern, (3) gather observations and use them to generate a hypothesis, (4) use real-world evidence to judge the hypothesis, and (5) collaborate early and often with customers and subject matter experts along the way. This contrasts greatly with what is generally available in the marketplace, which typically focuses on this fourth step.

The author follows up by noting that "at any point in time, a hypothesis and our confidence in it is simply the best that we can know so far. Real-world data science results are abstractions – simple heuristic representations of the reality they come from. Going pro in data science is a matter of making a small upgrade to basic human judgment and common sense. This book is built from the kinds of thinking we've always relied on to make smart decisions in a complicated world."

The remainder of the content that the author presents, following his introduction, revolves around the pragmatic data science approach that he takes: (1) how to get a competitive advantage using data science, (2) what to look for in a data scientist, (3) how to think like a data scientist, (4) how to write code, (5) how to be agile, (6) how to survive in your organization, and (7) the road ahead in terms of data science today and data science tomorrow.

While a rarity with most technically focused texts, not to mention freely available reports from O'Reilly such as this one, this book is actually strong throughout and I do not recommend skipping any of the chapters, even though some readers will likely be tempted to ignore the last three chapters. As an agile software application architect and developer, for example, one would think that I could just ignore the "Lessons Learned from a Minimal Viable Experiment" section in the sixth chapter, but the author's "Don't Worry, Be Crappy" follow-up section made me take notice.

What readers will get out of a text like this is going to vary considerably, but keep in mind that similar terminology can be used in different domains but mean different things due to context. With regard to minimal viable experiments in particular, the counterpart of which are minimal viable products of the domains in which I have traditionally worked, Overton makes some great points. And keep in mind as you make your way through this book that he is a data scientist and distinguished engineer at CSC.

In my opinion, one of the best statements that the author makes is the following, at the conclusion of the third chapter: "We tend to judge data scientists by how much they've stored in their heads. We look for detailed knowledge of machine learning algorithms, a history of experiences in a particular domain, and an all-around understanding of computers. I believe it's better, however, to judge the skill of a data scientist based on their track record of shepherding ideas through funnels of evidence and arriving at insights that are useful in the real world."

Subscribe to Erik on Software

Don’t miss out on the latest issues. Sign up now to get access to the library of members-only issues.
jamie@example.com
Subscribe