New Book Review: "The Last Mile of Analytics"
New book review for The Last Mile of Analytics: Making the Leap from Platforms to Tools, by Mike Barlow, O'Reilly, 2015, reposted here:
Probably the best book I have read regarding the road ahead for the analytics space, despite its small size. In reference to the title, Barlow quotes the co-author (Flomenberg) of a 2013 white paper entitled "The Last Mile in Big Data: How Data Driven Software (DDS) Will Empower the Intelligent Enterprise". Generally speaking, the last mile of analytics is software that lets you make use of the scalable database management platforms that are becoming more and more democratized. And this software comes in two flavors. "The first flavor is data tools for technically savvy users who know the questions they want to ask. The second flavor is for people who don't necessarily know the questions they want to ask, but who just want to do their jobs or complete a task more effectively." The former includes software for ETL, machine learning, data visualization, and other types of processes that are likely to require trained data analysts, and the latter includes software that some people are now calling "the last mile of analytics": software that is more user friendly and business oriented.
The author later quotes the CEO of Wise.io (Erhardt), who furthers this thought: "The last mile is about time-to-value. It's about lowering barriers and reducing friction for companies that need to use advanced analytics but don't have millions of dollars to spend or years to invest in development. There are still people at some machine learning companies who think their customers are other people with doctoral degrees. There's nothing wrong with that, but it's a very limited market. We're aiming to help people who don't necessarily have advanced degrees or millions of dollars to get started and begin using advanced analytics to help their business." Additionally, as the co-founder and CEO of Dato (Guestrin) comments, "There are only a small number of people in the world with deep experience in machine learning algorithms. But there is a much wider range of people who want to use machine learning and accomplish super-creative things with it."
As Barlow explains in the introductory pages to this freely available book from O'Reilly, the first generation of big data analytics vendors was interested in development and focused on creating platforms for use by modelers and developers, but the new generation of vendors is interested in deployment and focused on delivering analytics directly to business users. And advanced analytics is not simply BI (business intelligence) on steroids. As Erhardt explains, "BI typically relies on human judgments. It almost always looks backward. Decisions based on BI analysis are made by humans or by systems following rigid business rules. Advanced analytics introduces mathematical modeling into the process of identifying patterns and making decisions. It is forward-looking and predictive of the future. It's important to distinguish between classical statistics and machine learning. At the highest level, classical statistics relies on a trained expert to formulate and test an ex-ante hypothesis about the relationship between data and outcomes. Machine learning, on the other hand, derives those signals from the data itself."
If you are interested in this topic, but do not have time to read the entire text of this book, I recommend at least checking out the two diagrams provided by the author. The first depicts how the big data stack has split into two main components: "above-the-line" technologies, and "below-the-line" technologies. The former component consists of data-as-a-product, data tooling, and data-driven software, and the latter component consists of data platforms, data infrastructure, and management/security. Flomenberg explains that there is room for a few winners in data tooling and data management, but the data-driven software market (with hundreds of billions of dollars at stake) is essentially up for grabs. And the co-founder of The Hive (Ravi), a venture capital and private equity firm that backs big data startups, sees a Maslow-type pyramid, depicted by the second diagram, with BI at the bottom. The next levels up are human correlation, data mining, and predictive analytics. with closed-loop systems at the top. Moving up the pyramid, data management techniques become increasingly action-oriented and more fully automated.