New Book Review: "Big Data Now"
New book review for Big Data Now: Current Perpsectives from O'Reilly Radar, O'Reilly Radar Team, O'Reilly, 2012, reposted here:
Thorough introduction to the current Big Data landscape. After discussions of data science, associated tooling used in this space, and issues likely to be encountered with data, which comprise about half the text, the editors shift the discussion to the application of data science findings and the business of data. The only dissenter in terms of the effectiveness of what the authors share here actually touches upon a benefit rather than a drawback when they write that this book was apparently not written by data scientists: because the authors write with management in mind, the discussion does not get lost in the many technical details that comprise Big Data, but instead provides a compact, highly readable summary of a range of related subject matter.
The authors tackle Big Data terminology well, beginning with the term "Big Data" itself. "We've all heard a lot about 'big data' but 'big' is really a red herring. Oil companies, telecommunication companies, and other data-centric industries have had huge datasets for a long time. And as storage capacity continues to expand, today's 'big' is certainly tomorrow's 'medium' and next week's 'small'. The most meaningful definition I've heard: 'big data' is when the size of the data itself becomes part of the problem. We're discussing data problems ranging from gigabytes to petabytes of data. At some point, traditional techniques for working with data run out of steam." The aspect that makes what is now being attempted different is that information platforms designed to explore and understand data, beyond traditional business intelligence, are being built.
As a consultant, I especially appreciate the links that the editors provide throughout for tooling and other technical subjects that would otherwise significantly increase the length of this white paper sized book. The interviews with individuals from companies in this space, such as Infochimps and Gnip, is also appreciated. And although it ends rather abruptly, the last chapter on the business of data, which compliments the first chapter, is especially well done, including a discussion of the emerging Big Data stack comprised of both open source and commercial products that furthers the presention of the SMAQ stack (Storage, MapReduce, and Query) discussed earlier in the book. Recommended to anyone within or looking to enter the Big Data space.