New Book Review: "Seven Databases in Seven Weeks"
New book review for Seven Databases in Seven Weeks: A Guide to Modern Databases and the NoSQL Movement, by Eric Redmond and Jim Wilson, Pragmatic Bookshelf, 2012, reposted here:
Over the past couple years, I have read considerably about non-traditional database products, whether they be categorized as NoSQL or NewSQL, especially the Hadoop ecosystem (see my reviews on "HBase: The Definitive Guide" and "Hadoop in Action"), but I only just recently completed a reading of this book after pre-ordering it over a year ago. I share the sentiment of other reviewers to some extent, in the sense that even though the authors call the content that they offer here a "crash course workshop" written for "experienced developers", the discussions of each database product vary in terms of the detail and pace at which they are presented. That said, this book does offer a look at the modern database landscape from the perspective of a developer, and presents material on each database product to an extent which prompts the reader to look to other resources for additional detail, a practice to which I have grown accustomed as a consultant, so this aspect of the book is not a negative thing in itself, just something of which readers need to be aware in the case they are not accustomed to this style of presentation.
By this point, there are enough reviewers who have discussed the fact that Redmond and Wilson explore seven open source database products in this book: Redis, Neo4J, CouchDB, MongoDB, HBase, PostgreSQL, and Riak. Since there are literally hundreds of open source database products, it helps to understand the fact that one of the reasons these seven database products was chosen was because they span several genres of database that were designed to solve problems presented by real use cases. PostgreSQL is the one relational database discussed. Riak and Redis are key-value stores, HBase is a column-oriented database, MongoDB and CouchDB are document-oriented databases, and Neo4J is a graph database. Since I have already gained considerable exposure to the Hadoop ecosystem, I concentrated on the six chapters not covering HBase, and in reading the HBase material I can tell you that it really just scratches the service of the product, so this served as a personal reminder of what the authors state multiple times throughout what they have to share here: this is introductory material.
Coverage of each database product follows a similar pattern over hypothetical three-day periods of time. If you are interested in reading this book, do not be intimidated by how the material is laid out according to the calendar. Each set of three days is really just three steps of progression, diving deeper with each step. The only exception to this pattern is coverage of Redis, the last database product covered. While technically the third step of the Redis discussion involves Redis, what it really provides is introductory material on polygot persistence, which involves database products working together. In the example that the authors present, CouchDB is the system of record, Neo4J handles data relationships, and Redis helps with data population and caching. The authors even present a good sidebar on why use of nonblocking code is such an important method when dealing with databases. The wrap-up that follows each three-day time period outlines the the strengths and weaknesses of each database product, and a wrap-up chapter at the end of the book outlines the strengths and weakness of each database genre, followed by an appendix that provides informational tables that compares each of the database products from several different angles.
While HBase is still the database product that attracts the data architecture aspects of my consulting career the most, of the seven database products covered PostgreSQL is the one that I have actually started using, due to practical reasons, but I am also increasingly interested in Neo4J, a graph database product providing full ACID compliant transactions to which I was first exposed at SpringOne a couple years ago. It is interesting that although this book was reasonably targeted at an audience consisting of "experienced developers", it is not a stretch to say that many developers naively consider data availability a given on their project assignments, sometimes because they do not want to deal with the data, and sometimes because they think the data is the easy part. Books like this which help bridge the gulf that often exists between those who seem to think that Java or some other language is all that matters in the enterprise, and those who consider the viability of only one database product, serve a great need, and do not fall into the set of O'Reilly texts that have strayed off course in recent years.