New Book Review: "In Search of Database Nirvana"
New book review for In Search of Database Nirvana: The Challenges of Delivering Hybrid Transaction/Analytical Processing, by Rohit Jain, O'Reilly, 2016, reposted here:
Copy provided by O'Reilly.
Many potential readers of this book will likely dismiss it as being too theoretical in nature to be considered by practicing architects, because much of what this book presents, as discussed in the closing pages, is detailed coverage of the challenges for a query engine to support workloads spanning the spectrum that consists of OLTP on one end, to analytics on the other end, with operational and BI (business intelligence) workloads in the middle. However, Jain also rightly points out that this book can be used as a guide to assess a database engine, or combination of query and storage engines, geared toward meeting one's workload requirements, whether they are transactional, analytical, or a mix of these two.
If you do not have time to read this entire book, which is only about 50-pages in length, but a bit weighty, and are looking to explore database products for these reasons, I recommend minimally reading the five sections which open and close the discussion, comprising only 18 pages: (1) "The Swinging Database Pendulum", (2) "HTAP Workloads: Operational versus Analytical", (3) "Query versus Storage Engine", (4) "Assessing HTAP Options", and (5) "Conclusion". The remainder, which are technical explanations of the four challenges discussed by the author, can be potentially skipped: (1) "A Single Query Engine for All Workloads", (2) "Supporting Multiple Storage Engines", (3) "Same Data Model for All Workloads", and (4) "Enterprise-Caliber Capabilities".
However, in order to best address the concluding assessment questions, readers will likely find these latter sections helpful, especially if they haven't given these considerations recent thought. In my view, the author may have better positioned some of this content in the opening sections. Either way, the 49-question breakdown of the "What are the capabilities of the query engine that would meet your workload needs?" question, the 54-question breakdown of the "What are the capabilities of the storage engines that would meet your workload needs?" and "How well does the query engine integrate with those storage engines?" questions, the 5-question breakdown of the "What data models are important for your applications?", "Which storage engines support those models?", and "Does a single query engine support those storage engines?" questions, and the 19-question breakdown of the "What are the enterprise caliber capabilities that are important to you?" and "How do the query and storage engines meet those requirements?" questions will help to act as checklists, and provide some insight into where the book content is weighted more heavily.
Readers who have themselves needed to go through this decision making process for product selection, have been keeping up with the evolution of database products currently under way, or who are perhaps just familiar with product position movement in the DB-Engines Ranking, will likely have noticed the swinging database pendulum that Jain discusses at the outset. Because some products a decade or more ago shared a number of unfavorable characteristics, the pendulum swung to polyglot programming and persistence, using the best tool for the task. However, as some firms experienced greater operational workloads after adopting NoSQL database products, RDBMS capabilities became more relevant, especially because use of SQL was desired, and so products began to offer a combination of these capabilities. As the author comments, the term "HTAP" (hybrid transaction/analytical processing) probably comes closest to describing the way things are going to address all varied needs: the ability to run transactional/operational, BI, and analytic workloads against the same data without having to move it, transform it, duplicate it, or deal with latency.
While the smallest section of the book, I personally think that many readers will find the challenge-related chapter entitled "Same Data Model for All Workloads" of interest, and at less than 2 pages might be the one exception to my suggested reading plan for the time-constrained reader. The author comments that "there are key-value, ordered key value, Bigtable, document, full-text search, and graph data models. And to achieve the nirvana of a single storage engine, people keep stretching these data models to their limits by trying to have them do unnatural things for which they were not designed. Thus, there are entire parent-child relations implemented in documents, claiming that none of the RDBMS capabilities are needed anymore, and that document stores can meet all workload requirements." Jain then follows up by suggesting other hypothetical possibilities, but you get the point.
A good read overall, everything considered. Readers might find benefit in following up a reading of this book with "Seven Databases in Seven Weeks: A Guide to Modern Databases and the NoSQL Movement", which is still just as relevant now as it was back in 2012 and 2013 when I went through it myself, although if it were written in 2016 the chosen database products might be slightly different. Only representatives of relational, key-value, column-oriented, document-oriented, and graph database products were covered at the time. The HTAP genre, for example, was not covered, nor was the in-memory database, although admittedly there does exist overlap, as these other product types do not use the same categorization criteria.