New Book Review: "Time Series Databases"
New book review for Time Series Databases: New Ways to Store and Access Data, by Ted Dunning and Ellen Friedman, O'Reilly, 2015, reposted here:
This book specifically centers around time series databases (TSDB). Although collecting and analyzing time series data is not new, as the authors explain, the current task of building scalable time series databases is a huge challenge, calling for new approaches and new tools in light of the immense volume, velocity, and variety of data especially associated with machine data.
While time series databases are not a distinct category of databases apart from relational, key-value, column-oriented, document-oriented, and graph databases discussed in such books as "Seven Databases in Seven Weeks: A Guide to Modern Databases and the NoSQL Movement" (see my review), use of databases for time series data serves a distinct purpose. For example, Apache HBase is a column-oriented database that can be used for time series data.
Dunning and Friedman first explain the value of using time series data, and present an overview of modern use cases followed by a comparison of relational databases versus non-relational databases in the context of time series data. The latter half of the text provides an explanation of the concepts involved in building a high performance time series database, followed by some brief discussion on related topics.
The authors correctly state in their introductory chapter that time series data indicates when something actually took place. Because data can be recorded long after it is measured, time series data does not actually indicate when it was recorded. Such data requires a bitemporal database, which is beyond the scope of this book. Readers might be interested in knowing that an excellent (although very long-winded) book exists on this subject entitled "Managing Time in Relational Databases: How to Design, Update, and Query Temporal Data" (see my review).
Readers should be aware that Hadoop-based databases Apache HBase and MapR-DB are the focus in this book because they can rapidly ingest time series data, and also provide support for rapid, efficient queries of time series data. Usage of Open TSDB and Grafana alongside either of these databases is also discussed, just realize that MapR-DB extensions that enable direct BLOB loading are presented as the solution for accelerated performance.
Of the 8 chapters in this book, Chapter 3 ("Storing and Processing Time Series Data") and Chapter 4 ("Practical Time Series Tools") are the core of the book. Conversely, Chapters 5 through 8, while providing brief glimpses into some interesting related topics, do not provide much value and can be safely skipped by most readers. A decent 70-page read available free from O'Reilly that can be consumed in an afternoon.