New Book Review: "The Practitioner's Guide to Data Quality Improvement"
New book review for The Practitioner's Guide to Data Quality Improvement, by David Loshin, Morgan Kaufmann, 2010, reposted here:
Some of the other reviews that have been posted here provide some interesting observations from perspectives that are not always centered on data architecture or general enterprise architecture, and the hope of this reviewer is that he will be able to offer feedback to others on this text based on his consulting experience in these areas. In his preface, David Loshin comments that "this book is intended to provide the fundamentals for developing the enterprise data quality program, and is intended to guide both the manager and the practitioner in establishing operational data quality control throughout an organization, with particular focus on the ability to build a business case for instituting a data quality program", "the assessment of levels of data quality maturity", "the guidelines and techniques for evaluating data quality and identifying metrics related to the achievement of business objectives", "the techniques for measuring, reporting, and taking action based on these metrics", and "the policies and processes used in exploiting data quality tools and technologies for data quality improvement".
With these goals in mind, this reviewer thinks Loshin succeeded in this effort. Taking into account the fact that data quality is an enormous practice area, and success requires understanding of both data and the business to succeed, this introductory text walks the reader step-by-step through a considerable number of topics over which many authors would likely stumble. Some of the explanations that Loshin provides, such as the one in the chapter entitled "Developing a Business Case and a Data Quality Road Map" on how data flaws can incur business impacts, are extremely well done, especially when married with effective diagrams. And in his chapter entitled "Metrics and Performance Improvement", the author provides an explanation on drilling through key performance indicators that this reviewer has not seen elsewhere until this effort, and the presentation is exceedingly well done. Other areas of this text that this reviewer especially appreciates are the chapters entitled "Data Requirements Analysis", "Metadata and Data Standards", and "Inspection, Monitoring, Auditing, and Tracking".
This reviewer however would like to make potential readers of this book aware that most of what Loshin provides here is high level walkthroughs and examples of pertinent elements within data quality, rather than practical advice on how to approach much of the lower level work that should be expected to take place on a day-to-day basis. For example, in the chapter entitled "Entity Identity Resolution", the author provides a section on matching algorithms that briefly discusses parsing and standardization, abbreviation expansion, edit distance, phonetic comparison, and n-gramming, which consumes just a few short paragraphs. The author does not explain that there are many more matching algorithms currently in use in industry, that in most cases matching exercises need to take into account multiple rather than single algorithms in isolation, that in the world today internationalization takes an ever more important role when performing matching, and that there is a wide variety of commercial tooling available that needs to be assessed based on the needs of the organization.
However, armed with this knowledge the reader is sure to make use of this work by utilizing it while planning and strategizing data quality, as well as making use of it as introductory material to understanding what it might take to pursue efforts that require a higher level of data quality maturity such as master data management (MDM), in which case this reviewer recommends "Enterprise Master Data Management: An SOA Approach to Managing Core Information" by Allen Dreibelbis, Eberhard Hechler, Ivan Milman, Martin Oberhofer, Paul van Run, and Dan Wolfson (see my review). In the opinion of this reviewer, what Loshin provides here is best suited for managers looking to piece together all of the steps associated with data quality pursuits as well as get a better handle on how each of the steps are interrelated and whether each is a requirement or just an option, possibly looking to solve some aspects of data quality in an evolutionary, piecemeal fashion rather than revolutionary endeavor.