By Erik Gfesser — Aug 3, 2018

New Book Review: "High Performance In-Memory Computing with Apache Ignite"

New book review for High Performance In-Memory Computing with Apache Ignite, by Shamim Bhuiyan, Michael Zheludkov, and Timur Isachenko, Lulu.com, 2017, reposted here:

Stars-4-0._V192240704_

If you're looking for a book that covers Apache Ignite, as of mid-2018 this is still the only one. And while working through this book I came across a note that seemed to indicate that an updated version was in the works, I hadn't seen any activity that would suggest that this is going to happen, and the GitHub repo has been void of commits since mid-2017. By coincidence, upon revisiting the LeanPub web page for this book while writing this review, I noticed a new comment that didn't exist at the time I first acquired this book in April 2018, indicating that this book will not be updated, and two of the original authors are instead working on a new book entitled "The Apache Ignite Book: The Next Phase of the Distributed System".

Following an introduction, this book is broken down into seven chapters: (1) "Installation and the First Ignite Application", (2) "Architecture Overview", (3) "In-Memory Caching", (4) "Persistence", (5) "Accelerating Big Data Computing", (6) "Streaming and Complex Event Processing", and (7) "Distributed Computing". Chapters 4 and 7 are the most heavily weighted in terms of content, and chapter 5 follows close behind. Every chapter includes example projects except for chapter 2, and the code for all of these projects is available in GitHub. Plan to spend considerable time working through the examples, as there is much to learn. The authors do a decent job intermingling theory with these example projects, although in a number of instances the explanations are light. As with all books of this nature, plan to spend some time in the Apache Ignite documentation and in the community that works with this product.

The example projects are what make this book, but be aware that you will run across quite a few code issues if you choose to make use of a recent release of Apache Ignite. The book covers version 1.6.0, which was the latest version available at the time it was written, but I chose to use the most recent version available when I started working through it: version 2.4.0. In my opinion, it doesn't make sense to work through the examples using a version more than a year old, because it's not like I'm going to work with a client to implement an old version. I've used the same philosophy with other technology texts, which has forced me to dig into the documentation and community. In this case, I also made use of the most recent versions of other products used by the examples, such as Apache Hadoop and Apache Flume.

Many of the examples make use of the H2 in-memory database, and the first code issues I needed to resolve involved fixing Maven pom.xml files that lack this dependency. While it was annoying to run across this issue the first time I came across it, this is a relatively minor issue, and fixing these became routine throughout the rest of the book. In addition to relatively minor library dependency issues, I also came across quite a few differences between the examples which used 1.6.0 and my code which used 2.4.0, and these discrepancies started on the second page of chapter 1 and continued throughout the book. The first issue is relatively minor, just like the lacking H2 dependency, and in this case involved different output when hitting the Apache Ignite REST interface following its enablement.

I recently mentioned to a colleague that if I didn't have experience with Java, Spring, and Maven, working through the examples would have been even more tedious. For example (and I'm still writing about chapter 1), the book incorrectly states that "mvn archetype:create" needs to be used rather than "mvn archetype:generate". A couple pages later, the wrong Apache Ignite cache is used in the code. And while the authors introduced me to a database tool called DBeaver which supports quite a few NoSQL database products, the reality is that it took a long time for me to successfully get it to query Apache Ignite, and it wasn't very straightforward to work with, in contrast to another tool that I typically use, DbVisualizer. By the time I worked through chapter 1, I was already experienced upgrading Spring libraries due to the Apache Ignite upgrade I performed.

While chapter 2 provides a good architecture overview that includes Apache Ignite cluster topology, caching topology, caching strategy, data model, CAP theorem relationship, clustering, how SQL queries work, multi-datacenter replication, asynchronous support, resilience, security (only provided with the GridGain commercial version), and key API, I came to realize that the architecture changed between versions 1.6.0 and 2.4.0, and so the big memory, off-heap memory examples in chapter 3 do not work: according to the Apache Ignite 2.0 migration guide, the "memoryMode property has been removed due to new Ignite page memory architecture".

The rest of the examples in chapter 3 worked with some relatively minor changes, although I skipped the sections on Java method caching and web session clustering, because I don't have any interest in these areas right now. Chapter 4 on persistence is especially well done, providing a great mix of theory, diagrams, and code as the authors discuss Apache Ignite persistence using PostgreSQL and MongoDB, cache queries (scan queries and text queries), SQL queries (projection and indexing with annotations, the Query API, collocated distributed joins, non-collocated distributed joins, and performance tuning), JPA, expiration and eviction of cache entries, and transactions (commit protocols, optimistic transactions, pessimistic transactions, and performance impact). With my pursuit of upgrading libraries whenever possible, there were quite a few to upgrade in chapter 4, including Hibernate OGM which the book indicates is in development and should not be used in production, but I discovered that 5.3.1.Final was released just following publication.

The Hadoop accelerator examples in chapter 5 are by far the most problematic, and I was not able to get these to completely work using Apache Hadoop 2.8.4 after first attempting to make use of a Hortonworks Data Platform (HDF) 2.6.4 sandbox. In lieu of working through the rest of this chapter, I decided to work through the streaming and complex event processing (CEP) examples of chapter 6, skipping the examples in chapter 7 for the time being. While the examples in chapter 6 had some issues, I was motivated to get these to work as I want to continue to get familiar with streaming products. This chapter covers IgniteDataStreamer, Camel Streamer, Apache Flume, and Apache Storm. While I would have liked more theory in this chapter, I appreciated getting hands-on working with these tools and trying to get them to work with Apache Ignite. In addition to the 4 products covered in this chapter, the chapter also mentions JMS Streamer, MQTT Streamer, Kafka Streamer, and Flink Streamer, and the latest version of Apache Ignite additionally provides Twitter Streamer, RocketMQ Streamer, and ZeroMQ Streamer.

The issues in chapter 6 revolve around the integration between Apache Ignite and these other products. And I find it odd that one of the JAR files that needs to be executed throughout this chapter is to be found solely in the Camel project for the chapter. The chapter 5 examples caused me to hunt quite a bit for needed JAR files, so by the time I worked through chapter 6 this became a habit. It turns out that comments mentioning this JAR in the README for the Camel project in GitHub were removed at some point. Be aware that there is considerable hard-coding of configuration in this chapter, and you will need to spend quite a bit of time tweaking the code throughout. I initially wasn't able to connect the Apache Ignite Visor Command Line Interface (first introduced in chapter 1) to the cluster, but tweaking the code I finally got this to work later in the chapter, and ended up revisiting earlier examples to fix those as well.

Chapter 7 presents compute grid (distributed closures, MapReduce and fork-join, per-node shared state, distributed task session, fault tolerance and checkpointing, collocation of computation and data, and job scheduling), and service grid (developing services, cluster singleton, and service management and configuration) and developing microservices. At this point, I don't see myself making use of Apache Ignite to implement microservices as discussed by the authors, although I do see the possibility of including it within a microservice ecosystem, and I do give the authors credit for their admission that this method is not in alignment with the microservice architecture approach. If other books on Apache Ignite were available in the marketplace, I would have likely assigned a lower rating, but when all is said and done I learned quite a bit and plan to revisit this book in the future, potentially with the release of the forthcoming title.

Subscribe to Erik on Software