From the Trenches: Technology Radar Volume #17
Introduction
The title of this post is influenced by a 2012 post of mine which went into detail discussing a suite of commercial products that I had used on a project at that time, with the tagline "from the trenches" to emphasize that my writing was from personal development experience. Regardless of the degree to which a software product is commercial or open source, products can tend to claim at least some aspects of functionality that either do not work, or do not work as intended or evangelized. And while I have been an open source advocate for quite some time, some community advice can be misleading or simply incorrect.
ThoughtWorks claims that its writers share their opinions from experience, but the "Technology Radar" reports presented by the firm offer very little discussion. This post is an attempt to address one entry from each of its recent November 2017 radar quadrants: "languages & frameworks", "platforms", "techniques", and "tools". The purpose of this post is not to delve into detailed definitions of each entry, but to consider each entry from the perspective of my experiences and how ThoughtWorks chose to categorize.
Background
It is not uncommon to hear developers bemoan use of the term "architect" to describe themselves and what they do for a living, to the extent that "senior developer" now often refers to individuals who are actually architects, but do not wish to be recognized as such because it can imply that they are no longer hands-on from a development perspective.
However unfortunate this situation, one of the problems with using the term "developer" exclusive of "architect" is that it tends to imply that what one does is limited to programming or some interpretation of DevOps. In other words, relying on others to be concerned about the decisions feeding these work activities, whether these be interpretations of what users want, or what technologies to use.
At some point earlier in my career, some of my colleagues and I started referring to ourselves as "practical architects", a term which is intended to imply that the ivory tower of architecture has no place in our work efforts: we are practical because we can not only relate our decisions to the practicality of how these are to be implemented, but can also perform the implementation itself.
The discussion as to whether programming is necessary to make decisions or to show others how they might make use of chosen languages, frameworks, tooling, and software development techniques is arguably not an empty argument. One of the big questions here is the degree to which one can be effective without actually touching and feeling that which is being advocated or discouraged.
Over the past few years, I have been following the Technology Radar reports sporadically published by ThoughtWorks that are intended to categorize the aforementioned areas of technology in a manner which seeks to advocate the degree to which something should be used, including whether something should not be adopted at all. This take is a bit different than other technology reports with which technologists might be familiar.
For example, Gartner uses what are called "Magic Quadrants" to categorize, but what is being categorized by these reports are the vendors of products, not the products themselves, although it can be argued that many vendors are so closely tied to their flagship offerings that how they are categorized has a great affect on their product lines.
One of Gartner's competitors is Forrester Research, which publishes its reports called "The Forrester Wave". Forrester also indicates that what it evaluates are product vendors, but closely follows up on this objective by stating that it compares products and services by these vendors within specific markets, and it does not take long to see that it explicitly lists the products representing each vendor evaluation.
Gartner also publishes another report called a "Hype Cycle", which is intended to depict the maturity of a technology through phases that the firm refers to as "Technology Trigger", "Peak of Inflated Expectations", "Trough of Disillusionment", "Slope of Enlightenment", and "Plateau of Productivity". But be aware that each report is a snapshot in time, and technologies should not be expected to traverse multiple phases.
In this sense, Technology Radar reports are probably most similar to the Hype Cycle reports, because ThoughtWorks refers to occurrences of technologies and techniques as "blips" which come and go. However, while ThoughtWorks rightly cautions use of its reports in isolation just like the other firms, it also mentions that reports are not based on deep market analyses.
Placement in the "adopt" ring indicates that a blip is proven and mature, and ready to be used in the appropriate context. On the opposite end, the "hold" ring indicates that a blip is either not mature enough to be placed in another category, or should be avoided. Between these two extremes are the "assess" and "trial" rings, which are intended to categorize blips as being less proven or more risky than if placed in "adopt".
In my view, the categories that ThoughtWorks provides in its reports can be a bit esoteric, but if one views these as simply suggested levels of adoption, it can bring some simplicity to what is presented. Just keep in mind that the included blips are limited to the technologies and techniques run across or used by ThoughtWorks development teams. Despite this element being a limitation, it is also what brings practicality relative to other reports out there.
Just to reiterate: no report should be used in isolation. To be practical means to determine level of adoption within a specific context, which means good fit with respect to a development team and the larger organization within which a development team exists. A development team should not adopt something by only considering its own needs: the needs of other stakeholders also need to be taken into account.
Let's take a look at the blips I've chosen from each quadrant.
Spring Cloud
Radar: Spring Cloud continues to evolve and add interesting new features. Support for binding to Kafka Streams, for example, in the spring-cloud-streams project makes it relatively easy to build message driven applications with connectors for Kafka and RabbitMQ. The teams we have using it appreciate the simplicity it brings to using sometimes complex infrastructure, such as ZooKeeper, and support for common problems that we need to address when building distributed systems, tracing with the spring-cloud-sleuth for example. The usual caveats apply but we're successfully using it on multiple projects.
As someone who has been using Spring frameworks since 2007, and was a relatively early adopter of Spring Boot in early-2015 for a new product I built for one of my clients, I became interested in the Spring Cloud family of frameworks as new projects were released, and started using it by early-2017 as I worked through the tasks to rearchitect this product into microservices. The first such Spring Cloud frameworks that I used were Spring Cloud Config, Spring Cloud Netflix (Eureka, Zuul, Ribbon), Spring Cloud Bus, Spring Cloud Security, and Spring Cloud Sleuth, and I recently started using Spring Cloud Stream (and related Spring Cloud Stream App Starters) for a subsequent client project, with my team additionally taking on Spring Integration, one of the mainstays of Spring Cloud Stream.
My experiences with Spring projects over the years have demonstrated that from a technological perspective there is no reason not to adopt Spring frameworks in general for one's projects. As I've commented over the years, by the time I started adopting Spring frameworks, a schism had already formed with the traditional J2EE crowd at the time. While I had started to hear rumblings from the Spring faction, it wasn't until leaving the confines of my development work on a healthcare product for a global consultancy to join a new firm when I needed to start adopting Spring in short order, resulting in my being known for quite some time amongst my colleagues for having read through (over a single weekend) the 700-page first edition of "Spring in Action" by Craig Walls and Ryan Breidenbach.
The Spring ecosystem grew considerably over the ensuing years, extending far from its inversion of control (IoC) beginnings and invalidating any notion that use of "Spring" was limited to this development practice. Someone recently commented to me that "Spring is easy", which tells me that they probably don't understand what Spring is all about. However, I do tend to agree with someone who remarked to me a few years ago that the Spring ecosystem is all about how one chooses to use it, as it is fairly vast, with some frameworks (such as Spring Security and Spring Batch) amongst the most challenging that I have used, and others (such as Spring Cloud Stream) amongst the least well documented (which leads to its own set of challenges).
Making use of a single blip can sometimes hide detailed placement of underlying projects across multiple rings of the radar.
For example, since Spring Cloud is an umbrella project consisting of dozens of other projects, the single entry provided in the Technology Radar is likely to be misleading for anyone not familiar with the project.
While one other Spring Cloud project was included in the most recent radar, the only other Spring project that has ever been included as a blip is Spring Boot. In this example, I created a radar using a sampling of Spring projects with which I am familiar, to provide some relative positioning.
(Note that while ThoughtWorks comments that angular positioning of each blip is meaningless, the radial positioning of each blip is not, although from what I can tell the web version of the technology radar app does not apparently offer control of radial positioning. So don't pay attention to these factors in these screenshots.)
I've chosen to position Spring Boot, as well as a number of other Spring projects, as mature and ready for adoption. But others, such as Spring Batch and Spring Integration, while mature, also require a significant amount of time to understand, and some nuances of the tooling can be challenging to discover.
Spring Cloud is more of a mixed bag at this point in time. Spring Cloud Config, for example, provides needed conveniences and is relatively straightforward. But most of the available Spring Cloud Stream example projects are simple, and can be misleading to the unwary.
For example, I've tested quite a few of the Spring Cloud Stream App Starters, and some are significantly more robust than others. After playing with the MongoDB sources and sinks provided, I soon discovered that MongoDB sources don't provide the ability to keep track of what documents have been produced, even though at first glance the provided configuration seems very similar to JDBC sources.
So my team used more granular Spring Integration wiring for this functionality instead, with the understanding that use of Spring Cloud Stream also makes use of other frameworks such as Spring Integration, Spring Rabbit, and Spring AMQP when using a RabbitMQ binder. These underlying frameworks, while not trivial, are much more mature relative to Spring Cloud Stream (especially with respect to the documentation and example code provided by the community), but developers should also take notice of the many conveniences provided by this new abstraction for asynchronous messaging.
Build your own technology radar on the web or via the GitHub project.
While the Spring ecosystem provides a significant number of frameworks, the Spring Cloud family continues to grow and now provides dozens of frameworks within this broader ecosystem. Anyone entering this space should understand that the Spring Boot product on which these newer frameworks have been built was also in turn built on earlier frameworks. And effective development teams should expect to dive in to these frameworks over time as they either seek to understand how Spring Cloud frameworks work, or are forced to do so after going beyond out of the box usage of them.
The relatively terse statement that ThoughtWorks provides about Spring Cloud happens to mention three different Spring Cloud frameworks: Spring Cloud Stream, Spring Cloud Zookeeper, and Spring Cloud Sleuth. At least these are the frameworks that would have been mentioned had the frameworks been correctly identified by name (e.g. a framework called "spring-cloud-streams" does not actually exist). While I have used Spring Cloud Sleuth, it is Spring Cloud Stream with which I've recently spent the most time, so I will address it a bit here.
While Spring Cloud Stream does in fact offer connectors for Apache Kafka and RabbitMQ, it also offers connectors for dozens of other products. The focus here should really be the binders that Spring Cloud Stream offers for these two products out of the box. Traditionally, the term "connector" implies that some type of adapter is being made available to either transmit data from or to another product, while at the same time minimizing the custom code that needs to be written to do so. Spring Cloud Stream does in fact offer such adapters, but Spring Cloud Stream offers an abstraction on top of these middleware products (if this term can be used without offense) when used between such source and sink endpoints.
And a Spring Boot project that makes use of Spring Cloud Stream needs to be bound to one of these two products, because these are the products that can be used to transmit data between endpoints. If you are familiar with Apache Kafka and RabbitMQ, however, you will also understand that while Spring Cloud Stream offers a common abstraction here, not all options are available for both since these products have some stark differences (see the discussion on "event streaming as the source of truth" below). Yes, both of these products are used for messaging, but Apache Kafka provides additional functionality.
Overambitious API gateways
Radar: We remain concerned about business logic and process orchestration implemented in middleware, especially where it requires expert skills and tooling while creating single points of scaling and control. Vendors in the highly competitive API gateway market are continuing this trend by adding features through which they attempt to differentiate their products. This results in overambitious API gateway products whose functionality — on top of what is essentially a reverse proxy — encourages designs that continue to be difficult to test and deploy. API gateways do provide utility in dealing with some specific concerns — such as authentication and rate limiting — but any domain smarts should live in applications or services.
Mentioned under the aforementioned entry for "Spring Cloud" is Zuul, part of Spring Cloud Netflix. Zuul is a very lightweight API gateway that can be placed in front of other microservices as an "edge service application", providing filters associated with dynamic routing, monitoring, resiliency, security, and other functionality. Details about Zuul can be found in its GitHub project. Notice that the GitHub explanation lists Spring Cloud as one of the projects that make use of Zuul, and this is because it provides an abstraction to Zuul, just like other Spring projects provide for numerous other frameworks.
Using Spring Cloud, Zuul functionality can be enabled via annotation of a Spring Boot microservice. Making use of Zuul in such a manner provides familiarity to developers who are already comfortable with Spring Boot. Implementing an API gateway in such a manner is very lightweight. The Spring Cloud Netflix project summarizes Zuul as "intelligent routing" because this is really the focus. Such a microservice can be deployed in front of other microservices with the same API as some subset of these other microservices, to serve as a placeholder in the case that changes are needed down the road, but the intention is to make use of this layer to perform functionality on the edge before the underlying microservices are used.
A distinction needs to be made between such a lightweight API gateway and other products available in the marketplace. Posted nearby is an "API landscape" image that I came across at an early point of a recent client project of mine, containing none of the just discussed open source projects. The commercial products that my team and I evaluated for my recent client are positioned in the lower left of this image, within the "API management" subcategory of "API lifecycle platform", and these products additionally include functionality that overlaps with some of the other subcategories included here, such as "API documentation platform", "API testing", "API analytics/monitoring", "Developer portal", and "Access level and identity management".
But these areas of functionality do not include any domain smarts. In my opinion, even the ability to serve static reference data directly from the Google product that my team and I evaluated is going into overambitious territory. Just like a circa 2005 enterprise service bus (ESB) should not be used for the messaging functionality discussed later in this post, such a product should also not be used for an API gateway. Among many other reasons, not only is it ill advised to include business logic and/or rules outside of the domains which are intended to host this information, it is also ill advised to make use of heavyweight, specialized tooling here.
Colleagues and I have frequently joked about one of the risks here: spending more time to figure out how to use tooling than actually implementing needed functionality. The unfortunate reality is that all API gateways run the risk of either providing overly ambitious options out of the box, or providing the means to go in this direction through configuration or plugins. Interestingly, this ThoughtWorks Radar blip is somewhat similar to the "recreating ESB antipatterns with Kafka" blip in the techniques quadrant, because the best practices associated with both of these are being ignored to the extent that intended purposes are greatly expanded beyond what they were intended to do. Of course, one of the differences here is that the platform in question pertains to multiple competing products, whereas the identified technique is limited to use of Kafka to fulfill a certain philosophy rather than just open ended bloat.
The history provided for the overambitious API gateways blip shows that ThoughtWorks has chosen to consistently relegate it to a hold status since its first entry in the November 2015 radar, which makes me wonder why ThoughtWorks routinely removes entries from its radars to make way for new entries, but chooses to keep some intact for a much longer time period. Perhaps because in this case there is some staying power. When choosing open source technologies to adopt, I typically look at project history as one factor to consider, and during our whiteboard sessions it was helpful to understand why Netflix, a Google customer, chose to continue to make use of Zuul rather than the "microgateways" that the Google firm subsequently now offers as an alternative. If there is anything to be gained here, be sure to do your due diligence.
Event streaming as the source of truth
Radar: As event streaming platforms, such as Apache Kafka, rise in popularity, many consider them as an advanced form of message queuing, used solely to transmit events. Even when used in this way, event streaming has its benefits over traditional message queuing. However, we're more interested in how people use event streaming as the source of truth with platforms (Kafka in particular) as the primary store for data as immutable events. A service with an Event Sourcing design, for example, can use Kafka as its event store; those events are then available for other services to consume. This technique has the potential to reduce duplicating efforts between local persistence and integration.
You might have caught an earlier blog post of mine in which I walk through the concepts of "Single Source of Truth", "Single Version of the Truth", and "Single Source of Data" alongside the increasingly used "Source of Truth" and "Shared Source of Truth" phrases I have come across within the context of data streaming in the open source community. For a recent client digital transformation effort, I was initially tasked with addressing data ingestion on the edge of the enterprise, but was later asked to additionally address internal messaging as we worked through the product procurement process for the initial work.
As I evaluated use of current state tooling used at this client alongside Apache Kafka, RabbitMQ, and Java frameworks for microservices integration, I was reminded of my earlier statement in an oft-visited data scientist post that "many software developers do not value data because they take it as a given as being available, think of it as 'the easy part', do not understand it, or do not wish to work on it" (which I elaborated upon in a follow-up post a couple months later), in the sense that these maxims hold true for data at rest as well as for data in motion. But when it comes to data in motion, what does this data represent?
When I had initially introduced Apache Kafka to my client as the likely product for use on the edge, I was initially thinking in terms of real-time analytics, as data scientists would likely find beneficial use of raw data for analysis before cleansing, standardization, and the presumed lengthy current state batch processes associated with these tasks take place. One of the comments I initially heard from developers is that Kafka was seen as a messaging product, and since "RabbitMQ can handle anything thrown at it", Kafka was likely not necessary and perhaps overkill for what the client needed.
But as I explained in the earlier blog post mentioned above, Kafka is a database. It stores data. While it has been disappointing to see that db-engines.com (a website that ranks databases by popularity that I have been referencing the last few years) not list data streaming products such as Kafka, it was refreshing to see that a recent O'Reilly report included it in its discussion, even though InformationWeek did not decide to reference Kafka when the magazine recently quoted me in reference to this report. But whether to use Kafka or RabbitMQ? What use cases are being contemplated for use with these products?
With respect to data in motion representing events, both Kafka and RabbitMQ can be used. In the case of Kafka, events are stored for a configurable time period, with consumers of these events each keeping track of which events they have consumed. Because these events are stored, consumers can replay events by simply pointing back to events that have already been consumed. In the case of RabbitMQ, producers essentially keep track of events that have been propagated to messaging exchanges and subsequent queues. RabbitMQ keeps track of which events have been consumed for the convenience of consumers, but consumption is also performed in the literal sense of the word, as messages are removed after being consumed.
So what this means is that separate data stores need to be used in the case of RabbitMQ. The event logs so-to-speak are not stored in the same product as is the case with Kafka, but need to be stored in another product. The recent client that I mention here had preemptively decided to make use of MongoDB instances to store events, so I tackled the problem space by considering use of this database product alongside RabbitMQ as the messaging mechanism. My advocating the adoption of RabbitMQ was the result of a number of different considerations, including its good fit within the ecosystem already adopted, as well as its relatively lesser complexity when it comes to long-term maintenance, as a fair number of traditional IT shops have communicated challenges with the alternative.
Jupyter
Radar: Over the last couple of years, we've noticed a steady rise in the popularity of analytics notebooks. These are Mathematica-inspired applications that combine text, visualization and code in a living, computational document. In a previous edition, we mentioned GorillaREPL, a Clojure variant of these. But increased interest in machine learning — along with the emergence of Python as the programming language of choice for practitioners in this field — has focused particular attention on Python notebooks, of which Jupyter seems to be gaining the most traction among ThoughtWorks teams.
In early-2017, I gave an internal technology talk that walked fellow developers and architects through my progressive use of the R language for exploratory data analysis, and Python specifically for machine learning. And shortly after I gave this talk, I led a project for a client during which I developed software within the context of shadow IT for the first time. I ended up using Python, making extensive use of the "openpyxl" library to automate comparisons of data output from a commercial HRMS (human resources management system) that would otherwise be a mind numbing, error prone process.
My interest in the R language initially came about during my 2012 attendance of BigDataCamp Chicago, when I attended the "Intro to Hadoop & R & Big Data" talk. Later that year, I ended up adopting the R language for pro bono analytics work that was to last about 3 years. My initial focus made use of the ggplot2 package, with which I created numerous data visualizations. But as I wanted to go beyond visualizations and do more explorative data work, I took several data science courses which made use of R, and the coursework required use of the "knitr" package for dynamic reports.
While I thought knitr served its purposes well, use of it is bound within a proprietary development environment. After my decision to focus specifically on machine learning, I made a move to Python since R doesn't really lend itself to operationalizing code. And it was at this time that I discovered Jupyter Notebook (formerly IPython Notebook). Making use of computational notebooks from a browser outside a proprietary development environment is phenomenal. Unlike knitr, which required re-"knitting" every time code is changed so that the document can be updated, Jupyter enables updating code piecemeal throughout a document and running it apart from the rest of the document.
I'm not clear as to the reasoning that ThoughtWorks used to place Jupyter in the "assess" ring of the radar. Additionally, Jupyter appears as a newly added tool in the quadrant, and it should have arguably appeared sooner, with the premise that a progression exists from "assess" to "trial" to "adopt". But it makes sense to look at the definition that ThoughtWorks currently provides for "assess": "Worth exploring with the goal of understanding how it will affect your enterprise". From my experience, all tooling should be assessed within the context of how one's enterprise is affected, and part of the assessment process should involve trial usage (e.g. hands-on usage), not just reading about it.
But the definition that ThoughtWorks provides for "trial" is the following: "Worth pursuing. It is important to understand how to build up this capability. Enterprises should try this technology on a project that can handle the risk." See what I mean about "trial" and "assess" seemingly overlapping? One of my questions here is why risk is being addressed when the whole point of trying out a particular technology is to mitigate risk. The differentiator between "assess" and "trial" must be that "trial" involves greater commitment: after determining that enterprise fit has been met, moving forward with working with it more extensively. I'm not liking this breakdown because if enterprise fit has its own dedicated ring, how can ThoughtWorks actually pick technologies for the "trial" ring? Some unexplained filtering is going on here.
All this said, Jupyter Notebook has support for over 100 programming languages, and it is worth commenting that some kernels are more mature than others. For use with Python specifically, there is probably very little reason not to adopt Jupyter Notebook, but for others, the decision may not be so clear cut. For the sake of simplicity, ThoughtWorks probably included this tool in one of its outer rings for this very reason. ThoughtWorks makes the comment that "when Radar items could appear in multiple quadrants, we chose the one that seemed most appropriate". For the sake of a broader audience, they should probably state that Radar blips could appear in multiple rings as well.