Community Comment: Part 4
The comments I provided in reaction to a community discussion thread:
https://www.linkedin.com/feed/update/urn:li:activity:6727853891293089793?commentUrn=urn%3Ali%3Acomment%3A%28activity%3A6727853891293089793%2C6733693477084835840%29
Head of Artificial Intelligence at German consulting firm: This is how we have been building streaming data platforms five years ago. I'd like to have a discussion: how would you build such a thing in 2021? How would you do it in the cloud? How on-prem? If you feel like it, you make like your favorite setup as a comment.
24/7 Spark Streaming on YARN in Production
https://www.inovex.de/blog/247-spark-streaming-on-yarn-in-production/
Gfesser: Spark Streaming has since been deprecated by Structured Streaming. That said, I noticed that some of the responses here reference specific commercial products, and it's good practice to look under the covers to determine which, if any, open source libraries and frameworks are being used in order to aid product comparisons and determine architectural flexibility. Some might be interested in knowing that Andreessen Horowitz very recently put together a reference architecture called "unified architecture for data infrastructure" based on input from practitioners of leading data organizations that discusses "(a) what their internal technology stacks looked like, and (b) whether it would differ if they were to build a new one from scratch", as well as 3 common blueprints.
Emerging Architectures for Modern Data Infrastructure
https://www.linkedin.com/posts/erikgfesser_the-emerging-architectures-for-modern-data-activity-6725395085095174144-VVod