Media Query Source: Part 40 - CIO (US digital magazine); 10 key roles for AI success
- CIO (US digital magazine)
- 10 key roles every AI team needs
- Data engineer role is foundational
- Suitable data leads to trustworthiness
My responses ended up being included in an article at CIO (June 7, 2022). Extent of verbatim quote highlighted in orange, paraphrased quote highlighted in gray. Above image from cited article.
The query responses I provided to a media outlet on May 6, 2022:
Media: I'm contributing to an article on 10 key roles every AI team needs. What are the key titles, what does each function do? (For example, data scientists, ML engineers, product managers, data engineers.) I'd love to hear from individuals holding those jobs who can briefly talk about what their roles entail, or enterprise AI managers who manage these teams.
Gfesser: I'll speak to "data engineer", a role I've served in the past and a role for which I currently hire and manage, although I have served in other related roles as well.
While the roles of data engineer and ML engineer are closely related, I consider the data engineer foundational for both ML and non-ML initiatives.
In a nutshell, data engineers build data pipelines to collect and assemble data for downstream usage, and in a DevOps world, additionally build pipelines to implement the infrastructure on which these data pipelines run.
For example, when implementing data pipelines in one of the public clouds, a data engineer needs to first write the scripts to spin up the necessary cloud services which provide the compute necessary to process ingested data.
This ingested data is then transformed and subsequently stored in increasing states of consumption readiness, which can then be consumed for a variety of uses, including ML.
Data engineers are typically required to be skilled in languages such as Python, SQL, and Scala, libraries used for cleansing and transforming data, and when large volumes of data are involved, de facto standard Apache Spark for distributed data processing.
Data engineers are considered foundational because data needs to be collected and made suitable for consumption before anything trustworthy can be done with it.
Note that the role of data engineer has evolved over the last several years to a point that, not only is it considered critical to all serious data initiatives, it is also converging with other roles needed by ML teams.
Instead of waiting for a data engineer to build what is necessary for a data consumption use case, for example, other roles such as data scientist can instead do this work themselves, or vice versa.
That said, it is typically easier for a given skilled data engineer to pick up data science skills, due to their presumed software development background and because industry has not placed enough attention on building data engineering skills relative to data science.