This week, we spoke with Niall Wharton, Team Lead in Data Science, Machine Learning and Big Data at Xcede, to learn more about data engineers and the significant impact they’re making.
Hi Niall, thank you for volunteering to speak with us again. First off, what does a data engineer do?
The core role of a data engineer is to store data, making it available for data scientists and data analysts to use. It’s a vital relationship for operations - you help them complete their job by pulling all of this different data into one, digestible place.
But, as Callum Ryan (Senior Data Engineer at Concirrus) says, the responsibilities of a data engineer can vary, depending on the sector, scale and need of the business in question.
“It's a bit difficult to describe a data engineering role generically, as the role can mean different things in a number of businesses, but I think fundamentally it's all about ingesting data and keeping it clean.”
What data does a data engineer work with?
In a perfect world, data engineers would work with generic, quantitative and structured data. However, data engineers frequently work with hugely unstructured data, such as computer vision images, sound, recordings and text. Think about how you would handle vision data from CCTV cameras and accurately highlight who was moving at what time. The field has got progressively more complicated - which many engineers love, as the landscape is always changing and there are regularly new challenges to solve!
Thinking about 2020 so far - how have data engineers had to adapt in light of the pandemic?
The recent logistics and changes surrounding communication (due to remote working), have had a considerable impact on the data field. The very best data engineers frequently interact with their stakeholders; recently they’ve had to adapt to maintain a clear line of dialogue when working from home.
Remote working has also made task management more difficult. If data isn’t stored, or accessible at all, the data scientist can’t continue with their work. Data engineers have to constantly be aware of deadlines for each of their projects and how their work impacts others.
What programs or platforms do data engineers commonly use?
It varies a lot between different companies, but the most common thing we’re seeing is everyone going cloud-led. Accordingly, experience with AWS is a popular requirement; however, GCP and Azure are quickly becoming platforms of choice. On-premises platforms such as Hadoop are fast becoming outdated.
What does 2021 look like for a data engineer?
The importance of data engineers is increasing rapidly, so the profession looks to grow next year and see steady salary increases. It used to be that senior data scientists roles were more popular, because they were the ones producing the exciting insights at the end - but things are changing, and data engineers are becoming recognised in their own right.
2021 will also see an increased demand for data engineers that can productionise algorithms to work at scale. Data scientists often lack the engineering capabilities to do this with a company’s full product, making data engineers increasingly integral.
Looking at the specific requirements for a data engineer role, what qualifications and skills are businesses seeking?
From a technical side, businesses want engineers adept at a couple of programming languages, such as Python, Java or Scala. Data engineers can also obtain certification in cloud technologies and a Databricks certification for Spark.
A solid grasp of the data landscape and tasks at hand are instrumental, as Karim Essawi (Senior Data Engineer at Joyn GmbH) says:
“For me, a capable data engineer would demonstrate a sound understanding of the challenges of designing and implementing scalable and resilient self-service data pipelines and data products that enable stakeholders to make data-informed decisions.”
From a soft skills side, there are two essential qualities employers want:
1. Problem-solving. In a perfect world, data comes from one place to be stored in another. This isn’t reality. For example, Netflix would have multiple data sources, such as TV, mobile and web data. It’s a lengthy process so you must be good at working out problems with code and logistics.
2. Patience. You must be patient at handling stakeholders who might not appreciate the complexity of the task you’re doing; being able to explain why something is or isn’t possible.
Callum Ryan from Concirrus summarises the balance between technical and soft skills nicely:
“There are some more obvious important parts of the job, like being able to write code in different languages (Python/Scala), bash scripting, an understanding of cloud technologies, and recently a bit more DevOps with IaC (Terraform).
But I also believe there is also a real requirement to have softer skills, like clear communication and being able to explain complex technical concepts to a less technical audience.
It's also very important to be flexible. One minute I could be debugging a data issue in a pipeline, and the next helping a data scientist with their code.
And, a good engineer should always be learning. There's always something new around the corner, and there's always something to learn, be it about how the wider part of your business operates, or technology-wise.”
Thank you, Niall - anything else you want to add?