Pipeline Data Engineering Academy home blog pages letters

How to become a data engineer?

Ok, now that we've spent some time with understanding the current state of affairs when it comes to data engineering as a function or career choice, let's take a brief look at how to actually get there.

First off, not one of today's data engineers grew up as a kid imagining becoming a data professional once they grow up. Period. There is always a weird list of circumstances that led to a point when it became necessary to update their title on LinkedIn saying "Data Engineer" or something similar. For the last 10-15 years, the more prevalent the systematic extraction of value from data has become on a macroeconomic level, the more software engineers and data analysts have been moving/pushed towards setting up and maintaining relevant infrastructures that enable data products on the micro level. The undisputed (yet more and more often criticised) hype around data science for the last 8 years has been accelerating the demand for a broader understanding of dealing with data within organisations and showed that a progressive approach is welcomed by shareholders as it indicates the intention and commitment for market dominance through innovation (just think how many times you've heard about products with "AI" inside). When it comes to talent, tooling and organisational structure, the last years were all about the rapid professionalisation of the data ecosystem and as in any healthy market starting taking shape, many different competing offerings have emerged.

For the last five years, there has been an ongoing discourse between industry stakeholders and the job market to try to codify the role and responsibility of a data engineer. With the expectations converging about what this position shall include on various levels, the educational market has started to take note and to try to address this need. However, just as quickly as the focus of data teams and the corresponding data tools change, the involved disciplines and their know-how have to adapt really fast as well. The adjustment of the supply side (available people on the job market with the right skillset) takes time though as it is influenced by so many other general economic and individual factors, and this seemingly endless yet rapid shapeshifting on both sides makes the supply-demand gap for data engineering as a function so difficult to define and grasp. As a benchmark, data science seems to be 5-6 years ahead. At Pipeline Data Engineering Academy we have a distinct point of view about the approach, general attitude and skillset a data engineer has to show in order to succeed, but for the sake of this post, allow me to address this in detail in a separate one.

Below you can find a non-comprehensive overview of the educational sector's sometimes outstanding and sometimes half-assed attempt of aiming at a moving target. Be warned: you'll have to navigate between well-written, didactically thoughtful learning tools and more or less empty and pointless yet overpriced courses taking advantage of the overhyped expectations around a career in data.

So you've made the decision to become a data engineer

As mentioned above, there are various formal options promising to help you get there, so let's start with some self-assessment in order to understand which path would be the right one for you.

  • Motivation: is data engineering a hobby you'd like to explore, a pre-requisite for your sidegig, or the focus of your future career; is your motivation is intrinsic or extrinsic; would you like to improve your knowledge or to receive a certificate you can show off with etc.

  • Experience: how new are you to the world of data and software

  • Time: think about the long-term commitment and the (quality) time available on a daily basis you can dedicate to learning

  • Costs: don't just consider the tuition or the subscription fees for courses, but the additional costs for housing, opportunity costs etc.

  • Learning methodology: do you like to learn on you own pace by yourself, are you looking for a classroom experience with a lot of social interaction, would you like to focus on industry best practices or on a more academic approach etc.

Keep in mind: there is no one-size-fits-all solution, but having an overview helps you to exclude the options that most likely won't work for you. If you have the chance to become an intern or an apprentice next to an experienced data engineer, you should probably take it - learning on the job offers the fastest learning curve from all the below options. Think strategically, invest your time and your money into the path that projects the highest ROI in your particular situation.

Online courses

Let's start with the online courses that offer an easy and affordable entry and self-paced learning paths for data engineers. Popular MOOCs like Coursera and Udemy offer learning tracks puzzled together mostly from an abundance of data science, data analysis, cloud engineering, Python and SQL programming courses, while Udacity is offering a nanodegree you can receive investing about 4-5 months of your time. This is a result of the above mentioned vague definition/understanding of what data engineering is and what it is supposed to be (at least that's what haters will say), and shows that thinking about data infrastructure and the non-fancy plumbing work one is confronted with later on is more like an afterthought rather than a real focus. With that being said, one has to show love for the quality of the courses, teachers and the materials: they can be engaging and rewarding, but by definition they will be far from what the industry considers state of the art. Easy to use websites, mobile apps with simple exercises (especially if you are looking into learning Python) all help to immerse yourself in the world of data related roles, but every time you hit the enroll button just ask yourself: are you in for the know-how or the certificate?

Datacamp and Dataquest are both specialised platforms focusing on the three career paths we consider traditional nowadays: data analyst, data scientist and data engineer. Heavily discounted subscription fees and extremely well made missions help you get started and build up the confidence to solve coding challenges you are faced with during a regular job interview process. Whether you prefer text-based or video-based learning, Python or R, whether you like to engage in community discussions or rather work by yourself should determine which one to pick. As a result of being much more domain-specific than generic MOOCs they offer more depth and cater for the future data professional significantly better (e.g. integration of Jupyter). For $200-$500 per annum you can get access to all their career paths so you can switch from engineering to analyst in case you deem the first one too challenging along the way.

We can't ignore branded trainings organised by the key players in the data ecosystem: hardware manufacturers and SaaS providers offer extensive trainings so their clients can actually deal with their products (Google Professional Data Engineer, Microsoft Azure Data Engineer, Oracle Cloud, Cloudera Certified Professional Data Engineer etc.). These courses are usually meant for engineers with a decent level of experience, and offer very specific know-how as opposed to a more general approach towards making things work. Pluralsight offers a wide range of courses mainly focusing on B2B clients and their workforce, but offers a self-assessment tool to support you in picking the right courses matching your skill level and your goals.

All fully online trainings lack the social component that defines traditional education: a multi-layered experience. The lack of direct in-person interaction with teachers, instructors, mentors and your peers turns the learning experience into a very efficient transfer of value, but it won't create a community in the traditional sense and it won't make you experience the social-emotional landscape of a school. Online trainings most likely won't enable you to become part of a (professional) network or an alumni group you can leverage later in life, and it's difficult for these platforms to provide proper career coaching as a result of the cultural and economic differences between regions their students come from (think job market in Silicon Valley vs. job market in Brazil).

Recommended: for newcomers trying to get a feel for what data professionals do and how disciplines compare, for software professionals without exposure to data products or statistics yet, BI managers and analysts broadening their competences and leveling up etc.

Higher education

Oh, the never-ending discussion about the countercyclical relationship between economic cycles and the offering and enrollment at higher ed institutions... Instead of going into detail about the reasons for disparities between geographic regions and socioeconomic structures in the sector of higher education, let's agree on a couple of fairly provocative statements: most universities are slow to adapt and tend to focus more on certification than the actual transfer of know-how. Historically, anticipating significant changes on the demand side of the job market and providing an appropriate response in form of an adjusted, market relevant curriculum has not been their strong suit. Hence we can observe that on the European continent - taking it as an example as it is the most relevant for Pipeline Academy - there are only a handful of universities offering Data Engineering in form of a course or a masters degree (MSc): Technische Universität Berlin (GER), Technische Universität München (GER), Hasso Plattner Institute (GER), Jacobs University (GER), Data ScienceTech Institute (FRA), Universität Potsdam (GER) and Charles University (CZ) to name the few who already integrated data engineering into their program.

The pros and cons for enrolling at a university (time commitment, financial investment, degree as a signal for the job market etc.) is something I won't cover here, yet I encourage you to take a close look at the curriculum, the teachers and the alumni to get a sense of the quality of their education. Is an MOOC an alternative to the university experience? Well, it is not. But they can complement each other very well. Also, keep in mind that the unique landscape of colleges and universities (especially the top tier) in the US will sooner than later become a playground of Silicon Valley, so expect a significant change in what we consider today a "university experience".

Recommended: computer science and software engineer graduates with BSc., business and economics students looking for a specialisation that enables them to start their career at a tech company etc.

Coding bootcamps and live trainings

Software engineering is a trade without any strict professional standards (as opposed to medical doctors, architects, lawyers etc.). Up until now, the demand for software/data professionals on the job market has been constantly higher than the supply coming from formal education and this trend does not seem to change anytime soon. These two circumstances led to the emergence of coding bootcamps taking on the role of making certain popular professions in tech accessible for anybody (web developers, UX/UI designers etc.). First and foremost, they promise focus and speed for a fraction of the costs of a traditional education (depending on the duration, classroom size, reputation etc. this will fall somewhere between $5.000-20.000 in the US, in Europe it’s around €6.000-16.000). They also offer a classroom experience with approachable instructors, a project portfolio that can be leveraged when looking for a new job, and of course career coaching that will make you nail job interviews.

Data science and it's little sister, the more approachable data analysis has been on the menu of bootcamps for a couple of years now, and it's popularity is indisputable. Data engineering however has been untouched for several reasons:

  1. The handful of experienced data engineers are currently hotter than ever on the job market scoring the highest salaries among data professionals.

  2. Not all data engineers have the willingness and/or the capability to teach students.

  3. Setting up a proper data engineering curriculum and figuring out the right admission process, approaches, tools, methodologies is an investment most bootcamps don't have the capacity to make.

This is where Pipeline Data Engineering Academy comes in filling the gap. Our mission is to teach you how to build and maintain data products, machine learning systems and business intelligence tools through a 12-week intensive course. It is our belief that with this knowledge is going to set you up for a sustainable and rewarding long-term career.

But I digress...

Before you sign up to our next training, I urge you to read Quincy Larson's The Coding Bootcamp Handbook: it will help you prioritise your selection criteria and become more informed before you make a decision about joining any coding bootcamp. Approaching this journey with a certain level of scepticism is the healthy way forward. You probably want to take a look at student feedback on CourseReport and SwitchUp, the two leading coding bootcamp review sites: just make sure you consider survivorship bias while reading through the comments. For the folks in the US, Thinkful has created a smart and simple bootcamp comparison tool.

A note on COVID-19: most coding bootcamps have been forced to move their in-person courses online. There is a good reason why these institutions had classrooms in the first place, and this is something everyone expects to come back one way or the other after the corona restrictions are lifted. However, in case you choose to have a 100% remote coding bootcamp experience, you should ask yourself if the value for money ratio still holds up when compared to a recorded online training.

Recommended: for data analysts and jr. data scientists looking for expertise in building and maintaining data products, for engineers from different areas (frontend, devops etc.) transitioning into a new field etc.

Other resources

Whatever path you choose, there are a handful of resources that you should know about in order to complement your learning experience and to help you navigate your own ship. In case you find something you think should be on this list, please reach out to me and let me know!

Notable articles and blog posts:

I encourage you to explore your options and start your transitioning into the world of data engineering.