Pipeline Data Engineering Academy home blog pages letters

The Data Engineering Portfolio Project

Being the co-founder of the pioneering data engineering bootcamp means having no real blueprint on how to do things. No franchise books, tutorials or blogposts to lean on. Having a decade of experience in data, and half of that on top in the e-learning industry I had to sit down and think again with the head of the hiring manager who had been interviewing data engineers in Berlin for 7 years now, condensing what I have seen in about a thousand interview sessions.

If you join Pipeline Academy, it's very likely that you come to the bootcamp with the goal of leveling up in your career, meaning that you are going to have go through applications and job interviews after the course. In this process, you'll have to prove that you can execute the tasks listed in the job description, and that you are the right Data Engineer/ML engineer/Data Product Owner/... for supporting the goals of your future employer.

The products and projects you've built constitute a massive part of making a positive impression as an engineer, so let's take a look at what we consider fundamental expectations.

How should a professional data engineering portfolio project (aka capstone project) look like?

Peter and I start every month with a virtual open house session where we introduce Pipeline Academy to all the interested folks who join in. We keep repeating like a mantra that our aim for our graduates is that they should leave the course with three things in their pockets:

  • Context: i.e. understanding the ecosystem and the driving forces of data engineering,

  • Confidence: for having the right attitude for solving unstructured problems,

  • Code they own: everything they've produced during the course.

Don't forget - and I am stating the obvious here - our expectations are broadly speaking not different from what startups and tech companies are looking for.

Functionality

Your portfolio project should have the functionality of a typical real world data stack. It does not have to be complex or complicated, straightforward is better. Matching your tooling decisions with the business circumstances shows you listen, think holistically and care about the why.

Functionality that you should cover is:

  • acquisition of data from a database, an API, a queue/tracker, a scraper,

  • an automated ETL process, using a Prefect Cloud or a dbt cloud setup is a good choice - a cronjob and a Makefile is always a big plus for me,

  • loading data to a datawarehouse - hints: SQLite, DuckDB can do wonders, and having data quality measures in place would not hurt either,

  • if you do plain vanilla machine learning, you're already considered for a machine learning engineer role nowadays,

  • it is deployed easily (Makefile, Dockerfile, something executable), maybe in the cloud even,

  • your codebase has some CD/CI on it -- Github Actions will do, so you are dataops now,

  • it serves data to humans (web interface) or machines (API).

Subtle hint: you can see that our curriculum is pretty much aimed at making sure you can do all of this by the end of the bootcamp course. If you have a data engineering portfolio project that fits this description I would happy to review it and have a chat about it with you.

Skills

These are the skills I'd like to see demonstrated in a portfolio project:

It does not have to be complicated -- it's better if it's not --, it does not have to solve all the problems of mankind. It just should deliver what you say it should.

It's about showing how you were thinking about a problem and how you've delivered an imperfect solution that is suitable for your constraints. It will have tradeoffs and that's totally fine, and I'm happy to read about why and how you've ended up making certain decisions.

Interest and Attitude

If you can deliver on the above, that's already a decent start in my book. However, you should not forget that the closer you get to the final stages of a job interview process, the more your future coworkers are going to scrutinise your soft skills and the so-called "fit". In general, this is what a lot of people are looking for in a future coworker in engineering when it comes to general attitude:

Do not underestimate the power of showing that you are somebody who others would like to sit next to. It makes a world of difference.

Examples

Here are a couple of examples delivered by the students of Pipeline Academy. It should be noted that these products and capstone projects are being created under special circumstances (think time pressure, certain individual goals ingrained, focusing on solving one specific problem etc.).

EDIT: click here to see even more portfolio projects that our graduates have built during 2021.

Don't forget

  1. Tailor your capstone project according to your career goals: you can put an emphasis on code, architecture, communication, processes, quality etc. depending on what kind of role you are going for.

  2. Make sure that your project is consistently presented and explained the right way, even for people who are not familiar with its context.

  3. Practice explaining the why and the how: expect questions that uncover your train of thought.

  4. Ask for feedback: wherever you have a chance, ask the hiring manager or whoever is reviewing your portfolio for feedback.