Being the co-founder of the pioneering data engineering bootcamp means having no real blueprint on how to do things. No franchise books, tutorials or blogposts to lean on. Having a decade of experience in data, and half of that on top in the e-learning industry I had to sit down and think again with the head of the hiring manager who had been interviewing data engineers in Berlin for 7 years now, condensing what I have seen in about a thousand interview sessions.
If you join Pipeline Academy, it's very likely that you come to the bootcamp with the goal of leveling up in your career, meaning that you are going to have go through applications and job interviews after the course. In this process, you'll have to prove that you can execute the tasks listed in the job description, and that you are the right Data Engineer/ML engineer/Data Product Owner/... for supporting the goals of your future employer.
The products and projects you've built constitute a massive part of making a positive impression as an engineer, so let's take a look at what we consider fundamental expectations.
Peter and I start every month with a virtual open house session where we introduce Pipeline Academy to all the interested folks who join in. We keep repeating like a mantra that our aim for our graduates is that they should leave the course with three things in their pockets:
Context: i.e. understanding the ecosystem and the driving forces of data engineering,
Confidence: for having the right attitude for solving unstructured problems,
Code they own: everything they've produced during the course.
Don't forget - and I am stating the obvious here - our expectations are broadly speaking not different from what startups and tech companies are looking for.
Your portfolio project should have the functionality of a typical real world data stack. It does not have to be complex or complicated, straightforward is better. Matching your tooling decisions with the business circumstances shows you listen, think holistically and care about the why.
Functionality that you should cover is:
acquisition of data from a database, an API, a queue/tracker, a scraper,
an automated ETL process, using a Prefect Cloud or a dbt cloud setup is a good choice - a cronjob and a Makefile is always a big plus for me,
loading data to a datawarehouse - hints: SQLite, DuckDB can do wonders, and having data quality measures in place would not hurt either,
if you do plain vanilla machine learning, you're already considered for a machine learning engineer role nowadays,
it is deployed easily (Makefile, Dockerfile, something executable), maybe in the cloud even,
your codebase has some CD/CI on it -- Github Actions will do, so you are dataops now,
it serves data to humans (web interface) or machines (API).
Subtle hint: you can see that our curriculum is pretty much aimed at making sure you can do all of this by the end of the bootcamp course. If you have a data engineering portfolio project that fits this description I would happy to review it and have a chat about it with you.
These are the skills I'd like to see demonstrated in a portfolio project:
you can explain your/the project's goal in simple terms (you have a one-pager README.md or something similar — Readme Driven Development can help here: describe in plain English what you're trying to achieve, describe what you don't know yet and still have to research),
you have an architectural doodle/blueprint so one can grasp the components and their relationships,
your code is okay enough to be actually read,
you demonstrate structured thinking, communication skills, understanding of the fundamental concepts, and collaboration capabilities (e.g. "I've learned this from that blogpost and I opened an issue on the repo of this tool because I hit a wall").
It does not have to be complicated -- it's better if it's not --, it does not have to solve all the problems of mankind. It just should deliver what you say it should.
It's about showing how you were thinking about a problem and how you've delivered an imperfect solution that is suitable for your constraints. It will have tradeoffs and that's totally fine, and I'm happy to read about why and how you've ended up making certain decisions.
If you can deliver on the above, that's already a decent start in my book. However, you should not forget that the closer you get to the final stages of a job interview process, the more your future coworkers are going to scrutinise your soft skills and the so-called "fit". In general, this is what a lot of people are looking for in a future coworker in engineering when it comes to general attitude:
you are confident in approaching and dealing with problems,
you understand K.I.S.S. as a principle,
you are determined to build a resilient system so you are not on call on the weekends for fixing bugs,
you don't build things just because you can on the job — that is called a hobby,
you solve a problem without introducing a new one,
tools are tools and not means,
you frequently ask questions, you listen to the answer and you think about them,
you don't start a Spark cluster to load a CSV.
Do not underestimate the power of showing that you are somebody who others would like to sit next to. It makes a world of difference.
Here are a couple of examples delivered by the students of Pipeline Academy. It should be noted that these products and capstone projects are being created under special circumstances (think time pressure, certain individual goals ingrained, focusing on solving one specific problem etc.).
Tomek Florek: Feature enrichment API for data science and analytics
Amy Raygada: Menu generator based on nutritional values
Michele Tassoni: Data aggregation and data analysis app to determine the value of football players
Michail Koskinas: A hip-hop playlist recommender based on the trackID of a classical music piece from Spotify
Sujit Badle: Crypto data ETL backtest
EDIT: click here to see even more portfolio projects that our graduates have built during 2021.
Tailor your capstone project according to your career goals: you can put an emphasis on code, architecture, communication, processes, quality etc. depending on what kind of role you are going for.
Make sure that your project is consistently presented and explained the right way, even for people who are not familiar with its context.
Practice explaining the why and the how: expect questions that uncover your train of thought.
Ask for feedback: wherever you have a chance, ask the hiring manager or whoever is reviewing your portfolio for feedback.