Pipeline Data Engineering Academy home blog pages letters

This is not a test, this is a summer camp - part #2

This is the second part of the story about setting up a live remote coding workshop in the midst of an economic downturn. Part one is about the circumstances that led to all this, you can read it here. Part two is about how we've rocked the workshop itself and about the feedback we've received after.

When I started reading the responses from the eight participants of the Data Engineering Summer Camp to the feedback questionnaire I've shared with them right after the workshop, the feeling this whole situation sparked in me reminded me of the effect named after the 1950 Jidaigeki crime movie Rashomon by Akira Kurosawa. Karl G. Heider used the term "The Rashomon Principle" to refer to the effect of the subjectivity of perception on recollection, by which observers of an event are able to produce substantially different but equally plausible accounts of it. Even though the ten of us (including Daniel and myself) were not external observers of the Summer Camp but active participants of the event, when putting these ten individual recollections next to each other the data becomes a physical manifestation of this particular gestalt.

My point: whoever is reading this should keep in mind that this summary is inherently biased and there is no one correct interpretation of the event in question. What I am looking for are shared patterns in the multitude of witness perspectives involved that indicate a direction for the path to our own improvement.

But then you see a tweet like this from one of our happy students and you instantly forget about the semi-scientific approach you were about to force upon yourself:

Source: Flávio Clésio's twitter (saved version), @soobrosa is Daniel's twitter handle.

First, let's go back to where we left off the last time.

The Data Engineering Summer Camp

Quick recap: between the 18-22nd May (Monday-Friday) we've held the first Summer Camp by Pipeline Academy with eight participants in the virtual classroom. The purpose was supporting people and businesses by teaching them a valuable competence through a hands-on project and learning about our own areas of improvement through the process.

The agenda was designed to cover the most important building blocks of setting up a simple data product: an introduction to data engineering, ETL basics, some SQL and deployment. Since we prefer actual building to just talking about building, the decision to focus on real implementation was quickly made. The below schedule reflects this attempt with 2.5 days of classroom experience aka 'campsite' (frontal teaching, live coding, group discussions, presentations and demos, Q&A) and 2.5 days of individual or team adventures (in duos) for implementing the data pipeline.

The week before the workshop the participants received homework. At first glance, the task was fairly simple and straightforward, but it included some challenges that could be solved in various ways, depending on what tools and methods a student prefers. Imagine you're asked to travel from point A to point B within a city: there are plenty of alternative ways of completing the task picking different routes, means of transportation, and the chosen way serves as a testament to the preferences and skillset of the traveler. We used this dry run to get a better understanding of what our cohort is comfortable with vs. what topics we need to put more emphasis on to achieve the desired outcome for the week.

The first morning was about getting familiar with each other, some chit-chat about our daily routines during corona and our different experiences with working with data. We've moved on to discussing what data engineering is and how the skills that make a data engineer relate to the participants' individual skillset. Here's an example: a data scientist usually has more advanced mathematical know-how than a frontend engineer, but the latter is likely to have seen or written more maintainable code (especially collaboratively in larger teams). The afternoon was spent with an overview of ETL procedures (extract, transform, load).

The first day was very exciting but exhausting as well. I was relieved that we've managed to capture the attention of the participants as they kept coming back the following days, without any measurable churn rate. Some of the students had to skip a couple of hours of classes during the week due to bureaucratic obligations and work emergencies, and sometimes people did not join the class instantly due to issues with their internet provider at home (Berlin, du bist so wunderbar). We've had one single person who pretty much gave up on delivering his solution (albeit staying on board until the end) as a result of unforeseen duties that hijacked their time and attention for the week.

This is not an illustration.

Part of the plan was letting the hands of the students go so they can explore the newly learned methodologies and tools by themselves and combine it with their existing knowledge. Our aim was to enable independent work and push for applied creative problem-solving, while staying available for everyone for questions and support via chat. The student feedback confirmed our assumption: this was a highly productive segment of the week that accelerated the pace of learning rapidly.

It was remarkable how much interest there was in seemingly niche data engineering topics and obscure tools: some people were enthusiastic and some were a bit more skeptical about building a data pipeline with some unfamiliar puzzle pieces they were not accustomed to (i.e. SQLite), but most students were open to exploration and experimentation. Five days passed by in an instant.

As a closing event, the participants who have worked in teams had to present their data products on Friday. The solutions showed two examples of highly successful and productive collaborations, and two teams had hit roadblocks that could be clearly identified and discussed to pave the way for a late delivery. In addition to the course certificate, Daniel and I decided to gift Udemy courses to the students in order to make sure that they continue pushing themselves forward on the endless path of data engineering.

Feedback time

We've asked the students to fill out a survey so we get a better understanding of how they see the Summer Camp, you can read some of their sentiments in italic below. The learnings were many:

More fluff

"The course delivered a good blend between technical hands-on and practical aspects about the Data Engineering profession and roles. The course has a modern approach to combine the pragmatism of the data activities (e.g. reliability, well thought changes) with modern approaches (e.g. online deployment, orchestration, monitoring, etc)."

Curricula and roadmaps for getting started with data science are plenty, data engineering however requires a different type of didactics and a learning environment that mimics the engineering and collaboration processes at companies leveraging technology.

Adam, long time data science teacher has written up his experiences at the summer camp on his blog focusing on how this week's learnings enabled him to move forward with his own projects. Make sure to check out his post for a more technical POV (full disclosure: I've had the pleasure of working with Adam for a while and consider him a friend):

"As someone who has taught data science for a while, the most impressive thing was the simplicity of the stack. It is very easy when teaching to complicate things for students, or to teach complex tools that confuse more than help.

It can’t be understated the power of leaving after five days with a working product. The value of being able to see and interact with your data is huge, for spotting problems with your data pipeline to showing off to customers (or employers!)."

The Summer Camp was my major highlight of the lockdown: it turned out to be a productive validation for Pipeline Academy, and furthermore we've managed to support people and share valuable knowledge while staying true to our principles - transparency, collaboration, pragmatism and common sense.

I hope to see you at our campus in the fall.