Pipeline Data Engineering Academy home blog pages letters

Introducing: Sustainable Data Engineering

Starting 2021 we're enhancing the educational focus of our data engineering bootcamp by integrating sustainability as a core value in addition to transparency, pragmatism and collaborative way of doing things.

Pipeline Academy graduates will become the first data professionals equipped with practical data engineering concepts and best practices that support the environmental, economic and social sustainability of the products they build and the companies they join.

We do this to empower our participants to use their engineering skills consciously taking all stakeholders of the ecosystem into consideration, but also to push data engineering culture in general to the next level. We're more than certain that this expertise will translate into a significant competitive advantage on the job market for our graduates.

But what does sustainability mean in the world of data engineering?

1) Environmental sustainability in data engineering

The architectures designed by data engineers leave a footprint on our planet: the most obvious but often ignored factor when setting up a data stack is energy consumption. Understanding the metrics and core drivers of the ecologic impact is essential for every environmentally conscious engineer. Just imagine, what if architects would ignore the quality and quantity of construction materials and processes they use for their buildings? What about managing data waste (hardware and software)? Have you ever compared how green the major cloud providers are compared to each other? Applying this know-how responsibly results in more efficient and sustainable data systems.

2) Economic sustainability in data engineering

Building and scaling infrastructure can quickly become a serious cost driver for organisations collecting and leveraging data actively. The blurry and deliberately incomparable pricing structures of competing data management tools make decision-makers without a clear guiding framework face a tricky situation with high financial risks. Applying the right KPIs, anticipating business needs and considering technical constraints can lead to meaningful cost reductions for tooling, and the contribution of the responsible experts will support the long-term success of their businesses. Our advisor, Dr. Martin Loetzsch is the Chief Data Officer at Project A Ventures, and their Sustainability Playbooks are among the useful guiding lights for us.

3) Social sustainability in data engineering

For the last decade, data has been treated as the new oil for economic growth. But just like with fossil fuel, there are certain social consequences of managing and using data for business purposes. Even in the current regulatory environment (that is expected to become more and more strict in the near future), data engineers have one of the most central roles when it comes to informing stakeholders and executing engineering work according to guidelines like GDPR (General Data Protection Regulation) and CCPA (California Consumer Privacy Act). Addressing privacy and security concerns proactively, understanding the potential for abuse when working with user data, having a data management/governance in place are just a few of the concerns data professionals have to deal with on a daily basis. But it does not stop there: creating a healthy working environment for data teams by managing roles and responsibilities, increasing the transparency for their work to manage expectations are just as important for sustainable careers and successful work. To name an example, Maciej Cegłowski of Pinboard talked about this back in 2015.

Why are we doing this?

Your opportunity

The primary purpose of Pipeline Academy is to help individuals learn data engineering fundamentals so they can improve on their career opportunities. Daniel and I focus a lot on ideas that increase the long-term effectiveness of the content and methodologies we use. When observing the tectonic shift in consumer behaviour towards a more environmentally and socially conscious economy, we can see that it presents a great opportunity for every professional today. Learning how to create products or manage a business so they do little to no harm to others is becoming a huge competitive advantage on the job market, and will eventually be essential for tomorrow's purpose-driven leaders. In simple terms: we give our bootcamp participants not only the technical know-how about building data infrastructure, but also context about their responsibilities and options for managing its impact.

Our shared responsibility

Data engineers set up infrastructure that impact our social and ecological environment, and it is their duty to design systems that consider the ecosystem of stakeholders it is interacting with. Designing software architectures and optimising operational efforts will become much more pronounced mid term: employers selling i.e. eco-friendly products are going to hire data professionals who can execute on this specific value proposition. As educators, we have the power to equip the future architects and builders of data platforms with the knowledge that enables them to create a better working and living environment. They will build the systems that power the technological reforms in mobility, healthcare and education in the upcoming years, and they will educate future generations of data experts - and this multiplier effect makes us even more aware of our influence. Check how your company or a company you're planning to join does in these terms.

Rooted in our resourcefulness

In the last 15 years hyper-growth has become the standard expectation towards tech companies, which resulted in an unsustainable business- and working culture that does not support the long-term coexistence of social and environmental stakeholders. With Pipeline Academy we're building a bootstrapped educational institution that we'd like to establish long-term, and we won't sacrifice our lean DIY-attitude and minimalistic approach for growth. Conscious environmental management of our own organisation, symbiotical coexistence with our partners, and actively fostering a sustainable data and learning culture with our educational concept are proof of that. Compared to some other programs we put the emphasis on long-lasting value in handmade quality with a focus on the individual, and avoid ephemeral and trend-driven promises addressing the mass market.

Sustainable data engineering in practice

The sustainability concept is integrated into our curriculum in the form of weekly lectures, discussions, case studies, reading materials, but also very hands-on exercises. Some of the topics and concepts we discuss in the bootcamp:

  • Designing lean data architectures - explain it to your grandma,

  • Unix philosophy and the Zen of Python,

  • KISS, figure out what to worry about, learn what not to do,

  • 'Take the long view' (Prof. Galloway), invest in durable knowledge,

  • Strive for fast and good enough (Pareto and YAGNI),

  • Code longevity, reuse and recycle code,

  • System optimisation: remove, retire and delete,

  • Measuring tooling efficiency: speed, data storage and energy consumption,

  • Just because you can it does not mean you should - you are not Google, you don't need their stack,

  • Don't feed the FAANG beast with your data.

This is your chance to get ahead of the curve and learn data engineering enhanced with principles that contribute to a more sustainable software development and data management culture.

PS: We too are learning and exploring along the way, so if you have suggestions for us on how to improve on the subject matter, or would like to cooperate - do let us know! We're always open for meaningful collaborations.