Pipeline Data Engineering Academy home blog pages letters

Data Engineering Advent Calendar 2020

Throughout December 2020 we’ve shared a daily dose of semi-esoteric data engineering wisdom on our social media channels (twitter, instagram and LinkedIn). This post shall serve as a commemorative monolith you can always turn to when the data engineering gods are not picking up your call.

#1: Don’t write code, solve the problem.

#2: Python is always at hand to pretty print a JSON:

$ python3 -m json.tool some.json

#3: EXPLAIN is your friend.

#4: "Choose boring technology." Dan McKinley @mcfunley

#5: Complicated is better than complex.

#6: What do you do on your CLI?

$ < ~/.bash_history | sort | uniq -c | sort -n

#7: SQLite (2000) has one trillion (1e12) active installs. It's a file with SQL API and window functions.

#8: "Premature optimization is the root of all evil." Tony Hoare (Quicksort)

#9: Keep It Simple (&) Stupid and remember separation of concerns.

#10: Log in to a recently launched container (via @basmatitree)

$ docker exec -it $(docker ps -q | tail -1) /bin/bash

#11: Get details of last failed Redshift load.

SELECT * FROM stl_load_errors ORDER BY starttime DESC LIMIT 1;

#12: "When in doubt, use brute force." Ken Thompson (Go, UTF-8, Unix)

#13: Maintainable is debuggable and testable, and has version control.

#14: Remove all local git branches other than master and the currently used one (via @advincze)

$ git branch --no-color | grep -v 'master' | grep -v '*' | xargs git branch -D

#15: Order of execution in SQL:

FROM WHERE GROUP BY HAVING SELECT [DISTINCT] UNION ORDER BY

#16: "Don't reinvent the flat tire." Alan Kay (Squeak, Smalltalk, OOP, GUI)

#17: Code is dependency. Others' code is dependency squared. Delete, remove, retire.

#18: You can run SQL directly on your CLI on CSV or TSV files with http://harelba.github.io/q/

#19: Queries on MPPs? Use WITH/CTEs, filter with WHERE, SELECT explicitly, avoid JOIN, SORTKEYS are your friends. It's all about scanning less.

#20: “Bad programmers worry about the code. Good programmers worry about data structures and their relationships.” Linus Torvalds (Git, Linux)

#21: The longer a technology lives, the longer it can be expected to live (Lindy effect)

#22: Delete files recursively:

$ find . -name "*.pdf" -print0 | xargs -0 rm

#23: Why PostgreSQL (1996)? It is the open source RDBMS with columnar (cstore), geo (PostGIS), timeseries (TimescaleDB) and REST API (PostgREST).

#24: "Use simple algorithms as well as simple data structures." Rob Pike (Go, UTF-8, Unix)