Pipeline Data Engineering Academy home blog pages letters

The Data Janitor Letters - April 2020

Data engineering salon. News and interesting reads about the world of data.

Why you should stop using Google Analytics on your website
Marko Saric, Plausible Analytics

Good reasons, not even mentioning the real important ones.

Data Science: Reality Doesn't Meet Expectations
Dan Friedman

Data & infrastructure have serious quality problems.

Filip Piekniewski, Scientist, Accel Robotics

The realization that deep learning is not going to cut it with respect to self driving cars and many other applications is now an open secret.

Python Web Scraping with Virtual Private Networks
Mark Litwintschik, #BigData Consultant

I'll explore two solutions, the first using WireGuard and the second, using an OpenSSH SOCKS5 proxy.

10 Things I Hate About PostgreSQL
Rick Branson, Engineering Leader, Segment

In general I’d recommend starting with PostgreSQL and then trying to figure out why it won’t work for your use case.

Things I Wished More Developers Knew About Databases
Jaana Dogan, Engineer, Google

You are lucky if 99.999% of the time network is not a problem.

Server-side tracking: surprisingly easy
Martin Loetzsch, Chief Data Officer, Project A Ventures

Pixel-based tracking is dead.

Optimize PostgreSQL Server Performance Through Configuration
Tom Swartz, Software Engineer, Crunchy Data

If a query performs heavy joins or other expensive aggregate operations, or if a query is performing a full table scan where an index could be used, it will nearly always perform poorly, no matter how well the database settings are tuned.