Data engineering salon. News and interesting reads about the world of data.
When I finally deleted the old Spark code, it was a net delete of almost 1,700 lines of code; the resulting two SQL queries have, respectively, 155 and 81 lines of SQL code; and the new tests have about 1,231 lines of Python code.
Percona runs open source databases as managed services, which makes the company popular with customers but less so with competitors.
For small batch loads using traditional ETL tools is less complicated and much simpler to implement. But if the ETL pipeline needs to handle large amounts of data and scale Kafka wins hands down.
A Python library that scrapes metadata and formats it as markdown. A Rust CLI interface to search over that data.
We might have once lived in a world where QUEL and SQL would have continued to duke it out, and where the ‘best’ language might have found its own niches.
The initial query went from timing out in the webservice in question to returning results in a fraction of a second with basic binary search.
Free BigQuery export from GA to all customers.
We show you how you can easily set up a multi-touch attribution model to track website conversions with Google Analytics, Google Tag Manager and a Jupyter notebook.