Data engineering salon. News and interesting reads about the world of data.
Google and the entire tracking industry relies on IAB Europe’s consent system, which has now been found to be illegal.
The curse of complexity around the Modern Data Stack.
Spreadsheets are the interface that allows anyone to quickly and easily bring data into the data warehouse.
A serverless solution to run dbt in a self-hosted and collaborative setup and being able to follow GitOps style.
How to manage dependencies between data pipelines.
By following this pattern to create dimension models, it is easy to incorporate both Type 1 and Type 2 changes into the same dimension. Using dbt macros, we have modularised the dimension’s functionality for reusability and provide capacity to add more functionality to the dimension.
At the time of writing, we have over 4700 data models in our production dbt project, and over 800 views defined in Looker 🤯.
Most of us are not aware of all the features in tools we use on a daily basis, especially if it's big and extensive like PostgreSQL.
It is powered by WebAssembly, speaks Arrow fluently, reads Parquet, CSV and JSON files backed by Filesystem APIs or HTTP requests and has been tested with Chrome, Firefox, Safari and Node.js.
Put down the K8s cluster, your pipelines can run without it.
No, these tools accomplish different goals, however they can be used in combination to provide the best of both worlds: reproducible builds and containerized deployments.
Cloud vendors will increasingly focus on the lowest layers in the stack: basically leasing capacity in their data centers through an API. Other pure-software providers will build all the stuff on top of it. Databases, running code, you name it.