Data engineering salon. News and interesting reads about the world of data.
36% percent of people in the UK use an adblocker.
Accidental removal of data on S3 is something that no Data Engineer on AWS wants to be involved in.
In this post I show you how to synthesize billions of rows of true time series data with an autoregressive component, and then explore it with ClickHouse, a big data scale OLAP RDBMS, all on AWS.
This post outlines how to use all common Python libraries to read and write Parquet format while taking advantage of columnar storage, columnar compression and data partitioning.
Our current insertion rate is about 90M rows per second.
Currently, we have a warehouse consisting of over 100 models, and this validation step takes about two minutes.
Is an open-source (OSS) approach is more relevant than a commercial software approach in addressing the data integration problem.