~/unskewdata

Building Data Systems
That Actually Work

Principal Data Engineer. Open-source contributor. Writing about software engineering, distributed systems and databases.

KafkaSparkSnowflakeBigQueryRustPythonKubernetesTerraformAirflowHudigRPCFlink

Read the Blog Build Log

JUST LAUNCHEDMarch 2026

Introducing unskewdata Learn

Interactive learning for software and data engineering. Explore data structures, algorithms, system design, and more through hands-on visualizations built to make every concept click.

Start Learning

$ cat build.log

Build Log

Real systems I've designed and shipped in production. Click to expand the full breakdown.

Building a Full Data Platform from Zero, 3 Times, Across AWS and GCP

AWSGCPRedshiftBigQueryDBTAirflowTerraform

Platforms Built

Cutting Time-to-Insight from Days to Minutes with Real-Time Snowflake Ingestion

SnowpipeSnowflakeAWS SQSS3StreamlitReal-Time

Days → Min

Time to Insight

Real-Time Customer Data Enrichment via Kafka Streams at Scale

Kafka StreamsKSQL DBApache HudiKubernetesRocksDB

Real-Time

Enrichment

Bridging GCP and AWS: Cross-Cloud Event Pipeline for Unified Analytics

GCPAWSPub/SubKinesis FirehoseRedshiftCross-Cloud

2 Clouds

Unified Pipeline

Zero-Downtime Event Tracking at Scale with the Outbox Pattern

FastAPIKafkaOutbox PatternProtobufHelloFresh

Events Lost

GDPR-Compliant Data Lake with ACID Transactions on S3

Apache HudiS3SparkGDPRAWS GlueAthena

ACID

On S3

Scaling Spark Pipelines Across Three Companies and Two Clouds

Apache SparkPySparkScalaAWS EMRGlueAirflow

Companies

High-Performance Vector Search API in Rust for ML Recommendations

RustAxumgRPCQdrantAWS EKSRedis

Sub-ms

Latency

Building Data Systems That Actually Work