~/unskewdata

Building Data Systems That Actually Work

Principal Data Engineer. Open-source contributor. Writing about software engineering, distributed systems and databases.

KafkaSparkSnowflakeBigQueryRustPythonKubernetesTerraformAirflowHudigRPCFlink
JUST LAUNCHEDMarch 2026

Introducing unskewdata Learn

Interactive learning for software and data engineering. Explore data structures, algorithms, system design, and more through hands-on visualizations built to make every concept click.

Start Learning
$ cat build.log
Build Log

Real systems I've designed and shipped in production. Click to expand the full breakdown.

Building a Full Data Platform from Zero, 3 Times, Across AWS and GCP

AWSGCPRedshiftBigQueryDBTAirflowTerraform
3x
Platforms Built

Cutting Time-to-Insight from Days to Minutes with Real-Time Snowflake Ingestion

SnowpipeSnowflakeAWS SQSS3StreamlitReal-Time
Days → Min
Time to Insight

Real-Time Customer Data Enrichment via Kafka Streams at Scale

Kafka StreamsKSQL DBApache HudiKubernetesRocksDB
Real-Time
Enrichment

Bridging GCP and AWS: Cross-Cloud Event Pipeline for Unified Analytics

GCPAWSPub/SubKinesis FirehoseRedshiftCross-Cloud
2 Clouds
Unified Pipeline

Zero-Downtime Event Tracking at Scale with the Outbox Pattern

FastAPIKafkaOutbox PatternProtobufHelloFresh
0
Events Lost

GDPR-Compliant Data Lake with ACID Transactions on S3

Apache HudiS3SparkGDPRAWS GlueAthena
ACID
On S3

Scaling Spark Pipelines Across Three Companies and Two Clouds

Apache SparkPySparkScalaAWS EMRGlueAirflow
3
Companies

High-Performance Vector Search API in Rust for ML Recommendations

RustAxumgRPCQdrantAWS EKSRedis
Sub-ms
Latency