Designing distributed systems, deployment pipelines, and production infrastructure. Every system I build runs in production with real users and full observability.
2yr
Experience
2
Prod Systems
3
Active clusters
00Why Kafka
Apache Kafka
The architecture I respect the most.
01
Sequential Disk I/O
Kafka writes to disk sequentially — not randomly. This is faster than in-memory random writes on modern hardware. It exploits the OS page cache and the way SSDs and HDDs are physically built. Most people don't realize disk I/O can beat network I/O when you write sequentially. Kafka did.
02
Persistent Recovery
When Kafka goes down, nothing is lost. Messages are persisted to disk volumes. When brokers come back, consumers pick up exactly where they left off using committed offsets. Tasks that arrived during an outage are still there, waiting. No silent drops, no lost work.
03
One Tool, Not Two
Without Kafka you need Redis as a message broker AND Celery as the async task processor — two systems to configure, monitor, and debug. Kafka IS the broker. It handles message queuing, persistence, ordering, and consumer group coordination in a single system. Less moving parts, less failure surface.
04
Distributed by Design
Kafka isn't distributed as an afterthought — it was built for it from day one. Partitioning, replication, ISR sets, leader election via KRaft (bye ZooKeeper). Every design decision assumes multiple nodes, network failures, and split-brain scenarios. It's a distributed systems textbook in production code.
05
The Logo
Look at it. The connected nodes, the central broker, the producer-consumer topology visible in a single glyph. It's not just a logo — it's a system diagram. You can literally explain Kafka's architecture by pointing at different parts of its own logo. No other tool does that.
06
The Name
Named after Franz Kafka — the author who wrote about incomprehensible, labyrinthine systems that trap people inside them. The LinkedIn engineer who created it said he "just liked the name." That's the most engineer thing ever said about naming a distributed messaging system after an existential horror author.
Real-time cryptocurrency trading simulator with live Binance WebSocket price feeds, 10-second OHLC candlestick charts with EMA 12/26, atomic buy/sell with price quote verification, portfolio dashboard with allocation breakdown, and JWT cookie-based authentication.
Custom deployment orchestrator. Rust CLI (sol) communicates with Python service on Mac Mini via Cloudflare Tunnel. Push to pre-prod → validate → soak test → auto merge to production.
Custom reverse proxy in Rust. Rate limiting with real client IP extraction via CF-Connecting-IP, cryptographic secret middleware, and request forwarding to Django.
Operated independent k3s clusters communicating cross-network — application cluster, Kafka streaming cluster, and dedicated monitoring cluster. StatefulSets for stateful workloads, Traefik ingress routing, secret management, NodePort service exposure for cross-cluster metric scraping.
◉
Streaming
Kafka Cross-Cluster Pipeline
Built producer-consumer pipeline spanning two independent clusters. KRaft mode broker (no ZooKeeper), murmur2 partition hashing, consumer group offset tracking, at-least-once delivery semantics. Configured advertised listeners for cross-network broker discovery. Auto-topic creation, persistent message recovery on consumer restart.
◈
Observability
Full-Stack Monitoring
Centralized monitoring cluster scraping metrics from application and streaming clusters via cross-network Prometheus. Grafana dashboards with cluster/service/pod drill-down. Tempo for distributed tracing via OpenTelemetry. Alertmanager pipeline routing alerts through SendGrid SMTP. UptimeRobot external health checks.
⟁
CI/CD
Automated Deployment Pipeline
GitHub Actions workflow: validate dependencies → run test suite → build containers → deploy with rolling restarts. ~30s deployment window. Dependency verification before merge. Branch protection rules. Automated rollback on health check failure.
⊘
Custom Tooling
Wardent — Production Reverse Proxy
Custom Rust reverse proxy running as systemd service. Rate limiting with real client IP extraction via CF-Connecting-IP. Bot detection and AI crawler filtering — confirmed kills on ChatGPT and Gemini crawlers. Cryptographic secret middleware (X-Wardent-Secret). Raw status code responses for nginx error page interception. Ban management with configurable duration and violation thresholds.
⬡
Async Processing
Celery + Redis Task Infrastructure
Celery workers with Redis broker for asynchronous task processing. Separate database numbers for broker and cache (Celery /1, Django /0). Task serialization, retry logic, and queue management. Redis-backed Django cache layer for view caching and session storage.
Full webhook chain: Stripe → Cloudflare → nginx → Wardent (signature bypass) → Django. invoice.paid as single source of truth for tier assignment. customer.subscription.created only links Stripe customer ID. invoice.payment_failed returns early on free tier. Stripe Connect for marketplace payments on Homiverse.
◇
Database
PostgreSQL & Data Layer
NeonDB managed Postgres with connection pooling. Transaction isolation with READ COMMITTED and targeted select_for_update. Geospatial partitioning concepts (h3_id). MVCC and SSI understanding from DDIA. Django ORM optimization for N+1 query elimination.
⊞
Networking
Request Chain & Error Handling
Full request chain: Cloudflare → nginx → Wardent → Django. Static error pages (403, 413, 500, 502, 503, 504) served without Django dependency using Cloudflare =200 bypass trick. SSL/TLS via Let's Encrypt with certbot auto-renewal. Google OAuth redirect URI debugging across www/non-www domains. CSRF configuration across multiple environments.
05Incidents
P0
Concurrent Deadlock — Silent Production Hang
Production reverse proxy (Wardent) silently stopped forwarding requests. No error logs, no crash, process alive, all containers healthy, alertmanager didn't fire. Root cause: DashMap deadlock in rate limiter — request handler thread held banned shard lock while waiting for violations shard, cleanup task simultaneously held violations shard via retain() and attempted to read banned map. Classic concurrent deadlock with no observable symptoms. Fix: made request path read-only on shared maps, cleanup task became sole owner of all mutations.
RustConcurrencyDashMapZero-Log Failure
P1
Double Base64 — JWT Secret Encoding Trap
Kubernetes pods crashing with UnicodeDecodeError on JWT private key. 45 minutes of debugging across stringData vs data in secrets, --from-env-file, --from-literal, and Python secret creation scripts. Actual root cause: web deployment only mounted individual env vars (REDIS_URL, REDIS_PASSWORD) — JWT_PRIVATE_KEY was never mounted. os.getenv returned empty string, base64.b64decode('') produced garbage bytes. One-line fix: envFrom secretRef instead of individual env entries.
KubernetesSecretsBase64JWT
P2
OAuth Redirect Mismatch — Weeks Undetected
Google OAuth returning "Access blocked: redirect_uri_mismatch" for all social login attempts. App was sending www.solaradocs.net but Google Cloud Console only had non-www registered. Had been broken for weeks — nobody noticed because all active users were already authenticated. Fixed by adding www callback URI to Google Cloud Console.
OAuthDNSSilent Failure
P2
nginx 413 Bypass — Body Size Interception Failure
nginx's built-in 413 response from client_max_body_size completely bypasses error_page directives — custom error pages never served. Fix required disabling nginx body limit entirely (client_max_body_size 0), delegating body size enforcement to Wardent, which returns raw 413 that nginx then intercepts via proxy_intercept_errors and serves the static page.
nginxError HandlingWardent
P3
Wardent Rate Limiter — Wrong IP Source
Rate limiter was using socket address (Cloudflare edge IP) instead of real client IP. Behind Cloudflare, every request appeared to come from the same IP — one user triggering rate limit would ban all users. Fixed by extracting CF-Connecting-IP header for real client identification.
CloudflareRate LimitingIP Extraction
Let's connect.
Available for freelance architecture and system design work.