System Architect — Warsaw, PL

Architect.
Ship.
Observe.

Designing distributed systems, deployment pipelines, and production infrastructure. Every system I build runs in production with real users and full observability.

2yr
Experience
2
Prod Systems
3
Active clusters
00 Why Kafka
Apache Kafka
The architecture I respect the most.
01
Sequential Disk I/O
Kafka writes to disk sequentially — not randomly. This is faster than in-memory random writes on modern hardware. It exploits the OS page cache and the way SSDs and HDDs are physically built. Most people don't realize disk I/O can beat network I/O when you write sequentially. Kafka did.
02
Persistent Recovery
When Kafka goes down, nothing is lost. Messages are persisted to disk volumes. When brokers come back, consumers pick up exactly where they left off using committed offsets. Tasks that arrived during an outage are still there, waiting. No silent drops, no lost work.
03
One Tool, Not Two
Without Kafka you need Redis as a message broker AND Celery as the async task processor — two systems to configure, monitor, and debug. Kafka IS the broker. It handles message queuing, persistence, ordering, and consumer group coordination in a single system. Less moving parts, less failure surface.
04
Distributed by Design
Kafka isn't distributed as an afterthought — it was built for it from day one. Partitioning, replication, ISR sets, leader election via KRaft (bye ZooKeeper). Every design decision assumes multiple nodes, network failures, and split-brain scenarios. It's a distributed systems textbook in production code.
05
The Logo
Look at it. The connected nodes, the central broker, the producer-consumer topology visible in a single glyph. It's not just a logo — it's a system diagram. You can literally explain Kafka's architecture by pointing at different parts of its own logo. No other tool does that.
06
The Name
Named after Franz Kafka — the author who wrote about incomprehensible, labyrinthine systems that trap people inside them. The LinkedIn engineer who created it said he "just liked the name." That's the most engineer thing ever said about naming a distributed messaging system after an existential horror author.
01 Cluster
Python
Language
Rust
Language
SQL
Language
HTML/CSS
Language
Django
Framework
Docker
Infra
Docker Compose
Infra
nginx
Infra
systemd
Infra
Linux
Infra
AWS EC2/RDS
Cloud
PostgreSQL
Database
Redis
Database
NeonDB
Database
Kafka
Streaming
PgBouncer
Pooling
Grafana
Observe
Prometheus
Observe
Tempo
Observe
OpenTelemetry
Observe
Alertmanager
Observe
Cloudflare
Network
Cloudflare Tunnels
Network
Stripe
Payments
Stripe Connect
Payments
GitHub Actions
CI/CD
Wardent
Custom Proxy
SendGrid
Email
WebSockets
Real-time
ZooKeeper
Distributed
SSH
Access
Let's Encrypt
SSL
k3s
Orchestration
Helm
Orchestration
Git
Version Control
AWS RDS
Cloud
AWS Security Groups
Cloud
Cloudflare R2
Storage
Visual Stack
Kafka
k3s
Rust
Helm
PostgreSQL
Terraform
Grafana
Prometheus
OpenTelemetry
SendGrid
Docker
Python
Django
Redis
nginx
Git
Linux
GitHub Actions
AWS
Cloudflare
Stripe
pytest
GitHub
Gunicorn
JWT
Celery
Java
Google OAuth
02 Pipeline
Production
Document collaboration platform. Full infra chain: Cloudflare → nginx → Wardent (custom Rust proxy) → Django. Complete observability with Prometheus, Grafana, Tempo, OTel, and automated alerting via Alertmanager → SendGrid.
Django Rust Docker AWS Stripe OTel NeonDB
Production
Real-time cryptocurrency trading simulator with live Binance WebSocket price feeds, 10-second OHLC candlestick charts with EMA 12/26, atomic buy/sell with price quote verification, portfolio dashboard with allocation breakdown, and JWT cookie-based authentication.
Django Channels PostgreSQL Redis WebSockets Vanilla JS
Production
Real estate platform with Stripe Connect payments, 3D property mapping, real-time chat via WebSockets, and full JWT authentication system.
Django Stripe Connect WebSockets 3D Mapping
Building
Custom deployment orchestrator. Rust CLI (sol) communicates with Python service on Mac Mini via Cloudflare Tunnel. Push to pre-prod → validate → soak test → auto merge to production.
Rust CLI Python Cloudflare Tunnel CI/CD
Custom Tool
Custom reverse proxy in Rust. Rate limiting with real client IP extraction via CF-Connecting-IP, cryptographic secret middleware, and request forwarding to Django.
Rust systemd Security Networking
03 Topics
Apache Kafka
Streaming
CRDTs & Conflict Resolution
Distributed
Database Clustering & Sharding
Databases
Merge Conflicts & Write Ordering
Consistency
WebSocket Architecture
Real-time
Concurrent Writes & Race Conditions
Concurrency
Leaderless vs Leader-Based Replication
Replication
Quorum Reads & Writes
Consensus
Byzantine Fault Tolerance
Faults
Clock Skew & Sequence Ordering
Ordering
ZooKeeper & Split Brain
Coordination
Redis Pub/Sub & Cross-Node Delivery
Messaging
SSI, MVCC & Transaction Isolation
Transactions
Cascading OOM & Pod Failures
Failure Modes
Cassandra — Wide-Column Modeling & Partition Design
Databases
CockroachDB — Distributed SQL & Serializable Isolation
Databases
04 Operations
Orchestration
Multi-Cluster Kubernetes
Operated independent k3s clusters communicating cross-network — application cluster, Kafka streaming cluster, and dedicated monitoring cluster. StatefulSets for stateful workloads, Traefik ingress routing, secret management, NodePort service exposure for cross-cluster metric scraping.
Streaming
Kafka Cross-Cluster Pipeline
Built producer-consumer pipeline spanning two independent clusters. KRaft mode broker (no ZooKeeper), murmur2 partition hashing, consumer group offset tracking, at-least-once delivery semantics. Configured advertised listeners for cross-network broker discovery. Auto-topic creation, persistent message recovery on consumer restart.
Observability
Full-Stack Monitoring
Centralized monitoring cluster scraping metrics from application and streaming clusters via cross-network Prometheus. Grafana dashboards with cluster/service/pod drill-down. Tempo for distributed tracing via OpenTelemetry. Alertmanager pipeline routing alerts through SendGrid SMTP. UptimeRobot external health checks.
CI/CD
Automated Deployment Pipeline
GitHub Actions workflow: validate dependencies → run test suite → build containers → deploy with rolling restarts. ~30s deployment window. Dependency verification before merge. Branch protection rules. Automated rollback on health check failure.
Custom Tooling
Wardent — Production Reverse Proxy
Custom Rust reverse proxy running as systemd service. Rate limiting with real client IP extraction via CF-Connecting-IP. Bot detection and AI crawler filtering — confirmed kills on ChatGPT and Gemini crawlers. Cryptographic secret middleware (X-Wardent-Secret). Raw status code responses for nginx error page interception. Ban management with configurable duration and violation thresholds.
Async Processing
Celery + Redis Task Infrastructure
Celery workers with Redis broker for asynchronous task processing. Separate database numbers for broker and cache (Celery /1, Django /0). Task serialization, retry logic, and queue management. Redis-backed Django cache layer for view caching and session storage.
Containers
10-Container Docker Orchestration
Production compose stack: Django, Celery, Redis, Prometheus, Grafana, Tempo, OTel Collector, Alertmanager, nginx, Wardent. Log rotation (max-size 10m, max-file 3), restart policies, ghost container cleanup, volume management. Docker image builds with dependency caching and multi-layer optimization.
Payments
Stripe Webhook Architecture
Full webhook chain: Stripe → Cloudflare → nginx → Wardent (signature bypass) → Django. invoice.paid as single source of truth for tier assignment. customer.subscription.created only links Stripe customer ID. invoice.payment_failed returns early on free tier. Stripe Connect for marketplace payments on Homiverse.
Database
PostgreSQL & Data Layer
NeonDB managed Postgres with connection pooling. Transaction isolation with READ COMMITTED and targeted select_for_update. Geospatial partitioning concepts (h3_id). MVCC and SSI understanding from DDIA. Django ORM optimization for N+1 query elimination.
Networking
Request Chain & Error Handling
Full request chain: Cloudflare → nginx → Wardent → Django. Static error pages (403, 413, 500, 502, 503, 504) served without Django dependency using Cloudflare =200 bypass trick. SSL/TLS via Let's Encrypt with certbot auto-renewal. Google OAuth redirect URI debugging across www/non-www domains. CSRF configuration across multiple environments.
05 Incidents
P0
Concurrent Deadlock — Silent Production Hang
Production reverse proxy (Wardent) silently stopped forwarding requests. No error logs, no crash, process alive, all containers healthy, alertmanager didn't fire. Root cause: DashMap deadlock in rate limiter — request handler thread held banned shard lock while waiting for violations shard, cleanup task simultaneously held violations shard via retain() and attempted to read banned map. Classic concurrent deadlock with no observable symptoms. Fix: made request path read-only on shared maps, cleanup task became sole owner of all mutations.
Rust Concurrency DashMap Zero-Log Failure
P1
Double Base64 — JWT Secret Encoding Trap
Kubernetes pods crashing with UnicodeDecodeError on JWT private key. 45 minutes of debugging across stringData vs data in secrets, --from-env-file, --from-literal, and Python secret creation scripts. Actual root cause: web deployment only mounted individual env vars (REDIS_URL, REDIS_PASSWORD) — JWT_PRIVATE_KEY was never mounted. os.getenv returned empty string, base64.b64decode('') produced garbage bytes. One-line fix: envFrom secretRef instead of individual env entries.
Kubernetes Secrets Base64 JWT
P2
OAuth Redirect Mismatch — Weeks Undetected
Google OAuth returning "Access blocked: redirect_uri_mismatch" for all social login attempts. App was sending www.solaradocs.net but Google Cloud Console only had non-www registered. Had been broken for weeks — nobody noticed because all active users were already authenticated. Fixed by adding www callback URI to Google Cloud Console.
OAuth DNS Silent Failure
P2
nginx 413 Bypass — Body Size Interception Failure
nginx's built-in 413 response from client_max_body_size completely bypasses error_page directives — custom error pages never served. Fix required disabling nginx body limit entirely (client_max_body_size 0), delegating body size enforcement to Wardent, which returns raw 413 that nginx then intercepts via proxy_intercept_errors and serves the static page.
nginx Error Handling Wardent
P3
Wardent Rate Limiter — Wrong IP Source
Rate limiter was using socket address (Cloudflare edge IP) instead of real client IP. Behind Cloudflare, every request appeared to come from the same IP — one user triggering rate limit would ban all users. Fixed by extracting CF-Connecting-IP header for real client identification.
Cloudflare Rate Limiting IP Extraction

Let's connect.

Available for freelance architecture and system design work.

GitHub Email Discord SolaraDocs