02 — System Design Interview Guide
Priority: HIGH — With 5+ years of experience, this is your strongest differentiator. You’ve built real systems at scale — now learn to articulate it in interview format.
Table of Contents
- The System Design Interview Framework
- Core Concepts You Must Know
- Back-of-the-Envelope Estimation
- Top 20 System Design Problems
- Deep Dives: 5 Detailed Designs
- Your Experience as System Design Ammo
- Common Trade-offs to Discuss
- Resources
The System Design Interview Framework
Use this RESHADED framework for every problem (45-60 min):
R — Requirements (5 min)
Functional: What should the system do?
Non-functional: Scale, latency, availability, consistency
E — Estimation (5 min)
Users, QPS, storage, bandwidth
Back-of-envelope math to size the system
S — Storage Schema (5 min)
Data model: tables, relationships, indexes
SQL vs NoSQL decision with justification
H — High-Level Design (10 min)
Draw the architecture: clients → LB → services → DB/cache
Identify major components and data flow
A — API Design (5 min)
Key endpoints: method, path, params, response
Authentication, rate limiting, pagination
D — Detailed Design (10 min)
Dive deep into 1-2 critical components
This is where you show depth and trade-off reasoning
E — Evaluate (5 min)
How does the design handle failures?
What are the bottlenecks? How would you scale further?
What would you change for 10x more traffic?
D — Distinctive Features (optional, 5 min)
Monitoring, alerting, deployment strategy
Nice-to-have features that show you think beyond the basics
Pro Tips for the Interview
- Always start by asking questions. Never jump into design.
- Think out loud. The interviewer wants to see your thought process.
- Draw diagrams. Use boxes, arrows, labels. Keep it clean.
- Mention trade-offs. “I chose X over Y because…”
- Use numbers. “With 10M users at 10 QPS each, that’s 100M QPS…”
- Don’t go too deep too early. Get the high-level right first.
Core Concepts You Must Know
1. Scalability
Vertical Scaling (Scale Up):
- Bigger machine: more CPU, RAM, storage
- Simpler but has hard limits
- Good for databases (initially)
Horizontal Scaling (Scale Out):
- More machines
- Need: load balancing, data partitioning, stateless services
- Better for web/app tier
2. Load Balancing
What: Distributes incoming traffic across multiple servers.
Algorithms:
- Round Robin: simple, equal distribution
- Weighted Round Robin: heavier instances get more traffic
- Least Connections: sends to least busy server
- IP Hash: consistent routing for same client
- Consistent Hashing: minimizes data movement on scale events
Layers:
- L4 (Transport): TCP/UDP level, faster, no content inspection
- L7 (Application): HTTP level, content-based routing, path/header routing
Tools: AWS ALB/NLB, Nginx, HAProxy, Envoy
3. Caching
Why: Reduce latency, reduce DB load, improve throughput.
Cache Strategies:
- Cache-aside (Lazy Loading):
App checks cache → miss → read from DB → write to cache
Pros: Only caches what's needed
Cons: Cache miss penalty, potential stale data
- Write-through:
App writes to cache and DB simultaneously
Pros: Cache always consistent
Cons: Write latency, caches unused data
- Write-behind (Write-back):
App writes to cache → cache asynchronously writes to DB
Pros: Fast writes
Cons: Data loss risk if cache crashes
- Read-through:
Cache sits between app and DB, loads on miss automatically
Cache Eviction Policies:
- LRU (Least Recently Used) — most common
- LFU (Least Frequently Used)
- TTL (Time To Live) — expiry-based
Where to cache:
- Client-side (browser cache)
- CDN (static assets, edge caching)
- Application cache (Redis, Memcached)
- Database cache (query cache, buffer pool)
Tools: Redis, Memcached, Varnish, CDN (CloudFront, Cloudflare)
4. Database Design
SQL (Relational):
Strengths: ACID, joins, structured data, mature tooling
Use when: Strong consistency needed, complex queries, relationships matter
Examples: PostgreSQL, MySQL
NoSQL Types:
- Key-Value: Redis, DynamoDB (sessions, caching)
- Document: MongoDB (flexible schema, nested data)
- Column-family: Cassandra (wide columns, time-series)
- Graph: Neo4j (relationships, social networks)
Decision Framework:
Need ACID + complex queries? → SQL
Need horizontal scale + simple lookups? → Key-Value / Document
Need time-series at massive scale? → Column-family
Need relationship traversal? → Graph
5. Database Scaling
Read Replicas:
- Write to primary, read from replicas
- Replication lag = eventual consistency for reads
- Good for read-heavy workloads (80%+ reads)
Sharding (Horizontal Partitioning):
- Split data across multiple databases
- Shard key: determines which shard holds data
- Range-based: user_id 0-1M → shard 1 (risk: hot spots)
- Hash-based: hash(user_id) % N → shard N (better distribution)
- Challenges: cross-shard queries, rebalancing, shard key choice
Vertical Partitioning:
- Split tables by columns (e.g., user profile vs user activity)
- Different tables on different machines
Denormalization:
- Duplicate data to avoid joins
- Trade storage for read performance
- Keep in sync with triggers or application logic
6. Message Queues & Async Processing
Why: Decouple services, handle spikes, enable async processing.
Components:
Producer → Queue → Consumer
Patterns:
- Point-to-point: one consumer per message
- Pub/Sub: multiple consumers (topics/subscriptions)
- Work queue: multiple workers process from same queue
Guarantees:
- At-most-once: no retry (fast, may lose messages)
- At-least-once: retry on failure (duplicates possible)
- Exactly-once: hardest (deduplication + transactions)
Tools: RabbitMQ, Kafka, AWS SQS/SNS, Redis Streams
Your experience: "At Intensel, I designed async workflows with queues,
workers, retries, and scheduling for high-volume background jobs."
→ You literally built this. Talk about RabbitMQ + Dask setup.
7. CAP Theorem
In a distributed system, you can only guarantee 2 of 3:
C — Consistency: every read gets the latest write
A — Availability: every request gets a response
P — Partition tolerance: system works despite network failures
Reality: P is non-negotiable (networks fail), so you choose between:
CP: Consistent + Partition-tolerant (sacrifice availability)
→ HBase, MongoDB (strong consistency mode), ZooKeeper
AP: Available + Partition-tolerant (sacrifice consistency)
→ Cassandra, DynamoDB, DNS
In practice:
- Most systems are "mostly consistent, mostly available"
- Different data can have different guarantees:
Financial transactions → strong consistency
Social media likes → eventual consistency
8. Consistent Hashing
Problem: Simple hash(key) % N breaks when N changes (add/remove server).
Solution: Consistent hashing maps both keys and servers to a ring.
- Only K/N keys need to move when a server is added/removed
- Virtual nodes improve distribution (each server → multiple points on ring)
Use cases:
- Distributed caches (Redis cluster)
- Load balancing (sticky sessions)
- Database sharding
- CDN routing
9. API Gateway
What: Single entry point for all client requests.
Responsibilities:
- Routing: forward to correct service
- Authentication/Authorization
- Rate limiting
- Request/response transformation
- SSL termination
- Logging and monitoring
Tools: Kong, AWS API Gateway, Nginx, Envoy
10. CDN (Content Delivery Network)
What: Distributed network of edge servers that cache content close to users.
Types:
- Pull: CDN fetches from origin on first request, caches it
- Push: You upload content to CDN proactively
Use for: Static assets (images, CSS, JS), video streaming, API responses
Tools: CloudFront, Cloudflare, Akamai, Fastly
11. Rate Limiting
Why: Protect against abuse, ensure fair usage, prevent overload.
Algorithms:
- Token Bucket: tokens added at fixed rate, consumed per request
- Leaky Bucket: requests processed at fixed rate
- Fixed Window: count requests in fixed time windows
- Sliding Window Log: timestamps of recent requests
- Sliding Window Counter: hybrid of fixed window + sliding
Implementation:
- At API gateway level (global)
- Per-service (service-level)
- Using Redis (distributed rate limiting)
12. Replication & Consensus
Replication:
- Single-leader: one primary, multiple replicas (PostgreSQL, MySQL)
- Multi-leader: multiple primaries (conflict resolution needed)
- Leaderless: any node accepts writes (Cassandra, DynamoDB)
Consensus Protocols:
- Raft: leader election, log replication (easier to understand)
- Paxos: classic but complex
- ZAB: ZooKeeper's protocol
When asked about this:
- Explain in terms of your PostgreSQL replication experience
- Relate to your work with distributed Dask workers
Back-of-the-Envelope Estimation
Key Numbers to Memorize
Storage:
1 KB = 1,000 bytes (a short text record)
1 MB = 1,000 KB (a photo)
1 GB = 1,000 MB (a movie)
1 TB = 1,000 GB (large database)
1 PB = 1,000 TB (enterprise data warehouse)
Time:
1 ns = L1 cache reference
10 ns = L2 cache reference
100 ns = RAM access
1 ms = SSD random read
10 ms = HDD seek
150 ms = Round trip within same datacenter
150 ms = Send packet one way across the world
Traffic:
1 day = 86,400 seconds ≈ 100,000 seconds (for estimation)
1 million requests/day ≈ 12 requests/second
1 billion requests/day ≈ 12,000 requests/second
Quick math:
100M users × 10% daily active = 10M DAU
10M DAU × 5 requests/day = 50M requests/day
50M / 100K seconds ≈ 500 QPS
Peak = 2-3x average = 1000-1500 QPS
Estimation Template
1. Users: How many total? Daily active?
2. Read/Write ratio: Read-heavy (100:1)? Write-heavy?
3. QPS: DAU × actions per day / 86400
4. Peak QPS: 2-3x average
5. Storage: Records × size per record × retention period
6. Bandwidth: QPS × average response size
7. Memory (cache): If we cache 20% of hot data...
Top 20 System Design Problems
Tier 1: Must Know (very commonly asked)
| # | Problem | Key Concepts |
|---|---|---|
| 1 | Design a URL Shortener | Hashing, base62 encoding, redirection, analytics |
| 2 | Design Twitter/News Feed | Fan-out, timeline generation, caching, pub/sub |
| 3 | Design a Chat System (WhatsApp/Slack) | WebSockets, message queues, presence, delivery guarantees |
| 4 | Design a Rate Limiter | Token bucket, sliding window, distributed rate limiting |
| 5 | Design a Notification System | Push/pull, prioritization, deduplication, multiple channels |
Tier 2: Frequently Asked
| # | Problem | Key Concepts |
|---|---|---|
| 6 | Design an API Rate Limiter | Token bucket, Redis, distributed counting |
| 7 | Design a Key-Value Store | Consistent hashing, replication, conflict resolution |
| 8 | Design a Web Crawler | BFS, politeness, deduplication, distributed crawling |
| 9 | Design YouTube/Netflix | Video encoding, CDN, adaptive bitrate, recommendations |
| 10 | Design Uber/Lyft | Geospatial indexing, matching, real-time tracking |
Tier 3: Good to Know
| # | Problem | Key Concepts |
|---|---|---|
| 11 | Design Google Search | Inverted index, ranking, crawling, caching |
| 12 | Design Dropbox/Google Drive | File chunking, sync, conflict resolution, dedup |
| 13 | Design Instagram | Image storage, news feed, CDN, caching |
| 14 | Design a Task Queue | Worker pools, retries, dead letter queue, priorities |
| 15 | Design an E-commerce System | Inventory management, cart, checkout, payments |
Tier 4: Specialized (great for your profile)
| # | Problem | Key Concepts |
|---|---|---|
| 16 | Design a Geospatial Service | PostGIS, spatial indexing, tile serving, caching |
| 17 | Design a Data Pipeline | ETL, batch vs stream, exactly-once, schema evolution |
| 18 | Design a Metrics/Monitoring System | Time-series DB, aggregation, alerting, dashboards |
| 19 | Design a Map Tile Server | Tile pyramids, caching layers, CDN, auth |
| 20 | Design a Distributed Task Scheduler | Queues, workers, retries, scheduling, backpressure |
Deep Dives: 5 Detailed Designs
Design 1: URL Shortener (TinyURL)
Requirements:
- Shorten long URLs → short code (e.g., tiny.url/abc123)
- Redirect short → long URL
- Analytics (click count, geo, timestamp)
- Custom short URLs (optional)
- Expiry (optional)
Estimation:
- 100M new URLs/month, 10:1 read/write
- Write: 100M / (30 * 86400) ≈ 40 URLs/sec
- Read: 400 redirects/sec, peak: 1200/sec
- Storage: 100M * 500 bytes * 12 months ≈ 600 GB/year
High-Level Design:
Client → API Gateway → URL Service → Database
→ Cache (Redis)
API Design:
POST /api/urls {long_url, custom_alias?, expiry?} → {short_url}
GET /{shortCode} → 301/302 Redirect to long_url
Database Schema:
urls:
id (PK, bigint)
short_code (unique index, varchar(7))
long_url (text)
created_at (timestamp)
expires_at (timestamp, nullable)
user_id (FK, nullable)
Key Decisions:
1. Short code generation:
- Base62 encoding of auto-increment ID (predictable but simple)
- MD5/SHA256 hash → take first 7 chars (collision possible)
- Pre-generated IDs with a counter service (distributed-safe)
2. 301 vs 302 redirect:
- 301 (permanent): browser caches, less server load, less analytics
- 302 (temporary): every request hits server, better for analytics
3. Read optimization:
- Cache hot URLs in Redis (LRU eviction)
- Cache hit ratio likely 80%+ (Zipf distribution)
4. Scaling:
- Read replicas for DB
- Shard by short_code hash
- Redis cluster for cache
Design 2: Chat System (like Slack/WhatsApp)
Requirements:
- 1:1 and group messaging
- Online/offline status
- Message history
- Read receipts
- Push notifications for offline users
Estimation:
- 50M DAU, 40 messages/day per user
- Messages: 2 billion/day ≈ 23K/sec, peak 70K/sec
- Storage: 2B * 200 bytes = 400 GB/day
High-Level Design:
Client ↔ WebSocket Gateway ↔ Chat Service → Message Queue
→ Message DB
→ Presence Service
→ Notification Service
Key Components:
1. WebSocket Gateway:
- Maintains persistent connections
- Routes messages to correct recipient connection
- Handles reconnection, heartbeats
2. Chat Service:
- Message validation, storage, fan-out
- Group message distribution
3. Presence Service:
- Track online/offline status
- Heartbeat-based detection
- Pub/sub for status updates
4. Message Storage:
- Recent messages: Redis (fast access, bounded size)
- Historical: Cassandra or PostgreSQL partitioned by conversation + time
- Append-only, sequential writes
5. Delivery:
- Online → push via WebSocket
- Offline → store + push notification (APNs/FCM)
- Retry with exponential backoff
Design 3: Task Queue / Background Job System
Directly relevant to your Intensel experience!
Requirements:
- Submit async jobs with parameters
- Priority-based execution
- Retries with backoff
- Job status tracking
- Scheduling (cron-like)
- Scalable workers
High-Level Design:
API → Job Service → Message Queue (RabbitMQ/SQS)
→ Job DB (status tracking)
→ Worker Pool → Result Store
Scheduler (cron) → Job Service
Key Decisions:
1. Queue choice:
- RabbitMQ: mature, routing, priority queues, ACK-based
- SQS: managed, at-least-once, no priority (use multiple queues)
- Kafka: if you need replay-ability and high throughput
- Redis: simple, fast, but less durable
2. Retry strategy:
- Exponential backoff: 1s, 2s, 4s, 8s, ...
- Max retries: 3-5 depending on job type
- Dead Letter Queue for permanently failed jobs
- Idempotent job handlers (retries should be safe)
3. Worker scaling:
- Horizontal: add more worker instances
- Auto-scale based on queue depth
- Heartbeat/health checks for stuck workers
- Graceful shutdown (finish current job)
4. Job status lifecycle:
PENDING → QUEUED → RUNNING → SUCCESS/FAILED/RETRY
5. Scheduling:
- Cron-like scheduler checks periodically
- Inserts due jobs into the queue
- Use DB for schedule persistence, not in-memory
Design 4: Geospatial Service (Map Tile Server)
This is YOUR project — you’ve built this!
Requirements:
- Serve raster map tiles at various zoom levels
- Authenticated access per customer
- Handle 5.3 TB of geospatial data
- Low latency, high throughput
- Caching for repeated tile requests
High-Level Design:
Client (Mapbox GL) → CDN → API Gateway (auth) → Tile Service (FastAPI)
→ Tile Cache (Redis/disk)
→ MapServer → Data Store (SQLite/PostGIS)
Key Decisions:
1. Tile pyramid structure:
- Pre-render common zoom levels (0-14)
- On-demand render for deep zooms
- Z/X/Y addressing standard
2. Caching strategy:
- Multi-layer: CDN → Redis → Disk → Re-render
- Cache key: {layer}/{z}/{x}/{y}/{style_hash}
- TTL based on data update frequency
- Cache invalidation on data updates
3. Authentication:
- JWT tokens verified at API gateway
- Per-customer access control to specific layers
- Token refresh for long map sessions
4. Scaling:
- Stateless tile service → horizontal scaling
- CDN handles 90%+ of repeated tile requests
- Background pre-rendering for hot areas
- Auto-scale workers during peak hours
How to talk about this:
"I designed and implemented a tile delivery service using FastAPI and
MapServer, serving 5.3 TB of authenticated geospatial data. I added
multi-layer caching that reduced origin hits by 90%, bringing p99
latency under 100ms for cached tiles."
Design 5: Data Pipeline (Climate Risk Platform)
Another one of YOUR systems!
Requirements:
- Ingest data from multiple external sources
- Process multi-terabyte datasets
- Near real-time insights for customers
- Handle failures gracefully
High-Level Design:
External Sources → Ingestion Service → Queue → Processing Workers (Dask)
→ PostgreSQL/PostGIS
→ API → Customer Dashboard
Key Decisions:
1. Ingestion:
- API scrapers, file downloaders, webhook receivers
- Idempotent ingestion (same data processed twice = same result)
- Raw data stored in S3 (archival + reprocessing)
2. Processing:
- Dask for distributed computation
- Chunk large datasets into manageable pieces
- Retry failed chunks independently
3. Data quality:
- Validation at ingestion
- Schema checks before processing
- Monitoring for data freshness
4. Query optimization:
- Spatial indexing (GiST indexes in PostGIS)
- Partitioning by geography or time
- Materialized views for frequent queries
- Connection pooling (PgBouncer)
Your Experience as System Design Ammo
Map your resume to system design vocabulary:
| Your Experience | System Design Concept |
|---|---|
| ”Reduced latency from minutes to milliseconds” | Query optimization, indexing, caching |
| ”1.8 TB, 2.3 trillion records in PostGIS” | Database scaling, spatial indexing, sharding |
| ”Async workflows with queues, workers, retries” | Message queues, distributed task processing |
| ”Multi-terabyte datasets, hundreds of users” | Horizontal scaling, caching, CDN |
| ”5.3 TB tile service with caching” | CDN, multi-layer caching, read optimization |
| ”Caching, monitoring, alerting” | Observability, reliability engineering |
| ”AWS infrastructure” | Cloud architecture, auto-scaling, managed services |
| ”Mentored junior engineers” | Leadership, technical communication |
Common Trade-offs to Discuss
Always frame decisions as trade-offs in interviews:
| Decision | Trade-off |
|---|---|
| SQL vs NoSQL | Consistency + queries vs Scale + flexibility |
| Cache vs No cache | Latency vs Complexity + stale data |
| Sync vs Async | Simplicity vs Throughput + resilience |
| Monolith vs Microservices | Simplicity vs Independent scaling + deployment |
| Strong vs Eventual consistency | Correctness vs Availability + latency |
| Push vs Pull | Real-time vs Resource efficiency |
| Pre-compute vs On-demand | Latency vs Storage + freshness |
| Normalize vs Denormalize | Write efficiency vs Read efficiency |
| Vertical vs Horizontal scaling | Simplicity vs Unlimited scale |
| REST vs gRPC | Simplicity + tooling vs Performance + typing |
| Batch vs Stream processing | Throughput vs Latency |
Resources
Free
- System Design Primer: https://github.com/donnemartin/system-design-primer
- System Design Handbook: https://www.systemdesignhandbook.com/guides/system-design/
- ByteByteGo Newsletter: https://blog.bytebytego.com/ (free articles)
- High Scalability Blog: http://highscalability.com/
- Martin Kleppmann’s talks: YouTube (DDIA author)
Books
- Designing Data-Intensive Applications (DDIA) — you’re reading this!
- System Design Interview Vol 1 & 2 by Alex Xu
- Understanding Distributed Systems by Roberto Vitillo
Courses
- Grokking System Design (Educative): https://www.educative.io/courses/grokking-modern-system-design-interview-for-engineers-managers
- DesignGurus.io: https://www.designgurus.io/
Practice
- Practice with a friend — take turns being interviewer
- Record yourself explaining a design (watch for clarity)
- Time yourself — 35 minutes for the complete design
My Notes
Systems I can explain deeply:
- Climate risk platform (data pipeline, processing, API)
- Global building footprints (1.8TB PostGIS, spatial indexing)
- Map tile service (5.3TB, caching, MapServer)
- Async job processing (RabbitMQ, Dask, retries)
Concepts I need to review:
-
Trade-offs I always forget:
-
Next: 03-python-deep-dive.md