02 — System Design Interview Guide

Priority: HIGH — With 5+ years of experience, this is your strongest differentiator. You’ve built real systems at scale — now learn to articulate it in interview format.


Table of Contents

  1. The System Design Interview Framework
  2. Core Concepts You Must Know
  3. Back-of-the-Envelope Estimation
  4. Top 20 System Design Problems
  5. Deep Dives: 5 Detailed Designs
  6. Your Experience as System Design Ammo
  7. Common Trade-offs to Discuss
  8. Resources

The System Design Interview Framework

Use this RESHADED framework for every problem (45-60 min):

R — Requirements (5 min)
    Functional: What should the system do?
    Non-functional: Scale, latency, availability, consistency

E — Estimation (5 min)
    Users, QPS, storage, bandwidth
    Back-of-envelope math to size the system

S — Storage Schema (5 min)
    Data model: tables, relationships, indexes
    SQL vs NoSQL decision with justification

H — High-Level Design (10 min)
    Draw the architecture: clients → LB → services → DB/cache
    Identify major components and data flow

A — API Design (5 min)
    Key endpoints: method, path, params, response
    Authentication, rate limiting, pagination

D — Detailed Design (10 min)
    Dive deep into 1-2 critical components
    This is where you show depth and trade-off reasoning

E — Evaluate (5 min)
    How does the design handle failures?
    What are the bottlenecks? How would you scale further?
    What would you change for 10x more traffic?

D — Distinctive Features (optional, 5 min)
    Monitoring, alerting, deployment strategy
    Nice-to-have features that show you think beyond the basics

Pro Tips for the Interview


Core Concepts You Must Know

1. Scalability

Vertical Scaling (Scale Up):
  - Bigger machine: more CPU, RAM, storage
  - Simpler but has hard limits
  - Good for databases (initially)

Horizontal Scaling (Scale Out):
  - More machines
  - Need: load balancing, data partitioning, stateless services
  - Better for web/app tier

2. Load Balancing

What: Distributes incoming traffic across multiple servers.

Algorithms:
  - Round Robin: simple, equal distribution
  - Weighted Round Robin: heavier instances get more traffic
  - Least Connections: sends to least busy server
  - IP Hash: consistent routing for same client
  - Consistent Hashing: minimizes data movement on scale events

Layers:
  - L4 (Transport): TCP/UDP level, faster, no content inspection
  - L7 (Application): HTTP level, content-based routing, path/header routing

Tools: AWS ALB/NLB, Nginx, HAProxy, Envoy

3. Caching

Why: Reduce latency, reduce DB load, improve throughput.

Cache Strategies:
  - Cache-aside (Lazy Loading):
      App checks cache → miss → read from DB → write to cache
      Pros: Only caches what's needed
      Cons: Cache miss penalty, potential stale data

  - Write-through:
      App writes to cache and DB simultaneously
      Pros: Cache always consistent
      Cons: Write latency, caches unused data

  - Write-behind (Write-back):
      App writes to cache → cache asynchronously writes to DB
      Pros: Fast writes
      Cons: Data loss risk if cache crashes

  - Read-through:
      Cache sits between app and DB, loads on miss automatically

Cache Eviction Policies:
  - LRU (Least Recently Used) — most common
  - LFU (Least Frequently Used)
  - TTL (Time To Live) — expiry-based

Where to cache:
  - Client-side (browser cache)
  - CDN (static assets, edge caching)
  - Application cache (Redis, Memcached)
  - Database cache (query cache, buffer pool)

Tools: Redis, Memcached, Varnish, CDN (CloudFront, Cloudflare)

4. Database Design

SQL (Relational):
  Strengths: ACID, joins, structured data, mature tooling
  Use when: Strong consistency needed, complex queries, relationships matter
  Examples: PostgreSQL, MySQL

NoSQL Types:
  - Key-Value: Redis, DynamoDB (sessions, caching)
  - Document: MongoDB (flexible schema, nested data)
  - Column-family: Cassandra (wide columns, time-series)
  - Graph: Neo4j (relationships, social networks)

Decision Framework:
  Need ACID + complex queries? → SQL
  Need horizontal scale + simple lookups? → Key-Value / Document
  Need time-series at massive scale? → Column-family
  Need relationship traversal? → Graph

5. Database Scaling

Read Replicas:
  - Write to primary, read from replicas
  - Replication lag = eventual consistency for reads
  - Good for read-heavy workloads (80%+ reads)

Sharding (Horizontal Partitioning):
  - Split data across multiple databases
  - Shard key: determines which shard holds data
  - Range-based: user_id 0-1M → shard 1 (risk: hot spots)
  - Hash-based: hash(user_id) % N → shard N (better distribution)
  - Challenges: cross-shard queries, rebalancing, shard key choice

Vertical Partitioning:
  - Split tables by columns (e.g., user profile vs user activity)
  - Different tables on different machines

Denormalization:
  - Duplicate data to avoid joins
  - Trade storage for read performance
  - Keep in sync with triggers or application logic

6. Message Queues & Async Processing

Why: Decouple services, handle spikes, enable async processing.

Components:
  Producer → Queue → Consumer

Patterns:
  - Point-to-point: one consumer per message
  - Pub/Sub: multiple consumers (topics/subscriptions)
  - Work queue: multiple workers process from same queue

Guarantees:
  - At-most-once: no retry (fast, may lose messages)
  - At-least-once: retry on failure (duplicates possible)
  - Exactly-once: hardest (deduplication + transactions)

Tools: RabbitMQ, Kafka, AWS SQS/SNS, Redis Streams

Your experience: "At Intensel, I designed async workflows with queues,
workers, retries, and scheduling for high-volume background jobs."
→ You literally built this. Talk about RabbitMQ + Dask setup.

7. CAP Theorem

In a distributed system, you can only guarantee 2 of 3:
  C — Consistency: every read gets the latest write
  A — Availability: every request gets a response
  P — Partition tolerance: system works despite network failures

Reality: P is non-negotiable (networks fail), so you choose between:
  CP: Consistent + Partition-tolerant (sacrifice availability)
      → HBase, MongoDB (strong consistency mode), ZooKeeper
  AP: Available + Partition-tolerant (sacrifice consistency)
      → Cassandra, DynamoDB, DNS

In practice:
  - Most systems are "mostly consistent, mostly available"
  - Different data can have different guarantees:
      Financial transactions → strong consistency
      Social media likes → eventual consistency

8. Consistent Hashing

Problem: Simple hash(key) % N breaks when N changes (add/remove server).

Solution: Consistent hashing maps both keys and servers to a ring.
  - Only K/N keys need to move when a server is added/removed
  - Virtual nodes improve distribution (each server → multiple points on ring)

Use cases:
  - Distributed caches (Redis cluster)
  - Load balancing (sticky sessions)
  - Database sharding
  - CDN routing

9. API Gateway

What: Single entry point for all client requests.

Responsibilities:
  - Routing: forward to correct service
  - Authentication/Authorization
  - Rate limiting
  - Request/response transformation
  - SSL termination
  - Logging and monitoring

Tools: Kong, AWS API Gateway, Nginx, Envoy

10. CDN (Content Delivery Network)

What: Distributed network of edge servers that cache content close to users.

Types:
  - Pull: CDN fetches from origin on first request, caches it
  - Push: You upload content to CDN proactively

Use for: Static assets (images, CSS, JS), video streaming, API responses

Tools: CloudFront, Cloudflare, Akamai, Fastly

11. Rate Limiting

Why: Protect against abuse, ensure fair usage, prevent overload.

Algorithms:
  - Token Bucket: tokens added at fixed rate, consumed per request
  - Leaky Bucket: requests processed at fixed rate
  - Fixed Window: count requests in fixed time windows
  - Sliding Window Log: timestamps of recent requests
  - Sliding Window Counter: hybrid of fixed window + sliding

Implementation:
  - At API gateway level (global)
  - Per-service (service-level)
  - Using Redis (distributed rate limiting)

12. Replication & Consensus

Replication:
  - Single-leader: one primary, multiple replicas (PostgreSQL, MySQL)
  - Multi-leader: multiple primaries (conflict resolution needed)
  - Leaderless: any node accepts writes (Cassandra, DynamoDB)

Consensus Protocols:
  - Raft: leader election, log replication (easier to understand)
  - Paxos: classic but complex
  - ZAB: ZooKeeper's protocol

When asked about this:
  - Explain in terms of your PostgreSQL replication experience
  - Relate to your work with distributed Dask workers

Back-of-the-Envelope Estimation

Key Numbers to Memorize

Storage:
  1 KB  = 1,000 bytes        (a short text record)
  1 MB  = 1,000 KB           (a photo)
  1 GB  = 1,000 MB           (a movie)
  1 TB  = 1,000 GB           (large database)
  1 PB  = 1,000 TB           (enterprise data warehouse)

Time:
  1 ns  = L1 cache reference
  10 ns = L2 cache reference
  100 ns = RAM access
  1 ms  = SSD random read
  10 ms = HDD seek
  150 ms = Round trip within same datacenter
  150 ms = Send packet one way across the world

Traffic:
  1 day = 86,400 seconds ≈ 100,000 seconds (for estimation)
  1 million requests/day ≈ 12 requests/second
  1 billion requests/day ≈ 12,000 requests/second

Quick math:
  100M users × 10% daily active = 10M DAU
  10M DAU × 5 requests/day = 50M requests/day
  50M / 100K seconds ≈ 500 QPS
  Peak = 2-3x average = 1000-1500 QPS

Estimation Template

1. Users: How many total? Daily active?
2. Read/Write ratio: Read-heavy (100:1)? Write-heavy?
3. QPS: DAU × actions per day / 86400
4. Peak QPS: 2-3x average
5. Storage: Records × size per record × retention period
6. Bandwidth: QPS × average response size
7. Memory (cache): If we cache 20% of hot data...

Top 20 System Design Problems

Tier 1: Must Know (very commonly asked)

#ProblemKey Concepts
1Design a URL ShortenerHashing, base62 encoding, redirection, analytics
2Design Twitter/News FeedFan-out, timeline generation, caching, pub/sub
3Design a Chat System (WhatsApp/Slack)WebSockets, message queues, presence, delivery guarantees
4Design a Rate LimiterToken bucket, sliding window, distributed rate limiting
5Design a Notification SystemPush/pull, prioritization, deduplication, multiple channels

Tier 2: Frequently Asked

#ProblemKey Concepts
6Design an API Rate LimiterToken bucket, Redis, distributed counting
7Design a Key-Value StoreConsistent hashing, replication, conflict resolution
8Design a Web CrawlerBFS, politeness, deduplication, distributed crawling
9Design YouTube/NetflixVideo encoding, CDN, adaptive bitrate, recommendations
10Design Uber/LyftGeospatial indexing, matching, real-time tracking

Tier 3: Good to Know

#ProblemKey Concepts
11Design Google SearchInverted index, ranking, crawling, caching
12Design Dropbox/Google DriveFile chunking, sync, conflict resolution, dedup
13Design InstagramImage storage, news feed, CDN, caching
14Design a Task QueueWorker pools, retries, dead letter queue, priorities
15Design an E-commerce SystemInventory management, cart, checkout, payments

Tier 4: Specialized (great for your profile)

#ProblemKey Concepts
16Design a Geospatial ServicePostGIS, spatial indexing, tile serving, caching
17Design a Data PipelineETL, batch vs stream, exactly-once, schema evolution
18Design a Metrics/Monitoring SystemTime-series DB, aggregation, alerting, dashboards
19Design a Map Tile ServerTile pyramids, caching layers, CDN, auth
20Design a Distributed Task SchedulerQueues, workers, retries, scheduling, backpressure

Deep Dives: 5 Detailed Designs

Design 1: URL Shortener (TinyURL)

Requirements:
  - Shorten long URLs → short code (e.g., tiny.url/abc123)
  - Redirect short → long URL
  - Analytics (click count, geo, timestamp)
  - Custom short URLs (optional)
  - Expiry (optional)

Estimation:
  - 100M new URLs/month, 10:1 read/write
  - Write: 100M / (30 * 86400) ≈ 40 URLs/sec
  - Read: 400 redirects/sec, peak: 1200/sec
  - Storage: 100M * 500 bytes * 12 months ≈ 600 GB/year

High-Level Design:
  Client → API Gateway → URL Service → Database
                                      → Cache (Redis)

API Design:
  POST /api/urls  {long_url, custom_alias?, expiry?}  → {short_url}
  GET  /{shortCode}  → 301/302 Redirect to long_url

Database Schema:
  urls:
    id (PK, bigint)
    short_code (unique index, varchar(7))
    long_url (text)
    created_at (timestamp)
    expires_at (timestamp, nullable)
    user_id (FK, nullable)

Key Decisions:
  1. Short code generation:
     - Base62 encoding of auto-increment ID (predictable but simple)
     - MD5/SHA256 hash → take first 7 chars (collision possible)
     - Pre-generated IDs with a counter service (distributed-safe)

  2. 301 vs 302 redirect:
     - 301 (permanent): browser caches, less server load, less analytics
     - 302 (temporary): every request hits server, better for analytics

  3. Read optimization:
     - Cache hot URLs in Redis (LRU eviction)
     - Cache hit ratio likely 80%+ (Zipf distribution)

  4. Scaling:
     - Read replicas for DB
     - Shard by short_code hash
     - Redis cluster for cache

Design 2: Chat System (like Slack/WhatsApp)

Requirements:
  - 1:1 and group messaging
  - Online/offline status
  - Message history
  - Read receipts
  - Push notifications for offline users

Estimation:
  - 50M DAU, 40 messages/day per user
  - Messages: 2 billion/day ≈ 23K/sec, peak 70K/sec
  - Storage: 2B * 200 bytes = 400 GB/day

High-Level Design:
  Client ↔ WebSocket Gateway ↔ Chat Service → Message Queue
                                              → Message DB
                                              → Presence Service
                                              → Notification Service

Key Components:
  1. WebSocket Gateway:
     - Maintains persistent connections
     - Routes messages to correct recipient connection
     - Handles reconnection, heartbeats

  2. Chat Service:
     - Message validation, storage, fan-out
     - Group message distribution

  3. Presence Service:
     - Track online/offline status
     - Heartbeat-based detection
     - Pub/sub for status updates

  4. Message Storage:
     - Recent messages: Redis (fast access, bounded size)
     - Historical: Cassandra or PostgreSQL partitioned by conversation + time
     - Append-only, sequential writes

  5. Delivery:
     - Online → push via WebSocket
     - Offline → store + push notification (APNs/FCM)
     - Retry with exponential backoff

Design 3: Task Queue / Background Job System

Directly relevant to your Intensel experience!

Requirements:
  - Submit async jobs with parameters
  - Priority-based execution
  - Retries with backoff
  - Job status tracking
  - Scheduling (cron-like)
  - Scalable workers

High-Level Design:
  API → Job Service → Message Queue (RabbitMQ/SQS)
                    → Job DB (status tracking)
                    → Worker Pool → Result Store

  Scheduler (cron) → Job Service

Key Decisions:
  1. Queue choice:
     - RabbitMQ: mature, routing, priority queues, ACK-based
     - SQS: managed, at-least-once, no priority (use multiple queues)
     - Kafka: if you need replay-ability and high throughput
     - Redis: simple, fast, but less durable

  2. Retry strategy:
     - Exponential backoff: 1s, 2s, 4s, 8s, ...
     - Max retries: 3-5 depending on job type
     - Dead Letter Queue for permanently failed jobs
     - Idempotent job handlers (retries should be safe)

  3. Worker scaling:
     - Horizontal: add more worker instances
     - Auto-scale based on queue depth
     - Heartbeat/health checks for stuck workers
     - Graceful shutdown (finish current job)

  4. Job status lifecycle:
     PENDING → QUEUED → RUNNING → SUCCESS/FAILED/RETRY

  5. Scheduling:
     - Cron-like scheduler checks periodically
     - Inserts due jobs into the queue
     - Use DB for schedule persistence, not in-memory

Design 4: Geospatial Service (Map Tile Server)

This is YOUR project — you’ve built this!

Requirements:
  - Serve raster map tiles at various zoom levels
  - Authenticated access per customer
  - Handle 5.3 TB of geospatial data
  - Low latency, high throughput
  - Caching for repeated tile requests

High-Level Design:
  Client (Mapbox GL) → CDN → API Gateway (auth) → Tile Service (FastAPI)
                                                  → Tile Cache (Redis/disk)
                                                  → MapServer → Data Store (SQLite/PostGIS)

Key Decisions:
  1. Tile pyramid structure:
     - Pre-render common zoom levels (0-14)
     - On-demand render for deep zooms
     - Z/X/Y addressing standard

  2. Caching strategy:
     - Multi-layer: CDN → Redis → Disk → Re-render
     - Cache key: {layer}/{z}/{x}/{y}/{style_hash}
     - TTL based on data update frequency
     - Cache invalidation on data updates

  3. Authentication:
     - JWT tokens verified at API gateway
     - Per-customer access control to specific layers
     - Token refresh for long map sessions

  4. Scaling:
     - Stateless tile service → horizontal scaling
     - CDN handles 90%+ of repeated tile requests
     - Background pre-rendering for hot areas
     - Auto-scale workers during peak hours

How to talk about this:
  "I designed and implemented a tile delivery service using FastAPI and
  MapServer, serving 5.3 TB of authenticated geospatial data. I added
  multi-layer caching that reduced origin hits by 90%, bringing p99
  latency under 100ms for cached tiles."

Design 5: Data Pipeline (Climate Risk Platform)

Another one of YOUR systems!

Requirements:
  - Ingest data from multiple external sources
  - Process multi-terabyte datasets
  - Near real-time insights for customers
  - Handle failures gracefully

High-Level Design:
  External Sources → Ingestion Service → Queue → Processing Workers (Dask)
                                                → PostgreSQL/PostGIS
                                                → API → Customer Dashboard

Key Decisions:
  1. Ingestion:
     - API scrapers, file downloaders, webhook receivers
     - Idempotent ingestion (same data processed twice = same result)
     - Raw data stored in S3 (archival + reprocessing)

  2. Processing:
     - Dask for distributed computation
     - Chunk large datasets into manageable pieces
     - Retry failed chunks independently

  3. Data quality:
     - Validation at ingestion
     - Schema checks before processing
     - Monitoring for data freshness

  4. Query optimization:
     - Spatial indexing (GiST indexes in PostGIS)
     - Partitioning by geography or time
     - Materialized views for frequent queries
     - Connection pooling (PgBouncer)

Your Experience as System Design Ammo

Map your resume to system design vocabulary:

Your ExperienceSystem Design Concept
”Reduced latency from minutes to milliseconds”Query optimization, indexing, caching
”1.8 TB, 2.3 trillion records in PostGIS”Database scaling, spatial indexing, sharding
”Async workflows with queues, workers, retries”Message queues, distributed task processing
”Multi-terabyte datasets, hundreds of users”Horizontal scaling, caching, CDN
”5.3 TB tile service with caching”CDN, multi-layer caching, read optimization
”Caching, monitoring, alerting”Observability, reliability engineering
”AWS infrastructure”Cloud architecture, auto-scaling, managed services
”Mentored junior engineers”Leadership, technical communication

Common Trade-offs to Discuss

Always frame decisions as trade-offs in interviews:

DecisionTrade-off
SQL vs NoSQLConsistency + queries vs Scale + flexibility
Cache vs No cacheLatency vs Complexity + stale data
Sync vs AsyncSimplicity vs Throughput + resilience
Monolith vs MicroservicesSimplicity vs Independent scaling + deployment
Strong vs Eventual consistencyCorrectness vs Availability + latency
Push vs PullReal-time vs Resource efficiency
Pre-compute vs On-demandLatency vs Storage + freshness
Normalize vs DenormalizeWrite efficiency vs Read efficiency
Vertical vs Horizontal scalingSimplicity vs Unlimited scale
REST vs gRPCSimplicity + tooling vs Performance + typing
Batch vs Stream processingThroughput vs Latency

Resources

Free

Books

Courses

Practice


My Notes

Systems I can explain deeply:
- Climate risk platform (data pipeline, processing, API)
- Global building footprints (1.8TB PostGIS, spatial indexing)
- Map tile service (5.3TB, caching, MapServer)
- Async job processing (RabbitMQ, Dask, retries)

Concepts I need to review:
-

Trade-offs I always forget:
-

Next: 03-python-deep-dive.md