Fundamentals, Scalability, Databases, Caching, Load Balancing, Message Queues, Security, Case Studies — system design interview mastery.
System DesignScalability & HACase Studies
⏳
Loading cheatsheet...
🏗️
Fundamentals
CORE
concepts/cap-theorem.md
# ── CAP Theorem ──
A distributed system can guarantee at most 2 of 3:
C - Consistency: Every read returns the most recent write
A - Availability: Every request receives a response (non-error)
P - Partition Tolerance: System operates despite network splits
In practice, you MUST choose P (networks fail), so choose:
┌──────────────────┬─────────────────┬──────────────────┐
│ CP System │ AP System │ CA System │
├──────────────────┼─────────────────┼──────────────────┤
│ HBase │ DynamoDB │ Single-node RDBMS │
│ MongoDB (config) │ Cassandra │ (not distributed) │
│ Redis Cluster │ CouchDB │ │
│ ZooKeeper │ Eureka │ │
└──────────────────┴─────────────────┴──────────────────┘
# ── BASE vs ACID ──
ACID (relational DBs):
- Atomicity: Transaction all-or-nothing
- Consistency: Valid state to valid state
- Isolation: Concurrent txns don't interfere
- Durability: Committed data survives crashes
BASE (NoSQL / distributed):
- Basically Available: Reads/writes always succeed
- Soft State: Data may change without input (replication)
- Eventually Consistent: All replicas converge over time
Always start with requirements and estimations. Ask clarifying questions: read-heavy or write-heavy? Expected scale? Consistency requirements? These determine your architecture choices.
📈
Scalability
GROWTH
architecture/scaling.md
# ── Vertical vs Horizontal Scaling ──
VERTICAL (Scale Up):
+ Simpler (no code changes)
- Hardware limits (max RAM, CPU)
- Single point of failure
- Expensive at high end
HORIZONTAL (Scale Out):
+ Nearly unlimited capacity
+ Fault tolerance (redundancy)
- More complex (consistency, coordination)
- Requires stateless services
# ── Load Balancing ──
Client → Load Balancer → [Server 1]
→ [Server 2]
→ [Server 3]
Algorithms:
- Round Robin: Sequential rotation
- Weighted Round Robin: By server capacity
- Least Connections: Route to least busy
- IP Hash: Sticky sessions by client IP
- Least Latency: Route to fastest response
# ── Scaling the Database ──
READ REPLICAS:
Master (write) → Replica 1 (read)
→ Replica 2 (read)
→ Replica 3 (read)
SHARDING (Horizontal Partitioning):
Users 1-1M → Shard 1 (DB)
Users 1M-2M → Shard 2 (DB)
Users 2M-3M → Shard 3 (DB)
Sharding Strategies:
- Hash-based: hash(user_id) % num_shards
- Range-based: A-M → shard 1, N-Z → shard 2
- Directory-based: Lookup service maps keys to shards
Monolith vs Microservices
Aspect
Monolith
Microservices
Deployment
One unit
Independent services
Scaling
Scale whole app
Scale per service
Complexity
Low (initially)
High (ops)
Tech Stack
Single
Polyglot
Team Size
Small
Large (per service)
Failure Impact
Full outage
Partial
Communication
In-process
Network (API/msg)
Best For
MVP, small teams
Large orgs, scale
API Gateway
Feature
Description
Routing
Route requests to services
Auth
Centralize authentication
Rate Limiting
Per-client/per-service limits
SSL Termination
Handle TLS at gateway
Load Balancing
Distribute across instances
Caching
Cache frequent responses
Logging
Central request logging
Circuit Breaker
Fail fast on downstream errors
Examples
Kong, AWS API GW, NGINX, Envoy
architecture/microservices.yaml
# ── Microservices Architecture (High-level) ──
# Client → CDN → API Gateway → Auth Service
# → User Service
# → Order Service
# → Payment Service
# → Notification Service
#
# Communication:
# - Sync: REST/gRPC between services
# - Async: Message Queue (Kafka, RabbitMQ) for events
#
# Data: Database per service (no shared DB)
# Discovery: Service registry (Consul, Eureka) or DNS
# Config: Centralized (Consul, Spring Cloud Config)
# Monitoring: Metrics + logs + traces (Prometheus, ELK, Jaeger)
⚠️
Start monolith, extract when needed. Only move to microservices when you have clear team boundaries, scaling needs per service, or independent deployment requirements. Premature microservices add massive operational overhead.
🗄️
Databases
STORAGE
SQL vs NoSQL
Aspect
SQL
NoSQL
Schema
Fixed (structured)
Flexible (dynamic)
Query
SQL language
API-specific
Scaling
Vertical (sharding hard)
Horizontal (built-in)
Joins
Native joins
Usually manual
ACID
Full ACID
Eventual consistency
Best For
Complex queries, relations
Simple access, huge scale
Examples
PostgreSQL, MySQL
MongoDB, Cassandra, DynamoDB
NoSQL Types
Type
Examples
Use Case
Document
MongoDB, CouchDB
Content management, user profiles
Key-Value
Redis, DynamoDB
Sessions, caching, shopping cart
Column
Cassandra, HBase
Time-series, analytics, IoT
Graph
Neo4j, Amazon Neptune
Social networks, recommendations
Search
Elasticsearch
Full-text search, logging
database/schema-design.sql
-- ── SQL: Denormalization for Read Performance ──
-- Normalized (write-optimized):
-- users(id, name, email)
-- posts(id, user_id, title, body)
-- comments(id, post_id, user_id, text)
-- Denormalized (read-optimized, for feed):
CREATE TABLE feed (
id BIGINT PRIMARY KEY,
user_id BIGINT,
username VARCHAR(50), -- denormalized from users
user_avatar_url TEXT, -- denormalized from users
title VARCHAR(255),
body TEXT,
like_count INT DEFAULT 0, -- pre-computed counter
comment_count INT DEFAULT 0,
created_at TIMESTAMP,
INDEX idx_user_id (user_id),
INDEX idx_created_at (created_at)
);
-- ── Indexing Strategy ──
-- Single column
CREATE INDEX idx_email ON users(email);
-- Composite (order matters!)
CREATE INDEX idx_user_created ON posts(user_id, created_at DESC);
-- Covering index (all columns in index)
CREATE INDEX idx_covering ON orders(user_id, status, total_amount);
-- Partial index
CREATE INDEX idx_active ON users(email) WHERE is_active = true;
-- Unique index
CREATE UNIQUE INDEX idx_unique_email ON users(email);
Choose your database based on the query pattern, not just the data. If you need complex joins and ACID, use SQL. If you need horizontal scale and flexible schema, use NoSQL. Many systems use both: SQL for transactions, NoSQL for caching and large-scale reads.
⚡
Caching
PERFORMANCE
caching/strategies.md
# ── Cache Strategies ──
CACHE-ASIDE (Lazy Loading):
1. App checks cache
2. Cache miss → fetch from DB → store in cache → return
3. Cache hit → return from cache
+ Simple, never has stale data if TTL is short
- First request is slow (cold start)
READ-THROUGH:
1. App asks cache
2. Cache miss → cache fetches from DB → stores → returns
+ App doesn't know about the DB
- Data might be stale
WRITE-THROUGH:
1. App writes to cache
2. Cache writes to DB (synchronously)
+ Data always consistent
- Slower writes
WRITE-BEHIND (Write-Back):
1. App writes to cache (fast return)
2. Cache writes to DB asynchronously
+ Very fast writes
- Data loss on cache failure
WRITE-AROUND:
1. App writes to DB directly (cache is not updated)
2. Next read causes cache miss → refresh
+ Avoids cache being written to with data not read often
- Reads are slower until cache refreshes
caching/redis-examples.js
// ── Redis Cache Implementation ──
const Redis = require('redis');
const client = Redis.createClient({ url: 'redis://localhost:6379' });
async function getUser(userId) {
// 1. Check cache
const cached = await client.get(`user:${userId}`);
if (cached) return JSON.parse(cached);
// 2. Cache miss → fetch from DB
const user = await db.users.findById(userId);
// 3. Store in cache with TTL (5 min)
await client.setex(
`user:${userId}`,
300, // TTL in seconds
JSON.stringify(user)
);
return user;
}
// ── Cache Invalidation Strategies ──
// Time-based (TTL)
await client.setex('key', 3600, value); // expire in 1 hour
// Event-based (write invalidation)
async function updateUser(userId, data) {
await db.users.update(userId, data);
await client.del(`user:${userId}`); // invalidate cache
}
// Version-based
await client.set(`user:${userId}:v2`, JSON.stringify(user));
// ── Common Cache Patterns ──
// Rate limiter (sliding window)
async function rateLimit(userId, limit = 100, window = 60) {
const key = `rate:${userId}`;
const current = await client.incr(key);
if (current === 1) await client.expire(key, window);
return current <= limit;
}
// Distributed lock
async function acquireLock(key, ttl = 10) {
const result = await client.set(key, 'locked', 'PX', ttl * 1000, 'NX');
return result === 'OK';
}
Cache Placement
Level
Speed
Example
Browser cache
Fastest
Cache-Control headers
CDN cache
Very fast
CloudFront, Cloudflare
Load balancer
Fast
Nginx proxy_cache
Application cache
Fast
Redis, Memcached
Database cache
Moderate
MySQL query cache
Disk cache
Slow
Page cache, OS buffer
Redis vs Memcached
Feature
Redis
Memcached
Data structures
Strings, lists, sets, hashes, zsets
Strings only
Persistence
RDB + AOF
No (memory only)
Replication
Master-slave
No
Clustering
Redis Cluster
Client-side sharding
Pub/Sub
Yes
No
Lua scripting
Yes
No
Max memory
TB+
GB range
Use case
General purpose caching
Simple caching
🚫
Cache invalidation is one of the hardest problems in CS.Always set a TTL as a safety net. Use event-based invalidation for write-heavy data. Cache only what's frequently accessed and expensive to compute.
⚖️
Load Balancing & CDN
DISTRIBUTION
architecture/load-balancer.conf
# ── Nginx Load Balancer Configuration ──
upstream backend_servers {
# Least connections algorithm
least_conn;
server 10.0.1.1:8080 weight=3; # higher weight = more traffic
server 10.0.1.2:8080 weight=2;
server 10.0.1.3:8080 backup; # only used when others fail
# Health check (commercial version)
# health_check interval=10s fails=3 passes=2;
}
server {
listen 80;
server_name example.com;
# Rate limiting
limit_req_zone $binary_remote_addr zone=api:10m rate=10r/s;
location /api/ {
limit_req zone=api burst=20 nodelay;
proxy_pass http://backend_servers;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
# Timeout settings
proxy_connect_timeout 5s;
proxy_read_timeout 30s;
}
# Static files (direct serve, no proxy)
location /static/ {
alias /var/www/static/;
expires 30d;
add_header Cache-Control "public, immutable";
}
}
Load Balancer Types
Type
Layer
Examples
Use Case
L4 (Transport)
TCP/UDP
LVS, HAProxy
Raw performance
L7 (Application)
HTTP
Nginx, ALB, Envoy
URL routing, headers
DNS
Application
Route 53, Cloudflare
Geographic routing
Global
Any
Cloudflare, Akamai
Multi-region
CDN (Content Delivery Network)
Feature
Description
Edge caching
Cache content at POPs worldwide
Static assets
JS, CSS, images, fonts
DDoS protection
Absorb and filter attacks
SSL termination
TLS at edge, reduce server load
Origin shield
Reduce origin requests
Providers
Cloudflare, AWS CloudFront, Fastly
💡
Use CDN for ALL static assets (images, JS, CSS, fonts). This reduces origin server load by 60-80% and dramatically improves latency for global users. Set Cache-Control headers with content-based hashes for immutable caching.
# ── Messaging Patterns ──
PUB/SUB:
Producer → [Queue] → Consumer A (notifications)
→ Consumer B (analytics)
→ Consumer C (audit log)
COMPETING CONSUMERS:
Producer → [Queue] → Consumer A (worker 1) → DB
→ Consumer B (worker 2) → DB
→ Consumer C (worker 3) → DB
# Each message goes to ONE consumer
REQUEST-REPLY:
Client → [Request Queue] → Worker → [Reply Queue] → Client
# Use correlation_id to match request/response
DEAD LETTER QUEUE:
Main Queue → Consumer (fails) → DLQ → Alert + Manual processing
💡
Use Kafka for event streaming (order events, user activity logs, real-time analytics). Use RabbitMQ/SQS for task queues (email, PDF generation, image processing). Kafka preserves ordering within a partition and supports replay.
🔒
Security
SECURITY
security/auth-flow.md
# ── Authentication Flow (JWT) ──
1. Client → POST /login (username + password)
2. Server validates credentials
3. Server returns JWT (access + refresh tokens)
4. Client stores tokens (httpOnly cookie or secure storage)
5. Client sends JWT in Authorization: Bearer <token>
6. Server validates JWT signature
7. Return protected resource
JWT Structure:
Header: {"alg": "HS256", "typ": "JWT"}
Payload: {"sub": "user_123", "role": "admin", "exp": 1700000000}
Signature: HMAC-SHA256(base64(header) + "." + base64(payload), secret)
# ── OAuth 2.0 Flow (Authorization Code) ──
1. Client redirects to: Auth Server /authorize?response_type=code
2. User authenticates & authorizes
3. Auth Server redirects with: ?code=AUTH_CODE
4. Client exchanges code: POST /token (code → access_token + refresh_token)
5. Client uses access_token to call APIs
6. Refresh token used when access_token expires
Security Best Practices
Practice
Description
HTTPS everywhere
TLS 1.3, HSTS headers
Password hashing
bcrypt/argon2 (never MD5/SHA1)
JWT
Short expiry, httpOnly cookies, RS256
Rate limiting
Prevent brute force attacks
Input validation
Whitelist, parameterized queries
CORS
Restrict origins, not * wildcard
Secrets
Env vars, vault, never in code
Dependency scanning
npm audit, Snyk
Least privilege
Minimal permissions per service
Logging
Audit logs, no sensitive data in logs
Security Headers
Header
Purpose
Content-Security-Policy
Prevent XSS, restrict sources
X-Frame-Options
Prevent clickjacking
X-Content-Type-Options
Prevent MIME sniffing
Strict-Transport-Security
Force HTTPS
X-XSS-Protection
XSS filter (legacy)
Access-Control-Allow-Origin
CORS policy
Referrer-Policy
Control referrer info
📖
Case Studies
DESIGN
case-studies/twitter-design.md
# ── Design Twitter / News Feed ──
Requirements:
- Post tweets (500M/day, 5,800/sec)
- Home feed (fan-out to followers)
- View timeline (chronological)
- Search tweets
Estimation:
- 300M users, 100M DAU
- Average 200 followers per user
- Tweet: 200 bytes avg
- Storage: 500M * 200B * 365 ≈ 36 TB/year
Architecture:
┌─────────┐ ┌──────────┐ ┌─────────────┐
│ Client │───→│ API GW │───→│ Tweet Svc │
└─────────┘ └──────────┘ └──────┬──────┘
│
┌─────────────────────┘
↓
┌──────────────┐ ┌──────────────┐
│ Write Path │ │ Read Path │
│ Tweet → DB │ │ Feed Cache │
│ Fan-out Queue│ │ (Redis) │
└──────────────┘ └──────────────┘
Fan-out Strategies:
1. Fan-out on WRITE: Push tweet to all followers' feeds
+ Read is fast (O(1))
- Write is slow for celebrities (millions of followers)
2. Fan-out on READ: Pull tweets from followed users
+ Write is fast
- Read is slow (need to merge many feeds)
3. Hybrid: Fan-out on write for normal users
Fan-out on read for celebrities (>1M followers)
Components:
- Tweet Service: Create/store tweets (PostgreSQL + Redis cache)
- Fan-out Service: Distribute tweets to follower feeds (Kafka)
- Feed Service: Compose home feed (Redis sorted sets)
- User Service: Profile, followers list
- Search Service: Elasticsearch for full-text search
- Notification Service: Email/push on events
case-studies/url-shortener.md
# ── Design URL Shortener (tinyurl.com) ──
Requirements:
- Shorten long URLs
- Redirect short → long
- Custom aliases (optional)
- Analytics (click tracking)
- High availability
Estimation:
- 100M URLs stored
- 100:1 read:write ratio
- 500K new URLs/day
- 50M redirects/day
Key Generation:
1. Counter + Base62 encoding (a-z, A-Z, 0-9)
ID 1234567 → "5D1D" (base62)
+ Short, sortable
- Single point of failure (counter service)
2. Hash (MD5/SHA256) → first 7 chars
+ Distributed generation
- Collisions possible
3. Pre-generated random strings
+ No coordination needed
- Wastes some IDs
Architecture:
┌────────┐ ┌──────────┐ ┌──────────┐
│ Client │──→│ API GW │──→│ URL Svc │
└────────┘ └──────────┘ └────┬─────┘
│
┌─────────────┤
↓ ↓
┌──────────┐ ┌──────────┐
│ Cache │ │ Database │
│ (Redis) │ │ (SQL) │
└──────────┘ └──────────┘
Database Schema:
urls:
id BIGINT PRIMARY KEY AUTO_INCREMENT,
short_code VARCHAR(7) UNIQUE,
long_url TEXT NOT NULL,
user_id BIGINT,
created_at TIMESTAMP,
expires_at TIMESTAMP NULL
clicks:
id BIGINT PRIMARY KEY AUTO_INCREMENT,
short_code VARCHAR(7),
ip_address VARCHAR(45),
user_agent TEXT,
referrer TEXT,
clicked_at TIMESTAMP
Cache: short_code → long_url (Redis, TTL: 24h)
Analytics: Click events → Kafka → Analytics DB
💡
System design interviews typically cover: URL shortener, pastebin, Twitter/news feed, chat system, Google Maps, Netflix, Uber, rate limiter, key-value store, web crawler, notification system. Practice 5-6 of these thoroughly.
🎯
Interview Q&A
PREP
Q: Explain CAP theorem?In a distributed system, you can guarantee only 2 of 3: Consistency, Availability, Partition Tolerance. Since network partitions are inevitable, you choose between CP (consistent but unavailable during partitions — HBase, ZooKeeper) or AP (available but may return stale data — DynamoDB, Cassandra).
Q: SQL vs NoSQL — when to use which?SQL for structured data with relationships, complex queries, and ACID requirements (financial, inventory). NoSQL for massive scale, flexible schema, simple access patterns, and high write throughput (social feeds, IoT, session storage). Many systems use both: SQL for transactions, NoSQL for caching and analytics.
Q: How does a load balancer work?A load balancer distributes incoming traffic across multiple servers using algorithms (round-robin, least connections, IP hash). L4 operates at TCP/UDP level (fast). L7 operates at HTTP level (can route by URL, headers). Health checks remove unhealthy servers. It also handles SSL termination and rate limiting.
Q: What is eventual consistency?In an AP system, replicas may temporarily diverge but will eventually converge. Example: You post a tweet, your followers might not see it immediately but will within seconds. Achieved via read repair, hinted handoff, anti-entropy. Contrast with strong consistency (every read returns latest write).
Q: Design a rate limiter.Token Bucket: Fixed-rate token generation, allow burst. Leaky Bucket: Fixed-rate processing, smooth output. Fixed Window: Count requests per time window (edge problem at boundaries). Sliding Window: Weighted count across windows (Redis + sorted sets). Implement at API gateway with Redis as counter store.
Q: What is database sharding?Horizontal partitioning — splitting data across multiple database instances by a shard key. Hash-based (hash(key) % N), range-based (A-M shard 1, N-Z shard 2). Challenges: cross-shard queries, resharding, hot spots. Solve with consistent hashing (virtual nodes) and join-free denormalized data.
Q: How to handle server failures?Redundancy: Run multiple instances behind load balancer. Circuit breaker: Stop calling failing services. Retry with exponential backoff. Health checks: Auto-remove unhealthy servers. Graceful degradation: Return cached/default data. Failover: Hot standby or auto-promote replica. Multi-region: Active-active or active-passive.
Q: Microservices vs Monolith?Monolith: simpler to develop, deploy, debug. Better for small teams and MVPs. Microservices: independent deployment, scaling per service, polyglot. Better for large teams and mature products. Migrate when team size > 10 or services have different scaling needs. Extract services iteratively using Strangler Fig pattern.
💡
Key topics to master: CAP theorem, consistent hashing, load balancing, database sharding, caching strategies, message queues, CDN, API gateway, microservices, system design process, and common case studies (Twitter, URL shortener, chat, rate limiter).