Recommendation Systems Cheatsheet Cheatsheet

🎯

Recommendation Fundamentals

Core Concepts

Recommendation systems predict user preferences and suggest relevant items. They are core to Netflix, Amazon, Spotify, YouTube, and most modern platforms.

Recommendation Approaches

Approach	How It Works	Data Needed	Example
Content-Based	Recommend items similar to what user liked	Item features, user preferences	Spotify: similar songs based on audio features
Collaborative Filtering	Recommend items similar users liked	User-item interaction matrix	Netflix: users who watched X also watched Y
Hybrid	Combine content + collaborative signals	Both features and interactions	YouTube: video features + watch history
Knowledge-Based	Recommend based on domain rules	Item attributes, user requirements	Real estate: filter by budget, location, beds
Demographic	Recommend based on user demographics	Age, gender, location	News: trending in your city
Deep Learning	Neural networks learn complex patterns	Any available data	Google: Wide & Deep, DeepFM

recsys_basics.py

import numpy as np
from scipy.spatial.distance import cosine

# ── User-Item Interaction Matrix ──
#     Item1  Item2  Item3  Item4
# U1 [  5.0   3.0   0.0   1.0]
# U2 [  4.0   0.0   4.0   2.0]
# U3 [  1.0   1.0   5.0   4.0]
# U4 [  0.0   5.0   4.0   5.0]

R = np.array([
    [5, 3, 0, 1],
    [4, 0, 4, 2],
    [1, 1, 5, 4],
    [0, 5, 4, 5],
])

# ── Cosine Similarity ──
def cosine_sim(a, b):
    return 1 - cosine(a, b)

# User similarity: rows of R
user_sim = np.zeros((4, 4))
for i in range(4):
    for j in range(4):
        user_sim[i][j] = cosine_sim(R[i], R[j])

# Item similarity: columns of R
item_sim = np.zeros((4, 4))
for i in range(4):
    for j in range(4):
        item_sim[i][j] = cosine_sim(R[:, i], R[:, j])

👥

Collaborative Filtering

User Behavior

Collaborative filtering leverages the collective behavior of all users to make recommendations. It assumes users who agreed in the past will agree in the future.

collaborative_filtering.py

import numpy as np
from scipy.sparse import csr_matrix
from scipy.sparse.linalg import svds

# ── User-Based CF: Find similar users, recommend their items ──
def user_based_cf(R, user_idx, k=2):
    # 1. Compute user similarities
    user_sim = np.zeros((R.shape[0], R.shape[0]))
    for i in range(R.shape[0]):
        for j in range(R.shape[0]):
            if i != j:
                # Pearson correlation on co-rated items
                mask = (R[i] > 0) & (R[j] > 0)
                if mask.sum() > 1:
                    user_sim[i][j] = np.corrcoef(R[i][mask], R[j][mask])[0, 1]

    # 2. Find K most similar users
    similar_users = np.argsort(user_sim[user_idx])[-k:][::-1]

    # 3. Recommend items those users liked that target hasn&apos;t seen
    unseen = R[user_idx] == 0
    scores = np.zeros(R.shape[1])
    for su in similar_users:
        scores += user_sim[user_idx][su] * R[su]

    scores[~unseen] = -1  # Exclude already seen
    return np.argsort(scores)[-5:][::-1]

# ── Matrix Factorization (SVD) ──
# R_approx = U * S * V^T (latent factor model)
# User i&apos;s rating for item j = dot(u_i, v_j)

R_sparse = csr_matrix(R)
U, sigma, Vt = svds(R_sparse, k=2)  # k latent factors
sigma = np.diag(sigma)
predicted = np.dot(np.dot(U, sigma), Vt)

# Predicted rating for user 0, item 2 (previously 0):
print(f"Predicted: {predicted[0, 2]:.2f}")

CF Challenges & Solutions

Challenge	Description	Solution
Cold Start (New User)	No interaction history for new users	Use demographics, popularity, or ask for preferences
Cold Start (New Item)	No ratings for new items	Use content-based features or initial seeding
Data Sparsity	Most users rate few items (<1% of catalog)	Matrix factorization, regularization, implicit feedback
Scalability	User/item count in millions	Approximate nearest neighbors (ANN), SVD, ALS
Popularity Bias	Popular items get more exposure, creating feedback loop	Explore/exploit, diversity, coverage metrics

📄

Content-Based Filtering

Item Features

Content-based filtering recommends items similar to those a user has liked in the past, using item features (text, images, metadata) rather than user behavior.

content_based.py

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np

# ── TF-IDF for Text-Based Items ──
documents = [
    "Action movie with car chases and explosions",
    "Romantic comedy about love in Paris",
    "Sci-fi thriller set in space with aliens",
    "Documentary about ocean marine life",
    "Action thriller with martial arts fighting",
]

tfidf = TfidfVectorizer(stop_words='english', max_features=500)
tfidf_matrix = tfidf.fit_transform(documents)

# ── Compute Item Similarity Matrix ──
item_similarity = cosine_similarity(tfidf_matrix)

# ── Recommend similar items ──
def recommend_similar(item_idx, top_n=3):
    scores = item_similarity[item_idx]
    scores[item_idx] = -1  # Exclude itself
    return np.argsort(scores)[-top_n:][::-1]

# ── User Profile Construction ──
# Build user profile as weighted average of liked item vectors
liked_items = [0, 4]  # User liked items 0 and 4 (action movies)
user_profile = np.zeros(tfidf_matrix.shape[1])
for idx in liked_items:
    user_profile += tfidf_matrix[idx].toarray().flatten()
user_profile = user_profile / len(liked_items)

# ── Recommend items matching user profile ──
scores = cosine_similarity([user_profile], tfidf_matrix).flatten()
scores[liked_items] = -1  # Exclude already liked
top_recs = np.argsort(scores)[-3:][::-1]
print(f"Recommended items: {top_recs}")

🔢

Matrix Factorization

SVD, ALS, NMF

Matrix factorization decomposes the user-item interaction matrix into lower-dimensional latent factor matrices, capturing hidden patterns in user preferences.

matrix_factorization.py

import numpy as np

# ── SGD-based Matrix Factorization ──
class MatrixFactorization:
    def __init__(self, n_users, n_items, n_factors=20, lr=0.01,
                 reg=0.02, n_epochs=100):
        self.n_factors = n_factors
        self.lr = lr
        self.reg = reg
        self.U = np.random.normal(0, 0.1, (n_users, n_factors))
        self.V = np.random.normal(0, 0.1, (n_items, n_factors))
        self.bias_u = np.zeros(n_users)
        self.bias_i = np.zeros(n_items)
        self.global_mean = 0

    def fit(self, R, mask):
        n_ratings = mask.sum()
        for epoch in range(self.n_epochs):
            total_loss = 0
            for u in range(R.shape[0]):
                for i in range(R.shape[1]):
                    if mask[u, i] == 0:
                        continue
                    pred = (self.global_mean + self.bias_u[u] +
                            self.bias_i[i] + self.U[u] @ self.V[i])
                    error = R[u, i] - pred
                    total_loss += error ** 2

                    # Update biases
                    self.bias_u[u] += self.lr * (error - self.reg * self.bias_u[u])
                    self.bias_i[i] += self.lr * (error - self.reg * self.bias_i[i])

                    # Update latent factors
                    u_old = self.U[u].copy()
                    self.U[u] += self.lr * (error * self.V[i] - self.reg * self.U[u])
                    self.V[i] += self.lr * (error * u_old - self.reg * self.V[i])

            if (epoch + 1) % 20 == 0:
                rmse = np.sqrt(total_loss / n_ratings)
                print(f"Epoch {epoch+1}: RMSE = {rmse:.4f}")

    def predict(self, u, i):
        return (self.global_mean + self.bias_u[u] +
                self.bias_i[i] + self.U[u] @ self.V[i])

MF Variants

Method	Key Idea	Pros	Cons
Truncated SVD	Decompose R into U, Sigma, Vt	Fast, deterministic	Cannot handle missing values
SGD MF	Gradient descent on latent factors	Handles missing values well	Slower, sensitive to hyperparams
ALS (Alternating Least Squares)	Fix U, solve V, then fix V, solve U	Parallelizable (distributed)	Slower convergence
NMF (Non-negative MF)	Force all factors to be non-negative	Interpretable latent factors	Constrained solution space
SVD++	SVD + implicit feedback signals	Better with implicit data	More complex

📊

Evaluation Metrics

Measuring Quality

Recommendation Metrics

Metric	Measures	Formula/Description	Good Range
RMSE	Prediction accuracy	Sqrt(Mean((predicted - actual)^2))	< 0.9 for 1-5 scale
MAE	Prediction accuracy	Mean(\|predicted - actual\|)	< 0.7 for 1-5 scale
Precision@K	Relevance of top-K	Relevant in top K / K	> 0.1 is good for large catalogs
Recall@K	Coverage of relevant items	Relevant in top K / All relevant	> 0.3 is decent
NDCG@K	Ranking quality	Discounted cumulative gain, normalized	> 0.5 is good
MAP@K	Mean Average Precision	Mean of precision at each relevant rank	> 0.2 is good
Coverage	Catalog diversity	Items recommended / Total items	> 5% of catalog
Diversity	Item dissimilarity	Average pairwise distance of recommendations	Higher = more diverse

🔀

Hybrid Systems

Modern Approaches

Modern RecSys Architecture

System	Approach	Used By	Key Innovation
Wide & Deep	Linear (memorization) + DNN (generalization)	Google Play	Combine memorization and generalization
DeepFM	FM + Deep Neural Network	Huawei	Factorization machines with deep layers
DIN (Deep Interest)	Attention over user behavior history	Alibaba	Captures diverse interests with attention
Transformer4Rec	Transformer over sequential interactions	Various	Self-attention for session-based rec
SASRec	Self-Attentive Sequential Rec	Various	Transformer for next-item prediction
Graph Neural Network	GNN over user-item bipartite graph	Pinterest, Uber	Collaborative signals via message passing

💬

Interview Questions

Top 8

Q1: Cold start problem solutions?

AnswerNew users: (1) Show popular items, (2) Use demographic info, (3) Ask onboarding questions (pick 5 interests), (4) Hybrid approach. New items: (1) Content-based features, (2) Explore/exploit (show to 5% of users), (3) Metadata-based recommendations, (4) Similar item embeddings.

Q2: Content-based vs Collaborative Filtering?

AnswerContent-based: uses item features to find similar items. No cold start for items, but limited to recommending similar things (filter bubble). Collaborative: uses user behavior patterns. Better for serendipity but has cold start problems. Hybrid: combines both for best results. In practice, production systems almost always use hybrid approaches.

Q3: Explain matrix factorization intuition

AnswerEach user and item is represented as a vector in a latent feature space (e.g., 20 dimensions). These latent factors capture hidden preferences: for movies, dimensions might represent "action level," "romance level," "comedy level." The predicted rating = dot product of user and item vectors. Similar users/items have similar vectors.

Q4: Precision@K vs Recall@K?

AnswerPrecision@K: of the top K recommended items, how many are relevant? Important when you only show K items and want them all to be good. Recall@K: of all relevant items, how many appear in top K? Important when you want to ensure users see all relevant items. In practice, optimize both. E-commerce: precision matters more (limited shelf space). News: recall matters more.

Q5: How to handle popularity bias?

AnswerPopular items get more exposure, more ratings, more recommendations (feedback loop). Solutions: (1) Inverse propensity scoring (weight by 1/exposure probability). (2) Explore/exploit (epsilon-greedy, Thompson sampling). (3) Diversity penalties in scoring. (4) Fairness constraints in optimization. (5) User-segmented popularity baselines.

Q6: Implicit vs explicit feedback?

AnswerExplicit: direct ratings (1-5 stars, thumbs up/down). Sparse but accurate signal. Implicit: clicks, views, purchases, watch time. Abundant but noisy. Most real systems use implicit (much more data). For implicit: use weighted matrix factorization (ALS-WR) or Bayesian Personalized Ranking (BPR). Confidence = f(interaction_strength).

⏳

Loading cheatsheet...

Approach

How It Works

Data Needed

Example

Content-Based

Recommend items similar to what user liked

Item features, user preferences

Spotify: similar songs based on audio features

Collaborative Filtering

Recommend items similar users liked

User-item interaction matrix

Netflix: users who watched X also watched Y

Hybrid

Combine content + collaborative signals

Both features and interactions

YouTube: video features + watch history

Knowledge-Based

Recommend based on domain rules

Item attributes, user requirements

Real estate: filter by budget, location, beds

Demographic

Recommend based on user demographics

Age, gender, location

News: trending in your city

Deep Learning

Neural networks learn complex patterns

Any available data

Google: Wide & Deep, DeepFM

import numpy as np from scipy.spatial.distance import cosine # ── User-Item Interaction Matrix ── # Item1 Item2 Item3 Item4 # U1 [ 5.0 3.0 0.0 1.0] # U2 [ 4.0 0.0 4.0 2.0] # U3 [ 1.0 1.0 5.0 4.0] # U4 [ 0.0 5.0 4.0 5.0] R = np.array([ [5, 3, 0, 1], [4, 0, 4, 2], [1, 1, 5, 4], [0, 5, 4, 5], ]) # ── Cosine Similarity ── def cosine_sim(a, b): return 1 - cosine(a, b) # User similarity: rows of R user_sim = np.zeros((4, 4)) for i in range(4): for j in range(4): user_sim[i][j] = cosine_sim(R[i], R[j]) # Item similarity: columns of R item_sim = np.zeros((4, 4)) for i in range(4): for j in range(4): item_sim[i][j] = cosine_sim(R[:, i], R[:, j])

import numpy as np from scipy.sparse import csr_matrix from scipy.sparse.linalg import svds # ── User-Based CF: Find similar users, recommend their items ── def user_based_cf(R, user_idx, k=2): # 1. Compute user similarities user_sim = np.zeros((R.shape[0], R.shape[0])) for i in range(R.shape[0]): for j in range(R.shape[0]): if i != j: # Pearson correlation on co-rated items mask = (R[i] > 0) & (R[j] > 0) if mask.sum() > 1: user_sim[i][j] = np.corrcoef(R[i][mask], R[j][mask])[0, 1] # 2. Find K most similar users similar_users = np.argsort(user_sim[user_idx])[-k:][::-1] # 3. Recommend items those users liked that target hasn't seen unseen = R[user_idx] == 0 scores = np.zeros(R.shape[1]) for su in similar_users: scores += user_sim[user_idx][su] * R[su] scores[~unseen] = -1 # Exclude already seen return np.argsort(scores)[-5:][::-1] # ── Matrix Factorization (SVD) ── # R_approx = U * S * V^T (latent factor model) # User i's rating for item j = dot(u_i, v_j) R_sparse = csr_matrix(R) U, sigma, Vt = svds(R_sparse, k=2) # k latent factors sigma = np.diag(sigma) predicted = np.dot(np.dot(U, sigma), Vt) # Predicted rating for user 0, item 2 (previously 0): print(f"Predicted: {predicted[0, 2]:.2f}")

Challenge

Description

Solution

Cold Start (New User)

No interaction history for new users

Use demographics, popularity, or ask for preferences

Cold Start (New Item)

No ratings for new items

Use content-based features or initial seeding

Data Sparsity

Most users rate few items (<1% of catalog)

Matrix factorization, regularization, implicit feedback

Scalability

User/item count in millions

Approximate nearest neighbors (ANN), SVD, ALS

Popularity Bias

Popular items get more exposure, creating feedback loop

Explore/exploit, diversity, coverage metrics

from sklearn.feature_extraction.text import TfidfVectorizer from sklearn.metrics.pairwise import cosine_similarity import numpy as np # ── TF-IDF for Text-Based Items ── documents = [ "Action movie with car chases and explosions", "Romantic comedy about love in Paris", "Sci-fi thriller set in space with aliens", "Documentary about ocean marine life", "Action thriller with martial arts fighting", ] tfidf = TfidfVectorizer(stop_words='english', max_features=500) tfidf_matrix = tfidf.fit_transform(documents) # ── Compute Item Similarity Matrix ── item_similarity = cosine_similarity(tfidf_matrix) # ── Recommend similar items ── def recommend_similar(item_idx, top_n=3): scores = item_similarity[item_idx] scores[item_idx] = -1 # Exclude itself return np.argsort(scores)[-top_n:][::-1] # ── User Profile Construction ── # Build user profile as weighted average of liked item vectors liked_items = [0, 4] # User liked items 0 and 4 (action movies) user_profile = np.zeros(tfidf_matrix.shape[1]) for idx in liked_items: user_profile += tfidf_matrix[idx].toarray().flatten() user_profile = user_profile / len(liked_items) # ── Recommend items matching user profile ── scores = cosine_similarity([user_profile], tfidf_matrix).flatten() scores[liked_items] = -1 # Exclude already liked top_recs = np.argsort(scores)[-3:][::-1] print(f"Recommended items: {top_recs}")

import numpy as np # ── SGD-based Matrix Factorization ── class MatrixFactorization: def __init__(self, n_users, n_items, n_factors=20, lr=0.01, reg=0.02, n_epochs=100): self.n_factors = n_factors self.lr = lr self.reg = reg self.U = np.random.normal(0, 0.1, (n_users, n_factors)) self.V = np.random.normal(0, 0.1, (n_items, n_factors)) self.bias_u = np.zeros(n_users) self.bias_i = np.zeros(n_items) self.global_mean = 0 def fit(self, R, mask): n_ratings = mask.sum() for epoch in range(self.n_epochs): total_loss = 0 for u in range(R.shape[0]): for i in range(R.shape[1]): if mask[u, i] == 0: continue pred = (self.global_mean + self.bias_u[u] + self.bias_i[i] + self.U[u] @ self.V[i]) error = R[u, i] - pred total_loss += error ** 2 # Update biases self.bias_u[u] += self.lr * (error - self.reg * self.bias_u[u]) self.bias_i[i] += self.lr * (error - self.reg * self.bias_i[i]) # Update latent factors u_old = self.U[u].copy() self.U[u] += self.lr * (error * self.V[i] - self.reg * self.U[u]) self.V[i] += self.lr * (error * u_old - self.reg * self.V[i]) if (epoch + 1) % 20 == 0: rmse = np.sqrt(total_loss / n_ratings) print(f"Epoch {epoch+1}: RMSE = {rmse:.4f}") def predict(self, u, i): return (self.global_mean + self.bias_u[u] + self.bias_i[i] + self.U[u] @ self.V[i])

Method

Key Idea

Pros

Cons

Truncated SVD

Decompose R into U, Sigma, Vt

Fast, deterministic

Cannot handle missing values

SGD MF

Gradient descent on latent factors

Handles missing values well

Slower, sensitive to hyperparams

ALS (Alternating Least Squares)

Fix U, solve V, then fix V, solve U

Parallelizable (distributed)

Slower convergence

NMF (Non-negative MF)

Force all factors to be non-negative

Interpretable latent factors

Constrained solution space

SVD++

SVD + implicit feedback signals

Better with implicit data

More complex

Metric

Measures

Formula/Description

Good Range

RMSE

Prediction accuracy

Sqrt(Mean((predicted - actual)^2))

< 0.9 for 1-5 scale

MAE

Prediction accuracy

Mean(|predicted - actual|)

< 0.7 for 1-5 scale

Precision@K

Relevance of top-K

Relevant in top K / K

> 0.1 is good for large catalogs

Recall@K

Coverage of relevant items

Relevant in top K / All relevant

> 0.3 is decent

NDCG@K

Ranking quality

Discounted cumulative gain, normalized

> 0.5 is good

MAP@K

Mean Average Precision

Mean of precision at each relevant rank

> 0.2 is good

Coverage

Catalog diversity

Items recommended / Total items

> 5% of catalog

Diversity

Item dissimilarity

Average pairwise distance of recommendations

Higher = more diverse

System

Approach

Used By

Key Innovation

Wide & Deep

Linear (memorization) + DNN (generalization)

Google Play

Combine memorization and generalization

DeepFM

FM + Deep Neural Network

Huawei

Factorization machines with deep layers

DIN (Deep Interest)

Attention over user behavior history

Alibaba

Captures diverse interests with attention

Transformer4Rec

Transformer over sequential interactions

Various

Self-attention for session-based rec

SASRec

Self-Attentive Sequential Rec

Various

Transformer for next-item prediction

Graph Neural Network

GNN over user-item bipartite graph

Pinterest, Uber

Collaborative signals via message passing