TensorFlow Cheatsheet Cheatsheet

🔧

TensorFlow & Keras Basics

Foundation

TensorFlow is Google's open-source deep learning framework. Keras is its high-level API for building and training models quickly. TensorFlow 2.x integrates Keras as the default API.

tf_basics.py

import tensorflow as tf
from tensorflow import keras
print(tf.__version__)  # e.g., 2.17.0

# ── Tensors ──
scalar = tf.constant(42)
vector = tf.constant([1.0, 2.0, 3.0])
matrix = tf.constant([[1, 2], [3, 4]], dtype=tf.float32)
zeros = tf.zeros((3, 3))
ones = tf.ones((2, 4))
random = tf.random.normal((3, 3), mean=0, stddev=1)

# Tensor operations
a = tf.constant([[1, 2], [3, 4]])
b = tf.constant([[5, 6], [7, 8]])
print(a + b)        # Element-wise addition
print(tf.matmul(a, b))  # Matrix multiplication
print(tf.reduce_mean(a))  # Mean: 2.5
print(tf.reshape(a, (4, 1)))  # Reshape

# GPU check
print("GPUs Available:", tf.config.list_physical_devices('GPU'))

# ── Eager vs Graph Execution ──
# Eager: default in TF2, executes immediately
# Graph: use @tf.function for performance
@tf.function
def compute(a, b):
    return tf.matmul(a, b)

result = compute(tf.ones((1000, 1000)), tf.ones((1000, 1000)))

TensorFlow Core Concepts

TensorMulti-dimensional array with a known type. Rank = number of dimensions. Shape = size along each dimension.

VariableMutable tensor that holds model weights. Updated during training via gradients.

GraphComputational graph of operations. TF2 uses eager by default; @tf.function traces a graph for speed.

SessionDeprecated in TF2. Previously used to execute graphs; now automatic.

DeviceCPU or GPU/TPU where operations execute. tf.device('/GPU:0') forces placement.

TensorFlow vs PyTorch (2025)

Feature	TensorFlow 2.x	PyTorch
Creator	Google Brain	Meta AI (FAIR)
API Style	Keras (high-level) + tf API	Pythonic, torch.nn modules
Graph Mode	@tf.function decorator	torch.compile (optional)
Deployment	TF Serving, TF Lite, TF.js, SavedModel	TorchServe, ONNX, TensorRT
Mobile/Edge	TF Lite (mature), TF Micro	PyTorch Mobile, ExecuTorch
Research Popularity	Industry/production	Research community (majority)
Visualization	TensorBoard (built-in)	TensorBoard (via torch.utils.tensorboard)
Learning Curve	Easier for beginners (Keras)	More Pythonic, intuitive for researchers
TPU Support	Native, first-class	Via PyTorch/XLA (less mature)

🧱

Keras Layers & Models

Building Blocks

Keras provides a modular approach to building neural networks. Layers are the building blocks; models define how layers connect.

keras_layers.py

from tensorflow import keras
from tensorflow.keras import layers

# ── Sequential API ──
model = keras.Sequential([
    layers.Input(shape=(28, 28, 1)),
    layers.Conv2D(32, 3, activation='relu'),
    layers.MaxPooling2D(2),
    layers.Conv2D(64, 3, activation='relu'),
    layers.MaxPooling2D(2),
    layers.Flatten(),
    layers.Dense(128, activation='relu'),
    layers.Dropout(0.5),
    layers.Dense(10, activation='softmax'),
])
model.summary()

# ── Functional API ──
inputs = keras.Input(shape=(784,))
x = layers.Dense(256, activation='relu')(inputs)
x = layers.BatchNormalization()(x)
x = layers.Dropout(0.3)(x)
x = layers.Dense(128, activation='relu')(x)
outputs = layers.Dense(10, activation='softmax')(x)

model = keras.Model(inputs=inputs, outputs=outputs, name="mlp_classifier")

# ── Custom Layers ──
class ResidualBlock(layers.Layer):
    def __init__(self, units, **kwargs):
        super().__init__(**kwargs)
        self.dense1 = layers.Dense(units, activation='relu')
        self.dense2 = layers.Dense(units)
        self.bn = layers.BatchNormalization()

    def call(self, inputs, training=False):
        x = self.dense1(inputs)
        x = self.dense2(x)
        x = self.bn(x, training=training)
        return layers.Activation('relu')(x + inputs)  # Skip connection

# ── Common Layer Types ──
layers.Dense(64, activation='relu')          # Fully connected
layers.Conv2D(32, 3, padding='same')        # 2D convolution
layers.Conv1D(64, 5, activation='relu')     # 1D convolution (text, time series)
layers.LSTM(128, return_sequences=True)      # RNN layer
layers.GRU(64)                               # Gated recurrent unit
layers.Bidirectional(layers.LSTM(64))        # Bidirectional wrapper
layers.Embedding(10000, 128)                 # Word embeddings
layers.MultiHeadAttention(num_heads=8, key_dim=64)  # Transformer attention
layers.GlobalAveragePooling2D()              # Global average pooling
layers.BatchNormalization()                  # Batch normalization
layers.LayerNormalization()                  # Layer normalization

Essential Keras Layers Reference

Layer	Use Case	Key Args	Output Shape
Dense	Fully connected layers	units, activation, use_bias	(batch, units)
Conv2D	Image feature extraction	filters, kernel_size, strides, padding	(batch, H, W, filters)
Conv1D	Sequence/text features	filters, kernel_size, strides	(batch, seq_len, filters)
MaxPooling2D	Spatial downsampling	pool_size, strides	(batch, H/ps, W/ps, ch)
LSTM	Sequential data processing	units, return_sequences, return_state	(batch, seq, units)
GRU	Lighter sequential processing	units, return_sequences	(batch, seq, units)
Embedding	Word/token to vector	input_dim, output_dim	(batch, seq, embed_dim)
BatchNormalization	Stabilize training	momentum, epsilon	Same as input
Dropout	Regularization	rate	Same as input
Flatten	Reshape to 1D	-	(batch, HWC)

⚠️

Sequential vs Functional: Use Sequential for simple linear stacks. Use Functional API when you need: multiple inputs/outputs, shared layers, skip connections (ResNet), or non-linear topology.

🏋️

Training Models

Compile, Fit, Evaluate

Training a neural network in Keras involves compiling (loss, optimizer, metrics), fitting (epochs, batches), and evaluating on test data.

training.py

from tensorflow import keras
from tensorflow.keras import layers, callbacks

# ── Build Model ──
model = keras.Sequential([
    layers.Input(shape=(28, 28, 1)),
    layers.Conv2D(32, 3, activation='relu'),
    layers.MaxPooling2D(2),
    layers.Flatten(),
    layers.Dense(128, activation='relu'),
    layers.Dropout(0.5),
    layers.Dense(10, activation='softmax'),
])

# ── Compile ──
model.compile(
    optimizer=keras.optimizers.Adam(learning_rate=0.001),
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy'],
)

# ── Callbacks ──
cb_list = [
    callbacks.EarlyStopping(patience=5, restore_best_weights=True),
    callbacks.ModelCheckpoint('best_model.keras', save_best_only=True),
    callbacks.ReduceLROnPlateau(factor=0.5, patience=3, min_lr=1e-6),
    callbacks.TensorBoard(log_dir='./logs'),
]

# ── Train ──
history = model.fit(
    x_train, y_train,
    epochs=50,
    batch_size=64,
    validation_split=0.2,
    callbacks=cb_list,
    verbose=1,
)

# ── Evaluate ──
test_loss, test_acc = model.evaluate(x_test, y_test)
print(f"Test accuracy: {test_acc:.4f}")

# ── Predict ──
predictions = model.predict(x_test[:5])
predicted_classes = tf.argmax(predictions, axis=1)

# ── Custom Training Loop ──
optimizer = keras.optimizers.Adam(learning_rate=1e-3)
loss_fn = keras.losses.SparseCategoricalCrossentropy()

@tf.function
def train_step(x, y):
    with tf.GradientTape() as tape:
        logits = model(x, training=True)
        loss = loss_fn(y, logits)
    grads = tape.gradient(loss, model.trainable_variables)
    optimizer.apply_gradients(zip(grads, model.trainable_variables))
    return loss

for epoch in range(10):
    for step, (x_batch, y_batch) in enumerate(train_dataset):
        loss = train_step(x_batch, y_batch)

Optimizers Comparison

Optimizer	Learning Rate	Key Feature	Best For
SGD	0.01-0.1	Simple, well-understood, with momentum	Large datasets, fine-tuning with low LR
Adam	0.001 (default)	Adaptive LR per parameter, momentum	General-purpose, most common default
AdamW	0.001	Adam with decoupled weight decay	Transformer training, better generalization
RMSprop	0.001	Adaptive LR based on recent gradients	RNNs, non-stationary objectives
Adagrad	0.01	Accumulated squared gradients, decaying LR	Sparse gradients (NLP, embeddings)
Nadam	0.001	Adam + Nesterov momentum	Image classification, NLP
Lion	0.001	Memory-efficient, sign-based updates	Large models, memory-constrained training

Loss Functions

Loss	Task	Output Activation	Use When
binary_crossentropy	Binary classification	sigmoid	2-class problems (spam/not spam)
categorical_crossentropy	Multi-class (one-hot)	softmax	Multi-class with one-hot labels
sparse_categorical_crossentropy	Multi-class (integers)	softmax	Multi-class with integer labels
mse (Mean Squared Error)	Regression	linear (none)	Predicting continuous values
mae (Mean Absolute Error)	Regression (robust)	linear	Regression with outliers
huber	Regression (smooth)	linear	Combines MSE/MAE benefits
cosine_similarity	Embeddings/similarity	-	Face recognition, embeddings

💡

Learning rate scheduling: Use ReduceLROnPlateau for most tasks. For large-scale training, use CosineDecay or WarmupCosineDecay. Always start with the default LR for Adam (0.001) and adjust based on training dynamics.

📦

TF Data Pipeline

Efficient Data Loading

tf.data is TensorFlow's API for building efficient input pipelines. It handles reading, preprocessing, batching, and prefetching data for training.

tf_data_pipeline.py

import tensorflow as tf

# ── From NumPy Arrays ──
dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))
dataset = dataset.shuffle(10000).batch(32).prefetch(tf.data.AUTOTUNE)

# ── From Image Directory ──
train_ds = keras.utils.image_dataset_from_directory(
    './data/images/',
    validation_split=0.2,
    subset='training',
    seed=42,
    image_size=(224, 224),
    batch_size=32,
)

# ── From CSV ──
dataset = tf.data.experimental.make_csv_dataset(
    './data.csv',
    batch_size=32,
    label_name='target',
    num_epochs=1,
)

# ── Advanced Pipeline with Augmentation ──
AUTOTUNE = tf.data.AUTOTUNE

def augment(image, label):
    image = tf.image.random_flip_left_right(image)
    image = tf.image.random_brightness(image, max_delta=0.2)
    image = tf.image.random_contrast(image, lower=0.8, upper=1.2)
    return image, label

train_ds = (train_ds
    .map(augment, num_parallel_calls=AUTOTUNE)
    .shuffle(1000)
    .batch(32)
    .prefetch(AUTOTUNE)
)

# ── TFRecord for Large Datasets ──
# Write TFRecord
feature = {
    'image': tf.train.Feature(bytes_list=tf.train.BytesList(
        value=[image_bytes])),
    'label': tf.train.Feature(int64_list=tf.train.Int64List(
        value=[label_id])),
}
example = tf.train.Example(features=tf.train.Features(feature=feature))
with tf.io.TFRecordWriter('data.tfrecord') as writer:
    writer.write(example.SerializeToString())

# Read TFRecord
def parse_fn(example_proto):
    feature_desc = {
        'image': tf.io.FixedLenFeature([], tf.string),
        'label': tf.io.FixedLenFeature([], tf.int64),
    }
    parsed = tf.io.parse_single_example(example_proto, feature_desc)
    image = tf.io.decode_jpeg(parsed['image'], channels=3)
    return image, parsed['label']

dataset = tf.data.TFRecordDataset('data.tfrecord').map(parse_fn)

tf.data Best Practices

PrefetchAlways use .prefetch(tf.data.AUTOTUNE) to overlap data preprocessing with model execution.

Parallel MapUse .map(fn, num_parallel_calls=AUTOTUNE) for CPU-bound preprocessing.

CacheUse .cache() after reading files to avoid re-reading from disk each epoch.

ShuffleShuffle with buffer_size >= dataset size. Place shuffle before batch for best randomness.

Batch SizeLarger batches (64-256) for faster training; smaller (16-32) for better generalization.

🔄

Transfer Learning

Pre-trained Models

Transfer learning reuses knowledge from a model pre-trained on large datasets (ImageNet, etc.) for your specific task. It dramatically reduces training time and data requirements.

transfer_learning.py

from tensorflow import keras
from tensorflow.keras import layers, applications

# ── Feature Extraction (freeze base) ──
base_model = applications.MobileNetV2(
    weights='imagenet',
    input_shape=(224, 224, 3),
    include_top=False,
)
base_model.trainable = False  # Freeze all layers

model = keras.Sequential([
    base_model,
    layers.GlobalAveragePooling2D(),
    layers.Dense(256, activation='relu'),
    layers.Dropout(0.5),
    layers.Dense(num_classes, activation='softmax'),
])

model.compile(optimizer='adam', loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

# ── Fine-Tuning (unfreeze top layers) ──
base_model.trainable = True
# Freeze bottom layers, fine-tune top N layers
for layer in base_model.layers[:-20]:
    layer.trainable = False

model.compile(
    optimizer=keras.optimizers.Adam(learning_rate=1e-5),  # Very low LR!
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy'],
)

# ── Popular Pre-trained Models ──
# applications.ResNet50(weights='imagenet', include_top=False)
# applications.EfficientNetB0(weights='imagenet')
# applications.VGG16(weights='imagenet')
# applications.InceptionV3(weights='imagenet')
# applications.ConvNeXtBase(weights='imagenet')

Pre-trained Model Comparison

Model	Top-1 Acc	Params	Size	Best For
MobileNetV2	71.8%	3.4M	14 MB	Mobile, edge, real-time
MobileNetV3-Small	67.4%	2.5M	11 MB	Ultra-light mobile apps
EfficientNet-B0	77.1%	5.3M	21 MB	Best accuracy-to-size ratio
EfficientNet-B7	84.3%	66M	256 MB	Max accuracy from efficient family
ResNet50	76.0%	25.6M	98 MB	General-purpose baseline
ResNet152	78.3%	60.2M	232 MB	High accuracy, research
InceptionV3	77.9%	23.8M	92 MB	Multi-scale features
ConvNeXt-Base	85.3%	88.6M	350 MB	Modern CNN, matches ViT
VGG16	71.3%	138M	528 MB	Feature extraction, simple

🚀

SavedModel & TF Lite

Deployment

TensorFlow provides multiple deployment paths: SavedModel for serving, TF Lite for mobile/embedded, and TF.js for web browsers.

deployment.py

# ── SavedModel (for TF Serving) ──
model.save('my_model')  # Creates a SavedModel directory
model.save('my_model.keras')  # Single-file Keras format (TF 2.13+)

# Load
loaded_model = keras.models.load_model('my_model.keras')

# ── TF Lite Conversion ──
converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]  # Quantization
tflite_model = converter.convert()

with open('model.tflite', 'wb') as f:
    f.write(tflite_model)

# ── TF Lite Inference ──
interpreter = tf.lite.Interpreter(model_path='model.tflite')
interpreter.allocate_tensors()

input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()

interpreter.set_tensor(input_details[0]['index'], input_data)
interpreter.invoke()
output = interpreter.get_tensor(output_details[0]['index'])

# ── ONNX Export (via tf2onnx) ──
import tf2onnx
spec = (tf.TensorSpec((None, 224, 224, 3), tf.float32, name="input"),)
model_onnx, _ = tf2onnx.convert.from_keras(model, input_signature=spec)
with open("model.onnx", "wb") as f:
    f.write(model_onnx.SerializeToString())

Deployment Options

Platform	Format	Size Reduction	Use Case
TF Serving (REST/gRPC)	SavedModel	None	Cloud production APIs
TF Lite	.tflite	3-4x (with quantization)	Android, iOS, Raspberry Pi, ESP32
TF Lite Micro	.tflite	4-10x	Microcontrollers (STM32, Arduino)
TF.js	JSON + weights	None	Browser, Node.js
ONNX Runtime	.onnx	Varies	Cross-platform, C#/C++ apps
TensorRT	.plan	2-3x (FP16/INT8)	NVIDIA GPUs, max inference speed

🖼️

CNNs for Image Tasks

Computer Vision

Convolutional Neural Networks (CNNs) are the backbone of computer vision. They learn spatial hierarchies of features through convolutional and pooling layers.

cnn_image_classification.py

from tensorflow import keras
from tensorflow.keras import layers

# ── Simple CNN for CIFAR-10 ──
model = keras.Sequential([
    layers.Input(shape=(32, 32, 3)),
    # Block 1
    layers.Conv2D(32, 3, padding='same', activation='relu'),
    layers.BatchNormalization(),
    layers.Conv2D(32, 3, padding='same', activation='relu'),
    layers.BatchNormalization(),
    layers.MaxPooling2D(2),
    layers.Dropout(0.25),

    # Block 2
    layers.Conv2D(64, 3, padding='same', activation='relu'),
    layers.BatchNormalization(),
    layers.Conv2D(64, 3, padding='same', activation='relu'),
    layers.BatchNormalization(),
    layers.MaxPooling2D(2),
    layers.Dropout(0.25),

    # Classifier
    layers.Flatten(),
    layers.Dense(512, activation='relu'),
    layers.BatchNormalization(),
    layers.Dropout(0.5),
    layers.Dense(10, activation='softmax'),
])

model.compile(optimizer='adam', loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

# ── Data Augmentation Layer (Keras 3) ──
data_augmentation = keras.Sequential([
    layers.RandomFlip('horizontal'),
    layers.RandomRotation(0.1),
    layers.RandomZoom(0.1),
    layers.RandomContrast(0.1),
])

model = keras.Sequential([
    data_augmentation,
    layers.Rescaling(1./255),
    # ... rest of model
])

Common CNN Architectures

Architecture	Year	Innovation	Params
LeNet-5	1998	First successful CNN (handwriting)	60K
AlexNet	2012	Deep CNN + ReLU + dropout	60M
VGG16	2014	Uniform 3x3 convolutions, depth	138M
GoogLeNet	2014	Inception modules (multi-scale)	6.8M
ResNet	2015	Skip connections (residual learning)	25M (ResNet50)
DenseNet	2017	Dense connections between layers	8M (DenseNet121)
EfficientNet	2019	Compound scaling (depth, width, res)	5.3M (B0)
ConvNeXt	2022	Modernized CNN, matches ViT	88M (Base)

🎯

Regularization & Debugging

Prevent Overfitting

Regularization techniques prevent overfitting and improve model generalization. Debugging ML models requires understanding common failure modes and how to diagnose them.

regularization.py

from tensorflow.keras import layers, regularizers

# ── Weight Regularization ──
layers.Dense(64, activation='relu',
    kernel_regularizer=regularizers.l2(0.01))
layers.Conv2D(32, 3,
    kernel_regularizer=regularizers.l1_l2(l1=1e-5, l2=1e-4))

# ── Dropout ──
layers.Dropout(0.5)  # Drop 50% of neurons during training

# ── Early Stopping ──
early_stop = keras.callbacks.EarlyStopping(
    monitor='val_loss',
    patience=10,
    restore_best_weights=True,
    min_delta=0.001,
)

# ── Learning Rate Scheduler ──
lr_schedule = keras.optimizers.schedules.CosineDecay(
    initial_learning_rate=1e-3,
    decay_steps=10000,
)
optimizer = keras.optimizers.Adam(learning_rate=lr_schedule)

# ── Gradient Clipping ──
optimizer = keras.optimizers.Adam(
    learning_rate=1e-3,
    clipnorm=1.0,   # Clip by norm
    # clipvalue=0.5,  # Clip by value
)

# ── Debugging: Check for NaN ──
# Use gradient clipping, lower LR, check data normalization
# Check for exploding/vanishing gradients
# Use tensorboard to monitor weights and gradients

Debugging Checklist

Symptom	Likely Cause	Solution
Training loss not decreasing	LR too high/low, data issue	Try LR=0.001, check labels, normalize data
Train loss low, val loss high	Overfitting	More dropout, L2 reg, data augmentation, reduce model
Both losses high	Underfitting	Increase model capacity, train longer, check data
NaN loss	Exploding gradients	Lower LR, gradient clipping, check for zeros in data
Loss oscillating	LR too high or batch too small	Reduce LR, increase batch size
Very slow training	Data pipeline bottleneck	Use tf.data prefetch, increase num_parallel_calls

🎛️

Hyperparameter Tuning

AutoML

Keras Tuner automates hyperparameter search using strategies like Random Search, Bayesian Optimization, and Hyperband.

keras_tuner.py

import keras_tuner as kt
from tensorflow import keras
from tensorflow.keras import layers

def build_model(hp):
    model = keras.Sequential()
    model.add(layers.Input(shape=(784,)))

    # Tune number of layers
    for i in range(hp.Int('num_layers', 1, 4)):
        model.add(layers.Dense(
            units=hp.Int(f'units_{i}', min_value=32, max_value=512, step=32),
            activation=hp.Choice('activation', ['relu', 'tanh']),
            kernel_regularizer=keras.regularizers.l2(
                hp.Float('l2', 1e-5, 1e-2, sampling='log')
            ),
        ))
        model.add(layers.Dropout(hp.Float('dropout', 0.1, 0.5, step=0.1)))

    model.add(layers.Dense(10, activation='softmax'))

    model.compile(
        optimizer=keras.optimizers.Adam(
            learning_rate=hp.Float('lr', 1e-4, 1e-2, sampling='log')
        ),
        loss='sparse_categorical_crossentropy',
        metrics=['accuracy'],
    )
    return model

tuner = kt.Hyperband(
    build_model,
    objective='val_accuracy',
    max_epochs=30,
    factor=3,
    directory='tuner_results',
    project_name='mnist_tuning',
)

tuner.search(x_train, y_train, epochs=30,
             validation_split=0.2)
best_hps = tuner.get_best_hyperparameters(num_trials=1)[0]
best_model = tuner.hypermodel.build(best_hps)

💬

TensorFlow Interview Questions

Top 10

Common TensorFlow interview questions with detailed answers covering core concepts, training, deployment, and troubleshooting.

Q1: Eager Execution vs Graph Mode

AnswerEager execution runs operations immediately (like NumPy) - easier to debug but slower. Graph mode traces operations into a computational graph for optimized execution. In TF2, use @tf.function to convert eager code to graph. Graph mode enables automatic parallelization, gradient computation, and XLA compilation.

Q2: When to use Sequential vs Functional API?

AnswerSequential for single-input, single-output, linear stacks. Functional API when you need: multiple inputs/outputs, shared layers, skip connections (ResNet), or non-sequential data flow. Functional API is the recommended approach for most real-world models.

Q3: What is tf.function and how does it work?

Answer@tf.function traces Python code to create a static TensorFlow graph. This enables XLA optimization, parallel execution, and faster training. It uses AutoGraph to convert Python control flow (if/while/for) into graph operations. Avoid Python side effects inside @tf.function (like printing or modifying lists).

Q4: How does GradientTape work?

Answertf.GradientTape records operations inside its context for automatic differentiation. tape.gradient(loss, model.trainable_variables) computes gradients. Persistent=True allows multiple gradient calls. Watch non-trainable tensors with tape.watch(tensor). It's the foundation of custom training loops.

Q5: Batch Normalization vs Layer Normalization

AnswerBatchNorm normalizes across the batch dimension (for CNNs). LayerNorm normalizes across the feature dimension (for Transformers/RNNs). BatchNorm is faster during training but behaves differently at inference. LayerNorm is independent of batch size, making it better for variable-length sequences and smaller batches.

Q6: Explain transfer learning strategy

AnswerTwo phases: (1) Feature Extraction - freeze pre-trained base, train only new classifier head with high LR (1e-3). (2) Fine-tuning - unfreeze top N layers of base, train with very low LR (1e-5) to avoid catastrophic forgetting. Always use data augmentation. For similar domains, freeze fewer layers.

Q7: How to debug NaN loss during training?

AnswerCheck for: (1) Learning rate too high - reduce by 10x. (2) Gradient explosion - use gradient clipping (clipnorm=1.0). (3) Log(0) or division by zero in loss - add epsilon (1e-7). (4) Unnormalized input data - scale to [0,1] or standardize. (5) Mixed precision issues - disable FP16. Use TensorBoard to inspect gradient histograms.

Q8: TF Lite quantization types

AnswerDynamic range quantization: weights stored as int8, activations computed in float (2-4x smaller, no accuracy loss). Full integer quantization: everything in int8, needs representative dataset (3-4x smaller, slight accuracy loss). Float16 quantization: weights as float16, activations as float32 (2x smaller, GPU-friendly).

Q9: How to handle class imbalance?

AnswerUse class_weight parameter in model.fit(): class_weight={0: 1.0, 1: 10.0} inversely proportional to class frequency. Alternatively: oversample minority class, undersample majority, use focal loss, or use SMOTE for synthetic samples. Evaluate with F1-score, precision-recall curve (not just accuracy).

Q10: Explain Keras callbacks

AnswerCallbacks are functions called at training stages: (1) EarlyStopping - stop training when metric plateaus. (2) ModelCheckpoint - save best model. (3) ReduceLROnPlateau - reduce LR on plateau. (4) TensorBoard - log for visualization. (5) ProgbarLogger - progress display. Custom callbacks extend keras.callbacks.Callback with on_epoch_begin/end, on_batch_begin/end methods.

⏳

Loading cheatsheet...

import tensorflow as tf from tensorflow import keras print(tf.__version__) # e.g., 2.17.0 # ── Tensors ── scalar = tf.constant(42) vector = tf.constant([1.0, 2.0, 3.0]) matrix = tf.constant([[1, 2], [3, 4]], dtype=tf.float32) zeros = tf.zeros((3, 3)) ones = tf.ones((2, 4)) random = tf.random.normal((3, 3), mean=0, stddev=1) # Tensor operations a = tf.constant([[1, 2], [3, 4]]) b = tf.constant([[5, 6], [7, 8]]) print(a + b) # Element-wise addition print(tf.matmul(a, b)) # Matrix multiplication print(tf.reduce_mean(a)) # Mean: 2.5 print(tf.reshape(a, (4, 1))) # Reshape # GPU check print("GPUs Available:", tf.config.list_physical_devices('GPU')) # ── Eager vs Graph Execution ── # Eager: default in TF2, executes immediately # Graph: use @tf.function for performance @tf.function def compute(a, b): return tf.matmul(a, b) result = compute(tf.ones((1000, 1000)), tf.ones((1000, 1000)))

Feature

TensorFlow 2.x

PyTorch

Creator

Google Brain

Meta AI (FAIR)

API Style

Keras (high-level) + tf API

Pythonic, torch.nn modules

Graph Mode

@tf.function decorator

torch.compile (optional)

Deployment

TF Serving, TF Lite, TF.js, SavedModel

TorchServe, ONNX, TensorRT

Mobile/Edge

TF Lite (mature), TF Micro

PyTorch Mobile, ExecuTorch

Research Popularity

Industry/production

Research community (majority)

Visualization

TensorBoard (built-in)

TensorBoard (via torch.utils.tensorboard)

Learning Curve

Easier for beginners (Keras)

More Pythonic, intuitive for researchers

TPU Support

Native, first-class

Via PyTorch/XLA (less mature)

from tensorflow import keras from tensorflow.keras import layers # ── Sequential API ── model = keras.Sequential([ layers.Input(shape=(28, 28, 1)), layers.Conv2D(32, 3, activation='relu'), layers.MaxPooling2D(2), layers.Conv2D(64, 3, activation='relu'), layers.MaxPooling2D(2), layers.Flatten(), layers.Dense(128, activation='relu'), layers.Dropout(0.5), layers.Dense(10, activation='softmax'), ]) model.summary() # ── Functional API ── inputs = keras.Input(shape=(784,)) x = layers.Dense(256, activation='relu')(inputs) x = layers.BatchNormalization()(x) x = layers.Dropout(0.3)(x) x = layers.Dense(128, activation='relu')(x) outputs = layers.Dense(10, activation='softmax')(x) model = keras.Model(inputs=inputs, outputs=outputs, name="mlp_classifier") # ── Custom Layers ── class ResidualBlock(layers.Layer): def __init__(self, units, **kwargs): super().__init__(**kwargs) self.dense1 = layers.Dense(units, activation='relu') self.dense2 = layers.Dense(units) self.bn = layers.BatchNormalization() def call(self, inputs, training=False): x = self.dense1(inputs) x = self.dense2(x) x = self.bn(x, training=training) return layers.Activation('relu')(x + inputs) # Skip connection # ── Common Layer Types ── layers.Dense(64, activation='relu') # Fully connected layers.Conv2D(32, 3, padding='same') # 2D convolution layers.Conv1D(64, 5, activation='relu') # 1D convolution (text, time series) layers.LSTM(128, return_sequences=True) # RNN layer layers.GRU(64) # Gated recurrent unit layers.Bidirectional(layers.LSTM(64)) # Bidirectional wrapper layers.Embedding(10000, 128) # Word embeddings layers.MultiHeadAttention(num_heads=8, key_dim=64) # Transformer attention layers.GlobalAveragePooling2D() # Global average pooling layers.BatchNormalization() # Batch normalization layers.LayerNormalization() # Layer normalization

Layer

Use Case

Key Args

Output Shape

Dense

Fully connected layers

units, activation, use_bias

(batch, units)

Conv2D

Image feature extraction

filters, kernel_size, strides, padding

(batch, H, W, filters)

Conv1D

Sequence/text features

filters, kernel_size, strides

(batch, seq_len, filters)

MaxPooling2D

Spatial downsampling

pool_size, strides

(batch, H/ps, W/ps, ch)

LSTM

Sequential data processing

units, return_sequences, return_state

(batch, seq, units)

GRU

Lighter sequential processing

units, return_sequences

(batch, seq, units)

Embedding

Word/token to vector

input_dim, output_dim

(batch, seq, embed_dim)

BatchNormalization

Stabilize training

momentum, epsilon

Same as input

Dropout

Regularization

rate

Same as input

Flatten

Reshape to 1D

(batch, H*W*C)

from tensorflow import keras from tensorflow.keras import layers, callbacks # ── Build Model ── model = keras.Sequential([ layers.Input(shape=(28, 28, 1)), layers.Conv2D(32, 3, activation='relu'), layers.MaxPooling2D(2), layers.Flatten(), layers.Dense(128, activation='relu'), layers.Dropout(0.5), layers.Dense(10, activation='softmax'), ]) # ── Compile ── model.compile( optimizer=keras.optimizers.Adam(learning_rate=0.001), loss='sparse_categorical_crossentropy', metrics=['accuracy'], ) # ── Callbacks ── cb_list = [ callbacks.EarlyStopping(patience=5, restore_best_weights=True), callbacks.ModelCheckpoint('best_model.keras', save_best_only=True), callbacks.ReduceLROnPlateau(factor=0.5, patience=3, min_lr=1e-6), callbacks.TensorBoard(log_dir='./logs'), ] # ── Train ── history = model.fit( x_train, y_train, epochs=50, batch_size=64, validation_split=0.2, callbacks=cb_list, verbose=1, ) # ── Evaluate ── test_loss, test_acc = model.evaluate(x_test, y_test) print(f"Test accuracy: {test_acc:.4f}") # ── Predict ── predictions = model.predict(x_test[:5]) predicted_classes = tf.argmax(predictions, axis=1) # ── Custom Training Loop ── optimizer = keras.optimizers.Adam(learning_rate=1e-3) loss_fn = keras.losses.SparseCategoricalCrossentropy() @tf.function def train_step(x, y): with tf.GradientTape() as tape: logits = model(x, training=True) loss = loss_fn(y, logits) grads = tape.gradient(loss, model.trainable_variables) optimizer.apply_gradients(zip(grads, model.trainable_variables)) return loss for epoch in range(10): for step, (x_batch, y_batch) in enumerate(train_dataset): loss = train_step(x_batch, y_batch)

Optimizer

Learning Rate

Key Feature

Best For

SGD

0.01-0.1

Simple, well-understood, with momentum

Large datasets, fine-tuning with low LR

Adam

0.001 (default)

Adaptive LR per parameter, momentum

General-purpose, most common default

AdamW

0.001

Adam with decoupled weight decay

Transformer training, better generalization

RMSprop

0.001

Adaptive LR based on recent gradients

RNNs, non-stationary objectives

Adagrad

0.01

Accumulated squared gradients, decaying LR

Sparse gradients (NLP, embeddings)

Nadam

0.001

Adam + Nesterov momentum

Image classification, NLP

Lion

0.001

Memory-efficient, sign-based updates

Large models, memory-constrained training

Loss

Task

Output Activation

Use When

binary_crossentropy

Binary classification

sigmoid

2-class problems (spam/not spam)

categorical_crossentropy

Multi-class (one-hot)

softmax

Multi-class with one-hot labels

sparse_categorical_crossentropy

Multi-class (integers)

softmax

Multi-class with integer labels

mse (Mean Squared Error)

Regression

linear (none)

Predicting continuous values

mae (Mean Absolute Error)

Regression (robust)

linear

Regression with outliers

huber

Regression (smooth)

linear

Combines MSE/MAE benefits

cosine_similarity

Embeddings/similarity

Face recognition, embeddings

import tensorflow as tf # ── From NumPy Arrays ── dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train)) dataset = dataset.shuffle(10000).batch(32).prefetch(tf.data.AUTOTUNE) # ── From Image Directory ── train_ds = keras.utils.image_dataset_from_directory( './data/images/', validation_split=0.2, subset='training', seed=42, image_size=(224, 224), batch_size=32, ) # ── From CSV ── dataset = tf.data.experimental.make_csv_dataset( './data.csv', batch_size=32, label_name='target', num_epochs=1, ) # ── Advanced Pipeline with Augmentation ── AUTOTUNE = tf.data.AUTOTUNE def augment(image, label): image = tf.image.random_flip_left_right(image) image = tf.image.random_brightness(image, max_delta=0.2) image = tf.image.random_contrast(image, lower=0.8, upper=1.2) return image, label train_ds = (train_ds .map(augment, num_parallel_calls=AUTOTUNE) .shuffle(1000) .batch(32) .prefetch(AUTOTUNE) ) # ── TFRecord for Large Datasets ── # Write TFRecord feature = { 'image': tf.train.Feature(bytes_list=tf.train.BytesList( value=[image_bytes])), 'label': tf.train.Feature(int64_list=tf.train.Int64List( value=[label_id])), } example = tf.train.Example(features=tf.train.Features(feature=feature)) with tf.io.TFRecordWriter('data.tfrecord') as writer: writer.write(example.SerializeToString()) # Read TFRecord def parse_fn(example_proto): feature_desc = { 'image': tf.io.FixedLenFeature([], tf.string), 'label': tf.io.FixedLenFeature([], tf.int64), } parsed = tf.io.parse_single_example(example_proto, feature_desc) image = tf.io.decode_jpeg(parsed['image'], channels=3) return image, parsed['label'] dataset = tf.data.TFRecordDataset('data.tfrecord').map(parse_fn)

from tensorflow import keras from tensorflow.keras import layers, applications # ── Feature Extraction (freeze base) ── base_model = applications.MobileNetV2( weights='imagenet', input_shape=(224, 224, 3), include_top=False, ) base_model.trainable = False # Freeze all layers model = keras.Sequential([ base_model, layers.GlobalAveragePooling2D(), layers.Dense(256, activation='relu'), layers.Dropout(0.5), layers.Dense(num_classes, activation='softmax'), ]) model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy']) # ── Fine-Tuning (unfreeze top layers) ── base_model.trainable = True # Freeze bottom layers, fine-tune top N layers for layer in base_model.layers[:-20]: layer.trainable = False model.compile( optimizer=keras.optimizers.Adam(learning_rate=1e-5), # Very low LR! loss='sparse_categorical_crossentropy', metrics=['accuracy'], ) # ── Popular Pre-trained Models ── # applications.ResNet50(weights='imagenet', include_top=False) # applications.EfficientNetB0(weights='imagenet') # applications.VGG16(weights='imagenet') # applications.InceptionV3(weights='imagenet') # applications.ConvNeXtBase(weights='imagenet')

Model

Top-1 Acc

Params

Size

Best For

MobileNetV2

71.8%

3.4M

14 MB

Mobile, edge, real-time

MobileNetV3-Small

67.4%

2.5M

11 MB

Ultra-light mobile apps

EfficientNet-B0

77.1%

5.3M

21 MB

Best accuracy-to-size ratio

EfficientNet-B7

84.3%

66M

256 MB

Max accuracy from efficient family

ResNet50

76.0%

25.6M

98 MB

General-purpose baseline

ResNet152

78.3%

60.2M

232 MB

High accuracy, research

InceptionV3

77.9%

23.8M

92 MB

Multi-scale features

ConvNeXt-Base

85.3%

88.6M

350 MB

Modern CNN, matches ViT

VGG16

71.3%

138M

528 MB

Feature extraction, simple

# ── SavedModel (for TF Serving) ── model.save('my_model') # Creates a SavedModel directory model.save('my_model.keras') # Single-file Keras format (TF 2.13+) # Load loaded_model = keras.models.load_model('my_model.keras') # ── TF Lite Conversion ── converter = tf.lite.TFLiteConverter.from_keras_model(model) converter.optimizations = [tf.lite.Optimize.DEFAULT] # Quantization tflite_model = converter.convert() with open('model.tflite', 'wb') as f: f.write(tflite_model) # ── TF Lite Inference ── interpreter = tf.lite.Interpreter(model_path='model.tflite') interpreter.allocate_tensors() input_details = interpreter.get_input_details() output_details = interpreter.get_output_details() interpreter.set_tensor(input_details[0]['index'], input_data) interpreter.invoke() output = interpreter.get_tensor(output_details[0]['index']) # ── ONNX Export (via tf2onnx) ── import tf2onnx spec = (tf.TensorSpec((None, 224, 224, 3), tf.float32, name="input"),) model_onnx, _ = tf2onnx.convert.from_keras(model, input_signature=spec) with open("model.onnx", "wb") as f: f.write(model_onnx.SerializeToString())

Platform

Format

Size Reduction

Use Case

TF Serving (REST/gRPC)

SavedModel

None

Cloud production APIs

TF Lite

.tflite

3-4x (with quantization)

Android, iOS, Raspberry Pi, ESP32

TF Lite Micro

.tflite

4-10x

Microcontrollers (STM32, Arduino)

TF.js

JSON + weights

None

Browser, Node.js

ONNX Runtime

.onnx

Varies

Cross-platform, C#/C++ apps

TensorRT

.plan

2-3x (FP16/INT8)

NVIDIA GPUs, max inference speed

from tensorflow import keras from tensorflow.keras import layers # ── Simple CNN for CIFAR-10 ── model = keras.Sequential([ layers.Input(shape=(32, 32, 3)), # Block 1 layers.Conv2D(32, 3, padding='same', activation='relu'), layers.BatchNormalization(), layers.Conv2D(32, 3, padding='same', activation='relu'), layers.BatchNormalization(), layers.MaxPooling2D(2), layers.Dropout(0.25), # Block 2 layers.Conv2D(64, 3, padding='same', activation='relu'), layers.BatchNormalization(), layers.Conv2D(64, 3, padding='same', activation='relu'), layers.BatchNormalization(), layers.MaxPooling2D(2), layers.Dropout(0.25), # Classifier layers.Flatten(), layers.Dense(512, activation='relu'), layers.BatchNormalization(), layers.Dropout(0.5), layers.Dense(10, activation='softmax'), ]) model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy']) # ── Data Augmentation Layer (Keras 3) ── data_augmentation = keras.Sequential([ layers.RandomFlip('horizontal'), layers.RandomRotation(0.1), layers.RandomZoom(0.1), layers.RandomContrast(0.1), ]) model = keras.Sequential([ data_augmentation, layers.Rescaling(1./255), # ... rest of model ])

Architecture

Year

Innovation

Params

LeNet-5

1998

First successful CNN (handwriting)

60K

AlexNet

2012

Deep CNN + ReLU + dropout

60M

VGG16

2014

Uniform 3x3 convolutions, depth

138M

GoogLeNet

2014

Inception modules (multi-scale)

6.8M

ResNet

2015

Skip connections (residual learning)

25M (ResNet50)

DenseNet

2017

Dense connections between layers

8M (DenseNet121)

EfficientNet

2019

Compound scaling (depth, width, res)

5.3M (B0)

ConvNeXt

2022

Modernized CNN, matches ViT

88M (Base)

from tensorflow.keras import layers, regularizers # ── Weight Regularization ── layers.Dense(64, activation='relu', kernel_regularizer=regularizers.l2(0.01)) layers.Conv2D(32, 3, kernel_regularizer=regularizers.l1_l2(l1=1e-5, l2=1e-4)) # ── Dropout ── layers.Dropout(0.5) # Drop 50% of neurons during training # ── Early Stopping ── early_stop = keras.callbacks.EarlyStopping( monitor='val_loss', patience=10, restore_best_weights=True, min_delta=0.001, ) # ── Learning Rate Scheduler ── lr_schedule = keras.optimizers.schedules.CosineDecay( initial_learning_rate=1e-3, decay_steps=10000, ) optimizer = keras.optimizers.Adam(learning_rate=lr_schedule) # ── Gradient Clipping ── optimizer = keras.optimizers.Adam( learning_rate=1e-3, clipnorm=1.0, # Clip by norm # clipvalue=0.5, # Clip by value ) # ── Debugging: Check for NaN ── # Use gradient clipping, lower LR, check data normalization # Check for exploding/vanishing gradients # Use tensorboard to monitor weights and gradients

Symptom

Likely Cause

Solution

Training loss not decreasing

LR too high/low, data issue

Try LR=0.001, check labels, normalize data

Train loss low, val loss high

Overfitting

More dropout, L2 reg, data augmentation, reduce model

Both losses high

Underfitting

Increase model capacity, train longer, check data

NaN loss

Exploding gradients

Lower LR, gradient clipping, check for zeros in data

Loss oscillating

LR too high or batch too small

Reduce LR, increase batch size

Very slow training

Data pipeline bottleneck

Use tf.data prefetch, increase num_parallel_calls

import keras_tuner as kt from tensorflow import keras from tensorflow.keras import layers def build_model(hp): model = keras.Sequential() model.add(layers.Input(shape=(784,))) # Tune number of layers for i in range(hp.Int('num_layers', 1, 4)): model.add(layers.Dense( units=hp.Int(f'units_{i}', min_value=32, max_value=512, step=32), activation=hp.Choice('activation', ['relu', 'tanh']), kernel_regularizer=keras.regularizers.l2( hp.Float('l2', 1e-5, 1e-2, sampling='log') ), )) model.add(layers.Dropout(hp.Float('dropout', 0.1, 0.5, step=0.1))) model.add(layers.Dense(10, activation='softmax')) model.compile( optimizer=keras.optimizers.Adam( learning_rate=hp.Float('lr', 1e-4, 1e-2, sampling='log') ), loss='sparse_categorical_crossentropy', metrics=['accuracy'], ) return model tuner = kt.Hyperband( build_model, objective='val_accuracy', max_epochs=30, factor=3, directory='tuner_results', project_name='mnist_tuning', ) tuner.search(x_train, y_train, epochs=30, validation_split=0.2) best_hps = tuner.get_best_hyperparameters(num_trials=1)[0] best_model = tuner.hypermodel.build(best_hps)