Cassandra Cheatsheet Cheatsheet

🗄️

Data Model & Keyspace

FUNDAMENTAL

cassandra/keyspace.cql

-- ── Create Keyspace ──
CREATE KEYSPACE IF NOT EXISTS ecommerce
WITH replication = {
  'class': 'NetworkTopologyStrategy',
  'dc1': 3,
  'dc2': 2
}
AND durable_writes = true;

-- SimpleStrategy (single DC only)
CREATE KEYSPACE IF NOT EXISTS dev_keyspace
WITH replication = {
  'class': 'SimpleStrategy',
  'replication_factor': 3
};

-- Use Keyspace
USE ecommerce;

-- Describe keyspace
DESCRIBE KEYSPACE ecommerce;

-- Alter Keyspace
ALTER KEYSPACE ecommerce
WITH replication = {
  'class': 'NetworkTopologyStrategy',
  'dc1': 3,
  'dc2': 3
};

-- Drop Keyspace
DROP KEYSPACE IF EXISTS dev_keyspace;

-- ── Data Model Design (query-driven!) ──
-- Rule #1: Model around queries, NOT entities
-- Rule #2: One table per query pattern
-- Rule #3: Partition key distributes data
-- Rule #4: Clustering key sorts within partition

-- Users table (accessed by user_id)
CREATE TABLE users (
    user_id   UUID PRIMARY KEY,
    email     TEXT,
    name      TEXT,
    created_at TIMESTAMP,
    active    BOOLEAN
);

-- Orders by user (partition = user_id, cluster = order_date)
CREATE TABLE orders_by_user (
    user_id     UUID,
    order_date  TIMESTAMP,
    order_id    UUID,
    total       DECIMAL,
    status      TEXT,
    PRIMARY KEY ((user_id), order_date, order_id)
) WITH CLUSTERING ORDER BY (order_date DESC, order_id ASC);

-- Orders by status (partition = status, cluster = order_date)
CREATE TABLE orders_by_status (
    status      TEXT,
    order_date  TIMESTAMP,
    order_id    UUID,
    user_id     UUID,
    total       DECIMAL,
    PRIMARY KEY ((status), order_date, order_id)
) WITH CLUSTERING ORDER BY (order_date DESC);

CQL Data Types

Type	Description	Example
UUID	Version 4 UUID	a3f...
TIMEUUID	Type 1 UUID (sortable)	timeuuid()
TEXT	UTF-8 string	'hello'
VARCHAR	Same as TEXT	'hello'
INT	32-bit integer	42
BIGINT	64-bit integer	9223372036854775807
FLOAT / DOUBLE	Floating point	3.14
DECIMAL	High precision	19.99
BOOLEAN	True / false	true
TIMESTAMP	Date and time	'2024-01-15 10:30:00'
DATE	Date only	'2024-01-15'
TIME	Time only	'10:30:00'
BLOB	Binary data	0xCAFE
LIST<T>	Ordered, duplicates	['a','b','a']
SET<T>	Unordered, unique	{'a','b'}
MAP<K,V>	Key-value pairs	{'a': 1, 'b': 2}
TUPLE<T,...>	Fixed-length typed	(1, text, true)
UDT	User-defined type	address_type
FROZEN<T>	Immutable collection	frozen<list<text>>

Primary Key Rules

Concept	Description
Partition Key	Determines data distribution (hash)
Composite PK	(a, b) - a is partition, b is cluster
Clustering Key	Sorts data within a partition
Wide Row	Many clustering values per partition
Partition Limit	~100 MB per partition recommended

⚠️

Always model around queries. In Cassandra, denormalization is normal. Create one table per query pattern. The partition key determines which node stores the data. The clustering key controls sort order within a partition.

✏️

CQL CRUD Operations

DATA

cassandra/crud.cql

-- ── INSERT ──
INSERT INTO users (user_id, email, name, created_at, active)
VALUES (uuid(), 'alice@example.com', 'Alice', toTimestamp(now()), true);

INSERT INTO orders_by_user (user_id, order_date, order_id, total, status)
VALUES (123e4567-e89b-..., toTimestamp(now()), uuid(), 99.99, 'pending');

-- Insert with TTL (auto-expire in 86400 seconds = 1 day)
INSERT INTO sessions (session_id, user_id, data)
VALUES (uuid(), 123e4567-e89b-..., '...') USING TTL 86400;

-- ── SELECT ──
-- MUST include full partition key
SELECT * FROM users WHERE user_id = 123e4567-e89b-...;

-- With clustering key range
SELECT * FROM orders_by_user
WHERE user_id = 123e4567-e89b-...
  AND order_date >= '2024-01-01' AND order_date < '2024-02-01';

-- LIMIT
SELECT * FROM orders_by_status
WHERE status = 'shipped' LIMIT 100;

-- ALLOW FILTERING (warning: slow, full partition scan)
SELECT * FROM users WHERE email = 'alice@example.com' ALLOW FILTERING;

-- IN clause on partition key (queries multiple partitions)
SELECT * FROM users WHERE user_id IN (uuid1, uuid2, uuid3);

-- ── UPDATE ──
UPDATE users SET name = 'Alice Smith', active = false
WHERE user_id = 123e4567-e89b-...;

UPDATE orders_by_user SET status = 'shipped'
WHERE user_id = 123e4567-e89b-...
  AND order_date = '2024-01-15 10:30:00'
  AND order_id = 456e7890-...;

-- Update with TTL
UPDATE users USING TTL 3600 SET active = true
WHERE user_id = 123e4567-e89b-...;

-- Update collections
UPDATE users SET tags = tags + ['vip'] WHERE user_id = ...;   -- append list
UPDATE users SET tags = tags - ['old'] WHERE user_id = ...;   -- remove from list
UPDATE users SET scores = scores + {100} WHERE user_id = ...; -- add to set
UPDATE users SET attrs = attrs + {'key': 'val'} WHERE user_id = ...;

-- ── DELETE ──
DELETE FROM users WHERE user_id = 123e4567-e89b-...;

-- Delete specific row (with clustering key)
DELETE FROM orders_by_user
WHERE user_id = 123e4567-e89b-...
  AND order_date = '2024-01-15'
  AND order_id = 456e7890-...;

-- Delete a column
DELETE email FROM users WHERE user_id = ...;

-- Delete with TTL (tombstone expires)
DELETE FROM sessions WHERE session_id = ...;

Query Rules

Rule	Detail
Must have partition key	WHERE must include full partition key
No inequality on partition	Cannot use <, > on partition key
ALLOW FILTERING	Enables non-PK filters (slow!)
IN on partition key	Queries multiple nodes (use sparingly)
LIMIT required	For unbounded clustering key ranges
No ad-hoc joins	Denormalize instead of joining
No OR in WHERE	Use separate queries or IN clause

Collection Tips

Operation	Syntax
Append to list	tags = tags + ['new']
Prepend to list	tags = ['first'] + tags
Remove from list	tags = tags - ['old']
Replace at index	tags[2] = 'replaced'
Add to set	emails = emails + {'new@x.com'}
Remove from set	emails = emails - {'old@x.com'}
Put in map	attrs['key'] = 'value'
Remove from map	attrs = attrs - {'key'}

🚫

ALLOW FILTERING is a code smell. It forces a full partition scan, which is slow. Instead, create a separate table with the filter column as the partition key. Always query by partition key.

🔄

Replication & Consistency

DISTRIBUTED

cassandra/replication.cql

-- ── Replication Strategies ──
-- SimpleStrategy: Single data center
CREATE KEYSPACE ks1 WITH replication = {
  'class': 'SimpleStrategy',
  'replication_factor': 3
};

-- NetworkTopologyStrategy: Multiple data centers
CREATE KEYSPACE ks2 WITH replication = {
  'class': 'NetworkTopologyStrategy',
  'dc1': 3,
  'dc2': 3,
  'dc3': 2
};

-- ── Consistency Levels ──
-- Set per-query
SELECT * FROM users WHERE user_id = ...
  CONSISTENCY QUORUM;

SELECT * FROM users WHERE user_id = ...
  CONSISTENCY LOCAL_QUORUM;

INSERT INTO users (...) VALUES (...)
  CONSISTENCY ALL;

-- Set per-session
CONSISTENCY LOCAL_QUORUM;

-- Consistency level requirements:
-- ANY       = 1 node (hinted handoff allowed)
-- ONE       = 1 node must respond
-- LOCAL_ONE = 1 node in local DC
-- QUORUM    = (replication_factor / 2) + 1
-- LOCAL_QUORUM = quorum in local DC
-- ALL       = all replica nodes must respond

-- ── Repair (anti-entropy) ──
-- Run periodically to fix inconsistencies
nodetool repair -pr  (primary range only, recommended)
nodetool repair ks table  (specific table)

-- ── Snitch Configuration (cassandra.yaml) ──
-- GossipingPropertyFileSnitch (production recommended)
-- SimpleSnitch (dev/test)
-- Ec2Snitch, Ec2MultiRegionSnitch (AWS)
-- GoogleCloudSnitch (GCP)

Consistency Levels

Level	Nodes Required	Use Case
ANY	1 (hinted)	Write fire-and-forget
ONE	1	Low-latency reads
LOCAL_ONE	1 in local DC	Multi-DC reads
QUORUM	RF/2 + 1	Strong consistency
LOCAL_QUORUM	DC quorum	Strong + DC-local
ALL	All replicas	Max consistency, slow

CAP Trade-offs

Setting	Availability	Consistency
QUORUM / QUORUM	High	High
ONE / QUORUM	Very High	Medium
QUORUM / ALL	Low	Very High
LOCAL_QUORUM x2	Very High	Medium

💡

Use LOCAL_QUORUM for most production workloads in multi-DC setups. It provides strong consistency within a data center while remaining available if another DC goes down. Run nodetool repair weekly to prevent data drift.

📑

Indexes & Advanced Tables

QUERYING

cassandra/indexes.cql

-- ── Secondary Index (SASI / built-in) ──
-- Good for low-cardinality columns
CREATE INDEX idx_users_email ON users (email);
CREATE INDEX idx_orders_status ON orders_by_user (status);

-- Drop index
DROP INDEX IF EXISTS idx_users_email;

-- ── SASI (SSTable-Attached Secondary Index) ──
-- More efficient than built-in indexes
CREATE CUSTOM INDEX idx_users_name ON users (name)
USING 'org.apache.cassandra.index.sasi.SASIIndex'
WITH OPTIONS = {
  'mode': 'CONTAINS',
  'analyzer_class': 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer',
  'case_sensitive': 'false'
};

CREATE CUSTOM INDEX idx_orders_total ON orders_by_user (total)
USING 'org.apache.cassandra.index.sasi.SASIIndex'
WITH OPTIONS = { 'mode': 'SPARSE' };

-- ── User-Defined Types ──
CREATE TYPE address (
    street TEXT,
    city TEXT,
    state TEXT,
    zip TEXT,
    country TEXT
);

CREATE TYPE phone_number (
    type TEXT,   -- 'home', 'mobile', 'work'
    number TEXT
);

CREATE TABLE customers (
    customer_id UUID PRIMARY KEY,
    name TEXT,
    address FROZEN<address>,
    phones LIST<FROZEN<phone_number>>,
    tags SET<TEXT>
);

INSERT INTO customers (customer_id, name, address, phones)
VALUES (
    uuid(), 'Alice',
    {street: '123 Main', city: 'NYC', state: 'NY', zip: '10001', country: 'US'},
    [{type: 'mobile', number: '555-1234'}, {type: 'work', number: '555-5678'}]
);

-- ── Counter Table ──
CREATE TABLE page_views (
    page_id     TEXT,
    view_count  COUNTER,
    PRIMARY KEY (page_id)
);

UPDATE page_views SET view_count = view_count + 1 WHERE page_id = '/home';
UPDATE page_views SET view_count = view_count + 1 WHERE page_id = '/home';
SELECT * FROM page_views WHERE page_id = '/home';

-- Note: Counter columns can only be updated (not inserted)
-- Counter tables cannot be used with IF NOT EXISTS in collections

⚠️

Secondary indexes in Cassandra are anti-patterns for high-cardinality columns. An index on email (high cardinality) creates a wide row that hurts performance. Instead, create a separate table with email as the partition key.

📦

Batches & Lightweight Transactions

ATOMICITY

cassandra/batches.cql

-- ── Logged Batch (atomic across single partition) ──
BEGIN LOGGED BATCH
  INSERT INTO orders_by_user (user_id, order_date, order_id, total, status)
    VALUES (uuid(), toTimestamp(now()), uuid(), 50.00, 'pending');
  INSERT INTO orders_by_status (status, order_date, order_id, user_id, total)
    VALUES ('pending', toTimestamp(now()), uuid(), uuid(), 50.00);
  UPDATE users SET order_count = order_count + 1
    WHERE user_id = uuid();
APPLY BATCH;

-- ── Unlogged Batch (faster, no atomicity) ──
BEGIN UNLOGGED BATCH
  INSERT INTO metrics_day (day, metric, value) VALUES ('2024-01-15', 'views', 100);
  INSERT INTO metrics_day (day, metric, value) VALUES ('2024-01-15', 'clicks', 25);
  INSERT INTO metrics_day (day, metric, value) VALUES ('2024-01-15', 'conversions', 3);
APPLY BATCH;

-- ── Lightweight Transaction (Paxos / Compare-and-Set) ──
-- Only works on a single partition
INSERT INTO users (user_id, email, name)
VALUES (uuid(), 'new@example.com', 'NewUser')
IF NOT EXISTS;

UPDATE users SET active = true
WHERE user_id = 123e4567-...
IF active = false;

DELETE FROM users
WHERE user_id = 123e4567-...
IF email = 'alice@example.com';

-- ── Conditional Update with IF ──
UPDATE accounts SET balance = balance - 100
WHERE account_id = 456e7890-...
IF balance >= 100;

-- Check if applied
-- Result includes [applied] column (true/false)

Batch Types

Type	Atomicity	Performance	Use Case
LOGGED	Yes (Paxos)	Slow	Multi-table consistency
UNLOGGED	No	Fast	Bulk inserts, same DC
LWT (IF)	Yes (Paxos)	Very Slow	Uniqueness, CAS

LWT Best Practices

Do	Do Not
Use for uniqueness checks	Use LWT for counters
Use for account balance updates	Use LWT on hot partitions
Keep LWT operations minimal	Use LWT across partitions
Expect ~4x latency overhead	Use LWT in batch loops

🚫

Logged batches are NOT distributed transactions. They provide atomicity only within a single partition (same node). For multi-partition atomicity, use lightweight transactions (LWT) but expect significant performance overhead (~4-5x slower).

⚙️

Tuning & Performance

OPTIMIZATION

cassandra/tuning.cql

-- ── Table Options ──
CREATE TABLE events (
    event_id    TIMEUUID,
    user_id     UUID,
    type        TEXT,
    data        TEXT,
    PRIMARY KEY ((user_id), event_id)
) WITH CLUSTERING ORDER BY (event_id DESC)
  AND compaction = {
    'class': 'LeveledCompactionStrategy',
    'sstable_size_in_mb': 160
  }
  AND compression = {
    'sstable_compression': 'LZ4Compressor',
    'chunk_length_in_kb': 64
  }
  AND caching = {
    'keys': 'ALL',
    'rows_per_partition': '100'
  }
  AND gc_grace_seconds = 864000
  AND default_time_to_live = 0
  AND memtable_flush_writers_in_mb = 256;

-- ── Compaction Strategies ──
-- SizeTieredCompactionStrategy (default, good for write-heavy)
-- LeveledCompactionStrategy (good for read-heavy)
-- DateTieredCompactionStrategy (time-series data)
-- TimeWindowCompactionStrategy (TWCS, best for time-series)

ALTER TABLE events WITH compaction = {
  'class': 'TimeWindowCompactionStrategy',
  'compaction_window_size': 1,
  'compaction_window_unit': 'DAYS'
};

-- ── Tombstone Management ──
-- gc_grace_seconds: default 864000 (10 days)
-- Run nodetool repair before gc_grace expires
ALTER TABLE events WITH gc_grace_seconds = 604800;

-- ── Caching Options ──
-- keys: ALL, NONE, ROW_CACHE
-- rows_per_partition: integer or NONE
ALTER TABLE users WITH caching = {
  'keys': 'ALL',
  'rows_per_partition': '200'
};

-- ── Nodetool Commands (CLI) ──
-- nodetool status              -- cluster health
-- nodetool cfstats ks.table    -- table stats
-- nodetool tablestats ks.table -- detailed stats
-- nodetool cleanup             -- remove deleted data
-- nodetool compactionstats     -- compaction info
-- nodetool repair -pr          -- anti-repair
-- nodetool gossipinfo          -- cluster topology
-- nodetool netstats            -- network stats
-- nodetool proxyhistograms     -- read latency

Compaction Strategies

Strategy	Best For	Read Perf	Write Perf
STCS (default)	Write-heavy, equal-size cols	Medium	High
LCS	Read-heavy, high concurrency	High	Medium
DTCS	TTL/expiring data	Medium	Medium
TWCS	Time-series, IoT	High	High

Anti-Patterns

Anti-Pattern	Solution
High-cardinality index	Separate table with PK
Deleting lots of data	Use TTL instead
Many small partitions	Bucket by date/time
Large collections (>64 KB)	Separate entity table
ALLOW FILTERING in prod	Redesign data model
Unbounded row growth	Add date bucket to PK

💡

Use TWCS (TimeWindowCompactionStrategy) for time-series data. It groups SSTables by time windows, keeping recent data in small SSTables for fast writes, and efficiently compacts older data.

🎯

Interview Q&A

PREP

Q: What is the difference between a partition key and a clustering key?The partition key determines which node(s) store the data (hash-based distribution). The clustering key sorts data within a partition on disk. Primary key = partition key + optional clustering columns. Composite partition key: ((a, b), c) where (a,b) is partition and c is clustering.

Q: Why is denormalization normal in Cassandra?Because Cassandra does not support joins. Each query should be served by a single table. This means data is duplicated across tables, each optimized for a specific access pattern. Write overhead is acceptable because Cassandra is optimized for writes.

Q: How does consistency level work?Consistency level determines how many replica nodes must acknowledge a read or write. QUORUM = ceil(RF/2). For RF=3, QUORUM=2. Write QUORUM + Read QUORUM guarantees strong consistency (R + W > RF). LOCAL_QUORUM provides DC-local consistency.

Q: What are tombstones and why are they a problem?A tombstone is a marker for deleted data. Cassandra does not immediately delete data; it writes a tombstone. During reads, tombstones must be merged. Too many tombstones slow reads. Solutions: use TTL, run repairs, keep gc_grace_seconds reasonable, avoid mass deletes.

Q: Explain the difference between compaction strategies.STCS merges similarly-sized SSTables (good for write-heavy). LCS creates sorted levels (good for read-heavy). TWCS groups data by time windows (best for time-series). DTCS is deprecated in favor of TWCS for time-series.

Q: When should you NOT use Cassandra?Avoid Cassandra when: you need ACID transactions across rows, complex joins/aggregations, strong consistency with low latency, or small datasets. Use PostgreSQL, MySQL, or MongoDB for these cases. Cassandra excels at high write throughput, time-series, and large-scale distributed data.

Q: What is hinted handoff?When a replica node is down, the coordinator stores a hint (the write) locally. When the node comes back up, the hint is replayed to that node. This improves write availability but does not guarantee consistency during the outage.

Q: How do you handle large partitions in Cassandra?Keep partitions under 100 MB. Solutions: (1) bucket by time (e.g., daily partitions), (2) increase clustering key cardinality, (3) use TWCS compaction, (4) monitor with nodetool cfstats. Large partitions cause slow reads, repair issues, and compaction pressure.

💡

Top Cassandra interview topics: data modeling (query-driven), partition key vs clustering key, consistency levels and CAP theorem, compaction strategies, tombstones and gc_grace, lightweight transactions, batch types, and anti-patterns.

⏳

Loading cheatsheet...

-- ── Create Keyspace ── CREATE KEYSPACE IF NOT EXISTS ecommerce WITH replication = { 'class': 'NetworkTopologyStrategy', 'dc1': 3, 'dc2': 2 } AND durable_writes = true; -- SimpleStrategy (single DC only) CREATE KEYSPACE IF NOT EXISTS dev_keyspace WITH replication = { 'class': 'SimpleStrategy', 'replication_factor': 3 }; -- Use Keyspace USE ecommerce; -- Describe keyspace DESCRIBE KEYSPACE ecommerce; -- Alter Keyspace ALTER KEYSPACE ecommerce WITH replication = { 'class': 'NetworkTopologyStrategy', 'dc1': 3, 'dc2': 3 }; -- Drop Keyspace DROP KEYSPACE IF EXISTS dev_keyspace; -- ── Data Model Design (query-driven!) ── -- Rule #1: Model around queries, NOT entities -- Rule #2: One table per query pattern -- Rule #3: Partition key distributes data -- Rule #4: Clustering key sorts within partition -- Users table (accessed by user_id) CREATE TABLE users ( user_id UUID PRIMARY KEY, email TEXT, name TEXT, created_at TIMESTAMP, active BOOLEAN ); -- Orders by user (partition = user_id, cluster = order_date) CREATE TABLE orders_by_user ( user_id UUID, order_date TIMESTAMP, order_id UUID, total DECIMAL, status TEXT, PRIMARY KEY ((user_id), order_date, order_id) ) WITH CLUSTERING ORDER BY (order_date DESC, order_id ASC); -- Orders by status (partition = status, cluster = order_date) CREATE TABLE orders_by_status ( status TEXT, order_date TIMESTAMP, order_id UUID, user_id UUID, total DECIMAL, PRIMARY KEY ((status), order_date, order_id) ) WITH CLUSTERING ORDER BY (order_date DESC);

Type

Description

Example

UUID

Version 4 UUID

a3f...

TIMEUUID

Type 1 UUID (sortable)

timeuuid()

TEXT

UTF-8 string

'hello'

VARCHAR

Same as TEXT

'hello'

INT

32-bit integer

BIGINT

64-bit integer

9223372036854775807

FLOAT / DOUBLE

Floating point

3.14

DECIMAL

High precision

19.99

BOOLEAN

True / false

true

TIMESTAMP

Date and time

'2024-01-15 10:30:00'

DATE

Date only

'2024-01-15'

TIME

Time only

'10:30:00'

BLOB

Binary data

0xCAFE

LIST<T>

Ordered, duplicates

['a','b','a']

SET<T>

Unordered, unique

{'a','b'}

MAP<K,V>

Key-value pairs

{'a': 1, 'b': 2}

TUPLE<T,...>

Fixed-length typed

(1, text, true)

UDT

User-defined type

address_type

FROZEN<T>

Immutable collection

frozen<list<text>>

Concept

Description

Partition Key

Determines data distribution (hash)

Composite PK

(a, b) - a is partition, b is cluster

Clustering Key

Sorts data within a partition

Wide Row

Many clustering values per partition

Partition Limit

~100 MB per partition recommended

-- ── INSERT ── INSERT INTO users (user_id, email, name, created_at, active) VALUES (uuid(), 'alice@example.com', 'Alice', toTimestamp(now()), true); INSERT INTO orders_by_user (user_id, order_date, order_id, total, status) VALUES (123e4567-e89b-..., toTimestamp(now()), uuid(), 99.99, 'pending'); -- Insert with TTL (auto-expire in 86400 seconds = 1 day) INSERT INTO sessions (session_id, user_id, data) VALUES (uuid(), 123e4567-e89b-..., '...') USING TTL 86400; -- ── SELECT ── -- MUST include full partition key SELECT * FROM users WHERE user_id = 123e4567-e89b-...; -- With clustering key range SELECT * FROM orders_by_user WHERE user_id = 123e4567-e89b-... AND order_date >= '2024-01-01' AND order_date < '2024-02-01'; -- LIMIT SELECT * FROM orders_by_status WHERE status = 'shipped' LIMIT 100; -- ALLOW FILTERING (warning: slow, full partition scan) SELECT * FROM users WHERE email = 'alice@example.com' ALLOW FILTERING; -- IN clause on partition key (queries multiple partitions) SELECT * FROM users WHERE user_id IN (uuid1, uuid2, uuid3); -- ── UPDATE ── UPDATE users SET name = 'Alice Smith', active = false WHERE user_id = 123e4567-e89b-...; UPDATE orders_by_user SET status = 'shipped' WHERE user_id = 123e4567-e89b-... AND order_date = '2024-01-15 10:30:00' AND order_id = 456e7890-...; -- Update with TTL UPDATE users USING TTL 3600 SET active = true WHERE user_id = 123e4567-e89b-...; -- Update collections UPDATE users SET tags = tags + ['vip'] WHERE user_id = ...; -- append list UPDATE users SET tags = tags - ['old'] WHERE user_id = ...; -- remove from list UPDATE users SET scores = scores + {100} WHERE user_id = ...; -- add to set UPDATE users SET attrs = attrs + {'key': 'val'} WHERE user_id = ...; -- ── DELETE ── DELETE FROM users WHERE user_id = 123e4567-e89b-...; -- Delete specific row (with clustering key) DELETE FROM orders_by_user WHERE user_id = 123e4567-e89b-... AND order_date = '2024-01-15' AND order_id = 456e7890-...; -- Delete a column DELETE email FROM users WHERE user_id = ...; -- Delete with TTL (tombstone expires) DELETE FROM sessions WHERE session_id = ...;

Rule

Detail

Must have partition key

WHERE must include full partition key

No inequality on partition

Cannot use <, > on partition key

ALLOW FILTERING

Enables non-PK filters (slow!)

IN on partition key

Queries multiple nodes (use sparingly)

LIMIT required

For unbounded clustering key ranges

No ad-hoc joins

Denormalize instead of joining

No OR in WHERE

Use separate queries or IN clause

Operation

Syntax

Append to list

tags = tags + ['new']

Prepend to list

tags = ['first'] + tags

Remove from list

tags = tags - ['old']

Replace at index

tags[2] = 'replaced'

Add to set

emails = emails + {'new@x.com'}

Remove from set

emails = emails - {'old@x.com'}

Put in map

attrs['key'] = 'value'

Remove from map

attrs = attrs - {'key'}

-- ── Replication Strategies ── -- SimpleStrategy: Single data center CREATE KEYSPACE ks1 WITH replication = { 'class': 'SimpleStrategy', 'replication_factor': 3 }; -- NetworkTopologyStrategy: Multiple data centers CREATE KEYSPACE ks2 WITH replication = { 'class': 'NetworkTopologyStrategy', 'dc1': 3, 'dc2': 3, 'dc3': 2 }; -- ── Consistency Levels ── -- Set per-query SELECT * FROM users WHERE user_id = ... CONSISTENCY QUORUM; SELECT * FROM users WHERE user_id = ... CONSISTENCY LOCAL_QUORUM; INSERT INTO users (...) VALUES (...) CONSISTENCY ALL; -- Set per-session CONSISTENCY LOCAL_QUORUM; -- Consistency level requirements: -- ANY = 1 node (hinted handoff allowed) -- ONE = 1 node must respond -- LOCAL_ONE = 1 node in local DC -- QUORUM = (replication_factor / 2) + 1 -- LOCAL_QUORUM = quorum in local DC -- ALL = all replica nodes must respond -- ── Repair (anti-entropy) ── -- Run periodically to fix inconsistencies nodetool repair -pr (primary range only, recommended) nodetool repair ks table (specific table) -- ── Snitch Configuration (cassandra.yaml) ── -- GossipingPropertyFileSnitch (production recommended) -- SimpleSnitch (dev/test) -- Ec2Snitch, Ec2MultiRegionSnitch (AWS) -- GoogleCloudSnitch (GCP)

Level

Nodes Required

Use Case

ANY

1 (hinted)

Write fire-and-forget

ONE

Low-latency reads

LOCAL_ONE

1 in local DC

Multi-DC reads

QUORUM

RF/2 + 1

Strong consistency

LOCAL_QUORUM

DC quorum

Strong + DC-local

ALL

All replicas

Max consistency, slow

Setting

Availability

Consistency

QUORUM / QUORUM

High

ONE / QUORUM

Very High

Medium

QUORUM / ALL

Low

Very High

LOCAL_QUORUM x2

Very High

Medium

-- ── Secondary Index (SASI / built-in) ── -- Good for low-cardinality columns CREATE INDEX idx_users_email ON users (email); CREATE INDEX idx_orders_status ON orders_by_user (status); -- Drop index DROP INDEX IF EXISTS idx_users_email; -- ── SASI (SSTable-Attached Secondary Index) ── -- More efficient than built-in indexes CREATE CUSTOM INDEX idx_users_name ON users (name) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = { 'mode': 'CONTAINS', 'analyzer_class': 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer', 'case_sensitive': 'false' }; CREATE CUSTOM INDEX idx_orders_total ON orders_by_user (total) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = { 'mode': 'SPARSE' }; -- ── User-Defined Types ── CREATE TYPE address ( street TEXT, city TEXT, state TEXT, zip TEXT, country TEXT ); CREATE TYPE phone_number ( type TEXT, -- 'home', 'mobile', 'work' number TEXT ); CREATE TABLE customers ( customer_id UUID PRIMARY KEY, name TEXT, address FROZEN<address>, phones LIST<FROZEN<phone_number>>, tags SET<TEXT> ); INSERT INTO customers (customer_id, name, address, phones) VALUES ( uuid(), 'Alice', {street: '123 Main', city: 'NYC', state: 'NY', zip: '10001', country: 'US'}, [{type: 'mobile', number: '555-1234'}, {type: 'work', number: '555-5678'}] ); -- ── Counter Table ── CREATE TABLE page_views ( page_id TEXT, view_count COUNTER, PRIMARY KEY (page_id) ); UPDATE page_views SET view_count = view_count + 1 WHERE page_id = '/home'; UPDATE page_views SET view_count = view_count + 1 WHERE page_id = '/home'; SELECT * FROM page_views WHERE page_id = '/home'; -- Note: Counter columns can only be updated (not inserted) -- Counter tables cannot be used with IF NOT EXISTS in collections

-- ── Logged Batch (atomic across single partition) ── BEGIN LOGGED BATCH INSERT INTO orders_by_user (user_id, order_date, order_id, total, status) VALUES (uuid(), toTimestamp(now()), uuid(), 50.00, 'pending'); INSERT INTO orders_by_status (status, order_date, order_id, user_id, total) VALUES ('pending', toTimestamp(now()), uuid(), uuid(), 50.00); UPDATE users SET order_count = order_count + 1 WHERE user_id = uuid(); APPLY BATCH; -- ── Unlogged Batch (faster, no atomicity) ── BEGIN UNLOGGED BATCH INSERT INTO metrics_day (day, metric, value) VALUES ('2024-01-15', 'views', 100); INSERT INTO metrics_day (day, metric, value) VALUES ('2024-01-15', 'clicks', 25); INSERT INTO metrics_day (day, metric, value) VALUES ('2024-01-15', 'conversions', 3); APPLY BATCH; -- ── Lightweight Transaction (Paxos / Compare-and-Set) ── -- Only works on a single partition INSERT INTO users (user_id, email, name) VALUES (uuid(), 'new@example.com', 'NewUser') IF NOT EXISTS; UPDATE users SET active = true WHERE user_id = 123e4567-... IF active = false; DELETE FROM users WHERE user_id = 123e4567-... IF email = 'alice@example.com'; -- ── Conditional Update with IF ── UPDATE accounts SET balance = balance - 100 WHERE account_id = 456e7890-... IF balance >= 100; -- Check if applied -- Result includes [applied] column (true/false)

Type

Atomicity

Performance

Use Case

LOGGED

Yes (Paxos)

Slow

Multi-table consistency

UNLOGGED

Fast

Bulk inserts, same DC

LWT (IF)

Yes (Paxos)

Very Slow

Uniqueness, CAS

Do Not

Use for uniqueness checks

Use LWT for counters

Use for account balance updates

Use LWT on hot partitions

Keep LWT operations minimal

Use LWT across partitions

Expect ~4x latency overhead

Use LWT in batch loops

-- ── Table Options ── CREATE TABLE events ( event_id TIMEUUID, user_id UUID, type TEXT, data TEXT, PRIMARY KEY ((user_id), event_id) ) WITH CLUSTERING ORDER BY (event_id DESC) AND compaction = { 'class': 'LeveledCompactionStrategy', 'sstable_size_in_mb': 160 } AND compression = { 'sstable_compression': 'LZ4Compressor', 'chunk_length_in_kb': 64 } AND caching = { 'keys': 'ALL', 'rows_per_partition': '100' } AND gc_grace_seconds = 864000 AND default_time_to_live = 0 AND memtable_flush_writers_in_mb = 256; -- ── Compaction Strategies ── -- SizeTieredCompactionStrategy (default, good for write-heavy) -- LeveledCompactionStrategy (good for read-heavy) -- DateTieredCompactionStrategy (time-series data) -- TimeWindowCompactionStrategy (TWCS, best for time-series) ALTER TABLE events WITH compaction = { 'class': 'TimeWindowCompactionStrategy', 'compaction_window_size': 1, 'compaction_window_unit': 'DAYS' }; -- ── Tombstone Management ── -- gc_grace_seconds: default 864000 (10 days) -- Run nodetool repair before gc_grace expires ALTER TABLE events WITH gc_grace_seconds = 604800; -- ── Caching Options ── -- keys: ALL, NONE, ROW_CACHE -- rows_per_partition: integer or NONE ALTER TABLE users WITH caching = { 'keys': 'ALL', 'rows_per_partition': '200' }; -- ── Nodetool Commands (CLI) ── -- nodetool status -- cluster health -- nodetool cfstats ks.table -- table stats -- nodetool tablestats ks.table -- detailed stats -- nodetool cleanup -- remove deleted data -- nodetool compactionstats -- compaction info -- nodetool repair -pr -- anti-repair -- nodetool gossipinfo -- cluster topology -- nodetool netstats -- network stats -- nodetool proxyhistograms -- read latency

Strategy

Best For

Read Perf

Write Perf

STCS (default)

Write-heavy, equal-size cols

Medium

High

LCS

Read-heavy, high concurrency

High

Medium

DTCS

TTL/expiring data

Medium

TWCS

Time-series, IoT

High

Anti-Pattern

Solution

High-cardinality index

Separate table with PK

Deleting lots of data

Use TTL instead

Many small partitions

Bucket by date/time

Large collections (>64 KB)

Separate entity table

ALLOW FILTERING in prod

Redesign data model

Unbounded row growth

Add date bucket to PK