Key distinction: A finite language is always regular. An infinite language may or may not be regular. For example, L = {aⁿbⁿ | n ≥ 0} is infinite but NOT regular (needs counting, finite automata cannot count). Every finite automaton has finite memory (states), so it can only recognize languages with a repeating pattern.
Example: DFA for strings with even number of 0s over Σ = {0, 1}
States: Q = {q_even, q_odd} Start: q_even Final: F = {q_even}
┌──────────────────────────────┐
│ δ (Transition Table) │
├────────────┬────┬────────────┤
│ State │ 0 │ 1 │
├────────────┼────┼────────────┤
│ →* q_even │q_odd│ q_even │
│ q_odd │q_even│ q_odd │
└────────────┴────┴────────────┘
Trace "110": q_even--1-->q_even--1-->q_even--0-->q_odd ✗
Trace "1010": q_even--1-->q_even--0-->q_odd--1-->q_odd--0-->q_even ✓
nfa-to-dfa-subset-construction
━━━ NFA to DFA: Subset Construction Algorithm ━━━
Given NFA N = (Q, Σ, δ, q₀, F), construct DFA D = (Q', Σ, δ', q₀', F')
Step 1: q₀' = ε-closure(q₀) {start state of DFA}
Step 2: For each unmarked DFA state T ⊆ Q:
Step 2a: Mark T
Step 2b: For each symbol a ∈ Σ:
U = ε-closure( ∪ δ(t, a) for all t ∈ T )
If U ∉ Q', add U to Q'
Set δ'(T, a) = U
Step 3: F' = { T ∈ Q' | T ∩ F ≠ ∅ } {DFA final if any NFA final in T}
━━━ Worked Example ━━━
NFA: Σ = {a, b}, q₀ = {0}, F = {2}
Transitions:
δ(0,a) = {0,1} δ(0,b) = {0}
δ(1,b) = {2}
δ(2,a) = ∅ δ(2,b) = ∅
Step 1: A = ε-closure(0) = {0} [mark A]
From A={0}, a: ε-closure(δ(0,a)) = ε-closure({0,1}) = {0,1} = B [new]
From A={0}, b: ε-closure(δ(0,b)) = ε-closure({0}) = {0} = A [exists]
From B={0,1}, a: ε-closure(δ(0,a)∪δ(1,a)) = ε-closure({0,1}∪∅) = {0,1} = B [exists]
From B={0,1}, b: ε-closure(δ(0,b)∪δ(1,b)) = ε-closure({0}∪{2}) = {0,2} = C [new]
From C={0,2}, a: ε-closure(δ(0,a)∪δ(2,a)) = ε-closure({0,1}∪∅) = {0,1} = B [exists]
From C={0,2}, b: ε-closure(δ(0,b)∪δ(2,b)) = ε-closure({0}∪∅) = {0} = A [exists]
DFA Result (3 states):
┌──────────┬────┬────┬────────┐
│ State │ a │ b │ Final? │
├──────────┼────┼────┼────────┤
│ →* A{0} │ B │ A │ No │
│ B{0,1} │ B │ C │ No │
│ * C{0,2} │ B │ A │ Yes* │ (contains NFA final state 2)
└──────────┴────┴────┴────────┘
dfa-minimization
━━━ DFA Minimization: Table-Filling Algorithm ━━━
Goal: Find minimum-state DFA equivalent to given DFA.
Step 1: Partition states into two groups:
- Group 1: Final states (F)
- Group 2: Non-final states (Q - F)
Step 2: For each group, check if all states transition to the SAME group
for every input symbol. If not, split the group.
Step 3: Repeat Step 2 until no more splits possible.
Step 4: Each group becomes one state in the minimized DFA.
Example: Minimize DFA with states {A, B, C, D, E, F}
Final states: {C, D, E} Non-final: {A, B, F}
Check non-final {A, B, F}:
A --a--> C (final), B --a--> A (non-final) → SPLIT!
New groups: {A, F}, {B}
Check {A, F}:
A --a--> C, F --a--> C → same group ✓
A --b--> D, F --b--> D → same group ✓ → {A, F} stays
Check final {C, D, E}:
C --a--> A, D --a--> B → different groups → SPLIT!
Continue until stable...
Result: Minimized to fewer states by merging equivalent states.
Tip: Two states are equivalent if for ALL input strings,
starting from either state leads to same acceptance result.
DFA vs NFA trade-off: NFA can be exponentially more compact (n states in NFA may need 2ⁿ states in DFA after subset construction). However, DFA processing is O(n) per string while NFA worst-case is O(2ⁿ). For implementation, minimize the DFA first.ε-closure(q) = set of all states reachable from q using ε-transitions only.
🔤
Regular Expressions & Properties
REGULAR
Operator
Symbol
Meaning
Example
Union (OR)
a + b
Either a or b
0 + 1 = either 0 or 1
Concatenation
ab
a followed by b
01 = string "01"
Kleene Star
a*
Zero or more a's
a* = ε, a, aa, aaa, ...
Kleene Plus
a+
One or more a's
a+ = a, aa, aaa, ...
Zero or one
a?
Optional a
a? = ε or a
Parentheses
(r)
Grouping
(a+b)c = ac + bc
Language
Regular Expression
Strings ending with ab
(a+b)*ab
Strings containing 00 as substring
(a+b)*00(a+b)*
Strings starting with a, ending with b
a(a+b)*b
Strings of even length
((a+b)(a+b))*
Strings of odd length
(a+b)((a+b)(a+b))*
All strings except empty string
(a+b)+
Strings with exactly one b
a*ba*
Strings with at least two 0s
(1*01*01*)+
Every 0 is followed by 1
(1 + 01)*(0 + ε)
Even number of 0s
1*(01*01*)*
identity-rules
━━━ Regular Expression Identity Rules ━━━
Basic:
∅ + r = r ∅ · r = ∅ r + ∅ = r
ε · r = r r · ε = r ε* = ε
∅* = ε r + r = r r⁰ = ε
Annihilator:
∅ · r = ∅ r · ∅ = ∅
Distributive:
r(s + t) = rs + rt (r + s)t = rt + st
r + st ≠ (r+s)(r+t) (NOT distributive over itself)
Complement (De Morgan):
(r₁ + r₂)* ≠ (r₁* · r₂*) (these are NOT equal in general)
Kleene Laws:
(r*)* = r* (r+)* = r* r** = r*
rr* = r*r = r+ ε + rr* = r*
Idempotent:
r* + r* = r* r* · r* = r*
Precedence (highest to lowest): () > * > concatenation > +
Example: ab + cd* = (ab) + (c(d*))
ardens-theorem
━━━ Arden's Theorem (FA to RE Conversion) ━━━
For a state equation: R = Q + RP
The solution is: R = QP*
Steps to convert FA to RE:
1. Write equations for each state:
qi = δ(qi, a)·a + δ(qi, b)·b + (qi == q₀ ? ε : ∅)
2. Apply Arden's theorem iteratively to eliminate states
(eliminate non-final states first, keep final state last)
3. The final state equation gives the regular expression.
━━━ Worked Example ━━━
DFA for strings ending in "ab":
q0 --a--> q1, q0 --b--> q0 (q0 is start)
q1 --a--> q1, q1 --b--> q2 (q2 is final)
q2 --a--> q1, q2 --b--> q0
State equations:
q0 = ε + q0·b + q2·b
q1 = q0·a + q1·a + q2·a
q2 = q1·b
Substitute q2 into q0:
q0 = ε + q0·b + q1·b·b = ε + q0·b + q1·bb
By Arden: q0 = ε·b* + q1·bb·b* = b* + q1·bbb*
Substitute into q1:
q1 = (b* + q1·bbb*)·a + q1·a
q1 = b*·a + q1·(bbb*·a + a)
By Arden: q1 = b*a·(bbb*a + a)*
Since q2 = q1·b, the language is:
L = q2 = q1·b = b*a·(bbb*a + a)*·b
Simplified: (a+b)*ab ✓
Closure Properties of Regular Languages
Property
Closed?
Union (L₁ ∪ L₂)
Yes
Intersection (L₁ ∩ L₂)
Yes
Concatenation (L₁ · L₂)
Yes
Kleene Star (L*)
Yes
Complement (L̄)
Yes
Reversal (Lᵀ)
Yes
Difference (L₁ - L₂)
Yes
Homomorphism
Yes
Inverse Homomorphism
Yes
NOT Closed Under (Regular)
Property
Counterexample
Infinite Union
L_n = {aⁿbⁿ} each regular, union is CFL
Infinite Intersection
Similar counterexample
Subset
L₁ ⊂ L₂ does not guarantee closure
RE to FA conversion:Thompson's construction builds an NFA from any RE in O(|r|) states. The resulting NFA has exactly one start state, one final state, and at most 2 transitions per state.FA to RE:Use state elimination method or Arden's theorem. State elimination: add a new start and new final state, then eliminate internal states one by one, updating edge labels as REs.
💧
Pumping Lemma & Myhill-Nerode
PUMPING LEMMA
Pumping Lemma Statement
If L is a regular language, then there exists an integer p ≥ 1 (the pumping length) such that any string w ∈ L with |w| ≥ p can be divided into three partsw = xyz satisfying:
Condition
Description
1. |xy| ≤ p
The prefix xy has length at most p (y falls within first p symbols)
2. |y| > 0
y is non-empty (at least one symbol must be pumped)
3. xyⁱz ∈ L
For ALL i ≥ 0, the pumped string is in L (i=0 removes y)
Important: Pumping lemma is used to prove a language is NOT regular (proof by contradiction). It CANNOT prove a language IS regular.
pumping-lemma-proof-template
━━━ Proof Template: Language L is NOT Regular ━━━
Step 1: Assume L is regular.
Step 2: Let p be the pumping length given by the pumping lemma.
Step 3: Choose a string w ∈ L with |w| ≥ p.
(Choose wisely! Often w = aᵖb or aᵖbᵖ or aᵖ...)
Step 4: Show that for ALL possible divisions w = xyz with |xy| ≤ p and |y| > 0,
there exists some i ≥ 0 such that xyⁱz ∉ L.
Step 5: This contradicts the pumping lemma. Therefore L is NOT regular. □
Key Strategy:
- Choose w so that y falls in a region that controls the "counting"
- Since |xy| ≤ p, y consists only of the first p symbols
- Pumping y breaks the balance/constraint in the language
pumping-example-1
━━━ Example 1: Prove L = {0ⁿ1ⁿ | n ≥ 1} is NOT regular ━━━
Assume L is regular with pumping length p.
Choose w = 0ᵖ1ᵖ (clearly |w| = 2p ≥ p, and w ∈ L)
Since |xy| ≤ p, y consists entirely of 0s (first p symbols are all 0s).
So y = 0ᵏ for some k ≥ 1 (since |y| > 0).
Pump down (i = 0): xz = 0ᵖ⁻ᵏ1ᵖ
Since k ≥ 1, we have (p-k) < p, so |0ᵖ⁻ᵏ| ≠ |1ᵖ| = p
Therefore xz = 0ᵖ⁻ᵏ1ᵖ has unequal 0s and 1s → xz ∉ L.
Contradiction! L is NOT regular. □
pumping-example-2
━━━ Example 2: Prove L = {wwᴿ | w ∈ {0,1}*} is NOT regular ━━━
Assume L is regular with pumping length p.
Choose w = 0ᵖ1 1ᵖ0 (this is 0ᵖ1 followed by its reversal 10ᵖ)
|w| = 2p+2 ≥ p, w ∈ L (w = wwᴿ where w = 0ᵖ1)
Since |xy| ≤ p, y consists entirely of 0s from the first block.
So y = 0ᵏ for some k ≥ 1.
Pump up (i = 2): xy²z = 0ᵖ⁺ᵏ1 1ᵖ0
First half: 0ᵖ⁺ᵏ1 (length p+k+1)
Second half: 1ᵖ0 (length p+1)
Since k ≥ 1, first half ≠ reversal of second half → xy²z ∉ L.
Contradiction! L is NOT regular. □
pumping-example-3
━━━ Example 3: Prove L = {aⁿbⁿcⁿ | n ≥ 0} is NOT regular ━━━
Assume L is regular with pumping length p.
Choose w = aᵖbᵖcᵖ (|w| = 3p ≥ p, w ∈ L)
Since |xy| ≤ p, y consists entirely of a's.
So y = aᵏ for some k ≥ 1.
Pump up (i = 2): xy²z = aᵖ⁺ᵏbᵖcᵖ
Number of a's = p+k, b's = p, c's = p
Since k ≥ 1: p+k ≠ p → not all equal → xy²z ∉ L.
Contradiction! L is NOT regular. □
Myhill-Nerode Theorem
A language L is regular if and only if the Myhill-Nerode equivalence relation has a finite number of equivalence classes.
Concept
Definition
Equivalence relation
x ≡_L y iff for all z ∈ Σ*: xz ∈ L ↔ yz ∈ L
Distinguishing string
z is distinguishing for x,y if exactly one of xz, yz ∈ L
Equivalence class
[x] = {y | y ≡_L x} — set of all strings equivalent to x
Index of L
Number of distinct equivalence classes = minimum DFA states
Usage: To prove L is NOT regular, find infinitely many pairwise distinguishable strings. Example: For L = {0ⁿ1ⁿ}, strings 0, 00, 000, ... are all distinguishable (0ⁱ \u00B7 1ⁱ \u2208 L but 0ʲ \u00B7 1ⁱ \u2209 L for i \u2260 j). Since there are infinitely many equivalence classes, L is not regular.
⚠️
Common mistake:The pumping lemma says "there exists a division." When proving non-regularity, you must show that ALL possible divisions (not just one) lead to a contradiction. Since |xy| \u2264 p, restrict y to fall within the first p characters, then analyze all cases of what y could be. Often, all cases of y within the first p symbols lead to the same kind of contradiction.
🌳
Context-Free Grammar & Parsing
CFG
CFG Definition
Types of Derivations
Type
Description
Leftmost
Always expand the leftmost variable first
Rightmost
Always expand the rightmost variable first
Parse Tree
Tree representation of derivation (shows structure)
Ambiguous
String has ≥ 2 different parse trees (or leftmost derivations)
cfg-examples
━━━ CFG Examples ━━━
Example 1: L = {aⁿbⁿ | n ≥ 1}
S → aSb | ab
Derivation of "aabb" (n=2):
S → aSb → aaSbb → aabb ✓
Example 2: L = {aⁿbⁿ | n ≥ 0} (includes ε)
S → aSb | ε
S → aSb → aaSbb → aaaSbbb → aaabbb (n=3) ✓
S → ε (n=0) ✓
Example 3: Balanced Parentheses
S → (S) | SS | ε
Derivation of "(())":
S → (S) → (SS) → (()S) → (()) ✓
Example 4: L = {w wᴿ | w ∈ {a,b}*}
S → aSa | bSb | ε
S → aSa → abSba → abba ✓ (w = ab)
Example 5: L = {aⁿbᵐcⁿ⁺ᵐ | n,m ≥ 1}
S → aSc | aBc
B → bBc | bc
Derivation of "aabcc" (n=1,m=1,c extra):
S → aSc → a(aBc)c = aaBcc → aa(bBc)cc → aabBccc
→ aab(b c)ccc ✗ wrong
Better: S → aSc → aaScc → aa(aBc)cc = aaaBccc
→ aaa(bBc)cc → aaabBccc → aaab(bbc)ccc ✗
Correct: S → aBc → a(bBc)c = abBcc → ab(bc)cc = abbccc ✓
ambiguity
━━━ Ambiguity ━━━
A grammar is AMBIGUOUS if some string has two or more distinct parse trees
(or equivalently, two or more leftmost derivations).
Example: "Dangling Else" (if-else)
S → if E then S else S
S → if E then S
S → a
String: "if E then if E then a else a"
Parse tree 1: else binds to INNER if (standard in most languages)
Parse tree 2: else binds to OUTER if
Solution: Rewrite grammar to be unambiguous:
S → M | U
M → if E then M else M | a (matched if)
U → if E then S | if E then M else U (unmatched if)
Example: Expression Grammar (ambiguous)
E → E + E | E * E | (E) | id
"id + id * id" has two parse trees:
Tree 1: (id + id) * id Tree 2: id + (id * id)
Unambiguous version (enforces precedence):
E → E + T | T
T → T * F | F
F → (E) | id
first-follow-sets
━━━ FIRST and FOLLOW Sets ━━━
FIRST(α):
Set of terminals that can begin strings derived from α.
If α ⇒* ε, then ε ∈ FIRST(α).
Rules to compute FIRST:
1. If X is terminal: FIRST(X) = {X}
2. If X → ε is production: ε ∈ FIRST(X)
3. If X → Y₁Y₂...Yₖ:
Add FIRST(Y₁) - {ε} to FIRST(X)
If ε ∈ FIRST(Y₁), add FIRST(Y₂) - {ε} to FIRST(X)
Continue until Yᵢ doesn't derive ε
If ALL Yᵢ derive ε, add ε to FIRST(X)
━━━ Example: FIRST sets ━━━
Grammar:
E → TE'
E' → +TE' | ε
T → FT'
T' → *FT' | ε
F → (E) | id
FIRST(F) = { (, id }
FIRST(T') = { *, ε }
FIRST(T) = FIRST(F) = { (, id }
FIRST(E') = { +, ε }
FIRST(E) = FIRST(T) = { (, id }
FOLLOW(A):
Set of terminals that can appear immediately after A in some sentential form.
Rules to compute FOLLOW:
1. $ ∈ FOLLOW(S) ($ = end of input marker)
2. If A → αBβ: add FIRST(β) - {ε} to FOLLOW(B)
3. If A → αBβ and ε ∈ FIRST(β): add FOLLOW(A) to FOLLOW(B)
4. If A → αB: add FOLLOW(A) to FOLLOW(B)
━━━ Example: FOLLOW sets ━━━
FOLLOW(E) = { ), $ } (from F → (E), and rule 4)
FOLLOW(E') = { ), $ } (from E → TE', FOLLOW(E')=FOLLOW(E))
FOLLOW(T) = { +, ), $ } (from E' → +TE', FIRST(+), and FOLLOW(E'))
FOLLOW(T') = { +, ), $ } (from T → FT', same as FOLLOW(T))
FOLLOW(F) = { *, +, ), $ } (from T' → *FT', and FOLLOW(T))
ll1-parsing-table
━━━ LL(1) Parsing Table Construction ━━━
For each production A → α in grammar:
For each terminal a ∈ FIRST(α):
Add A → α to M[A, a]
If ε ∈ FIRST(α):
For each terminal b ∈ FOLLOW(A):
Add A → α to M[A, b]
━━━ LL(1) Parsing Table for Expression Grammar ━━━
│ id │ + │ * │ ( │ ) │ $
──────┼─────────┼─────────┼─────────┼─────────┼─────────┼────────
E │ → TE' │ │ │ → TE' │ │
E' │ │ → +TE' │ │ │ → ε │ → ε
T │ → FT' │ │ │ → FT' │ │
T' │ │ → ε │ → *FT' │ │ → ε │ → ε
F │ → id │ │ │ → (E) │ │
LL(1) Condition: Grammar is LL(1) iff for every non-terminal A
with productions A → α₁ | α₂ | ... | αₙ:
1. FIRST(αᵢ) ∩ FIRST(αⱼ) = ∅ for all i ≠ j
2. If ε ∈ FIRST(αᵢ), then FIRST(αᵢ) ∩ FOLLOW(A) = ∅
This grammar satisfies both conditions → LL(1) ✓
ll1-parse-trace
━━━ LL(1) Parse Trace: "id + id * id" ━━━
Stack | Input | Action
───────────────┼─────────────────┼──────────────────
$ E | id + id * id $ | E → TE'
$ E' T | id + id * id $ | T → FT'
$ E' T' F | id + id * id $ | F → id
$ E' T' id | id + id * id $ | match id
$ E' T' | + id * id $ | T' → ε
$ E' | + id * id $ | E' → +TE'
$ E' T + | + id * id $ | match +
$ E' T | id * id $ | T → FT'
$ E' T' F | id * id $ | F → id
$ E' T' id | id * id $ | match id
$ E' T' | * id $ | T' → *FT'
$ E' T' F * | * id $ | match *
$ E' T' F | id $ | F → id
$ E' T' id | id $ | match id
$ E' T' | $ | T' → ε
$ E' | $ | E' → ε
$ | $ | ACCEPTED ✓
💡
LL(1) vs LR parsers: LL(1) = Left-to-right scan, Leftmost derivation, 1 lookahead. Top-down parser (predictive parsing, recursive descent). LR(1) = Left-to-right, Rightmost derivation, 1 lookahead. Bottom-up parser (shift-reduce). LR parsers handle a larger class of grammars than LL parsers. If a grammar is not LL(1), it might still be LR(1). Most parser generators (Yacc, Bison) use LALR(1) or CLR(1).
📦
Pushdown Automata (PDA)
PDA
PDA Formal Definition
Acceptance Modes
Mode
Condition
Equivalence
Final State
String accepted if PDA ends in a state q ∈ F
DPDA: modes NOT equivalent
Empty Stack
String accepted if stack is empty at end
NPDA: both modes accept CFLs
Feature
DPDA
NPDA
Determinism
At most 1 move per configuration
Multiple moves possible
Transition
δ: Q × (Σ∪{ε}) × Γ → Q × Γ*
δ: Q × (Σ∪{ε}) × Γ → finite subsets
Power
Strictly weaker than NPDA
Equivalent to CFG (accepts all CFLs)
Languages
Deterministic CFLs (proper subset of CFLs)
All context-free languages
ε-transitions
Allowed (but no choice with input)
Allowed
Example gap
L = {wwᵀ | w ∈ {a,b}*} accepted by NPDA but NOT DPDA
—
pda-example-1
━━━ PDA Example 1: L = {aⁿbⁿ | n ≥ 1} ━━━
PDA by Final State: P = ({q₀, q₁, q₂}, {a,b}, {Z,A}, δ, q₀, Z, {q₂})
Transitions (state, input, pop → push, next_state):
δ(q₀, a, Z) = (q₀, AZ) Push A for each a (on top of Z)
δ(q₀, a, A) = (q₀, AA) Keep pushing A for a's
δ(q₀, b, A) = (q₁, ε) Pop A for each b, move to q₁
δ(q₁, b, A) = (q₁, ε) Keep popping A for b's
δ(q₁, ε, Z) = (q₂, Z) Stack has only Z → accept (go to final state)
Trace: "aabb"
Config: (q₀, aabb, Z)
→ (q₀, abb, AZ) [read a, push A]
→ (q₀, bb, AAZ) [read a, push A]
→ (q₁, b, AZ) [read b, pop A]
→ (q₁, ε, Z) [read b, pop A]
→ (q₂, ε, Z) [ε-move, go to final state]
ACCEPTED ✓ (q₂ ∈ F)
Trace: "aab"
(q₀, aab, Z) → (q₀, ab, AZ) → (q₀, b, AAZ) → (q₁, ε, AZ)
→ stuck! Cannot read ε with A on stack. REJECTED ✓
pda-example-2
━━━ PDA Example 2: L = {wwᴿ | w ∈ {a,b}*} (Empty Stack Acceptance) ━━━
PDA: P = ({q₁, q₂}, {a,b}, {Z,A,B}, δ, q₁, Z, ∅)
Phase 1 (q₁): Push entire string onto stack
δ(q₁, a, Z) = (q₁, AZ) δ(q₁, a, A) = (q₁, AA)
δ(q₁, a, B) = (q₁, AB) δ(q₁, b, Z) = (q₁, BZ)
δ(q₁, b, A) = (q₁, BA) δ(q₁, b, B) = (q₁, BB)
δ(q₁, ε, Z) = (q₂, Z) [nondeterministic: guess middle, switch to q₂]
Phase 2 (q₂): Match and pop (compare to reverse)
δ(q₂, a, A) = (q₂, ε) [match a, pop A]
δ(q₂, b, B) = (q₂, ε) [match b, pop B]
δ(q₂, ε, Z) = (q₂, ε) [stack empty → accept by empty stack]
Trace: "abba"
q₁: push a,b,b,a → stack (top to bottom): A,B,B,A,Z
q₁ → q₂ (ε-transition at middle)
q₂: read a, pop A → ok read b, pop B → ok
q₂: read b, pop B → ok read a, pop A → ok
q₂: ε, Z → pop Z, stack empty → ACCEPTED ✓
Note: This PDA is NON-DETERMINISTIC (NPDA) — must guess the midpoint.
This language is NOT accepted by any DPDA!
cfg-to-pda-conversion
━━━ CFG to PDA Conversion (General Algorithm) ━━━
Given CFG G = (V, T, P, S), construct PDA that accepts by empty stack:
PDA: P = ({q}, T, V ∪ T ∪ {Z₀}, δ, q, Z₀, ∅)
Transitions:
1. δ(q, ε, Z₀) = (q, SZ₀) [push start symbol]
2. For each A → α in P:
δ(q, ε, A) = (q, α) [replace top variable with production RHS]
3. For each terminal a ∈ T:
δ(q, a, a) = (q, ε) [match terminal, pop it]
The PDA works as follows:
- Stack initially has Z₀, then S is pushed on top
- At each step: if top of stack is a variable, replace it (nondeterministically choose production)
- If top of stack is a terminal, it must match current input symbol (pop both)
- String accepted when stack is empty (only Z₀ consumed)
━━━ Example: CFG S → aSb | ε → PDA ━━━
δ(q, ε, Z₀) = (q, SZ₀)
δ(q, ε, S) = (q, aSb) [from S → aSb]
δ(q, ε, S) = (q, ε) [from S → ε]
δ(q, a, a) = (q, ε) [match terminal a]
δ(q, b, b) = (q, ε) [match terminal b]
pda-to-cfg-conversion
━━━ PDA to CFG Conversion ━━━
Given PDA P, construct equivalent CFG G:
Step 1: For PDA with single state q, stack alphabet Γ:
Variables: Aᵢⱼ for all i, j ∈ Γ (including bottom marker Z₀)
Start symbol: A_Z₀Z₀ (or A_Z₀ε if empty stack acceptance)
Step 2: Productions (for single-state PDA):
- A_ii → ε for all i ∈ Γ (empty stack)
- A_ij → aA_kl b if δ(q, a, i) contains (q, kl) [push]
- A_ij → aA_jb if δ(q, a, i) contains (q, ε) [pop, i=j]
- A_ij → A_ik A_kj for all k ∈ Γ (concatenation)
Step 3: For multi-state PDA: variables are [p, A, q] meaning
"starting in state p with A on top, end in state q with A popped"
This always produces a grammar in Chomsky Normal Form (CNF).
━━━ Key Insight ━━━
PDA ↔ CFG are equivalent: every CFL has a PDA and vice versa.
The conversion preserves the language exactly.
Key insight: A PDA is essentially an NFA + stack. The stack gives it infinite memory (unlike finite automata), but it can only access the top element (LIFO). This is exactly enough power to recognize context-free languages — matching nested/recursive structures like parentheses, aⁿbⁿ, etc.DPDA \u2260 NPDA in power: deterministic CFLs are a proper subset of all CFLs.
🤖
Turing Machines & Decidability
TM
Turing Machine Definition
TM Capabilities
Feature
Description
Infinite tape
Unbounded in both directions (or one-way)
Read/Write head
Can read and write symbols, move L or R
No restrictions
Can move left, overwrite, revisit cells
Power
Most powerful model — equivalent to any real computer
Halting
May loop forever (undecidable to predict!)
tm-example-1
━━━ TM Example 1: String Reversal ━━━
Input: w on tape, |w| = n, tape = w B B B ...
Output: wᴿ on tape (reversed string)
Algorithm:
1. Move right to find end of string (first B)
2. Move left one position (last char of w)
3. Remember this char, replace with marker (e.g., X)
4. Move left to start, write remembered char, move right past it
5. Repeat until all chars processed
Transition table:
┌──────────┬──────┬──────┬──────┬──────────┐
│ State │ Read │Write│ Move │Next State │
├──────────┼──────┼──────┼──────┼──────────┤
│ q0(start)│ a │ X │ R │ qa │ [remember a]
│ q0 │ b │ X │ R │ qb │ [remember b]
│ q0 │ X │ X │ R │ q4 │ [all done]
│ q0 │ B │ B │ — │ q_accept │
│ qa │ a │ a │ R │ qa │ [move right]
│ qa │ b │ b │ R │ qa │
│ qa │ X │ X │ R │ qa │ [skip X]
│ qa │ B │ B │ L │ qa_check │ [at end]
│ qa_check │ a │ X │ L │ q_return │ [match!]
│ qa_check │ b │ b │ — │ q_reject │ [mismatch!]
│ qa_check │ X │ X │ L │ q_return │ [center]
│ qb │ a │ a │ R │ qb │ [move right]
│ qb │ b │ b │ R │ qb │
│ qb │ X │ X │ R │ qb │ [skip X]
│ qb │ B │ B │ L │ qb_check │ [at end]
│ qb_check │ b │ X │ L │ q_return │ [match!]
│ qb_check │ a │ a │ — │ q_reject │ [mismatch!]
│ qb_check │ X │ X │ L │ q_return │ [center]
│ q_return │ a/b/X│ same │ L │ q_return │ [move left]
│ q_return │ X │ X │ R │ q0 │ [restart]
│ q_return │ B │ B │ R │ q_accept │ [done]
│ q4 │ X │ B │ R │ q4 │ [clean up]
│ q4 │ B │ B │ — │ HALT │
└──────────┴──────┴──────┴──────┴──────────┘
tm-example-2
━━━ TM Example 2: Palindrome Checker {w = wᴿ} ━━━
Input: w (string over {a, b}), Output: accept if palindrome, reject if not
Algorithm:
1. Read first char (left end), remember it, replace with X
2. Move right to find last char (first B, then move left)
3. Compare: if last char matches remembered first char, replace with X
4. Move back to first X, then right to next unprocessed char
5. Repeat until all chars processed (all X's) → accept
6. If mismatch → reject immediately
Transition table:
┌──────────┬──────┬──────┬──────┬──────────┐
│ State │ Read │Write│ Move │Next State │
├──────────┼──────┼──────┼──────┼──────────┤
│ q0(start)│ a │ X │ R │ qa │ [remember a]
│ q0 │ b │ X │ R │ qb │ [remember b]
│ q0 │ X │ X │ R │ q_check │ [all X's?]
│ q0 │ B │ B │ — │ q_accept │ [empty/done]
│ qa │ a │ a │ R │ qa │ [move right]
│ qa │ b │ b │ R │ qa │ [move right]
│ qa │ X │ X │ R │ qa │ [skip X]
│ qa │ B │ B │ L │ qa_check │ [at end, go back]
│ qa_check │ a │ X │ L │ q_return │ [match a!]
│ qa_check │ b │ b │ — │ q_reject │ [mismatch!]
│ qa_check │ X │ X │ L │ q_return │ [center match]
│ qb │ a │ a │ R │ qb │ [move right]
│ qb │ b │ b │ R │ qb │ [move right]
│ qb │ X │ X │ R │ qb │ [skip X]
│ qb │ B │ B │ L │ qb_check │ [at end, go back]
│ qb_check │ b │ X │ L │ q_return │ [match b!]
│ qb_check │ a │ a │ — │ q_reject │ [mismatch!]
│ qb_check │ X │ X │ L │ q_return │ [center match]
│ q_return │ a/b/X│ same │ L │ q_return │ [move left]
│ q_return │ X │ X │ R │ q0 │ [start next round]
│ q_return │ B │ B │ R │ q_accept │ [all matched]
│ q_check │ X │ X │ R │ q_check │
│ q_check │ B │ B │ — │ q_accept │ [palindrome!]
│ q_accept │ - │ - │ - │ HALT │
│ q_reject │ - │ - │ - │ HALT │
└──────────┴──────┴──────┴──────┴──────────┘
tm-variants
━━━ Turing Machine Variants ━━━
1. Multi-Tape TM:
- k tapes, each with its own read/write head
- Can simulate with single-tape TM (proves equivalent power)
- Multi-tape can be exponentially faster for some problems
2. Non-Deterministic TM (NTM):
- Multiple possible transitions per configuration
- Can simulate with deterministic TM (exponential slowdown)
- NTM = DTM in terms of language recognition power
3. Semi-Infinite Tape TM:
- Tape is infinite only to the right (not left)
- Equivalent to standard TM
4. Enumerators:
- TM that prints (enumerates) strings of a language
- Prints exactly the strings of a recursively enumerable language
5. Multi-Stack Machines:
- 2 stacks = TM (one stack simulates tape going left, other going right)
- 1 stack = PDA (weaker than TM)
All variants recognize the SAME class of languages (REC and RE)!
decidability
━━━ Decidability & Undecidability ━━━
Language Classification:
┌────────────────────────────────────────────────────┐
│ ALL Languages │
│ ┌──────────────────────────────────────────────┐ │
│ │ Recursively Enumerable (RE / R.E.) │ │
│ │ TM-recognizable (may loop forever) │ │
│ │ ┌────────────────────────────────────────┐ │ │
│ │ │ Recursive (R / Decidable) │ │ │
│ │ │ TM always halts (accept or reject) │ │ │
│ │ │ ┌──────────────────────────────────┐ │ │ │
│ │ │ │ Context-Free Languages │ │ │ │
│ │ │ │ ┌────────────────────────────┐ │ │ │ │
│ │ │ │ │ Regular Languages │ │ │ │ │
│ │ │ │ └────────────────────────────┘ │ │ │ │
│ │ │ └──────────────────────────────────┘ │ │ │
│ │ └────────────────────────────────────────┘ │ │
│ └──────────────────────────────────────────────┘ │
└────────────────────────────────────────────────────┘
Church-Turing Thesis:
"Any algorithm can be implemented by a Turing Machine."
This is a THESIS (not a theorem) — cannot be proven, but widely accepted.
Equivalent models: TM, λ-calculus, μ-recursive functions, Post systems,
RAM model, modern programming languages (with infinite memory).
halting-problem
━━━ The Halting Problem ━━━
H = { <M, w> | TM M halts on input w }
Theorem: H is UNDECIDABLE (but it IS recursively enumerable)
Proof by contradiction (diagonalization):
Assume H is decidable → there exists TM H that decides H.
Build a TM D (the "diagonalizer"):
D(<M>):
Run H(<M, <M>) [does M halt on its own description?]
If H says "halts": run M(<M>) forever (loop)
If H says "loops": halt and reject
Now ask: What does D(<D>) do?
If D halts on <D>:
H(<D, <D>) says "halts" → D loops. CONTRADICTION!
If D loops on <D>:
H(<D, <D>) says "loops" → D halts. CONTRADICTION!
Therefore H is UNDECIDABLE. □
━━━ Key Undecidable Problems ━━━
• Halting problem (does M halt on w?)
• Does TM M accept the empty string?
• Is L(M) empty? finite? regular? context-free?
• Does L(M₁) = L(M₂)?
• Is L(M) = Σ*? (completeness problem)
• Post Correspondence Problem (PCP)
• Hilbert's 10th problem (Diophantine equations)
• Does a given CFG generate all strings? (L(G) = Σ*)
decidable-problems
━━━ Common DECIDABLE Problems ━━━
Regular Languages:
✓ DFA acceptance (does DFA accept w?) — just simulate DFA
✓ DFA emptiness (is L(DFA) = ∅?) — BFS from start, reach final?
✓ DFA equivalence (L(DFA₁) = L(DFA₂)?) — construct symmetric difference
✓ DFA minimization — Hopcroft's algorithm
✓ Regular expression equivalence
Context-Free Languages:
✓ CFG emptiness — check if start symbol can derive ε or any terminal
✓ CFG finiteness — check for cycles in derivation graph
✓ CFL membership — CYK algorithm or Earley parser (O(n³))
✓ Ambiguity? — UNDECIDABLE in general! (but decidable for subclasses)
Recursive (Decidable) Languages:
✓ Every regular language is decidable
✓ Every context-free language is decidable
✓ L(DFA) = Σ*?
✓ L(DFA₁) ∩ L(DFA₂) = ∅?
✓ Is L(CFG) regular? — decidable
Key theorem for proving undecidability: Use reduction. To show problem A is undecidable, reduce a known undecidable problem B to A. If you could decide A, then you could decide B (contradiction). Common base case: the Halting Problem H. Reducing H to other problems proves they're undecidable.Rice's Theorem: Any non-trivial property of the language recognized by a TM is undecidable.
🎓
Interview Q&A
PLACEMENT QUESTIONS
Q1: Explain the Chomsky Hierarchy and give examples for each type.
The Chomsky Hierarchy classifies formal grammars into 4 types based on production rule restrictions:Type 3 (Regular): productions are A \u2192 aB or A \u2192 a (right-linear). Recognized by FA. Example: L = {aⁿbᵐ} = (a*)(b*). Type 2 (Context-Free): productions areA \u2192 \u03B3 where left side is a single non-terminal. Recognized by PDA. Example: L = {aⁿbⁿ} with grammar S \u2192 aSb | \u03B5. Type 1 (Context-Sensitive): productions satisfy |\u03B1| \u2264 |\u03B2| (non-contracting). Recognized by Linear-Bounded TM. Example: L = {aⁿbⁿcⁿ}.Type 0 (Recursively Enumerable): no restrictions. Recognized by TM. The hierarchy is strict: Regular \u2282 CFL \u2282 CSL \u2282 RE.
Q2: Design a DFA that accepts all strings over {'{'}0,1{'}'} containing the substring "010".
We need 4 states to track the longest suffix that matches a prefix of "010." States: q\u2080 (no match progress), q\u2081 (last char was '0'), q\u2082 (last 2 chars were '01'), q\u2083 (matched "010" \u2014 trap state).
Start: q\u2080, Final: {q\u2083}. State q\u2083 is a trap \u2014 once we see "010," we stay in q\u2083 regardless of future input. Trace for "1010": q\u2080\u2192q\u2080\u2192q\u2081\u2192q\u2082\u2192q\u2083 \u2713. For "0110": q\u2080\u2192q\u2081\u2192q\u2082\u2192q\u2080\u2192q\u2081 \u2717.
Q3: How do you prove a language is NOT regular using the Pumping Lemma?
Step 1: Assume L is regular. Let p be the pumping length.Step 2:Choose a "smart" string w \u2208 L with |w| \u2265 p. The key is choosing w such that y (the pumpable part, which falls in the first p chars) controls the critical property.Step 3:Show that for ALL valid divisions w = xyz (|xy| \u2264 p, |y| > 0), pumping breaks the property.Example: For L = {aⁿbⁿ}, choose w = aᵖbᵖ. Since |xy| \u2264 p, y = aᵏ (k \u2265 1). Pumping down (i=0) gives aᵖ\u207Bᵏbᵖ which has unequal counts \u2192 not in L. Step 4: Contradiction \u2192 L is not regular. Also mention Myhill-Nerode as an alternative: find infinitely many distinguishable strings.
Q4: What is the difference between LL(1) and LR parsers?
LL(1): Scans Left-to-right, uses Leftmost derivation, 1 lookahead symbol. It's a top-down parser (predictive parsing, recursive descent). Constructs parse tree from root to leaves. Requires the grammar to be free of left recursion and common prefixes (must compute FIRST/FOLLOW sets). Smaller class of grammars.
LR(1):Scans Left-to-right, uses Rightmost derivation (in reverse), 1 lookahead. It's abottom-up parser (shift-reduce). Constructs parse tree from leaves to root. Handles a much larger class of grammars. Variants include SLR(1), LALR(1) (used by Yacc/Bison), and CLR(1) (most powerful).
Key takeaway: All LL(1) grammars are LR(1), but not vice versa. LR parsers are more powerful but harder to implement manually. Most practical compilers use LALR(1) or a variant.
Q5: Explain the Halting Problem and give other examples of undecidable problems.
The Halting Problem: Given a TM M and input w, does M halt on w? This is undecidable(proven by diagonalization \u2014 assume a decider H exists, build a TM D that does the opposite of H on its own description, contradiction). It IS semi-decidable (recursively enumerable) \u2014 just simulate M on w and accept if it halts, but loop if it doesn't.
Other undecidable problems:(1) Does TM M accept empty string? (2) Is L(M) empty/finite/regular/CFL? (3) Do two TMs accept the same language? (4) Post Correspondence Problem (PCP). (5) Rice's Theorem: every non-trivial semantic property of TM-recognized languages is undecidable.
Technique to prove undecidability:Reduction \u2014 reduce the known undecidable halting problem to your target problem. If you could decide the target, you could decide halting (contradiction).
TOC Interview Strategy:(1) Know the Chomsky hierarchy inside out \u2014 be able to place any language and identify the right automaton. (2) Practice DFA/NFA design \u2014 draw state diagrams for common patterns (substring search, divisible by n, mod counting). (3) Memorize the pumping lemma proof structure \u2014 it's formulaic once practiced. (4) Understand FIRST/FOLLOW computation and LL(1) table construction step-by-step. (5) Know at least 3 undecidable problems and how to prove undecidability via reduction.