Compiler Design Cheatsheet Cheatsheet

🔄

Introduction & Compiler Phases

OVERVIEW

Phase	Input	Output	Description
Lexical Analysis	Source code (chars)	Token stream	Groups characters into meaningful tokens (keywords, identifiers, literals, operators)
Syntax Analysis	Token stream	Parse tree / AST	Checks grammatical structure using CFG; builds parse tree verifying language syntax rules
Semantic Analysis	Parse tree / AST	Annotated AST	Checks type compatibility, scope rules, declarations; enforces language semantics
Intermediate Code Gen	Annotated AST	IR (3-address code)	Translates to platform-independent intermediate representation (TAC, bytecode, SSA)
Code Optimization	IR	Optimized IR	Improves IR for faster/smaller code: constant folding, dead code elimination, loop optimization
Code Generation	Optimized IR	Target machine code	Maps IR to machine instructions: register allocation, instruction selection, peephole opts

Compiler vs Interpreter

Aspect	Compiler	Interpreter
Process	Translates entire program at once	Executes line-by-line / statement-by-statement
Output	Machine code / executable file	No separate output; runs directly
Speed	Fast execution (pre-translated)	Slower execution (real-time translation)
Errors	Reports all errors after scanning	Stops at first error encountered
Memory	Needs more memory (full program)	Less memory (line-by-line)
Debugging	Harder (generated code)	Easier (source-level execution)
Examples	GCC, Clang, javac	Python, Ruby, JavaScript (V8 JIT)
Hybrid	Java (compile to bytecode, then JIT)	Mixed approach combines benefits

Pass vs Phase

Concept	Description
Phase	A logical stage of compilation (lexical, syntax, semantic, etc.)
Pass	One complete traversal of source/intermediate code through all phases
Single Pass	All phases executed in one traversal (e.g., Pascal compiler)
Multi Pass	Phases executed in multiple traversals (modern compilers)
Trade-off	Single pass = faster but less optimization; Multi pass = slower but better code

Cross-Compiler & Bootstrapping

A cross-compiler runs on platform A but generates code for platform B. Example: GCC on x86 Linux producing ARM binary. A source-to-source compiler (transpiler) translates one HLL to another (e.g., TypeScript to JavaScript). A decompiler reverses compilation (machine code to high-level code).

Bootstrapping

Bootstrapping is writing a compiler in the language it compiles. Three stages: (1) Write a simple compiler in language X for a subset of language Y. (2) Use it to compile a compiler for Y written in Y. (3) The resulting compiler can now compile itself. This is how C, Go, Rust, and many other languages bootstrapped. It proves the language is self-hosting and powerful enough to build its own tools.

compiler-architecture.txt

┌─────────────┐    ┌─────────────┐    ┌───────────────┐
│   Source     │───▶│  Lexical    │───▶│   Syntax      │
│   Code       │    │  Analyzer   │    │   Analyzer    │
└─────────────┘    │  (Scanner)  │    │  (Parser)     │
                   └─────────────┘    └───────┬───────┘
                                               │
                   ┌─────────────┐    ┌───────▼───────┐
                   │   Target    │◀───│    Code       │◀──┐
                   │   Machine   │    │  Generator    │   │
                   │   Code      │    └───────┬───────┘   │
                   └─────────────┘            │           │
                         ▲            ┌──────▼────────┐  │
                         │            │   Code        │  │
                         └────────────│  Optimizer    │──┘
                                      └───────┬───────┘
                                              │
                                      ┌───────▼───────┐
                                      │  Semantic     │
                                      │  Analyzer     │
                                      └───────┬───────┘
                                              │
                                      ┌───────▼───────┐
                                      │ Intermediate  │
                                      │ Code Generator│
                                      └───────────────┘

Symbol Table is the backbone of the compiler. It is created during lexical analysis and updated throughout all phases. It stores identifiers with their attributes (type, scope, memory location, etc.). Every phase reads/writes the symbol table. The error handler works alongside all phases to report lexical, syntactic, and semantic errors.

🔤

Lexical Analysis

LEXING

Term	Definition
Token	A pair (token-name, attribute-value); represents a meaningful lexical unit
Lexeme	The actual character sequence in source code that matches a pattern (e.g., "count")
Pattern	A rule (usually a regex) describing the set of strings for a token (e.g., [a-zA-Z_][a-zA-Z0-9_]* for identifiers)
Lexical Error	Characters that dont match any token pattern (e.g., @ in C without string context)
Longest Match Rule	When two tokens match, prefer the longer lexeme (e.g., "!=" over "!" then "=")
Rule Priority	When same-length matches, prefer keyword over identifier (e.g., "if" as keyword)

Token Class	Pattern / Example	Lexeme Examples
Keyword	if, else, while, return, int, float	"if", "while", "return"
Identifier	[a-zA-Z_][a-zA-Z0-9_]*	"count", "_temp", "x1"
Integer Literal	[0-9]+	"42", "0", "12345"
Float Literal	[0-9]+\.[0-9]+([eE][+-]?[0-9]+)?	"3.14", "1.5e-3"
String Literal	"[^"\n]*"	"hello world"
Operator	+, -, *, /, =, ==, !=, <, <=, >, >=	"+", "==", "<="
Delimiter	(, ), {, }, [, ], ;, ,	"(", ";", ","
Comment	//.$ or /\[\s\S]?\/	"// comment", "/* block */"

re-to-nfa-to-dfa.txt

─── Regular Expression → NFA → DFA Pipeline ───

Step 1: Write Regular Expression
  Example: (a|b)*abb

Step 2: RE → ε-NFA (Thompson Construction)
  For each RE operator, build small NFA fragments:
  • Union (a|b): new start → ε → NFA(a) and NFA(b) → ε → new accept
  • Concatenation (ab): NFA(a).accept → ε → NFA(b).start
  • Kleene Star (a*): new start → ε → NFA(a), NFA(a).accept → ε → NFA(a).start, → ε → new accept

Step 3: ε-NFA → NFA (ε-Closure)
  Compute ε-closure of each state (all reachable states via ε only)

Step 4: NFA → DFA (Subset Construction)
  • Start state = ε-closure(NFA start state)
  • For each DFA state and input symbol: compute move + ε-closure → new DFA state
  • Repeat until no new states are created

Step 5: DFA Minimization (Hopcroft / Table-filling)
  • Remove unreachable states
  • Partition states into accepting / non-accepting
  • Repeatedly split partitions if transitions go to different partitions
  • Result: Minimum-state DFA

subset-construction-example.txt

─── Subset Construction Example ───
NFA for (a|b)*abb:

NFA States: {0,1,2,3,4,5,6,7,8,9,10}
  0 --ε--> 1, 0 --ε--> 7
  1 --a--> 2,  7 --b--> 8
  2 --ε--> 3, 3 --ε--> 6, 6 --ε--> 1  (a loop)
  8 --ε--> 9, 9 --ε--> 6              (b loop)
  3 --a--> 4, 4 --b--> 5, 5 --b--> 10  (path for "abb")
  10 = accepting state

DFA Construction:
  A = ε-closure(0) = {0,1,2,4,7}     START

  From A:
    move(A, a) = {3,8}  → ε-closure = {1,2,3,4,6,7,8}  = B
    move(A, b) = {5,8}  → ε-closure = {1,2,4,5,6,7,8,9} = C

  From B:
    move(B, a) = {3,8}  → ε-closure = {1,2,3,4,6,7,8}  = B (same)
    move(B, b) = {5,8,9}→ ε-closure = {1,2,4,5,6,7,8,9} = C

  From C:
    move(C, a) = {3,8}  → ε-closure = {1,2,3,4,6,7,8}  = B
    move(C, b) = {5,8,9,10} → ε-closure = {1,2,4,5,6,7,8,9,10} = D (ACCEPTING)

  DFA: A --a--> B, A --b--> C, B --a--> B, B --b--> C
       C --a--> B, C --b--> D,  D --a--> B, D --b--> C
  Only D is accepting → recognizes strings ending in "abb"

Finite Automaton Comparison

Feature	NFA	DFA
Transitions	Multiple for same symbol + ε moves	Exactly one per symbol per state
State Count	Fewer states	Up to 2^n states (subset construction)
Speed	Slower (backtracking possible)	Faster (single deterministic path)
Memory	Less memory	More memory (more states)
Construction	Easier from RE	Built from NFA via subset construction
Use	Theory, understanding patterns	Lexical analyzers (Lex, Flex)

Lexer Generator Tools

Tool	Language	Notes
Lex	C	Original lexer generator; produces C code
Flex	C	Free/Lexical; faster modern alternative to Lex
JFlex	Java	Lexer generator for Java applications
ANTLR	Java/Python/etc.	Combined lexer + parser; LL(*) parsing
Ragel	C/C++/Ruby/etc.	State machine based; very high performance

lex-example.l

/* ── Simple Lex (Flex) Example ── */
%{
#include "y.tab.h"    /* token definitions from Bison/Yacc */
int line_num = 1;
%}

digit     [0-9]
letter    [a-zA-Z_]
id        {letter}({letter}|{digit})*
number    {digit}+(.{digit}+)?([eE][+-]?{digit}+)?

%%

"if"          { return IF; }
"else"        { return ELSE; }
"while"       { return WHILE; }
"return"      { return RETURN; }
"int"         { return TYPE_INT; }
"float"       { return TYPE_FLOAT; }

{id}          { yylval.sval = strdup(yytext); return IDENTIFIER; }
{number}      { yylval.ival = atoi(yytext); return NUMBER; }

"=="          { return EQ; }
"!="          { return NEQ; }
"<="          { return LE; }
">="          { return GE; }

"//".*        { /* skip single-line comment */ }
[ \t]        { /* skip whitespace */ }
"\n"         { line_num++; }

.             { printf("Lex error: %s at line %d\n", yytext, line_num); }

%%

💡

The Longest Match Rule is critical: if the input is "intvalue", the lexer must produce the identifier "intvalue", not the keyword "int" followed by "value". Always match the longest possible lexeme. Only if two lexemes have the same length, use keyword priority.

🌳

Syntax Analysis (Parsing)

PARSING

Term	Definition
CFG	Context-Free Grammar: G = (V, T, P, S) where V=variables, T=terminals, P=productions, S=start symbol
Parse Tree	Tree showing how a string is derived from the start symbol using grammar productions
Ambiguity	A grammar is ambiguous if a string has two or more distinct parse trees (or leftmost derivations)
Left Recursion	A → Aα (direct) or A →* Aα (indirect); causes infinite loop in top-down parsers
Left Factoring	Transforming A → αβ₁ \| αβ₂ to A → αA', A' → β₁ \| β₂; removes common prefix ambiguity
LL(1)	Top-down parser; scans Left-to-right, uses Leftmost derivation, 1 token lookahead
LR(1)	Bottom-up parser; scans Left-to-right, uses Rightmost derivation (in reverse), 1 token lookahead

left-recursion-elimination.txt

─── Left Recursion Elimination ───

Direct Left Recursion:
  A → Aα₁ | Aα₂ | ... | Aαₘ | β₁ | β₂ | ... | βₙ

  After elimination:
  A  → β₁A&apos; | β₂A&apos; | ... | βₙA&apos;
  A&apos; → α₁A&apos; | α₂A&apos; | ... | αₘA&apos; | ε

─── Worked Example ───
  Original Grammar:
    E  → E + T | T
    T  → T * F | F
    F  → ( E ) | id

  After Elimination:
    E  → T E&apos;
    E&apos; → + T E&apos; | ε
    T  → F T&apos;
    T&apos; → * F T&apos; | ε
    F  → ( E ) | id

─── Indirect Left Recursion Removal ───
  If A → Bα and B → Aβ (mutual recursion):
  1. Arrange non-terminals in some order: A₁, A₂, ..., Aₙ
  2. For i = 1 to n:
     For j = 1 to i-1:
       Replace productions of Aᵢ where Aᵢ → Aⱼγ with
       all productions of Aⱼ substituted: Aᵢ → δ₁γ | δ₂γ | ...
     Eliminate direct left recursion from Aᵢ

left-factoring-example.txt

─── Left Factoring ───

Problem: Common prefixes cause predictive parser to not know
which production to choose with just 1 lookahead.

Original Grammar (ambiguous for LL(1)):
  S → i E t S S&apos; | a
  S&apos; → e S | ε
  E → b

Here, on seeing "i", parser cannot decide which S production:
both start with "i E t S". Left factor:

  S  → i E t S S&apos; | a
     becomes:
  S  → i E t S S&apos;₁
  S&apos;₁ → e S | ε  | a   (but this mixes concerns)

Better approach for general case:
  A → αβ₁ | αβ₂
  becomes:
  A → αA&apos;
  A&apos; → β₁ | β₂

Example:
  stmt → if ( expr ) stmt else stmt
       | if ( expr ) stmt
       | while ( expr ) stmt
       | ...

  Left factored:
  stmt → if ( expr ) stmt opt_else
       | while ( expr ) stmt
       | ...
  opt_else → else stmt | ε

first-follow-sets.txt

─── FIRST and FOLLOW Sets (Worked Example) ───

Grammar (after left recursion elimination):
  E  → T E&apos;
  E&apos; → + T E&apos; | ε
  T  → F T&apos;
  T&apos; → * F T&apos; | ε
  F  → ( E ) | id

─── FIRST Set Computation ───
Rule: FIRST(α) = set of terminals that begin strings derivable from α
  - If α is terminal a, then FIRST(α) = {a}
  - If α is ε, then FIRST(α) = {ε}
  - If α → Y₁Y₂...Yₖ:
    • Add FIRST(Y₁) - {ε} to FIRST(α)
    • If ε ∈ FIRST(Y₁), add FIRST(Y₂) - {ε}
    • Continue until Yᵢ where ε ∉ FIRST(Yᵢ) or end
    • If all Yᵢ contain ε, add ε to FIRST(α)

Step-by-step:
  FIRST(F):
    F → ( E ): FIRST = {(}
    F → id:   FIRST = {id}
    ∴ FIRST(F) = {(, id}

  FIRST(T&apos;):
    T&apos; → * F T&apos;: FIRST = {*}
    T&apos; → ε:        FIRST = {ε}
    ∴ FIRST(T&apos;) = {*, ε}

  FIRST(T):
    T → F T&apos;:
    FIRST(F) - {ε} = {(, id} → add to FIRST(T)
    ε ∉ FIRST(F), so stop
    ∴ FIRST(T) = {(, id}

  FIRST(E&apos;):
    E&apos; → + T E&apos;: FIRST = {+}
    E&apos; → ε:        FIRST = {ε}
    ∴ FIRST(E&apos;) = {+, ε}

  FIRST(E):
    E → T E&apos;:
    FIRST(T) - {ε} = {(, id} → add to FIRST(E)
    ε ∉ FIRST(T), so stop
    ∴ FIRST(E) = {(, id}

─── FOLLOW Set Computation ───
Rule: FOLLOW(A) = set of terminals that can appear immediately after A
  1. Add $ to FOLLOW(S) where S is start symbol
  2. For production B → αAβ: add FIRST(β) - {ε} to FOLLOW(A)
  3. For production B → αA (or αAβ where ε ∈ FIRST(β)):
     add FOLLOW(B) to FOLLOW(A)
  4. Repeat until no changes

Step-by-step:
  FOLLOW(E):
    Start symbol → add {$}
    From F → ( E ): add {)} (since ) is terminal after E)
    ∴ FOLLOW(E) = {$, )}

  FOLLOW(E&apos;):
    From E → T E&apos;: E&apos; is at end, add FOLLOW(E) = {$, )}
    ∴ FOLLOW(E&apos;) = {$, )}

  FOLLOW(T):
    From E&apos; → + T E&apos;: add FIRST(E&apos;) - {ε} = {+}
    Also E&apos; → ε case: add FOLLOW(E&apos;) = {$, )} to FOLLOW(T)
    ∴ FOLLOW(T) = {+, $, )}

  FOLLOW(T&apos;):
    From T → F T&apos;: T&apos; is at end, add FOLLOW(T) = {+, $, )}
    ∴ FOLLOW(T&apos;) = {+, $, )}

  FOLLOW(F):
    From T&apos; → * F T&apos;: add FIRST(T&apos;) - {ε} = {*}
    Also add FOLLOW(T&apos;) = {+, $, )} to FOLLOW(F)
    ∴ FOLLOW(F) = {*, +, $, )}

─── Summary ───
  FIRST(E)  = {(, id}         FOLLOW(E)  = {$, )}
  FIRST(E&apos;) = {+, ε}         FOLLOW(E&apos;) = {$, )}
  FIRST(T)  = {(, id}         FOLLOW(T)  = {+, $, )}
  FIRST(T&apos;) = {*, ε}         FOLLOW(T&apos;) = {+, $, )}
  FIRST(F)  = {(, id}         FOLLOW(F)  = {*, +, $, )}

ll1-parsing-table.txt

─── LL(1) Parsing Table Construction ───

Rule: For each production A → α:
  For each terminal a in FIRST(α):   add A → α to M[A, a]
  If ε ∈ FIRST(α):
    For each terminal b in FOLLOW(A): add A → α to M[A, b]
    If $ ∈ FOLLOW(A): add A → α to M[A, $]

If M[A, a] has multiple entries → NOT LL(1)!

─── Parsing Table for Our Grammar ───

         id        +        *        (        )        $
  ┌────────┬────────┬────────┬────────┬────────┬────────┐
  │E       │T E&apos;   │        │        │T E&apos;   │        │        │
  ├────────┼────────┼────────┼────────┼────────┼────────┤
  │E&apos;      │        │+ T E&apos; │        │        │ε       │ε       │
  ├────────┼────────┼────────┼────────┼────────┼────────┤
  │T       │F T&apos;   │        │        │F T&apos;   │        │        │
  ├────────┼────────┼────────┼────────┼────────┼────────┤
  │T&apos;      │        │ε       │* F T&apos; │        │ε       │ε       │
  ├────────┼────────┼────────┼────────┼────────┼────────┤
  │F       │id      │        │        │( E )   │        │        │
  └────────┴────────┴────────┴────────┴────────┴────────┘

No cell has multiple entries → Grammar IS LL(1) ✓

─── LL(1) Parsing of "id + id * id" ───

Stack:       $ E
Input:       id + id * id $

Step 1: E → T E&apos;          Stack: $ E&apos; T       Input: id + id * id $
Step 2: T → F T&apos;          Stack: $ E&apos; T&apos; F     Input: id + id * id $
Step 3: F → id              Stack: $ E&apos; T&apos; id    Input: id + id * id $
Step 4: match id            Stack: $ E&apos; T&apos;       Input: + id * id $
Step 5: T&apos; → ε            Stack: $ E&apos;          Input: + id * id $
Step 6: E&apos; → + T E&apos;      Stack: $ E&apos; T +      Input: + id * id $
Step 7: match +             Stack: $ E&apos; T        Input: id * id $
Step 8: T → F T&apos;          Stack: $ E&apos; T&apos; F      Input: id * id $
Step 9: F → id              Stack: $ E&apos; T&apos; id     Input: id * id $
Step 10: match id           Stack: $ E&apos; T&apos;       Input: * id $
Step 11: T&apos; → * F T&apos;     Stack: $ E&apos; T&apos; F *    Input: * id $
Step 12: match *            Stack: $ E&apos; T&apos; F      Input: id $
Step 13: F → id             Stack: $ E&apos; T&apos; id     Input: id $
Step 14: match id           Stack: $ E&apos; T&apos;       Input: $
Step 15: T&apos; → ε            Stack: $ E&apos;          Input: $
Step 16: E&apos; → ε            Stack: $              Input: $
Step 17: match $            Stack: (empty)       Input: $
ACCEPT ✓

LL vs LR Parsers

Feature	LL (Top-Down)	LR (Bottom-Up)
Direction	Left-to-right, Leftmost derivation	Left-to-right, Rightmost derivation (reverse)
Strategy	Predictive / recursive descent	Shift-reduce parsing
Grammar	Must be left-factored, no left recursion	Handles more grammars naturally
Power	Less powerful (subset of LR)	More powerful (superset of LL)
Error Detection	Immediate, reports at first token	Delayed, may detect later in parse
Implementation	Easier to implement manually	More complex; usually tool-generated
Tools	ANTLR, JavaCC, hand-written	Yacc, Bison, CUP
Table Size	Smaller	Larger (states based)
Examples	LL(1), LL(k), strong LL	SLR, CLR(1), LALR(1)

LR Parser Family

Type	Power	Description
SLR (Simple LR)	Least powerful	Uses FOLLOW set for reduce; may have shift-reduce conflicts
CLR(1) (Canonical LR)	Most powerful	Full LR(1) items with lookahead; largest table
LALR(1)	Between SLR & CLR	Merges LR(1) states with same core; same size as SLR, near CLR power
Yacc/Bison	LALR(1)	Practical standard; uses LALR(1) parsing tables
SLR vs LALR	LALR fewer conflicts	LALR is the practical choice for most compilers

recursive-descent.c

/* ── Recursive Descent Parser for: E → T E&apos;, E&apos; → + T E&apos; | ε, etc. ── */

Token lookahead;

void match(TokenType expected) {
    if (lookahead.type == expected)
        lookahead = next_token();
    else
        error("Syntax error: expected %s, got %s",
              token_name(expected), token_name(lookahead.type));
}

/* E → T E&apos; */
void parse_E() {
    parse_T();
    parse_Eprime();
}

/* E&apos; → + T E&apos; | ε */
void parse_Eprime() {
    if (lookahead.type == PLUS) {
        match(PLUS);
        parse_T();
        parse_Eprime();
    }
    /* else: ε production, do nothing */
}

/* T → F T&apos; */
void parse_T() {
    parse_F();
    parse_Tprime();
}

/* T&apos; → * F T&apos; | ε */
void parse_Tprime() {
    if (lookahead.type == STAR) {
        match(STAR);
        parse_F();
        parse_Tprime();
    }
    /* else: ε production */
}

/* F → ( E ) | id */
void parse_F() {
    if (lookahead.type == LPAREN) {
        match(LPAREN);
        parse_E();
        match(RPAREN);
    } else if (lookahead.type == ID) {
        match(ID);
    } else {
        error("Syntax error: expected ( or id");
    }
}

/* Entry point */
void parse() {
    lookahead = next_token();
    parse_E();
    match(END_OF_FILE);
}

⚠️

A grammar is LL(1) if and only if: For every production A → α | β, the following conditions hold: (1) FIRST(α) and FIRST(β) are disjoint (no common terminals). (2) If ε ∈ FIRST(α), then FIRST(β) and FOLLOW(A) are disjoint (and vice versa). If a grammar is not LL(1), try left factoring and left recursion elimination first. If still not LL(1), consider using an LR parser (more powerful) or restructuring the grammar.

✅

Semantic Analysis

SEMANTICS

Task	Description	Example Error
Type Checking	Verify operations are valid for operand types	Adding int + string
Scope Resolution	Ensure variables are declared before use; resolve names	Using undeclared variable
Type Coercion	Implicit conversion between compatible types	int to float in 3 + 2.5
Array Bounds	Check subscript types and dimensions	int arr[3]; arr[5] = 1
Function Arguments	Check number, types, and order of arguments	foo(int, str) called with (str)
L-Value Check	Ensure left side of assignment is assignable	5 = x; (error)
Declaration Check	Verify each identifier is declared exactly once per scope	Duplicate variable name

Symbol Table Structure

Entry Field	Description
Name	Lexeme / identifier name
Type	Data type (int, float, array, function, etc.)
Kind	Variable, constant, function, parameter, type name
Scope Level	Nesting depth (0 = global, 1 = first function, etc.)
Offset	Relative address in activation record / stack frame
Size	Number of bytes for the entity
Parameters	For functions: parameter types and order
Return Type	For functions: return type
Link	Pointer to enclosing scope symbol table

Symbol Table Operations

Operation	Description
insert(name, attributes)	Add a new entry to the table
lookup(name)	Search for an entry by name; returns attributes or null
delete(name)	Remove an entry (when exiting scope)
begin_scope()	Create a new nested scope; push onto scope stack
end_scope()	Destroy current scope; pop from scope stack
enter_proc(name)	Create symbol table for a new procedure/function

symbol-table-scope.txt

─── Symbol Table: Scope Nesting Example ───

int x;           // x: int, scope=0, offset=0
float y;         // y: float, scope=0, offset=4

int foo(int a) { // a: int, scope=1, offset=-4 (param)
    int x;       // x: int, scope=1, offset=-8 (shadows outer x)
    float b;     // b: float, scope=1, offset=-12
    {            // nested block, scope=2
        int c;   // c: int, scope=2, offset=-16
        x = c;   // refers to scope=1 x (nearest enclosing)
    }
    return a + x; // x from scope=1, a from scope=1
}

// Scope Stack during foo():
// ┌──────────────────────────────────┐
// │ Scope 2: { c: int, offset=-16 }  │  ← innermost (search first)
// ├──────────────────────────────────┤
// │ Scope 1: { a: int, x: int,      │
// │            b: float }            │
// ├──────────────────────────────────┤
// │ Scope 0: { x: int, y: float }   │  ← global
// └──────────────────────────────────┘

// Lookup rule: search from innermost to outermost.
// If found, use that entry. If not found → "undeclared identifier" error.

attribute-grammars.txt

─── Attribute Grammars ───

S-attributed SDT (Synthesized Attributes Only):
  - Values flow BOTTOM-UP (from leaves to root)
  - Evaluated during bottom-up parsing (LR parser compatible)
  - All attributes are synthesized: computed from children

Example: Expression type checking (S-attributed)
  E.val = E1.val + T.val
  T.val = T1.val * F.val
  F.val = number.lexval | id.lexval

  Production:         Semantic Rule:
  E → E₁ + T          E.val = E₁.val + T.val
  E → T               E.val = T.val
  T → T₁ * F          T.val = T₁.val * F.val
  T → F               T.val = F.val
  F → ( E )           F.val = E.val
  F → num             F.val = num.lexval

L-attributed SDT (Synthesized + Inherited Attributes):
  - Inherited attributes flow TOP-DOWN (from parent to children)
  - Inherited attr of a child depends only on:
    (a) Parent&apos;s attributes, and
    (b) Left siblings&apos; attributes (NOT right siblings)
  - Compatible with LL parsing (top-down)

Example: Declaring and type checking variables
  Production:               Semantic Rule:
  D → T L                   L.inh_type = T.type
  T → int                   T.type = integer
  T → float                 T.type = float
  L → L₁ , id               L₁.inh_type = L.inh_type
                             add_type(id.entry, L.inh_type)
  L → id                    add_type(id.entry, L.inh_type)

  For "int a, b, c":
    D → T(int) L
    L.inh_type = integer
    L → L₁(integer) , a    → add a as int
    L₁ → L₂(integer) , b   → add b as int
    L₂ → c                 → add c as int

type-coercion.txt

─── Type Coercion Rules ───

Implicit (automatic) coercion hierarchy (widening):
  char → int → long → float → double

Example expressions:
  3 + 2.5        → 3.0 + 2.5 = 5.5   (int promoted to float)
  &apos;a&apos; + 1        → 97 + 1 = 98        (char promoted to int)
  1 + 2 + 3.0    → 1+2 = 3 (int), 3+3.0 = 6.0 (promoted)

Type checking rules:
  int  op int  → int        (e.g., +, -, *, /)
  int  op float → float     (promote int)
  float op float → float
  bool op bool → bool       (&&, ||, !)

Semantic error examples:
  "hello" + 5          → ERROR: cannot add string and int
  3.14 % 2             → ERROR: modulo requires integer operands
  int arr[3]; arr = 5; → ERROR: cannot assign to array name
  x = y;  (x declared as int*, y as int) → ERROR: incompatible pointer types

💡

S-attributed vs L-attributed: Every S-attributed grammar is also L-attributed (since inherited attributes are a superset capability). L-attributed SDTs can be implemented during top-down parsing. S-attributed SDTs fit naturally with LR (bottom-up) parsers. In practice, most compilers use a combination: parse tree is built first, then semantic analysis walks the tree with attribute evaluation.

📋

Intermediate Code Generation

IR Form	Description	Pros	Cons
Three-Address Code	x = y op z (max 1 op per instruction)	Easy to optimize, close to machine code	Many temporaries, verbose
Syntax Tree	Tree representing the hierarchical structure	Natural representation, easy to traverse	Harder to optimize directly
Postfix Notation	Operand stack-based (e.g., a b + c *)	Simple, no parentheses needed	Not easy for control flow
DAG	Directed Acyclic Graph; common subexprs shared	Detects common subexpressions	More complex to construct
SSA (Static Single Assignment)	Each variable assigned exactly once	Powerful for optimization	Requires phi nodes at merges
Bytecode	Stack-based (JVM) or register-based (Dalvik)	Portable, compact	Abstraction overhead

Three-Address Code Forms

Form	Representation	Description
Quadruple	(op, arg1, arg2, result)	Most common; easy to optimize and rearrange
Triple	(op, arg1, arg2)	Result referenced by position; no temporaries needed
Indirect Triple	Array of pointers to triples	Allows easy reordering of instructions

Three-Address Code Instructions

Category	Syntax	Example
Assignment	x = y op z	t1 = a + b
Unary op	x = op y	t1 = -a
Copy	x = y	t1 = x
Unconditional Jump	goto L	goto L3
Conditional Jump	if x relop y goto L	if t1 < t2 goto L1
Procedure Call	param x; call p, n; y = return	param t1; call foo, 1; t2 = return
Indexed Assignment	x = y[i]; x[i] = y	t1 = a[t2]; a[t2] = t1
Address/Pointer	x = &y; x = y; x = y	t1 = &a; t2 = *t1

tac-generation-example.txt

─── Three-Address Code Generation (Worked Example) ───

Source:  x = (a + b) * (c - d) + e

Step-by-step TAC generation:
  t1 = a + b         // evaluate (a + b)
  t2 = c - d         // evaluate (c - d)
  t3 = t1 * t2       // multiply results
  t4 = t3 + e        // add e
  x = t4             // store in x

Quadruple representation:
  ┌──────┬──────┬──────┬──────┐
  │  op  │arg1  │arg2  │result│
  ├──────┼──────┼──────┼──────┤
  │  0:  │  +   │  a   │  b   │  t1  │
  │  1:  │  -   │  c   │  d   │  t2  │
  │  2:  │  *   │  t1  │  t2  │  t3  │
  │  3:  │  +   │  t3  │  e   │  t4  │
  │  4:  │  =   │  t4  │      │  x   │
  └──────┴──────┴──────┴──────┘

Triple representation:
  ┌──────┬──────┬──────┐
  │  op  │arg1  │arg2  │
  ├──────┼──────┼──────┤
  │  0:  │  +   │  a   │  b   │    // (0) = a + b
  │  1:  │  -   │  c   │  d   │    // (1) = c - d
  │  2:  │  *   │ (0)  │ (1)  │    // (2) = (0) * (1)
  │  3:  │  +   │ (2)  │  e   │    // (3) = (2) + e
  │  4:  │  =   │ (3)  │  x   │    // x = (3)
  └──────┴──────┴──────┘

boolean-expression-tac.txt

─── Boolean Expression Translation ───

Two methods: Short-circuit (lazy) and Arithmetic evaluation.

Source:  if (a < b || c > d && e == f) goto L1

Method 1: Short-circuit evaluation (preferred in practice)
  // a < b || (c > d && e == f)
  // For OR: if left is true, skip right (short-circuit)
  // For AND: if left is false, skip right (short-circuit)

  if a < b goto L1      // short-circuit OR: if true, done
  goto L2
L2:
  if c > d goto L3       // AND: check first condition
  goto L4                // if false, whole AND is false
L3:
  if e == f goto L1      // AND: check second condition
  goto L4
L4:
  // fall through (condition is false)

Method 2: Boolean to integer (arithmetic approach)
  // true = 1, false = 0
  // || → max(a, b)
  // && → min(a, b)
  // ! → 1 - x

  t1 = a < b
  if t1 != 0 goto L1
  t2 = c > d
  t3 = e == f
  t4 = t2 && t3        // or: t4 = AND(t2, t3)
  if t4 != 0 goto L1

dag-example.txt

─── DAG (Directed Acyclic Graph) for Common Subexpression Elimination ───

Source:  t1 = a + b
         t2 = a + b    // same as t1 → redundant!
         t3 = t1 + c
         t4 = d + e
         t5 = d + e    // same as t4 → redundant!

DAG Construction:
         [+] (a+b) ────── node for t1 and t2
        /   \
      [a]   [b]
        |
      [+] (t1+c) ────── node for t3
      /   \
  [node_t1]  [c]

         [+] (d+e) ────── node for t4 and t5
        /   \
      [d]   [e]

After DAG optimization:
  t1 = a + b
  t2 = t1       // reuse! (not computed again)
  t3 = t1 + c
  t4 = d + e
  t5 = t4       // reuse! (not computed again)

Saved: 2 redundant additions eliminated.

declaration-assignment-translation.txt

─── Declaration Translation ───

Source:  int a, b, c;
         float x = 3.14;
         int arr[10];

TAC:
  // Variable declarations (allocate space)
  // Width calculation:
  // int = 4 bytes, float = 4 bytes, pointer = 8 bytes

  // int a, b, c
  // offset_a = 0, offset_b = 4, offset_c = 8 (relative)

  // float x = 3.14
  // offset_x = 12
  t1 = 3.14        // float constant
  x = t1

  // int arr[10]
  // offset_arr = 16, width = 40 bytes

─── Assignment Translation ───

Simple: x = y + z * w
  t1 = z * w
  t2 = y + t1
  x = t2

Array: x = a[i] + b[j+1]
  t1 = i * 4          // element size for int (4 bytes)
  t2 = a[t1]          // a[i]
  t3 = j + 1
  t4 = t3 * 4
  t5 = b[t4]          // b[j+1]
  t6 = t2 + t5
  x = t6

Structure: s.x = s.y + 5
  // Assume: s at offset 0, x at offset 0, y at offset 4
  t1 = *(s + 4)       // s.y
  t2 = t1 + 5
  *(s + 0) = t2       // s.x = result

Pointer: *p = *q + r
  t1 = *q             // dereference q
  t2 = t1 + r
  *p = t2             // dereference and store

⚠️

Why Three-Address Code? Each instruction performs exactly one operation (like a RISC instruction set), making it ideal for optimization. The compiler can easily analyze data flow, detect common subexpressions, eliminate dead code, and apply register allocation. Quadruples are preferred over triples because they allow instruction reordering during optimization (triples reference positions, making reordering harder).

📦

Runtime Environment

RUNTIME

Allocation	Lifetime	Location	Example
Static	Entire program execution	Fixed memory area	Global variables, static locals, string literals
Stack (Automatic)	Function call to return	Runtime stack	Local variables, parameters, return address
Heap (Dynamic)	Until explicitly freed	Heap memory pool	malloc/new objects, linked list nodes

activation-record.txt

─── Activation Record (Stack Frame) Structure ───

┌──────────────────────────────────┐  ← High Address
│                                  │
│         Caller&apos;s Frame         │
│                                  │
├──────────────────────────────────┤
│  Actual Parameters (args)        │  ← pushed by caller
│  arg₁, arg₂, ..., argₙ          │
├──────────────────────────────────┤
│  Return Address                  │  ← where to return after call
├──────────────────────────────────┤
│  Dynamic Link (Old FP)           │  ← caller&apos;s frame pointer
├──────────────────────────────────┤
│  Saved Registers                 │  ← callee-saved registers
├──────────────────────────────────┤
│  Return Value                   │  ← space for function result
├──────────────────────────────────┤
│  Local Variables                │  ← auto vars, temporaries
│  temp₁, temp₂, ...              │
├──────────────────────────────────┤
│  Local Arrays/Structs           │  ← large local data
├──────────────────────────────────┤
│         Frame Pointer (FP)       │  ← current function&apos;s FP
└──────────────────────────────────┘  ← Low Address / SP
          ↑
    Stack Pointer (SP)

Function Call Sequence:
1. Caller pushes actual parameters (args) onto stack
2. Caller executes CALL instruction (pushes return address)
3. Callee pushes old frame pointer (dynamic link)
4. Callee saves callee-saved registers
5. Callee allocates space for locals (SP = SP - local_size)
6. Function body executes
7. Callee stores return value in designated location
8. Callee restores saved registers and frame pointer
9. Callee executes RETURN (jumps to return address)
10. Caller retrieves return value and pops arguments

Parameter Passing Methods

Method	Mechanism	Effect in Callee	Example
Call by Value	Copy of argument passed	Changes to param dont affect caller	swap(a,b) has no effect on caller
Call by Reference	Address (pointer) passed	Changes to param affect caller directly	swap(&a,&b) actually swaps
Call by Name	Textual substitution (lazy)	Param re-evaluated on each use	Jensen's Device: swap-like with side effects
Call by Copy-Restore	Copy in at call, copy out at return	Effect visible only if no aliasing issues	Fortran-style; hybrid of value and ref
Call by Result	No value copied in; result copied out	Only output parameter	Similar to OUT param in Ada

Garbage Collection Methods

Method	How It Works	Pros	Cons
Mark-Sweep	Mark reachable from roots, sweep unmarked	Simple, handles cycles	Fragmentation, pauses execution
Mark-Compact	Mark then compact live objects together	No fragmentation	More complex, slower compaction
Copying	Copy live to new space, abandon old	No fragmentation, fast alloc	Uses 2x memory, copies overhead
Reference Counting	Count references; free at zero	Immediate reclamation, no pause	Cannot detect cycles, per-object overhead
Generational	Partition by object age; collect young often	Efficient for typical programs	Complex implementation

parameter-passing-detail.txt

─── Parameter Passing: Detailed Comparison ───

void foo(int x, int y) { x = 10; y = 20; }
int a = 5, b = 7;
foo(a, b);

─── Call by Value ───
  x = copy of a (5), y = copy of b (7)
  Inside foo: x becomes 10, y becomes 20
  After foo: a = 5, b = 7 (UNCHANGED)
  ↳ C (scalar types), Java (primitives), Python (immutable types)

─── Call by Reference ───
  x is alias for a, y is alias for b
  Inside foo: x (= a) becomes 10, y (= b) becomes 20
  After foo: a = 10, b = 20 (CHANGED)
  ↳ C++ (&), C (pointers), Java (objects by reference), Pascal (var)

─── Call by Name ───
  x and y are replaced by their actual argument text
  foo(a, b): x=10 makes a=10, y=20 makes b=20
  With expression foo(a+b, a*b):
    x=10 sets a+b = 10 (side effect on a!)
    y=20 sets a*b = 20 (different result!)
  ↳ Algol 60, macro expansion in C

─── Call by Copy-Restore (Value-Result) ───
  On CALL: copy a→x (5), copy b→y (7)
  Inside foo: x=10, y=20
  On RETURN: copy x→a (10), copy y→b (20)
  Result: a=10, b=20 (same as reference here)
  BUT with aliasing: foo(a, a) → x=y=a=5
    inside: x=10, y=20
    return: a = x (10), then a = y (20)
    final: a = 20 (last copy wins!)
  ↳ Fortran, Ada (IN OUT)

garbage-collection-detail.txt

─── Mark-and-Sweep Garbage Collection ───

Phase 1: MARK
  - Start from ROOT SET (stack, global vars, registers)
  - DFS/BFS through all reachable objects
  - Set "marked" flag on each reachable object
  - Objects not marked = garbage

Phase 2: SWEEP
  - Scan entire heap linearly
  - For each object:
    • If marked: clear the mark bit (keep alive)
    • If NOT marked: add to free list (reclaim memory)

Phase 3: (Optional) COMPACT
  - Move all live objects to one end of heap
  - Update all pointers to moved objects
  - Eliminates fragmentation

─── Mark-Sweep Visual Example ───

  Heap: [A→B, B→null, C→D, D→C*, E→null]
  (* = cycle)

  Root set: A, E

  MARK phase:
    Visit A (mark) → visit B (mark) → null, backtrack
    Visit E (mark) → null, backtrack
    Result: A=marked, B=marked, E=marked
            C=unmarked, D=unmarked

  SWEEP phase:
    A: marked → clear mark (keep)
    B: marked → clear mark (keep)
    C: unmarked → FREE (reclaim)
    D: unmarked → FREE (reclaim) ← cycle detected!
    E: marked → clear mark (keep)

  ↳ Mark-sweep correctly handles cycles (C↔D)

Access Links (Static Scope) vs Display: In languages with static (lexical) scoping, the compiler creates an access link in each activation record pointing to the frame of the lexically enclosing function. Alternatively, a display is an array of frame pointers indexed by nesting depth for O(1) access to any enclosing scope. Dynamic scoping (e.g., old Lisp) uses the dynamic link (caller chain) instead — the most recent active instance of a variable name is used.

⚡

Code Optimization

OPTIMIZATION

Optimization Level	Scope	Description
Local	Within a single basic block	Simple transforms on straight-line code; no control flow analysis needed
Global	Within a single function	Uses control flow graph (CFG); can move code across blocks
Interprocedural	Across function boundaries	Inline expansion, interprocedural constant propagation
Peephole	Small window of instructions	Pattern matching on 1-4 instructions; replaces with better sequence

Local Optimizations

Technique	Description	Example
Constant Folding	Evaluate constant expressions at compile time	t1 = 3 + 5 → t1 = 8
Constant Propagation	Replace variable with its known constant value	x = 5; y = x + 1 → y = 6
Copy Propagation	Replace copies of variables with original	t1 = x; t2 = t1 + y → t2 = x + y
Dead Code Elimination	Remove code whose results are never used	t1 = a + b (t1 never used) → remove
Common Subexpr Elimination	Reuse already-computed identical expressions	t1=a+b; t2=a+b → t2=t1
Strength Reduction	Replace expensive ops with cheaper ones	x2 → x+x or x<<1; x2 → xx
Algebraic Simplification	Apply algebraic identities	x+0→x; x1→x; x0→0; x\|\|true→true

Loop Optimizations

Technique	Description	Example
Loop Unrolling	Replicate body to reduce branch overhead	for(i=0;i<4;i++) → 4 copies of body
Code Motion	Move invariant code out of loop	Compute t=a+b once before loop if a,b unchanged
Induction Variable	Simplify/eliminate loop induction vars	Replace i*4 with j+=4 (strength reduction)
Loop Fusion	Combine two loops with same bounds	Two identical for loops → one merged loop
Loop Fission	Split loop body for parallelism	One loop → two independent loops for multicore
Loop Peeling	Execute first iteration separately	Remove condition checks from main loop body

optimization-examples.txt

─── Optimization Examples (Before → After) ───

1. CONSTANT FOLDING
   Before:  t1 = 3 + 5 * 2
   After:   t1 = 13                    // evaluated at compile time

2. CONSTANT PROPAGATION + FOLDING
   Before:  x = 10
            y = x + 20
            z = y * 2
   After:   x = 10
            y = 30
            z = 60

3. COPY PROPAGATION
   Before:  t1 = x
            t2 = t1 + y
            t3 = t2 * z
   After:   t2 = x + y
            t3 = (x + y) * z

4. DEAD CODE ELIMINATION
   Before:  t1 = a + b
            t2 = c * d
            x = t1 + 5        // t2 never used!
   After:   t1 = a + b
            x = t1 + 5

5. COMMON SUBEXPRESSION ELIMINATION
   Before:  t1 = a + b
            t2 = c * d
            t3 = a + b         // same as t1
            t4 = t3 + e
   After:   t1 = a + b
            t2 = c * d
            t4 = t1 + e        // reused t1

6. STRENGTH REDUCTION
   Before:  t1 = i * 4         // multiplication is expensive
   After:   t1 = i << 2        // shift left by 2 (same as * 4)

   Before:  t1 = x ** 2        // exponentiation
   After:   t1 = x * x         // single multiplication

7. CODE MOTION (Loop Invariant)
   Before:  while (i < n) {
                t1 = a + b;     // a and b don&apos;t change!
                ... = arr[i] + t1;
                i++;
            }
   After:   t1 = a + b;        // moved outside loop
            while (i < n) {
                ... = arr[i] + t1;
                i++;
            }
   Benefit: Saves n-1 additions

peephole-optimization.txt

─── Peephole Optimization Examples ───

Peephole optimization looks at a small window (1-4 instructions)
and replaces patterns with more efficient equivalents.

Pattern → Replacement:
  ─────────────────────────────────────────
  MOV R1, R1         →  (delete — redundant)
  MOV R1, a
  MOV R1, b          →  MOV R1, b (first MOV is dead)

  ADD R1, 0          →  (delete — no effect)
  MUL R1, 1          →  (delete — no effect)
  SUB R1, 0          →  (delete — no effect)

  MUL R1, 2          →  SHL R1, 1   (shift left)
  MUL R1, 4          →  SHL R1, 2

  JMP L1
  L1: ...            →  (delete jump — fall through)

  JZ L1
  JNZ L2
  L1: ...            →  JNZ L2
                         L1: ...

  PUSH R1
  POP R1             →  (delete — no net effect)

  CMP R1, R1         →  (replace with TEST or delete)
  JZ L               →  JMP L  (always true)

─── Register Allocation via Graph Coloring ───

Step 1: Build Interference Graph
  - Nodes = temporaries (virtual registers)
  - Edge between tᵢ and tⱼ if they are "live" at the same point
    (i.e., both hold values needed at the same time)

Step 2: Graph Coloring
  - Assign a physical register (color) to each node
  - Adjacent nodes must get different colors
  - Use k colors = k physical registers available

Step 3: If k-colorable → allocate registers
  If NOT k-colorable → spill some variables to memory (stack)

─── Example ───
Variables: a, b, c, d, e
Interference edges: a-b, a-c, b-c, b-d, c-d, c-e
Available registers: 3 (R1, R2, R3)

Algorithm (simplify/select):
1. Remove node e (degree 2 < k=3) → push e
2. Remove node a (degree 2 < 3) → push a
3. Remove node d (degree 2 < 3) → push d
4. Remove node b (degree 1 < 3) → push b
5. Remove node c (degree 0 < 3) → push c

Select (reverse order):
  c → R1
  b → R2 (conflicts with c=R1)
  d → R3 (conflicts with b=R2, c=R1)
  a → R2 (conflicts with c=R1; b=R2 and a don&apos;t conflict now)
  e → R2 (conflicts with c=R1; check remaining)

Result: a=R2, b=R2(overlap handled), c=R1, d=R3, e=R2
(Spill a or b if actual coloring fails; then reload from stack)

Optimization	Time Complexity	Safety	Impact
Constant Folding	O(1) per expression	Always safe	Moderate; often enables other opts
Dead Code Elim.	O(n) per function	Always safe	Can be significant (large dead blocks)
CSE	O(n) per block	Always safe	Moderate; saves repeated computations
Copy Propagation	O(n) per function	Always safe	Enables dead code elim. downstream
Code Motion	O(n) per loop	Safe for loop invariants	High for tight loops with invariants
Loop Unrolling	O(1) transform	Safe (may increase code size)	Reduces branch overhead; helps ILP
Register Allocation	NP-hard (heuristic)	Safe (spill when needed)	Very high; fewer loads/stores

Optimization Caution:Optimization should preserve the program's meaning (semantics).Never optimize at the cost of correctness. Common pitfalls: (1) Forgetting side effects during CSE (e.g., function calls with same arguments but different results). (2) Integer overflow in constant folding. (3) Strict aliasing violations in C. (4) Floating-point reassociation (a+b)+c != a+(b+c) in IEEE 754. Always use -O2 for production, -O0 for debugging.

🎓

Interview Q&A

PLACEMENT QUESTIONS

Q1: Explain all phases of a compiler in detail.

A compiler has six logical phases: (1) Lexical Analysis (Scanner) breaks source code characters into tokens — keywords, identifiers, literals, operators. It uses regex patterns and produces a token stream while building the symbol table. (2) Syntax Analysis (Parser) takes the token stream and verifies grammatical structure using a Context-Free Grammar (CFG). It produces a parse tree or abstract syntax tree (AST). Top-down parsers (LL) use predictive parsing with FIRST/FOLLOW; bottom-up parsers (LR) use shift-reduce parsing. (3) Semantic Analysis checks type compatibility, scope rules, and declarations. It performs type checking, type coercion, and annotates the AST. The symbol table is heavily used here. (4) Intermediate Code Generation translates the annotated AST into platform-independent IR, typically three-address code (quadruples). This bridges the source and target languages. (5)Code Optimization transforms the IR to improve execution speed or reduce code size — constant folding, dead code elimination, loop optimization, register allocation via graph coloring. (6) Code Generation maps optimized IR to target machine instructions — instruction selection, register allocation, peephole optimization. The symbol table and error handler work alongside all phases.

Q2: How do you eliminate left recursion? Give an example.

Left recursion occurs when a non-terminal's production starts with itself (A -> Aα). This causes top-down parsers (recursive descent, LL) to enter an infinite loop. Elimination algorithm: For A -> Aα₁ | Aα₂ | ... | Aαₘ | β₁ | β₂ | ... | βₙ, replace with: A -> β₁A' | β₂A' | ... | βₙA' and A' -> α₁A' | α₂A' | ... | αₘA' | ε. Example: The grammar E -> E + T | T has direct left recursion. After elimination: E -> TE' and E' -> +TE' | ε. Similarly T -> T*F | F becomes T -> FT' and T' ->*FT' | ε. For indirect left recursion (A -> Bα, B -> Aβ), order all non-terminals and substitute: for each Aᵢ, replace any production Aᵢ -> Aⱼγ with Aᵢ -> all-productions-of-Aⱼ(gamma), then apply direct elimination.

Q3: What is the difference between LL and LR parsing? Which is more powerful?

LL (Top-Down) parsing scans Left-to-right and uses Leftmost derivation. The parser predicts which production to apply using 1 (or k) lookahead tokens. It builds the parse tree from root to leaves. LL parsers require the grammar to be left-factored and free of left recursion. They use a parsing table built from FIRST/FOLLOW sets. LR (Bottom-Up) parsing scans Left-to-right but uses Rightmost derivation in reverse. It builds the tree from leaves to root using shift-reduce actions. LR parsers are more powerful — they can handle a larger class of grammars (including some left-recursive grammars). LR is strictly more powerful than LL: every LL(1) grammar is also LR(1), but not vice versa. The three types of LR parsers — SLR (simplest, least powerful), CLR(1) (most powerful, largest tables), and LALR(1) (practical middle ground used by Yacc/Bison) — differ in how they handle reduce actions. LALR(1) has the same state count as SLR but near CLR(1) parsing power, making it the standard for production compilers.

Q4: How do you generate three-address code for an expression?

Three-address code (TAC) has at most one operator per instruction, with at most three operands. For expression x = (a + b) * (c - d) + e : (1) Create temporaries: t1 = a + b; t2 = c - d; t3 = t1 * t2; t4 = t3 + e; x = t4. Each sub-expression gets its own temporary. The grammar-directed translation uses synthesized attributes: E.place holds the result name, and new temporaries are generated with a counter. For Boolean expressions like a < b || c > d , short-circuit evaluation produces jumps: if a < b goto L_true; if c > d goto L_true; goto L_false. The DAG representation can detect common subexpressions: if t1 = a+b and t2 = a+b, the DAG shares one node, so t2 = t1 (eliminating redundant computation).Quadruples (op, arg1, arg2, result) are preferred over triples because they allow instruction reordering during optimization.

Q5: Explain the structure of an activation record.

An activation record (stack frame) is created for each function call and stores everything needed for that invocation. Structure (from high to low address): (1) Actual Parameters — pushed by the caller before the call. (2) Return Address — pushed by the CALL instruction; where to resume after return.(3) Dynamic Link (Old FP)— saves the caller's frame pointer; used to restore the caller's frame on return.(4) Saved Registers — callee-saved registers that must be preserved across the call.(5) Return Value— space for the function's return value (often in a register).(6) Local Variables — automatic variables declared in the function.(7) Temporaries — intermediate values computed during expression evaluation. The Frame Pointer (FP)points to a fixed location in the frame, allowing access to locals and parameters via fixed offsets. TheStack Pointer (SP) tracks the current top of stack. For languages with nested functions and static scoping, an access link(static link) points to the lexically enclosing function's frame, enabling access to non-local variables.

💡

Compiler Design interview tips: Focus on the pipeline flow (what goes in, what comes out of each phase). Be ready to compute FIRST/FOLLOW sets and build parsing tables. Know the difference between LL and LRparsers and why LR is more powerful. Understand three-address code generation and activation records in detail. Practice converting expressions to TAC, building DAGs, and performing optimizations. Know the parameter passing methods and their trade-offs.

⏳

Loading cheatsheet...