编译原理

本笔记基于上海交通大学 胡易坤老师 2024-2025 学年春季学期教学内容进行整理，部分图片来自胡老师的PPT，若有侵权请联系删除。

Quick Links

▶

考点跳转链接

Regex to NFA
NFA to DFA
DFA Simplification

Top-Down Parsing
- Recursive-Descent Parsing
- Predictive Parsing
  - LL(1) Parsing
Bottom-Up Parsing

FIRST and FOLLOW
LEADING and TRAILING
CLOSURE and GOTO
LR(1)’s CLOSURE and GOTO

Quick Check

▶

各类文法与解析方法

LL(1) Grammar
- L: 从左到右扫描输入；L：最左派生；1：提前看一个输入符号
- 无左递归和左因子
- $\mathbf{A} \rightarrow \mathbf{\alpha} \ | \ \mathbf{\beta}$表示两个不同的产生式，则$\mathbf{FIRST(\alpha)}$ 和 $\mathbf{FIRST(\beta)}$ 是互不相交的集合
  - 二者不会同时派生以$\mathbf{a}$开头的字符串
  - 至多一个$\mathbf{\alpha}$和$\mathbf{\beta}$可以派生空字符串
- 如果$\mathbf{\epsilon} \in \mathbf{FIRST(\beta)}$，则$\mathbf{FIRST(\alpha)}$和$\mathbf{FOLLOW(A)}$是互不相交的集合
LL(1) Parsing
- 提取左因子
- 消除直接左递归
- 计算$\mathbf{FIRST}$和$\mathbf{FOLLOW}$
- 构造预测分析表（列为终结符，行为非终结符）
- 从Start Symbol开始预测/或使用栈和输入缓冲区进行预测分析
  - 查表（最左侧非终结符，最左侧未匹配输入符号）作为派生的产生式进行派生
  - 若使用栈和输入缓冲区，则
    - 若栈顶符号为终结符且与输入符号匹配，则出栈并将输入指针后移
    - 若栈顶符号为非终结符，则查表（栈顶符号，输入指针对应符号）得到派生的产生式，将栈顶符号出栈并将产生式右侧符号逆序入栈
    - 直到栈为空，输入指针指向$\$$，则接受输入
OG Grammar（算符文法）
- 任意生成式不含两个相邻的非终结符
- 不含空生成式
OPP Grammar（算符优先文法）
- 首先满足OG Grammar的要求
- 任意两个终结符号对（有序）之间一定满足唯一的优先级关系
OPP Parsing
- 计算$\mathbf{LEADING}$和$\mathbf{TRAILING}$
- 构造优先级关系表
- 使用栈和输入缓冲区进行运算符优先解析/或使用优先级爬升法
  - 若栈顶终结符号优先级 <· / =· 输入符号优先级，则移进
  - 若栈顶终结符号优先级 >· 输入符号优先级，则归约
LR(0) Grammar
- L: 从左到右扫描输入；R：最右派生，最左归约；0：提前看一个输入符号
- 任一项集的状态转移不含归约-归约或移进-归约冲突
LR(0) Parsing
- 扩展文法
- 通过$\mathbf{GOTO}$和$\mathbf{CLOSURE}$构造LR(0)项集族
- 构造LR(0)分析表（列为输入符号，行为状态）
- 使用栈和输入缓冲区进行LR(0)解析
  - 查表，（栈顶，输入符号）为s n，则移进并push状态n到栈顶，
  - 若为r m，则用第m条产生式进行归约，pop栈顶的符号数目等于产生式右侧符号数目，查（栈顶符号，产生式左侧符号）为n，则push状态n到栈顶
  - 若为ACCEPT，则接受输入
SLR(1) Grammar
- 任一项集的状态转移不含归约-归约或移进-归约冲突
  - LR(0)项目集中存在归约-归约冲突时，两个$\mathbf{FOLLOW}$集的交集为空
  - LR(0)项目集中存在移进-归约冲突时，移进的终结符号不在归约的$\mathbf{FOLLOW}$集中
SLR(1) Parsing
- 扩展文法、构造集族同LR(0)
- 构造SLR(1)分析表时：
  - 对$\mathbf{A} \rightarrow \mathbf{\beta} \cdot$ 归约时, 只对$\mathbf{FOLLOW(A)}$中的终结符进行归约
  - 对$\mathbf{S'} \rightarrow \mathbf{S'}$ 赋值ACCEPT时, 只对$\$$接受
- 使用栈和输入缓冲区进行SLR(1)解析
LR(1) Grammar
- L: 从左到右扫描输入；R：最右派生，最左归约；1：提前看一个输入符号
- LR(1)的项由两部分组成：LR(0)的项和一个lookahead符号
- 任一项集的状态转移不含归约-归约或移进-归约冲突
  - 无归约-归约冲突：同一状态下如果有多个归约，则前瞻符号不相交
  - 无移进-归约冲突：同一状态下如果同时有移进和归约，则移进的终结符号不在归约的前瞻符号中
LR(1) Parsing
- 扩展文法
- 通过$\mathbf{GOTO}$和$\mathbf{CLOSURE}$构造LR(1)项集族
- 构造LR(1)分析表时：
  - 对$[\mathbf{A} \rightarrow \mathbf{\beta} \cdot, a]$ 归约时, 只对$\mathbf{a}$进行归约
  - 对$[\mathbf{S'} \rightarrow \mathbf{S} \cdot, \$]$ 赋值ACCEPT时, 只对$\$$接受
- 使用栈和输入缓冲区进行LR(1)解析
LALR(1) Grammar
- LA: LookAhead；L: 从左到右扫描输入；R：最右派生，最左归约；1：提前看一个输入符号
- LALR(1)的项集族是LR(1)的项集族的同心集合并
- 任一项集的状态转移不含归约-归约或移进-归约冲突
LALR(1) Parsing
- 扩展文法、构造集族同LR(0)作为心
- 对每个状态的初始项的心，用占位符#作为前瞻符号，通过闭包计算自发生成的前瞻符号和传递关系
- 对于传递关系，保留每个状态的内核项，构造传播表
- 根据传播表和自发生成的前瞻符号，通过若干轮传播得到每个心最终对应的前瞻符号，得到归约内核项的[心，前瞻符号(若干个)]对
- 构造LALR(1)分析表，与LR(1)相同
- 使用栈和输入缓冲区进行LALR(1)解析

Ch1 Intro

编译流程

解释器 vs. 编译器 (Interpreter vs. Compiler)
- 解释器方便错误诊断 (Error Diagnosis)
- 编译器得到的代码更加高效

编译器

前端：词法分析(Lexical Analysis)、语法分析(Syntax Analysis)、语义分析(Semantic Analysis)、中间代码生成(Intermediate Code Generation)
后端：代码优化(Optimization)、目标代码生成(Code Generation)

词法分析：Lexical Analysis

Lexical Analysis: Scanning （词法分析：扫描）
- Lexical Analyzer: Scanner （词法分析器：扫描器）
- Recognize Words (Lexemes) -> Tokens & Symbol Table （识别单词（词素）-> 生成标记和符号表）
- Token: <token-name, attribute-value (opt.)>
  - token-name: id, number, keywords, operators, etc. （标记名：标识符、数字、关键字、运算符等）
  - attribute-value: a pointer to the symbol table （属性值：指向符号表的指针）

语法分析：Syntax Analysis

Syntax Analysis: Parsing （语法分析：解析）
- Syntax Analyzer: Parser （语法分析器：解析器）
- Produce the Grammatical Structure （生成语法结构）
  - The Relationships among Tokens （标记之间的关系）
  - Tree-like Intermediate Representation （树状中间表示）
- Parse Tree （解析树）
  - Parser generates the Parse Tree, from which produces the Syntax Tree （解析器生成解析树，从中生成语法树）
- Syntax Tree: Simplified Parse Tree （语法树：简化的解析树）
  - Interior Node: Operation （内部节点：操作）
  - Children: Arguments of the Operation （子节点：操作的参数）

词法分析 vs. 语法分析

做着相似的事情：处理字符串
词法分析 (Scanning)：拆分字符串成Lexemes，并抽象成Tokens
语法分析 (Parsing)：整理Tokens的逻辑关系

语义分析：Semantic Analysis

Semantic Analysis （语义分析）
- computing additional info needed for compilation （计算编译所需的附加信息）
  - which is not regarded as syntax （这些信息不被视为语法）
- checking source code semantic consistency with the language definition （检查源代码与语言定义的语义一致性）
  - Type Checking （类型检查）

中间代码生成：Intermediate Code Generation

Intermediate Code Generation （中间代码生成）
- Intermediate Representations (IR), e.g., Syntax Tree, etc. （中间表示，例如语法树等）
- Low-level or Machine-like IR （低级或类似机器的中间表示）
  - LLVM-IR (LLVM, Clang), Gimple (GCC), etc. （LLVM-IR（LLVM，Clang），Gimple（GCC）等）
  - Three-address Code, with at Most Three Operands per Instruction （三地址码，每条指令最多三个操作数）
  - Static Single Assignment (SSA) （静态单赋值）
    - Every variable is only assigned (defined) once and defined before used. （每个变量只能被赋值（定义）一次，并且在使用前必须定义。）

优化：Optimization

Optimization （优化）
- Improve the IR for Better Target Code （改进中间表示以生成更好的目标代码）
- Better: Faster, Smaller, Greener （更好：更快、更小、更环保）

目标代码生成：Target Code Generation

Target Code Generation （目标代码生成）
- Instruction Selection （指令选择）
  - RISC vs. CISC
    - RISC: Reduced Instruction Set Computer （精简指令集计算机）
    - CISC: Complex Instruction Set Computer （复杂指令集计算机）
  - Intel Manual: > 6000 Pages （英特尔手册：超过6000页）
- Register Allocation （寄存器分配）
  - Graph Coloring Problem （图着色问题）
- Evaluation Order （计算顺序）
  - Arrange the Computation Order for Less Register Occupation （安排计算顺序以减少寄存器占用）
  - NPC (Non-Polynomial Complete) Problem （非多项式完全问题）

Ch2 Syntax Definition

文法的定义：Definition of Grammars

定义

Grammar: a Set of Rules to Describe a Language.
Language: a Sorted Set of Strings over some fixed Alphabet.
关系：Grammar Abstracts the Language to Cover All Its Strings. （文法是对语言的抽象，覆盖了语言的所有字符串）
∅: Empty Language (Set of Strings)
𝜺: Empty String (Set of Symbols)

组成

Grammar G[S] = (VN, VT, P, S)
- VT: A set of Terminal Symbols (终结符)
  - Atomic: 基本符号，不可再分
- VN : A set of Non-terminals (非终结符)
  - A Non-terminal: a set of strings of Terminals
- P: A set of Productions (生成式、规则)
  - A Non-terminal, an Arrow, a sequence of Terminals and/or Non-terminals
- S: A Start Symbol (开始符)
  - A Non-terminal

推导：Derivations

定义

A Grammar derives strings by beginning with the Start Symbol and repeatedly replacing a Non-terminal with Terminals via its Productions. （文法通过从开始符开始，反复用生成式将非终结符替换为终结符来派生字符串）
- 推导（Derivation）：反复根据生成规则用终结符替换非终结符
- 归约（Reduction）：推导的反过程

语法分析（Syntax Analysis, Parsing）

given a sequence of Terminals, figure out Whether it can be Derived from the Grammar and How if possible. （给定一个终结符序列，判断它是否可以由文法推导而来，如果可以，推导过程是什么）
一个语言可能有多个文法描述，而一个文法只会派生一个唯一语言

文法的二义性：Ambiguity

定义

语法树（Parse Tree）: A Graphical Representation of a Derivation without the Order of Applying Productions. （语法树是一个图形表示，表示了一个推导过程，但不考虑生成式应用的顺序）
二义性（Ambiguity）: When Parsing, given a sequence of Terminals, a Grammar is Ambiguous if there are more than one Parse Tree for the Derivation.（二义性是指一个文法可以产生多棵语法树） ### Fix the Grammar
Example:
- stmt -> if expr then stmt | if expr then stmt else stmt | other
- if E1 then if E2 then S1 else S2
Match each else with the closest unmatched then
- the statement appearing between a then and an else must be “matched”
- the interior statement must not end with an unmatched (open) then
Listing All Cases then Tidying Them Up
- if expr then matched_stmt
- if expr then open_stmt
- if expr then matched_stmt else matched_stmt
- if expr then matched_stmt else open_stmt
- so that:
  - matched_stmt -> if expr then matched_stmt else matched_stmt | other
  - open_stmt -> if expr then matched_stmt | if expr then open_stmt | if expr then matched_stmt else open_stm
- finally:
  - stmt -> matched_stmt | open_stmt
  - matched_stmt -> if expr then matched_stmt else matched_stmt | other
  - open_stmt -> if expr then stmt | if expr then matched_stmt else open_stmt

文法和语言的分类：Classes of Languages

范围由 Type 0 到 Type 3 逐渐缩小

Ch3 Scanning

词法分析：Lexical Analysis

Token, Pattern, and Lexemes

The Analysis Partitions Input String into Substrings. （分析将输入字符串划分为子字符串。）
Token: <token-name, attribute-value(opt.)>
- token-name: the role of lexical unit （词法单元的角色）
  - often refer to Token by its token-name
- attribute-value: any info associated to the Token （与Token相关的任何信息）
  - Generally, it has only ONE value: a pointer to the Symbol Table （通常只有一个值：指向符号表的指针）
    - In practice, the value of a constant can be stored as the attribute. （在实践中，常量的值可以作为属性存储。）
      - constant: strings, numbers （常量：字符串、数字）
Pattern: description of the form lexemes of a token may （描述词法单元的形式）
- regular expression: , ?, *, +, …
Lexeme: a sequence of characters matches a token’s pattern （与模式匹配的字符序列）
- Token vs. Lexeme: Class vs. Instance in C++

Specification of Tokens

Strings and Languages

String: a finite sequence of Symbols from an Alphabet （字符串：来自字母表的有限符号序列）
Language: any countable set of Strings （语言：任何可数的字符串集合）
Terms for Parts of a String s （字符串s的部分术语）:
- prefix：前缀
  - any string obtained by removing zero or more symbols from the end of s
- suffix：后缀
  - any string obtained by removing zero or more symbols from the beginning of s
- substring：子串
  - any string obtained by removing any prefix or any suffix from s （去掉前缀或后缀）
- subsequence：子序列
  - any string obtained by removing zero or more not necessarily consecutive position of s （不一定连续的位置）

Operations on Languages

Union: （并集）
- $L_1 ∪ L_2 = \{x | x ∈ L_1 \quad or \quad x ∈ L_2\}$
Concatenation: （连接）
- $L_1L_2 = \{xy | x ∈ L_1 \quad and \quad y ∈ L_2\}$
Kleene closure: （星闭包）
- $L^* = \bigcup_{1=0}^\infty L^i \\= {𝜖} ∪ L ∪ L^2 ∪ L^3 ∪ ... \\= \{x | x = x_1x_2...x_n, n ≥ 0, xi ∈ L\}$
- 𝜖 is the empty string
Positive closure: （正闭包）
- $L^+ = \bigcup_{1=1}^\infty L^i \\= LL^* \\= \{x | x = x_1x_2...x_n, n ≥ 1, xi ∈ L\}$

Regular Expressions

定义

Regular Expression (Regex): a way to describe Patterns of Tokens of a programming language. （正则表达式：描述编程语言的词法单元模式的一种方式）
- Each Regular Expression r denotes a Language L$(r)$. （每个正则表达式r表示一个语言L$(r)$）
  - Regular Language, Type-3 Language
- The Regular Expressions are built recursively out of smaller ones, using the rules. （正则表达式是用规则递归构建的）

语法

BASIS （基础）
- 𝜀 is a regular expression, and L(𝜀) = {𝜀}, the empty set. （空字符）
- a is a symbol in a set Σ, then a is a regular expression, and L(a) = {a}. （集合中的字符）
INDUCTION （归纳）：Suppose $r$ and $s$ are expressions denoting $L(r)$ and $L(s)$
- $r | s$ : a regular expression denoting $L(r) ∪ L(s)$
- $rs$ : a regular expression denoting $L(r)L(s)$
- $r^*$ : a regular expression denoting $(L(r))^*$
- $(r)$ : a regular expression denoting $L(r)$
  - We can add additional brackets around expressions. （我们可以在表达式周围添加额外的括号。）
定律：

Regular Definitions

Regular Definition: a set of productions with non-terminals derived by regular expressions. （正则定义：一组通过正则表达式派生的非终结符的产生式）

Extensions of Regex

$+$ : one or more instances
- $r^*$ = $r^+$ | 𝜖
- $r^+$ = $rr^*$ = $r^*r$
$?$ : zero or one instance
- $r?$ = $r$ | 𝜖
$[\quad]$ : character classes
- $[abc]$ = $a$ | $b$ | $c$
- $[a-z]$ = $a$ | $b$ | … | $z$

Regular Language / Grammar, and Regex

Regular Expression $r$ denotes a Language $L(r)$.
- Regular Language is Type-3 Language （正则语言是类型3语言）
- Regular Language is that denoted by Regular Expressions. （正则语言是由正则表达式表示的）
Regular Grammar is the grammar describes a Regular Language.
- with the production form of $\mathbf{A}→\alpha$ or $\mathbf{A}→\alpha\mathbf{B}$

Recognition of Tokens

Input Buffer

Transition Diagrams

Transition Diagram = Nodes + Edges, a Flowchart （状态流程图 = 节点 + 边）
- Nodes: states, conditions that could occur when looking for a lexeme that matches one pattern. （节点：状态）
  - States: Circles （状态：圆圈）
  - Start State: Arrowhead, Beginning of a Pattern （起始状态：箭头，开始）
  - End State(s): Double Circles, End of a Pattern （终止状态：双圆圈，结束）
- Edge: actions, taken to transit from one State to Another. （边：动作）
  - labeled by a Symbol or a set of Symbols for matching （标记为符号或符号集以进行匹配）
- Deterministic: at most ONE edge out of a given state with a given label. （确定性：在给定状态下，最多有一条边）
Example:
- *(Retract): A Token has been accepted while another char has been read which must be unread. （回退：一个Token已经被接受，而另一个不应读取字符已经被读取，必须回退）

Reserved Words （保留字）

Keywords look like Identifiers.
- if, then, …
Add reserved words into symbol table initially. （在符号表中添加保留字）
Create separate transition diagrams for each keyword. （为每个关键字创建单独的转换图）
- thenextone

有穷自动机：Finite Automata

Finite Automata

What: an Abstract Machine that can be in exactly one of a Finite number of States at any given time. （有限自动机：在任何给定时间只能处于有限数量的状态之一的抽象机器）
- Finite Automation = Finite-state Automation (FSA, plural: automata) （有限自动机 = 有限状态自动机）
  - Finite-state Machine (FSM), or simply State Machine （有限状态机，或简单地称为状态机）
- changes from one state to another according to Inputs, called Transition （根据输入，从一个状态变化到另一个状态，称为转换）
Why: used as the Recognizer for Scanning, identifying Tokens （用于扫描的识别器，识别token）
How: answers “YES” or “NO” about each input String （如何：对每个输入字符串回答“是”或“否”）
- determines whether the String is valid for the given Grammar （确定字符串是否符合给定的语法）

DFA vs. NFA

FA: Deterministic (DFA) or Non-deterministic (NFA)
DFA: have exactly/at most one action for each input symbol （每个输入符号有一个动作）
- can be represented with a Transition Diagram
- Recognition with DFA: Faster, may take More Space （识别DFA：更快，可能占用更多空间）
- complex to represent Regex, but more Precise, widely used （复杂表示正则表达式，但更精确，广泛使用）
NFA: can have multiple actions for the same input symbol （同一输入符号可以有多个动作）
- can be represented with a Transition Graph
- Recognition with NFA: Slower, may take Less Space （识别NFA：较慢，可能占用更少的空间）
- simply represents Regex, but less Precise （简单表示正则表达式，但不够精确）
Example:
Lexical Analysis Workflow with FA:
- Regex -> NFA -> DFA
- Regex -> DFA

Nondeterministic Finite Automata (NFA)

An NFA M = $\mathbf{(S, 𝜮, move, 𝒔_0, F)}$ consists of:
- $\mathbf{S}$: a finite set of States （有限状态集）
- $\mathbf{𝜮}$: the Input Alphabet, excluding 𝜖 （不包含𝜖的输入符号集合）
- $\mathbf{move}$, a Transition Function （转换函数）
  - move(State, Symbol) = set of Next States
  - move: $𝑆×(Σ ∪ \{𝜖\}) ⟶ ℙ(𝑆)$
- $\mathbf{s_0} ∈ \mathbf{S}$, the Start State (or Initial State) （起始状态）
- $\mathbf{F} ⊆ \mathbf{S}$, a set of Accepting States (or Final States) （终止状态）
An NFA accepts Input String $s$ iff
- there exists some path in the Transition Graph from the Start State to one Accepting State, （存在一条路径从起始状态到一个接受状态）
- such that symbols along the path spell out $s$ （路径上的符号拼写出$s$）
Transition Tables：rows for States, columns for Input Symbols and 𝝐
- Example:

Deterministic Finite Automata (DFA)

What: a Special Case of an NFA, where
- there are no moves on symbol 𝜖, and
- for each state s and input symbol a, there is Exactly ONE edge out of s labeled by a. （每个状态s和输入符号a，恰好有一条边出s标记为a）
  - COMPLETE: It defines from each state a transition for each input symbol. （完整：它定义了从每个状态到每个输入符号的转换）
    - Transition function is a total function.
  - Local Automation: DFA not necessarily complete (… At Most ONE edge …) （局部自动机：DFA不一定是完全图）
    - Transition function is a partial function.

Algorithm for Simulation

Conversion: NFA –> DFA

Subset Construction （子集构造）
- removing 𝜖-transitions
- combining multiple NFA’s states into ONE constructed DFA’s state （将多个NFA的状态组合成一个构造的DFA的状态，即：等势点合并）
Definitions:
- 𝜖-closure(s):
  - s: some State
  - = Set of NFA States reached by state s via 𝜖-transitions, including s itself. （NFA中可以通过若干个空变换到达的状态的集合）
- 𝜖-closure(T):
  - T: set of States
  - = $∪_{s∈T}$ 𝜖-closure(s)
- move(T, a):
  - T: set of States
  - a: Input Symbol
  - = NFA’s States reached by 𝑠 ∈ 𝑇 on a.

Algorithm Subset Construction

Input: the start State s0 and the Transition Diagram of NFA N.

Output: Transition Graph of DFA Dtran

add 𝝐-closure(s0) into Dstates //将初始状态s0的𝝐闭包加入Dstates
while (Dstates has unsearched state S) { //当Dstates有未搜索的状态S时
    tag S as searched //将S标记为已搜索
    foreach input symbol a { //对每个输入符号a
        U = 𝝐-closure(move(S, a)) //设U为S进行a动作后状态S'的𝝐闭包
        if (U is new to Dstates) { //如果U是Dstates中的新状态
            add U into Dstates as unsearched //将U加入Dstates并标记为未搜索
        }
        Dtran(S, a) = U //将Dtran(S, a)设为U
    }
}

最后得到的Dtran是一个DFA的转换表，Dstates是DFA的状态集合。

Algorithm 𝜖-closure(T) Computation:

上一步中U = 𝝐-closure(move(S, a))的实现逻辑：
Input: the State Set T

Output: 𝜖-closure(T)

push all states in T onto Stack //将T中的所有状态压入栈中
while (Stack is not empty) { //当栈不为空时
    s = Stack.pop() //弹出栈顶元素s
    foreach (state u reached by s via 𝜖) { //对于每个s通过𝜖能达到的状态u
        if (u is not in 𝜖-closure(T)) { //如果u不在T的𝜖闭包中
            add u into 𝜖-closure(T) // 将其加入T的𝜖闭包
            Stack.push(u) //将u压入栈中
        }
    }
}

Example:

Conversion: Regex –> NFA

McNaughton-Yamada-Thompson Algorithm
Regex’s Definition:
- BASIS:
- INDUCTION:
Example:

Workflow

The Workflow of Lexical Analyzer
- Regex –> NFA Construction
- NFA –> DFA Construction
- Simulating DFA to Recognize Tokens
Convert Regex Directly into DFA: PASS
DFA Simplification: Minimizing the Number of States

DFA Simplification

What and Why:
- no REDUNDANT states （无冗余状态）
  - REDUNDANCE: the states that NO accepted input string’s path passes through （没有路径到达终止状态的状态）
  - (in the transition graph)
- no EQUIVALENT states （无等效状态）
  - EQUIVALENCE: states with the SAME side effects （具有相同副作用的状态）
  - (making the states indistinguishable)
- Distinguish States via Input String （通过输入字符串区分状态）
  - State: s, t
  - String: x
  - x distinguishes s from t,
    - if one state can reach an accepting state via x, while the other cannot. （如果一个状态可以通过x到达终止状态，而另一个状态不能）
  - s is distinguishable from t,
    - if there is some string distinguishes them. （存在一些字符串可以区分它们）
  - Unify Indistinguishable States into One. （将不可区分的状态合并为一个）
How：
1. Start with the initial partition $\mathbf{Π}$ with two groups, the accepting and non-accepting states of the DFA. （将DFA的接受状态和非接受状态分为两个组）
2. Let $\mathbf{Π_{new} := Π}$. Then, for each group $\mathbf{G}$ of $\mathbf{Π}$: （初始时令$\mathbf{Π_{new} = Π}$，然后对于每个组G）
  - For each input symbol $\mathbf{a}$, states $\mathbf{s},\mathbf{t}$ in $\mathbf{G}$ are partitioned if they transit to different groups of $\mathbf{Π}$ via $\mathbf{a}$; （对于每个输入符号a，如果状态s和t通过a转移到不同的组，则他们被划分为不同的组）
  - Replace $\mathbf{G}$ in $\mathbf{Π_{new}}$ by the new subgroups. （用新的子组替换$\mathbf{G}$）
3. If $\mathbf{Π_{new}} ≠ \mathbf{Π}$, $\mathbf{Π: = Π_{new}}$ and repeat Step 2, Step 4 otherwise. （如果$\mathbf{Π_{new}} ≠ \mathbf{Π}$，则令$\mathbf{Π: = Π_{new}}$并重复步骤2，否则跳到步骤4）
4. Aggregate the transitions among groups. （将组之间的转换聚合）
5. The resulting DFA is the minimized DFA. （得到的DFA是最小化的DFA）
Example:

Ch4 Parsing

词法分析：Lexical Analysis

Parser

What: Given Tokens, Parsing Verifies whether the Token Names Can Be Generated by the Grammar for the Source Language. （给定tokens，解析会验证token名称是否可以由源语言的语法生成）
Why: We expect the Parser
- to report Syntax Errors （报告语法错误）
- to recover from Errors to continue following processes. （从错误中恢复以继续后续过程）
How: Derivation or Reduction
- Top-down and Bottom-up Parsing

Compiler Errors

Lexical Errors
- The string does not match the pattern of any token. （字符串与任何token的模式不匹配）
Syntactic Errors
- The string does not meet the requirements of the grammar. （字符串不符合语法要求）
Semantic Errors: Type Mismatching （语义错误：类型不匹配）

Error-Recovery Strategies

Panic-Mode Recovery
- Discarding input symbols one at a time until meeting synchronizing tokens （丢弃输入符号，直到遇到同步tokens）
  - synchronizing tokens, e.g. “;”, “}”, etc., decided by designers
- simple, but may cause more errors
Phrase-Level Recovery
- Local Correction on Input, allowing the parser to continue （允许解析器继续）
  - e.g., “,” → “;”, delete/insert “;”
  - designers’ responsibility
- helpless if error occurs before the point of detection
Error Productions
- augmenting grammar with productions generating erroneous constructs （用产生错误构造的产生式扩充语法）
- relying on designers
Global Correction
- choosing a minimal sequence of changes for a globally least-cost correction （选择一系列最小的变化，以实现全局最低成本的修正）
- costly, yardstick? should defined by designers

Context-Free Grammar (CFG)

Definition

A Context-Free Grammar consists of:
- Terminals
- Non-terminals
- Start Symbol
- Productions: $\mathbf{A} \rightarrow \mathbf{\alpha}$
  - Header / Left Side → Body / Right Side
    - Header: A Non-terminal
    - Body: zero or more Terminals or Non-terminals
  - $\mathbf{V_N} \rightarrow (\mathbf{V_T}|\mathbf{V_N})^*$

Derivation

What: Beginning with the Start Symbol, replace a Non-terminal by the body of one of its Production. （从起始符号开始，用其产生式的右侧替换非终结符）
Why:
- corresponding to Top-down Construction of a Parse Tree （对应于自顶向下构造解析树）
- helpful for Bottom-up Parsing
How:
- $$ : derive in one step
- $ $ : derive in zero or more steps
  - $\alpha \overset{*}{\Rightarrow} \alpha$
  - if $\alpha \overset{*}{\Rightarrow} \beta$ and $\beta \overset{*}{\Rightarrow} \gamma$, then $\alpha \overset{*}{\Rightarrow} \gamma$
- $ $ : derive in one or more steps
$\mathbf{S} \overset{*}{\Rightarrow} \alpha$
- $\alpha$ is a Sentential Form of S. （$\alpha$是S的一个句子形式）
- $\alpha$ may contain Terminals, Non-terminals, or may be Empty.
- Sentence: a Sentential Form without Non-terminals. （没有非终结符的句子形式）
What is a Language?
- $L(G)$: set of Sentences generated by the Grammar $G$
- A string of terminals $ L(G) $ if and only if $ $ is a Sentence of $ G $ (or $ S $).
Example:
Leftmost Derivation: the Leftmost Non-terminal is always replaced at first （最左边的非终结符总是第一个被替换）
- $ $
Rightmost Derivation: $ $
Parse Trees and Derivations
- What: A Graphical Representation of a Derivation （派生的图形表示）
  - filtering out the order in which productions applied to replace non-terminals （过滤出应用于替换非终结符的产生式的顺序）
- Example：

Top-Down Parsing

What: Create the Parse Tree from Top to Bottom （从上到下创建解析树）
- from root to leaves （从根到叶）
Why: for Parsing
How: Derive an Input String in the Leftmost Manner （以最左边的方式派生输入字符串）
- consistent with string scanning （与字符串扫描一致）
- Key: determine the production to be applied for a non-terminal （确定要应用于非终结符的产生式）
- Recursive-Descent Parsing
  - require backtracking to find right production （需要回溯以找到正确的产生式）
  - general, but inefficient
- Predictive Parsing
  - a special case of Recursive-Descent Parsing
  - no backtracking, choosing by looking ahead at input symbols （无需回溯，通过提前查看输入符号进行选择）

Recursive-Descent Parsing

void A() {
    // Choose an A-production, A --> X1 X2 ...Xk;
    for (i = 1 to k) {
        if (Xi is a non-terminal)
            call Xi();
        else if (Xi equals the current input symbol a)
            advance the input to the next symbol;
        else ... // an error has occurred
    }
}

Example:

Left Recursion （左递归）

What: The Grammar has a non-terminal A such that there exists a derivation $\mathbf{A} \overset{+}{\Rightarrow} \mathbf{A} \alpha$ （文法有一个非终结符A，使得存在一个派生$\mathbf{A} \overset{+}{\Rightarrow} \mathbf{A} \alpha$）
Why: Recursive-Descent Parsing cannot handle Left Recursion. （递归下降解析无法处理左递归）
How: Transform the grammar to eliminate Left Recursion. （转换文法以消除左递归）
- ==Immediate Elimination: $\mathbf{A} \rightarrow \mathbf{A} \alpha | \beta \quad \Rightarrow \mathbf{A} \rightarrow \beta \mathbf{A'} , \mathbf{A'} \rightarrow \alpha \mathbf{A'} | 𝝐$==
- $ _1 | | _m | _1 | | _n (_1 | | _n) , (_1 | | _m) | $
Example:

Left Recursion Elimination （消除左递归）

INPUT: Grammar G without Cycles or 𝝐-productions （无循环或𝝐产生式的文法）
- Cycle: $\mathbf{A} \overset{+}{\Rightarrow} \mathbf{A}$
- 𝝐-production: $\mathbf{A} \rightarrow \epsilon$
OUTPUT: Equivalent Grammar without Left Recursions （没有左递归的等效文法）

Steps:

supposing there are the Non-terminals with Order A1, A2..., An
for (i from 1 to n) {
    for (j from 1 to i-1) {
        replace each Ai → Aj𝛾 by Ai → 𝛿1𝛾 | 𝛿2𝛾 | ... | 𝛿𝑘𝛾, where:
            Aj  → 𝛿1 | 𝛿2 | ... | 𝛿𝑘 are all current Aj-productions
    }
    eliminate the Immediate Left Recursion among the Ai-productions
}

Example:

Predictive Parsing

What: recursive-descent parsers needing no backtracking （不需要回溯的递归下降解析器）
- can be constructed for LL(k) grammar （可以为LL(k)文法构造）
  - L: scanning input from Left to right
  - L: producing a Leftmost derivation
  - k: using k input symbols of lookahead at each step to make decision
Why: a unique production to apply, or none to use (error)
Example: LL(1)
- stmt → if (expr) stmt else stmt | while (expr) stmt | { stmt_list}
How: FIRST and FOLLOW
- assist in choosing which production to apply, based on the next input symbol （根据下一个输入符号选择应用哪个产生式）

FIRST and FOLLOW

FIRST: the set of terminals that begin strings derived from a non-terminal or a string of grammar symbols. （从非终结符或语法符号字符串派生的字符串开始的终结符集合）
- FIRST(𝜶): what the first symbol would be for 𝜶
- 𝜶: string of grammar symbols
- return: set of terminals that begin strings derived from 𝜶
  - first symbols of strings derived from 𝜶
- HOW：
  - if $\mathbf{X}$ is a terminal, then $\mathbf{FIRST(X)} = \{\mathbf{X}\}$
  - if $\mathbf{X}$ is a non-terminal, and $\mathbf{X} \rightarrow \mathbf{Y_1} \mathbf{Y_2} \dots \mathbf{Y_k}$
    - add all non-𝝐 symbols of $\mathbf{Y_1}$ to $\mathbf{FIRST(X)}$
    - add all non-𝝐 symbols of $\mathbf{Y_2}$ to $\mathbf{FIRST(X)}$, if $\epsilon \in \mathbf{FIRST(Y_1)}$
    - ···
    - add all non-𝝐 symbols of $\mathbf{Y_k}$ to $\mathbf{FIRST(X)}$, if $\epsilon \in \mathbf{FIRST(Y_1)}$ and $\epsilon \in \mathbf{FIRST(Y_2)}$ and ··· and $\epsilon \in \mathbf{FIRST(Y_{k-1})}$
    - add 𝝐 to $\mathbf{FIRST(X)}$, if $\epsilon \in \mathbf{FIRST(Y_1)}$ and $\epsilon \in \mathbf{FIRST(Y_2)}$ and ··· and $\epsilon \in \mathbf{FIRST(Y_k)}$
- Example:
FOLLOW: the set of terminals that can appear immediately to the right of a non-terminal in some sentential form. （在某些句子形式中，可以出现在非终结符右侧的终结符集合）
- FOLLOW(N): what is the next symbol of N
- N: a non-terminal
- return: set of terminals can appear immediately after N in a sentential form
- HOW:
  - place $ in $\mathbf{FOLLOW(S)}$
    - S: start symbol
    - $: input right end-marker
  - for each production $\mathbf{M} \rightarrow \alpha \mathbf{N} \beta$
    - add all symbols in $\mathbf{FIRST(\beta)}$ to $\mathbf{FOLLOW(N)}$, except 𝝐
    - if $\epsilon \in \mathbf{FIRST(\beta)}$, add all symbols in $\mathbf{FOLLOW(M)}$ to $\mathbf{FOLLOW(N)}$
  - for each production $\mathbf{M} \rightarrow \alpha \mathbf{N}$
    - add all symbols in $\mathbf{FOLLOW(M)}$ to $\mathbf{FOLLOW(N)}$
- Example:

LL(1) Grammar

What: Any $\mathbf{A} \rightarrow \mathbf{\alpha} \ | \ \mathbf{\beta}$ Represents two Distinct Productions （$\mathbf{A} \rightarrow \mathbf{\alpha} \ | \ \mathbf{\beta}$表示两个不同的产生式）
- $\mathbf{FIRST(\alpha)}$ and $\mathbf{FIRST(\beta)}$ are disjoint sets.（二者是互不相交的集合）
  - For no terminal $\mathbf{a}$, do both $\mathbf{\alpha}$ and $\mathbf{\beta}$ derive strings beginning with $\mathbf{a}$. （二者不会同时派生以$\mathbf{a}$开头的字符串）
  - At most one of $\mathbf{\alpha}$ and $\mathbf{\beta}$ can derive the empty string. （至多一个$\mathbf{\alpha}$和$\mathbf{\beta}$可以派生空字符串）
- If $\mathbf{\epsilon} \in \mathbf{FIRST(\beta)}$, then $\mathbf{FIRST(\alpha)}$ and $\mathbf{FOLLOW(A)}$ are disjoint sets. （如果$\mathbf{\epsilon} \in \mathbf{FIRST(\beta)}$，则$\mathbf{FIRST(\alpha)}$和$\mathbf{FOLLOW(A)}$是互不相交的集合）
Why: Proper Production is Selected by Looking ONLY at the Next Input Symbol. （通过仅查看下一个输入符号来选择适当的产生式）
How: By Parsing Table, a Two-Dimensional Array
- $\mathbf{M[A, a]} = \mathbf{\alpha}$, when deriving $\mathbf{A}$, apply $\mathbf{A} \rightarrow \mathbf{\alpha}$ if coming up with $\mathbf{a}$. （当派生$\mathbf{A}$时，如果出现$\mathbf{a}$，则应用$\mathbf{A} \rightarrow \mathbf{\alpha}$）

LL(1) Parsing

Predictive Parsing Table Construction
- INPUT: Grammar G.
- OUTPUT: Parsing Table M.
- STEPS: For each production $\mathbf{A} \rightarrow \mathbf{\alpha}$:
  1. For each terminal $\mathbf{a} \in \mathbf{FIRST(\alpha)}$, add $\mathbf{A} \rightarrow \mathbf{\alpha}$ to $\mathbf{M[A, a]}$.
  2. If $\mathbf{\epsilon} \in \mathbf{FIRST(\alpha)}$, then for each terminal $\mathbf{b} \in \mathbf{FOLLOW(A)}$, add $\mathbf{A} \rightarrow \mathbf{\alpha}$ to $\mathbf{M[A, b]}$.
  3. If $\mathbf{\epsilon} \in \mathbf{FIRST(\alpha)}$ and $\mathbf{\$} \in \mathbf{FOLLOW(A)}$, add $\mathbf{A} \rightarrow \mathbf{\alpha}$ to $\mathbf{M[A, \$]}$.
- Example:
Implementation
- Stack-based Method, mimicking a leftmost derivation （基于栈的方法，模仿最左派生）

Left Factoring

When the decision is not clear, defer it until seeing enough symbols. （当决策不明确时，推迟到看到足够的符号为止）
Left-Factored:
- $\mathbf{A} \rightarrow \mathbf{𝜶} \mathbf{𝜷_1} | \mathbf{𝜶} \mathbf{𝜷_2}\quad by \quad \mathbf{A} \rightarrow \mathbf{𝜶} \mathbf{A'}, \mathbf{A'} \rightarrow \mathbf{𝜷_1} | \mathbf{𝜷_2}$
How:
- For each non-terminal $\mathbf{A}$, find the longest common prefix $\mathbf{𝜶}$ of its alternatives.
- If $\mathbf{𝜶}$ is not empty, replace all of the $\mathbf{A}$-productions
Example:
- $\mathbf{A} \rightarrow \mathbf{𝜶} \mathbf{𝜷_1} | \mathbf{𝜶} \mathbf{𝜷_2} | ... | \mathbf{𝜶} \mathbf{𝜷_n} | \mathbf{𝜸}\quad \quad by \quad \quad \mathbf{A} \rightarrow \mathbf{𝜶} \mathbf{A'} | 𝜸, \mathbf{A'} \rightarrow \mathbf{𝜷_1} | \mathbf{𝜷_2} | ... | \mathbf{𝜷_n}$

Non-LL(1) Grammar

Non-LL(1) Grammars:
- grammars with Left Recursion （左递归的文法）
- grammars not Left Factored （未左因子化的文法）
- grammars with Ambiguity （歧义文法）
Thus, before Predictive Parsing,
- perform Left Factoring （左因子化）
- eliminate Left Recursion (prerequisite of the elimination?) （消除左递归）
- remove Ambiguity （消除歧义）

Bottom-Up Parsing

What: the construction of a parse tree beginning at the leaves and working up to the root （从叶子开始构建解析树，直到根）
Why: not all grammars can be made LL(1)
How: construct rightmost derivation in the reverse order
- Reduction: “Reversed Derivation”（归约：反向派生）
  - from the string to the start symbol （从字符串到起始符号）
  - The body of a production is replaced by the non-terminal at its header. （用产生式的头部替换产生式的主体）
- $S \overset{rm}{\Rightarrow} \gamma_0 \overset{rm}{\Rightarrow} \gamma_1 \overset{rm}{\Rightarrow} \dots \overset{rm}{\Rightarrow} \gamma_n \overset{rm}{\Rightarrow} \omega$
  - find the rightmost derivation in the reverse order: “leftmost reduction” （找到右侧派生的反向顺序：最左侧归约）
- “Left-to-Right, Rightmost Derivation in Reverse”: LR Parsing
Example：
- $E \Rightarrow T \Rightarrow T * F \Rightarrow T * id \Rightarrow F * id \Rightarrow id * id$

Handles

What: if $\mathbf{S} \overset{*}{\Rightarrow} \mathbf{\alpha A \omega} \overset{}{\Rightarrow} \mathbf{\alpha \beta \omega}$, then production $\mathbf{A \rightarrow \beta}$ in the position following $\mathbf{\alpha}$ is a handle of $\mathbf{\alpha \beta \omega}$.
- a substring that matches the body of a production, representing a step of reduction （与产生式的主体匹配的子字符串，表示归约的一步）
- a pair of values: (production, position)
Why: handle pruning for bottom-up parsing （处理底向上解析的剪枝）
- identify handles and reduce them to the appropriate leftmost non-terminals （识别句柄并将其归约到适当的最左非终结符）
How: by using a stack to keep track of the current position in the input string and the corresponding production rules （使用栈来跟踪输入字符串中的当前位置和相应的产生式规则）
Example：

Shift-Reduce Parsing （移进归约解析）

How: Stack + Input Buffer
- Stack: reduced Grammar Symbols
- Input Buffer: rest of the String to be parsed
Actions
- Shift: move the next symbol onto stack
- Reduce: replace the handle on the top of stack
- Accept: announce the success of parsing
- Error: discover syntax errors, and call for recovery
Example:

Operator-Precedence Parsing （运算符优先解析）

What: a shift-reduce parser handling operator-precedence grammar （处理运算符优先文法的移位归约解析器）
- operator-precedence grammar: a subset of LR(1) grammar （运算符优先文法：LR(1)文法的一个子集）
- for each Production:
  - no 𝝐 in the body （主体中没有空字符）
  - no two consecutive non-terminals in the body （在主体中没有两个连续的非终结符）
Why: to handle expressions with operator precedence and associativity （处理具有运算符优先级和结合性的表达式）
How: find handles according precedence
- Precedence （优先级）
  - $\mathbf{a} <· \mathbf{b}$: a’s precedence is lower than b’s
  - $\mathbf{a} =· \mathbf{b}$: … is equal to …
  - $\mathbf{a} >· \mathbf{b}$: … is higher than …
- Precedence Climbing Method （优先级爬升法）
  - scan the input from Left to Right until >· is encountered
  - then, scan backward until <· is encountered
  - that between <· and >· is the handle
- Implementation with STACK (栈实现)
  - let $\mathbf{a}$ be the top Terminal on the STACK （栈顶终结符）
  - let $\mathbf{b}$ be the INPUT Symbol under processing （正在处理的输入符号）
  - if $\mathbf{a} <· or =· \mathbf{b}$, Shift
  - else if $\mathbf{a} >· \mathbf{b}$, Reduce
Example:

OPP: Precedence Relation Construction （优先级关系构造）

Precedence Relation

If operator $\mathbf{a}$ has higher precedence than $\mathbf{b}$
- $\mathbf{a} >· \mathbf{b}$
- $\mathbf{b} <· \mathbf{a}$
If $\mathbf{a}$ and $\mathbf{b}$ has equal precedence
- if left-associative, then $\mathbf{a} >· \mathbf{b}$ and $\mathbf{b} >· \mathbf{a}$
- if right-associative, then $\mathbf{a} <· \mathbf{b}$ and $\mathbf{b} <· \mathbf{a}$
For all operator $\mathbf{a}$
- $\mathbf{a} <· id, \quad id >· \mathbf{a}$
- $\$ <· \mathbf{a}, \quad \mathbf{a} >· \$$
- $\mathbf{a} <· (,\quad ( <· \mathbf{a}, \quad \mathbf{a} >· ), \quad ) >· \mathbf{a}$
- $(=· )$

LEADING and TRAILING

LEADING: the set of symbols that can appear at the beginning of a string derived from a non-terminal （可以出现在从非终结符派生的字符串开头的符号集合）
- $\mathbf{LEADING(Q)} = \{\mathbf{Y}, \mathbf{N} \ | \ \mathbf{Q} \overset{+}{\Rightarrow} \mathbf{Y\delta} \ or\ \mathbf{Q} \overset{+}{\Rightarrow} \mathbf{N Y \delta}, \mathbf{N} \in \mathbf{V_n}, \mathbf{Y} \in \mathbf{V_t}\}$
- for $\mathbf{Q} \rightarrow \mathbf{Y \delta}$ or $\mathbf{Q} \rightarrow \mathbf{N Y \delta}$, we have:
  - $\mathbf{Y}, \mathbf{N} \in \mathbf{LEADING(Q)}$
  - $\mathbf{LEADING(N)} \subseteq \mathbf{LEADING(Q)}$
TRAILING: the set of symbols that can appear at the end of a string derived from a non-terminal （可以出现在从非终结符派生的字符串末尾的符号集合）
- $\mathbf{TRAILING(P)} = \{\mathbf{X}, \mathbf{N} \ | \ \mathbf{P} \overset{+}{\Rightarrow} \mu \mathbf{X} \ or \ \mathbf{P} \overset{+}{\Rightarrow} \mu \mathbf{X} \mathbf{N}, \mathbf{N} \in \mathbf{V_n}, \mathbf{X} \in \mathbf{V_t}\}$
- for $\mathbf{P} \rightarrow \mu \mathbf{X}$ or $\mathbf{P} \rightarrow \mu \mathbf{X} \mathbf{N}$, we have:
  - $\mathbf{X}, \mathbf{N} \in \mathbf{TRAILING(P)}$
  - $\mathbf{TRAILING(N)}\subseteq\mathbf{TRAILING(P)}$
Example:

Constructing Precedence Relations

\[ \mathbf{X} =· \mathbf{Y} \]
- if there is $\mathbf{A} \rightarrow \mathbf{\alpha X Y \beta}$, where $\mathbf{X}, \mathbf{Y} \in \mathbf{V}$, $\mathbf{\alpha}, \mathbf{\beta} \in \mathbf{V^*}$
- or, there is $\mathbf{A} \rightarrow \mathbf{\alpha X N Y \beta}$, where $\mathbf{N} \in \mathbf{V_n} \cup \{\mathbf{\epsilon}\}$, $\mathbf{X}, \mathbf{Y} \in \mathbf{V_t}$
  - adjacent symbols have equal precedence （相邻符号具有相等的优先级）
    - at least one of them is a terminal （至少有一个是终结符）
  - e.g., $\mathbf{E} + \mathbf{T}$, $\mathbf{T} * \mathbf{F}$, $\mathbf{E} += \mathbf{T}$
\[ \mathbf{X} <· \mathbf{Y} \]
- if there is $\mathbf{A} \rightarrow \mathbf{\alpha X Q \beta}$, and $\mathbf{Q} \overset{+}{\Rightarrow} \mathbf{Y \delta}$, where $\mathbf{X}, \mathbf{Y} \in \mathbf{V}$, $\mathbf{Q} \in \mathbf{V_n}$, $\mathbf{\alpha}, \mathbf{\beta}, \mathbf{\delta} \in \mathbf{V^*}$
- or, there is $\mathbf{A} \rightarrow \mathbf{\alpha X Q \beta}$, and $\mathbf{Q} \overset{+}{\Rightarrow} \mathbf{N Y \delta}$, where $\mathbf{N} \in \mathbf{V_n} \cup \{\mathbf{\epsilon}\}$, $\mathbf{X}, \mathbf{Y} \in \mathbf{V_t}$
  - $\mathbf{LEADING(Q)} = \{\mathbf{Y}, \mathbf{N} \ | \ \mathbf{Q} \overset{+}{\Rightarrow} \mathbf{Y\delta} \ or\ \mathbf{Q} \overset{+}{\Rightarrow} \mathbf{N Y \delta}\}$
    - for $\mathbf{A} \rightarrow \mathbf{\alpha X Q \beta}$, we have $\mathbf{X}$ <· Symbols in $\mathbf{LEADING(Q)}$
\[ \mathbf{X} >· \mathbf{Y} \]
- if there is $\mathbf{A} \rightarrow \alpha \mathbf{P} \mathbf{Y} \beta$, and $\mathbf{P} \overset{+}{\Rightarrow} \mu \mathbf{X}$, where $\mathbf{X}, \mathbf{Y} \in \mathbf{V}$, $\mathbf{P} \in \mathbf{V_n}$, $\alpha, \beta, \mu \in \mathbf{V}^*$
- or, there is $\mathbf{A} \rightarrow \alpha \mathbf{P} \mathbf{Y} \beta$, and $\mathbf{P} \overset{+}{\Rightarrow} \mu \mathbf{X} \mathbf{N}$, where $\mathbf{N} \in \mathbf{V_n} \cup \{\epsilon\}$, $\mathbf{P} \in \mathbf{V_n}$, $\mathbf{X}, \mathbf{Y} \in \mathbf{V_T}$
- $\mathbf{TRAILING(P)} = \{\mathbf{X}, \mathbf{N} \ | \ \mathbf{P} \overset{+}{\Rightarrow} \mu \mathbf{X} \ \text{or} \ \mathbf{P} \overset{+}{\Rightarrow} \mu \mathbf{X} \mathbf{N}\}$
  - for $\mathbf{A} \rightarrow \alpha \mathbf{P} \mathbf{Y} \beta$, we have Symbols in $\mathbf{TRAILING(P)} >· \mathbf{Y}$

Precedence Table

Steps：
- for each production $\mathbf{A} \rightarrow \mathbf{X_1 X_2 \dots X_k}$:
  - for each $\mathbf{X}$
    - if $\mathbf{X_i} \in \mathbf{V_t}$ and $\mathbf{X_{i+1}} \in \mathbf{V_t}$, then $\mathbf{X_i} =· \mathbf{X_{i+1}}$
    - if $\mathbf{X_i} \in \mathbf{V_t}$ and $\mathbf{X_{i+2}} \in \mathbf{V_t}$ and $\mathbf{X_{i+1}} \in \mathbf{V_n}$, then $\mathbf{X_i} =· \mathbf{X_{i+2}}$
    - if $\mathbf{X_i} \in \mathbf{V_t}$ and $\mathbf{X_{i+1}} \in \mathbf{V_n}$, then $\mathbf{X_i} <· \mathbf{LEADING(X_{i+1})}$
    - if $\mathbf{X_i} \in \mathbf{V_n}$ and $\mathbf{X_{i+1}} \in \mathbf{V_t}$, then $\mathbf{X_i} =· \mathbf{X_{i+1}}$ and $\mathbf{TRAILING(X_i)} >· \mathbf{X_{i+1}}$
- for the Start Symbol $\mathbf{S}$:
  - $\$ <· \mathbf{LEADING(S)}$
  - $\mathbf{TRAILING(S)} >· \$$
Example:

OPP: Some More

Unary Minus vs. Binary Minus （一元负号与二元负号）
Leave It to Scanners （留给扫描器）
- return two different tokens for the two （返回两个不同的tokens）
- lookahead is required （需要向前看）
OPPs are not used often in practice （运算符优先级在实践中不常用）
- limited scenarios and applications （有限的场景和应用）
- but, simple -> part of a complex parsing system （但简单，是复杂解析系统的一部分）

LR Parsing

What:
- left-to-right scanning
- rightmost derivation in reverse
Why
- can recognize virtually all programming languages of context-free grammars （几乎可以识别所有上下文无关文法的编程语言）
- is the most general non-backtracking shift-reduce parsing method known （已知的最通用的非回溯移位归约解析方法）
  - yet is still efficient （仍然高效）
- can detect a syntax error as soon as possible （尽快检测语法错误）
- is a proper superset of the predictive parsing （是预测解析的适当超集）

LR(0) Parsing

Items and the LR(0) Automaton

Problem: when to shift and when to reduce?
- how to decide whether that on the top of the stack is a handle? （如何判断栈顶的句柄？）
LR(0) Items: a production with a dot at some position in its body （LR(0) 项目：在其主体的某个位置有一个点的产生式）
- prefixes of a valid production, indicating how much we have seen at the point （有效产生式的前缀，指示我们在该点上已经看到多少）
- e.g. $\mathbf{A} \rightarrow \mathbf{XYZ}$ yields four items（例如$\mathbf{A} \rightarrow \mathbf{XYZ}$产生四个项）：
  - $\mathbf{A} \rightarrow \cdot \mathbf{XYZ}$: hope to see $\mathbf{XYZ}$ next on the input
  - $\mathbf{A} \rightarrow \mathbf{X} \cdot \mathbf{YZ}$: hope to see $\mathbf{YZ}$ next on the input
  - $\mathbf{A} \rightarrow \mathbf{XY} \cdot \mathbf{Z}$: hope to see $\mathbf{Z}$ next on the input
  - $\mathbf{A} \rightarrow \mathbf{XYZ} \cdot$: hope to see nothing next on the input
- kind of state + transition → automaton
Kernel and Non-Kernel Items
- Kernel Items: the Initial Item + those whose dots are not at the left
- Non-Kernel Items: Otherwise
LR(0) Automaton: CLOSURE + GOTO
- CLOSURE: set of Items （项集的闭包）
- GOTO: the Transition Function （转换函数）

CLOSURE and GOTO

$\mathbf{CLOSURE(I)}$:
- $\mathbf{I}$: a set of Items for a grammar $\mathbf{G}$（文法$\mathbf{G}$的项集）
- construct by two rules:
  1. every Item in $\mathbf{I}$ is added in to $\mathbf{CLOSURE(I)}$ （$\mathbf{I}$中的每个项目都添加到$\mathbf{CLOSURE(I)}$中）
  2. if $\mathbf{A} \rightarrow \mathbf{\alpha} \cdot \mathbf{B} \mathbf{\beta}$ is in $\mathbf{CLOSURE(I)}$ and $\mathbf{B} \rightarrow \mathbf{\gamma}$ is a production of $\mathbf{G}$, then add $\mathbf{B} \rightarrow \cdot \mathbf{γ}$ to $\mathbf{CLOSURE(I)}$ （如果$\mathbf{A} \rightarrow \mathbf{\alpha} \cdot \mathbf{B} \mathbf{\beta}$在$\mathbf{CLOSURE(I)}$中，并且$\mathbf{B} \rightarrow \mathbf{\gamma}$是$\mathbf{G}$的一个产生式，则将$\mathbf{B} \rightarrow \cdot \mathbf{γ}$添加到$\mathbf{CLOSURE(I)}$中）
- Example:
$\mathbf{GOTO(I, X)}$:
- $\mathbf{I}$: a set of Items for a grammar $\mathbf{G}$ （文法$\mathbf{G}$的项集）
- $\mathbf{X}$: a grammar symbol （文法符号）
- the closure of the set of items [$\mathbf{A} \rightarrow \mathbf{\alpha X} \cdot \mathbf{\beta}$] such that [$\mathbf{A} \rightarrow \mathbf{\alpha} \cdot \mathbf{X} \mathbf{\beta}$] is in $\mathbf{I}$ （$\mathbf{GOTO(I, X)}$是$\mathbf{I}$中所有形如[$\mathbf{A} \rightarrow \mathbf{\alpha} \cdot \mathbf{X \beta}$]的项所对应的项[$\mathbf{A} \rightarrow \mathbf{\alpha X} \cdot \mathbf{\beta}$]的集合的闭包）
  - the transition from the state for $\mathbf{I}$ under input $\mathbf{X}$ （在输入$\mathbf{X}$下，$\mathbf{I}$的状态转换）
- construct by
  1. for each Item $\mathbf{A} \rightarrow \mathbf{\alpha} \cdot \mathbf{X} \mathbf{\beta}$ in $\mathbf{I}$
  2. then every Item in $\mathbf{CLOSURE(A \rightarrow \alpha X \cdot \beta)}$ is added to $\mathbf{GOTO(I, X)}$
- Example:

Automaton Construction

INPUT: a grammar $\mathbf{G}$
OUTPUT: a LR(0) automaton
Construction:
1. augment $\mathbf{G}$ to $\mathbf{G'}$ by adding a new start symbol $\mathbf{S'}$ and production $\mathbf{S'} \rightarrow \mathbf{S}$ （通过添加新的起始符号$\mathbf{S'}$和产生式$\mathbf{S'} \rightarrow \mathbf{S}$扩展文法$\mathbf{G}$至$\mathbf{G'}$）
2. $\mathbf{C}$:= {$\mathbf{CLOSURE(S' \rightarrow \cdot S)}$} （先求$\mathbf{CLOSURE(S' \rightarrow \cdot S)}$，作为闭包的集合$\mathbf{C}$中的第一个闭包）
3. repeat:
  - for each Item $\mathbf{I}$ in $\mathbf{C}$ and each grammar symbol $\mathbf{X}$ in $\mathbf{G'}$ （对$\mathbf{C}$中每个项目$\mathbf{I}$和$\mathbf{G'}$中的每个文法符号$\mathbf{X}$）
    - if $\mathbf{GOTO(I, X)}$ is not empty and not in $\mathbf{C}$ （如果$\mathbf{GOTO(I, X)}$不为空且不在$\mathbf{C}$中，注意$\mathbf{GOTO(I, X)}$是项集$\mathbf{I}$在输入符号$\mathbf{X}$下的转换的闭包）
      - add $\mathbf{GOTO(I, X)}$ to $\mathbf{C}$
4. until no new Items are added to $\mathbf{C}$
Example:

Parsing Table Construction

LR(0) Parsing Table $\mathbb{T}$:
- Rows: states
- Columns: grammar symbols
  - terminals for $\mathbf{SHIFT}$ and $\mathbf{REDUCE}$ actions
  - non-terminals for $\mathbf{GOTO}$ actions
- Construction: each edge $\mathbf{X}$: $(s_m, s_n)$ （对于每条边$\mathbf{X}$，$(s_m, s_n)$为其起始和终止状态，X为文法符号）
  - if $\mathbf{X}$ is a terminal $\mathbf{a}$, then $\mathbb{T}[s_m, a] = \mathbf{SHIFT}\ n$ （记作s n）
  - if $\mathbf{X}$ is a non-terminal $\mathbf{A}$, then $\mathbb{T}[s_m, A] = \mathbf{GOTO}\ n$ （记作n）
  - if $\mathbf{A} \rightarrow \mathbf{\beta} \cdot$ is in $s_m$, then for each terminal $\mathbf{a}$, $\mathbb{T}[s_m, a] = \mathbf{REDUCE}\ \mathbf{A} \rightarrow \mathbf{\beta}$ （记作r n，其中n是产生式的编号，用罗马数字表示）
  - if $\mathbf{S'} \rightarrow \mathbf{S} \cdot$ is in $s_m$, then for each terminal $\mathbf{a}$, $\mathbb{T}[s_m, a] = \mathbf{ACCEPT}$ （记作a/acc）
Example: > 注：在上面的例子中，状态1、2、9均存在移进-归约冲突，因此不属于LR(0)文法。

Alogrithm and Implementation （算法与实现）

INPUT: an input string $\mathbf{\omega}$ and an LR-parsing table $\mathbb{T}$ for a grammar $\mathbf{G}$
OUTPUT: if $\mathbf{\omega} \in \mathbf{L(G)}$, the reduction steps of a bottom-up parse for $\mathbf{\omega}$

STEPS:

let a be the first symbol of 𝜔$ //设a是输入字符串𝜔$的第一个符号
push state 0 onto the stack //将状态0推入栈中
while (1) {
    let s be the state on top of the stack //设s是栈顶的状态
    if (ACTION[s, a] = shift t) { //如果ACTION[s, a] = shift t
        push t onto the stack //将t推入栈中
        let a be the next input symbol //设a是下一个输入符号
    }
    else if (ACTION[s, a] = reduce A→𝛽) { //如果ACTION[s, a] = reduce A→𝛽
        pop |𝛽| symbols off the stack //从栈中弹出|𝛽|（𝛽包含的符号数）个状态
        let t be the top of the stack now //设t是栈顶的状态
        push GOTO[t, A] onto the stack //将GOTO[t, A]推入栈中
    }
    else if (ACTION[s, a] = accept) break //如果ACTION[s, a] = accept，则结束
    else call error-handling routine
}

Example:

LR(0) Conflicts

Reduce-Reduce Conflicts:
- state has two reduce items
- e.g. $\mathbf{A} \rightarrow \mathbf{\alpha} \cdot$ and $\mathbf{B} \rightarrow \mathbf{\beta} \cdot$
Shift-Reduce Conflicts:
- state has a reduce item and a shift item
- e.g. $\mathbf{A} \rightarrow \mathbf{\alpha} \cdot \mathbf{k \gamma}$ and $\mathbf{B} \rightarrow \mathbf{\beta} \cdot$
To avoid conflicts: SLR(1) Parsing

SLR(1) Parsing

Parsing Table Construction

LR(0) Parsing Table $\mathbb{T}$ $\Rightarrow$ SLR(1) Parsing Table $\mathbb{T}$
- Rows: states
- Columns: grammar symbols
  - terminals for $\mathbf{SHIFT}$ and $\mathbf{REDUCE}$ actions
  - non-terminals for $\mathbf{GOTO}$ actions
- Construction: each edge $\mathbf{X}$: $(s_m, s_n)$
  - if $\mathbf{X}$ is a terminal $\mathbf{a}$, then $\mathbb{T}[s_m, a] = \mathbf{SHIFT}\ n$
  - if $\mathbf{X}$ is a non-terminal $\mathbf{A}$, then $\mathbb{T}[s_m, A] = \mathbf{GOTO}\ n$
  - if $\mathbf{A} \rightarrow \mathbf{\beta} \cdot$ is in $s_m$, then for each terminal $\mathbf{a} \in \mathbf{FOLLOW(A)}$, $\mathbb{T}[s_m, a] = \mathbf{REDUCE}\ \mathbf{A} \rightarrow \mathbf{\beta}$ （只对$\mathbf{FOLLOW(A)}$中的终结符进行归约）
  - if $\mathbf{S'} \rightarrow \mathbf{S} \cdot$ is in $s_m$, then $\mathbb{T}[s_m, \$] = \mathbf{ACCEPT}$ （只对$进行归约）
Example:

SLR(1) Conflicts

Reduce-Reduce Conflicts:
- state has two reduce items
- e.g. $\mathbf{A} \rightarrow \mathbf{\alpha} \cdot$ and $\mathbf{B} \rightarrow \mathbf{\beta} \cdot$
  - if $\mathbf{FOLLOW(A)} \cap \mathbf{FOLLOW(B)} = \emptyset$, safe for SLR(1) （如果$\mathbf{FOLLOW(A)} \cap \mathbf{FOLLOW(B)} = \emptyset$，则SLR(1)安全）
Shift-Reduce Conflicts:
- state has a reduce item and a shift item
- e.g. $\mathbf{A} \rightarrow \mathbf{\alpha} \cdot \mathbf{k \gamma}$ and $\mathbf{B} \rightarrow \mathbf{\beta} \cdot$
  - if $\mathbf{k} \notin \mathbf{FOLLOW(B)}$, safe for SLR(1) （如果$\mathbf{k} \notin \mathbf{FOLLOW(B)}$，则SLR(1)安全）

LR(1) Parsing

How: check the immediate following symbols of non-terminals for reduction （检查非终结符的直接后续符号以进行归约）
LR(1) Item = [LR(0) Item, Following Symbol]
- e.g. $[\mathbf{A} \rightarrow \mathbf{XY} \cdot \mathbf{Z}, \mathbf{a}]$
- where $\mathbf{a}$ is the following terminal symbol of $\mathbf{A}$ （$\mathbf{a}$是$\mathbf{A}$的后续终结符）
- when $\mathbf{Z}$ is not empty, the same as LR(0) Item （当$\mathbf{Z}$不为空时，与LR(0)项相同）
- otherwise, $\mathbf{A} \rightarrow \mathbf{XY} \cdot \mathbf{Z}$ is applied only when the next input symbol is $\mathbf{a}$ （否则，只有在下一个输入符号为$\mathbf{a}$时才应用$\mathbf{A} \rightarrow \mathbf{XY} \cdot \mathbf{Z}$）
  - instead of SLR(1)’s ALL $\mathbf{A}$’s following symbols, let along LR(0)’s ALL terminal symbols （而不是SLR(1)的所有$\mathbf{A}$的后续符号，甚至LR(0)的所有终结符）

LR(1)’s CLOSURE and GOTO

LR(1)’s $\mathbf{CLOSURE(I)}$
- every Item in $\mathbf{I}$ is added in to $\mathbf{CLOSURE(I)}$
- repeat:
  - if $[\mathbf{A} \rightarrow \mathbf{\alpha} \cdot \mathbf{B} \mathbf{\beta}, \mathbf{a}]$ is in $\mathbf{CLOSURE(I)}$ and $\mathbf{B} \rightarrow \mathbf{\gamma}$ is a production of $\mathbf{G}$, then for each terminal $\mathbf{b}$ in $\mathbf{FIRST(\beta a)}$, add $[\mathbf{B} \rightarrow \cdot \mathbf{\gamma}, \mathbf{b}]$ into $\mathbf{CLOSURE(I)}$
  - until no more new Items can be added into $\mathbf{CLOSURE(I)}$
- only reduce $\mathbf{B}$ when it is followed by that in $\mathbf{FIRST(\beta a)}$ （仅在后面跟着$\mathbf{FIRST(\beta a)}$中的项时才归约$\mathbf{B}$）
LR(1)’s $\mathbf{GOTO(I, B)}$
- if $[\mathbf{A} \rightarrow \mathbf{\alpha} \cdot \mathbf{B} \mathbf{\beta}, \mathbf{a}]$ in $\mathbf{I}$
- then every Item in $\mathbf{CLOSURE(\{[\mathbf{A} \rightarrow \mathbf{\alpha} \mathbf{B} \cdot \mathbf{\beta}, \mathbf{a}]\})}$ is in $\mathbf{GOTO(I, B)}$
  - $\mathbf{GOTO(I, B)} \supseteq \mathbf{CLOSURE(\{[\mathbf{A} \rightarrow \mathbf{\alpha} \mathbf{B} \cdot \mathbf{\beta}, \mathbf{a}]\})}$
  - Those of Kernel Items inherited from the previous state （从前一个状态继承的内核项）
Example:

Automaton Construction

INPUT: a grammar $\mathbf{G}$
OUTPUT: a LR(1) automaton
Construction:
1. augment $\mathbf{G}$ to $\mathbf{G'}$ by adding a new start symbol $\mathbf{S'}$ and production $\mathbf{S'} \rightarrow \mathbf{S}$ （扩展文法）
2. $\mathbf{C}$:= {$\mathbf{CLOSURE({[S' \rightarrow \cdot S, \$]})}$} （初始化项集C）
3. repeat:
  - for each Item $\mathbf{I}$ in $\mathbf{C}$ and each grammar symbol $\mathbf{X}$ in $\mathbf{G'}$
    - if $\mathbf{GOTO(I, X)}$ is not empty and not in $\mathbf{C}$
      - add $\mathbf{GOTO(I, X)}$ to $\mathbf{C}$
4. until no new Items are added to $\mathbf{C}$
Example:
1. Example:
2. Exercise:

Parsing Table Construction

LR(1) Parsing Table $\mathbb{T}$:
- Rows: states
- Columns: grammar symbols
  - terminals for $\mathbf{SHIFT}$ and $\mathbf{REDUCE}$ actions
  - non-terminals for $\mathbf{GOTO}$ actions
- Construction: each edge $\mathbf{X}$: $(s_m, s_n)$
  - if $\mathbf{X}$ is a terminal $\mathbf{a}$, then $\mathbb{T}[s_m, a] = \mathbf{SHIFT}\ n$
  - if $\mathbf{X}$ is a non-terminal $\mathbf{A}$, then $\mathbb{T}[s_m, A] = \mathbf{GOTO}\ n$
  - if $[\mathbf{A} \rightarrow \mathbf{\beta} \cdot, \mathbf{a}]$ (the kernel) is in $s_m$, then $\mathbb{T}[s_m, a] = \mathbf{REDUCE}\ \mathbf{A} \rightarrow \mathbf{\beta}$
  - if $[\mathbf{S'} \rightarrow \mathbf{S} \cdot, \$]$ is in $s_m$, then $\mathbb{T}[s_m, \$] = \mathbf{ACCEPT}$
Example: > 注：上表中$[11, id]$应为$s12$，即$\mathbb{T}[11, id] = s12$。

LALR(1) Parsing

What: Look Ahead LR(1)
Why: smaller parsing table than LR(1) for practice（比LR(1)小的解析表）
- equal to SLR(1) in state number （与SLR(1)状态数相等）
  - e.g. In C, serveral hundred for SLR(1), serverals of thousands for LR(1)
- more powerful than SLR(1) in processing more grammars （比SLR(1)更强大，能处理更多文法）
How: combine Items with the same Production Set in LR(1) （将LR(1)中具有相同产生式集的项（同心集）组合在一起）
- e.g. $[\mathbf{A} \rightarrow \mathbf{\alpha} \cdot, \mathbf{a}]$ and $[\mathbf{A} \rightarrow \mathbf{\alpha} \cdot, \mathbf{b}]$ are combined into one state
Compared to LR(1):

Automaton Construction

INPUT: a grammar $\mathbf{G}$
OUTPUT: a LALR(1) automaton
Construction:
1. Construct LR(0) items as LALR(1) items’ cores for the grammar $\mathbf{G'}$，then remove the non-kernel items （为文法$\mathbf{G'}$构造LR(0)项作为LALR(1)项的核心，然后删除非核心项）
2. For each kernal items $\mathbf{K}$ in $\mathbf{I_i}$ and each grammar symbol $\mathbf{X}$ in $\mathbf{G'}$, calculate the lookahead symbols’ INIT and PROPAGATION. （对于每个项集$\mathbf{I_i}$中的核心项$\mathbf{K}$和文法符号$\mathbf{X}$，计算前瞻符号的初始值和传播）
  - For each item $(\mathbf{A} \rightarrow \mathbf{\alpha} \cdot \mathbf{\beta})$ in $\mathbf{K}$
    - $\mathbf{J} := \{\mathbf{CLOSURE({[A \rightarrow \alpha \cdot \beta, \#]})}\}$ （初始化$\mathbf{J}$）
    - if $[\mathbf{B \rightarrow \gamma \cdot X \delta, a}]$ in $\mathbf{J}$ and $\mathbf{a \neq} \#$, then $\mathbf{B \rightarrow \gamma X \cdot \delta}$ in $\mathbf{GOTO(I_i, X)}$ has a SELF-generated lookahead symbol $\mathbf{a}$ （$\mathbf{J}$中的项若出现不为#的前瞻符号，则它是自发生成的）
    - if $[\mathbf{B \rightarrow \gamma \cdot X \delta,} \#]$ in $\mathbf{J}$, then lookahead symbols are propagated from $\mathbf{A \rightarrow \alpha \cdot \beta}$ to $\mathbf{B \rightarrow \gamma X \cdot \delta}$ in $\mathbf{GOTO(I_i, X)}$. （$\mathbf{J}$中的项若出现#的前瞻符号，则该项会传播前瞻符号）
  - Specifically, $\mathbf{\$}$ in $\mathbf{[S' \rightarrow S \cdot, \$]}$ is SELF-generated.
3. Construct the $\mathbf{FROM-TO}$ table base on $\mathbf{GOTO}$ for the kernel items to show the propagation of lookahead symbols （为核心项构建$\mathbf{FROM-TO}$表，以显示前瞻符号的传播）
  - in step 2, we calculate propagated relationships.
4. Propagate the lookahead symbols according to the $\mathbf{FROM-TO}$ relationships until the fixed point achieved （根据$\mathbf{FROM-TO}$关系传播前瞻符号，直到达到不变点）
  - in step 2, we calculate self-generated lookahead symbols.
Example: > 注：上右表中，前瞻符号在传播时，每一行的前瞻符号可以向右填满右侧列，最右一列是最终结果

Parsing Table Construction

the same as LR(1) method
LALR(1) Parsing Table $\mathbb{T}$:
- Rows: states
- Columns: grammar symbols
  - terminals for $\mathbf{SHIFT}$ and $\mathbf{REDUCE}$ actions
  - non-terminals for $\mathbf{GOTO}$ actions
- Construction: each edge $\mathbf{X}$: $(s_m, s_n)$
  - if $\mathbf{X}$ is a terminal $\mathbf{a}$, then $\mathbb{T}[s_m, a] = \mathbf{SHIFT}\ n$
  - if $\mathbf{X}$ is a non-terminal $\mathbf{A}$, then $\mathbb{T}[s_m, A] = \mathbf{GOTO}\ n$
  - if $[\mathbf{A} \rightarrow \mathbf{\beta} \cdot, \mathbf{a}]$ (the kernel) is in $s_m$, then $\mathbb{T}[s_m, a] = \mathbf{REDUCE}\ \mathbf{A} \rightarrow \mathbf{\beta}$
  - if $[\mathbf{S'} \rightarrow \mathbf{S} \cdot, \$]$ is in $s_m$, then $\mathbb{T}[s_m, \$] = \mathbf{ACCEPT}$
Example:

Capabilities vs. Conflicts

LALR never introduces new SHIFT-REDUCE conflicts:
- SHIFT does not depend on lookaheads
Example:

Summary

LR(0), SLR(1), LR(1), LALR(1)
- all working in SHIFT-REDUCE mode
- only different in Parsing Tables
Parsing Tables - Capabilities
- LR(0) < SLR(1) < LALR(1) < LR(1)
- LR(0): Items
- SLR(1): Items with FOLLOW
- LALR(1): Items Combined from SLR(1) and LR(1)
- LR(1): Items with Subset of FOLLOW

Using Ambiguous Grammars

In Theory: grammar for LR parsing tables should be unambiguous （在理论上：LR解析表的文法应该是无歧义的）
For ambiguous grammars:
- there will be conflicts
- add new information/restrictions to resolve ambiguity （添加新信息/限制以解决歧义）
  - precedence, associativity, etc.
  - get LR tables without conflicts
Why embracing ambiguous grammars?
- some are much natural, the unambiguous one can be very complex （有些是非常自然的，无歧义的可能非常复杂）
- isolate common syntactic constructs for special-case optimizations （隔离常见的语法结构以进行特殊情况优化）
Example:

Summary

SJTU Notes > 计科

#SJTU #编译原理

编译原理

https://youyeyejie.github.io/_posts/编译原理/

作者

youyeyejie

发布于

2025年6月9日

更新于

2025年6月20日

马克思主义基本原理上一篇

量子力学下一篇