Original PDF Flash format microsoft-powerpoint---cs471-24-summary.ppt  


Microsoft Powerpoint Cs471 24 Summary.ppt

What You’ve Learned This Semester
What happens when you compile your code
High-Level
Compilers:
Machine
Programming
Compiler
Code
Languages
Summary and Wrapup
Error Messages
How to implement a compiler
– What is challenging, time consuming
CS 471
– Available tools and their other applications
December 5, 2007
How to apply theory to practice (with less direction)
How to break a large problem down into subcomponents

1
CS 471 – Fall 2007
Goals of a Compiler
Simplified Compiler Structure
A compiler’s job is to
Source code
• Lower the abstraction level
(character stream)
if (b==0) a = b;
• Eliminate overhead from language abstractions
Lexical Analysis
• Map source program onto hardware efficiently
– Hide hardware weaknesses, utilize hardware
Token stream
Front End
strengths
Parsing
Machine independent
• Equal the efficiency of a good assembly programmer
Abstract syntax tree
Intermediate Code Generation
Optimizing compilers should improve the code
• Performance*
Intermediate code
• Code size
Code Generation
Back End
• Security
Assembly code
Machine dependent
• Reliability
(character stream)
CMP CX, 0
• Power consumption
CMOVZ CX, DX
2
CS 471 – Fall 2007
3
CS 471 – Fall 2007
Phases of a Compiler
Regular expressions
Language – set of strings
Source program
Lexical Analyzer
String – finite sequence of symbols
Symbols – taken from a finite alphabet

Lexical analyzer
•Group sequence of characters into
lexemes (keywords, identifiers,
Syntax analyzer
constants)
Specify languages using regular expressions
•Makes use of the theory of regular
Semantic analyzer
languages and finite state machines
Symbol
a
one instance of a
Intermediate
•Lex and Flex are tools that construct
code generator
Epsilon
ε
empty string
lexical analyzers from regular
expression specifications
Alternation
R | S
string from either L(R) or L(S)
Code optimizer
Concatenation
R · S
string from L(R) followed by L(S)
Code generator
Repetition
R*
0 or more strings from L(R)
Target program
4
CS 471 – Fall 2007
5
CS 471 – Fall 2007
1

Finite Automata
Language
Automaton (DFA) can be represented as:
Each string is accepted or rejected
1. Starting in the start state
• A transition table

non-”
2. Automaton follows one edge for every character (edge
0
1
error
must match character)
[^”]*
1
2
1
3. After n-transitions for an n-character string, if final
state then accept
2
error
error
Language: set of strings that the FSA accepts
• A graph
Non-”
i
f
[a-z0-9]
0
1

2
0
1
2
3
[a-z0-9]
ID
IF
ID
[a-hj-z]
6
CS 471 – Fall 2007
7
CS 471 – Fall 2007
Lex: A Lexical Analyzer Generator
Phases of a Compiler
Lex produces a C program from a lexical specification
http://www.epaperpress.com/lexandyacc/index.html
Source program
Parser
• Convert a linear structure – sequence
%%
Lexical analyzer
of tokens – to a hierarchical tree-like
DIGITS [0-9]+
structure – an AST
Syntax analyzer
ALPHA [A-Za-z]
• The parser imposes the syntax rules of
CHARACTER {ALPHA}|_
Semantic analyzer
the language
IDENTIFIER {ALPHA}({CHARACTER}|{DIGITS})*
• Work should be linear in the size of the
%%
Intermediate
input (else unusable) → type consistency
code generator
if
{return IF; }
cannot be checked in this phase
{IDENTIFIER}
{return ID; }
Code optimizer
•Deterministic context free languages
{DIGITS}
{return NUM; }
and pushdown automata for the basis
([0-9]+”.”[0-9]*)|([0-9]*”.”[0-9]+) {return REAL; }
Code generator
• Bison and yacc allow a user to
.
{error(); }
construct parsers from CFG specifications
Target program
8
CS 471 – Fall 2007
9
CS 471 – Fall 2007
Context-Free Grammar Terminology
Shift-Reduce Parsing
Terminals
S (S) S
Bottom-up parsing uses two kinds of actions:
– Token or ε
Shift and Reduce
Non-terminals
S ε
– Syntactic variables
Shift: Move I one place to the right
Start symbol
• Shifts a terminal to the left string
– A special nonterminal is designated (S)
E + (I int ) E + (int I )
Productions
– Specify how non-terminals may be expanded to
form strings
Reduce: Apply an inverse production at the
right end of the left string
– LHS: single non-terminal, RHS: string of
• If E → E + ( E ) is a production, then
terminals or non-terminals
• Vertical bar is shorthand for multiple productions
E + (E + ( E ) I ) E +(E I )
10
CS 471 – Fall 2007
11
CS 471 – Fall 2007
2

Shift-Reduce Parsing Table
Parsing Table
terminal symbols non-terminal symbols
(
)
id
,
$
S
L
state
next
next state
1
s3
s2
g4
on
Action table
actions
2
S→id
S→id
S→id
S→id
S→id
reduction
3
s3
s2
g7
g5
1. shift and goto state n
4
accept
2. reduce using X → γ
– pop symbols γ off stack
5
s6
s8
– using state label of top (end) of stack, look up X
6 S→(L)
S→(L)
S→(L)
S→(L)
S→(L)
in goto table and goto that state
7
L→S
L→S
L→S
L→S
L→S
8
s3
s2
g9
• DFA + stack = push-down automaton (PDA)
9 L→L,S
L→L,S
L→L,S
L→L,S
L→L,S
12
CS 471 – Fall 2007
13
CS 471 – Fall 2007
Yacc / Bison
Phases of a Compiler
Yet Another Compiler Compiler
Source program
Semantic Analysis
Automatically constructs an LALR(1) parsing
table from a set of grammar rules
• Calculates the program’s
Lexical analyzer
Yacc/Bison specification:
“meaning”
Syntax analyzer
• Rules of the language are
file.y
Semantic analyzer
checked (variable declaration,
parser declarations
bison –vd file.y
type checking)
%%
y.tab.c
Intermediate
-or-
• Type checking also needed for
grammar rules
y.tab.h
code generator
%%
yacc –vd file.y
y.output
code generation (code gen for a +
auxiliary code
Code optimizer
b depends on the type of a and b)
Code generator
Target program
14
CS 471 – Fall 2007
15
CS 471 – Fall 2007
Typical Semantic Errors
Tiger Symbol Tables
Traverse the AST created by the parser to find:
Two namespaces … two symbol tables
Namespace for types
Multiple declarations: a variable should be
Namespace for vars and functions
declared (in the same scope) at most once
Called tenv and venv, respectively
Undeclared variable: a variable should not be
used before being declared
a \
Type mismatch: type of the left-hand side of an
a
assignment should match the type of the right-hand
y \
x
side
Scope
Wrong arguments: methods should be called with
X
x \
change
the right number and types of arguments
y
y
x
x
x
16
CS 471 – Fall 2007
17
CS 471 – Fall 2007
3

Phases of a Compiler
Intermediate Code
• Abstract machine code – (Intermediate
Source program
Intermediate Code Generation
Representation)
• Makes it easy to port compiler
Lexical analyzer
• Allows machine-independent code generation,
to other architectures (e.g.
optimization
Syntax analyzer
Pentium to MIPS)
optimize
x86
Semantic analyzer
• Can also be the basis for
interpreters (such as in Java)
AST
IR
PowerPC
Intermediate
code generator
• Enables optimizations that are
not machine specific
Alpha
Code optimizer
Code generator
Target program
18
CS 471 – Fall 2007
19
CS 471 – Fall 2007
Phases of a Compiler
Analysis and Transformation
Most optimizations require some global
Source program
Intermediate Code Optimization
understanding of program flow
• Constant propagation, dead
• Moving, removing, rearranging instructions
Lexical analyzer
code elimination, common sub-
Achieve understanding by discovering the
Syntax analyzer
expression elimination, strength
control flow of the procedure
reduction, etc.
• What blocks follow/are reachable from other blocks
Semantic analyzer
• Based on dataflow analysis –
• Where loops exist (focus optimization efforts)
Intermediate
properties that are independent of
• We call this Control-Flow Analysis
code generator
execution paths
Also, find and connect definitions and uses of
variables
Code optimizer
• We call this Data-Flow Analysis
Code generator
All approaches to control-flow analysis require
• Identification of basic blocks
Target program
• Construction of a flow graph of the procedure
20
CS 471 – Fall 2007
21
CS 471 – Fall 2007
When to Apply Optimization
Scope of Optimization
Local
(or single block)
AST
Inlining
• Confined to straight-line code
HIR
Specialization
Constant folding
• Simplest to analyze
IR
Constant propagation
Value numbering
Intraprocedural
(or global)
Canonical
Dead code elimination
• Consider the whole procedure
MIR
IR
Loop-invariant code motion
Common sub-expression elimination
Interprocedural
(or whole program)
Abstract
Strength reduction
• Consider the whole program
Assembly
Branch prediction/optimization
Register allocation
LIR
Loop unrolling
Assembly
Cache optimization
22
CS 471 – Fall 2007
23
CS 471 – Fall 2007
4

Flag
Description
Phases of a Compiler
-O0
Barely any transformations, just code generation. Target code
Source program
Native Code Generation
can be debugged with no information loss.
• Intermediate code is translated into
-O1
Some transformations that preserve execution ordering.
Lexical analyzer
native code
Debuggability of the generated code is hardly affected. User
variables should not disappear and function inlining is not done.
Syntax analyzer
• Register allocation, instruction
selection
-O2
More aggressive transformations. May affect execution ordering
and usually provide faster code. Debuggability may be somewhat
Semantic analyzer
compromised by disappearing user variables and function bodies.
-O3
Very aggressive transformations that may or may not provide
Intermediate
Native Code Optimization
better code. Some semantics may be modified. Execution order
code generator
completely distorted. Debuggability is seriously compromised.
• Peephole optimizations – small window
is optimized at a time
-Os
Optimize for size. Enables transformations that reduce generated
Code optimizer
code size. May sometimes improve application performance
because there is less code to execute. May lead to reduced
Code generator
memory footprints which may produce fewer page faults.
-O4
There is no -O4. Anything above -O3 is treated as -O3.
Target program
24
CS 471 – Fall 2007
25
CS 471 – Fall 2007
Instruction Tiling
Register Allocation
Live Range
Interference
s1 2
s1 s2 s3 s4 s5 s6
x = x + 1;
s2 4
S1
S4
S5
s3 s1 + s2
s4 s1 + 1
MOVE t2
mov t1, [bp+x]
s5 s1 * s2
S3
S2
S6
MEM
+
mov t2, t1
s6 s4 * 2
t1
add t2, 1
+
MEM
1
Result
mov [bp+x], t2
FP
x
+
Graph Coloring
r1 2
r2 4
FP
x
S1
S4
S5
r1 ← green
r3 r1 + r2
r2 ← blue
r3 r1 + 1
r3 ← pink
r1 r1 * r2
S3
S2
S6
r2 r3 * 2
26
CS 471 – Fall 2007
27
CS 471 – Fall 2007
Alternatives to the Traditional Model
Just-in-Time Compilation
Static Compilation
Compiler
All work is done “ahead-of-time”
High-Level
Machine
Programming
Front
Back
End
End
Code
Just-in-Time Compilation
Languages
Postpone some compilation tasks
Error Messages
Multiversioning and Dynamic Feedback
Include multiple options in binary
Ship bytecodes (think IR) rather than binaries
Dynamic Binary Optimization
• Binaries execute on machines
Traditional compilation model
• Bytecodes execute on virtual machines
Executables can adapt
28
CS 471 – Fall 2007
29
CS 471 – Fall 2007
5

The Big Picture
We now know:
All of the components of a compiler
What needs to be done statically vs.
dynamically
Compiler Wars:
The potential impact of language or
Deliverables and Rules
architecture changes
Why Java moved the “back-end” to run time
CS 471
December 10, 2007
30
CS 471 – Fall 2007
1. Robustifying your Compiler
2. Generate Test Cases
(For real this time)
a) A test case that may break others’ lexical analyzers
kh7fe.lex.tig
b) A test case that may break others’ parsers
Sample outputs are online
kh7fe.parse.tig
c) A test case that may break others’ typecheckers
www.cs.virginia.edu/kim/courses/cs471/project/
kh7fe.type.tig
d) A test case that may break others’ irGen
Contains:
kh7fe.irgen.tig
– test[0-9]+.tig.out
e) A README file – describe expected output for each
– merge.tig.out
test file; what you expect to break in each case; why it’s
– queens.tig.out
unique
kh7fe.README.tests
32
CS 471 – Fall 2007
33
CS 471 – Fall 2007
3. Submit your Compilers
Then What?
a) Your tiger.lex file (We will build your lexical analyzer)
Submit everything
kh7fe.lex
BEFORE SUNDAY DEC 9 @ 9PM FIRM
b) Your grammar (with semantic actions)
(enforced by toolkit)
kh7fe.grm
c) Your typechecker (Input: file; Output: type errors)
Show up here on Monday Dec 10 @ 2PM for …
kh7fe.typecheck
d) Your IR generator (Input: file; Output: type errors and
(if applicable) IR tree)
COMPILER WARS!
kh7fe.irgen
e) README file describing all of the issues you corrected
between PA7 and compiler wars
kh7fe.README.compiler
f) A tar.gz file containing everything in your PA7
directory (assuming it has been updated since PA7)
kh7fe.finalCompiler.tar.gz
34
CS 471 – Fall 2007
35
CS 471 – Fall 2007
6

How Will It Work?
Prize structure
Bracketed progression (like the ACC tournament)
Prizes for:
– Round 1: Lexical Analysis
Best lexical analyzer
– Round 2: Parsing & AST Generation
Best parser
– Round 3: Type Checking
Best typechecker
– Round 4: IR Generation
Best ir generator
Each round: Full bracket progression to determine a
Best overall
winner (single elimination)
Most improved compiler
Each game: Compare test cases passed with your
opponent
Best test case
51 given test cases
2 custom cases
Prizes:
Corresponding PA late day reprieve (1 day)
Tie breaker: Cleanest code (as determined by judges)
Grab bag
36
CS 471 – Fall 2007
37
CS 471 – Fall 2007
PA8 Grade
Rest of Class Time Today…
50%
1) Please fill out the official evaluations
submitting all PA8 materials on time and
showing up for compiler wars
2) Please send me anonymous feedback if you have
detailed suggestions for doing things differently the
next time I teach this course
50%
a) Topic emphasis, e.g. less time on X more time on Y
b) Homework vs. projects vs. tests
completeness of the final product (PA2-PA7)
c) Lab timeslots?
d) Group vs. individual projects?
Extra Office Hours
Sunday 6-7 PM Olsson 209
Thanks for a Great Semester!
38
CS 471 – Fall 2007
39
CS 471 – Fall 2007
7