PHASES OF COMPILER
IN
COMPILER CONSTRUCTION
ANALYSIS
PART:-
• Analysis part breaks the
source program into constituent pieces and imposes a grammatical structure on
them which further uses this structure to create an intermediate representation
of the source program.
• It is also termed as
front end of compiler.
• Information about the
source program is collected and stored in a data structure called symbol table.
SYNTHESIS
PART:-
• Synthesis part takes the
intermediate representation as input and transforms it to the target program.
• It is also termed as
back end of compiler.
The design of compiler can
be decomposed into several phases, each of which converts one form of source
program into another.
The different phases of compiler are as
follows:
1. Lexical analysis
2. Syntax analysis
3. Semantic analysis
4. Intermediate code
generation
5. Code optimization
6. Code generation
All of the aforementioned
phases involve the following tasks:
• Symbol table management.
• Error handling.
LEXICAL
ANALYSIS:-
• Lexical analysis is the
first phase of compiler which is also termed as scanning.
• Source program is
scanned to read the stream of characters and those characters are grouped to
form a sequence called lexemes which produces token as output.
Token: Token is a sequence of characters that represent lexical
unit, which matches with the pattern, such as keywords, operators, identifiers
etc.
• Lexeme: Lexeme is instance of a token i.e., group of characters
forming a token. ,
• Pattern: Pattern describes the rule that the lexemes of a token
takes. It is the structure that must be matched by strings.
• Once a token is generated the corresponding entry is made in the
symbol table.
Input: stream of characters
Output: Token
Token Template: <token-name, attribute-value>
(eg.) c=a+b*5;
LEXEMES AND TOKENS:-
Lexemes
|
Tokens
|
c
|
identifier
|
=
|
assignment
symbol
|
a
|
identifier
|
+
|
+ (addition
symbol)
|
b
|
identifier
|
*
|
*
(multiplication symbol)
|
5
|
5 (number)
|
Hence, <id,
1><=>< id, 2>< +><id, 3 >< * >< 5>
SYNTAX ANALYSIS:-
• Syntax analysis is the second phase of compiler which is also
called as parsing.
• Parser converts the tokens produced by lexical analyzer into a
tree like representation called parse tree.
• A parse tree describes the syntactic structure of the input.
• Syntax tree is a compressed representation of the parse tree in
which the operators appear as interior nodes and the operands of the operator
are the children of the node for that operator.
Input: Tokens
Output: Syntax tree
SEMANTIC ANALYSIS:-
• Semantic analysis is the third phase of compiler.
• It checks for the semantic consistency.
• Type information is gathered and stored in symbol table or in syntax tree.
• Performs type checking.
INTERMEDIATE CODE GENERATION:-
• Intermediate code generation produces intermediate
representations for the source program which are of the following forms:
o Postfix notation
o Three address code
o Syntax tree
Most commonly used form is the three address code.
t1 = inttofloat (5)
t2 = id3* tl
t3 = id2 + t2
id1 = t3
PROPERTIES OF INTERMEDIATE CODE:-
• It should be easy to produce.
• It should be easy to translate into target program.
CODE OPTIMIZATION:-
• Code optimization phase gets the intermediate code as input and
produces optimized intermediate code as output.
• It results in faster running machine code.
• It can be done by reducing the number of lines of code for a
program.
• This phase reduces the redundant code and attempts to improve
the intermediate code so that faster-running machine code will result.
• During the code optimization, the result of the program is not
affected.
• To improve the code generation, the optimization involves
o Deduction and removal of
dead code (unreachable code).
o Calculation of constants in
expressions and terms.
o Collapsing of repeated
expression into temporary string.
o Loop unrolling.
o Moving code outside the
loop.
o Removal of unwanted
temporary variables.
t1 = id3* 5.0
id1 = id2 + t1
CODE GENERATION:-
• Code generation is the final phase of a compiler.
• It gets input from code optimization phase and produces the
target code or object code as result.
• Intermediate instructions are translated into a sequence of
machine instructions that perform the same task.
• The code generation involves
o Allocation of register and memory.
o Generation of correct references.
o Generation of correct data types.
o Generation of missing code.
LDF R2, id3
MULF R2, # 5.0
LDF R1, id2
ADDF R1, R2
STF id1, R1
SYMBOL TABLE MANAGEMENT:-
• Symbol table is used to store all the information about
identifiers used in the program.
• It is a data structure containing a record for each identifier,
with fields for the attributes of the identifier.
• It allows finding the record for each identifier quickly and to
store or retrieve data from that record.
• Whenever an identifier is detected in any of the phases, it is
stored in the symbol table.
Example
int a, b; float c; char z;
Symbol name
|
Type
|
Address
|
a
|
Int
|
1000
|
b
|
Int
|
1002
|
c
|
Float
|
1004
|
z
|
char
|
1008
|
Example
extern double test (double x);
double sample (int count)
{
double sum= 0.0;
for (int i = 1; i < = count; i++)
sum+= test((double) i);
return sum;
}
Symbol name
|
Type
|
Scope
|
test
|
function, double
|
extern
|
x
|
double
|
function parameter
|
sample
|
function, double
|
global
|
count
|
int
|
function parameter
|
sum
|
double
|
block local
|
i
|
int
|
for-loop statement
|
ERROR HANDLING:-
• Each phase can encounter errors. After detecting an error, a
phase must handle the error so that compilation can proceed.
• In lexical analysis, errors occur in separation of tokens.
• In syntax analysis, errors occur during construction of syntax
tree.
• In semantic analysis, errors may occur at the following cases:
(i) When the compiler detects constructs that have right syntactic
structure but no meaning
(ii) During type conversion.
• In code optimization, errors occur when the result is affected
by the optimization. In code generation, it shows error when code is missing
etc.
Figure illustrates the translation of source code through each
phase, considering the statement
c =a+ b *
5.
ERROR ENCOUNTERED IN DIFFERENT PHASES
Each phase can encounter errors. After detecting an error, a phase
must some how deal with the error, so that compilation can proceed.
A program may have the following kinds of errors at various stages:
A program may have the following kinds of errors at various stages:
LEXICAL ERRORS
It includes incorrect or misspelled name of some identifier i.e.,
identifiers typed incorrectly.
SYNTACTICAL ERRORS
It includes missing semicolon or unbalanced parenthesis. Syntactic
errors are handled by syntax analyzer (parser).
When an error is detected, it must be handled by parser to enable the parsing of the rest of the input. In general, errors may be expected at various stages of compilation but most of the errors are syntactic errors and hence the parser should be able to detect and report those errors in the program.
When an error is detected, it must be handled by parser to enable the parsing of the rest of the input. In general, errors may be expected at various stages of compilation but most of the errors are syntactic errors and hence the parser should be able to detect and report those errors in the program.
THE GOALS OF ERROR HANDLER IN PARSER ARE:
• Report the presence of errors clearly and accurately.
• Recover from each error quickly enough to detect subsequent errors.
• Add minimal overhead to the processing of correcting programs.
• Recover from each error quickly enough to detect subsequent errors.
• Add minimal overhead to the processing of correcting programs.
There are four common error-recovery strategies that can be
implemented in the parser to deal with errors in the code.
o Panic mode.
o Statement level.
o Error productions.
o Global correction.
o Statement level.
o Error productions.
o Global correction.
SEMANTICALLY ERRORS
These errors are a result of incompatible value assignment. The
semantic errors that the semantic analyzer is expected to recognize are:
• Type mismatch.
• Undeclared variable.
• Reserved identifier misuse.
• Multiple declaration of variable in a scope.
• Accessing an out of scope variable.
• Actual and formal parameter mismatch.
• Undeclared variable.
• Reserved identifier misuse.
• Multiple declaration of variable in a scope.
• Accessing an out of scope variable.
• Actual and formal parameter mismatch.
LOGICAL ERRORS
These errors occur due to not reachable code-infinite loop.
Good work :)
ReplyDelete