### **PUNE VIDYARTHI GRIHA's COLLEGE OF ENGINEERING, NASHIK.**

# "CODE GENERATION"

**PREPARED BY :** 

**PROF. ANAND N. GHARU** 

**ASSISTANT PROFESSOR** 

**COMPUTER DEPARTMENT** 

3/18/2019 SUBJECT – COMPILER (BE COMPUTER SPPU-2019)

# **CONTENTS**:

O Code Generation - Issues in code generation, basic blocks, flow graphs, DAG representation of basic blocks, Target machine description, peephole optimization, Register allocation and Assignment, Simple code generator, Code generation from labeled tree, Concept of code generator.

### Outline

- Code Generation Issues
- Target Machine Description
- Basic Blocks and Flow Graphs
- Optimizations of Basic Blocks
- A Simple Code Generator
- Peephole optimization
- Register allocation and assignment
- Instruction selection by tree rewriting

3/18/2019

### INTRODUCTION

- The final phase of a compiler is code generator
- It receives an intermediate representation (IR) with supplementary information in symbol table
- Produces a semantically equivalent target program
- Code generator main tasks:
  - Instruction selection
  - Register allocation and assignment
  - Instruction ordering



### **Code Generation**

- Code produced by compiler must be correct
  - Source-to-target program transformation should be semantics preserving
- Code produced by compiler should be of high quality
  - Effective use of target machine resources
  - Heuristic techniques should be used to generate good but suboptimal code, because generating optimal code is undecidable
- code generator should run efficiently

#### Issues in the design of codegenerator

- **Input:** Intermediate representation with symbol table assume that input has been validated by the front end
- Target programs :

The back-end code generator of a compiler may generate different forms of code, depending on the requirements:

- Absolute machine code (executable code)
- Relocatable machine code (object files for linker)
- Assembly language (facilitates debugging)
- Byte code forms for interpreters (e.g. JVM)

3/18/2019

#### Issues in the design of codegenerator

Target Machine:

Implementing code generation requires thorough understanding of the target machine architecture and its instruction set

#### Issues in the design of codegenerator

#### **Instruction Selection:**

Instruction selection is important to obtain efficient code



### Issues in the design of code generator Instruction Selection

- Suppose we translate a:=b+c into
  - MOV b,R0 ADD c,R0 MOV R0,a
- Assuming addresses of a, b, and c are stored in к0, к1, and к2

MOV \*R1,\*R0

ADD \*R2,\*R0

 Assuming R1 and R2 contain values of ь and с ADD R2,R1 MOV R1,a

3/18/2019

### Issues in the design of code generator Instruction Selection

- Suppose we translate three-address code x:=y+z
   tO: MOV y,R0 ADD z,R0
- MOV R0, x
  Then, we translate

  a:=b+c
  d:=a+e
  - tO: MOV a,R0 ADD b,R0 MOV R0,a MOV a,R0 ADD e,R0 MOV R0,d

3/18/2019

### Issues in the design of code generator Register Allocation and Assignment

- Efficient utilization of the limited set of registers is important to generate good code
- Registers are assigned by
  - *Register allocation* to select the set of variables that will reside in registers at a point in the code
  - Register assignment to pick the specific register that a variable will reside in
- Finding an optimal register assignment in general is NP-complete

### Issues in the design of code generator : Register Allocation and Assignment : Example

| t:=a*b   | t:=a*b             |  |  |
|----------|--------------------|--|--|
| t:=t+a   | t:=t+a             |  |  |
| t:=t/d   | t:=t/d             |  |  |
|          |                    |  |  |
|          |                    |  |  |
| { R1=t } | $\{ R0=a, R1=t \}$ |  |  |
|          |                    |  |  |
|          |                    |  |  |
| MOV a,R1 | MOV a,R0           |  |  |
| MUL b,R1 | MOV R0,R1          |  |  |
| ADD a,R1 | MUL b,R1           |  |  |
| DIV d,R1 | ADD R0,R1          |  |  |
| MOV R1,t | DIV d,R1           |  |  |
|          | MOV R1,t           |  |  |
|          |                    |  |  |

3/18/2019

### Issues in the design of code generator Choice of Evaluation Order

• When instructions are independent, their evaluation order can be changed



### A simple target machine model

- Load operations: LDr,x and LDr1, r2
- Store operations: STx,r
- Computation operations: OPdst, src1, src2
- Unconditional jumps: BRL
- Conditional jumps: Bcond r, Llike BLTZr, L

# **Addressing Modes**

- variable name: x
- indexed address: a(r) like LDR1, a(R2) means R1=contents(a+contents(R2))
- integer indexed by a register : like LDR1, 100(R2)
- Indirect addressing mode: \*r and \*100(r)
- immediate constant addressing mode: like LDR1, #100

# b=a[i]

LD R1, i //R1 =i

MUL R1, R1, 8 //R1 = Rl \* 8

LD R2, a(R1)

//R2=contents(a+contents(R1))

ST b, R2 //b =R2

3/18/2019

# a[j] =**c**

LD R1, c //R1 =c LD R2, j // R2 =j MUL R2, R2, 8 //R2 = R2 \* 8 ST a(R2), R1

//contents(a+contents(R2))=R1

3/18/2019

LD R1, p //R1 =p LD R2, O(R1) // R2=

contents(0+contents(R1))

ST x, R2 // x=R2

#### conditional-jump three-address instruction

If x<y goto L LDR1, x // R1=x LDR2,y // R2=y SUBR1, R1, R2 // R1=R1-R2 BLTZR1, M // if R1<0 jump t o M

#### costs associated with the addressing modes

• LDR0, R1

cost = 1

- LDR0, M cost = 2
- LDR1,\*100(R2) cost = 3

### Addresses in the TargetCode

- Astatically determined area Code
- Astatically determined data area Static
- Adynamically managed area Heap
- Adynamically managed area Stack

Three-address statements for procedure calls and returns

- call callee
- Return
- Halt
- action

#### Target program for a sample call and return

|                               |               | <b>E</b>             |                                   |                                                                                                   |
|-------------------------------|---------------|----------------------|-----------------------------------|---------------------------------------------------------------------------------------------------|
| action <sub>1</sub><br>call p | // code for c | 100:<br>120:<br>132: | ACTION1<br>ST 364, #140<br>BR 200 | <pre>// code for c // code for action1 // save return address 140 in location 364 // call p</pre> |
| -                             |               | 140:                 | ACTION <sub>2</sub>               |                                                                                                   |
| $action_2$                    |               | 160:                 | HALT                              | // return to operating system                                                                     |
| halt                          |               |                      | •••                               |                                                                                                   |
|                               | // code for p |                      |                                   | // code for $p$                                                                                   |
| $action_3$                    |               | 200:                 | ACTION <sub>3</sub>               |                                                                                                   |
| return                        |               | 220:                 | BR *364                           | // return to address saved in location 364                                                        |
|                               |               |                      | •••                               |                                                                                                   |
|                               |               |                      |                                   | // 300-363 hold activation record for $c$                                                         |
|                               |               | 300:                 |                                   | // return address                                                                                 |
|                               |               | 304:                 |                                   | // local data for $c$                                                                             |
|                               |               |                      |                                   |                                                                                                   |
|                               |               |                      |                                   | // 364-451 hold activation record for $p$                                                         |
|                               |               | 364:                 |                                   | // return address                                                                                 |
|                               |               | 368:                 |                                   | // local data for $p$                                                                             |
| 2/10/2010                     |               |                      |                                   | · ·                                                                                               |
| 3/18/2019                     |               | PR                   | OF. ANAND GHA                     | KU                                                                                                |
|                               |               |                      |                                   |                                                                                                   |

# **Stack Allocation**

| LD SP, #stackStart<br>code for the first procedure                                           | // initialize the stack                                                                                     |  |
|----------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------|--|
| HALT                                                                                         | // terminate execution                                                                                      |  |
| ADD SP, SP, #caller.recordSize<br>ST *SP, #here + 16<br>BR callee.codeArea                   | <pre>// increment stack pointer // save return address // return to caller Branch to called procedure</pre> |  |
| Return to caller<br>in Callee: BR *0(SP)<br><sup>3/18/2019</sup> in caller: SUB SP, SP, #cal | ler.recordsize                                                                                              |  |

### Target code for stack allocation

|                                                      | // code for m | 108:<br>128:<br>136:<br>144:<br>152: | LD SP, #600<br>ACTION <sub>1</sub><br>ADD SP, SP, #msize<br>ST *SP, #152<br>BR 300<br>SUB SP, SP, #msize | <pre>// code for m // initialize the stack // code for action1 // call sequence begins // push return address // call q // restore SP</pre> |
|------------------------------------------------------|---------------|--------------------------------------|----------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------|
| action <sub>1</sub><br>call q                        |               | 160:<br>180:                         | ACTION <sub>1</sub> 2<br>HALT                                                                            |                                                                                                                                             |
| $action_2$                                           |               | 200.                                 | ACTION                                                                                                   | // code for p                                                                                                                               |
| halt                                                 | // code for p | 200:<br>220:                         | ACTION <sub>3</sub><br>BR *0(SP)                                                                         | // return                                                                                                                                   |
| $action_3$<br>return                                 | // code for q | 300:<br>320:<br>328:                 | <br>ACTION4<br>ADD SP, SP, #qsize<br>ST *SP, #344                                                        | <pre>// code for q // contains a conditional jump to 456 // push return address</pre>                                                       |
| action <sub>4</sub><br>call p<br>action <sub>5</sub> |               | 336:<br>344:<br>352:                 | BR 200<br>SUB SP, SP, #qsize<br>ACTION <sub>5</sub><br>ADD SP, SP, #qsize                                | // call p                                                                                                                                   |
| call q<br>action <sub>6</sub><br>call q<br>return    |               | 380:<br>388:<br>396:<br>404:         | BR *SP, #396<br>BR 300<br>SUB SP, SP, #qsize<br>ACTION <sub>6</sub>                                      | // push return address<br>// call q                                                                                                         |
|                                                      |               | 432:<br>440:                         | ADD SP, SP, #qsize<br>ST *SP, #440<br>BR 300<br>SUB SP, SP, #qsize                                       | // push return address<br>// call q                                                                                                         |
| 3/18/2019                                            |               |                                      | NO CAPARU                                                                                                | // return                                                                                                                                   |
|                                                      |               | 600:                                 |                                                                                                          | // stack starts here                                                                                                                        |

# Flow Graphs

- A *flow graph* is a graphical depiction of a sequence of instructions with control flow edges
- A flow graph can be defined at the intermediate code level or target code level



### **Basic Blocks**

 A basic block is a sequence of consecutive instructions with exactly one entry point and one exit point (with natural flow or a branch instruction)



#### **Basic Blocks and Control Flow Graphs**

• A control flow graph (CFG) is a directed graph with basic blocks  $B_i$  as vertices and with edges  $B_i \rightarrow B_j$  iff  $B_j$  can be executed immediately after  $B_i$ 



### Successor and Predecessor Blocks

- Suppose the CFG has an edge  $B_1 \rightarrow B_2$ 
  - Basic block  $B_1$  is a predecessor of  $B_2$
  - Basic block  $B_2$  is a successor of  $B_1$



3/18/2019

# Partition Algorithm for Basic Blocks

Input: A sequence of three-address statements
Output: A list of basic blocks with each three-address statement in exactly one block

- 1. Determine the set of *leaders*, the first statements of basic blocks
  - a) The **first statement** is the leader
  - b) Any statement that is **the target of a goto** is a leader
  - c) Any statement that immediately follows a goto is a leader
- 2. For each leader, its basic block consist of the leader and all statements up to but not including the next leader or the end of the program

3/18/2019

# Intermediate code to set a 10\*10 matrix to an identity matrix

for *i* from 1 to 10 do for *j* from 1 to 10 do a[i, j] = 0.0;for *i* from 1 to 10 do a[i, i] = 1.0; 1) i = 12) j=1 3) t1 = i \* 10 4)  $t_2 = t_1 + j_1$ 5)  $t_3 = t_2 * 8$ 6)  $a[t_3] = 0.0$ 7) j = j + 18) if j<=10 goto (3) 9) i = i + 110) if i<=10 goto (2) 11) i=1 12) t4 = i \* 1013)t5 =t4 +i 14) t6 = t5 \* 815) a[t6] =1.0 16) i =i +1 17) if i<=10 goto (12)

3/18/2019

### Flow graph based on Basic



# Intermediate code to set a 10\*10 matrix to an identity matrix

| Generate TAC and Partit<br>into basic blocks<br>for i=1 to n<br>for j=1 to n<br>c[i,j] = a[i,j] +b[i,j] | ion below code   | 1) $i = 1$<br>2) $j = 1$<br>3) $t1 = i * 10$<br>4) $t2 = t1 + j$<br>5) $t3 = t2 * 8$<br>6) $t4 = i * 10$<br>7) $t5 = t4 + j$<br>8) $t6 = t5 * 8$<br>9) $t7 = i * 10$<br>10) $t8 = t7 + j$<br>11) $t9 = t8 * 8$<br>12) $c[t9] = a[t3] + b[t6]$<br>13) $j = j + 1$<br>14) $if j \le n \text{ goto } (3)$ |
|---------------------------------------------------------------------------------------------------------|------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|                                                                                                         |                  | (3)<br>15)i =i +1<br>16) if i<=n goto (2)                                                                                                                                                                                                                                                              |
| 3/18/2019                                                                                               | PROF ANAND GHARU |                                                                                                                                                                                                                                                                                                        |

3/18/2019

### Flow graph based on Basic



## Loops

- A loop is a collection of basic blocks, such that
  - All blocks in the collection are strongly connected
  - The collection has a unique *entry*, and the only way to reach a block in the loop is through the entry

# Loops (Example)



Strongly connected components:

SCC={ {B2,B3}, {B4} }

Entries: B3, B4

## Equivalence of Basic Blocks

• Two basic blocks are (semantically) *equivalent* if they compute the same set of expressions



Blocks are equivalent, assuming **t1** and **t2** are *dead*: no longer used (no longer *live*)

3/18/2019

## **Transformations on Basic Blocks**

- A code-improving transformation is a code optimization to improve speed or reduce code size
- Global transformations are performed across basic blocks
- Local transformations are only performed on single basic blocks
- Transformations must be safe and preserve the meaning of the code
  - A local transformation is safe if the transformed basic block is guaranteed to be equivalent to its original form

3/18/2019

## **Transformations on Basic Blocks**

- Common Subexpression Elimination
- Dead Code Elimination
- Renaming Temporary Variables
- Interchange of Statements
- Algebraic Transformations
- Next-Use Computation

#### **Common-Subexpression Elimination**

Remove redundant computations



## **Dead Code Elimination**

Remove unused statements



Assuming a is dead (not used)



## **Renaming Temporary Variables**

 Temporary variables that are dead at the end of a block can be safely renamed



Normal-form block

## Interchange of Statements

Independent statements can be reordered



Note that normal-form blocks permit all statement interchanges that are possible

PROF. ANAND GHARU Reference : Aho Ullman, Sethi

3/18/2019

## **Algebraic Transformations**

 Change arithmetic operations to transform blocks to algebraic equivalent forms



## Next-Use

- Next-use information is needed for dead-code elimination and register assignment
- Next-use is computed by a backward scan of a basic block and performing the following actions on statement

*i*: x := y op z

- Add liveness/next-use info on x, y, and z to statement i
- Set x to "not live" and "no next use"
- Set y and z to "live" and the next uses of y and z to i

# Example

1: 
$$t_1 = a * a$$
  
2:  $t_2 = a * b$   
3:  $t_3 = 2 * t_2$   
4:  $t_4 = t_1 + t_3$   
5:  $t_5 = b * b$   
6:  $t_6 = t_4 + t_5$   
7:  $X = t_6$ 

PROF. ANAND GHARU Reference : Aho Ullman, Sethi

3/18/2019

# Example

| 1:t <sub>1</sub> =a*a                    | STATEMENT                                                                                                                                            | Symbol Table          |      |          |
|------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------|------|----------|
| 2: $t_2 = a * b$                         | 7: no temporary is live                                                                                                                              | t <sub>1</sub>        | dead | Use in 4 |
| 3: $t_3 = 2 * t_2$                       | 6: $t_6$ :use(7), $t_4 t_5$ not live<br>5: $t_5$ :use(6)                                                                                             | t <sub>2</sub>        | dead | Use in 3 |
| 4: $t_4 = t_1 + t_3$<br>5: $t_5 = b * b$ | 5: t <sub>5</sub> :use(6)<br>4: t <sub>4</sub> :use(6), t <sub>1</sub> t <sub>3</sub> not live<br>3: t <sub>3</sub> :use(4), t <sub>2</sub> not live | <b>t</b> <sub>3</sub> | dead | Use in 4 |
| 6: $t_6 = t_4 + t_5$                     | 2: t <sub>2</sub> :use(3)                                                                                                                            | <b>t</b> <sub>4</sub> | dead | Use in 6 |
| 7: X = $t_6$                             | 1: t <sub>1</sub> :use(4)                                                                                                                            | <b>t</b> <sub>5</sub> | dead | Use in 6 |
|                                          |                                                                                                                                                      | <b>t</b> <sub>6</sub> | dead | Use in 7 |

3/18/2019

## Example ...



Reference : Aho Ullman, Sethi

## **ACode Generator**

- Generates target code for a sequence of threeaddress statements using next-use information
- Uses new function *getreg* to assign registers to variables
- Computed results are kept in registers as long as possible, which means:
  - Result is needed in another computation
  - Register is kept up to a procedure call or end of block
- Checks if operands to three-address code are available in registers

## Code Generator : Register and Address Descriptors

• A *register descriptor* keeps track of what is currently stored in a register at a particular point in the code, e.g. a local variable, argument, global variable, etc.

MOV a, R0 "R0 contains a"

• An *address descriptor* keeps track of the location where the current value of the name can be found at run time, e.g. a register, stack location, memory address, etc.

#### MOV a, R0 MOV R0, R1 "a in R0 and R1"

3/18/2019

## The Code Generation Algorithm

- For each statement x := y op z
  - 1. Set location L = getreg(y, z)
  - 2. If  $y \notin L$  then generate

MOV y',L

where y' denotes one of the locations where the value of y is available (choose register if possible)

3. Generate

#### **OP** *z*',*L*

where z' is one of the locations of z;

Update register/address descriptor of x to include L

4. If y and/or z has no next use and is stored in register, update register descriptors to remove y and/or z

# The getreg Algorithm

- To compute getreg(y,z)
  - If y is stored in a register R and R only holds the value y, and y has no next use, then return R; Update address descriptor: value y no longer in R
  - 2. Else, return a new empty register if available
  - 3. Else, find an occupied register R; Store contents (register spill) by generating MOV R, M

for every *M* in address descriptor of *y*; Return register *R* 

4. Return a memory location

## Example

| Stmt                | code                               | reg desc                         | addr desc                        |
|---------------------|------------------------------------|----------------------------------|----------------------------------|
| t <sub>1</sub> =a-b | mov a,R <sub>o</sub>               | $R_0$ contains $t_1$             | t <sub>1</sub> in R <sub>o</sub> |
|                     | sub b,R <sub>o</sub>               |                                  |                                  |
| t <sub>2</sub> =a-c | mov a, R <sub>1</sub>              | $R_0$ contains $t_1$             | t <sub>1</sub> in R <sub>0</sub> |
|                     | sub c,R <sub>1</sub>               | $R_1$ contains $t_2$             | $t_2 in R_1$                     |
| $t_3 = t_1 + t_2$   | add R <sub>1</sub> ,R <sub>0</sub> | $R_0$ contains $t_3$             | t <sub>3</sub> in R <sub>o</sub> |
| 0                   |                                    | $R_1$ contains $t_2$             | $t_2 in R_1$                     |
|                     |                                    |                                  |                                  |
| $d=t_3+t_2$         | add R <sub>1</sub> ,R <sub>0</sub> | <b>R</b> <sub>o</sub> contains d | d in R <sub>o</sub>              |
|                     | mov R <sub>o</sub> ,d              |                                  | d in R <sub>o</sub> and          |
|                     |                                    |                                  | memory                           |

3/18/2019

#### **Register Allocation and Assignment**

- Global Register Allocation
- Usage Counts
- Register Assignment for Outer Loops
- Register Allocation by Graph Coloring

## **Global register allocation**

- Previously explained algorithm does local (block based) register allocation
- This resulted that all live variables be stored at the end of block
- To save some of these stores and their corresponding loads, we might arrange to assign registers to frequently used variables and keep these registers consistent across block boundaries (globally)
- Some options are:
  - Keep values of variables used in loops inside registers
  - Use graph coloring approach for more globally allocation

- Usage Count means to determine that after the definition of x , how many subsequent uses of x exists in that block
- For the loops we can approximate the saving by register allocation as: Sum over all blocks (B) in a loop (L)
  - For each uses of x before any definition in the block we add one unit of saving
  - If x is live on exit from B and is assigned a value in B, then we add 2 units of saving

i.e

 $\Sigma$  (use(x,B) +2 \* live(x,B)

For all Blocks B in L

•3/1This is used for register assignment

Reference : Aho Ullman, Sethi

#### Example: Flow graph of an inner loop



- Now, compute Usage Count for each variable considering each block.
- Use of a in B1 prior to its definition in B1 is 0
   ∴ use(a,B1) =0
- 'a' is live on exit from B1 and is assigned a value in B1
   ∴ live(a,B1) = 1

 $\therefore usageCount(a) = (0+2^*1) + (1+2^*0) + (1+2^*0) + (0+2^*0)$ 

2+1+1+0=4

3/18/2019 $\therefore$  usageCount(a) = 4

• Now, computing Usage Count for each variable, we get

|         | Bloc<br>k Variable | B1      | B2                               | B3                               | B4               | Total |
|---------|--------------------|---------|----------------------------------|----------------------------------|------------------|-------|
|         | а                  | 0+2*1=2 | 1+2*0=1                          | 1+2*0=1                          | 0+2*0=0          | 4     |
|         | b                  | 2+2*0=2 | 0+2*0=0                          | 0+2*1=2                          | 0+2*1=2          | 6     |
|         | С                  | 1+2*0=1 | 0+2*0=0                          | 1+2*0=1                          | 1+2*0=1          | 3     |
|         | d                  | 1+2*1=3 | 1+2*0=1                          | 1+2*0=1                          | 1+2*0=1          | 6     |
|         | е                  | 0+2*1=2 | 0+2*0=0                          | 0+2*1=2                          | 0+2*0=0          | 4     |
| 3/18/20 | <sup>19</sup> f    | 1+2*0=1 | 0772017-2017<br>Reference :<br>A | IANE2COLAR<br>ho Ullman,<br>Seth | ∪ <b>0+2*0=0</b> | 4     |

- Thus, we may select 'a','b','d' for registers , since their usage count is largest.
- We can use 'e' or 'f' too instead of 'a'

# Code sequence using global register assignment



## Global Register Allocation with Graph Coloring

- When a register is needed but all available registers are in use, the content of one of the used registers must be stored (spilled) into a memory location to free a register
- Graph coloring allocates registers and attempts to minimize the cost of spills
- Build a conflict graph (interference graph)
- Find a k-coloring for the graph, with k the number of registers

#### Register allocation by Graph coloring

- Two passes are used
  - Target-machine instructions are selected as though there are an infinite number of symbolic registers
  - Assign physical registers to symbolic ones
    - Create a register-interference graph
    - Nodes are symbolic registers and edges connects two nodes if one is live at a point where the other is defined.
    - Use a graph coloring algorithm to assign registers.

3/18/2019



#### Register allocation by Graph coloring

- Construct Interference Graph
- Connect nodes with edges that interfere with other
- Then, Color the interference graph such that nodes connected to each other have different colors
- Then Allocate variables having the same color with same registers

#### Register allocation by Graph coloring



Now, consider three available registers R1, R2 and R3

t1 & t5 reside in R1 say red t2 & t6 reside in R2 say green <sup>3</sup>t3/& 4 reside in R3 say blue AND GHARU

## **Peephole Optimization**

- Target code often contains redundant instructions and suboptimal constructs
- Examine a short sequence of target instruction (peephole) and replace by a shorter or faster sequence
- Peephole is a small moving window on the target systems

3/18/2019

## Peephole optimization examples...

- 1. Redundant loads and stores
- Consider the code sequence

Move  $R_0$ , a Move a,  $R_0$ 

Instruction 2 can always be removed if it does not have a label.

3/18/2019

## Peephole optimization examples...

#### 2. Unreachable code

 Consider following code sequence #define debug 0 if (debug) { print debugging info }

```
this may be translated as
if debug = 1 goto L1
goto L2
L1: print debugging info
L2:
```

```
Eliminate jump over jumps
if debug <> 1 goto L2
print debugging information
PROF. ANAND GHARU
```

## Unreachable code example ...

#### constant propagation

if 0 ⇔ 1 goto L2 print debugging information

L2:

Evaluate boolean expression. Since if condition is always true the code becomes

goto L2 print debugging information

The print statement is now unreachable. Therefore, the code becomes



#### Peephole optimization examples...

3. Flow of control: replace jump sequences

| goto L1          |    | goto L2         |
|------------------|----|-----------------|
|                  | by |                 |
| <br>L1 : goto L2 |    | <br>L1: goto L2 |

4. Simplify algebraic expressions

remove x := x+0 or x:=x\*1

3/18/2019

## Peephole optimization examples...

#### 5. Strength reduction

- Replace X^2 by X\*X
- Replace multiplication by left shift
- Replace division by right shift

## 6. Use faster machine instructions by Inc R

3/18/2019

## Example

#### STATEMENT

- t1 =2 t2 =4
- t3 =t1 +t2
- t4 =t1 +1

- t5 =t1 \* t2
- t6 =t4 \* 2 3/18/2019

- Finally Code generated is
  - MOV #2,R1 MOV#4, R2
  - ADD R1, R2 MOV R2, R3
  - ADD #1,R1
  - MOV R1, R3
  - MUL R2,R1



### DAGrepresentation of basic blocks

- Useful data structures for implementing transformations on basic blocks
- Gives a picture of how value computed by a statement is used in subsequent statements
- Good way of determining common sub-expressions
- A dag for a basic block has following labels on the nodes
  - leaves are labeled by unique identifiers, either variable names or constants
  - interior nodes are labeled by an operator symbol
  - nodes are also optionally given a sequence of identifiers for labels

### DAGrepresentation: example

1. t<sub>1</sub> := 4 \* i 2.  $t_2 := a[t_1]$ 3. t<sub>3</sub> := 4 \* i 4.  $t_{4} := b[t_{3}]$ 5.  $t_5 := t_2 * t_4$  $6.t_6 := prod + t_5$ 7.prod :=  $t_6$ 8. t<sub>7</sub> := i + 1 9. i :=  $t_7$ 10. if i <= 20 goto (1)



3/18/2019

PROF. ANAND GHARU

#### DAGrepresentation in form of Intermediate code **Final I/c Code** $S_1 = 4 * i$ Generated $S_{y} = c(A)$ $S_1 = 4 * i$ $S_3 = S_2[S_1]$ $S_2 = c(A)$ S₄=4 \* i $S_3 = S_2[S_1]$ $S_5 = c(B)$ $S_6 = S_5 [S_4]$ $S_5 = c(B)$ $S_7 = S_3 * S_6$ $S_6 = S_5[S_4]$ $S_8 = prod + S_7$ $S_7 = S_3 * S_6$ $prod = S_8$ $S_0 = i+1$ $prod = prod + S_7$ $i = S_{9}$ $lf_{3/18/2019} \approx 20 \text{ goto}(1)$ PROF. ANAND GHARU+ 1 If i <= 20 goto (1)

### Rearranging order of the code

 Consider Х and its DAG following basic block  $t_3$ t<sub>1</sub>  $t_1 = a + b$  $t_2 = c + d$  $t_2$ b e a  $t_3 = e - t_2$  $X=t_1-t_3$ C

## Rearranging order ...

 $t_1 = a + b$  $t_2 = c + d$  $\bar{t_3} = e - t_2$  $X = t_1 - t_3$ MOV  $a, R_0$ ADD b,  $R_0$ MOV c, R<sub>1</sub> ADD d,  $R_1$ MOV  $R_0, t_1$ MOV e,  $R_0$ SUB  $R_1$ ,  $R_0$ MOV  $t_1$ ,  $R_1$ SUB  $R_0$ ,  $R_1$ MOV R1. X

|                           | Rearranging the code as             |
|---------------------------|-------------------------------------|
|                           | $t_2 = c + d$                       |
| So. We need               | $t_3 = e - t_2$<br>$t_1 = a + b$    |
| o decide the              | $X=t_1-t_3$                         |
| order                     | gives                               |
|                           | $\sim$ MOV c, $R_o$                 |
| Register spilling         | ADD d, R <sub>o</sub>               |
|                           | MOV e, $R_1$                        |
|                           | SUB $R_0$ , $R_1$                   |
|                           | MOV a, R <sub>o</sub>               |
| <b>Register reloading</b> | ADD b, R <sub>o</sub>               |
|                           | SUB R <sub>1</sub> , R <sub>0</sub> |
| PROF. ANA                 | ND MOVR <sub>1</sub> , X            |
|                           |                                     |

### Ordering of Trees

Ordering can be decided using

- A) Heuristic Ordering
- B) Optimal Ordering( Labelling)

3/18/2019

PROF. ANAND GHARU

 Heuristics attempts to order the nodes of a DAG so that, if possible, a node immediately follows the evaluation of its left-mostoperand.

• The algorithm for heuristic ordering is given below. It lists the nodes of a DAG such that the node's reverse listing results in the computation order.

While there exists an unlisted interior node do

Select an unlisted node *m* whose parents have been listed list *n* 

while there exists a **left-most child** *m* of *n* that has no unlisted parents and *m* is not a **leaf do** 

list*m* 

}

*n* =m

order20 reverse of the order of listing of modes



#### Order: 7654321 3/18/2019 PROF. ANAND GHARU

### The previous order : $t_1 = a + b$ t2 = t1 - ct3= d + e t4 = t2 \* t3 $t_5 = t_1 + t_4$ t6 = t4 - t3t7 = t5 \* t6

**Final Order** decided is : t7 = d+ e t6 = a + bt5= t6- c t4 =t5\* t8 t7 = t4 - t8t2 = t6 + t4t1 = t2 \* t3

### Optimal Ordering of Trees : Labelling Algorithm

•Optimal ordering means yielding the order that has shortest instruction sequence

### Optimal Ordering of Trees : Labelling Algorithm

- Optimal ordering means yielding the order that has shortest instruction sequence
- Label each node of the tree bottom up with an integer denoting fewest number of registers required to evaluate the tree with no stores of immediate results.
- Generate code during tree traversal by first evaluating the operand requiring more registers

Optimal Ordering of Trees : Labeling Algorithm

## The Labeling Algorithm

#### if *n* is a leaf then

if n is the leftmost child of its parent then
 label(n) := 1

else

label(n) := 0

else begin

let  $n_1, n_2, ..., n_k$  be the children of n ordered by label so that  $label(n_1) \ge label(n_2) \ge ... \ge label(n_k)$ ;  $3/18/2babel(n) := \max_{1 \le \frac{n}{2} \ge k} (label(n_i) \ge i - 1)$ end

## **Optimal Ordering of Trees: Labeling** Algorithm

### An Example

For binary interior nodes:

$$label(n) = \begin{cases} \max(l1, l2), & \text{if } l1 \neq l2\\ l1 + 1, & \text{if } l1 = l2 \end{cases}$$



3/18/2019

Reference : Aho Ullman, Sethi

## **Code Generation From a Labeled Tree**

- Use a stack *rstack* to allocate *registers* R0, R1, ..., R(*r*-1)
- The value of a tree is always computed in the top register on *rstack*
- The function *swap*(*rstack*) interchanges the top two registers on *rstack*

• Use a stack *tstack* to allocate *temporary* PROF. ANAND GHARU *memory locations* T0, T1, ...



## The Function gencode

#### procedure gencode(n);

#### begin

if *n* is a left leaf representing operand *name* and *n* is the leftmost child of its parent then **print** 'MOV' || name || ',' || top(rstack) else if *n* is an interior node with operator *op*, left child  $n_1$ , and right child  $n_2$  then **if**  $label(n_2) = 0$  **then** /\* case 1 \*/ else if  $1 \le label(n_1) \le label(n_2)$  and  $label(n_1) \le r$  then /\* case 2 \*/ else if  $1 \le label(n_2) \le label(n_1)$  and  $label(n_2) \le r$  then /\* case 3 \*/ else /\* case 4, both labels  $\geq r * /$ 

end<sub>8/2019</sub>

## The Function gencode

/\* case 1 \*/

#### begin

let name be the operand represented by n<sub>2</sub>;
gencode(n<sub>1</sub>);
print op || name || ',' || top(rstack)
end

/\* case 2 \*/

#### begin

end

swap(rstack); gencode( $n_2$ ); R := pop(rstack); gencode( $n_1$ ); **print** op || R || ', ' || top(rstack); push(rstack, R); swap(rstack);

3/18/2019

### The Function gencode

/\* case 3 \*/
begin
 gencode(n\_1);
 R := pop(rstack); gencode(n\_2);
 print op || R || ',' || top(rstack);
 push(rstack, R);
end

/\* case 4 \*/

#### begin

3/18/2019

gencode $(n_2)$ ; T := pop(tstack); **print** 'MOV'  $\parallel$  top(rstack)  $\parallel$  ','  $\parallel$  T; gencode $(n_1)$ ; push(tstack, T); **print** op  $\parallel$  T  $\parallel$  ','  $\parallel$  top(rstack);

### An Example



gencode(t4) [R1, R0] /\* 2 \*/ gencode(t3) [R0, R1] gencode(e) [R0, R1] print MOV e, R1 gencode(t2) [R0] gencode(c) [R0] print MOV c, R0 print ADD d, R0 print SUB R0, R1 gencode(t1) [R0] gencode(a) [R0] print MOV a, R0

/\* 3 \*/ /\* 0 \*/ /\* 1 \*/ /\* 0 \*/

/\* 1 \*/ /\* 0 \*/

3/18/2019

## **Multiregister Operations**

- Some operations like multiplication, division, or a function call normally require more than one register
- The labeling algorithm needs to ensure that *label(n)* is always at least the number of registers required by the operation

$$label(n) = \begin{cases} \max(2, l1, l2), & \text{if } l1 \neq l2\\ l1 + 1, & \text{if } l1 = l2 \end{cases}$$

3/18/2019

## **Code Generator Generators**

- A tool to automatically construct the instruction selection phrase of a code generator
- Such tools may use *tree grammars* or *context free grammars* to describe the *target machines*
- *Register allocation* will be implemented as a separate mechanism
- *Graph coloring* is one of the approaches for register allocation



## Code Generator Generators Tree Rewriting

- The code is generated by *reducing* the input tree into a single node using a sequence of *tree-rewriting rules*
- Each tree rewriting rule is of the form *replacement* ← *template* { *action* }
  - *replacement* is a single node
  - *template* is a tree
  - *action* is a code fragment

## • A set of tree-rewriting rules is called a *tree*-

translation scheme<sup>PROF.</sup> ANAND GHARU ence : Aho Ullman, Sethi



Each tree template represents a computation performed by the sequence of machines instructions emitted by the associated action

3/18/2019

## **Tree Rewriting Rules**













3/18/2019



3/18/2019

PROF. ANAND GHARU Reference : Aho Ullman, Sethi

105



3/18/2019

### CODEGENERATORGENERATOR: FINALCODE GENERATED

MOV#a,R0 ADD Sp, R0 ADD i(SP), R0 MOV b, R1 INC R1 MOV R1, \*R0

3/18/2019

### Not insyllabus...but may ask Dynamic Programming Code Generation

- The *dynamic programming* algorithm applies to a *broad* class of *register* machines with *complex* instruction sets
- Machines has *r* interchangeable registers
- Machines has instructions of the form  $R_i = E$

where E is any expression containing operators, registers, and memory locations. If E involves registers, then Ri must be one of them

## **Dynamic Programming**

• The *dynamic programming* algorithm partitions the problem of generating optimal code for an expression into sub-problems of generating optimal code for the subexpressions of the given expression



3/18/2019

## **Contiguous Evaluation**

- We say a program *P* evaluates a tree *T* contiguously if
- it first evaluates those subtrees of *T* that need to be computed into *memory*
- it then evaluates the subtrees of the root in *either* order
- it finally evaluates the root

## **Optimally Contiguous Program**

- For the machines defined above, given any program *P* to evaluate an expression tree *T*, we can find an *equivalent* program *P*' such that
  - P' is of no higher cost than P
  - P' uses no more registers than P
  - P' evaluates the tree in a contiguous fashion
- This implies that every expression tree can be evaluated *optimally* by a *contiguous* program

3/18/2019

• *Phase 1*: compute bottom-up for each node *n* of the expression tree *T* an array *C* of costs, in which the *i*th component C[i] is the optimal cost of computing the subtree Srooted at *n* into *a register*, assuming *i* registers are available for the computation. C[0] is the optimal cost of computing the subtree *S* into *memory* 

- To compute C[i] at node n, consider each machine instruction R := E whose expression E matches the subexpression rooted at node n
- Determine the costs of evaluating the operands of *E* by examining the cost vectors at the corresponding descendants of *n*

- For those operands of *E* that are registers, consider all possible orders in which the corresponding subtrees of *T* can be evaluated into registers
- In each ordering, the first subtree corresponding to a register operand can be evaluated using *i* available registers, the second using *i*-1 registers, and so on

- For node *n*, add in the cost of the instruction R := E that was used to match node *n*
- The value *C*[*i*] is then the minimum cost over all possible orders
- At each node, store the instruction used to achieve the best cost for C[i] for each *i*
- The smallest cost in the vector gives the minimum cost of evaluating *T*

- *Phase 2*: traverse *T* and use the cost vectors to determine which subtrees of *T* must be computed into memory
- *Phase 3*: traverse *T* and use the cost vectors and associated instructions to generate the final target code

### An Example

Consider a machine with two registers R0 and R1 and instructions

 $Ri := Mj \qquad Mi := Ri \qquad Ri := Rj$  $Ri := Ri \ op \ Rj \qquad Ri := Ri \ op \ Mj$ 



3/18/2019

### An Example



# **OTHANK YOUMYBLOG:** anandgharu.wordpress.comg:

anandgharu.wordpress.com

PROF. ANAND GHARU

3/18/2019