COMP26020 - Part III (compiler) experimental exercise
Register allocation using graphic coloring
background
No matter which programming language is used, computer programs usually use more variables than all CPU registers can hold. When a program is compiled and executed on a given processor, the compiler needs to consider which variables will be stored in registers and how long they will be stored. If we think it takes several cycles to move data from memory, there will be a performance benefit if the compiler can minimize this transfer. How? By doing some "smart" register allocation, for example, by ensuring that the most commonly used variables are placed in registers.
To understand this problem, consider the following code:
1.r1 = x
2.r2 = y
3.r3 r2 = r1 *
4.r4 z =
5.r5 = r4 + r2
6.r6 = w
7.r7 = r5 + r6
8.r8 = r7 * r3
9.r9 model = r8 + r1
In this code, the programmer uses nine variables. However, does this mean that nine registers are required? To find the answer, let's define the concept of a living range. For any given variable, there is an active range from the point assigned to the variable to the last use of the specific value. Note that if a new value is assigned to the same variable, a new activity range begins. For example, instruction 2 defines a value of r2. It was last used in instruction 5, so the valid values range from 2 to 5. However, if instruction 4 is r2=z, the live range will be from 2 to 3, and the other live range will start from instruction 4 to the end of instruction 5.
In order to practice, you may need to find all the activity scopes of the above code. The answers given are: R1: [1,9], R2: [2,5], R3: [3,8], R4: [4,5], R5: [5,7],r6:[6,7],r7: [7,8], r8:(8,9),r9 model: [9,9].
Activity ranges are important because they indicate how many activity values are required in any given instruction. For example, the above live range tells us that four values in instruction 6 need to be live. Obviously, the maximum number of live values required in any instruction indicates how many registers we need so that all values (live ranges) can be placed in registers. Most importantly, however, live ranges can guide register allocation: two live ranges that do not overlap or interfere can use the same register. For example, for the above live range, r2 and r6 can share the same register because different parts of the code need corresponding values (or "live"). Different algorithms have been developed to find how to allocate different live ranges to registers. This problem is called register allocation. This is a np complete problem, which means that most of the different solutions proposed over the years are based on heuristics. For more information, you can find more in Chapter 13 of the recommended textbook "compilation Engineering":
https://www.sciencedirect.com/science/article/pii/B978012088478000013X
Among different methods, register allocation using graph coloring is a common method. In register allocation using graph coloring, live range is used to create interference graph. In this diagram, each active range corresponds to a node. If the active ranges overlap, there is an edge between the two nodes. Then, the register allocation problem is equivalent to the graph coloring problem. This is a famous graph theory problem. Its purpose is to paint all nodes of the graph with color, so that two adjacent nodes will not have the same color. The usual goal is to find a minimum number of colors. Each color corresponds to a range and a range
The color of the node corresponds to the register applied to a specific active range. There are various algorithms to color graphs. Here, we will focus on a simple algorithm called top-down coloring. The working principle of the algorithm is as follows:
1. Suppose there is an ordered color list (such as red, black, blue, etc.)
2. Set the interferogram, and the node number is: 1,2,3
3. The nodes (i.e. active range) in the interference diagram are arranged in descending order according to the number of neighbors. In the case of tie (i.e. nodes with the same number of neighbors), the node with the lowest id takes precedence.
4. Assign colors according to the order in the color list. For each node, select the first color from the list that is not used by the node's neighbors.
5. Continue to follow the ranking and repeat step 4 until all nodes are shaded.
Your mission
Use the programming language of your choice to implement a program:
read the file listing the interferogram (input).
write a file that lists the colors of each node of the graph (output).
The color list is given by the capital letters in the alphabet: A, B, C,..., Z. there are 26 colors (or registers).
Input file specification:
The number of lines equal to the number of interferogram nodes. Each row contains the number of nodes (in ascending order) and the numbers of all nodes that interfere with it. example:
1234
214
31
412
This means that node 1 interferes with nodes 2, 3 and 4. Node 2 interferes with node 1 (we already know) and node 4. Node 3 interferes with node 1, and node 4 interferes with node 1 and node 2. You can assume no more than 50 nodes.
Output file specification:
For each node (in ascending order), write the node number and color. For example:
1
2 b
3 b
4 ° C
Make sure you follow the specifications, as most of the marking will be done automatically. Your executable should accept two parameters that indicate the name of the input file and the name of the output file. For example:
myprogram.exe input. Txt output. txt
You should be able to complete this task in two experiments. The registration deadline is 2nd 3 A lab session before the course begins.
You should submit through gitlab. Your submission should include source files and readme files, and explain how to compile and run your code.
(out of 10 points) will be scored according to the following scheme: 2 points for code readability and reasonable comments / explanations. 2 correctly implement the marking of the above example. correctly find 6 points for additional automatic test output.
#include <bits/stdc++.h> #include<iostream> #include<cstring> #include<fstream> using namespace std; const int MAXN = 100; int n, m = 26; //n represents the number of nodes and m represents the number of colors int a[MAXN][MAXN]; struct node{ int idx; int cnt; }Node[MAXN]; void NextValue(int k,int m,int* x) { //X [node [k] in [1,m] IDX] determines the color with the smallest value and does not conflict with its adjacent pins //x[Node[k].idx]=0 indicates that there is no available color, and the color is numbered from 1 int j; do{ x[Node[k].idx]=(x[Node[k].idx]+1) % (m+1); //Try the next color if(!x[Node[k].idx]) return; //No colors available for(j = 1; j < k; j++) { if(a[Node[k].idx][Node[j].idx] && x[Node[k].idx] == x[Node[j].idx]) //If (i,j) is the edge of the graph, and the adjacent nodes k and j have the same color break; //In case of conflict, select the next color } if(j == k) return; //Successful selection of a color returns }while(1); } void mColoring(int k,int m,int *x) { if(k == n+1) { //An m-coloring scheme of graph G is obtained return; } NextValue(k, m, x); //Is x [node [k] IDX] assign color if(!x[Node[k].idx]) return; //x[Node[k].idx]=0 indicates that there is no suitable color at present else mColoring(k+1,m,x); //Color has been assigned to the first k nodes. Try the other nodes } bool cmp(const node &A, const node &B) { if(A.cnt > B.cnt) { return 1; } return 0; } int main(int argc, char *argv[]) //Main function { ifstream infile(argv[1],ios::in); if(!infile){ // Determine whether the file exists cerr<<"open error."<<endl; exit(1); // Exit program } char str[255]; // Defines the character array used to accept and read a row of data n = -1; while(infile) { infile.getline(str,255); // The getline function can read the entire line and save it in the str array int idx = str[0]-'0'; Node[idx].idx = idx; Node[idx].cnt = strlen(str)/2; for(int i = 2; i < strlen(str); i+=2) { int xx = str[i]-'0'; a[idx][xx] = 1; } n++; } sort(Node+1, Node+n+1, cmp); int *x = new int[MAXN]; for(int i = 1; i <= n; i++) x[i] = 0; mColoring(1, m,x); freopen(argv[2], "w", stdout); for(int i = 1; i <= n; i++) { printf("%d%c\n", i,'A'+x[i]-1); } return 0; }