26. (small experiment) understanding the compilation principle: an interpreter for four operations

In the previous course, I mentioned the knowledge of compilation principles many times in the JavaScript and CSS parts. If we learn this part of knowledge from formal materials such as "dragon book" of compilation principle, it will take a lot of time. Therefore, I have designed a small experiment here to help you quickly understand the knowledge related to compilation principle.

Today's content is special. Let's do a detailed code experiment. I put the detailed code in the article. If you are listening to the audio, you can click the article to view the details.

analysis

According to the relevant knowledge of compilation principle, let's design the work. Here we are divided into several steps.

Definition of four operations: the lexical and grammatical definitions of the four operations are produced.
Lexical analysis: turn the input string stream into token.
Syntax analysis: turn token into abstract syntax tree AST.
Interpretation and execution: the AST is traversed in sequence and the results are obtained.

Define four operations

The four operations are addition, subtraction, multiplication and division, for example:

1 + 2 * 3

First, let's define the morphology. There are only numbers and operators in the four operations, so the definition is very simple, but we also need to pay attention to spaces and line breaks, so the lexical definition is probably as follows.

Token
- Number: 1 2 3 4 5 6 7 8 9 0 combination
- Operator: one of +, -, *, /
Whitespace: <sp>
LineTerminator: <LF> <CR>

Here we do not deal with blanks and newlines, so the lexical analysis stage will directly discard them.

Next, let's define the syntax. Most of the syntax definitions use BNF, but in fact, everyone writes indiscriminately. For example, the JavaScript standard is a self created syntax similar to BNF.

However, the core idea of grammatical definition will not change. It is the combination of several structures to produce a new structure, so grammatical definition is also called grammatical production.

Because addition, subtraction, multiplication and division have priority, we can think that addition is composed of several multiplications and connected by plus or minus signs:

<Expression> ::= 
    <AdditiveExpression><EOF>

This BNF is written in a similar way to the principle of recursion. You can understand that it represents a list. For convenience, we have to take ordinary numbers as a special case of multiplication.

<MultiplicativeExpression> ::=

<Number>

|<MultiplicativeExpression><*><Number>

|<MultiplicativeExpression></><Number>

Well, that's the definition of the four operations.

Lexical analysis: state machine

In the lexical analysis part, we turn the character stream into token stream. There are two schemes for lexical analysis, one is state machine and the other is regular expression. They are equivalent. Just choose what you like. I will introduce state machine to you here.

According to the analysis, we may generate four input elements, of which there are only two token s. The first state of our state machine is to judge which state is entered according to the first input character:

var token = [];
const start = char =&gt; {
    if(char === '1' 
        || char === '2'
        || char === '3'
        || char === '4'
        || char === '5'
        || char === '6'
        || char === '7'
        || char === '8'
        || char === '9'
        || char === '0'
    ) {
        token.push(char);
        return inNumber;   
    }
    if(char === '+' 
        || char === '-'
        || char === '*'
        || char === '/'
    ) {
        emmitToken(char, char);
        return start
    }
    if(char === ' ') {
        return start;
    }
    if(char === '\r' 
        || char === '\n'
    ) {
        return start;
    }
}
const inNumber = char =&gt; {
    if(char === '1' 
        || char === '2'
        || char === '3'
        || char === '4'
        || char === '5'
        || char === '6'
        || char === '7'
        || char === '8'
        || char === '9'
        || char === '0'

    ) {
        token.push(char);
        return inNumber;
    } else {
        emmitToken("Number", token.join(""));
        token = [];
        return start(char); // put back char
    }
}

This state machine is very simple. It has only two states, because we only have Number, not a single character token.

Here, my state machine implementation is a very classic way: use a function to represent the state, use if to represent the migration relationship of the state, and use the return value to represent the next state.

Let's run this state machine to try:

function emmitToken(type, value) {
    console.log(value);
}

var input = "1024 + 2 * 256"

var state = start;

for(var c of input.split(''))
    state = state(c);

state(Symbol('EOF'))

After running, we found that the output is as follows:

This is the answer we want.

Syntax analysis: LL

After finishing the lexical analysis, we begin the syntax analysis. LL syntax analysis writes a function according to each production formula. First, we write the function name:

function AdditiveExpression( ){


}
function MultiplicativeExpression(){
    

}

For ease of understanding, we will not do streaming processing. In fact, generally compiled code should support streaming processing.

So let's assume that all the token s have been obtained:

var tokens = [{

type:"Number",

value: "1024"

}, {

type:"+"

value: "+"

}, {

type:"Number",

value: "2"

}, {

type:""

value: ""

}, {

type:"Number",

value: "256"

}, {

type:"EOF"

}];

Each production corresponds to a function. For example, according to the production, our addresseexpression needs to deal with three situations:

<AdditiveExpression> ::=

<MultiplicativeExpression>

|<AdditiveExpression><+><MultiplicativeExpression>

|<AdditiveExpression><-><MultiplicativeExpression>

Then three if branches should be written in address veexpression to handle three situations.

Addresseexpression is written by using production to synthesize new nodes based on the incoming nodes

function AdditiveExpression(source){

if(source[0].type === "MultiplicativeExpression") {

let node = {

type:"AdditiveExpression",

children:[source[0]]

}

source[0] = node;

return node;

}

if(source[0].type === "AdditiveExpression" && source[1].type === "+") {

let node = {

type:"AdditiveExpression",

operator:"+",

children:[source.shift(), source.shift(), MultiplicativeExpression(source)]

}

source.unshift(node);

}

if(source[0].type === "AdditiveExpression" && source[1].type === "-") {

let node = {

type:"AdditiveExpression",

operator:"-",

children:[source.shift(), source.shift(), MultiplicativeExpression(source)]

}

source.unshift(node);

}

}

Next, we will pass the parsed token to our top-level processing function Expression.

Expression(tokens);

Next, let's see what Expression should do with it.

The first token received by our Expression is a Number. At this time, Expression is stupid because the production only tells us what to do when we receive addresseexpression.

At this time, we need to expand the first item of the production layer by layer and call the corresponding processing function according to all possibilities. This process is called "closure" in the compilation principle.

function Expression(source){
    if(source[0].type === "AdditiveExpression" &amp;&amp; source[1] &amp;&amp; source[1].type === "EOF" ) {
        let node = {
            type:"Expression",
            children:[source.shift(), source.shift()]
        }
        source.unshift(node);
        return node;
    }
    AdditiveExpression(source);
    return Expression(source);
}
function AdditiveExpression(source){
    if(source[0].type === "MultiplicativeExpression") {
        let node = {
            type:"AdditiveExpression",
            children:[source[0]]
        }
        source[0] = node;
        return AdditiveExpression(source);
    } 
    if(source[0].type === "AdditiveExpression" &amp;&amp; source[1] &amp;&amp; source[1].type === "+") {
        let node = {
            type:"AdditiveExpression",
            operator:"+",
            children:[]
        }
        node.children.push(source.shift());
        node.children.push(source.shift());
        MultiplicativeExpression(source);
        node.children.push(source.shift());
        source.unshift(node);
        return AdditiveExpression(source);
    }
    if(source[0].type === "AdditiveExpression" &amp;&amp; source[1] &amp;&amp; source[1].type === "-") {
        let node = {
            type:"AdditiveExpression",
            operator:"-",
            children:[]
        }
        node.children.push(source.shift());
        node.children.push(source.shift());
        MultiplicativeExpression(source);
        node.children.push(source.shift());
        source.unshift(node);
        return AdditiveExpression(source);
    }
    if(source[0].type === "AdditiveExpression")
        return source[0];
    MultiplicativeExpression(source);
    return AdditiveExpression(source);
}
function MultiplicativeExpression(source){
    if(source[0].type === "Number") {
        let node = {
            type:"MultiplicativeExpression",
            children:[source[0]]
        }
        source[0] = node;
        return MultiplicativeExpression(source);
    } 
    if(source[0].type === "MultiplicativeExpression" &amp;&amp; source[1] &amp;&amp; source[1].type === "*") {
        let node = {
            type:"MultiplicativeExpression",
            operator:"*",
            children:[]
        }
        node.children.push(source.shift());
        node.children.push(source.shift());
        node.children.push(source.shift());
        source.unshift(node);
        return MultiplicativeExpression(source);
    }
    if(source[0].type === "MultiplicativeExpression"&amp;&amp; source[1] &amp;&amp; source[1].type === "/") {
        let node = {
            type:"MultiplicativeExpression",
            operator:"/",
            children:[]
        }
        node.children.push(source.shift());
        node.children.push(source.shift());
        node.children.push(source.shift());
        source.unshift(node);
        return MultiplicativeExpression(source);
    }
    if(source[0].type === "MultiplicativeExpression")
        return source[0];

    return MultiplicativeExpression(source);
};

var source = [{
    type:"Number",
    value: "3"
}, {
    type:"*",
    value: "*"
}, {
    type:"Number",
    value: "300"
}, {
    type:"+",
    value: "+"
}, {
    type:"Number",
    value: "2"
}, {
    type:"*",
    value: "*"
}, {
    type:"Number",
    value: "256"
}, {
    type:"EOF"
}];
var ast = Expression(source);

console.log(ast);

Interpretation execution

After getting AST, we have solved the most difficult step. Here, we will not optimize or simplify the tree. Next, we will directly enter the execution phase. We just need to traverse the tree.

We can write if separately according to different node types and other information:



function evaluate(node) {

if(node.type === "Expression") {

return evaluate(node.children[0])

}

if(node.type === "AdditiveExpression") {

if(node.operator === '-') {

return evaluate(node.children[0]) - evaluate(node.children[2]);

}

if(node.operator === '+') {

return evaluate(node.children[0]) + evaluate(node.children[2]);

}

return evaluate(node.children[0])

}

if(node.type === "MultiplicativeExpression") {

if(node.operator === '*') {

return evaluate(node.children[0]) * evaluate(node.children[2]);

}

if(node.operator === '/') {

return evaluate(node.children[0]) / evaluate(node.children[2]);

}

return evaluate(node.children[0])

}

if(node.type === "Number") {

return Number(node.value);

}

}

summary

In this small experiment, we learned the basic knowledge of compilation principle through a small experiment. The purpose of the small experiment is to help you understand the basic concepts of compilation principle involved in JavaScript course. It is still far from the real study of compilation principle.

Through experiments, we understand the process of production, lexical analysis, grammatical analysis and interpretation execution.

Finally, I leave you with some challenges. You can choose according to your level:

Complete the emmitToken so that our code can work completely.
Add decimals to four operations.
Introduce negative numbers.
Add bracket function.

Keywords: Javascript

Added by AnsonM on Sat, 11 Dec 2021 04:03:00 +0200

Programming VIP