Implement a keyword in TiDB -- Parser

preface

In fact, we always want to do something cool and hacker based on TiDB. For example, our team sent an article about TiDB for Pg compatible with Gitlab. For specific articles, please refer to the link:

TiDB4PG compatible Gitlab - Zhihu (zhihu.com)

In this article, I'll briefly talk about the hard process of achieving compatibility with Gitlab.

We adopted a relatively stupid way to connect Gitlab's source code to the original tidb for PG through compilation and startup. This must be an error. After all, many things are not compatible. In order to quickly realize compatibility, we decided to take the way of capturing packets, find out the SQL statements executed by Gitlab connected to tidb for PG, sort them out roughly, and see what SQL statements there are to customize compatible development. One of the difficulties in the compatible implementation of these SQL statements is the Returning keyword in DML statements.

principle

In tidb server, the process of executing an SQL statement can be roughly divided into several stages: parsing, compiling, optimizing, executing and Returning results. To implement a keyword, you also need to do some articles in these stages. For the Returning keyword, we can start with the relatively simple DELETE statement in the DML statement, so the final result of the next transformation process is to realize the DELETE Returning sentence pattern.

Transformation realization process

Parser

From the process of SQL flow in TiDB, the first step to follow-up code is to parse the SQL statement from the client into a structure that can be recognized by the follow-up code, that is, AST tree. This process is mainly implemented in the Parser module.

In TiDB V5 Before 5.0, the Parser package had a special code warehouse and was imported into TiDB through go mod. After 5.0, TiDB moved the Parser package to the TiDB source code. The source code of TiDB for PG is also based on TiDB v4 0.14 transformation. This time, I want to try to implement the RETURNING keyword in the latest source code of TiDB. One is to prepare for the hackathon competition, and the other is to test the water for TiDB for PG to move closer to the new version of TiDB.

The Paser module is mainly composed of lexer & yacc modules. In the process of parsing, the lexer component will convert the SQL text into one token after another and pass it to the parser, and the most important parser in the parser Go file is the goyacc tool according to parser Y file is generated. According to the definition of syntax in the file, it determines what syntax rules the token passed from lexer can match. Finally, it outputs the ast structure tree, that is, all kinds of stmts defined in parser/ast, and what we want to achieve is DML DeleteStmt in go.

// DeleteStmt is a statement to delete rows from table.
// See https://dev.mysql.com/doc/refman/5.7/en/delete.html
type DeleteStmt struct {
    dmlNode

    // TableRefs is used in both single table and multiple table delete statement.
    TableRefs *TableRefsClause
    // Tables is only used in multiple table delete statement.
    Tables       *DeleteTableList
    Where        ExprNode
    Order        *OrderByClause
    Limit        *Limit
    Priority     mysql.PriorityEnum
    IgnoreErr    bool
    Quick        bool
    IsMultiTable bool
    BeforeFrom   bool
    // TableHints represents the table level Optimizer Hint for join type.
    TableHints []*TableOptimizerHint
    With       *WithClause
    // Our topic today is the keyword Returning
    Returning  *ReturningClause
}

type ReturningClause struct {
    node
    Fields *FieldList
}

func (n *ReturningClause) Restore(ctx *format.RestoreCtx) error {
    ctx.WriteKeyWord("Returning ")
    for i, item := range n.Fields.Fields {
        if i != 0 {
            ctx.WritePlain(",")
        }
        if err := item.Restore(ctx); err != nil {
            return errors.Annotatef(err, "An error occurred while restore ReturningClause.Fields[%d]", i)
        }
    }
    return nil
}

func (n *ReturningClause) Accept(v Visitor) (Node, bool) {
    newNode, skipChildren := v.Enter(n)
    if skipChildren {
        return v.Leave(newNode)
    }
    n = newNode.(*ReturningClause)

    if n.Fields != nil {
        node, ok := n.Fields.Accept(v)
        if !ok {
            return n, false
        }
        n.Fields = node.(*FieldList)
    }

    return v.Leave(n)
}

Sorry, the space is limited. I can't post all the code. What is worth mentioning here is the Accept() method. In the ast package, almost all stmt structures implement ast Node interface, the Accept() method in this interface, is mainly used to process ast, traverse all nodes through the Visitor mode, and make a conversion to the ast structure. In order to normally convert the RETURNING keyword into DeleteStmt, we also need to register the RETURNING keyword as token in the parser.

In parser The definitions area in Y defines the token of the return related sentence pattern, such as the return keyword, as well as the ReturningClause and ReturningOption sentence patterns.

For some basic knowledge about parser, please refer to the article:

Simple interpretation and transformation method of TiDB Parser module - Zhihu (zhihu.com)

TiDB source code reading series (V) implementation of TiDB SQL Parser | PingCAP

After doing this, we will be able to In the rule part of Y, find the DELETE sentence pattern, add the returning sentence pattern, that is, returning option, and then write simple logic in it.

/*******************************************************************
 *
 *  Delete Statement
 *
 *******************************************************************/
DeleteWithoutUsingStmt:
    "DELETE" TableOptimizerHintsOpt PriorityOpt QuickOptional IgnoreOptional "FROM" TableName PartitionNameListOpt TableAsNameOpt IndexHintListOpt WhereClauseOptional OrderByOptional LimitClause ReturningOptional
    {
        ... Omitted here ...
        if $14 != nil {
            x.Returning = $14.(*ast.ReturningClause)
        }

        $$ = x
    }
|    "DELETE" TableOptimizerHintsOpt PriorityOpt QuickOptional IgnoreOptional TableAliasRefList "FROM" TableRefs WhereClauseOptional ReturningOptional
    {
        ... Omitted here ...
        if $10 != nil {
            x.Returning = $10.(*ast.ReturningClause)
        }
        $$ = x
    }

DeleteWithUsingStmt:
    "DELETE" TableOptimizerHintsOpt PriorityOpt QuickOptional IgnoreOptional "FROM" TableAliasRefList "USING" TableRefs WhereClauseOptional ReturningOptional
    {
        ... Omitted here ...
        if $11 != nil {
            x.Returning = $11.(*ast.ReturningClause)
        }
        $$ = x
    }

ReturningClause:
    "RETURNING" SelectStmtFieldList
    {
        $$ = &ast.ReturningClause{Fields: $2.(*ast.FieldList)}
    }

ReturningOptional:
    {
        $$ = nil
    }
|    ReturningClause
    {
        $$ = $1
    }

Then you can use the parser/bin/goyacc tool according to the latest Paser Y generates the final parser Go, enter the parser package and run make all.

It should be noted that for keywords, the latest parser is generated After go, we also need to create a file in parser / misc Go, which is because lexer uses dictionary tree technology for token recognition, and its implementation code is in it. Otherwise, lexer will not recognize this so-called keyword.

The verification after modification is actually very simple. Find the parser in the parser package_ test. Go test file, write a sentence pattern of delete returning, run the test again, and after that, it is OK.

You can also start the tidb source code, connect it with the mysql client, execute a delete returning sentence, and return successfully. Then it shows that this keyword is also compatible and successful.

Simple summary

At this stage, the preliminary keyword compatibility has been realized. Note that it is only preliminary compatibility now. To make it effective, you need to enter the next Plan formulation and actuator execution. This part is in tidb V5 The transformation of 0 is still in the process of research. After all, compared with tidb v4 There are some changes in the Plan formulation and optimization of 0.14, which have not been studied in time. I will elaborate in subsequent articles. Finally, let's take a look at the return compatibility results of TiDB for PG.

Keywords: tidb

Added by B-truE on Sat, 29 Jan 2022 14:29:06 +0200