Nebula Graph source code interpretation series Vol.04 implementation of Optimizer based on RBO

In the previous article, we described how an execution plan is generated. This time, let's see that the generated execution plan is optimized by Optimizer.

summary

Optimizer, as its name suggests, is a component used to optimize execution plans. Database optimizers are usually divided into two types: rule-based optimizer (RBO) and Cost-based optimizer (CBO). The former optimizes completely based on preset optimization rules, and the matching conditions and optimization results are relatively fixed; The latter will calculate the execution costs of different execution plans according to the collected data statistics, and try to select the execution plan with the lowest cost.

At present, the main implementation of Nebula Graph is RBO, so this paper also focuses on the RBO implementation in Nebula Graph.

Source location

The source code implementation of the optimizer is under the src/optimizer directory, and the file structure is as follows:

.
├── CMakeLists.txt
├── OptContext.cpp
├── OptContext.h
├── OptGroup.cpp
├── OptGroup.h
├── Optimizer.cpp
├── Optimizer.h
├── OptimizerUtils.cpp
├── OptimizerUtils.h
├── OptRule.cpp
├── OptRule.h
├── rule
│   ├── CombineFilterRule.cpp
│   ├── CombineFilterRule.h
│   ├── EdgeIndexFullScanRule.cpp
│   ├── EdgeIndexFullScanRule.h
|    ....
|
└── test
    ├── CMakeLists.txt
    ├── IndexBoundValueTest.cpp
    └── IndexScanRuleTest.cpp

The test directory is the test directory, the rule directory is the preset rule set, and other source files are the specific implementation of the optimizer.

The entry of the optimizer optimized query is in the src/service/QueryInstance.cpp file, as shown below:

Status QueryInstance::validateAndOptimize() {
    auto *rctx = qctx()->rctx();
    VLOG(1) << "Parsing query: " << rctx->query();
    auto result = GQLParser(qctx()).parse(rctx->query());
    NG_RETURN_IF_ERROR(result);
    sentence_ = std::move(result).value();

    NG_RETURN_IF_ERROR(Validator::validate(sentence_.get(), qctx()));
    NG_RETURN_IF_ERROR(findBestPlan());

    return Status::OK();
}

The findBestPlan function calls the optimizer to optimize and return a new optimized execution plan.

Brief description of optimization process

From the perspective of topology, the execution plan currently designed by Nebula Graph is a directed acyclic graph, which is organized by each node pointing to its dependent nodes. Theoretically, each node can specify the results of any node as input, but it is meaningless to use the results of a node that has not been executed, Therefore, when generating an execution plan, it is restricted to use only the nodes that have been executed as input. At the same time, the execution plan also executes special nodes such as loops and conditional branches. As shown in Figure 1, the execution plan is generated by the query statement GO 2 STEPS FROM 'Tim Duncan' OVER like.

Figure 1

At present, the main function of the optimizer is to match in the execution plan according to the preset mode. If the matching is successful, call the corresponding conversion function to convert the matched part of the execution plan according to the preset rules. For example, the execution plan in the form of GetNeighbor → limit is converted into the GetNeighbor operator pushed down by limit to realize the operator push down optimization.

Concrete implementation

First, the optimizer does not directly operate on the execution plan, but first converts the execution plan into OptGroup and OptGroupNode. OptGroup represents a single optimization group (usually one or more equivalent operators). OptGroupNode represents an independent operator and pointers to dependencies and branches. That is, OptGroupNode retains the topology of the execution plan. The reason for such a structure transformation is to abstract the topology of the execution plan, shield some unnecessary execution details (such as loops and conditional branches), and easily save some rule matching contexts in the new structure.

The conversion process is basically a simple preorder traversal, and in the traversal process, the operator is converted into the corresponding OptGroup and OptGroupNode. For ease of description, the structure composed of OptGroup and OptGroupNode is called optimization plan, which is distinguished from execution plan.

After the conversion, it will start to match the rules and make the corresponding optimization plan conversion. All predefined rules will be traversed here, and each rule will do a bottom up traversal matching on the optimization plan. Specifically, start from the most leaf layer OptGroup to the root node OptGroup, and do a top down traversal on the OptGroup node in each OptGroup node to match the rule pattern.

As shown in Figure 2, the pattern to be matched here is Limit - > project → GetNeighbors. In the order of bottom up, first match in the order of top down at the start node. Start does not equal the failure of Limit matching. Then start from GetNeighbors, the same top down matching fails until the start of Limit. After the matching is successful, the matching part of the optimization plan will be converted according to the transform function defined by the rule. For example, figure 2 will merge Limit and GetNeighbors.

Figure 2

Finally, the optimizer will convert the optimized plan that has been optimized into an execution plan again. This is the opposite of the first step, but it is also a recursive process of traversing the transformation.

How to add a new rule

In the previous article, we learned about the implementation of the entire optimizer component, but we don't need to know much about the implementation details of the optimizer for adding optimization rules, just how to define new rules. Here, we take Limit push down as an example to explain the implementation of a typical optimization rule. See src/optimizer/rule/LimitPushDownRule.cpp for the source code of the Limit push down rule:

std::unique_ptr<OptRule> LimitPushDownRule::kInstance =
    std::unique_ptr<LimitPushDownRule>(new LimitPushDownRule());

LimitPushDownRule::LimitPushDownRule() {
    RuleSet::QueryRules().addRule(this);
}

const Pattern &LimitPushDownRule::pattern() const {
    static Pattern pattern =
        Pattern::create(graph::PlanNode::Kind::kLimit,
                        {Pattern::create(graph::PlanNode::Kind::kProject,
                                         {Pattern::create(graph::PlanNode::Kind::kGetNeighbors)})});
    return pattern;
}
StatusOr<OptRule::TransformResult> LimitPushDownRule::transform(
    OptContext *octx,
    const MatchedResult &matched) const {
    auto limitGroupNode = matched.node;
    auto projGroupNode = matched.dependencies.front().node;
    auto gnGroupNode = matched.dependencies.front().dependencies.front().node;

    const auto limit = static_cast<const Limit *>(limitGroupNode->node());
    const auto proj = static_cast<const Project *>(projGroupNode->node());
    const auto gn = static_cast<const GetNeighbors *>(gnGroupNode->node());

    int64_t limitRows = limit->offset() + limit->count();
    if (gn->limit() >= 0 && limitRows >= gn->limit()) {
        return TransformResult::noTransform();
    }

    auto newLimit = static_cast<Limit *>(limit->clone());
    auto newLimitGroupNode = OptGroupNode::create(octx, newLimit, limitGroupNode->group());

    auto newProj = static_cast<Project *>(proj->clone());
    auto newProjGroup = OptGroup::create(octx);
    auto newProjGroupNode = newProjGroup->makeGroupNode(newProj);

    auto newGn = static_cast<GetNeighbors *>(gn->clone());
    newGn->setLimit(limitRows);
    auto newGnGroup = OptGroup::create(octx);
    auto newGnGroupNode = newGnGroup->makeGroupNode(newGn);

    newLimitGroupNode->dependsOn(newProjGroup);
    newProjGroupNode->dependsOn(newGnGroup);
    for (auto dep : gnGroupNode->dependencies()) {
        newGnGroupNode->dependsOn(dep);
    }

    TransformResult result;
    result.eraseAll = true;
    result.newGroupNodes.emplace_back(newLimitGroupNode);
    return result;
}

std::string LimitPushDownRule::toString() const {
    return "LimitPushDownRule";
}

To define a rule, first inherit the OptRule class. Then, implement the pattern interface, where the pattern to be matched is required to be returned. The pattern is composed of operators and operator dependencies, such as Limit - > Project - > GetNeighbors. Then we need to implement the transform interface. The transform interface will pass in a matching optimization plan. We will analyze the matching optimization plan according to the predefined pattern, and make corresponding optimization transformation for the optimization plan, such as merging the Limit operator into the GetNeighbors operator, and then returning the optimized optimization plan.

As long as the two interfaces are implemented correctly, our new optimization rules can work normally.

The above is an introduction to Nebula Graph Optimizer.

AC diagram database technology? Please join the Nebula communication group first Fill out your Nebula business card , Nebula's little assistant will pull you into the group~~

[activity] Nebula Hackathon 2021 is in progress. Let's explore the unknown and get a ¥ 150000 bonus → https://nebula-graph.com.cn/hackathon/

Keywords: C++

Added by DJ Judas on Tue, 16 Nov 2021 10:17:32 +0200