What are the possible ways to refactor "arrow" code?

The article comes from Chen Hao

So back to business,

The so-called arrow code is basically the kind of code shown in the following picture. Countless if's look big.

So, what's wrong with this "arrow" code? It also looks good, symmetrical and beautiful. But

There are several questions about arrow Codes:

1) If my monitor is not wide enough and the arrow code is indented too hard, I need to pull the horizontal scroll bar back and forth, which makes me quite uncomfortable when reading the code.

2) In addition to the width, there is also the length. There are too many if else codes in some codes. When you read the middle, you don't know what kind of layer by layer inspection the middle code came here.

In short, if the "arrow code" is nested too much and the code is too long, it will be quite easy for the code maintainers (including themselves) to get lost in the code, because when they see the innermost code, the code readers may not know what the condition judgment of the previous layer is and how the code runs here, so, Arrow code is very difficult to maintain and Debug.

catalogue

l case and Guard Clauses

l extract into function

l code outside nested if

l state check nesting

l extended thinking

n check error

n check status

l summary

Case and Guard Clauses

Let's take a look at an example. If the amount of code is a little larger and nested a little more, you can easily get lost in the condition (the following example may be just a small arrow under the "big arrow")

FOREACH(Ptr<WfExpression>, argument, node->arguments) {
    int index = manager->expressionResolvings.Keys().IndexOf(argument.Obj());
    if (index != -1) {
        auto type = manager->expressionResolvings.Values()[index].type;
        if (! types.Contains(type.Obj())) {
            types.Add(type.Obj());
            if (auto group = type->GetTypeDescriptor()->GetMethodGroupByName(L"CastResult", true)) {
                int count = group->GetMethodCount();
                for (int i = 0; i < count; i++) { auto method = group->GetMethod(i);
                    if (method->IsStatic()) {
                        if (method->GetParameterCount() == 1 &&
                            method->GetParameter(0)->GetType()->GetTypeDescriptor() == description::GetTypeDescriptor<DescriptableObject>() &&
                            method->GetReturn()->GetTypeDescriptor() != description::GetTypeDescriptor<void>() ) {
                            symbol->typeInfo = CopyTypeInfo(method->GetReturn());
                            break;
                        }
                    }
                }
            }
        }
    }
}

In the above code, you can try to write the conditions in reverse, and then you can solve the arrow code. The reconstructed code is as follows:

FOREACH(Ptr<WfExpression>, argument, node->arguments) {
    int index = manager->expressionResolvings.Keys().IndexOf(argument.Obj());
    if (index == -1)  continue;
    
    auto type = manager->expressionResolvings.Values()[index].type;
    if ( types.Contains(type.Obj()))  continue;
    
    types.Add(type.Obj());

    auto group = type->GetTypeDescriptor()->GetMethodGroupByName(L"CastResult", true);
    if  ( ! group ) continue;
 
    int count = group->GetMethodCount();
    for (int i = 0; i < count; i++) { auto method = group->GetMethod(i);
        if (! method->IsStatic()) continue;
       
        if ( method->GetParameterCount() == 1 &&
               method->GetParameter(0)->GetType()->GetTypeDescriptor() == description::GetTypeDescriptor<DescriptableObject>() &&
               method->GetReturn()->GetTypeDescriptor() != description::GetTypeDescriptor<void>() ) {
            symbol->typeInfo = CopyTypeInfo(method->GetReturn());
            break;
        }
    }
}

This code reconstruction method is called Guard Clauses. The idea here is to let the wrong code return first, judge all the wrong judgments, and then the rest is the normal code.

Extract into function

Some people say that the continue statement destroys the smoothness of reading the code. In fact, we can see that all if statements are judging whether there is an error. Therefore, when maintaining the code, you can completely ignore these if statements, because they are error handling, and the rest of the code is normal function code, which is easier to read.

For example, for dealing with this situation in the above code, can we refactor without continue?

Of course, it can be drawn into a function:

bool CopyMethodTypeInfo(auto &method, auto &group, auto &symbol) 
{
    if (! method->IsStatic()) {
        return true;
    }
    if ( method->GetParameterCount() == 1 &&
           method->GetParameter(0)->GetType()->GetTypeDescriptor() == description::GetTypeDescriptor<DescriptableObject>() &&
           method->GetReturn()->GetTypeDescriptor() != description::GetTypeDescriptor<void>() ) {
        symbol->typeInfo = CopyTypeInfo(method->GetReturn());
        return false;
    }
    return true;
}

void ExpressionResolvings(auto &manager, auto &argument, auto &symbol) 
{
    int index = manager->expressionResolvings.Keys().IndexOf(argument.Obj());
    if (index == -1) return;
    
    auto type = manager->expressionResolvings.Values()[index].type;
    if ( types.Contains(type.Obj())) return;

    types.Add(type.Obj());
    auto group = type->GetTypeDescriptor()->GetMethodGroupByName(L"CastResult", true);
    if  ( ! group ) return;

    int count = group->GetMethodCount();
    for (int i = 0; i < count; i++) { auto method = group->GetMethod(i);
        if ( ! CopyMethodTypeInfo(method, group, symbol) ) break;
    }
}

...
...
FOREACH(Ptr<WfExpression>, argument, node->arguments) {
    ExpressionResolvings(manager, arguments, symbol)
}
...
...

After being drawn into a function, do you think the code is easier to read and maintain than before.

Although some people may think: "if the code is not shared, don't extract it into a function!" However, functions are the encapsulation or abstraction of code and are not necessarily used for code sharing. Functions are used to shield details and allow other codes to be coupled to the interface rather than the implementation of details, which will make our code simpler. Simple things can be easy to read and maintain. That's what functions do.

Code outside nested if

If the original code has code to be executed after each if statement, how should it be reconstructed. For example, the following code.

//original edition
for(....) {
    do_before_cond1()
    if (cond1) {
        do_before_cond2();
        if (cond2) {
            do_before_cond3();
            if (cond3) {
                do_something();
            }
            do_after_cond3();
        }
        do_after_cond2();
    }
    do_after_cond1();
}

Those in the above code

do_after_condX() is executed whether the condition is successful or not. Therefore, our flattened code is as follows:

//Refactoring First Edition
for(....) {
    do_before_cond1();
    if ( !cond1 ) {
        do_after_cond1();
        continue
    } 
    do_after_cond1();

    do_before_cond2();
    if ( !cond2 ) { 
        do_after_cond2();
        continue;
    }
    do_after_cond2();

    do_before_cond3();
    if ( !cond3 ) {
        do_after_cond3();
        continue;
    }
    do_after_cond3();

    do_something();  
}

You will find that the do above_ after_ There are two copies of condx. If the code in the if statement block changes some do_after_condX depends on the state, then this is the final version.

However, if they have no dependencies before, we can only keep one copy according to the DRY principle, so it's good to fall directly in front of the if condition, as shown below:

//Refactoring Second Edition
for(....) {
    do_before_cond1();
    do_after_cond1();
    if ( !cond1 ) continue;
 
    do_before_cond2();
    do_after_cond2();
    if ( !cond2 ) continue;

    do_before_cond3();
    do_after_cond3();
    if ( !cond3 ) continue;

    do_something();  
}

At this point, we change the order of execution and put the condition into do_ after_ After condx(). Is there a problem?

In fact, if you analyze the previous code, you will find that, originally, cond1 is to judge do_ before_ Whether cond1() is wrong. If it is successful, it will be executed next. And do_after_cond1() is executed anyway. Logically, do_after_cond1() is actually the same as do_before_cond1() has nothing to do with the execution result, but cond1 has nothing to do with whether to execute do or not_ before_ Cond2 () is related. If I change the line break into the following, the code logic will be clearer.

//Refactoring Third Edition
for(....) {
    do_before_cond1();
    do_after_cond1();

    if ( !cond1 ) continue;  // < -- cond1 becomes the condition for whether to make the second statement block
    do_before_cond2();
    do_after_cond2();

    if ( !cond2 ) continue; // < -- cond2 becomes the condition for whether to make the third statement block
    do_before_cond3();
    do_after_cond3();

    if ( !cond3 ) continue; //< -- cond3 becomes the condition for whether to make the fourth statement block
    do_something(); 
 
}

Therefore, when maintaining the code in the future, the maintainer can see at a glance when and where the code will be executed. At this time, you will find that if you extract these statement blocks into functions, the code will be cleaner, and then reconstruct the version:

//Refactoring Fourth Edition
bool do_func3() {
   do_before_cond2();
   do_after_cond2();
   return cond3;
}

bool do_func2() {
   do_before_cond2();
   do_after_cond2();
   return cond2;
}

bool do_func1() {
   do_before_cond1();
   do_after_cond1();
   return cond1;
}

// For loop you can reconstruct this
for (...) {
    bool cond = do_func1();
    if (cond) cond = do_func2();
    if (cond) cond = do_func3();
    if (cond) do_something();
}

// For loop can also be reconstituted like this
for (...) {
    if ( ! do_func1() ) continue;
    if ( ! do_func2() ) continue;
    if ( ! do_func3() ) continue;
    do_something();
}

Above, I have given two versions of for loop. It's up to me which implementation I like. At this time, because the code in for loop is very simple, even if you don't like continue, the cost of reading such code is very low.

State check nesting

Next, let's look at another example. The following code forges a scene - pulling two people into a one-to-one chat room. Because we want to check the status of both sides, the code may be written as an "arrow".

int ConnectPeer2Peer(Conn *pA, Conn* pB, Manager *manager)
{
    if ( pA->isConnected() ) {
        manager->Prepare(pA);
        if ( pB->isConnected() ) {
            manager->Prepare(pB);
            if ( manager->ConnectTogther(pA, pB) ) {
                pA->Write("connected");
                pB->Write("connected");
                return S_OK;
            }else{
                return S_ERROR;
            }

        }else {
            pA->Write("Peer is not Ready, waiting...");
            return S_RETRY;
        }
    }else{
        if ( pB->isConnected() ) {
            manager->Prepare();
            pB->Write("Peer is not Ready, waiting...");
            return S_RETRY;
        }else{
            pA->Close();
            pB->Close();
            return S_ERROR;
        }
    }
    //Shouldn't be here!
    return S_ERROR;
}

Refactoring the above code, we can first analyze the above code and explain that the above code is a combined "state" of "connected" and "not connected" of PeerA and PeerB (Note: the actual state should be more complex than this, and there may be "disconnected", "error"... And so on). Therefore, We can write the code as follows, merge the nesting conditions above, and judge each combination. In this way, the logic will be very clean and clear.

int ConnectPeer2Peer(Conn *pA, Conn* pB, Manager *manager)
{
    if ( pA->isConnected() ) {
        manager->Prepare(pA);
    }

    if ( pB->isConnected() ) {
        manager->Prepare(pB);
    }

    // pA = YES && pB = NO
    if (pA->isConnected() && ! pB->isConnected()  ) {
        pA->Write("Peer is not Ready, waiting");
        return S_RETRY;
    // pA = NO && pB = YES
    }else if ( !pA->isConnected() && pB->isConnected() ) {
        pB->Write("Peer is not Ready, waiting");
        return S_RETRY;
    // pA = YES && pB = YES
    }else if (pA->isConnected() && pB->isConnected()  ) {
        if ( ! manager->ConnectTogther(pA, pB) ) {
            return S_ERROR;
        }
        pA->Write("connected");
        pB->Write("connected");
        return S_OK;
    }

    // pA = NO, pB = NO
    pA->Close();
    pB->Close();
    return S_ERROR;
}

Extended thinking

For if else statements, generally speaking, it is to check two things: errors and status.

Check for errors

For checking errors, using Guard Clauses will be a standard solution, but we also need to pay attention to the following things:

1) Of course, when there is an error, there will be a need to release resources. You can use goto fail; This way, but the most elegant way should be C + + object-oriented RAII.

2) Returning by error code is a relatively simple method. This method has some problems. For example, if there are too many error codes, it will be very complex to judge the wrong code. In addition, normal code and wrong code will be mixed together, affecting readability. Therefore, in higher group languages, the use of try catch exception capture will make the code easier to read.

Check status

For the inspection status, there must be more complicated situations in practice, such as the following situations:

1) Like the state change of both ends in TCP protocol.

2) Various combinations of command options like shell commands.

3) Like a state change in a game (a very complex state tree).

4) State changes like parsing.

For these complex state changes, basically, you need to define a state machine, or a query table of combined states of sub States, or a state query analysis tree.

When writing code, the control state or business state in the operation of the code is an important reason for the confusion of your code process. A very important work of reconstructing "arrow" code is to re sort out and describe the transition relationship of these states.

summary

Well, let's summarize the following methods to refactor the "arrow" Code:

1) Use Guard Clauses. Try to let the error return first, so that you will get clean code later.

2) Extract the statement block in the condition into a function. Function is used to shield details and make other codes coupled to the interface rather than the implementation of details, which will make our code simpler. Simple things can be easy to read and maintain. The original intention of refactoring code is to write code that is easy to read and maintain!

3) For error handling, try catch exception handling and RAII mechanism are used. There are many problems in the error handling of return codes, such as: A) the return code can be ignored, B) the error handling code is mixed with the normally handled code, and C) the function interface is polluted, such as the bad function shared by error codes and return values such as atoi().

4) For the judgment and combination of multiple states, if it is complex, you can use the "combined state table" or the design mode of state machine plus Observer state subscription. Such code is decoupled, clean and simple, and also has strong scalability.

5) Refactoring "arrow" code is actually helping you re sort out all the code and logic. This process is very worth paying for. The process of rethinking and doing everything possible to simplify the code itself can make people grow.

I can only say that I still need to continue to study hard, come on!

The article comes from CoolShell

Keywords: C++ Back-end

Added by Sykoi on Sat, 26 Feb 2022 07:39:26 +0200