Consistency challenge: data consistency solution under microservice architecture

CAP principle and BASE theorem

CAP principle

CAP is the acronym combination of Consistency, Availability and Partition tolerance. It is a priority for designers of all distributed systems before design.

CAP principle
  • Consistency C means that after the update operation is successful, the data of all nodes are completely consistent at the same time.

  • Availability A represents whether the system can return the expected results in normal response time when users access data.

  • Partition fault tolerance P means that the distributed system can still provide services that meet consistency or availability when it encounters a node or network partition failure.

It is illustrated by an actual case. Take order system and inventory system as an example

Order and inventory system

In the actual business, the creation of orders will be accompanied by the reduction of inventory. In the original single architecture, these are two modules in one application. However, with the development of business, the pressure on the inventory module increases, the architect splits it into a new inventory system, and the inventory data is also stored in a separate database. The two systems communicate through the network, Naturally, the architecture pattern has also changed to distributed. But at the same time, the system failure probability is also increasing after adding network communication. Architects must consider the fault tolerance after applying "partition", which is the meaning of P partition fault tolerance in CAP. Partition fault tolerance is a factor that must be included in any distributed system.

So how to deal with it? There are two options:

  • First, give up C consistency to ensure A availability, referred to as AP. Specifically, after A communication failure occurs, the application will complete the operation of adding an order and abandoning the reduction of inventory. The application will immediately return A response and explain the specific processing details in the response. This scheme will give users A better experience, but the data level is incomplete and needs subsequent compensation.

  • The second is to give up A availability to ensure consistency C, referred to as CP. Specifically, after A communication failure occurs, the application will enter A blocking state and try to resume communication with the inventory system until all data processing is completed. This scheme gives priority to ensuring data integrity, but the user experience of this scheme is very poor, because the user will be waiting until all operations are completed.

CAP itself is mutually exclusive, and only two of them can be selected. Ca, AP and CP have their own application scenarios, which should be selected in combination with the actual situation. For example, because CA does not consider partition tolerance, So all its operations need to be completed in the same process (that is, what we often call single application); because AP gives up data consistency, it is suitable for projects that do not require high data but emphasize user experience, such as blogs, news information, etc.; on the contrary, CP gives up availability and is suitable for trading systems with high data requirements, such as bank transactions and e-commerce order transactions. Even if users wait for a long time, they should ensure the integrity and reliability of data .

The above is the application of CAP principle in practical projects. For Internet applications, it is not advisable to completely give up data consistency for user experience. After all, data is fundamental. How to solve it? This leads to a new theory: BASE theorem.

BASE theorem

The BASE theorem is basically available, Soft State and Eventually Consistent (final consistency) is the abbreviation of three phrases. BASE is the choice after weighing the consistency and availability in CAP. It comes from the conclusion of distributed practice of large-scale Internet system. It is gradually evolved based on CAP theorem. Its core idea is that even if consistency cannot be achieved, each application can adopt appropriate methods according to its own business characteristics The system achieves final consistency.

In the case just now, when the communication between the order and the inventory system fails, the architect selects the AP scheme to ensure the user's response speed. At this time, the order is created but the inventory is not reduced, which is inconsistent. In order to solve this problem, many projects will write a task thread to regularly check the communication status. When the communication is restored, the task will notify the inventory system to complete the inventory reduction process to keep the data consistent.

BASE allows the existence of soft state. The so-called soft state is the intermediate state when the order increases but the inventory does not decrease in the above case. In the subsequent remedial measures, the task thread compensates the inventory. At this time, the data is finally consistent, and the state at this time is the final consistency state. BASE theorem only requires to ensure that the data is ultimately consistent, so there is an intermediate soft state.

There are many measures to ensure the final consistency, such as TCC, task timing check, MQ message queue buffer, ETL day-end proofreading and even manual supplementary recording. Among these strategies, TCC is a consistency implementation scheme commonly used by architecture teachers. Let's first understand what TCC is.

TCC consistency scheme

TCC is the acronym of three letters Try, Confirm and Cancel. TCC does not refer to a certain technology, but a data consistency scheme. It divides the distributed processing process into two stages:

  • Try is the first stage for trying and locking resources;

  • If the resource is locked successfully, Confirm submission is performed in the second stage to complete the data operation;

  • If resource locking fails, Cancel in the second stage and roll back the data;

Assuming that Zhang San purchased 10 bottles of coke with a total price of 30 yuan, he needs to add an order with an amount of 30 yuan to the order form and reduce 10 bottles of coke in the inventory form. In the case of TCC scheme, two additional fields of advance amount and status need to be added in the order table, and the frozen inventory field needs to be added in the inventory table. Because the Try stage is used to lock resources, the 30 yuan of the new order is in the pre increase amount instead of directly updating the order amount field. At this time, the order status is "initial". Similarly, the inventory table should freeze 10 bottles of coke instead of directly updating 100 to 90.

Try stage

When all resources of the two systems are locked, enter the second stage to execute the confirm operation. Confirm is used to submit data. The process of submitting data is very simple. Move the pre increased amount in the order table to the order amount, the order status changes to completed, and the commodity inventory decreases accordingly.

Two stage Confirm operation

If an exception occurs to the locked resources in the Try phase, such as "insufficient inventory", enter the second phase and Cancel to Cancel the locked resources. The pre added amount in the order table is reset to 0, the order status changes to Cancel, and the frozen inventory in the inventory table is also reset to 0.

Phase II Cancel operation

The above is the implementation process of TCC in the actual project. You can find that TCC works on the data source and ensures the data consistency by adding additional locked resource fields on the control table. What are the precautions in the TCC scheme?

First, we should complete most of the business logic in the Try phase and do as many things as possible in the Try phase. At the beginning of TCC design, it is considered that Confirm or Cancel must be successful. Therefore, do not include any business code or remote communication in phase II, and release frozen resources only through the simplest code. Like this case, Confirm or Cancel only executes the update statement on the table to release the frozen resources. The success rate of this operation is 99.9999%. You can think that the second stage is reliable.

Secondly, if an error occurs during the execution of Confirm or Cancel, the specific TCC framework will continue to retry the execution operation to try to ensure successful execution. In this process, the update statement may be executed many times, so pay attention to the idempotency of the code.

TIPS: idempotency means that the results of one operation and multiple operations are consistent, such as the following example.

#SQL without idempotency, because num will increase by + 10 every time it is executed in the retry process

update tab set num = num + 10 where id = 1; 

#For idempotent SQL, no matter how many times you retry, the update result num is 1760

update tab set num = 1760 where id = 1;

Finally, under the minimum probability, if Confim or Cancel fails after multiple retries, the final data inconsistency will occur, which requires developing additional data integrity verification programs to remedy or manually supplement.

Speaking of this, you must have an intuitive understanding of the TCC scheme, but the TCC scheme is a theoretical design after all, which needs the support of the manufacturer's corresponding framework. Famous TCC frameworks in the open source field of Java include ByteTCC, Hmily, TCC transaction and Seata. Yes, the distributed transaction middleware Seata we learned earlier also supports the TCC mode. Let's introduce the TCC mode of Seata.

TCC mode of Seata

Execution process of Seata TCC

For the TCC mode of Seata, it is highly similar to the AT mode. It can be understood that TCC is the "manual version" of AT mode. All operations during commit and rollback should be handled by writing code, rather than automatically executing reverse SQL to complete commit and rollback like AT.

Seata TCC mode execution process

The execution steps and pseudo code are combined to explain the execution of Seata TCC mode.

  • The first step is to execute mallservice Sale method, which is used to implement complete business logic and define the boundary of global transactions. Note to add the @ GlobalTransactional annotation on the sale method. After it is written, the TC will be automatically notified to start the global transaction before entering the sale method.

@Service
public class MallService {

    @Resource
    private OrderAction orderAction;

    @Resource
    private StorageAction storageAction;

    @GlobalTransactional
    public void sale(){ //Methods of selling goods in mall business
       orderAction.prepare("Zhang San",30);
       storageAction.prepare("cola",10);
    }
}
  • In the second step, the first sentence of the sale method calls the prepare method of OrderAction. Action is a special class in Seata TCC mode. OrderAction on the TM side is just an interface that defines the methods corresponding to Try, Commit and Cancel in TCC. The prepare method defined in this example corresponds to the Try phase of TCC and is used to lock resources. We need to add the @ TwoPhaseBusinessAction annotation outside the prepare method to declare which method will Commit or roll back after successful or failed execution of the prepare method. This annotation has three parameters:

  1. Name represents the registered name of the branch transaction;

  2. commitMethod represents the method to be executed when the phase II TC sends a submission message;

  3. The rollback method represents the method to be executed when the rollback message is sent by the phase 2 TC.

public interface OrderAction {
    @TwoPhaseBusinessAction(name="TccOrderAction",commitMethod = "commit" , rollbackMethod = "rollback")
    //prepare corresponds to TCC phase I and is used to lock resources.
    public boolean prepare(BusinessActionContext actionContext,
                           @BusinessActionContextParameter(paramName = "customer") String customer,
                           @BusinessActionContextParameter(paramName = "amount") float amount
    );
    //Submission method definition
    public boolean commit(BusinessActionContext actionContext);

    //Rollback method definition
    public boolean rollback(BusinessActionContext actionContext);
}

The specific implementation class of OrderAction needs to be developed in the RM side order service. The following is the pseudo code of the OrderActionImpl implementation class.

@Service("orderActionImpl")
public class OrderActionImpl implements OrderAction{

    @Override
    @Transactional(propagation = Propagation.REQUIRES_NEW) //Local transaction control
    public boolean prepare(BusinessActionContext actionContext, String customer, float amount) {
        Order order = new Order();
        order.setCustomer("Zhang San");//customer
        order.setAmount(0f); //Order amount
        order.setFrozenAmount(amount); //prepare is used to lock resources, so the amount should be written to the "advance amount" field.
        order.setStatus(0); //0 represents the initial state
        orderRepository.save(order);//Create a new Order record on the RM side
        return true; //true indicates successful execution
    }

    @Override
    @Transactional(propagation = Propagation.REQUIRES_NEW)
    public boolean commit(BusinessActionContext actionContext) {
        //Idempotency check
        String orderCode = (String)actionContext.getActionContext("orderCode");
        Order condition = new Order();
        condition.setOrderCode(orderCode);
        Example<Order> sample = Example.of(condition);
        Order order = orderRepository.findOne(sample).get();
        if(order.getStatus() > 0){ //If the order status is not 0, it means that this order has been processed. Skip it directly
            return true; 
        }
        //Set the amount, release the locked amount, and change the order status to completed
        order.setAmount(order.getFrozenAmount());
        order.setFrozenAmount(0f);
        order.setStatus(1);
        orderRepository.save(order);
        return true;
    }

    @Override
    @Transactional(propagation = Propagation.REQUIRES_NEW)
    public boolean rollback(BusinessActionContext actionContext) {
        //Idempotency check
        String orderCode = (String)actionContext.getActionContext("orderCode");
        Order condition = new Order();
        condition.setOrderCode(orderCode);
        Example<Order> sample = Example.of(condition);
        Order order = orderRepository.findOne(sample).get();
        if(order.getStatus() > 0){//If the order status is not 0, it means that this order has been processed. Skip it directly
            return true; 
        }
        //Execution failed, the order amount rollback is set to 0, and the order status is changed to cancel
        order.setAmount(0f);
        order.setFrozenAmount(0f);
        order.setStatus(2);
        orderRepository.save(order);
        return true;
    }
}

Because TM mall application and RM order service are called remotely. When the TM side calls the prepare method, it is actually executed in the order service. At this time, the order service will also notify the TC to start the branch transaction.

  • Step 3: the TM side sale method executes to the second sentence and calls the prepare method of StorageAction. As before, the TM side holds the StorageAction interface, which is basically the same as the OrderAction interface. The definition is as follows:
public interface StorageAction {

    @TwoPhaseBusinessAction(name="TccStorageAction" ,commitMethod = "commit" , rollbackMethod = "rollback")
    public boolean prepare(BusinessActionContext context,
                           @BusinessActionContextParameter(paramName = "goodsCode") String goodsCode,
                           @BusinessActionContextParameter(paramName = "quantity") int quantity);

    public boolean commit(BusinessActionContext context);

    public boolean rollback(BusinessActionContext context);

}

The implementation class of the interface is in the RM side inventory service, named StorageActionImpl. The prepare method is used to lock the inventory, and commit and rollback are used to commit and rollback. The RM side StorageActionImpl pseudo code is as follows.

@Service("storageActionImpl")
@Transactional(propagation = Propagation.REQUIRES_NEW)
public class StorageActionImpl implements StorageAction {

    Logger logger = LoggerFactory.getLogger(this.getClass());

    @Resource
    private StorageRepository storageRepository;

    @Override
    public boolean prepare(BusinessActionContext context, String goodsCode, int quantity) {
        //Implement locked inventory logic
        //Check whether the inventory is sufficient, and throw an exception if it is not enough;
        //Update the "locked inventory" field, and the original commodity inventory remains unchanged.
    }

    @Override
    public boolean commit(BusinessActionContext context) {
        //Implement submission logic
        //Idempotency check;
        //Update commodity inventory = original inventory - locked inventory, and the locked inventory is reset to 0; 
    }

    @Override
    public boolean rollback(BusinessActionContext context) {
        //Implement rollback logic
        //Idempotency check;
        //Lock inventory reset to 0; 
    }
}
  • Step 4: the TM side sale method is executed. If the prepare methods of all actions are executed normally, sale will automatically send a "global transaction commit" message to TC.
  • Step 5: after receiving the global transaction commit message, TC sends it to each RM for branch transaction commit. In this process, orderactionimpl Commit method and storageactionimpl The commit method is automatically executed to complete the data submission.

Accordingly, if the sale method of TM fails to execute, the global transaction will be rolled back, and the rollback methods of OrderActionImpl and StorageActionImpl will be executed automatically to realize the rollback operation.

During the execution process, if the commit or rollback fails, Seata will try again and again to ensure the success of the operation as much as possible. Therefore, idempotency check should be done well.

How to choose between Seata AT and TCC

Since Seata already has AT mode, why introduce TCC to ensure transaction consistency? In the distributed transaction {Seata AT mode, have you ever considered that in complex enterprise applications, it is impossible to completely require the underlying database to use MySQL uniformly, or even ensure that all data sources support transactions.

Similar problems have been encountered before. After the user uploads the file, the system is abnormal and needs to be rolled back globally. When rolling back, the file needs to be deleted. At this time, reverse SQL based on Seata AT mode is powerless. TCC can solve such problems well, because all logic in the TCC process is controlled by programmers through code, which can well solve such non transactional data processing scenarios.

How should Seata AT and TCC choose? This depends on the specific business scenario. If transactional relational databases such as MySQL and Oracle are used AT the bottom of all services involved, and the business is direct operation on the database, the AT mode can achieve distributed data consistency as soon as possible. However, if the operation of non transactional resources is involved, the AT mode can do nothing, TCC must be used to implement the details of preparation, submission and rollback, but this undoubtedly puts forward higher requirements for the ability of programmers. Therefore, try to arrange these tasks to the core engineers with good technology in the team.

Keywords: Spring Boot Spring Cloud Microservices seata

Added by JMair on Mon, 20 Dec 2021 17:44:30 +0200