Dubbo Cluster fault tolerant Cluster

1. Preface

Online services are rarely deployed in a single machine, which does not meet the needs of the three high architecture of the Internet. Once the service is suspended, high availability is out of the question. In addition, the default maximum 200 concurrent processing of Dubbo single machine does not meet the so-called high concurrency and high performance. Therefore, as an excellent distributed service framework, Dubbo supports cluster fault tolerance.

The entire Cluster error tolerance layer of Dubbo is implemented in the Dubbo Cluster module, which contains many components, such as Cluster, ClusterInvoker, Directory, LoadBalance, etc. This paper mainly analyzes Cluster and ClusterInvoker, and other components will be discussed in later articles.

2. Cluster

Cluster is the fault-tolerant interface of Dubbo cluster. The interface definition is very simple:

public interface Cluster {
    String DEFAULT = FailoverCluster.NAME;

    // Aggregate a group of Invokers into an Invoker
    <T> Invoker<T> join(Directory<T> directory) throws RpcException;

    // Get cluster fault tolerant extension point implementation
    static Cluster getCluster(String name) {
        return getCluster(name, true);

    static Cluster getCluster(String name, boolean wrap) {
        if (StringUtils.isEmpty(name)) {
            name = Cluster.DEFAULT;
        return ExtensionLoader.getExtensionLoader(Cluster.class).getExtension(name, wrap);

It can be seen that cluster has only one function, that is to aggregate a group of invokers into a ClusterInvoker with cluster fault tolerance. Cluster itself does not have cluster fault tolerance. It is only responsible for creating ClusterInvoker with cluster fault tolerance. Cluster relies on the Directory interface. The Directory interface will provide a list of services that can be called, that is, a group of invokers. Cluster will select a final Invoker from this group of invokers and initiate the call. If the call fails, subsequent processing will be carried out according to the corresponding cluster fault-tolerant policy, such as service retry.

Dubbo as of 2.7 Version 8 supports the following ten cluster fault tolerance strategies:

Fault tolerance strategyexplain
FailoverWhen a failure occurs, other servers will be retried. The default policy is
FailfastFast failure. When the request fails, it will quickly return abnormal results without any retry. It is applicable to non idempotent interfaces
FailsafeIf the security fails and an exception occurs, it can be ignored directly. It is applicable when you don't care about the call result
FallbackAfter the request fails, it will be automatically recorded in the failure queue and retried by a timed thread pool
ForkingCall multiple identical services at the same time. As long as one of them returns, the result will be returned immediately
BroadcastBroadcast calls all available services. If any node reports an error, it will report an error
MockProvide false response results when the call fails
AvailableWithout load balancing, traverse the list of all services, find the first available node and directly initiate the call
MergeableAutomatically merge the results requested by multiple nodes
ZoneAwareHave the ability of region awareness and give priority to calling services in the same region.

Among them, Failover is the default policy, and the corresponding class is FailoverCluster. We have taken it as an example.

public class FailoverCluster extends AbstractCluster {

    public final static String NAME = "failover";

    public <T> AbstractClusterInvoker<T> doJoin(Directory<T> directory) throws RpcException {
        return new FailoverClusterInvoker<>(directory);

The implementation of Cluster is very simple. It is to create a ClusterInvoker with Cluster fault tolerance. The logic of service retry is all in FailoverClusterInvoker.

3. ClusterInvoker

The ClusterInvoker interface is inherited from Invoker. It increases the fault tolerance of the cluster on the basis of the original.

public interface ClusterInvoker<T> extends Invoker<T> {

    // Registry URL
    URL getRegistryUrl();

    // Get service directory
    Directory<T> getDirectory();

ClusterInvoker still adopts the decorator mode. It does not have the ability of remote service invocation. It depends on the basic Invoker to do cluster fault tolerance on it. The basic Invoker is provided by Directory. Taking RegistryDirectory as an example, it will go to the registry to subscribe to the required services, and then convert ProviderUrls into a group of invokers. ClusterInvoker will do routing filtering, load balancing, fault tolerance and other operations from this group of invokers.

ClusterInvoker uses the template method mode. The invoke() method of the base class AbstractClusterInvoker implements a set of algorithm skeleton. The process is as follows: Directory filter service list, initialize LoadBalance, and start doInvoke call.

public Result invoke(final Invocation invocation) throws RpcException {
    // Make sure the service is not logged off
    Map<String, Object> contextAttachments = RpcContext.getContext().getObjectAttachments();
    if (contextAttachments != null && contextAttachments.size() != 0) {
        ((RpcInvocation) invocation).addObjectAttachments(contextAttachments);
    // Filter the list of services through Directory
    List<Invoker<T>> invokers = list(invocation);
    // Initialize load balancing
    LoadBalance loadbalance = initLoadBalance(invokers, invocation);
    RpcUtils.attachInvocationIdIfAsync(getUrl(), invocation);
    return doInvoke(invocation, invokers, loadbalance);

The specific fault tolerance strategy is in the doInvoke() method of the subclass. We still take FailoverClusterInvoker as an example. The process is as follows:

  1. Gets the number of service retries.
  2. Create a List to store the invoked Invoker for avoidance when retrying.
  3. Create a Set to store the called Provider for logging.
  4. Initiate a service call, select Invoker for load balancing to initiate the call, return if successful, and retry if failed.
public Result doInvoke(Invocation invocation, final List<Invoker<T>> invokers, LoadBalance loadbalance) throws RpcException {
    List<Invoker<T>> copyInvokers = invokers;
    // Ensure that services are available
    checkInvokers(copyInvokers, invocation);
    String methodName = RpcUtils.getMethodName(invocation);
    // Method retries
    int len = getUrl().getMethodParameter(methodName, RETRIES_KEY, DEFAULT_RETRIES) + 1;
    if (len <= 0) {
        len = 1;
    // Record the last call exception
    RpcException le = null;
    // Record the Invoker that has been called. Avoid when retrying
    List<Invoker<T>> invoked = new ArrayList<Invoker<T>>(copyInvokers.size()); // invoked invokers.
    // Record the called Provider for logging
    Set<String> providers = new HashSet<String>(len);
    for (int i = 0; i < len; i++) {
        if (i > 0) {
            copyInvokers = list(invocation);
            checkInvokers(copyInvokers, invocation);
        // load balancing 
        Invoker<T> invoker = select(loadbalance, invocation, copyInvokers, invoked);
        RpcContext.getContext().setInvokers((List) invoked);
        try {
            // Service call
            Result result = invoker.invoke(invocation);
            return result;
        } catch (RpcException e) {
            if (e.isBiz()) {
                throw e;
            le = e;
        } catch (Throwable e) {
            le = new RpcException(e.getMessage(), e);
        } finally {
    throw new RpcException();

Load balancing does not directly call the doSelect() method of the parent class because the child class needs to handle operations such as sticky connections.

The implementation of other cluster fault-tolerant strategies will not be analyzed one by one. Interested students will have a look.

4. Summary

Cluster is a cluster fault tolerance interface. Its function is only to aggregate a group of invokers into a ClusterInvoker with cluster fault tolerance. In 2.7 In version 8, Dubbo has built-in ten cluster fault-tolerant policies. The default policy is Failover, and the corresponding class is FailoverCluster, which will create FailoverClusterInvoker.
ClusterInvoker adopts decorator mode + template method mode. It does not have the ability of remote call. It relies on the basic Invokers provided by Directory to make cluster fault tolerance. The invoke() method of the base class implements a set of algorithm skeleton, filters out the callable services through the Directory, then initializes the LoadBalance, and finally gives it to the subclass for cluster fault tolerance.

Keywords: Java Zookeeper rpc

Added by spyke01 on Tue, 28 Dec 2021 03:54:22 +0200