An accident tells you which of zookeeper and nacos is better for the registration center

preface

In distributed systems, registry plays an important role and is an indispensable member of service discovery and client load balancing. In addition to the basic functions of the registry, its stability, availability and robustness have a significant impact on the smooth operation of the whole distributed system. As a mainstream distributed system in China, dubbo supports third-party middleware such as zookeeper, nacos and redis, as well as Simple and Multicast. zk and nacos are probably the most commonly used methods. Who is better? The following accident scene has the answer.

In the distributed system, the service is often defined by the provider, and the sdk package of the service definition is given. The consumer discovers the service by introducing the provider's sdk package. However, when a subsystem needs to rely on the services of thousands of subsystems, it is obviously unfriendly to rely on the sdk packages of thousands of subsystems. What is the way to avoid dependence, dubbo Provides a generalized invocation method. Although the generalized call solves the problem of dependent reference, there are also some fatal problems caused by improper use, which are exposed through the following demo case where the generalized service definition is not cached.

Case recurrence

​ pom. dubbo of version 2.5.7 and zkClient dependency of version 0.11 are introduced into the XML file, and zk is used as the registry.

		<dependency>
			<groupId>com.alibaba</groupId>
			<artifactId>dubbo</artifactId>
			<version>2.5.7</version>
		</dependency>
 
		<dependency>
			<groupId>com.101tec</groupId>
			<artifactId>zkclient</artifactId>
			<version>0.11</version>
		</dependency>

Use the following code to simulate the generalization call, define the generalization service in the helloService() method, and return the generalization service. Then make a service generalization call in the sayHello() method. The sayHello method always obtains the service through an endless loop until an exception occurs.

import com.alibaba.dubbo.config.ApplicationConfig;
import com.alibaba.dubbo.config.ReferenceConfig;
import com.alibaba.dubbo.config.RegistryConfig;
import com.alibaba.dubbo.rpc.service.GenericService;
import org.springframework.stereotype.Service;
 
@Service
public class HelloGenericService {
 
    private GenericService helloService() {
        ReferenceConfig<GenericService> config = new ReferenceConfig<>();
        config.setInterface("com.qiao.hao.ting.service.HelloService");
        config.setGeneric(true);
        config.setProtocol("dubbo");
        config.setCheck(false);
        //Adopt zk as the registration center
        config.setRegistry(new RegistryConfig("zookeeper://127.0.0.1:2181"));
        //config.setRegistry(new RegistryConfig("nacos://127.0.0.1:8848"));
        config.setTimeout(1000);
        config.setApplication(new ApplicationConfig("general"));
        GenericService service = config.get();
        return service;
    }
 
    public Object sayHello() {
        while (true) {
            try {
                GenericService genericService = helloService();
                //rpc call
                //genericService.$invoke("syaHello", new String[]{}, new Object[]{});
            } catch (Exception e) {
                e.printStackTrace();
                break;
            }
        }
        return "success";
    }
}

After triggering the sayHello call, look at the information of the zk node. View the registered node information of dubbo through the zkCli client window. Every time helloService is called, a consumption node will be written to the / dubbo / corresponding interface / consumers directory of zk.

If the program continues to run, the number of consumer nodes will directly overflow the array size that the ls command can accept.

At the same time, the file size in zk's data directory is increasing. One of the most intuitive problems is that the disk will be full over time.

At the same time, check the service registration information through Dubbo admin, and you can see com qiao. hao. ting. service. The number of helloService service nodes is more than one. With the continuous operation of helloService, the number of nodes will increase.

Now change the registry to nacos , the registered client adopts Dubbo registry Nacos version 0.0.1.

		<dependency>
			<groupId>com.alibaba</groupId>
			<artifactId>dubbo</artifactId>
			<version>2.5.7</version>
		</dependency>
 
		<dependency>
			<groupId>com.alibaba</groupId>
			<artifactId>dubbo-registry-nacos</artifactId>
			<version>0.0.1</version>
		</dependency>

Set the registry of the generalized service definition to nacos.

   private GenericService helloService() {
        ReferenceConfig<GenericService> config = new ReferenceConfig<>();
        config.setInterface("com.qiao.hao.ting.service.HelloService");
        config.setGeneric(true);
        config.setProtocol("dubbo");
        config.setCheck(false);
        //config.setRegistry(new RegistryConfig("zookeeper://127.0.0.1:2181"));
        //Using nacos as the registration center
        config.setRegistry(new RegistryConfig("nacos://127.0.0.1:8848"));
        config.setTimeout(1000);
        config.setApplication(new ApplicationConfig("general"));
        GenericService service = config.get();
        return service;
    }

Trigger the sayHello method. You can know from the management interface of nacos that no matter how the program runs, com qiao. hao. ting. service. There is only one helloservice consumer registration information.

For the consistency of the comparison, the comparison is carried out through Dubbo admin. Dubbo admin is registered through zk by default. Here, you need to make a small transformation to Dubbo admin. Switch Dubbo admin to nacos in the following two steps. First, download the source code corresponding to the Dubbo admin Version (version 2.5.7 in this case), and then introduce the dependency of Dubbo registry nacos version 0.0.1.

Second, Dubbo registry. The address of address is changed to nacos://127.0.0.1:8848 .

Then rebuild Dubbo admin and run. Finally, check the service list. It can be seen that under the registration center of nacos, the com qiao. hao. ting. service. There will only be one registration information for the helloservice service.

problem analysis

Since the generalized service is not cached, the service registration will be performed every time it is called. When the service registration request is sent to zk, zk will write a node; Consistency in nacos is not maintained through node data like zk, and there will be no infinite repeated registration of services (the specific principles of the two are not described here, please look forward to it).

GenericService service = config.get();

Of course, it is almost impossible to call and register in an endless loop in the actual code, but in high concurrency or maintaining a certain number of requests for a long time, it will still lead to disk exhaustion of zk, io read-write exceptions and unavailability of zk, resulting in unavailability of the service registration and development capability of the whole cluster.

Can you find this problem in the test phase. If the testers are strong, they may also pay attention to service registration. However, it is generally impossible. Service registration is generally not in the scope of testing. In function testing, even if unit, smoke, overall and regression tests are included, zk is not available. Stress testing is generally short-lived. The machine should be able to withstand the amount of disk writes in a short time, unless the test environment is also monitored, but it is generally impossible.

Solution

If zk is used as the registry, how to prevent and solve such problems.

1. Cache the service. For example, change to the following code.

@Service
public class HelloGenericService {
    
    private GenericService genericService;
    
    private Object lockObject = new Object();
 
    private GenericService helloService() {
        if(genericService != null) {
            return genericService;
        }
        synchronized (lockObject) {
            if(genericService != null) {
                return genericService;
            }
            ReferenceConfig<GenericService> config = new ReferenceConfig<>();
            config.setInterface("com.qiao.hao.ting.service.HelloService");
            config.setGeneric(true);
            config.setProtocol("dubbo");
            config.setCheck(false);
            //config.setRegistry(new RegistryConfig("zookeeper://127.0.0.1:2181"));
            config.setRegistry(new RegistryConfig("nacos://127.0.0.1:8848"));
            config.setTimeout(1000);
            config.setApplication(new ApplicationConfig("general"));
            genericService = config.get();
        }
        return genericService;
    }
 
}

2. Strengthen code review

3. Monitor zk nodes, such as physical monitoring of disk, cpu and io, and network monitoring of registration service requests.

conclusion

It is recommended to select nacos as the registry.

Keywords: Java Dubbo Zookeeper rpc

Added by binto on Tue, 18 Jan 2022 20:50:12 +0200