06 Nacos Client local cache and failover

Learning is not so utilitarian. The second senior brother will take you to easily read the source code from a higher dimension ~

In this article, we will analyze the local cache and failover functions of Nacos through the source code, and the core classes involved are ServiceInfoHolder and FailoverReactor.

ServiceInfoHolder function overview

ServiceInfoHolder class, as the name suggests, is the holder of service information. The previous article has repeatedly involved the ServiceInfoHolder class. For example, every time a client obtains new service information from the registry, it will call the processServiceInfo method of this class for localization processing, including updating the cache service, publishing events, updating local files, etc.

In addition to the above functions, when instantiating this class, it also performs operations including local cache directory initialization, failover initialization, etc. Let's analyze it one by one.

Local memory cache for ServiceInfo

ServiceInfo, the information of the registered service, including service name, group name, cluster information, instance list information, last update time, etc. In other words, the information obtained by the client from the registry is hosted locally by ServiceInfo.

The ServiceInfoHolder class holds ServiceInfo and stores it through a ConcurrentMap:

public class ServiceInfoHolder implements Closeable {
    private final ConcurrentMap<String, ServiceInfo> serviceInfoMap;

This is the first layer cache of service registration information by Nacos client. When analyzing the processServiceInfo method earlier, we have seen that when the service information changes, the information in the serviceInfoMap will be updated as soon as possible.

public ServiceInfo processServiceInfo(ServiceInfo serviceInfo) {
// ....
    // Cache service information
    serviceInfoMap.put(serviceInfo.getKey(), serviceInfo);
    // Judge whether the registered instance information has been changed
    boolean changed = isChangedServiceInfo(oldService, serviceInfo);
    if (StringUtils.isBlank(serviceInfo.getJsonFromServer())) {
 // ....

The use of serviceInfoMap is so simple that when the change instance put s the latest data into it. When using an instance, you can get according to the key.

serviceInfoMap is initialized in the construction method of ServiceInfoHolder, and an empty ConcurrentMap is created by default. However, when reading information from the cache file at startup is configured, it will be loaded from the local cache.

// Whether to read information from the cache directory at startup. The default is false. Set to true to read the cache file
if (isLoadCacheAtStart(properties)) {
    this.serviceInfoMap = new ConcurrentHashMap<String, ServiceInfo>(DiskCache.read(this.cacheDir));
} else {
    this.serviceInfoMap = new ConcurrentHashMap<String, ServiceInfo>(16);

This involves the local cache directory. In the processServiceInfo method, when the service instance changes, you will see that ServiceInfo information is written to the directory through the DiskCache#write method.

// Service instance changed
if (changed) {
    NAMING_LOGGER.info("current ips:(" + serviceInfo.ipCount() + ") service: " + serviceInfo.getKey() + " -> "
            + JacksonUtils.toJson(serviceInfo.getHosts()));
    // Adding an instance change event will be pushed to the subscriber for execution
    NotifyCenter.publishEvent(new InstancesChangeEvent(serviceInfo.getName(), serviceInfo.getGroupName(),
            serviceInfo.getClusters(), serviceInfo.getHosts()));
    // Record Service local files
    DiskCache.write(serviceInfo, cacheDir);

Let's talk about the local cache directory.

Local cache directory

The local cache directory exists as a property of ServiceInfoHolder, which is used to specify the root directory of the local cache and the root directory of failover.

private String cacheDir;

In the construction method of ServiceInfoHolder, the first call is to generate the cache directory:

public ServiceInfoHolder(String namespace, Properties properties) {
    // Generate cache directory: the default is ${user.home}/nacos/naming/public,
    // You can use system Setproperty ("JM. Snapshot. Path") custom root directory
    initCacheDir(namespace, properties);

Don't look at the source code of the generated directory. The default cache directory is ${user.home}/nacos/naming/public, which can be accessed through system Setproperty ("JM.SNAPSHOT.PATH") custom root directory.

After initializing the directory, the failover information is also stored in the directory.


Similarly, in the construction method of ServiceInfoHolder, a FailoverReactor class will be initialized, which is also a member variable of ServiceInfoHolder. FailoverReactor is used to handle failover.

this.failoverReactor = new FailoverReactor(this, cacheDir);

this here is the current object of ServiceInfoHolder, that is, the two hold each other's references.

Let's look at the FailoverReactor construction method:

public FailoverReactor(ServiceInfoHolder serviceInfoHolder, String cacheDir) {
    // Hold ServiceInfoHolder reference
    this.serviceInfoHolder = serviceInfoHolder;
    // Splicing fault root directory: ${user.home}/nacos/naming/public/failover
    this.failoverDir = cacheDir + FAILOVER_DIR;
    // Initialize executorService
    this.executorService = new ScheduledThreadPoolExecutor(1, new ThreadFactory() {
        public Thread newThread(Runnable r) {
            Thread thread = new Thread(r);
            // Running in daemon mode
            return thread;
    // Other initialization operations can be performed by starting multiple scheduled tasks through executorService

The construction method of FailoverReactor basically shows its functions:

  • Hold ServiceInfoHolder reference;
  • Splicing fault root directory: ${user.home}/nacos/naming/public/failover, where public may also be other user-defined namespaces;
  • Initialize executorService;
  • init method: start multiple scheduled tasks through executorService;

init method execution

Three scheduled tasks are enabled in the init method:

  • Initialization is executed immediately, with an interval of 5 seconds, and the execution task is switchrefresh;
  • The initialization is delayed for 30 minutes, the execution interval is 24 hours, and the execution task is DiskFileWriter;
  • Initialization is executed immediately, with an interval of 10 seconds. The core operation is DiskFileWriter;

These three tasks are the internal classes of FailoverReactor. Let's first look at the implementation of the DiskFileWriter of the latter two tasks:

class DiskFileWriter extends TimerTask {

    public void run() {
        Map<String, ServiceInfo> map = serviceInfoHolder.getServiceInfoMap();
        for (Map.Entry<String, ServiceInfo> entry : map.entrySet()) {
            ServiceInfo serviceInfo = entry.getValue();
            if (StringUtils.equals(serviceInfo.getKey(), UtilAndComs.ALL_IPS) || StringUtils
                    .equals(serviceInfo.getName(), UtilAndComs.ENV_LIST_KEY) || StringUtils
                    .equals(serviceInfo.getName(), UtilAndComs.ENV_CONFIGS) || StringUtils
                    .equals(serviceInfo.getName(), UtilAndComs.VIP_CLIENT_FILE) || StringUtils
                    .equals(serviceInfo.getName(), UtilAndComs.ALL_HOSTS)) {
            // Write cache contents to disk file
            DiskCache.write(serviceInfo, failoverDir);

The logic is very simple. It is to obtain the ServiceInfo cached in the ServiceInfoHolder and judge whether it meets the requirements to write to the disk file. If so, write it to the failover directory spliced earlier: ${user.home}/nacos/naming/public/failover. However, when there is a difference between the second scheduled task and the third scheduled task, the third scheduled task has pre judgment and is executed only when the file does not exist.

Finally, let's take a look at the core implementation of SwitchRefresher as follows:

File switchFile = new File(failoverDir + UtilAndComs.FAILOVER_SWITCH);
// File does not exist exit
if (!switchFile.exists()) {
    switchParams.put("failover-mode", "false");
    NAMING_LOGGER.debug("failover switch is not found, " + switchFile.getName());

long modified = switchFile.lastModified();

if (lastModifiedMillis < modified) {
    lastModifiedMillis = modified;
    // Get failover file contents
    String failover = ConcurrentDiskUtil.getFileContent(failoverDir + UtilAndComs.FAILOVER_SWITCH,
    if (!StringUtils.isEmpty(failover)) {
        String[] lines = failover.split(DiskCache.getLineSeparator());

        for (String line : lines) {
            String line1 = line.trim();
            // 1 indicates that failover mode is on
            if (IS_FAILOVER_MODE.equals(line1)) {
                switchParams.put(FAILOVER_MODE_PARAM, Boolean.TRUE.toString());
                NAMING_LOGGER.info("failover-mode is on");
                new FailoverFileReader().run();
            } else if (NO_FAILOVER_MODE.equals(line1)) {
                // 0 means failover mode is off
                switchParams.put(FAILOVER_MODE_PARAM, Boolean.FALSE.toString());
                NAMING_LOGGER.info("failover-mode is off");
    } else {
        switchParams.put(FAILOVER_MODE_PARAM, Boolean.FALSE.toString());

The logic of the above code is as follows:

  • If the failover file does not exist, it is returned directly. The failover [switch] file is named "00-00-000-VIPSRV_FAILOVER_SWITCH-000-00-00".
  • Compare the file modification time. If it has been modified, obtain the contents in the failover file.
  • The 0 and 1 identities are stored in the failover file. 0 means off and 1 means on.
  • When it is on, execute the thread FailoverFileReader.

FailoverFileReader, as its name suggests, is a failover file reader. The basic operation is to read the contents of the ServiceInfo file stored in the failover directory, convert it into ServiceInfo, and store all the ServiceInfo in the serviceMap attribute of the failover reactor.

An example of the contents of the failover directory file is as follows:

(base) appledeMacBook-Pro-2:failover apple$ ls

The file format is as follows:

    "hosts": [
            "ip": "",
            "port": 800,
            "valid": true,
            "healthy": true,
            "marked": false,
            "instanceId": "",
            "metadata": {
                "netType": "external",
                "version": "2.0"
            "enabled": true,
            "weight": 2,
            "clusterName": "DEFAULT",
            "serviceName": "DEFAULT_GROUP@@nacos.test.1",
            "ephemeral": true
    "dom": "DEFAULT_GROUP@@nacos.test.1",
    "name": "DEFAULT_GROUP@@nacos.test.1",
    "cacheMillis": 10000,
    "lastRefTime": 1617001291656,
    "checksum": "969c531798aedb72f87ac686dfea2569",
    "useSpecifiedURL": false,
    "clusters": "",
    "env": "",
    "metadata": {}

Let's take a look at the core business implementation:

for (File file : files) {
    if (!file.isFile()) {

    // Skip if it is a failover flag file
    if (file.getName().equals(UtilAndComs.FAILOVER_SWITCH)) {

    ServiceInfo dom = new ServiceInfo(file.getName());

    try {
        String dataString = ConcurrentDiskUtil
                .getFileContent(file, Charset.defaultCharset().toString());
        reader = new BufferedReader(new StringReader(dataString));

        String json;
        if ((json = reader.readLine()) != null) {
            try {
                dom = JacksonUtils.toObj(json, ServiceInfo.class);
            } catch (Exception e) {
                NAMING_LOGGER.error("[NA] error while parsing cached dom : " + json, e);

    } catch (Exception e) {
        NAMING_LOGGER.error("[NA] failed to read cache for dom: " + file.getName(), e);
    } finally {
        try {
            if (reader != null) {
        } catch (Exception e) {

    // ...  Read in cache
    if (!CollectionUtils.isEmpty(dom.getHosts())) {
        domMap.put(dom.getKey(), dom);

The basic flow of the code is as follows:

  • Read all files in the failover directory and traverse;
  • If the file does not exist, skip;
  • If the file is a failover flag file, skip;
  • Read the json content in the file and convert it into ServiceInfo object;
  • Put the ServiceInfo object into the domMap;

When the for loop is completed, if domMap is not empty, assign it to serviceMap:

if (domMap.size() > 0) {
    serviceMap = domMap;

Well, some students will ask, where is this serviceMap used? When getting an instance, we usually call a method named getServiceInfo:

public ServiceInfo getServiceInfo(final String serviceName, final String groupName, final String clusters) {
    NAMING_LOGGER.debug("failover-mode: " + failoverReactor.isFailoverSwitch());
    String groupedServiceName = NamingUtils.getGroupedName(serviceName, groupName);
    String key = ServiceInfo.getKey(groupedServiceName, clusters);
    if (failoverReactor.isFailoverSwitch()) {
        return failoverReactor.getService(key);
    return serviceInfoMap.get(key);

That is, if failover is enabled, the failoverReactor#getService method will be called first, and this method is to obtain ServiceInfo from the serviceMap.

public ServiceInfo getService(String key) {
    ServiceInfo serviceInfo = serviceMap.get(key);

    if (serviceInfo == null) {
        serviceInfo = new ServiceInfo();

    return serviceInfo;

So far, the analysis of the failover process of the Nacos client has been completed.


This article introduces the implementation of Nacos client local cache and failover. The so-called local cache has two aspects. The first aspect is that the instance information obtained from the registry will be cached in memory, that is, it will be carried in the form of Map, so that the query operation is convenient. The second method is to cache it regularly in the form of disk files for emergencies.

The failover is also divided into two aspects. The first aspect is that the failover switch is marked by files; The second aspect is that after the failover is enabled, when a failure occurs, the service instance information can be obtained from the file regularly backed up by the failover.

About the blogger: the author of the technical book "inside of SpringBoot technology", loves to study technology and write technical dry goods articles.

The official account: "new horizon of procedures", the official account of bloggers, welcome the attention.

Technical exchange: please contact blogger wechat: zhuan2quan

Keywords: Spring Cloud Nacos

Added by anthony-needs-you on Tue, 21 Dec 2021 20:13:51 +0200