Redis 32: actual combat: redis cluster mode

In the last article, we talked about the construction of Redis cluster and the dynamic addition and deletion of nodes. Here we will briefly review. 30001 ~ 30006 are the clusters we built initially, and 30007 and 30008 are the master-slave nodes dynamically added later. We use the -- cluster info command to see the allocation of master nodes and slots. The execution code is as follows:

$ redis-cli --cluster info 127.0.0.1:30001
127.0.0.1:30001 (887397e6...) -> 0 keys | 5461 slots | 1 slaves.
127.0.0.1:30007 (df019085...) -> 0 keys | 0 slots | 1 slaves.
127.0.0.1:30003 (f5958382...) -> 0 keys | 5461 slots | 1 slaves.
127.0.0.1:30002 (3da35c40...) -> 0 keys | 5462 slots | 1 slaves.
[OK] 0 keys in 4 masters.
0.00 keys per slot on average.

It can be seen that the dynamically added master node 30007 has a slave node, but does not allocate any slots, which obviously can not meet our needs. Only nodes are added, but no data is processed. Therefore, we need to re fragment and store the data on all master nodes, so as to give full play to the role of the cluster.

Repartition

We can use the refresh command to reallocate slots. The command is as follows:

$ redis-cli --cluster reshard 127.0.0.1:30007
>>> Performing Cluster Check (using node 127.0.0.1:30007)
M: df0190853a53d8e078205d0e2fa56046f20362a7 127.0.0.1:30007
   slots:[0-1332],[5461-6794],[10923-12255] (4000 slots) master
   1 additional replica(s)
S: dc0702625743c48c75ea935c87813c4060547cef 127.0.0.1:30006
   slots: (0 slots) slave
   replicates 3da35c40c43b457a113b539259f17e7ed616d13d
M: 3da35c40c43b457a113b539259f17e7ed616d13d 127.0.0.1:30002
   slots:[6795-10922] (4128 slots) master
   1 additional replica(s)
S: 1a324d828430f61be6eaca7eb2a90728dd5049de 127.0.0.1:30004
   slots: (0 slots) slave
   replicates f5958382af41d4e1f5b0217c1413fe19f390b55f
S: 1d09d26fd755298709efe60278457eaa09cefc26 127.0.0.1:30008
   slots: (0 slots) slave
   replicates df0190853a53d8e078205d0e2fa56046f20362a7
S: abec9f98f9c01208ba77346959bc35e8e274b6a3 127.0.0.1:30005
   slots: (0 slots) slave
   replicates 887397e6fefe8ad19ea7569e99f5eb8a803e3785
M: f5958382af41d4e1f5b0217c1413fe19f390b55f 127.0.0.1:30003
   slots:[12256-16383] (4128 slots) master
   1 additional replica(s)
M: 887397e6fefe8ad19ea7569e99f5eb8a803e3785 127.0.0.1:30001
   slots:[1333-5460] (4128 slots) master
   1 additional replica(s)
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
How many slots do you want to move (from 1 to 16384)?

During execution, it will ask how many nodes you intend to move. The value range is 1 to 16384. We enter 4000 here, which means to move 4000 slots to a master node. After entering the command, the execution effect is as follows:

How many slots do you want to move (from 1 to 16384)? 4000
What is the receiving node ID?

Then it will ask which node you need to assign these slots to. Please enter the node Id. after entering the Id of port 30007 above, click enter. The execution effect is as follows:

How many slots do you want to move (from 1 to 16384)? 4000
What is the receiving node ID? df0190853a53d8e078205d0e2fa56046f20362a7
Please enter all the source node IDs.
  Type 'all' to use all the nodes as source nodes for the hash slots.
  Type 'done' once you entered all the source nodes IDs.
Source node #1:

At this time, it will ask you which source node you want to transfer from. Enter the {all} command to randomly select from all nodes. The execution effect is as follows:

# ...... Ignore other
Moving slot 2656 from 887397e6fefe8ad19ea7569e99f5eb8a803e3785
Moving slot 2657 from 887397e6fefe8ad19ea7569e99f5eb8a803e3785
Moving slot 2658 from 887397e6fefe8ad19ea7569e99f5eb8a803e3785
Moving slot 2659 from 887397e6fefe8ad19ea7569e99f5eb8a803e3785
Moving slot 2660 from 887397e6fefe8ad19ea7569e99f5eb8a803e3785
Moving slot 2661 from 887397e6fefe8ad19ea7569e99f5eb8a803e3785
Moving slot 2662 from 887397e6fefe8ad19ea7569e99f5eb8a803e3785
Moving slot 2663 from 887397e6fefe8ad19ea7569e99f5eb8a803e3785
Moving slot 2664 from 887397e6fefe8ad19ea7569e99f5eb8a803e3785
Moving slot 2665 from 887397e6fefe8ad19ea7569e99f5eb8a803e3785
Do you want to proceed with the proposed reshard plan (yes/no)?

At this time, it will list all the node information to be transferred and let you confirm. You only need to enter "yes" to start the transfer operation.

After performing the transfer, we use the "cluster slots" command to view the relevant information of the slot. The results are as follows:

$ redis-cli -c -p 30001
127.0.0.1:30001> cluster slots # View cluster slot information
1) 1) (integer) 0
   2) (integer) 1332
   3) 1) "127.0.0.1"
      2) (integer) 30007
      3) "df0190853a53d8e078205d0e2fa56046f20362a7"
   4) 1) "127.0.0.1"
      2) (integer) 30008
      3) "1d09d26fd755298709efe60278457eaa09cefc26"
2) 1) (integer) 5461
   2) (integer) 6794
   3) 1) "127.0.0.1"
      2) (integer) 30007
      3) "df0190853a53d8e078205d0e2fa56046f20362a7"
   4) 1) "127.0.0.1"
      2) (integer) 30008
      3) "1d09d26fd755298709efe60278457eaa09cefc26"
3) 1) (integer) 10923
   2) (integer) 12255
   3) 1) "127.0.0.1"
      2) (integer) 30007
      3) "df0190853a53d8e078205d0e2fa56046f20362a7"
   4) 1) "127.0.0.1"
      2) (integer) 30008
      3) "1d09d26fd755298709efe60278457eaa09cefc26"
4) 1) (integer) 12256
   2) (integer) 16383
   3) 1) "127.0.0.1"
      2) (integer) 30003
      3) "f5958382af41d4e1f5b0217c1413fe19f390b55f"
   4) 1) "127.0.0.1"
      2) (integer) 30004
      3) "1a324d828430f61be6eaca7eb2a90728dd5049de"
5) 1) (integer) 6795
   2) (integer) 10922
   3) 1) "127.0.0.1"
      2) (integer) 30002
      3) "3da35c40c43b457a113b539259f17e7ed616d13d"
   4) 1) "127.0.0.1"
      2) (integer) 30006
      3) "dc0702625743c48c75ea935c87813c4060547cef"
6) 1) (integer) 1333
   2) (integer) 5460
   3) 1) "127.0.0.1"
      2) (integer) 30001
      3) "887397e6fefe8ad19ea7569e99f5eb8a803e3785"
   4) 1) "127.0.0.1"
      2) (integer) 30005
      3) "abec9f98f9c01208ba77346959bc35e8e274b6a3"

It can be seen from the results that 30007 extracts some slots from the other three master nodes as its own slots.

Note that if an error of / usr/bin/env: ruby: No such file or directory occurs during this process, it indicates that the tool needs to rely on the ruby environment during execution. You can use the command yum install ruby to install the ruby environment.

Slot location algorithm

The total number of slots in the Redis cluster is 16384. Each master node is responsible for maintaining part of the slots and the key value data mapped to the slots. By default, the Redis cluster will hash the key value to be stored using the CRC16 algorithm to obtain an integer value, and then use this integer value to model 16384 to obtain the specific slot position. The formula is:

slot = CRC16(key) % 16383

load balancing

When the load of Redis cluster is unbalanced, we can use the rebalance command to redistribute the number of slots responsible for each node, so as to balance the load pressure of each node, so as to improve the overall operation efficiency of Redis cluster.

The rebalance command is as follows:

$ redis-cli --cluster rebalance 127.0.0.1:30007

It should be noted that even if the rebalance command is entered, it may not be executed. When it considers it unnecessary to allocate, it will exit directly, as shown below:

$ redis-cli --cluster rebalance 127.0.0.1:30007
>>> Performing Cluster Check (using node 127.0.0.1:30007)
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
*** No rebalancing needed! All nodes are within the 2.00% threshold.

Code practice

Earlier, we talked about the functions related to the construction of Redis cluster. Next, we use Python code to operate Redis cluster. The core code is as follows:

import redis
from rediscluster import StrictRedisCluster


class RedisDao(object):
    """initialization redis instance"""

    __instance = None

    def __new__(cls, *args, **kwargs):
        if cls.__instance is None:
            cls.__instance = object.__new__(cls)
        return cls.__instance

    def __init__(self, redis_uri):
        self.conn = self.init_redis_conn(redis_uri)

    @staticmethod
    def init_redis_conn(redis_uri):
        """
        REDIS_CLUSTER = "redis://jarvis@192.168.48.153:6381,192.168.48.153:6382,192.168.48.153:6383"
        REDIS_URI = "redis://passwd@192.168.17.128:6379/0"
        :param redis_uri: Connected redis uri
        :return: redis client
        """
        redis_params = redis_uri[len("redis://"):]
        auth = ""
        if '@' in redis_params:
            redis_array = redis_params.split("@")
            auth = redis_array[0]
            uri_params = redis_array[1]
        else:
            uri_params = redis_params

        uri_array = uri_params.split(",")
        # standalone mode 
        if len(uri_array) == 1:
            params = uri_array[0].split("/")
            host, port = params[0].split(":")
            db = int(params[1]) if len(uri_params) > 1 else 0
            pool = redis.ConnectionPool(host=host, port=port, password=auth, db=db)
            # return redis.Redis(host=host, port=port, db=db, password=auth)
            return redis.Redis(connection_pool=pool)
        # Cluster mode
        else:
            redis_nodes = []
            for uri in uri_array:
                host, port = uri.split(":")
                redis_nodes.append({'host': host, 'port': int(port)})
            return StrictRedisCluster(startup_nodes=redis_nodes, password=auth, max_connections=5)


if __name__ == "__main__":
    pass





This result shows that the Redis cluster operates normally. Except for different operation objects, the operation method names are the same, so it is friendly to programmers. You can write corresponding code according to your business scenario.

fault

In the last part of the article, let's take a look at the knowledge points related to Redis cluster faults, so that we won't be so flustered when we encounter some fault problems, and can provide us with some help when dealing with faults.

Fault discovery

There are two important concepts in fault discovery: pfail possible Fail and Fail.

Health monitoring in the cluster is confirmed by regularly sending PING information to other nodes in the cluster. If the node sending the PING message does not receive the returned PONG message within the specified time, the other node will be marked as suspected offline.

When a node finds that a node is suspected to be offline, it will broadcast this message to the whole cluster, and other nodes will receive this message and monitor whether a node is really offline through PING. If a node receives that the number of suspected offline of a node exceeds more than half of the number of clusters, it can mark the node to determine the offline status, then broadcast to the whole cluster, force other nodes to also receive the fact that the node has been offline, and immediately switch the master-slave of the lost node.

This is the concept of suspected offline and confirmed offline. This concept is similar to the concepts of subjective offline and objective offline in sentinel mode.

Failover

When a node is identified as offline by the cluster, it can perform failover. The execution process of failover is as follows:

  1. Select a slave node from all the slave nodes of the offline master node (for the selection method, see "election principles of new master nodes" below);
  2. The slave node will execute the SLAVEOF NO ONE command, turn off the replication function of the slave node, and change from the slave node to the master node. The data set obtained from the original synchronization will not be discarded;
  3. The new master node will revoke all slot assignments to the offline master node and assign all these slots to itself;
  4. The new master node broadcasts a PONG message to the cluster. This PONG message is to let other nodes in the cluster know that this node has changed from a slave node to a master node, and that this master node has taken over the slot information originally handled by the offline node;
  5. The new master node begins to process the related command requests, and the failover process is completed.

New master node election principle

The method of new master node election is as follows:

  1. epoch is a self incrementing counter with an initial value of 0;
  2. Each master node has a chance to vote, and the master node will vote for the first slave node requiring voting;
  3. When the slave node finds that the master node it is copying is confirmed to be offline, it will broadcast a message to the cluster asking all the master nodes with voting rights to vote for the slave node;
  4. If the master node with voting rights has not voted for others, it will send a message to the first slave node requiring voting, indicating that it will vote for the slave node;
  5. When the number of votes received by the slave node is more than half of the number of clusters, the slave node will be elected as the new master node.

Here, the selection of the whole new master node is completed.

Summary

This paper starts from the dynamic addition of the main node to reallocate the slot through the reshard command, describes the slot location algorithm and the implementation method of load balancing, also demonstrates how to operate the Redis cluster in the program by using code, and finally describes the whole process of Redis cluster fault discovery, failover and new master node election, I hope it will help you understand Redis cluster.

Keywords: Database Redis Cache

Added by snowman2344 on Thu, 20 Jan 2022 00:30:42 +0200