Redis source code reading master-slave replication and sentinel mechanism

A single node in Redis has a single point of failure problem. In order to solve the single point problem, it is generally necessary to configure the slave node for Redis, and then use the sentry to monitor the survival status of the master node. If the master node hangs up, the slave node can continue to provide caching function.

1, Master-slave replication

Master-slave replication is generally used to separate data reading and writing. The master node provides write operations and the slave node provides read operations. It is suitable for scenarios with more reads and less writes.

Field resolution:

runId: each Redis node will generate a unique uuid when it is started. runId will change every time Redis is restarted.
Offset: both the master node and the slave node maintain their own master-slave copy offset When the master node has a write command, offset=offset + the byte length of the command; After receiving the command sent by the master node, the slave node will also increase its own offset and send its own offset to the master node. In this way, the master node saves its own offset and the offset of the slave node at the same time, and judges whether the data of the master and slave nodes are consistent by comparing the offset.
repl_backlog_size: a fixed length first in first out queue saved on the primary node. The default size is 1MB.

A. Replication process

(1) The slave node executes slaveof [masterIP] [masterPort], and actively connects to the master server to request data synchronization.

The processing function of slaveof command is replicaofCommand:

void replicaofCommand(client *c) {
    // The slaveof no one command cancels the copy function
    if (!strcasecmp(c->argv[1]->ptr,"no") &&
        !strcasecmp(c->argv[2]->ptr,"one")) {
    } else {
        // Record the Ip address and port of the master server
        server.masterhost = sdsnew(ip);
        server.masterport = port;
        server.repl_state = REPL_STATE_CONNECT;
    }
    addReply(c,shared.ok);
}

(2) Discover the master node information from the scheduled tasks in the node and establish a Socket connection with the master node.

The replicaofCommand function only records the IP address and port of the primary server, and does not initiate a connection request to the primary server.

Note the connection establishment is asynchronous, and the related operations of master-slave replication are carried out in the time event processing function serverCron:

// Perform master-slave replication related operations in a cycle of one second
run_with_period(1000) replicationCron();

In replicationCron, initiate a connection request from the server to the master server:

if (server.repl_state == REPL_STATE_CONNECT) {
    // In the connection, the corresponding file event is created
    if (connectWithMaster() == C_OK) {
        serverLog(LL_NOTICE,"MASTER <-> REPLICA sync started");
        server.repl_state = REPL_STATE_CONNECTING;
    }
}

(3) The slave node sends a Ping signal, the master node returns to Pong, and both sides can communicate with each other.

Ping packet: the Ping packet consists of a packet header (the type field is CLUSTERMSG_TYPE_PING(0)) and multiple mission sections (the recorded status information about other nodes, including node name, IP address, status and listening address, etc.). Each node in the Redis cluster can know the current status of other nodes through the heartbeat packet and save it to the status of this node.

Pong package: the format of the pong package is the same as that of the ping package, except that the type field in the header is written as CLUSTERMSG_TYPE_PONG(1).

Note that the pong packet will be sent as a reply packet after receiving the ping packet and meet packet. After the master-slave handover, the new master node will directly send a Pong packet to all nodes in the cluster to notify the node role conversion after the master-slave handover.

(4) After the connection is established, the slave node sends psync command to the master node to request data synchronization, and the master node sends all data to the slave node (data synchronization).

The entry function of the master server for processing psync commands is syncCommand.

[after Redis 2.8, use psync [runId] [offset] command; Support full and partial replication;

Redis 4.0 proposes two more optimizations for master-slave replication and psync2 protocol]

psync2 has the advantage that after the master-slave switch of redis, there is no need to perform fullsync synchronization again. It only needs partial synchronization, which is a bit similar to binlog

(5) After the master node synchronizes the current data to the slave node, the replication process is completed; Then, the master node will continue to send write commands to the slave node to ensure the consistency of master-slave data.

Every time the master server receives a write command request, it will broadcast the command request to all slave servers and record it in the copy buffer. The implementation function of the broadcast command request to the slave server is replicationfeedslips. The logic is as follows:

void replicationFeedSlaves(list *slaves, int dictid, robj **argv, int argc) {
   //If it is not equal to the last selected database, you need to synchronize the select command first
   if (server.slaveseldb != dictid) {
     //Add the select command to the copy buffer
     if (server.repl_backlog)
        feedReplicationBacklogWithObject(selectcmd);
     //Send a select command to all slave servers
     while((ln = listNext(&li))) {
       addReply(slave,selectcmd);
     }
   }
   server.slaveseldb = dictid;
   if (server.repl_backlog) {
     //Adds the current command request to the copy buffer
   }

   while((ln = listNext(&li))) {
     //Synchronize command requests to all slave servers
  }
}

B. Full replication

one ⃣ Send PSYNC from node- 1 command (because the runId of the primary node is not known for the first transmission, it is?, because it is the first replication, so offset=-1).

two ⃣ The master node finds that the slave node is copied for the first time and returns FULLRESYNC {runId} {offset}. runId is the runId of the master node and offset is the current offset of the master node.

three ⃣ After receiving the master node information from the node, save it to info.

four ⃣ After sending FULLRESYNC, the master node starts the bgsave command to generate an RDB file (data persistence).

five ⃣ The master node sends RDB files to the slave node. During the period from the completion of loading data from the node, the write command of the master node is put into the buffer.

six ⃣ ﹥ clean up your own database data from the node.

seven ⃣ Load RDB files from the node and save the data to your own database

eight ⃣ The master node sends the write command in the buffer to the slave node

C. Incremental replication

one ⃣ Partial replication is mainly an optimization measure made by Redis for the excessive overhead of full replication, which is implemented by psync [runId] [offset] command.

When the slave node is copying the master node, in case of abnormal conditions such as network flash off or command loss, the slave node will ask the master node to reissue the lost command data, and the replication backlog buffer of the master node will directly send this part of data to the slave node.

In this way, the consistency of master-slave node replication can be maintained. This part of the reissued data is generally far less than the full amount of data.

two ⃣ During the interruption of the master-slave connection, the master node still responds to the command, but the command cannot be sent to the slave node due to the interruption of the replication connection. However, the replication backlog buffer in the master node can still store the write command data of the latest period of time.

three ⃣ After the master-slave connection is restored, the copied offset and the operation ID of the master node are saved before the slave node. Therefore, they will be sent to the master node as psync parameters, requiring partial replication.

four ⃣ After receiving the psync command, the master node first checks whether the parameter runId is consistent with itself. If it is consistent, it indicates that the current master node was copied before.

Then search in the replication backlog buffer according to the parameter offset. If the data after offset exists, send the + COUTINUE command to the slave node, indicating that partial replication can be carried out. Because the buffer size is fixed, if a buffer overflow occurs, full copy will be performed.

five ⃣ The master node sends the data in the replication backlog buffer to the slave node according to the offset to ensure that the master-slave replication enters the normal state

int masterTryPartialResynchronization(client *c) {
    //Judge whether the server running ID matches and whether the copy offset is legal
    if (strcasecmp(master_replid, server.replid) &&
       (strcasecmp(master_replid, server.replid2) ||
        psync_offset > server.second_replid_offset))
    {
        goto need_full_resync;
    }

    //Determine whether the copy offset is included in the copy buffer
    if (!server.repl_backlog ||
        psync_offset < server.repl_backlog_off ||
        psync_offset > (server.repl_backlog_off +   
               server.repl_backlog_histlen))
    {
        goto need_full_resync;
    }
    //Partial resynchronization, identifying the slave server
    c->flags |= CLIENT_SLAVE;
    c->replstate = SLAVE_STATE_ONLINE;
    c->repl_ack_time = server.unixtime;
    //Add the client to slave list slave
    listAddNodeTail(server.slaves,c);

   //Return + continue according to the ability of the slave server
    if (c->slave_capa & SLAVE_CAPA_PSYNC2) {
        buflen = snprintf(buf,sizeof(buf),"+CONTINUE %s\r\n", server.replid);
    } else {
        buflen = snprintf(buf,sizeof(buf),"+CONTINUE\r\n");
    }
    if (write(c->fd,buf,buflen) != buflen) {
    }
    //Send the command request in the copy buffer to the client
    psync_len = addReplyReplicationBacklog(c,psync_offset);
    //Update the number of valid slave servers
    refreshGoodSlavesCount();
    return C_OK; /* The caller can return, no full resync needed. */

need_full_resync:
    return C_ERR;
}

2, Sentinel mechanism

Once the master node goes down, the slave node is promoted to the master node. At the same time, the master node address of the application party needs to be modified, and all slave nodes need to be ordered to copy the new master node. The whole process needs manual intervention. Therefore, the sentinel mechanism can be used to manage this process.

Sentinel is a highly available solution of Redis. When the Redis Master fails, it can automatically select a Redis Slave to switch to the Master and continue to provide services.

The minimum configuration of Redis Sentinel is one master and one slave. The system can perform the following four tasks:

Monitoring: constantly check whether the master server and slave server are running normally.
Notification: when a monitored Redis server has a problem, Sentinel sends a notification to the administrator or other applications through API script.
Automatic failover: when the master node cannot work normally, Sentinel will start an automatic failover operation. It will upgrade one of the slave nodes that is in the master-slave relationship with the failed master node to a new master node, and point other slave nodes to the new master node.
Configuration provider: in Redis Sentinel mode, the client application connects to the Sentinel node set during initialization to obtain the information of the master node.

A. Typical profile

// Monitor a Redis Master service named mymaster, with an address and port number of 127.0.0.1:6379 and a quorum of 2
sentinel monitor mymaster 127.0.0.1 6379 2 

// If the sentry does not receive a valid ping reply from mymaster within 60s, it is considered that mymaster is in the down state
sentinel down-after-milliseconds mymaster 60000
// The timeout for switching is 180s
sentinel failover-timeout mymaster 180000

// After the handover, the number of Redis Slave requests to synchronize data to the new Redis Master is 1
// That is, after the handover is completed, let each Slave synchronize the data in turn. After the previous Slave completes the synchronization, the next Slave initiates the request for data synchronization
sentinel parallel-syncs mymaster 1

// Monitor a Redis Master service named resque, with an address and port number of 127.0.0.1:6380 and a quorum of 4
// quorum: ① number of sentinels required to mark master offline; ② The number of votes required by the election sentry to perform master-slave switching
sentinel monitor resque 192.168.1.3 6380 4

B. Working principle

① Each Sentinel node needs to perform the following tasks regularly: each Sentinel sends a PING command to its known master server, slave server and other Sentinel instances once a second.

② If the time of a slave from the last valid reply to the PING command exceeds the value specified by down after milliseconds, the slave will be marked as subjective offline by Sentinel.

③ If a master is marked as subjectively offline, then all Sentinel nodes of the master are being monitored to confirm that the master has indeed entered the subjectively offline state once per second.

④ If a master is marked as subjective offline and a sufficient number of sentinels (at least up to the number specified in the configuration file) agree with this judgment within the specified time range, then the master is marked as objective offline.

⑤ Generally, each Sentinel will send INFO commands to all its known mater s and slave every 10 seconds.

When a master is marked as offline objectively, the frequency of Sentinel sending INFO commands to the slave of the offline master is changed from once every 10 seconds to once every 1 second.

⑥ All Sentinel negotiate the status of the offline master. If it is in SDOWN status, it will vote to automatically select a new master node and point the remaining slave nodes to the new master node for data replication.

⑦ When a sufficient number of Sentinel agree to the master offline, the objective offline status of the master will be removed.

When the master returns a valid reply to Sentinel's PING command again, the subjective offline status of the master server will be removed.

C. Code flow

(1) Start: redis server

redis-server /path/to/sentinel.conf --sentinel

After executing the command, the main code logic executed is as follows:

main(){
    ...
    //Check whether it starts in sentinel mode
    server.sentinel_mode = checkForSentinelMode(argc,argv);
    ...
    if (server.sentinel_mode) {
        initSentinelConfig();        // Set the listening port to 26379
        initSentinel();              // Change the sentinel executable command. The sentinel can only execute a limited number of server commands, such as ping,sentinel,subscribe,publish,info and so on. This function also initializes the sentry
    }
    ...
    sentinelHandleConfiguration();        // Parse the configuration file for initialization
    ...
    sentinelIsRunning();                // Randomly generate a 40 byte sentinel ID and print the startup log
    ...
}

In the main procedure of the main function, it is only initialized. The real operation of establishing command connection and message connection is in the scheduled task serverCron:

serverCron(){
  // ...
  
    // In sentinel mode, it is used to establish a connection, send heartbeat packets regularly and collect information
    if (server.sentinel_mode) sentinelTimer();
    
    // ...
}

The main functions of this function are as follows:

Establish command connection and message connection. After the message connection is established, you will subscribe to the sentinel:hello channel of Redis service.
On the command connection, send info command every 10s for information collection; Send ping command on the command connection every 1s to detect survivability; Publish a message on the command connection every 2s. The message format is as follows: sentinel_ip,sentinel_port,sentinel_runid,current_epoch,master_name,master_ip,master_port,master_config_epoch The above parameters respectively represent the IP of the sentry, the port of the sentry, the ID of the Sentry (i.e. the 40 byte random string mentioned above), the current era (for election and master-slave switching), the name of Redis Master, the IP of Redis Master, the port of Redis Master, and the configuration era of Redis Master (for election and master-slave switching).
Check whether the service is in the subjective offline state.
Check whether the service is in an objective offline state and master-slave switching is required.

(2) If it is judged that a Redis Master is in an objective offline state, it is necessary to start the master-slave switch

The selection rules in Redis are as follows:

If the Slave is in the subjective offline status, it cannot be selected.
If the slave does not reply to the ping command effectively within 5S or is disconnected from the main server for a long time, it cannot be selected.
If slave priority is 0, it cannot be selected (slave priority can be specified in the configuration file. Positive integer. The smaller the value, the higher the priority. When it is specified as 0, it cannot be selected as the master server).
Compare the priority among the remaining Slave, and the one with higher priority is selected; If the priority is the same, those with larger copy offset will be selected; Otherwise, select the top ranked Slave in alphabetical order.

In this state, the selected Redis Slave needs to be switched to Redis Master, that is, the sentry sends the following command to the Slave:

MULTI              //Start a transaction
SLAVEOF NO ONE     //Turn off the replication function of the slave server and convert it into a master server
CONFIG REWRITE     //Redis Conf file rewriting (the original configuration will be rewritten according to the current running configuration)
CLIENT KILL TYPE normal // Close the client connected to the service (after closing, the client will reconnect and obtain the address of Redis Master again)
EXEC               //Execute transaction

The Sentry will send the command to switch the master server to other slave servers in turn, as follows:

MULTI              //Start a transaction
SLAVEOF IP PORT    //Set the server to request data from the new master server 
CONFIG REWRITE     //Redis Conf file rewriting (the original configuration will be rewritten according to the current running configuration)
CLIENT KILL TYPE normal //Close the client connected to the service (after closing, the client will reconnect and obtain the address of Redis Master again)
EXEC               //Execute transaction

At the same time, in this process, the sentry follows the corresponding state transition.

Keywords: Redis

Added by joukar on Fri, 28 Jan 2022 22:05:20 +0200

Programming VIP