Redis distributed lock redison source code analysis

Distributed lock

When designing distributed locks, we should consider at least some conditions that distributed locks should meet, and how to design distributed locks efficiently

1. Mutually exclusive

Under the condition of distributed high concurrency, we most need to ensure that only one thread can obtain a lock at the same time, which is the most basic point.

2. Deadlock prevention

Under the condition of distributed high concurrency, for example, when a thread obtains a lock, it has not had time to release the lock. Because of system failure or other reasons, it cannot execute the command to release the lock, so that other threads cannot obtain the lock, resulting in deadlock.

Therefore, it is very necessary to set the effective time of the lock to ensure that after the system fails, it can actively release the lock within a certain time to avoid deadlock.

3. Performance

For shared resources with a large number of accesses, we need to consider reducing the lock waiting time to avoid causing a large number of thread blocking.

Therefore, two points need to be considered in the design of lock.

1. The granularity of the lock should be as small as possible. For example, if you want to reduce inventory through a lock, you can set the name of the lock to the ID of the commodity instead of any name. In this way, the lock is only valid for the current commodity, and the granularity of the lock is small.

2. The scope of the lock should be as small as possible. For example, if you can solve the problem by locking 2 lines of code, don't lock 10 lines of code.

4. Reentry

We know that ReentrantLock is a reentrant lock. Its feature is that the same thread can repeatedly get the lock of the same resource. Reentry lock is very conducive to the efficient utilization of resources. There will be a demonstration about this later.

For the above Redisson can be well satisfied, let's analyze it.

Redisson

On the nety framework based on NIO, redison makes full use of a series of advantages provided by Redis key database. Based on the common interfaces in the Java Utility Kit, redison provides users with a series of common tool classes with distributed characteristics, and supports various deployment architectures such as Redis single instance, Redis sentry, Redis Cluster, Redis master slave and so on. Redisson uses Lua script to encapsulate multiple non atomic commands and send them to the server to ensure the atomicity of the operation

Redisson is a relatively perfect implementation of redis distributed lock at present. More details can be learned through redisson

Redisson easy to use

RLock lock = redisson.getLock("lockName");
try{
    //You can set the timeout
    lock.lock();
    //Business logic
} finally {
    lock.unlock();
}

Analysis of Redisson plus unlocking process

Detailed flow chart

Take a look at the internal lock operation process of Redisson through a figure. Its internal implementation mainly uses three technology stacks (Lua script + Semaphore + asynchronous thread). Note: the Redisson version used by the author is 3.12.1

Locking operation

The thread acquires the lock successfully: execute the lua script and save the data to the redis database.

The thread goes to acquire the lock, but the acquisition fails: it keeps trying to acquire the lock through the while loop. After successful acquisition, execute the lua script and save the data to the redis database.

Suppose multiple clients compete for lock resources on lockName with key at the same time

Client 1

RLock lock1 = redisson.getLock("lockName");
lock1.lock();
System.out.println("Client 1 is locked successfully!");
lock1.lock();
System.out.println("Client 1 has repeatedly been locked successfully!");
lock1.unlock();
lock1.unlock();

Client 2

RLock lock2 = redisson.getLock("lockName");
lock2.lock();

Lock source code

private void lock(long leaseTime, TimeUnit unit, boolean interruptibly) throws InterruptedException {
        long threadId = Thread.currentThread().getId();
        Long ttl = tryAcquire(leaseTime, unit, threadId);
        // If ttl is empty, the lock is successful
        if (ttl == null) {
            return;
        }
        // Use redis - > subscribe to subscribe to the channel to listen for callback processing
        RFuture<RedissonLockEntry> future = subscribe(threadId);
        ...
        // If the lock acquisition fails, use the while loop to continuously obtain the remaining expiration time ttl of the lock, and then specify the time of Park as ttl to judge continuously
        try {
            while (true) {
                // If the lock still exists, the remaining expiration time ttl of the current lock is returned
                ttl = tryAcquire(leaseTime, unit, threadId);
                // If other clients release the lock, the current client competes for the lock successfully
                if (ttl == null) {
                    break;
                }
                // The current client thread is blocked for the specified ttl
                if (ttl >= 0) {
                    //Use semaphore - > locksupport Park() method
                    future.getNow().getLatch().tryAcquire(ttl, TimeUnit.MILLISECONDS);
                    }
                } 
            }
        } finally {
            unsubscribe(future, threadId);
        }
    }

Lock lua script

Explain a few parameters in Lua script

KEYS[1] the name of the locked key, such as RLOCK lock = reisson getLock(“lockName”); Then KEYS[1] is lockname

ARGV[1] indicates the expiration time of the lock. If it is not set, it defaults to 30 seconds

ARGV[2] indicates the ID of the locked client. The format is uuid + ":" + threadid, for example: 11bb52bc-a764-4649-8b46-a61513d7fe44:1

<T> RFuture<T> tryLockInnerAsync(long leaseTime, TimeUnit unit, long threadId, RedisStrictCommand<T> command) {
    internalLockLeaseTime = unit.toMillis(leaseTime);
    return commandExecutor.evalWriteAsync(getName(), LongCodec.INSTANCE, command,
                 //If the lock does not exist, lock it and save the lock resource in the hash
              "if (redis.call('exists', KEYS[1]) == 0) then " +
                  "redis.call('hset', KEYS[1], ARGV[2], 1); " +
                  "redis.call('pexpire', KEYS[1], ARGV[1]); " +
                  "return nil; " +
              "end; " +
              //If the lock exists and locks the same thread repeatedly, increase the number of lock reentry of the thread by 1
              "if (redis.call('hexists', KEYS[1], ARGV[2]) == 1) then " +
                  "redis.call('hincrby', KEYS[1], ARGV[2], 1); " +
                  "redis.call('pexpire', KEYS[1], ARGV[1]); " +
                  "return nil; " +
              "end; " +
              //Other clients have competed for the lock. Return the remaining expiration time of the current lock
              "return redis.call('pttl', KEYS[1]);",
                Collections.<Object>singletonList(getName()), internalLockLeaseTime, getLockName(threadId));
    }

Lock acquisition process

Branch 1: lock does not exist

Use the "exists lockName" command to determine whether the lock exists. If it does not exist, use the "hset KEYS[1] ARGV[2] 1" command to lock. Assuming that the lock of client 1 is successful, use the hash data structure to store the lock resources of client 1. The structure is:

"lockName":{
    //Client ID: number of reentries
    "11bb52bc-a764-4649-8b46-a61513d7fe44:1":1
}

Then use "pexpire KEYS[1] ARGV[1]", i.e. "pexpire lockName 30000" to set the expiration time of lockName lock, and then return null to indicate that locking is successful!

Branch 2: the lock exists and locks the same client repeatedly

The client can obtain the lock repeatedly in the same thread operation. Use the command "hincrby KEYS[1] ARGV[2] 1" to increase the reentry times of the same client by 1, and reset the expiration time. Returning null indicates that the lock is successful!

Client 1 repeatedly locks successfully. At this time, the hash structure is as follows:

"lockName":{
    //Client ID: the number of reentries is increased by 1
    "11bb52bc-a764-4649-8b46-a61513d7fe44:1":2
}

Branch 3: client lock competition

After the client fails to obtain the lock, the current client will subscribe to the channel with the name "reisson_lock_channel: {lockname}" to listen to the callback processing. When the client releases the lock, it will be in reisson_ lock__ Channel: publish unlock on channel of {lockname}_ Unlock message for message

If another client 2 also tries to lock the lockName at this time, exists judges that the lockName already exists and the lockName key in the hash already exists. The lock "11bb52bc-a764-4649-8b46-a61513d7fe44:1" of client 1, so client 2 cannot lock. What should I do?

The client thread will use the "pttl KEYS[1]" command to return the remaining expiration time ttl of the current lock, and then use Semaphore in the J.U.C framework to call locksupport according to the returned ttl time Parknanos (ttl) to block itself. After the specified waiting time, continue to try to lock and cycle until it succeeds

The code snippet of lock() method in RedissonLock class is as follows:

while (true) {
    //Try locking
    ttl = tryAcquire(leaseTime, unit, threadId);
    //Get success
    if (ttl == null) {
        break;
    }
    
    if (ttl >= 0) {
        // Call Semaphore's tryAcquire() method
        future.getNow().getLatch().tryAcquire(ttl, TimeUnit.MILLISECONDS);
    }
}

The code snippet of tryACquire in Semaphore class is as follows:

//nanosTimeout time is the waiting time ttl of the client
LockSupport.parkNanos(this, nanosTimeout);

Scene example: A is going to the toilet, B finds that a is already in the toilet, so B asks when a can come out. A says it's OK to wait 10 minutes. At this time, B uses his watch to start semaphore. After 10 minutes, he finds that a is still inside, so B continues to ask a how long it will take. A is embarrassed to wait for me for another 5 minutes, and B starts timing and waiting again...

Reentrant locking mechanism

The reason why Redisson can implement the reentrant locking mechanism is related to two points:

1,Redis The data type of the storage lock is Hash type
2,Hash Data type key The value contains the current thread information.

The following is the data stored in redis

Here, the surface data type is Hash type, which is equivalent to our java < key, < key1, value > > type. Here, key refers to 'reisson'

Its validity period is 9 seconds. Let's look again. The key1 value of limen is 078e44a3-5f95-4e24-b6aa-80684655a15a:45. Its composition is:

guid + ID of the current thread. The latter value is related to reentrant locking.

Illustration

The above figure means the mechanism of reentrant lock. Its biggest advantage is that the same thread does not need to wait for the lock, but can directly operate accordingly.

WatchDog delay mechanism

Why use WatchDog?

There is a leaseTime option in the lock acquisition api provided by Redisson. When the value is - 1, it indicates that the successful client can always hold the lock. Other client threads will wait until the lock is released. We know that when setting a key in Redis, we often need to specify expireTime to prevent it from occupying memory space for a long time. In this scenario, the lock will eventually expire, so before the key expires, a mechanism (WatchDog) must be provided to ensure that the key remains valid

WatchDog implementation mechanism in Redisson distributed lock

  1. You can customize the expiration time. Automatic extension will be started only when the expiration time is not set (the default value is 0).
  2. No expiration time is set. When directly applying for a lock, an extended expiration time of 30s will be set by default. The expiration time of 30s will be reset every third of the extended expiration time of 10s (the period time value is the extended expiration time).
  3. In order to prevent a business from being abnormal and the task lasts for a long time, so that the lock is occupied for a long time, the parameter limit of maximum delay times is added. For example, if the delay exceeds three times, it will not be postponed.

After the client locks successfully, a watch dog background thread will be enabled. The netty time wheel HashedWheelTimer algorithm will be used to check every 10 seconds. If the client still holds the lock, the expiration time of the lock will be reset to lockWatchdogTimeout=30 seconds (default), in which delay = lockWatchdogTimeout/3

private <T> RFuture<Long> tryAcquireAsync(long leaseTime, TimeUnit unit, long threadId) {
    //Enable watchdog when leaseTime is - 1
    if (leaseTime != -1) {
        return tryLockInnerAsync(leaseTime, unit, threadId, RedisCommands.EVAL_LONG);
    }
    RFuture<Long> ttlRemainingFuture = tryLockInnerAsync(commandExecutor.getConnectionManager().getCfg().getLockWatchdogTimeout(), TimeUnit.MILLISECONDS, threadId, RedisCommands.EVAL_LONG);
    ttlRemainingFuture.onComplete((ttlRemaining, e) -> {
        if (e != null) {
            return;
        }
        // The client is locked successfully, and the operation is postponed
        if (ttlRemaining == null) {
            scheduleExpirationRenewal(threadId);
        }
    });
    return ttlRemainingFuture;
}

How does Redisson ensure that the delay operation of WatchDog is executed only once when the same client is locked repeatedly and successfully? The answer is: local cache

private void scheduleExpirationRenewal(long threadId) {
    ExpirationEntry entry = new ExpirationEntry();
    ExpirationEntry oldEntry = EXPIRATION_RENEWAL_MAP.putIfAbsent(getEntryName(), entry);
    //After the second time, you can obtain the lock without using the time wheel algorithm
    if (oldEntry != null) {
        oldEntry.addThreadId(threadId);
    } else {
        //When the first time is successful, postpone the operation
        entry.addThreadId(threadId);
        renewExpiration();
    }
}

The code snippet of renewExpiration() method in RedissonLock class is as follows:

//Deferred operation
private void renewExpiration() {
    Timeout task = commandExecutor.getConnectionManager().newTimeout(new TimerTask() {
        @Override
        public void run(Timeout timeout) throws Exception {
            //Expiration time of Lua script deferred lock
            RFuture<Boolean> future = renewExpirationAsync(threadId);
            future.onComplete((res, e) -> {
                //Successful extension
                if (res) {
                    // Continue cycle delay operation
                    renewExpiration();
                }
            });
        }
    //Check every 10 seconds
    }, internalLockLeaseTime / 3, TimeUnit.MILLISECONDS);
}

Call the renewExpirationAsync() method to set the expiration time of the lock. The Lua script is as follows:

protected RFuture<Boolean> renewExpirationAsync(long threadId) {
    return commandExecutor.evalWriteAsync(getName(), LongCodec.INSTANCE, RedisCommands.EVAL_BOOLEAN,
            "if (redis.call('hexists', KEYS[1], ARGV[2]) == 1) then " +
                "redis.call('pexpire', KEYS[1], ARGV[1]); " +
                "return 1; " +
            "end; " +
            "return 0;",
        Collections.<Object>singletonList(getName()), 
        internalLockLeaseTime, getLockName(threadId));
}

Which locking methods use WatchDog?

It should be noted that not all successful lock operations will enable the WatchDog function. The WatchDog function will not be enabled until the condition of leaseTime - 1 is established.

Some locking operations provided by Redisson are listed below: Define RLOCK RLOCK = client getLock(“lockName”);

Lock acquisition method

Enable WatchDog

rlock.lock()

Enable

rlock.tryLock()

Enable

tryLock(long waitTime, TimeUnit unit)

Enable

tryLock(long waitTime, long leaseTime != -1, TImeUnit unit)

close

...

...

Note: if leaseTime is - 1, the watchDao function will not be enabled

Release lock operation

The operation of releasing the lock is relatively simple and easy to understand. There are about four steps: Delete key - > set expiration time - > delete local cache - > publish unlocking message

Unlock operation lua script

protected RFuture<Boolean> unlockInnerAsync(long threadId) {
    return commandExecutor.evalWriteAsync(getName(), LongCodec.INSTANCE, RedisCommands.EVAL_BOOLEAN,
            //If it does not exist, it will directly return null
            "if (redis.call('hexists', KEYS[1], ARGV[3]) == 0) then " +
                "return nil;" +
            "end; " +
            //Lock reentry times of current client - 1
            "local counter = redis.call('hincrby', KEYS[1], ARGV[3], -1); " +
            //If the current client still holds the lock, reset the expiration time
            "if (counter > 0) then " +
                "redis.call('pexpire', KEYS[1], ARGV[2]); " +
                "return 0; " +
             //The client has released the lock, deleted the key and released the unlocking message
            "else " +
                "redis.call('del', KEYS[1]); " +
                "redis.call('publish', KEYS[2], ARGV[1]); " +
                "return 1; "+
            "end; " +
            "return nil;",
            Arrays.<Object>asList(getName(), getChannelName()), LockPubSub.UNLOCK_MESSAGE, internalLockLeaseTime, getLockName(threadId));
    }

Delete local cache code

void cancelExpirationRenewal(Long threadId) {
    ExpirationEntry task = EXPIRATION_RENEWAL_MAP.get(getEntryName());
    if (task == null) {
        return;
    }
    if (threadId != null) {
        task.removeThreadId(threadId);
    }
    if (threadId == null || task.hasNoThreads()) {
        Timeout timeout = task.getTimeout();
        if (timeout != null) {
            timeout.cancel();
        }
        EXPIRATION_RENEWAL_MAP.remove(getEntryName());
    }
}

The deletion of the local cache map is executed in the asynchronous thread. After the WatchDog suspends the lock of the client, the thread information of the client is saved in the local cache map to ensure that when the same client succeeds in repeatedly obtaining the lock, the lock postponement operation is executed only once

Disadvantages of Redis distributed lock

Redis distributed lock has a defect, that is, in redis sentinel mode:

Client 1 writes a reisson lock to a master node, which will be asynchronously copied to the corresponding slave node. However, in this process, once the master node goes down, the active and standby switches, and the slave node changes from master node to slave node.

At this time, when client 2 tries to lock, it can also lock on the new master node. At this time, multiple clients will complete locking the same distributed lock.

At this time, the system will have problems in business semantics, resulting in the generation of all kinds of dirty data.

The defect is in sentinel mode or master-slave mode. If the master instance goes down, multiple clients may complete locking at the same time.

summary

So far, the detailed process analysis of Redisson plus unlocking is completed! Back to the beginning, we said that Redisson still has some minor defects. For example, under the mast slave architecture, the master-slave synchronization is usually asynchronous

There is obvious competition in this scenario (master-slave structure): 1. Client A obtains the lock from the master 2. Before the master synchronizes the lock to the slave, the Master goes down 3. slave node is promoted to master node 4. Client B has obtained the same resource and client A has obtained another one. The lock security is invalid!

The official solution is to use the redlock algorithm. If readers want to learn more about redlock, please refer to the redlock algorithm of Redis on the official website

reference resources:

Redisson implements distributed locks (1) -- principle: https://www.cnblogs.com/qdhxhz/p/11046905.html

Redisson source code analysis: https://blog.csdn.net/flyfhj/article/details/104715607

How does redisson's WatchDog take care of the house https://blog.csdn.net/ice24for/article/details/86177152

redisson's hundred lock deconstruction (Part 1): https://blog.csdn.net/ice24for/article/details/86515316

The deconstruction of redisson's hundred locks (Part 2): https://blog.csdn.net/ice24for/article/details/86527808

redis distributed locks automatically extend the expiration time: https://segmentfault.com/a/1190000037526623

Redisson solves the problem that the redis distributed lock expires and the business is not completed: https://blog.csdn.net/belongtocode/article/details/102511520

Added by kcorless on Wed, 09 Mar 2022 07:44:19 +0200