On cache and distributed lock

For a large website, the daily traffic is huge, especially at some specific time points, such as the shopping festival of e-commerce platform and the opening season of education platform. When excessive concurrency is encountered at a certain point in time, it will often crush the server and cause the website to crash. Therefore, the website is very important for high concurrency processing, in which cache plays an important role. For some infrequently changing or hot data, it can be stored in the cache. At this time, when users access, they will directly read the cache without querying the database, which greatly improves the throughput of the website.

Use of cache

First, build a simple test environment, create a SpringBoot application, and write a controller:

@RestController
public class TestController {

    @Autowired
    private UserService userService;

    @GetMapping("/test")
    public List<User> test(){
        return userService.getUsers();
    }
}

visit http://localhost:8080/test All user information can be obtained: ![image.png](https://img-blog.csdnimg.cn/img_convert/f25bec94add09df72e3d1b9cadd92bff.png#align=left&display=inline&height=419&margin=[object Object]&originHeight=419&originWidth=393&size=11706&status=done&style=none&width=393) We used jmeter to stress test the application and went to the official website: http://jmeter.apache.org/download_jmeter.cgi ![image.png](https://img-blog.csdnimg.cn/img_convert/531baa1cd001ba4681fb13d45f7699b5.png#align=left&display=inline&height=226&margin=[object Object]&originHeight=226&originWidth=729&size=20936&status=done&style=none&width=729) Download the zip package locally, extract it, and double-click jmeter.exe in the bin directory Bat to start JMeter: ![image.png](https://img-blog.csdnimg.cn/img_convert/8c7ecfd946cf42c3fda5c5de53bb8d3d.png#align=left&display=inline&height=133&margin=[object Object]&originHeight=133&originWidth=805&size=5562&status=done&style=none&width=805) Here, we simulate the concurrency of 2000 requests in one second to see the throughput of the application: ![image.png](https://img-blog.csdnimg.cn/img_convert/41b217208ca4509714974b8f0d3cee72.png#align=left&display=inline&height=81&margin=[object Object]&originHeight=81&originWidth=1011&size=10832&status=done&style=none&width=1011) It is found that the throughput is 421. It is conceivable that when the amount of data in the data table is very large, if all requests need to query the database once, the efficiency will be greatly reduced. Therefore, we can add cache to optimize:

@RestController
public class TestController {

    // cache
    Map<String, Object> cache = new HashMap<>();

    @Autowired
    private UserService userService;

    @GetMapping("/test")
    public List<User> test() {
        // Get data from cache
        List<User> users = (List<User>) cache.get("users");
        if (StringUtils.isEmpty(users)) {
            // Unnamed cache, query database
            users = userService.getUsers();
            // Store the queried data into the cache
            cache.put("users",users);
        }
        // Named cache, direct return
        return users;
    }
}

Here, a cache is simply simulated using HashMap, and the execution process of this interface is as follows: ![image.png](https://img-blog.csdnimg.cn/img_convert/e7bfcfe196a6104d239679dcc3fe24bf.png#align=left&display=inline&height=492&margin=[object Object]&originHeight=492&originWidth=482&size=38419&status=done&style=none&width=482) When the request arrives, first read the data from the cache. If the data is read, it will be returned directly; If it is not read, query the database and store the obtained data in the cache, so that the data in the cache can be read in the next request. Now test the throughput of the application: ![image.png](https://img-blog.csdnimg.cn/img_convert/c16afbceab46cbec27d4e50e451510eb.png#align=left&display=inline&height=85&margin=[object Object]&originHeight=85&originWidth=1010&size=10849&status=done&style=none&width=1010) It is not difficult to find that the throughput has been significantly improved.

Local cache and distributed cache

Just now, we used the cache to improve the overall performance of the application, but the cache is defined inside the application. This kind of cache is called local cache. Local caching can indeed solve the problem for stand-alone applications, but in distributed applications, one application is often deployed in multiple copies to achieve high availability: ![image.png](https://img-blog.csdnimg.cn/img_convert/59b15487200efe679ac3831b758defd9.png#align=left&display=inline&height=297&margin=[object Object]&originHeight=297&originWidth=668&size=22166&status=done&style=none&width=668) At this time, each application will save its own cache. When modifying data, it is necessary to modify the data in the cache accordingly. However, because there are multiple caches, other caches will not be modified, resulting in data disorder. Therefore, we need to extract the cache to form a cache middleware that is independent of all applications but related to all applications: ![image.png](https://img-blog.csdnimg.cn/img_convert/60959120e2b68232a097815d76cb9783.png#align=left&display=inline&height=372&margin=[object Object]&originHeight=372&originWidth=608&size=27412&status=done&style=none&width=608) Redis is a popular caching middleware at present.

Spring boot integrates Redis

Next, modify the application just now to use redis cache. First download the redis image:

docker pull redis

Create directory structure:

mkdir -p /mydata/redis/conf
touch /mydata/redis/conf/redis.conf

Go to / mydata/redis/conf directory and modify redis Conf file:

appendonly yes # Persistent configuration

Create an instance of redis and start:

docker run -p 6379:6379 --name redis\
                  -v /mydata/redis/data:/data\
                  -v /mydata/redis/conf/redis.conf:/etc/redis/redis.conf\
                  -d redis redis-server /etc/redis/redis.conf

Configure the following to enable redis to start with the start of Docker:

docker update redis --restart=always

Here, redis is ready, and then redis dependency is introduced into the project:

<dependency>
  <groupId>org.springframework.boot</groupId>
  <artifactId>spring-boot-starter-data-redis</artifactId>
</dependency>

In application Redis configured in YML:

spring:  redis:    host: 192.168.66.10

Modify controller code:

@RestControllerpublic class TestController {    @Autowired    private UserService userService;    @Autowired    private StringRedisTemplate redisTemplate;    @GetMapping("/test")    public String test() {        // Get data from Redis string usersjson = redistemplate opsForValue(). get("users");         If (stringutils. Isempty (usersjson)) {/ / the cache is not hit. Query the database list < user > users = userservice. Getusers(); / / convert the query result to JSON string usersJson = JSON.toJSONString(users); / / put it into the cache redisTemplate.opsForValue().set("users",usersJson);}// Return result return usersJson;}}

Some problems in cache

The use of Redis cache does not mean that you can rest easy. There are still many problems to be solved. The following are three problems that cache middleware often faces:

  1. Cache penetration
  2. Cache avalanche
  3. Buffer breakdown

Cache penetration

Cache penetration refers to querying a certain nonexistent data. Since the cache needs to be queried from the database when it misses, if the data cannot be found, it will not be written to the cache, which will cause the nonexistent data to be queried in the database every request, thus putting pressure on the database. Because caching helps the database share the pressure, but if some people know which data in the system must not exist, it can use this data to send a large number of requests constantly, so as to destroy our system. The solution is to store the data whether it exists or not. For example, if the data required by a request does not exist, the key of the data is still stored so that it can be obtained from the cache during the next request. However, if the key of each request is different, a large number of useless keys will be stored in Redis, Therefore, you should set a specified expiration time for these keys and delete them automatically when they expire.

Cache avalanche

Cache avalanche means that a large number of data in the cache expire at the same time, and the amount of query data is huge, resulting in excessive pressure or even downtime of the database. The solution is to add a random value to the original expiration time of the data, so that the expiration time between the data is inconsistent, and there will be no simultaneous expiration of large quantities of data.

Buffer breakdown

Cache breakdown means that when a hot key expires at a certain point in time, there are a large number of concurrent requests for the key at this point in time, so a large number of requests reach db. The solution is to lock. When a hot key expires, a large number of requests will compete for resources. When a request is successfully executed, other requests need to wait. At this time, the data will be put into the cache after the request is executed, so that other requests can directly obtain data from the cache.

Solve cache breakdown problem

We can easily solve cache penetration and cache avalanche. However, the cache breakdown problem needs to be solved by locking. Let's explore how to solve the cache breakdown problem by locking.

@GetMapping("/test")public String test() {    String usersJson = redisTemplate.opsForValue().get("users");    if (StringUtils.isEmpty(usersJson)) {        synchronized (this){            // Confirm again whether there is data in the cache string JSON = redistemplate opsForValue(). get("users");             If (stringutils. Isempty (JSON)) {list < user > users = userservice. Getusers(); system. Out. Println ("database queried...); usersJson = JSON.toJSONString(users);            }else{                usersJson = json;            }            redisTemplate.opsForValue().set("users",usersJson);        }    }    return usersJson;}

First, you still need to get data from the cache. If the cache is not hit, execute the synchronization code block, and confirm the cached data in the synchronization code block. This is because when a large number of requests enter the outermost if statement at the same time, a request starts to execute and successfully queries the database. However, after the request puts the data into Redis, if it is not judged again, these requests will still query the database. The execution principle is as follows: ![image.png](https://img-blog.csdnimg.cn/img_convert/46506e1773d05d3d58eec817f5f3ea26.png#align=left&display=inline&height=433&margin=[object Object]&originHeight=433&originWidth=961&size=47364&status=done&style=none&width=961) After using jmeter to simulate concurrency 2000 times a second, the results are as follows:

The database was queried......

The console only outputs one query database, It shows that only one of the 2000 requests queries the database, but the performance drops sharply: ![image.png](https://img-blog.csdnimg.cn/img_convert/4392f16a8c40abadb3a5c4f429a50d2c.png#align=left&display=inline&height=87&margin=[object Object]&originHeight=87&originWidth=1012&size=11160&status=done&style=none&width=1012) This is no problem for stand-alone applications, because the default Bean in SpringBoot is a singleton, and there is no problem locking code blocks through this. However, in distributed applications, an application is often deployed in multiple copies, and this cannot lock the requests of each application. At this time, distributed locks need to be used.

Distributed lock

Like caching middleware, we can extract locks outside, independent of all services, but associated with each service, as shown below: ![image.png](https://img-blog.csdnimg.cn/img_convert/6754bb9382b69484ce7f9171b0d75bd2.png#align=left&display=inline&height=318&margin=[object Object]&originHeight=423&originWidth=686&size=28475&status=done&style=none&width=515)

If each service wants to lock, it needs to go to a public place to occupy it. This ensures that even in a distributed environment, the lock of each service is still the same. There are many options in this public place. Redis can be used to implement distributed locks. One instruction in Redis is very suitable for implementing distributed locks. It is setnx. Let's see how it is introduced on the official website: ![image.png](https://img-blog.csdnimg.cn/img_convert/716c16c26695d36a8e565b7685c0a084.png#align=left&display=inline&height=272&margin=[object Object]&originHeight=362&originWidth=621&size=23718&status=done&style=none&width=466) Setnx will set the value only when the key does not exist, otherwise nothing will be done. For each service, we can let it execute setnx lock 1, because this operation is atomic. Even if there are millions of concurrent requests, only one request can be set successfully, and other requests will fail because the key already exists. If the setting is successful, it indicates that the lock is occupied successfully; If the setting fails, the lock occupation fails. ![image.png](https://img-blog.csdnimg.cn/img_convert/82d7b716e974fb5f37d21764c16a43ba.png#align=left&display=inline&height=303&margin=[object Object]&originHeight=404&originWidth=498&size=29661&status=done&style=none&width=374) The code is as follows:

@RestControllerpublic class TestController {    @Autowired    private UserService userService;    @Autowired    private StringRedisTemplate redisTemplate;    @GetMapping("/test")    public String test() throws InterruptedException {        String usersJson = redisTemplate.opsForValue().get("users");        if (StringUtils.isEmpty(usersJson)) {            usersJson = getUsersJson();        }        return usersJson;    }    public String getUsersJson() throws InterruptedException {        String usersJson = "";        // Preempt distributed lock Boolean lock = redistemplate opsForValue(). setIfAbsent("lock", "1");         If (lock) {/ / locking succeeds. / / reconfirm whether there is data in the cache string JSON = redistemplate. Opsforvalue(). Get ("users"); if (stringutils. Isempty (JSON)) {list < user > users = userservice. Getusers(); system. Out. Println ("query database...); usersJson = JSON.toJSONString(users);            } else {                usersJson = json;            }            redisTemplate.opsForValue().set("users", usersJson);            //  Release the lock redistemplate delete("lock");        }  Else {/ / lock occupation fails, triggering the retry mechanism Thread.sleep(200); / / repeatedly calling its own getusersjson();} return usersJson;    }}

Of course, there is still a big problem here. If the program is abnormal before releasing the lock, resulting in code termination and the lock is not released in time, there will be a deadlock problem. The solution is to set the expiration time of the lock while occupying the lock. In this way, even if the program does not release the lock in time, Redis will automatically delete the lock after it expires.

Even if the lock expiration time is set, there will still be new problems. When the execution time of the business is greater than the lock expiration time, the business is not processed at this time, but the lock is deleted by Redis, so that other requests can re occupy the lock and execute the business method. The solution is to make the lock occupied by each request unique, A request cannot delete locks of other requests at will. The code is as follows:

public String getUsersJson() throws InterruptedException {    String usersJson = "";    // Preempt distributed lock string UUID = UUID randomUUID(). toString();     Boolean lock = redisTemplate. opsForValue(). setIfAbsent("lock", uuid,300, TimeUnit.SECONDS);     If (lock) {/ / locking succeeds. / / reconfirm whether there is data in the cache string JSON = redistemplate. Opsforvalue(). Get ("users"); if (stringutils. Isempty (JSON)) {list < user > users = userservice. Getusers(); system. Out. Println ("query database...); usersJson = JSON.toJSONString(users);        } else {            usersJson = json;        }        redisTemplate.opsForValue().set("users", usersJson);        //  Judge whether the current lock is its own string lockval = redistemplate opsForValue(). get("lock");         If (UUID. Equals (lockval)) {/ / the lock redistemplate. Delete ("lock") can only be released if it is your own lock;}} Else {/ / lock occupation fails. The retry mechanism thread.sleep (200); getusersjson();} return usersJson;}

Think about it carefully. There is still a problem here, because when the lock is released, the Java program will send instructions to redis. Redis will return the results to the Java program after execution, which will consume time in the process of network transmission. Suppose that at this time, the Java program obtains the lock value from redis, and redis successfully returns the value, but the lock expires in the return process. At this time, other requests can occupy the lock. At this time, the Java program receives the lock value and finds that it is its own lock, so it performs the deletion operation, but the lock in redis is already the lock of other requests, In this way, there is still a problem that one request deletes the locks of other requests. For this reason, the official website of Redis also provides solutions: ![image.png](https://img-blog.csdnimg.cn/img_convert/5602cb97d79cd8b0fb01dfae0a43b07c.png#align=left&display=inline&height=240&margin=[object Object]&originHeight=240&originWidth=553&size=9704&status=done&style=none&width=553) The problem can be solved by executing such a Lua script. The code is as follows:

public String getUsersJson() throws InterruptedException {    String usersJson = "";    // Preempt distributed lock string UUID = UUID randomUUID(). toString();     Boolean lock = redisTemplate. opsForValue(). setIfAbsent("lock", uuid,300, TimeUnit.SECONDS);     If (lock) {/ / locking succeeds. / / reconfirm whether there is data in the cache string JSON = redistemplate. Opsforvalue(). Get ("users"); if (stringutils. Isempty (JSON)) {list < user > users = userservice. Getusers(); system. Out. Println ("query database...); usersJson = JSON.toJSONString(users);        } else {            usersJson = json;        }        redisTemplate.opsForValue().set("users", usersJson);        String luaScript = "if redis.call(\"get\",KEYS[1]) == ARGV[1]\n" +            "then\n" +            "    return redis.call(\"del\",KEYS[1])\n" +            "else\n" +            "    return 0\n" +            "end";        //  Execute script defaultredisscript < long > redisscript = new defaultredisscript < > (luascript, long. Class); List<String> keyList = Arrays. asList("lock");         redisTemplate. execute(redisScript, keyList, uuid);    }  Else {/ / lock occupation fails. The retry mechanism thread.sleep (200); getusersjson();} return usersJson;}

Redisson

Redisson is a Java in memory data grid implemented on the basis of Redis. We can use it to easily implement distributed locks. Firstly, Redisson dependency is introduced:

<dependency>  <groupId>org.redisson</groupId>  <artifactId>redisson</artifactId>  <version>3.16.0</version></dependency>

Write configuration class:

@Configurationpublic class MyRedissonConfig {    @Bean    public RedissonClient redissonClient() {        Config config = new Config();        config.useSingleServer().setAddress("redis://192.168.66.10:6379");        return Redisson.create(config);    }}

Write a controller to experience Redisson:

@RestControllerpublic class TestController {        @Autowired    private RedissonClient redissonClient;    @GetMapping("/test")    public String test() {        // Lock occupied RLOCK lock = reissonclient getLock("my_lock");        //  Lock lock();         Try {/ / simulate business processing thread.sleep (1000 * 10);} catch (Exception e) {            e.printStackTrace();        }  Finally {/ / release lock. Unlock();} return "test";    }}

Simply declare the operations of adding and releasing locks. All the previous problems will be solved. Redisson will automatically set the expiration time for the lock, and provides a watchdog to monitor the lock. Its function is to continuously extend the expiration time of the lock before the redisson instance is closed, If the thread of the lock has not finished processing the business (by default, the watchdog renewal time is 30 seconds). You can also specify the expiration time of the lock:

lock.lock(15, TimeUnit.SECONDS);

Set the time when locking.

When the lock expiration time is set to 15 seconds, if the business execution takes more than 15 seconds, will Redis automatically delete the lock and request to preempt the lock? In fact, this situation still exists, so we should avoid setting too small expiration time, and make sure that the expiration time of the lock is greater than the execution time of the business.

Redisson can also easily implement read-write locks, such as:

@RestControllerpublic class TestController {    @Autowired    private StringRedisTemplate redisTemplate;    @Autowired    private RedissonClient redissonClient;    @GetMapping("/write")    public String write() {        RReadWriteLock wrLock = redissonClient.getReadWriteLock("wr_lock");        // Get write lock RLOCK wlock = wrlock writeLock();        //  Lock wlock lock();         String uuid = "";         Try {UUID = UUID. Randomuuid(). Tostring(); thread. Sleep (20 * 1000); / / store in redis redistemplate. Opsforvalue(). Set ("UUID", UUID);} catch (InterruptedException e) {            e.printStackTrace();        }  Finally {/ / release lock wlock. Unlock();} return uuid;    }    @ Getmapping ("/ read") public string read() {rreadwritelock wrlock = redissonclient.getreadwritelock ("wr_lock"); / / get read lock RLock rLock = wrLock.readLock(); / / lock RLOCK. Lock(); string UUID = ""; try {/ / read UUID UUID = redistemplate. Opsforvalue(). Get ("UUID") ;        }  Finally {/ / release lock RLOCK. Unlock();} return uuid;    }}

As long as the read-write lock uses the same lock, the read operation must wait during the write operation, and the write lock is a mutually exclusive lock. When a thread is writing, other threads must wait in line; Read / write is a shared lock. All threads can read directly, so that the latest data can be read every time.

Cache consistency

Although the use of cache improves the system throughput, it also brings a problem. When there is data in the cache, the data will be taken directly from the cache. However, if the data in the database is modified, the user still reads the data in the cache, which leads to the problem of data inconsistency, There are generally two solutions:

  1. Double write mode: modify the cache while modifying the database
  2. Failure mode: delete the cache directly after modifying the database

Double write mode can cause dirty data problems, as shown below: ![image.png](https://img-blog.csdnimg.cn/img_convert/bd08521806761daba6894326241bc061.png#align=left&display=inline&height=288&margin=[object Object]&originHeight=288&originWidth=778&size=31365&status=done&style=none&width=778) When administrators A and B modify the price of A commodity, administrator A submits it first and administrator B submits it later. It should be that administrator B's write cache operation takes effect. However, due to unknown conditions such as network fluctuations, administrator A's write cache operation takes effect, while administrator B's write cache operation takes effect, and finally the data in the cache becomes 2000, This leads to the generation of dirty data, but this kind of dirty data is only temporary, because the data in the database is correct, so after the cache expires, re query the database, and the data in the cache will be normal. The problem is how to ensure the data consistency in the double write mode. The solution is to lock the operation of modifying the database and modifying the cache, making it an atomic operation.

Failure modes can also lead to dirty data. Therefore, for frequently modified data, you should query the database directly instead of caching.

To sum up, the general solution is: set the expiration time for all cached data, so that when the cache expires, a database query can be triggered to update the cache; When reading and writing data, Redisson is used to add read-write locks to ensure the atomicity of write operations.

Added by Stray_Bullet on Mon, 10 Jan 2022 10:03:43 +0200