Hard core resources! Redis five data structures and three advanced data structures analysis (detailed explanation)

The last one shared "understanding JVM in depth", and this one shared "Redis five data structures and three advanced data structure analysis".

preface

The most important and fundamental part of Redis is its rich data structure. The reason why Redis stands out is that it has rich data structure and can support a variety of scenarios. Moreover, Redis's data structure implementation and application scenarios are quite common in the interview. Next, let's talk about Redis's data structure.

Redis has five data structures: string, list, hash, set and sorted set. However, redis also has more advanced data structures, such as HyperLogLog, Geo and BloomFilter. Let's talk about these data structures of redis.

String

Basic concept: String is the simplest and most commonly used data structure in Redis and the only data structure in Memcached. In normal development, String can be said to be used most frequently.
Underlying implementation:

  • If a string object stores an integer value and the integer value can be represented by long, the string object will save the integer value in the ptr attribute of the string object structure (convert void * to long) and set the encoding of the string object to int.
  • If the string object saves a string value and the length of the string value is greater than 39 bytes, the string object will use a simple dynamic string (SDS) to save the string value and set the encoding of the object to raw.
  • If the string object saves a string value and the length of the string value is less than or equal to 39 bytes, the string object will use embstr encoding to save the string value.

use:

> redis_cli            # Start redis cli client
> set hello world      # Set the value of the key hello to world   
OK                     # OK will be returned after the set command succeeds
> get hello            # Get the value with the key hello through the get command
"world"                # Obtained value
> del hello            # Delete the value whose key is hello
(integer) 1            # The number of deleted items is returned
> mset a 10 b 20 c 30  # Set value of batch
OK
> mget a b c           # Return value of batch
1)"10"
2)"20"
3)"30"
> exists hello         # Does the key exist
(integer) 1            # 1 indicates existence, 0 indicates non existence
> expire hello 10      # Set expiration time for hello in seconds
(integer) 1            # Returning 1 means success, and 0 means that the key does not exist or the expiration time cannot be set
> pexpire hello 10     # Set expiration time for hello in milliseconds
(integer) 1            # Returning 1 means success, and 0 means that the key does not exist or the expiration time cannot be set

Next, I will focus on a series of commands set key value [EX seconds] [PX milliseconds] [NX|XX]. This is still very important and easy to be confused.
reids clears the TLL value every time it overwrites the previous value. (TTL is the expiration time)

  • EX second: set the expiration time of the key to second seconds. The effect of SET key value EX second is equivalent to that of set key second value.
  • PX millisecond: set the expiration time of the key to millisecond. SET key value PX millisecond effect is equivalent to PSETEX key millisecond value.
  • Nx: set the key only when the key does not exist. SET key value NX has the same effect as SETNX key value.
  • 20: Set the key only when it already exists.
# Use EX option
> set key1 hello EX 1000   # Set expiration time 1000s
OK
> ttl hello                # Get the expiration time of hello
(integer) 1000
# Use PX option
> set key1 hello PX 1000   # Set expiration time 1000ms
OK
> ttl hello                # Get the expiration time of hello
(integer) 1000
# Use NX option
> set hello world NX
OK                         # Key does not exist, set successfully
> get hello
"value"
> set hello world NX
(nil)                      # Key already exists, setting failed
> get hello
"world"                    # Keep the original value unchanged
# Use XX option
> exists hello             # Make sure hello doesn't exist
(integer) 0
> set hello world XX
(nil)                      # Setting failed because the key does not exist
> set hello wolrd          # Set a value for hello first
OK
> set hello newWolrd XX
OK                         # This time, the setting is successful
> get hello
"newWorld"
# NX or XX can be used in combination with EX or PX
> set hello world EX 1000 NX
OK
> get hello
"world"
> ttl hello
(integer)1000
> set hello wolrd PX 30000 NX
OK
> pttl hello
(integer)30000          # In practice, this value must be less than 30000. This time it is written for the effect
# EX and PX can appear at the same time, but the options given later will override the options given earlier
> set hello wolrd EX 1000 PX 30000
OK
> ttl hello
(integer)30            # This is the parameter set by PX,
> pttl hello
(integer)30000
> set number 1
OK
> incr number          # Auto increment number
(integer) 2

In the development process, it is a very common operation to implement locks with redis. Combined with NX and EX.

> set hello world NX EX 10     # Lock successfully. The expiration time is 10s
OK
> set hello wolrd NX EX 10     # Executing this command within 10s returns an error because the last lock has not been released
(nil)
> del hello                    # The lock was released
OK
> set hello world NX EX 10     # Lock successfully. The expiration time is 10s
OK
> setnx hello world            # You can also write that
> setex hello 10 wolrd

The lock can be released by setting the expiration time and manual del deletion.
string commands are commonly used, so I'll introduce them more. I'll focus on the following commands.

Application scenario:

  • Caching function: string is the most commonly used caching function. It will cache some data that are not updated frequently but queried frequently, so as to reduce the pressure of DB.
  • Counter: it can be used to count through incr operations, such as statistics of website visits, article visits, etc.
    -

List

Basic concept: List is an ordered and repeatable List, which is quite similar to Java List. It has fast query speed and can be queried through index; Insertion and deletion are slow.

Underlying implementation:

  • The encoding of the list object can be ziplist or linkedlist.
  • The length of all string elements saved by the list object is less than 64 bytes, and the number of saved elements is less than 512. Zip list encoding is used; Otherwise, use linkedlist;

use:

> lpush mylist a     # Insert data from left
(ineteger)1
> lpush mylist b
(integer)1
> rpush mylist c     # Insert data from the right
(integer)1
> lrange mylist 0 -1 # To retrieve data, lrange needs two indexes, closed on the left and closed on the right; 0 is from 0, - 1 is the penultimate, and - 2 is the penultimate and so on
1)"b"
2)"a"
3)"c"
> lrange mylist 0 -2 # 0 to penultimate 
1)"b"
2)"a"
> lpush mylist a b c # Batch insert
(integer)3
> lpop mylist        # Pop up element from left
"b"
> rpop mylist        # Pop up element from right
"c"
> rpop mylist        # Returns null when there are no elements in the list
(nil)
> brpoop mylist 5    # Pop up elements from the right. If there are no elements in the list, it will be blocked. If there are still no elements after 5 s, it will return
1)"mylist"   # List name        
2)"b"        # Pop up element
> del mylist         # Delete list 
(integer)1

Usage scenario:

  • Message queue: Redis's list is an ordered list structure, which can block the queue and use the method of left in and right out. Lpush is used to produce data inserted from the left, Brpop is used for consumption, and lpush is used to insert data from the right.
  • Paging display of data: the lrange command requires two indexes to obtain data, which can be used to realize paging. You can calculate two index values in the code, and then retrieve data from redis.
  • It can be used to realize functions such as fan list and ranking of the latest news.
    -

Hash

Introduction: Redis hash can store the mapping between multiple key value pairs. Like a string, the value stored in the hash can be either a string or a numeric value, and the user can also perform self increment or self decrement on the numeric value stored in the hash. This is very similar to Java's HashMap. Each HashMap has its own name and can store multiple k/v pairs at the same time.
Underlying implementation:

  • The encoding of hash object can be ziplist or hashtable.
  • The string length of keys and values of all key value pairs saved by the hash object is less than 64 bytes, and the number of key value pairs saved is less than 512, which is encoded by ziplist; Otherwise, use hashtable;

use:

> hset student name Zhang San    # It can be understood as forgetting to add kv key value pairs to the map named student
(integer)1            # Returning 1 means that the key does not exist and is added successfully
> hset student sex male
(integer)1
> hset student name Zhang San
(integer)0            # Return 0 because the key already exists
> hgetall student
1)"name"
2)"Zhang San"
3)"sex"
4)"male"
> hdel student name       #Delete this key
(integer)1           # Returning 1 also means that the entire key exists and is deleted successfully
> hdel student name
(integer)0           # 0 is returned because the key no longer exists

Application scenario:

  • Hash is more suitable for storing structured data, such as objects in Java; In fact, objects in Java can also be stored with strings. You only need to serialize the objects into json strings. However, if a property of the object is updated frequently, you need to serialize and store the whole object every time, which is expensive. If you use hash to store each attribute of an object, you only need to update the attribute to be updated each time.
  • Shopping cart scenario: you can take the user's id as the key, the commodity id as the stored field, and the commodity quantity as the value of the key value pair, which constitutes the three elements of the shopping cart.
    -

Set

Basic concept: both set and list of Redis can store multiple strings. The difference between them is that list is orderly and repeatable, while set is unordered and non repeatable.
Underlying implementation:

  • The code of the collection object can be intset or hashtable.
  • All elements saved by the collection object are integer values, and the number of saved elements does not exceed 512. intset coding is used; Otherwise, use hashtable;

use:

> sadd family mother          # Try adding mother to the family collection
(integer)1       # Returning 1 indicates that the addition is successful, and 0 indicates that the element already exists in the collection
> sadd family father
(integer)1
> sadd family father
(intger)0
> smembers family             # Gets all the elements in the collection
1)"mother"
2)"father"
> sismember family father     # Judge whether the father is in the family set 
(integer)1      # 1 exist; 0 does not exist
> sismber family son
(integer)0
> srem family son             # Remove the element son from the family collection
(integer)1     # 1 indicates existence and successful removal; 0 indicates that the element exists
> srem family som
(integer)0
> sadd family1 mother
(integer)1
> smembers family 
1)"mother"
2)"father"
> smember family1
1)"mother"
> sinter family family1     # Get the intersection of family and family1
1)"mother"
> sadd family1 son
(integer)1
> sunion family family1     # Gets the union of family and family1
1)"mother"
2)"father"
> sdiff family family1      # Get the difference set between family and family1 (that is, the elements that family has but family1 does not)
1)"father"

Application scenario:

  • Tags: you can store the tags of each person in the blog site with a set collection, and then merge users according to each tag.
  • Store friends / fans: set has the function of de duplication; You can also use the set union function to get functions such as common friends.
    -

Sorted Set

Basic concept: like hash, ordered set is used to store key value pairs: each key of ordered set is called member, which is unique, while each value of ordered set is called score, which must be a floating point number. It can be sorted according to scores. The ordered set is the only one in Redis that can access elements according to members (which is the same as hashing), and access the structure of elements according to scores and the order in which they are arranged. Like other Redis structures, users can add, remove and obtain ordered collections.
Underlying implementation:

  • The code of the ordered set can be ziplist or skiplist
  • The number of elements saved in the ordered collection is less than 128, and the length of all element members saved is less than 64 bytes. Use zip list encoding; Otherwise, use skiplist;

use:

> zadd class 100 member1 # Add the member1 element and its score value of 100 to the ordered collection class
(integer)1
> zadd class 90 member2 80 member3 # Batch add
(integer)2
> zrange class 0 -1 withscores  # Get the values and scores in the ordered collection and sort by score
1)"member3" 
2)"80"
3)"member2"
4)"90"
5)"member1"
6)"100"
> zrem class member1   # Delete member1 in class
(integer)1

Application scenario:

  • Ranking list: the most commonly used scenes of ordered collection. For example, news websites sort hot news, such as according to the number of hits, likes, etc.
  • Message queue with weight: important messages have a larger score and ordinary messages have a smaller score. Tasks with high priority can be executed first. Basic concepts of hyperlog:
    Redis added the hyperlog structure in version 2.8.9.

Redis HyperLogLog is an algorithm used for cardinality statistics. The advantage of HyperLogLog is that when the number or volume of input elements is very, very large, the space required to calculate the cardinality is always fixed and very small.

In Redis, each hyperlog key only needs 12 KB of memory to calculate the cardinality of nearly 2 ^ 64 different elements. This is in sharp contrast to a collection where the more elements consume more memory when calculating the cardinality.

However, because HyperLogLog only calculates the cardinality based on the input elements and does not store the input elements themselves, HyperLogLog cannot return the input elements like a collection.
use:
Here is an example of how many users logged in on a statistical website on May 23, 2021

> pfadd user_login_20210523 tom #  user_login_20210523 is the key; tom is the logged in user
(integer)1
> pfadd user_login_20210523 tom jack lilei User
(integer)1
> pfcount user_login_20210523  # Get the number of key corresponding values. The same user can log in multiple times only once
(integer) 3  
> pfadd user_login_20210522 sira 
(integer)1
> pfcount user_login_20210523 user_login_20210522 # Count the number of login users on the 22nd and 23rd
(integer)4
>pfmerge user_login_20210522_23 user_login_20210522 user_login_20210523 # Merge multiple key contents
"OK"
> pfcount user_login_20210522_23
(integer)4

Application scenario:

  • It can be used to count the number of visitors to the website and other indicators

GEO

Basic concepts:
In redis version 3.2, a new data structure called geo is added, which is mainly used to store geographic location information and operate the stored information.
use:
geoadd is used to store the specified geospatial location. You can add one or more longitudes, latitudes and location names to the specified key.

> GEOADD beijing 116.405285 39.912835 "Mushrooms can't sleep"
(integer)2

geopos is used to return the positions (longitude and latitude) of all specified members from a given key, and nil if it does not exist.

> GEOPOS beijing "Mushrooms can't sleep" "the Imperial Palace"
1) 1)116.405285
   2)39.912835
2)(nil)   

geodist is used to return the distance between two given positions.
Unit parameters:
m: Meter, default unit.
km: kilometers.
mi: miles.
ft: feet.

> GEOADD beijing 116.403681 39.921156 "the Imperial Palace"
(integer)1
> GEODIST beijing "Mushrooms don't sleep" "the Imperial Palace" km
"0.936"

Application scenario:
A scene used to store and manipulate geographic information.
**Popular science is a little knowledge of Geography:
Longitude range: - 180 - 180. Calculated from the 0 ° longitude, it is divided into 180 ° to the East and 180 ° to the West. The 180 ° to the East belongs to the east longitude, and it is customary to use "E" as the code. The 180 ° to the West belongs to the west longitude, and it is customary to use "W" as the code. 0 ° position: the meridian at the center of the meridian at Greenwich Observatory is the primary meridian.
Latitude range: - 90 - 90. The latitude of the point north of the equator is called north latitude, which is N; The latitude of the point south of the equator is called south latitude, which is S. For the convenience of research, people divide the latitude into low, medium and high latitudes. 0 ° ~ 30 ° is low latitude, 30 ° ~ 60 ° is middle latitude and 60 ~ 90 ° is high latitude**

BloomFilter

Basic concepts:
A data structure consisting of a long string of binary vectors that can be regarded as a binary number
Group. Since it is binary, it stores either 0 or 1, but the initial default value is 0. Its main function is to judge whether an element is in a set. For example, I want to judge whether there is a number among the 2 billion numbers. If I plug in DB directly, the amount of data will be too large and the time will be very slow; If you put 2 billion data into the cache, the cache will not fit. At this time, bloom filter is the most suitable. The principle of Bloom filter is:

  • Add element
    When we want to add an element key to the bloom filter, we calculate a value through multiple hash functions, and then set the square of this value to 1.

 

  • Determine whether the element exists:
    To judge whether an element exists, first calculate the element through multiple hash functions to multiple subscript values, and then judge whether the element values corresponding to these subscripts are all 1. If there are elements that are not 1, then the element must not be in the set; If they are all 1, the element probability is in the set, and it is not 100% certain that the element exists in the set. Because the results of multiple different data calculated through the hash function will be repeated, there will be a position where other data is set to 1 through the hash function.
    In general: bloom filter can judge that a certain data must not exist, but it cannot judge that it must exist.
  • Advantages and disadvantages of Bloom filter:
  • Advantages: the advantages are obvious. The binary array occupies very little memory, and the insertion and query speed are fast enough.
  • Disadvantages: with the increase of data, the misjudgment rate will increase; Also, it is impossible to judge that the data must exist; Another important disadvantage is that data cannot be deleted.

use:
RedisBloom, a plug-in of Bloom filter, can be used after redis 4.0. The command is as follows:

bf.add Add element to bloom filter
bf.exists Determine whether the element is in the bloom filter
bf.madd Add multiple elements to the bloom filter, bf.add Only one can be added
bf.mexists Determine whether multiple elements are in bloom filter

> bf.add boomFilter tc01
(integer) 1            # 1: Existence; 0: does not exist
> bf.add boomFilter tc02
(integer) 1
> bf.add boomFilter tc03
(integer) 1
> bf.exists boomFilter tc01
(integer) 1
> bf.exists boomFilter tc02
(integer) 1
> bf.exists boomFilter tc03
(integer) 1
> bf.exists boomFilter tc04
(integer) 0
> bf.madd boomFilter tc05 tc06 tc07
1) (integer) 1
2) (integer) 1
3) (integer) 1
> bf.mexists boomFilter tc05 tc06 tc07 tc08
1) (integer) 1
2) (integer) 1
3) (integer) 1
4) (integer) 0

 

  1. Redisson uses Bloom filters:
public static void main(String[] args) {

        Config config = new Config();
        config.useSingleServer().setAddress("redis://192.168.15.105:6379");
        config.useSingleServer().setPassword("password123");
        
        //Construct Redisson
        RedissonClient redisson = Redisson.create(config);

        RBloomFilter<String> bloomFilter = redisson.getBloomFilter("userPhones");
        
        //Initialize bloom filter: the estimated element is 500000000L, and the error rate is 3%
        bloomFilter.tryInit(500000000L,0.03);
        
        //Insert the number 10086 into the bloom filter
        bloomFilter.add("18846014678");

         //Determine whether the following numbers are in the bloom filter
        System.out.println(bloomFilter.contains("18846014678")); //true
        System.out.println(bloomFilter.contains("1111111222")); //false
 }

 

  1. Guava uses a bloom filter:
    Guava is a Java toolkit provided by Google, which is very powerful
public static void main(String[] args) {
        BloomFilter<String> bloomFilter = BloomFilter.create(Funnels.stringFunnel(Charsets.UTF_8), 500000, 0.01);

        bloomFilter.put("18846047789");

        System.out.println(bloomFilter.mightContain("18846047789")); // true
        System.out.println(bloomFilter.mightContain("1122222"));     //false
    }
}

Application scenario:

  • Solve the cache penetration problem: in general query scenarios, query the cache first. If there is no cache, query the DB. If it is found, it is stored in the cache first, and then returned to the caller. If it cannot be found, it returns null. In this case, if someone frequently requests no data in the cache, such as data with id = -1, it will cause great pressure on the DB. In this case, the redis bloom filter can be used. You can store all possible IDS in the bloom filter first. When the query comes, go to the Bloom filter first. If you can't find it, you won't request the cache and DB, If it exists in the bloom filter, the data is fetched from the cache.
  • Blacklist verification: you can put the ip in the blacklist into the bloom filter, so you don't have to query in db every time you come.

summary

Redis's rich data structure is one of the important cornerstones supporting redis. It enables redis to adapt to many complex scenarios. This part of the content can be said to be a required test in the interview, so we should make more efforts in this regard.

  • The above is the sharing of Redis five data structures and three advanced data structure analysis.
  • You are also welcome to exchange and discuss. If there are any inaccuracies in this article, I hope you will forgive me.
  • It's not easy to create. Your support is my biggest motivation. If I can help you, give me a compliment~~~

 

Keywords: Java Database Redis data structure Back-end

Added by kaedus on Tue, 01 Feb 2022 06:14:41 +0200