Preface
This article is the learning notes of redis source code on string processing, welcome to correct. The version of redis is 5.0.5. I won't elaborate on the functions, uses and performance of redis.
text
In the main topic, redis provides its own string storage and related operations. The source files are in sds.h and sds.c. In the process of learning the code, we find that redis uses a more ingenious design. Instead of simply using char * in C language to store the string in redis, we use C language pointer to encapsulate the string structure. So that you can automatically expand the capacity in common string processing functions; moreover, this design ensures that you can use all the string functions in libc as well as redis string storage functions. Let's talk about this design. First, let's look at the next macro definition:
typedef char *sds;
In redis, the alias sds is given to char *, so the commonly used functions related to string operation also start with sds, such as:
void sdssetlen(sds s, size_t newlen); ​size_t sdslen(const sds s);
Next, look at the structure of the storage string:
/* Note: sdshdr5 is never used, we just access the flags byte directly. * However is here to document the layout of type 5 SDS strings. */ struct __attribute__ ((__packed__)) sdshdr5 { unsigned char flags; /* 3 lsb of type, and 5 msb of string length */ char buf[]; }; struct __attribute__ ((__packed__)) sdshdr8 { uint8_t len; /* used */ uint8_t alloc; /* excluding the header and null terminator */ unsigned char flags; /* 3 lsb of type, 5 unused bits */ char buf[]; }; struct __attribute__ ((__packed__)) sdshdr16 { uint16_t len; /* used */ uint16_t alloc; /* excluding the header and null terminator */ unsigned char flags; /* 3 lsb of type, 5 unused bits */ char buf[]; }; struct __attribute__ ((__packed__)) sdshdr32 { uint32_t len; /* used */ uint32_t alloc; /* excluding the header and null terminator */ unsigned char flags; /* 3 lsb of type, 5 unused bits */ char buf[]; }; struct __attribute__ ((__packed__)) sdshdr64 { uint64_t len; /* used */ uint64_t alloc; /* excluding the header and null terminator */ unsigned char flags; /* 3 lsb of type, 5 unused bits */ char buf[]; };
We can see that five different structures are defined. We know that redis data is cached in memory, so we define different structures to store strings of different lengths to reduce the memory consumption as much as possible. The sdshdr5 structure is basically not used, so I will not introduce it first. The other four structures contain four fields:
- len: string length. Different structures support different maximum length
- alloc: allocated memory size
- flags: low 3-bit storage type, i.e. structure type
- buf: the memory starting address used to store strings. buf is dynamically allocated
Each struct tells the compiler not to do byte alignment through the attribute. Without byte alignment, memory can be saved. Another purpose is to get flags by subtraction of pointer. After getting flags, everything is easy to do. The lower three bits in flags are used to store structure type. The definition in sds.h is as follows:
#define SDS_TYPE_5 0 #define SDS_TYPE_8 1 #define SDS_TYPE_16 2 #define SDS_TYPE_32 3 #define SDS_TYPE_64 4 #Define SDS? Type? Mask 7 / / get mask of type #Define SDS? Type? Bits 3 / / type bits
When a string needs to be stored, first select a structure based on the length of the string, and then apply for a piece of space. The size includes the size of the structure and the size of the string to be stored. Some codes are as follows:
sds sdsnewlen(const void *init, size_t initlen) { void *sh; sds s; char type = sdsReqType(initlen); // Select appropriate structure storage according to string length /* Empty strings are usually created in order to append. Use type 8 * since type 5 is not good at this. */ if (type == SDS_TYPE_5 && initlen == 0) type = SDS_TYPE_8; int hdrlen = sdsHdrSize(type); // Calculate the size of the structure to apply for memory space unsigned char *fp; /* flags pointer. */ sh = s_malloc(hdrlen+initlen+1); // Requested space contains \ 0 Terminator if (init==SDS_NOINIT) init = NULL; else if (!init) memset(sh, 0, hdrlen+initlen+1); if (sh == NULL) return NULL; s = (char*)sh+hdrlen; // s is the starting position of the storage string /* * Other codes * ...... * * / return s; }
Redis will apply for an extra byte to store the \ 0 terminator, and add the terminator automatically, so that all string processing functions in libc can be applied. The space requested in the function includes the structure part, but the return value is the address where the string is actually stored. So how does redis associate s with structure? First look at the two macros in sds.h:
// Define a structure pointer of sdshdr * * according to the starting position of the string #define SDS_HDR_VAR(T,s) struct sdshdr##T *sh = (void*)((s)-(sizeof(struct sdshdr##T))); // Convert the pointer to the structure pointer of sdshdr * * at the beginning of the string #define SDS_HDR(T,s) ((struct sdshdr##T *)((s)-(sizeof(struct sdshdr##T))))
Because the address is continuous at the time of application, these two macros can directly subtract the length of the structure from the address to get the starting address of the structure. To use these two macros, you need to know the type of structure, but only the starting address of a string. How do you know which structure is stored. In this case, the structure does not use memory alignment. Take the function to get the string length as an example:
static inline size_t sdslen(const sds s) { unsigned char flags = s[-1]; switch(flags&SDS_TYPE_MASK) { case SDS_TYPE_5: return SDS_TYPE_5_LEN(flags); case SDS_TYPE_8: return SDS_HDR(8,s)->len; case SDS_TYPE_16: return SDS_HDR(16,s)->len; case SDS_TYPE_32: return SDS_HDR(32,s)->len; case SDS_TYPE_64: return SDS_HDR(64,s)->len; } return 0; }
Because the structure has no memory alignment and the flag s field is of char type, you can get the flag directly from s[-1]. As long as you get the flag, it is easy to do anything else. Other operations such as obtaining capacity are handled in this way, which will not be introduced here.
Ending
This paper mainly introduces the technique of storing string structure. Other codes are relatively simple. The common storage operations in sds.h and sds.c, such as string concatenation and format, are automatic expansion, and the expansion size is exponential growth. Sometimes, it is possible to shrink. Whether it is to expand or shrink, it will cause the original pointer to fail. So when using, you should be careful about pointer failure. Most functions have to use this way: S = sdstrip (s, "AA.:"); Come here first. Welcome to correct~