Article catalog
meet
Before reading the source code, take a few questions. I think it's quite interesting.
Q1: how to implement a string that is easy to expand and binary safe (not interrupted by \ 0)? Q2: how is SDS compatible with C language functions? Q3: what operations does SDS show to save memory? Q4: how is SDS expanded? Q5: how does SDS reduce the number of copies?
Aside: I like this model very much. I've also written some blogs on source code analysis, but I feel it's gone after reading it, with little effect. When looking at nginx, I didn't learn much programming techniques except marveling at its uncanny architecture design and some hot problems (my main programming techniques were learned in the STL source code, of course, I didn't understand it). However, it may be better to use this way of initiating learning, which can also make people more willing to look at the source code.
turn
sds structure analysis
typedef char *sds;
/* Note: sdshdr5 is never used, we just access the flags byte directly. * However is here to document the layout of type 5 SDS strings. */ struct __attribute__ ((__packed__)) sdshdr5 { unsigned char flags; /* 3 lsb of type, and 5 msb of string length */ char buf[]; }; struct __attribute__ ((__packed__)) sdshdr8 { //The length of the string recorded in this field can be obtained at constant time //Due to the length record variable len, it does not depend on '\ 0' when reading and writing strings, thus ensuring binary security. uint8_t len; /* used */ //The maximum string length that can be stored is 2 ^ 8 = 256 uint8_t alloc; /* excluding the header and null terminator */ //The lower three bits represent the type of string, and the upper five bits are only used in sd5, but looking at the posture, sd5 is in the cold. unsigned char flags; /* 3 lsb of type, 5 unused bits */ //Flexible array, which saves an empty character as the end of buf and does not count in len and alloc, so it is compatible with C language functions such as strcmp and strcpy. char buf[]; }; struct __attribute__ ((__packed__)) sdshdr16 { uint16_t len; /* used */ uint16_t alloc; /* excluding the header and null terminator */ unsigned char flags; /* 3 lsb of type, 5 unused bits */ char buf[]; }; struct __attribute__ ((__packed__)) sdshdr32 { uint32_t len; /* used */ uint32_t alloc; /* excluding the header and null terminator */ unsigned char flags; /* 3 lsb of type, 5 unused bits */ char buf[]; }; struct __attribute__ ((__packed__)) sdshdr64 { uint64_t len; /* used */ uint64_t alloc; /* excluding the header and null terminator */ unsigned char flags; /* 3 lsb of type, 5 unused bits */ char buf[]; };
A String value can be at max 512 Megabytes in length. If you take that 64 as an example, will that be the only point? But it's no use if a string is too large.
In addition to saving memory, the use of flexible array has another advantage. The memory and structure of flexible array are continuous. It is very convenient to get the first address of the structure through the first address offset of flexible array.
Next, let's look at how to save memory:

This is sdshdr5. Unsigned 8 in it corresponds to a byte. The back of the self brain tonic.
basic operation
Create string
sds sdsnewlen(const void *init, size_t initlen) { void *sh; sds s; char type = sdsReqType(initlen); /* Empty strings are usually created in order to append. Use type 8 * since type 5 is not good at this. */ if (type == SDS_TYPE_5 && initlen == 0) type = SDS_TYPE_8; //SDS_TYPE_5 forced conversion to SDS_ TYPE_ eight int hdrlen = sdsHdrSize(type); //Calculate the required unsigned length for different headers unsigned char *fp; /* flags pointer. */ sh = s_malloc(hdrlen+initlen+1); //Left one for '\ 0' if (sh == NULL) return NULL; if (init==SDS_NOINIT) init = NULL; else if (!init) memset(sh, 0, hdrlen+initlen+1); s = (char*)sh+hdrlen; //Direct to buf fp = ((unsigned char*)s)-1; //buf first address offset switch(type) { case SDS_TYPE_5: { //This step is a little redundant. Some code is still there, but it's dead *fp = type | (initlen << SDS_TYPE_BITS); break; } case SDS_TYPE_8: { SDS_HDR_VAR(8,s); sh->len = initlen; sh->alloc = initlen; *fp = type; break; } case SDS_TYPE_16: { SDS_HDR_VAR(16,s); sh->len = initlen; sh->alloc = initlen; *fp = type; break; } case SDS_TYPE_32: { SDS_HDR_VAR(32,s); sh->len = initlen; sh->alloc = initlen; *fp = type; break; } case SDS_TYPE_64: { SDS_HDR_VAR(64,s); sh->len = initlen; sh->alloc = initlen; *fp = type; break; } } if (initlen && init) memcpy(s, init, initlen); s[initlen] = '\0'; return s; }
As for why put sd5 in the cold? Maybe it's because it's too short, and the ability to bear the risk of growing is not enough
Release string
/* Free an sds string. No operation is performed if 's' is NULL. */ void sdsfree(sds s) { if (s == NULL) return; s_free((char*)s-sdsHdrSize(s[-1])); //Here is the direct release of memory }
However, how can these excellent projects have no memory pool? There will be both overt and covert.
/* Modify an sds string in-place to make it empty (zero length). * However all the existing buffer is not discarded but set as free space * so that next append operations will not require allocations up to the * number of bytes previously available. */ void sdsclear(sds s) { sdssetlen(s, 0); s[0] = '\0'; }
There will be everything that should be.
sdsMakeRoomFor capacity expansion
Capacity expansion flow chart:

/* Enlarge the free space at the end of the sds string so that the caller * is sure that after calling this function can overwrite up to addlen * bytes after the end of the string, plus one more byte for nul term. * * Note: this does not change the *length* of the sds string as returned * by sdslen(), but only the free buffer space we have. */ sds sdsMakeRoomFor(sds s, size_t addlen) { void *sh, *newsh; size_t avail = sdsavail(s); size_t len, newlen; char type, oldtype = s[-1] & SDS_TYPE_MASK; int hdrlen; /* Return ASAP if there is enough space left. */ if (avail >= addlen) return s; len = sdslen(s); sh = (char*)s-sdsHdrSize(oldtype); newlen = (len+addlen); if (newlen < SDS_MAX_PREALLOC) newlen *= 2; else newlen += SDS_MAX_PREALLOC; type = sdsReqType(newlen); /* Don't use type 5: the user is appending to the string and type 5 is * not able to remember empty space, so sdsMakeRoomFor() must be called * at every appending operation. */ if (type == SDS_TYPE_5) type = SDS_TYPE_8; //After this battle, the pointer to buf was updated hdrlen = sdsHdrSize(type); if (oldtype==type) { newsh = s_realloc(sh, hdrlen+newlen+1); if (newsh == NULL) return NULL; s = (char*)newsh+hdrlen; } else { /* Since the header size changes, need to move the string forward, * and can't use realloc */ newsh = s_malloc(hdrlen+newlen+1); if (newsh == NULL) return NULL; memcpy((char*)newsh+hdrlen, s, len+1); s_free(sh); s = (char*)newsh+hdrlen; s[-1] = type; sdssetlen(s, len); } sdssetalloc(s, newlen); return s; }
Note that the upper part of the branch is realloc and the lower part is malloc. Note the difference between the two.
Little tip:__ attribute__ ((__packed__))
Adding the attribute ((packed)) keyword to the structure declaration can make our structure occupy memory in a compact arrangement. Simply put, it is to cancel memory alignment.
Where did this tip come from? Turn to the beginning and look again. This programming technique needs special attention. If you don't pay attention, you'll miss it.
In general, the structure will be aligned with memory. Take sd32 as an example. Before alignment, it is aligned according to 4 bytes, with a size of 12 bytes. After de alignment, the size is 9 bytes (buf no face). Moreover, after alignment, you can directly find flags by shifting one bit forward from the first address of buf. If not, you can think about how to find flags, which will almost become a "chicken / egg" knot (do not know the type, how to find the offset? Do not know the track offset, how to find the type?).
hair
Let's answer the first question by ourselves. Slip away.