redis source code reading notes - sds

The source code of this article comes from redis 6.2
link: https://github.com/redis/redis/tree/6.2
https://github.com/redis/redis/blob/6.2/src/sds.h
https://github.com/redis/redis/blob/6.2/src/sds.c

Introduction to sds

sds (simple dynamic string) its full name is simple dynamic string. It uses C language to encapsulate a binary secure string function library. Different from the character array of C language, it uses' \ 0 'as the terminator. SDS maintains a header to save the use length of the current character array. Therefore, SDS can ensure the integrity of the data stored therein without truncation. redis enables Use SDS as the string representation

+--------+-------------------------------+-----------+
| Header | Binary safe C alike string... | Null term |
+--------+-------------------------------+-----------+
         |
         `-> Pointer returned to the user.

sds memory structure

Before redis 4.0

struct sdshdr {
    unsigned int len; 
    unsigned int free;
    char buf[];
};

static inline size_t sdslen(const sds s) {
    struct sdshdr *sh = (void*)(s-(sizeof(struct sdshdr)));
    return sh->len;
}

static inline size_t sdsavail(const sds s) {
    struct sdshdr *sh = (void*)(s-(sizeof(struct sdshdr)));
    return sh->free;
}

redis 4.0 and later

/* Note: sdshdr5 is never used, we just access the flags byte directly.
 * However is here to document the layout of type 5 SDS strings. */
struct __attribute__ ((__packed__)) sdshdr5 {
    unsigned char flags; /* 3 lsb of type, and 5 msb of string length */
    char buf[];
};
struct __attribute__ ((__packed__)) sdshdr8 {
    uint8_t len; /* used */
    uint8_t alloc; /* excluding the header and null terminator */
    unsigned char flags; /* 3 lsb of type, 5 unused bits */
    char buf[];
};
struct __attribute__ ((__packed__)) sdshdr16 {
    uint16_t len; /* used */
    uint16_t alloc; /* excluding the header and null terminator */
    unsigned char flags; /* 3 lsb of type, 5 unused bits */
    char buf[];
};
struct __attribute__ ((__packed__)) sdshdr32 {
    uint32_t len; /* used */
    uint32_t alloc; /* excluding the header and null terminator */
    unsigned char flags; /* 3 lsb of type, 5 unused bits */
    char buf[];
};
struct __attribute__ ((__packed__)) sdshdr64 {
    uint64_t len; /* used */
    uint64_t alloc; /* excluding the header and null terminator */
    unsigned char flags; /* 3 lsb of type, 5 unused bits */
    char buf[];
};

#define SDS_TYPE_5  0
#define SDS_TYPE_8  1
#define SDS_TYPE_16 2
#define SDS_TYPE_32 3
#define SDS_TYPE_64 4
#define SDS_TYPE_MASK 7
#define SDS_TYPE_BITS 3
#define SDS_HDR_VAR(T,s) struct sdshdr##T *sh = (void*)((s)-(sizeof(struct sdshdr##T)));
#define SDS_HDR(T,s) ((struct sdshdr##T *)((s)-(sizeof(struct sdshdr##T))))
#define SDS_TYPE_5_LEN(f) ((f)>>SDS_TYPE_BITS)

static inline size_t sdslen(const sds s) {
    unsigned char flags = s[-1];
    switch(flags&SDS_TYPE_MASK) {
        case SDS_TYPE_5:
            return SDS_TYPE_5_LEN(flags);
        case SDS_TYPE_8:
            return SDS_HDR(8,s)->len;
        case SDS_TYPE_16:
            return SDS_HDR(16,s)->len;
        case SDS_TYPE_32:
            return SDS_HDR(32,s)->len;
        case SDS_TYPE_64:
            return SDS_HDR(64,s)->len;
    }
    return 0;
}

static inline void sdsinclen(sds s, size_t inc) {
    unsigned char flags = s[-1];
    switch(flags&SDS_TYPE_MASK) {
        case SDS_TYPE_5:
            {
                unsigned char *fp = ((unsigned char*)s)-1;
                unsigned char newlen = SDS_TYPE_5_LEN(flags)+inc;
                *fp = SDS_TYPE_5 | (newlen << SDS_TYPE_BITS);
            }
            break;
        case SDS_TYPE_8:
            SDS_HDR(8,s)->len += inc;
            break;
        case SDS_TYPE_16:
            SDS_HDR(16,s)->len += inc;
            break;
        case SDS_TYPE_32:
            SDS_HDR(32,s)->len += inc;
            break;
        case SDS_TYPE_64:
            SDS_HDR(64,s)->len += inc;
            break;
    }
}

     +-------------+-------+------------------------+-----------+-------------\
     | Len | Alloc | Flags | H E L L O W O R L D \n | Null term |  Free space \
     +---------------------+------------------------+-----------+-------------\
     |                     |                        |
     `->  sds hdr          `-> sds                  `-> '\0'
  1. C language struct is aligned by bytes by default, and struct is used__ attribute__ ((__packed__)) The sdshdr8 structure declared by sdshdr8 cancels byte alignment and is compact
  2. In order to save space, redis uses different structures for sds of different sizes. When expanding the capacity of sds, the header structure will be dynamically adjusted and additional space of min(1M, sds.len) size will be allocated to reduce the number of memory allocations
  3. The memory space used by the SDS structure of redis is hdrlen + initlen + 1, where hdrlen is the head size of SDS, initlen is the character array size of SDS, and 1 is the size of the \ 0 flag bit after the SDS character array. The reason why saving ends with the null character '\ 0' is that SDS that saves text data can reuse part of < string h> Library defined functions.
  4. The storage of sds includes two parts. The first part is the sds header, including len, alloc, flags and buf, where len is the length used by the buf of sds, alloc is the total length of the buf of sds, flags is the flag bit, which is used to record the type of current sds, and char buf [] is a character array with length of 0, which is called flexible array in C language, mainly to meet the variable length character array. The second part is the actual storage of buf array
  5. In the code in line 39, sdshdr##T's macro definition will be expanded during precompiling to splice the values of T on sdshdr. void * will be implicitly converted to struct sdshdr##T *, SDS_HDR_VAR(5, s) will get the variable sh, pointing to the header address of S

sds source code

_sdsnewlen

Create sds with init character array and initlen of specified length

/* Create a new sds string with the content specified by the 'init' pointer
 * and 'initlen'.
 * If NULL is used for 'init' the string is initialized with zero bytes.
 * If SDS_NOINIT is used, the buffer is left uninitialized;
 *
 * The string is always null-termined (all the sds strings are, always) so
 * even if you create an sds string with:
 *
 * mystring = sdsnewlen("abc",3);
 *
 * You can print the string with printf() as there is an implicit \0 at the
 * end of the string. However the string is binary safe and can contain
 * \0 characters in the middle, as the length is stored in the sds header. */
sds _sdsnewlen(const void *init, size_t initlen, int trymalloc) {
    void *sh;
    sds s;
    char type = sdsReqType(initlen);
    /* Empty strings are usually created in order to append. Use type 8
     * since type 5 is not good at this. */
    if (type == SDS_TYPE_5 && initlen == 0) type = SDS_TYPE_8;
    int hdrlen = sdsHdrSize(type);
    unsigned char *fp; /* flags pointer. */
    size_t usable;

    assert(initlen + hdrlen + 1 > initlen); /* Catch size_t overflow */
    sh = trymalloc?
        s_trymalloc_usable(hdrlen+initlen+1, &usable) :
        s_malloc_usable(hdrlen+initlen+1, &usable);
    if (sh == NULL) return NULL;
    if (init==SDS_NOINIT)
        init = NULL;
    else if (!init)
        memset(sh, 0, hdrlen+initlen+1);
    s = (char*)sh+hdrlen;
    fp = ((unsigned char*)s)-1;
    usable = usable-hdrlen-1;
    if (usable > sdsTypeMaxSize(type))
        usable = sdsTypeMaxSize(type);
    switch(type) {
        case SDS_TYPE_5: {
            *fp = type | (initlen << SDS_TYPE_BITS);
            break;
        }
        case SDS_TYPE_8: {
            SDS_HDR_VAR(8,s);
            sh->len = initlen;
            sh->alloc = usable;
            *fp = type;
            break;
        }
        case SDS_TYPE_16: {
            SDS_HDR_VAR(16,s);
            sh->len = initlen;
            sh->alloc = usable;
            *fp = type;
            break;
        }
        case SDS_TYPE_32: {
            SDS_HDR_VAR(32,s);
            sh->len = initlen;
            sh->alloc = usable;
            *fp = type;
            break;
        }
        case SDS_TYPE_64: {
            SDS_HDR_VAR(64,s);
            sh->len = initlen;
            sh->alloc = usable;
            *fp = type;
            break;
        }
    }
    if (initlen && init)
        memcpy(s, init, initlen);
    s[initlen] = '\0';
    return s;
}

sdsMakeRoomFor

Expand the space of sds on the original basis addlen

/* Enlarge the free space at the end of the sds string so that the caller
 * is sure that after calling this function can overwrite up to addlen
 * bytes after the end of the string, plus one more byte for nul term.
 *
 * Note: this does not change the *length* of the sds string as returned
 * by sdslen(), but only the free buffer space we have. */
sds sdsMakeRoomFor(sds s, size_t addlen) {
    void *sh, *newsh;
    size_t avail = sdsavail(s);
    size_t len, newlen, reqlen;
    char type, oldtype = s[-1] & SDS_TYPE_MASK;
    int hdrlen;
    size_t usable;

    /* Return ASAP if there is enough space left. */
    if (avail >= addlen) return s;

    len = sdslen(s);
    sh = (char*)s-sdsHdrSize(oldtype);
    reqlen = newlen = (len+addlen);
    assert(newlen > len);   /* Catch size_t overflow */
    if (newlen < SDS_MAX_PREALLOC)
        newlen *= 2;
    else
        newlen += SDS_MAX_PREALLOC;

    type = sdsReqType(newlen);

    /* Don't use type 5: the user is appending to the string and type 5 is
     * not able to remember empty space, so sdsMakeRoomFor() must be called
     * at every appending operation. */
    if (type == SDS_TYPE_5) type = SDS_TYPE_8;

    hdrlen = sdsHdrSize(type);
    assert(hdrlen + newlen + 1 > reqlen);  /* Catch size_t overflow */
    if (oldtype==type) {
        newsh = s_realloc_usable(sh, hdrlen+newlen+1, &usable);
        if (newsh == NULL) return NULL;
        s = (char*)newsh+hdrlen;
    } else {
        /* Since the header size changes, need to move the string forward,
         * and can't use realloc */
        newsh = s_malloc_usable(hdrlen+newlen+1, &usable);
        if (newsh == NULL) return NULL;
        memcpy((char*)newsh+hdrlen, s, len+1);
        s_free(sh);
        s = (char*)newsh+hdrlen;
        s[-1] = type;
        sdssetlen(s, len);
    }
    usable = usable-hdrlen-1;
    if (usable > sdsTypeMaxSize(type))
        usable = sdsTypeMaxSize(type);
    sdssetalloc(s, usable);
    return s;
}

sdscatlen

Add the first len characters of character array t to the end of sds and expand the capacity dynamically

/* Append the specified binary-safe string pointed by 't' of 'len' bytes to the
 * end of the specified sds string 's'.
 *
 * After the call, the passed sds string is no longer valid and all the
 * references must be substituted with the new pointer returned by the call. */
sds sdscatlen(sds s, const void *t, size_t len) {
    size_t curlen = sdslen(s);

    s = sdsMakeRoomFor(s,len);
    if (s == NULL) return NULL;
    memcpy(s+curlen, t, len);
    sdssetlen(s, curlen+len);
    s[curlen+len] = '\0';
    return s;
}

sdsfree

To release the sds space, you need to go back to the sds header before releasing it

/* Free an sds string. No operation is performed if 's' is NULL. */
void sdsfree(sds s) {
    if (s == NULL) return;
    s_free((char*)s-sdsHdrSize(s[-1]));
}

sdsdup

Deep copy, create a new sds using the current sds

/* Duplicate an sds string. */
sds sdsdup(const sds s) {
    return sdsnewlen(s, sdslen(s));
}

Debug sdsTest

gcc -std=c99 sds.c -g -nostartfiles -esdsTest -o sds zmalloc.c -DREDIS_TEST

  1. You need to compile using gcc instead of g + +, because C will implicitly convert void * to other types, while C + + cannot
  2. -The g parameter adds debugging information to the target file
  3. -The nostratfiles -esdsTest parameter changes the function entry to the sdsTest function
  4. -o sds specifies the name of the generated file
  5. zmalloc.c is a sds specific memory management file, which needs to be specified in the compilation parameters
  6. -DRESID_TEST is used to define the parameter REDIS_TEST
    gdb sds
#ifdef REDIS_TEST
#include <stdio.h>
#include <limits.h>
#include "testhelp.h"

#define UNUSED(x) (void)(x) / / the compiler will issue a warning for unused variables. This function is to shut up the compiler:), similar to _in go language= x

int sdsTest(int argc, char **argv, int accurate) {
    UNUSED(argc);
    UNUSED(argv);
    UNUSED(accurate);

    // Selected some representative test s
    {
        sds x = sdsnew("foo");

        test_cond("Create a string and obtain the length",
            sdslen(x) == 3 && memcmp(x,"foo\0",4) == 0);

        sdsfree(x);
        x = sdsnewlen("\0\n\r",3);
        test_cond("Create a string with specified length",
            sdslen(x) == 3 && memcmp(x,"\0\n\r\0",4) == 0);
            
        sdsfree(x);
        x = sdsnewlen("foo",2);
        test_cond("Create a string with specified length",
            sdslen(x) == 2 && memcmp(x,"fo\0",3) == 0);

        x = sdscat(x,"bar");
        test_cond("Strings concatenation",
            sdslen(x) == 5 && memcmp(x,"fobar\0",6) == 0);

        x = sdscpy(x,"xyzxxxxxxxxxxyyyyyyyyyykkkkkkkkkk");
        test_cond("sdscpy() against an originally shorter string",
            sdslen(x) == 33 &&
            memcmp(x,"xyzxxxxxxxxxxyyyyyyyyyykkkkkkkkkk\0",33) == 0);

        {
            sdsfree(x);
            char etalon[1024*1024];
            for (size_t i = 0; i < sizeof(etalon); i++) {
                etalon[i] = '0';
            }
            x = sdscatprintf(sdsempty(),"%0*d",(int)sizeof(etalon),0);
            test_cond("sdscatprintf() can print 1MB",
                sdslen(x) == sizeof(etalon) && memcmp(x,etalon,sizeof(etalon)) == 0);
        }
    }
    test_report();
    return 0;
}
#endif

Debugging environment: company development machine






Some problems

  1. The comment "do not use sdshdr5" is found in sdsMakeRoomFor. Sdshdr5 will still be used during actual debugging?
    In order to save memory, sdshdr5 is allowed to be used during initialization. Sdshdr5 will be extended to sdshdr8 when sds is expanded
  2. _ sdsnewlen will use zmalloc Zmalloc under c_ Usable function, zmalloc What does C do when allocating memory?
    During compilation, ptmalloc of glibc, tcmalloc of google and jemalloc of facebook can be used as memory management tools to execute compilation parameters. zmalloc encapsulates these memory management tools into a whole to provide a unified interface for the upper layer
  3. malloc needs to pass in the requested memory size, but free only needs to pass in a pointer. How does free know its size?
    glibc's malloc related source code is not well understood. Refer to CSAPP. When malloc, it will apply for more memory space to store the current memory length and other information, similar to sds

reference

  1. redis6.2 README
  2. SDS README
  3. GDB Getting Started Guide

I have little talent and knowledge, and my understanding level is limited. Please criticize and correct me if there are mistakes and improper places.

Keywords: Redis Cache nosql

Added by (RL)Ian on Mon, 27 Dec 2021 14:43:17 +0200