Data type analysis of Redis source code - DICT
The current Redis analysis version is 6.2, which should be noted.
DICT, dictionary data structure, is essentially a hash table. Related documents mainly include dict.h and dict.c.
Infrastructure
dictEntry is a hash node and a key value pair, in which the value adopts the union v. it can be seen from * next in the structure that this is a one-way linked list, and the necessary header node is saved separately and hung on the hash table.
typedef struct dictEntry { void *key; // key union { void *val; uint64_t u64; // unsigned int64_t s64; // signed double d; } v; // value struct dictEntry *next; // Next node address } dictEntry;
Then there is the dictType dictionary type, which distinguishes dictionaries. The related processing functions of different types of dictionaries can be different.
typedef struct dictType { // The hash function generates the corresponding hash value according to the Key uint64_t (*hashFunction)(const void *key); void *(*keyDup)(void *privdata, const void *key); // Key copy function void *(*valDup)(void *privdata, const void *obj); // Value copy function int (*keyCompare)(void *privdata, const void *key1, const void *key2); // Key comparison function void (*keyDestructor)(void *privdata, void *key); // Destroy key function void (*valDestructor)(void *privdata, void *obj); // Destroy value function int (*expandAllowed)(size_t moreMem, double usedRatio); // Expansion function? } dictType;
dictht hash table structure.
typedef struct dictht { dictEntry **table; // The secondary pointer stores the head pointer of the hash node linked list, mainly to solve hash conflicts unsigned long size; // Hash table size unsigned long sizemask; // Hash table size mask size-1. Calculate index idx = hash & sizemask unsigned long used; // Number of existing hash nodes } dictht; // Hashtable
dict dictionary.
typedef struct dict { dictType *type; // Dictionary type void *privdata; // Private data // Hash table array. There are only two hash tables. Generally, only the first hash table is used. // ht[1] can only be used in rehash of ht[0] dictht ht[2]; // Rehash IDX ID is - 1 rehash process index when there is no rehash long rehashidx; // rehash pause ID // >0 indicates pause < 0 indicates coding error int16_t pauserehash; } dict;
dictIterator dictionary iterator.
typedef struct dictIterator { dict *d; // Current iteration dictionary long index; // The hash table index location that the iterator is currently pointing to // Table hash table 0 | 1 currently iterated // Safe identifies whether the current iterator is safe // If safe is 1, it is safe. During the iteration, dictAdd, dictFind and other functions can be executed to modify the dictionary // If safe is not 1, you can only call dictNext to iterate over the dictionary. Limited modification int table, safe; // entry current iteration node pointer // nextEntry is the next node of the current iteration node. The entry node may be modified when the security iterator runs // Additional pointers are required to save the next node to prevent pointer loss dictEntry *entry, *nextEntry; // Unsafe iterator signature for misuse detection long long fingerprint; } dictIterator;
Macro definition function
DICT_NOTUSED
Do unused parameters generate warnings?
dictFreeVal
Releases the specified hash node value of the specified dictionary. If the value destructor is defined, it will not be processed if it is not defined.
#define dictFreeVal(d, entry) \ if ((d)->type->valDestructor) \ (d)->type->valDestructor((d)->privdata, (entry)->v.val)
dictSetVal
Set the pointing hash node value of the specified dictionary, and adopt the do/while structure to ensure that the code is executed once. When valDup is defined, the return value is used. On the contrary, it is set directly.
#define dictSetVal(d, entry, _val_) do { \ if ((d)->type->valDup) \ (entry)->v.val = (d)->type->valDup((d)->privdata, _val_); \ else \ (entry)->v.val = (_val_); \ } while(0)
dictSetSignedIntegerVal
Set hash node value, signed integer. Directly assign entry - > v.s64.
dictSetUnsignedIntegerVal
Set hash node value, unsigned integer. Directly assign entry - > v.u64.
dictSetDoubleVal
Set hash node value, double. Directly assign entry - > v.d.
dictFreeKey
Releases the specified hash node key of the specified dictionary, similar to dictFreeVal.
dictSetKey
Set the specified hash node key of the specified dictionary, similar to dictSetVal.
dictCompareKeys
Compares two hash keys of the specified dictionary. If D - > type - > keyCompare exists, call keyCompare for comparison. Otherwise, directly compare key1 == key2.
dictHashKey
Generates a hash value for the specified key of the specified dictionary.
dictGetKey
Gets the specified hash node key.
dictGetVal
Gets the value of the specified hash node.
dictGetSignedIntegerVal
Gets the signed integer value of the specified hash node.
dictGetUnsignedIntegerVal
Gets the unsigned integer value of the specified hash node.
dictGetDoubleVal
Gets the double value of the specified hash node.
dictSlots
Gets the total number of slots ht[0].size + ht[1].size for the specified dictionary.
dictSize
Gets the total number of nodes used by the hash table of the specified dictionary.
dictIsRehashing
Determines whether the specified dictionary is rehash.
dictPauseRehashing
Pauses the rehash process for the specified dictionary.
dictResumeRehashing
Continue the rehash process of the specified dictionary.
Constants and variables
DICT_OK
Macro definition constants. Indicates that the dictionary was processed successfully.
DICT_ERR
Macro definition constants. Indicates that dictionary processing failed.
DICT_HT_INITIAL_SIZE
Macro definition constants. Hash table initialization size.
dictTypeHeapStringCopyKey
External variables. Dictionary type, heap string copy key?
dictTypeHeapStrings
External variables. Dictionary type, heap string?
dictTypeHeapStringCopyKeyValue
External variables. Dictionary type, heap string copy key value?
dict_can_resize
Static variables. Whether the hash table size can be adjusted. However, if the value is 0, not all adjustments are blocked.
dict_force_resize_ratio
Static variables. Hash table size force scaling. used/size, if the ratio is greater than dict_force_resize_ratio, you need to force adjustment.
dict_hash_function_seed[16]
Static variables. Hash function seed.
hash function
dictSetHashFunctionSeed
Set hash seed. Implemented using memcpy.
dictGetHashFunctionSeed
Get hash seed. Direct return to Dict_ hash_ function_ Just seed.
dictGenHashFunction
Hash function. The shell is implemented in the sipash. C file.
dictGenCaseHashFunction
Is also a hash function. Shell sipash_ Nocase is implemented in the sipash. C file.
Private function
_dictExpandIfNeeded
Expand the hash table if necessary. Call the dictExpand interface function to implement.
static int _dictExpandIfNeeded(dict *d) { // The rehash process is in progress. Extension is not supported. Just return directly if (dictIsRehashing(d)) return DICT_OK; // If the hash table is empty, it is directly extended to the initial size DICT_HT_INITIAL_SIZE if (d->ht[0].size == 0) return dictExpand(d, DICT_HT_INITIAL_SIZE); // Condition 1: hash table used size > = hash table size, indicating saturation or overflow // Condition 2 Dict_ can_ Whether resize supports resizing or has reached the forced expansion ratio (load factor) // Condition 3: the dictionary type can be extended and adjusted // Expand to used+1 Size... In this way, the function may be triggered more frequently. Only add existing nodes + 1. if (d->ht[0].used >= d->ht[0].size && (dict_can_resize || d->ht[0].used/d->ht[0].size > dict_force_resize_ratio) && dictTypeExpandAllowed(d)) { return dictExpand(d, d->ht[0].used + 1); } return DICT_OK; }
_dictNextPower
Power expand the initial hash table size until it is greater than or equal to the given size.
static unsigned long _dictNextPower(unsigned long size) { unsigned long i = DICT_HT_INITIAL_SIZE; // Initial size 4 if (size >= LONG_MAX) return LONG_MAX + 1LU; // No overflow? while(1) { if (i >= size) // i = 2^n return i; i *= 2; } }
dictTypeExpandAllowed
Determines whether the specified dictionary allows hash table extension. It is mainly based on the expandAllowed function in the dictionary type.
static int dictTypeExpandAllowed(dict *d) { if (d->type->expandAllowed == NULL) return 1;// Not set, OK by default? return d->type->expandAllowed( _dictNextPower(d->ht[0].used + 1) * sizeof(dictEntry*), (double)d->ht[0].used / d->ht[0].size); }
_dictKeyIndex
Returns the index (head node) of the available slot. If in the rehash process, the index in ht[2] is always returned.
static long _dictKeyIndex(dict *d, const void *key, uint64_t hash, dictEntry **existing) { // Hash is generated by key. Then why not call the hash function to generate and pass parameters instead? unsigned long idx, table; dictEntry *he; // Hash node if (existing) *existing = NULL; // Hash node already exists. Force reset. if (_dictExpandIfNeeded(d) == DICT_ERR) return -1; // Expand the hash table if necessary for (table = 0; table <= 1; table++) { // Retrieved in ht[0] and ht[1], possibly in the rehash process. Need cycle idx = hash & d->ht[table].sizemask; // Index hash value and hash table size mask he = d->ht[table].table[idx]; // Get the corresponding hash header node and start the circular search key list while(he) { if (key==he->key || dictCompareKeys(d, key, he->key)) { // Presence return - 1 if (existing) *existing = he; return -1; } he = he->next; } if (!dictIsRehashing(d)) break; // If not in the rehash process. Direct break. ht[1]=null } return idx; // If it does not exist, the return value is? You may want to drill down into hash functions. }
_dictInit
Dictionary initialization.
int _dictInit(dict *d, dictType *type, void *privDataPtr) { _dictReset(&d->ht[0]); // Reset hash table _dictReset(&d->ht[1]); d->type = type; d->privdata = privDataPtr; d->rehashidx = -1; // Default no rehash procedure d->pauserehash = 0; return DICT_OK; } // There is a problem here. The function implementation does not match the declaration. dict.c@66 static int _dictInit(dict *ht, dictType *type, void *privDataPtr);
application program interface
_dictReset
Reset used HT_ The hash table initialized by init (the function implementation is not found yet). This function is only used in HT_ Called in destroy, but now we see it in There is also a call in dictInit. Reset the related attributes in the hash table to NULL(table)/0(size|sizemask|used).
dictCreate
Interface. Create a dictionary, mainly a hash table.
dict *dictCreate(dictType *type, void *privDataPtr) { dict *d = zmalloc(sizeof(*d)); // Allocate space _dictInit(d,type,privDataPtr); return d; }
dictResize
Resize the hash to include the minimum value of the node.
int dictResize(dict *d) { unsigned long minimal; // minimum value // If the hash table resizing is not supported or the rehash process is in progress, return directly if (!dict_can_resize || dictIsRehashing(d)) return DICT_ERR; minimal = d->ht[0].used; // Number of slots used if (minimal < DICT_HT_INITIAL_SIZE) // Less than the initialization size. replace minimal = DICT_HT_INITIAL_SIZE; return dictExpand(d, minimal); }
_dictExpand
Expand or create a hash table. When malloc_ When failed is non null, the program will not be terminated when memory allocation fails (in this case, it is 1).
int _dictExpand(dict *d, unsigned long size, int* malloc_failed) { if (malloc_failed) *malloc_failed = 0; // Reset memory allocation ID // In the rehash process, or the used slot is larger than the hash table size (needs to be expanded), it is returned directly if (dictIsRehashing(d) || d->ht[0].used > size) return DICT_ERR; dictht n; // New hash table unsigned long realsize = _dictNextPower(size); // New power expansion size // If the new size is equal to the hash table size, it is invalid. if (realsize == d->ht[0].size) return DICT_ERR; // Allocate a new hash table and reset all pointers to NULL n.size = realsize; n.sizemask = realsize-1; if (malloc_failed) { // I don't think I'll get in here, do I? The above has been reset to 0 n.table = ztrycalloc(realsize*sizeof(dictEntry*)); // attempt *malloc_failed = n.table == NULL; if (*malloc_failed) // Here, malloc_ The value of failed is not NULL? return DICT_ERR; } else n.table = zcalloc(realsize*sizeof(dictEntry*)); // Allocate and initialize memory n.used = 0; // If the first hash table is NULL, it is the initialization of the first hash table. Direct assignment and return. Not rehash procedure if (d->ht[0].table == NULL) { d->ht[0] = n; return DICT_OK; } // Prepare the second hash table for incremental rehash d->ht[1] = n; d->rehashidx = 0; return DICT_OK; }
dictExpand
Create or extend a hash table. Direct call_ dictExpand implementation, malloc_failed=NULL.
dictTryExpand
Attempt to create or extend a hash table. If allocation fails, return DICT_ERR.
int dictTryExpand(dict *d, unsigned long size) { int malloc_failed; // uninitialized _dictExpand(d, size, &malloc_failed); return malloc_failed? DICT_ERR : DICT_OK; }
dictRehash
Perform an n-step incremental rehash process (incremental). Returns 1 if there are still keys to move from the old hash table to the new hash table, otherwise returns 0.
int dictRehash(dict *d, int n) { // Browse the maximum number of empty slots (head nodes), 10 at a time? int empty_visits = n*10; if (!dictIsRehashing(d)) return 0; // Not in rehash process while(n-- && d->ht[0].used != 0) { // n is not 0, and the number of slots used in the first hash table is not 0 dictEntry *de, *nextde; // rehashidx cannot overflow. It ensures that there are nodes in the first hash table, because HT [0]. Used= 0 assert(d->ht[0].size > (unsigned long)d->rehashidx); // Short node filtering. Non dense? Offset back to non short nodes. If empty_ If the visits is 0, it means there is no. while(d->ht[0].table[d->rehashidx] == NULL) { d->rehashidx++; if (--empty_visits == 0) return 1; } de = d->ht[0].table[d->rehashidx]; // Get the header node corresponding to rehashidx // Transfer all keys in this bucket from the old hash table to the new hash table, that is, HT [0] = > HT [1] while(de) { uint64_t h; nextde = de->next; // Get the next hash node, loop variable. // Get the new hash index in ht[1] h = dictHashKey(d, de->key) & d->ht[1].sizemask; // Update header node de->next = d->ht[1].table[h]; d->ht[1].table[h] = de; d->ht[0].used--; d->ht[1].used++; de = nextde; // Update loop variable } // Backward offset. Update rehashidx d->ht[0].table[d->rehashidx] = NULL; d->rehashidx++; } // Detect complete transfer if (d->ht[0].used == 0) { zfree(d->ht[0].table); // Free original hash table space d->ht[0] = d->ht[1]; // Overall transfer HT [1] = > HT [0] _dictReset(&d->ht[1]); // Reset ht[1] d->rehashidx = -1; // Identify rehash end return 0; // nothing } // rehash is also required return 1; }
timeInMilliseconds
Gets the time in milliseconds.
long long timeInMilliseconds(void) { struct timeval tv; gettimeofday(&tv,NULL); return (((long long)tv.tv_sec)*1000)+(tv.tv_usec/1000); }
dictRehashMilliseconds
rehash in milliseconds + increments. The value of "delta" is greater than 0 and less than 1 in most cases. The exact upper limit depends on the running time of dictRehash(d,100).
int dictRehashMilliseconds(dict *d, int ms) { if (d->pauserehash > 0) return 0; long long start = timeInMilliseconds(); int rehashes = 0; while(dictRehash(d,100)) { rehashes += 100; if (timeInMilliseconds()-start > ms) break; } return rehashes; }
_dictRehashStep
Single node rehash. If pauserehash==0.
static void _dictRehashStep(dict *d) { if (d->pauserehash == 0) dictRehash(d,1); }
dictAdd
Adds a hash element to the specified dictionary.
int dictAdd(dict *d, void *key, void *val) { dictEntry *entry = dictAddRaw(d,key,NULL); if (!entry) return DICT_ERR; dictSetVal(d, entry, val); return DICT_OK; }
dictAddRaw
Add or find hash nodes at a lower level.
dictEntry *dictAddRaw(dict *d, void *key, dictEntry **existing) { long index; dictEntry *entry; dictht *ht; if (dictIsRehashing(d)) _dictRehashStep(d); // rehash process single node operation in progress // Get the new node index, and return - 1 to indicate that the node already exists. if ((index = _dictKeyIndex(d, key, dictHashKey(d,key), existing)) == -1) return NULL; // Locate the hash table. rehash process is ht[1], otherwise it is ht[0] ht = dictIsRehashing(d) ? &d->ht[1] : &d->ht[0]; entry = zmalloc(sizeof(*entry)); // Allocate node space entry->next = ht->table[index]; // Head node ht->table[index] = entry; // Fill in hash node ht->used++; // Update used slots dictSetKey(d, entry, key); // Set hash key return entry; }
dictReplace
Add or replace hash nodes.
int dictReplace(dict *d, void *key, void *val) { dictEntry *entry, *existing, auxentry; // If the entry is not NULL, the hash node is added entry = dictAddRaw(d,key,&existing); if (entry) { dictSetVal(d, entry, val); return 1; } // Set the new value and then release the old value (this order is important because val may be exactly the same as the existing value) auxentry = *existing; dictSetVal(d, existing, val); dictFreeVal(d, &auxentry); return 0; }
dictAddOrFind
Add or find hash nodes. Directly call the dictAddRaw implementation. If the return value is not empty, it means adding. Otherwise, it means finding existing.
dictGenericDelete
Find and remove hash nodes. Auxiliary functions.
static dictEntry *dictGenericDelete(dict *d, const void *key, int nofree) { uint64_t h, idx; dictEntry *he, *prevHe; int table; // Empty hash table if (d->ht[0].used == 0 && d->ht[1].used == 0) return NULL; if (dictIsRehashing(d)) _dictRehashStep(d); h = dictHashKey(d, key); // Generate corresponding hash value according to key for (table = 0; table <= 1; table++) { idx = h & d->ht[table].sizemask; // Get hash header node index he = d->ht[table].table[idx]; // Get the corresponding hash header node prevHe = NULL; while(he) { if (key==he->key || dictCompareKeys(d, key, he->key)) { // Key matching // Remove element from linked list if (prevHe) // Update the next of the previous node. If prevHe exists prevHe->next = he->next; else // Otherwise, set the next node of the current node as the header node d->ht[table].table[idx] = he->next; if (!nofree) { // nofree does not need to be released.! 0, release dictFreeKey(d, he); dictFreeVal(d, he); zfree(he); } d->ht[table].used--; // Update used slots return he; // Returns the removed node } prevHe = he; // Record the current node as the previous node he = he->next; // Update current node } if (!dictIsRehashing(d)) break; // Non rehash, interrupt loop } return NULL; // Corresponding node not found }
dictDelete
Delete hash node. Call dictGenericDelete directly, where nofree=0.
dictUnlink
Contact hash node binding. There is no need to actually release the corresponding hash key node. It is also a direct call to the dictGenericDelete implementation, where nofree=1 means that the corresponding space does not need to be released.
dictFreeUnlinkedEntry
Release the hash node that calls dictUnlink to unbind.
void dictFreeUnlinkedEntry(dict *d, dictEntry *he) { if (he == NULL) return; dictFreeKey(d, he); dictFreeVal(d, he); zfree(he); }
_dictClear
Destroy the entire dictionary.
int _dictClear(dict *d, dictht *ht, void(callback)(void *)) { unsigned long i; // Release all hash nodes for (i = 0; i < ht->size && ht->used > 0; i++) { dictEntry *he, *nextHe; // The private data of the dictionary is processed only once when i=0 if (callback && (i & 65535) == 0) callback(d->privdata); // The current header node is empty. Skip directly if ((he = ht->table[i]) == NULL) continue; while(he) { nextHe = he->next; // Release key node dictFreeKey(d, he); dictFreeVal(d, he); zfree(he); // Update the number of available slots, and the loop variable he ht->used--; he = nextHe; } } // Release hash table zfree(ht->table); // Reinitialize hash table _dictReset(ht); return DICT_OK; }
dictRelease
Destroy and release the hash table.
void dictRelease(dict *d) { _dictClear(d,&d->ht[0],NULL); _dictClear(d,&d->ht[1],NULL); zfree(d); // Release dictionary }
dictFind
Find hash node. Press the key.
dictEntry *dictFind(dict *d, const void *key) { dictEntry *he; uint64_t h, idx, table; if (dictSize(d) == 0) return NULL; // Empty dictionary if (dictIsRehashing(d)) _dictRehashStep(d); h = dictHashKey(d, key); for (table = 0; table <= 1; table++) { idx = h & d->ht[table].sizemask; he = d->ht[table].table[idx]; while(he) { if (key==he->key || dictCompareKeys(d, key, he->key)) return he; he = he->next; } if (!dictIsRehashing(d)) return NULL; } return NULL; }
dictFetchValue
Gets the value of the corresponding hash node.
void *dictFetchValue(dict *d, const void *key) { dictEntry *he; he = dictFind(d,key); return he ? dictGetVal(he) : NULL; }
dictFingerprint
Dictionary signature.
long long dictFingerprint(dict *d) { long long integers[6], hash = 0; int j; integers[0] = (long) d->ht[0].table; integers[1] = d->ht[0].size; integers[2] = d->ht[0].used; integers[3] = (long) d->ht[1].table; integers[4] = d->ht[1].size; integers[5] = d->ht[1].used; for (j = 0; j < 6; j++) { hash += integers[j]; /* For the hashing step we use Tomas Wang's 64 bit integer hash. */ hash = (~hash) + (hash << 21); // hash = (hash << 21) - hash - 1; hash = hash ^ (hash >> 24); hash = (hash + (hash << 3)) + (hash << 8); // hash * 265 hash = hash ^ (hash >> 14); hash = (hash + (hash << 2)) + (hash << 4); // hash * 21 hash = hash ^ (hash >> 28); hash = hash + (hash << 31); } return hash; }
dictGetIterator
Gets the dictionary iterator.
dictIterator *dictGetIterator(dict *d) { dictIterator *iter = zmalloc(sizeof(*iter)); iter->d = d; // Iterative dictionary iter->table = 0; // Hashtable iter->index = -1; // Iterative index iter->safe = 0; iter->entry = NULL; // Current hash node iter->nextEntry = NULL; // Next hash node return iter; }
dictGetSafeIterator
Gets the dictionary security iterator. After calling dictGetIterator, set ITER - > safe to 1.
dictNext
Iterative loop.
dictEntry *dictNext(dictIterator *iter) { while (1) { if (iter->entry == NULL) { // The current node is NULL, indicating the first iteration process? No. The same is true for chain switching. dictht *ht = &iter->d->ht[iter->table]; // Get the corresponding hash table if (iter->index == -1 && iter->table == 0) { // It can be proved to be the first iterative process if (iter->safe) // If it is a secure iterator, pause the rehash process dictPauseRehashing(iter->d); else // Otherwise, the signature is detected iter->fingerprint = dictFingerprint(iter->d); } iter->index++; // Update iteration index if (iter->index >= (long) ht->size) { // If overflow occurs, detect whether the current hash table involves rehash. After judgment, switch the table and reset the index if (dictIsRehashing(iter->d) && iter->table == 0) { iter->table++; // Switch hash tables, and reset indexes iter->index = 0; ht = &iter->d->ht[1]; // Update ht to ht[1] } else { break; // If not, end } } iter->entry = ht->table[iter->index]; // Update current node } else { iter->entry = iter->nextEntry; // Update current node } if (iter->entry) { // Save the next node, and the iterator caller may delete the current returned node iter->nextEntry = iter->entry->next; return iter->entry; // Returns the current iteration node } } return NULL; // Iteration end NULL }
dictReleaseIterator
Releases the specified iterator.
void dictReleaseIterator(dictIterator *iter) { if (!(iter->index == -1 && iter->table == 0)) { // It has been iterated if (iter->safe) // The security iterator needs to actively restart the rehash process dictResumeRehashing(iter->d); // That is, update D - > pauserehash-- else assert(iter->fingerprint == dictFingerprint(iter->d)); // Verify iterator signature } zfree(iter); // Free iterator space }
dictGetRandomKey
Randomly get a hash node.
dictEntry *dictGetRandomKey(dict *d) { dictEntry *he, *orighe; // Target node, and original node? unsigned long h; int listlen, listele; if (dictSize(d) == 0) return NULL; //Empty dictionary if (dictIsRehashing(d)) _dictRehashStep(d); // I still don't know the effect if (dictIsRehashing(d)) { // In rehash process do { // 0 - > rehashidx-1 no element h = d->rehashidx + (randomULong() % (dictSlots(d) - d->rehashidx)); he = (h >= d->ht[0].size) ? d->ht[1].table[h - d->ht[0].size] : d->ht[0].table[h]; } while(he == NULL); } else { do { h = randomULong() & d->ht[0].sizemask; he = d->ht[0].table[h]; } while(he == NULL); } // A non empty bucket is found, but it is a linked list. You need to get a random element from the linked list listlen = 0; // Linked list length orighe = he; // Original node while(he) { // next calculate the length of the linked list until it is empty he = he->next; listlen++; } listele = random() % listlen; // Random node location he = orighe; // return while(listele--) he = he->next; // Loop to target node return he; }
dictGetSomeKeys
Sample the dictionary and return several keys at random.
unsigned int dictGetSomeKeys(dict *d, dictEntry **des, unsigned int count) { // des stores an array of random nodes unsigned long j; // Internal hash table ID 0|1 unsigned long tables; // Number of hash tables to be sampled 1|2 unsigned long stored = 0, maxsizemask; unsigned long maxsteps; if (dictSize(d) < count) count = dictSize(d); // Detect and reset count (as needed) maxsteps = count*10; // Maximum steps? 10 at a time? // Try to make a rehash process batch proportional to count? for (j = 0; j < count; j++) { if (dictIsRehashing(d)) // You still need to judge whether it is in the rehash process _dictRehashStep(d); else break; } tables = dictIsRehashing(d) ? 2 : 1; // Distinguish according to current rehash maxsizemask = d->ht[0].sizemask; // Maximum size mask? if (tables > 1 && maxsizemask < d->ht[1].sizemask) // Indeed, the largest sizemask maxsizemask = d->ht[1].sizemask; // Select a random point (index?) in a larger hash table class unsigned long i = randomULong() & maxsizemask; unsigned long emptylen = 0; // So far, continuous empty node length while(stored < count && maxsteps--) { // count the specified number of random nodes for (j = 0; j < tables; j++) { // 0|1 // Constant rehash: update the index to ht[0], which has been browsed (moved to ht[1]) during rehash, and there is no bucket during the period // Therefore, you can directly filter the 0 - > idx-1 buckets of ht[0] if (tables == 2 && j == 0 && i < (unsigned long) d->rehashidx) { // However, if the current index is outside the range of the second table // In both tables, no elements are being updated to the current rehash index // Switch from large table to small table if (i >= d->ht[1].size) i = d->rehashidx; else continue; } if (i >= d->ht[j].size) continue; // The current table has overflowed. Switch to the next hash table ht[1] dictEntry *he = d->ht[j].table[i]; // Count the continuous empty buckets and jump to other locations until the count quantity is met (5 is the minimum?) if (he == NULL) { emptylen++; if (emptylen >= 5 && emptylen > count) { i = randomULong() & maxsizemask; // Reselect index emptylen = 0; } } else { emptylen = 0; while (he) { // Collect all non empty buckets in the iteration process *des = he; des++; // des offset he = he->next; stored++; // Number of storage nodes updated if (stored == count) return stored; // Reach the required quantity } } } i = (i+1) & maxsizemask; // Update index } return stored; }
dictGetFairRandomKey
Obtaining a more reasonable random node is more likely to extract more buckets than dictGetRandomKey.
dictEntry *dictGetFairRandomKey(dict *d) { dictEntry *entries[GETFAIR_NUM_ENTRIES]; // GETFAIR_NUM_ENTRIES 15 unsigned int count = dictGetSomeKeys(d,entries,GETFAIR_NUM_ENTRIES); // Random key // When there are elements in the hash table, dictGetSomeKeys may also return 0 elements. Worse // Therefore, you still need to call dictGetRandomKey to return at least one node. if (count == 0) return dictGetRandomKey(d); unsigned int idx = rand() % count; // Random again return entries[idx]; }
rev
Inverse transposition, tool function.
static unsigned long rev(unsigned long v) { unsigned long s = CHAR_BIT * sizeof(v); // Bit size, must be a power of 2 unsigned long mask = ~0UL; // Set mask 0000 0000 while ((s >>= 1) > 0) { // Shift bit right > > mask ^= (mask << s); // << ^ v = ((v >> s) & mask) | ((v << s) & ~mask); } return v; }
dictScan
Iterates over the elements of the dictionary.
unsigned long dictScan(dict *d, unsigned long v, dictScanFunction *fn, dictScanBucketFunction* bucketfn, void *privdata) { dictht *t0, *t1; const dictEntry *de, *next; unsigned long m0, m1; if (dictSize(d) == 0) return 0; // Pause rehash, do not judge whether it is rehash? dictPauseRehashing(d); if (!dictIsRehashing(d)) { t0 = &(d->ht[0]); m0 = t0->sizemask; // At the current position, call the bucket callback handler if (bucketfn) bucketfn(privdata, &t0->table[v & m0]); de = t0->table[v & m0]; while (de) { next = de->next; fn(privdata, de); // Node callback handler de = next; // continue } /* Set unmasked bits so incrementing the reversed cursor * operates on the masked bits */ v |= ~m0; /* Increment the reverse cursor */ v = rev(v); v++; v = rev(v); } else { t0 = &d->ht[0]; t1 = &d->ht[1]; /* Make sure t0 is the smaller and t1 is the bigger table */ if (t0->size > t1->size) { t0 = &d->ht[1]; t1 = &d->ht[0]; } m0 = t0->sizemask; m1 = t1->sizemask; /* Emit entries at cursor */ if (bucketfn) bucketfn(privdata, &t0->table[v & m0]); de = t0->table[v & m0]; while (de) { next = de->next; fn(privdata, de); de = next; } /* Iterate over indices in larger table that are the expansion * of the index pointed to by the cursor in the smaller table */ do { /* Emit entries at cursor */ if (bucketfn) bucketfn(privdata, &t1->table[v & m1]); de = t1->table[v & m1]; while (de) { next = de->next; fn(privdata, de); de = next; } /* Increment the reverse cursor not covered by the smaller mask.*/ v |= ~m1; v = rev(v); v++; v = rev(v); /* Continue while bits covered by mask difference is non-zero */ } while (v & (m0 ^ m1)); } dictResumeRehashing(d); // Restore rehash return v; }
dictEmpty
Reset dictionary.
void dictEmpty(dict *d, void(callback)(void*)) { _dictClear(d,&d->ht[0],callback); _dictClear(d,&d->ht[1],callback); d->rehashidx = -1; d->pauserehash = 0; }
dictEnableResize
Enable dictionary sizing parameter, dict_can_resize = 1.
dictDisableResize
Disable dictionary sizing parameter, dict_can_resize = 0.
dictGetHash
Get the hash value of the key and directly call dictHashKey.
dictFindEntryRefByPtrAndHash
Find references to nodes based on pointers and hash values?
dictEntry **dictFindEntryRefByPtrAndHash(dict *d, const void *oldptr, uint64_t hash) { dictEntry *he, **heref; unsigned long idx, table; if (dictSize(d) == 0) return NULL; for (table = 0; table <= 1; table++) { idx = hash & d->ht[table].sizemask; heref = &d->ht[table].table[idx]; he = *heref; while(he) { if (oldptr==he->key) // Key pointer match return heref; heref = &he->next; he = *heref; } if (!dictIsRehashing(d)) return NULL; } return NULL; }
Summary of this chapter
Through the partial interpretation of the source code, we can see that the implementation of the dictionary is based on hash table, which is not available in C language. Redis has implemented a set of dictionary by itself, which adopts the structure of dict > HT > Table > headptr. However, hash conflict is a very important problem (when two or more keys are assigned to the same index of the hash table array), the public chain address method is used to solve this problem. Because each hash node has a next pointer, a one-way linked list can be formed when multiple nodes are assigned to the same index. It is also inevitable to use circular structure in search. However, this is also the reason why adding new nodes is always added to the head of the linked list. It is impossible to iterate to the tail node to add, which is too waste.
When the hash node increases or decreases, the rehash process will also be triggered to adjust the size of the hash table accordingly. ht[1] is used in this case. When expanding, the size of ht[1] is always the first 2^n greater than or equal to ht[0].used*2. If shrink, ht[1].size=ht[0].used*2. This can effectively avoid frequent adjustment of the hash table, resulting in unnecessary loss. The rehash process is to recalculate the hash value and index value of the key, place the key value pair in the corresponding position of ht[1], migrate from ht[0] to ht[1], release ht[0], set ht[1] to ht[0], and ht[1] will create a new blank hash table for the next time. Of course, the migration process is not completed at one time, but gradually. The main reason is that when there are too many key value pairs, the process has too much impact on the machine performance.