The following contents are from the data structure (2nd Edition) by grandma Chen Yue. The notes are for your reference only.
Open addressing
The so-called open addressing method is to find another empty hash address once there is a conflict, that is, when the address has stored other elements. In an unfilled hash table, an empty hash address can always be found.
Lazy delete: add a "delete tag" instead of actually deleting it. Because during the lookup operation, finding an empty address means that the lookup fails, but in fact, maybe the data objects hashed here have bypassed here and exist elsewhere.
Generally speaking, when the i-th conflict occurs, the next address we test will increase di. That is, h1(key)=(h(key) + di) mod TableSize. According to the different selection methods of Di, the following four conflict resolution methods can be obtained.
- Linear detection method
di=i in the above formula becomes the linear detection method, that is, the linear detection method circularly tests the next storage address with the increment sequence 1, 2,... (TableSize-1).
When inserting, you need to find an empty position or know that the hash table is full; During the search operation, detect a keyword and compare it once until a specific data object is found or an empty position is detected, indicating that the search fails.
Primary Clustering: the phenomenon that many elements "pile up" on adjacent hash addresses, which greatly reduces the search efficiency.
- Square detection method
The square detection method is to test the next storage address with the increment sequence 12, - 12, 22, - 22,..., q2, - q2, Q < = [TableSize / 2]. If the length of the hash table, TableSize, is a prime number in the form of 4k+3, the square detection method can detect the entire hash table space.
Secondary clustering: data objects hashed to the same address will detect the same candidate unit.
Type declaration of open addressing method:
#define MAXTABLESIZE 100000 / / maximum hash table length allowed typedef int ElementType;//int for keyword type typedef int Index;//int for hash address type typedef Index Position;//The data location is in int typedef enum{Legitimate,Empty,Deleted}EntryType; //Hash cell status type, corresponding to legal element, empty cell and deleted element respectively typedef struct HashEntry Cell;//Type of hash table cell struct HashEntry { ElementType Data;//Store elements EntryType Info;//Unit status }; typedef struct TblNode *HashTable;//Hash table type struct TblNode {//Hash table node definition int TableSize;//Maximum length of table Cell *Cells;//An array that holds hash cell data };
Initialization function of open addressing method:
int NextPrime(int N) {//Returns the minimum prime number greater than N and not exceeding MAXTABLESIZE int i; int p=(N%2)?N+2:N+1;//Start with the next odd number greater than N while(p<=MAXTABLESIZE) { for(i=(int)sqrt(p);i>2;i--) if(!(p%i)) break;//p is not prime if(i==2) break;//The for loop ends normally, indicating that p is a prime number else p+=2; } return p; } HashTable CreateTable(int TableSize) { HashTable H; int i; H=(HashTable)malloc(sizeof(struct TblNode)); H->TableSize=NextPrime(TableSize); //The Table Size passed in by the user is not necessarily a prime number. The maximum length of the hash table is guaranteed to be a prime number H->Cells=(Cell*)malloc(H->TableSize*sizeof(Cell)); for(int i=0;i<H->TableSize;i++) H->Cells[i].Info=Empty; return H; }
Search function of square detection method:
Position Find(HashTable H,ElementType Key) { Position CurrentPos,NewPos; int CNum=0;//CNum records the number of conflicts NewPos=CurrentPos=Hash(Key,H->TableSize);//Initial hash position //A conflict occurs when the cell at this location is not empty and is not the element to be found //Suppose the keyword key here is of type int while(H->Cells[NewPos].Info!=Empty&&H->Cells[NewPos].Data!=Key) { if(++CNum%2) {//If it's an odd number of conflicts NewPos=CurrentPos+(CNum+1)*(CNum+1)/4; if(NewPos >= H->TableSize)//Adjust to legal address NewPos=NewPos % H->TableSize; } else {//If it is an even number of conflicts NewPos=CurrentPos-CNum*CNum/4; while(NewPos < 0)//Adjust to legal address NewPos+=H->TableSize; } } return NewPos; //At this time, the NewPos is either the location of the key or the location of an empty cell }
Insertion function of square detection method:
bool Insert(HashTable H,ElementType Key) { Position Pos=Find(H,key); if(H->Cells[Pos].Info != Legitimate) { H->Cells[Pos].Info=Legitimate; H->Cells[Pos],Data=key; return true; } else { printf("Key value already exists"); return false; } }
- Double hash detection
hi(key) = (h(key)+i*h2(key)) mod TableSize;
If the second hash function h2(key) is not selected well, the result will be disastrous. It is required that h2(key) cannot be 0 for any key. In addition, the detection increment sequence should ensure that all hash storage units should be detected. p - (the prime p of the key is less than H2, MOD) = good effect.
- Re hashing method
The loading factor of open addressing method will seriously affect the search efficiency. When the loading factor is too large, you can double expand the hash table, so that the loading factor can be reduced by half, that is, Rehashing.
Separate link method
The practice of Separate Chaining is to store all data objects with keywords as synonyms in the same single linked list through node links.
Structure declaration of separation link method:
#define KEYLENGTH 15 typedef char ElementType[KEYLENGTH+1]; typedef int Index; //Definition of single linked list typedef struct LNode *PtrToLNode; struct LNode { ElementType Data; PtrToLNode Next; }; typedef PtrToLNode Position; typedef PtrToLNode List; typedef struct TblNode *HashTable;//Hash table type struct TblNode {//Hash table node definition int TableSize;//Maximum length of table List Heads;//Array pointing to the node of the chain header };
Initialization function of split link method:
HashTable CreateTable(int TableSize) { HashTable H; int i; H=(HashTable)malloc(sizeof(struct TblNode)); H->TableSize=NextPrime(TableSize); H->Heads=(List)malloc(sizeof(struct LNode)); for(i=0;i<H->TableSize;i++) { H->Heads[i].Data[0]='\0'; H->Heads[i].Next=NULL; } return H; }
Lookup function of split link method:
Position Find(HashTable H,ElementType Key) {//Suppose the keyword Key here is character type Position P; Index Pos; Pos=Hash(Key,H->TableSize);//Initial hash position P=H->Heads[Pos].Next;//Start from the first node of the linked list while(P&&strcmp(P->Data,Key)) //When the end of the table is not reached and the Key is not found P=P->Next; return P; }
Insert function of separate link method:
bool Insert(HashTable H,ElementType Key) { P=Find(H,Key); if(!P) { NewCell=(Position)malloc(sizeof(struct LNode)); strcpy(NewCell->Data,Key); Pos=Hash(Key,H->TableSize); //Insert NewCell as the first node of the H - > heads [POS] linked list NewCell->Next=H->Heads[Pos].Next; H->Heads[Pos].Next=NewCell; return true; } else { printf("Key value already exists"); return false; } }
Release hash function of split link method:
void DestroyTable(HashTable H) { int i; Position P,Tmp; //Release the node of each linked list for(i=0;i<H->TableSize;i++) { P=H->Heads[i].Next; while(P) { Tmp=P->Next; free(P); P=Tmp; } } free(H->Heads);//Release header node array free(H);//Release hash node }