Hash lookup - a way to handle conflicts

The following contents are from the data structure (2nd Edition) by grandma Chen Yue. The notes are for your reference only.

Open addressing

The so-called open addressing method is to find another empty hash address once there is a conflict, that is, when the address has stored other elements. In an unfilled hash table, an empty hash address can always be found.

Lazy delete: add a "delete tag" instead of actually deleting it. Because during the lookup operation, finding an empty address means that the lookup fails, but in fact, maybe the data objects hashed here have bypassed here and exist elsewhere.

Generally speaking, when the i-th conflict occurs, the next address we test will increase di. That is, h1(key)=(h(key) + di) mod TableSize. According to the different selection methods of Di, the following four conflict resolution methods can be obtained.

Linear detection method

di=i in the above formula becomes the linear detection method, that is, the linear detection method circularly tests the next storage address with the increment sequence 1, 2,... (TableSize-1).

When inserting, you need to find an empty position or know that the hash table is full; During the search operation, detect a keyword and compare it once until a specific data object is found or an empty position is detected, indicating that the search fails.

Primary Clustering: the phenomenon that many elements "pile up" on adjacent hash addresses, which greatly reduces the search efficiency.

Square detection method

The square detection method is to test the next storage address with the increment sequence 12, - 12, 22, - 22,..., q2, - q2, Q < = [TableSize / 2]. If the length of the hash table, TableSize, is a prime number in the form of 4k+3, the square detection method can detect the entire hash table space.

Secondary clustering: data objects hashed to the same address will detect the same candidate unit.

Type declaration of open addressing method:

#define MAXTABLESIZE 100000 / / maximum hash table length allowed
typedef int ElementType;//int for keyword type
typedef int Index;//int for hash address type
typedef Index Position;//The data location is in int
typedef enum{Legitimate,Empty,Deleted}EntryType;
//Hash cell status type, corresponding to legal element, empty cell and deleted element respectively

typedef struct HashEntry Cell;//Type of hash table cell
struct HashEntry {
	ElementType Data;//Store elements
	EntryType Info;//Unit status
};

typedef struct TblNode *HashTable;//Hash table type
struct TblNode {//Hash table node definition
	int TableSize;//Maximum length of table
	Cell *Cells;//An array that holds hash cell data
};

Initialization function of open addressing method:

int NextPrime(int N)
{//Returns the minimum prime number greater than N and not exceeding MAXTABLESIZE
	int i;
	int p=(N%2)?N+2:N+1;//Start with the next odd number greater than N
	while(p<=MAXTABLESIZE) {
		for(i=(int)sqrt(p);i>2;i--) 
			if(!(p%i)) break;//p is not prime
		if(i==2) break;//The for loop ends normally, indicating that p is a prime number
		else p+=2;
	}
	return p;
}

HashTable CreateTable(int TableSize)
{
	HashTable H;
	int i;
	H=(HashTable)malloc(sizeof(struct TblNode));
	H->TableSize=NextPrime(TableSize);
	//The Table Size passed in by the user is not necessarily a prime number. The maximum length of the hash table is guaranteed to be a prime number
	H->Cells=(Cell*)malloc(H->TableSize*sizeof(Cell));
	for(int i=0;i<H->TableSize;i++)
		H->Cells[i].Info=Empty;
	return H;
}

Search function of square detection method:

Position Find(HashTable H,ElementType Key)
{
	Position CurrentPos,NewPos;
	int CNum=0;//CNum records the number of conflicts
	NewPos=CurrentPos=Hash(Key,H->TableSize);//Initial hash position
	//A conflict occurs when the cell at this location is not empty and is not the element to be found
	//Suppose the keyword key here is of type int
	while(H->Cells[NewPos].Info!=Empty&&H->Cells[NewPos].Data!=Key) {
		if(++CNum%2) {//If it's an odd number of conflicts
			NewPos=CurrentPos+(CNum+1)*(CNum+1)/4;
			if(NewPos >= H->TableSize)//Adjust to legal address
				NewPos=NewPos % H->TableSize;
		}
		else {//If it is an even number of conflicts
		NewPos=CurrentPos-CNum*CNum/4;
		while(NewPos < 0)//Adjust to legal address
			NewPos+=H->TableSize;
			}
	}
	return NewPos;
	//At this time, the NewPos is either the location of the key or the location of an empty cell
}

Insertion function of square detection method:

bool Insert(HashTable H,ElementType Key) {
	Position Pos=Find(H,key);

	if(H->Cells[Pos].Info != Legitimate) {
		H->Cells[Pos].Info=Legitimate;
		H->Cells[Pos],Data=key;
		return true;
	}
	else {
		printf("Key value already exists");
		return false;
	}
}

Double hash detection

hi(key) = (h(key)+i*h2(key)) mod TableSize;

If the second hash function h2(key) is not selected well, the result will be disastrous. It is required that h2(key) cannot be 0 for any key. In addition, the detection increment sequence should ensure that all hash storage units should be detected. p - (the prime p of the key is less than H2, MOD) = good effect.

Re hashing method

The loading factor of open addressing method will seriously affect the search efficiency. When the loading factor is too large, you can double expand the hash table, so that the loading factor can be reduced by half, that is, Rehashing.

Separate link method

The practice of Separate Chaining is to store all data objects with keywords as synonyms in the same single linked list through node links.

Structure declaration of separation link method:

#define KEYLENGTH 15
typedef char ElementType[KEYLENGTH+1];
typedef int Index;

//Definition of single linked list
typedef struct LNode *PtrToLNode;
struct LNode {
	ElementType Data;
	PtrToLNode Next;
};
typedef PtrToLNode Position;
typedef PtrToLNode List;

typedef struct TblNode *HashTable;//Hash table type
struct TblNode {//Hash table node definition
	int TableSize;//Maximum length of table
	List Heads;//Array pointing to the node of the chain header
};

Initialization function of split link method:

HashTable CreateTable(int TableSize)
{
	HashTable H;
	int i;

	H=(HashTable)malloc(sizeof(struct TblNode));
	H->TableSize=NextPrime(TableSize);
	H->Heads=(List)malloc(sizeof(struct LNode));
	for(i=0;i<H->TableSize;i++) {
		H->Heads[i].Data[0]='\0';
		H->Heads[i].Next=NULL;
	}
	return H;
}

Lookup function of split link method:

Position Find(HashTable H,ElementType Key) 
{//Suppose the keyword Key here is character type
	Position P;
	Index Pos;
	
	Pos=Hash(Key,H->TableSize);//Initial hash position
	P=H->Heads[Pos].Next;//Start from the first node of the linked list
	while(P&&strcmp(P->Data,Key)) //When the end of the table is not reached and the Key is not found
		P=P->Next;

	return P;
}

Insert function of separate link method:

bool Insert(HashTable H,ElementType Key)
{
	
	P=Find(H,Key);
	if(!P) {
		NewCell=(Position)malloc(sizeof(struct LNode));
		strcpy(NewCell->Data,Key);
		Pos=Hash(Key,H->TableSize);
		//Insert NewCell as the first node of the H - > heads [POS] linked list
		NewCell->Next=H->Heads[Pos].Next;
		H->Heads[Pos].Next=NewCell;
		return true;
	}
	else {
	printf("Key value already exists");
	return false;
	}
}

Release hash function of split link method:

void DestroyTable(HashTable H)
{
	int i;
	Position P,Tmp;

	//Release the node of each linked list
	for(i=0;i<H->TableSize;i++) {
		P=H->Heads[i].Next;
		while(P) {
			Tmp=P->Next;
			free(P);
			P=Tmp;
		}
	}
	free(H->Heads);//Release header node array
	free(H);//Release hash node
 }

Keywords: data structure

Added by SoundreameR on Tue, 01 Feb 2022 08:19:37 +0200

Programming VIP

Hash lookup - a way to handle conflicts

Open addressing

Separate link method

Popular Keywords