Design hash set of leetcode daily question
Title Link: https://leetcode-cn.com/problems/design-hashset/
Topic Description: design a hash set without using any built-in hash table library.
Implement the MyHashSet class:
void add(key) inserts the value key into the hash set.
bool contains(key) Returns whether the value key exists in the hash set.
void remove(key) deletes the given value key from the hash set. If there is no such value in the hash set, nothing is done.
analysis:
In order to realize the data structure of hash set, the following key problems need to be solved:
-
Hash function: it can map any possible element in the collection to a fixed range of integer value, and store the element in the address corresponding to the integer value.
-
Conflict handling: since different elements may be mapped to the same integer value, conflict handling is required when there is a "conflict" between integer values. In general, there are several strategies to resolve conflicts:
-
Chain address method: maintain a linked list for each hash value, and put elements with the same hash value into this linked list.
-
Open address method: when a conflict is found at the hash value hh, find the next non conflicting location from hh according to a certain strategy. For example, one of the simplest strategies is to constantly check the positions of integers such as h+1,h+2,h+3,\ldotsh+1,h+2,h+3.
-
Hashing method: when a hash conflict is found, another hash function is used to generate a new address.
-
-
Capacity expansion: when there are too many hash table elements, the probability of conflict will be greater and greater, and the efficiency of querying an element in the hash table will be lower and lower. Therefore, it is necessary to open up a larger space to alleviate the conflicts in the hash table.
package com.tao.hashtable; import java.util.Iterator; import java.util.LinkedList; /** * @Classname MyHashSet * @Description Using double linked list to design hash set * Java The HashMap of the standard library is basically realized by the zipper method. * The implementation of zipper method is relatively simple, combining linked list and array. In other words, create a linked list array, and each cell in the array is a linked list. * If hash conflicts are encountered, the conflicting values can be added to the linked list. * <p> * Implementation steps * <p> * *Get a key * *Calculate hashValue of key * *Locate data[hashValue] according to the hashValue value. (data[hashValue] is a linked list) * *If data[hashValue] is empty, insert it directly * *Otherwise, it will be added to the end of the linked list * It should be noted here that the hash function must ensure the uniform distribution of hash values. If all hash values are concentrated in one linked list, the time complexity is the same as that of the sequential linked list. * <p> * Another point is the size of the array. If you can estimate the size of the data, you can specify it directly. Otherwise, you need to expand the array dynamically. * int[] newArray = new int [array.length*2]; * //Array array from position 0 to array Length position, copy to newArray array 0 Length position. * System.arraycopy(array,0,newArray,0,array.length); * * @Date 2021/8/4 21:36 * @Author Anonymous */ @SuppressWarnings("all") public class MyHashSet2 { public static void main(String[] args) { } private static final int BASE = 769;//Reference value as hash modulo operation private LinkedList<Integer>[] data;//Each cell in the array is a linked list. /** * Initialize your data structure here. */ public MyHashSet2() { data = new LinkedList[BASE]; for (int i = 0; i < data.length; i++) { data[i] = new LinkedList<>(); } } /* * @Author Anonymous * @Description //Add elements to the collection * @Date 10:04 2021/8/9 * @Param [key] Represents the value to insert * @return void **/ public void add(int key) { int hash = hashValue(key);//Calculate the hash value first Iterator<Integer> iterator = data[hash].iterator();//Get the iterator of the linked list where the hash is located while (iterator.hasNext()) {//Traverse the linked list where the hash value is the hash Integer next = iterator.next();//Get the next value if (next == key) {//If there are duplicate elements in it, it will be returned directly return; } } data[hash].offerLast(key);//If there are no duplicate elements, they are added to the end of the linked list } /* * @Author Anonymous * @Description //Remove element from collection * @Date 10:05 2021/8/9 * @Param [key] Represents the element to delete * @return void **/ public void remove(int key) { int hash = hashValue(key);//Get hash value Iterator<Integer> iterator = data[hash].iterator(); while (iterator.hasNext()) { Integer next = iterator.next();//Get the next element if (next == key) { data[hash].remove(next); return; } } } /** * Returns true if this set contains the specified element */ public boolean contains(int key) { int hash = hashValue(key); Iterator<Integer> iterator = data[hash].iterator(); while (iterator.hasNext()) { Integer next = iterator.next(); if (next == key) { return true; } } return false; } /* * @Author Anonymous * @Description // * If the size of the hash table is base, a simple hash function can be designed: hash(x)=x mod base. * We open up an array with the size of base, and each position of the array is a linked list. After calculating the hash value, it is inserted into the linked list of the corresponding position. * Since we use integer division as the hash function, in order to avoid conflicts as much as possible, we should take base as a prime number. Here, we take base=769. * @Date 10:06 2021/8/9 * @Param [value] Hash of the inserted number to be calculated * @return int Hash value **/ public int hashValue(int value) { return value % BASE; } }
Checking the modulus of the lower prime number actually uses the concept of congruence: when the element is a regular arithmetic sequence and the maximum common divisor of the cardinality (array size) is not 1, the conflict during hash mapping will become higher (some positions of the array will never have values). For example, the number sequence 0,6,12,18,24,30,
- base is 10. After modulo (0,6,2,8,4,0...) is taken, the position in the hash table can only be in the array positions of 0,2,4,6,8;
- However, if we take the base as 7 (the array size is even smaller than 10), after taking the modulus of the same sequence (0,6,5,4,3,2,1,0,...), it can be distributed in all positions of 0,1,2,3,4,5,6 in the hash table;
Follow up: if the maximum convention of x and y is z, the minimum common multiple of x and y is (xy)/z. obviously, if z is 1, that is, when the maximum common divisor of two numbers is 1, the minimum common multiple of two numbers is xy.
Then, when a number is a prime number, except its own multiple, its remainder and its maximum common divisor will be 1. At this time, any number (except multiple) selected by the step length can meet the uniform distribution of the bucket.
Therefore, calculating the position of hash value in the bucket by modulus is that each position in the hash table can be "useful" when a prime number is used as the base.
Hash has many uses. When we use Ngnix for load balancing, we also use hash. In general, if the data is evenly distributed, you can consider using hash to process the data at this time.
Of course, prime numbers do not have to be 769: you can refer to: https://planetmath.org/goodhashtableprimes