preface
This article will not elaborate too much on the principle of HashSet, but only on some views of HashSet based on bottom + specific. I would like to point out some mistakes. Thank you
HashSet features
Element uniqueness: because HashSet is implemented based on HashMap, the elements in HashSet are stored on the key of HashMap, and the values in value are a unified fixed object private static final Object PRESENT = new Object();
Disorder: if there is no other data to support ordering, the implementation of array + hash cannot ensure the ordering of elements.
Allow adding a null value
HashSet underlying implementation
When implementing HashSet, the underlying layer directly creates an instance object of HashMap. You can see by looking at the construction method, as shown in the figure:
` public HashSet() { map = new HashMap<>(); }
Some of curd's daily operations directly call HashMap methods, which is omitted here. You need to see the source code implementation of HashMap. You should be more patient and watch it more times.
Own views on HashSet
The underlying time of the HashSet completely depends on the design of the HashMap. Generally speaking, there are great differences between the two. The HashSet mainly operates on the key key of the Map, and the global value is replaced by the specified empty Object object, which virtually increases the extra burden outside the HashSet function, and it is also heavy in the operation statement of curd; It seems that designers reuse the implementation of HashMap directly in order to be lazy.
Personal view
We can independently design according to the characteristics of HashSet set, and we can use the idea of HashMap; Discard the reference of value and directly implement the function of HashSet in the form of array + hash; On the whole, only a brief optimization is carried out, and the processes that do not use HashSet operation are discarded, so as to improve the efficiency of HashSet
The actual efficiency of array + hash is also briefly compared
CustomArray is implemented based on the idea of array + Hash, which is more efficient than HashSet in operation
test CustomArray Time to add millions of data: 32 ms test HashSet Time to add millions of data: 123 ms test CustomArray Time to cycle millions of data: 11 ms test HashSet Time to cycle millions of data: 22 ms
Implementation of array + Hash
The overall idea is consistent with that of HashMap.
Important properties of this structure
//Default initialization threshold 16 private static final int DEFAULT_CAPACITY = 1 << 4; private static final float DEFAULT_LOAD_FACTOR = 0.75f; //Maximum value allowed for array private static final int MAXIMUM_CAPACITY = 1 << 30; /** * Maximum capacity of array threshold = initialCapacity * loadFactor */ private int threshold; /** * Load factor */ private float loadFactor; /** * Array length */ private int initialCapacity; /** * Number of valid elements in the array */ private int size = 0; Object[] elements;
constructor
public CustomArray(int initialCapacity, float loadFactor) { if (initialCapacity < 0) { throw new IllegalArgumentException("Illegal initial capacity:" + initialCapacity); } if (loadFactor <= 0 || Float.isNaN(loadFactor)) { throw new IllegalArgumentException("Illegal load factor: " + loadFactor); } if (initialCapacity > MAXIMUM_CAPACITY) { initialCapacity = MAXIMUM_CAPACITY; } this.loadFactor = loadFactor; this.threshold= (int) (initialCapacity*loadFactor); this.initialCapacity = initialCapacity; elements = new Object[initialCapacity]; } public CustomArray() { this(DEFAULT_CAPACITY, DEFAULT_LOAD_FACTOR); } public CustomArray(int initialCapacity) { this(initialCapacity, DEFAULT_LOAD_FACTOR); }
hash
static final int hash(Object key) { int h; return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16); }
newly added
public void put(E key) { putVal(hash(key), key); } /** * Capacity expansion: first, judge whether hash collision occurs * If there is no collision, size+1 is used to verify whether capacity expansion is required * If a collision occurs, the value is overwritten directly and no collision is required * @param hash key hash value of * @param key New element */ private void putVal(int hash, E key) { Object curTempKey = elements[hash]; if (null != curTempKey) { //The specified key value exists and is not equal to the key value you are trying to add, so it will be overwritten if (!curTempKey.equals(key)) { elements[hash] = key; } } else { if (++size > threshold) { resize(); } elements[hash] = key; } }
Get element
public E get(int index) { if (index > size) { throw new ArrayIndexOutOfBoundsException("Index: " + index + ", Size: " + size); } return (E) elements[index]; }
Removing Elements
public boolean remove(E key) { int hash = hash(key); if (elements[hash] == null) { return false; } elements[hash] = null; size--; return true; }
Capacity expansion mechanism
/** * Capacity expansion */ private void resize() { Object[] oldArray = elements; int oldCap = (oldArray == null) ? 0 : oldArray.length; int oldThr = threshold; int newCap, newThr = 0; if (oldCap > 0) { //TODO: if the current threshold has been changed to the maximum value, additional policy processing is required if (oldThr > MAXIMUM_CAPACITY) { System.err.println("At present, the maximum capacity of the array has been reached, and the expansion cannot be realized again:"+threshold); } if (oldCap >= MAXIMUM_CAPACITY) { threshold = MAXIMUM_CAPACITY; } else if ((newCap = oldCap << 1) >= MAXIMUM_CAPACITY) { newCap = newThr = MAXIMUM_CAPACITY; threshold = newThr; elements = Arrays.copyOf(elements, newCap); } else { threshold = (int) (newCap * loadFactor); elements = Arrays.copyOf(elements, newCap); } } }