Some views of HashSet based on bottom + characteristics

preface

This article will not elaborate too much on the principle of HashSet, but only on some views of HashSet based on bottom + specific. I would like to point out some mistakes. Thank you

HashSet features

Element uniqueness: because HashSet is implemented based on HashMap, the elements in HashSet are stored on the key of HashMap, and the values in value are a unified fixed object private static final Object PRESENT = new Object();
Disorder: if there is no other data to support ordering, the implementation of array + hash cannot ensure the ordering of elements.
Allow adding a null value

HashSet underlying implementation

When implementing HashSet, the underlying layer directly creates an instance object of HashMap. You can see by looking at the construction method, as shown in the figure:

`  public HashSet() {
        map = new HashMap<>();
   }

Some of curd's daily operations directly call HashMap methods, which is omitted here. You need to see the source code implementation of HashMap. You should be more patient and watch it more times.

Own views on HashSet

The underlying time of the HashSet completely depends on the design of the HashMap. Generally speaking, there are great differences between the two. The HashSet mainly operates on the key key of the Map, and the global value is replaced by the specified empty Object object, which virtually increases the extra burden outside the HashSet function, and it is also heavy in the operation statement of curd; It seems that designers reuse the implementation of HashMap directly in order to be lazy.

Personal view
We can independently design according to the characteristics of HashSet set, and we can use the idea of HashMap; Discard the reference of value and directly implement the function of HashSet in the form of array + hash; On the whole, only a brief optimization is carried out, and the processes that do not use HashSet operation are discarded, so as to improve the efficiency of HashSet

The actual efficiency of array + hash is also briefly compared

CustomArray is implemented based on the idea of array + Hash, which is more efficient than HashSet in operation

test CustomArray Time to add millions of data: 32 ms
 test HashSet Time to add millions of data: 123 ms
 test CustomArray Time to cycle millions of data: 11 ms
 test HashSet Time to cycle millions of data: 22 ms

Implementation of array + Hash

The overall idea is consistent with that of HashMap.

Important properties of this structure

	//Default initialization threshold 16
    private static final int DEFAULT_CAPACITY = 1 << 4;
    private static final float DEFAULT_LOAD_FACTOR = 0.75f;
    //Maximum value allowed for array
    private static final int MAXIMUM_CAPACITY = 1 << 30;
    /**
     *   Maximum capacity of array threshold = initialCapacity * loadFactor
     */
    private int threshold;
    /**
     *   Load factor
     */
    private float loadFactor;
    /**
     *   Array length
     */
    private int initialCapacity;
    /**
     *  Number of valid elements in the array
     */
    private int size = 0;
    Object[] elements;

constructor

	 public CustomArray(int initialCapacity, float loadFactor) {
        if (initialCapacity < 0) {
            throw new IllegalArgumentException("Illegal initial capacity:" + initialCapacity);
        }
        if (loadFactor <= 0 || Float.isNaN(loadFactor)) {
            throw new IllegalArgumentException("Illegal load factor: " +
                    loadFactor);
        }
        if (initialCapacity > MAXIMUM_CAPACITY) {
            initialCapacity = MAXIMUM_CAPACITY;
        }
        this.loadFactor = loadFactor;
        this.threshold= (int) (initialCapacity*loadFactor);
        this.initialCapacity = initialCapacity;
        elements = new Object[initialCapacity];
    }
    public CustomArray() {
        this(DEFAULT_CAPACITY, DEFAULT_LOAD_FACTOR);
    }
    public CustomArray(int initialCapacity) {
        this(initialCapacity, DEFAULT_LOAD_FACTOR);
    }

hash

	static final int hash(Object key) {
        int h;
        return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
    }

newly added

 	public void put(E key) {
        putVal(hash(key), key);
    }

    /**
     *   Capacity expansion: first, judge whether hash collision occurs
     *      If there is no collision, size+1 is used to verify whether capacity expansion is required
     *      If a collision occurs, the value is overwritten directly and no collision is required
     * @param hash  key hash value of
     * @param key  New element
     */
    private void putVal(int hash, E key) {
        Object curTempKey = elements[hash];
        if (null != curTempKey) {
            //The specified key value exists and is not equal to the key value you are trying to add, so it will be overwritten
            if (!curTempKey.equals(key)) {
                elements[hash] = key;
            }
        } else {
            if (++size > threshold) {
                resize();
            }
            elements[hash] = key;
        }
    }

Get element

	public E get(int index) {
        if (index > size) {
            throw new ArrayIndexOutOfBoundsException("Index: " + index + ", Size: " + size);
        }
        return (E) elements[index];
    }

Removing Elements

	public boolean remove(E key) {
        int hash = hash(key);
        if (elements[hash] == null) {
            return false;
        }
        elements[hash] = null;
        size--;
        return true;
    }

Capacity expansion mechanism

	/**
     *   Capacity expansion
     */
    private void resize() {
        Object[] oldArray = elements;
        int oldCap = (oldArray == null) ? 0 : oldArray.length;
        int oldThr = threshold;
        int newCap, newThr = 0;
        if (oldCap > 0) {
            //TODO: if the current threshold has been changed to the maximum value, additional policy processing is required
            if (oldThr > MAXIMUM_CAPACITY) {
                System.err.println("At present, the maximum capacity of the array has been reached, and the expansion cannot be realized again:"+threshold);
            }
            if (oldCap >= MAXIMUM_CAPACITY) {
                threshold = MAXIMUM_CAPACITY;
            } else if ((newCap = oldCap << 1) >= MAXIMUM_CAPACITY) {

                newCap = newThr = MAXIMUM_CAPACITY;
                threshold = newThr;
                elements = Arrays.copyOf(elements, newCap);
            } else {
                threshold = (int) (newCap * loadFactor);
                elements = Arrays.copyOf(elements, newCap);
            }
        }
    }

Keywords: Java Algorithm

Added by dvdflashbacks on Thu, 20 Jan 2022 07:54:50 +0200