ArrayList of Java collection source code and related problems

Introduction to ArrayList

ArrayList can be said to be the most commonly used collection. Its essence is an array, a dynamic array that can be automatically expanded. The thread is unsafe and the element is allowed to be null.

Because the memory of the array is continuous, the elements can be read and written in O(1) time according to the subscript, so the time efficiency is very high.

Internal properties of ArrayList

//Serialize UID. Because ArrayList implements the Serializable interface, a UID convenient for serialization and deserialization is added.
private static final long serialVersionUID = 8683452581122892189L;
//Default capacity 10
private static final int DEFAULT_CAPACITY = 10;
//An empty array
private static final Object[] EMPTY_ELEMENTDATA = {};
//Default empty array.
private static final Object[] DEFAULTCAPACITY_EMPTY_ELEMENTDATA = {};
//For arrays that really store elements, variables modified by transient cannot be serialized and deserialized.
transient Object[] elementData;
//Number of current storage elements
private int size;

Three construction methods of ArrayList

//Empty parameter construction, just copy the empty array created in the attribute
public ArrayList() {
    this.elementData = DEFAULTCAPACITY_EMPTY_ELEMENTDATA;
}

//The passed in parameter is the collection length
public ArrayList(int initialCapacity) {
    //If the length is greater than 0, an array of corresponding size is created and assigned to the real array
    if (initialCapacity > 0) {
        this.elementData = new Object[initialCapacity];
    } else if (initialCapacity == 0) {
    //If the length is equal to 0, copy the empty array created in the property    
        this.elementData = EMPTY_ELEMENTDATA;
    } else {
        throw new IllegalArgumentException("Illegal Capacity: "+
                                           initialCapacity);
    }
}
    
//The passed in parameter is a collection
public ArrayList(Collection<? extends E> c) {
    	//First convert the collection to an array
        elementData = c.toArray();
    	//If the array length is not equal to 0, enter if
        if ((size = elementData.length) != 0) {
            //If c.toArray() makes an error and does not return Object [], use arrays The copy method passes the elements in the set C to the array
            if (elementData.getClass() != Object[].class)
                elementData = Arrays.copyOf(elementData, size, Object[].class);
        } else {
            //If the number is 0, copy the empty array in the attribute
            this.elementData = EMPTY_ELEMENTDATA;
        }
    }

Common API s for ArrayList

increase

1. add(E e)

First, let's talk about the add method with only one parameter (elements added to the collection). Let's go step by step:

Step 1: call the add (E) method

public boolean add(E e) {
    ensureCapacityInternal(size + 1);
    elementData[size++] = e;
    return true;
}

This method is divided into two steps. The first step is to call the ensureCapacityInternal(size + 1) method, and the second step is assignment.

Step 2: call the ensureCapacityInternal(size + 1) method to determine the required capacity of the current array

private void ensureCapacityInternal(int minCapacity) {
    if (elementData == DEFAULTCAPACITY_EMPTY_ELEMENTDATA) {
        minCapacity = Math.max(DEFAULT_CAPACITY, minCapacity);
    }
    ensureExplicitCapacity(minCapacity);
}

In this step, judge whether the current array is the array created during null parameter construction, that is, DefaultAttribute_ EMPTY_ ELEMENTDATA, then compare the default length of the array and the length after the current addition element, extract the maximum value, and then call ensureExplicitCapacity(minCapacity) to determine whether the expansion is required.

Step 3: call the ensureCapacityInternal(size + 1) method to determine whether capacity expansion is needed

private void ensureExplicitCapacity(int minCapacity) {
    modCount++;
    if (minCapacity - elementData.length > 0)
        grow(minCapacity);
}

In this method, modCount + + is executed first. This variable literally means the number of modifications. Where modCount is used, it is thread unsafe. As long as we modify the structure and length of the set... We will modify the modCount value, and then when traversing the set again, we will judge whether the current modCount is equal, If it is not equal, it proves that other threads have modified the collection during traversal. Then, judge whether the current required capacity is greater than the length of the array. If it is greater than, execute the grow th method to expand the capacity.

Step 4: call the grow (minCapacity) method to expand the capacity

private void grow(int minCapacity) {
    int oldCapacity = elementData.length;
    int newCapacity = oldCapacity + (oldCapacity >> 1);
    if (newCapacity - minCapacity < 0)
        newCapacity = minCapacity;
    if (newCapacity - MAX_ARRAY_SIZE > 0)
        newCapacity = hugeCapacity(minCapacity);
    elementData = Arrays.copyOf(elementData, newCapacity);
}

When describing this method, you should first know that minCapacity is the required capacity of the current array; oldCapacity is the capacity of the current array; newCapacity is the capacity after 1.5x capacity expansion. Then look at the two if statements in the method. First: if the current capacity required by minCapacity is not reached after capacity expansion, the required length and size will be used directly, and the capacity expansion will not be continued for 1.5 times; Second: if the length after capacity expansion is greater than the maximum length of the array, call the hugeCapacity(minCapacity) method to determine the length of the array. If everything is ready, call arrays The copyof (elementdata, newCapacity) method copies the original array and the carrying length to the new array. The expansion is over.

Let's talk about the hugeCapacity(minCapacity) method:

private static int hugeCapacity(int minCapacity) {
    if (minCapacity < 0) // overflow
        throw new OutOfMemoryError();
    return (minCapacity > MAX_ARRAY_SIZE) ? Integer.MAX_VALUE : MAX_ARRAY_SIZE;
}

If minCapacity is greater than the maximum capacity, the new capacity is integer MAX_ Value, otherwise, the new capacity size is MAX_ARRAY_SIZE is integer MAX_ VALUE - 8.

add(int index, E element)

The second is the add method with two parameters. The first parameter is the inserted index and the second is the inserted data.

public void add(int index, E element) {
    if (index > size || index < 0)
        throw new IndexOutOfBoundsException(outOfBoundsMsg(index));
    ensureCapacityInternal(size + 1); 
    System.arraycopy(elementData, index, elementData, index + 1,size - index);
    elementData[index] = element;
    size++;
}

There are three steps in this method,

  • The first step is to call the ensureCapacityInternal(size + 1) method to judge and operate the capacity expansion.
  • The second step is to call system The arraycopy (elementdata, index, elementdata, index + 1, size - index) method separates the array from the subscript index to facilitate insertion.
  • The third step is to assign a value at the corresponding subscript index.

addAll(Collection<? extends E> c)

The third is to add a set of data.

public boolean addAll(Collection<? extends E> c) {
    Object[] a = c.toArray();
    int numNew = a.length;
    ensureCapacityInternal(size + numNew); 
    System.arraycopy(a, 0, elementData, size, numNew);
    size += numNew;
    return numNew != 0;
}

This method is divided into four steps:

  • Step 1: convert the incoming set into an array and find the length of the converted array.
  • Step 2: call the ensureCapacityInternal(size + numNew) method, and judge whether to expand the capacity according to the calculated length and the length of the existing array.
  • Step 3: add all the contents of the new array from size to the elementData array.
  • Step 4: update the array length size.

addAll(int index, Collection<? extends E> c)

The last way to increase is to add an entire set at the specified subscript.

public boolean addAll(int index, Collection<? extends E> c) {
    if (index > size || index < 0)
        throw new IndexOutOfBoundsException(outOfBoundsMsg(index));
    Object[] a = c.toArray();
    int numNew = a.length;
    ensureCapacityInternal(size + numNew);  // Increments modCount
    int numMoved = size - index;
    if (numMoved > 0)
        System.arraycopy(elementData, index, elementData, index + numNew,numMoved);
    System.arraycopy(a, 0, elementData, index, numNew);
    size += numNew;
    return numNew != 0;
}

There are six steps in this method:

  • Step 1: judge the length relationship between index and the original array, and throw an exception if it is not logical.
  • Step 2: convert the incoming set into an array and find the length of the converted array.
  • Step 3: call the ensureCapacityInternal(size + numNew) method, and judge whether to expand the capacity according to the calculated length and the length of the existing array.
  • Step 4: move the original array backward from index.
  • Step 5: add all the contents of the new array to the elementData array from index.
  • Step 6: update the array length size.

Summary

  • Whether it is add or addAll, first judge whether it is out of bounds. If it is out of bounds, expand the capacity, and then move the array
  • If capacity expansion is required, the original general size will be expanded by default; If it's not enough, take the size of the target as the size after capacity expansion

Delete

remove(int index)

The first deletion method is the remove(int index) method, which deletes the set element of the specified subscript

public E remove(int index) {
    if (index >= size)
        throw new IndexOutOfBoundsException(outOfBoundsMsg(index));
	//Step 1: to modify the structure and length of the set, modify modCount first
    modCount++;
    //Step 2: get the element of the subscript
    E oldValue = (E) elementData[index];
	//Step 3: use numMoved to determine whether the deleted element is the last element. If not, move the element after index forward
    int numMoved = size - index - 1;
    if (numMoved > 0)
        System.arraycopy(elementData, index+1, elementData, index,numMoved);
    //Step 4: set the last element to null
    elementData[--size] = null;
    return oldValue;
}

remove(Object o)

The second is the remove(Object o) method, which deletes the specified element in the collection (only one is deleted)

public boolean remove(Object o) {
    if (o == null) {
        for (int index = 0; index < size; index++)
            if (elementData[index] == null) {
                fastRemove(index);
                return true;
            }
    } else {
        for (int index = 0; index < size; index++)
            if (o.equals(elementData[index])) {
                fastRemove(index);
                return true;
            }
    }
    return false;
}

The two main for loops of this method are to find whether there is data in the collection that is the same as the incoming parameters. The main deletion operation is to call the fastRemove(index) method:

private void fastRemove(int index) {
    modCount++;
    int numMoved = size - index - 1;
    if (numMoved > 0)
        System.arraycopy(elementData, index+1, elementData, index,numMoved);
    elementData[--size] = null; // clear to let GC do its work
}

This method is almost the same as the remove (int index) method, and the steps are almost the same.

  • Step 1: to modify the structure and length of the set, modify modCount first
  • Step 2: use numMoved to determine whether the deleted element is the last element. If not, move the element after index forward
  • Step 3: set the last element to null

removeAll(Collection<?> c)

Just now, that is to delete the same element in the collection. As the name suggests, this removeAll is to delete all data in the collection that is the same as the parameter.

public boolean removeAll(Collection<?> c) {
    Objects.requireNonNull(c);
    return batchRemove(c, false);
}

We can see that this method actually calls the batchRemove(c, false) method for operation.

private boolean batchRemove(Collection<?> c, boolean complement) {
    final Object[] elementData = this.elementData;
    int r = 0, w = 0;
    boolean modified = false;
    try {
        //Step 1: put the non duplicate data in the two sets in front
        for (; r < size; r++)
            if (c.contains(elementData[r]) == complement)
                elementData[w++] = elementData[r];
    } finally {
        //Step 2: empty all the following elements
        if (r != size) {
            System.arraycopy(elementData, r,elementData, w,size - r);
            w += size - r;
        }
        if (w != size) {
            // clear to let GC do its work
            for (int i = w; i < size; i++)
                elementData[i] = null;
            modCount += size - w;
            size = w;
            modified = true;
        }
    }
    return modified;
}

retainAll(Collection<?> c)

This delete operation means that only the intersection of two sets is preserved. This method is very similar to the removeAll (collection <? > C) method, except that a parameter complex is changed. The previous one is to put all different elements in front, while this one is to put all the same elements in front, and then empty the following data.

public boolean retainAll(Collection<?> c) {
    Objects.requireNonNull(c);
    return batchRemove(c, true);
}

private boolean batchRemove(Collection<?> c, boolean complement) {
    final Object[] elementData = this.elementData;
    int r = 0, w = 0;
    boolean modified = false;
    try {
        //Step 1: put the duplicate data in the two sets in front
        for (; r < size; r++)
            if (c.contains(elementData[r]) == complement)
                elementData[w++] = elementData[r];
    } finally {
        //Step 2: empty all the following elements
        if (r != size) {
            System.arraycopy(elementData, r,elementData, w,size - r);
            w += size - r;
        }
        if (w != size) {
            // clear to let GC do its work
            for (int i = w; i < size; i++)
                elementData[i] = null;
            modCount += size - w;
            size = w;
            modified = true;
        }
    }
    return modified;
}

clear()

This delete operation is to clear all the elements in the collection

public void clear() {
    modCount++;
    // Directly traverse each bit, then empty each bit and let the GC clean it up
    for (int i = 0; i < size; i++)
        elementData[i] = null;
    size = 0;
}

Summary

The direct normal array deletion operation is to judge whether it is the end. If so, it will be directly assigned as null. If not, all the contents behind the array will be moved forward by one bit, and then the last will be assigned as null. All deletion operations will modify modCount.

change

set(int index, E element)

This is the only way to change the operation, that is, to modify the data of the specified subscript.

public E set(int index, E element) {
    //Step 1: judge whether the size relationship between index and size is reasonable
    if (index >= size)
        throw new IndexOutOfBoundsException(outOfBoundsMsg(index));
    //Step 2: take out the original element
    E oldValue = (E) elementData[index];
    //Step 3: put in the elements to be changed
    elementData[index] = element;
    return oldValue;
}

modCount does not need to be modified

check

get(int index)

public E get(int index) {
    if (index >= size)
        throw new IndexOutOfBoundsException(outOfBoundsMsg(index));

    return (E) elementData[index];
}

It's very simple. You don't need to change modCount.

contains(Object o)

This method determines whether a collection contains an element.

public boolean contains(Object o) {
    return indexOf(o) >= 0;
}

This method mainly calls indexOf(o). Let's take a look:

public int indexOf(Object o) {
    if (o == null) {
        for (int i = 0; i < size; i++)
            if (elementData[i]==null)
                return i;
    } else {
        for (int i = 0; i < size; i++)
            if (o.equals(elementData[i]))
                return i;
    }
    return -1;
}

This method is to traverse the collection to find whether there are the same elements. If there are, it will return the subscript. If there is no, it will return - 1 to the contents method for comparison with 0.

Empty judgment method isEmpty()

public boolean isEmpty() {
    return size == 0;
}

This method can directly judge whether the size is 0

Volume reduction trimToSize()

This method is used to change the length of a set to the number and size of elements in the set

public void trimToSize() {
    //Step 1: modify modCount
    modCount++;
    //Step 2: if the size is 0, you can set empty directly_ Elementdata is copied to the past, otherwise, arrays. Is called Copyof method copy
    if (size < elementData.length) {
        elementData = (size == 0)? EMPTY_ELEMENTDATA : Arrays.copyOf(elementData, size);
    }
}

Iterator iterator

What is an Iterator?

Iterator is an interface that provides a unified access mechanism for various data structures. As long as the iterator interface is deployed for any data structure, the traversal operation can be completed (that is, all members of the data structure can be processed in turn).

Iterator is a special object:
1. It has the next() method. Calling this method will return a result object
2. The result object has two attribute values: value and done.
3. value represents the specific return value; done is a boolean type, indicating whether the collection has completed traversal. If not, it returns true; otherwise, it returns false
4. There is a pointer inside, pointing to the starting position of the data structure. Each time the next() method is called, the pointer moves back one position until it points to the last position

The function of iterator:

  1. Provide a unified and simple access interface for various data structures;
  2. Enable the members of the data structure to be arranged in a certain order;
  3. Create a new traversal command for... of loop, and the Iterator interface is mainly used for for... of consumption.

Simple use of iterators

Collection<String> coll = new ArrayList<String>();	//polymorphic
coll.add("abc1");
coll.add("abc2");
coll.add("abc3");
coll.add("abc4");
// Iterator to fetch the elements in the collection ArrayList
// Call the iterator() method of the collection to get the object of the implementation class of the Iterator interface
Iterator<String> it = coll.iterator();
// The interface implements the class object and calls the method hasNext() to judge whether there are elements in the collection
// boolean b = it.hasNext();
// System.out.println(b);
// The implementation class object of the interface, and call the method next() to get the elements in the collection
// String s = it.next();
// System.out.println(s);
// Iteration is the repeated content, which is implemented by loop. The termination condition of loop: there is no element in the set, and hasNext() returns false
while (it.hasNext()) {
	String s = it.next();
	System.out.println(s);
}

Output is:

abc1
abc2
abc3
abc4

Iterator source code

Construction method (create iterator)

public Iterator<E> iterator() {
    // Construct Itr object and return
    return new Itr();
}

This method creates an Itr object and returns it. Let's look at the source code of Itr ().

Properties in Itr

//Number of elements in the array
protected int limit = ArrayList.this.size; 
//Subscript of the next element
int cursor;
//The subscript of the last returned element
int lastRet = -1; 
//Save modCount to determine whether the collection has been modified
int expectedModCount = modCount;

hasNext() method

public boolean hasNext() {
    return cursor < limit;
}

This is to judge whether the subscript of the next element is less than the number of current array elements of limit. If less than, it returns true to prove that there is the next element.

next() method

Return data

public E next() {
    // Judge whether the structure of the List has been modified. If so, throw an exception
    if (modCount != expectedModCount)
        throw new ConcurrentModificationException();
    int i = cursor;
    // If you cross the line, throw an exception
    if (i >= limit)
        throw new NoSuchElementException();
    Object[] elementData = ArrayList.this.elementData;
    // Again, judge whether it is out of bounds. It is possible that an asynchronous thread modified the List during our operation here
    if (i >= elementData.length)
        throw new ConcurrentModificationException();
    // Mark plus 1
    cursor = i + 1;
    // Returns data and sets the last subscript
    return (E) elementData[lastRet = i];
}

remove() method

public void remove() {
    if (lastRet < 0)
        throw new IllegalStateException();
    if (modCount != expectedModCount)
        throw new ConcurrentModificationException();

    try {
        // Call the remove method of ArrayList to remove the data
        ArrayList.this.remove(lastRet);
        // Update a series of data
        cursor = lastRet;
        lastRet = -1;
        expectedModCount = modCount;
        limit--;
    } catch (IndexOutOfBoundsException ex) {
        throw new ConcurrentModificationException();
    }
}

summary

Why is the capacity expansion factor of ArrayList 1.5?

When k=1.5, we can make full use of the space already released in front. If k > = 2, the new capacity is just right, which is always greater than the capacity of all abandoned arrays in the past.

Why not expand the fixed capacity?

The purpose of capacity expansion needs to comprehensively consider these two situations:

The expansion capacity should not be too small to prevent frequent expansion, frequent application of memory space + frequent replication of arrays

The expansion capacity should not be too large. It is necessary to make full use of space to avoid wasting too much space;

For the expansion of fixed capacity, it is difficult to determine how much value is appropriate, and it is not appropriate to take any specific value, because the amount of data required is often determined by the client of the array in the specific application scenario. It depends on the quantity * coefficient currently used, which is more in line with the actual application scenario.

For example, now that I have used the capacity of an array 100, it is likely that this order of magnitude of data needs to be inserted.

Why 1.5 instead of 1.2, 1.25, 1.8 or 1.75?

Because 1.5 can make full use of shift operation to reduce floating-point number or operation time and operation times.

Why is the maximum array length MAX_ARRAY_size is integer MAX_ VALUE - 8

As an object, array needs a certain amount of memory to store object header information. The maximum memory occupied by object header information cannot exceed 8 bytes.

What the hell is modCount

When looking at the source code of ArrayList and HashMap, it is found that in many cases, as long as the addition and deletion of elements are involved, it will be accompanied by the + + of modCount. So why do you have this modCount?

modCount literally means the number of modifications. Where modCount is used, such as ArrayList, hashMap, LinkedList, etc., they are ready-made and unsafe. Therefore, as long as the structure is modified, modCount + + will be added, and then the traversal will judge whether the current modCount is equal, If it is not equal, it proves that other threads have modified the collection during traversal.

Fial fast mechanism

We all know that these collections are thread unsafe. If other threads modify the collections during the use of iterators, they will throw a ConcurrentModificationException. This is the fail fast strategy. At this time, the source code operates through modCount. When the iterator is created, it will create a variable equal to the modCount at that time. If the collection changes during the iteration, the modCount is + +. At this time, the value of the variable in the iterator is not equal to modCount, so throw an exception.

Therefore, when traversing thread unsafe collections, try to use iterators

How to implement thread safe ArrayList

  1. All the places that involve changing modCount are worth adding synchronized
  2. Use collections directly synchronizedList
  3. Using Vector
  4. Replace ArrayList with CopyOnWriteArrayList

The difference between ArrayList and Vector

  • ArrayList thread is unsafe, Vector thread is safe. null values are allowed.
  • The default size is 10
  • During capacity expansion, ArrayList defaults to 1.5 times the original size, and Vector defaults to 2 times the original size (customizable).

ArrayList and LinkedList

  • Null values are allowed.
  • ArrayList is implemented by array and has capacity expansion operation. LinkedList is a linked list implementation, a two-way linked list.
  • ArrayList has good get/set performance and LinkedList has good insertion and deletion performance. However, this is not the case. If you insert from the head, the performance of LinkedList is better than ArrayList; Inserting from the middle, ArrayList is much better than LinkedList; Inserting from the tail, ArrayList is better than LinkedList.
  • LinkedList also supports the API of stack and queue, so it can also be used as stack and queue.
  • The traversal of ArrayList is simply from 0, while LinkedList will judge whether the current value is the first half or the second half, and the corresponding traversal will start from the beginning or the end.

Keywords: Java

Added by marmite on Sat, 29 Jan 2022 02:04:02 +0200