List of Java Collection Source Analysis: ArrayList_A Little Classroom (Multi-shore College)

With so much preparation, we finally got to ArrayList, which is our most frequently used collection class. Let's first look at how the document describes it:

Resizable-array implementation of the List interface. Implements all optional list operations, and permits all elements, including null. In addition to implementing the List interface, this class provides methods to manipulate the size of the array that is used internally to store the list. (This class is roughly equivalent to Vector, except that it is unsynchronized.)

As you can see, ArrayList is a copy of Vector, except for thread safety. Vector is not recommended for various reasons, so we will not analyze it here. ArrayList is a dynamically resizable List implementation whose data sequence is consistent with the insertion order and the rest of the features are consistent with those defined in List.

ArrayList Inheritance Structure

As you can see, ArrayList is a subclass of AbstractList and implements the List interface. In addition, it also implements three identifying interfaces, none of which has any method, and the implementation class only has some function as an identifying representation class. Random Access means that implementation classes support fast random access, Cloneable means that implementation classes support cloning, specifically by rewriting clone methods, and java.io.Serializable means that serialization is supported. If you need to customize this process, you can rewrite writeObject and readObject methods.

When you ask questions about ArrayList in a general interview, you may ask what the initial size of ArrayList is. Many people may call parametric constructors directly when initializing ArrayList and never pay attention to this problem. For example, get an object like this:

ArrayList<String> strings = new ArrayList<>();

As we all know, ArrayList is based on arrays, and arrays are fixed in length. So why can ArrayList insert either one or ten thousand data without specifying the length? Recall the first sentence of the document just now:

Resizable-array implementation of the List interface.

ArrayList can be dynamically resized, so we can insert multiple data without perception, which also means that it must have a default size. If you want to expand the size of an array, you can only replicate it. In this way, the default size and how to dynamically adjust the size will have a great impact on performance. Let's give an example to illustrate this situation.

For example, the default size is 5, and we insert five pieces of data into the ArrayList, which does not involve expansion. If you want to insert 100 pieces of data, you need to adjust the size of ArrayList to 100 before inserting, which involves a copy of the array. What if you want to insert another 50 pieces of data at this time? Then you have to adjust the size to 150, copy the original 100 pieces of data, and insert the new 50 pieces of data. Since then, every time we insert a piece of data into it, we have to involve a data copy, and the larger the amount of data, the more data we need to copy, the faster the performance will decline.

In fact, ArrayList is just an encapsulation of array operations, which takes some measures to avoid the above problems. If we do not use these measures, it is not much different from using arrays directly. Let's see what measures ArrayList uses and how to use them. Let's start with initialization.

Construction Method and Initialization

ArrayList has three constructions, using two member variables.

//This is an array used to mark storage capacity and also to store actual data.
//When ArrayList is expanded, its capacity is the length of the array.
//By default, it is empty, and when the first element is added, it extends directly to DEFAULT_CAPACITY, or 10.
//The difference between this and size is that the expansion of ArrayList is not as much as it needs.
transient Object[] elementData;

//Here is the actual number of data stored.
private int size;

In addition to the above two member variables, we also need to master a variable, which is

protected transient int modCount = 0;

The main function of this variable is to prevent changing the size of the ArrayList during some operations, which will make the results unpredictable.

Let's look at the constructor:

//Default constructor. The document states that its default size is 10, but as the elementData definition says,
//Only after inserting a piece of data will it expand to 10, which is empty by default.
 public ArrayList() {
    this.elementData = DEFAULTCAPACITY_EMPTY_ELEMENTDATA;
}

//With an initial size constructor, once the size is specified, elementData is no longer the original mechanism.
public ArrayList(int initialCapacity) {
    if (initialCapacity > 0) {
        this.elementData = new Object[initialCapacity];
    } else if (initialCapacity == 0) {
        this.elementData = EMPTY_ELEMENTDATA;
    } else {
        throw new IllegalArgumentException("Illegal Capacity: "+
                                               initialCapacity);
    }
}

//Construct an ArrayList with initialization data from another Collection.
//Here you can see that size represents the amount of data stored.
//It also demonstrates the abstract charm of Collection, which can be transformed between different structures.
public ArrayList(Collection<? extends E> c) {
    //The most important transformation is toArray(), which is defined in Collection.
    elementData = c.toArray();
    if ((size = elementData.length) != 0) {
        if (elementData.getClass() != Object[].class)
            elementData = Arrays.copyOf(elementData, size, Object[].class);
    } else {
        // replace with empty array.
        this.elementData = EMPTY_ELEMENTDATA;
    }
}

Important methods

ArrayList is already a concrete implementation class, so all the methods defined in the List interface are implemented here. Some of the methods implemented in AbstractList are rewritten here again, and we will see the difference later.

First look at some simple ways:

//Remember the implementation in AbstractList? That's based on Iterator.
//There's absolutely no need to switch to Iterator before you operate here.
public int indexOf(Object o) {
    if (o == null) {
        for (int i = 0; i < size; i++)
            if (elementData[i]==null)
                return i;
    } else {
        for (int i = 0; i < size; i++)
            if (o.equals(elementData[i]))
                return i;
    }
    return -1;
}

//It's the same as indexOf.
 public int lastIndexOf(Object o) {
    //...
}

//In the same way, we already have all the elements, so we don't need to use Iterator to get them anymore.
//Notice that elementData is truncated to size when returned here
public Object[] toArray() {
    return Arrays.copyOf(elementData, size);
}

//With type conversion, see here a[size] = null; this is really not very useful unless you are sure that all elements are not empty.
//Only by null can we judge how much useful data we get.
public <T> T[] toArray(T[] a) {
    if (a.length < size)
        // Given insufficient data length, copy a new one and return it
        return (T[]) Arrays.copyOf(elementData, size, a.getClass());
    System.arraycopy(elementData, 0, a, 0, size);
    if (a.length > size)
        a[size] = null;
    return a;
}

The most important thing in data manipulation is to add, delete, modify and check, which does not involve the change of length, while adding and deleting involves the dynamic adjustment of size. Let's first see how the modification and check can be realized:

private void rangeCheck(int index) {
    if (index >= size)
        throw new IndexOutOfBoundsException(outOfBoundsMsg(index));
}

//Just get the data between 0 and size.
public E get(int index) {
    rangeCheck(index);

    return elementData(index);
}

//Change the value of the corresponding position
public E set(int index, E element) {
    rangeCheck(index);

    E oldValue = elementData(index);
    elementData[index] = element;
    return oldValue;
}

Addendum and delete are the most important parts of ArrayList. This part of the code needs our careful study. Let's see how it handles the problems in our example:

//Add an element at the end
public boolean add(E e) {
    //Make sure the elementData array is long enough first
    ensureCapacityInternal(size + 1);  // Increments modCount!!
    elementData[size++] = e;
    return true;
}

public void add(int index, E element) {
    rangeCheckForAdd(index);

    //Make sure the elementData array is long enough first
    ensureCapacityInternal(size + 1);  // Increments modCount!!
    //Move the data one bit backwards and insert it after emptying the position.
    System.arraycopy(elementData, index, elementData, index + 1,
                         size - index);
    elementData[index] = element;
    size++;
}

Both of the above ways of adding data call the ensureCapacityInternal method. Let's see how it works:

//As mentioned in defining elementData, inserting the first data expands it directly to 10
private void ensureCapacityInternal(int minCapacity) {
    if (elementData == DEFAULTCAPACITY_EMPTY_ELEMENTDATA) {
        minCapacity = Math.max(DEFAULT_CAPACITY, minCapacity);
    }
    
    //Here I hand in my work again.
    ensureExplicitCapacity(minCapacity);
}

//If the length of elementData does not meet the requirement, it needs to be extended
private void ensureExplicitCapacity(int minCapacity) {
    modCount++;

    // overflow-conscious code
    if (minCapacity - elementData.length > 0)
        grow(minCapacity);
}

//expansion
private void grow(int minCapacity) {
    // overflow-conscious code
    int oldCapacity = elementData.length;
    //You can see that it's 1.5 times larger here.
    int newCapacity = oldCapacity + (oldCapacity >> 1);
    
    //After the expansion, it is still not satisfied. At this time, it will be expanded directly to minCapacity.
    if (newCapacity - minCapacity < 0)
        newCapacity = minCapacity;
    //Preventing spillovers
    if (newCapacity - MAX_ARRAY_SIZE > 0)
        newCapacity = hugeCapacity(minCapacity);
    // minCapacity is usually close to size, so this is a win:
    elementData = Arrays.copyOf(elementData, newCapacity);
}

So far, we have a thorough understanding of the expansion mechanism of ArrayList. First, create an empty array elementData, which expands directly to 10 when inserting data for the first time. Then if the length of elementData is insufficient, it expands 1.5 times. If the expansion is not enough, the length needed is used as the length of elementData.

This approach is obviously better than our example, but it copies data frequently when encountering large amounts of data. So how can we alleviate this problem? ArrayList offers us two possible solutions:

  • Using the parametric structure of ArrayList (int initial capacity) to declare a larger size at the time of creation solves the problem of frequent copying, but requires us to predict the order of magnitude of data ahead of time, which will also occupy a large amount of memory.
  • In addition to the automatic expansion when adding data, we can also do one expansion before insertion. As long as the order of magnitude of data is predicted in advance, it can be expanded in place directly once needed. Compared with ArrayList (int initial capacity), the advantage is that it does not need to occupy a large amount of memory all the time, and the number of data copies is greatly reduced. This method is ensureCapacity(int minCapacity), which calls ensure Capacity Internal (int minCapacity) internally.

There are other important functions, and the principles of their implementation are quite similar. Here we do not analyze them one by one, but we will list them for use.

//Set elementData to the same size as size, freeing up all unused memory
public void trimToSize() {
    //...
}

//Delete elements at specified locations
public E remove(int index) {
    //...
}

//Delete by element itself
public boolean remove(Object o) {
    //...
}

//Add some elements at the end
public boolean addAll(Collection<? extends E> c) {
    //...
}

//From the specified location, add some elements
public boolean addAll(int index, Collection<? extends E> c){
    //...
}

//Delete elements within a specified range
protected void removeRange(int fromIndex, int toIndex){
    //...
}

//Delete all elements contained in c
public boolean removeAll(Collection<?> c) {
    //...
}

//Retain only all elements contained in c
public boolean retainAll(Collection<?> c) {
    //...
}

ArrayList also optimizes the ListIterator and SubList implementations of the parent level, mainly using location access elements, so we won't look at them any more.

Other implementations

ArrayList not only implements all the functions defined in List, but also implements equals, hashCode, clone, writeObject and readObject. These methods need to be compatible with the stored data, otherwise the result will be wrong or the cloned data is only a shallow copy, or the data itself does not support serialization and so on, which we should pay attention to when defining the data. Let's look at what it customizes when serializing.

//Here we can unravel our confusion that elementData is transient ly modified, that is, it does not participate in serialization.
//Here we see that the data is written one by one, and the size is also written in.
private void writeObject(java.io.ObjectOutputStream s)
    throws java.io.IOException{
    // Write out element count, and any hidden stuff
    int expectedModCount = modCount;
    s.defaultWriteObject();

    // Write out size as capacity for behavioural compatibility with clone()
    s.writeInt(size);

        // Write out all elements in the proper order.
    for (int i=0; i<size; i++) {
        s.writeObject(elementData[i]);
    }

    //The function of modCount is shown here. If a modification is made during serialization, an exception will be thrown.
    if (modCount != expectedModCount) {
        throw new ConcurrentModificationException();
    }
}

readObject is an opposite process, that is to restore the data correctly and set up elementData. Interested in it, you can read the source code by yourself.

summary

Overall, ArrayList, like arrays, is more suitable for random data access than for large numbers of insertions and deletions. If insertions must be performed, the following three ways should be used:

  • Using the parametric construction of ArrayList (int initial capacity), a larger size is declared at creation time.
  • Ensure Capacity (int minCapacity) is used to expand before insertion.
  • With LinkedList, there's nothing wrong with this. We'll soon introduce this collection class suitable for addition and deletion.

Thank you for seeing it. If you can help me, please give me a compliment.

More experience and technology are welcome to come and learn together. A Little Classroom - Online Learning Platform for Dreams http://www.yidiankt.com/

[Pay attention to the public number and reply to "1" for free - [java Core Knowledge Points]

QQ discussion group: 616683098

QQ: 3184402434

Students who want to study in depth can study and discuss with QQ ~and have a full set of resources to share, experience to explore, wait for you!

Keywords: Programming Java REST

Added by KashKurcura on Tue, 10 Sep 2019 07:58:03 +0300