This time, thoroughly understand the implementation principle of SparseArray

Recently, when sorting out the knowledge of sparsearay, I found that most articles on sparsearay principle analysis on the Internet have many problems (it can be said that many authors do not understand the source code of sparsearay). Therefore, this article came into being. We know that SparseArray and ArrayMap are efficient K-V data structures in Android, and they are also frequent visitors in Android interviews. It is necessary to understand their implementation principle. This article takes SparseArray's source code as an example for in-depth analysis.

1, Class structure of SparseArray

SparseArray can be translated into a sparse Array, which can be literally understood as a loose discontinuous Array. Although it is called Array, it is a data structure for storing K-V. Where Key can only be of type int, while Value is of type Object. Let's look at its class structure:

public class SparseArray<E> implements Cloneable {
    // The value used to mark here has been deleted
    private static final Object DELETED = new Object();
    // Used to mark whether an element has been removed
    private boolean mGarbage = false;
    // A collection used to store key s
    private int[] mKeys;
    // The collection used to store value
    private Object[] mValues;
    // Number of elements stored
    private int mSize;
    
    // The default initial capacity is 10
    public SparseArray() {
        this(10);
    }

    public SparseArray(int initialCapacity) {
        if (initialCapacity == 0) {
            mKeys = EmptyArray.INT;
            mValues = EmptyArray.OBJECT;
        } else {
            mValues = ArrayUtils.newUnpaddedObjectArray(initialCapacity);
            mKeys = new int[mValues.length];
        }
        mSize = 0;
    }
    
    // ... Omit other codes

}

You can see that SparseArray only implements the clonable interface, not the Map interface, and SparseArray internally maintains an int array and an Object array. The parametric structure is called in the no parameter construction method, and its initial capacity is set to 10.

2, remove() method of SparseArray

Is it strange? As a container class, how can I remove it first without first talking about the put method? This is because some operations of the remove method will affect the put operation. The put method is easier to understand only if you understand remove first. Let's look at the code of remove:

// SparseArray
public void remove(int key) {
    delete(key);
}

public void delete(int key) {
    int i = ContainerHelpers.binarySearch(mKeys, mSize, key);

    if (i >= 0) {
        if (mValues[i] != DELETED) {
            mValues[i] = DELETED;
            mGarbage = true;
        }
    }
}

You can see that the remove method directly calls the delete method. In the delete method, you will first find the location of the key through binary search (analysis after binary search code), and then set the value of this location to delete. Note that mGarbage is also set to true to mark the presence of deleted elements in the collection. Imagine that after deleting multiple elements, there may be discontinuities in this collection? Maybe this is the origin of SparseArray's name.

3, put() method of SparseArray

As a data structure storing K-V types, the put method is the entry of key and value. It is also the most important method in SparseArray. Let's first look at the code of the put method:

// SparseArray
public void put(int key, E value) {
    int i = ContainerHelpers.binarySearch(mKeys, mSize, key);

    if (i >= 0) { // This means that the corresponding key already exists in mKeys, and the i position corresponds to the key.
        mValues[i] = value; // Update value directly
    } else { // The returned negative number indicates that no key was found in mKeys
      
        // Invert to get the position of the key to be inserted
        i = ~i;
      
    // If the insertion position is smaller than size and the value of this position is just deleted, directly insert the key and value into the ith position of mKeys and mValues respectively
        if (i < mSize && mValues[i] == DELETED) {
            mKeys[i] = key;
            mValues[i] = value;
            return;
        }
    // If mGarbage is true, it means that some elements have been removed. At this time, mKeys is full, but there are elements marked DELETE inside mKeys
        if (mGarbage && mSize >= mKeys.length) {
            // Call the gc method to move the elements in mKeys and mValues, which can be analyzed later
            gc();
                    
            // Because the gc method moves the array, the insertion position may change, so the insertion position needs to be recalculated
            i = ~ContainerHelpers.binarySearch(mKeys, mSize, key);
        } 
    // The insert method of GrowingArrayUtils will move all the data after the insertion position backward by one bit, and then insert the key and value into the ith position corresponding to mKeys and mValue respectively. If the array space is insufficient, it will start capacity expansion. Analyze this insert method later
        mKeys = GrowingArrayUtils.insert(mKeys, mSize, i, key);
        mValues = GrowingArrayUtils.insert(mValues, mSize, i, value);
        mSize++;
    }
}

Although this method has only a few lines, it is not easy to fully understand it, even if you write detailed notes. We might as well analyze it in detail. The first line of code obtains an index through binary search. Look at the code of binary search:

// ContainerHelpers
static int binarySearch(int[] array, int size, int value) {
    int lo = 0;
    int hi = size - 1;

    while (lo <= hi) {
        final int mid = (lo + hi) >>> 1;
        final int midVal = array[mid];

        if (midVal < value) {
            lo = mid + 1;
        } else if (midVal > value) {
            hi = mid - 1;
        } else {
            return mid;  // value found
        }
    }
    return ~lo;  // value not present
}

We are familiar with binary search. This algorithm is used to find the location of an element in a set of ordered arrays. If this element exists in the array, the corresponding position of this element is returned. If it does not exist, then lo at this time is the best storage location for this element. In the above code, lo is inversed as the return value. Because lo must be a number greater than or equal to 0, the return value after negation must be less than or equal to 0 With this in mind, let's look at the if in the put method Is else easy to understand?

// SparseArray
public void put(int key, E value) {
 
    if (i >= 0) { 
        mValues[i] = value; // Update value directly
    } else { 
        i = ~i;
        // ...  Omit other codes
    }
}

If i > = 0, it means that the current key already exists in mKeys. At this time, put only needs to update the latest value to mValues. If i < = 0, it means that there is no corresponding key in mKeys before. Therefore, you need to insert key and value into mKeys and mValues respectively. The best place to insert is to invert i.

After obtaining the insertion position, if this position is an element marked for deletion, it can be directly overwritten for a long time. Therefore, the following code is provided:

public void put(int key, E value) {
    // ...
    if (i >= 0) {
        // ...
    } else {    
        // If the corresponding position of i is deleted, it can be directly overwritten
        if (i < mSize && mValues[i] == DELETED) {
            mKeys[i] = key;
            mValues[i] = value;
            return;
        }
        // ...
    }
    
}

If the above conditions are not met, continue to look down:

public void put(int key, E value) {
    // ...
    if (i >= 0) {
        // ...
    } else {    
    // If mGarbage is true, it means that some elements have been removed. At this time, mKeys is full, but there are elements marked DELETE inside mKeys
        if (mGarbage && mSize >= mKeys.length) {
            // Call the gc method to move the elements in mKeys and mValues, which can be analyzed later
            gc();
                    
            // Because the gc method moves the array, the insertion position may change, so the insertion position needs to be recalculated
            i = ~ContainerHelpers.binarySearch(mKeys, mSize, key);
        } 
        // ...
    }
    
}

As we know above, mGarbage will be set to true when removing elements. This code means that there are removed elements, and the removed position is not the position to be inserted. If mKeys is full, call the gc method to move the elements to fill the removed position. Because the element position in mKeys has changed, the key insertion position may also change. Therefore, you need to call the dichotomy again to find the key insertion position.

The above code will finally determine the location where the key is inserted. Next, call the insert method of GrowingArrayUtils to insert the key:

// SparseArray
public void put(int key, E value) {
    int i = ContainerHelpers.binarySearch(mKeys, mSize, key);

    if (i >= 0) { 
        // ...
    } else { 
         // ...

    // The insert method of GrowingArrayUtils will move all the data after the insertion position backward by one bit, and then insert the key and value into the ith position corresponding to mKeys and mValue respectively. If the array space is insufficient, it will start capacity expansion. Analyze this insert method later
        mKeys = GrowingArrayUtils.insert(mKeys, mSize, i, key);
        mValues = GrowingArrayUtils.insert(mValues, mSize, i, value);
        mSize++;
    }
}

The insert method code of GrowingArrayUtils is as follows:

// GrowingArrayUtils
public static <T> T[] insert(T[] array, int currentSize, int index, T element) {
    assert currentSize <= array.length;
    // If the array size after insertion is less than the array length, the insertion operation can be performed
    if (currentSize + 1 <= array.length) {
        // Move all elements after index back one bit
        System.arraycopy(array, index, array, index + 1, currentSize - index);
        // Insert the key into the index position
        array[index] = element;
        return array;
    }

    // Coming here indicates that the array is full and needs to be expanded. newArray is the expanded array
    T[] newArray = ArrayUtils.newUnpaddedArray((Class<T>)array.getClass().getComponentType(),
            growSize(currentSize));
    System.arraycopy(array, 0, newArray, 0, index);
    newArray[index] = element;
    System.arraycopy(array, index, newArray, index + 1, array.length - index);
    return newArray;
}

// Returns the size after capacity expansion
public static int growSize(int currentSize) {
    return currentSize <= 4 ? 8 : currentSize * 2;
}

The code of the insert method is easy to understand. If the array capacity is sufficient, move the element after the index one bit backward, and then insert the key into the position of the index. If the array capacity is insufficient, it needs to be expanded before inserting.

4, gc() method of SparseArray

This method is actually easy to understand. We know that the Java virtual machine will perform GC operation when there is insufficient memory. After recycling garbage objects, the mark removal method will move the living objects to one end of memory in order to avoid memory fragmentation. The GC method in SparseArray actually refers to the idea of garbage collection and defragmentation.

As mentioned above about the mGarbage parameter, this variable will be set to true when deleting elements. As follows:

// mGarbage is set to true in all methods that remove elements in SparseArray

public E removeReturnOld(int key) {
    int i = ContainerHelpers.binarySearch(mKeys, mSize, key);

    if (i >= 0) {
        if (mValues[i] != DELETED) {
            final E old = (E) mValues[i];
            mValues[i] = DELETED;
            mGarbage = true;
            return old;
        }
    }
    return null;
}

public void delete(int key) {
    int i = ContainerHelpers.binarySearch(mKeys, mSize, key);

    if (i >= 0) {
        if (mValues[i] != DELETED) {
            mValues[i] = DELETED;
            mGarbage = true;
        }
    }
}

public void removeAt(int index) {
    if (index >= mSize && UtilConfig.sThrowExceptionForUpperArrayOutOfBounds) {
        throw new ArrayIndexOutOfBoundsException(index);
    }
    if (mValues[index] != DELETED) {
        mValues[index] = DELETED;
        mGarbage = true;
    }
}

All methods of inserting and finding elements in SparseArray will judge if mGarbage is true and Msize > = mkeys Call gc when length. Take the append method as an example. The code is as follows:

public void append(int key, E value) {

    if (mGarbage && mSize >= mKeys.length) {
        gc();
    }
  
   // ...  Omit irrelevant code
}

There are as many as 8 places to call the gc method in source code, all of which are related to adding and finding elements. For example, in methods such as put(), keyAt(), setValueAt(). In fact, the implementation of gc is relatively simple, that is, move all the data after deleting the location forward. The code is as follows:

private void gc() {
    // Log.e("SparseArray", "gc start with " + mSize);

    int n = mSize;
    int o = 0;
    int[] keys = mKeys;
    Object[] values = mValues;

    for (int i = 0; i < n; i++) {
        Object val = values[i];

        if (val != DELETED) {
            if (i != o) {
                keys[o] = keys[i];
                values[o] = val;
                values[i] = null;
            }

            o++;
        }
    }

    mGarbage = false;
    mSize = o;

    // Log.e("SparseArray", "gc end with " + mSize);
}

5, get() method of SparseArray

This method is relatively simple, because an ordered array is maintained when put ting, so the position of the key in the array can be directly determined through binary search.

public E get(int key, E valueIfKeyNotFound) {
    int i = ContainerHelpers.binarySearch(mKeys, mSize, key);

    if (i < 0 || mValues[i] == DELETED) {
        return valueIfKeyNotFound;
    } else {
        return (E) mValues[i];
    }
}

6, Summary

It can be seen that SparseArray is a very simple data structure, but its principle seems not so easy to understand. This is also the reason why the analysis of SparseArray in most online articles is ambiguous. I believe that through the study of this article, I must have a new understanding of the implementation of SparseArray!

This article is transferred from https://juejin.cn/post/6972985532397649933 , in case of infringement, please contact to delete.

Added by jovanross on Tue, 04 Jan 2022 21:54:20 +0200

Programming VIP