Application and principle analysis of ThreadLocal

Function and principle of ThreadLocal

We know that Java multithreading has security problems. The main reason is that multithreading accesses a shared data at the same time. Therefore, we have two main ideas to solve the multithreading problem:

1. Lock the shared data

2. Avoid multi-threaded operation of the same shared data

Idea 1 is a commonly used method, but since it is locking, there must be some performance problems, such as thread waiting.

So today we'll talk about idea 2, but idea 2 can't be applied to all thread safety issues. Because multiple threads must access the same data in many specific business scenarios, idea 2 is applicable to scenarios where shared data can be changed into thread private variables, such as the implementation of Handler in Android.

The Handler in Android implements the thread loop one-to-one relationship and uses ThreadLocal.

First, let's talk about the principle of ThreadLocal. ThreadLocal is a tool to solve the multi-threaded concurrency problem by exchanging space for time. It provides a variable copy for each thread and realizes the isolation of shared variables among multiple threads. ThreadLocal is more efficient than synchronized to achieve thread safety by locking. It is an implementation of lock free programming.

First, let's take an example as the starting point to see the internal principle of ThreadLocal:

public class ThreadLocalTest {

    private static ExecutorService service = Executors.newCachedThreadPool();

    public static void main(String[] args) {

        ThreadLocal<Boolean> threadLocal = new ThreadLocal<>();

        threadLocal.set(Boolean.TRUE);

        for (int i=0;i<100;i++){
            service.execute(() -> {
                threadLocal.set(Boolean.FALSE);
                System.out.println("The child thread is set to:" + threadLocal.get());
            });
        }
        try {
            Thread.sleep(2000L);
        } catch (InterruptedException e) {
            e.printStackTrace();
        }
        System.out.println("The main thread is set to:" + threadLocal.get());
        service.shutdown();
    }
}

We can see that the output is:

The child thread is set to: false
    ...
The child thread is set to: false
 The main thread is set to: true

We set the value of threadLocal to true in the main thread, and then opened 100 sub threads to change its value to false. In order to better reproduce the thread unsafe situation, we specially dormant the main thread for 3S. At this time, the value of threadLocal of the final output main thread is still true. This shows that the setting in the child thread does not affect the value of the main thread. In fact, the operation in each thread is the threadLocal value corresponding to its own thread.

If we change the ThreadLocal type in the above code to AtomicBoolean, the final output will be

The child thread is set to: false
    ...
The child thread is set to: false
 The main thread is set to: false

In fact, ThreadLocal is like this, which means that we create a Boolean value in each thread.

Let's assume that there is a scenario where each thread will do its own thread related operations. Finally, the operations in this thread should be written to the file with id as name in each thread.

public class ThreadLocalTest {
    private static ExecutorService service = Executors.newCachedThreadPool();

    public void main() {

        ThreadLocal<Integer> taskId = new ThreadLocal<>();

        taskId.set(1);

        service.execute(() -> {
            taskId.set(2);
            System.out.println("This thread does a lot of work A Operation, and then generate a" + taskId.get() + "In a named file...");
        });
        service.execute(() -> {
            taskId.set(3);
            System.out.println("This thread does a lot of work B Operation, and then generate a" + taskId.get() + "In a named file...");
        });
        service.execute(() -> {
            taskId.set(4);
            System.out.println("This thread does a lot of work C Operation, and then generate a" + taskId.get() + "In a named file...");
        });

        System.out.println("In the main thread taskId Value:" + taskId.get());
        service.shutdown();
    }
}

If we write this, it meets the standard, and the final output result is:

This thread does a lot of work A Operation, and then generate a file named 2...
This thread does a lot of work B Operation, and then generate a file named 3...
In the main thread taskId Value: 1
 This thread does a lot of work C Operation, and then generate a file named 4...

Of course, this operation can be implemented without ThreadLocal. We can directly generate a taskId of type int within each thread

public class ThreadLocalTest {
    private static ExecutorService service = Executors.newCachedThreadPool();

    public void main() {

        int taskId = 1;

        service.execute(() -> {
            int taskId2 = 2;
            System.out.println("This thread does a lot of work A Operation, and then generate a" + taskId2 + "In a named file...");
        });
        service.execute(() -> {
            int taskId3 = 3;
            System.out.println("This thread does a lot of work B Operation, and then generate a" + taskId3 + "In a named file...");
        });
        service.execute(() -> {
            int taskId4 = 4;
            System.out.println("This thread does a lot of work C Operation, and then generate a" + taskId4 + "In a named file...");
        });
        System.out.println("In the main thread taskId Value:" + taskId);
        service.shutdown();
    }
}

The above two codes as like as two peas, but which one do you think is more elegant?

Therefore, after understanding the two differences, it is obvious that when using ThreadLocal, each thread generates a copy of the variable in this thread, and the values obtained by all subsequent operations in this thread are this copy, which avoids many troubles in the development process. For example, the naming of each id is different and looks long and smelly.

Specifically, if some requirements require that each thread needs some data, and threads do not interfere with each other, using ThreadLocal is undoubtedly a great choice.

ThreadLocal principle

Let's analyze from the source code of ThreadLocal why ThreadLocal can ensure that when multiple threads operate on the same shared variable, using ThreadLocal type will generate a copy corresponding to the thread, so as to ensure thread safety.

ThreadLocal mainly uses the following methods:

protected T initialValue()

private void set(ThreadLocal<?> key, Object value)

public T get()

public void remove()

From the simplest point of view:

protected T initialValue() {
    return null;
}

It's simple enough. In fact, it doesn't make any sense inside ThreadLocal. It's used by developers to copy it when creating ThreadLocal. It is used to call get() to return the default value when ThreadLocal is not set().

ThreadLocal<Integer> taskId = new ThreadLocal<Integer>() {
    @Override
    protected Integer initialValue() {
        return 0;
    }
};

Take a look at the get() method:

public T get() {
    //Get current thread
    Thread t = Thread.currentThread();
    //Get the ThreadLocalMap corresponding to the current Thread. Each Thread saves a ThreadLocalMap, so as to realize the one-to-one correspondence between Thread and ThreadLocalMap
    ThreadLocalMap map = getMap(t);
    if (map != null) {
        //Get the value stored in this thread corresponding to this ThreadLocal
        ThreadLocalMap.Entry e = map.getEntry(this);
        if (e != null) {
            @SuppressWarnings("unchecked")
            //Value
            T result = (T)e.value;
            //Return correct results
            return result;
        }
    }
    //Description: if the Thread has not called the set() method to set the value, perform the initialization operation to obtain the value to be saved and generate a ThreadLocalMap corresponding to the Thread
    return setInitialValue();
}

ThreadLocalMap getMap(Thread t) {
    return t.threadLocals;
}

private T setInitialValue() {
    //Call initialValue() to initialize the value, or null if it is not replicated
    T value = initialValue();
    Thread t = Thread.currentThread();
    ThreadLocalMap map = getMap(t);
    if (map != null)
        //Set value
        map.set(this, value);
    else
        //If the ThreadLocalMap corresponding to the Thread is still empty, create the ThreadLocalMap corresponding to the Thread and write the value
        createMap(t, value);
    return value;
}

A ThreadLocalMap object in the Thread class stores the ThreadLocalMap corresponding to the current Thread. ThreadLocalMap has an array of Entry type, which is used to store data with ThreadLocal as the key and the value to be stored as the value.

Now you can see that Thread corresponds to TreadLocalMap one by one.

Calling get() does several things:

1. Get the current thread and call getMap() to get the ThreadLocalMap corresponding to the current thread

2. If the ThreadLocalMap corresponding to the thread is not empty, get the value of the key corresponding to the ThreadLocal

3. If 2 is empty, initialize value, create ThreadLocalMap corresponding to Thread and write value

Next, set():

public void set(T value) {
    Thread t = Thread.currentThread();
    //Gets the ThreadLocalMap object corresponding to the current operation thread
    ThreadLocalMap map = getMap(t);
    if (map != null)
        //When you get it, store the value you need to save in the map
        map.set(this, value);
    else
        //If not, create a map for the current thread and save the value to be saved with ThreadLocal as the key
        createMap(t, value);
}

Set in ThreadLocalMap (ThreadLocal <? > key, object value):

private void set(ThreadLocal<?> key, Object value) {

    
    //In fact, ThreadLocalMap is an array structure of Entry type
    Entry[] tab = table;
    int len = tab.length;
    //Use the hashcode of key and array length - 1 to do the & operation to address and find out the location i that should be stored. In fact, doing the & operation with array length - 1 is to truncate the hashcode to ensure that this i must be within the length range of the array.
    int i = key.threadLocalHashCode & (len-1);

    for (Entry e = tab[i];
         e != null;
         e = tab[i = nextIndex(i, len)]) {
        ThreadLocal<?> k = e.get();

        if (k == key) {
            e.value = value;
            return;
        }

        if (k == null) {
            replaceStaleEntry(key, value, i);
            return;
        }
    }
    //Store ThreadLocal as key and value as value Entry in tab[i]
    tab[i] = new Entry(key, value);
    int sz = ++size;
    if (!cleanSomeSlots(i, sz) && sz >= threshold)
        //After storing the data, clear the useless items of the key that have been recycled, organize the data and judge whether the array length reaches the threshold. If it reaches the threshold, expand the capacity
        rehash();
}

private void rehash() {
    expungeStaleEntries();

    // Use lower threshold for doubling to avoid hysteresis
    //When the array utilization is greater than or equal to 3 / 4, the capacity is expanded
    if (size >= threshold - threshold / 4)
        resize();
}

private void resize() {
    Entry[] oldTab = table;
    int oldLen = oldTab.length;
    //Each expansion is the original length * 2
    int newLen = oldLen * 2;
    Entry[] newTab = new Entry[newLen];
    int count = 0;

    for (int j = 0; j < oldLen; ++j) {
        Entry e = oldTab[j];
        if (e != null) {
            ThreadLocal<?> k = e.get();
            if (k == null) {
                e.value = null; // Help the GC
            } else {
                int h = k.threadLocalHashCode & (newLen - 1);
                while (newTab[h] != null)
                    h = nextIndex(h, newLen);
                newTab[h] = e;
                count++;
            }
        }
    }
    //Set the new length to threshold to judge whether expansion is needed next time
    setThreshold(newLen);
    size = count;
    table = newTab;
}

In fact, the main idea in the process of saving value is

1. If the ThreadLocalMap corresponding to the Thread does not exist, create it. If it already exists, execute value saving

2. If there is a hash conflict in the key, use the open addressing method to address back and find the appropriate location to store

3. After saving the value, clear the useless items that have been recycled by the key and judge whether to expand the capacity. If necessary, expand the capacity. Judge that the threshold of capacity expansion is 3 / 4 of the original length. Each expansion is the original length * 2. Let's see the initial length:

private static final int INITIAL_CAPACITY = 16;

ThreadLocalMap(ThreadLocal<?> firstKey, Object firstValue) {
    table = new Entry[INITIAL_CAPACITY];
    int i = firstKey.threadLocalHashCode & (INITIAL_CAPACITY - 1);
    table[i] = new Entry(firstKey, firstValue);
    size = 1;
    setThreshold(INITIAL_CAPACITY);
}

The initial length of ThreadLocalMap is 16

Memory leak problem

The Entry type above is:

static class Entry extends WeakReference<ThreadLocal<?>> {
    /** The value associated with this ThreadLocal. */
    Object value;

    Entry(ThreadLocal<?> k, Object v) {
        super(k);
        value = v;
    }
}

It is a weak reference class inherited from ThreadLocal type. The internal key value is a weak reference. When the key is recycled, the key is null, and the value is still strongly referenced, so it cannot be recycled. Therefore, a memory leak will occur.

Of course, we find that every time we get(),set() and other operations will traverse the useless items in the array and recycle them, so if we call get() or set(), there will be no memory leakage. But the best solution is to manually execute the remove() method after using it to ensure that unwanted values are released.

Problems that I didn't understand earlier

When I read the source code earlier, I didn't understand that multiple threads operate on the same ThreadLocal object without locking. How can I ensure that multiple threads create or read the ThreadLocalMap corresponding to the thread?
For example:

public T get() {
    Thread t = Thread.currentThread();
    ThreadLocalMap map = getMap(t);
    //If the execution right of a thread is taken away by other threads when a thread executes here, is the map still the map of the thread just operated when it comes back?
    if (map != null) {
        ThreadLocalMap.Entry e = map.getEntry(this);
        if (e != null) {
            @SuppressWarnings("unchecked")
            T result = (T)e.value;
            return result;
        }
    }
    return setInitialValue();
}

Later, I looked deeply into the Java virtual machine and found that this was not a problem:

Although multiple threads operate a common ThreadLocal variable at the same time, they do not read or write to the ThreadLocal instance, but call its method.

Map is the local variable of get(), and the local variable is saved in the local variable table of the get() method stack frame. Therefore, this data is private to the thread and will not be shared by other threads. Therefore, when other threads get the execution right and come here, they operate on their own map object and will not affect the map of the previous thread.

If the map object in the get() method is a member variable, thread insecurity will occur during multi-threaded operation.

Refer to the following articles:
https://blog.csdn.net/weixin_43314519/article/details/108188298
https://www.jianshu.com/p/1a5d288bdaee

https://www.freesion.com/article/4836870638/

Storage location of jvm variables this article also outlines the storage location of some variables for reference, but it is best to read the book "going deep into Java virtual machine":
https://blog.csdn.net/shanchahua123456/article/details/79605433

Keywords: Java Android Multithreading

Added by Dave96 on Sun, 12 Sep 2021 23:36:27 +0300