Multithreading security: everyone talks, but not everyone talks clearly

To write multithread safe code, the key point is to control access to shared and variable states:

  • The so-called shared means that the variable may be accessed by multiple threads at the same time;
  • The so-called variable means that the value of the variable may change during its life cycle.

If multiple threads access a shared variable state variable at the same time, but there is no effective access control, the operation of the program may bring unexpected errors. In order to solve this problem, there are three ways:

  1. Do not share this state variable in the thread (unfortunately, it cannot be completely avoided, but unnecessary sharing can be minimized).
  2. Shared variables are set to immutable state (simple and effective. Immutable objects are naturally multithread safe, such as String and BigInteger).
  3. Use synchronization when accessing state variables (this is the focus of concurrent programming).

In most cases, the multithreading security discussed is about accessing and sharing variable state variables, so we have to involve the data synchronization mechanism. However, before discussing the access control of data synchronization, we need to discuss a question: what is multithreading security?

1. Multithreading security

There may be different opinions on the definition of multithreading security, but its core point is correctness, that is, the behavior and result of the program are consistent with the expectation.

When multiple threads access a class, no matter what thread scheduling algorithm is used in the running environment or how these threads execute alternately, and there is no need to add any additional cooperation mechanism in the main program, this class can show correct behavior, so this class is thread safe—— <Java Concurrency in Practice>

Additional attention should be paid here. Generally, the necessary synchronization mechanism is encapsulated in a class, so that the client using this class does not need to consider the problem of multithreading.

Of course, synchronization mechanism is not necessary to maintain multithreading safety. For example, a class is stateless:

public class StatelessFactorizer extends GenericServlet implements Servlet {

    public void service(ServletRequest req, ServletResponse resp) {
        BigInteger i = extractFromRequest(req);
        BigInteger[] factors = factor(i);
        encodeIntoResponse(resp, factors);
    }

    void encodeIntoResponse(ServletResponse resp, BigInteger[] factors) {
    }

    BigInteger extractFromRequest(ServletRequest req) {
        return new BigInteger("7");
    }

    BigInteger[] factor(BigInteger i) {
        // Doesn't really factor
        return new BigInteger[] { i };
    }
}

This is the simplest network service that provides factorization. The service is state independent. Even if more requests are processed at the same time, they will not affect each other.

2. Atomicity

How to ensure multithreading safety? In short, access operations to shared variable state variables are atomic, that is, inseparable.

public class UnsafeCountingFactorizer extends GenericServlet implements Servlet {
    private long count = 0;

    public long getCount() {
        return count;
    }

    public void service(ServletRequest req, ServletResponse resp) {
        BigInteger i = extractFromRequest(req);
        BigInteger[] factors = factor(i);
        //Not Thread-Safe
        ++count;
        encodeIntoResponse(resp, factors);
    }
    ......
}

People with a little experience in multi-threaded programming know that the above code is not multi-threaded safe. When the count variable is used to record the number of times the service is called, the class becomes stateful. However, the self growth + + operation is not atomic. It can be decomposed into reading value, increasing value and writing back value. Each step can be interrupted and suspended. Multithreading accessing the count variable may cause problems. This is a multithreading error caused by improper execution sequence, which is called race condition.

Race conditions are mostly due to "check before execution", that is, check the state of a value first, and then execute the response action according to this state. However, in multithreading, after reading the status of this value, the value may be modified by other threads and become invalid.

To solve the above problems, we need to form a group of operations into an atomic composite operation. The composite operation cannot be interrupted until it is completed.

public class CountingFactorizer extends GenericServlet implements Servlet {
    private final AtomicLong count = new AtomicLong(0);

    public long getCount() { return count.get(); }

    public void service(ServletRequest req, ServletResponse resp) {
        BigInteger i = extractFromRequest(req);
        BigInteger[] factors = factor(i);
        //Thread-Safe
        count.incrementAndGet();
        encodeIntoResponse(resp, factors);
    }
}

For example, the above code uses the incrementAndGet method of atomic class AtomicLong to ensure that self growth is atomic.

3. Locking mechanism

If there are multiple shared state variables in multithreading, how to deal with them? It is not enough to rely on each variable as an atomic type. It is also necessary to set the operations between all state variables to atomic.

Multithreading safety requires that all variables related to state be updated in an atomic operation.

Such requirements can be implemented in Java using built-in locks and synchronous code blocks.

synchronized (lock){
     // doing someting;
}

There will be a built-in lock inside each object. When entering the synchronization code block, the built-in lock of the object will be obtained automatically, and when exiting the synchronization code block (including throwing exceptions), the built-in lock will be released automatically. The program in the synchronized code block will be guaranteed to be atomic, because the built-in lock is a kind of mutually exclusive lock. Only one thread can obtain the lock at a time, so as to ensure that multiple threads do not interfere with each other.

It should be noted that the built-in lock provides a reentry mechanism, that is, if the current thread has obtained the built-in lock of an object, it will succeed when it requests the lock again, which means that the operation granularity of the built-in lock is thread, not call.

The reentry mechanism of built-in lock is designed with good intentions, especially when inheriting the synchronous operation in the parent code, such as:

public class A{
    public synchronized void function(){
    }
}

public class B extends A{
    public synchronized void funtcion(){
        super.function();
        ......
    }
}

If there is no reentry mechanism, when the function in class B has obtained the built-in lock and then calls the synchronous method function in the parent class, it will wait because the built-in lock has not been obtained, but the methods of the external subclass can not release the built-in lock because the methods of the internal parent class are congested, resulting in deadlock.

4. Use built-in lock to protect the state

The emergence of lock makes the code path executed in parallel appear the necessary serial. However, it should be noted that if locks are used to control the access of a variable, locks need to be added to all access locations of the variable.

Each shared variable should have only one lock to protect. If multiple variables cooperate to complete the operation, these variables should be protected by the same lock.

When setting up synchronization code blocks, you should avoid abuse of synchronization control. The most extreme example is that all the code is in the synchronous code block. Although it is multi-threaded safe, it will cause all threads to become serial and lose the meaning of concurrency.

In order to improve the performance, we should only use code blocks as much as possible. For example:

public class CachedFactorizer extends GenericServlet implements Servlet {
    @GuardedBy("this") private BigInteger lastNumber;
    @GuardedBy("this") private BigInteger[] lastFactors;
    @GuardedBy("this") private long hits;
    @GuardedBy("this") private long cacheHits;

    public synchronized long getHits() {
        return hits;
    }

    public synchronized double getCacheHitRatio() {
        return (double) cacheHits / (double) hits;
    }

    public void service(ServletRequest req, ServletResponse resp) {
        BigInteger i = extractFromRequest(req);
        BigInteger[] factors = null;
        synchronized (this) {
            ++hits;
            if (i.equals(lastNumber)) {
                ++cacheHits;
                factors = lastFactors.clone();
            }
        }
        if (factors == null) {
            factors = factor(i);
            synchronized (this) {
                lastNumber = i;
                lastFactors = factors.clone();
            }
        }
        encodeIntoResponse(resp, factors);
    }

    void encodeIntoResponse(ServletResponse resp, BigInteger[] factors) {
    }
}

How to select the appropriate scope of synchronous code blocks needs to balance simplicity and concurrency based on actual requirements.

It is better to use only one synchronization mechanism in the same code block, which is easy to maintain. In addition, do not perform time-consuming operations in synchronized code blocks, which is a great consumption of performance.

Added by cayman_d on Wed, 09 Mar 2022 14:47:23 +0200