Catalogue of Series Articles
Part I Basic Knowledge
Chapter II Thread Security
-
The core of thread security is the management of state access operations, especially access to shared and variable states
Object state: Informally, data stored in a state variable (instance or static domain), while the state of an object may include domains that depend on other objects, such as a HashMap whose state is stored not only in the HashMap itself, but also in many Maps. The state of an object in an Entry object contains any data that may affect its externally visible behavior
Sharing: Variables can be accessed simultaneously by multiple threads
Variable: The value of a variable can change over the life cycle
-
The security of an object depends on how it is accessed, not what it needs to do
-
When multiple threads access a state variable and one of them performs a write operation, a synchronization mechanism must be taken to ensure thread security.
-
The main synchronization mechanisms in Java:
- Synnized keyword internal lock (since JDK6 introduced biased and lightweight locks, optimized for synchronized lock upgrade)
- volatile keyword
- Display Lock
- Java under JUC package. Util. Concurrent. Locks under locks package
- Atomic Variables
- Java under JUC package. Util. Concurrent. Various atomic classes provided under the atomic package
-
In a multithreaded access model, there are three scenarios for thread security
- Modify a state variable with final to an immutable variable
- Do not share variables between threads
- Use synchronization when accessing state variables
-
Java does not force all States to be encapsulated in classes. We can store states in an open domain or provide a public reference to an internal object, but this makes it more difficult to implement thread security. The conclusion is that the better the encapsulation, the easier the thread security. When designing thread-safe classes, good object-oriented technology, Non-modifiability and clear immutability specifications can help
2.1 What is thread security
-
Correctness: A class behaves exactly in accordance with the specification
Good specifications define:
- Invariance condition, constraining object state
- A posterior condition describing the result of an object operation
-
Thread security: When multiple threads access a class, the class always behaves correctly, so this class is thread safe
-
A class is thread-safe: Thread-safe when multiple threads access a class, regardless of how the runtime environment is scheduled or how the threads are executed alternately, and no additional synchronization or synergy of hot rivers is required in the main code
-
Stateless objects must be thread-safe
Below is an example of a stateless Servlet that provides a simple service for decomposing prime factors
@ThreadSafe//Thread Security (this is the comment used for identification in JCIP book, really want to introduce dependencies) public class StatelessFactorizer implements Servlet { public void service(ServletRequest req, ServletResponse resp) { BigInteger i = extractFromRequest(req);// Parse ServletRequest to get data without separate comment BigInteger[] factors = factor(i);// Decomposition prime factor encodeInttoResponse(resp, factors);// Encapsulate ServletResponse object } }
Such a Servlet is a stateless Servlet, where each request sent or response received is independent and does not interact with each other, so there is no shared state in a multithreaded access call, and thus thread-safe (second point of the thread-safe condition above).
Most Servlets are stateless, which greatly reduces the complexity of Servlet thread security. Thread security becomes a problem only when Servlets need to save some information when processing requests
2.2 Atomicity
When we add a state to a stateless Servlet, that is, when we want the Servlet to help us record the number of requests processed, the implementation of unsynchronized behavior can lead to thread insecurity
@NotThreadSafe//Thread insecurity public class UnsafeCountingFactorizer implements Servlet { private long count = 0; public long getCount() { return count; } public void service(ServletRequest req, ServletResponse resp) { BigInteger i = extractFromRequest(req); BigInteger[] factors = factor(i); ++count; encodeInttoResponse(resp, factors); } }
The above Servlet counts are inaccurate when accessed by multiple threads, which is incorrect and therefore unsafe. The reason for inaccurate counts is the ++ operator, which is not an atomic operator.
The operation of ++ corresponds to three instructions at the CPU instruction level
- Read the value of count
- Modify count value
- Write value of count
Therefore, there is a situation where two threads read count 9 at the same time, so they both write 10 at the end of the writing. Theoretically, after two visits, the result should be 11. For each more such conflict, the calculation will deviate by one. Perhaps the deviation effect of the count in this business scenario is not fatal. However, the same sequence or identifier will not be allowed when the counter is used to generate a numeric sequence or a unique object identifier
In concurrent programming, incorrect results due to inappropriate execution times, such as this, are a very important scenario in which a formal name is race condition, and the code that causes the race condition becomes a critical zone
2.2.1 race conditions
- Race Condition versus Data Race
A race condition is a condition in which results cannot be predicted due to differences in execution order
Data competition refers to errors in reading and writing due to information asynchronization
In short, race conditions are more concerned with the atomicity of operations, and threads are insecure because there is no guarantee of atomicity. Data competition focuses more on the visibility of the state of the operation and thread insecurity due to the lack of guaranteed visibility between multiple threads
Typical data competition:
public class DataRace { private long count; public void set(long newCount){ count=newCount; } public long get(){ return count; } }
When multiple threads invoke both set and get methods at the same time, it is possible that after a thread set executes successfully, other threads get to the old value and the solution is simple
private volatile long count;
Adding a volatile keyword to the shared state variable count guarantees visibility, meaning that when one thread is modified, other threads get updated values
Typical race conditions:
public class RaceCondition { private Long count; private void increase(){ count++; } }
This is also an example of++ mentioned in the previous section, where there are race conditions and data competition, and the modification method is to use atomic classes
private AtomicLong count = new AtomicLong(0); private void increase() { count.addAndGet(1); }
To summarize, the race condition type that exists here is also the most common type: check first, then execute, that is, take the next step with an error/invalid value
2.2.2 Racing Conditions in Delayed Initialization
The common "check-first-execute" application is delayed initialization, which is the full-featured mode in the single-case mode.
@NotThreadSafe public class LazyInitRace { private ExpensiveObject instance = null; public ExpensiveObject getInstance() { if(instance == null) { instance = new ExpensiveObject(); } return instance; } }
This form of singleton mode is not safe in a multithreaded environment because there is no way to guarantee singletons, also because of this check section, when multiple threads simultaneously check instances and find them empty, new instances are created, destroying the singleton
2.2.3 Composite Operation
The above problems, whether checking, executing, or reading, modifying, written models, all you need to solve are to ensure the atomicity of the operation. When I finish executing or writing, other threads cannot check or read, this race condition is destroyed and thread security is guaranteed.
Atomic operations: There are two operations A and B. From the perspective of threads executing A, threads executing B either do not execute at all or do it all. A and B are atoms to each other
There are many ways to guarantee atomicity in Java, either by using the locking mechanism described in Section 2.3 or by using the AtomicXxx data type mentioned earlier for Unsafe CountingFactorizer, which guarantees numeric increase or decrease atomicity
The usage of AtomicLong is demonstrated in 2.2.1. Modified code is not shown here, but it is important to note that in actual development, existing thread-safe objects are used whenever possible to manage the state of classes.
2.3 Locking mechanism
When a state variable is added to the Servlet, we can manage the state of the Servlet through thread-safe Atomic objects. Just imagine if more states are added, would it be OK to just add thread-safe variables?
Requirements, we want to improve Servlet performance, cache the results of the last calculation, and use the results of the last calculation directly when there are two identical numeric requests for prime factor decomposition (this is not an effective caching strategy, section 5.6 will give you a better one)
To implement this cache, we need to save two states:
- Number of recently performed prime factor decomposition
- The result of decomposition
@NotThreadSafe public class UnsafeCachingFactorizer implements Servlet { // Cache the last decomposed value, an atomic operation class that replaces object references when AtomicReference, specifying reference types through generics private final AtomicReference<BigInteger> lastNumber = new AtomicReference<>(); // Cache the results of the last decomposition private final AtomicReference<BigInteger[]> lastFactors = new AtomicReference<>(); public void service(ServletRequest servletRequest, ServletResponse servletResponse) { BigInteger i = extractFromRequest(servletRequest); if (i.equals(lastNumber.get())) { encodeIntoResponse(servletResponse, lastFactors.get()); } else { BigInteger[] factors = factor(i); lastNumber.set(i); lastFactors.set(factors); encodeIntoResponse(servletResponse, factors); } } }
Although we have used atomic classes to guarantee the atomicity of numbers, there are still race conditions for this set of business logic that we define, because the overall operation is not atomic
One of the invariant conditions of Unsafe CachingFactorizer is that the product of the factors cached in lastFactors is equal to the value of lastNumber, and when multiple threads access simultaneously, one thread may succeed in changing the value of lastFactors, while another thread may succeed in changing the last Number. Finally, this invariant condition is destroyed. It is also possible that other threads modified lastNumber and lastFactors while thread A was getting the last number value
When the invariance condition involves multiple variables, the variables are not independent, so modifications to them also need to be atomic.
We need a mechanism to limit the atomic operation of this statement
2.3.1 built-in lock - synchronized
The synchronization code block is divided into two parts, a reference to the lock object and a code block to protect the lock.
synchronized (lock) { // Synchronize Code Blocks }
Each of these Java objects can be used as locks, which are called built-in locks, and synchronized can also modify the method body, where the lock object of the instance method body is the object itself, and the lock object of the static method body is the Class object of that class.
Built-in locks in Java are mutually exclusive, meaning that at most one thread can hold such locks
When thread A wants to acquire the mutex of thread B, it has to wait for thread B to release the lock, and if B does not release for some reason, A will wait, which is deadlock
It's easy when we want to use this synchronization mechanism to ensure thread security in these cases
@ThreadSafe public class SynchronizedFactorizer implements Servlet { // Here @GuardedBy refers to being protected by a built-in lock synchronized object @GuardedBy("this") private BigInteger lastNumber; @GuardedBy("this") private BigInteger[] lastFactors; public synchronized void service(ServletRequest req, ServletResponse resp) { BigInteger i = extractFromRequest(req); if (i.equals(lastNumber)) { encodeIntoResponse(resp, lastFactors); } else { BigInteger[] factors = factor(i); lastNumber = i; lastFactors = factors; encodeIntoResponse(resp, factors); } } }
We've added built-in locks to the service method as a whole, but it correctly caches the latest results, but it's also very bad and we'll optimize its concurrency in the future
2.3.2 Reentry
When a thread requests a lock held by another thread, the requesting thread is blocked and waits for the other thread to release the lock. However, built-in locks can be reentrant, so a thread's request to acquire a lock held by itself succeeds.
Reentrant means that the granularity of acquiring built-in locks is thread, not call
An implementation of reentrant locks is to associate a counter with each lock and record the current owner's thread (Java Object Header MarkWord), which when the counter is 0 indicates that the lock is not currently held by any thread; When a thread requests an unheld lock, MarkWord records the thread ID of the current request and adds a counter of + 1. If the same thread requests the lock again, the counter value increases. When a thread exits the synchronization block, the counter decreases until the current thread completely releases the lock, which means that the counter value is 0, so how many times should a thread release a lock if it applies for fewer contractions
public class Widget { public synchronized void doSomething() { //... } } public class LoggingWidget extends Widget { public synchronized void doSomething() { System.out.println(toString() + ": calling doSomething"; super.doSomething(); } }
In the above code, we call the doSomething of the parent class to acquire a lock if the thread has already acquired the lock, and if the lock cannot be reentered, it will enter a deadlock state
2.4 Activeness and Performance
In 2.3 unsafeCachingFactorizer, we used built-in locks to secure threads, but it deviated significantly from our original intention of involving caching - improving performance because it should have been a concurrent Servlet network application because we added internal locks to the service method as a whole, resulting in only one request at a time and other requests waiting. At the same time, for CPUs with multi-core processors, even if the current load is high, other CPU cores may be idle, which greatly wastes resources and impacts performance
Therefore, in development, we need to minimize the block size of synchronized code while keeping threads safe. For the service method, we may not need to synchronize the entire method
@ThreadSafe public class CachedFactorizer extends GenericServlet implements Servlet { @GuardedBy("this") private BigInteger lastNumber; @GuardedBy("this") private BigInteger[] lastFactors; @GuardedBy("this") private long hits; // Access Counters @GuardedBy("this") private long cacheHits; // Cache Hit Counter // Get Access Count Lock Protected public synchronized long getHits() { return hits; } // cache hit rate public synchronized double getCacheHitRation() { return (double) cacheHits / hits; } public void service(ServletRequest req, ServletResponse resp) { BigInteger i = extractFromRequest(req); BigInteger[] factors = null; synchronized (this) { // The current object acts as a lock, with a built-in lock. ++hits; if (i.equals(lastNumber)) { factors = lastFactors.clone(); } } // The following sections do not require lock protection if (factors == null) { // Factorization, assuming a time-consuming operation, releases the lock before a time-consuming / I/O blocking operation. factors = factor(i); synchronized (this) { lastNumber = i; lastFactors = factors.clone(); } } encodeIntoResponse(resp, factors); } }
In the above code, we added the function of calculating cache hit ratio, and we reduced the size of the synchronization code block as much as possible, divided into two parts, one part to determine whether it can be returned directly, the other to update the cache for atoms; Local variables such as factors are stored in the stack frame of each thread and are not shared, so there is no thread security issue.
One important point to keep in mind is that locks must not be held for operations that may take a long time to perform or that cannot be completed quickly, such as IO operations.
At the same time, since we've used synchronization blocks to build atomic operations, we can discard the use of the AtomicXxx class and use two different synchronization mechanisms that will not only cause confusion, but also add any performance and security benefits.