linux Kernel-level Synchronization Mechanism--futex

In the interview about multi-threaded synchronization, you have to think about the question, we know that glibc pthread_cond_timedwait underlying is achieved by linux futex mechanism.

The ideal synchronization mechanism should be to use atomic instructions to solve the problem in user mode without lock conflicts, and to sleep and wake up with system calls provided by the kernel when the system needs to hang up and wait. In other words, when the user-mode spin fails, can you suspend the process and wake it up when the thread holding the lock releases it?
If you haven't thought about it in depth, you'll probably take it for granted that it's like this (pseudocode):

void lock(int lockval) {
	//trylock is a user-level spinlock
	while(!trylock(lockval)) {
		wait();//Releasing the cpu and adding the current thread to the waiting queue is a system call
	}
}

boolean trylock(int lockval){
	int i=0; 
	//localval=1 represents successful locking
	while(!compareAndSet(lockval,0,1)){
		if(++i>10){
			return false;
		}
	}
	return true;
}

void unlock(int lockval) {
	 compareAndSet(lockval,1,0);
	 notify();
}

The problem with the above code is that there is a window between trylock and wait calls:
If a thread trylock fails and the thread holding the lock releases the lock when it calls wait, the current thread still calls wait to wait, but no one wakes up the thread after that.

In order to solve these problems, the linux kernel introduces the futex mechanism. Futex mainly includes two methods: waiting and waking: futex_wait and futex_wake, which are defined as follows

//Uaddr points to an address, and val represents the expected value of the address. When * uaddr==val, wait occurs.
int futex_wait(int *uaddr, int val);
//Wake up n pending processes on lock variables pointed to by uaddr
int futex_wake(int *uaddr, int n);

futex checks whether the value of address pointed by addr is equal to val before it actually hangs the process. If it is not equal, it returns immediately and continues trylock in user mode. Otherwise, insert the current thread into a queue and suspend it.

stay Consideration on Synchronization In this paper, the background and basic principles of futex are introduced. People who are not familiar with futex can look at it first.

This article will deeply analyze the implementation of futex, so that readers can have an intuitive understanding of the bottom implementation of the lock, and then combine the previous two articles.( Consideration on Synchronization and Consideration on Synchronization ) Can have a comprehensive understanding of the synchronization mechanism of the operating system.

The term process in the following section includes regular processes and threads.

futex_wait

Before looking at the following source code analysis, consider a question: How do you ensure that the value of val is not modified by other processes when a process is suspended?

The code is in kernel/futex.c.

static int futex_wait(u32 __user *uaddr, int fshared,
		      u32 val, ktime_t *abs_time, u32 bitset, int clockrt)
{
	struct hrtimer_sleeper timeout, *to = NULL;
	struct restart_block *restart;
	struct futex_hash_bucket *hb;
	struct futex_q q;
	int ret;

	...

	//Setting hrtimer timer tasks: After a certain period of time (abs_time), wake up the wait ing process if the process has not been awakened
	if (abs_time) {
	    ...
		hrtimer_init_sleeper(to, current);
		...
	}

retry:
	//The function determines whether the value pointed by uaddr is equal to val, and some initialization operations
	ret = futex_wait_setup(uaddr, val, fshared, &q, &hb);
	//If val changes, return directly
	if (ret)
		goto out;

	//Change the current process state to TASK_INTERRUPTIBLE, insert it into the futex waiting queue, and reschedule it.
	futex_wait_queue_me(hb, &q, to);

	/* If we were woken (and unqueued), we succeeded, whatever. */
	ret = 0;
	//If unqueue_me succeeds, it means a timeout trigger (because when futex_wake wakes up, it moves the process out of the waiting queue, so it fails here)
	if (!unqueue_me(&q))
		goto out_put_key;
	ret = -ETIMEDOUT;
	if (to && !to->task)
		goto out_put_key;

	/*
	 * We expect signal_pending(current), but we might be the
	 * victim of a spurious wakeup as well.
	 */
	if (!signal_pending(current)) {
		put_futex_key(fshared, &q.key);
		goto retry;
	}

	ret = -ERESTARTSYS;
	if (!abs_time)
		goto out_put_key;

	...

out_put_key:
	put_futex_key(fshared, &q.key);
out:
	if (to) {
		//Cancellation of scheduled tasks
		hrtimer_cancel(&to->timer);
		destroy_hrtimer_on_stack(&to->timer);
	}
	return ret;
}

Before blocking the process, the current process will be inserted into a waiting queue. It should be noted that the waiting queue here is actually a Java HashMap-like structure, which is globally unique.

struct futex_hash_bucket {
	spinlock_t lock;
	//Bidirectional linked list
	struct plist_head chain;
};

static struct futex_hash_bucket futex_queues[1<<FUTEX_HASHBITS];

Focus on futex_wait_setup and two functions futex_wait_queue_me

static int futex_wait_setup(u32 __user *uaddr, u32 val, int fshared,
			   struct futex_q *q, struct futex_hash_bucket **hb)
{
	u32 uval;
	int ret;
retry:
	q->key = FUTEX_KEY_INIT;
	//Initialize futex_q
	ret = get_futex_key(uaddr, fshared, &q->key, VERIFY_READ);
	if (unlikely(ret != 0))
		return ret;

retry_private:
	//Obtain spin lock
	*hb = queue_lock(q);
	//Atoms set the value of uaddr to uval
	ret = get_futex_value_locked(&uval, uaddr);

   ... 
	//If the current uaddr points to a value that is not equal to val, it means that other processes have changed
	//The value pointed by uaddr, the waiting condition no longer holds, and returns directly without blocking.
	if (uval != val) {
		//Release lock
		queue_unlock(q, *hb);
		ret = -EWOULDBLOCK;
	}

   ...
	return ret;
}

The futex_wait_setup function does two main things, one is to get the spinlock, and the other is to determine whether * uaddr is the expected value.

static void futex_wait_queue_me(struct futex_hash_bucket *hb, struct futex_q *q,
				struct hrtimer_sleeper *timeout)
{
	//Set the process state to TASK_INTERRUPTIBLE and only select when the cpu schedules
	//Process with TASK_RUNNING status
	set_current_state(TASK_INTERRUPTIBLE);
	//Insert the current process (q encapsulation) into the waiting queue and release the spinlock
	queue_me(q, hb);

	//Start a Timing Task
	if (timeout) {
		hrtimer_start_expires(&timeout->timer, HRTIMER_MODE_ABS);
		if (!hrtimer_active(&timeout->timer))
			timeout->task = NULL;
	}

	/*
	 * If we have been removed from the hash list, then another task
	 * has tried to wake us, and we can skip the call to schedule().
	 */
	if (likely(!plist_node_empty(&q->list))) {
		 
		 //If no expiration time is set | | The expiration time is set and has not expired yet.
		if (!timeout || timeout->task)
			//The system reschedules the process, at which time the cpu will execute other processes, which will block here.
			schedule();
	}
	//Come here to show that the cpu has selected it to run again.
	__set_current_state(TASK_RUNNING);
}

There are several main things to do in futex_wait_queue_me:

  1. Insert the current process into the waiting queue
  2. Start a Timing Task
  3. Rescheduling process

How to Guarantee Atomicity between Conditions and Waiting

Spinlock is added to the futex_wait_setup method; in the futex_wait_queue_me, the state is set to TASK_INTERRUPTIBLE, and the queue_me is called to insert the current thread into the waiting queue before releasing the spinlock. That is to say, the process of checking the value of uaddr is placed in the same critical area as the process of hanging up. When the spinlock is released, it is no longer relevant to change the value of addr address, because the current process has joined the waiting queue and can be waked up without the problem of no wake-up mentioned at the beginning of this article.

Summary of futex_wait

Summarize the futex_wait process:

  1. Adding spin lock
  2. Check if * uaddr equals val, and if not, it returns immediately
  3. Set the process state to TASK_INTERRUPTIBLE
  4. Insert the current process into the waiting queue
  5. Release spin lock
  6. Create a Timed Task: Wake up the process when it has not been awakened for more than a certain period of time
  7. Suspend the current process

futex_wake

futex_wake

static int futex_wake(u32 __user *uaddr, int fshared, int nr_wake, u32 bitset)
{
	struct futex_hash_bucket *hb;
	struct futex_q *this, *next;
	struct plist_head *head;
	union futex_key key = FUTEX_KEY_INIT;
	int ret;

	...
	//Fill in the content of & key according to the value of uaddr
	ret = get_futex_key(uaddr, fshared, &key, VERIFY_READ);
	if (unlikely(ret != 0))
		goto out;
	//Get the futex_hash_bucket corresponding to uaddr according to & key
	hb = hash_futex(&key);
	//Spin lock on the hb
	spin_lock(&hb->lock);
	head = &hb->chain;
	//Traversing through the list of the hb, note that the nodes stored in the list are plist_node type, while this is futex_q type. This type conversion is achieved through the container_of mechanism in c.
	plist_for_each_entry_safe(this, next, head, list) {
		if (match_futex (&this->key, &key)) {
			...
			//Wake up the corresponding process
			wake_futex(this);
			if (++ret >= nr_wake)
				break;
		}
	}
	//Release spin lock
	spin_unlock(&hb->lock);
	put_futex_key(fshared, &key);
out:
	return ret;
}

The futex_wake process is as follows:

  1. Find the futex_hash_bucket corresponding to uaddr, which is the hb in the code
  2. Spin lock on hb
  3. Traverse the linked list of fb and find the corresponding node of uaddr
  4. Calling wake_futex to wake up the waiting process
  5. Release spin lock

In wake_futex, the formulated process state is set to TASK_RUNNING and added to the system scheduling list. At the same time, the process is removed from the waiting queue of futex. The specific code is not analyzed, and interested parties can study it by themselves.

End

ReentrantLock,Object.wait and Thread.sleep in Java all use futex to synchronize threads. Understanding the implementation of futex can help you better understand and use the synchronization mechanism of these upper layers. In addition, due to the limited space and energy, there is no specific analysis of the content related to process scheduling, but it does not hinder the understanding of the content of the article.

Original: Java Architecture Notes

Free Java advanced materials need to be collected by oneself, covering Java, Redis, MongoDB, MySQL, Zookeeper, Spring Cloud, Dubbo, high concurrent and distributed tutorials, a total of 30G. _
Portal: https://mp.weixin.qq.com/s/JzddfH-7yNudmkjT0IRL8Q

Keywords: Programming Java Linux glibc Redis

Added by brownca on Mon, 29 Jul 2019 13:52:32 +0300