Analysis of Linux RCU principle

background

  • Read the fueling source code! - by Lu Xun
  • A picture is worth a thousand words. --By Golgi

Explain:

  1. Kernel version: 4.14
  2. ARM64 processor, Contex-A53, dual core
  3. Using tool: Source Insight 3.5, Visio

1. overview

RCU, read copy update, is a synchronization mechanism in Linux kernel.
RCU is often described as an alternative to read-write lock, which is characterized by that readers do not need to synchronize with writers directly, and readers and writers can execute concurrently. The goal of RCU is to reduce the cost of reader side to the greatest extent, so it is also commonly used in scenarios with high performance requirements for readers.

  • Advantage:

    1. The reader side has little overhead, does not need to acquire any locks, does not need to execute atomic instructions or memory barriers;
    2. No deadlock;
    3. There is no priority reversal problem;
    4. There is no dangerous problem of memory leakage;
    5. Good real-time delay;
  • Disadvantages:

    1. The synchronization overhead of writers is large, and writers need to be mutually exclusive;
    2. It is more complex than other synchronization mechanisms;

Take a picture to describe the general operation:

  • Multiple readers can access critical resources simultaneously and use RCU read lock / RCU read unlock to calibrate critical area;
  • When the updater updates critical resources, it copies a copy as the basis for modification. When all readers leave the critical area, the pointer to the old critical resources points to the updated copies, and the old resources are recycled;
  • Only one writer is shown in the figure. When there are multiple writers, they need to be mutually exclusive;

The above description is relatively simple, and the implementation of RCU is very complex. This paper first gives an initial impression of RCU, and analyzes the interface with examples, and then goes deep into the implementation principle behind it layer by layer. Let's get started.

2. RCU Foundation

2.1 basic elements of RCU

The basic idea of RCU is to divide the Update operation into two parts: 1) Removal; 2) Reclamation.
The direct white point understanding is that critical resources are read by multiple readers. When the writer updates the copied copy after modification, the first step is to remove the old critical resource data (the modification pointer points to), and the second step is to recycle the old data (such as kfree).

Therefore, it is divided into the following three basic elements in terms of function: Reader/Updater/Reclaimer. The interaction between the three elements is as follows:

  1. Reader

    • RCU read lock and RCU read unlock are used to define the critical area of the reader. When accessing the data protected by RCU, you need to always access the critical area;
    • Before accessing the protected data, RCU? Reference is needed to obtain the RCU protected pointer;
    • When using non preemptive RCU, code that can sleep cannot be used between RCU ﹣ read ﹣ lock / RCU ﹣ read ﹣ unlock;
  2. Updater

    • When multiple updaters update data, they need to use mutual exclusion mechanism for protection;
    • Updater uses RCU assign pointer to remove the old pointer points and point to the updated critical resources;
    • The Updater uses synchronize RCU or call RCU to start the Reclaimer to reclaim the old critical resources, where synchronize RCU represents synchronous waiting for reclaiming, and call RCU represents asynchronous reclaiming;
  3. Reclaimer

    • Reclaimer recycles the old critical resources;
    • In order to ensure that no readers are accessing the critical resources to be recycled, the Reclaimer needs to wait for all readers to exit the critical area. The waiting time is called Grace Period;

2.2 three basic mechanisms of RCU

To provide the functions described above, RCU is implemented based on three mechanisms.

2.2.1 Publish-Subscribe Mechanism

What is the concept of subscription mechanism

  • Updater and Reader are similar to Publisher and Subsriber;
  • Updater updates the content, calls the interface to publish, and Reader calls the interface to read the publishing content.

So what needs to be done to ensure this subscription mechanism? Let's look at a pseudocode:

 /* Definiton of global structure */
 1 struct foo {
  2   int a;
  3   int b;
  4   int c;
  5 };
  6 struct foo *gp = NULL;
  7 
  8 /* . . . */
  9 /* =========Updater======== */ 
 10 p = kmalloc(sizeof(*p), GFP_KERNEL);
 11 p->a = 1;
 12 p->b = 2;
 13 p->c = 3;
 14 gp = p;
 15 
 16 /* =========Reader======== */
 17 p = gp;
 18 if (p != NULL) {
 19   do_something_with(p->a, p->b, p->c);
 20 }

At first glance, it seems that the problem is not big. Updater updates the assignment, Reader reads and other processes. However, due to the problems of compilation disorder and execution disorder, the execution order of the above code is not necessarily the order of the code. For example, in some architectures (DEC Alpha), the operation part of the Reader may operate do'something'with() before p assignment.

In order to solve this problem, Linux provides RCU assign point / RCU reference macro to ensure the execution order. The Linux kernel also encapsulates a higher level based on RCU assign point / RCU reference macro, such as list, hlist. Therefore, there are three scenarios protected by RCU in the kernel: 1) pointer; 2) list list list list; 3) hlist Hash list.

For these three scenarios, the publish subscribe interface is shown in the following table:

2.2.2 Wait For Pre-Existing RCU Readers to Complete

Reclaimer needs to recycle the old critical resources, so the question is, when? Therefore, RCU needs to provide a mechanism to ensure that all previous RCU readers have completed, that is, after exiting the critical area calibrated by RCU read lock / RCU read unlock, it can be recycled.

  • In the figure, Readers and Updater execute simultaneously;
  • When the Updater performs the Removal operation, it calls synchronize ﹣ RCU, which marks the end of the update and the beginning of the recovery phase;
  • After the synchronize ﹣ RCU call, there may be new readers to read critical resources (updated content), but Grace Period only waits for pre existing readers, that is, reader-4 and reader-5 in the figure. As long as these existing RCU readers exit the critical area, it means the end of the Grace Period, so the recycling work is carried out;
  • Synchronization ABCD RCU does not return immediately after the last pre existing RCU reader leaves the critical area, it may have a scheduling delay;

2.2.3 Maintain Multiple Versions of Recently Updated Objects

As can be seen from section 2.2.2, after the Updater is updated and before the Reclaimer is recycled, there will be critical resources of the new and old versions. Only after the synchronize RCU is returned, the Reclaimer will recycle the old critical resources, and the last version is left. Obviously, when there are multiple updaters, there will be more versions of critical resources.

Let's take a picture. Take pointers and linked lists as examples:

  • Call synchronize RCU to start as the critical point to maintain different versions of critical resources;
  • After Reclaimer reclaims the resources of the old version, it will be unified finally;

3. RCU example analysis

It's time for a wave of fueling sample code.

  • Overall code logic:
    1. Construct four kernel threads, two kernel threads test the RCU protection operation of pointer, and two kernel threads test the RCU protection operation of linked list;
    2. In the process of recovery, two mechanisms are used: synchronous RCU recovery and call RCU asynchronous recovery;
    3. In order to simplify the code, the basic fault-tolerant judgment has been omitted;
    4. The mechanism of multiple updaters is not considered, so the mutually exclusive operation between updaters is omitted;
#include <linux/module.h>
#include <linux/init.h>
#include <linux/slab.h>
#include <linux/kthread.h>
#include <linux/rcupdate.h>
#include <linux/delay.h>

struct foo {
	int a;
	int b;
	int c;
	struct rcu_head rcu;
	struct list_head list;
};

static struct foo *g_pfoo = NULL;

LIST_HEAD(g_rcu_list);

struct task_struct *rcu_reader_t;
struct task_struct *rcu_updater_t;
struct task_struct *rcu_reader_list_t;
struct task_struct *rcu_updater_list_t;

/* Reader operation of pointer */
static int rcu_reader(void *data)
{
	struct foo *p = NULL;
	int cnt = 100;

	while (cnt--) {
		msleep(100);
		rcu_read_lock();
		p = rcu_dereference(g_pfoo);
		pr_info("%s: a = %d, b = %d, c = %d\n",
				__func__, p->a, p->b, p->c);
		rcu_read_unlock();
	}

	return 0;
}

/*  Recycling operations */
static void rcu_reclaimer(struct rcu_head *rh)
{
	struct foo *p = container_of(rh, struct foo, rcu);
	pr_info("%s: a = %d, b = %d, c = %d\n",
			__func__, p->a, p->b, p->c);
	kfree(p);
}

/* Updater operation of pointer */
static int rcu_updater(void *data)
{
	int value = 1;
	int cnt = 100;

	while (cnt--) {
		struct foo *old;
		struct foo *new = (struct foo *)kzalloc(sizeof(struct foo), GFP_KERNEL);

		msleep(200);

		old = g_pfoo;

		*new = *g_pfoo;
		new->a = value;
		new->b = value + 1;
		new->c = value + 2;
		rcu_assign_pointer(g_pfoo, new);

		pr_info("%s: a = %d, b = %d, c = %d\n",
				__func__, new->a, new->b, new->c);

		call_rcu(&old->rcu, rcu_reclaimer);

		value++;
	}

	return 0;
}

/* Reader operation of linked list */
static int rcu_reader_list(void *data)
{
	struct foo *p = NULL;
	int cnt = 100;

	while (cnt--) {
		msleep(100);
		rcu_read_lock();
		list_for_each_entry_rcu(p, &g_rcu_list, list) {
			pr_info("%s: a = %d, b = %d, c = %d\n",
					__func__, p->a, p->b, p->c);
		}
		rcu_read_unlock();
	}

	return 0;
}

/* Updater operation of linked list */
static int rcu_updater_list(void *data)
{
	int cnt = 100;
	int value = 1000;

	while (cnt--) {
		msleep(100);
		struct foo *p = list_first_or_null_rcu(&g_rcu_list, struct foo, list);
		struct foo *q = (struct foo *)kzalloc(sizeof(struct foo), GFP_KERNEL);

		*q = *p;
		q->a = value;
		q->b = value + 1;
		q->c = value + 2;

		list_replace_rcu(&p->list, &q->list);

		pr_info("%s: a = %d, b = %d, c = %d\n",
				__func__, q->a, q->b, q->c);

		synchronize_rcu();
		kfree(p);

		value++; 
	}

	return 0;
}

/* module Initialization */
static int rcu_test_init(void)
{
	struct foo *p;

	rcu_reader_t = kthread_run(rcu_reader, NULL, "rcu_reader");
	rcu_updater_t = kthread_run(rcu_updater, NULL, "rcu_updater");
	rcu_reader_list_t = kthread_run(rcu_reader_list, NULL, "rcu_reader_list");
	rcu_updater_list_t = kthread_run(rcu_updater_list, NULL, "rcu_updater_list");

	g_pfoo = (struct foo *)kzalloc(sizeof(struct foo), GFP_KERNEL);

	p = (struct foo *)kzalloc(sizeof(struct foo), GFP_KERNEL);
	list_add_rcu(&p->list, &g_rcu_list);

	return 0;
}

/* module Clean-up work */
static void rcu_test_exit(void)
{
	kfree(g_pfoo);
	kfree(list_first_or_null_rcu(&g_rcu_list, struct foo, list));

	kthread_stop(rcu_reader_t);
	kthread_stop(rcu_updater_t);
	kthread_stop(rcu_reader_list_t);
	kthread_stop(rcu_updater_list_t);
}

module_init(rcu_test_init);
module_exit(rcu_test_exit);

MODULE_AUTHOR("Loyen");
MODULE_LICENSE("GPL");

To prove that there is no fraud, post the output log running on the development board, as shown in the following figure:

4. API introduction

4.1 core API

These interfaces below cannot be more core.

a.      rcu_read_lock()  //Mark the beginning of critical area for readers
b.      rcu_read_unlock()  //Mark the end of critical area for readers
c.      synchronize_rcu() / call_rcu() //Wait for the Grace period to finish before recycling
d.      rcu_assign_pointer()  //Updater uses this macro to assign values to RCU protected pointers
e.      rcu_dereference()  //Reader uses this macro to get the RCU protected pointer

4.2 other relevant API

Based on the core API, other related APIs are extended as follows, which will not be described in detail:

RCU list traversal::

        list_entry_rcu
        list_entry_lockless
        list_first_entry_rcu
        list_next_rcu
        list_for_each_entry_rcu
        list_for_each_entry_continue_rcu
        list_for_each_entry_from_rcu
        list_first_or_null_rcu
        list_next_or_null_rcu
        hlist_first_rcu
        hlist_next_rcu
        hlist_pprev_rcu
        hlist_for_each_entry_rcu
        hlist_for_each_entry_rcu_bh
        hlist_for_each_entry_from_rcu
        hlist_for_each_entry_continue_rcu
        hlist_for_each_entry_continue_rcu_bh
        hlist_nulls_first_rcu
        hlist_nulls_for_each_entry_rcu
        hlist_bl_first_rcu
        hlist_bl_for_each_entry_rcu

RCU pointer/list update::

        rcu_assign_pointer
        list_add_rcu
        list_add_tail_rcu
        list_del_rcu
        list_replace_rcu
        hlist_add_behind_rcu
        hlist_add_before_rcu
        hlist_add_head_rcu
        hlist_add_tail_rcu
        hlist_del_rcu
        hlist_del_init_rcu
        hlist_replace_rcu
        list_splice_init_rcu
        list_splice_tail_init_rcu
        hlist_nulls_del_init_rcu
        hlist_nulls_del_rcu
        hlist_nulls_add_head_rcu
        hlist_bl_add_head_rcu
        hlist_bl_del_init_rcu
        hlist_bl_del_rcu
        hlist_bl_set_first_rcu

RCU::

        Critical sections       Grace period            Barrier

        rcu_read_lock           synchronize_net         rcu_barrier
        rcu_read_unlock         synchronize_rcu
        rcu_dereference         synchronize_rcu_expedited
        rcu_read_lock_held      call_rcu
        rcu_dereference_check   kfree_rcu
        rcu_dereference_protected

bh::

        Critical sections       Grace period            Barrier

        rcu_read_lock_bh        call_rcu                rcu_barrier
        rcu_read_unlock_bh      synchronize_rcu
        [local_bh_disable]      synchronize_rcu_expedited
        [and friends]
        rcu_dereference_bh
        rcu_dereference_bh_check
        rcu_dereference_bh_protected
        rcu_read_lock_bh_held

sched::

        Critical sections       Grace period            Barrier

        rcu_read_lock_sched     call_rcu                rcu_barrier
        rcu_read_unlock_sched   synchronize_rcu
        [preempt_disable]       synchronize_rcu_expedited
        [and friends]
        rcu_read_lock_sched_notrace
        rcu_read_unlock_sched_notrace
        rcu_dereference_sched
        rcu_dereference_sched_check
        rcu_dereference_sched_protected
        rcu_read_lock_sched_held


SRCU::

        Critical sections       Grace period            Barrier

        srcu_read_lock          call_srcu               srcu_barrier
        srcu_read_unlock        synchronize_srcu
        srcu_dereference        synchronize_srcu_expedited
        srcu_dereference_check
        srcu_read_lock_held

SRCU: Initialization/cleanup::

        DEFINE_SRCU
        DEFINE_STATIC_SRCU
        init_srcu_struct
        cleanup_srcu_struct

All: lockdep-checked RCU-protected pointer access::

        rcu_access_pointer
        rcu_dereference_raw
        RCU_LOCKDEP_WARN
        rcu_sleep_check
        RCU_NONIDLE

Well, listing these API s is a bit of a parallel.

The mysterious veil of RCU has been uncovered initially. It will be a little difficult to pick clothes inside. After all, the realization mechanism behind RCU is really difficult. So, the question is, do you want to be a gentleman? Please pay attention.

Reference resources

Documentation/RCU
What is RCU, Fundamentally?
What is RCU? Part 2: Usage
RCU part 3: the RCU API
Introduction to RCU

Welcome to the official account, continue to share the kernel mechanism articles in the form of pictures and texts.

Tags: Linux

Posted on Sat, 11 Apr 2020 05:00:45 -0700 by stephfox