RMAP for Linux Memory Management

background

  • Read the fucking source code!--By Luxun
  • A picture is worth a thousand words. --By Golgi

Explain:

  1. Kernel version: 4.14
  2. ARM64 Processor, Contex-A53, Dual Core
  3. Use Tools: Source Insight 3.5, Visio

1. Overview

RMAP reverse mapping is a method for physical addresses to reverse-map virtual addresses.

  • mapping
    Page tables are used for virtual address-to-physical address mapping, where PTE page table entries record mapping relationships, and the mapcount field in the struct page structure holds how many PTE page table entries map the physical page.

  • Reverse Mapping
    When a physical address is being recycled or migrated, you need to find out how many virtual addresses are projected at that physical address and disconnect the mapping process.Without a mechanism for reverse mapping, the efficiency of traversing the process's page tables is obviously inefficient.Reverse mapping can find the virtual address space VMA and simply unmap it from the user page table used by the VMA, which can quickly solve this problem.

Typical application scenarios for reverse mapping:

  1. When kswapd recycles pages, it needs to disconnect all PTE table entries that map the anonymous page.
  2. When migrating a page, you need to disconnect all PTE table entries that map the anonymous page.

2. Data structure

Reverse mapping has three key structures:

  1. struct vm_area_struct, short for VMA;
    VMA is described in a previous article to describe an area in the process address space.The fields associated with the reverse mapping are as follows:
struct vm_area_struct {
...
/*
     * A file's MAP_PRIVATE vma can be in both i_mmap tree and anon_vma
     * list, after a COW of one of the file pages.  A MAP_SHARED vma
     * can only be in the i_mmap tree.  An anonymous MAP_PRIVATE, stack
     * or brk vma (with NULL file) can only be in an anon_vma list.
     */
    struct list_head anon_vma_chain; /* Serialized by mmap_sem &
                      * page_table_lock */
    struct anon_vma *anon_vma;  /* Serialized by page_table_lock */
...
}
  1. struct anon_vma, short for AV;
    The AV structure is used to manage anonymous types of VMAs. When there are anonymous pages that need unmap processing, AV can be found first, then AV can be found for processing.The structure is as follows:
/*
 * The anon_vma heads a list of private "related" vmas, to scan if
 * an anonymous page pointing to this anon_vma needs to be unmapped:
 * the vmas on the list will be related by forking, or by splitting.
 *
 * Since vmas come and go as they are split and merged (particularly
 * in mprotect), the mapping field of an anonymous page cannot point
 * directly to a vma: instead it points to an anon_vma, on whose list
 * the related vmas can be easily linked or unlinked.
 *
 * After unlinking the last vma on the list, we must garbage collect
 * the anon_vma object itself: we're guaranteed no page can be
 * pointing to this anon_vma once its vma list is empty.
 */
struct anon_vma {
    struct anon_vma *root;      /* Root of this anon_vma tree */
    struct rw_semaphore rwsem;  /* W: modification, R: walking the list */
    /*
     * The refcount is taken on an anon_vma when there is no
     * guarantee that the vma of page tables will exist for
     * the duration of the operation. A caller that takes
     * the reference is responsible for clearing up the
     * anon_vma if they are the last user on release
     */
    atomic_t refcount;

    /*
     * Count of child anon_vmas and VMAs which points to this anon_vma.
     *
     * This counter is used for making decision about reusing anon_vma
     * instead of forking new one. See comments in function anon_vma_clone.
     */
    unsigned degree;

    struct anon_vma *parent;    /* Parent of this anon_vma */

    /*
     * NOTE: the LSB of the rb_root.rb_node is set by
     * mm_take_all_locks() _after_ taking the above lock. So the
     * rb_root must only be read/written after taking the above lock
     * to be sure to see a valid next pointer. The LSB bit itself
     * is serialized by a system wide lock only visible to
     * mm_take_all_locks() (mm_all_locks_mutex).
     */

    /* Interval tree of private "related" vmas */
    struct rb_root_cached rb_root;
};
  1. struct anon_vma_chain, short for AVC;
    AVC is the bridge between VMA and AV.
/*
 * The copy-on-write semantics of fork mean that an anon_vma
 * can become associated with multiple processes. Furthermore,
 * each child process will have its own anon_vma, where new
 * pages for that process are instantiated.
 *
 * This structure allows us to find the anon_vmas associated
 * with a VMA, or the VMAs associated with an anon_vma.
 * The "same_vma" list contains the anon_vma_chains linking
 * all the anon_vmas associated with this VMA.
 * The "rb" field indexes on an interval tree the anon_vma_chains
 * which link all the VMAs associated with this anon_vma.
 */
struct anon_vma_chain {
    struct vm_area_struct *vma;
    struct anon_vma *anon_vma;
    struct list_head same_vma;   /* locked by mmap_sem & page_table_lock */
    struct rb_node rb;          /* locked by anon_vma->rwsem */
    unsigned long rb_subtree_last;
#ifdef CONFIG_DEBUG_VM_RB
    unsigned long cached_vma_start, cached_vma_last;
#endif
};

In one picture it becomes clear:

  • Anon_vma_chain is added to the vma->anon_vma_chain chain table through the same_vma chain table node.
  • Add anon_vma_chain to the red-black tree anon_vma->rb_root through the RB red-black tree node;

2. Process analysis

First look at the macro picture:

  • Address space VMA can map virtual addresses to physical addresses through a page table.
  • The page frame corresponds to the page structure, and the mapping field in the page structure points to anon_vma, so the VMA associated with it can be found through the RMAP mechanism.

2.1 anon_vma_prepare

In the previous page fault article, the anon_vma_prepare function was mentioned, which prepares the struct anon_vma structure for VMAs in the process address space.

The call routine and function flow are shown in the following figure:

The relationship among VMA, AV and AVC is already described in the figure above.

When VMA-associated AV is created, there is still a critical step that needs to be done to truly connect the RMAP pathway, that is, to associate pages with AV.Only then can the AV be found through the page and the VMA be found, thus completing the corresponding PTE unmap operation.

2.2 Subprocess Create anon_vma

The parent process creates a child process by fork(), which copies the address space and page table of the entire parent process.The child process copies the VMA data structure content of the parent process, while the child process creates the corresponding anon_vma structure through the anon_vma_fork() function.

The anon_vma_fork() effect is as follows:

Take the actual fork() twice as an example, and after COW occurs, look at the link relationships among the three processes as follows:

2.3 TTU (try to unmap) and Rmap Walk

If page s are mapped to multiple virtual addresses, you can iterate through all VMA s through the Rmap Walk mechanism and eventually call the callback function to unmap them.

The related structure is struct rmap_walk_control, as follows:

/*
 * rmap_walk_control: To control rmap traversing for specific needs
 *
 * arg: passed to rmap_one() and invalid_vma()
 * rmap_one: executed on each vma where page is mapped
 * done: for checking traversing termination condition
 * anon_lock: for getting anon_lock by optimized way rather than default
 * invalid_vma: for skipping uninterested vma
 */
struct rmap_walk_control {
    void *arg;
    /*
     * Return false if page table scanning in rmap_walk should be stopped.
     * Otherwise, return true.
     */
    bool (*rmap_one)(struct page *page, struct vm_area_struct *vma,
                    unsigned long addr, void *arg);
    int (*done)(struct page *page);
    struct anon_vma *(*anon_lock)(struct page *page);
    bool (*invalid_vma)(struct vm_area_struct *vma, void *arg);
};

The entry to unmap is try_to_unmap, as shown in the following diagram:

The basic routine is to initialize the callback function around the struct rmap_walk_control structure so that it can be called at the appropriate time.

Detailed details about unmapping try_to_unmap_one will not go any further, just a good grasp of the general framework.

Tags: Linux

Posted on Tue, 07 Jan 2020 16:07:33 -0800 by ericburnard