[xv6 source code snooping] mmap

preface

This article is about MIT 6 Implementation of s081-2020-lab10 (MMAP);
If you find any problems in the content, please don't save your keyboard.

preparation

MMAP () and munmap() are two system calls that are only available in UNIX like systems. This experiment is only modeled on the function (toy) of realizing some of its file memory mapping. See [MMAP mapping in Linux] [1] for specific functions of mmap()( https://zhuanlan.zhihu.com/p/... ), understanding the role of mmap() in real Linux can be a good reference for completing this experiment.

Before starting to do the experiment, I'll briefly sort out my understanding of mmap() from the perspective of this experiment. It's the first thing to think clearly before doing it.

mmap() can copy the contents of files in the disk to a physical memory area, and establish a mapping between the specified virtual address area and the physical memory area in the user process space. However, in the implementation, mmap() does not care about the contents of the file. The copying of actual data is realized through Lazy Allocation, which is also what we did in Experiment 5. So most of the work of mmap() will be left to kernel / trap The user trap handler(void usertrap(void)) under C handles page fault, allocates new memory pages, and copies file contents, while mmap() itself only does little related initialization work.

What are the initialization tasks? The function signature of mmap() is void* mmap(void *addr, size_t length, int prot, int flags, int fd, off_t offset). Refer to sys in Lazy Allocation_ For the implementation of sbrk (), MMAP () will find a large enough area in the user process space and record the parameter information as the input for page fault processing. As mentioned in the experimental guide, a structure needs to be associated with process. Considering that the user process may have multiple MMAP () files, we use a structure array with a size of 16 to store the relevant information of each MMAP (). The implementation here is very similar to the open file table of each process (myproc() - > ofile).

When (Flags & map_shared)= 0, any modification of the file data in memory by the user finally needs to call munmap() to drop the disk when the process exit s; And when (Flags & map_private)= 0, the file data in the memory can be completely free from the corresponding file data in the disk, which is transparent to each other. The question is, do you need to consider synchronization when different processes drop the same file at the same time? It must be necessary. Just take the inode mutex in advance and release it after it is used up. The inode lock is obtained to protect the inode reference number, and the inode layer will effectively shield the synchronous operation of buffer cache layer block falling disk. We don't need to care about the following problems of inode layer.

You may need to wake up when (Flags & map_shared)= 0, the modification of memory file data in mmap() of real Linux needs to be visible to all processes mapping this memory. The experiment is not needed here. Even if necessary, considering the problem of address access synchronization, we can't think of solving it from the software level. Therefore, the experimental guidance says:

It's OK if processes that map the same MAP_SHARED file do not share physical pages.

Finally, of course, when fork(), the contents of parent-child processes should be consistent, but don't forget to increase the number of references to VMA's struct file.

The above basic steps can pass the part of mmaptest, and the optimized space mentioned in the following instructions:

MAP_ It is better to judge whether the dirty of PTE is set to selectively operate when the shared page falls. [√]
Use cow_fork() optimizes the original fork(). [ ×]

Experimental part

mmap

Struct vma is the core structure. Each process will have a fixed number of vma. We combine a 16 long struct vma array into struct proc. When the value of the valid field is 1, it means that the vma is idle; When 0 is taken, it means that the vma is being occupied. Each process will only belong to its own vma array, so it does not need locks for access control.

/* kernel/proc.h */

struct vma {
  int valid;
  struct file *f;
  uint64 addr;
  uint64 length;
  int prot;
  int flags;
};

#define MAXVMASSZIE  16

// Per-process state
struct proc {
  ...
  struct vma vmas[MAXVMASSZIE];
};

When the mmap allocates vma, you need to find a free block from myproc () - > Vmas to be occupied by the mmap. Because the user process does not give the starting logical address of the mapping when calling, we need to find a suitable location in the user process address space and put it in.

Recalling the previous virtual address space layout of user processes, we can only map the file contents to the heap area. However, if the allocation starts from myproc() - > SZ like sbrk(int n), it will cause a lot of internal memory fragments. It will be very troublesome to release the process at that time. When executing void uvmfree(pagetable_t pagetable, uint64 sz) function, panic will occur accidentally. Therefore, I choose to allocate our vma at the bottom of the trapframe. It will be easier to allocate and release, regardless of the problem of myproc () - > Sz. Therefore, when calling the allocation mmap() to allocate vma, the first fit algorithm (first adaptation algorithm) will be used. It is intended to scan the low address from the trafframe (maxva-2 * PageSize). When the first available address is found, the vma will be allocated here.

Every time I mmap(), I will put the vma - > valid = 1 vma in the vma array at the right end of the array, so that directly taking the vma at the right end of the array is idle, and then sort the remaining vma according to the address from high to bottom. The sorting algorithm uses simple bubble sorting.

/* kernel/sysfile.c */


void
swap_vma(struct vma* vma1, struct vma *vma2)
{
  if (vma1 == vma2)
    return;
  struct vma temp = *vma1;
  *vma1 = *vma2;
  *vma2 = temp;
}

void
sort_vmas(struct vma* vmas)
{  
  struct vma *l = vmas, *r = vmas+MAXVMASSZIE-1, *bound;

  if (l >= r)
    return;

  while (1) {
    while (l < vmas+MAXVMASSZIE && !l->valid)
      ++l;
    while (r >= vmas && r->valid)
      --r;
    if (l < r)
      swap_vma(l, r);
    else 
      break;
  }

  if (l == vmas)
    return;

  bound = r;
  struct vma *i, *max;
  
  while (vmas < bound) {
    max = vmas;
    for (i = vmas+1; i <= bound; ++i)
      max = max->addr < i->addr ? i : max;
    swap_vma(max, bound);
    --bound;
  }
}

As mentioned in the previous section, mmap() will not do much work. It will only allocate and initialize vma and increase the number of references to the file by one. In addition, it should be noted that if the mmap file is SHARED, we also need to check the consistency of flags and file access permissions; PRIVATE is not needed, because the memory content will not be written back to the source file at this time. It doesn't matter how you modify the memory content.

/* kernel/sysfile.c */

uint64
sys_mmap(void)
{

  int prot, flags, fd, length;
  uint64 top = TRAPFRAME;

  if (argint(1, &length) < 0 || argint(2, &prot) < 0 || argint(3, &flags) < 0 || argint(4, &fd) < 0)
    return -1;

  struct proc *p = myproc();
  struct file *f = p->ofile[fd];


  if (flags & MAP_SHARED) {
    if (f->readable && !f->writable) 
      if (prot ^ PROT_READ) 
        return -1;
    if (!f->readable && f->writable)
      if (prot ^ PROT_WRITE)
        return -1;
  }

  sort_vmas(p->vmas);

  struct vma *vma, *res = p->vmas+MAXVMASSZIE-1;

  if (!res->valid)
    return -1;

  for (vma = p->vmas; vma < p->vmas + MAXVMASSZIE; ++vma) {
    if (!vma->valid) {
      if (top <= vma->addr && vma->addr + vma->length <= top - length) {
        break;
      } else {
        top = vma->addr;
      }
    } else {
      break;
    }
  }

  res->addr = top-length;
  res->length = length;
  res->flags = flags;
  res->prot = prot;
  res->valid = 0;
  res->f = filedup(p->ofile[fd]);

  return res->addr;
}

In usertrap(), the processing of page missing interrupts is very similar to Lazy Allocation. They are all kinds of exceptions and boundary checks. The new part is to read the content from the file. We use an auxiliary function to encapsulate the read-write logic of this part.

/* kernel/trap.c */

void
usertrap(void)
{
  ...
  
  if(r_scause() == 8){
      
  ...
      
  } else if (r_scause() == 13 || r_scause() == 15) {
    uint64 va = r_stval();
    uint64 sp = PGROUNDUP(p->trapframe->sp);

    if (va >= MAXVA || va < sp) {

      goto kl;
    } else {
      va = PGROUNDDOWN(va);
      pte_t *pte;

      if ((pte = walk(p->pagetable, va, 0)) != 0 && (*pte & PTE_V) != 0)
        goto kl;

      struct vma *vma;
      for (vma = p->vmas; vma < p->vmas + MAXVMASSZIE; ++vma)
        if (!vma->valid)
          if(va >= vma->addr && va+PGSIZE <= vma->addr+vma->length)
            break;
    
      if (vma == &p->vmas[MAXVMASSZIE])
        goto kl;

      char *pa = kalloc();
      if(pa != 0){
        if(mappages(p->pagetable, va, PGSIZE, (uint64)pa, (vma->prot << 1) | PTE_U) != 0){
          kfree(pa);
          goto kl;
        }
        rw_vma(vma, 1, (uint64)pa, va-vma->addr, PGSIZE);
      } else {
        goto kl;
      }
    }
  } else if((which_dev = devintr()) != 0){
    // ok
  } else {
    kl:
    printf("usertrap(): unexpected scause %p pid=%d\n", r_scause(), p->pid);
    printf("            sepc=%p stval=%p\n", r_sepc(), r_stval());
    p->killed = 1;
  }
    
  ...
      
}

/* kernel/sysfile.c */

int
rw_vma(struct vma *vma, int r, uint64 addr, uint off, uint n)
{
  struct file *f = vma->f;
  int res = 0;

  if (r) {
    ilock(f->ip);
    readi(f->ip, 0, addr, off, n);
    iunlock(f->ip);

    int overflow = f->ip->size - off;
    if (overflow <= PGSIZE && overflow > 0) 
      memset((char*)(addr+n-overflow), 0, overflow);
  } else {
    ilock(f->ip);
    begin_op();
    res = writei(f->ip, 0, addr, off, n);
    end_op();
    iunlock(f->ip);
  }

  return res;
}

This is the first big pit I stepped on. Note MMAP_ In test (), first open the file named "mmap.dur" and call mmap() to map it to memory. After_ v1() will read the contents of 2 PGSIZE bytes from the address allocated by mmap(), so page fault interrupt will be triggered twice in total. Although usertrap() will allocate two pages here, in fact, the file size of "mmap.dur" is only 6144 bytes, that is, PGSIZE + (PGSZIE/2) bytes. When we read the second page, readi only reads PGSIZE/2 bytes and returns. Therefore, we need to fill the remaining PGSIZE/2 space in memory with 0 bytes, which is also RW_ What is done inside the first branch of VMA (). Otherwise_ v1() will read some strange data after PGSZIE + (PGSIZE/2), resulting in failure to pass the first test.

At munmap(), vma will also be released when the process exits, so this part of logic should be put into another auxiliary function:

/* kernel/sysfile.c */

int
_munmap(uint64 addr, int length, struct vma *vma)
{
  struct proc *p = myproc();

  if (!vma) {
    for (vma = p->vmas; vma < p->vmas + MAXVMASSZIE; ++vma)
      if (!vma->valid)
        if (addr >= vma->addr && addr+length <= vma->addr+vma->length)
          break;

    if (vma == p->vmas+MAXVMASSZIE)
      return -1;
  } else {
    addr = vma->addr;
    length = vma->length;
  }

  if (vma->flags & MAP_SHARED)
    _uvmunmap(p->pagetable, addr, PGROUNDUP(length)/PGSIZE, vma);
  else
    _uvmunmap(p->pagetable, addr, PGROUNDUP(length)/PGSIZE, 0);

  vma->addr = vma->addr == addr ? addr+length : vma->addr;
  vma->length -= length;

  if (vma->length == 0) {
    fileclose(vma->f);
    vma->valid = 1;
  }

  return 0;
}

The release logic is very simple. First, search for a vma so that its range can cover the range from addr to addr + length. However, when the process is released, we don't need to search like this. We can let the caller directly pass in the vma to be released. Then, when freeing the memory, we need to discuss the situation: if the SHARED content needs to be written back to the disk; And PRIVATE just release page. Since the two processing logic are very similar, they are both controlled by_ uvmunmap() function.

/* kernel/sysfile.c */


void
_uvmunmap(pagetable_t pagetable, uint64 va, uint64 npages, struct vma* vma)
{
  uint64 a;
  pte_t *pte;

  if((va % PGSIZE) != 0)
    panic("uvmunmap: not aligned");

  for(a = va; a < va + npages*PGSIZE; a += PGSIZE){
    if((pte = walk(pagetable, a, 0)) == 0)
      continue;
    if((*pte & PTE_V) == 0)
      continue;
    if(PTE_FLAGS(*pte) == PTE_V)
      panic("uvmunmap: not a leaf");

    uint64 pa = PTE2PA(*pte);
    if (vma && *pte & PTE_D)
      rw_vma(vma, 0, PTE2PA(*pte), a-vma->addr, PGSIZE);

    kfree((void*)pa);
    *pte = 0;
  }
}

_ The logic of munmap() is so simple because the test does not punch a hole between the memory areas allocated by vma, that is, the test is only released from one end of vma area every time, which greatly reduces the complexity of the algorithm.

Finally, there are some initialization and recycling logic when allocating and releasing processes:

/* kernel/pro.c */

static struct proc*
allocproc(void)
{
  struct proc *p;
  ...

found:
  ...

  for (struct vma *vma = p->vmas; vma < p->vmas + MAXVMASSZIE; ++vma) {
    memset(vma, 0, sizeof(struct vma));
    vma->valid = 1;
  }

  return p;
}

void
exit(int status)
{
  ...
      
  // Close all open files.
  for(int fd = 0; fd < NOFILE; fd++){
    if(p->ofile[fd]){
      struct file *f = p->ofile[fd];
      fileclose(f);
      p->ofile[fd] = 0;
    }
  }

  for (struct vma* vma = p->vmas; vma < p->vmas + MAXVMASSZIE; ++vma)
    if (!vma->valid)
      _munmap(0, 0, vma);

  ...
}

It would be better to follow the prompts bit by bit, but when I saw the vma release processing during the release process, I thought it would be better to put this part of logic in freeproc(struct proc *p), because after all, it is dealing with page table related logic. After writing, I found that forktest has been panic: free leaf. I can't find the problem. Later, when printing the pid of each process, it is found that the process executing exit() is different from the process executing freeproc(). At this time, I realized that the resource recovery after the process exits is the work of the parent process, and I obtained the struct proc pointer through myproc(), resulting in the release of the page table of the parent process. This is the biggest pit I feel after the whole experiment.

Finally, post a screenshot of make grade:

Postscript

This experiment is a comprehensive application of the contents of the previous experiments. You don't need to look at many things in advance, but it's hard after all. It's still difficult to pass all the tests after compiling several times. I should have compiled it more than 100 times... (shame)

The next article should be 6 S081 2020 is the last experimental implementation of network card driver. At that time, the learning of operating system can be stopped for a while.

Keywords: Operating System

Added by kungfu71186 on Sun, 23 Jan 2022 15:05:45 +0200

Programming VIP