Lab cow
The problem
The fork system call in xv6 copies the user space memory of all the parent processes to the child processes. If the user space memory of the parent process is large, the replication will be time-consuming. Sometimes this copy is unnecessary. If the child process calls exec after fork, the copied memory will be released, and most of the copied memory may not be used. If the parent-child processes share a page and one of them wants to write this page, it really needs to be copied.
The Solution
The goal of COW fork is to delay the allocation of physical pages for child processes until they need to be copied.
COW fork only creates a page table for the child process, and its corresponding PTEs in user memory points to the physical page of the parent process. COW fork() marks that all user PTEs of parent-child processes are not writable. When one of the processes wants to write COW pages, the CPU generates a page fault. The page fault handler of the kernel handles this situation:
- Assign a physical page to the wrong process.
- Copy the contents of the original page to the new page.
- Modify the PTE to point to the physical address of the new page and mark it writable.
- When the page fault handler returns, the user space can write the copied page.
Note when releasing a physical page: a physical page may have multiple process page table references, which can be released only when the number of references is \ (1 \).
Implement copy-on write
vm.c:
- Mark the PTE of the parent-child process as COW page and non writable.
You need to judge whether this page is a PTE_U. Because the trafframe page will be processed separately after fork ing the mapping of user pages. And the parameter \ (sz \) of uvmcopy is the size of the process user space memory, excluding trapframe. - The number of physical page references of the parent process plus \ (1 \).
- The PTE of the child process points to the physical page of the parent process (no physical page is allocated to the child process).
// Given a parent process's page table, copy // its memory into a child's page table. // Copies only the page table and // clear their PTE_W. // returns 0 on success, -1 on failure. // frees any allocated pages on failure. int uvmcopy(pagetable_t old, pagetable_t new, uint64 sz) { pte_t *pte; uint64 pa, i; uint flags; for(i = 0; i < sz; i += PGSIZE){ if((pte = walk(old, i, 0)) == 0) panic("uvmcopy: pte should exist"); if((*pte & PTE_V) == 0) panic("uvmcopy: page not present"); pa = PTE2PA(*pte); flags = PTE_FLAGS(*pte); // trapframe(PTE_U=0) copied separately after call uvmcopy in fork() // uvmcopy only pay attention to user memory(PTE_U set) if ((flags & PTE_W) && (PTE_U | flags)) { flags = (flags & (~PTE_W)) | PTE_RSW_COW; *pte = ((*pte) & (~PTE_W)) | PTE_RSW_COW; } krefinc(pa); if(mappages(new, i, PGSIZE, pa, flags) != 0){ goto err; } } return 0; err: uvmunmap(new, 0, i / PGSIZE, 1); return -1; }
trap.c:
- cowpagefault() handles writing non writable cow pages.
- Allocate a new page through kalloc().
- Important optimization: you can judge the number of references of this page before copying. If only this process is in use, you can directly set the page writable.
- Copy the contents of the old page to the new page.
- Set PTE: writable, cancel the COW page mark, and use the physical address of the new page.
- Release the old page (the reference is \ (1 \) and recycle the physical page, otherwise the number of references is reduced by \ (1 \)).
// only handler cow fault. // write the COW page(PTE_RSW_COW) of PTE_W on 0. int cowpagefault(pagetable_t pagetable, uint64 va) { pte_t *pte; uint64 oldpa, newpa; // va >= MAXVA cause walk() panic if (va >= MAXVA) return -1; if ((pte = walk(pagetable, va, 0)) == 0) return -1; if ((*pte & PTE_RSW_COW) == 0) return -1; if (*pte & PTE_W) return -1; // guard page, trampoline, trapframe if ((*pte & PTE_U) == 0) { printf("PTE_U=0\n"); return -1; } oldpa = PTE2PA(*pte); // Important optimization: if a store page fault and // the ref which pointed this phy-page is one, no copy. if (krefnum(oldpa) == 1) { *pte = (*pte | PTE_W) & (~PTE_RSW_COW); return 0; } if ((newpa = (uint64)kalloc()) == 0) { printf("cowpagefault: kalloc fault.\n"); return -1; } // copy the old page to the new page memmove((void*)newpa, (void*)oldpa, PGSIZE); *pte = (PA2PTE(newpa) | PTE_FLAGS(*pte) | PTE_W) & (~PTE_RSW_COW); kfree((void*)oldpa); return 0; }
trap.c : usertrap():
Improve the trap path.
else if (r_scause() == 15) { if (cowpagefault(p->pagetable, r_stval()) < 0) { p->killed = 1; } }
kalloc.c:
- Judge the reference count when releasing the physical page. Only when the number of references is \ (1 \) can the physical page be released.
- Based on the design of kfree(), the reference count must be set to \ (1 \) in the initialization freerange of the free physical page linked list, otherwise kfree() will panic.
// Free the page of physical memory pointed at by v, // which normally should have been returned by a // call to kalloc(). (The exception is when // initializing the allocator; see kinit above.) void kfree(void *pa) { struct run *r; if(((uint64)pa % PGSIZE) != 0 || (char*)pa < end || (uint64)pa >= PHYSTOP) panic("kfree"); int phypageindex = PHYPAGEINDEX((uint64)pa); uint16 refcnt; acquire(&kmem.lock); refcnt = kmem.ppagerefcnt[phypageindex]; if (refcnt < 1) panic("kfree ref = 0."); kmem.ppagerefcnt[phypageindex]--; release(&kmem.lock); if (refcnt > 1) return; // Fill with junk to catch dangling refs. memset(pa, 1, PGSIZE); r = (struct run*)pa; acquire(&kmem.lock); r->next = kmem.freelist; kmem.freelist = r; release(&kmem.lock); } void krefinc(uint64 pa) { int phypageindex = PHYPAGEINDEX(pa); acquire(&kmem.lock); if (pa >= PHYSTOP || kmem.ppagerefcnt[phypageindex] < 1) panic("krefinc pa invalid."); kmem.ppagerefcnt[phypageindex]++; release(&kmem.lock); } uint16 krefnum(uint64 pa) { int phypageindex = PHYPAGEINDEX(pa); uint16 sz; acquire(&kmem.lock); sz = kmem.ppagerefcnt[phypageindex]; release(&kmem.lock); return sz; }
vm.c:
When copyyout () copies the data in the kernel to the user space and obtains the physical address corresponding to the virtual address dstva, it is not the MMU converting the address, but the kernel reads the page table content through walkaddr simulation hardware and converts the virtual address into the physical address. The kernel operates directly on the physical address, so it needs to copy the COW page when copyyout. Otherwise, the non writable COW page will be directly modified without page fault and difficult to troubleshoot.
// Copy from kernel to user. // Copy len bytes from src to virtual address dstva in a given page table. // Return 0 on success, -1 on error. int copyout(pagetable_t pagetable, uint64 dstva, char *src, uint64 len) { uint64 n, va0, pa0; while(len > 0){ va0 = PGROUNDDOWN(dstva); pa0 = walkaddr(pagetable, va0); if(pa0 == 0) return -1; pte_t *pte = walk(pagetable,va0, 0); if (!(*pte & PTE_W) && (*pte & PTE_RSW_COW)) { if (cowpagefault(pagetable, va0) < 0) return -1; pa0 = PTE2PA(*pte); } n = PGSIZE - (dstva - va0); if(n > len) n = len; memmove((void *)(pa0 + (dstva - va0)), src, n); len -= n; src += n; dstva = va0 + PGSIZE; } return 0; }