The Execution Procedure of do_fork Function in Linux

introduce

Linux provides several system calls to create and terminate processes, as well as to execute new programs. They are Fork, vfork, clone and exec, exit. Clone is used to create lightweight processes, and resources must be shared. Exc system calls to execute a new program, exit system calls to terminate processes. Whether it's fork, VFORK or clone, in the kernel, do_fork is ultimately called to create the process.

asmlinkage int sys_fork(unsigned long r4, unsigned long r5,
            unsigned long r6, unsigned long r7,
            struct pt_regs __regs)
{
#ifdef CONFIG_MMU
    struct pt_regs *regs = RELOC_HIDE(&__regs, 0);
    return do_fork(SIGCHLD, regs->regs[15], regs, 0, NULL, NULL);
#else
    /* fork almost works, enough to trick you into looking elsewhere :-( */
    return -EINVAL;
#endif
}
asmlinkage int sys_clone(unsigned long clone_flags, unsigned long newsp,
             unsigned long parent_tidptr,
             unsigned long child_tidptr,
             struct pt_regs __regs)
{
    struct pt_regs *regs = RELOC_HIDE(&__regs, 0);
    if (!newsp)
        newsp = regs->regs[15];
    return do_fork(clone_flags, newsp, regs, 0,
            (int __user *)parent_tidptr,
            (int __user *)child_tidptr);
}
asmlinkage int sys_vfork(unsigned long r4, unsigned long r5,
             unsigned long r6, unsigned long r7,
             struct pt_regs __regs)
{
    struct pt_regs *regs = RELOC_HIDE(&__regs, 0);
    return do_fork(CLONE_VFORK | CLONE_VM | SIGCHLD, regs->regs[15], regs,
               0, NULL, NULL);
}

Execution of do_fork function

The do_fork() function generates a new process, which is roughly divided into three steps.

  1. Establish process control structure and assign initial value to make it process image.
  2. Set up the relevant kernel data structure to track the execution of the new process. It includes task array, free time list tarray_freelist and pidhash [] array.
  3. Start the scheduler to give the subprocess a chance to run.

Next, we will go through the kernel source code to explain what each step has done.

  1. In the first step
    First, apply for a task_struct data structure to represent the new process to be generated. By checking the value of clone_flags, determine what needs to be done next. Through the copy_process function, the parent process PCB is copied directly into the PCB of the new process. Assign a unique process ID PID and user_struct structure to the new process.
struct task_struct *p;
    int trace = 0;
    long nr;

    /*
     * Do some preliminary argument and permissions checking before we
     * actually start allocating stuff
     */
    if (clone_flags & CLONE_NEWUSER) {          
        if (clone_flags & CLONE_THREAD)
            return -EINVAL;
        /* hopefully this check will go away when userns support is
         * complete
         */
        if (!capable(CAP_SYS_ADMIN) || !capable(CAP_SETUID) ||
                !capable(CAP_SETGID))
            return -EPERM;
    }

    /*
     * We hope to recycle these flags after 2.6.26
     */
    if (unlikely(clone_flags & CLONE_STOPPED)) {
        static int __read_mostly count = 100;

        if (count > 0 && printk_ratelimit()) {
            char comm[TASK_COMM_LEN];

            count--;
            printk(KERN_INFO "fork(): process `%s' used deprecated "
                    "clone flags 0x%lx\n",
                get_task_comm(comm, current),
                clone_flags & CLONE_STOPPED);
        }
    }

    /*
     * When called from kernel_thread, don't do user tracing stuff.
     */
    if (likely(user_mode(regs)))
        trace = tracehook_prepare_clone(clone_flags);

    p = copy_process(clone_flags, stack_start, regs, stack_size,
             child_tidptr, NULL, trace);
  1. In the second step

Firstly, the new process is added to the process list, the new process is added to the pidhash hash list, and the task count value is increased. The context of the hardware (TSS segment, LDT and GDT) is initialized by copying the context of the parent process.

if (!IS_ERR(p)) {//The function IS_ERR() analyses whether the return value of copy_process() is correct.
        struct completion vfork;//Define the variable vfork of struct completion type.

        trace_sched_process_fork(current, p);

        nr = task_pid_vnr(p);

        if (clone_flags & CLONE_PARENT_SETTID)
            put_user(nr, parent_tidptr);

        if (clone_flags & CLONE_VFORK) {//Judging whether there is CLONE_VFORK mark in clone_flags
            p->vfork_done = &vfork;
            init_completion(&vfork);/*This function is used in the last stage of process creation, where the parent process sets itself to an uninterruptible state and then sleeps in the
 On the waiting queue (init_waitqueue_head() function is to add the parent process to the waiting queue of the child process) and wait for the awakening of the child process.*/
        }

        audit_finish_fork(p);
        tracehook_report_clone(regs, clone_flags, nr, p);

        /*
         * We set PF_STARTING at creation in case tracing wants to
         * use this to distinguish a fully live task from one that
         * hasn't gotten to tracehook_report_clone() yet.  Now we
         * clear it and set the child going.
         */
  1. In the third step
    Set the new ready queue state TASK_RUNING, hang the new process into the ready queue, restart the scheduler to run, return the PID of the child process to the parent process, and set the child process to return 0 value from do_fork().
p->flags &= ~PF_STARTING;

        if (unlikely(clone_flags & CLONE_STOPPED)) {
            /*
             * We'll start up with an immediate SIGSTOP.
             */
            sigaddset(&p->pending.signal, SIGSTOP);
            set_tsk_thread_flag(p, TIF_SIGPENDING);
            __set_task_state(p, TASK_STOPPED);
        } else {
            wake_up_new_task(p, clone_flags);
        }

        tracehook_report_clone_complete(trace, regs,
                        clone_flags, nr, p);

        if (clone_flags & CLONE_VFORK) {
            freezer_do_not_count();
            wait_for_completion(&vfork);
            freezer_count();
            tracehook_report_vfork_done(p, nr);
        }
    } else {
        nr = PTR_ERR(p);
    }
    return nr;
}

Complete do_fork function source code

(In the kernel/fork.c file, version number: linux-2.6.32.65)

long do_fork(unsigned long clone_flags,
          unsigned long stack_start,
          struct pt_regs *regs,
          unsigned long stack_size,
          int __user *parent_tidptr,
          int __user *child_tidptr)
{
    struct task_struct *p;//Allocate a task_struct data structure in memory to represent the upcoming new process
    int trace = 0;
    long nr;

    /*
     * Do some preliminary argument and permissions checking before we
     * actually start allocating stuff
     */
    if (clone_flags & CLONE_NEWUSER) {          //clone and new user yes
        if (clone_flags & CLONE_THREAD)
            return -EINVAL;
        /* hopefully this check will go away when userns support is
         * complete
         */
        if (!capable(CAP_SYS_ADMIN) || !capable(CAP_SETUID) ||
                !capable(CAP_SETGID))
            return -EPERM;
    }

    /*
     * We hope to recycle these flags after 2.6.26
     */
    if (unlikely(clone_flags & CLONE_STOPPED)) {
        static int __read_mostly count = 100;

        if (count > 0 && printk_ratelimit()) {
            char comm[TASK_COMM_LEN];

            count--;
            printk(KERN_INFO "fork(): process `%s' used deprecated "
                    "clone flags 0x%lx\n",
                get_task_comm(comm, current),
                clone_flags & CLONE_STOPPED);
        }
    }

    /*
     * When called from kernel_thread, don't do user tracing stuff.
     */
    if (likely(user_mode(regs)))
        trace = tracehook_prepare_clone(clone_flags);

    p = copy_process(clone_flags, stack_start, regs, stack_size,
             child_tidptr, NULL, trace);//Copy the contents of the parent PCB to the PCB of the new process.
    /*Complete the specific process creation by copy_process() function, and return value type is task_t type.
     * Do this prior waking up the new thread - the thread pointer
     * might get invalid after that point, if the thread exits quickly.
     */
    if (!IS_ERR(p)) {//The function IS_ERR() analyses whether the return value of copy_process() is correct. If correct, line 3-7 is executed
        struct completion vfork;//Define the variable vfork of struct completion type.

        trace_sched_process_fork(current, p);

        nr = task_pid_vnr(p);

        if (clone_flags & CLONE_PARENT_SETTID)
            put_user(nr, parent_tidptr);

        if (clone_flags & CLONE_VFORK) {//Judging whether there is CLONE_VFORK mark in clone_flags
            p->vfork_done = &vfork;
            init_completion(&vfork);/*This function is used in the last stage of process creation, where the parent process sets itself to an uninterruptible state and then sleeps in the
 On the waiting queue (init_waitqueue_head() function is to add the parent process to the waiting queue of the child process) and wait for the awakening of the child process.*/
        }

        audit_finish_fork(p);
        tracehook_report_clone(regs, clone_flags, nr, p);

        /*
         * We set PF_STARTING at creation in case tracing wants to
         * use this to distinguish a fully live task from one that
         * hasn't gotten to tracehook_report_clone() yet.  Now we
         * clear it and set the child going.
         */
        p->flags &= ~PF_STARTING;

        if (unlikely(clone_flags & CLONE_STOPPED)) {
            /*
             * We'll start up with an immediate SIGSTOP.
             */
            sigaddset(&p->pending.signal, SIGSTOP);
            set_tsk_thread_flag(p, TIF_SIGPENDING);
            __set_task_state(p, TASK_STOPPED);
        } else {
            wake_up_new_task(p, clone_flags);
        }

        tracehook_report_clone_complete(trace, regs,
                        clone_flags, nr, p);

        if (clone_flags & CLONE_VFORK) {
            freezer_do_not_count();
            wait_for_completion(&vfork);
            freezer_count();
            tracehook_report_vfork_done(p, nr);
        }
    } else {
        nr = PTR_ERR(p);
    }
    return nr;
}

For more details, click on the following:
Citation article

Keywords: Linux

Added by BrianM on Sun, 14 Jul 2019 20:48:54 +0300