Learning computer again (VIII. Process and creation process)

In how the program runs, it also talks about the process, but it is not introduced in detail due to space and theme reasons; This time, we need to introduce the process. There are many concepts of process, and it is also the core of the operating system.

8.1 process

8.1. 1 what is a process

What is a process? It's hard to write concepts.

We started from the first article, started to write the first c language, then compiled the link, and finally generated an executable file, which is called a program. What's in this executable? You can read this article Learn computer again (III. elf file layout and symbol table) , which contains various sections to prepare for program loading.

When we run the executable, it is a process.

How a program runs and the difference between a process and a program can be seen in this article: Learn computer again (VI. how the program works).

From a professional point of view: process is the basic unit for the operating system to allocate resources.

The process has its own independent processing environment: environment variables, program running directory, process group, etc.

The process has its own independent system resources: processor CPU occupancy, memory, I/O devices, data and programs.

8.1. 2 parallelism and concurrency

In the previous operating system, there was single channel programming.

The so-called single channel programming is that all processes are queued one by one. If A is blocked, B can only wait.

In contrast, computer systems now allow multiple programs to be loaded into memory for concurrent execution. Concurrent execution actually means that the CPU quickly switches from one process to another, so that each process can run for a period of time.

From this figure, we can see that from the point of time, the CPU can only run one program, but from the point of time, the CPU can run multiple programs.

Because of the need for switching, time interruption in the computer provides a hardware guarantee for process switching. This will be discussed in the next section.

Which is what?

Parallelism is real hardware concurrency. Two or more CPU s share the same physical memory.

Look at the graphics industry and see that this is parallel execution without interference.

8.1. 3 process creation

stay Learn computer again (VI. how the program works) fork() is also mentioned in. Yes, in linux system, if you want to create a process, you need to call fork(). fork() is the system API of linux. All resources are managed by the operating system, including processes. (after talking about this, do you want to know how the system call is called? Let's talk about this again)

#include <unistd.h>

pid_t fork(void);
/* Function:
	It is used to create a new process from an existing process. The new process is called a child process and the original process is called a parent process.
Parameters:
	nothing
 Return value:
	Success: 0 is returned in the child process, and the child process ID is returned in the parent process. pid_t is an integer
	Failed: returned - 1.
	Two main reasons for failure:
	1)The current process has reached the upper limit specified by the system. At this time, the value of errno is set to EAGAIN.
	2)When the system is out of memory, the value of errno is set to ENOMEM.
*/

Next, we use this function to create the first child process,

#include <unistd.h>
#include <stdio.h>
#include <errno.h>

int main(int argc, char **argv)
{
    printf("hello fork\n");

    pid_t pid = fork();
    if(pid < 0)
    {
        printf("fork fail %d\n", errno);
    } 
    else if(pid == 0)       // This is a child process
    {
        printf("I am son\n");
    } 
    else    // Processes greater than 0 are parent processes
    {
        printf("parent %d\n", pid);
    } 

    return 0;
}

Output:

root@ubuntu:~/c_test/08# ./fork
hello fork
parent 1524
I am son

After fork, which one gets the CPU resources first for the parent-child process?

In kernel 2.6 32. By default, the parent process will be the first object to be called after fork. The reason for adopting this strategy: after forking, the parent process is active in the CPU, and its memory management information is also placed in the translation backup buffer (TLB) of the hardware unit, so scheduling the parent process first can improve performance

However, there is no guarantee that the parent process will be scheduled first in POSIX standard and linux. Therefore, in the application, the parent process cannot be assumed to call first. If it needs to be called in order, process synchronization is required.

be careful:

The return of fork must be processed. If it is not processed, return to -1, use -1 as the process number, and then call kill function, kill(-1, 9) will kill all processes except init, and of course need permission.

8.1. 4 memory relationship between parent and child processes

The child process after fork completely copies the address space of the parent process, including stack, heap, code segment, etc.

Write a program to see the effect:

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>


int g_a = 10;       // global variable

int main(int argc, char **argv)
{
    int local_b = 20;   // local variable
    int *malloc_c = malloc(sizeof(int));

    *malloc_c = 30;     // Heap variable

    pid_t pid = fork();
    if(pid < 0) 
    {
        perror("fork");
        return -1;
    }

    if(pid == 0) {
        // Subprocess
        printf("son  g_a:%d p:%p local_b:%d p:%p malloc_c:%d p:%p\n", g_a, &g_a, local_b, &local_b, *malloc_c, malloc_c);
    } else if(pid > 0) {
        // Parent process
        printf("parent  g_a:%d p:%p local_b:%d p:%p malloc_c:%d p:%p\n", g_a, &g_a, local_b, &local_b, *malloc_c, malloc_c);
    }

    if(pid == 0) {
        // Subprocess
        g_a = 11;
        local_b = 21;
        *malloc_c = 31;
        printf("son  g_a:%d p:%p local_b:%d p:%p malloc_c:%d p:%p\n", g_a, &g_a, local_b, &local_b, *malloc_c, malloc_c);
    } else if(pid > 0) {
        // Parent process
        sleep(1);
        printf("parent  g_a:%d p:%p local_b:%d p:%p malloc_c:%d p:%p\n", g_a, &g_a, local_b, &local_b, *malloc_c, malloc_c);
    }

    while(1);

    return 0;
}

Three variables are specifically defined here, one is the global variable in the data segment, one is the local variable on the stack, and the other is the dynamic variable in the heap.

When we write code, we basically use the variables of these three types. Let's compile and run:

root@ubuntu:~/c_test/08# ./test_mem
parent  g_a:10 p:0x601060 local_b:20 p:0x7ffe57755668 malloc_c:30 p:0x1aae010
son  g_a:10 p:0x601060 local_b:20 p:0x7ffe57755668 malloc_c:30 p:0x1aae010
son  g_a:11 p:0x601060 local_b:21 p:0x7ffe57755668 malloc_c:31 p:0x1aae010
parent  g_a:10 p:0x601060 local_b:20 p:0x7ffe57755668 malloc_c:30 p:0x1aae010

Obviously, in the first two lines, the printed values are the same, and the virtual addresses are the same. We will talk about the virtual address later. Now, as long as we go to the value in memory, we must map it to the physical memory page through the virtual address. We will analyze which physical memory page is pointed to later. (I feel like digging a hole in the back again)

Then we modify three values in the child process, and then continue to execute. The answer is that the values of the parent and child processes are different, but the virtual address is still the same. Make a mark for later analysis.

Let's continue to view the values of maps:

root@ubuntu:/proc# cat 1522/maps 
00400000-00401000 r-xp 00000000 08:01 11672602                           /root/c_test/08/test_mem
00600000-00601000 r--p 00000000 08:01 11672602                           /root/c_test/08/test_mem
00601000-00602000 rw-p 00001000 08:01 11672602                           /root/c_test/08/test_mem
01aae000-01acf000 rw-p 00000000 00:00 0                                  [heap]
7f6344bba000-7f6344d7a000 r-xp 00000000 08:01 791097                     /lib/x86_64-linux-gnu/libc-2.23.so
7f6344d7a000-7f6344f7a000 ---p 001c0000 08:01 791097                     /lib/x86_64-linux-gnu/libc-2.23.so
7f6344f7a000-7f6344f7e000 r--p 001c0000 08:01 791097                     /lib/x86_64-linux-gnu/libc-2.23.so
7f6344f7e000-7f6344f80000 rw-p 001c4000 08:01 791097                     /lib/x86_64-linux-gnu/libc-2.23.so
7f6344f80000-7f6344f84000 rw-p 00000000 00:00 0 
7f6344f84000-7f6344faa000 r-xp 00000000 08:01 791108                     /lib/x86_64-linux-gnu/ld-2.23.so
7f634519c000-7f634519f000 rw-p 00000000 00:00 0 
7f63451a9000-7f63451aa000 r--p 00025000 08:01 791108                     /lib/x86_64-linux-gnu/ld-2.23.so
7f63451aa000-7f63451ab000 rw-p 00026000 08:01 791108                     /lib/x86_64-linux-gnu/ld-2.23.so
7f63451ab000-7f63451ac000 rw-p 00000000 00:00 0 
7ffe57737000-7ffe57758000 rw-p 00000000 00:00 0                          [stack]
7ffe57794000-7ffe57797000 r--p 00000000 00:00 0                          [vvar]
7ffe57797000-7ffe57799000 r-xp 00000000 00:00 0                          [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]
root@ubuntu:/proc# cat 1523/maps 
00400000-00401000 r-xp 00000000 08:01 11672602                           /root/c_test/08/test_mem
00600000-00601000 r--p 00000000 08:01 11672602                           /root/c_test/08/test_mem
00601000-00602000 rw-p 00001000 08:01 11672602                           /root/c_test/08/test_mem
01aae000-01acf000 rw-p 00000000 00:00 0                                  [heap]
7f6344bba000-7f6344d7a000 r-xp 00000000 08:01 791097                     /lib/x86_64-linux-gnu/libc-2.23.so
7f6344d7a000-7f6344f7a000 ---p 001c0000 08:01 791097                     /lib/x86_64-linux-gnu/libc-2.23.so
7f6344f7a000-7f6344f7e000 r--p 001c0000 08:01 791097                     /lib/x86_64-linux-gnu/libc-2.23.so
7f6344f7e000-7f6344f80000 rw-p 001c4000 08:01 791097                     /lib/x86_64-linux-gnu/libc-2.23.so
7f6344f80000-7f6344f84000 rw-p 00000000 00:00 0 
7f6344f84000-7f6344faa000 r-xp 00000000 08:01 791108                     /lib/x86_64-linux-gnu/ld-2.23.so
7f634519c000-7f634519f000 rw-p 00000000 00:00 0 
7f63451a9000-7f63451aa000 r--p 00025000 08:01 791108                     /lib/x86_64-linux-gnu/ld-2.23.so
7f63451aa000-7f63451ab000 rw-p 00026000 08:01 791108                     /lib/x86_64-linux-gnu/ld-2.23.so
7f63451ab000-7f63451ac000 rw-p 00000000 00:00 0 
7ffe57737000-7ffe57758000 rw-p 00000000 00:00 0                          [stack]
7ffe57794000-7ffe57797000 r--p 00000000 00:00 0                          [vvar]
7ffe57797000-7ffe57799000 r-xp 00000000 00:00 0                          [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]

Carefully observe whether the memory address value of each segment is the same, and the memory of each segment is the same.

Well, can we infer the operation of fork on memory from here?

In the traditional Unix system, the child process copies all the resources of the parent process, including the address space of the process, including the context of the process (the static description of the whole process of process execution activities), the stack of the process, etc.

Seeing the word "tradition", this play must have disadvantages, which are as follows:

  1. Use a lot of memory
  2. Replication also takes a lot of time, resulting in low fork efficiency
  3. Usually, we will call the exec function to execute another process instead of the parent process, resulting in a large number of useless copies.

So linux now uses write time copy (copy on write) technology. This technology is also very understandable. In the fork process, the child process does not need to completely copy the address space of the parent process, but allows the parent and child processes to share the same address space and set these address spaces as read-only. When one of the parent and child processes tries to modify, it will cause a page missing exception, and then the kernel will try to modify the page Create a new physical page and write the real value to the new physical page. This is write time copy. After all, it is a reliable technology.

Does it feel like it's over here? After taking a closer look at the code, we malloc the variable and haven't released it yet?

Here is a question, how to release? What is the release problem after the child process is added?

Through the above analysis, the child process will copy a heap space, so there is also a malloc in the heap of the child process_ In this case, malloc is an application and needs to be released twice (parent-child processes respectively).

You can try.

8.1. 5 parent child process file relationship

Execute the fork function, and the kernel will copy all the file descriptors of the parent process. Therefore, the child process can also operate on the open file of the parent process.

Let's write a code to test the relationship between parent-child process files.

#include <stdio.h>
#include <unistd.h>
#include <string.h>
#include <fcntl.h>


#define     INFILE      "./in.txt"
#define     OUTFILE     "./out.txt"


int main(int argc, char**argv)
{
    // Open file first
    int r_fd = open(INFILE, O_RDONLY);
    if(r_fd < 0)
    {
        printf("open %s\n", INFILE);
        return 0;
    }

    int w_fd = open(OUTFILE, O_WRONLY | O_CREAT | O_TRUNC);
    if(w_fd < 0) 
    {
        printf("open %s\n", OUTFILE);
        return 0;
    }


    // Create child process
    pid_t pid = fork();
    if(pid < 0)
    {
        printf("fork error\n");
        return 0;
    } 
    
    char buf[100];
    memset(buf, 0, 100);

    // Like the parent-child process, read the file and then write the file
    while(read(r_fd, buf, 2) > 0)
    {
        printf("pid:%d buf:%s\n", getpid(), buf);
        sprintf(buf, "pid:%d \n", getpid());
        write(w_fd, buf, strlen(buf));                      // Multiple processes operate on one w_fd
        sleep(1);
        memset(buf, 0, 100);
    }


    while(1);
    close(r_fd);
    close(w_fd);

    return 0;
}

Let's take a look at the effect of code execution:

root@ubuntu:~/c_test/08# ./test_file
pid:1501 buf:1

pid:1502 buf:2

pid:1501 buf:3

pid:1502 buf:4

pid:1502 buf:5
pid:1501 buf:6

Through this output, it is found that the pointer offset of the shared file read by the parent-child process is one, so it can be read sequentially. If it is not one, the parent-child process reads from 1-6

root@ubuntu:~/c_test/08# cat out.txt 
pid:1501 
pid:1502 
pid:1501 
pid:1502 
pid:1501 
pid:1502

When writing a file, it also shares a file pointer, so it is written alternately.

If this is not safe, how can the child process not access the shared files of the parent process.

In fact, the open function has a flag: O_CLOSEXEC.

You can see from the name. After executing the exec function, the shared file will be closed, so that the child process cannot access the file opened by the parent process.

8.1.6 vfork()

In the early days when there was no fork for write time replication, it was really slow to create a process with fork, so the bosses created a new function vfork() to create a process.

Implementation of vfork(): the memory data of the parent process will not be copied and shared directly.

There will be no problem with sharing, of course, but this vfork() will ensure that the child process runs first and the parent process hangs first until the child process calls_ After the exit, exit, or exec function, the parent process runs again.

However, the vfork has been eliminated when the copy on write occurs in the fork. There is no need to write an example here. There is no need to use the eliminated function.

8.1. 7 process tree

Since all processes come from the parent process fork, there is always an ancestor process, which is the init process started by the system:

This picture comes from Mr. Liu Chao's interesting talk about the operating system.

Let's talk about this picture in the next section, ha ha ha.

Attached:

The child process inherits the properties of the parent process:

  1. The entire memory section. (copy on write)
  2. Offset pointer to open file
  3. Actual user ID, actual group ID, valid user ID, valid group ID
  4. Additional group ID, process group ID, session group ID
  5. Control terminal
  6. Set user ID flag and set group ID flag
  7. Current working directory
  8. root directory
  9. Create mask word in file mode
  10. Signal shielding and arrangement
  11. Close on execution flag for any open file descriptor
  12. environment
  13. Connected shared storage segments
  14. Storage mapping
  15. Resource constraints

Differences between parent and child processes:

  1. Return value of fork
  2. Different process ID
  3. Two processes have different parent process ID s
  4. TMS of child process_ utime,tms_stime,tms_cutime and tms_ustime is set to 0
  5. The file lock set by the parent process will not be inherited by the child process
  6. The unprocessed alarm clock of the child process is cleared
  7. The unprocessed signal set of the child process is set to an empty set

There are too many attributes. Many of them are not very clear. Take your time and come on.

Keywords: Process fork

Added by rUmX on Mon, 27 Dec 2021 09:10:46 +0200