C processes and threads

Catalogue of series articles

preface

  • The generation mode of learning process, such as fork(), system(), exec() functions, etc
  • Communication and synchronization between Linux processes, including pipe, named pipe fifo, semaphore sem, shared memory shm, message queue msg, and signal. Record their learning
  • Learn the thread programming mode under Linux, including mutual exclusion, condition variables, thread signals, etc.

1, Linux Process

The process operation modes of Linux mainly include generating process, terminating process, and communication and synchronization between processes.

1.1 process generation

  • First, copy the environment variables of the parent process
  • Establishing process structure in kernel
  • Insert the structure into the process list for easy maintenance
  • Assign resources to this process
  • Copy the memory mapping information of the parent process
  • Manage file descriptors and link points
  • Notify parent process

1.2 termination mode of process

  • Return from main
  • Call exit
  • Call_ exit
  • Call abort
  • There is a signal termination

When a process terminates, the system will release the resources owned by the process, such as memory, files and structures.

1.3 communication between processes

The common communication between processes are pipeline, shared memory and message queue.

  • Pipeline: similar to file operation, only one end of the pipeline is read-only and the other end is write only. Data is transferred between processes by reading and writing
  • Shared memory: an address in the memory is shared among multiple processes, and multiple processes operate on the memory using the obtained shared memory address
  • Message queue: create a linked list in the kernel

1.4 synchronization between processes

As long as there are message queues, semaphores and so on.
Semaphore is a shared value representing quantity, which is used to protect operations between processes or shared resources

The difference and relation between process and thread

  • The process is the basic unit of resource allocation by the operating system. The process has a complete virtual space. When allocating the process system resources, in addition to CPU resources, independent resources will not be allocated to threads, and the resources required by threads need to be shared.
  • A thread is a part of a process. If the process does not display thread allocation, it can be considered as a single thread. If a thread is established in the process, it can be considered as a multi-threaded system
  • Multithreading and multiprocessing are different concepts. Although both of them perform functions in parallel, resources such as memory and variables between multithreads can be shared in a simple way, while multi processes are different, and the sharing mode between processes is limited.

2, Process generation mode

2.1 process number

When each process is initialized, the system assigns an ID number to identify the process. In Linux, the process number is unique. The system can use this value to represent a process. The ID number describing the process is usually called PID
The variable type of PID is pid_t.

2.1.1 getpid() getppid() function

getpid() is to get the current process ID number, and getppid() is to return the ID number of the parent process of the current process. The type is pid_t

int  main(int argc, char **argv)
{
    pid_t pid,ppid;

    pid = getpid();
    ppid = getppid();

    printf("pid = %d ppid = %d\n", pid, ppid);

    return 0;
}

2.2 process replication fork()

The fork() function copies a process based on the parent process, and its ID number is different from that of the parent process. In the Linux environment, fork() is implemented by write replication. Only when memory is different from the parent process, others are shared with the parent process, and a copy is regenerated only after the parent process or child process is modified. The following are examples:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/types.h>

int  main(int argc, char **argv)
{
    pid_t pid;

    pid = fork();

    if(-1 == pid)
    {
        printf("Error:fork\n");
        return -1;
    }else if(0 == pid){
        printf("child process PID = %d parent PID = %d\n", getpid(),getppid());
    }else{
        printf("parent process PID = %d parent PID = %d\n", getpid(),getppid());
    }

    return 0;
}
wsj@realarm:~/work/net_stu$ ./a.out 
parent process PID = 13851 parent PID = 6537
child process PID = 13852 parent PID = 13851

2.3 system mode

The system() function calls an external command of the shell to start another process in the current process.
The system() function calls "/ bin/sh-c command" to execute a specific command, blocking the current process until the command is executed

system() function prototype:

#include <stdlib.h>
int system(const char *command);

When the system() function is executed, functions such as fork() execve() waitpid() will be called. If any of the calls fails, the system() function call will fail,

The process executes the exec() function

When using the fork() function and the system() function, the system will establish a new process to execute the caller's operation. The original process still exists. Until the user explicitly exits, the exec () family function will replace the original process with a new process, and the system will run from the new process. The new process PID value is the same as the original process PID value. exec()

extern int execve (const char *__path, char *const __argv[],
		   char *const __envp[]) __THROW __nonnull ((1, 2));

#ifdef __USE_XOPEN2K8
/* Execute the file FD refers to, overlaying the running program image.
   ARGV and ENVP are passed to the new program, as for `execve'.  */
extern int fexecve (int __fd, char *const __argv[], char *const __envp[])
     __THROW __nonnull ((2));
#endif


/* Execute PATH with arguments ARGV and environment from `environ'.  */
extern int execv (const char *__path, char *const __argv[])
     __THROW __nonnull ((1, 2));

/* Execute PATH with all arguments after PATH until a NULL pointer,
   and the argument after that for environment.  */
extern int execle (const char *__path, const char *__arg, ...)
     __THROW __nonnull ((1, 2));

/* Execute PATH with all arguments after PATH until
   a NULL pointer and environment from `environ'.  */
extern int execl (const char *__path, const char *__arg, ...)
     __THROW __nonnull ((1, 2));

/* Execute FILE, searching in the `PATH' environment variable if it contains
   no slashes, with arguments ARGV and environment from `environ'.  */
extern int execvp (const char *__file, char *const __argv[])
     __THROW __nonnull ((1, 2));

/* Execute FILE, searching in the `PATH' environment variable if
   it contains no slashes, with all arguments after FILE until a
   NULL pointer and environment from `environ'.  */
extern int execlp (const char *__file, const char *__arg, ...)
     __THROW __nonnull ((1, 2));

Among the above functions, only execve() function is a real system call, and others are library functions wrapped on this basis. The function above is to find the appropriate executable file name in the executable path of the current system according to the specified file name, and use it to replace the contents of the calling process. Unlike the fork() function, the functions of the exec() function family will not return after successful execution. This is because the execution of a new program has occupied the space and resources of the current process. These resources include code segments, data segments and stacks. They have been replaced by new content, and the process ID and other symbolic information are still the original, that is, exec(0) the function family runs its own program on the original shell. The system will return - 1 only if the program call fails

Interprocess communication and synchronization

In Linux, the communication mechanism between multiple processes is called IPC. It is a method for multiple process families to communicate with each other. In early Linux, there were many methods for inter process communication: half duplex pipeline, FIFO (named pipeline), message queue, semaphore, shared memory, etc.

Half duplex pipe

Pipeline is a mechanism that connects the standard input and standard output between two processes, so it is called half duplex. In the shell, pipeline is represented by "|", for example:

ls -l | grep *.c

Through code: Create function prototype of pipe: pipe()

#include <unistd.h>
int pipe(int filedes[2);
/*Array filedes is an array of file descriptors used to store the two file descriptors returned by the pipeline 
The first element in the array is created and opened for read operations, and the second element is created and opened for write operations,
That is, the output of fd1 is the input of fd0. When the function is executed successfully, it returns 0, and when the function fails, it returns - 1*/

The established pipeline code is as follows:

#include<stdio.h>
#include<stdlib.h>
#include<string.h>
#include<unistd.h>
#include<sys/types.h>

int main(void)
{
	int result = -1;
	int fd[2], nbytes;
	pid_t pid;
	char string[] = "Hello, pipe";
	char readbuffer[80];
	/*File descriptor 1 is for writing and 0 is for reading*/
	int *write_fd = &fd[1];
	int *read_fd = &fd[0];

	result = pipe(fd);

	if(-1 == result){
		printf("pipe error\n");
		return -1;
	}
	pid = fork();
	if(-1 == pid){
		printf("fork error\n");
		return -1;
	}else if(0 == pid){
		close(*read_fd);

		result = write(*write_fd,string,strlen(string));
		return 0;
	}else{
		close(*write_fd);

		nbytes = read(*read_fd,readbuffer,sizeof(readbuffer));
		printf("read buf :%s\n",readbuffer);
	}
	return 0;
}

Atomicity of pipeline blockage and pipeline operation
When the write end of the pipeline is not closed, if the number of bytes requested for writing is greater than the threshold PIPE_BUF, the return value of the write operation is the current number of data bytes in the pipeline. If the number of bytes requested is not greater than PIPE_BUF, the existing number of bytes in the pipeline (at this time, the amount of data in the pipeline is less than the requested amount of data) or the requested number of bytes are returned (at this time, the amount of data in the pipeline is not less than the requested amount of data)
When the pipeline writes, when the number of data written is less than 128K, the writing is non atomic. If the number of bytes written twice in the parent process is changed to 28K, it can be found that when the amount of data written to the pipeline is greater than 128K bytes, the data in the buffer will be continuously written to the pipeline, and the position where all the data is written is known. If no process reads the data, the migration will be blocked.

Pipeline operation atomic code instance

#include<stdio.h>
#include<stdlib.h>
#include<string.h>
#include<unistd.h>
#include<sys/types.h>

#define K 1024
#define WRITELEN (128*K)

int main(void)
{
	int result = -1;
	int fd[2], nbytes;
	pid_t pid;
	char string[] = "Hello, pipe";
	char readbuffer[10*K];

	/*File descriptor 1 is for writing and 0 is for reading*/
	int *write_fd = &fd[1];
	int *read_fd = &fd[0];

	result = pipe(fd);

	if(-1 == result){
		printf("pipe error\n");
		return -1;
	}
	pid = fork();
	if(-1 == pid){
		printf("fork error\n");
		return -1;
	}else if(0 == pid){
		close(*read_fd);

		int write_size = WRITELEN;
		result = 0;
		while(write_size >= 0){
			result = write(*write_fd,string,write_size);

			if(result > 0){
				write_size -= result;
				printf("write in%d Bytes, remaining%d Data\n",result,write_size);
			}else{
				printf("wait 10s\n");
				sleep(10);
			}
			if(write_size <= 0)
				break;
			
		}

		return 0;
	}else{
		close(*write_fd);

		while(1){
			nbytes = read(*read_fd,readbuffer,sizeof(readbuffer));
			if(nbytes <= 0){
				printf("No data written\n");
				break;
			}
			printf("read buf :%s\n",readbuffer);
			}
	}
	return 0;
}

Command pipeline

There are some obvious differences between command pipeline and pipeline:
1. In the file system, named pipes exist in the form of device specific files
2. Different processes can share data through named pipes

  • Create FIFO
    There are many ways to create named pipes, which can be done directly with the shell, such as
[root:/tmp]# mkfifo namedfifo
[root:/tmp]# ls -l namedfifo 
prw-r--r-- 1 root root 0 Aug  4 09:03 namedfifo
  • FIFO operation
    For the named pipeline FIFO, the IO operation is basically the same as the ordinary pipeline IO operation. There are major differences between the two. In FIFO, an open() function must be used to explicitly establish the channel connecting the pipeline. Generally speaking, the FIFO is always blocked, that is, if the named pipeline is opened and the read permission is set, the read process will be uniformly blocked, This blocking action is also true until other processes open the FIFO and write data to the pipeline. If a process opens a pipeline to write data, when no process reads data, the write pipeline is also blocked, and the write operation can not be carried out until the written data is read out. If you don't want blocking in the command pipeline operation, you can use o in the open () call_ Nonblock flag to turn off the default blocking action.

Message queue

Message queue is the internal linked list of the kernel address control. The content is transmitted between processes through the Linux kernel. Messages are sent to the message queue in sequence and obtained in several different ways. Each message queue can be uniquely identified by IPC identifier. Message queues in the kernel are distinguished by IPC identifier, Different message queues are relatively independent, and the messages in each message queue form an independent linked list.

summary

Keywords: C Linux Ubuntu

Added by dksmarte on Fri, 19 Nov 2021 16:07:35 +0200