[C language] Linux Socket select multiplexing

preface

After learning multi process and multi thread, in contrast, when there are multiple concurrent connection requests, the multi process or multi thread model needs to create a process or thread for each connection, and most of these processes or threads are blocked. If 1000 clients are connected to the server at this time, 1000 processes or threads need to be created, not to mention the time slice consumed by switching between processes and threads. The creation of each process and thread alone has consumed a lot of resources.

When I/O multiplexing is used, only one thread is required to process multiple connections to monitor the ready state, which greatly reduces the number of threads and reduces the memory overhead and CPU overhead of context switching.

In this article, let's talk about select multiplexing in select, poll and epoll multiplexing

1, Sync / async, block / unblock

Before learning about select multiplexing, let's understand a few concepts

During network programming under Linux, we often see four call modes: synchronous / asynchronous, block / unblock:

  • The concepts of synchronization and asynchrony describe the interaction between the user thread and the kernel: synchronization means that after the user thread initiates an IO request, it needs to wait or poll the kernel to complete the IO operation before continuing to execute; Asynchronous means that the user thread continues to execute after initiating the IO request. When the kernel IO operation is completed, it will notify the user thread or call the callback function registered by the user thread.
  • The concepts of blocking and non blocking describe the way in which the user thread invokes the kernel IO operation: blocking means that the IO operation will not return until the data is received or the result is obtained, and will not return to the user space until it is completely completed; Non blocking means that a status value is returned to the user immediately after the IO operation is called, without waiting until the IO operation is completely completed.

Simply put, for an event,
Synchronization knows when this event will happen
I don't know when this time will happen
For a message
Blocking: as long as there is a message, you will always be notified. There is a message that has not been processed
Non blocking: if a message comes, you will only be notified once. If you miss this notification, you need to poll later

During network programming under Linux, server-side programming often needs to construct high-performance IO models. There are five common IO models:

  • Blocking IO
  • Synchronous non blocking IO
  • IO Multiplexing
  • signal driven IO
  • Asynchronous IO

2, select multiplexing

The select() function allows the process to instruct the kernel to wait for any one of multiple events (file descriptors) to occur, wake it up only after one or more events occur or go through a specified period of time, and then judge which file descriptor has occurred and handle it accordingly.

1. select() function

select has a concept called set

int select(int nfds, fd_set *readfds, fd_set *writefds, fd_set *exceptfds, struct timeval *timeout);

The select() function is used to monitor three groups of independent [three sets] file descriptors, modify the fd of the corresponding set to the fd in which the corresponding event actually occurred, keep it in the set, and return how many fd events occurred. The analysis is as follows:

  • Parameter 1 (int nfds): the total number of fd to be tested. Its value is the maximum file descriptor to be tested + 1. In the select() function, the kernel starts scanning file descriptors from 0. If the file descriptors to be monitored are 3, 6 and 9, then the maximum file descriptor is 9, then the kernel will scan from 0 to 9, scanning a total of 10 file descriptors, That is, max(3,6,9)+1=10
  • Parameter 2 (fd_set *readfds): Specifies the FD set for the kernel to test read conditions. If such events are not needed or concerned, it can be set to NULL
  • Parameter 3 (fd_set *writefds): Specifies the FD set for the kernel to test write conditions. If such events are not needed or concerned, it can be set to NULL
  • Parameter 4 (fd_set *exceptfds): Specifies the FD set for the kernel to test exception conditions. If such events are not needed or concerned, it can be set to NULL
  • Parameter 5 (struct timeout * timeout): set the timeout of select. If it is set to NULL, it will never timeout
  • Return value: after the select() function is executed, the return value is an integer of type int. this integer represents how many fd events in the collection have been cached (not empty). For example, at that time, only 6 and 9 fd have data readable, so the value returned by select() is 2
  • Note: when the select() function is executed, it will modify the content [fd] in the original set, move the fd that currently has no data readable out of the set, and only keep the fd that currently has data readable
  • For example, to monitor the read events of file descriptors 5, 6, 7, 8 and 9, the first thing is to pass them through FD_SET() is added to the rdset set. Now the rdset contains five FDS (5,6,7,8,9). Call the select() function and find that only two FDS (6 and 9) have read events. Then the function will remove the fd (5,7,8) that currently has no read events from the rdset set and keep the two FDS (6,9) in the rdset set, It is convenient to scan fd through fd when the kernel scans fd later_ Isset() to determine whether the fd has a read event to process
  • [if you don't quite understand it now, don't worry. You can read the following and then string them together to understand it. It's much easier to do]

2,FD_ Zero macro

void FD_ZERO(fd_set *set);

FD_ The zero() macro is used to empty the specified file descriptor set (set set) [empty set]

  • The file descriptor collection must be initialized before it can be set. If the memory space is not emptied, it is usually not emptied after the system allocates the memory space. If the set is a local variable, there will be random values in the set, so the result is unknown.
  • It is also best to use FD when updating the set_ Zero() macro, which empties the collection last modified by select() to facilitate updating the contents of the collection

3,FD_SET() macro

void FD_SET(int fd, fd_set *set);

fd_ The set () macro is used to add the specified file descriptor fd to the set set.

4,FD_CLR() macro

void FD_CLR(int fd, fd_set *set);

fd_ The CLR () macro is used to delete the specified file descriptor fd from the set set.

5,FD_ISSET() macro

int FD_ISSET(int fd, fd_set *set);

fd_ The isset () macro is used to determine whether the specified descriptor fd is in the set set

  • General FD_ISSET() will be used after select(), because after calling select(), fd with actual events in the corresponding set will be retained, and the rest will be moved out of the set. At this time, all stored in the set set are actual events, so fd only needs to be used after select()_ Isset() to determine whether the fd is in the set set or not, so as to know whether an event occurs in the fd.

3, Specific code and program analysis

1. Specific code

#include <stdio.h>
#include <string.h>
#include <errno.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <arpa/inet.h>
#include <unistd.h>
#include <sys/select.h>
#include <ctype.h>

/* Defines the number of socket s that can be queued */
#define BACKLOG			32
/* This macro is used to calculate the number of elements of the array */
#define ARRAY_SIZE(x) (sizeof(x)/sizeof(x[0]))

int socket_server_init(char *servip, int port);

int main(int argc, char **argv)
{
	int		listen_fd = -1;
	int		client_fd = -1;
	int		rv = -1;
	int		port;
	int		max_fd = -1;
	fd_set	rdset;
	int		fds_array[1024];
	int		i;
	int		found;
	char	buf[1024];
	
	/* It is used to confirm whether the format of program execution is correct. If not, exit and remind the user */
	if (argc < 2)
	{
		printf("Program Usage: %s [Port]\n", argv[0]);
	}
	
	//Assign port parameters to parameter variables
	//Since the parameters passed from the command line are of string type, atoi needs to be converted to integer
	port = atoi(argv[1]);
		
	/* Create listen_fd, a function is encapsulated here */
	if ((listen_fd = socket_server_init(NULL, port)) < 0)
	{
		printf("socket_server_init failure\n");
		return -1;
	}
	printf("listen listen_fd[%d] on port[%d]\n", listen_fd, port);
	
	/* Set all elements in the array to - 1, which means empty */
	for (i=0; i<ARRAY_SIZE(fds_array); i++)
	{
		fds_array[i] = -1;
	}
	fds_array[0] = listen_fd;

	while (1)
	{	
		/* Empty the contents of the rdset collection */
		FD_ZERO(&rdset);
		
		/* Put the fd in the array into the collection */
		for (i=0; i<ARRAY_SIZE(fds_array); i++)
		{
			if (fds_array[i] < 0)
				continue;
			FD_SET(fds_array[i], &rdset);
			max_fd = fds_array[i]>max_fd ? fds_array[i] : max_fd;
		}
		
		/* Start select */
		if ((rv = select(max_fd+1, &rdset, NULL, NULL, NULL)) < 0)
		{
			printf("select() failure;%s\n", strerror(errno));
			goto cleanup; 
		}
		else if (rv == 0)
		{
			printf("select() timeout\n");
			continue;
		}
		
		/* There's news */
		/* Judge whether it is listen_fd message */
		if (FD_ISSET(fds_array[0], &rdset))
		{	
			/*
			 * accept()
			 * Accept connection requests from clients
			 * Return a client_fd communication with customers
			 */
			if ((client_fd = accept(listen_fd, (struct sockaddr *)NULL, NULL)) < 0)
			{
				printf("accept new client failure: %s\n", strerror(errno));
				continue;
			}
			
			/*
			 * The client_fd is placed in an empty bit in the array
			 * (Where the value of the element is - 1)
			 */
			found = 0;
			for (i=0; i<ARRAY_SIZE(fds_array); i++)
			{
				if (fds_array[i] < 0)
				{
					fds_array[i] = client_fd;
					found = 1;
					break;
				}
			}
			
			/*
			 * If no space is found, the array is full
			 * Do not receive this new client, turn off the client_fd
			 */
			if (!found)
			{
				printf("accept new client[%d], but full, so refuse\n", client_fd);
				close(client_fd);
			}
			printf("accept new client[%d]\n", client_fd);


		}		/* end of server message */
		else	/* Messages from connected clients */
		{
			for (i=0; i<ARRAY_SIZE(fds_array); i++)
			{	
				/* Judge whether fd is valid, and check whether the current fd is in the rdset set set */
				if (fds_array[i] < 0 || !FD_ISSET(fds_array[i], &rdset))
					continue;
				
				/* Empty buf to store the read data */
				memset(buf, 0, sizeof(buf));
				if ((rv = read(fds_array[i], buf, sizeof(buf))) <= 0)
				{
					printf("read data from client[%d] failure or get disconnected, so close it\n", fds_array[i]);
					close(fds_array[i]);
					fds_array[i] = -1;
					continue;
				}
				printf("read %d Bytes data from client[%d]: %s\n", rv, fds_array[i], buf);
				
				/* Convert lowercase letters to uppercase */
				for (int j=0; j<rv; j++)
				{
					if (buf[j] >= 'a' && buf[j] <= 'z')
						buf[j] = toupper(buf[j]);
				}
				
				/* Send data to client */
				if ((rv = write(fds_array[i], buf, rv)) < 0)
				{
					printf("write data to client[%d] failure: %s\n", fds_array[i], strerror(errno));
					close(fds_array[i]);
					fds_array[i] = -1;
					continue;
				}
				printf("write %d Bytes data to client[%d]: %s\n", rv, fds_array[i], buf);

			} /* end of for(i=0; i<ARRAY_SIZE(fds_array); i++) */

		} /* end of client message */

	} /* end of while(1) */

cleanup:
	close(listen_fd);

	return 0;

} /* end of main function */

/*
 * Socket Server Init Function
 * Create listen_fd and bind bind bind ip and port, and listen
 */
int socket_server_init(char *servip, int port)
{
	int					listen_fd = -1;
	int					rv = 0;
	int					on = 1;
	struct sockaddr_in	servaddr;
	
	/*
	 * socket(),Create a new sockfd
	 * Specifies that the protocol family is IPv4
	 * socket Type is SOCK_STREAM(TCP)
	 */
	if ((listen_fd = socket(AF_INET, SOCK_STREAM, 0)) < 0)
	{
		printf("create listen_fd failure: %s\n", strerror(errno));
		return -1;
	}
	
	//Set the Socket port to be reusable, and fixed the error of "address already in use" when the Socket server restarts
	setsockopt(listen_fd, SOL_SOCKET, SO_REUSEADDR, &on, sizeof(on));
	
	/*
	 * bind(),Bind the protocol address of the server to listen_fd
	 */
	memset(&servaddr, 0, sizeof(servaddr));
	servaddr.sin_family = AF_INET;
	servaddr.sin_port = htons(port);
	/* If the ip address is empty */
	if (!servip)
	{
		/* Monitor the ip address of all network cards */
		servaddr.sin_addr.s_addr = htonl(INADDR_ANY);
	}
	else
	{
		/* Converts the dotted decimal ip address to a 32-bit integer and passes it into the structure */
		if (inet_pton(AF_INET, servip, &servaddr.sin_addr) <= 0)
		{
			printf("inet_pton() failure: %s\n", strerror(errno));
			rv = -2;
			goto cleanup;
		}

	}

	if (bind(listen_fd, (struct sockaddr *)&servaddr, sizeof(servaddr)) < 0)
	{
		printf("bind listenfd[%d] on port[%d] failure: %s\n", listen_fd, port, strerror(errno));
		rv = -3;
		goto cleanup;
	}
	
	/*
	 * listen()
	 * listen_fd, and set the maximum number of queued connections
	 */
	if (listen(listen_fd, BACKLOG) < 0)
	{
		printf("listen listen_fd[%d] on port[%d] failure: %s\n", listen_fd, port, strerror(errno));
		rv = -4;
		goto cleanup;
	}

cleanup:
	if (rv < 0)
		close(listen_fd);
	else
		rv = listen_fd;
	
	return rv;
}

2. Program analysis

The flow of this code is as follows:

  • 1. Through encapsulated function int socket_server_init(char *servip, int port); To create a listen_fd, and bind() binds the protocol family, IP and port. Finally, go to listen() to listen to this port. If all steps are successful, listen is returned_ FD, if it fails, an error is returned and listen is closed_ fd
  • 2. The array FDS that will be used to store fd_ All array contents are set to - 1, indicating that the location is empty and can store fd
  • 3. Enter the while loop
  • 4. Empty the set, put the effective fd in the array into the set according to the array content, update the set content, and judge the maximum fd to facilitate kernel scanning.
  • 5. Start select()
    • Error in return value < 0, shut down the server
    • If the return value = 0 times out, the current cycle will be ended and the next cycle will be entered [but the timeout will not occur because the timeout in the code is set to NULL]
    • If the return value is > 0 and an event occurs, you can judge the event
  • ⑥ . because it is divided into server-side and client-side according to the function, through FD_ Isset (fds_array [0], & rdset) determines whether there is data readable on the server
    • If listen_fd has data readable, indicating that there is a new client to connect to the server
      • At this point, you can accept() to receive the new client
      • After accept(), it is also necessary to determine whether there is any vacancy in the array [element with value of - 1] to store the client communicating with the new client_fd
        • If there is an empty space, the client_fd stored in array
        • If there is no vacancy, the client will be deleted_ FD closes and disconnects the client
    • If client_fd has data readable, indicating that a client has sent a message
      • Although the fd with actual events in the set has been retained after select(), we can't directly know which ones. We need to traverse the array to compare with the fd in the set at this time
      • Use the two conditions (fds_array [i] < 0 |! fd_isset (fds_array [i], & rdset)) to filter, and continue to skip the "fd with value of - 1 and not in the set" in the array to process the data of fd in the set
      • Read the data. If the data is read incorrectly or the connection is disconnected, move the fd out of the array [set the value of the corresponding element in the array to - 1], and close the connection with the client
      • Write data. If there is a write error, move the fd out of the array [set the value of the corresponding element in the array to - 1], and close the connection with the client

4, Operation effect

summary

The above is some understanding of Linux select multiplexing. If there is anything wrong, please give us your advice.

Keywords: C Linux socket

Added by Eal on Mon, 10 Jan 2022 21:46:54 +0200