Five IO multiplexing codes (B station in principle)

# select_server.py
# Server
from socket import *
import select

server = socket(AF_INET, SOCK_STREAM)
server.bind(('127.0.0.1', 8093))
server.listen(5)
# Set to non-blocking
server.setblocking(False)

# Initialization adds server-side socket objects to the listen list, followed by some conn connection objects dynamically, sk senses when accept and conn is static when recv
rlist = [server, ]
rdata = {}  # Store messages sent by clients

wlist = []  # Waiting for Write Object
wdata = {}  # Store the message to be returned to the client

print('Prepare! Listen!!!')
count = 0  # Writing for counting, not for the sake of experimentation
while True:
    # Start select listening, listen on the server in rlist, and the select function blocks the process until the socket in rlist is triggered
    # (In this case, the socket receives a handshake signal from the client and becomes readable, satisfying the "readable" condition of the select function).
    # The triggered (dynamic) socket (server socket) is returned to the rl return value;
    rl, wl, xl = select.select(rlist, wlist, [], 0.5)
    print('%s frequency>>' % count, wl)
    count = count + 1
    # Loop RLS to determine if a client is connected and select triggers when a client is connected
    for sock in rl:
        # Determine if the current trigger is a socket object, when the trigger is a socket object, a new client accept connection is in
        if sock == server:
            # Receive client connections to get client object and client address information
            conn, addr = sock.accept()
            # Add a new client connection to the listen list. When a client connection receives a message, the select will be triggered. It will know that the connection is still and has messages, and it will be returned to rl in the return value list.
            rlist.append(conn)
        else:
            # Since the socket receives client connection requests when the client connects in and adds the client connection to the listening list (rlist), the connection is triggered when the client sends a message
            # So determine if it is triggered by a client connection object
            try:
                data = sock.recv(1024)
                # When there is no data, we close the connection and remove it from the listen list
                if not data:
                    sock.close()
                    rlist.remove(sock)
                    continue
                print("received {0} from client {1}".format(data.decode(), sock))
                # Save received client messages
                rdata[sock] = data.decode()

                # Processes client connection objects and messages received by this object into return messages and adds them to the wdata dictionary
                wdata[sock] = data.upper()
                # When we need to reply to this client, we add this connection to the wlist write listen list
                wlist.append(sock)
            # If this connection goes wrong, the client is violently disconnected (note that I haven't received his message yet, or something went wrong while receiving his message)
            except Exception:
                # Close this connection
                sock.close()
                # Remove him from the list of listeners because, for whatever reason, it is disconnected and there is no need to listen on it anymore
                rlist.remove(sock)
    # If no client requests a connection and no client sends a message, start processing the Send Message List and see if a message needs to be sent
    for sock in wl:
        sock.send(wdata[sock])
        wlist.remove(sock)
        wdata.pop(sock)

    # #Print the message received by a conn object in a select listener list that has received data
    # for k,v in rdata.items():
    #     print(k,'the message is:', v)
    # #Empty received messages
    # rdata.clear()

# select_client.py
# Client
from socket import *

client = socket(AF_INET, SOCK_STREAM)
client.connect(('127.0.0.1', 8093))

while True:
    msg = input('>>: ').strip()
    if not msg:
        continue
    client.send(msg.encode('utf-8'))
    data = client.recv(1024)
    print(data.decode('utf-8'))

client.close()

# selector_server.py
# Server
from socket import *
import selectors

sel = selectors.DefaultSelector()


def accept(server_fileobj, mask):
    conn, addr = server_fileobj.accept()
    sel.register(conn, selectors.EVENT_READ, read)


def read(conn, mask):
    try:
        data = conn.recv(1024)
        if not data:
            print('closing', conn)
            sel.unregister(conn)
            conn.close()
            return
        conn.send(data.upper() + b'_SB')
    except Exception:
        print('closing', conn)
        sel.unregister(conn)
        conn.close()


server_fileobj = socket(AF_INET, SOCK_STREAM)
server_fileobj.setsockopt(SOL_SOCKET, SO_REUSEADDR, 1)
server_fileobj.bind(('127.0.0.1', 8088))
server_fileobj.listen(5)
server_fileobj.setblocking(False)  # Set socket's interface to non-blocking
sel.register(server_fileobj, selectors.EVENT_READ,
             accept)  # Appnd has a file handle server_fileobj in the read list equivalent to net select, and binds a callback function accept

while True:
    events = sel.select()  # Check all fileobj to see if it has completed wait data
    for sel_obj, mask in events:
        callback = sel_obj.data  # callback=accpet
        callback(sel_obj.fileobj, mask)  # accpet(server_fileobj,1)

# selector_client.py
from socket import *

c = socket(AF_INET, SOCK_STREAM)
c.connect(('127.0.0.1', 8088))

while True:
    msg = input('>>: ')
    if not msg:
        continue
    c.send(msg.encode('utf-8'))
    data = c.recv(1024)
    print(data.decode('utf-8'))

# epoll_demo.py
#!/usr/bin/env python
import select
import socket

response = b''

serversocket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
serversocket.bind(('0.0.0.0', 8080))
serversocket.listen(1)
# Since socket s are blocked by default, you need to use a non-blocking (asynchronous) mode.
serversocket.setblocking(0)

# Create an epoll object
epoll = select.epoll()
# Register an interest in reading an event on a server socket. A read event triggers a server socket to receive a socket connection at any time
epoll.register(serversocket.fileno(), select.EPOLLIN)

try:
    # Dictionary connections map file descriptors (integers) to their corresponding network connection objects
    connections = {}
    requests = {}
    responses = {}
    while True:
        # Query the epoll object to see if any interested event is triggered. The parameter "1" indicates that we will wait one second to see if an event occurs.
        # If any events of interest occur before this query, the query will immediately return with a list of those events
        events = epoll.poll(1)
        # Evet is returned as a tuple of a sequence (fileno, event code). Fileno is a pronoun for a file descriptor and is always an integer.
        for fileno, event in events:
            # If the server produces an event, a new connection is entered
            if fileno == serversocket.fileno():
                connection, address = serversocket.accept()
                print('client connected:', address)
                # Set the new socket to non-blocking mode
                connection.setblocking(0)
                # Register interest in read-to-read (EPOLLIN) event s for new socket s
                epoll.register(connection.fileno(), select.EPOLLIN)
                connections[connection.fileno()] = connection
                # Initialize received data
                requests[connection.fileno()] = b''

            # If a read event occurs, read new data sent from the client
            elif event & select.EPOLLIN:
                print("------recvdata---------")
                # Receive data from client
                requests[fileno] += connections[fileno].recv(1024)
                # If the client exits, close the client connection, and cancel all read and write listening
                if not requests[fileno]:
                    connections[fileno].close()
                    # Delete listening objects from the connections dictionary
                    del connections[fileno]
                    # Delete Handle Object corresponding to Receive Data Dictionary
                    del requests[connections[fileno]]
                    print(connections, requests)
                    epoll.modify(fileno, 0)
                else:
                    # Once the completion request has been received, unregister attention to read events, and register attention to write (EPOLLOUT) events. When a write event occurs, data is returned to the client
                    epoll.modify(fileno, select.EPOLLOUT)
                    # Print the complete request, proving that although communication with the client is staggered, the data can be assembled and processed as a whole
                    print('-' * 40 + '\n' + requests[fileno].decode())

            # If a write event occurs on a client socket, it will accept new data for sending to the client
            elif event & select.EPOLLOUT:
                print("-------send data---------")
                # Send a portion of the response data at a time until the complete response data has been sent to the operating system waiting to be transferred to the client
                byteswritten = connections[fileno].send(requests[fileno])
                requests[fileno] = requests[fileno][byteswritten:]
                if len(requests[fileno]) == 0:
                    # Once the complete response data is sent, no more attention is paid to writing event s
                    epoll.modify(fileno, select.EPOLLIN)

            # The HUP event indicates that the client socket has been disconnected (that is, shut down), so the server also needs to be shut down.
            # There is no need to register concerns about HUP event s. On socket s, they are always registered by epoll objects
            elif event & select.EPOLLHUP:
                print("end hup------")
                # Unfocus on this socket connection
                epoll.unregister(fileno)
                # Close socket connection
                connections[fileno].close()
                del connections[fileno]
finally:
    # Open socket connections do not need to be closed because Python closes at the end of the program. Explicit closing here is a good code practice
    epoll.unregister(serversocket.fileno())
    epoll.close()
    serversocket.close()

Supplementary select
When the user process calls select, the whole process is block ed, and at the same time, kernels "monitor" all the sockets responsible for selects, and selects return when the data in any socket is ready. At this time, the user process calls the read operation again to copy the data from the kernels to the user process.
This diagram is not really different from that of blocking IO, but it is actually worse because it not only blocks but also requires two more system calls (select and recvfrom), while blocking IO calls only one system call (recvfrom).When there is only one connection request, this model is not as efficient as blocking IO. However, the advantage of using select is that it can handle multiple connections at the same time, whereas blocking IO cannot. I can listen for you regardless of whether the connection is blocked or not. All of your connections, including recv operations, are monitored for you (in what form? Don't think about it first, the ~~ described below)Any of them has changes (links, data)I'll tell you the user, then you can call this data, and that's where his NB is. This IO multiplexing model mechanism is provided by the operating system and called select on windows. If we want to write code to control this mechanism or write such a mechanism by ourselves, we can use the select module in PythonBlock to complete the above series of proxies. These objects or connections that can receive data are called file descriptors fd under file-free unix
Emphasize:

1.If the number of connections processed is not very high, a web server using select/epoll may not necessarily perform better than a web server using multi-threading + blocking IO, and may have greater latency. The advantage of select/epoll is not that it can process more connections for a single connection, but that it can handle more connections.

2.In a multiplexing model, each socket is typically set to non-blocking, but as shown in the figure above, the entire user's process is always blocked. Only the process is select ed as a function block, not given to the block by the socket IO.

python In select Modular:
import select
fd_r_list, fd_w_list, fd_e_list = select.select(rlist, wlist, xlist, [timeout])

Parameters: Four parameters are acceptable (the first three must)
    rlist: wait until ready for reading  #The list of objects you need to listen for to get data
    wlist: wait until ready for writing  #Waiting for the object to be written, when you need to write something, input, etc. That is, I will cycle him to see if there is any message to be sent. If I take out the message from this object and send it out, it is usually not needed, here we will also give a [].
    xlist: wait for an "exceptional condition"  #Waiting for an anomalous object, some extra cases, are not usually needed, but must be passed on, so we will give him a [].
    timeout: timeout
    When timeout ＝ n(positive integer)If there is no change in the handle to listen on, then select Will block n Seconds, then three empty lists are returned and executed directly if the handle to listen on changes.
Return value: The three lists correspond to the three parameter lists above
- select Method is used to monitor file descriptors(When the file descriptor condition is not satisfied, select Will block)，When a file descriptor state changes, three lists are returned
    1,When in the parameter 1 sequence fd When the "readable" condition is met, get the changed fd And add to fd_r_list in
    2,When parameter 2 sequence contains fd Then all of the fd add to fd_w_list in
    3,When in parameter 3 sequence fd When an error occurs, the error occurs fd add to fd_e_list in
    4,When the timeout time is empty, select It will block until the listening handle changes

Selectect process analysis for monitoring fd changes:
1. User processes create socket objects and copy the monitored fd to the kernel space. Each fd corresponds to a system file table. When the fd in the kernel space responds to the data, it sends a signal to the user that the process data has arrived.
2. The user process sends the system call again, such as (accept) copy ing the data from the kernel space to the user space and clearing it as the data that receives the data from the data side's kernel space, so that the fd can respond again when it listens again (the sender needs to receive a response because of the TCP protocol).
Advantages of this model:

1. Compared with other models, the event-driven model using select() only executes with a single thread (process), consumes less resources, does not consume too much CPU, and can also serve multiple clients. This model has some reference value if you try to build a simple event-driven server program.

Disadvantages of this model:

1. The first select() interface is not the best option to implement Event Driven because select() is required when the value of the handle to be probed is large.Interfaces themselves take a lot of time to poll for handles. Many operating systems provide more efficient interfaces, such as linux providing epoll, BSD providing kqueue, Solaris providing/dev/poll,.... If you need to implement more efficient server programs, interfaces like epoll are more recommended. Unfortunately, the epoll interfaces specific to different operating systems differ greatly.Therefore, it is difficult to implement a server with better cross-platform capabilities using an interface similar to epoll.
2. Secondly, the model mixes event detection with event response, which can be catastrophic to the entire model once the execution of the event response is large.
3. What select does has nothing to do with the blocking of Phase 2, which is to copy data from the kernel state to the blocking of the user state. It always helps you do the monitoring work and saves you some time from the blocking of Phase 1.

Mechanism of IO multiplexing:
1. select mechanism: Windows, Linux
2. poll mechanism: Linux # is the same as lselect listening mechanism, but there is no limit on the number of listeners in the list. The default limit for select is 1024, but both of them are file descriptors that the operating system polls for each listened file descriptor (if the number is large, it is not very efficient) to see if there are any readable operations.
3. epoll mechanism: Linux #its listening mechanism is different from the above two. It binds a callback function to each listened object. If you have a message on this object, triggering the callback function to the user will make a system call to copy the data, instead of polling all listened objects. This is much more efficient.

Supplementary selector

IO multiplexing: To explain this term, first understand the concept of multiplexing, which means sharing. This is still a bit abstract. For this reason, let's understand the use of multiplexing in the field of communication. In order to make full use of the physical media of network connection, in the field of communication, the technology of time division multiplexing or frequency division multiplexing is often used on the same network link.It transmits multiple signals over the same link, so here we basically understand the meaning of multiplexing, that is, to use a "medium" to do as much of the same kind of thing as possible, that is, IO multiplexed "medium"What is it? For this reason, let's first look at the server programming model. The client-side requests the server to produce a process to service it. Every time a client requests a process to service it. However, the process can not be generated indefinitely. To solve the problem of a large number of client access, IO reuse technology is introduced, that is, a process can beTo serve multiple client requests at the same time. That is to say, the "medium" for IO reuse is a process (precisely select and poll are reused, because processes are also implemented by calling select and poll), and a process is reused (select and poll)To serve multiple IOs, although the IO sent by the client is concurrent, the read and write data required by the IO is mostly not ready, so a function (select and poll) can be used to monitor the status of the data required by the IO. Once the IO has data to read and write, the process can service such IO.
After understanding IO reuse, let's look at the differences and connections among the three APIs (select, poll, epoll) used to implement IO reuse

Select, poll, epoll are all mechanisms of IO multiplexing. I/O multiplexing is a mechanism by which multiple descriptors can be monitored once a descriptor is ready (typically read or write)Can tell the application to read and write accordingly. But select, poll, epoll are all synchronous I/O essentially because they all need to be responsible for read and write when the read and write events are ready, that is, the read and write process is blocked, while asynchronous I/O does not need to be responsible for read and write, and the implementation of asynchronous I/O is responsible for copying data from the kernel to the user space.The prototype is as follows:

int select(int nfds, fd_set *readfds, fd_set *writefds, fd_set *exceptfds, struct timeval *timeout);

int poll(struct pollfd *fds, nfds_t nfds, int timeout);

int epoll_wait(int epfd, struct epoll_event *events, int maxevents, int timeout);

The first parameter of select, nfds, adds 1 to the maximum descriptor value in the fdset set, which is an array of bits with a size limit of u FD_SETSIZE (1024)Each bit of the bit array represents whether its corresponding descriptor needs to be checked. The 234 parameters indicate the file descriptor bit array that requires attention to read, write, and error events, which are input and output parameters that may be modified by the kernel to indicate which descriptors are of interest, so f needs to be reinitialized before each call to selectThe dset. timeout parameter is the timeout time, which is modified by the kernel and has a value of the time remaining for the timeout.

The calling steps for select are as follows:

(1) Copy fdset from user space to kernel space using copy_from_user

(2) Register callback function_u pollwait

(3) traverse all FDS and call their corresponding poll method (for socket s, this poll method is sock_poll, sock_poll will call tcp_poll,udp_poll or datagram_poll as appropriate)

(4) Taking tcp_poll as an example, its core implementation is u pollwait, which is the callback function registered above.

(5) u pollwait's main task is to suspend the current (current process) to the device's waiting queue. Different devices have different waiting queues. For tcp_poll, the waiting queue is sk->sk_sleep (note that hanging a process in the waiting queue does not mean the process has slept). The device receives a message (network device) or fills out the file data.(Disk device) wakes up the device waiting for the queue to go to sleep, and the current is waked up.

(6) When the poll method returns, a mask describing whether the read and write operations are ready is returned, and fd_set is assigned a value based on this mask.

(7) If all FDS have been traversed and a read-write mask has not been returned, the call to schedule_timeout is a process (i.e., current) that calls select to go to sleep. When a device driver has read-write access to its own resources, it will wake up the process that is waiting for sleep on the queue. If a certain timeout is exceeded (schedule_timeout specified)Or if no one wakes up, the process that calls select will be waked up again to get the CPU, and then go through the FD again to determine if there is a ready fd.

(8) Copy fd_set from kernel space to user space.

Summarize the major shortcomings of select:

(1) Each call to select requires a copy of the fd collection from the user state to the kernel state, which can be expensive in many FDS

(2) At the same time, every call to select needs to traverse all FDS passed in by the kernel, which is very expensive in many FDS

(3) The number of file descriptors supported by select is too small, defaulting to 1024

2. Polfd is different from select in that it passes events of interest to the kernel through a pollfd array, so there is no limit on the number of descriptors. Events field and revents in pollfd are used to mark events of interest and events respectively, so the pollfd array only needs to be initialized once.

The implementation mechanism of poll is similar to select, which corresponds to sys_poll in the kernel, except that poll passes an array of pollfd to the kernel and polls each descriptor in pollfd, which is more efficient than handling fdset. After poll returns, each element in pollfd needs to be checked for its revents value to indicate whether the event occurred.

3. It was not until Linux 2.6 that the directly supported implementation of the kernel, epoll, was recognized as the best multi-channel I/O ready notification method for Linux 2.6. epoll can support both horizontal and edge triggering(Edge Triggered, just tells the process which file descriptors have just become ready, it just says it once, and if we don't take action, it won't tell it again, which is called edge triggering)In theory, edge triggering performs better, but the code implementation is fairly complex. epoll also tells only those ready file descriptors, and when we call epoll_wait()When you get a ready file descriptor, it returns not the actual descriptor, but a value representing the number of ready descriptors. You simply go to an array specified by epoll and get the corresponding number of file descriptors in turn. Memory mapping (mmap) is also used here.In select/poll, the kernel scans all monitored file descriptors only after calling certain methods, while epoll passes through epoll_ctl() in advance.To register a file descriptor, once ready based on a file descriptor, the kernel uses a callback-like mechanism to quickly activate the file descriptor, which is notified when the process calls epoll_wait().

Now that epoll is an improvement on select ion and poll, it should be able to avoid these three shortcomings. How can epoll solve them?
Before that, let's take a look at the differences between the calling interfaces of epoll and select and poll, both of which provide only one function, the select or poll function. epoll provides three functions, epoll_create,epoll_ctl and epoll_wait, epoll_create creates an epoll handle; epoll_ctl is the type of event registered for listening; epoll_wait is the production of waiting eventsHealth.

For the first disadvantage, epoll's solution is in the epoll_ctl function. Each time a new event is registered in the epoll handle (EPOLL_CTL_ADD is specified in epoll_ctl), all FDS are copied into the kernel, not duplicated in epoll_wait. epoll ensures that each fd is copied only once throughout the process.
For the second disadvantage, epoll's solution does not take turns adding current to fd's corresponding device waiting queue every time like select or poll, but only hanging current once while epoll_ctl is in progress (this time is essential)And specify a callback function for each fd, which is called when the device is ready to wake up the waiting person on the waiting queue, and this callback function adds the ready FD to a ready list. The work of epoll_wait is actually to check for the ready FD in this ready list (using schedule_timeout() Achieving a short sleep and judging its effect is similar to Step 7 of the select implementation.
For the third disadvantage, epoll does not have this limit. The FD cap it supports is the maximum number of files that can be opened. This number is generally much greater than 2048. For example, it is about 100,000 on a machine with 1 GB of memory, which can be viewed by cat/proc/sys/fs/file-max. Generally, this number has a lot to do with system memory.

Summary:

(1)Select, poll implementations require that they continuously poll all fd collections until the device is ready, during which sleep and wake may alternate several times. While epoll actually needs to call epoll_wait to continuously poll the ready list, during which sleep and wake may alternate several times, but when the device is ready, it calls a callback function to put the ready fd in the ready list and wake up inThe process of entering sleep in epoll_wait. Sleep and alternation are required, but select and poll traverse the entire fd collection when they are awake, while epoll only needs to determine if the ready list is empty when they are awake, which saves a lot of CPU time, which is the performance improvement brought by callback mechanism.

(2) select, poll copies the fd collection from user state to kernel state once for each call, and suspends the current to the device waiting queue once, while epoll only copies once, and suspends the current to the waiting queue only once(At the beginning of epoll_wait, note that the wait queue here is not a device wait queue, but a wait queue defined internally by epoll), which also saves a lot of overhead.

(3)These three IO multiplexing models have different support on different platforms, but epoll does not support them under windows. Fortunately, we have selectors module, which helps us choose the best one under the current platform by default. We only need to write who to listen on, and then how to send messages and receive messages, but how to listen, choose whether to select or poll epoll, which is selector's helpWe choose automatically.

Keywords: Python

Added by ihsreal on Thu, 16 Sep 2021 01:54:26 +0300

Programming VIP

Five IO multiplexing codes (B station in principle)

Popular Keywords