BIO and NIO summary

BIO and NIO

Note: the io models discussed in this paper are based on the network communication socket

BIO -- blocking IO model

In network communication, the client establishes a connection with the server first. Because the server does not know when the client will send data, the server has to start a thread to receive messages from the client. Therefore, the server will block in this io process. In the traditional bio model of java, once the connection is established, it will always listen to whether data is transmitted from the socket. See the code and comments below to help understand.

package BIO;

import java.io.IOException;
import java.io.InputStream;
import java.net.ServerSocket;
import java.net.Socket;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;

/**
 * @Description:
 * @ClassName: BIOServer
 * @Author: yokna
 * @Date: 2021/7/22 10:47
 * @Version: 1.0
 */
public class BIOServer {
    public static void main(String[] args) throws IOException {
        //Open a thread pool
        ExecutorService executorService = Executors.newCachedThreadPool();

        ServerSocket serverSocket = new ServerSocket(6666);
        System.out.println("Service startup");
        while (true){
            //This will block once and wait for the client to connect
            Socket accept = serverSocket.accept();
            System.out.println("Connect to a client");

            executorService.execute(new Runnable() {
                @Override
                public void run() {
                    //It will be blocked once. See the notes in the following methods for details
                    handler(accept);
                }
            });
        }
    }


    public static void handler(Socket socket)  {

        try {
            System.out.println(Thread.currentThread().getId() + Thread.currentThread().getName());
            byte[] bytes = new byte[1024];
            //Gets the input stream to the socket
            InputStream inputStream = socket.getInputStream();

            while (true){
                //This is the root cause of blocking. When the system calls read, it is uncontrollable when the other party can transmit data due to the unpredictability of the network. Therefore, it should be blocked all the time to read the data sent by the client
                int read = inputStream.read(bytes);
                if (read != -1){
                    System.out.println(new String(bytes,0,read));
                }else {
                    break;
                }
            }
        } catch (IOException e) {
            e.printStackTrace();
        }finally {
            try {
                socket.shutdownOutput();
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
    }
}

In the actual development, when a connection comes, a thread is opened to handle the connection. The connection is inefficient in the process of transmitting data, but the server overhead must be maintained. When there are many clients, more threads are needed to handle it, resulting in the problem of context switching between threads. After some connections come in, The message will not be sent all the time, so you don't need to loop to read all the time.

NIO non blocking IO model

Before NIO, you need to understand mmap and zerocopy

mmap and bytebuffer can directly map the out of heap address to the java virtual machine.

When the traditional io model calls read, the process is as follows

It can be seen that when a read occurs, it involves four round-trip switches from kernel state to user state and two data copies. This overhead is fatal in high concurrency scenarios. Often, we only need to send the file to another user through socket without processing the file, The above costs of context switching and data copying are extra costs. Is there a way to send files directly to another user? Of course, the next step is to lead to zero copy.

zerocopy

In the zero copy model, we send data directly through the kernel state without context switching. The cost is only to read data into memory through DMA. However, this method is limited to data and does not need to be processed by java programs. If we need to process data in the program and then send it to users, this method is obviously inappropriate. Fortunately, there is another mechanism mmap, which not only takes into account the performance, but also can customize the operation of data.

mmap and sendfile

mmap maps the out of heap memory data directly into the java heap (instead of copying), which saves two copies, but the cost of context switching is unavoidable.

bytebuffer

bytebuffer has three different buffer types, corresponding to in heap space and out of heap space respectively,

HeadByteBuffer

Use ByteBuffer Created by allocate (), the ByteBuffer exists in the heap space, so it is supported by GC (can be garbage collected) and the cache is optimized. However, it is not a continuous memory space, which means that if you access the native code through JNI, the JVM will first copy it to its buffer space.

DirectByteBuffer

Use ByteBuffer When allocatedirect() is created, the JVM will use malloc() function to allocate memory space outside the heap space. The advantage is that the allocated memory space is continuous, while the disadvantage is that it is not managed by the JVM, which means you need to be careful of memory leakage.

MappedByteBuffer

Use filechannel Map () map to allocate memory space outside the heap space. In essence, it is around the system call of mmap(), so that our java code can directly manipulate the mapped memory data.

import java.io.File;
import java.io.FileNotFoundException;
import java.io.RandomAccessFile;
import java.nio.ByteBuffer;
import java.nio.MappedByteBuffer;
import java.nio.channels.FileChannel;
import java.nio.charset.StandardCharsets;

/**
 * @Description:
 * @ClassName: RandomIO
 * @Author: yokna
 * @Date: 2021/7/21 9:39
 * @Version: 1.0
 */
public class RandomIO {
    //It supports reading and writing, and can be read randomly. Seek will point the pointer to seek, and then write data at the position pointed by the seek pointer the next time
    public static void main(String[] args) throws Exception {
        File file = new File("src/main/resources/test");
        RandomAccessFile raf = new RandomAccessFile(file,"rw");

        //Get the channel
        FileChannel c1 = raf.getChannel();

//        There are three ways to create ByteBuffer

        MappedByteBuffer map = c1.map(FileChannel.MapMode.READ_WRITE, 0, 1024);//The system call in the mmap kernel reduces the copy from the application cache space to the memory cache space once, and opens up a space in the off heap space that can be shared with the kernel
//        ByteBuffer buffer = ByteBuffer.allocate(1024);// Allocate on heap
//        ByteBuffer buffer1 = ByteBuffer.allocateDirect(1024);// It is allocated in the off heap space and mapped directly. The kernel can directly access this space
//        c1.write(buffer);
//        Difference: if the data outside the heap wants to be written to the disk, after the system call, the data needs to be copied from the user space memory to the kernel space memory
//        The data of off heap mapbuffer is processed directly by the kernel

        map.put("hello world \n hello nnn \n good idea".getBytes(StandardCharsets.UTF_8));
        map.put("hh".getBytes(StandardCharsets.UTF_8));

        //This method moves the pointer to the pos position
        raf.seek(2);
        //Overwrite write: overwrite write successively from the pointer position
        raf.write("123455".getBytes(StandardCharsets.UTF_8));
    }


}

NIO model

With the above mmap concept and java's bytebuffer implementation of the mmap mechanism, let's look at NIO

In order to solve the thread overhead problem of bio when establishing a connection, we introduce the NIO model. In the NIO mode, call read. If no data has arrived, it will immediately return - 1, and errno is set to EAGAIN.

nio is generally used in conjunction with IO multiplexing. The so-called IO multiplexing is that the program registers a set of socket file descriptors to the operating system, which means "I want to monitor whether these fd IO events occur, and tell the program to handle them".

1.Linux operation epoll_create,A data table is created in the kernel layer, and the interface will return a“ epoll The "file descriptor" of points to this table. Equivalent to creating a selector
2.Linux operation epoll_ctl Register events to listen on, JAVA Lieutenant general channel Register into selector Underlying system call
3.Linux operation epoll_wait use epoll_wait To wait for the event to happen, selector.selecte()Method
package JavaSocket;

import java.io.IOException;
import java.net.InetSocketAddress;
import java.nio.ByteBuffer;
import java.nio.channels.SelectionKey;
import java.nio.channels.Selector;
import java.nio.channels.ServerSocketChannel;
import java.nio.channels.SocketChannel;
import java.util.Iterator;
import java.util.Set;

/**
 * @Description: Synchronous non blocking
 * @ClassName: NIO
 * @Author: yokna
 * @Date: 2021/7/21 11:02
 * @Version: 1.0
 */
public class NIO {

    public void  nio() throws IOException {
        //Equivalent to serversocket
        ServerSocketChannel ss = ServerSocketChannel.open();

        //Enable nio, non blocking model
        ss.configureBlocking(false);
        ss.bind(new InetSocketAddress(8089));

        Selector selector = Selector.open();//epoll_create system call to give an ep_fd file descriptor, used to register socket file descriptor

        ss.register(selector, SelectionKey.OP_ACCEPT);//epoll_ctl(ep_fd, fd) places the file descriptor on a red black tree

        while (true){
            int num = selector.select();//epoll_wait system call if the kernel finds a socket event, it will put the file descriptor in the red black tree in the selector into a linked list
            if (num == 0) continue;

            //Get the keys in the selector and traverse. The keys are immutable. Any attempt to modify the keys will cause an exception
            Set<SelectionKey> keys = selector.keys();
            Iterator<SelectionKey> iter = keys.iterator();

            while (iter.hasNext()){
                SelectionKey key = iter.next();

                if (key.isAcceptable()){
                    ServerSocketChannel cc = (ServerSocketChannel) key.channel();
                    SocketChannel sc = cc.accept();
                    sc.configureBlocking(false);
                    sc.register(selector,SelectionKey.OP_READ);
                }else if (key.isReadable()){
                    SocketChannel c1 = (SocketChannel) key.channel();
                    ByteBuffer bf = ByteBuffer.allocate(1024);
                    c1.read(bf);
                    byte[] inData = new byte[1024];
                    bf.get(inData,0,bf.limit());
                    System.out.println(new String(inData,0,bf.limit()));
                }
            }
            iter.remove();//After processing an event, you need to remove one, otherwise it will be processed repeatedly under multithreading.
        }

    }
}

summary

BIO is a blocked network IO model. Blocking is manifested in that after the connection is established, the server cannot know when the client will send data, and sending data once requires four context switches and two copies, which is costly.

zerocopy is an optimization of network data sending. If you do not need to modify the data, you can send it directly from memory to the other party, eliminating two copies (copy it from memory to java program, and then copy it from java program to memory)

mmap is a direct out of heap memory mapping technology, which maps out of heap memory to in heap space, but the jvm cannot manage this address space. mmap appears to solve the problem that data needs to be modified without multiple replication and context switching. The cost of mmap is that the jvm cannot manage the out of heap address space and still has the cost of context switching. The implementation in java is bytebuffer.

NIO is generally used in combination with io multiplexing. The three core components are bytebuffer, channel and selector. bytebuffer is generated to reduce the number of copies and context switching. Channel is the connection channel between the server and the client, and the selector is the mechanism for the server to manage the channel. When there is data in the channel, it will be found in the selector, so that the server can calmly process network io requests. The whole process is event driven. The events in the selector are found through continuous polling, and then the channel of the event is taken out from the selector. If it is designed to send data to the client on the server side, it will be sent quickly using bytebuffer. This set of combination punches down, which not only improves the speed of java network communication, but also greatly improves the concurrency of java programs!

In order to reduce the number of copies and the impact of context switching, the channel is the connection channel between the server and the client, and the selector is the mechanism for the server to manage the channel. When there is data in the channel, it will be found in the selector, so that the server can handle network io requests easily. The whole process is event driven. The events in the selector are found through continuous polling, and then the channel of the event is taken out from the selector. If it is designed to send data to the client on the server side, it will be sent quickly using bytebuffer. This set of combination punches down, which not only improves the speed of java network communication, but also greatly improves the concurrency of java programs!

Keywords: Java Linux Netty socket Multithreading

Added by kumaran on Thu, 23 Dec 2021 11:51:58 +0200