The article is very long. It is recommended to collect it and read it slowly! Java high concurrency enthusiast community: Crazy maker circle Here are some valuable learning resources:
-
Free classic books: Java high concurrency core programming (Volume 1) Necessary for interview + necessary for large factory + necessary for salary increase Ganyon received it free of charge
-
Free classic books: Java high concurrency core programming (Volume 2) Necessary for interview + necessary for large factory + necessary for salary increase Ganyon received it free of charge
-
Free classic books: Netty Zookeeper Redis high concurrency practice Necessary for interview + necessary for large factory + necessary for salary increase Ganyon received it free of charge
-
Free classic books: SpringCloud Nginx high concurrency core programming Necessary for interview + necessary for large factory + necessary for salary increase Ganyon received it free of charge
-
Free resource treasure house: Java essential Baidu network disk resource collection, value > 10000 yuan Ganyon received
Recommendation: wonderful blog posts on joining big factories, building architectures and vigorously improving Java internal skills
Netty's strategy to solve the null polling BUG of Selector (diagram + second understanding + the most complete in History)
Null polling BUG for Selector
If the polling result of the Selector is null and there is no wakeup or new message processing, null polling occurs and the CPU utilization is 100%.
Note: the CPU is 100%, which is a very serious bug.
This notorious epoll bug is a BUG of JDK NIO. The official claims that the problem was fixed in update18 of JDK1.6. However, the problem still exists until JDK1.7 and JDK1.8, but the occurrence probability of the BUG has been reduced, and it has not been fundamentally solved. For this BUG and the list of problems related to the BUG, see the following link:
https://bugs.java.com/bugdatabase/view_bug.do?bug_id=2147719
https://bugs.java.com/bugdatabase/view_bug.do?bug_id=6403933
4 steps for Netty to solve empty polling:
Overview of Netty's solutions:
- 1. Count the select operation cycle of the Selector, and count every empty select operation completed. If N empty polls occur continuously in a certain cycle, the epoll dead cycle bug is triggered.
- 2. Rebuild the Selector to determine whether the reconstruction request is initiated by other threads. If not, remove the registration of the original SocketChannel from the old Selector, re register it to the new Selector, and close the original Selector.
4 steps for Netty to resolve empty polling
Netty has four steps to solve empty polling, as follows:
Part I: timed blocking select (timemilliseconds)
- First define the current time currentTimeNanos.
- Then calculate the minimum time required for execution, timeoutMillis.
- Timed blocking select (timemilliseconds).
- Perform + + operations on selectCnt every time.
Part II: valid IO event processing logic
Part III: timeout processing logic
- If the query times out, seletCnt is reset to 1.
Step 4: solve the empty polling BUG
- Once the threshold of SELECTOR_AUTO_REBUILD_THRESHOLD is reached, the selector needs to be rebuilt to solve this problem.
- This threshold is 512 by default.
- Rebuild the selector and re register the channel
Netty solves the 4-step core code of empty polling
long time = System.nanoTime(); //Call the select method, and the blocking time is the latest timed task time to timeout calculated above int selectedKeys = selector.select(timeoutMillis); //Counter plus 1 ++selectCnt; if (selectedKeys != 0 || oldWakenUp || this.wakenUp.get() || this.hasTasks() || this.hasScheduledTasks()) { //Entering this branch indicates a normal scenario //Selectedkeys! = 0: the number of selectedkeys is not 0. There are io events //oldWakenUp: indicates that the selector has been awakened elsewhere when it comes in //wakenUp.get(): also indicates that the selector is awakened //Hasscheduledtasks() 𞓜 hasscheduledtasks(): indicates that there are tasks or scheduled tasks to be executed //If any of the above situations occurs, it will be returned directly break; } //The logic here is: current time - cycle start time > = timed select timeoutMillis, which indicates that a blocking select() has been executed, and a valid select has been executed if (time - TimeUnit.MILLISECONDS.toNanos(timeoutMillis) >= currentTimeNanos) { //Entering this branch indicates timeout, which is a normal scenario //Indicates that a blocking poll has occurred and timed out selectCnt = 1; } else if (SELECTOR_AUTO_REBUILD_THRESHOLD > 0 && selectCnt >= SELECTOR_AUTO_REBUILD_THRESHOLD) { //Entering this branch indicates that there is no timeout, and selectedKeys==0 //It belongs to abnormal scenario //Indicates that the select bug repair mechanism is enabled, //That is, the configured io.netty.selectorAutoRebuildThreshold //The parameter is greater than 3, and the number of early returns of the above select method has been greater than //The configured threshold will trigger the reconstruction of the selector //Perform selector rebuild //After reconstruction, try to call the non blocking version select once and return directly selector = this.selectRebuildSelector(selectCnt); selectCnt = 1; break; } currentTimeNanos = time;
Netty's detection and processing logic for the early return of Selector.select is mainly in the NioEventLoop.select method. The complete code is as follows:
public final class NioEventLoop extends SingleThreadEventLoop { private void select(boolean oldWakenUp) throws IOException { Selector selector = this.selector; try { //The counter is set to 0 int selectCnt = 0; long currentTimeNanos = System.nanoTime(); //Obtain the blocking time of this select ion according to the registered scheduled task long selectDeadLineNanos = currentTimeNanos + this.delayNanos(currentTimeNanos); while(true) { //Each iteration of the loop recalculates the blocking time of the select long timeoutMillis = (selectDeadLineNanos - currentTimeNanos + 500000L) / 1000000L; //If the blocking time is 0, it indicates that a scheduled task is about to timeout //At this time, if it is the first loop (selectCnt=0), call selector.selectNow once, then exit the loop and return //The call of selectorNow method is mainly to detect and process the prepared network events as much as possible if (timeoutMillis <= 0L) { if (selectCnt == 0) { selector.selectNow(); selectCnt = 1; } break; } //If there is no timed task timeout, but there are previously registered tasks (not limited to timed tasks here), //If wakenUp is successfully set to true, selectNow is called and returned if (this.hasTasks() && this.wakenUp.compareAndSet(false, true)) { selector.selectNow(); selectCnt = 1; break; } //Call the select method, and the blocking time is the latest timed task time to timeout calculated above int selectedKeys = selector.select(timeoutMillis); //Counter plus 1 ++selectCnt; if (selectedKeys != 0 || oldWakenUp || this.wakenUp.get() || this.hasTasks() || this.hasScheduledTasks()) { //Entering this branch indicates a normal scenario //selectedKeys != 0: the number of selectedkeys is not 0. There are io events //oldWakenUp: indicates that the selector has been awakened elsewhere when it comes in //wakenUp.get(): also indicates that the selector is awakened //Hasscheduledtasks() 𞓜 hasscheduledtasks(): indicates that there are tasks or scheduled tasks to be executed //If any of the above situations occurs, it will be returned directly break; } //If the thread is interrupted, the counter is set to zero and returns directly if (Thread.interrupted()) { if (logger.isDebugEnabled()) { logger.debug("Selector.select() returned prematurely because Thread.currentThread().interrupt() was called. Use NioEventLoop.shutdownGracefully() to shutdown the NioEventLoop."); } selectCnt = 1; break; } //Here, judge whether the select returns because the calculated timeout has expired, //In this case, it also belongs to normal return, and the counter is set to 1 to enter the next cycle long time = System.nanoTime(); if (time - TimeUnit.MILLISECONDS.toNanos(timeoutMillis) >= currentTimeNanos) { //Entering this branch indicates timeout, which is a normal scenario //Indicates that a blocking poll has occurred and timed out selectCnt = 1; } else if (SELECTOR_AUTO_REBUILD_THRESHOLD > 0 && selectCnt >= SELECTOR_AUTO_REBUILD_THRESHOLD) { //Entering this branch indicates that there is no timeout, and selectedKeys==0 //It belongs to abnormal scenario //Indicates that the select bug repair mechanism is enabled, //That is, the configured io.netty.selectorAutoRebuildThreshold //The parameter is greater than 3, and the number of early returns of the above select method has been greater than //The configured threshold will trigger the reconstruction of the selector //Perform selector rebuild //After reconstruction, try to call the non blocking version select once and return directly selector = this.selectRebuildSelector(selectCnt); selectCnt = 1; break; } currentTimeNanos = time; } //This is the processing of programs that turn off the select bug repair mechanism, //Simply record the log to facilitate troubleshooting if (selectCnt > 3 && logger.isDebugEnabled()) { logger.debug("Selector.select() returned prematurely {} times in a row for Selector {}.", selectCnt - 1, selector); } } catch (CancelledKeyException var13) { if (logger.isDebugEnabled()) { logger.debug(CancelledKeyException.class.getSimpleName() + " raised by a Selector {} - JDK bug?", selector, var13); } } } private Selector selectRebuildSelector(int selectCnt) throws IOException { logger.warn("Selector.select() returned prematurely {} times in a row; rebuilding Selector {}.", selectCnt, this.selector); //Perform selector rebuild this.rebuildSelector(); Selector selector = this.selector; //After reconstruction, try to call the non blocking version select once and return directly selector.selectNow(); return selector; } }
The source code of this.rebuildSelector() called above is as follows:
public final class NioEventLoop extends SingleThreadEventLoop { public void rebuildSelector() { //If it is not in the thread, it is put in the task queue if (!this.inEventLoop()) { this.execute(new Runnable() { public void run() { NioEventLoop.this.rebuildSelector0(); } }); } else { //Otherwise, it means that the actual reconstruction method is called directly in this thread this.rebuildSelector0(); } } private void rebuildSelector0() { Selector oldSelector = this.selector; //If the old selector is empty, it will be returned directly if (oldSelector != null) { NioEventLoop.SelectorTuple newSelectorTuple; try { //Create a new selector newSelectorTuple = this.openSelector(); } catch (Exception var9) { logger.warn("Failed to create a new Selector.", var9); return; } int nChannels = 0; Iterator var4 = oldSelector.keys().iterator(); //For all key s registered on the old selector, re register them on the new selecor in turn while(var4.hasNext()) { SelectionKey key = (SelectionKey)var4.next(); Object a = key.attachment(); try { if (key.isValid() && key.channel().keyFor(newSelectorTuple.unwrappedSelector) == null) { int interestOps = key.interestOps(); key.cancel(); SelectionKey newKey = key.channel().register(newSelectorTuple.unwrappedSelector, interestOps, a); if (a instanceof AbstractNioChannel) { ((AbstractNioChannel)a).selectionKey = newKey; } ++nChannels; } } catch (Exception var11) { logger.warn("Failed to re-register a Channel to the new Selector.", var11); if (a instanceof AbstractNioChannel) { AbstractNioChannel ch = (AbstractNioChannel)a; ch.unsafe().close(ch.unsafe().voidPromise()); } else { NioTask<SelectableChannel> task = (NioTask)a; invokeChannelUnregistered(task, key, var11); } } } //Assign the selector associated with the NioEventLoop as the new selector this.selector = newSelectorTuple.selector; this.unwrappedSelector = newSelectorTuple.unwrappedSelector; try { //Close the old selector oldSelector.close(); } catch (Throwable var10) { if (logger.isWarnEnabled()) { logger.warn("Failed to close the old Selector.", var10); } } if (logger.isInfoEnabled()) { logger.info("Migrated " + nChannels + " channel(s) to the new Selector."); } } } }
Threshold configuration for Netty null polling
Netty considers this problem in NioEventLoop and fixes this bug by re creating a new Selector when the select method returns abnormally (netty source code comments call it prematurely, i.e. returns in advance) more than a certain number of times.
Netty provides the configuration parameter io.netty.selectorAutoRebuildThreshold, which allows the user to define the threshold number of times for selecting to create a new Selector to return in advance. Exceeding this number of times will trigger the automatic reconstruction of the Selector, which is 512 by default.
However, if the specified io.netty.selectorAutoRebuildThreshold is less than 3, the function is considered to be turned off in Netty.
public final class NioEventLoop extends SingleThreadEventLoop { private static final int SELECTOR_AUTO_REBUILD_THRESHOLD; static { //... omit some codes int selectorAutoRebuildThreshold = SystemPropertyUtil.getInt("io.netty.selectorAutoRebuildThreshold", 512); if (selectorAutoRebuildThreshold < 3) { selectorAutoRebuildThreshold = 0; } SELECTOR_AUTO_REBUILD_THRESHOLD = selectorAutoRebuildThreshold; if (logger.isDebugEnabled()) { logger.debug("-Dio.netty.noKeySetOptimization: {}", DISABLE_KEY_SET_OPTIMIZATION); logger.debug("-Dio.netty.selectorAutoRebuildThreshold: {}", SELECTOR_AUTO_REBUILD_THRESHOLD); } } }
reference:
https://www.jianshu.com/p/b1ba37b6563b
https://blog.csdn.net/zhengchao1991/article/details/106534280