Let's start with an official website
Fusing strategy
-
Slow_request_ratio: select the slow call ratio as the threshold, and set the allowed slow call RT (i.e. the maximum response time). If the response time of the request is greater than this value, it will be counted as slow call. When the number of requests in the unit statistical duration (statIntervalMs) is greater than the set minimum number of requests, and the proportion of slow calls is greater than the threshold, the next fusing duration requests will be automatically fused. After fusing for a long time, the fuse will enter the detection recovery state (HALF-OPEN state). If the response time of the next request is less than the set slow call RT, the fusing will be ended. If it is greater than the set slow call RT, it will be blown again.
-
Error_ratio: when the number of requests in the unit statistical duration (statIntervalMs) is greater than the set minimum number of requests, and the proportion of exceptions is greater than the threshold, the requests in the next fusing duration will be blown automatically. After fusing for a long time, the fuse will enter the detection recovery state (HALF-OPEN state). If the next request is successfully completed (no error), the fusing will be ended, otherwise it will be blown again. The threshold range of abnormal ratio is [0.0, 1.0], representing 0% - 100%.
-
Error_count: when the number of exceptions in the unit statistical time exceeds the threshold, it will automatically fuse. After fusing for a long time, the fuse will enter the detection recovery state (HALF-OPEN state). If the next request is successfully completed (no error), the fusing will be ended, otherwise it will be blown again.
Fusing state,
- There are three fusing states: open and HALF_OPEN,CLOSED.
- OPEN: indicates that the fuse is on and all requests are rejected
- HALF_OPEN: detect the recovery state. If the next request passes, the fusing will be ended, otherwise the fusing will continue
- CLOSED: indicates that the fuse is CLOSED and the request passes smoothly

Rule construction
com.alibaba.csp.sentinel.slots.block.degrade.DegradeRuleManager#newCircuitBreakerFrom
public final class DegradeRuleManager { //Static variable contains resource information //(resource -> Breaker) private static volatile Map<String, List<CircuitBreaker>> circuitBreakers = new HashMap<>(); //(resource -> Breaker) private static volatile Map<String, Set<DegradeRule>> ruleMap = new HashMap<>(); private static CircuitBreaker newCircuitBreakerFrom(/*@Valid*/ DegradeRule rule) { switch (rule.getGrade()) { case RuleConstant.DEGRADE_GRADE_RT: return new ResponseTimeCircuitBreaker(rule); case RuleConstant.DEGRADE_GRADE_EXCEPTION_RATIO: case RuleConstant.DEGRADE_GRADE_EXCEPTION_COUNT: return new ExceptionCircuitBreaker(rule); default: return null; } } ////////// }
Fuse interceptors mainly include ResponseTimeCircuitBreaker and ExceptionCircuitBreaker.
public class ExceptionCircuitBreaker extends AbstractCircuitBreaker { private final int strategy; private final int minRequestAmount; private final double threshold; private final LeapArray<SimpleErrorCounter> stat; @Override public void onRequestComplete(Context context) { Entry entry = context.getCurEntry(); if (entry == null) { return; } Throwable error = entry.getError(); SimpleErrorCounter counter = stat.currentWindow().value(); if (error != null) { counter.getErrorCount().add(1); } counter.getTotalCount().add(1); handleStateChangeWhenThresholdExceeded(error); } private void handleStateChangeWhenThresholdExceeded(Throwable error) { if (currentState.get() == State.OPEN) { return; } if (currentState.get() == State.HALF_OPEN) { // In detecting request if (error == null) { fromHalfOpenToClose(); } else { fromHalfOpenToOpen(1.0d); } return; } List<SimpleErrorCounter> counters = stat.values(); long errCount = 0; long totalCount = 0; for (SimpleErrorCounter counter : counters) { errCount += counter.errorCount.sum(); totalCount += counter.totalCount.sum(); } if (totalCount < minRequestAmount) { return; } double curCount = errCount; if (strategy == DEGRADE_GRADE_EXCEPTION_RATIO) { // Use errorRatio curCount = errCount * 1.0d / totalCount; } if (curCount > threshold) { transformToOpen(curCount); } } }
References: https://www.jianshu.com/p/500d461d2391
SystemSlot
It is the entry point to realize the adaptive current limiting of the system. DegradeSlot is placed behind FlowSlot in the ProcessorSlotChain list as a bottom-up solution for current limiting, while SystemSlot is placed in front of FlowSlot in the ProcessorSlotChain list. It is mandatory to give priority to whether the current situation of the system can process the current request, so that the system can run at the maximum throughput and ensure the stability of the system.
The system adaptive flow restriction rule is effective for all resources with traffic type IN, so the resource name of the rule does not need to be configured. The fields defined by SystemRule are as follows:
public class SystemRule extends AbstractRule { private double highestSystemLoad = -1; private double highestCpuUsage = -1; private double qps = -1; private long avgRt = -1; private long maxThread = -1; }
- QPS: according to the QPS current limiting threshold, the default is - 1, and it takes effect only when it is greater than 0.
- avgRt: the current limiting threshold based on the average time consumption. The default value is - 1. It takes effect only when it is greater than 0.
- maxThread: the threshold value of the maximum number of threads occupied in parallel. The default value is - 1. It takes effect when it is greater than 0.
- Highestcpuusuage: the threshold value of current limiting according to CPU utilization. The value is between [0,1]. The default value is - 1. It takes effect only when it is greater than or equal to 0.0.
- highestSystemLoad: limits the current according to the system load threshold. The default is - 1. It takes effect only when it is greater than 0.0.
If multiple systemrules are configured, only the minimum value is taken for each configuration item. For example, if qps is configured for all three systemrules, take the smallest qps of the three rules as the current limiting threshold, which is completed when calling the SystemRuleManager#loadRules method to load the rules.
public static void loadSystemConf(SystemRule rule) { // Whether to turn on the system adaptive current limiting judgment function boolean checkStatus = false; // highestSystemLoad if (rule.getHighestSystemLoad() >= 0) { // If multiple rules are configured, the minimum value is taken highestSystemLoad = Math.min(highestSystemLoad, rule.getHighestSystemLoad()); highestSystemLoadIsSet = true; // Turn on the system adaptive current limit check function checkStatus = true; } // highestCpuUsage if (rule.getHighestCpuUsage() >= 0) { if (rule.getHighestCpuUsage() > 1) {} // [0,1) else { // If multiple rules are configured, the minimum value is taken highestCpuUsage = Math.min(highestCpuUsage, rule.getHighestCpuUsage()); highestCpuUsageIsSet = true; checkStatus = true; } } // avgRt if (rule.getAvgRt() >= 0) { // If multiple rules are configured, the minimum value is taken maxRt = Math.min(maxRt, rule.getAvgRt()); maxRtIsSet = true; checkStatus = true; } // maxThread if (rule.getMaxThread() >= 0) { // If multiple rules are configured, the minimum value is taken maxThread = Math.min(maxThread, rule.getMaxThread()); maxThreadIsSet = true; checkStatus = true; } // qps if (rule.getQps() >= 0) { // If multiple rules are configured, the minimum value is taken qps = Math.min(qps, rule.getQps()); qpsIsSet = true; checkStatus = true; } checkSystemStatus.set(checkStatus); }
The SystemRuleManager#checkSystem method collects statistics from the global resource indicator data node constant.entry_ Node reads the indicator data of the current time window, and judges whether the total QPS and average time-consuming indicator data reach the threshold, or whether the total number of threads occupied reaches the threshold. If the threshold is reached, a Block exception (SystemBlockException) is thrown. In addition, the checkSystem method also limits the current according to the current Load and CPU utilization of the system.
The source code of SystemRuleManager#checkSystem method is as follows:
public static void checkSystem(ResourceWrapper resourceWrapper) throws BlockException { if (resourceWrapper == null) { return; } // If configured SystemRule,be checkSystemStatus by true if (!checkSystemStatus.get()) { return; } // Only the current type is IN Flow rate if (resourceWrapper.getEntryType() != EntryType.IN) { return; } // qps Current limiting double currentQps = Constants.ENTRY_NODE == null ? 0.0 : Constants.ENTRY_NODE.successQps(); if (currentQps > qps) { throw new SystemBlockException(resourceWrapper.getName(), "qps"); } // Thread limit int currentThread = Constants.ENTRY_NODE == null ? 0 : Constants.ENTRY_NODE.curThreadNum(); if (currentThread > maxThread) { throw new SystemBlockException(resourceWrapper.getName(), "thread"); } // Average time-consuming current limiting double rt = Constants.ENTRY_NODE == null ? 0 : Constants.ENTRY_NODE.avgRt(); if (rt > maxRt) { throw new SystemBlockException(resourceWrapper.getName(), "rt"); } // System average load current limiting if (highestSystemLoadIsSet && getCurrentSystemAvgLoad() > highestSystemLoad) { if (!checkBbr(currentThread)) { throw new SystemBlockException(resourceWrapper.getName(), "load"); } } // cpu Utilization rate current limiting if (highestCpuUsageIsSet && getCurrentCpuUsage() > highestCpuUsage) { throw new SystemBlockException(resourceWrapper.getName(), "cpu"); } }
Get system load and CPU usage
Sentinel uses the OperatingSystemMXBean API to obtain the values of these two indicator data every second through the scheduled task. The code is as follows:
@Override public void run() { try { OperatingSystemMXBean osBean = ManagementFactory .getPlatformMXBean(OperatingSystemMXBean.class); // getSystemLoadAverage currentLoad = osBean.getSystemLoadAverage(); // getSystemCpuLoad currentCpuUsage = osBean.getSystemCpuLoad(); if (currentLoad > SystemRuleManager.getSystemLoadThreshold()) { writeSystemStatusLog(); } } catch (Throwable e) { RecordLog.warn("[SystemStatusListener] Failed to get system metrics from JMX", e); } }
Detailed reference: http://learn.lianglianglee.com/ Column / in-depth understanding of% 20Sentinel (end) / 13% 20 fuse degradation and system adaptive current limiting.md