Sentinel notes - fuse degradation and system adaptive current limiting

 

Let's start with an official website

 

 

Fusing strategy

  • Slow_request_ratio: select the slow call ratio as the threshold, and set the allowed slow call RT (i.e. the maximum response time). If the response time of the request is greater than this value, it will be counted as slow call. When the number of requests in the unit statistical duration (statIntervalMs) is greater than the set minimum number of requests, and the proportion of slow calls is greater than the threshold, the next fusing duration requests will be automatically fused. After fusing for a long time, the fuse will enter the detection recovery state (HALF-OPEN state). If the response time of the next request is less than the set slow call RT, the fusing will be ended. If it is greater than the set slow call RT, it will be blown again.

  • Error_ratio: when the number of requests in the unit statistical duration (statIntervalMs) is greater than the set minimum number of requests, and the proportion of exceptions is greater than the threshold, the requests in the next fusing duration will be blown automatically. After fusing for a long time, the fuse will enter the detection recovery state (HALF-OPEN state). If the next request is successfully completed (no error), the fusing will be ended, otherwise it will be blown again. The threshold range of abnormal ratio is [0.0, 1.0], representing 0% - 100%.

  • Error_count: when the number of exceptions in the unit statistical time exceeds the threshold, it will automatically fuse. After fusing for a long time, the fuse will enter the detection recovery state (HALF-OPEN state). If the next request is successfully completed (no error), the fusing will be ended, otherwise it will be blown again.

Fusing state,

  • There are three fusing states: open and HALF_OPEN,CLOSED.
  • OPEN: indicates that the fuse is on and all requests are rejected
  • HALF_OPEN: detect the recovery state. If the next request passes, the fusing will be ended, otherwise the fusing will continue
  • CLOSED: indicates that the fuse is CLOSED and the request passes smoothly
  Fuse Rule parameter

 

 

Rule construction

com.alibaba.csp.sentinel.slots.block.degrade.DegradeRuleManager#newCircuitBreakerFrom

public final class DegradeRuleManager {

    //Static variable contains resource information 
    
    //(resource -> Breaker)
    private static volatile Map<String, List<CircuitBreaker>> circuitBreakers = new HashMap<>();

    //(resource -> Breaker)
    private static volatile Map<String, Set<DegradeRule>> ruleMap = new HashMap<>();

    private static CircuitBreaker newCircuitBreakerFrom(/*@Valid*/ DegradeRule rule) {
        switch (rule.getGrade()) {
            case RuleConstant.DEGRADE_GRADE_RT:
                return new ResponseTimeCircuitBreaker(rule);
            case RuleConstant.DEGRADE_GRADE_EXCEPTION_RATIO:
            case RuleConstant.DEGRADE_GRADE_EXCEPTION_COUNT:
                return new ExceptionCircuitBreaker(rule);
            default:
                return null;
        }
    }
    //////////
}

 

 

 

 

Fuse interceptors mainly include ResponseTimeCircuitBreaker and ExceptionCircuitBreaker.

public class ExceptionCircuitBreaker extends AbstractCircuitBreaker {

    private final int strategy;
    private final int minRequestAmount;
    private final double threshold;
    private final LeapArray<SimpleErrorCounter> stat;

    @Override
    public void onRequestComplete(Context context) {
        Entry entry = context.getCurEntry();
        if (entry == null) {
            return;
        }
        Throwable error = entry.getError();
        SimpleErrorCounter counter = stat.currentWindow().value();
        if (error != null) {
            counter.getErrorCount().add(1);
        }
        counter.getTotalCount().add(1);

        handleStateChangeWhenThresholdExceeded(error);
    }

    private void handleStateChangeWhenThresholdExceeded(Throwable error) {
        if (currentState.get() == State.OPEN) {
            return;
        }
        
        if (currentState.get() == State.HALF_OPEN) {
            // In detecting request
            if (error == null) {
                fromHalfOpenToClose();
            } else {
                fromHalfOpenToOpen(1.0d);
            }
            return;
        }
        
        List<SimpleErrorCounter> counters = stat.values();
        long errCount = 0;
        long totalCount = 0;
        for (SimpleErrorCounter counter : counters) {
            errCount += counter.errorCount.sum();
            totalCount += counter.totalCount.sum();
        }
        if (totalCount < minRequestAmount) {
            return;
        }
        double curCount = errCount;
        if (strategy == DEGRADE_GRADE_EXCEPTION_RATIO) {
            // Use errorRatio
            curCount = errCount * 1.0d / totalCount;
        }
        if (curCount > threshold) {
            transformToOpen(curCount);
        }
    }
}

 

 

References:   https://www.jianshu.com/p/500d461d2391

SystemSlot

It is the entry point to realize the adaptive current limiting of the system. DegradeSlot is placed behind FlowSlot in the ProcessorSlotChain list as a bottom-up solution for current limiting, while SystemSlot is placed in front of FlowSlot in the ProcessorSlotChain list. It is mandatory to give priority to whether the current situation of the system can process the current request, so that the system can run at the maximum throughput and ensure the stability of the system.

The system adaptive flow restriction rule is effective for all resources with traffic type IN, so the resource name of the rule does not need to be configured. The fields defined by SystemRule are as follows:

public class SystemRule extends AbstractRule {
    private double highestSystemLoad = -1;
    private double highestCpuUsage = -1;
    private double qps = -1;
    private long avgRt = -1;
    private long maxThread = -1;
}

 

  • QPS: according to the QPS current limiting threshold, the default is - 1, and it takes effect only when it is greater than 0.
  • avgRt: the current limiting threshold based on the average time consumption. The default value is - 1. It takes effect only when it is greater than 0.
  • maxThread: the threshold value of the maximum number of threads occupied in parallel. The default value is - 1. It takes effect when it is greater than 0.
  • Highestcpuusuage: the threshold value of current limiting according to CPU utilization. The value is between [0,1]. The default value is - 1. It takes effect only when it is greater than or equal to 0.0.
  • highestSystemLoad: limits the current according to the system load threshold. The default is - 1. It takes effect only when it is greater than 0.0.

If multiple systemrules are configured, only the minimum value is taken for each configuration item. For example, if qps is configured for all three systemrules, take the smallest qps of the three rules as the current limiting threshold, which is completed when calling the SystemRuleManager#loadRules method to load the rules.

public static void loadSystemConf(SystemRule rule) {
        // Whether to turn on the system adaptive current limiting judgment function
        boolean checkStatus = false;
        // highestSystemLoad
        if (rule.getHighestSystemLoad() >= 0) {
            // If multiple rules are configured, the minimum value is taken
            highestSystemLoad = Math.min(highestSystemLoad, rule.getHighestSystemLoad());
            highestSystemLoadIsSet = true;
            // Turn on the system adaptive current limit check function
            checkStatus = true;
        }
        // highestCpuUsage
        if (rule.getHighestCpuUsage() >= 0) {
            if (rule.getHighestCpuUsage() > 1) {}
            // [0,1)
            else {
                // If multiple rules are configured, the minimum value is taken
                highestCpuUsage = Math.min(highestCpuUsage, rule.getHighestCpuUsage());
                highestCpuUsageIsSet = true;
                checkStatus = true;
            }
        }
        // avgRt
        if (rule.getAvgRt() >= 0) {
            // If multiple rules are configured, the minimum value is taken
            maxRt = Math.min(maxRt, rule.getAvgRt());
            maxRtIsSet = true;
            checkStatus = true;
        }
        // maxThread
        if (rule.getMaxThread() >= 0) {
            // If multiple rules are configured, the minimum value is taken
            maxThread = Math.min(maxThread, rule.getMaxThread());
            maxThreadIsSet = true;
            checkStatus = true;
        }
        // qps
        if (rule.getQps() >= 0) {
            // If multiple rules are configured, the minimum value is taken
            qps = Math.min(qps, rule.getQps());
            qpsIsSet = true;
            checkStatus = true;
        }
        checkSystemStatus.set(checkStatus);
    }

 

The SystemRuleManager#checkSystem method collects statistics from the global resource indicator data node constant.entry_ Node reads the indicator data of the current time window, and judges whether the total QPS and average time-consuming indicator data reach the threshold, or whether the total number of threads occupied reaches the threshold. If the threshold is reached, a Block exception (SystemBlockException) is thrown. In addition, the checkSystem method also limits the current according to the current Load and CPU utilization of the system.

The source code of SystemRuleManager#checkSystem method is as follows:

public static void checkSystem(ResourceWrapper resourceWrapper) throws BlockException {
        if (resourceWrapper == null) {
            return;
        }
        // If configured SystemRule,be checkSystemStatus by true
        if (!checkSystemStatus.get()) {
            return;
        }
        // Only the current type is IN Flow rate
        if (resourceWrapper.getEntryType() != EntryType.IN) {
            return;
        }
        // qps Current limiting
        double currentQps = Constants.ENTRY_NODE == null ? 0.0 : Constants.ENTRY_NODE.successQps();
        if (currentQps > qps) {
            throw new SystemBlockException(resourceWrapper.getName(), "qps");
        }
        // Thread limit
        int currentThread = Constants.ENTRY_NODE == null ? 0 : Constants.ENTRY_NODE.curThreadNum();
        if (currentThread > maxThread) {
            throw new SystemBlockException(resourceWrapper.getName(), "thread");
        }
        // Average time-consuming current limiting
        double rt = Constants.ENTRY_NODE == null ? 0 : Constants.ENTRY_NODE.avgRt();
        if (rt > maxRt) {
            throw new SystemBlockException(resourceWrapper.getName(), "rt");
        }
        // System average load current limiting
        if (highestSystemLoadIsSet && getCurrentSystemAvgLoad() > highestSystemLoad) {
            if (!checkBbr(currentThread)) {
                throw new SystemBlockException(resourceWrapper.getName(), "load");
            }
        }
        // cpu Utilization rate current limiting
        if (highestCpuUsageIsSet && getCurrentCpuUsage() > highestCpuUsage) {
            throw new SystemBlockException(resourceWrapper.getName(), "cpu");
        }
}

 

Get system load and CPU usage

Sentinel uses the OperatingSystemMXBean API to obtain the values of these two indicator data every second through the scheduled task. The code is as follows:

@Override
    public void run() {
        try {
            OperatingSystemMXBean osBean = ManagementFactory
                                       .getPlatformMXBean(OperatingSystemMXBean.class);
            // getSystemLoadAverage
            currentLoad = osBean.getSystemLoadAverage();
            // getSystemCpuLoad
            currentCpuUsage = osBean.getSystemCpuLoad();
            if (currentLoad > SystemRuleManager.getSystemLoadThreshold()) {
                writeSystemStatusLog();
            }
        } catch (Throwable e) {
            RecordLog.warn("[SystemStatusListener] Failed to get system metrics from JMX", e);
        }
    }

 

 

Detailed reference:   http://learn.lianglianglee.com/ Column / in-depth understanding of% 20Sentinel (end) / 13% 20 fuse degradation and system adaptive current limiting.md

 

Added by Mike-2003 on Sat, 06 Nov 2021 03:58:28 +0200