Microservice architecture | 5.1 uses Netflix Hystrix circuit breaker

preface

reference material:
<Spring Microservices in Action>
Principle and practice of Spring Cloud Alibaba microservice
"Spring cloud framework development tutorial in Silicon Valley of station B" Zhou Yang

Hystrix is a delay and disaster recovery library, which aims to isolate the access points of remote systems, services and third-party libraries, stop cascading failures, and realize resilience in complex distributed systems where failures are inevitable;

1. Basic knowledge of hystrix

1.1 emphasis call of hystrix circuit breaker

  • There is no difference between providers and consumers in Hystrix circuit breaker. It emphasizes the intermediary between services and resources, such as service request database and service internal call;
  • Therefore, in service consumers and service providers, Hystrix circuit breakers can be used for all database access and inter service calls;
  • The realization of circuit breaker mode, backup mode and bulkhead mode requires a deep understanding of threads and thread management;
  • Netflix's Hystrix library encapsulates thread operations. Developers can only focus on business development and Spring Cloud;

1.2 implementation of two categories of Hystrix

  • Use the Hystrix circuit breaker to package all calls to the database in all services;
  • Use the Hystrix circuit breaker to package the internal service calls between all services;

1.3 bulkhead strategy

1.4 decision making process of hystrix in case of remote resource call failure

  • Snapshot time window: view the number of calls in 10 s:
    • Threshold of total requests: if the number of calls is less than the minimum number of calls set in this window, Hystrix will not take action even if several calls fail;
    • On the contrary, proceed to the next step;
  • Overall failure percentage view:
    • Error percentage threshold: if the overall percentage of faults exceeds the threshold, Hystrix will trigger the circuit breaker and make almost all calls fail in the future;
  • When the Hystrix circuit breaker is triggered, it will attempt to start a new active window:
    • Every 5s (configurable), Hystrix will make a remote call. If the call is successful, Hystrix will reset the circuit breaker and restart the call. If the call fails, Hystrix will keep the circuit breaker disconnected;

1.5 when the Hystrix circuit breaker is opened

  • When another request is called, the main logic will not be called, but the degraded fallback will be called directly. Through the circuit breaker, it can automatically find errors and switch the degraded logic to the main logic to reduce the response delay;
  • hystrix will start a sleep time window in which the degradation logic is temporary and becomes the main logic;
  • When the sleep time window expires, the circuit breaker will enter the half open state and release a request to the original main logic;
    • If the request returns normally, the circuit breaker will continue to close and the main logic will be restored;
    • On the contrary, the circuit breaker continues to enter the open state, and the sleep time window is re timed;

1.6 all configurations of hystrix


@HystrixCommand(fallbackMethod = "str_fallbackMethod",
        groupKey = "strGroupCommand",
        commandKey = "strCommand",
        threadPoolKey = "strThreadPool",

        commandProperties = {
                // Set the isolation policy. THREAD indicates THREAD pool. SEMAPHORE: signal pool isolation
                @HystrixProperty(name = "execution.isolation.strategy", value = "THREAD"),
                // When the isolation strategy selects signal pool isolation, it is used to set the size of signal pool (maximum concurrent number)
                @HystrixProperty(name = "execution.isolation.semaphore.maxConcurrentRequests", value = "10"),
                // Configure the timeout for command execution
                @HystrixProperty(name = "execution.isolation.thread.timeoutinMilliseconds", value = "10"),
                // Enable timeout
                @HystrixProperty(name = "execution.timeout.enabled", value = "true"),
                // Is the execution interrupted when it times out
                @HystrixProperty(name = "execution.isolation.thread.interruptOnTimeout", value = "true"),
                // Is the execution interrupted when it is cancelled
                @HystrixProperty(name = "execution.isolation.thread.interruptOnCancel", value = "true"),
                // Maximum concurrent number of callback method executions allowed
                @HystrixProperty(name = "fallback.isolation.semaphore.maxConcurrentRequests", value = "10"),
                // Whether the service degradation is enabled and whether the callback function is executed
                @HystrixProperty(name = "fallback.enabled", value = "true"),
                // Is the circuit breaker enabled
                @HystrixProperty(name = "circuitBreaker.enabled", value = "true"),
                // This attribute is used to set the minimum number of requests for circuit breaker fusing in the rolling time window. For example, when the default value is 20,
                // If only 19 requests are received within the rolling time window (default 10 seconds), the circuit breaker will not open even if all 19 requests fail.
                @HystrixProperty(name = "circuitBreaker.requestVolumeThreshold", value = "20"),
                // This attribute is used to set in the rolling time window, indicating that in the rolling time window, the number of requests exceeds
                // circuitBreaker. In the case of requestvolumthreshold, if the percentage of wrong requests exceeds 50,
                // Set the circuit breaker to the "on" state, otherwise it will be set to the "off" state.
                @HystrixProperty(name = "circuitBreaker.errorThresholdPercentage", value = "50"),
                // This attribute is used to set the sleep time window after the circuit breaker is opened. After the sleep window ends,
                // It will set the circuit breaker to the "half open" state and try the request command of fusing. If it still fails, it will continue to set the circuit breaker to the "open" state,
                // Set to "off" if successful.
                @HystrixProperty(name = "circuitBreaker.sleepWindowinMilliseconds", value = "5000"),
                // Forced opening of circuit breaker
                @HystrixProperty(name = "circuitBreaker.forceOpen", value = "false"),
                // Forced closing of circuit breaker
                @HystrixProperty(name = "circuitBreaker.forceClosed", value = "false"),
                // Rolling time window setting, which is used for the duration of information to be collected when judging the health of the circuit breaker
                @HystrixProperty(name = "metrics.rollingStats.timeinMilliseconds", value = "10000"),
                // This attribute is used to set the number of "buckets" divided when rolling time window statistics indicator information. When collecting indicator information, the circuit breaker will
                // The set time window length is divided into multiple "buckets" to accumulate each measurement value. Each "bucket" records the collection indicators over a period of time.
                // For example, it can be divided into 10 "buckets" in 10 seconds, so timeinMilliseconds must be divisible by numBuckets. Otherwise, an exception will be thrown
                @HystrixProperty(name = "metrics.rollingStats.numBuckets", value = "10"),
                // This property is used to set whether the delay in command execution is tracked and calculated using percentiles. If set to false, all summary statistics will return - 1.
                @HystrixProperty(name = "metrics.rollingPercentile.enabled", value = "false"),
                // This property is used to set the duration of the rolling window of percentile statistics, in milliseconds.
                @HystrixProperty(name = "metrics.rollingPercentile.timeInMilliseconds", value = "60000"),
                // This attribute is used to set the number of buckets used in the percentile statistics scroll window.
                @HystrixProperty(name = "metrics.rollingPercentile.numBuckets", value = "60000"),
                // This attribute is used to set the maximum number of executions to keep in each bucket during execution. If the number of execution times exceeding the set value occurs within the rolling time window,
                // Start rewriting from the original position. For example, set this value to 100 and scroll the window for 10 seconds. If 500 executions occur in a "bucket" within 10 seconds,
                // Then only the statistics of the last 100 executions are retained in the "bucket". In addition, increasing the size of this value will increase the consumption of memory and the calculation time required to sort percentiles.
                @HystrixProperty(name = "metrics.rollingPercentile.bucketSize", value = "100"),
                // This attribute is used to set the interval waiting time for collecting health snapshots (success of requests, percentage of errors) that affect the status of the circuit breaker.
                @HystrixProperty(name = "metrics.healthSnapshot.intervalinMilliseconds", value = "500"),
                // Enable request cache
                @HystrixProperty(name = "requestCache.enabled", value = "true"),
                // Whether the execution and events of the HystrixCommand are printed into the HystrixRequestLog
                @HystrixProperty(name = "requestLog.enabled", value = "true"),
        },
        threadPoolProperties = {
                // This parameter is used to set the number of core threads in the command execution thread pool, which is the maximum concurrency of command execution
                @HystrixProperty(name = "coreSize", value = "10"),
                // This parameter is used to set the maximum queue size of the thread pool. When set to - 1, the thread pool will use the queue implemented by SynchronousQueue,
                // Otherwise, the queue implemented by LinkedBlockingQueue will be used.
                @HystrixProperty(name = "maxQueueSize", value = "-1"),
                // This parameter is used to set the rejection threshold for the queue. With this parameter, the request can be rejected even if the queue does not reach the maximum value.
                // This parameter is mainly a supplement to the LinkedBlockingQueue queue, because LinkedBlockingQueue
                // The queue cannot dynamically modify its object size, but the size of the queue that rejects requests can be adjusted through this attribute.
                @HystrixProperty(name = "queueSizeRejectionThreshold", value = "5"),
        }
)

2. Use Hystrix circuit breaker for service

2.1 introduction of POM XML dependency

<!--Pull Spring Cloud Hystrix Dependency-->
<dependency>
    <groupId>org.springframework.cloud</groupId>
    <artifactId>spring-cloud-starter-hystrix</artifactId>
</dependency>
  • The following dependencies are the core Netflix Hystrix library, which generally does not need to be introduced by us;
<dependency>
    <groupId>com.netflix.hystrix</groupId>
    <artifactId>hystrix-javanica</artifactId>
    <version>1.5.9</version>
</dependency>

2.2 modify bootstrap YML profile

  • If Feign call is needed, it needs to be configured;
feign: 
  hystrix:
    #Enable the hystrix support of feign. The default is false 
    enabled: true

2.3 mark notes on the main program class

  • @Enablercircuitbreaker: indicates to activate and use policies related to service degradation;

  • @EnableHystrix: inherits @ enablercircuitbreaker and encapsulates it;

  • If you forget to add this annotation to the main program class, the Hystrix circuit breaker will not be active. When the service starts, you will not receive any warning or error messages;

2.4 use @ HystrixCommand annotation on business class (circuit breaker mode)

In the business class under the service package; Method level annotation;

@HystrixCommand
private Xxx getXxx(String xxxId) {
    return xxxService.getXxx(xxxId);
}
  • @ the HystrixCommand annotation will dynamically generate an agent that will wrap the method and manage all calls to the method through a thread pool dedicated to handling remote calls;
  • When the call time exceeds 1000 ms, the circuit breaker will interrupt the call to getXxx(). And throw r the following exception:
    • com.nextflix.hystrix.exception.HystrixRuntimeException;

2.5 customized circuit breaker (backup strategy, bulkhead strategy)

  • By default, the @ HystrixCommand annotation without attribute configuration will place all remote service calls under the same thread pool. May cause problems in the application;
@HystrixCommand(
        fallbackMethod="getYyy",  //[optional] the backup policy defines the unique name of the thread pool. If the getXxx call fails, the method will be called. Note that the parameters of the two methods should be consistent
        threadPoolKey="xxxThreadPool",  //[optional] bulkhead policy, which defines the unique name of the thread pool. Bulkhead policy defines the unique name of the thread pool
        threadPoolProperties={
                @HystrixProperty(name="coreSize",value="30"),  //Maximum number of threads in the thread pool
                @HystrixProperty(name="maxQueueSize", value="10")  //Define a queue in front of the thread pool to queue incoming requests
        },
        //Customize the behavior of the circuit breaker through the commandProperties property
        commandProperties={
                @HystrixProperty(name="circuitBreaker.requestVolumeThreshold", value="10"),  //Number of continuous calls to occur within 10s before circuit breaker triggering
                @HystrixProperty(name="circuitBreaker.errorThresholdPercentage", value="75"),  //Percentage of call failures that must be reached before the circuit breaker trips
                @HystrixProperty(name="circuitBreaker.sleepWindowInMilliseconds", value="7000"),  //After the circuit breaker trips, Hystrix allows a call to pass in order to see whether the service is restored to health before the hibernation time of Hystrix 
                
                @HystrixProperty(name="metrics.rollingStats.timeInMilliseconds", value="15000"),  //The window size used by Hystr ix to monitor service call problems. Its default value is 10 000 Ms
                @HystrixProperty(name="metrics.rollingStats.numBuckets", value="5"),  //Defines the number of times statistics are collected in a scrolling window
                
                //@HystrixProperty(name="execution.isolation.thread.timeoutInMilliseconds",value="5000"), / / set the timeout time of the circuit breaker
                @HystrixProperty(name="execution.isolation.strategy", value="SEMAPHORE"),  //Modify isolation pool settings
        }
)
private Xxx getXxx(String xxxId) {
    return xxxService.getXxx(xxxId);
}

private Yyy getYyy(String yyyId) {
    return yyyService.getYyy(yyyId);
}

Detailed explanation of properties:

  • threadPoolProperties:

    • maxQueueSize: if its value is set to - 1, Java SynchronousQueue will be used to save all incoming requests. In essence, the synchronization queue will force the number of requests being processed to never exceed the number of available threads in the thread pool;
    • maxQueueSize: if it is set to a value greater than 1, Java LinkedBlockingQueue will be used; The use of LinkedBlockingQueue allows developers to queue requests even if all threads are busy processing requests;
    • The maxQueueSize property can only be set when the thread pool is first initialized (for example, when the application starts). Hystrix allows you to dynamically change the size of the queue by using the queueSizeRejectionThreshold property, but this property can only be set when the value of maxQueueSize property is greater than 0;
  • commandProperties:

    • execution.isolation.thread.timeoutInMilliseconds: set the timeout time of the circuit breaker. In actual development, the problem should be to solve the performance problem rather than increasing the default timeout. If you do encounter some service calls that take longer than other service calls, be sure to isolate these service calls into a separate thread pool;
    • metrics.rollingStats.numBuckets: its value needs to be measured rollingStats. Timeinmilliseconds divisible. In the above example, Hystrix uses a 15s window and collects statistical data into five buckets with a length of 3 s. The smaller the statistical window and the more buckets are reserved in the window, the more CPU utilization and memory utilization of high request services will be exacerbated;
    • execution.isolation.strategy: there are two different isolation strategies when the circuit breaker is executed;
      • THREAD: by default, each Hystrix command invoked by protection runs in a separate THREAD pool;
      • SEMAPHORE: lightweight isolation, which is applicable to the case where the service volume is large and the asynchronous l/O programming model is being used (assuming that the asynchronous IO container such as Netty is used);
  • For details of other configuration attributes, please refer to all configurations of 1.6 Hystrix in this chapter;

2.6 unified Hystrix configuration using class level annotations

  • @DefaultProperties: equivalent to modifying the default value of Hystrix configuration under this class;
@DefaultProperties( 
    commandProperties={
         @HystrixProperty(name="execution.isolation.thread.timeoutInMilliseconds",value="10000")}
class  MyService{
    ...
}

3. Use hystrixcurrencystrategy to contact the thread context

3.1 context isolation of hystrix

  • By default, Hystrix runs with THREAD isolation policy;
  • This enables each Hystrix command to run in a separate thread pool, which does not share its context with the parent process;
  • Therefore, by default, for methods called by the parent thread and protected by @ HystrixComman, the value set as ThreadLocal value in the parent thread is not available;
  • Solution: define a concurrency strategy that can inject the additional parent thread context into the thread managed by the Hystrix command;

3.2 customize Hystrix well development strategy class

//Extend the basic hystrix concurrency strategy class
public class ThreadLocalAwareStrategy extends HystrixConcurrencyStrategy{
    private HystrixConcurrencyStrategy existingConcurrencyStrategy;

    //Pass the existing concurrency policy into the constructor
    public ThreadLocalAwareStrategy(HystrixConcurrencyStrategy existingConcurrencyStrategy) {
        this.existingConcurrencyStrategy = existingConcurrencyStrategy;
    }

    @Override
    public BlockingQueue<Runnable> getBlockingQueue(int maxQueueSize) {
        return existingConcurrencyStrategy != null
                ? existingConcurrencyStrategy.getBlockingQueue(maxQueueSize)
                : super.getBlockingQueue(maxQueueSize);
    }

    @Override
    public <T> HystrixRequestVariable<T> getRequestVariable(
            HystrixRequestVariableLifecycle<T> rv) {
        return existingConcurrencyStrategy != null
                ? existingConcurrencyStrategy.getRequestVariable(rv)
                : super.getRequestVariable(rv);
    }

    @Override
    public ThreadPoolExecutor getThreadPool(HystrixThreadPoolKey threadPoolKey,
                                            HystrixProperty<Integer> corePoolSize,
                                            HystrixProperty<Integer> maximumPoolSize,
                                            HystrixProperty<Integer> keepAliveTime, TimeUnit unit,
                                            BlockingQueue<Runnable> workQueue) {
        return existingConcurrencyStrategy != null
                ? existingConcurrencyStrategy.getThreadPool(threadPoolKey, corePoolSize,
                maximumPoolSize, keepAliveTime, unit, workQueue)
                : super.getThreadPool(threadPoolKey, corePoolSize, maximumPoolSize,
                keepAliveTime, unit, workQueue);
    }

    @Override
    public <T> Callable<T> wrapCallable(Callable<T> callable) {
         //Inject Callable implementation to set UserContext
        return existingConcurrencyStrategy != null
                ? existingConcurrencyStrategy
                .wrapCallable(new DelegatingUserContextCallable<T>(callable, UserContextHolder.getContext()))
                : super.wrapCallable(new DelegatingUserContextCallable<T>(callable, UserContextHolder.getContext()));
    }
}

3.3 define a Java Callable class and inject UserContext into the Hystrix command

public final class DelegatingUserContextCallable<V> implements Callable<V> {
    private final Callable<V> delegate;
    private UserContext originalUserContext;
    //Pass the original Callable class, and the custom Callable will call the code protected by Hystrix and the UserContext from the parent thread
    public DelegatingUserContextCallable(Callable<V> delegate,
                                             UserContext userContext) {
        this.delegate = delegate;
        this.originalUserContext = userContext;
    }
    //The call() method is invoked before being annotated by @HystrixCommand.
    public V call() throws Exception {
        //UserContext has been set, and the ThreadLocal variable storing UserContext is associated with the thread running the method protected by Hystrix
        UserContextHolder.setContext( originalUserContext );

        try {
            //call() method on the method protected by Hystrix
            return delegate.call();
        }
        finally {
            this.originalUserContext = null;
        }
    }

    public static <V> Callable<V> create(Callable<V> delegate,
                                         UserContext userContext) {
        return new DelegatingUserContextCallable<V>(delegate, userContext);
    }
}

3.4 configure Spring Cloud to use custom Hystrix well development strategy

@Configuration
public class ThreadLocalConfiguration {
        //When a configuration object is constructed, it is automatically assembled in the existing hystrixcurrencystrategy
        @Autowired(required = false)
        private HystrixConcurrencyStrategy existingConcurrencyStrategy;

        @PostConstruct
        public void init() {
            // Keep references to existing Hystrix plug-ins
            //Because you want to register a new concurrency policy, you need to get all other Hystrix components and reset the Hystrix components
            HystrixEventNotifier eventNotifier = HystrixPlugins.getInstance()
                    .getEventNotifier();
            HystrixMetricsPublisher metricsPublisher = HystrixPlugins.getInstance()
                    .getMetricsPublisher();
            HystrixPropertiesStrategy propertiesStrategy = HystrixPlugins.getInstance()
                    .getPropertiesStrategy();
            HystrixCommandExecutionHook commandExecutionHook = HystrixPlugins.getInstance()
                    .getCommandExecutionHook();

            HystrixPlugins.reset();

            //Use the Hystrix plug-in to register a custom Hystrix concurrency strategy
            HystrixPlugins.getInstance().registerConcurrencyStrategy(new ThreadLocalAwareStrategy(existingConcurrencyStrategy));
            //Re register all components used by the Hystrix plug-in
            HystrixPlugins.getInstance().registerEventNotifier(eventNotifier);
            HystrixPlugins.getInstance().registerMetricsPublisher(metricsPublisher);
            HystrixPlugins.getInstance().registerPropertiesStrategy(propertiesStrategy);
            HystrixPlugins.getInstance().registerCommandExecutionHook(commandExecutionHook);
        }
}

4. Use hystrixDashboard to realize service monitoring

  • Hytrix provides a quasi real-time call monitoring (hytrix dashboard), which will continuously record the execution information of all requests initiated through hytrix and display it to users in the form of statistical reports and graphics, including how many requests are executed, how many successes, how many failures, etc;
  • Netflix monitors the above indicators through the hystrix metrics event stream project;
  • Spring Cloud also provides the integration of Hystrix Dashboard, which transforms the monitoring content into a visual interface;

4.1 introduction of POM XML dependency

<dependency>
    <groupId>org.springframework.cloud</groupId>
    <artifactId>spring-cloud-starter-netflix-hystrix-dashboard</artifactId>
</dependency>

4.2 modify application YML profile

Mainly modify the port number;

server:
  port: 9001

4.3 mark notes on the main program class

@EnableHystrixDashboard: table name enables the Hystrix Dashboard to monitor calls;

4.4 configure monitored services

1. Add POM XML dependency file

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-actuator</artifactId>
</dependency>

2. Specify the monitoring path in the main startup class

@SpringBootApplication
@EnableEurekaClient //After the service is started, it will be automatically registered into eureka service
@EnableCircuitBreaker//Support for hystrixR circuit breaker mechanism
public class Application{
    public static void main(String[] args){
        SpringApplication.run(Application.class,args);
    }

/**
 *This configuration is for service monitoring and has nothing to do with service fault tolerance itself. It is the result of spring cloud upgrade
 *ServletRegistrationBean Because the default path of springboot is not "/ hystrix.stream",
 *Just configure the following servlet s in your project
 */
@Bean
public ServletRegistrationBean getServlet() {
    HystrixMetricsStreamServlet streamServlet = new HystrixMetricsStreamServlet();
    ServletRegistrationBean registrationBean = new ServletRegistrationBean(streamServlet);
    registrationBean.setLoadOnStartup(1);
    registrationBean.addUrlMappings("/hystrix.stream");
    registrationBean.setName("HystrixMetricsStreamServlet");
    return registrationBean;
}

4.5 access to graphical interface

4.6 viewing the monitoring diagram

  • Example of monitoring chart:

  • Legend explanation of monitoring diagram:


  • Solid circle: has two meanings:
    • The change of color represents the hea lt h degree of the example, which decreases from green < yellow < orange < red;
    • Its size will also change according to the request traffic of the instance. The larger the traffic, the larger the solid circle;
    • Therefore, through the display of the solid circle, we can quickly find fault cases and high pressure cases in a large number of examples;
  • Curve:
    • It is used to record the relative change of flow within 2 minutes, and the rising and falling trend of flow can be observed through it;

last

Newcomer production, if there are mistakes, welcome to point out, thank you very much! Welcome to the official account and share some more everyday things. If you need to reprint, please mark the source!

Keywords: Distribution Spring Cloud Hystrix microservice

Added by dzelenika on Wed, 02 Feb 2022 13:03:08 +0200