Principle and practice of three circuit breaker frameworks in distributed system
E-commerce spring4lj project( https://gitee.com/gz-yami/mall4j )
With the popularity of microservices, fusing, as a very important technology, is also widely known. When the running quality of the microservice is lower than a certain critical value, start the fuse mechanism and suspend the microservice call for a period of time to ensure that the back-end microservice will not be shut down due to continuous overload. This paper introduces how to use Hystrix, the new generation fuse Resilience4j and Sentinel, which is open source by Alibaba. If there is any mistake, please point it out.
1. Why do I need a circuit breaker
The Circuit Breaker mode is derived from Martin Fowler's Circuit Breaker. "Circuit Breaker" itself is a kind of switching device, which is used to protect the circuit from overload. When there is a short circuit in the circuit, "Circuit Breaker" can cut off the faulty circuit in time to prevent serious consequences such as overload, heating and even fire.
In the distributed architecture, the function of circuit breaker mode is similar. When a service unit fails (similar to the short circuit of electrical appliances), it returns an error response to the caller through the fault monitoring of circuit breaker (similar to fusing fuse), rather than waiting for a long time. In this way, the thread will not be occupied and not released for a long time due to calling the fault service, so as to avoid the spread of the fault in the distributed system.
In view of the above problems, the circuit breaker is a framework to realize a series of service protection functions such as circuit breaking, thread isolation, flow control and so on. Nodes of systems, services and third-party libraries, so as to provide more powerful fault tolerance for delays and faults.
2. Hystrix
2.1 what is Hystrix
Hystrix is an open-source framework of Netfix. It has the functions of dependency isolation, system fault tolerance and degradation, which are also its two most important uses, as well as the functions of request merging.
2.2 simple case of hystrix
2.2.1 create a new hystrix project and introduce dependency
<dependency> <groupId>org.springframework.cloud</groupId> <artifactId>spring-cloud-starter-netflix-hystrix</artifactId> </dependency>
2.2.2 add the annotation @ enablercircuitbreaker / / enable circuit breaker on the start class
@EnableCircuitBreaker public class TestApplication extends SpringBootServletInitializer{ public static void main(String[] args) { SpringApplication.run(ApiApplication.class, args); } }
2.2.3 add open circuit logic in TestProductController
@RequestMapping("/get/{id}") @HystrixCommand(fallbackMethod="errorCallBack") //When the test does not have this data, the service will be degraded public Object get(@PathVariable("id") long id){ Product p= productService.findById(id); if( p==null){ throw new RuntimeException("No such product found"); } return p; } //Specify a demotion method public Object errorCallBack( @PathVariable("id") long id ){ return id+"non-existent,error"; }
2.3 summary
This paper briefly introduces the working principle and simple cases of Hystrix, but the official development of Hystrix has stopped, so I won't introduce it in depth.
3. Resilience4j
3.1 introduction
After the official development of Hystrix has stopped, the official of Hystrix recommends the use of a new generation of fuse for resilience4j. Resilience4j is a lightweight, easy-to-use fault-tolerant library inspired by Netflix Hystrix, but designed for Java 8 and functional programming. Because the library only uses Vavr (formerly known as Javaslang), it has no other external dependencies. In contrast, Netflix Hystrix has compilation dependencies on archius, which has more external library dependencies, such as Guava and Apache Commons Configuration. If you need to use resilience4j, you don't need to introduce all dependencies, just select the functional modules you need.
3.2 module composition
Resilience4j provides several core modules:
resilience4j-circuitbreaker: The circuit is disconnected resilience4j-ratelimiter: Rate limit resilience4j-bulkhead: a partition resilience4j-retry: Automatic retry (synchronous and asynchronous) resilience4j-timelimiter: timeout handler resilience4j-cache: Result cache
3.3 setting Maven
Introduce dependency
<dependency> <groupId>io.github.resilience4j</groupId> <artifactId>resilience4j-circuitbreaker</artifactId> <version>0.13.2</version> </dependency>
3.4 circuit breaker
Please note that to use this function, we need to introduce the resilience4j circuit breaker dependency above.
This fuse mode can help us prevent fault cascading in case of remote service failure.
After multiple requests fail, we think the service is unavailable / overloaded, and short circuit all subsequent requests, so that we can save system resources. Let's see how this can be achieved through Resilience4j.
First, we need to define the settings to use. The easiest way is to use the default settings:
CircuitBreakerRegistry circuitBreakerRegistry = CircuitBreakerRegistry.ofDefaults();
You can also use custom parameters:
CircuitBreakerConfig config = CircuitBreakerConfig.custom() .failureRateThreshold(20) .ringBufferSizeInClosedState(5) .build();
Here, we set ratethreshold to 20% and try again at least 5 times.
Then, we create a CircuitBreaker object and call the remote service through it:
interface RemoteService { int process(int i); } CircuitBreakerRegistry registry = CircuitBreakerRegistry.of(config); CircuitBreaker circuitBreaker = registry.circuitBreaker("my"); Function<Integer, Integer> decorated = CircuitBreaker .decorateFunction(circuitBreaker, service::process);
Finally, let's see how it passes the JUnit test.
We call the service 10 times. You can verify that the service is called at least 5 times. If there are 20% failures, the call will be stopped.
when(service.process(any(Integer.class))).thenThrow(new RuntimeException()); for (int i = 0; i < 10; i++) { try { decorated.apply(i); } catch (Exception ignore) {} } verify(service, times(5)).process(any(Integer.class));
Three states of circuit breaker:
- Off - normal service, no short circuit involved
- On - remote service down, all requests short circuited
- Half open - after entering the open state for a period of time (according to the configured amount of time), the fuse allows to check whether the remote service is restored
We can configure the following settings:
- Failure rate threshold above which the CircuitBreaker opens
- Waiting time, which is used to define the time that the CircuitBreaker should remain open before switching to half open state
- The size of the ring buffer when the CircuitBreaker is half open or closed
- The listener that handles custom events, CircuitBreakerEventListener, handles CircuitBreaker events
- Custom predicates are used to evaluate whether the exception is a failure, so as to improve the failure rate
3.5 current limiter
This feature requires the use of resilience4j rateelimiter dependencies.
Simple example:
RateLimiterConfig config = RateLimiterConfig.custom().limitForPeriod(2).build(); RateLimiterRegistry registry = RateLimiterRegistry.of(config); RateLimiter rateLimiter = registry.rateLimiter("my"); Function<Integer, Integer> decorated = RateLimiter.decorateFunction(rateLimiter, service::process);
Now all calls to the decomporatefunction conform to the rate limiter.
We can configure the following parameters:
- Limit refresh time
- Permission restrictions during refresh
- Default waiting license period
3.6 bulkhead isolation
Resilience4j bulkhead dependency needs to be introduced here to limit the number of concurrent calls to specific services.
Let's take a look at an example of configuring concurrent calls using Bulkhead API:
BulkheadConfig config = BulkheadConfig.custom().maxConcurrentCalls(1).build(); BulkheadRegistry registry = BulkheadRegistry.of(config); Bulkhead bulkhead = registry.bulkhead("my"); Function<Integer, Integer> decorated = Bulkhead.decorateFunction(bulkhead, service::process);
For testing, we can call a method of mock service. In this case, we ensure that Bulkhead does not allow any other calls:
CountDownLatch latch = new CountDownLatch(1); when(service.process(anyInt())).thenAnswer(invocation -> { latch.countDown(); Thread.currentThread().join(); return null; }); ForkJoinTask<?> task = ForkJoinPool.commonPool().submit(() -> { try { decorated.apply(1); } finally { bulkhead.onComplete(); } }); latch.await(); assertThat(bulkhead.isCallPermitted()).isFalse();
We can configure the following settings:
- Maximum number of parallelism allowed
- The maximum time a thread will wait when entering saturation
3.7 retry
The resilience4j Retry library needs to be introduced. You can use Retry to automatically Retry after a call fails:
RetryConfig config = RetryConfig.custom().maxAttempts(2).build(); RetryRegistry registry = RetryRegistry.of(config); Retry retry = registry.retry("my"); Function<Integer, Void> decorated = Retry.decorateFunction(retry, (Integer s) -> { service.process(s); return null; });
Now, let's simulate an exception thrown during a remote service call and ensure that the library automatically retries the failed call:
when(service.process(anyInt())).thenThrow(new RuntimeException()); try { decorated.apply(1); fail("Expected an exception to be thrown if all retries failed"); } catch (Exception e) { verify(service, times(2)).process(any(Integer.class)); }
We can also configure:
- Maximum number of attempts
- Wait time before retrying
- User defined function to modify the waiting interval after failure
- A custom predicate that evaluates whether an exception will cause the call to be retried
3.8 caching
The cache module needs to introduce resilience4j cache dependency. The initialization code is as follows:
javax.cache.Cache cache = ...; // Use appropriate cache here Cache<Integer, Integer> cacheContext = Cache.of(cache); Function<Integer, Integer> decorated = Cache.decorateSupplier(cacheContext, () -> service.process(1));
The cache here is implemented through JSR-107 Cache, and Resilience4j provides a method to operate the cache.
Note that there is no API for decorating methods (such as Cache.decorateFunction (Function)), which only supports Supplier and Callable types.
3.9 time limiter
For this module, we need to introduce resilience4j TimeLimiter dependency, which can limit the time spent using TimeLimiter to call remote services.
We set a TimeLimiter with a configured timeout of 1 ms to facilitate the test:
long ttl = 1; TimeLimiterConfig config = TimeLimiterConfig.custom().timeoutDuration(Duration.ofMillis(ttl)).build(); TimeLimiter timeLimiter = TimeLimiter.of(config);
Next, let's call * future Get() * verify Resilience4j timed out as expected:
Future futureMock = mock(Future.class); Callable restrictedCall = TimeLimiter.decorateFutureSupplier(timeLimiter, () -> futureMock); restrictedCall.call(); verify(futureMock).get(ttl, TimeUnit.MILLISECONDS);
We can also use it in combination with circuit breaker:
Callable chainedCallable = CircuitBreaker.decorateCallable(circuitBreaker, restrictedCall);
3.10 additional modules
Resilience4j also provides many additional functional modules to simplify its integration with popular frameworks and libraries.
Some common integrations are:
- Spring Boot – resilience4j Spring Boot module
- Ratpack – resilience4j ratpack module
- Retrofit – resilience4j retrofit module
- Vertx – resilience4j vertx module
- 4metric – drop Wizard module
- Prometheus – resilience4j Prometheus module
3.11 summary
Through the above, we learned about the simple use of Resilience4j Library in all aspects and how to use it to solve various fault-tolerant problems in inter server communication. The source code of Resilience4j can be found on GitHub.
4. Sentinel
4.1 what is Sentinel?
Sentinel is a lightweight traffic control component for distributed service architecture, which is open source by Alibaba. It mainly takes traffic as the starting point to ensure the stability of micro services from multiple dimensions such as current limiting, traffic shaping, fuse degradation and system load protection.
4.2 Sentinel has the following characteristics:
- Rich application scenarios: Sentinel has undertaken the core scenarios of Alibaba's double 11 traffic promotion in recent 10 years, such as spike (i.e. sudden traffic control is within the range of system capacity), message peak cutting and valley filling, cluster traffic control, real-time fuse downstream unavailable applications, etc.
- Complete real-time monitoring: Sentinel also provides real-time monitoring function. You can see the second level data of a single machine connected to the application in the console, and even the summary operation of clusters with a scale of less than 500.
- Extensive open source Ecology: Sentinel provides out of the box integration modules with other open source frameworks / libraries, such as Spring Cloud, Dubbo and gRPC. You only need to introduce the corresponding dependencies and make simple configuration to quickly access Sentinel.
- Perfect SPI extension point: Sentinel provides simple, easy-to-use and perfect SPI extension interface. You can quickly customize the logic by implementing the extension interface. For example, custom rule management, adapting dynamic data sources, etc.
4.3 working mechanism:
- Provide API s for adaptation or display of mainstream frameworks to define the resources to be protected, and provide facilities for real-time statistics and call link analysis of resources.
- According to the preset rules, combined with the real-time statistical information of resources, the flow is controlled. At the same time, Sentinel provides an open interface to facilitate you to define and change rules.
- Sentinel provides a real-time monitoring system to facilitate you to quickly understand the current system status.
4.4 Sentinel summary:
Sentinel is a high availability traffic protection component for distributed service architecture. As Alibaba's fuse middleware, sentinel has undertaken the core scenario of Alibaba's double 11 traffic promotion in recent 10 years, which is very prominent in terms of high availability and stability of traffic protection.
5. Summary
The performance comparison of three mainstream fuse middleware is shown in the table:
E-commerce spring4lj project( https://gitee.com/gz-yami/mall4j )