Recently, the business has grown rapidly, which is also a big challenge for our background. The performance bottleneck of the core business interface encountered this time is not caused by a single problem, but a combination of several problems: after we solve one, we launch it, and then we find another performance bottleneck. This is also my lack of experience, which leads to my failure to locate and solve it all at once; And I have stubborn self-esteem for our whole backstage team. I don't want to survive the pressure peak through a large number of horizontal expansion, resulting in different degrees of problems online for several consecutive nights, which must have an impact on our business growth. This is also where I am immature and want to reflect. This series of articles mainly record the general technical optimization of our background micro service system for this business growth, and the optimization of business process and cache. Since it is only applicable to our business, it will not be repeated here. This series will be divided into the following articles:
- Improved client load balancing algorithm
- Develop a filter plug-in for the log output exception stack
- Improving asynchronous log waiting strategy for x86 cloud environment
- To increase the monitoring of HTTP request waiting queue of synchronous microservices and cloud deployment, we need to be careful that the request response is slow due to reaching the upper limit of instance network traffic
- Add necessary intrusive monitoring for key business of the system
Develop a filter plug-in for the log output exception stack
We usually print the log when an exception occurs, and there is an exception stack in the log.
When a large number of timeout exceptions occur online due to a basic component or a slow storage, if the complete exception stack is printed, a large number of logs will be output at once, resulting in a bottleneck in writing logs (although we use the asynchronous log of Log4j2, if the RingBuffer is full, the output logs will still lead to business thread blocking). As a result, other normal businesses in the same microservice become abnormal.
And we found that
Why add - XX:-OmitStackTraceInFastThrow in JVM parameters
To avoid this problem, the first thing you might think of is to remove - XX:-OmitStackTraceInFastThrow from the JVM parameter. This OmitStackTraceInFastThrow is enabled by default. Its function is to automatically omit the exception stack when a JDK built-in exception is thrown multiple times through a method (usually NullPointerException). In fact, it is throwable Getourstacktrace() returns an empty array. The underlying implementation is that when these exceptions are thrown, they will be recorded in the method of the method_data. When these exceptions are thrown, check the method of the corresponding method_data whether these exceptions are thrown too many times. If so, use the exception object without stack to replace the original exception object, so that the exception stack is ignored. Corresponding source code:
graphKit.cpp
bool treat_throw_as_hot = false; ciMethodData* md = method()->method_data(); if (ProfileTraps) { //If there are too many times, treat_throw_as_hot is true if (too_many_traps(reason)) { treat_throw_as_hot = true; } if (C->trap_count(reason) != 0 && method()->method_data()->trap_count(reason) != 0 && has_ex_handler()) { treat_throw_as_hot = true; } } if (treat_throw_as_hot && (!StackTraceInThrowable || OmitStackTraceInFastThrow)) { ciInstance* ex_obj = NULL; switch (reason) { case Deoptimization::Reason_null_check: //For NullPointerException, return the built-in NullPointerException object of the corresponding empty stack ex_obj = env()->NullPointerException_instance(); break; case Deoptimization::Reason_div0_check: //For ArithmeticException, return the built-in ArithmeticException object of the corresponding empty stack ex_obj = env()->ArithmeticException_instance(); break; case Deoptimization::Reason_range_check: //For ArrayIndexOutOfBounds, return the built-in nullpoarrayindexoutofboundsinerexception object of the corresponding empty stack ex_obj = env()->ArrayIndexOutOfBoundsException_instance(); break; case Deoptimization::Reason_class_check: if (java_bc() == Bytecodes::_aastore) { //For ArrayStoreException, return the built-in ArrayStoreException object of the corresponding empty stack ex_obj = env()->ArrayStoreException_instance(); } else { //For ClassCastException, return the built-in ClassCastException object of the corresponding empty stack ex_obj = env()->ClassCastException_instance(); } break; default: break; }
However, we usually add - XX:-OmitStackTraceInFastThrow to the JVM startup parameters to turn it off. The main reasons are:
- The parameter OmitStackTraceInFastThrow is only for some Java built-in exceptions (listed in the above source code), and is not useful for exceptions customized by us or the framework.
- Whether there is too much analysis is only for the methods that throw exceptions, but whether it is the same call path is not considered.
- Microservice threads may run for a long time. Our business logs are very large, and a new file is generated every hour. Suppose a method throws a certain amount of NullPointerException every day, but it is not many and is not found in time. Over time, one day suddenly there is no stack. Later, if this method throws a lot of nullpointerexceptions, but we can't see the stack, we have to turn over the previous logs one by one, which is too inefficient.
Our requirements for exception logs
Since various frameworks are used in our project, and some use asynchronous frameworks, the exception stack will be very, very long (some even have 1000 lines), so the most suitable ones are:
- The exception stack is output for each exception
- However, the exception stack only includes the packages we care about, and other packages are omitted to prevent a large number of logs from being output due to the long exception stack.
Log4j2 official configuration of exceptions
Log4j2 official only provides the configuration of blacklist packages, that is, the exception stack of which packages is omitted; Also about streamlining the exception stack is the first few lines of the output log, but we can't guarantee that the log we care about must be in the first few lines of the log, especially for the exception stack of asynchronous responsive code calls.
However, these configurations of Log4j2 are implemented through the Pattern plug-in, or we can develop a custom Pattern plug-in.
Develop custom Pattern plug-ins
First of all, Log4j2 official only provides the configuration of blacklist packages. We can reverse the judgment logic of the configuration of blacklist packages. In fact, it becomes a white list. Other configuration formats are analyzed as usual. So we copied the code ThrowableProxyRenderer of Log4j2, named it our customized ThrowableProxyRenderer, and changed the package private method to public, so that we can call it anywhere. The blacklist judgment is reversed and changed to white list:
Complete source code reference: GitHub com/JoJoTec/spr…
public class CustomizedThrowableProxyRenderer { //Omit methods that have not been modified //The method is changed to public public static void formatExtendedStackTraceTo(final ThrowableProxy src, final StringBuilder sb, final List<String> whiteListedPackages, final TextRenderer textRenderer, final String suffix, final String lineSeparator) { textRenderer.render(src.getName(), sb, "Name"); textRenderer.render(": ", sb, "NameMessageSeparator"); textRenderer.render(src.getMessage(), sb, "Message"); renderSuffix(suffix, sb, textRenderer); textRenderer.render(lineSeparator, sb, "Text"); final StackTraceElement[] causedTrace = src.getThrowable() != null ? src.getThrowable().getStackTrace() : null; formatElements(sb, Strings.EMPTY, 0, causedTrace, src.getExtendedStackTrace(), whiteListedPackages, textRenderer, suffix, lineSeparator); formatSuppressed(sb, TAB, src.getSuppressedProxies(), whiteListedPackages, textRenderer, suffix, lineSeparator); formatCause(sb, Strings.EMPTY, src.getCauseProxy(), whiteListedPackages, textRenderer, suffix, lineSeparator); } //The original blackListElement method is changed to whiteListedElement //The result is reversed and changed to white list private static boolean whiteListedElement(final StackTraceElement element, final List<String> whiteListedPackages) { if (whiteListedPackages != null) { final String className = element.getClassName(); for (final String pkg : whiteListedPackages) { if (className.startsWith(pkg)) { return true; } } } return false; } }
Then, develop our customized Log4j2 Pattern plug-in:
CustomizedThrowablePatternConverter.java
@Plugin(name = "CustomizedThrowablePatternConverter", category = PatternConverter.CATEGORY) @ConverterKeys({ "cusEx", "cusThrowable", "cusException" }) public class CustomizedThrowablePatternConverter extends ThrowablePatternConverter { public static CustomizedThrowablePatternConverter newInstance(final Configuration config, final String[] options) { return new CustomizedThrowablePatternConverter(config, options); } private CustomizedThrowablePatternConverter(final Configuration config, final String[] options) { super("CustomizedThrowable", "throwable", options, config); } @Override public void format(final LogEvent event, final StringBuilder toAppendTo) { final ThrowableProxy proxy = event.getThrownProxy(); final Throwable throwable = event.getThrown(); if ((throwable != null || proxy != null) && options.anyLines()) { if (proxy == null) { super.format(event, toAppendTo); return; } final int len = toAppendTo.length(); if (len > 0 && !Character.isWhitespace(toAppendTo.charAt(len - 1))) { toAppendTo.append(' '); } //Call the above tool class and format exception CustomizedThrowableProxyRenderer.formatExtendedStackTraceTo(proxy, toAppendTo, options.getIgnorePackages(), options.getTextRenderer(), getSuffix(event), options.getSeparator()); } } }
In this way, we can add this exception definition PatternLayout to our Log4j2 configuration, for example:
<RollingFile name="file" append="true" filePattern="${LOG_ROOT}/app.log-%d{yyyy.MM.dd.HH}-${sys:LOCAL_IP}"> <PatternLayout pattern="%d{yyyy-MM-dd HH:mm:ss.SSS} %5p [%X{traceId},%X{spanId}] [${sys:PID}] [%t]: %m%n%cusEx{filters(java, com.mycompany)}"/> <Policies> <TimeBasedTriggeringPolicy interval="1" modulate="true"/> </Policies> <DirectWriteRolloverStrategy maxFiles="72"/> </RollingFile>
% cusEx{filters(java, com.mycompany)} represents the exception stack, and only the exception stack at the beginning of these packages will be output. Here are java and com Starting with mycompany. As a result, log error("something error!", e);, The output log will be similar to:
2021-10-30 16:00:19.254 ERROR [65f6eda3bf6a48ee,d25cc4c9eb1deed6] [30] [subscribe-pool-9]: something error! ... suppressed 27 lines at com.mycompany.spring.cloud.parent.web.common.undertow.jfr.JFRTracingFilter.doFilter(JFRTracingFilter.java:40) ~[spring-cloud-parent-web-common-2020.0.3-SNAPSHOT.jar!/:2020.0.3-SNAPSHOT] at com.mycompany.spring.cloud.parent.web.common.undertow.jfr.LazyJFRTracingFilter.doFilter(LazyJFRTracingFilter.java:23) ~[spring-cloud-parent-web-common-2020.0.3-SNAPSHOT.jar!/:2020.0.3-SNAPSHOT] ... suppressed 46 lines Caused by: com.alibaba.fastjson.JSONException: write javaBean error, fastjson version 1.2.75, class com.hopegaming.factsCenter.query.revo.controller.frontend.share.matches.MatchController$EventVO, fieldName : data ... suppressed 8 lines ... 74 more Caused by: java.lang.NullPointerException at com.mycompany.OrderController.postOrder(OrderController.java:103) ~[classes!/:2020.0.3-SNAPSHOT] ... suppressed 13 lines ... 74 more
Original link:
Author: full of dry goods
Link: https://juejin.cn/post/7025778908515532837