Mobile Performance Monitoring Scheme Hertz
Performance issues are one of the main causes of App user churn. App's performance problems include crashes, network request errors or timeouts, slow response, list scrolling cartons, high traffic, power consumption, and so on. There are many reasons for the poor performance of App, except for the external factors of hardware and software, most of which are caused by developers'misuse of threads, locks, system functions, programming paradigms, data structures and so on. Even the most experienced programmers can hardly avoid all the "pits" that lead to poor performance during development, so the key to solving performance problems is to find and locate these "pits" as soon as possible.
By summarizing the common performance problems and learning the principles of performance monitoring technology such as Weixin and 360 in the industry, Metro Takeaway has developed a mobile terminal performance monitoring solution, Hertz (Hertz). Hertz's goal is to achieve these three functions:
- During the development period, check performance exceptions and notify the developer.
- During the testing period, performance test reports are generated in conjunction with existing testing tools.
- During the online period, the online problem location and tracing can be realized by reporting performance data on the monitoring platform.
To achieve these three functions, we must first collect measurable and valuable performance data, so the collection of performance data is one of the core issues that we pay attention to.
data acquisition
Although users can perceive a variety of performance problems, we can still abstract them into specific monitoring indicators. In Hertz, these monitoring indicators include FPS, CPU utilization, memory usage, Carton, page loading time, network request traffic, etc. Some of these performance indicators are relatively easy to obtain, such as FPS, CPU utilization, memory occupancy, and others are not easy to obtain, such as Carton, page loading time, network request traffic, etc.
For example, in iOS, we can obtain FPS as follows:
- (void)tick:(CADisplayLink *)link { NSTimeInterval deltaTime = link.timestamp - self.lastTime; self.currentFPS = 1 / deltaTime; self.lastTime = link.timestamp; }
In Android, we can get the memory footprint as follows:
public long useSize() { Runtime runtime = Runtime.getRuntime(); long totalSize = runtime.maxMemory() >> 10; this.memoryUsage = (runtime.totalMemory() - runtime.freeMemory()) >> 10; this.memoryUsageRate = this.memoryUsage * 100 / totalSize; }
The above example is just to show that it is very simple to get FPS, memory, CPU, but these indicators must be combined with other data to make sense, such as the current page information, the current App run time, or the stack and run log of program execution when Carton occurs, etc. For example, the combination of CPU and current page information can evaluate the operation complexity of each page; the combination of memory and App runtime can observe the relationship between memory and usage time, and then analyze whether memory leaks occur; and the combination of FPS and Carton information can evaluate the performance of App when Carton happens.
Flow consumption
Mobile users are very sensitive to traffic, and American Delivery receives occasional complaints from users that they consume a lot of traffic in a short period of time, so we think about whether we can count users'traffic consumption locally in App and report it to the background. This statistic does not need to be accurate to each API, but can be roughly categorized to calculate the total traffic consumption. Our dimension of traffic statistics is: natural day + request source + network type. Why do we need to monitor traffic locally on the client side when we have traffic monitoring on the server side (such as CAT)? Local traffic can count all network requests sent by the client, which is very difficult for the server to monitor. One example is that not all network requests are reported to the server for monitoring; the other is that network reasons may cause users to consume only upstream traffic, but these requests do not reach the server.
In iOS, we implement traffic statistics by registering NSURLProtocol:
- (void)connectionDidFinishLoading:(NSURLConnection *)connection { [self.client URLProtocolDidFinishLoading:self]; self.data = nil; if (connection.originalRequest) { WMNetworkUsageDataInfo *info = [[WMNetworkUsageDataInfo alloc] init]; self.connectionEndTime = [[NSDate date] timeIntervalSince1970]; info.responseSize = self.responseDataLength; info.requestSize = connection.originalRequest.HTTPBody.length; info.contentType = [WMNetworkUsageURLProtocol getContentTypeByURL:connection.originalRequest.URL andMIMEType:self.MIMEType]; [[WMNetworkMeter sharedInstance] setLastDataInfo:info]; [[WMNetworkUsageManager sharedManager] recordNetworkUsageDataInfo:info]; }
}
In Android, we implement traffic statistics by intercepting network request API based on Aspectj:
@Pointcut("target(java.net.URLConnection) && " + "!within(retrofit.appengine.UrlFetchClient) " + "&& !within(okio.Okio) && !within(butterknife.internal.ButterKnifeProcessor) " + "&& !within(com.flurry.sdk.hb)" + "&& !within(rx.internal.util.unsafe.*) " + "&& !within(net.sf.cglib..*)" + "&& !within(com.huawei.android..*)" + "&& !within(com.sankuai.android.nettraffic..*)" + "&& !within(roboguice..*)" + "&& !within(com.alipay.sdk..*)") protected void baseCondition() { } @Pointcut("call (org.apache.http.HttpResponse org.apache.http.client.HttpClient.execute(org.apache.http.client.methods.HttpUriRequest))" + "&& target(org.apache.http.client.HttpClient)" + "&& args(request)" + "&& !within(com.sankuai.android.nettraffic.factory..*)" + "&& baseClientCondition()" ) void httpClientExecute(HttpUriRequest request) { }
After statistics of the total flow consumption, we also hope to classify the flow roughly to facilitate the location of the problem. There are two factors we care about: first, the source of requests, that is, traffic consumption comes from API requests, H5 or CDN; second, the network type, that is, Wifi, 4G or 3G traffic. For traffic sources, we first make a simple classification through domain names. Taking iOS as an example, the sample code is as follows:
- (NSString *) regApiHost { return _regApiHost ? _regApiHost :@"^(.*\\.)?(meituan\\.com|maoyan\\.com|dianping\\.com|kuxun\\.cn)$"; } - (NSString *) regResHost { return _regResHost ? _regResHost : @"^(.*\\.)?(meituan\\.net|dpfile\\.com)$"; } - (NSString *) regWebHost { return _regWebHost ? _regWebHost : @"^(.*\\.)?(meituan\\.com|maoyan\\.com|dianping\\.com|kuxun\\.cn|meituan\\.net|dpfile\\.com)$"; }
But some domain names may deploy both API services and Web services. For such domains, we further distinguish between MIMEType s by checking the return package. Taking iOS as an example, the sample code is as follows:
+ (BOOL)isPermissiveWebURL:(NSURL *)URL andMIMEType:(NSString *)MIMEType { NSRegularExpression *permissiveHost = [NSRegularExpression regularExpressionWithPattern:[[WMNetworkMeter sharedInstance] regWebHost] options:NSRegularExpressionCaseInsensitive error:nil]; NSString *host = URL.host; return ([MIMEType isEqualToString:@"text/css"] || [MIMEType isEqualToString:@"text/html"] || [MIMEType isEqualToString:@"application/x-javascript"] || [MIMEType isEqualToString:@"application/javascript"]) && (host && [permissiveHost numberOfMatchesInString:host options:0 range:NSMakeRange(0, [host length])]); }
Page load time
To measure page load time, we need to solve two problems. First, how to measure the loading time of a page; second, how to achieve speed measurement by not writing or writing less code as far as possible. First, take Android as an example. In the process of creating and loading Activity, many operations will be performed, such as setting the theme of the page, initializing the layout of the page, loading pictures, obtaining network data or reading and writing database, etc. The performance problems in any of the above operations may lead to the screen can not be displayed in time, affecting the user experience. Hertz abstracts these possible operations into the following velocity measurement model:
T1 refers to the time when the page is initialized to the display of the first UI element, which generally refers to waiting animation when data is loaded. T2 refers to the network request time, which may start earlier than the end of T1. T3 is the time when data is loaded into the UI, filled with data, and rendered again. T is the time between the initialization of the entire page and the completion of the final UI drawing.
For the second problem, it is very inefficient and error-prone to write code buried points manually at each time point. Hence, Hertz configures each page's corresponding API through a configuration file, unifying the buried points in the base classes of API requests. Of course, there is room for optimization, such as injecting buried code into API calls on hook key nodes.
[{ "page": "MainActivity", "api": [ "/poi/filter", "/home/head", "/home/rcmdboard" ] }, { "page": "RestaurantActivity", "api": [ "/poi/food" ] }]
In addition, there is also a question of how to determine whether UI rendering is complete. In Android, Hertz's approach is to insert a FrameLayout into Activity's rootView and monitor whether the FrameLayout calls the dispatchDraw method. Of course, the disadvantage of this scheme is that the insertion of a first-level View causes the hierarchical nesting to deepen.
@Override protected void dispatchDraw(Canvas canvas) { super.dispatchDraw(canvas); if (!mIsComplete) { mIsComplete = mCallback.onDrawEnd(this, mKey); } }
In iOS, we take a different approach. Hertz specifies a tag for an element of the final rendering page in the configuration file, and opens CADisplayLink to check whether the element appears under the root node after the network request succeeds.
- (void)tick:(CADisplayLink *)link { [_currentTrackRecordArray enumerateObjectsUsingBlock:^(WMHertzPageTrackRecord * _Nonnull record, NSUInteger idx, BOOL * _Nonnull stop) { if ([self findTag:record.configItem.tag inViewHierarchy:record.rootView]) { [self endPageRenderEvent:record]; } }]; }
Carton
At present, the mainstream mobile devices adopt dual cache + vertical synchronization display technology. Perhaps the principle is that the display system has two buffers. The GPU will pre-render a frame into a buffer for the video controller to read. When the next frame is rendered, the GPU will directly point the video controller's pointer to the second container. Here, the GPU waits for the VSync (Vertical Synchronization) signal of the display to be sent out before a new frame rendering and buffer updating are performed.
The screen refresh frequency of most mobile phones is 60HZ. If the task of this frame is not completed within 1000/60=16.67ms, frame loss will occur, which is the reason why the user feels Carton. The rendering task of this frame includes two parts: CPU and GPU. CPU is responsible for calculating the display content, such as view creation, layout calculation, picture decoding, text rendering, etc. Then CPU submits the calculated content to GPU for transformation, synthesis and rendering by GPU.
In addition to UI rendering, system events, input events, program callback services, and other codes we insert are also executed in the main thread. Once complex codes are added to the main thread, these codes may hinder the main thread from responding to clicks, sliding events, and UI rendering operations of the main thread, which is the most common cause of carton.
After understanding the principle of screen rendering and the causes of Carton formation, it is easy to think that by detecting FPS, we can know whether App has Carton or not, and we can also measure the quality of current page rendering by calculating the frame loss rate through a continuous FPS frame number. However, practice has found that the refresh frequency of FPS is very fast and it is prone to jitter, so it is difficult to detect Carton directly by comparing FPS. It is much easier to detect the execution time of the message loop in the main thread, which is also a common method to detect Katon in the industry. Hence, Hertz's practice is to detect the time of each message loop performed by the main thread, and when this time is greater than the threshold, it is recorded as a Carlton occurrence.
In practice, we find that some Carton continuity takes a long time, such as when opening a new page, while others take a relatively short time but have a faster frequency, such as when the list slides. Therefore, we adopt the decision strategy of "N times Carton exceeding threshold T", that is, when the number of times of Carton is more than N in a period of time, the acquisition and reporting can be triggered: for example, the Carton threshold T = 2000 ms, the number of Carton N=1, which can be determined as a single time-consuming Carton; and the Carton threshold T = 300 ms, the number of Carton N=5, which can be determined as a fast-frequency Carton.
Runnable loopRunnable = new Runnable() { @Override public void run() { if (mStartedDetecting && !isCatched) { nowLaggyCount++; if (nowLaggyCount >= N) { blockHandler.onBlockEvent(); isCatched = true; ... } } } }; public void onMainLoopFinish(){ if(isCatched){ blockHandler.onBlockFinishEvent(loopStartTime,loopEndTime); } resetStatus(); ... }
When Carton is detected, how can we locate the problem that caused Carton? Would it be cool to grab the call stack and run log of the program when Carton happened? Indeed, by grabbing the stack, we can very effectively locate the "problem code" that caused Carton.
In practice, we find that there are two problems to be paid attention to in crawling stack.
The first problem is the timing of stack grabbing. The time to crawl the stack must be at the time of the occurrence of Carton, not after, otherwise the code that caused Carton cannot be accurately captured, so in the sub-thread, when Carton is not finished, we will crawl the stack.
The second problem is how to classify stacks. The classification of Carton stack is different from that of Crash stack. It is obviously inappropriate to classify the innermost code because different business logic codes in the outer layer may have the same call stack in the innermost layer. It is also inappropriate to categorize the outermost code, because the outermost code may be business logic code or system call.
Hertz's current approach is to categorize according to the innermost classification principle and to match some simple rules to the class name of the hit rule.
Extensibility and ease of use
Hertz attaches great importance to the scalability and ease of use of SDK, and we did a lot of consideration at the beginning of the design. The framework of SDK is shown in the following figure, which is divided into three layers as a whole: the top layer is the interface layer, which provides a very small amount of external exposure methods, as well as environment and configuration parameters. The second layer is the business layer, which contains all the core logic such as page speed measurement, Carton detection and parameter acquisition. The third layer is the data adaptation layer, which encapsulates the data generated by the business layer into a unified data structure and adapts it to different output channels through the adapter.
Our first consideration in design is the ease of use of the interface. Hertz has built-in three modes of operation: development mode, test mode and online mode. The developer only needs to specify a pattern and Hertz can start working. Various modes presuppose the parameters needed for SDK operation, such as sampling frequency, Karton threshold, reporting channel switch, etc., while the logic of monitoring indicators acquisition, Karton detection, page speed measurement are automatically executed in-house. Taking Android as an example, the sample code is as follows:
final HertzConfiguration configuration = new HertzConfiguration.Builder(this) .mode(HertzMode.HERTZ_MODE_DEBUG) .appId(APP_ID) .unionId(UNION_ID) .build(); Hertz.getInstance().init(configuration);
Our second design consideration is SDK scalability. Taking the data adaptation layer as an example, there are five kinds of adaptation channels built in at present, which can adapt the collected monitoring data to different data channels. Depending on the mode of work selected, the data will be adapted to the server monitoring channel, generating test reports, or only exporting logs and prompts locally in App. One advantage of this design is that if a new data output channel is needed, either an interceptor can be added to the upper layer or an adapter can be added by altering only a small amount of SDK code. Similarly, the design of performance acquisition module and page speed measurement module also follows this idea.
practical application
After joining Hertz, American Delivery Group has the ability to find and locate performance problems. It has been verified in the development, test and online stages.
Application in Development Period
Accessing Hertz during development is equivalent to integrating an offline performance detection tool. When anomalies are detected, Hertz feeds these data back directly to the developer, as shown in the following figure:
The data collected at runtime will be output to the log, and a floating layer will be inserted on App's page to display the basic information of current FPS, CPU, memory and so on. If Carton is detected, a prompt page pops up and lists the current execution stack. At present, most stack logs can be positioned to the problematic codes obviously from the results of Katon test, and these problems can be easily optimized by looking at the code slightly and analyzing the reasons.
Here's an example of how initializing a complex UI causes Katon:
android.content.res.StringBlock.nativeGetString(Native Method) android.content.res.StringBlock.get(StringBlock.java:82) android.content.res.XmlBlock$Parser.getName(XmlBlock.java:175) android.view.LayoutInflater.inflate(LayoutInflater.java:470) android.view.LayoutInflater.inflate(LayoutInflater.java:420) android.view.LayoutInflater.inflate(LayoutInflater.java:371) com.sankuai.meituan.takeoutnew.controller.ui.PoiListAdapterController.getView(PoiListAdapterController.java:77) com.sankuai.meituan.takeoutnew.adapter.PoiListAdapter.getView(PoiListAdapter.java:26) android.widget.HeaderViewListAdapter.getView(HeaderViewListAdapter.java:220)
Here's an example of how Carton is created when using Gson to reverse parse strings:
com.google.gson.Gson.toJson(Gson.java:519) com.meituan.android.common.locate.util.GoogleJsonWrapper $MyGson.toJson(GoogleJsonWrapper.java:236) com.sankuai.meituan.location.collector.CollectorJson $MyGson.toJson(CollectorJson.java:216) com.sankuai.meituan.location.collector.CollectorFilter.saveCurrentData(CollectorFilter.java:67) com.sankuai.meituan.location.collector.CollectorFilter.init(CollectorFilter.java:33) com.sankuai.meituan.location.collector.CollectorFilter.<init>(CollectorFilter.java:27) com.sankuai.meituan.location.collector.CollectorMsgHandler.recordGps(CollectorMsgHandler.java:134) com.sankuai.meituan.location.collector.CollectorMsgHandler.getNewLocation(CollectorMsgHandler.java:81) com.meituan.android.common.locate.LocatorMsgHandler$1.handleMessage(LocatorMsgHandler.java:29)
Here's an example of how the main thread reads and writes a database to create a carton:
android.database.sqlite.SQLiteConnection.nativeExecuteForLastInsertedRowId(Native Method) android.database.sqlite.SQLiteConnection.executeForLastInsertedRowId(SQLiteConnection.java:782) android.database.sqlite.SQLiteSession.executeForLastInsertedRowId(SQLiteSession.java:788) android.database.sqlite.SQLiteStatement.executeInsert(SQLiteStatement.java:86) de.greenrobot.dao.AbstractDao.executeInsert(AbstractDao.java:306) de.greenrobot.dao.AbstractDao.insert(AbstractDao.java:276) com.sankuai.meituan.takeoutnew.db.dao.BaseAbstractDao.insert(BaseAbstractDao.java:25) com.sankuai.meituan.takeoutnew.log.LogDataUtil.insertIntoDb(LogDataUtil.java:243) com.sankuai.meituan.takeoutnew.log.LogDataUtil.saveLogInfo(LogDataUtil.java:221) com.sankuai.meituan.takeoutnew.log.LogDataUtil.saveLog(LogDataUtil.java:116) com.sankuai.meituan.takeoutnew.log.LogDataUtil.saveLogInfo(LogDataUtil.java:112) com.sankuai.meituan.takeoutnew.ui.page.main.order.OrderListFragment.onPageShown(OrderListFragment.java:306) com.sankuai.meituan.takeoutnew.ui.page.main.order.OrderListFragment.init(OrderListFragment.java:151) com.sankuai.meituan.takeoutnew.ui.page.main.order.OrderListFragment.onCreateView(OrderListFragment.java:81)
From the specific problems reported, most of the logs can be more clearly positioned to the problem code, as long as a little look at the code and analysis of the reasons, these problems can be easily optimized.
Application during test period
Traditional performance testing mostly relies on third-party tools. There is a big discrepancy between the generated data and the actual data. In addition, these tests often only give some indicators of data, but can not help developers locate the problem. We use Hertz to collect performance data during the testing phase. The testing methods can be manual testing, automated testing or monkey testing. After the performance data is obtained, a simple test report will be issued after the script processing.
Of course, this form of test report still needs to be manually exported logs and executed scripts, on which we will develop an automated test tool in the future.
Update application
For Carton detection, besides Hertz's ability to immediately feed back problems to developers during the development and testing periods, Hertz also uploads data to the server when it runs on gray scale or online. At present, the reporting channel is the intra-company CAT (available, please refer to details). Deep Analysis of Open Source Distributed Monitoring CAT Yi Wen. It can be seen that the classification and display of stacks are very similar to Crash monitoring which we are familiar with. According to the classification principle mentioned above, the Carton stack is arranged according to the number of occurrences, and can be filtered according to version, operating system and device, which is more in line with the use habits of developers.
For the statistics of traffic, we report the data of traffic consumption of the whole network users at the service end every day, and output a report to list the users who consume Top100. If an exception is found, it can be further checked according to the back-end log and the client diagnostic log to find out which network request caused the traffic exception.
For page speed data and basic indicators such as FPS, CPU, memory, Hertz will also report the data to CAT to evaluate the overall performance of App.
summary
Performance optimization is a topic that every mature App must take seriously, and the pain of performance optimization often lies in not finding problems in time, or finding problems but not locating them. In order to monitor the data and guide the idea of performance optimization, we developed and improved the performance monitoring scheme Hertz of App in practice, and made some explorations and validations in the monitoring and application of performance data.
Currently, Hertz's monitoring indicators include FPS, CPU usage, memory usage, Carton, page load time, network request traffic, etc. The monitoring of power consumption, cold start of App and Exception will be gradually added to Hertz's monitoring objectives. Performance monitoring indicators may reuse multiple existing tools in the future, and gradually improve on this basis.
Hertz's Carton detection and stack grabbing can effectively help developers locate performance problems, but there is still much room for optimization of the current Carton detection strategy. For example, whether different thresholds can be set according to different devices, and different strategies can be set at different times when App is running. For the classification of stacks, the current rule is simply to match the prefix of the class name. How to classify the stacks more accurately and reasonably is also a problem we should consider more in the future. Of course, these optimizations need more data samples to support.
It is also very important to build visual and friendly performance testing tools, such as a Web page that can be viewed in real time or can read historical reports. At the same time, Hertz can be easily designed to combine with automated testing methods, or generate test reports automatically in the integration phase. However, we have only made some preliminary attempts in this regard. When we have the ability to accurately collect performance data, how to better apply it to the entire development process, including testing, still needs long-term exploration and practice.
This article mainly introduces some ideas and implementations summarized in Hertz's practice of American Delegation Takeaway. There are many interesting and deeper topics about App's performance monitoring. For example, how to balance the performance problems brought by performance monitoring tools and tools themselves, the specific techniques and means of performance optimization, as well as the further analysis of performance data to establish the monitoring system of abnormal equipment, and so on. In the future, we will further explore, practice and share these issues.