download: Shell advanced development practicehttp://www.97yrbl.com/t-1315.html
When ANR occurs in app or watchdog is triggered by System, the System wants to generate a trace file to record the call stack information of each thread and the status information of some processes / threads. This file is usually stored in the / data/anr directory and is not available to app developers. However, starting from Android R(11), app can read the details of the file through the getHistoricalProcessExitReasons interface of AMS. The following is the content of a typical trace file.
----- pid 8331 at 2021-11-26 09:10:03 ----- Cmd line: com.hangl.test Build fingerprint: xxx ABI: 'arm64' Build type: optimized Zygote loaded classes=9118 post zygote classes=475 Dumping registered class loaders #0 dalvik.system.PathClassLoader: [], parent #1 #1 java.lang.BootClassLoader: [], no parent ... (Some states of the process as a whole, such as GC Statistics for) suspend all histogram: Sum: 161us 99% C.I. 2us-60us Avg: 16.100us Max: 60us DALVIK THREADS (14): "Signal Catcher" daemon prio=5 tid=7 Runnable | group="system" sCount=0 dsCount=0 flags=0 obj=0x14dc0298 self=0x7c4c962c00 ... "main" prio=5 tid=1 Native | group="main" sCount=1 dsCount=0 flags=1 obj=0x7263ee78 self=0x7c4c7dcc00 | sysTid=8331 nice=-10 cgrp=default sched=0/0 handle=0x7c4dd45ed0 | state=S schedstat=( 387029514 32429484 166 ) utm=28 stm=10 core=6 HZ=100 | stack=0x7feacb5000-0x7feacb7000 stackSize=8192KB | held mutexes= native: #00 pc 00000000000d0f48 /apex/com.android.runtime/lib64/bionic/libc.so (__epoll_pwait+8) native: #01 pc 00000000000180bc /system/lib64/libutils.so (android::Looper::pollInner(int)+144) native: #02 pc 0000000000017f8c /system/lib64/libutils.so (android::Looper::pollOnce(int, int*, int*, void**)+56) native: #03 pc 000000000013b920 /system/lib64/libandroid_runtime.so (android::android_os_MessageQueue_nativePollOnce(_JNIEnv*, _jobject*, long, int)+44) at android.os.MessageQueue.nativePollOnce(Native method) at android.os.MessageQueue.next(MessageQueue.java:336) at android.os.Looper.loop(Looper.java:174) at android.app.ActivityThread.main(ActivityThread.java:7397) at java.lang.reflect.Method.invoke(Native method) at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:492) at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:935) "Jit thread pool worker thread 0" daemon prio=5 tid=2 Native | group="main" sCount=1 dsCount=0 flags=1 obj=0x14dc0220 self=0x7bb9a05000 ... Copy code
This article is not intended to discuss the trigger types of ANR, nor to show the generation order of each piece of content in a running account, because many articles have been written, and there are many fine works among them. In view of this, this paper will focus on the generation process of call stack, which will help us better understand trace information.
preface
No matter ANR or Watchdog, the generation process of trace is carried out in target process. Taking ANR as an example, its decision process takes place in system_server(AMS), while the generation process of trace takes place in APP. So how to make APP start this process? The answer is to send it sigquit (signal 3). This is because the ptrace scheme is usually used for cross process information collection, which requires the collector to either have special permissions or meet the parent-child relationship between processes, which is not convenient for in-process collection.
Therefore, the first step of the analysis is to check the processing mode of signal 3 in the process.
1. Signal Catcher thread
The "Signal Catcher" thread exists in every Java process. During normal operation, it will suspend waiting for the arrival of signal 3 (and signal 10). When the process receives signal 3, it will be handled by the "Signal Catcher" thread, and the processing function is HandleSigQuit.
void SignalCatcher::HandleSigQuit() { Runtime* runtime = Runtime::Current(); std::ostringstream os; os << "\n" << "----- pid " << getpid() << " at " << GetIsoDate() << " -----\n"; DumpCmdLine(os); // Note: The strings "Build fingerprint:" and "ABI:" are chosen to match the format used by // debuggerd. This allows, for example, the stack tool to work. std::string fingerprint = runtime->GetFingerprint(); os << "Build fingerprint: '" << (fingerprint.empty() ? "unknown" : fingerprint) << "'\n"; os << "ABI: '" << GetInstructionSetString(runtime->GetInstructionSet()) << "'\n"; os << "Build type: " << (kIsDebugBuild ? "debug" : "optimized") << "\n"; runtime->DumpForSigQuit(os); if ((false)) { std::string maps; if (android::base::ReadFileToString("/proc/self/maps", &maps)) { os << "/proc/self/maps:\n" << maps; } } os << "----- end " << getpid() << " -----\n"; Output(os.str()); } Copy code
The jump process in the middle will not be shown, but directly enter the topic we care about: the collection process of call stack. Through the ThreadList::Dump function, we can collect the call stack information of all threads.
void ThreadList::Dump(std::ostream& os, bool dump_native_stack) { Thread* self = Thread::Current(); { MutexLock mu(self, *Locks::thread_list_lock_); os << "DALVIK THREADS (" << list_.size() << "):\n"; } if (self != nullptr) { DumpCheckpoint checkpoint(&os, dump_native_stack); size_t threads_running_checkpoint; { // Use SOA to prevent deadlocks if multiple threads are calling Dump() at the same time. ScopedObjectAccess soa(self); threads_running_checkpoint = RunCheckpoint(&checkpoint); } if (threads_running_checkpoint != 0) { checkpoint.WaitForThreadsToRunThroughCheckpoint(threads_running_checkpoint); } } else { DumpUnattachedThreads(os, dump_native_stack); } } Copy code
The key step is to execute the RunCheckpoint function. It divides the information collection of each thread into separate tasks: * * if the thread is in Runnable state (running java code), it will send the collected tasks to the thread for processing; if the thread is in other states, it will be completed by the "Signal Catcher" thread. * * Please remember this sentence, because the two different situations are analyzed in sections 2 and 3 below.
2. Checkpoint mechanism
The thread that sends the task to the Runnable state adopts the checkpoint mechanism, which is divided into two parts:
-
The "Signal Catcher" thread calls RequestCheckpoint to change the internal data of the art::Thread object of the target thread. Specifically, the following two fields are changed.
tls32_.state_and_flags.as_struct.flags |= kCheckpointRequest; tlsPtr_.checkpoint_function = function; (tls32_and tlsPtr_Both art::Thread Object's internal data) Copy code
-
For ART virtual machine, the target thread will check the state at the beginning of each method and the jump position of the loop statement_ and_ Flags field. If the checkpoint bit is set, the corresponding checkpoint function will be executed. In this way, the placement of checkpoints can ensure that the thread can "handle the checkpoint task in time": because all forward executed (linear and conditional branches are counted) codes will be executed in a limited time, and the codes that may be executed for a long time are either loops or method calls. Therefore, the timeliness can be guaranteed as long as checkpoints are inserted in these two places. (refer to R Da Zhihu) answer)
As for the checkpoint of the target thread, I also want to give an example to let you really feel its existence.
Bytecode can be interpreted and executed in ART virtual machine or compiled into machine code for execution. After a method is compiled into machine code (as shown below), we can see the operation of detecting state_and_flags at the entrance of the function. When a flag bit is set, execute the pTestSuspend action.