preface
before LEF file - stack backtracking This paper only describes the process of stack backtracking through exidx, which will be described according to eh_frame stack backtracking principle and process.
Principle description
The core of dwarf is a table, which is obtained according to the stack pressing process of a function. Take the assembly code of a function as an example:
0000000000023c80 <_dl_start>: _dl_start(): /usr/src/debug/glibc/2.31+gitAUTOINC+f84949f1c4-r0/git/csu/init-first.c:96 23c80: a9bf7bfd stp x29, x30, [sp, #-16]! 23c84: 910003fd mov x29, sp /usr/src/debug/glibc/2.31+gitAUTOINC+f84949f1c4-r0/git/csu/init-first.c:97 23c88: 94000024 bl 23d18 <abort>
The corresponding table is:
00000050 0000000000000014 00000054 FDE cie=00000000 pc=0000000000023c80..0000000000023c8c LOC CFA x29 ra 0000000000023c80 sp+0 u u 0000000000023c84 sp+16 c-16 c-8
The original data format in the table is:
00000050 0000000000000014 00000054 FDE cie=00000000 pc=0000000000023c80..0000000000023c8c DW_CFA_advance_loc: 4 to 0000000000023c84 //ip address DW_CFA_def_cfa_offset: 16 //sp offset 16 DW_CFA_offset: r29 (x29) at cfa-16 DW_CFA_offset: r30 (x30) at cfa-8
According to the compilation comparison table, the following explanation can be obtained:
- When the ip address is at 23c80, stack pressing has not occurred, and return addr has not changed
- When the ip address is at 23c84, stack pressing occurs. The address of the previous stack frame is cfa(call frame addr), x29 is saved at cfa - 16, and return addr is saved at cfa - 8
It can be seen from the above that according to the target IP address and table, the cfa of the upper level and the values of relevant registers can be inversely deduced, and the complete call stack can be obtained by analogy.
The above table is described by CIE (common information entry) and FDE (frame description entry), and CIE and FDE are stored in eh_frame section. In order to speed up the search of FDE and CIE, there is also an eh_frame_hdr section, which are described as follows:
- eh_frame_hdr structure
eh_ frame_ The HDR structure in the file is organized as follows:
Encoding | Field |
---|---|
unsigned char | version |
unsigned char | eh_frame_ptr_enc |
unsigned char | fde_count_enc |
unsigned char | table_enc |
encoded | eh_frame_ptr |
encoded | fde_count |
binary search table |
Members are described as follows:
- eh_frame_ptr_enc: EH in structure_ frame_ Coding of PTR, eh_frame_ptr is an indefinite length data, which needs to be based on eh_frame_ptr_ The data is composed of encoding rules of enc.
- fde_count_enc: FDE in structure_ Code of count, fde_count is also an indefinite length data, which needs to be calculated according to fde_count_ The data is composed of encoding rules of enc.
- table_enc: the code of fde table, which is more like a flag bit.
- eh_frame_ptr: point to eh_ The first address of the frame section.
- fde_count: the number of FDE entries in the search table
- binary search table: an fde table containing fde count fde entries. Each fde entry contains two members: the starting address and the fde address. fde table is used to quickly locate fde and improve search efficiency.
eh_frame_hdr is declared in libunwind library as follows:
struct __attribute__((packed)) dwarf_eh_frame_hdr { unsigned char version; unsigned char eh_frame_ptr_enc; //eh_frame_ptr_enc, for internal members EH_ Encoding format of frame unsigned char fde_count_enc; //fde_count encoding format unsigned char table_enc; //table encode, encoding format of table entry Elf_W (Addr) eh_frame; //Point to eh_frame section, /* The rest of the header is variable-length and consists of the following members: Followed by a variable length structure, which records the contents of the table encoded_t fde_count; struct { encoded_t start_ip; // first address covered by this FDE,The address is based on EH_ frame_ Offset of HDR address, not mapping address encoded_t fde_addr; // address of the FDE } binary_search_table[fde_count]; */ };
- FDE structure
The FDE structure is organized in the document as follows:
Field | Description |
---|---|
Length | Required |
Extended Length | Optional |
CIE Pointer | Required |
PC begin | Required |
PC Range | Required |
Augmentation Data Length | Optional |
Augmentation Data | Optional |
Call Frame Instructions | Required |
Padding |
The length of each member in FDE is uncertain, so it cannot be described in a fixed format structure. Instead, the corresponding value of each member needs to be read according to the coding rules / flags. The meaning of each member is described as follows:
- Length: the length occupied by the FDE, fixed to 4 bytes. When the value is 0xFFFFFFFF, it means that the FDE is in 64 bit DWARF format, and the FDE length is recorded in the following 64 bit data. Note that length itself is not included in the occupied length.
- Extended Length: this member is included only when FDE is in 64 bit DWARF format, indicating the length occupied by FDE.
- CIE Pointer: pointer to CIE. This value is actually an offset. The address of CIE can be obtained by subtracting the offset value from the address of the current CIE Pointer. The number of bytes occupied by this value is 4 or 8 bytes according to the format of DWARF.
- PC Begin: the starting address overwritten by the FDE. This value is encoded by the FDE encoding member in the CIE, and the number of bytes occupied by it needs to be determined according to the encoding rules.
- PC range: the range covered by the FDE. This value is also encoded by the FDE encoding member in the CIE. The number of bytes occupied by it needs to be determined according to the encoding rules.
- Augmentation Data Length: FDE extension data size, which depends on the extension string in CIE. This member is included only when the extension string starts with 'z'.
- Augmentation Data: FDE extension data, including pointer to LSDA, encoded by LSDA encoding member in CIE.
- Call Frame Instructions: a sequence of instructions that indicates how to deduce the call stack.
Because the members in FDE depend on the code of CIE, when reading the data in FDE, first analyze the CIE, and then read the FDE data members.
- CIE structure
The organizational structure of CIE structure in the document is as follows:
Field | Description |
---|---|
Length | Required |
Extended Length | Optional |
CIE ID | Required |
version | Required |
Augmentation String | Required |
Code Alignment Factor | Required |
Data Alignment Factor | Required |
Return Address Register | Required |
Augmentation Data Length | Optional |
Augmentation Data | Optional |
Inital Instructions | Optional |
Padding |
Members are described as follows:
- Length: the length occupied by the CIE, fixed to 4 bytes. When the value is 0xFFFFFFFF, it means that the FDE is in 64 bit DWARF format, and the CIE length is recorded in the following 64 bit data.
- Extended Length: this member is included only when the CIE is in 64 bit DWARF format, indicating the length occupied by the CIE.
- CIE ID: 0.
- version: fixed to 1.
- Augmentation String: extended parameter string, ending with NULL character.
- Code Alignment Factor: Code Alignment Factor.
- Data Alignment Factor: Data Alignment Factor.
- Return Address Register: Return Address Register, indicating which register holds the return address.
- Augmentation Data: depends on the character sequence in the Augmentation String, and the "R" character corresponds to fde encoding.
- Initial Instructions: Script
be careful:
For elf files without fde table, you can turn to debug_frame to search for qualified FDEs, but you need to debug at this time_ Frame starts parsing each fde line by line until the target fde is found. The elf file with fde table can directly locate the address of the target fde according to the search table and directly parse the fde.
Source code analysis
/*src/dwarf/Goarser.c*/ /* The function finds the saved locations and applies the register state as well. */ HIDDEN int dwarf_step (struct dwarf_cursor *c) { int ret; dwarf_state_record_t sr; if ((ret = find_reg_state (c, &sr)) < 0)//[1.1] return ret; return apply_reg_state (c, &sr.rs_current);//[1.2] }
[1.1] get the register loading status and fill the register according to FDE
/* Find the saved locations. */ static int find_reg_state (struct dwarf_cursor *c, dwarf_state_record_t *sr) { dwarf_reg_state_t *rs; struct dwarf_rs_cache *cache; int ret = 0; intrmask_t saved_mask; if ((cache = get_rs_cache(c->as, &saved_mask)) && (rs = rs_lookup(cache, c))) { /* update hint; no locking needed: single-word writes are atomic */ unsigned short index = rs - cache->buckets; c->use_prev_instr = ! cache->links[index].signal_frame; memcpy (&sr->rs_current, rs, sizeof (*rs)); } else { ret = fetch_proc_info (c, c->ip); //[1.1.1] int next_use_prev_instr = c->use_prev_instr; if (ret >= 0) { /* Update use_prev_instr for the next frame. */ assert(c->pi.unwind_info); struct dwarf_cie_info *dci = c->pi.unwind_info; next_use_prev_instr = ! dci->signal_frame; ret = create_state_record_for (c, sr, c->ip);//[1.1.2] } put_unwind_info (c, &c->pi); c->use_prev_instr = next_use_prev_instr; if (cache && ret >= 0) { rs = rs_new (cache, c); cache->links[rs - cache->buckets].hint = 0; memcpy(rs, &sr->rs_current, sizeof(*rs)); } } unsigned short index = -1; if (cache) { put_rs_cache (c->as, cache, &saved_mask); if (rs) { index = rs - cache->buckets; c->hint = cache->links[index].hint; cache->links[c->prev_rs].hint = index + 1; c->prev_rs = index; } } if (ret < 0) return ret; if (cache) tdep_reuse_frame (c, cache->links[index].signal_frame); return 0; }
[1.1.1] get FDE
static int fetch_proc_info (struct dwarf_cursor *c, unw_word_t ip) { int ret, dynamic = 1; /* The 'ip' can point either to the previous or next instruction depending on what type of frame we have: normal call or a place to resume execution (e.g. after signal frame). For a normal call frame we need to back up so we point within the call itself; this is important because a) the call might be the very last instruction of the function and the edge of the FDE, and b) so that run_cfi_program() runs locations up to the call but not more. For signal frame, we need to do the exact opposite and look up using the current 'ip' value. That is where execution will continue, and it's important we get this right, as 'ip' could be right at the function entry and hence FDE edge, or at instruction that manipulates CFA (push/pop). */ if (c->use_prev_instr) --ip; memset (&c->pi, 0, sizeof (c->pi)); /*First, search in the dynamic library list. If it does not exist in the dynamic library list, you need to traverse the dynamic library loaded by the process to find it*/ /* check dynamic info first --- it overrides everything else */ ret = unwi_find_dynamic_proc_info (c->as, ip, &c->pi, 1, c->as_arg); if (ret == -UNW_ENOINFO) { dynamic = 0; if ((ret = tdep_find_proc_info (c, ip, 1)) < 0) return ret; } if (c->pi.format != UNW_INFO_FORMAT_DYNAMIC && c->pi.format != UNW_INFO_FORMAT_TABLE && c->pi.format != UNW_INFO_FORMAT_REMOTE_TABLE) return -UNW_ENOINFO; c->pi_valid = 1; c->pi_is_dynamic = dynamic; /* Let system/machine-dependent code determine frame-specific attributes. */ if (ret >= 0) tdep_fetch_frame (c, ip, 1); return ret; }
HIDDEN int unwi_find_dynamic_proc_info (unw_addr_space_t as, unw_word_t ip, unw_proc_info_t *pi, int need_unwind_info, void *arg) { /*Judge whether it is the current address space, that is, find the porc of the process_ Info is also a proc for finding other processes (attached)_ info*/ if (as == unw_local_addr_space) return local_find_proc_info (as, ip, pi, need_unwind_info, arg); else return remote_find_proc_info (as, ip, pi, need_unwind_info, arg); }
static inline int local_find_proc_info (unw_addr_space_t as, unw_word_t ip, unw_proc_info_t *pi, int need_unwind_info, void *arg) { unw_dyn_info_list_t *list; unw_dyn_info_t *di; /*Check whether there is a linked list of dynamic libraries. If not, return directly. If yes, find the qualified dynamic libraries in the linked list*/ #ifndef UNW_LOCAL_ONLY # pragma weak _U_dyn_info_list_addr if (!_U_dyn_info_list_addr) return -UNW_ENOINFO; #endif list = (unw_dyn_info_list_t *) (uintptr_t) _U_dyn_info_list_addr (); for (di = list->first; di; di = di->next) if (ip >= di->start_ip && ip < di->end_ip) return unwi_extract_dynamic_proc_info (as, ip, pi, di, need_unwind_info, arg); return -UNW_ENOINFO; }
/*The following analysis takes the arm architecture and local unwind as examples, / include / tdep arm / libunwind_ i.h*/ #ifdef UNW_LOCAL_ONLY # define tdep_find_proc_info(c,ip,n) \ arm_find_proc_info((c)->as, (ip), &(c)->pi, (n), \ (c)->as_arg) # define tdep_put_unwind_info(as,pi,arg) \ arm_put_unwind_info((as), (pi), (arg)) #else # define tdep_find_proc_info(c,ip,n) \ (*(c)->as->acc.find_proc_info)((c)->as, (ip), &(c)->pi, (n), \ (c)->as_arg) # define tdep_put_unwind_info(as,pi,arg) \ (*(as)->acc.put_unwind_info)((as), (pi), (arg)) #endif
/*/src/arm/Gex_tables.c*/ HIDDEN int arm_find_proc_info (unw_addr_space_t as, unw_word_t ip, unw_proc_info_t *pi, int need_unwind_info, void *arg) { int ret = -1; intrmask_t saved_mask; Debug (14, "looking for IP=0x%lx\n", (long) ip); /*If DWARF mode is adopted*/ if (UNW_TRY_METHOD(UNW_ARM_METHOD_DWARF)) ret = dwarf_find_proc_info (as, ip, pi, need_unwind_info, arg); /*If EXIDX mode is adopted*/ if (ret < 0 && UNW_TRY_METHOD (UNW_ARM_METHOD_EXIDX)) { struct arm_cb_data cb_data; memset (&cb_data, 0, sizeof (cb_data)); cb_data.ip = ip; cb_data.pi = pi; cb_data.di.format = -1; SIGPROCMASK (SIG_SETMASK, &unwi_full_mask, &saved_mask); ret = dl_iterate_phdr (arm_phdr_cb, &cb_data); SIGPROCMASK (SIG_SETMASK, &saved_mask, NULL); if (cb_data.di.format != -1) ret = arm_search_unwind_table (as, ip, &cb_data.di, pi, need_unwind_info, arg); else ret = -UNW_ENOINFO; } return ret; }
/*/src/dwarf*/ HIDDEN int dwarf_find_proc_info (unw_addr_space_t as, unw_word_t ip, unw_proc_info_t *pi, int need_unwind_info, void *arg) { struct dwarf_callback_data cb_data; intrmask_t saved_mask; int ret; Debug (14, "looking for IP=0x%lx\n", (long) ip); memset (&cb_data, 0, sizeof (cb_data)); cb_data.ip = ip; cb_data.pi = pi; cb_data.need_unwind_info = need_unwind_info; cb_data.di.format = -1; cb_data.di_debug.format = -1; /*By DL_ iterate_ The phdr function traverses the elf files loaded by the process (including executable programs and all dynamic libraries), and calls the dwarf_callback function in turn to find the qualified dynamic libraries in the callback function*/ SIGPROCMASK (SIG_SETMASK, &unwi_full_mask, &saved_mask); ret = dl_iterate_phdr (dwarf_callback, &cb_data); SIGPROCMASK (SIG_SETMASK, &saved_mask, NULL); if (ret > 0) { if (cb_data.single_fde) /* already got the result in *pi */ return 0; /*If fde search table exists, search in seerch table*/ /* search the table: */ if (cb_data.di.format != -1) ret = dwarf_search_unwind_table_int (as, ip, &cb_data.di, pi, need_unwind_info, arg); else ret = -UNW_ENOINFO; /*If it is not found in the search table and the debug frame still exists, try to find it in the debug frame*/ if (ret == -UNW_ENOINFO && cb_data.di_debug.format != -1) ret = dwarf_search_unwind_table_int (as, ip, &cb_data.di_debug, pi, need_unwind_info, arg); } else ret = -UNW_ENOINFO; return ret; }
/*/src/dwarf/Gfind_proc_info-lsb.c*/ /* ptr is a pointer to a dwarf_callback_data structure and, on entry, member ip contains the instruction-pointer we're looking for. */ HIDDEN int dwarf_callback (struct dl_phdr_info *info, size_t size, void *ptr) { struct dwarf_callback_data *cb_data = ptr; unw_dyn_info_t *di = &cb_data->di; const Elf_W(Phdr) *phdr, *p_eh_hdr, *p_dynamic, *p_text; unw_word_t addr, eh_frame_start, eh_frame_end, fde_count, ip; Elf_W(Addr) load_base, max_load_addr = 0; int ret, need_unwind_info = cb_data->need_unwind_info; unw_proc_info_t *pi = cb_data->pi; struct dwarf_eh_frame_hdr *hdr = NULL; unw_accessors_t *a; long n; int found = 0; struct dwarf_eh_frame_hdr synth_eh_frame_hdr; #ifdef CONFIG_DEBUG_FRAME unw_word_t start, end; #endif /* CONFIG_DEBUG_FRAME*/ /*Target ip value to be located*/ ip = cb_data->ip; /* Make sure struct dl_phdr_info is at least as big as we need. */ if (size < offsetof (struct dl_phdr_info, dlpi_phnum) + sizeof (info->dlpi_phnum)) return -1; Debug (15, "checking %s, base=0x%lx)\n", info->dlpi_name, (long) info->dlpi_addr); phdr = info->dlpi_phdr; //Dynamic library program header pointer load_base = info->dlpi_addr; //Dynamic library load address p_text = NULL; //Code segment header address p_eh_hdr = NULL; //eh_frame_hdr first address p_dynamic = NULL; //Dynamic segment header address /* See if PC falls into one of the loaded segments. Find the eh-header segment at the same time. */ for (n = info->dlpi_phnum; --n >= 0; phdr++) //Traverse the segment of the dynamic library { if (phdr->p_type == PT_LOAD) //If it is a loading segment { Elf_W(Addr) vaddr = phdr->p_vaddr + load_base; if (ip >= vaddr && ip < vaddr + phdr->p_memsz) //If the destination address is in the loading segment, the loading segment is a code segment p_text = phdr; //Record the first address of the code segment if (vaddr + phdr->p_filesz > max_load_addr) //Record the maximum address of the virtual address loaded into memory max_load_addr = vaddr + phdr->p_filesz; } else if (phdr->p_type == PT_GNU_EH_FRAME) p_eh_hdr = phdr; //Record eh_frame_hdr segment program header else if (phdr->p_type == PT_DYNAMIC) p_dynamic = phdr; //Record dynamic segment program header } if (!p_text) //If ptext is 0, it means that the target ip is not in the dynamic library and returns directly return 0; if (p_eh_hdr) //If EH_ frame_ If HDR exists, record eh_frame_hdr segment header address { hdr = (struct dwarf_eh_frame_hdr *) (p_eh_hdr->p_vaddr + load_base); } else { //If not eh_frame_hdr needs to synthesize eh_frame_header Elf_W (Addr) eh_frame; Debug (1, "no .eh_frame_hdr section found\n"); eh_frame = dwarf_find_eh_frame_section (info); if (eh_frame) { Debug (1, "using synthetic .eh_frame_hdr section for %s\n", info->dlpi_name); synth_eh_frame_hdr.version = DW_EH_VERSION; synth_eh_frame_hdr.eh_frame_ptr_enc = DW_EH_PE_absptr | ((sizeof(Elf_W (Addr)) == 4) ? DW_EH_PE_udata4 : DW_EH_PE_udata8); synth_eh_frame_hdr.fde_count_enc = DW_EH_PE_omit; synth_eh_frame_hdr.table_enc = DW_EH_PE_omit; synth_eh_frame_hdr.eh_frame = eh_frame; hdr = &synth_eh_frame_hdr; } } if (hdr) //If you find eh_ frame_ First address of HDR { if (p_dynamic) //If a dynamic segment exists { /* For dynamicly linked executables and shared libraries, DT_PLTGOT is the value that data-relative addresses are relative to for that object. We call this the "gp". */ Elf_W(Dyn) *dyn = (Elf_W(Dyn) *)(p_dynamic->p_vaddr + load_base); for (; dyn->d_tag != DT_NULL; ++dyn) if (dyn->d_tag == DT_PLTGOT) //Find the first address of the PLTGOT (relocation table) segment according to the dynamic segment { /* Assume that _DYNAMIC is writable and GLIBC has relocated it (true for x86 at least). */ di->gp = dyn->d_un.d_ptr; break; } } else /* Otherwise this is a static executable with no _DYNAMIC. Assume that data-relative addresses are relative to 0, i.e., absolute. */ di->gp = 0; pi->gp = di->gp; //Record di - > GP to unwind info if (hdr->version != DW_EH_VERSION) { Debug (1, "table `%s' has unexpected version %d\n", info->dlpi_name, hdr->version); return 0; } a = unw_get_accessors_int (unw_local_addr_space); //Initialize structure addr = (unw_word_t) (uintptr_t) (&hdr->eh_frame); //eh_frame address, which is a variable length address /* (Optionally) read eh_frame_ptr: */ //Read eh_frame_start address. After reading, addr will increase and point to the next member if ((ret = dwarf_read_encoded_pointer (unw_local_addr_space, a, &addr, hdr->eh_frame_ptr_enc, pi, &eh_frame_start, NULL)) < 0) return ret; /* (Optionally) read fde_count: */ //Read fde_count if ((ret = dwarf_read_encoded_pointer (unw_local_addr_space, a, &addr, hdr->fde_count_enc, pi, &fde_count, NULL)) < 0) return ret; /*If there is no fde table, you need to traverse eh_frame to find*/ if (hdr->table_enc != (DW_EH_PE_datarel | DW_EH_PE_sdata4)) { /* If there is no search table or it has an unsupported encoding, fall back on linear search. */ if (hdr->table_enc == DW_EH_PE_omit) Debug (4, "table `%s' lacks search table; doing linear search\n", info->dlpi_name); else Debug (4, "table `%s' has encoding 0x%x; doing linear search\n", info->dlpi_name, hdr->table_enc); eh_frame_end = max_load_addr; /* XXX can we do better? */ if (hdr->fde_count_enc == DW_EH_PE_omit) fde_count = ~0UL; if (hdr->eh_frame_ptr_enc == DW_EH_PE_omit) abort (); Debug (1, "eh_frame_start = %lx eh_frame_end = %lx\n", eh_frame_start, eh_frame_end); /* XXX we know how to build a local binary search table for .debug_frame, so we could do that here too. */ found = linear_search (unw_local_addr_space, ip, eh_frame_start, eh_frame_end, fde_count, pi, need_unwind_info, NULL); if (found != 1) found = 0; else cb_data->single_fde = 1; } else //If there is an fde table, you can directly search the table according to the index and directly fill in the relevant data { di->format = UNW_INFO_FORMAT_REMOTE_TABLE; di->start_ip = p_text->p_vaddr + load_base; //Snippet start address di->end_ip = p_text->p_vaddr + load_base + p_text->p_memsz; //Code segment termination address di->u.rti.name_ptr = (unw_word_t) (uintptr_t) info->dlpi_name; //Dynamic library name di->u.rti.table_data = addr; //table address assert (sizeof (struct table_entry) % sizeof (unw_word_t) == 0); di->u.rti.table_len = (fde_count * sizeof (struct table_entry) / sizeof (unw_word_t)); /* For the binary-search table in the eh_frame_hdr, data-relative means relative to the start of that section... */ di->u.rti.segbase = (unw_word_t) (uintptr_t) hdr; //The reference of the relative address is EH_ frame_ Base address of PTR //That is, the addresses saved in the table entry are relative to segbase found = 1; Debug (15, "found table `%s': segbase=0x%lx, len=%lu, gp=0x%lx, " "table_data=0x%lx\n", (char *) (uintptr_t) di->u.rti.name_ptr, (long) di->u.rti.segbase, (long) di->u.rti.table_len, (long) di->gp, (long) di->u.rti.table_data); } } #ifdef CONFIG_DEBUG_FRAME /* Find the start/end of the described region by parsing the phdr_info structure. */ start = (unw_word_t) -1; end = 0; /*Find the start and end addresses loaded into memory*/ for (n = 0; n < info->dlpi_phnum; n++) { if (info->dlpi_phdr[n].p_type == PT_LOAD) { unw_word_t seg_start = info->dlpi_addr + info->dlpi_phdr[n].p_vaddr; unw_word_t seg_end = seg_start + info->dlpi_phdr[n].p_memsz; if (seg_start < start) start = seg_start; if (seg_end > end) end = seg_end; } } //Read the debug frame section from the file and fill in the table table found = dwarf_find_debug_frame (found, &cb_data->di_debug, ip, info->dlpi_addr, info->dlpi_name, start, end); #endif /* CONFIG_DEBUG_FRAME */ return found; }
/*The alias of the function is dwarf_search_unwind_table*/ #ifndef __clang__ static ALIAS(dwarf_search_unwind_table) int dwarf_search_unwind_table_int (unw_addr_space_t as, unw_word_t ip, unw_dyn_info_t *di, unw_proc_info_t *pi, int need_unwind_info, void *arg); #else #define dwarf_search_unwind_table_int dwarf_search_unwind_table #endif
int dwarf_search_unwind_table (unw_addr_space_t as, unw_word_t ip, unw_dyn_info_t *di, unw_proc_info_t *pi, int need_unwind_info, void *arg) { const struct table_entry *e = NULL, *table; unw_word_t ip_base = 0, segbase = 0, last_ip, fde_addr; unw_accessors_t *a; #ifndef UNW_LOCAL_ONLY struct table_entry ent; #endif int ret; unw_word_t debug_frame_base; size_t table_len; #ifdef UNW_REMOTE_ONLY assert (is_remote_table(di->format)); #else assert (is_remote_table(di->format) || di->format == UNW_INFO_FORMAT_TABLE); #endif assert (ip >= di->start_ip && ip < di->end_ip); /*From EH_ Found in table of frame*/ if (is_remote_table(di->format)) { table = (const struct table_entry *) (uintptr_t) di->u.rti.table_data; table_len = di->u.rti.table_len * sizeof (unw_word_t); debug_frame_base = 0; } else /*Find from debug frame*/ { assert(di->format == UNW_INFO_FORMAT_TABLE); #ifndef UNW_REMOTE_ONLY struct unw_debug_frame_list *fdesc = (void *) di->u.ti.table_data; /* UNW_INFO_FORMAT_TABLE (i.e. .debug_frame) is read from local address space. Both the index and the unwind tables live in local memory, but the address space to check for properties like the address size and endianness is the target one. */ as = unw_local_addr_space; table = fdesc->index; table_len = fdesc->index_size * sizeof (struct table_entry); debug_frame_base = (uintptr_t) fdesc->debug_frame; #endif } a = unw_get_accessors_int (as); segbase = di->u.rti.segbase; if (di->format == UNW_INFO_FORMAT_IP_OFFSET) { ip_base = di->start_ip; } else { ip_base = segbase; } #ifndef UNW_REMOTE_ONLY if (as == unw_local_addr_space) { e = lookup (table, table_len, ip - ip_base); //The relative address is ip - ipbase if (e && &e[1] < &table[table_len]) last_ip = e[1].start_ip_offset + ip_base; //Find the starting mapping address overwritten by the FDE else last_ip = di->end_ip; } else #endif { #ifndef UNW_LOCAL_ONLY int32_t last_ip_offset = di->end_ip - ip_base; segbase = di->u.rti.segbase; if ((ret = remote_lookup (as, (uintptr_t) table, table_len, ip - ip_base, &ent, &last_ip_offset, arg)) < 0) return ret; if (ret) { e = &ent; last_ip = last_ip_offset + ip_base; } else e = NULL; /* no info found */ #endif } if (!e) { Debug (1, "IP %lx inside range %lx-%lx, but no explicit unwind info found\n", (long) ip, (long) di->start_ip, (long) di->end_ip); /* IP is inside this table's range, but there is no explicit unwind info. */ return -UNW_ENOINFO; } Debug (15, "ip=0x%lx, start_ip=0x%lx\n", (long) ip, (long) (e->start_ip_offset)); if (debug_frame_base) fde_addr = e->fde_offset + debug_frame_base; else fde_addr = e->fde_offset + segbase; //Address of fde Debug (1, "e->fde_offset = %lx, segbase = %lx, debug_frame_base = %lx, " "fde_addr = %lx\n", (long) e->fde_offset, (long) segbase, (long) debug_frame_base, (long) fde_addr); //Parsing proc according to the content of fde if ((ret = dwarf_extract_proc_info_from_fde (as, a, &fde_addr, pi, debug_frame_base ? debug_frame_base : segbase, need_unwind_info, debug_frame_base != 0, arg)) < 0) return ret; /* .debug_frame uses an absolute encoding that does not know about any shared library relocation. */ if (di->format == UNW_INFO_FORMAT_TABLE) { pi->start_ip += segbase; pi->end_ip += segbase; pi->flags = UNW_PI_FLAG_DEBUG_FRAME; } #if defined(NEED_LAST_IP) pi->last_ip = last_ip; #else (void)last_ip; #endif if (ip < pi->start_ip || ip >= pi->end_ip) return -UNW_ENOINFO; return 0; }
/* Extract proc-info from the FDE starting at adress ADDR. Pass BASE as zero for eh_frame behaviour, or a pointer to debug_frame base for debug_frame behaviour. */ HIDDEN int dwarf_extract_proc_info_from_fde (unw_addr_space_t as, unw_accessors_t *a, unw_word_t *addrp, unw_proc_info_t *pi, unw_word_t base, int need_unwind_info, int is_debug_frame, void *arg) { unw_word_t fde_end_addr, cie_addr, cie_offset_addr, aug_end_addr = 0; unw_word_t start_ip, ip_range, aug_size, addr = *addrp; int ret, ip_range_encoding; struct dwarf_cie_info dci; uint64_t u64val; uint32_t u32val; Debug (12, "FDE @ 0x%lx\n", (long) addr); memset (&dci, 0, sizeof (dci)); //Read the length of the first member of FDE: if the value is not 0xffffffff, it indicates the length of FDE; If the value is 0xffffffff, it means that the next 64 bits are FDE length and FDE entry in 64 bit format. if ((ret = dwarf_readu32 (as, a, &addr, &u32val, arg)) < 0) return ret; if (u32val != 0xffffffff) { int32_t cie_offset = 0; /* In some configurations, an FDE with a 0 length indicates the end of the FDE-table. */ if (u32val == 0) return -UNW_ENOINFO; /* the FDE is in the 32-bit DWARF format */ //32-bit DWARF format *addrp = fde_end_addr = addr + u32val; //End address of FDE entry cie_offset_addr = addr; //Offset address of CIE if ((ret = dwarf_reads32 (as, a, &addr, &cie_offset, arg)) < 0) return ret; if (is_cie_id (cie_offset, is_debug_frame)) /* ignore CIEs (happens during linear searches) */ return 0; if (is_debug_frame) //According to whether it is debug_ In frame, get the address of CIE cie_addr = base + cie_offset; else /* DWARF says that the CIE_pointer in the FDE is a .debug_frame-relative offset, but the GCC-generated .eh_frame sections instead store a "pcrelative" offset, which is just as fine as it's self-contained. */ cie_addr = cie_offset_addr - cie_offset; } else //If it is a 64 bit FDE entry { int64_t cie_offset = 0; /* the FDE is in the 64-bit DWARF format */ if ((ret = dwarf_readu64 (as, a, &addr, &u64val, arg)) < 0) return ret; *addrp = fde_end_addr = addr + u64val; cie_offset_addr = addr; if ((ret = dwarf_reads64 (as, a, &addr, &cie_offset, arg)) < 0) return ret; if (is_cie_id (cie_offset, is_debug_frame)) /* ignore CIEs (happens during linear searches) */ return 0; if (is_debug_frame) cie_addr = base + cie_offset; else /* DWARF says that the CIE_pointer in the FDE is a .debug_frame-relative offset, but the GCC-generated .eh_frame sections instead store a "pcrelative" offset, which is just as fine as it's self-contained. */ cie_addr = (unw_word_t) ((uint64_t) cie_offset_addr - cie_offset); } Debug (15, "looking for CIE at address %lx\n", (long) cie_addr); //According to the address of the CIE, read and parse the contents of the CIE to the DCI structure if ((ret = parse_cie (as, a, cie_addr, pi, &dci, is_debug_frame, arg)) < 0) return ret; /* IP-range has same encoding as FDE pointers, except that it's always an absolute value: */ ip_range_encoding = dci.fde_encoding & DW_EH_PE_FORMAT_MASK; //The content of FDE is read and parsed according to the coding format of CIE. At this time, the instruction code of FDE is not parsed if ((ret = dwarf_read_encoded_pointer (as, a, &addr, dci.fde_encoding, pi, &start_ip, arg)) < 0 || (ret = dwarf_read_encoded_pointer (as, a, &addr, ip_range_encoding, pi, &ip_range, arg)) < 0) return ret; pi->start_ip = start_ip; pi->end_ip = start_ip + ip_range; pi->handler = dci.handler; if (dci.sized_augmentation) { if ((ret = dwarf_read_uleb128 (as, a, &addr, &aug_size, arg)) < 0) return ret; aug_end_addr = addr + aug_size; } if ((ret = dwarf_read_encoded_pointer (as, a, &addr, dci.lsda_encoding, pi, &pi->lsda, arg)) < 0) return ret; Debug (15, "FDE covers IP 0x%lx-0x%lx, LSDA=0x%lx\n", (long) pi->start_ip, (long) pi->end_ip, (long) pi->lsda); //Determine whether to save the contents of CIE FDE, if (need_unwind_info) { pi->format = UNW_INFO_FORMAT_TABLE; pi->unwind_info_size = sizeof (dci); pi->unwind_info = mempool_alloc (&dwarf_cie_info_pool); if (!pi->unwind_info) return -UNW_ENOMEM; if (dci.have_abi_marker) { if ((ret = dwarf_readu16 (as, a, &addr, &dci.abi, arg)) < 0 || (ret = dwarf_readu16 (as, a, &addr, &dci.tag, arg)) < 0) return ret; Debug (13, "Found ABI marker = (abi=%u, tag=%u)\n", dci.abi, dci.tag); } if (dci.sized_augmentation) dci.fde_instr_start = aug_end_addr; else dci.fde_instr_start = addr; dci.fde_instr_end = fde_end_addr; memcpy (pi->unwind_info, &dci, sizeof (dci)); } return 0; }
[1.1.2] parse FDE and fill register
static int create_state_record_for (struct dwarf_cursor *c, dwarf_state_record_t *sr, unw_word_t ip) { int ret; switch (c->pi.format) { case UNW_INFO_FORMAT_TABLE: case UNW_INFO_FORMAT_REMOTE_TABLE: if ((ret = setup_fde(c, sr)) < 0) //Parsing CIE instruction code return ret; ret = parse_fde (c, ip, sr); //Parsing FDE instruction code break; case UNW_INFO_FORMAT_DYNAMIC: ret = parse_dynamic (c, ip, sr); break; default: Debug (1, "Unexpected unwind-info format %d\n", c->pi.format); ret = -UNW_EINVAL; } return ret; }
static inline int setup_fde (struct dwarf_cursor *c, dwarf_state_record_t *sr) { int i, ret; assert (c->pi_valid); memset (sr, 0, sizeof (*sr)); for (i = 0; i < DWARF_NUM_PRESERVED_REGS + 2; ++i) set_reg (sr, i, DWARF_WHERE_SAME, 0); struct dwarf_cie_info *dci = c->pi.unwind_info; sr->rs_current.ret_addr_column = dci->ret_addr_column; unw_word_t addr = dci->cie_instr_start; //CIE instruction code header address unw_word_t curr_ip = 0; //ip = 0 dwarf_stackable_reg_state_t *rs_stack = NULL; ret = run_cfi_program (c, sr, &curr_ip, ~(unw_word_t) 0, &addr, dci->cie_instr_end, &rs_stack, dci); empty_rstate_stack(&rs_stack); if (ret < 0) return ret; memcpy (&sr->rs_initial, &sr->rs_current, sizeof (sr->rs_initial)); return 0; } static inline int parse_fde (struct dwarf_cursor *c, unw_word_t ip, dwarf_state_record_t *sr) { int ret; struct dwarf_cie_info *dci = c->pi.unwind_info; unw_word_t addr = dci->fde_instr_start; //Instruction code header address of FDE unw_word_t curr_ip = c->pi.start_ip; //First address covered by FDE dwarf_stackable_reg_state_t *rs_stack = NULL; /* Process up to current `ip` for signal frame and `ip - 1` for normal call frame See `c->use_prev_instr` use in `fetch_proc_info` for details. */ // c->use_ prev_ Instr = 0 or 1, used for current ip or ip - 1 backtracking ret = run_cfi_program (c, sr, &curr_ip, ip - c->use_prev_instr, &addr, dci->fde_instr_end, &rs_stack, dci); empty_rstate_stack(&rs_stack); if (ret < 0) return ret; return 0; }
[1.2] calculate the register value of the next level frame according to the register state obtained after FDE analysis. This is to analyze it by referring to the instruction code table
static int apply_reg_state (struct dwarf_cursor *c, struct dwarf_reg_state *rs) { unw_word_t regnum, addr, cfa, ip; unw_word_t prev_ip, prev_cfa; unw_addr_space_t as; dwarf_loc_t cfa_loc; unw_accessors_t *a; int i, ret; void *arg; prev_ip = c->ip; prev_cfa = c->cfa; as = c->as; arg = c->as_arg; a = unw_get_accessors_int (as); /* Evaluate the CFA first, because it may be referred to by other expressions. */ if (rs->reg.where[DWARF_CFA_REG_COLUMN] == DWARF_WHERE_REG) { /* CFA is equal to [reg] + offset: */ /* As a special-case, if the stack-pointer is the CFA and the stack-pointer wasn't saved, popping the CFA implicitly pops the stack-pointer as well. */ if ((rs->reg.val[DWARF_CFA_REG_COLUMN] == UNW_TDEP_SP) && (UNW_TDEP_SP < ARRAY_SIZE(rs->reg.val)) && (rs->reg.where[UNW_TDEP_SP] == DWARF_WHERE_SAME)) cfa = c->cfa; else { regnum = dwarf_to_unw_regnum (rs->reg.val[DWARF_CFA_REG_COLUMN]); if ((ret = unw_get_reg ((unw_cursor_t *) c, regnum, &cfa)) < 0) return ret; } cfa += rs->reg.val[DWARF_CFA_OFF_COLUMN]; } else { /* CFA is equal to EXPR: */ assert (rs->reg.where[DWARF_CFA_REG_COLUMN] == DWARF_WHERE_EXPR); addr = rs->reg.val[DWARF_CFA_REG_COLUMN]; if ((ret = eval_location_expr (c, as, a, addr, &cfa_loc, arg)) < 0) return ret; /* the returned location better be a memory location... */ if (DWARF_IS_REG_LOC (cfa_loc)) return -UNW_EBADFRAME; cfa = DWARF_GET_LOC (cfa_loc); } dwarf_loc_t new_loc[DWARF_NUM_PRESERVED_REGS]; memcpy(new_loc, c->loc, sizeof(new_loc)); for (i = 0; i < DWARF_NUM_PRESERVED_REGS; ++i) { switch ((dwarf_where_t) rs->reg.where[i]) { case DWARF_WHERE_UNDEF: new_loc[i] = DWARF_NULL_LOC; break; case DWARF_WHERE_SAME: break; case DWARF_WHERE_CFAREL: new_loc[i] = DWARF_MEM_LOC (c, cfa + rs->reg.val[i]); break; case DWARF_WHERE_REG: new_loc[i] = DWARF_REG_LOC (c, dwarf_to_unw_regnum (rs->reg.val[i])); break; case DWARF_WHERE_EXPR: addr = rs->reg.val[i]; if ((ret = eval_location_expr (c, as, a, addr, new_loc + i, arg)) < 0) return ret; break; case DWARF_WHERE_VAL_EXPR: addr = rs->reg.val[i]; if ((ret = eval_location_expr (c, as, a, addr, new_loc + i, arg)) < 0) return ret; new_loc[i] = DWARF_VAL_LOC (c, DWARF_GET_LOC (new_loc[i])); break; } } memcpy(c->loc, new_loc, sizeof(new_loc)); c->cfa = cfa; /* DWARF spec says undefined return address location means end of stack. */ if (DWARF_IS_NULL_LOC (c->loc[rs->ret_addr_column])) { c->ip = 0; ret = 0; } else { ret = dwarf_get (c, c->loc[rs->ret_addr_column], &ip); if (ret < 0) return ret; c->ip = ip; ret = 1; } /* XXX: check for ip to be code_aligned */ if (c->ip == prev_ip && c->cfa == prev_cfa) { Dprintf ("%s: ip and cfa unchanged; stopping here (ip=0x%lx)\n", __FUNCTION__, (long) c->ip); return -UNW_EBADFRAME; } if (c->stash_frames) tdep_stash_frame (c, rs); return ret; }
Example description
Dynamic library original file: libc-2.31.so
Via readelf - s libc-2.31 So get eh_frame_hdr and EH_ Position of frame section:
[17] .eh_frame_hdr PROGBITS 00000000001293e8 001293e8 0000000000005944 0000000000000000 A 0 0 4 [18] .eh_frame PROGBITS 000000000012ed30 0012ed30 0000000000022234 0000000000000000 A 0 0 8
View EH_ frame_ Binary content of HDR:
According to the foregoing eh_ frame_ The structure of HDR is described. The values of each member variable resolved at address 1293e8 are as follows:
version: 01 //byte eh_frame_ptr_enc:1b //byte fde_count_enc:03 //byte table_enc:3b //byte eh_frame_ptr:1293ec + 5944 = 12ed30 //It is consistent with that read out by readelf fde_count: b27 binary search table: ------------------------ start_ip: ffefa898 + 1293e8(eh_frame_hdr First address) = 23c80 fde_addr: 5998 + 1293e8 = 12ed80 ------------------------ start_ip: ffefa8a4 + 1293e8(eh_frame_hdr First address) = 23c8c fde_addr: 5a94 + 1293e8 = 12ef7c ------------------------ .......
The first address covered by the first FDE is 23c80, the FDE position is 0x12ed80, and the binary file contents are as follows:
The values of FDE member variables are as follows:
Length:0x14 CIE Pointer:0x54 //Then CIE addr is 12ed84 - 54 = 12ed30
Since other member variables in FDE depend on the code of CIE, you need to parse the content of CIE first:
The values of CIE member variables are as follows:
Length: 0x10 CIE ID:00000000 version:0x01 Augmentation String: 7a 52 00 //Corresponding to the zR string on the right Code Alignment Factor: 0x04 Data Alignment Factor: 0x78 Return Address Register: 0x1e Augmentation Data Length:0x01 Augmentation Data:0x1b //fde encoding, Inital Instructions: 0c 1f 00 //The total length is 0x10
According to the content of CIE, the member variables of FDE are resolved as follows:
Length:0x14 CIE Pointer:0x54 //Then CIE addr is 12ed84 - 54 = 12ed30 PC begin:0xffef4ef8 + 0x12ed88 = 0x23c80 //12ed88 is the current address PC Range:0x0c //pc range encoding = fde encoding & 0x0f = 0x0b Augmentation Data Length:0x00 Call Frame Instructions:41 0e 10 9d 02 9e 01//The total length is 0x14
For instruction code analysis, refer to the code, and readelf - WF libc-2.31 EH read by so_ The frame is compared as follows. It can be found that it is consistent with the last one.
Contents of the .eh_frame section: 00000000 0000000000000010 00000000 CIE Version: 1 Augmentation: "zR" Code alignment factor: 4 Data alignment factor: -8 Return address column: 30 Augmentation data: 1b DW_CFA_def_cfa: r31 (sp) ofs 0 00000014 0000000000000010 00000018 FDE cie=00000000 pc=0000000000024040..0000000000024044 DW_CFA_nop DW_CFA_nop DW_CFA_nop 00000028 0000000000000024 0000002c FDE cie=00000000 pc=0000000000024048..00000000000240e8 DW_CFA_advance_loc: 4 to 000000000002404c DW_CFA_def_cfa_offset: 48 DW_CFA_offset: r29 (x29) at cfa-48 DW_CFA_offset: r30 (x30) at cfa-40 DW_CFA_advance_loc: 16 to 000000000002405c DW_CFA_offset: r19 (x19) at cfa-32 DW_CFA_offset: r20 (x20) at cfa-24 DW_CFA_advance_loc: 80 to 00000000000240ac DW_CFA_remember_state DW_CFA_restore: r30 (x30) DW_CFA_restore: r29 (x29) DW_CFA_restore: r19 (x19) DW_CFA_restore: r20 (x20) DW_CFA_def_cfa_offset: 0 DW_CFA_advance_loc: 4 to 00000000000240b0 DW_CFA_restore_state DW_CFA_nop 00000050 0000000000000014 00000054 FDE cie=00000000 pc=0000000000023c80..0000000000023c8c DW_CFA_advance_loc: 4 to 0000000000023c84 DW_CFA_def_cfa_offset: 16 DW_CFA_offset: r29 (x29) at cfa-16 DW_CFA_offset: r30 (x30) at cfa-8