Previous address:
- Operating system series - operating system overview
- Operating system Series II - process
- Operating system series 3 -- compilation and link relationship
- Operating system series 4 - stack and function call relationship
Theme of this issue:
Detailed explanation of target file
1. Definition and classification of target documents
Target file definition:
The file generated by the compiler after compiling the source code is called an object file
In terms of structure, the target file has not been linked, so some symbols may not have been adjusted
At present, the executable file formats on the PC side mainly include PE(Portable Execute) under windows and ELF under linux (Execute Linkable Format). The object file is the intermediate file whose source code has been compiled but not linked, such as. obj file under windows and. o file under linux. The content of this type of file is almost the same as the structure and executable file. Therefore, in a broad sense, the object file and executable file can be regarded as one type of file, which is collectively referred to as ELF file lattice under linux Type.
Therefore, ELF file formats can be divided into the following categories:
ELF file type | explain | example |
---|---|---|
Relocatable file | Compiled files can be linked to generate executable files | Under linux o and windows obj file |
executable file | Programs that can be executed directly, typical ELF file format | / bin/bash under linux and exe file under windows |
shared object file | Dynamically linked files can be linked with other relocatable files and shared target files to generate new target files | Under linux DLL files under so and windows |
core dump file | Process information storage file. When the process terminates unexpectedly, the system saves the address space of the process and some information at the time of termination to the file | core dump file under linux |
2. What is the target document like?
1. Intuitively understand the target document
Write a simple test program: main c
#include <stdio.h> int global_init_var = 100; //data segment int global_uninit_var; //bss segment int main(void) { static int static_var = 200; //data segment static int static_var2; //bss segment int a = 1; //text int b; //text return 0; } jason@ubuntu:~/WorkSpace/3.OS_study/1.object_file$ gcc -c main.c jason@ubuntu:~/WorkSpace/3.OS_study/1.object_file$ file main.o main.o: ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), not stripped
You can see main O is a relocate file. The target file will store information in the form of section. It can be simply understood as the following figure
Among them,
The beginning of ELF file is a file header, which describes the file attributes of the whole file, including section table, which describes the situation of each section
. text section, the compiled execution statements are translated into machine code and placed in text segment
. data section, initialized global variables and local static variables
. bss segment, uninitialized global and local static variables
2. Use of ELF file analysis tool
Common tools include objdump and readelf
objdump:
Usage: objdump < Options > < File >
Displays information from the target < File >.
At least one of the following options must be given:
-a, --archive-headers Display archive header information
-f, --file-headers Display the contents of the overall file header
-p, --private-headers Display object format specific file header contents
-P, --private=OPT,OPT... Display object format specific contents
-h, --[section-]headers Display the contents of the section headers
-x, --all-headers Display the contents of all headers
-d, --disassemble Display assembler contents of executable sections
-D, --disassemble-all Display assembler contents of all sections
–disassemble= Display assembler contents from
-S, --source Intermix source code with disassembly
–source-comment[=] Prefix lines of source code with
-s, --full-contents Display the full contents of all sections requested
-g, --debugging Display debug information in object file
-e, --debugging-tags Display debug information using ctags style
-G, --stabs Display (in raw form) any STABS info in the file
-W[lLiaprmfFsoRtUuTgAckK] or
–dwarf[=rawline,=decodedline,=info,=abbrev,=pubnames,=aranges,=macro,=frames,
=frames-interp,=str,=loc,=Ranges,=pubtypes,
=gdb_index,=trace_info,=trace_abbrev,=trace_aranges,
=addr,=cu_index,=links,=follow-links]
Display DWARF info in the file
–ctf=SECTION Display CTF info from SECTION
-t, --syms Display the contents of the symbol table(s)
-T, --dynamic-syms Display the contents of the dynamic symbol table
-r, --reloc Display the relocation entries in the file
-R, --dynamic-reloc Display the dynamic relocation entries in the file
@ Read options from
-v, --version Display this program's version number
-i, --info List object formats and architectures supported
-H, --help Display this information
objdump -x display elf All files header information objdump -h display elf Document section header information objdump -s Display the disassembled information in hexadecimal jason@ubuntu:~/WorkSpace/3.OS_study/1.object_file$ objdump -x main.o main.o: file format elf64-x86-64 main.o Architecture: i386:x86-64, Flag 0 x00000011: HAS_RELOC, HAS_SYMS Start address 0 x0000000000000000 Section: Idx Name Size VMA LMA File off Algn 0 .text 00000016 0000000000000000 0000000000000000 00000040 2**0 CONTENTS, ALLOC, LOAD, READONLY, CODE 1 .data 00000008 0000000000000000 0000000000000000 00000058 2**2 CONTENTS, ALLOC, LOAD, DATA 2 .bss 00000004 0000000000000000 0000000000000000 00000060 2**2 ALLOC 3 .comment 0000002b 0000000000000000 0000000000000000 00000060 2**0 CONTENTS, READONLY 4 .note.GNU-stack 00000000 0000000000000000 0000000000000000 0000008b 2**0 CONTENTS, READONLY 5 .note.gnu.property 00000020 0000000000000000 0000000000000000 00000090 2**3 CONTENTS, ALLOC, LOAD, READONLY, DATA 6 .eh_frame 00000038 0000000000000000 0000000000000000 000000b0 2**3 CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATA SYMBOL TABLE: 0000000000000000 l df *ABS* 0000000000000000 main.c 0000000000000000 l d .text 0000000000000000 .text 0000000000000000 l d .data 0000000000000000 .data 0000000000000000 l d .bss 0000000000000000 .bss 0000000000000000 l O .bss 0000000000000004 static_var2.2319 0000000000000004 l O .data 0000000000000004 static_var.2318 0000000000000000 l d .note.GNU-stack 0000000000000000 .note.GNU-stack 0000000000000000 l d .note.gnu.property 0000000000000000 .note.gnu.property 0000000000000000 l d .eh_frame 0000000000000000 .eh_frame 0000000000000000 l d .comment 0000000000000000 .comment 0000000000000000 g O .data 0000000000000004 global_init_var 0000000000000004 O *COM* 0000000000000004 global_uninit_var 0000000000000000 g F .text 0000000000000016 main RELOCATION RECORDS FOR [.eh_frame]: OFFSET TYPE VALUE 0000000000000020 R_X86_64_PC32 .text jason@ubuntu:~/WorkSpace/3.OS_study/1.object_file$ objdump -h main.o main.o: file format elf64-x86-64 Section: Idx Name Size VMA LMA File off Algn 0 .text 00000016 0000000000000000 0000000000000000 00000040 2**0 CONTENTS, ALLOC, LOAD, READONLY, CODE 1 .data 00000008 0000000000000000 0000000000000000 00000058 2**2 CONTENTS, ALLOC, LOAD, DATA 2 .bss 00000004 0000000000000000 0000000000000000 00000060 2**2 ALLOC 3 .comment 0000002b 0000000000000000 0000000000000000 00000060 2**0 CONTENTS, READONLY 4 .note.GNU-stack 00000000 0000000000000000 0000000000000000 0000008b 2**0 CONTENTS, READONLY 5 .note.gnu.property 00000020 0000000000000000 0000000000000000 00000090 2**3 CONTENTS, ALLOC, LOAD, READONLY, DATA 6 .eh_frame 00000038 0000000000000000 0000000000000000 000000b0 2**3 CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATA
Through the information from dump in the figure above, we mainly look at the "size" and "file off" sections. We can roughly put main O the ELF file structure is understood as shown in the following figure:
3. Analysis of various segments
Look at the main O information from objdump -s -d dump actually used
1.text code snippet
For the disassembly of code instructions, the hexadecimal in the content can correspond to the disassembly instructions.
2.data segment
. The data section holds the initialized global variables and local static variables.
For example, global_ init_ The hexadecimal value corresponding to var = 100 is 0x64, which can be found in the hexadecimal content of the data section
3.bss section
. The bss segment holds uninitialized global variables and local static variables
doubt:
You can see from the previous dump information The bss segment has only four bytes, but there is actually a global in the code_ unint_ VaR and static_var2 is only right for two uninitialized variables, which should account for 8 bytes?
So we analyze this problem through the symbol table:
objdump -x obtains the following results:
We found that static_var2 is bss segment, but global_uninit_var is in the COM section, which involves the related knowledge of links. I won't talk about it in detail here. I can give a preliminary conclusion:
- The static variable visible inside the compilation unit will be placed in the if it is not initialized bss segment (such as static_var2 in the above example)
- Uninitialized global variables will be treated as weak symbols because they involve the link mechanism, so they will be placed under the COMMON type