Transferred from: https://www.cnblogs.com/jiqingwu/p/elf_format_research_01.html
ELF is the abbreviation of Executable and Linking Format, which can be executed and linked. It is a part of ABI (Application Binary Interface) specification of Unix/Linux system.
Executable binary files, object code files, shared library files and core dump files under Unix/Linux all belong to ELF files.
The following figure is from the document Executable and Linkable Format (ELF) , describes the general layout of ELF files.
On the left is the link view of ELF, which can be understood as the content layout of the object code file. On the right is the execution view of ELF, which can be understood as the content layout of executable files.
Note that the content of the object code file is composed of section, while the content of the executable file is composed of segment.
Pay attention to the concepts of segment and section, which will be often mentioned later.
When we write assemblers, we use text,. bss,. data these instructions refer to sections, for example Text, tell the assembler to put the code after it text section.
The section in the object code file corresponds to the entry in the section header table one by one. The information of the section is used by the linker to relocate the code.
When the file is loaded into memory for execution, it is organized in segments. Each segment corresponds to an entry in the program header table in the ELF file, which is used to establish the process image of the executable file.
For example, as we usually say, code segments and data segments are segments, and the section s in the object code will be organized into various segments of the executable file by the linker.
The contents of the. text section are assembled into the code snippet data, . The contents of BSS and other sections will be included in the data segment.
In the target file, the program header is not necessary, and the target file generated by gcc does not contain the program header.
A useful tool for parsing ELF files is readelf. For an object code file sleep. On my local machine O execute readelf - s sleep o. The output is as follows:
There are 12 section headers, starting at offset 0x270: Section Headers: [Nr] Name Type Address Offset Size EntSize Flags Link Info Align [ 0] NULL 0000000000000000 00000000 0000000000000000 0000000000000000 0 0 0 [ 1] .text PROGBITS 0000000000000000 00000040 0000000000000015 0000000000000000 AX 0 0 1 [ 2] .rela.text RELA 0000000000000000 000001e0 0000000000000018 0000000000000018 I 9 1 8 [ 3] .data PROGBITS 0000000000000000 00000055 0000000000000000 0000000000000000 WA 0 0 1 [ 4] .bss NOBITS 0000000000000000 00000055 0000000000000000 0000000000000000 WA 0 0 1 ... ... ... ... [11] .shstrtab STRTAB 0000000000000000 00000210 0000000000000059 0000000000000000 0 0 1 Key to Flags: W (write), A (alloc), X (execute), M (merge), S (strings), I (info), L (link order), O (extra OS processing required), G (group), T (TLS), C (compressed), x (unknown), o (OS specific), E (exclude), l (large), p (processor specific)
readelf -S is the section information in the display file, sleep O there are 12 sections in total, and we have omitted the information of some of them.
As you can see, in addition to what we are familiar with text, .data, .bss, there are other sections, which will be specifically mentioned when we talk about Section in the future.
Looking at the Flags of each Section, we can also get some information, such as The Flags of the text section are AX, indicating that the memory to be allocated is executable. This Section is code.
. data and The Flags of bss are WA, indicating that they are writable and memory needs to be allocated, which are the characteristics of data segments.
Use readelf -l to display the program header information of the file. We are interested in sleep O execute readelf -l sleep o. There are no program headers in this file.
The program header corresponds to the segment in the file one by one. Because there is no segment in the object code file, the program header is unnecessary.
The contents of executable files are organized into segment s, so program header table is required.
section header is not required, but it is contained in binary files that have not been strip ed.
Execute readelf -l sleep on the local executable file sleep, and the output is as follows:
Elf file type is DYN (Shared object file) Entry point 0x1040 There are 11 program headers, starting at offset 64 Program Headers: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flags Align PHDR 0x0000000000000040 0x0000000000000040 0x0000000000000040 0x0000000000000268 0x0000000000000268 R 0x8 INTERP 0x00000000000002a8 0x00000000000002a8 0x00000000000002a8 0x000000000000001c 0x000000000000001c R 0x1 [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2] LOAD 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000560 0x0000000000000560 R 0x1000 LOAD 0x0000000000001000 0x0000000000001000 0x0000000000001000 0x00000000000001d5 0x00000000000001d5 R E 0x1000 LOAD 0x0000000000002000 0x0000000000002000 0x0000000000002000 0x0000000000000110 0x0000000000000110 R 0x1000 LOAD 0x0000000000002de8 0x0000000000003de8 0x0000000000003de8 0x0000000000000248 0x0000000000000250 RW 0x1000 DYNAMIC 0x0000000000002df8 0x0000000000003df8 0x0000000000003df8 0x00000000000001e0 0x00000000000001e0 RW 0x8 NOTE 0x00000000000002c4 0x00000000000002c4 0x00000000000002c4 0x0000000000000044 0x0000000000000044 R 0x4 GNU_EH_FRAME 0x0000000000002004 0x0000000000002004 0x0000000000002004 0x0000000000000034 0x0000000000000034 R 0x4 GNU_STACK 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 RW 0x10 GNU_RELRO 0x0000000000002de8 0x0000000000003de8 0x0000000000003de8 0x0000000000000218 0x0000000000000218 R 0x1 Section to Segment mapping: Segment Sections... 00 01 .interp 02 .interp .note.ABI-tag .note.gnu.build-id .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rela.dyn .rela.plt 03 .init .plt .text .fini 04 .rodata .eh_frame_hdr .eh_frame 05 .init_array .fini_array .dynamic .got .got.plt .data .bss 06 .dynamic 07 .note.ABI-tag .note.gnu.build-id 08 .eh_frame_hdr 09 10 .init_array .fini_array .dynamic .got
As shown in the output, there are 11 segments in the file. Only segments of type LOAD are really needed by the runtime.
In addition to segment information, it also outputs which sections each segment contains. For example, if the second LOAD segment is marked with R (read only) and E (executable), its number is 03, indicating which sections it contains. The line content is:
03 .init .plt .text .fini.
Can be found text is included in it. This section is the code section.
Another example is the third LOAD segment. The index is 04 and the flag is R (read-only), but there is no executable attribute. It contains section s rodata .eh_frame_hdr .eh_frame, where rodata represents read-only data, that is, string constants used in the program.
The last LOAD segment, index 05, flag RW (read / write), contains sections init_array .fini_array .dynamic .got .got.plt .data .bss, you can see Data and bss is included, and this section is undoubtedly a data section.
Let's stop here today. The following contents are organized as follows:
- First, let's talk about the header of Elf file, because the first dozens of bytes of the file are the data of Elf header. This data structure contains a lot of information and can tell us where the program header table and section header table are in the file.
- Next, we will talk about how to interpret the section header table and how to organize the data of the section.
- Then I will talk about program header table and data organization of segment. We also need to know how section s are organized into segments.
- Finally, we will talk about how to generate a process image if the program is loaded into memory by the loader.
Welcome to continue to pay attention.