ELF file parsing: Segment and Section

Transferred from: https://www.cnblogs.com/jiqingwu/p/elf_format_research_01.html

ELF is the abbreviation of Executable and Linking Format, which can be executed and linked. It is a part of ABI (Application Binary Interface) specification of Unix/Linux system.

Executable binary files, object code files, shared library files and core dump files under Unix/Linux all belong to ELF files.

The following figure is from the document Executable and Linkable Format (ELF) , describes the general layout of ELF files.

On the left is the link view of ELF, which can be understood as the content layout of the object code file. On the right is the execution view of ELF, which can be understood as the content layout of executable files.
Note that the content of the object code file is composed of section, while the content of the executable file is composed of segment.

Pay attention to the concepts of segment and section, which will be often mentioned later.
When we write assemblers, we use text,. bss,. data these instructions refer to sections, for example Text, tell the assembler to put the code after it text section.
The section in the object code file corresponds to the entry in the section header table one by one. The information of the section is used by the linker to relocate the code.

When the file is loaded into memory for execution, it is organized in segments. Each segment corresponds to an entry in the program header table in the ELF file, which is used to establish the process image of the executable file.
For example, as we usually say, code segments and data segments are segments, and the section s in the object code will be organized into various segments of the executable file by the linker.
The contents of the. text section are assembled into the code snippet data, . The contents of BSS and other sections will be included in the data segment.

In the target file, the program header is not necessary, and the target file generated by gcc does not contain the program header.
A useful tool for parsing ELF files is readelf. For an object code file sleep. On my local machine O execute readelf - s sleep o. The output is as follows:

There are 12 section headers, starting at offset 0x270:

Section Headers:
  [Nr] Name              Type             Address           Offset
       Size              EntSize          Flags  Link  Info  Align
  [ 0]                   NULL             0000000000000000  00000000
       0000000000000000  0000000000000000           0     0     0
  [ 1] .text             PROGBITS         0000000000000000  00000040
       0000000000000015  0000000000000000  AX       0     0     1
  [ 2] .rela.text        RELA             0000000000000000  000001e0
       0000000000000018  0000000000000018   I       9     1     8
  [ 3] .data             PROGBITS         0000000000000000  00000055
       0000000000000000  0000000000000000  WA       0     0     1
  [ 4] .bss              NOBITS           0000000000000000  00000055
       0000000000000000  0000000000000000  WA       0     0     1
  ... ... ... ...
  [11] .shstrtab         STRTAB           0000000000000000  00000210
       0000000000000059  0000000000000000           0     0     1
Key to Flags:
  W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
  L (link order), O (extra OS processing required), G (group), T (TLS),
  C (compressed), x (unknown), o (OS specific), E (exclude),
  l (large), p (processor specific)

readelf -S is the section information in the display file, sleep O there are 12 sections in total, and we have omitted the information of some of them.
As you can see, in addition to what we are familiar with text, .data, .bss, there are other sections, which will be specifically mentioned when we talk about Section in the future.
Looking at the Flags of each Section, we can also get some information, such as The Flags of the text section are AX, indicating that the memory to be allocated is executable. This Section is code.
. data and The Flags of bss are WA, indicating that they are writable and memory needs to be allocated, which are the characteristics of data segments.

Use readelf -l to display the program header information of the file. We are interested in sleep O execute readelf -l sleep o. There are no program headers in this file.
The program header corresponds to the segment in the file one by one. Because there is no segment in the object code file, the program header is unnecessary.

The contents of executable files are organized into segment s, so program header table is required.
section header is not required, but it is contained in binary files that have not been strip ed.
Execute readelf -l sleep on the local executable file sleep, and the output is as follows:

Elf file type is DYN (Shared object file)
Entry point 0x1040
There are 11 program headers, starting at offset 64

Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  PHDR           0x0000000000000040 0x0000000000000040 0x0000000000000040
                 0x0000000000000268 0x0000000000000268  R      0x8
  INTERP         0x00000000000002a8 0x00000000000002a8 0x00000000000002a8
                 0x000000000000001c 0x000000000000001c  R      0x1
      [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
  LOAD           0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x0000000000000560 0x0000000000000560  R      0x1000
  LOAD           0x0000000000001000 0x0000000000001000 0x0000000000001000
                 0x00000000000001d5 0x00000000000001d5  R E    0x1000
  LOAD           0x0000000000002000 0x0000000000002000 0x0000000000002000
                 0x0000000000000110 0x0000000000000110  R      0x1000
  LOAD           0x0000000000002de8 0x0000000000003de8 0x0000000000003de8
                 0x0000000000000248 0x0000000000000250  RW     0x1000
  DYNAMIC        0x0000000000002df8 0x0000000000003df8 0x0000000000003df8
                 0x00000000000001e0 0x00000000000001e0  RW     0x8
  NOTE           0x00000000000002c4 0x00000000000002c4 0x00000000000002c4
                 0x0000000000000044 0x0000000000000044  R      0x4
  GNU_EH_FRAME   0x0000000000002004 0x0000000000002004 0x0000000000002004
                 0x0000000000000034 0x0000000000000034  R      0x4
  GNU_STACK      0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x0000000000000000 0x0000000000000000  RW     0x10
  GNU_RELRO      0x0000000000002de8 0x0000000000003de8 0x0000000000003de8
                 0x0000000000000218 0x0000000000000218  R      0x1

 Section to Segment mapping:
  Segment Sections...
   00
   01     .interp
   02     .interp .note.ABI-tag .note.gnu.build-id .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rela.dyn .rela.plt
   03     .init .plt .text .fini
   04     .rodata .eh_frame_hdr .eh_frame
   05     .init_array .fini_array .dynamic .got .got.plt .data .bss
   06     .dynamic
   07     .note.ABI-tag .note.gnu.build-id
   08     .eh_frame_hdr
   09
   10     .init_array .fini_array .dynamic .got

As shown in the output, there are 11 segments in the file. Only segments of type LOAD are really needed by the runtime.
In addition to segment information, it also outputs which sections each segment contains. For example, if the second LOAD segment is marked with R (read only) and E (executable), its number is 03, indicating which sections it contains. The line content is:
03 .init .plt .text .fini.
Can be found text is included in it. This section is the code section.
Another example is the third LOAD segment. The index is 04 and the flag is R (read-only), but there is no executable attribute. It contains section s rodata .eh_frame_hdr .eh_frame, where rodata represents read-only data, that is, string constants used in the program.
The last LOAD segment, index 05, flag RW (read / write), contains sections init_array .fini_array .dynamic .got .got.plt .data .bss, you can see Data and bss is included, and this section is undoubtedly a data section.

Let's stop here today. The following contents are organized as follows:

    • First, let's talk about the header of Elf file, because the first dozens of bytes of the file are the data of Elf header. This data structure contains a lot of information and can tell us where the program header table and section header table are in the file.
    • Next, we will talk about how to interpret the section header table and how to organize the data of the section.
    • Then I will talk about program header table and data organization of segment. We also need to know how section s are organized into segments.
    • Finally, we will talk about how to generate a process image if the program is loaded into memory by the loader.
      Welcome to continue to pay attention.

Added by fncuis on Fri, 04 Feb 2022 12:33:00 +0200