PE header parsing (PE header only)

Learn to understand, if wrong, also hope to point out

catalogue

preface

process

summary

preface

PE format is a format specification that executable programs on Windows need to comply with.

Next, let's introduce the meaning of some members in the PE header.

DOS header

The size is fixed at 64 bytes

struct _IMAGE_DOS_HEADER ,

{0x00 WORD e_magic; / / indicates whether the file is a PE file. If it is a PE file, the value is 4D5A, and the ASCII value corresponds to MZ.

0x02 WORD e_cblp;

0x04 WORD e_cp;

0x06 WORD e_crlc;

0x08 WORD e_cparhdr;

0x0a WORD e_minalloc;

0x0c WORD e_maxalloc;

0x0e WORD e_ss;

0x10 WORD e_sp;

0x12 WORD e_csum;

0x14 WORD e_ip;

0x16 WORD e_cs;

0x18 WORD e_lfarlc;

0x1a WORD e_ovno;

0x1c WORD e_res[4];

0x24 WORD e_oemid;

0x26 WORD e_oeminfo;

0x28 WORD e_res2[10];

0x3c DWORD e_lfanew; / / indicates the PE ID, that is, the offset of the start position of the NT header. The size is not fixed. Because the area between the end of the DOS header and the beginning of the NT header is a free area left by the operating system for the compiler to use. The size is uncertain, so this member is required to store the offset of the NT header.

};

struct _IMAGE_NT_HEADERS

{0x00 DWORD Signature; //PE identification, that is, the place pointed by the last member of DOS header in the above figure.

0x04 _image _file _headerfileheader; / / standard PE header

0x18 _image_optional_header; / / Optional PE header

};

Standard PE head

The size is 20 bytes

struct _IMAGE_FILE_HEADER

{

0x00 WORD Machine; / / identify the CPU on which this program can be allowed. If 0x00, this program can be allowed on any CPU. If 014c, it can be executed on 386 and subsequent CPUs.

0x02 WORD NumberOfSections; / / remove the number of sections in the PE header

0x04 DWORD TimeDateStamp; / / timestamp, because the compiler will generate a MAP file when compiling. The MAP file records the name and address of the program's function. There is a timestamp in the MAP file, which records the generation time of the MAP file. This time matches the EXE program. Some shelling software will need to provide a MAP file when shelling File to identify whether the timestamp of the MAP file matches exe.

0x08 DWORD PointerToSymbolTable;

0x0c DWORD NumberOfSymbols;

0x10 WORD SizeOfOptionalHeader; / / records the size of the Optional PE header. The 32-bit default is E0 and the 64 bit default is F0. The size can be adjusted.

0x12 WORD Characteristics; / / each bit has a specific meaning. The executable program is 10F, level 0,1,2,3,8, position 1

};

Optional PE head

The size is uncertain. The 32-bit default is E0, and the 64 system default is F0. You can define the size yourself.

struct _IMAGE_OPTIONAL_HEADER

{

0x00 WORD Magic; / / describes the file type, 10B describes the PE file under 32 bits, and 20B describes the PE file under 64 bits

0x02 BYTE MajorLinkerVersion;

0x03 BYTE MinorLinkerVersion;

0x04 DWORD SizeOfCode; / / the sum of all code sections must be an integer multiple of FileAlignment

0x08 DWORD SizeOfInitializedData; / / the sum of all initialized data must be an integer multiple of FileAlignment

0x0c DWORD sizeouninitializeddata; / / the sum of all uninitialized data must be an integer multiple of FileAlignment

0x10 DWORD AddressOfEntryPoint; / / referred to as OEP, the entry address of the program. You need to cooperate with ImageBase to locate the entry address of the program. I believe you may have questions here. Why not directly locate the entry address? You also need to use ImageBase+OEP offset to locate it, because an EXE program is likely to consist of more than one PE file. If you write it directly , and this location has been occupied by other PE files, so something happens. If this offset method is adopted, after the ImageBase is changed, the offset can still be located through OEP, and the program can still run.

0x14 DWORD BaseOfCode; / / the base address of the code section.

0x18 DWORD BaseOfData; / / the base address of the data section.

0x1c DWORD ImageBase; / / a base address, that is, the starting address, that the program loads into memory.

0x20 DWORD SectionAlignment; / / memory alignment, 1000 bytes

0x24 DWORD FileAlignment; / / hard disk alignment, 200 bytes

Here, an exe program is stored on the hard disk. If it is directly opened through a hexadecimal editor, it will not make any changes. It is the same as in the hard disk. However, although the program is loaded as memory, it cannot run. If the hard disk alignment and memory alignment dimensions are different, there will be a stretching process. For example, the code section is in the hard disk The disk occupies 389 bytes, and then the hard disk allocates 400 bytes (for alignment). When this program is allowed, it will be stretched to 1000 bytes when it is pulled into the hard disk. If the hard disk alignment is consistent with the memory alignment, there will be no stretching process.

0x28 WORD MajorOperatingSystemVersion;

0x2a WORD MinorOperatingSystemVersion;

0x2c WORD MajorImageVersion;

0x2e WORD MinorImageVersion;

0x30 WORD MajorSubsystemVersion;

0x32 WORD MinorSubsystemVersion;

0x34 DWORD Win32VersionValue;

0x38 DWORD SizeOfImage; / / the mapping size of the program in memory can be set longer than the original size, but it must be an integer multiple of SectionAlignment

0x3c DWORD SizeOfHeaders; / / all headers plus the size of the section table. It must be an integer multiple of the file alignment. (DOS header + PE ID + standard PE header + Optional PE header + section table)

0x40 DWORD CheckSum; / / the checksum is actually very simple, that is, the data is added from start to end and stored in this member to overflow naturally.

0x44 WORD Subsystem;

0x46 WORD DllCharacteristics;

0x48 DWORD SizeOfStackReserve; / / reserve the stack size during initialization

0x4c DWORD SizeOfStackCommit; / / the actual stack size submitted during initialization

0x50 DWORD sizeofhepreserve; / / reserve the heap size during initialization

0x54 DWORD sizeofhepcommit; / / the size of the heap actually submitted during initialization

0x58 DWORD LoaderFlags;

0x5c DWORD NumberOfRvaAndSizes;

0x60 _IMAGE_DATA_DIRECTORY DataDirectory[16];

};

process

The following is the data I wrote in C to parse PE header (only PE header). The method is stupid. If you have a better method, please let me know. Thank you.

My idea is to record the information of each member of the PE header with an array, and then match and read it in turn.

#Analyze_Of_PE_Header.h

"""Analyze_Of_PE_Header.h"""
#pragma once
typedef char BYTE;
typedef short WORD;
typedef int DWORD;
//DOS header information
extern const BYTE* _IMAGE_DOS_HEADER[19];
//Size of each member of DOS header
extern DWORD DOS_LENGTH[21];

//Standard PE header information
extern const BYTE* _IMAGE_FILE_HEADER[7];
//Size of each member of standard PE header
extern DWORD FILE_PE_LENGTH[7];

//Optional PE header information
extern const BYTE* _IMAGE_OPTIONAL_HEADER[30];
//Size of each member of Optional PE header
extern DWORD _OPTIONAL_HEADER_LENGTH[30];

char* ReadFile(char*);
bool Analyse_PE_Head(char*);

#Analyze_Of_PE_Header.cpp

#define _CRT_SECURE_NO_WARNINGS
#include "Analyze_Of_PE_Header.h"
#include <stdio.h>
#include <malloc.h>
#include <string.h>

#DOS header information
const BYTE* _IMAGE_DOS_HEADER[19]
{
	"e_magic",
	"e_cblp",
	"e_cp",
	"e_crlc",
	"e_cparhdr",
	 "e_minalloc",
	"e_maxalloc",
	 "e_ss",
	 "e_sp",
	"e_csum",
	"e_ip",
	"e_cs",
	"e_lfarlc",
	"e_ovno",
	"e_res[4]",
	"e_oemid",
	"e_oeminfo",
	"e_res2[10]",
	"e_lfanew"
};

DWORD DOS_LENGTH[21]
{
	2,2,2,2,2,2,2,2,2,2,2,2,2,2,8,2,2,20,4
};


#Standard PE header information
const BYTE* _IMAGE_FILE_HEADER[7]
{
	"Machine",
	"NumberOfSections",
	"TimeDateStamp",
	"PointerToSymbolTable",
	"NumberOfSymbols",
	"SizeOfOptionalHeader",
	"Characteristics"
};

DWORD FILE_PE_LENGTH[7]
{
	2,2,4,4,4,2,2
};

//Optional PE header information
const BYTE* _IMAGE_OPTIONAL_HEADER[30]
{
	"Magic",
	"MajorLinkerVersion",
	"MinorLinkerVersion",
	"SizeOfCode",
	"SizeOfInitializedData",
	"SizeOfUninitializedData",
	"AddressOfEntryPoint",
	"BaseOfCode",
	"BaseOfData",
	"ImageBase",
	"SectionAlignment",
	"FileAlignment",
	"MajorOperatingSystemVersion",
	"MinorOperatingSystemVersion",
	"MajorImageVersion",
	"MinorImageVersion",
	"MajorSubsystemVersion",
	"MinorSubsystemVersion",
	"Win32VersionValue",
	"SizeOfImage",
	"SizeOfHeaders",
	"CheckSum",
	"Subsystem",
	"DllCharacteristics",
	"SizeOfStackReserve",
	"SizeOfStackCommit",
	"SizeOfHeapReserve",
	"SizeOfHeapCommit",
	"LoaderFlags",
	"NumberOfRvaAndSizes"
};
DWORD _OPTIONAL_HEADER_LENGTH[30]
{
	2,1,1,4,4,
	4,4,4,4,4,
	4,4,2,2,2,
	2,2,2,4,4,
	4,4,2,2,4,
	4,4,4,4,4
};
char* ReadFile(char* p)
{
	FILE* fp = fopen(p, "rb");
	if (fp == NULL)
	{
		printf("File open failed\n");
		return NULL;
	}
	fseek(fp, 0, SEEK_END);
	int len = ftell(fp);
	char* buf = (char*)malloc(len);
	if (buf == NULL)
	{
		printf("memory allocation failed\n");
		return NULL;
	}
	fseek(fp, 0, SEEK_SET);
	fread(buf, 1, len, fp);
	fclose(fp);
	return buf;
}

bool Analyse_PE_Head(char* p)
{
	char* Buf = ReadFile(p);
	char* BufBackUp = Buf;
	int Turn = 0;
	if (Buf == NULL)
	{
		return false;
	}
	//DOS header
	WORD* Test = (WORD*)Buf;
	if ((*Test) != 0x5A4D)
	{
		return false;
	}
	printf("DOS head:\n");
	int i = 0;
	while (i < 19)
	{
		printf("%s: ", _IMAGE_DOS_HEADER[i]);
		if (DOS_LENGTH[i] == 2)
		{
			WORD* Ptemp = (WORD*)Buf;
			Buf += 2;
			printf("%x\n", (*Ptemp));
		}
		else if (DOS_LENGTH[i] == 4)
		{
			DWORD* Ptemp = (DWORD*)Buf;
			Buf += 4;
			//Record PE head offset
			Turn = (*Ptemp);
			printf("%x\n", (*Ptemp));
		}
		else
		{
			WORD* Ptemp = (WORD*)Buf;
			int j = 0;

			while (j < (DOS_LENGTH[i] / 2))
			{
				printf("%x", *Ptemp);
				Buf += 2;
				Ptemp++;
				j++;
			}
			printf("\n");
		}
		i++;
	}
	printf("\n-----------------------------------------------\n");

	//PE head jump
	Buf = BufBackUp + Turn;
	//NT header start
	int* temp = (int*)Buf;
	if ((*temp) != 0x4550)
	{
		return false;
	}
	printf("PE identification:%x\n", (*temp));
	Buf += 4;

	//Standard PE head
	printf("standard PE head\n");
	i = 0;
	while (i < 7)
	{
		printf("%s: ", _IMAGE_FILE_HEADER[i]);
		if (FILE_PE_LENGTH[i] == 2)
		{
			WORD* Ptemp = (WORD*)Buf;
			Buf += 2;
			printf("%x\n", (*Ptemp));
		}
		else
		{
			DWORD* Ptemp = (DWORD*)Buf;
			Buf += 4;
			printf("%x\n", (*Ptemp));
		}
		i++;
	}
	printf("\n-----------------------------------------------\n");

	//Optional PE head
	printf("Optional PE head\n");
	i = 0;
	while (i < 30)
	{
		printf("%s: ", _IMAGE_OPTIONAL_HEADER[i]);
		if (_OPTIONAL_HEADER_LENGTH[i] == 2)
		{
			WORD* Ptemp = (WORD*)Buf;
			Buf += 2;
			printf("%x\n", (*Ptemp));
		}
		else if (_OPTIONAL_HEADER_LENGTH[i] == 1)
		{
			BYTE* Ptemp = (BYTE*)Buf;
			Buf++;
			printf("%x\n", (*Ptemp));
		}
		else
		{
			DWORD* Ptemp = (DWORD*)Buf;
			Buf += 4;
			printf("%x\n", (*Ptemp));
		}
		i++;
	}
	printf("\n-----------------------------------------------\n");
	return true;
}

#main.cpp

#include <stdio.h>
#include "Analyze_Of_PE_Header.h"

int main(void)
{
	char path[] = "C:\\Program Files (x86)\\NetSarang\\Xshell 7\\Xshell.exe";
	bool result = Analyse_PE_Head(path);
	if (result == NULL)
	{
		printf("Parsing failed\n");
	}

	return 0;
}

This is the result of the run

summary

However, I also have an idea to create different structures according to different headers, but I encountered problems in finding the members of the corresponding structure. I didn't think of an implementation method, so I adopted this method.

Keywords: C++

Added by alecodonnell on Tue, 30 Nov 2021 17:45:01 +0200