The use of disassembly tool objdump and the interpretation of disassembly file

What is disassembly

As the name suggests, disassembly is the reverse process of assembly, which assembles binary files into assembly code. Arm linux objdump is a tool in the cross compilation tool chain. It is specially used for disassembly. The binary code is disassembled into assembly code for viewing.

Why disassembly

1. Reverse cracking. Disassemble the executable program to get the assembly code, and then infer the logic of the whole program according to the assembly code. This is not what ordinary people can do. It is very difficult to understand a large number of programs written in assembly language, let alone deduce other people's code logic.
2. Debugging the program can help us understand and detect whether the generated executable program is normal, especially when understanding the concepts of link script and link address.
3. The executable generated by the source code compilation link of C language and then disassembled can help us understand the corresponding relationship between C language and assembly language, and contribute to an in-depth understanding of C language.

Generation and interpretation of disassembly file

Generation of disassembly file:

led.bin: start.o 
	arm-linux-ld -Ttext 0x0 -o led.elf $^
	arm-linux-objcopy -O binary led.elf led.bin
	arm-linux-objdump -D led.elf > led_elf.dis
	gcc mkv210_image.c -o mkx210
	./mkx210 led.bin 210.bin
	
%.o : %.S
	arm-linux-gcc -o $@ $< -c

%.o : %.c
	arm-linux-gcc -o $@ $< -c 

clean:
	rm *.o *.elf *.bin *.dis mkx210 -f

The above is a simple Makefile, whose function is to convert the source file S and c is compiled into o documents, and then o link files into Elf executable. arm-linux-objdump -D led.elf > led_ elf. DIS is the LED Elf reverse compilation ed_elf.dis.

Source file: star S is an assembly file

//. globl indicates that the following variables have global attributes, corresponding to the global variables of C language
.globl _start

_start:
	 set up GPJ0CON of bit[0:15],to configure GPJ0_0/1/2/3 Pin for output function
	// Set the bit[12:23] of GPJ0CON and configure GPJ0_3/4/5 pins are output functions
	ldr r1, =0xE0200240 					
	ldr r0, =0x00111000
	str r0, [r1]

	mov r2, #0x1000

	//Set GPD0_1 is output mode
	ldr r1, =0xE02000A0 					
	ldr r0, =0x00000010
	str r0, [r1]	
	
led_blink:
	 set up GPJ2DAT of bit[0:3],send GPJ2_0/1/2/3 Pin output low level, LED bright
	// Set bit[3:5] of GPJ0DAT to make GPJ0_3/4/5 pin output low level, LED on
	ldr r1, =0xE0200244 					
	mov r0, #0
	str r0, [r1]
	
	ldr r1, =0xE02000A4					
	mov r0, #0
	str r0, [r1]

	// delayed
	bl delay							

	 set up GPJ2DAT of bit[0:3],send GPJ2_0/1/2/3 Pin output high level, LED Extinguish
	// Set bit[3:5] of GPJ0DAT to make GPJ0_3/4/5 pin output high level, LED off
	ldr r1, =0xE0200244 					
	mov r0, #0x38
	str r0, [r1]
	
	ldr r1, =0xE02000A4					
	mov r0, #0x2
	str r0, [r1]

	// delayed
	bl delay	

	sub r2, r2, #1
	cmp r2,#0
	bne led_blink


halt:
	b halt


delay:
	mov r0, #0x900000
delay_loop:
	cmp r0, #0
	sub r0, r0, #1
	bne delay_loop
	mov pc, lr

star.s is an assembler that lights up the LED when learning S5PV210 development board. It is composed of start, lighting, delay and dead cycle. Here, we don't pay attention to the specific functions, but focus on comparing with the files generated by disassembly.

Disassembly file obtained: led_elf.dis

led.elf:     file format elf32-littlearm

Disassembly of section .text:
//First column second column third column 
00000000 <_start>:
   0:	e59f1070 	ldr	r1, [pc, #112]	; 78 <delay_loop+0x10>
   4:	e59f0070 	ldr	r0, [pc, #112]	; 7c <delay_loop+0x14>
   8:	e5810000 	str	r0, [r1]
   c:	e3a02a01 	mov	r2, #4096	; 0x1000
  10:	e59f1068 	ldr	r1, [pc, #104]	; 80 <delay_loop+0x18>
  14:	e3a00010 	mov	r0, #16
  18:	e5810000 	str	r0, [r1]

0000001c <led_blink>:
  1c:	e59f1060 	ldr	r1, [pc, #96]	; 84 <delay_loop+0x1c>
  20:	e3a00000 	mov	r0, #0
  24:	e5810000 	str	r0, [r1]
  28:	e59f1058 	ldr	r1, [pc, #88]	; 88 <delay_loop+0x20>
  2c:	e3a00000 	mov	r0, #0
  30:	e5810000 	str	r0, [r1]
  34:	eb00000a 	bl	64 <delay>
  38:	e59f1044 	ldr	r1, [pc, #68]	; 84 <delay_loop+0x1c>
  3c:	e3a00038 	mov	r0, #56	; 0x38
  40:	e5810000 	str	r0, [r1]
  44:	e59f103c 	ldr	r1, [pc, #60]	; 88 <delay_loop+0x20>
  48:	e3a00002 	mov	r0, #2
  4c:	e5810000 	str	r0, [r1]
  50:	eb000003 	bl	64 <delay>
  54:	e2422001 	sub	r2, r2, #1
  58:	e3520000 	cmp	r2, #0
  5c:	1affffee 	bne	1c <led_blink>

00000060 <halt>:
  60:	eafffffe 	b	60 <halt>

00000064 <delay>:
  64:	e3a00609 	mov	r0, #9437184	; 0x900000

00000068 <delay_loop>:
  68:	e3500000 	cmp	r0, #0
  6c:	e2400001 	sub	r0, r0, #1
  70:	1afffffc 	bne	68 <delay_loop>
  74:	e1a0f00e 	mov	pc, lr
  78:	e0200240 	eor	r0, r0, r0, asr #4
  7c:	00111000 	andseq	r1, r1, r0
  80:	e02000a0 	eor	r0, r0, r0, lsr #1
  84:	e0200244 	eor	r0, r0, r4, asr #4
  88:	e02000a4 	eor	r0, r0, r4, lsr #1

Disassembly of section .ARM.attributes:

00000000 <.ARM.attributes>:
   0:	00001a41 	andeq	r1, r0, r1, asr #20
   4:	61656100 	cmnvs	r5, r0, lsl #2
   8:	01006962 	tsteq	r0, r2, ror #18
   c:	00000010 	andeq	r0, r0, r0, lsl r0
  10:	45543505 	ldrbmi	r3, [r4, #-1285]	; 0x505
  14:	08040600 	stmdaeq	r4, {r9, sl}
  18:	Address 0x00000018 is out of bounds.


Resolution:
1. The first line: LED elf: file format elf32-littlearm. Indicates that this assembler is composed of LED Elf generation, the program is 32 small end mode.
2.00000000 <_ Start >: 00000000 in front is the address of the label<_ Start > is the label, corresponding to start S_ Start label. In fact, the label is equivalent to the function name in C language. In C language, the function name can also be used to represent the first address of the function, which can be confirmed here. The label of disassembly is obtained from the assembly file, which makes it convenient for us to find the corresponding part of the disassembly file and the assembly file.
3. The whole disassembly file is divided into three columns, corresponding to the instruction address, instruction machine code and the instruction to which the instruction machine code is disassembled.

Disassembly file led_elf.dis interpretation:

//Assembly file
_start:
	 set up GPJ0CON of bit[0:15],to configure GPJ0_0/1/2/3 Pin for output function
	// Set the bit[12:23] of GPJ0CON and configure GPJ0_3/4/5 pins are output functions
	ldr r1, =0xE0200240 					
	ldr r0, =0x00111000
	str r0, [r1]

	mov r2, #0x1000

//Corresponding disassembly file part
00000000 <_start>:
   0:	e59f1070 	ldr	r1, [pc, #112]	; 78 <delay_loop+0x10>
   4:	e59f0070 	ldr	r0, [pc, #112]	; 7c <delay_loop+0x14>
   8:	e5810000 	str	r0, [r1]
   c:	e3a02a01 	mov	r2, #4096	; 0x1000
   ......
  70:	1afffffc 	bne	68 <delay_loop>
  74:	e1a0f00e 	mov	pc, lr
  78:	e0200240 	eor	r0, r0, r0, asr #4
  7c:	00111000 	andseq	r1, r1, r0
  80:	e02000a0 	eor	r0, r0, r0, lsr #1
  84:	e0200244 	eor	r0, r0, r4, asr #4
  88:	e02000a4 	eor	r0, r0, r4, lsr #1

Here we interpret the first few sentences of the assembly file:
1.ldr r1, [pc, #112]: this sentence corresponds to LDR r1 of assembly file, = 0xe020240. The function is to save 0xe020240 into r1 register. [pc, #112] represents the data at the address of pc+70 (#112 is decimal). At this time, PC points to the lower two levels of the current address, that is, pc = 0 + 8, so pc+70 = 78. The data stored at address 78 is e020240, which is just equal to the data 0xe020240 to be loaded by the assembly statement. Therefore, ldr r1, [pc, #112] and LDR r1, = 0xe020240 realize the same function.
2.ldr r0, [pc, #112] corresponds to LDR R0 of assembly file, = 0x00111000. The interpretation method is the same as the above, but note that PC= 4 + 8 at this time.
3. str r0, [r1] statement Assembly statement and disassembly statement are consistent.
4. mov r2, #4096 corresponds to the assembled mov r2, #0x1000. They are the same. The decimal 4096 is equal to the hexadecimal 0x1000.
Supplement: 1 The PC points to the first two stages of the current address because of the existence of pipeline. The pipeline stages of different types of ARM chips are different, but in the disassembly file, in order to unify, they are processed according to the three-stage pipeline.
2. Why load data into registers, either directly (mov r2, #4096) or by relative addressing (ldr r1, [pc, #112])? Legal and illegal immediate numbers are involved here. The simple incoming book is that the data is too large and the data part of a sentence cannot be expressed. Therefore, the data to be loaded is placed at an address, and the ldr is also a pseudo instruction when it is needed.

Added by ShiloVir on Wed, 02 Feb 2022 17:53:20 +0200