What is disassembly
As the name suggests, disassembly is the reverse process of assembly, which assembles binary files into assembly code. Arm linux objdump is a tool in the cross compilation tool chain. It is specially used for disassembly. The binary code is disassembled into assembly code for viewing.
Why disassembly
1. Reverse cracking. Disassemble the executable program to get the assembly code, and then infer the logic of the whole program according to the assembly code. This is not what ordinary people can do. It is very difficult to understand a large number of programs written in assembly language, let alone deduce other people's code logic.
2. Debugging the program can help us understand and detect whether the generated executable program is normal, especially when understanding the concepts of link script and link address.
3. The executable generated by the source code compilation link of C language and then disassembled can help us understand the corresponding relationship between C language and assembly language, and contribute to an in-depth understanding of C language.
Generation and interpretation of disassembly file
Generation of disassembly file:
led.bin: start.o arm-linux-ld -Ttext 0x0 -o led.elf $^ arm-linux-objcopy -O binary led.elf led.bin arm-linux-objdump -D led.elf > led_elf.dis gcc mkv210_image.c -o mkx210 ./mkx210 led.bin 210.bin %.o : %.S arm-linux-gcc -o $@ $< -c %.o : %.c arm-linux-gcc -o $@ $< -c clean: rm *.o *.elf *.bin *.dis mkx210 -f
The above is a simple Makefile, whose function is to convert the source file S and c is compiled into o documents, and then o link files into Elf executable. arm-linux-objdump -D led.elf > led_ elf. DIS is the LED Elf reverse compilation ed_elf.dis.
Source file: star S is an assembly file
//. globl indicates that the following variables have global attributes, corresponding to the global variables of C language .globl _start _start: set up GPJ0CON of bit[0:15],to configure GPJ0_0/1/2/3 Pin for output function // Set the bit[12:23] of GPJ0CON and configure GPJ0_3/4/5 pins are output functions ldr r1, =0xE0200240 ldr r0, =0x00111000 str r0, [r1] mov r2, #0x1000 //Set GPD0_1 is output mode ldr r1, =0xE02000A0 ldr r0, =0x00000010 str r0, [r1] led_blink: set up GPJ2DAT of bit[0:3],send GPJ2_0/1/2/3 Pin output low level, LED bright // Set bit[3:5] of GPJ0DAT to make GPJ0_3/4/5 pin output low level, LED on ldr r1, =0xE0200244 mov r0, #0 str r0, [r1] ldr r1, =0xE02000A4 mov r0, #0 str r0, [r1] // delayed bl delay set up GPJ2DAT of bit[0:3],send GPJ2_0/1/2/3 Pin output high level, LED Extinguish // Set bit[3:5] of GPJ0DAT to make GPJ0_3/4/5 pin output high level, LED off ldr r1, =0xE0200244 mov r0, #0x38 str r0, [r1] ldr r1, =0xE02000A4 mov r0, #0x2 str r0, [r1] // delayed bl delay sub r2, r2, #1 cmp r2,#0 bne led_blink halt: b halt delay: mov r0, #0x900000 delay_loop: cmp r0, #0 sub r0, r0, #1 bne delay_loop mov pc, lr
star.s is an assembler that lights up the LED when learning S5PV210 development board. It is composed of start, lighting, delay and dead cycle. Here, we don't pay attention to the specific functions, but focus on comparing with the files generated by disassembly.
Disassembly file obtained: led_elf.dis
led.elf: file format elf32-littlearm Disassembly of section .text: //First column second column third column 00000000 <_start>: 0: e59f1070 ldr r1, [pc, #112] ; 78 <delay_loop+0x10> 4: e59f0070 ldr r0, [pc, #112] ; 7c <delay_loop+0x14> 8: e5810000 str r0, [r1] c: e3a02a01 mov r2, #4096 ; 0x1000 10: e59f1068 ldr r1, [pc, #104] ; 80 <delay_loop+0x18> 14: e3a00010 mov r0, #16 18: e5810000 str r0, [r1] 0000001c <led_blink>: 1c: e59f1060 ldr r1, [pc, #96] ; 84 <delay_loop+0x1c> 20: e3a00000 mov r0, #0 24: e5810000 str r0, [r1] 28: e59f1058 ldr r1, [pc, #88] ; 88 <delay_loop+0x20> 2c: e3a00000 mov r0, #0 30: e5810000 str r0, [r1] 34: eb00000a bl 64 <delay> 38: e59f1044 ldr r1, [pc, #68] ; 84 <delay_loop+0x1c> 3c: e3a00038 mov r0, #56 ; 0x38 40: e5810000 str r0, [r1] 44: e59f103c ldr r1, [pc, #60] ; 88 <delay_loop+0x20> 48: e3a00002 mov r0, #2 4c: e5810000 str r0, [r1] 50: eb000003 bl 64 <delay> 54: e2422001 sub r2, r2, #1 58: e3520000 cmp r2, #0 5c: 1affffee bne 1c <led_blink> 00000060 <halt>: 60: eafffffe b 60 <halt> 00000064 <delay>: 64: e3a00609 mov r0, #9437184 ; 0x900000 00000068 <delay_loop>: 68: e3500000 cmp r0, #0 6c: e2400001 sub r0, r0, #1 70: 1afffffc bne 68 <delay_loop> 74: e1a0f00e mov pc, lr 78: e0200240 eor r0, r0, r0, asr #4 7c: 00111000 andseq r1, r1, r0 80: e02000a0 eor r0, r0, r0, lsr #1 84: e0200244 eor r0, r0, r4, asr #4 88: e02000a4 eor r0, r0, r4, lsr #1 Disassembly of section .ARM.attributes: 00000000 <.ARM.attributes>: 0: 00001a41 andeq r1, r0, r1, asr #20 4: 61656100 cmnvs r5, r0, lsl #2 8: 01006962 tsteq r0, r2, ror #18 c: 00000010 andeq r0, r0, r0, lsl r0 10: 45543505 ldrbmi r3, [r4, #-1285] ; 0x505 14: 08040600 stmdaeq r4, {r9, sl} 18: Address 0x00000018 is out of bounds.
Resolution:
1. The first line: LED elf: file format elf32-littlearm. Indicates that this assembler is composed of LED Elf generation, the program is 32 small end mode.
2.00000000 <_ Start >: 00000000 in front is the address of the label<_ Start > is the label, corresponding to start S_ Start label. In fact, the label is equivalent to the function name in C language. In C language, the function name can also be used to represent the first address of the function, which can be confirmed here. The label of disassembly is obtained from the assembly file, which makes it convenient for us to find the corresponding part of the disassembly file and the assembly file.
3. The whole disassembly file is divided into three columns, corresponding to the instruction address, instruction machine code and the instruction to which the instruction machine code is disassembled.
Disassembly file led_elf.dis interpretation:
//Assembly file _start: set up GPJ0CON of bit[0:15],to configure GPJ0_0/1/2/3 Pin for output function // Set the bit[12:23] of GPJ0CON and configure GPJ0_3/4/5 pins are output functions ldr r1, =0xE0200240 ldr r0, =0x00111000 str r0, [r1] mov r2, #0x1000 //Corresponding disassembly file part 00000000 <_start>: 0: e59f1070 ldr r1, [pc, #112] ; 78 <delay_loop+0x10> 4: e59f0070 ldr r0, [pc, #112] ; 7c <delay_loop+0x14> 8: e5810000 str r0, [r1] c: e3a02a01 mov r2, #4096 ; 0x1000 ...... 70: 1afffffc bne 68 <delay_loop> 74: e1a0f00e mov pc, lr 78: e0200240 eor r0, r0, r0, asr #4 7c: 00111000 andseq r1, r1, r0 80: e02000a0 eor r0, r0, r0, lsr #1 84: e0200244 eor r0, r0, r4, asr #4 88: e02000a4 eor r0, r0, r4, lsr #1
Here we interpret the first few sentences of the assembly file:
1.ldr r1, [pc, #112]: this sentence corresponds to LDR r1 of assembly file, = 0xe020240. The function is to save 0xe020240 into r1 register. [pc, #112] represents the data at the address of pc+70 (#112 is decimal). At this time, PC points to the lower two levels of the current address, that is, pc = 0 + 8, so pc+70 = 78. The data stored at address 78 is e020240, which is just equal to the data 0xe020240 to be loaded by the assembly statement. Therefore, ldr r1, [pc, #112] and LDR r1, = 0xe020240 realize the same function.
2.ldr r0, [pc, #112] corresponds to LDR R0 of assembly file, = 0x00111000. The interpretation method is the same as the above, but note that PC= 4 + 8 at this time.
3. str r0, [r1] statement Assembly statement and disassembly statement are consistent.
4. mov r2, #4096 corresponds to the assembled mov r2, #0x1000. They are the same. The decimal 4096 is equal to the hexadecimal 0x1000.
Supplement: 1 The PC points to the first two stages of the current address because of the existence of pipeline. The pipeline stages of different types of ARM chips are different, but in the disassembly file, in order to unify, they are processed according to the three-stage pipeline.
2. Why load data into registers, either directly (mov r2, #4096) or by relative addressing (ldr r1, [pc, #112])? Legal and illegal immediate numbers are involved here. The simple incoming book is that the data is too large and the data part of a sentence cannot be expressed. Therefore, the data to be loaded is placed at an address, and the ldr is also a pseudo instruction when it is needed.