015 Android executable dex

Start with a hello world

smali code and dex are inextricably linked. Start with the simplest dex file hello world and learn the structure of the whole dex file

.class public LHelloWorld;
.super Ljava/lang/Object;

.method public static main([Ljava/lang/String;)V
    .registers 2

    sget-object v0, Ljava/lang/System;->out:Ljava/io/PrintStream;

    const-string	v1, "Hello World!"

    invoke-virtual {v0, v1}, Ljava/io/PrintStream;->println(Ljava/lang/String;)V

    return-void
.end method

Take the above smali code as an example

D:\Android\tools>java -jar smali.jar -o classes.dex HelloWorld.smali

Then convert the smali code into a dex file and make it into a compressed package named HelloWorld zip

Ensure that the adb and simulator are successfully connected

Then upload the packaged zip file to the simulator

Then execute the dex file, and you can see that Hello World has been successfully printed

Dex file structure

Dex files can be divided into three main blocks:

  1. Dex header
  2. Arrays of various data, including string type methods and prototype field methods
  3. Class data

File header

First, let's look at the header

The total size is 0x70 bytes, and there are four more important fields

  1. dex_magic: indicates the file ID and characteristic string of the DEX file
  2. Checksum: checksum, which calculates the 32-bit hash value of the file (from field 3 to the end of the file)
  3. signature: indicates sha1. Hash the file (from field 4 to the end of the file)
  4. file_size: indicates the file size

In addition to these four fields, there are other fields in the file header

  1. header_size: dex header size
  2. endian_tag: data arrangement method - small end method

Size and offset of various tables

  1. string_ids_size and string_ids_off, the size and offset of the string table
  2. type_ids_size and type_ids_off, the size and offset of the type table
  3. proto_ids_size and proto_ids_off, the size and offset of the field table
  4. class_defs_size and class_defs_off, the size and offset of the class data table

Array of various data

The second part of the dex file is an array of various data, including string type methods, prototype field methods

String table

String table item is the offset of a string data, and the offset points to a string_data structure. string_ There are two fields in the data structure

Field 1: string length. The data type is uleb128. It is a unique variable length data type in Android

Field 2: store data, string ends with 0

Type table

Type table, which stores an index value and points to a string table

For example, an index value of 3 indicates a string. The subscript of 3 in the table points to the string L/java/lang/Object.

Prototype table

The prototype table stores the description information of each part of the function prototype. Including short_idx, return_type_idx, parameters_off, and finally an array subscript pointing to the string table.

Note: the field is the value of the return type (return_type_idx) and is the index in the type table

Field table

It stores field information, including the class where the field is located (class_idx), the type of the field (type_idx), and the name of the field (name_idx).

class_idx is the index in the type table, type_idx is the index in the type table, and the index of the field name is the array subscript of the string

Method table

The method table stores the information of the method, including the class where the method is located (class_dex), the prototype of the method (proto_idx), and the name of the method (name_idx).

Where class_idx is the index of the type table, proto_idx is the index of the prototype table, and the index of the method name (name_idx) is the array subscript of the string table

Class data

Class data is also an array, and each element is the relevant information of a class. The file analyzed now has only one class, so there is only one class information.

Class in table entry_ Data stores class data, including class name index, access attribute, parent class index, interface offset, source code index, annotation offset and class data offset

The data of the whole class is in class_data_item in this structure.

method_list is the list of all methods in the class. Because the current file has only one Main method, there is only one structure in the list. The structure contains the basic information of the method, including method index, access flag, code offset and code information

Where code_item is the information of the whole code, in which two fields are particularly important

ins_size: instruction length

ushort insns[8]: instruction array

This array stores the virtual machine instructions translated into smali code, that is, OpCode

Parsing Smali code manually

62 00 00 00 1A 01 00 00 6E 20 01 00 10 00 0E 00

Next, copy this hexadecimal and manually parse this code into Smali code.

Here, we also need to use a document (Chinese) Dalvik operation code, which contains all opcodes, corresponding operation codes and examples.

First, find 62. The instruction meaning represented by 62 is to read the static object reference field to vx according to the field ID. then, you need to understand the next example

6201 0C00

The code parsed as smali is

sget‐object v1, Test3.os1:Lja va/lang/Object; // field@000c

Read the static Object reference field os1 (field table #CH entry) of the Object to v1.

In other words, this instruction has a total of 4 bytes

62 Represents the opcode sget‐object
01 Represents a register with sequence number 1 v1
000C The index of the representative field table is 0 xC Field of

Then let's look at the Opcode we want to parse

62 00 00 00 1A 01 00 00 6E 20 01 00 10 00 0E 00

Parse the first four bytes first

62 00 00 00

The specific meanings are as follows

62 Represents the opcode sget‐object
00 Represents a register with sequence number 0 v0
0000 Represents a field with a field table index of 00

Next, find the 0th field in the field table

java.io.PrintStream java.lang.System.out

The 0th field is the out object. Translate this field into smali code

Ljava.lang.System;->out:java.io.PrintStream

So the first four bytes

62 00 00 00

Parsing to smali code is

sget‐object v0, Ljava.lang.System;->out:java.io.PrintStream

Then look at 1A

1A08 0000

Resolve as

 const‐string v8, ""  // string @0000  

Deposit string@0000 (string table #0 entry) reference to v 8

Opcode to be resolved

1A 01 00 00

Next, find the string with the string table index 0

Then this instruction is parsed into Smali code

const‐string v1, "Hello World!"

Next, find 6E

Look directly at the example

6E53 0600 0421 ‐ invoke‐virtual { v4, v0,  v1, v2, v3}, Test2.method5:(IIII)V // me thod@0006  

This instruction is complex, with a total of 6 bytes, of which

6E---> invoke‐virtual
5---> Number of parameters
3---> v3
0600--->method@0006 
0421--->v4, v0,  v1, v2

The compilation of the call parameter table is strange. If the number of parameters is greater than 4, the fifth parameter will be compiled at the 4 lowest bits of the next byte of the instruction byte

So the Opcode we want to parse

6E 20 01 00 10 00

Can be resolved as

6E---> invoke‐virtual
2---> Number of parameters
01 00--->method@0001
10 00--->v1 v0

Translated as

invoke‐virtual {v0,v1},method@01

Find the method with subscript 1 in the function table

void java.io.PrintStream.println(java.lang.String)

Convert to Smali code

Ljava/io/PrintStream;->println(Ljava/lang/String;)V

The complete smali code is

invoke‐virtual {v0,v1}, Ljava/io/PrintStream;->println(Ljava/lang/String;)V

Finally, look at 0E

Indicates that the return value is null

62 00 00 00 1A 01 00 00 6E 20 01 00 10 00 0E 00

The parsed complete smali code is

sget‐object v0, Ljava.lang.System;->out:java.io.PrintStream;

const‐string v1, "Hello World!"

invoke‐virtual {v0,v1}, Ljava/io/PrintStream;->println(Ljava/lang/String;)V

return‐void 

And HelloWorld There is no difference in the source code of SmalI

Added by unknown101 on Tue, 25 Jan 2022 00:12:09 +0200