Start with a hello world
smali code and dex are inextricably linked. Start with the simplest dex file hello world and learn the structure of the whole dex file
.class public LHelloWorld; .super Ljava/lang/Object; .method public static main([Ljava/lang/String;)V .registers 2 sget-object v0, Ljava/lang/System;->out:Ljava/io/PrintStream; const-string v1, "Hello World!" invoke-virtual {v0, v1}, Ljava/io/PrintStream;->println(Ljava/lang/String;)V return-void .end method
Take the above smali code as an example
D:\Android\tools>java -jar smali.jar -o classes.dex HelloWorld.smali
Then convert the smali code into a dex file and make it into a compressed package named HelloWorld zip
Ensure that the adb and simulator are successfully connected
Then upload the packaged zip file to the simulator
Then execute the dex file, and you can see that Hello World has been successfully printed
Dex file structure
Dex files can be divided into three main blocks:
- Dex header
- Arrays of various data, including string type methods and prototype field methods
- Class data
File header
First, let's look at the header
The total size is 0x70 bytes, and there are four more important fields
- dex_magic: indicates the file ID and characteristic string of the DEX file
- Checksum: checksum, which calculates the 32-bit hash value of the file (from field 3 to the end of the file)
- signature: indicates sha1. Hash the file (from field 4 to the end of the file)
- file_size: indicates the file size
In addition to these four fields, there are other fields in the file header
- header_size: dex header size
- endian_tag: data arrangement method - small end method
Size and offset of various tables
- string_ids_size and string_ids_off, the size and offset of the string table
- type_ids_size and type_ids_off, the size and offset of the type table
- proto_ids_size and proto_ids_off, the size and offset of the field table
- class_defs_size and class_defs_off, the size and offset of the class data table
Array of various data
The second part of the dex file is an array of various data, including string type methods, prototype field methods
String table
String table item is the offset of a string data, and the offset points to a string_data structure. string_ There are two fields in the data structure
Field 1: string length. The data type is uleb128. It is a unique variable length data type in Android
Field 2: store data, string ends with 0
Type table
Type table, which stores an index value and points to a string table
For example, an index value of 3 indicates a string. The subscript of 3 in the table points to the string L/java/lang/Object.
Prototype table
The prototype table stores the description information of each part of the function prototype. Including short_idx, return_type_idx, parameters_off, and finally an array subscript pointing to the string table.
Note: the field is the value of the return type (return_type_idx) and is the index in the type table
Field table
It stores field information, including the class where the field is located (class_idx), the type of the field (type_idx), and the name of the field (name_idx).
class_idx is the index in the type table, type_idx is the index in the type table, and the index of the field name is the array subscript of the string
Method table
The method table stores the information of the method, including the class where the method is located (class_dex), the prototype of the method (proto_idx), and the name of the method (name_idx).
Where class_idx is the index of the type table, proto_idx is the index of the prototype table, and the index of the method name (name_idx) is the array subscript of the string table
Class data
Class data is also an array, and each element is the relevant information of a class. The file analyzed now has only one class, so there is only one class information.
Class in table entry_ Data stores class data, including class name index, access attribute, parent class index, interface offset, source code index, annotation offset and class data offset
The data of the whole class is in class_data_item in this structure.
method_list is the list of all methods in the class. Because the current file has only one Main method, there is only one structure in the list. The structure contains the basic information of the method, including method index, access flag, code offset and code information
Where code_item is the information of the whole code, in which two fields are particularly important
ins_size: instruction length
ushort insns[8]: instruction array
This array stores the virtual machine instructions translated into smali code, that is, OpCode
Parsing Smali code manually
62 00 00 00 1A 01 00 00 6E 20 01 00 10 00 0E 00
Next, copy this hexadecimal and manually parse this code into Smali code.
Here, we also need to use a document (Chinese) Dalvik operation code, which contains all opcodes, corresponding operation codes and examples.
First, find 62. The instruction meaning represented by 62 is to read the static object reference field to vx according to the field ID. then, you need to understand the next example
6201 0C00
The code parsed as smali is
sget‐object v1, Test3.os1:Lja va/lang/Object; // field@000c
Read the static Object reference field os1 (field table #CH entry) of the Object to v1.
In other words, this instruction has a total of 4 bytes
62 Represents the opcode sget‐object 01 Represents a register with sequence number 1 v1 000C The index of the representative field table is 0 xC Field of
Then let's look at the Opcode we want to parse
62 00 00 00 1A 01 00 00 6E 20 01 00 10 00 0E 00
Parse the first four bytes first
62 00 00 00
The specific meanings are as follows
62 Represents the opcode sget‐object 00 Represents a register with sequence number 0 v0 0000 Represents a field with a field table index of 00
Next, find the 0th field in the field table
java.io.PrintStream java.lang.System.out
The 0th field is the out object. Translate this field into smali code
Ljava.lang.System;->out:java.io.PrintStream
So the first four bytes
62 00 00 00
Parsing to smali code is
sget‐object v0, Ljava.lang.System;->out:java.io.PrintStream
Then look at 1A
1A08 0000
Resolve as
const‐string v8, "" // string @0000
Deposit string@0000 (string table #0 entry) reference to v 8
Opcode to be resolved
1A 01 00 00
Next, find the string with the string table index 0
Then this instruction is parsed into Smali code
const‐string v1, "Hello World!"
Next, find 6E
Look directly at the example
6E53 0600 0421 ‐ invoke‐virtual { v4, v0, v1, v2, v3}, Test2.method5:(IIII)V // me thod@0006
This instruction is complex, with a total of 6 bytes, of which
6E---> invoke‐virtual 5---> Number of parameters 3---> v3 0600--->method@0006 0421--->v4, v0, v1, v2
The compilation of the call parameter table is strange. If the number of parameters is greater than 4, the fifth parameter will be compiled at the 4 lowest bits of the next byte of the instruction byte
So the Opcode we want to parse
6E 20 01 00 10 00
Can be resolved as
6E---> invoke‐virtual 2---> Number of parameters 01 00--->method@0001 10 00--->v1 v0
Translated as
invoke‐virtual {v0,v1},method@01
Find the method with subscript 1 in the function table
void java.io.PrintStream.println(java.lang.String)
Convert to Smali code
Ljava/io/PrintStream;->println(Ljava/lang/String;)V
The complete smali code is
invoke‐virtual {v0,v1}, Ljava/io/PrintStream;->println(Ljava/lang/String;)V
Finally, look at 0E
Indicates that the return value is null
62 00 00 00 1A 01 00 00 6E 20 01 00 10 00 0E 00
The parsed complete smali code is
sget‐object v0, Ljava.lang.System;->out:java.io.PrintStream; const‐string v1, "Hello World!" invoke‐virtual {v0,v1}, Ljava/io/PrintStream;->println(Ljava/lang/String;)V return‐void
And HelloWorld There is no difference in the source code of SmalI