Article introduction
- This article explains how java code is compiled into bytecode and executed on the Java virtual machine. It is very important to understand how java code is not compiled into bytecode and executed in the Java virtual machine, because it can help you understand what happens to your program at run time.
- This understanding can not only ensure that you have a logical understanding of language features, but also understand the compromises and side effects of language features when making specific discussions.
In bytecode, the number before each instruction (or opcode) indicates the position of this byte.
- For example, an instruction such as 1: iconst_1 is only one byte long and has no operands, so the position of the next bytecode is 2.
- For another example, such an instruction 1: bipush 5 will occupy two bytes, opcode bipush will occupy one byte, and operand 5 will occupy one byte.
- Then, the position of the next bytecode is 3, because the byte occupied by the operand is at position 2.
Java virtual machine is a stack based architecture. When a method includes initializing the execution of the main method, a stack frame will be created on the stack, in which the local variables in the method are stored.
variable
local variable
The local variable array contains all variables used during method execution, including a reference variable this, all method parameters and variables defined in the method body.
- The method parameters of class methods (such as static method) start from 0.
- Instance method. The 0th slot is used to store this, so the parameter needs to start from 1!.
Local variable type
-
boolean
-
byte
-
char
-
long
-
short
-
int
-
float
-
double
-
reference
-
returnAddress
-
All types except long and double occupy a slot in the local variable array. Long and double need two consecutive slots because they are 64 bit types.
-
When a new variable is created on the operand stack to store the value of the new variable. The value of the new variable is then stored in the corresponding position of the local variable array.
-
If this variable is not a basic type, the value on the corresponding slot stores a reference to this variable. This reference points to an object stored in the heap.
for example
int i = 5;
Compiled as bytecode
0: bipush 5((two bytes) 2: istore_0
bipush
Pushes a byte to the operand stack as an integer. In this example, 5 is pushed to the operand stack.
istore_0
It is a set in the format istore_n is one of the operands that stores an integer in the local variable table.
n is the position in the local variable table, and the value can only be 0,1,2,3. Another opcode, istore w, is used when the value is greater than 3. It places an operand in the appropriate position in the local variable array, which will be described in detail later.
The above code is executed in memory as follows:
Each method in this class file also contains a local variable table. If this code is included in a method, in the local variable table corresponding to this method in the class file, you will get the following entity (entry):
LocalVariableTable: Start Length Slot Name Signature 0 1 1 i I
Member variable (class variable)
A member variable (field) is stored on the heap as part of a class instance (or object). Information about this member variable is defined in the class bytecode field in the class file_ Info [] array, as follows:
ClassFile { u4 magic; u2 minor_version; u2 major_version; u2 constant_pool_count; cp_info contant_pool[constant_pool_count – 1]; u2 access_flags; u2 this_class; u2 super_class; u2 interfaces_count; u2 interfaces[interfaces_count]; u2 fields_count; field_info fields[fields_count]; u2 methods_count; method_info methods[methods_count]; u2 attributes_count; attribute_info attributes[attributes_count]; }
In addition, if this variable is initialized, the bytecode for initialization will be added to the instance constructor.
When the following code is compiled:
public class SimpleClass{ public int simpleField = 100; }
An additional summary will use the javap command to demonstrate adding member variables to a field_info array.
public int simpleField; Signature: I flags: ACC_PUBLIC
The bytecode for initialization is added to the constructor as follows:
public SimpleClass(); Signature: ()V flags: ACC_PUBLIC Code: stack=2, locals=1, args_size=1 0: aload_0 1: invokespecial #1 // Method java/lang/Object."<init>":()V 4: aload_0 5: bipush 100 7: putfield #2 // Field simpleField:I 10: return
aload_0
Push an object reference in the local variable array slot to the top of the operand stack.
Although the above code shows that there is no constructor to initialize member variables, in fact, the compiler will create a default constructor to initialize member variables.
- The first local variable actually points to this.
- aload_ The 0 opcode pushes the reference variable this to the operand stack.
- aload_0 is a set in the format aload_ One of the operands of. Their function is to push an object reference to the operand stack.
- Where n refers to the location of this object reference in the accessed local variable array, and the value can only be 0, 1, 2 or 3.
- Similar opcodes are iload_,lload_,fload_ And dload_, However, these opcodes are used to load values rather than an object reference. Here, i refers to int, l refers to long, f refers to float, and d refers to double.
- If the index of the local variable is greater than 3, it can be loaded using Iload, lload, flow, dload and aload. These opcodes require a single operand to specify the index of the local variable to be loaded.
invokespecial
The invokespecial instruction is used to call instance methods, private methods, methods of the parent class of the current class, construction methods, etc.
Part of the opcode of the method called by:
- invokedynamic(MethodHandle,Lamdba)
- invokeinterface (interface method)
- invokespecial (constructor, parent method, private method)
- invokestatic (static method)
- invokevirtual (instance method)
The invokespecial instruction is used in this code to call the constructor of the parent class.
bipush
Pushes a byte to the operand stack as an integer. In this example, 100 is pushed to the operand stack.
putfield
Followed by an operand #2, which is a reference to a member variable in the runtime constant pool (cp_info). In this example, this member variable is called simpleField. Assign a value to the member variable, and then the object containing the member variable is ejected from the operand stack.
Aload in front_ The 0 instruction pushes the object containing this member variable and the previous bipush instruction pushes 100 to the top of the operand stack respectively. putfield then removes them all from the top of the operand stack (Pop-Up). The end result is that the value of the member variable simpleFiled on this object is updated to 100.
The above code is executed in memory as follows:
java_class_variable_creation_byte_code
The putfield opcode has a single operand pointing to the second location in the constant pool.
The JVM maintains a constant pool, a runtime data structure similar to a symbol table, but contains more data.
The bytecode in Java needs data. Usually, because this data is too large to be stored directly in the bytecode, it is placed in the constant pool. The bytecode holds a reference to the constant pool. When a class file is created, some of them are constant pools, as shown below:
Constant pool: #1 = Methodref #4.#16 // java/lang/Object."<init>":()V #2 = Fieldref #3.#17 // SimpleClass.simpleField:I #3 = Class #13 // SimpleClass #4 = Class #19 // java/lang/Object #5 = Utf8 simpleField #6 = Utf8 I #7 = Utf8 <init> #8 = Utf8 ()V #9 = Utf8 Code #10 = Utf8 LineNumberTable #11 = Utf8 LocalVariableTable #12 = Utf8 this #13 = Utf8 SimpleClass #14 = Utf8 SourceFile #15 = Utf8 SimpleClass.java #16 = NameAndType #7:#8 // "<init>":()V #17 = NameAndType #5:#6 // simpleField:I #18 = Utf8 LSimpleClass; #19 = Utf8 java/lang/Object
Constant (class constant)
The variable modified by final is called a constant, and we identify it as ACC in the class file_ FINAL.
For example:
public class SimpleClass { public final int simpleField = 100; public int simpleField2 = 100; }
An ACC is added to the variable description_ Final parameter:
public static final int simpleField = 100; Signature: I flags: ACC_PUBLIC, ACC_FINAL ConstantValue: int 100
However, the initialization operation in the constructor is not affected:
4: aload_0 5: bipush 100 7: putfield #2 // Field simpleField2:I
Static variable
Variables modified by static, which we call static class variables, are identified as ACC in the class file_ Static, as follows:
public static int simpleField; Signature: I flags: ACC_PUBLIC, ACC_STATIC
No bytecode was found in the instance constructor to initialize static variables. The initialization of static variables is in the class constructor. It uses putstatic opcode instead of putfield bytecode, which is a part of the class constructor.
static {}; Signature: ()V flags: ACC_STATIC Code: stack=1, locals=0, args_size=0 0: bipush 100 2: putstatic #2 // Field simpleField:I 5: return
Conditional statement
Conditional flow control, such as if else statement and switch statement, uses one instruction to compare two values and branches with other bytecodes at the bytecode level.
- for loop and while loop statements are implemented in a similar way. The difference is that they usually contain a goto instruction to achieve the purpose of loop.
- Do while loops do not require any goto instructions because their conditional branches are at the end of the bytecode. For more details about loops, see loops section.
Some opcodes can compare two integers or two references and then take a branch in a single instruction. Comparisons between other types, such as double,long or float, need to be implemented in two steps.
First, after comparison, push 1,0 or - 1 to the top of the operand stack. Next, a branch is executed based on whether the value on the operand stack is greater than, less than or equal to 0.
First, let's take the if else statement as an example. Other different types of instructions for branch jumping will be included in the following explanation.
if-else
The following code shows a simple if else statement to compare the size of two integers.
public int greaterThen(int intOne, int intTwo) { if (intOne > intTwo) { return 0; } else { return 1; } }
This method is compiled into the following bytecode:
0: iload_1 1: iload_2 2: if_icmple 7 5: iconst_0 6: ireturn 7: iconst_1 8: ireturn
- First, use iload_1 and iload_2 push the two parameters to the operand stack.
- Then, use if_icmple compares two values at the top of the operand stack.
- If intOne is less than or equal to intTwo, the operand branch becomes bytecode 7 and jumps to bytecode instruction line 7line.
Note that in Java code, the test in if condition is completely opposite to that in bytecode, because in bytecode, if the test in if condition statement is successfully executed, the content in else statement block will be executed, while in Java code, if the test in if condition statement is successfully executed, the content in if statement block will be executed.
In other words, if_ The icmple instruction is testing. If the if condition is not true, skip the if code block. The body of the if code block is the bytecode with sequence numbers of 5 and 6, and the body of the else code block is the bytecode with sequence numbers of 7 and 8.
java_if_else_byte_code
The following code example shows a slightly more complex example, which requires a two-step comparison:
public int greaterThen(float floatOne, float floatTwo) { int result; if (floatOne > floatTwo) { result = 1; } else { result = 2; } return result; }
This method generates the following bytecode:
0: fload_1 1: fload_2 2: fcmpl 3: ifle 11 6: iconst_1 7: istore_3 8: goto 13 11: iconst_2 12: istore_3 13: iload_3 14: ireturn
In this example, first use the flow_ 1 and flow_ 2 push the two parameters to the top of the operand stack. This example is different from the previous one in that it requires two-step comparison. fcmpl first compares floatOne and floatTwo, and then pushes the result to the top of the operand stack. As follows:
floatOne > floatTwo -> 1 floatOne = floatTwo -> 0 floatOne < floatTwo -> -1 floatOne or floatTwo= Nan -> 1
Next, if the result of fcmpl is < = 0, ifle is used to jump to the bytecode at index 11.
- The difference between this example and the previous example is that there is only a single return statement at the end of this method, and there is a goto instruction at the end of the if statement block to prevent the else statement block from being executed.
- The goto branch corresponds to the bytecode Iload at sequence number 13_ 3. It is used to push the result stored in the third slot in the local variable table to the top of the scan operand stack, so that it can be returned by the return statement.
java_if_else_byte_code_extra_goto
Like the opcodes for numerical comparison, there are opcodes for reference equality comparison, such as = =, and for comparison with null, such as = = null and= Null, test the type of an object, such as instanceof.
- if_cmp eq ne lt le gt ge this set of opcodes is used for the two integers at the top of the operand stack and jumps to a new bytecode. Desirable values are:
eq – be equal to ne – Not equal to lt – less than le – Less than or equal to gt – greater than ge – Greater than or equal to
- if_acmp eq ne these two opcodes are used to test whether two references are equal (eq) or unequal (NE), and then jump to a new bytecode specified by the operand.
- The bytecodes ifnonnull/ifnull are used to test whether the two references are null or not, and then jump to a new bytecode specified by the operand.
- The opcode lcmp is used to compare two integers at the top of the operand stack, and then push a value to the operand stack, as shown below:
If value1 > Value2 - > push 1 if value1 = Value2 - > push 0 if value1 < Value2 - > push - 1
fcmp l g / dcmp l g this set of opcodes is used to compare two float or double values, and then push a value to the operand stack, as shown below:
If value1 > Value2 - > push 1 if value1 = Value2 - > push 0 if value1 < Value2 - > push - 1
The difference between operands of type l or g is how they handle NaN.
- fcmpg and dcmpg push int value 1 to the operand stack, while fcmpl and dcmpl push - 1 to the operand stack. This ensures that if one of the two values is NaN (Not A Number), the test will not succeed.
- For example, if x > y (where both X and y are double types) and one of X and Y is NaN, the fcmpl instruction will push - 1 to the operand stack.
- The next opcode will always be an ifle instruction. If the value at the top of the stack is less than 0, a branch jump will occur. As a result, if one of x and y is NaN, ifle will skip the if statement block to prevent the code in the if statement block from being executed.
- instanceof if the object at the top of the operand stack is an instance of a class, this opcode pushes an int value of 1 to the operand stack. The operand of this opcode is used to specify the class by providing an index in the constant pool. If the object is null or not an instance of the specified class, the int value 0 is pushed to the operand stack.
if eq ne lt le gt ge all these opcodes are used to compare the value at the top of the operand stack with 0, and then jump to the bytecode at the specified position of the operand.
If successful, these instructions are always used for more complex conditional logic that cannot be completed with one instruction, for example, to test the result of a method call.
switch
The allowed types of a Java switch expression can be char, byte, short, int, character, byte, short Integer, string or an enum type. To support switch statements.
The Java virtual machine uses two special instructions: tableswitch and lookupswitch, which are implemented by integer values. Using only integer values does not cause any problems, because char,byte,short and enum types can be promoted internally to int types.
Adding support for strings in Java 7 is also implemented through integers. tableswitch passes faster, but usually takes up more memory.
Table switch works by listing all possible case values between the minimum and maximum case values. The minimum and maximum values are also provided, so if the switch variable is not within the enumerated case value, the JVM will immediately jump to the default statement block. The values of case statements not provided in Java code will also be listed, but point to the default statement block to ensure that all values between the minimum and maximum values will be listed.
For example, execute the following swith statement:
public int simpleSwitch(int intOne) { switch (intOne) { case 0: return 3; case 1: return 2; case 4: return 1; default: return -1; }
This code generates the following bytecode:
0: iload_1 1: tableswitch { default: 42 min: 0 max: 4 0: 36 1: 38 2: 42 3: 42 4: 40 } 36: iconst_3 37: ireturn 38: iconst_2 39: ireturn 40: iconst_1 41: ireturn 42: iconst_m1 43: ireturn
The tableswitch instruction has values 0, 1 and 4 to match the case statements provided in Java code, and each value points to the bytecode of their corresponding code block. The tableswitch instruction also has values 2 and 3, which are not provided as case statements in Java code. They both point to the default code block. When these instructions are executed, the value at the top of the operand stack is checked to see if it is between the maximum and minimum values. If the value is not between the minimum and maximum values, the code execution will jump to the default branch, which is located at the bytecode with sequence number 42 in the above example. To ensure that the value of the default branch can be found by the tableswitch instruction, it is always at the first byte (after any required alignment padding). If the value is between the minimum value and the maximum value, it is used to index the interior of tableswitch to find the appropriate bytecode for branch jump.
For example, if the value is, code execution jumps to the bytecode at sequence number 38. The following figure shows how this bytecode is executed:
java_switch_tableswitch_byte_code
If the value in the case statement is "too far away" (for example, too sparse), this method is not desirable because it will occupy too much memory. When the case in the switch is sparse, you can use lookupswitch instead of tableswitch. Lookupswitch will list the bytecode corresponding to the branch for each case sentence example, but will not list all possible values.
- When lookupswitch is executed, the value at the top of the operand stack is compared with each value in lookupswitch to determine the correct branch address. Using lookups switch, the JVM will find the correct match in the match list, which is a time-consuming operation. Using table switch, the JVM can quickly locate the correct value.
- When a selection statement is compiled, the compiler must make a trade-off between memory and performance to decide which selection statement to choose. In the following code, the compiler will use lookupswitch:
public int simpleSwitch(int intOne) { switch (intOne) { case 10: return 1; case 20: return 2; case 30: return 3; default: return -1; } }
The bytecode generated by this code is as follows:
0: iload_1 1: lookupswitch { default: 42 count: 3 10: 36 20: 38 30: 40 } 36: iconst_1 37: ireturn 38: iconst_2 39: ireturn 40: iconst_3 41: ireturn 42: iconst_m1 43: ireturn
For a more efficient search algorithm (more efficient than linear search), lookupswitch will provide the number of matching values and sort the matching values. The following figure shows how the above code is executed:
java_switch_lookupswitch_byte_code
String switch
In Java 7, the switch statement adds support for string types. Although the existing opcodes that implement switch statements only support int types, no new opcodes are added. The switch statement of string type is completed in two parts. First, compare the hash value between the top of the operand stack and the value corresponding to each case statement. This step can be done by lookups switch or table switch (depending on the sparsity of the hash value).
This will also cause the bytecode corresponding to a branch to call string Equals() makes an exact match. A tableswitch instruction will use string The result of equlas () jumps to the code of the correct case statement.
public int simpleSwitch(String stringOne) { switch (stringOne) { case "a": return 0; case "b": return 2; case "c": return 3; default: return 4; } }
This string switch statement will generate the following bytecode:
0: aload_1 1: astore_2 2: iconst_m1 3: istore_3 4: aload_2 5: invokevirtual #2 // Method java/lang/String.hashCode:()I 8: tableswitch { default: 75 min: 97 max: 99 97: 36 98: 50 99: 64 } 36: aload_2 37: ldc #3 // String a 39: invokevirtual #4 // Method java/lang/String.equals:(Ljava/lang/Object;)Z 42: ifeq 75 45: iconst_0 46: istore_3 47: goto 75 50: aload_2 51: ldc #5 // String b 53: invokevirtual #4 // Method java/lang/String.equals:(Ljava/lang/Object;)Z 56: ifeq 75 59: iconst_1 60: istore_3 61: goto 75 64: aload_2 65: ldc #6 // String c 67: invokevirtual #4 // Method java/lang/String.equals:(Ljava/lang/Object;)Z 70: ifeq 75 73: iconst_2 74: istore_3 75: iload_3 76: tableswitch { default: 110 min: 0 max: 2 0: 104 1: 106 2: 108 } 104: iconst_0 105: ireturn 106: iconst_2 107: ireturn 108: iconst_3 109: ireturn 110: iconst_4 111: ireturn
This class contains this bytecode and the following constant pool values referenced by this bytecode. To learn more about constant pools, check out the runtime constant pools section of this article on JVM internals.
Constant pool: #2 = Methodref #25.#26 // java/lang/String.hashCode:()I #3 = String #27 // a #4 = Methodref #25.#28 // java/lang/String.equals:(Ljava/lang/Object;)Z #5 = String #29 // b #6 = String #30 // c #25 = Class #33 // java/lang/String #26 = NameAndType #34:#35 // hashCode:()I #27 = Utf8 a #28 = NameAndType #36:#37 // equals:(Ljava/lang/Object;)Z #29 = Utf8 b #30 = Utf8 c #33 = Utf8 java/lang/String #34 = Utf8 hashCode #35 = Utf8 ()I #36 = Utf8 equals #37 = Utf8 (Ljava/lang/Object;)Z
Note that the number of bytecodes required to execute this switch includes two tableswitch instructions and several invokevirtual instructions to call string equals(). For more details about invokevirtual, please refer to the method invocation section of the next article. The following figure shows how the time code is executed when entering "b":
If different case s match the same hash value, for example, the hash values of the strings "FB" and "Ea" are 28. This can be handled by slightly adjusting the equlas method flow as follows. Note that the bytecode at sequence number 34: ifeg 42 calls another string Equals () to replace the lookupsswitch opcode in the previous example where there was no hash conflict.
public int simpleSwitch(String stringOne) { switch (stringOne) { case "FB": return 0; case "Ea": return 2; default: return 4; } }
The bytecode generated by the above code is as follows:
0: aload_1 1: astore_2 2: iconst_m1 3: istore_3 4: aload_2 5: invokevirtual #2 // Method java/lang/String.hashCode:()I 8: lookupswitch { default: 53 count: 1 2236: 28 } 28: aload_2 29: ldc #3 // String Ea 31: invokevirtual #4 // Method java/lang/String.equals:(Ljava/lang/Object;)Z 34: ifeq 42 37: iconst_1 38: istore_3 39: goto 53 42: aload_2 43: ldc #5 // String FB 45: invokevirtual #4 // Method java/lang/String.equals:(Ljava/lang/Object;)Z 48: ifeq 53 51: iconst_0 52: istore_3 53: iload_3 54: lookupswitch { default: 84 count: 2 0: 80 1: 82 } 80: iconst_0 81: ireturn 82: iconst_2 83: ireturn 84: iconst_4 85: ireturn
loop
- Conditional flow control, such as if else statement and switch statement, is realized by using an instruction to compare two values and then jump to the corresponding bytecode. For more details about conditional statements, see the conditional section.
- Loops, including for loops and while loops, are implemented in a similar way, except that they usually use a goto instruction to implement a bytecode loop. Do while loops do not require any goto instructions because their conditional branches are at the end of the bytecode.
- Some bytecodes can compare two integers or two references, and then take a branch with a single instruction. Comparisons between other types, such as double,long or float, require two steps. First, perform a comparison and push 1, 0, or - 1 to the top of the operand stack. Next, a branch is executed based on whether the value at the top of the operand stack is greater than 0, less than 0, or equal to 0. For more details about the instructions for branch jump, you can see above.
while Loop
while loops a conditional branch instruction, such as if_fcmpge or if_icmplt (as described above) and a goto statement. After the loop, understand and execute the conditional branch instruction. If the condition is not true, terminate the loop. The last instruction in the loop is goto, which is used to jump to the beginning of the loop code until the conditional branch is not established, as shown below:
public void whileLoop() { int i = 0; while (i < 2) { i++; } }
Compiled into:
0: iconst_0 1: istore_1 2: iload_1 3: iconst_2 4: if_icmpge 13 7: iinc 1, 1 10: goto 2 13: return
if_ The cmpge instruction tests whether the local variable at position 1 is equal to or greater than 10. If greater than 10, the instruction jumps to the bytecode with sequence number 14 to complete the cycle. The goto instruction guarantees that bytecode loops until if_ The icmpge condition holds at a certain point. Once the loop ends, the program execution branch will immediately jump to the return instruction. Iinc instruction is one of the few instructions that can directly update a local variable without loading and storing values on the operand stack. In this example, iinc adds 1 to the value of the first local variable.
for loop
The for loop and the while loop use exactly the same pattern at the bytecode level. This is not surprising because all while loops can be rewritten with the same for loop. The example of the simple while loop above can be rewritten with a for loop to generate exactly the same bytecode, as shown below:
public void forLoop() { for(int i = 0; i < 2; i++) { } }
do-while Loop
The do while loop is also very similar to the for loop and the while loop, except that they do not need to take the goto instruction as a conditional branch to become the last instruction for fallback to the beginning of the loop.
public void doWhileLoop() { int i = 0; do { i++; } while (i < 2); }
The generated bytecode is as follows:
0: iconst_0 1: istore_1 2: iinc 1, 1 5: iload_1 6: iconst_2 7: if_icmplt 2 10: return