How to analyze Java problems from the perspective of bytecode

preface

One day when i was wandering around Zhihu, i encountered such a problem: why is i the final result of face code 8?

public static void main(String[] args) {
 int i = 1;
 i += i += ++i + 2.6 + i;
}

Very simple two lines of code. If you encounter such a problem, how will you explain the problem clearly? Is the formula disassembled by Java operator sequence and then calculated step by step, or what other method?

After thinking for a while, I decided to see how these two lines of code work through bytecode instructions.

Copy the two lines of code into Test.java and execute the following instructions to convert the Java source code into bytecode:

javac Test.java
javap -c Test.class

The bytecode output results are as follows:

If you don't know about bytecode before, you can search the data of bytecode instructions, or go to the book "understanding Java virtual machine" to find "Appendix b bytecode instruction list".

Next, translate the bytecode:

public static void main(java.lang.String[]);
    Code:
       0: iconst_1  //  Put 1 at the top of the operand stack
       1: istore_1  //  Take the i at the top of the operand stack out of the stack and store it in the slot in the local variable table
       2: iload_1   //  Take i from the slot and put it at the top of the operand stack. At this time, the stack content is 1
       3: iload_1   //  Take i from the slot and put it at the top of the operand stack again. At this time, the stack content is 1   one
       4: i2d       //  Convert the int at the top i of the operand stack to double type. At this time, the stack content is 1.0   one
       5: iinc      // ++ i increases automatically. At this time, the value of i in the slot is 2. Remember, it is 2
       8: iload_1   //  Take i from the slot and put it at the top of the stack. At this time, the content of the stack is 2   one   one
       9: i2d       //  Converts the int type at the top of the stack to the double type
      10: ldc2_w    //  Put 2.6 on the top of the stack, and the stack content is 2.6   two   one   one
      13: dadd      //  Add the two double at the top of the stack, and put the result into the top of the stack. At this time, the content of the stack is   four point six   one   one  
      14: iload_1   //  Put the i in the slot at the top of the stack, and the stack content is   two   four point six   one   one  
      15: i2d       //  Convert the int type at the top of the stack to the double type, and the stack content   two   four point six   one   one
      16: dadd      //  Add the two double at the top of the stack, and put the result into the top of the stack. At this time, the content of the stack is   six point six   one   one
      17: dadd      //  Add the two double at the top of the stack, and put the result into the top of the stack. At this time, the content of the stack is   seven point six   one
      18: d2i       //  Convert the double at the top of the stack to int type, and 7.6 becomes 7. At this time, the content of the stack is 7   one
      19: dup       //  Copy the stack top value and press the stack. At this time, the stack content is   seven   seven   one
      20: istore_1  //  Will I=   i  +  (++i  +  two point six  +  i) As a result, the value of I, that is, 7, is placed in the slot and out of the stack. At this time, the stack content is 7   one
      21: iadd      //  Add the two int s at the top of the stack. At this time, the content of the stack is 8
      22: istore_1  //  i  =  i  +  (i  +  (++i  +  two point six  +  i)) the result, i.e. the value of I, i.e. 8, is put into the slot and out of the stack
      23: return    //  Return 8

The bytecode annotation above is my answer. The operation steps are disassembled step by step.

Stack frame

What are the local variable tables and slot s mentioned above?

I have to raise the stack frame here. When we execute a method, the virtual machine will create a stack frame at the top of the virtual machine stack private to the thread to correspond to this method. Therefore, stack frame is the data structure during method call and execution, including local variable table, operand stack, dynamic connection, etc.

A method is called from the beginning to the completion of execution, which corresponds to the process of putting a stack frame into and out of the "virtual machine stack".

Local variable table

The local variable table is a space for storing method parameters and method local variables, which is composed of slots. When the code is compiled into a bytecode file, the size of the local variable table can be determined. Except that the 64 bit long and double types occupy two slots, other data types occupy one slot.

Operand stack

In the process of method execution, data is written and read into the operand stack through various bytecode instructions, that is, in and out of the stack. The operation of data is based on the operation stack. For example, iadd can add the two int types at the top of the stack.

Dynamic connection

Each stack frame will contain a symbolic reference to the corresponding method of the stack frame in the runtime constant pool. This reference is held to support the dynamic connection of the method call process. The process of resolving symbolic references into direct references at run time is called dynamic connection.

Method return address

The method exits under the following two conditions: when the method returns a bytecode instruction, determine whether a return value will be returned to the caller according to the method logic, and then exit the method normally; When an exception is encountered and try is not used to catch the exception, the code exits abnormally.

No matter how you exit, you must return to the position when calling the method. Some information returned by the method will be saved in the stack frame to restore the execution state of the upper layer method.

Extended application

Recently, there is a popular question on the Internet. Why does 100 = = 100 return true and 200 = = 200 return false? As we all know, = = compares the addresses of two objects. Why can the addresses of two objects be the same? Let's explore here:

The source code is as follows:

 public static void main(String[] args) {
        Integer a = 100;
        Integer b = 100;
        Integer c = 200;
        Integer d = 200;
        System.out.println(a == b);
        System.out.println(c == d);
    }

Output results:

The bytecode is as follows:

public static void main(java.lang.String[]);
    Code:
       0: bipush        100
       2: invokestatic  #2     // Method java/lang/Integer.valueOf:(I)Ljava/lang/Integer;
       5: astore_1
       6: bipush        100
       8: invokestatic  #2    // Method java/lang/Integer.valueOf:(I)Ljava/lang/Integer;
      11: astore_2
      12: sipush        200
      15: invokestatic  #2    // Method java/lang/Integer.valueOf:(I)Ljava/lang/Integer;
      18: astore_3
      19: sipush        200
      22: invokestatic  #2    // Method java/lang/Integer.valueOf:(I)Ljava/lang/Integer;
      25: astore        4
      27: getstatic     #3    // Field java/lang/System.out:Ljava/io/PrintStream;
      30: aload_1
      31: aload_2
      32: if_acmpne     39
      35: iconst_1
      36: goto          40
      39: iconst_0
      40: invokevirtual #4    // Method java/io/PrintStream.println:(Z)V
      43: getstatic     #3    // Field java/lang/System.out:Ljava/io/PrintStream;
      46: aload_3
      47: aload         4
      49: if_acmpne     56
      52: iconst_1
      53: goto          57
      56: iconst_0
      57: invokevirtual #4   // Method java/io/PrintStream.println:(Z)V
      60: return

From the bytecode, we can see that when assigning values to a, b, c and d, the Integer.valueOf() method is called through the "invokstatic" bytecode instruction.

However, the difference is that when assigning values to a and b, the bytecode instruction is bipush, which pushes the single byte integer constant value (- 128 - 127) into the top of the operand stack; When assigning values to c and d, the bytecode instruction is sipush, which pushes the constant value of int type into the top of the operand stack.

Why is it the same Integer type? One is 1 byte and the other is 4 bytes?

Let's explore the valueOf() method of Integer:

This method calls the overloaded valueOf(), and the code is as follows:

As shown above, this IntegerCache is a static internal class of Integer. It will judge the value of Integer initialized by you. When the value is between low and high, i.e. - 128 ~ 127, memory will not be reallocated in the heap to create an Integer object, and an Integer object will be returned directly from the cache array, so a == b.

The source code of IntegerCache is as follows:

It can be seen that the cache array is initialized through the for loop in the static static block.

epilogue

The article may not describe the stack frame in so much detail. It is mainly to let you roughly understand the basic functions of the stack frame and popularize the functions of bytecode. When we can't understand some code, it may be enlightened to understand it from another angle.

Keywords: Java Interview Programmer architecture

Added by echoninja on Tue, 30 Nov 2021 07:41:38 +0200