Detailed explanation of Java8 features lambda expression: Principle

Why does Java need lambda expressions?

It can improve the simplicity and readability of the code.

For example, in the normal development process, it is a common requirement to convert a list into another list or map.

This is usually the case before lambda.

List<Long> idList = Arrays.asList(1L, 2L, 3L);
List<Person> personList = new ArrayList<>();
for (long id : idList) {
    personList.add(getById(id));
}

After the code is repeated more, we will abstract this common code and form some class libraries for reuse.

The above requirements can be abstracted as: calling a conversion function for each element in a list to convert and output the result list.

interface Function {
    <T, R> R fun(T input);
}
<T, R> List<R> map(List<T> inputList, Function function) {
    List<R> mappedList = new ArrayList<>();
    for (T t : inputList) {
        mappedList.add(function.fun(t));
    }
    return mappedList;
}

With this abstraction, the initial code can be "simplified" into

List<Long> idList = Arrays.asList(1L, 2L, 3L);
List<Person> personList = map(idList, new Function<Long, Person>() {
    @Override
    public Person fun(Long input) {
        return getById(input);
    }
});

Although the implementation logic is less, it is also regrettable to find that the number of lines of code has also increased.

Because functions in the Java language cannot be passed to methods as parameters, functions can only be expressed in a class. In order to pass the function as a parameter to the method, we are forced to use the anonymous inner class implementation, which needs to add a lot of redundant code.

In some functional programming languages (such as Python, Scala, Kotlin, etc.), functions are first-class citizens. Functions can be passed as parameters and returned as return values.

For example, in Kotlin, the above code can be reduced to very short. The code contains only key content and no redundant information.

val personList = idList.map { id -> getById(id) }

Such a gap in writing efficiency also led to the loss of some Java users to other languages, but finally provided Lambda expression capability in JDK8 to support this function transfer.

List<Person> personList = map(idList, input -> getById(input));

Is Lambda expression just the syntax sugar of anonymous inner class?

If you want to implement lambda expressions in the Java language, you can easily restore this arrow syntax to anonymous inner classes through javac, because their functions are basically equivalent (there are often prompts in the IDEA).

But anonymous inner classes have some disadvantages.

  1. Each anonymous internal class will create a corresponding class at compile time and has files. Therefore, it is inevitable that there will be class loading processes of loading, verification, preparation, parsing and initialization at run time.
  2. Each call will create an instance object of the anonymous inner class, whether it is a stateful (capturing, capturing some variables from the context) or a non capturing inner class.

invokedynamic introduction

It would be nice to have a function reference and pointer, but there is no function type representation in the JVM.

Is there an Object in Java that represents a function reference? There is a Method Object in the reflection, but its problem is performance. Every execution will be subject to security check, and the parameters are of Object type. boxing is required.

Is there any other way to represent function references? MethodHandle, which is a new feature provided in JDK7 together with invokedynamic instructions.

However, if you directly use MethodHandle to implement it, you will encounter the problem that it cannot be overloaded because there is no signature information. Moreover, the performance of the invoke method of MethodHandle may not be better than that of bytecode call.

Background of invokedynamic

Dynamic languages (JRuby, Scala, etc.) on the JVM are troublesome to implement dynamic typing.

Here is a brief explanation of what dynamic typing is, as opposed to static typing.

static typing: the types of all variables are determined at compile time, and type checking will be performed.

dynamic typing: the type of a variable cannot be determined at compile time, but can only be determined and checked at run time.

For example, in the following dynamic language example, the types of a and B are unknown, so the method of a.append(b) is also unknown.

def add(val a, val b)
    a.append(b)

In Java, the types of a and b can be determined at compile time.

SimpleString add(SimpleString a, SimpleString b) {
    return a.append(b);
}

The compiled bytecode is as follows. The function signature of calling variable a through invokevirtual is (LSimpleString;)LSimpleString; Methods.

0: aload_1
1: aload_2
2: invokevirtual #2 // Method SimpleString.append:(LSimpleString;)LSimpleString;
5: areturn

There are four kinds of bytecode instructions for method calls in the JVM.

invokestatic - invokes a static method

invokeinterface - invokes interface methods

invokevirtual - calls the public method of the instance non interface method

invokespecial - other method calls, private, constructor, super

These method call instructions have clearly specified what method to call when compiling, and they all need to receive the symbolic reference of the method in a clear constant pool and check the type. They can't be called by passing an object that doesn't meet the type requirements, even if the passed type happens to have the same method signature.

invokedynamic function

This limitation makes it difficult for dynamic language implementers on the JVM to implement dynamic types only temporarily through poor performance reflection.

This shows that dynamic dispatch cannot be supported at the bytecode level. What should we do? The familiar "All problems in computer science can be solved by another level of indirection" is used again.

To realize dynamic dispatch, since it cannot be decided at compile time, we will postpone this decision until runtime, and the user-defined code will tell the JVM what method to execute.

In jdk7, Java provides the invokedynamic instruction to solve this problem, along with Java Lang.invoke package.

Most users are not familiar with this instruction, because unlike invokestatic and other instructions, it has no direct concept related to it in the Java language.

The key concepts are as follows

  1. Invokedynamic instruction: when the JVM arrives here for the first time, it will link and call the bootstrap method specified by the user to determine what method to execute. After that, it does not need this parsing step. The place where this invokedynamic instruction appears is also called dynamic call site
  2. Bootstrap Method: users can write their own methods, implement their own logic, and finally return a CallSite object.
  3. CallSite: responsible for returning MethodHandle through getTarget() method
  4. MethodHandle: MethodHandle represents the pointer of the method to be executed

Then connect them together and sort them out

invokedynamic is in the unlinked state at the beginning. At this time, the instruction does not know what the target method to call is.

When the JVM wants to execute the invokedynamic instruction somewhere for the first time, the invokedynamic must be linked first.

The link process passes in the current call related information by calling a bootstrap method. The bootstrap method will return a CallSite, which contains the reference of MethodHandle, that is, the target of CallSite.

The invokedynamic instruction links to the CallSite and delegate s all calls to its current targetMethodHandle. CallSite can be divided into MutableCallSite, ConstantCallSite and VolatileCallSite according to whether the target needs to be transformed. The method to be called can be dynamically modified by switching the target MethodHandle.

How are lambda expressions really implemented

Let's take a direct look at the current way java implements lambda

Take the following code as an example

public class RunnableTest {
    void run() {
        Function<Integer, Integer> function = input -> input + 1;
        function.apply(1);
    }
}

After compilation, check the generated bytecode through javap

void run();
    descriptor: ()V
    flags:
    Code:
      stack=2, locals=2, args_size=1
         0: invokedynamic #2,  0              // InvokeDynamic #0:apply:()Ljava/util/function/Function;
         5: astore_1
         6: aload_1
         7: iconst_1
         8: invokestatic  #3                  // Method java/lang/Integer.valueOf:(I)Ljava/lang/Integer;
        11: invokeinterface #4,  2            // InterfaceMethod java/util/function/Function.apply:(Ljava/lang/Object;)Ljava/lang/Object;
        16: pop
        17: return
      LineNumberTable:
        line 12: 0
        line 13: 6
        line 14: 17
      LocalVariableTable:
        Start  Length  Slot  Name   Signature
            0      18     0  this   Lcom/github/liuzhengyang/invokedyanmic/RunnableTest;
            6      12     1 function   Ljava/util/function/Function;
      LocalVariableTypeTable:
        Start  Length  Slot  Name   Signature
            6      12     1 function   Ljava/util/function/Function<Ljava/lang/Integer;Ljava/lang/Integer;>;

private static java.lang.Integer lambda$run$0(java.lang.Integer);
    descriptor: (Ljava/lang/Integer;)Ljava/lang/Integer;
    flags: ACC_PRIVATE, ACC_STATIC, ACC_SYNTHETIC
    Code:
      stack=2, locals=1, args_size=1
         0: aload_0
         1: invokevirtual #5                  // Method java/lang/Integer.intValue:()I
         4: iconst_1
         5: iadd
         6: invokestatic  #3                  // Method java/lang/Integer.valueOf:(I)Ljava/lang/Integer;
         9: areturn
      LineNumberTable:
        line 12: 0
      LocalVariableTable:
        Start  Length  Slot  Name   Signature
            0      10     0 input   Ljava/lang/Integer;

Corresponding function < integer, integer > function = Input - > input + 1; The bytecode of this line is

0: invokedynamic #2,  0              // InvokeDynamic #0:apply:()Ljava/util/function/Function;
5: astore_1

Here we review the steps of invokedynamic.

  1. When the JVM parses for the first time, it calls the user-defined bootstrap method
  2. The bootstrap method returns a CallSite
  3. MethodHandle can be obtained from CallSite to represent the method pointer
  4. After calling JVM, you no longer need to re analyze, bind directly to the CallSite, call the corresponding target MethodHandle, and make inline optimization.

The first line invokedynamic is followed by two parameters. The second 0 has no meaning and is fixed to 0. The first parameter is #2, pointing to the type in the constant pool CONSTANT_InvokeDynamic_info Constant for.

#2 = InvokeDynamic      #0:#32         // #0:apply:()Ljava/util/function/Function;

The #0: #32 corresponding to this constant represents the name and method signature (method type) of the dynamic method corresponding to the invokedynamic instruction

#32 = NameAndType        #43:#44        // apply:()Ljava/util/function/Function;

The first #0 represents the index of bootstrap method in the bootstrap methods table. The last thing you see in the javap results is

BootstrapMethods:
  0: #28 invokestatic java/lang/invoke/LambdaMetafactory.metafactory:(Ljava/lang/invoke/MethodHandles$Lookup;Ljava/lang/String;Ljava/lang/invoke/MethodType;Ljava/lang/invoke/MethodType;Ljava/lang/invoke/MethodHandle;Ljava/lang/invoke/MethodType;)Ljava/lang/invoke/CallSite;
    Method arguments:
      #29 (Ljava/lang/Object;)Ljava/lang/Object;
      #30 invokestatic com/github/liuzhengyang/invokedyanmic/RunnableTest.lambda$run$0:(Ljava/lang/Integer;)Ljava/lang/Integer;
      #31 (Ljava/lang/Integer;)Ljava/lang/Integer;

Look at the corresponding values of the bootstrap methods attribute JVM virtual machine specification The instructions in.

BootstrapMethods_attribute {
    u2 attribute_name_index;
    u4 attribute_length;
    u2 num_bootstrap_methods;
    {   u2 bootstrap_method_ref;
        u2 num_bootstrap_arguments;
        u2 bootstrap_arguments[num_bootstrap_arguments];
    } bootstrap_methods[num_bootstrap_methods];
}

bootstrap_method_ref
The value of the bootstrap_method_ref item must be a valid index into the constant_pool table. The constant_pool entry at that index must be a CONSTANT_MethodHandle_info structure

bootstrap_arguments[]
Each entry in the bootstrap_arguments array must be a valid index into the constant_pool table. The constant_pool entry at that index must be a CONSTANT_String_info, CONSTANT_Class_info, CONSTANT_Integer_info, CONSTANT_Long_info, CONSTANT_Float_info, CONSTANT_Double_info, CONSTANT_MethodHandle_info, or CONSTANT_MethodType_info structure

CONSTANT_MethodHandle_info The CONSTANT_MethodHandle_info structure is used to represent a method handle

This bootstrap method attribute can tell the reference of the bootstrap method required by the invokedynamic instruction, as well as the number and type of parameters.

#28 corresponds to bootstrap_method_ref, is

#28 = MethodHandle       #6:#40         // invokestatic java/lang/invoke/LambdaMetafactory.metafactory:(Ljava/lang/invoke/MethodHandles$Lookup;Ljava/lang/String;Ljava/lang/invoke/MethodType;Ljava/lang/invoke/MethodType;Ljava/lang/invoke/MethodHandle;Ljava/lang/invoke/MethodType;)Ljava/lang/invoke/CallSite;

According to the JVM specification, bootstrap method receives three standard parameters and some custom parameters. The standard parameters are as follows

  1. MethodHandles.$ The caller parameter of lookup type. This object can get the methods that can be called in the environment of executing the invokedynamic instruction in a reflection like manner. For example, private methods of other classes cannot be called. This parameter is stacked by the JVM
  2. The invokedName parameter of String type indicates the name of the method to be implemented by invokedynamic. In this case, apply is the method name implemented by lambda expression. This parameter is put on the stack by the JVM
  3. The invokedType parameter of MethodType indicates the type of method to be implemented by invokedynamic. Here, it is () Function. This parameter is put on the stack by the JVM

#29, #30, #31 are optional user-defined parameter types

#29 = MethodType         #41            //  (Ljava/lang/Object;)Ljava/lang/Object;
#30 = MethodHandle       #6:#42         // invokestatic com/github/liuzhengyang/invokedyanmic/RunnableTest.lambda$run$0:(Ljava/lang/Integer;)Ljava/lang/Integer;
#31 = MethodType         #21            //  (Ljava/lang/Integer;)Ljava/lang/Integer;

Through Java lang.invoke. Under the code description of lambdametafactory #metafactory

public static CallSite metafactory(MethodHandles.Lookup caller,
        String invokedName,
        MethodType invokedType,
        MethodType samMethodType,
        MethodHandle implMethod,
        MethodType instantiatedMethodType)

The first three have been introduced, and the rest are

MethodType samMethodType: sam(SingleAbstractMethod) is #29 = MethodType #41 // (Ljava/lang/Object;)Ljava/lang/Object;, Represents the type of method object to implement, but it has no generic information, (Ljava/lang/Object;)Ljava/lang/Object;

MethodHandle implMethod: the location of the method to be executed. Here is com github. liuzhengyang. invokedyanmic. Runnable. Lambda $run $0 (integer) integer / invokestatic. Here is a method generated by javac after lambda parsing. It will be introduced later

MethodType instantiatedMethodType: basically the same as samMethod, but it will contain generic information (Ljava/lang/Integer;)Ljava/lang/Integer;

private static java.lang.Integer lambda$runprivate static java. lang.Integer lambda$run$0(java.lang.Integer);(java.lang.Integer); This method uses javac to generate the lambda expression desugar. If the lambda expression uses a context variable, it is stateful. This expression is also called capturing lambda. It will pass the variable as the parameter of the generation method. If there is no state, it is non capturing.

In addition, if you use the java8 MethodReference syntax, such as Main::run, it means that there are methods that can be called directly, so you don't need to regenerate into an intermediate method.

Continue to look at 5: astore_1 this instruction means that the object reference of the current operand stack is saved to the local variable table with index 1, that is, it is assigned to the function variable.

Description after invokedynamic #2, 0 is executed, an object of type Function is inserted into the operand stack.

The process here needs to continue to look at the implementation of lambda Metafactory #metafactory.

mf = new InnerClassLambdaMetafactory(caller, invokedType,
                                        invokedName, samMethodType,
                                        implMethod, instantiatedMethodType,
                                        false, EMPTY_CLASS_ARRAY, EMPTY_MT_ARRAY);
mf.validateMetafactoryArgs();
return mf.buildCallSite();

Create a InnerClassLambdaMetafactory, then call buildCallSite to return to CallSite.

Take a look at what InnerClassLambdaMetafactory does: lambda Metafactory implementation which dynamically creates an inner class like class per lambda callsite

What's going on? Spare a big circle or create an inner class! Don't panic, read it first, and finally analyze the difference between it and ordinary inner class.

The process of creating InnerClassLambdaMetafactory is probably the assignment and initialization of parameters

Let's look at buildCallSite, which is more complex. The method description is build the callsite Generate a class file which implements the functional interface, define the class, if there are no parameters create an instance of the class which the CallSite will return, otherwise, generate handles which will call the class' constructor.

Create a class file that implements the functional interface and define the class. If there is no parameter non capturing type, create a class instance, and CallSite can return the instance fixedly. Otherwise, CallSite will generate a new object through the constructor every time.

Compared with ordinary InnerClass, there is a memory optimization, and stateless uses an object.

The first step of method implementation is to call spinInnerClass(), generate the bytecode of the implementation class of a function interface through ASM, and load and return the class.

Keep only critical code
cw.visit(CLASSFILE_VERSION, ACC_SUPER + ACC_FINAL + ACC_SYNTHETIC, lambdaClassName, null, JAVA_LANG_OBJECT, interfaces);
for (int i = 0; i < argDescs.length; i++) {
    FieldVisitor fv = cw.visitField(ACC_PRIVATE + ACC_FINAL, argNames[i], argDescs[i], null, null);
    fv.visitEnd();
}
generateConstructor();
if (invokedType.parameterCount() != 0) {
    generateFactory();
}
// Forward the SAM method
MethodVisitor mv = cw.visitMethod(ACC_PUBLIC, samMethodName, samMethodType.toMethodDescriptorString(), null, null);
mv.visitAnnotation("Ljava/lang/invoke/LambdaForm$Hidden;", true);
new ForwardingMethodGenerator(mv).generate(samMethodType);

byte[] classBytes = cw.toByteArray();

return UNSAFE.defineAnonymousClass(targetClass, classBytes, null);

The generation method is

  1. Declare the interface to implement
  2. Create fields for saving parameters
  3. Generate a constructor. If there are parameters, generate a static Factory method
  4. Implement the method to be implemented in the function interface, forward to implMethodName, that is, the method generated by javac or the method pointed to by MethodReference
  5. After generation, click classwrite Tobytearray gets the class bytecode array
  6. Through unsafe Define anonymous class (targetClass, classbytes, null) defines the internal class. The defineAnonymousClass here is special. The anonymous class it creates will be mounted on the targetClass host class, and then the class can be loaded with the class loader of the host class. However, it will not be placed in SystemDirectory. SystemDirectory is the mapping from class loader object + class name to kclass address. If it is not placed in this Directory, it can be loaded repeatedly to facilitate the implementation of some dynamic language functions and prevent some memory leaks.

These are more abstract and intuitive to see the generated results

// $FF: synthetic class
final class RunnableTest$Lambda$1 implements Function {
    private RunnableTest$Lambda$1() {
    }

    @Hidden
    public Object apply(Object var1) {
        return RunnableTest.lambda$run$0((Integer)var1);
    }
}

What if there are parameters, such as using a non static field from an external class and an external local variable

private int a;
void run() {
    int b = 0;
    Function<Integer, Integer> function = input -> input + 1 + a + b;
    function.apply(1);
}

The corresponding result is

final class RunnableTest$Lambda$1 implements Function {
    private final RunnableTest arg$1;
    private final int arg$2;

    private RunnableTest$Lambda$1(RunnableTest var1, int var2) {
        this.arg$1 = var1;
        this.arg$2 = var2;
    }

    private static Function get$Lambda(RunnableTest var0, int var1) {
        return new RunnableTest$Lambda$1(var0, var1);
    }

    @Hidden
    public Object apply(Object var1) {
        return this.arg$1.lambda$run$0(this.arg$2, (Integer)var1);
    }
}

After creating the inner class, you will generate the required CallSite. If there are no parameters, generate a function interface object example of the inner class, create a MethodHandle that returns the object fixedly, and then wrap it into a ConstantCallSite return.

If there are parameters, a MethodHandle that needs to generate an object instance of function interface every time the Factory method is called is returned, wrapped as ConstantCallSite.

This completes the bootstrap process. After invokedynamic is linked, subsequent calls will directly call the corresponding MethodHandle. Specifically, the implementation is to return fixed internal class objects or create new internal class objects each time.

Again, compare the advantages of invokedynamic over direct anonymous internal class syntax

Let's think about the reason why Java 8 implements this set of operations. Since lambda expressions don't need any dynamic dispatch (which method to transfer is explicit), why use invokedynamic?

A basic guarantee of the JVM virtual machine is that the lower version of the class file can also run on the higher version of the JVM, and the JVM virtual machine is constantly optimizing and improving performance through version upgrade.

It is simple to directly convert to an internal class implementation, but the content of the compiled binary bytecode (including third-party jar packages) is fixed. The implementation is fixed to create an internal class object + invoke{virtual, static, special, interface} call.

In the future, performance can only be improved by improving the optimization of creating class objects and invoking invoke instructions. To put it another familiar way, it's dead here.

If you use invokedynamic, enough information is retained after javac compilation. When the JVM executes, you can dynamically decide how to implement lambda, and you can constantly optimize the implementation of lambda expressions and maintain compatibility, leaving more possibilities for the future.

summary

This paper is a summary of my study of lambda, and introduces the reasons for the emergence of lambda expressions, implementation methods and the comparison of different implementation ideas. I have only glanced at some codes and materials for lambda knowledge. Please point out any errors or unclear places.

Keywords: Cloud Server

Added by louie35 on Mon, 13 Dec 2021 07:32:47 +0200