Rust asynchronous code from the perspective of generator

(Ruihang Xia)

Currently participating in edge temporal data storage engine project

This article has 6992 words and 18 minutes of reading

preface

As an important feature of 2018 edition, Rust's asynchronous programming has now been widely used. When you use it, you will inevitably wonder how it works. This article attempts to explore the generator and variable capture, and then introduces a scene encountered in the development of the embedded sequential storage engine ceresdb helix.

Due to the author's level, there are inevitably some mistakes and omissions in the content. Please leave a message.

PART. 1 async/.await, coroutine and generator

async/.await syntax enters the stable channel in version 1.39 [1], which can easily write asynchronous code:

,,,java
async fn asynchronous() {

// snipped

}

async fn foo() {

let x: usize = 233;
asynchronous().await;
println!("{}", x);

,,,

In the above example, the local variable x can be used directly after an asynchronous procedure (FN asynchronous), just like writing synchronous code. Before that, asynchronous code was generally used through a combinator in the form of futures 0.1[2]. The local variables that you want to use for the next asynchronous process (such as and_then()) need to be explicitly and manually chained in the way of closure access parameters, which is not a very good experience.

async/. What await does is actually transform the code into the form of generator/coroutine[3]. A coroutine process can be suspended to do something else and then resume execution. It is currently used What await looks like. Take the above code as an example, another asynchronous procedure asynchronous() is called in the asynchronous procedure foo(), which is in the seventh row. During await, the execution of the current process is suspended and resumed when it can continue.

The recovery execution may need some previous information. For example, in foo(), we used the previous information x in the eighth line. That is to say, async process should have the ability to save some internal local states so that they can await will continue to be used after. In other words, local variables that may be used after yield should be saved in generator state. Here, we need to introduce the pin[4] mechanism to solve the possible self reference problem, which will not be repeated in this part.

PART. 2 visualize generator via MIR

We can see what the generator mentioned above looks like through MIR[5]. MIR is an intermediate representation of Rust, based on the control flow graph CFG[6]. CFG can intuitively show what the program looks like when it is executed. MIR can help when you sometimes don't know what your Rust code looks like.

There are several ways to get the MIR representation of the code. If you have an available Rust toolchain, you can pass an environment variable to rustc like this, and then build it with cargo to generate the MIR:

RUSTFLAGS="--emit mir" cargo build

If the build is successful, a will be generated in the target/debug/deps / directory mir file. Or you can pass https://play.rust-lang.org/ To get the MIR, select MIR on the overflow menu next to Run.

The MIR generated by nightly toolchain in 2021-08 is probably like this. There are many things you don't know that you don't need to worry about. Let's know about it.

_ 0, _ 1 These are variables
There are many grammars similar to Rust, such as type annotation, function definition, call and annotation.

fn future_1() -> impl Future {
    let mut _0: impl std::future::Future; // return place in scope 0 at src/anchored.rs:27:21: 27:21
    let mut _1: [static generator@src/anchored.rs:27:21: 27:23]; // in scope 0 at src/anchored.rs:27:21: 27:23

    bb0: {
        discriminant(_1) = 0; // scope 0 at src/anchored.rs:27:21: 27:23
        _0 = from_generator::<[static generator@src/anchored.rs:27:21: 27:23]>(move _1) -> bb1; // scope 0 at src/anchored.rs:27:21: 27:23
                                         // mir::Constant
                                         // + span: src/anchored.rs:27:21: 27:23
                                         // + literal: Const { ty: fn([static generator@src/anchored.rs:27:21: 27:23]) -> impl std::future::Future {std::future::from_generator::<[static generator@src/anchored.rs:27:21: 27:23]>}, val: Value(Scalar(<ZST>)) }
    }

    bb1: {
        return; // scope 0 at src/anchored.rs:27:23: 27:23
    }
}

fn future_1::{closure#0}(_1: Pin<&mut [static generator@src/anchored.rs:27:21: 27:23]>, _2: ResumeTy) -> GeneratorState<(), ()> {
    debug _task_context => _4; // in scope 0 at src/anchored.rs:27:21: 27:23
    let mut _0: std::ops::GeneratorState<(), ()>; // return place in scope 0 at src/anchored.rs:27:21: 27:23
    let mut _3: (); // in scope 0 at src/anchored.rs:27:21: 27:23
    let mut _4: std::future::ResumeTy; // in scope 0 at src/anchored.rs:27:21: 27:23
    let mut _5: u32; // in scope 0 at src/anchored.rs:27:21: 27:23

    bb0: {
        _5 = discriminant((*(_1.0: &mut [static generator@src/anchored.rs:27:21: 27:23]))); // scope 0 at src/anchored.rs:27:21: 27:23
        switchInt(move _5) -> [0_u32: bb1, 1_u32: bb2, otherwise: bb3]; // scope 0 at src/anchored.rs:27:21: 27:23
    }

    bb1: {
        _4 = move _2; // scope 0 at src/anchored.rs:27:21: 27:23
        _3 = const (); // scope 0 at src/anchored.rs:27:21: 27:23
        ((_0 as Complete).0: ()) = move _3; // scope 0 at src/anchored.rs:27:23: 27:23
        discriminant(_0) = 1; // scope 0 at src/anchored.rs:27:23: 27:23
        discriminant((*(_1.0: &mut [static generator@src/anchored.rs:27:21: 27:23]))) = 1; // scope 0 at src/anchored.rs:27:23: 27:23
        return; // scope 0 at src/anchored.rs:27:23: 27:23
    }

    bb2: {
        assert(const false, "`async fn` resumed after completion") -> bb2; // scope 0 at src/anchored.rs:27:21: 27:23
    }

    bb3: {
        unreachable; // scope 0 at src/anchored.rs:27:21: 27:23
    }
}

There are other codes in this demo crite, but the source code corresponding to the above MIR is relatively simple:

async fn future_1() {}

It's just a simple empty asynchronous function. You can see that the generated MIR will expand a lot. If the content is a little more, it doesn't look good in the form of text. We can specify the format of the generated MIR and visualize it.

The steps are as follows:

RUSTFLAGS="--emit mir -Z dump-mir=F -Z dump-mir-dataflow -Z unpretty=mir-cfg" cargo build > mir.dot
dot -T svg -o mir.svg mir.dot

Mir can be found in the current directory SVG, after opening, you can see something like a flowchart (another similar diagram is omitted, and those interested can try to generate one by themselves through the above method).

Here, the MIR is organized according to the basic unit basic block (bb), the original information is in, and the jump relationship between each basic block is drawn. From the above figure, we can see four basic blocks, one of which is the starting point and the other three are the ending point. First, the bb0 switch (match in trust) of the starting point has a variable_ 5. Branch to different blocks according to different values. Imagine a code like this:

match _5 {
  0: jump(bb1),
    1: jump(bb2),
    _ => unreachable()
}

The state of the generator can be regarded as that_ 5. Different values are the states of the generator. future_ The state of 1 is written out roughly like this

enum Future1State {
    Start,
    Finished,
}

If it is async fn foo() in § 1, there may be another enumeration value to represent the yield. At this point, if you think about the previous problems, you can naturally think of how to save variables that span different stages of the generator.

enum FooState {
    Start,
    Yield(usize),
    Finished,
}

PART. 3 generator captured

Let's save in generator state, which can span The variables used by await/yield in subsequent stages are called captured variables. So can you know which variables are actually captured? Let's try it. First, write a slightly more complex asynchronous function:

async fn complex() {
    let x = 0;
    future_1().await;
    let y = 1;
    future_1().await;
    println!("{}, {}", x, y);
}

The generated MIR and svg are complex. A paragraph is intercepted and placed in the appendix. You can try to generate a complete content yourself.

After browsing the generated content, we can see that a long type always appears, such as:

[static generator@src/anchored.rs:27:20: 33:2]
// or
(((*(_1.0: &mut [static generator@src/anchored.rs:27:20: 33:2])) as variant#3).0: i32)

By comparing the location of our code, we can find that the two file locations in this type are the first and last braces of our asynchronous function complex(). This type is a type related to our entire asynchronous function.

Through further exploration, we can probably guess that the first line in the above code fragment is an anonymous type (struct) that implements Generator trait[7], and "as variant#3" is an operation in MIR. The Projection::Downcast of projection is probably generated here [8]. The type of projection made after this downcast is i32 we know. Combining other similar fragments, we can infer that this anonymous type is similar to the generator state described above, and each variant is a different state tuple. Projecting this N tuple can get the captured local variables.

PART. 4 anchored

Knowing which variables will be captured can help us understand our code and make some applications based on this information.

Let's first mention a special thing in the Rust type system, auto trait[9]. The most common are send and Sync. This auto trail will be automatically implemented for all types, unless negative impl opt out is explicitly used, and negative impl will be passed, such as included! The Rc structure of send is the same! Send. Through auto trait and negative impl, we control the types of some structures and ask the compiler to check them.

For example, anchored [10] crash is a small tool implemented through auto trail and generator capture mechanism, which can prevent the specified variables in asynchronous functions from passing through await point. A more useful scenario is the acquisition of internal variability of variables in asynchronous processes.

Generally speaking, we will provide the internal variability of variables through asynchronous locks, such as tokio::sync::Mutex; If this variable does not pass through The await point is captured by the generator state, so the std::sync::Mutex synchronization lock or RefCell can also be used; If you want higher performance and avoid the runtime overhead of both, you can also consider unsafe cell or other unsafe means, but it is a little dangerous. Through anchored, we can control unsafe factors in this scenario and realize a safe method to provide internal variability. As long as the variable is marked through the ZST of anchored::Anchored, and then an attribute is added to the whole async fn, the compiler can help us confirm that nothing has been captured and crossed by mistake Await, and then lead to catastrophic data competition.

Like this:

#[unanchored]
async fn foo(){
    {
        let bar = Anchored::new(Bar {});
    }
    async_fn().await;
}

This will lead to compilation errors:

#[unanchored]
async fn foo(){
    let bar = Anchored::new(Bar {});
    async_fn().await;
    drop(bar);
}

For common types such as Mutex, Ref and RefMut of std, clip provides two links [11], which are also realized by analyzing the type of generator. And like anchored, it has a disadvantage. In addition to explicitly using a separate block to place variables as above, false positive will occur [12]. Because local variables are recorded in other forms [13], the information is polluted.

At present, anchored lacks some ergonomic interfaces. There are also some problems when attribute macro interacts with other tools of the economic system. Interested partners are welcome to learn about it https://github.com/waynexia/a...

file: https://docs.rs/anchored/0.1....

"Reference"

[1]https://blog.rust-lang.org/20...

[2]https://docs.rs/futures/0.1.2...

[3]https://github.com/rust-lang/...

[4]https://doc.rust-lang.org/std...

[5]https://blog.rust-lang.org/20...

[6]https://en.wikipedia.org/wiki...

[7]https://doc.rust-lang.org/std...

[8]https://github.com/rust-lang/...

[9]https://doc.rust-lang.org/bet...

[10]https://crates.io/crates/anch...