c++ asyncio hands-on experience

Translated from Asyncio, asyncio is a c++20 library to write concurrent code using the async/await syntax. , take personal study notes only

install

Configuration environment

  1. There is not enough memory when compiling the small pipe. Add swap
    How to increase virtual memory in Linux system

  2. You need to use the new features of C++20. The g + + provided with Ubuntu is very old. You need to download 11 / 12
    If you want to switch versions, you can link to different versions
    Upgrade gcc and g + + to 10 for Ubuntu

  3. Delete the built-in Cmake and replace it with a newer one
    Compile and install Cmake

Compile project source code

  1. Build
$ git clone --recursive https://github.com/netcan/asyncio.git
$ cd asyncio
$ mkdir build
$ cd build
$ cmake ..
$ make -j
  1. Test
$cd asyncio
$cmake .
$make
  1. Run
    In the Test folder, you can see the generated Test program, hello_world,echo_client,echo_server, etc

example

Hello World

Task<> hello_world() {
    fmt::print("hello\n");
    co_await asyncio::sleep(1s);
    fmt::print("world\n");
}

int main() {
    asyncio::run(hello_world());
}

output:

hello
world

Dump callstack

Print current stack information

Task<int> factorial(int n) {
    if (n <= 1) {
        co_await dump_callstack();
        co_return 1;
    }
    co_return (co_await factorial(n - 1)) * n;
}

int main() {
    fmt::print("run result: {}\n", asyncio::run(factorial(10)));
    return 0;
}

output:

[0] void factorial(factorial(int)::_Z9factoriali.Frame*) at asyncio/test/st/hello_world.cpp:17
[1] void factorial(factorial(int)::_Z9factoriali.Frame*) at asyncio/test/st/hello_world.cpp:20
[2] void factorial(factorial(int)::_Z9factoriali.Frame*) at asyncio/test/st/hello_world.cpp:20
[3] void factorial(factorial(int)::_Z9factoriali.Frame*) at asyncio/test/st/hello_world.cpp:20
[4] void factorial(factorial(int)::_Z9factoriali.Frame*) at asyncio/test/st/hello_world.cpp:20
[5] void factorial(factorial(int)::_Z9factoriali.Frame*) at asyncio/test/st/hello_world.cpp:20
[6] void factorial(factorial(int)::_Z9factoriali.Frame*) at asyncio/test/st/hello_world.cpp:20
[7] void factorial(factorial(int)::_Z9factoriali.Frame*) at asyncio/test/st/hello_world.cpp:20
[8] void factorial(factorial(int)::_Z9factoriali.Frame*) at asyncio/test/st/hello_world.cpp:20
[9] void factorial(factorial(int)::_Z9factoriali.Frame*) at asyncio/test/st/hello_world.cpp:20

run result: 3628800

TCP Echo

Client

Task<> tcp_echo_client(std::string_view message) {
    auto stream = co_await asyncio::open_connection("127.0.0.1", 8888);

    fmt::print("Send: '{}'\n", message);
    co_await stream.write(Stream::Buffer(message.begin(), message.end()));

    auto data = co_await stream.read(100);
    fmt::print("Received: '{}'\n", data.data());

    fmt::print("Close the connection\n");
    stream.close(); // unneeded, just imitate python
}

int main(int argc, char** argv) {
    asyncio::run(tcp_echo_client("hello world!"));
    return 0;
}

output:

Send: 'hello world!'
Received: 'hello world!'
Close the connection

Server

Task<> handle_echo(Stream stream) {
    auto& sockinfo = stream.get_sock_info();
    auto sa = reinterpret_cast<const sockaddr*>(&sockinfo);
    char addr[INET6_ADDRSTRLEN] {};

    auto data = co_await stream.read(100);
    fmt::print("Received: '{}' from '{}:{}'\n", data.data(),
               inet_ntop(sockinfo.ss_family, get_in_addr(sa), addr, sizeof addr),
               get_in_port(sa));

    fmt::print("Send: '{}'\n", data.data());
    co_await stream.write(data);

    fmt::print("Close the connection\n");
    stream.close(); // unneeded, just imitate python
}

Task<void> amain() {
    auto server = co_await asyncio::start_server(
            handle_echo, "127.0.0.1", 8888);

    fmt::print("Serving on 127.0.0.1:8888\n");

    co_await server.serve_forever();
}

int main() {
    asyncio::run(amain());
    return 0;
}

output:

Serving on 127.0.0.1:8888
Received: 'Hello World!' from '127.0.0.1:49588'
Send: 'Hello World!'
Close the connection

Gather

auto factorial(std::string_view name, int number) -> Task<int> {
    int r = 1;
    for (int i = 2; i <= number; ++i) {
        fmt::print("Task {}: Compute factorial({}), currently i={}...\n", name, number, i);
        co_await asyncio::sleep(500ms);
        r *= i;
    }
    fmt::print("Task {}: factorial({}) = {}\n", name, number, r);
    co_return r;
};

auto test_void_func() -> Task<> {
    fmt::print("this is a void value\n");
    co_return;
};

int main() {
    asyncio::run([&]() -> Task<> {
        auto&& [a, b, c, _void] = co_await asyncio::gather(
            factorial("A", 2),
            factorial("B", 3),
            factorial("C", 4),
            test_void_func());
        assert(a == 2);
        assert(b == 6);
        assert(c == 24);
    }());
}

output:

Task A: Compute factorial(2), currently i=2...
Task B: Compute factorial(3), currently i=2...
Task C: Compute factorial(4), currently i=2...
this is a void value
Task C: Compute factorial(4), currently i=3...
Task A: factorial(2) = 2
Task B: Compute factorial(3), currently i=3...
Task B: factorial(3) = 6
Task C: Compute factorial(4), currently i=4...
Task C: factorial(4) = 24

FQA

How to deal with cancelled processes

Q: Technically, you can add a handle that does not exist in the event loop queue. In this case, will cancelled events become a dangerous problem?

void cancel_handle(Handle& handle) {
    cancelled_.insert(&handle);
}

A: In some cases, memory leakage may occur, but it is safe. The cancelled set saves the destroyed handle. It will notify eventloop when the handle is ready, and then eventloop will skip this handle and delete it from the cancelled set
A: You're right. I found a Bug in the release version. When a handle is destroyed, it is inserted into the cancelled set. Then create a new coroutine, which has the same address as the destroyed coroutine handle!! Loop will delete the created coroutine fixed patch: https://github.com/netcan/asyncio/commit/23e6a38f5d00b55037f9560845c4e44948e41709

Compared with other methods, coroutine has different performance and performance

Q: First of all, great work! But when should I use coroutine and when not? They are too new to bring much performance and compare with other methods.
A: Good question. From my point of view, a coroutine is just a syntax sugar of callback. In other words, any scenario requiring a callback interface can be replaced by a coroutine. A typical asynchronous programming mode involves a large number of callbacks, so the use of coroutine code is more readable than the callback style.
A: In terms of performance, a coroutine is only a recoverable function. It supports suspension and recovery. I tested the call / suspension / recovery of a coroutine. It only takes tens of nanoseconds. Compared with callback programming, it has negative overhead. More detailed information is available https://www.youtube.com/watch?v=_fu0gx-xseY , the explanation of the author of the collaborative process

Why do you need some primitives (asynchronous mutexes / asynchronous condition variables) even in single threaded mode?

Q: I'm curious. Can you share what these primitives (asynchronous mutual exclusion, synchronous wait) do?
A: In order to be able to create an entire asynchronous application, we should not block any threads in the thread pool. Traditional synchronization primitives using the operating system scheduler (such as std:: mutex std:: condition_ variable) are useless in this case. We need these primitives to cooperate with the internal application scheduler.
Q: OK. OK, forget this. I'm curious about the library. All I have is a single thread
A: You still need them. Although you have only one thread (you just don't need to worry about concurrency), you need to consider, for example, how to wait for a condition variable and notify it from elsewhere? How to join multiple asynchronous tasks
A: These primitives are needed. For example, in a game scenario, the server must wait to collect all commands from the client, and then coroutine continues to execute the game logic, which requires conditional variables.

io_uring is better than epoll

Q: This result is incredible, but it is possible that the order of io is milliseconds
Depending on the situation, use io_ Ring or user state network stack can obtain IO in the microsecond / nanosecond range. So far, the best ping pong program I have seen, the time to send the first byte is 1.2 microseconds, including: the network card receives the byte, the PCI bus transmits the byte to the CPU, the CPU reads the request and writes the response, the PCI bus transmits the byte to the network card, and the network card sends the byte
The main problem of IO today is the overhead of system calls. If you don't use system calls, io_uring is the most economical alternative, and you can get an order of magnitude acceleration.
A: If we remember wrong, a system call is about 100ns(empty epoll_wait benchmark), but io_uring may be faster than epoll, and I also see some related comparisons

Why does python asyncio perform so well?

Q: Why does python asyncio perform so well?
A: Because many bigwigs have optimized python asyncio for many years, it is always slower than C + + in the sense of fixed overhead, but it should be close to optimal in terms of scalability and boundary case processing.

How to print the call stack of a coroutine?

Q: In an example, you printed the call stack. Can you understand that "async call stack" is different from the traditional call stack? How did you get this information? I'm curious about this because I've been trying to implement a tool to help mode.
A: Yes, it's async callstack, and the focus is on using coprocess promise_ Await of type_ Transform() function, which can save the source of a coroutine_ Location I information, in other words, when the user calls Co_ During await, the location information of await will be saved. dump backtrace is very simple. You only need to recursively print out the source of the collaborative process_ Location and its other information

Added by moboter on Fri, 31 Dec 2021 16:41:36 +0200