Use ebpf to monitor node Time consuming of JS event loop

Powerful ebpf is more and more widely used and can do more and more things, especially the elegant way without invasion is a good choice for technology selection. This article describes how to use ebpf to monitor nodes JS, so as to understand node JS event loop. However, this is only coarse-grained monitoring. You want to understand node in detail JS operation, there is still a lot to do.

On node In JS, we can know the execution time of JS through the cpupprofile of V8 Inspector, but cpupprofile can't see the execution time of C and C + + code. Generally, we can use perf tool to determine the time consumption of C and C + + code, but this is an exploration through ebpf. First, let's take a look at the monitoring of the poll io phase. First, define a structure to record the time consumption.

struct event 
    __u64 start_time;
    __u64 end_time; 

Then write the bpf program.

#include <linux/bpf.h>#include <linux/ptrace.h>#include <bpf/bpf_helpers.h>#include <bpf/bpf_tracing.h>#include "uv.h"#include "uv_uprobe.h"
char LICENSE[] SEC("license") = "Dual BSD/GPL";#define MAX_ENTRIES 10240 / / used to record data struct {_uint (type, bpf_map_type_hash);
    __uint(max_entries, MAX_ENTRIES);
    __type(key, __u32);
    __type(value, const char *);} values SEC(".maps");
// Used to input data to user layer struct {_uint (type, bpf_map_type_perf_event_array);
    __uint(key_size, sizeof(__u32));
    __uint(value_size, sizeof(__u32));} events SEC(".maps");static __u64 id = 0;SEC("uprobe/uv__io_poll")int BPF_KPROBE(uprobe_uv__io_poll, uv_loop_t* loop, int timeout){
    __u64 current_id = id;
    __u64 time = bpf_ktime_get_ns();
    bpf_map_update_elem(&values, &current_id, &time, BPF_ANY);
    return 0;}
SEC("uretprobe/uv__io_poll")int BPF_KRETPROBE(uretprobe_uv__io_poll){   
    __u64 current_id     __u64 current_id = id;
    __u64 *time = bpf_map_lookup_elem(&values, &current_id);
    if (!time) {
        return 0;
    struct event e;
    // Record the start time and end time
    e.start_time = *time;
    e.end_time = bpf_ktime_get_ns();
    // Output to user layer
    bpf_perf_event_output(ctx, &events, BPF_F_CURRENT_CPU, &e, sizeof(e));
    bpf_map_delete_elem(&values, &current_id);
    return 0;}

Finally, write the code using ebpf program, and only list the core code.

#include <errno.h>#include <stdio.h>#include <unistd.h>#include <sys/resource.h>#include <bpf/libbpf.h>#include "uv_uprobe.skel.h"#include "uprobe_helper.h"#include <signal.h>#include <bpf/bpf.h>#include "uv_uprobe.h" / / output the result function static void handle_event(void *ctx, int cpu, void *data, __u32 data_sz){    const struct event *e = (const struct event *)data;
    printf("%s %llu\n", "poll io", (e->end_time - e->start_time) / 1000 / 1000);}
int main(int argc, char **argv){    struct uv_uprobe_bpf *skel;
    long base_addr, uprobe_offset;
    int err, i;
    struct perf_buffer_opts pb_opts;
    struct perf_buffer *pb = NULL;
    // Which node to monitor JS process
    char * pid_str = argv[1];
    pid_t pid = (pid_t)atoi(pid_str);
    char execpath[500];
    // Find node according to pid JS executable
    int ret = get_pid_binary_path(pid, execpath, 500);
    // Functions to be monitored, uv__io_poll is a function that handles the poll io phase
    char * func = "uv__io_poll";
    // Get the address of the function from the executable file
    uprobe_offset = get_elf_func_offset(execpath, func);
    // Load bpf program into kernel
    skel = uv_uprobe_bpf__open();
    err = uv_uprobe_bpf__load(skel);
    // Mount monitoring point
    skel->links.uprobe_uv__io_poll = bpf_program__attach_uprobe(skel->progs.uprobe_uv__io_poll,
                            false /* not uretprobe */,
    skel->links.uretprobe_uv__io_poll = bpf_program__attach_uprobe(skel->progs.uretprobe_uv__io_poll,
                               true /* uretprobe */,
                               -1 /* any pid */,
    // Set the callback to handle the output of bpf
    pb_opts.sample_cb = handle_event;
    pb_opts.lost_cb = handle_lost_events;
    pb = perf_buffer__new(bpf_map__fd(skel->, PERF_BUFFER_PAGES,
    printf("%-7s %-7s\n", "phase", "interval");           
    for (i = 0; ; i++) {
        // Wait for the output of bpf, and then execute callback processing, which is implemented based on epoll
        perf_buffer__poll(pb, PERF_POLL_TIMEOUT_MS);

Compile the above code and start a node JS process, and then put node Execute the above code with the pid of the JS process as a parameter, and you can see the time-consuming of the poll io phase. Generally, if node No task in JS will block epoll_wait, so we can't observe the time-consuming. We just need to write a timer in the code.

setInterval(() => {}, 3000);

We can see that the poll io takes about 3s, because when there is a timer, the poll io will return after waiting for 3s at most, that is, the time of the whole poll io stage. After understanding the basic implementation, let's monitor the time consumption of each stage of the whole event cycle. The principle is similar. First define a macro that handles multiple stages.

#define PHASE(uprobe) \
    uprobe(uv__run_timers) \ 
    uprobe(uv__run_pending) \
    uprobe(uv__run_idle) \
    uprobe(uv__run_prepare) \
    uprobe(uv__io_poll) \
    uprobe(uv__run_check) \

Then change the bpf code.

#define PROBE(type) \
SEC("uprobe/" #type) \
int BPF_KPROBE(uprobe_##type) \
{ \
    char key[20] = #type; \
    __u64 time = bpf_ktime_get_ns(); \
    bpf_map_update_elem(&values, &key, &time, BPF_ANY); \
    return 0; \
} \
SEC("uretprobe/" #type) \
int BPF_KRETPROBE(uretprobe_##type) \
{   \
    char key[20] = #type; \
    __u64 *time = bpf_map_lookup_elem(&values, &key); \
    if (!time) { \
        return 0; \
    } \
    struct event e = { \
        .name=#type \
    }; \
    e.start_time = *time; \
    e.end_time = bpf_ktime_get_ns(); \
    bpf_perf_event_output(ctx, &events, BPF_F_CURRENT_CPU, &e, sizeof(e)); \
    bpf_map_delete_elem(&values, key); \
    return 0; \

We can see that the code is the same as the previous bpf code, but it is convenient to define multiple stages through macros to avoid repeated code. This paper mainly introduces some knowledge of using C# A is equal to "a", a##b is equal to AB, and "a" and "B" are equal to "ab" (there is a space between "a" and "B"). Similarly, after writing the bpf code, change the code of the main program.

    #define ATTACH_UPROBE(type)  \
        do \
        {   char * func_##type = #type; \
            uprobe_offset = get_elf_func_offset(execpath, func_##type); \
            if (uprobe_offset == -1) { \
                fprintf(stderr, "invalid function &s: %s\n", func_##type); \
                break; \
            } \
            fprintf(stderr, "uprobe_offset: %ld\n", uprobe_offset);\
            skel->links.uprobe_##type = bpf_program__attach_uprobe(skel->progs.uprobe_##type,\
                                    false /* not uretprobe */,\
            skel->links.uretprobe_##type = bpf_program__attach_uprobe(skel->progs.uretprobe_##type,\
                                    true /* uretprobe */,\
                                    pid /* any pid */,\
        } while(false); 


Similarly, the code is the same, but it becomes a macro definition, and then the duplicate code is defined through PHASE(ATTACH_UPROBE). The reason why do while(false) is used here is that if there is a problem in the processing of a certain stage, it will be ignored. Because we cannot return directly, do while is a better implementation. Because when I tested, two phases failed because I couldn't find the address of the corresponding function. Finally, write a test code.

function compute() {
    let sum = 0;
    for(let i = 0; i < 10000000; i++) {
        sum += i;
    }}setInterval(() => {    compute();
    setImmediate(() => {
    });}, 10000)

See output after execution.

Postscript: This paper roughly introduces the implementation of node. Net based on ebpf JS event cycle time-consuming monitoring, this is only a very preliminary exploration, if you have good ideas, welcome to communicate.

Code warehouse:

Added by rizi on Tue, 21 Dec 2021 17:59:54 +0200