dpdk-16.04 example demo and internal implementation analysis of monitoring uio file to detect interrupt

preface

stay Eal:Error reading from file descriptor 33: Input/output error In this article, I described the problem of printing Input/output error when dpdk program in VMWARE environment uses 82545EM virtual network card.

This problem was finally solved by modifying igb_uio code repair, after repair, I can't help thinking about how user mode works? It was probably known that uio files were monitored through epoll, but the specific process was not clear.

In this paper, I use dpdk-16.04 interrupt thread simulation demo to further study the key process of dpdk monitoring network card interrupt events through uio files.

Example demo of dpdk monitoring uio files to detect interrupts

demo running machine kernel information:

longyu@debian:~/epoll$ uname -a
Linux debian 4.19.0-18-amd64 #1 SMP Debian 4.19.208-1 (2021-09-29) x86_64 GNU/Linux

Network card binding information:

longyu@debian:~/epoll$ sudo python ../dpdk-16.04/tools/dpdk_nic_bind.py -s

Network devices using DPDK-compatible driver
============================================
0000:02:05.0 '82545EM Gigabit Ethernet Controller (Copper)' drv=igb_uio unused=e1000

In order to solve the compilation problem, dpdk-16.04 igb_uio.c code has been modified as follows:

--- lib/librte_eal/linuxapp/igb_uio/igb_uio.c  
+++ lib/librte_eal/linuxapp/igb_uio/igb_uio.c
@@ -442,7 +442,7 @@
        case RTE_INTR_MODE_MSIX:
                /* Only 1 msi-x vector needed */
                msix_entry.entry = 0;
-               if (pci_enable_msix(dev, &msix_entry, 1) == 0) {
+               if (pci_enable_msix_range(dev, &msix_entry, 1, 1) == 0) {s

The demo program is extracted from dpdk-16.04 and simplified. The source code is as follows:

#include <stdio.h>
#include <stdarg.h>
#include <errno.h>
#include <sys/epoll.h>
#include <string.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <stdlib.h>

static void eal_intr_handle_interrupts(int pfd, unsigned totalfds);

#define rte_panic(...) rte_panic_(__func__, __VA_ARGS__, "dummy")
#define rte_panic_(func, format, ...) __rte_panic(func, format "%.0s", __VA_ARGS__)

/* call abort(), it will generate a coredump if enabled */
static void __rte_panic(const char *funcname, const char *format, ...)
{
  va_list ap;

  va_start(ap, format);
  vprintf(format, ap);
  va_end(ap);
  abort();
}

static void epoll_uio_file(int fd)
{
  struct epoll_event ev;

  for (;;) {
    unsigned numfds = 0;

    /* create epoll fd */
    int pfd = epoll_create(1);
    if (pfd < 0)
      rte_panic("Cannot create epoll instance\n");

    ev.events = EPOLLIN | EPOLLPRI;
    ev.data.fd = fd;

    if (epoll_ctl(pfd, EPOLL_CTL_ADD, fd, &ev) < 0){
      rte_panic("Error adding fd %d epoll_ctl, %s\n",
                fd, strerror(errno));
    } else {
      numfds++;
    }

    /* serve the interrupt */
    eal_intr_handle_interrupts(pfd, numfds);

    /**
     * when we return, we need to rebuild the
     * list of fds to monitor.
     */
    close(pfd);
  }
}

#define EAL_INTR_EPOLL_WAIT_FOREVER -1

static void
eal_intr_handle_interrupts(int pfd, unsigned totalfds)
{
  struct epoll_event events[totalfds];
  int nfds = 0;
  int bytes_read;
  char buf[1024];

  for(;;) {
    nfds = epoll_wait(pfd, events, totalfds,
                      EAL_INTR_EPOLL_WAIT_FOREVER);
    /* epoll_wait fail */
    if (nfds < 0) {
      if (errno == EINTR)
        continue;
      printf("epoll_wait returns with fail\n");
      return;
    }
    /* epoll_wait timeout, will never happens here */
    else if (nfds == 0)
      continue;

    /* epoll_wait has at least one fd ready to read */
    bytes_read = 1;
    bytes_read = read(events[0].data.fd, &buf, bytes_read);

    if (bytes_read < 0) {
      if (errno == EINTR || errno == EWOULDBLOCK)
        continue;

      printf("Error reading from file "
              "descriptor %d: %s\n",
              events[0].data.fd,
              strerror(errno));
    }
  }
}

#define UIO_PATH "/dev/uio0"

int main(void)
{
  int fd;

  fd = open(UIO_PATH, O_RDWR);

  if (fd < 0) {
    rte_panic("open %s failed\n", UIO_PATH);
  }

  epoll_uio_file(fd);

  return 0;
}

The key process of the above demo is as follows:

  1. Open binding to igb_uio file generated by network card interface driven by uio
  2. Use fd obtained by opening UIO file in 1 to call epoll for parameters_ uio_ File function
  3. epoll_ uio_ The file function creates an epoll event and adds the incoming fd to the monitoring list
  4. epoll_uio_file then calls eal_. intr_ handle_ Interrupts function, eal_ intr_ handle_ Calling epoll_ in interrupts function Wait monitors events. When an event occurs, call the read function to read the event content

demo run information

The log information of the running result is as follows:

Error reading from file descriptor 3: Input/output error
Error reading from file descriptor 3: Input/output error
Error reading from file descriptor 3: Input/output error
Error reading from file descriptor 3: Input/output error
Error reading from file descriptor 3: Input/output error
Error reading from file descriptor 3: Input/output error

The output information indicates that the and Eal:Error reading from file descriptor 33: Input/output error The same problem.

strace tracking information is as follows:

openat(AT_FDCWD, "/dev/uio0", O_RDWR)   = 3
epoll_create(1)                         = 4
epoll_ctl(4, EPOLL_CTL_ADD, 3, {EPOLLIN|EPOLLPRI, {u32=3, u64=3}}) = 0
epoll_wait(4, [{EPOLLIN|EPOLLPRI|EPOLLERR|EPOLLHUP, {u32=3, u64=3}}], 1, -1) = 1
read(3, 0x7ffcdaac3480, 1)              = -1 EIO (Input/output error)
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(0x88, 0), ...}) = 0
brk(NULL)                               = 0x562f29f41000
brk(0x562f29f62000)                     = 0x562f29f62000
write(1, "Error reading from file descript"..., 57) = 57
epoll_wait(4, [{EPOLLIN|EPOLLPRI|EPOLLERR|EPOLLHUP, {u32=3, u64=3}}], 1, -1) = 1
read(3, 0x7ffcdaac3480, 1)              = -1 EIO (Input/output error)
write(1, "Error reading from file descript"..., 57) = 57

Some functions and implementation of dpdk-16.04 monitoring uio file to detect interrupt

1. One interface supports registering multiple interrupt callbacks

Each interrupt source is linked by a linked list. Each interrupt source also has an interrupt callback linked list. The definition of an interrupt callback is callback function + parameter, and multiple interrupt callbacks are linked by a linked list.

Interrupt callback and interrupt source structure are defined as follows:

struct rte_intr_callback {
	TAILQ_ENTRY(rte_intr_callback) next;
	rte_intr_callback_fn cb_fn;  /**< callback address */
	void *cb_arg;                /**< parameter for callback */
};

struct rte_intr_source {
	TAILQ_ENTRY(rte_intr_source) next;
	struct rte_intr_handle intr_handle; /**< interrupt handle */
	struct rte_intr_cb_list callbacks;  /**< user callbacks */
	uint32_t active;
};

dpdk-16.04 does not check the uniqueness of interrupt callbacks. There are cases where multiple identical interrupt callbacks are registered.

2. Support efficient event monitoring and timely capture and handle interruption events

dpdk-16.04 uses epoll to monitor interrupt events. When registering an interrupt, the pci network card is bound to the IGB_ The handle of the uio file generated by uio will be added to the epoll event. After the registration is completed, it will be processed through epoll_wait to monitor whether there is an interrupt trigger.

3. Support dynamic registration and destruction of interruption events

dpdk-16.04 creates a pipe to rebuild interrupt listening events. The read end of the pipe is also added to the epoll event. After registering the interrupt, data will be written to the write end of the pipe. If the interrupt processing thread monitors that there is data at the read end of the pipe, the interrupt event will be rebuilt.
Similarly, when an interrupt event is destroyed, data will also be written to the write end of the pipe to notify the interrupt processing thread and rebuild the event listening list.

Keywords: epoll dpdk

Added by pl_harish on Sun, 06 Mar 2022 10:41:20 +0200