Code debugging - introduction, practice to principle

In the last article, we analyzed the reasons for the emergence of online coredump, including the coredump analysis tool gdb. Readers have been asking whether they can write an article on gdb debugging these days. Today, with this article, we share some debugging experience in our work, hoping to help you.

Write in front

In my work experience, I developed on windows a few years ago and used Visual Studio for debugging. It is a sharp tool. All kinds of breakpoints can be set with the click of the mouse; Since about 12 years ago, it has switched to Linux development, so the debugging is based on GDB. Originally, this article also wanted to write about debugging under windows. However, it has been useless for many years. In addition, the work is too busy, so this article only writes about GDB debugging under Linux. I'm really sorry for Windows developers 😃.

This article covers a comprehensive and summarizes the gdb debugging experience in recent years (all Pediatrics) 😁)， Some debugging skills often used, hoping to be helpful to relevant personnel engaged in Linux development

background

As a C/C + + developer, ensuring the normal operation of the program is the most basic and main purpose. In order to ensure the normal operation of the program, debugging is the most basic means. Being familiar with these debugging methods can facilitate us to locate the program problem faster and improve the development efficiency.

In the development process, if the running results of the program do not meet the expectations, the first time is to open GDB for debugging, set breakpoints in the corresponding places, and then analyze the reasons; When there is a problem with the online service, check whether the process is in or not. If not, check whether the coredump file is generated. If so, use GDB to debug the coredump file. Otherwise, analyze the kernel log through dmesg to find the cause.

concept

GDB is a powerful program debugging tool based on command line and released by GNU open source organization under UNIX/LINUX operating system.

GDB supports debugging methods such as breakpoints, single step execution, printing variables, observing variables, viewing registers, viewing stacks, etc. In Linux environment software development, GDB is the main debugging tool for debugging C and C + + programs (also supports go and other languages).

Common commands

breakpoint

Breakpoint is a function we often use in debugging. After we set the breakpoint at the specified location, the program will pause when it runs to that location. At this time, we can perform more operations on the program, such as viewing variable content, stack, etc., to help us debug the program.

Commands to set breakpoints are divided into the following categories:

breakpoint
watchpoint
catchpoint

breakpoint

Breakpoints can be generated according to line numbers, functions and conditions. The following are related commands and corresponding function descriptions:

command	effect
break [file]:function	Set a breakpoint at the function entry of the file
break [file]:line	Set a breakpoint on the line of the file file
info breakpoints	View breakpoint list
break [+-]offset	Set a breakpoint where the current position offset is [+ -] offset
break *addr	Set breakpoint at address addr
break ... if expr	Set conditional breakpoints only when conditions are met
ignore n count	Next, ignore count times for breakpoint number n
clear	delete all breakpoints
clear function	Delete all breakpoints within function
delete n	Deletes the specified number of breakpoints
enable n	Enables breakpoints of the specified number
disable n	Disables the specified number of breakpoints
save breakpoints file	Save breakpoint information to the specified file
source file	Import the breakpoint information saved in the file
break	Set breakpoint at next instruction
clear [file:]line	Delete the breakpoint on line

watchpoint

Watchpoint is a special type of breakpoint, similar to normal breakpoint. It is a command that requires GDB to pause program execution. The difference is that the watchpoint does not reside in a line of source code, but instructs GDB to pause the execution whenever the value of an expression changes.

watchpoint is divided into hardware implementation and software implementation. The former needs the support of hardware system; The principle of the latter is to check whether the value of the variable changes after each step. When GDB creates a data breakpoint, it will give priority to trying the hardware mode. If it fails, it will try the software implementation again.

command	effect
watch variable	Set variable data breakpoints
watch var1 + var2	Set expression data breakpoints
rwatch variable	Set the read breakpoint. Only hardware implementation is supported
awatch variable	Set the read / write breakpoint. Only hardware implementation is supported
info watchpoints	View a list of data breakpoints
set can-use-hw-watchpoints 0	Mandatory software based implementation

When using data breakpoints, you should pay attention to:

When the monitoring variable is a local variable, once the local variable fails, the data breakpoint will also fail
If the pointer variable p is monitored, watch *p monitors the change of the memory data referred to by P, and watch p monitors whether the pointer itself has changed

The most common data breakpoint application scenario: locate when the internal members of the structure on the heap are modified. Since pointers are generally local variables, there are generally two methods to solve breakpoint failure.

command	effect
print &variable	View the memory address of the variable
watch (type )address	Setting breakpoints indirectly through memory addresses
watch -l variable	Specify the location parameter
watch variable thread 1	Only the thread numbered 1 breaks when it modifies the var value of the variable

catchpoint

Literally, it is to capture breakpoints, which mainly monitor the generation of signals. For example, throw in c + + or breakpoint behavior occurs when loading the library.

command	meaning
catch fork	Interrupt when program calls fork
tcatch fork	The set breakpoint is triggered only once and then deleted automatically
catch syscall ptrace	Set breakpoints for ptrace system calls

Add the breakpoint number after the command to define the operation to be performed after the breakpoint is triggered. It may be used in some advanced automatic debugging scenarios.

command line

command	effect
run arglist	Run the program with arglist as the parameter list
set args arglist	Specify startup command line parameters
set args	Specify an empty parameter list
show args	Print command line list

Program stack

command	effect
backtrace [n]	Print stack frame
frame [n]	Select the nth stack frame. If it does not exist, the current stack frame will be printed
up n	Select the stack frame with current stack frame number + n
down n	Select the stack frame with the current stack frame number - n
info frame [addr]	Describes the currently selected stack frame
info args	Parameter list of current stack frame
info locals	Local variable of current stack frame

Multi process, multi thread

Multi process

GDB only tracks the parent process by default when debugging multi process programs (including fork calls). You can use command settings to track only the parent process or child process, or debug the parent process and child process at the same time.

command	effect
info inferiors	View process list
attach pid	Binding process id
inferior num	Switch to the specified process for debugging
print $_exitcode	Displays the return value when the program exits
set follow-fork-mode child	Track child processes
set follow-fork-mode parent	Trace parent process
set detach-on-fork on	Only one of these processes is tracked when fork is called
set detach-on-fork off	fork calls track both parent and child processes

In debugging multi process programs, by default, in addition to the current debugging process, other processes are suspended. Therefore, if you need to debug the current process, other processes can also be executed normally, then you can set up set schedule-multiple on.

Multithreading

Multithreaded development is very common in daily development work, so it is necessary to master multithreaded debugging skills.

By default, when debugging multiple threads, all threads will be suspended once the program is interrupted. If you continue to execute the current thread at this time, other threads will also execute at the same time.

command	effect
info threads	View thread list
print $_thread	Displays the number of threads currently being debugged
set scheduler-locking on	While debugging one thread, other threads pause execution
set scheduler-locking off	When debugging one thread, other threads execute synchronously
set scheduler-locking step	When debugging a thread only with step, other threads will not execute, and other commands, such as next, will still execute

If you only care about the current thread, it is recommended to temporarily set {scheduler locking} to} on to avoid other threads running at the same time, resulting in hitting other breakpoints and distracting attention.

Printout

Usually, during debugging, we need to check the value of a variable to analyze whether it meets the expectations. At this time, we need to print out the variable value.

command	effect
whatis variable	View the type of variable
ptype variable	View detailed type information of variables
info variables var	View the file that defines this variable. Local variables are not supported

Print string

Use the x/s command to print an ASCII string. If it is a wide character string, you need to first look at the length of the wide character {print sizeof(str).

If the length is 2, print with x/hs; If the length is 4, print with x/ws.

command	effect
x/s str	Print string
set print elements 0	Print unlimited string length and / or unlimited array length
call printf("%s\n",xxx)	At this time, the printed string will not contain redundant escape characters
printf "%s\n",xxx	ditto

Print array

command	effect
print *array@10	Prints the values of 10 consecutive elements from the beginning of the array
print array[60]@10	Print the 10 elements of the array subscript starting from 60, i.e. the 60th to 69th elements
set print array-indexes on	When printing array elements, the subscripts of the array are also printed

Print pointer

command	effect
print ptr	View the type and address of the pointer
print (struct xxx )ptr	View the contents of the structure pointed to

Prints the value of the specified memory address

Use the x command to print the memory value in the format of x/nfu addr, and print the memory value of n length units starting from addr in the format of f.

n: Number of output units
f: Output format: for example, x indicates hexadecimal output, o indicates octal output, and the default is x
u: The length of a unit, b represents 1 byte, h represents 2 bytes (half word), w represents 4 bytes, and g represents 8 bytes (giant word)

command	effect
x/8xb array	Print the values of the first 8 byte s of the array in hexadecimal
x/8xw array	Print the first 16 word values of array array in hexadecimal

Print local variables

command	effect
info locals	Prints the value of the local variable of the current function
backtrace full	Print the local variable value of each function of the current stack frame. The command can be abbreviated as bt
bt full n	Display n stack frames and their local variables from inside to outside
bt full -n	Display n stack frames and their local variables from outside to inside

Print structure

command	effect
set print pretty on	Each row displays only one member of the structure
set print null-stop	Do not display '\ 000'

Function jump

command	effect
set step-mode on	Without skipping functions without debugging information, you can display and debug assembly code
finish	After executing the current function and printing the return value, the interrupt is triggered
return 0	Instead of executing the following instructions, you can return directly. You can specify the return value
call printf("%s\n", str)	Call printf function to print string (call or print function can be used)
print func()	Call func function (call function or print function can be used)
set var variable=xxx	Set the value of variable to xxx
set {type}address = xxx	Assign a value to a variable whose storage address is address and type is type
info frame	Displays information about the function stack (stack frame address, instruction register value, etc.)

other

Graphical

tui is the abbreviation of terminal user interface. You can enter or exit the graphical interface by specifying the - tui parameter at startup or by using ctrl+x+a during debugging.

command	meaning
layout src	Display source code window
layout asm	Show assembly window
layout split	Display source code + assembly window
layout regs	Display register + source code or assembly window
winheight src +5	Source window height increased by 5 lines
winheight asm -5	Reduce the assembly window height by 5 lines
winheight cmd +5	Increase console window height by 5 lines
winheight regs -5	Reduce the height of the register window by 5 lines

assembly

command	meaning
disassemble function	View the assembly code of the function
disassemble /mr function	Compare function source code and assembly code at the same time

Debug and save core files

command	meaning
file exec_file #	Loading symbol table information of executable file
core core_file	Load core dump file
gcore core_file	Generate a core dump file to record the status of the current process

Start mode

gdb debugging can be started in the following ways:

gdb filename: debug executable
gdb attach pid: debug the running process by "binding" the process ID
gdb filename -c coredump_file: debug executable

In the following sections, the above debugging methods will be explained respectively, so that everyone can better master debugging skills from the perspective of examples.

debugging

Executable file

Single thread

First, let's look at a piece of code:

#include<stdio.h>

void print(int xx, int *xxptr) {
  printf("In print():\n");
  printf("   xx is %d and is stored at %p.\n", xx, &xx);
  printf("   ptr points to %p which holds %d.\n", xxptr, *xxptr);
}

int main(void) {
  int x = 10;
  int *ptr = &x;
  printf("In main():\n");
  printf("   x is %d and is stored at %p.\n", x, &x);
  printf("   ptr points to %p which holds %d.\n", ptr, *ptr);
  print(x, ptr);
  return 0;
}

This code is relatively simple. Let's start debugging:

gdb ./test_main
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-114.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /root/test_main...done.
(gdb) r
Starting program: /root/./test_main
In main():
   x is 10 and is stored at 0x7fffffffe424.
   ptr points to 0x7fffffffe424 which holds 10.
In print():
   xx is 10 and is stored at 0x7fffffffe40c.
   xxptr points to 0x7fffffffe424 which holds 10.
[Inferior 1 (process 31518) exited normally]
Missing separate debuginfos, use: debuginfo-install glibc-2.17-260.el7.x86_64

In the above commands, we start debugging through gdb test command, and then execute the program by executing r (the abbreviation of run command) until exiting. In other words, the above command is a complete process of running executable programs using gdb (only r command is used). Next, we will take this as an example to introduce several common commands.

breakpoint

(gdb) b 15
Breakpoint 1 at 0x400601: file test_main.cc, line 15.
(gdb) info b
Num     Type           Disp Enb Address            What
1       breakpoint     keep y   0x0000000000400601 in main() at test_main.cc:15
(gdb) r
Starting program: /root/./test_main
In main():
   x is 10 and is stored at 0x7fffffffe424.
   ptr points to 0x7fffffffe424 which holds 10.

Breakpoint 1, main () at test_main.cc:15
15	  print(xx, xxptr);
Missing separate debuginfos, use: debuginfo-install glibc-2.17-260.el7.x86_64
(gdb)

backtrace

(gdb) backtrace
#0  main () at test_main.cc:15
(gdb)

The backtrace command lists all frames in the current stack. In the above example, there is only one frame on the stack, numbered 0, belonging to the main function.

(gdb) step
print (xx=10, xxptr=0x7fffffffe424) at test_main.cc:4
4	  printf("In print():\n");
(gdb)

Next, we execute the step command, that is, enter the function. Next, we continue to view the stack frame information through the backtrace command.

(gdb) backtrace
#0  print (xx=10, xxptr=0x7fffffffe424) at test_main.cc:4
#1  0x0000000000400612 in main () at test_main.cc:15
(gdb)

From the above output results, we can see that there are two stack frames. Frame 1 belongs to the main function and frame 0 belongs to the print function.

Each stack frame lists the parameters of the function. From the above, we can see that the main function has no parameters, while the print function has parameters and displays the value of its parameters.

One thing we may be confused about is that the stack frame number of the main function is 0 when executing the backtrace for the first time, while the stack frame of the main function is 1 and the stack frame of the print function is 0 when executing the backtrace for the second time_ With the downward growth of stack_ The rules are the same. We just need to remember_ The minimum frame number is the last function.

frame

Stack frames are used to store information such as variable values of functions. By default, GDB is always located in the context of the stack frame corresponding to the currently executing function.

In the previous example, GDB is in the context of frame 0 because it is currently executing in the print() function. You can obtain the frame of the currently executing context through the frame command.

(gdb) frame
#0  print (xx=10, xxptr=0x7fffffffe424) at test_main.cc:4
4	  printf("In print():\n");
(gdb)

Next, we try to print the value of the current stack frame with the print command, as follows:

(gdb) print xx
$1 = 10
(gdb) print xxptr
$2 = (int *) 0x7fffffffe424
(gdb)

What if we want to see the contents of other stack frames? For example, what about the information of x and ptr in the main function? If you print these two values directly, you will get the following:

(gdb) print x
No symbol "x" in current context.
(gdb) print xxptr
No symbol "ptr" in current context.
(gdb)

Here, we can_ frame num_ To switch stack frames, as follows:

(gdb) frame 1
#1  0x0000000000400612 in main () at test_main.cc:15
15	  print(x, ptr);
(gdb) print x
$3 = 10
(gdb) print ptr
$4 = (int *) 0x7fffffffe424
(gdb)

Multithreading

To facilitate the demonstration, we create a simple example with the following code:

#include <chrono>
#include <iostream>
#include <string>
#include <thread>
#include <vector>

int fun_int(int n) {
  std::this_thread::sleep_for(std::chrono::seconds(10));
  std::cout << "in fun_int n = " << n << std::endl;
  
  return 0;
}

int fun_string(const std::string &s) {
  std::this_thread::sleep_for(std::chrono::seconds(10));
  std::cout << "in fun_string s = " << s << std::endl;
  
  return 0;
}

int main() {
  std::vector<int> v;
  v.emplace_back(1);
  v.emplace_back(2);
  v.emplace_back(3);

  std::cout << v.size() << std::endl;

  std::thread t1(fun_int, 1);
  std::thread t2(fun_string, "test");

  std::cout << "after thread create" << std::endl;
  t1.join();
  t2.join();
  return 0;
}

The above code is relatively simple:

Function fun_ The function of int is to sleep for 10s and then print its parameters
Function fun_ The string function is to sleep for 10s, and then print its parameters
In the main function, create two threads to execute the above two functions respectively

The following is a complete debugging process:

(gdb) b 27
Breakpoint 1 at 0x4013d5: file test.cc, line 27.
(gdb) b test.cc:32
Breakpoint 2 at 0x40142d: file test.cc, line 32.
(gdb) info b
Num     Type           Disp Enb Address            What
1       breakpoint     keep y   0x00000000004013d5 in main() at test.cc:27
2       breakpoint     keep y   0x000000000040142d in main() at test.cc:32
(gdb) r
Starting program: /root/test
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".

Breakpoint 1, main () at test.cc:27
(gdb) c
Continuing.
3
[New Thread 0x7ffff6fd2700 (LWP 44996)]
in fun_int n = 1
[New Thread 0x7ffff67d1700 (LWP 44997)]

Breakpoint 2, main () at test.cc:32
32	  std::cout << "after thread create" << std::endl;
(gdb) info threads
  Id   Target Id         Frame
  3    Thread 0x7ffff67d1700 (LWP 44997) "test" 0x00007ffff7051fc3 in new_heap () from /lib64/libc.so.6
  2    Thread 0x7ffff6fd2700 (LWP 44996) "test" 0x00007ffff7097e2d in nanosleep () from /lib64/libc.so.6
* 1    Thread 0x7ffff7fe7740 (LWP 44987) "test" main () at test.cc:32
(gdb) thread 2
[Switching to thread 2 (Thread 0x7ffff6fd2700 (LWP 44996))]
#0  0x00007ffff7097e2d in nanosleep () from /lib64/libc.so.6
(gdb) bt
#0  0x00007ffff7097e2d in nanosleep () from /lib64/libc.so.6
#1  0x00007ffff7097cc4 in sleep () from /lib64/libc.so.6
#2  0x00007ffff796ceb9 in std::this_thread::__sleep_for(std::chrono::duration<long, std::ratio<1l, 1l> >, std::chrono::duration<long, std::ratio<1l, 1000000000l> >) () from /lib64/libstdc++.so.6
#3  0x00000000004018cc in std::this_thread::sleep_for<long, std::ratio<1l, 1l> > (__rtime=...) at /usr/include/c++/4.8.2/thread:281
#4  0x0000000000401307 in fun_int (n=1) at test.cc:9
#5  0x0000000000404696 in std::_Bind_simple<int (*(int))(int)>::_M_invoke<0ul>(std::_Index_tuple<0ul>) (this=0x609080)
    at /usr/include/c++/4.8.2/functional:1732
#6  0x000000000040443d in std::_Bind_simple<int (*(int))(int)>::operator()() (this=0x609080) at /usr/include/c++/4.8.2/functional:1720
#7  0x000000000040436e in std::thread::_Impl<std::_Bind_simple<int (*(int))(int)> >::_M_run() (this=0x609068) at /usr/include/c++/4.8.2/thread:115
#8  0x00007ffff796d070 in ?? () from /lib64/libstdc++.so.6
#9  0x00007ffff7bc6dd5 in start_thread () from /lib64/libpthread.so.0
#10 0x00007ffff70d0ead in clone () from /lib64/libc.so.6
(gdb) c
Continuing.
after thread create
in fun_int n = 1
[Thread 0x7ffff6fd2700 (LWP 45234) exited]
in fun_string s = test
[Thread 0x7ffff67d1700 (LWP 45235) exited]
[Inferior 1 (process 45230) exited normally]
(gdb) q

During the above commissioning:

b 27 add a breakpoint on line 27
b test.cc:32 add a breakpoint on line 32 (the effect is the same as b 32)
info b outputs all breakpoint information
The r program starts running and pauses at the first breakpoint
c executes the c command, pauses at the second breakpoint, and creates two threads t1 and t2 between the first breakpoint and the second breakpoint
info threads outputs all thread information. From the output, we can see that there are three threads in total, namely main thread, t1 and t2
Thread 2 switches to thread 2
bt outputs the stack information of thread 2
c until the end of the procedure
q exit gdb

Multi process

As above, we still use an example to simulate multi process debugging. The code is as follows:

#include <stdio.h>
#include <unistd.h>

int main()
{
    pid_t pid = fork();
    if (pid == -1) {
       perror("fork error\n");
       return -1;
    }
  
    if(pid == 0) { // Subprocess
        int num = 1;
        while(num == 1){
          sleep(10);
         }
        printf("this is child,pid = %d\n", getpid());
    } else { // Parent process
        printf("this is parent,pid = %d\n", getpid());
      wait(NULL); // Wait for the child process to exit
    }
    return 0;
}

In the above code, there are two processes, one is the parent process (that is, the main process), and the other is the child process created by the fork() function.

By default, in multi process programs, GDB only debugs the main process, that is, no matter how many times the program calls the fork() function and how many child processes are created, GDB only debugs the parent process by default. In order to support multi process debugging, GDB version 7.0 supports separate debugging (debugging parent process or child process) and simultaneous debugging of multiple processes.

So, how do we debug subprocesses? We can debug sub processes in the following ways.

attach

First, both parent and child processes can start gdb for debugging through the attach command. As we all know, the operating system assigns a unique ID number to each running program, that is, the process ID. If we know the process ID, we can debug it with the attach command.

In the above code, the subprocess created by the fork() function first enters the while loop sleep and then calls the printf function after the while loop. The purpose of this is as follows:

Help attach capture the process id to debug
When debugging with gdb, the real code (that is, the print function) is not executed, so you can debug the child process from scratch

You may have doubts. The above code and entering the while loop will not execute the printf function below anyway. In fact, this is the strength of gdb. You can modify the value of num through the gdb command so that it can jump out of the while loop

Compile and generate the executable file test with the following command_ process

g++ -g test_process.cc -o test_process

Now let's try to start debugging.

gdb -q ./test_process
Reading symbols from /root/test_process...done.
(gdb)

It should be noted here that the - q option is added to remove other unnecessary output. q is the abbreviation of quit.

(gdb) r
Starting program: /root/./test_process
Detaching after fork from child process 37482.
this is parent,pid = 37478
[Inferior 1 (process 37478) exited normally]
Missing separate debuginfos, use: debuginfo-install glibc-2.17-260.el7.x86_64 libgcc-4.8.5-36.el7.x86_64 libstdc++-4.8.5-36.el7.x86_64
(gdb) attach 37482
//Symbol class output, omitted here
(gdb) n
Single stepping until exit from function __nanosleep_nocancel,
which has no line number information.
0x00007ffff72b3cc4 in sleep () from /lib64/libc.so.6
(gdb)
Single stepping until exit from function sleep,
which has no line number information.
main () at test_process.cc:8
8	      while(num==10){
(gdb)

In the above command, we execute n (abbreviation of next) to re judge the judgment body of the while loop.

(gdb) set num = 1
(gdb) n
12	      printf("this is child,pid = %d\n",getpid());
(gdb) c
Continuing.
this is child,pid = 37482
[Inferior 1 (process 37482) exited normally]
(gdb)

In order to exit the while loop, we use the set command to set the value of num to 1, so that the condition will expire, exit the while loop, and then execute the following printf() function; At last, we execute the C (short for continue) command to support the program exit.

If the program is running normally and deadlock occurs, you can obtain the process ID through ps, bind it according to gdb attach pid, and then view the stack information

Specify process

By default, when GDB debugs multi process programs, only the parent process is debugged. GDB provides two commands to specify whether to debug parent or child processes through follow fork mode and detach on fork.

follow-fork-mode

This command can be used as follows:

(gdb) set follow-fork-mode mode

mode has the following two options:

Parent: parent process, default option of mode
Child: child process. Its purpose is to tell gdb to debug the child process instead of the parent process after the target application calls fork, because in Linux system, a successful fork() system call will return twice, once in the parent process and once in the child process

(gdb) show follow-fork-mode
Debugger response to a program call of fork or vfork is "parent".
(gdb) set follow-fork-mode child
(gdb) r
Starting program: /root/./test_process
[New process 37830]
this is parent,pid = 37826

^C
Program received signal SIGINT, Interrupt.
[Switching to process 37830]
0x00007ffff72b3e10 in __nanosleep_nocancel () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install glibc-2.17-260.el7.x86_64 libgcc-4.8.5-36.el7.x86_64 libstdc++-4.8.5-36.el7.x86_64
(gdb) n
Single stepping until exit from function __nanosleep_nocancel,
which has no line number information.
0x00007ffff72b3cc4 in sleep () from /lib64/libc.so.6
(gdb) n
Single stepping until exit from function sleep,
which has no line number information.
main () at test_process.cc:8
8	      while(num==10){
(gdb) show follow-fork-mode
Debugger response to a program call of fork or vfork is "child".
(gdb)

In the above command, we did the following:

Show follow fork mode: use this command to view the current mode. You can see from the output that it is in the parent process mode
Set follow fork mode child: Specifies the debug subprocess mode
r: Run the program, run the program directly, enter the sub process, and then execute the while loop
ctrl + c: with this command, GDB can receive the SIGINT command and suspend the execution of the while loop
n(next): continue to execute, and then enter the condition judgment of the while loop
Show follow fork mode: execute the command again. You can see from the output that it is currently in child mode

detach-on-fork

If you specify whether to debug the child process or the parent process at the beginning, the following fork mode command can fully meet the requirements; But what if you want to switch debugging back and forth between the parent process and the child process according to the actual situation during debugging?

GDB provides another command:

(gdb) set detach-on-fork mode

mode has the following two values:

on: the default value indicates that only one process can be debugged, which can be a child process or a parent process

off: every process in the program will be recorded, so we can debug all processes

If you choose to turn off the detach on fork mode (the mode is off), GDB will retain control over all forked processes, that is, you can debug all forked processes. Use the info forks command to list all fork processes that can be debugged by GDB, and use the fork command to switch from one fork process to another.

info forks: print the list of all forked processes under DGB control. The list includes fork id, process id, and the location of the current process
Fork fork ID: the fork ID parameter is the internal fork number assigned by GDB, which can be obtained through the above command info forks

coredump

When we develop or use a program, what we fear most is that the program crashes inexplicably. In order to analyze the cause of the crash, the memory content of the operating system (including the stack and other information when the program crashes) will be dumped when the program crashes (by default, this file is called core.pid, where PID is the process id). This dump operation is called coredump (core dump). Then we can debug this file with the debugger, To restore the scene when the program crashed.

Before we analyze how to debug the coredump file with gdb, we need to generate a coredump. For simplicity, we use the following example to generate it:

#include <stdio.h>

void print(int *v, int size) {
  for (int i = 0; i < size; ++i) {
    printf("elem[%d] = %d\n", i, v[i]);
  }
}

int main() {
  int v[] = {0, 1, 2, 3, 4};
  print(v, 1000);
  return 0;
}

Compile and run the program:

g++ -g test_core.cc -o test_core
./test_core

The output is as follows:

elem[775] = 1702113070
elem[776] = 1667200115
elem[777] = 6648431
elem[778] = 0
elem[779] = 0
 Segment error(spit out the pips)

As expected, the program generates exceptions, but does not generate coredump files. This is because coredump generation is turned off by default, so you need to set the corresponding options to turn on coredump generation.

For coredump generated by multithreaded programs, sometimes its stack information can not completely analyze the cause, which makes us have to have other ways.

There was an online failure in 18 years. Everything was normal in the test environment, but when it was online, it would coredump. After debugging coredump according to gdb, it could only be located in libcurl, but the reason could not be located. It took about two days. It was found that coredump was only available when it timed out. Because the configuration of the test environment was poor, the timeout setting was 20ms, while the online timeout setting was 5ms, After knowing the cause of coredump, the method of step-by-step positioning and scope reduction is adopted to gradually narrow the scope of the code. Finally, it is located that it is caused by a bug in libcurl. Therefore, many times, the problem on the positioning line needs to take appropriate methods to locate the problem in combination with the actual situation.

to configure

The configuration coredump is generated, including temporary configuration (the configuration fails after exiting the terminal) and permanent configuration.

temporary

Through ulimit -a, you can determine whether coredump generation is currently configured:

ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0

From the above output, we can see that the number after the core file size is 0, that is, the core dump file is not generated. We can set it through the following command

ulimit -c size

Where size is the size of the coredump that is allowed to be generated. This is generally set as large as possible to prevent incomplete coredump information. The author generally sets it as unlimited.

ulimit -c unlimited

It should be noted that the default generation path of the temporarily configured coredump option is the path when the command is executed. You can modify the path by modifying the configuration.

permanent

The above settings only enable the core dump function. By default, the core file generated by the kernel during coredump is placed in the same directory as the program, and the file name is fixed as core. Obviously, if multiple programs generate core files, or the same program crashes multiple times, the same core file will be overwritten repeatedly.

By modifying the parameters of the kernel, you can specify the file name of the coredump file generated by the kernel. The following commands can be used to realize the permanent configuration, storage path and generation of coredump name.

mkdir -p /www/coredump/
chmod 777 /www/coredump/

/etc/profile
ulimit -c unlimited

/etc/security/limits.conf
*          soft     core   unlimited

echo "/www/coredump/core-%e-%p-%h-%t" > /proc/sys/kernel/core_pattern

debugging

Now, we re execute the following command to generate the coredump file as expected:

./test_coredump

elem[955] = 1702113070
elem[956] = 1667200115
elem[957] = 6648431
elem[958] = 0
elem[959] = 0
 Segment error(spit out the pips)

Then use the following command for coredump debugging:

gdb ./test_core -c /www/coredump/core_test_core_1640765384_38924 -q

The output is as follows:

#0  0x0000000000400569 in print (v=0x7fff3293c100, size=1000) at test_core.cc:5
5	    printf("elem[%d] = %d\n", i, v[i]);
Missing separate debuginfos, use: debuginfo-install glibc-2.17-260.el7.x86_64 libgcc-4.8.5-36.el7.x86_64 libstdc++-4.8.5-36.el7.x86_64
(gdb)

It can be seen that the program core is on line 5. At this time, we can view the stack backtracking information through the where command.

Enter the where command in gdb to get the stack call information. This is the most basic and useful command when debugging coredump. The output of the where command contains the function name and related parameter values in the program.

Through the where command, we can find that the program core is on line 5, so we can basically locate the reason according to the analysis of the source code.

It should be noted that when multithreading is running, the core is not necessarily in the current thread, which requires us to have a certain understanding of the code to ensure which code is safe, then switch the thread through thread num, and then view the stack information through bt or where command, so as to locate the cause of coredump.

principle

In the previous sections, we talked about the commands of GDB and the role of these commands in debugging, and demonstrated them with examples. As a C/C++ coder, we should know its nature and why. Therefore, with the help of this section, we will talk about the principle of GDB debugging.

gdb takes over the execution of a process through the system call ptrace. Ptrace system call provides a way for the parent process to observe and control the execution of other processes, check and change its core image and registers. It is mainly used to realize breakpoint debugging and system call tracking.

ptrace system call is defined as follows:

#include <sys/ptrace.h>
long ptrace(enum __ptrace_request request, pid_t pid, void *addr, void *data)

pid_t pid: indicates the process to be tracked by ptrace
void *addr: indicates the memory address to be monitored
enum __ ptrace_ Request: determines the function of system call. There are several main options:
- PTRACE_ Trace: indicates that this process will be tracked by the parent process, and any signal (except {SIGKILL) will pause the child process, and then the parent process blocking} wait() will be awakened. The call to {exec() inside the child process will send a} sigrap} signal, which allows the parent process to fully control the child process before the new program starts running
- PTRACE_ATTACH: attach to a specified process to make it a child process tracked by the current process, and the behavior of the child process is equivalent to that it has performed a ptrace_ Trace operation. However, it should be noted that although the current process becomes the parent process of the tracked process, the pid of the child process using getppid() will still be the pid of its original parent process
- PTRACE_CONT: resume the child process that was stopped before. The specified signal can be delivered to the child process at the same time

Debugging principle

Run and debug new processes

Run and debug the new process as follows:

Run gdb exe
Enter the run command and gdb performs the following operations:
- Create a new process through the fork() system call
- Execute the ptrace (ptrace_trace, 0, 0, 0) operation in the newly created child process
- In the subprocess, the specified executable file is loaded through the execv() system call

attach the running process

You can debug a running process through gdb attach pid. gdb will perform ptrace(PTRACE_ATTACH, pid, 0, 0) operation on the specified process.

It should be noted that when we attach a process id, the following error may be reported:

Attaching to process 28849
ptrace: Operation not permitted.

This is because you do not have permission to operate. You can operate under the user or root who starts the process.

Breakpoint principle

Implementation principle

When we set a breakpoint through b or break, we insert a breakpoint instruction at the specified position. When the debugged program runs to the breakpoint, sigrap signal is generated. The signal is captured by gdb and the breakpoint hit judgment is performed.

Setting principle

To set a breakpoint in a program is to first save the original instruction in this location, and then write int 3 in this location. When int 3 is executed, a soft interrupt occurs and the kernel sends a sigrap signal to the child process. Of course, this signal is forwarded to the parent process. Then replace int 3 with the saved instruction and wait for the operation to resume.

Hit judgment

gdb stores all breakpoint locations in a linked list. Hit determination compares the current stop position of the debugged program with the breakpoint position in the linked list to view the signal generated by the breakpoint.

Conditional judgment

After the instruction is resumed at the breakpoint, a condition judgment is added. If the expression is true, a breakpoint is triggered. Because it needs to be judged once, whether to trigger the conditional breakpoint after adding the conditional breakpoint will affect the performance. On x86 platform, some hardware supports hardware breakpoints. Instead of inserting int 3 at the conditional breakpoint, insert another instruction. When the program reaches this address, it does not send an int 3 signal, but makes a comparison. The contents of a specific register and an address, and then decide whether to send int 3. Therefore, when your breakpoint location is frequently "passed" by the program, try to use hardware breakpoints, which will help improve performance.

Single step principle

This ptrace function is supported by ptrace(PTRACE_SINGLESTEP, pid,...) Call to implement a single step.

 printf("attaching to PID %d\n", pid);
    if (ptrace(PTRACE_ATTACH, pid, 0, 0) != 0)
    {
        perror("attach failed");
    }
    int waitStat = 0;
    int waitRes = waitpid(pid, &waitStat, WUNTRACED);
    if (waitRes != pid || !WIFSTOPPED(waitStat))
    {
        printf("unexpected waitpid result!\n");
        exit(1);
    }
   
    int64_t numSteps = 0;
    while (true) {
        auto res = ptrace(PTRACE_SINGLESTEP, pid, 0, 0);
    }

The above code first receives a pid, then carries on the attach, finally calls ptrace to carry on the single step debugging.

other

With the help of this article, briefly introduce some other commands or tools used in the author's work.

pstack

This command displays the stack trace for each process. The pstack command must be run by the owner or root of the corresponding process. Pstack can be used to determine where a process is suspended. The only option allowed for this command is the PID of the process to check.

This command is very useful for troubleshooting process problems. For example, if we find that a service is always in work state (such as suspended state, like an endless loop), we can easily locate the problem by using this command; pstack can be executed several times over a period of time. If it is found that the code stack always stops at the same location, that location needs to be focused on, which is likely to be the problem;

Taking the multithreaded code as an example, if its process ID is 4507 (local to the author), then

The output results of pstack 4507 are as follows:

Thread 3 (Thread 0x7f07aaa69700 (LWP 45708)):
#0  0x00007f07aab2ee2d in nanosleep () from /lib64/libc.so.6
#1  0x00007f07aab2ecc4 in sleep () from /lib64/libc.so.6
#2  0x00007f07ab403eb9 in std::this_thread::__sleep_for(std::chrono::duration<long, std::ratio<1l, 1l> >, std::chrono::duration<long, std::ratio<1l, 1000000000l> >) () from /lib64/libstdc++.so.6
#3  0x00000000004018cc in void std::this_thread::sleep_for<long, std::ratio<1l, 1l> >(std::chrono::duration<long, std::ratio<1l, 1l> > const&) ()
#4  0x00000000004012de in fun_int(int) ()
#5  0x0000000000404696 in int std::_Bind_simple<int (*(int))(int)>::_M_invoke<0ul>(std::_Index_tuple<0ul>) ()
#6  0x000000000040443d in std::_Bind_simple<int (*(int))(int)>::operator()() ()
#7  0x000000000040436e in std::thread::_Impl<std::_Bind_simple<int (*(int))(int)> >::_M_run() ()
#8  0x00007f07ab404070 in ?? () from /lib64/libstdc++.so.6
#9  0x00007f07ab65ddd5 in start_thread () from /lib64/libpthread.so.0
#10 0x00007f07aab67ead in clone () from /lib64/libc.so.6
Thread 2 (Thread 0x7f07aa268700 (LWP 45709)):
#0  0x00007f07aab2ee2d in nanosleep () from /lib64/libc.so.6
#1  0x00007f07aab2ecc4 in sleep () from /lib64/libc.so.6
#2  0x00007f07ab403eb9 in std::this_thread::__sleep_for(std::chrono::duration<long, std::ratio<1l, 1l> >, std::chrono::duration<long, std::ratio<1l, 1000000000l> >) () from /lib64/libstdc++.so.6
#3  0x00000000004018cc in void std::this_thread::sleep_for<long, std::ratio<1l, 1l> >(std::chrono::duration<long, std::ratio<1l, 1l> > const&) ()
#4  0x0000000000401340 in fun_string(std::string const&) ()
#5  0x000000000040459f in int std::_Bind_simple<int (*(char const*))(std::string const&)>::_M_invoke<0ul>(std::_Index_tuple<0ul>) ()
#6  0x000000000040441f in std::_Bind_simple<int (*(char const*))(std::string const&)>::operator()() ()
#7  0x0000000000404350 in std::thread::_Impl<std::_Bind_simple<int (*(char const*))(std::string const&)> >::_M_run() ()
#8  0x00007f07ab404070 in ?? () from /lib64/libstdc++.so.6
#9  0x00007f07ab65ddd5 in start_thread () from /lib64/libpthread.so.0
#10 0x00007f07aab67ead in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x7f07aba80740 (LWP 45707)):
#0  0x00007f07ab65ef47 in pthread_join () from /lib64/libpthread.so.0
#1  0x00007f07ab403e37 in std::thread::join() () from /lib64/libstdc++.so.6
#2  0x0000000000401455 in main ()

In the above output results, the detailed information inside the process is output to the terminal to facilitate the analysis of the problem.

ldd

During the compilation process, we usually prompt that the compilation fails. Through the output of error information, it is found that the function definition cannot be found, or the compilation succeeds, but the runtime fails (often because it depends on the abnormal version of lib Library). At this time, we can analyze which libraries the executable depends on and the path of these libraries through ldd.

It is used to view the shared library required by the program. It is often used to solve some problems that the program cannot run due to the lack of a library file.

Still view the executable test_ The dependent Library of thread, and the output is as follows:

ldd -r ./test_thread
	linux-vdso.so.1 =>  (0x00007ffde43bc000)
	libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f8c5e310000)
	libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007f8c5e009000)
	libm.so.6 => /lib64/libm.so.6 (0x00007f8c5dd07000)
	libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f8c5daf1000)
	libc.so.6 => /lib64/libc.so.6 (0x00007f8c5d724000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f8c5e52c000)

In the above output:

Column 1: what libraries does the program need to rely on
The second column: the library provided by the system corresponding to the library required by the program
Column 3: start address of Library loading

Sometimes, when we view the dependent library through ldd, we will prompt that the library cannot be found, as follows:

ldd -r test_process
	linux-vdso.so.1 =>  (0x00007ffc71b80000)
	libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007fe4badd5000)
	libm.so.6 => /lib64/libm.so.6 (0x00007fe4baad3000)
	libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007fe4ba8bd000)
	libc.so.6 => /lib64/libc.so.6 (0x00007fe4ba4f0000)
	/lib64/ld-linux-x86-64.so.2 (0x00007fe4bb0dc000)
	 liba.so => not found

For example, the last hint above, Liba So can't find it. At this time, we need to know Liba The path of so, such as / path / to / Liba So, there are two ways:

LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/path/to/

In this way, you can find the corresponding lib library through ldd, but this disadvantage is temporary. That is, after exiting the terminal and executing ldd, you will still be prompted that the library cannot be found, so there is another way, that is, modify / etc / LD so. Conf, add the required path after the file, i.e

include ld.so.conf.d/*.conf
/path/to/

The following order shall then be passed to take effect permanently

 /sbin/ldconfig

c++filter

Because c + + supports overloading, the name mangling mechanism of the compiler is introduced to rename functions.

We use the strings command to view test_ Function information in thread (only relevant information such as fun is output)

strings test_thread | grep fun_
in fun_int n =
in fun_string s =
_GLOBAL__sub_I__Z7fun_inti
_Z10fun_stringRKSs

Can see_ Z10fun_ If you want to know the function definition of stringrkss, you can use the C + + filter command, as follows:

 c++filt _Z10fun_stringRKSs
fun_string(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)

Through the above output, we can restore the function name generated by the compiler to the function name in our code, namely fun_string.

epilogue

GDB is an essential debugging tool for Linux development. The use scenario depends on specific requirements or specific problems encountered. In our daily development work, skillfully using GDB to assist can make the development process get twice the result with half the effort.

Starting from some simple commands, this paper gives examples to debug executable programs (single thread, multi thread and multi process scenarios), coredump files and other scenarios, so that we can more intuitively understand the use of GDB. GDB is very powerful. The author uses some very basic functions in his work. If you want to deeply understand GDB, you need to read it on the official website.

This article took about three weeks from conception to completion. The writing process is painful (it needs to sort out materials, build various scenes, and restore various scenes), and it is full of harvest at the same time. Through this paper, the understanding of the underlying principle of GDB is further deepened.

Author: high performance architecture exploration
This article starts with the official account [high performance architecture].
Personal technology blog: High performance architecture exploration

Turn https://www.cnblogs.com/gaoxingnjiagoutansuo/p/15820753.html

Keywords: C++

Added by richza on Mon, 24 Jan 2022 13:49:24 +0200

Programming VIP

Code debugging - introduction, practice to principle

Write in front

background

concept

Common commands

breakpoint

breakpoint

watchpoint

catchpoint

command line

Program stack

Multi process, multi thread

Multi process

Multithreading

Printout

Print string

Print array

Print pointer

Prints the value of the specified memory address

Print local variables

Print structure

Function jump

other

Graphical

assembly

Debug and save core files

Start mode

debugging

Executable file

Single thread

breakpoint

backtrace

frame

Multithreading

Multi process

attach

Specify process

follow-fork-mode

detach-on-fork

coredump

to configure

temporary

permanent

debugging

principle

Debugging principle

Run and debug new processes

attach the running process

Breakpoint principle

Implementation principle

Setting principle

Hit judgment

Conditional judgment

Single step principle

other

pstack

ldd

c++filter

epilogue

Popular Keywords