Principle of memory allocation in linux Environment

0. There are several key concepts of virtual memory management in Linux

  • How is the Linux virtual address space distributed?
  • How do malloc and free allocate and free memory?
  • How to view the fragmentation of memory in the heap?
  • Since the in heap memory brk and sbrk cannot be released directly, why not use mmap to allocate them all and munmap to release them directly?

Linux virtual memory management has several key concepts:

  • Each process has an independent virtual address space, and the virtual address accessed by the process is not the real physical address;
  • The virtual address can be mapped with the physical address through the page table on each process (in the kernel virtual address space of each process) to obtain the real physical address;
  • If the physical address corresponding to the virtual address is not in the physical memory, a page missing interrupt is generated, the physical address is actually allocated, and the page table of the process is updated at the same time; If the physical memory is exhausted at this time, some pages will be eliminated to the physical disk according to the memory replacement algorithm

1. How is the Linux virtual address space distributed?

Linux uses virtual address space, which greatly increases the addressing space of the process, from low address to high address:

  • 1. Read only segment: this part of the space can only be read and cannot be written; (including: code segment, rodata segment (C constant string and #define defined constant))
  • 2. Data segment: space for saving global variables and static variables;
  • 3. Heap: it is usually referred to as dynamic memory. malloc/new mostly comes from this. The position of the stack top can be dynamically adjusted by functions brk and sbrk.
  • 4. File mapping area: memory mapped to physical space, such as dynamic library and shared memory, which is generally the virtual address space allocated by mmap function.
  • 5. Stack: used to maintain the context space of function calls, generally 8M, which can be viewed through ulimit – s.
  • 6. Kernel virtual space: the memory area invisible to the user code, which is managed by the kernel (the page table is stored in the kernel virtual space).

The following figure shows the typical virtual address space distribution of 32-bit system (from deep understanding of computer system).

The 32-bit system has 4G address space:
0x08048000~0xbfffffff are user space and 0xc00000000 ~ 0xFFFFFFFF are kernel space, including kernel code and data, process related data structures (such as page table, kernel stack), etc. In addition,%esp executes the stack top and changes to the low address direction; The brk/sbrk function controls the top of the heap_ edata changes in the direction of high address.

What about the 64 bit system? Does the 64 bit system have an address space of 2 ^ 64?
In fact, the virtual address space division of 64 bit system has changed:

  • 1. The address space size is not 2 ^ 32 or 2 ^ 64, but generally 2 ^ 48
    Because it does not need such a large addressing space as 2 ^ 64, too much space will only lead to a waste of resources. 64 bit Linux generally uses 48 bits to represent the virtual address space and 40 bits to represent the physical address, which can be viewed through * * #cat /proc/cpuinfo * *:
  • 2. Among them, 0x0000000000000000 ~ 0x00007fffffffffffffffff represents user space, 0xffff8000000000 ~ 0xffffffffffffffffff represents kernel space, providing a total of 256TB(2^48) addressing space.
    The characteristic of these two intervals is that bit 47 is the same as bits 48 ~ 63. If these bits are 0, it means user space, otherwise it means kernel space.
  • 3. User space from low address to high address is still read-only segment, data segment, heap, file mapping area and stack;

2. How do malloc and free allocate and free memory?

How to check the number of page breaks in a process?
View with * * # ps -o majflt,minflt -C program * * command

majflt stands for major fault and its Chinese name is big mistake. minflt stands for minor fault and its Chinese name is small mistake.
These two values represent the number of page breaks that have occurred since a process was started.
You can use the command ps -o majflt minflt -C program to view the value of majflt and minflt of the process. These two values are cumulative values, which are accumulated from the start of the process. We can pay more attention to these two values when we stress test programs with high performance requirements.

If a process uses mmap to map large data files to the virtual address space of the process, we need to focus on the value of majflt, because compared with minflt, majflt is fatal to performance. It takes a few milliseconds to read the disk randomly, while minflt only affects performance in a large number of times.

What operations are performed after the page missing interrupt is sent?
When a process is interrupted due to missing pages, the process will fall into the kernel state and perform the following operations:

  • 1. Check whether the virtual address to be accessed is legal
  • 2. Find / assign a physical page
  • 3. Fill the contents of the physical page (read the disk, or set it to 0 directly, or do nothing)
  • 4. Establish mapping relationship (virtual address to physical address)

Re execute the instruction in which the page break occurred
If you need to read the disk in step 3, the page loss interrupt is majflt, otherwise it is minflt.

Principle of memory allocation
From the perspective of operating system, there are two ways for processes to allocate memory, which are completed by two system calls: brk and mmap (regardless of shared memory).

  • 1. brk is the highest address pointer to the data segment (. data)_ edata pushes to the high address;
  • 2. mmap is to find a piece of free virtual memory in the process's virtual address space (in the middle of the heap and stack, called the file mapping area).

Both methods allocate virtual memory without allocating physical memory. When accessing the allocated virtual address space for the first time, a page missing interrupt occurs. The operating system is responsible for allocating physical memory, and then establishing the mapping relationship between virtual memory and physical memory.

In the standard C library, malloc/free functions are provided to allocate and release memory. The bottom layers of these two functions are implemented by brk, mmap and munmap system calls.

The following is an example to illustrate the principle of memory allocation:

Case 1: if malloc is less than 128k memory, brk is used to allocate memory_ edata pushes to the high address (only allocate virtual space, not physical memory (so it is not initialized). When reading / writing data for the first time, the kernel will be interrupted due to missing pages, and then the kernel will allocate the corresponding physical memory, and then establish a mapping relationship with the virtual address space), as shown in the following figure:

  • 1. When a process starts, the initial layout of its (virtual) memory space is shown in Figure 1.
    Among them, the mmap memory mapping file is in the middle of the heap and stack (such as libc-2.2.93.so, other data files, etc.). For simplicity, the memory mapping file is omitted.

_ The edata pointer (defined in glibc) points to the highest address of the data segment.

  • 2. After the process calls A=malloc(30K), the memory space is shown in Figure 2:
    malloc function will call brk system call, and_ The edata pointer pushes 30K to the high address to complete the virtual memory allocation.

You may ask: just put_ edata+30K completes the memory allocation?
The truth is_ edata+30K only completes the allocation of virtual addresses. There is still no physical page corresponding to memory A. when the process reads and writes memory a for the first time, a page missing interrupt occurs. At this time, the kernel allocates the physical page corresponding to memory a. In other words, if malloc is used to allocate the content of a, and then it is never accessed, the physical page corresponding to a will not be allocated.

  • 3. After the memory (malloc) space is 40K, as shown in the figure.

Case 2: for memory with malloc greater than 128k, use mmap to allocate memory, and find a free memory allocation between heap and stack (corresponding to independent memory and initialized to 0), as shown in the following figure:

  • 4. After the process calls C=malloc(200K), the memory space is shown in Figure 4:
    By default, the malloc function allocates memory. If the requested memory is greater than 128K (adjustable by the M_MMAP_THRESHOLD option), it is not pushed_ Instead of using the edata pointer, the MMAP system call is used to allocate a piece of virtual memory from the middle of the heap and stack.
    This is mainly because:
    The memory allocated by brk can only be released after the high address memory is released (for example, A cannot be released before B is released, which is the reason for memory fragmentation. When to tighten, see the following), while the memory allocated by mmap can be released separately.

Of course, there are other advantages and disadvantages. More specifically, interested students can see the malloc code in glibc.

  • 5. After the process calls D=malloc(100K), the memory space is shown in Figure 5;
  • 6. Process call free © After that, the virtual memory corresponding to C is released together with the physical memory.
  • 7. After the process calls free(B), as shown in Figure 7:
    The virtual memory and physical memory corresponding to B are not released because there is only one_ If the edata pointer is pushed back, what about the memory of D?
    Of course, the memory of B can be reused. If there is another 40K request at this time, malloc is likely to return the memory of B.
  • 8. After the process calls free(D), as shown in Figure 8:
    B and D are connected and become a piece of 140K free memory.
  • 9. By default:
    When the free memory of the highest address space exceeds 128K (which can be adjusted by the M_TRIM_THRESHOLD option), execute the (trim) memory compression operation. In the previous step free, it is found that the free memory of the highest address exceeds 128K, so the memory compression becomes as shown in Fig. 9.

the case is entirely cleared
Having finished the principle of memory allocation, the reason for the high cpu consumption of the tested module in the kernel state is very clear: malloc allocates 2M memory every time the request comes. By default, malloc calls mmap to allocate memory. When the request ends, it calls munmap to release memory. Assuming that each request requires 6 physical pages, each request will generate 6 page missing interrupts. Under the pressure of 2000, more than 10000 page missing interrupts are generated per second. These page missing interrupts do not need to be solved by reading the disk, so they are called minflt; Page missing interrupts are executed in the kernel state, so the cpu consumption of the process in the kernel state is very large. Page missing interrupts are scattered throughout the processing of requests, so the proportion of allocation statement time (10us) to the processing time of the whole request (1000us) is very small.

terms of settlement
Change the dynamic memory to static allocation, or allocate it for each thread with malloc at startup, and then save it in threaddata. However, due to the particularity of this module, static allocation or allocation at startup is not feasible. In addition, the default stack size limit under Linux is 10M. If a few M of memory is allocated on the stack, there is a risk.

malloc is prohibited from calling mmap to allocate memory and memory compression is prohibited.
When the process starts, add the following two lines of code:
mallopt(M_MMAP_MAX, 0); // Prohibit malloc from calling mmap to allocate memory
mallopt(M_TRIM_THRESHOLD, -1); // Prohibit memory compression
Effect: after adding these two lines of code, observe with ps command. After the pressure is stable, both majlt and minflt are 0. The system state cpu of the process is reduced from 20 to 10.

3. How to view the fragmentation of memory in the heap?

glibc provides the following structures and interfaces to view the usage of in heap memory and mmap.

struct mallinfo {
  int arena;            /* non-mmapped space allocated from system */
  int ordblks;         /* number of free chunks */
  int smblks;          /* number of fastbin blocks */
  int hblks;             /* number of mmapped regions */
  int hblkhd;           /* space in mmapped regions */
  int usmblks;        /* maximum total allocated space */
  int fsmblks;         /* space available in freed fastbin blocks */
  int uordblks;        /* total allocated space */
  int fordblks;         /* total free space */
  int keepcost;       /* top-most, releasable (via malloc_trim) space */
};

Returns the memory usage of heap(main_arena) in the form of mallinfo structure

struct mallinfo mallinfo();

Output the usage of heap and mmap to stderr

void malloc_stats();

mallinfo and malloc can be verified by the following example_ Stats output results.

#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <sys/mman.h>
#include <malloc.h>

size_t  heap_malloc_total, heap_free_total,mmap_total, mmap_count;

void print_info() {
    struct mallinfo mi = mallinfo();
	printf("count by itself:\n");
    printf("\theap_malloc_total=%lu heap_free_total=%lu heap_in_use=%lu\n\tmmap_total=%lu mmap_count=%lu\n",
              heap_malloc_total*1024, heap_free_total*1024, heap_malloc_total*1024-heap_free_total*1024,
              mmap_total*1024, mmap_count);
	printf("count by mallinfo:\n");
	printf("\theap_malloc_total=%lu heap_free_total=%lu heap_in_use=%lu\n\tmmap_total=%lu mmap_count=%lu\n",
             mi.arena, mi.fordblks, mi.uordblks,
             mi.hblkhd, mi.hblks);
	printf("from malloc_stats:\n");
	malloc_stats();
}

#define ARRAY_SIZE 200
int main(int argc, char** argv) {
    char** ptr_arr[ARRAY_SIZE];
    int i; 
    for( i = 0; i < ARRAY_SIZE; i++) {
            ptr_arr[i] = malloc(i * 1024); 
            if ( i < 128) {	//glibc uses mmap above 128k by default
            	heap_malloc_total += i;
            }  else {
            	mmap_total += i;
                mmap_count++;
            }
    } 
    print_info();
    for( i = 0; i < ARRAY_SIZE; i++) {
           if ( i % 2 == 0) {
                continue;
           }
           free(ptr_arr[i]);
           if ( i < 128) {
           		heap_free_total += i;
           } else {
                mmap_total -= i;
                mmap_count--;
           }
    } 
    printf("\nafter free\n");
    print_info();
    return 1;
}

The first loop of this example allocates a memory block of index position (KB) size for each member of the pointer array, and counts the heap and mmap memory allocation respectively through 128 as the boundary;

The second loop is the free index subscript to odd items, while updating the count. Counting through the program and mallinfo / malloc_ The results obtained by stats interface are compared and printed_ Print info to the terminal.

Here is an execution result:

count by itself:
        heap_malloc_total=8323072 heap_free_total=0 heap_in_use=8323072
        mmap_total=12054528 mmap_count=72
count by mallinfo:
        heap_malloc_total=8327168 heap_free_total=2032 heap_in_use=8325136
        mmap_total=12238848 mmap_count=72

from malloc_stats:
Arena 0:
system bytes     =    8327168
in use bytes     =    8325136
Total (incl. mmap):
system bytes     =   20566016
in use bytes     =   20563984
max mmap regions =         72
max mmap bytes   =   12238848

after free
count by itself:
        heap_malloc_total=8323072 heap_free_total=4194304 heap_in_use=4128768
        mmap_total=6008832 mmap_count=36

count by mallinfo:
        heap_malloc_total=8327168 heap_free_total=4197360 heap_in_use=4129808
        mmap_total=6119424 mmap_count=36

from malloc_stats:
Arena 0:
system bytes     =    8327168
in use bytes     =    4129808
Total (incl. mmap):
system bytes     =   14446592
in use bytes     =   10249232
max mmap regions =         72
max mmap bytes   =   12238848

It can be seen from the above that the program statistics are basically consistent with the information obtained by mallinfo, including heap_free_total represents the sum of memory fragments released in the heap.

If you want to know how many fragments there are in the heap, you can get through the values of fsmblks, smblks and ordblks in the mallinfo structure. These values represent the total number of fragments in different size intervals, which are 0 ~ 80 bytes, 80 ~ 512 bytes and 512~128k respectively. If the values of fsmblks and smblks are too large, the fragmentation problem may be serious.

However, a fatal problem of mallinfo structure is that its member definitions are all int s. In a 64 bit environment, uordblks/fordblks/arena/usmblks in the structure can easily lead to overflow. It should be a problem left over from history. Pay attention to it when using it!

4. Since the in heap memory brk and sbrk cannot be released directly, why not use mmap to allocate them all and munmap to release them directly?

Since the fragments in the heap cannot be released directly, which leads to the suspected "memory leak" problem, why doesn't malloc use mmap to realize it all (the memory allocated by mmap can be free through munmap to realize the real release)? Instead, use mmap only for large blocks of memory larger than 128k?

In fact, the interfaces sbrk/mmap/munmap used by the process to apply for and release the address space from the OS are system calls. Frequent system calls consume system resources. Moreover, after the memory requested by mmap is munmap, reapplication will produce more page missing interrupts. For example, when using mmap to allocate 1M space, a large number of page missing interrupts (1M/4K times) are generated in the first call. When allocating 1M space again after munmap, a large number of page missing interrupts will be generated again. Page missing interrupt is a kernel behavior, which will lead to large CPU consumption in kernel state. In addition, if you use mmap to allocate small memory, it will lead to more fragmentation of address space and greater management burden of the kernel.

At the same time, the heap is a continuous space, and the fragments in the heap have not been returned to the OS. If the reusable fragments access the memory again, it is likely that there is no need to generate any system call and page missing interrupt, which will greatly reduce the consumption of CPU. Therefore, in the malloc implementation of glibc, the differences and advantages and disadvantages between sbrk and MMAP are fully considered. MMAP is used to obtain the address space by allocating large memory (128k) by default. This critical value can also be modified through mallopt(M_MMAP_THRESHOLD).

5. How to view the missing page interrupt information of the process?

You can view the missing page interrupt information through the following command

ps -o majflt,minflt -C <program_name>
ps -o majflt,minflt -p <pid>

among
majflt stands for major fault and refers to major error;
minflt stands for minor fault and refers to small error.
These two values represent the number of page breaks that have occurred since a process was started.

majflt differs from minflt in that:
majflt indicates that the disk needs to be read and written. It may be that the page corresponding to the memory needs to be load ed into the physical memory in the disk, or it may be that the physical memory is insufficient at this time and some physical pages need to be eliminated into the disk.

6. Besides malloc/free of glibc, are there any other third-party implementations?

In fact, many people began to criticize the implementation of glibc memory management, especially the problems of high concurrency, low performance and memory fragmentation. Therefore, some third-party tools have emerged to replace the implementation of glibc, the most famous of which are google's tcmalloc and facebook's jemalloc.

There are many resources on the Internet that you can check by yourself (only use the third-party library, and you can use malloc in the third-party library without modifying the code).

7. References

Refer to dzqdevin's blog
Refer to Baidu Forum's blog

8. Program test code

#include <malloc.h>
#include <string.h>
#include <stdlib.h>
#include <iostream>

static void display_mallinfo(void) {
	struct mallinfo mi;
	mi = mallinfo();
    printf("Total non-mmapped bytes (arena):       %d\n", mi.arena);
    printf("# of free chunks (ordblks):            %d\n", mi.ordblks);
    printf("# of free fastbin blocks (smblks):     %d\n", mi.smblks);
    printf("# of mapped regions (hblks):           %d\n", mi.hblks);
    printf("Bytes in mapped regions (hblkhd):      %d\n", mi.hblkhd);
    printf("Max. total allocated space (usmblks):  %d\n", mi.usmblks);
    printf("Free bytes held in fastbins (fsmblks): %d\n", mi.fsmblks);
    printf("Total allocated space (uordblks):      %d\n", mi.uordblks);
    printf("Total free space (fordblks):           %d\n", mi.fordblks);
    printf("Topmost releasable block (keepcost):   %d\n", mi.keepcost);
}

int main(int argc, char *argv[]) {
#define MAX_ALLOCS 2000000
	char *alloc[MAX_ALLOCS];
    int numBlocks, j, freeBegin, freeEnd, freeStep;
    size_t blockSize;
    if (argc < 3 || strcmp(argv[1], "--help") == 0) {
		printf("%s num-blocks block-size [free-step [start-free " "[end-free]]]\n", argv[0]);
        return 0;
    }
    numBlocks	= atoi(argv[1]);
    blockSize	= atoi(argv[2]);
    freeStep	= (argc > 3) ? atoi(argv[3]) : 1;
    freeBegin	= (argc > 4) ? atoi(argv[4]) : 0;
    freeEnd		= (argc > 5) ? atoi(argv[5]) : numBlocks;
	
	printf("============== Before allocating blocks ==============\n");
	display_mallinfo();
	for (j = 0; j < numBlocks; j++) {
		if (numBlocks >= MAX_ALLOCS) {
			printf("Too many allocations\n");
		}
        alloc[j] = (char *)malloc(blockSize);
        if (alloc[j] == NULL) {
            printf("malloc\n");
        }
    }
    
    printf("\n============== After allocating blocks ==============\n");
    display_mallinfo();
    for (j = freeBegin; j < freeEnd; j += freeStep) {
        free(alloc[j]);
    }
    
    printf("\n============== After freeing blocks ==============\n");
    display_mallinfo();
	exit(EXIT_SUCCESS);
}

9. Other

1. Find the stack start address of main through gdb (consider adding a global variable to record its address when it calls the constructor.)
The address allocation of the operating system stack is that each program allocates 127T(64 bit) virtual memory. What the program sees is only the virtual address. The entry of any program thread stack is close to 0x7fffffffff (0x7fffffffffff ff plus a random value).

The stack pointer when entering the main function is not the real stack start address, because other preparation code processing added by the compiler has consumed part of the stack space before calling main.

2. Further consideration
Due to the existence of virtual memory, it is possible for the system to tidy up memory. Analysis: even after the system memory is sorted and adjusted, the address of the virtual memory will not change, and the stack memory of each thread should not change, but the physical memory corresponding to each memory page will change.

After the process is started, add the starting address printing of each thread stack, which should be used to locate the stack free problem after some core s.

Keywords: Linux memory management

Added by penkomitev on Wed, 02 Feb 2022 01:26:59 +0200