malloc, free, calloc, realloc principles

1. malloc and related functions

In Linux, malloc and its related functions are defined as follows:

#include <stdlib.h>
void *malloc(size_t size);
void free(void *ptr);
void *calloc(size_t nmemb, size_t size);
void *realloc(void *ptr, size_t size);
void *reallocarray(void *ptr, size_t nmemb, size_t size);

In Linux, use the command man malloc to view the relevant information as follows:

If the allocation is successful: the pointer to the allocated memory space is returned;
If allocation fails: return NULL pointer;
At the same time, when the memory is no longer used, the free() function should be used to release the memory block.

About: void *, indicating pointer of undetermined type. C/C + + stipulates that void * type can be forcibly converted to any other type of pointer.
malloc returns a void pointer to the allocated space, or NULL if there is insufficient memory available. To return a pointer to a type other than void, use a type cast on the return value. The storage space pointed to by the return value is guaranteed to be suitably aligned for storage of any type of object. If size is 0, malloc allocates a zero-length item in the heap and returns a valid pointer to that item. Always check the return from malloc, even if the amount of memory requested is small.

Other statements about void *:

void * p1;
int *p2;
p1 = p2; 

That is, any other type can be directly assigned to it without strong conversion, but the opposite is not possible.

malloc:
malloc allocates at least the number of bytes of memory specified by the size parameter
The return value of malloc is a pointer to the starting address of a section of available memory
The address allocated by multiple calls to malloc cannot overlap unless the address allocated by malloc is released
malloc should complete the memory allocation and return as soon as possible (NP hard memory allocation algorithm cannot be used)
When malloc is implemented, the memory size adjustment and memory release functions (realloc and free) should be implemented at the same time

2. malloc and new

new returns a pointer of the specified type, and can automatically calculate the required size.

int *p;
p = new int;   //The return type is int * and the allocated size is sizeof(int)
p = new int[100];    //The return type is int * and the allocated size is sizeof(int) * 100

malloc, on the other hand, must be calculated by us and forcibly converted to a pointer of the actual specified type when returning.

int *p;
p = (int *)malloc(sizeof(int));
1,malloc Your return is void *,If we write it: p = malloc(sizeof(int));
	Indirect description (will) void *Converted to int *,(this is unreasonable)
2,malloc The argument to is sizeof(int),It is used to indicate the size required for shaping data. If we write:
	p = (int*)malloc(1);It can be seen that only one byte of space is applied. If an integer is stored in it,
	It will occupy an additional 3 bytes and may change the data in the original memory space
3,malloc Just allocating memory does not initialize it, so the value of a new piece of memory will be random. In a general sense:
	We habitually initialize it as NULL. of course,It can also be used memset Function.

Simply put:
Malloc function is actually to find a space of a specified size in memory, and then give the first address of this space to a pointer variable. The pointer variable here can be a separate pointer or the first address of an array, depending on the specific content of the parameter size in malloc function. The memory space allocated by malloc here is logically continuous, but physically discontinuous. As programmers, we focus on logical continuity. Other operating systems will help us deal with it.

3. malloc specific implementation mechanism

3.1 Linux memory management

3.1.1 virtual memory address and physical memory address

For simplicity, virtual memory address technology is widely used in modern operating systems when dealing with memory address. That is, at the assembler (or machine language) level, when the memory address is involved, the virtual memory address is used. With this technology, each process seems to own a piece of 2N bytes of memory, where N is the number of machine bits. For example, under 64 bit CPU and 64 bit operating system, the virtual address space of each process is 264Byte.

The function of this virtual address space is mainly to simplify the preparation of programs and facilitate the isolation and management of inter process memory by the operating system. The real process is unlikely (and can not use) such a large memory space, and the actual memory that can be used depends on the size of physical memory.

Because the virtual address is adopted at the machine language level, when the actual machine code program involves memory operation, the virtual address needs to be converted into physical memory address according to the actual context of the current process, so as to realize the operation of real memory data. This conversion is generally completed by a hardware called MMU (Memory Management Unit).

3.1.2 page and address composition

In modern operating systems, both virtual memory and physical memory are not managed in bytes, but in pages. A memory Page is the general name of a continuous memory address with a fixed size. Specifically, in Linux, the typical memory Page size is 4096Byte (4K).

Therefore, the memory address can be divided into page number and in page offset. Taking 64 bit machine, 4G physical memory and 4K page size as an example, the composition of virtual memory address and physical memory address is as follows:

Above is the virtual memory address and below is the physical memory address. Because the page size is 4K, the low price in the page is represented by the lower 12 bits, while the remaining high address represents the page number.

MMU mapping unit is not bytes, but pages. This mapping is realized by looking up a page table of data structure resident in memory. Now the specific memory address mapping of the computer is more complex. In order to speed up the speed, a series of caching and optimization mechanisms will be introduced, such as TLB. The following is a simplified schematic diagram of memory address translation. Although it has been simplified, the basic principle is consistent with the real situation of modern computers.

3.1.3 memory page and disk page

We know that memory is generally regarded as the cache of disk. Sometimes when MMU is working, it will find that a page table indicates that a memory page is not in physical memory. At this time, a Page Fault will be triggered. At this time, the system will load the disk page into memory at the corresponding place in the disk, and then re execute the machine instructions that failed due to page missing. As this part can be seen as transparent to malloc implementation, it will not be described in detail. If you are interested, please refer to the relevant chapters of in-depth understanding of computer system.

Finally, a more consistent process of real address translation found in Wikipedia is attached for your reference. This figure adds the process of TLB and page missing exception picture source

3.2 Linux process level memory management

3.2.1 memory layout

After understanding the relationship between virtual memory and physical memory and the related mapping mechanism, let's take a look at how memory is arranged in a process.

Take Linux 64 bit system as an example. Theoretically, the available space of 64bit memory address is 0x0000000000000000 ~ 0xFFFFFFFFFFFFFFFF, which is a huge space. In fact, Linux only uses a small part (256T).

according to Linux kernel related documents Description: the Linux 64 bit operating system only uses the lower 47 bits and the upper 17 bits for expansion (only all 0 or all 1). Therefore, the address space actually used is 0x0000000000000000 ~ 0x00007fffffffffffff and 0xffff8000000000 ~ 0xffffffffffffff, where the front is User Space and the latter is Kernel Space. The diagram is as follows:

For users, the main focus is User Space. After zooming in the User Space, you can see that it is mainly divided into the following sections:

  • Code: This is the lowest address part of the whole user space. It stores instructions (that is, the executable machine code compiled by the program)
  • Data: the initialized global variables are stored here
  • BSS: uninitialized global variables are stored here
  • Heap: heap, which is the focus of this article. The heap grows from low address to high address. The brk related system call to be mentioned later is to allocate memory from here
  • Mapping Area: This is the area related to mmap system call. Most actual malloc implementations consider allocating larger memory areas through mmap, which is not discussed in this article. This area grows from high address to low address
  • Stack: This is the stack area, growing from high address to low address
    Next, we mainly focus on the operation of the Heap area. Students interested in the whole Linux memory layout can refer to other materials.

3.2.2 Heap memory model

Generally speaking, the memory applied by malloc is mainly allocated from the Heap area (this paper does not consider applying for large memory through mmap).

As we know from the above, the virtual memory address space faced by the process can be used only if it is mapped to the physical memory address by page. Due to the limitation of physical storage capacity, the virtual memory space of the whole heap cannot be mapped to the actual physical memory. Linux heap management is shown as follows:

Linux maintains a break pointer that points to an address in the heap space. The address space from the start address of the heap to break is mapped and can be accessed by the process; From break up, it is an unmapped address space. If you access this space, the program will report an error.

3.2.3, brk and sbrk

As we know from the above, to increase the actual available heap size of a process, you need to move the break pointer to the high address. Linux operates the break pointer through brk and sbrk system calls. The prototypes of the two system calls are as follows:

#include <unistd.h>
int brk(void *addr);
void *sbrk(intptr_t increment);

Use the man brk command in linux to view relevant information:

brk sets the break pointer directly to an address, while sbrk moves the break from the current position by the increment specified by increment. brk returns 0 when the execution is successful, otherwise - 1 is returned and errno is set to ENOMEM; When sbrk succeeds, it returns the address pointed to by break before moving. Otherwise, it returns (void *)-1.

A trick is that if you set increment to 0, you can get the address of the current break.

In addition, it should be noted that since Linux performs memory mapping by page, if break is set to not align by page size, the system will actually map a complete page at the end, so the mapped memory space is larger than the place pointed to by break. However, using the address after the break is very dangerous (although there may be a small memory address available after the break).

3.2.4 resource limitation and rlimit

The resources allocated by the system to each process are not unlimited, including mappable memory space. Therefore, each process has an rlimit, indicating the upper limit of resources available to the current process. This limit can be obtained through the getrlimit system call. The following code obtains the rlimit of the virtual memory space of the current process:

int main() {
	struct rlimit *limit = (struct rlimit *)malloc(sizeof(struct rlimit));
	getrlimit(RLIMIT_AS, limit);
	printf("soft limit: %ld, hard limit: %ld\n", limit->rlim_cur, limit->rlim_max);
}

Where rlimit is a structure:

struct rlimit {
	rlim_t rlim_cur; /* Soft limit */
	rlim_t rlim_max; /* Hard limit (ceiling for rlim_cur) */
};

Each resource has soft and hard restrictions, and you can conditionally set rlimit through setrlimit. The hard limit is the upper limit of the soft limit, and the non privileged process can only set the soft limit and cannot exceed the hard limit.

4. Implement malloc

4.1. Simple implementation

Use sbrk() to implement a simple malloc without guarantee:

/* A dummy malloc */
#include <sys/types.h>
#include <unistd.h>
void* malloc(size_t size) {
	void* p;
	p = sbrk(0);
	if (sbrk(size) == (void*)-1) {
		return NULL;
	}
	return p;
}

This malloc increases the number of bytes specified by size based on the current break every time, and returns the address of the previous break. This malloc cannot be used in real scenes because it lacks records of the allocated memory and is not convenient for memory release.

4.2 formal realization

4.2.1 data structure

First, we need to determine the data structure. A simple and feasible scheme is to organize the heap memory space in the form of blocks. Each Block is composed of a meta area and a data area. The meta area records the meta information of the data Block (data area size, free flag bit, pointer, etc.). The data area is the memory area actually allocated, and the first byte address of the data area is the address returned by malloc.

You can define a block with the following structure:

typedef struct s_block *t_block;
struct s_block {
	size_t size; /* Data area size */
	t_block next; /* Pointer to the next block */
	int free; /* Is it a free block */
	int padding; /* Fill in 4 bytes to ensure that the length of the meta block is a multiple of 8 */
	char data[1] /* This is a virtual field that represents the first byte of the data block, and the length should not be included in the meta */
};

Since we only consider 64 bit machines, for convenience, we fill in an int at the end of the structure so that the length of the structure itself is a multiple of 8 for memory alignment. The schematic diagram is as follows:

About arrays with length 1 Reference blog

4.2.2. Find a suitable block

Now consider how to find the appropriate block in the block chain. Generally speaking, there are two search algorithms:

  • First fit: from the beginning, use the first block whose data area size is larger than the required size, which is called the allocated block this time
  • Best fit: start from scratch, traverse all blocks, and use the block with the data area size greater than size and the smallest difference as the block allocated this time
    The two methods have their own advantages. best fit has higher memory utilization (higher payload), while first fit has better operation efficiency. Here we use the first fit algorithm.
/* First fit */
t_block find_block(t_block *last, size_t size) {
	t_block b = first_block;
	while(b && !(b->free && b->size >= size)) {
		*last = b;
		b = b->next;
	}
	return b;
}

find_block from frist_ Starting with block, find the first qualified block and return the starting address of block. If it is not found, return NULL. Here, a pointer called last will be updated during traversal, which always points to the currently traversed block. This is to open up a new block if you can't find a suitable block, which will be used in the next section.

4.2.3. Open up a new block

If the existing blocks cannot meet the size requirements, you need to open up a new block at the end of the linked list. The key here is how to create a struct using sbrk only:

#define BLOCK_SIZE 24 / * due to the existence of virtual data field, sizeof cannot correctly calculate the length of meta, which is set manually here*/
t_block extend_heap(t_block last, size_t s) {
	t_block b;
	b = sbrk(0);
	if(sbrk(BLOCK_SIZE + s) == (void *)-1) {
		return NULL;
	}
	b->size = s;
	b->next = NULL;
	if(last) {
		last->next = b;
	}
	b->free = 0;
	return b;
}

4.2.4 splitting blocks

First fit has a fatal disadvantage, that is, it may make a small size occupy a large block. At this time, in order to improve the payload, it should be split into a new block when the remaining data area is large enough, as shown below:

Implementation code:

void split_block(t_block b, size_t s) {
	t_block new;
	new = b->data + s;
	new->size = b->size - s - BLOCK_SIZE ;
	new->next = b->next;
	new->free = 1;
	b->size = s;
	b->next = new;
}

4.2.5 implementation of malloc

With the above code, we can integrate them into a simple but initially available malloc. Note that first we need to define the first header of a block linked list_ Block, initialized to NULL; In addition, we need at least block in the remaining space_ Size + 8 to perform splitting operation.
Since we want the data area allocated by malloc to be aligned according to 8 bytes, when the size is not a multiple of 8, we need to adjust the size to a multiple greater than the minimum 8 of size:

size_t align8(size_t s) {
	if(s & 0x7 == 0) {
		return s;
	}
	return ((s >> 3) + 1) << 3;
}

#define BLOCK_SIZE 24
void *first_bloc k= NULL;
/* other functions... */
void* malloc(size_t size) {
	t_block b, last;
	size_t s;
	/* Align address */
	s = align8(size);
	if(first_block) {
		/* Find the appropriate block */
		last = first_block;
		b = find_block(&last, s);
		if(b) {
		/* Split if possible */
		if ((b->size - s) >= ( BLOCK_SIZE + 8)) {
			split_block(b, s);
		}
		b->free = 0;
		} else {
			/* Without a suitable block, open up a new one */
			b = extend_heap(last, s);
			if(!b) {
				return NULL;
			}
		}
	} else {
		b = extend_heap(NULL, s);
		if(!b) {
			return NULL;
		}
		first_block = b;
	}
	return b->data;
}

4.2.6 implementation of calloc

With malloc, there are only two steps to implement calloc:

  • malloc a section of memory
  • Set the content of the data area to 0
    Since our data area is aligned by 8 bytes, in order to improve efficiency, we can set 0 every 8 bytes instead of one byte. We can create a new size_t pointer to force the memory area as size_t type.
void* calloc(size_t number, size_t size) {
	size_t* new;
	size_t s8, i;
	new = malloc(number * size);
	if(new) {
		s8 = align8(number * size) >> 3;
		for(i = 0; i < s8; i++) {
			new[i] = 0;
		}
	}
	return new;
}

4.2.7 implementation of free

The implementation of free is not as simple as it seems. Here we need to solve two key problems:

  • How to verify that the incoming address is a valid address, that is, it is indeed the first address of the data area allocated through malloc
  • How to solve the fragmentation problem
    First, we need to ensure that the incoming free address is valid. This validity includes two aspects:
  • The address should be within the area previously assigned by malloc, that is, at first_block and current break pointer range
  • This address was indeed previously assigned through our own malloc
    The first problem is easy to solve. Just compare the addresses. The key is the second problem.
    There are two solutions:
    One is to bury a magic number field in the structure. Before free, check whether the value of a specific position is the magic number set by us through relative offset;
    Second, add a magic pointer in the structure. This pointer points to the first byte of the data area (that is, the address passed in when free is legal). We check whether the magic pointer points to the address referred to by the parameter before free.
    Here we adopt the second scheme:
    First, we add magic pointer to the structure (and modify BLOCK_SIZE at the same time):
typedef struct s_block *t_block;
struct s_block {
	size_t size; /* Data area size */
	t_block next; /* Pointer to the next block */
	int free; /* Is it a free block */
	int padding; /* Fill in 4 bytes to ensure that the length of the meta block is a multiple of 8 */
	void *ptr; /* Magic pointer,Point to data */
	char data[1] /* This is a virtual field that represents the first byte of the data block, and the length should not be included in the meta */
};

Then we define the function to check the legitimacy of the address:

t_block get_block(void *p) {
	char *tmp;
	tmp = p;
	return (p = tmp -= BLOCK_SIZE);
}
 
int valid_addr(void *p) {
	if(first_block) {
		if(p > first_block && p < sbrk(0)) {
			return p == (get_block(p))->ptr;
		}
	}
	return 0;
}

After multiple malloc and free, the whole memory pool may generate many fragment blocks. These blocks are very small and often can not be used, and even many fragments are connected together. Although they can meet the requirements of this malloc, they cannot fit because they are divided into multiple small blocks. This is the fragment problem.

A simple solution: when a block is free, if it is found that its adjacent blocks are also free, merge the block and adjacent blocks. In order to meet this implementation, we need to put s_ Change block to bidirectional linked list. The modified block structure is as follows:

typedef struct s_block *t_block;
struct s_block {
`
};

The consolidation method is as follows:

t_block fusion(t_block b) {
	if (b->next && b->next->free) {
		b->size += BLOCK_SIZE + b->next->size;
		b->next = b->next->next;
		if(b->next) {
			b->next->prev = b;
		}
	}
	return b;
}

With the above methods, the implementation idea of free is relatively clear: first, check the legitimacy of the parameter address. If it is not legal, do nothing; Otherwise, mark the free of this block as 1 and merge it with the following block if possible. If the current block is the last one, the break pointer will be rolled back to release the process memory. If the current block is the last one, the break pointer will be rolled back and the first will be set_ Block is NULL. The implementation is as follows:

void free(void *p) {
	t_block b;
	if(valid_addr(p)) {
        b = get_block(p);
        b->free = 1;
        if(b->prev && b->prev->free) {
            b = fusion(b->prev);
        }
        if(b->next) {
            fusion(b);
        } else {
            if(b->prev) {
                b->prev->prev = NULL;
            } else {
                first_block = NULL;
            }
            brk(b);
        }
	}
}

4.2.8 implementation of realloc

In order to implement realloc, we first need to implement a memory replication method. Like calloc, for efficiency, we copy in 8 bytes:

void copy_block(t_block src, t_block dst) {
	size_t *sdata, *ddata;
	size_t i;
	sdata = src->ptr;
	ddata = dst->ptr;
	for(i = 0; (i * 8) < src->size && (i * 8) < dst->size; i++) {
		ddata[i] = sdata[i];
	}
}

Then we started to implement realloc. A simple (but inefficient) approach is to malloc a piece of memory and then copy the data. However, we can be more efficient. We can consider the following aspects:

  • If the data area of the current block is greater than or equal to the size required by realloc, no operation will be done
  • If the new size becomes smaller, consider split
  • If the data area of the current block cannot meet the size, but the subsequent block is free and can meet the requirements after merging, consider merging

The following is the implementation of realloc:

void *realloc(void *p, size_t size) {
    size_t s;
    t_block b, new;
    void *newp;
    if (!p) {
        /* According to the standard library document, when p passes NULL, it is equivalent to calling malloc */
        return malloc(size);
    }
    if(valid_addr(p)) {
        s = align8(size);
        b = get_block(p);
        if(b->size >= s) {
            if(b->size - s >= (BLOCK_SIZE + 8)) {
                split_block(b, s);
            }
        } else {
            /* See if consolidation is possible */
            if(b->next && b->next->free && (b->size + BLOCK_SIZE + b->next->size) >= s) {
                fusion(b);
                if(b->size - s >= (BLOCK_SIZE + 8)) {
                    split_block(b, s);
                }
            } else {
                /* New malloc */
                newp = malloc (s);
                if (!newp) {
                    return NULL;
                }
                new = get_block(newp);
                copy_block(b, new);
                free(p);
                return(newp);
            }
        }
        return (p);
    }
    return NULL;
}

5. Remaining problems and optimization

The above is a relatively simple but initially available malloc implementation. There are many remaining possible optimization points, such as:

  • Compatible with both 32-bit and 64 bit systems
  • When allocating large amounts of fast memory, consider using mmap instead of sbrk, which is usually more efficient
  • You can consider maintaining multiple linked lists instead of a single one. The block size in each linked list is within a range, such as 8-byte linked list, 16 byte linked list, 24-32 byte linked list, etc. At this time, you can allocate to the corresponding linked list according to size, which can effectively reduce fragmentation and improve the speed of querying blocks
  • You can consider storing only free blocks in the linked list instead of allocated blocks, which can reduce the times of searching blocks and improve efficiency

6. Other references

1. This article has a lot of references A malloc Tutorial , some of the pictures and codes directly refer to the content of the article, which is specially pointed out here
2,Computer Systems: A Programmer's Perspective, 2/E There are many places worth referring to in this book
3. About the virtual memory model of Linux, Anatomy of a Program in Memory It is a good reference. In addition, the author has another article How the Kernel Manages Your Memory There is a good explanation for the part of virtual memory management in Linux kernel
4. For the implementation of malloc in the real world, you can refer to glibc Implementation of
5. A lot of references have been made in the writing process of this paper Wikipedia

Keywords: malloc

Added by c815902 on Wed, 02 Feb 2022 18:34:29 +0200