Effective c + + learning notes - customize new and delete


Unless otherwise specified, the information discussed in this paper is about The function of new also applies to new[]

Clause 49 - Understanding new_handler behavior

1. Global new_handler

We know that bad will be thrown when new fails_ Alloc abnormality; We can also set the new operator to return a null pointer instead of throwing an exception by specifying noexcept. However, before performing the above operations, new will first call the user's exception handling function, new, when the allocation fails_ handler. The type declaration and setting method are as follows:

typedef void (*new_handler)();
std::new_handler set_new_handler( std::new_handler new_p ) noexcept;(since C++11)

When the new operator cannot find enough space, it will try to call new continuously_ handler. Therefore, in New_ In handler, we must do one of the following:

Make more memory available - so you can use new after the new operation fails_ Handler gets more free space, which makes the operation successful. Usually, we can apply to allocate a large amount of memory when the program is running; When new_ When handlers are called for the first time, they are returned to the program.
Install another new_handler - if the current new_ The handler cannot get more memory, but it knows that there are other new available_ The handler has this ability. You can reset the new_handler function pointer.
Uninstall new_handler - uninstall new_handler will make the corresponding handler pointer null. In this case, the new operator throws an exception.
Throw bad_alloc
Program exit

Copy an example from the manual:

#include <iostream>
#include <new>

void handler()
{
	std::cout << "Memory allocation failed, terminating\n";
	std::set_new_handler(nullptr);
}

int main()
{
	std::set_new_handler(handler);
	try {
		while (true) {
			new int[100000000ul];
		}
	}
	catch (const std::bad_alloc& e) {
		std::cout << e.what() << '\n';
	}
}

2. Handle new for a specific class_ handler

Sometimes we want a specific new_handler is used to handle a specific class. Obviously, each class must provide its own new_handler function. However, we need to note that we have set a specific new_ The handler should ensure that it does not affect other classes. This means that we need to set new before the global new operator executes_ Handler to restore global new after its execution_ Handler, so you need to override the new operator; Finally, we should not write new_handler function, that is, the new we set in the new operator_ The handler should be specified by the customer. Therefore, we need to provide a set_ new_ The handler function is used by the client to set the exception handling function for the current class. Of course, this name may not be called this, but its function and the set provided by the system_ new_ Handlers should be consistent.

#include <iostream>
using namespace std;

class CLS_Test
{
public:
	static void* operator new(size_t size) noexcept(false);
	static new_handler set_new_handler(new_handler _handler);
private:
	static new_handler m_curHandler;
};

new_handler CLS_Test::m_curHandler = nullptr;
new_handler CLS_Test::set_new_handler(new_handler _handler)
{
	new_handler oldHandler = m_curHandler;
	m_curHandler = _handler;
	return oldHandler;
}

void* CLS_Test::operator new(size_t size) noexcept(false)
{
	new_handler oldHandler = ::set_new_handler(m_curHandler);
	void *pRet = ::operator new(size);
	::set_new_handler(oldHandler);
	return pRet;
}

This basically achieved our goal. However, we ignore a problem: the global new operator may throw exceptions. At this time, we should use the idea of RAII to add the class that manages the original pointer:

class CLS_HandlerHolder
{
public:
	explicit CLS_HandlerHolder(new_handler _handler) :
		m_handler(_handler)
	{
	}

	~CLS_HandlerHolder()
	{
		set_new_handler(m_handler);
	}

	CLS_HandlerHolder(const CLS_HandlerHolder&) = delete; 
	CLS_HandlerHolder& operator=(const CLS_HandlerHolder&) = delete;
private:
	new_handler m_handler;
};

At this time, the implementation of customized new should be as follows:

void* CLS_Test::operator new(size_t size) noexcept(false)
{
	CLS_HandlerHolder oldHandler(::set_new_handler(m_curHandler));
	return ::operator new(size);
}

3. Using template base class to implement custom new_handler

In order to reuse this class to the greatest extent, we can use the minin style base class - use this class as a template base class for other classes. As a base class, it is used to set new for all derived classes_ The function of handler. And it's easy to notice that due to CLS_ All variables and functions of test are static, so it has no problem in multi inheritance. The purpose of setting it as a template is to make different classes use different static m_handler. Otherwise, the new of each base class_ Handler will conflict.

class CLS_HandlerHolder
{
public:
	explicit CLS_HandlerHolder(new_handler _handler) :
		m_handler(_handler)
	{
	}

	~CLS_HandlerHolder()
	{
		set_new_handler(m_handler);
	}

	CLS_HandlerHolder(const CLS_HandlerHolder&) = delete; 
	CLS_HandlerHolder& operator=(const CLS_HandlerHolder&) = delete;
private:
	new_handler m_handler;
};

template<typename T>
class CLS_HandlerBase
{
public:
	static void* operator new(size_t size) noexcept(false);
	static new_handler set_new_handler(new_handler _handler);
private:
	static new_handler m_curHandler;
};

template<typename T>
new_handler CLS_HandlerBase<T>::m_curHandler = nullptr;

template<typename T>
new_handler CLS_HandlerBase<T>::set_new_handler(new_handler _handler)
{
	new_handler oldHandler = m_curHandler;
	m_curHandler = _handler;
	return oldHandler;
}

template<typename T>
void* CLS_HandlerBase<T>::operator new(size_t size) noexcept(false)
{
	CLS_HandlerHolder oldHandler(::set_new_handler(m_curHandler));
	return ::operator new(size);
}

Its usage is roughly as follows:

class CLS_HandlerTest : public CLS_HandlerBase<CLS_HandlerTest> 
{
	...
};

void myHandler()
{
	cout << "cannot allocate enough space" << endl;
	abort();
}

int main()
{
	CLS_HandlerTest::set_new_handler(myHandler);
	CLS_HandlerTest* pHandlerTest = new CLS_HandlerTest;
	string* pStr = new string;
	CLS_HandlerTest::set_new_handler(nullptr);
	CLS_HandlerTest* pHandlerTest2 = new CLS_HandlerTest;
	return 0;
}

Here's an interesting point: we never use the type parameters of the template class, and the template parameters of the template class we instantiate are derived classes. This is precisely because we said earlier that we just want each derived class to have a separate base class instance.

4,nothrow new

Global new provides a nothrow version. However, its assurance of abnormality is not high. Because two things happen in the process of new: allocating space (new operator execution) and calling constructor (compiler generated call). The exception guarantee of new is only for the space allocation process, not including the object construction process. Therefore, in fact, we do not need to use nothrow new.

Article 50 - understand the reasonable replacement time of new and delete

1. Common reasons for replacing new and delete

Used to detect and record application errors - sometimes we try to delete the memory on the stack or try to delete the same memory multiple times. If we use the customized space allocation and release function, we can maintain a list of pointers obtained by new and remove the pointers when deleting. In addition, since the memory is completely managed by ourselves, we may cross the boundary of a pointer and overwrite some data of its initial or end part. If we customize the new operator, we can apply for more memory to save the pointer signature. When deleting, we decide whether to destroy the memory or record errors by checking whether the signature is complete.
To enhance performance - the new operator provided by the compiler is a generalized space allocation operator. It must deal with a series of requirements, including supporting both large memory and small memory; Various allocation forms must be accepted, from the dynamic allocation of a small number of blocks during the survival of the program to the continuous allocation and return of a large number of short-lived objects; They must also consider memory fragmentation. These are the directions that garbage collection mechanism must consider. However, as shown in other languages, this space allocation and garbage collection can hardly meet all needs at the same time. We can customize new and delete to meet our needs in specific directions, including faster speed, higher space utilization, etc.
In order to collect usage statistics - by recording in new and delete, we can know what kind of memory our software tends to allocate, how its distribution is, how long their life cycle is, and so on.

2. Alignment problem in Custom new

Conceptually, writing a custom new operator is very simple. For example, write a new operator to add pointer signature:

#include <iostream>
using namespace std;

static const int signature = 0XCAFEBABE;
void* operator new(size_t size)
{
	size_t sizeReal = size + 2 * sizeof(int);
	void* pMem = malloc(sizeReal);
	if (!pMem)
	{
		throw bad_alloc();
	}

	*(static_cast<int*>(pMem)) = signature;
	*(reinterpret_cast<int*>(static_cast<char*>(pMem) + sizeReal - sizeof(int))) = signature;

	return static_cast<char*>(pMem) + sizeof(int);
}

Let's not consider new here_ Call to handler. There is a more subtle problem here - alignment. Many computer architectures require that specific types must be stored at specific memory addresses. For example, it may require that the address of the pointer must be a multiple of 4 or the address of double must be a multiple of 8. If this condition is not met, it may cause runtime exceptions. Even for hardware systems without exceptions, they claim that when the alignment conditions are met, their execution efficiency will be higher.
C + + requires all pointers returned by the new operator to be properly aligned. Malloc works under such requirements. Therefore, it is safe for the new operator to directly return the memory allocated by malloc (this is the implementation of global new). However, we did not return the memory directly here, but added an offset. Therefore, we cannot guarantee that the pointer is available or efficient. I did the following tests on my machine:

#include <iostream>
using namespace std;

static const int signature = 0XCAFEBABE;
void* operator new(size_t size)
{
	size_t sizeReal = size + 2 * sizeof(int);
	void* pMem = malloc(sizeReal);
	if (!pMem)
	{
		throw bad_alloc();
	}

	*(static_cast<int*>(pMem)) = signature;
	*(reinterpret_cast<int*>(static_cast<char*>(pMem) + sizeReal - sizeof(int))) = signature;

	return static_cast<char*>(pMem) + sizeof(int);
}

void __CRTDECL operator delete(void* const block, size_t const)
{
	free(static_cast<char*>(block) - sizeof(int));
}

int main()
{
	time_t tStart = clock();
	for (int i = 0; i < 1000000000; i++)
	{
		double* pD = new double;
		*pD = 1.0;
		delete pD;
	}
	time_t tEnd = clock();
	cout << "time = " << (tEnd - tStart) << endl;
	return 0;
}


After removing customized new and delete:

Of course, we don't have to implement technical details like Qi bit. On the one hand, many compilers provide memory management functions for debug state and log state. Commercial products are also available on many platforms, which can replace the memory management provided by the compiler; On the other hand, we can choose open source. A Pool allocator is provided in the Boost library to allocate a large number of small objects.

3. Other opportunities to use custom new and delete

To increase the speed of allocation and return - for example, you write a single threaded program, while your compiler provides a thread safe memory manager. We can write a memory manager without thread safety to improve memory speed.
To reduce the extra space overhead of the default memory manager
In order to make up for the non optimal parity in the default allocator - the compiler's own new does not guarantee the 8-bit alignment of double, and the customized version can achieve this function.
In order to centralize related objects - if you know that a data structure is often used together and you want to minimize the frequency of "memory page errors" when processing this data, it makes sense to create a separate heap for this data structure. In this way, they can be saved on as few memory pages as possible, and their access speed can be improved.
In order to obtain unconventional behavior

Clause 51 - new and delete should be written in a conventional way

1. New rule 1 - new handling function must be called when memory is insufficient

When the memory is insufficient, the new operator needs to call the new handling function continuously until the function returns null or throws an exception.

2. new rule 2 - processing zero memory requests

C + + stipulates that even if the memory size requested by the customer is 0 bytes, the new operator must return a legal pointer. This behavior is to simplify the processing of other parts of the language. The implementation of gcc is as follows:

_GLIBCXX_WEAK_DEFINITION void *
operator new (std::size_t sz) _GLIBCXX_THROW (std::bad_alloc)
{
  void *p;

  /* malloc (0) is unpredictable; avoid it.  */
  if (sz == 0)
    sz = 1;

  while (__builtin_expect ((p = malloc (sz)) == 0, false))
    {
      new_handler handler = std::get_new_handler ();
      if (! handler)
	_GLIBCXX_THROW_OR_ABORT(bad_alloc());
      handler ();
    }

  return p;
}

Here we can also see rule 1 and the new we learned earlier_ What the handler must do. If we don't do that, the function will fall into an infinite loop.
If we look at the msvc source code, we will find that it does not process the zero memory application.

_CRT_SECURITYCRITICAL_ATTRIBUTE
void* __CRTDECL operator new(size_t const size)
{
    for (;;)
    {
        if (void* const block = malloc(size))
        {
            return block;
        }
		...
    }
}

So when will a zero memory request occur? When we try to create an array with length 0. According to the comments of gcc, this behavior of changing 0 to 1 is to prevent the uncertain behavior of malloc(0). However, I really don't understand why msvc doesn't have this implementation. Maybe it has something to do with the implementation of the underlying malloc. If a friend knows, please help point out.

3. New rule 3 - avoid masking the normal form of new

The new operator overridden by a specific scope may mask the global new operator. The reason is that, as we discussed earlier, the functions of the derived class will mask the functions of the same name of the base class. As mentioned in Clause 50, a common reason for writing custom memory management is to optimize the allocation of objects of a specific size, but not for any derived classes of this class. However, once the operator is inherited, the compiler will call the operator overload to allocate the space of the derived class, rather than the global new operator!
In order to solve this problem, we must enable the memory allocation of derived classes to directly call the global operator. The most direct way is to directly judge the object size in customized new:

#include <iostream>
using namespace std;

class CLS_Base
{
public:
	void* operator new(size_t size) noexcept(false)
	{
		if (size != sizeof(CLS_Base))
		{
			::operator new(size);
		}
	}
};

class CLS_Derived : public CLS_Base{};

Here we do not deal with the case of size == 0. That's because sizeof(CLS_Base) must not be 0. So the problem will be handed over to the global new operator.

4. new [] rule - allocate unprocessed memory

If we try to customize operator new []. That's very difficult. On the one hand, we cannot determine whether we want to build a base class object array or a derived class object array according to size; On the other hand, size may be more than the amount of memory we want to build, because we need to save information such as array length.

5. Delete rule - it is always safe to delete null pointers

The free function in C has guaranteed this, so the free function directly called by delete in C + + implements this rule. The processing method of customized delete is the same as that of new. If the memory space size of delete is not equal to the current object size, the global delete function will be called directly. Here is another point worth noting: when we use the derived class object pointed to by the release base class pointer, if the destructor is not a virtual function, we will not get the correct space release size:

#include <iostream>
using namespace std;

class CLS_Base
{
	int m_iMem;
public:
	void* operator new(size_t const size) noexcept(false)
	{	
		void* ptr = ::operator new(size);
		cout << "new ptr = " << ptr << " size = " << size << endl;
		return ptr;
	}

	void operator delete(void* const block, size_t const size) noexcept(false)
	{
		cout << "delete ptr = " << block << " size = " << size << endl;
		::operator delete(block);
	}
};

class CLS_Derived : public CLS_Base
{
	int m_iMem;
};

int main()
{
	CLS_Base* pBase = new CLS_Derived;
	delete (pBase);
	CLS_Derived* pDerived = new CLS_Derived;
	delete (pDerived);
}

Clause 52 - if you write placement new, you should also write placement delete

Locate the new operator and the delete operator. We once wanted to learn how to use them in C++ Primer Plus.

1. Locate the new operator

In a narrow sense, the positioning new operator is:

void* operator new(std::size_t size, void* pMemory)'

However, locating the new operator does not actually limit the type and number of parameters. Therefore, we can overload the operator arbitrarily to allocate memory under the conditions of specified space, specified object and so on.
A typical usage scenario of location operator is to use it together with Allocator to allocate continuous memory space.

2. Pair new and delete

When we try to new an object, if an exception occurs during construction, the operating system has the responsibility to call delete to reset the allocated space. However, this is based on the premise that new and delete matching parameters appear in pairs. Therefore, the following code will cause memory leakage:

#include <iostream>
using namespace std;

class CLS_Test
{
public:
	CLS_Test()
	{
		throw runtime_error("test error");
	}

	void* operator new(size_t const size, void* pMemory) noexcept(false)
	{	
		void* ptr = ::operator new(size);
		cout << "new ptr = " << ptr << " size = " << size << endl;
		return ptr;
	}

	void operator delete(void* const block, size_t const size) noexcept(false)
	{
		cout << "delete ptr = " << block << " size = " << size << endl;
		::operator delete(block);
	}
};

int main()
{
	try
	{
		CLS_Test* pTest = new(nullptr) CLS_Test;
	}
	catch (exception &e)
	{
		cout << e.what() << endl;
	}
}


Although we caught the exception, the operating system did not find the delete operator corresponding to new. This will cause memory overflow. Therefore, we can implement the delete operator as follows:

void operator delete(void* const block, void* pMemory) noexcept(false)
{
	...
}

3. Locate new allocation and normal delete

If we don't have any exceptions in the construction process, we need to reclaim the memory when the use of memory ends. At this time, we are faced with a problem: which delete is used for memory recycling? According to the manual, we should call the normal delete function, that is, the version consistent with the parameters of the global delete operator. The problem is that according to the name masking rule, we can't see it now. Therefore, we also need to implement the overloaded ordinary delete operator. In addition to finding the corresponding operator overload during delete, a better reason to overload it may be to locate the memory allocated by new, which usually does not need to be deleted manually.

4. Positioning operators and name masking

Locating new and delete will mask their global versions, which is obvious. We can use inheritance to provide a better solution:

class CLS_StandardNewDeleteForms
{
public:
	// normal new and delete
	static void* operator new(size_t const size) noexcept(false);
	static void operator delete(void* const block, size_t const size) noexcept(false);

	// placement new and delete
	static void* operator new(size_t const size, void* pMemory) noexcept(false);
	static void operator delete(void* const block, size_t const size) noexcept(false);
	
	// noexcept new and delete
	static void* operator new(size_t const size) noexcept(true);
	static void operator delete(void* const block, size_t const size) noexcept(true);
};

class CLS_Test : public CLS_StandardNewDeleteForms
{
public:
	using CLS_StandardNewDeleteForms::operator new;
	using CLS_StandardNewDeleteForms::operator delete;
};

As for why not directly introduce global new and delete. That's because when using declaration is used for class scope, its declared nested name can only be used to declare base class members.

Keywords: C++ memory management

Added by brandone on Mon, 31 Jan 2022 08:53:39 +0200