C + +: copy on write technology and Simulation Implementation of string class

preface

Depth copy (depth assignment)

In previous articles C + +: Six default member functions and deep and shallow copies of the class , we have discussed the problem of deep and shallow copy in depth. If you don't know the of deep and shallow copy, you can take a look at this article first.

Here we directly give the definition of deep and shallow copy.

Shallow copy:

Also known as bit copy, the compiler just copies the values in the object. If a resource is managed in an object, it will eventually lead to multiple objects sharing the same resource. When an object is destroyed, the resource will be released. At this time, other objects do not know that the resource has been released and think it is still valid. Therefore, when continuing to operate on the resource, an access violation will occur.

If a class involves resource management, its copy constructor, assignment operator overload and destructor must be displayed, which are generally provided in the way of deep copy.

Deep copy:

Allocate resources to each object independently to ensure that multiple objects will not be released multiple times due to sharing resources, resulting in program crash.

1. Some pits that may exist in deep copy( 🕳)

Let's simulate the implementation of a string class. Since its member variable is a pointer to char *, we need to use deep copy technology when implementing its copy constructor and assigning constructor.

The code is as follows:

#include <iostream>
#include <cstring>

using namespace std;

namespace mytest{
    class string{
        friend ostream& operator<<(ostream& out,const string& s);
        private:
            char* _arr;
        public:
            explicit string(): _arr(nullptr)
            {}
            string(const char* s = "")
            {
                _arr = new char[strlen(s)+1];
                strcpy(_arr,s);
            }

            string(const string& s)
            {
                _arr = new char[strlen(s._arr)+1];
                strcpy(_arr,s._arr);
            }

            string& operator=(const string& s)
            {
                if(this != &s)
                {
                    delete []_arr;
                    _arr = new char[strlen(s._arr)+1];
                    strcpy(_arr,s._arr);
                }
                return *this;
            }

            ~string()
            {
                delete[] _arr;
                _arr = nullptr;
            }


    };

     ostream& operator<<(ostream& out,const string& s)
     {
         out << s._arr;
         return out;
     }
}

This code has no problem at first glance. Most people may implement deep copy in the same way as me, and think it is no problem. However, there are still some abnormally unsafe problems in this code.

The so-called unsafe exception means that when an exception occurs in the program, the original pointer, space, memory and other states are unsafe or modified. So exception security can be said that even if an exception occurs, it will not disclose resources or allow any data structure damage, and the fallback of the program is clean.

Extension: thread safety: it means that multiple execution streams will not cause ambiguity in program results when accessing the same critical resource.

Then let's take a look at our code:

string& operator=(const string& s)
{    
    if(this != &s)                    
    {                    
        delete []_arr;                                        
        _arr = new char[strlen(s._arr)+1];           
        strcpy(_arr,s._arr);    
        }            
    return *this;
}

Suppose that the space on the heap of the current program is full, and there is no way to allocate more space to_ The ARR variable. At this time, our program will make an error. When the application for space fails, the new function will throw an exception, indicating that the application fails, but at this time, we have_ The original space of arr has been released_ Arr will become a wild pointer, which will cause the problem of abnormal insecurity of the program. When a new space fails, the fallback of the program is not clean.

So how to improve? There are two ways.

Improvement 1: take a temporary char * variable to apply for space. If the application is successful, assign the address of the space to_ arr pointer.

string& operator=(const string& s)
{   
    if(this != &s)
    {  
    	char* tmp = new char[strlen(s._arr)+1];  
        delete[] _arr; 
         _arr = tmp; 
        strcpy(_arr,s._arr); 
	} 
    return *this;
}

Improvement 2: create a temporary object with S_ Arr initializes the object, and then exchanges the current object's_ Of arr and temporary objects_ arr.

string& operator=(const string& s)    
{    
    if(this != &s)    
    {    
        string tmp(s);    
        std::swap(_arr,tmp._arr);
	}       
	return *this;
}

Comparing improvement 1 with improvement 2, we can find that improvement 2 is better! Because it calls the copy constructor to directly apply for a space, exchanges the points of their pointers through the swap function, and it is a temporary variable. When the function runs, it will automatically call its own destructor for deconstruction, so we don't need to worry about its memory release.

Code improvement:

#include <iostream>
#include <cstring>
#include <algorithm>

using namespace std;

namespace mytest{
    class string{
        friend ostream& operator<<(ostream& out,const string& s);
        private:
            char* _arr;
        public:
            explicit string(): _arr(nullptr)
            {}
            string(const char* s = "")
            {
                _arr = new char[strlen(s)+1];
                strcpy(_arr,s);
            }

            //The program is abnormally safe, because if there is insufficient space, the program will throw an exception, and the fallback process is clean
            string(const string& s):_arr(nullptr)
            {
                //Note that you must use a string here to receive
                //If char * TMP = s_ If arr writes it, it will cause Double Free problem
                //Because tmp points to s_ The space of arr, when s is released,
                //When the current object is released again, the_ arr that space is repeatedly released.
                string tmp(s._arr);
                std::swap(_arr,tmp._arr);
            }

            string& operator=(const string& s)
            {
                if(this != &s)
                {
                    //There are two advantages to receiving with a temporary variable
                    //1.tmp is initialized by calling the copy constructor, which is abnormally safe
                    //2. Because tmp is a temporary variable, after the assignment is overloaded,
                    //It will automatically call the destructor for deconstruction. We don't have to separate the original_ The arr space is released,
                    //Then they exchanged, which is killing three birds with one stone.
                    string tmp(s);
                    std::swap(_arr,tmp._arr);
                }
                return *this;
            }

            ~string()
            {
                if(_arr)
                {
                    delete[] _arr;
                    _arr = nullptr;
                }
            }


    };

     ostream& operator<<(ostream& out,const string& s)
     {
         out << s._arr;
         return out;
     }
}

2. Copy on write technology

We also talked about deep and shallow copy earlier, so the question is, is it better to use deep copy or shallow copy?

At this time, some people may say that when there are variables similar to pointers in the member variables of a class, deep copy is required during implementation. On the contrary, shallow copy is used when there is no such variable.

This answer can be no problem in a big way, but what if there is such a situation? Take the string class as an example. If my string class is implemented by deep copy, what if my operation on string is just an access operation? The object I defined will not call the copy constructor or assignment overloaded function, so it will not involve the problem of deep copy. At this time, will it feel overqualified to implement this class with deep copy? Because shallow copy can fully meet my requirements.

If you only use shallow copy, in case you encounter this form that needs to be copied and constructed, don't you have to re-establish a string class? Obviously, this is impossible. So what is the best thing to do?

At this time, there is a technology, cpoy on write technology, commonly known as copy only when writing technology. It is the product of the "lazy behavior" in the programming world - the procrastination strategy. For example, every time we write data to a file, we call the fwirte function, but it will not be written to the file immediately. Instead, it will be written to a piece of memory of a specific size (disk cache). It will be written to the disk only when we close the file (that's why if the file is not closed, the written things will be lost).

2.1 principle

Reference counting is the principle of copying only when writing in string class. When string class instantiates different objects, these objects will trigger the mechanism of copying only when writing when they are operating in shared memory.

When will shared memory be implemented? Generally speaking, there are two cases: one is to call other classes to construct itself (trigger copy constructor), and the other is to assign values with other classes (trigger assignment overloaded function).

When will the shared memory be operated? It refers to our usual operations such as + =, =, +, []. At this time, different objects will operate on the same shared memory. At this time, we need to trigger our write time copy technology, otherwise it will become shallow copy, causing the problem of Double Free.

So how does reference counting manage our shared memory?

2.2 management of shared memory by reference counting

Reference count: used to record the number of resource users. During construction, the count of resources is set to 1. Every time an object uses the resource, the count is increased by 1. When an object is destroyed, the count is first reduced by 1, and then check whether the resource needs to be released. If the count is 1, it indicates that the object is the last user of the resource and releases the resource; Otherwise, it cannot be released because there are other objects using the resource.

Suppose there is an object of string s1("abc"), when it is initialized, we can set its reference counter to 1, indicating that an object is in use in the current "abc" space. When we use string s2(s1) to copy and construct it, we only need to add 1 to the reference counter pointing to the "abc" space.

Then, during object deconstruction, we only need to subtract 1 from the reference counter of the space it points to. When the reference counter is reduced to 0, the object that last performed the subtraction operation will be responsible for recycling this space. To avoid memory leakage.

2.3 code implementation of copying only when writing

In order to ensure that different objects instantiated by string class have reference counters in different spaces, we need to encapsulate the reference counter into a class and use this class as the member variable of string class.

The code is as follows:

#include <iostream>
#include <cstring>
#include <algorithm>

using namespace std;

namespace mytest{
    class string;
    class stringRefC
    {
        friend ostream& operator<<(ostream& out,const stringRefC& s);
        friend class string;
        private:
        int ref_count;
        char* _arr;
        public:
        stringRefC(const char* s = "") : ref_count(0)
        {
            _arr = new char[strlen(s)+1];
            strcpy(_arr,s);
        }
        stringRefC(const stringRefC& s);


        void increaseCount()
        {
            ++ref_count;
        }

        void decraseCount()
        {
            if(--ref_count == 0)
            {
                //If the counter is 0, it calls its own destructor
                delete this;
            }
        }
        
        int use_count()const
        {
            return ref_count;
        }

        ~stringRefC()
        {
            if(ref_count == 0)
            {
                delete []_arr;
                _arr = nullptr;
            }
        }


    };
    class string{
        friend class stringRefC;
        friend ostream& operator<<(ostream& out,const string& s);
        private:
        stringRefC* _ref; 
        public:
        explicit string(): _ref(nullptr)
        {}
        string(const char* s= ""):_ref(new stringRefC(s))
        {
            _ref->increaseCount();
        }
        string(const string& s) : _ref(s._ref)
        {
            _ref->increaseCount();
        }
        string& operator=(const string& s)
        {
            if(this != &s)
            {
                _ref->decraseCount();
                _ref = s._ref;
                _ref->increaseCount();
            }
            return *this;
        }

        ~string()
        {
            _ref->decraseCount();
        }
    };
    
     ostream& operator<<(ostream& out,const string& s)
       {
           out << s._ref << ",ref_count = " << s._ref->use_count();
           return out;
       }
};

void test()
{
    mytest::string s("abc");
    cout << "s:" << s << endl;
    mytest::string r("def");
    cout << "r:" << r << endl;

    cout << "copyonwrite:r = s" << endl;
    r = s;
    cout << "r:" << r << endl;

    cout << "copyonwrite: string k(r)" << endl;
    mytest::string k(r);
    cout << "k:" << k << endl;
}

int main()
{
    test();
    return 0;
}

Operation results:

For more details about copying only when writing, you can view: COPY-ON-WRITE technology of C++ STL STRING.

3. Simulation implementation of string class

3.1 introduction to string class

String is a string class that represents a string
The interface of this class is basically the same as that of the regular container, and some regular operations specially used to operate string are added.
String at the bottom is actually: basic_ Alias of string template class, typedef basic_string<char, char_ traits, allocator> string;
Sequences of multibyte or variable length characters cannot be manipulated
Note that this class processes bytes independently of the encoding used: if it is used to process sequences of multibyte or variable length characters (such as UTF-8), all members of this class (such as length or size) and its iterators will still operate according to bytes (rather than actually encoded characters).

The underlying implementation principles of size() and length() methods are exactly the same. The reason why size() is introduced is to keep consistent with the interfaces of other containers. Generally, size() is basically used.
clear() just clears the valid characters in the string without changing the size of the underlying space.
Both resize(size_t n) and resize(size_t n, char c) change the number of valid characters in the string to N. the difference is that when the number of characters increases: resize(n) fills the extra element space with 0, and resize(size_t n, char c) fills the extra element space with C. Note: when resizing the number of elements, increasing the number of elements may change the size of the underlying capacity. If reducing the number of elements, the total size of the underlying space will remain unchanged.
reserve(size_t res_arg=0): reserve space for string without changing the number of valid elements. When the parameter of reserve is less than
When the total size of the underlying space of string is, the reserver will not change the capacity.

More detailed view string document introduction.

3.2 simulation code implementation

#include <iostream>
#include <assert.h>
#include <cstring>
#include <algorithm>

using namespace std;


namespace mytest{
    class string
    {
        friend ostream& operator<<(ostream& out,const string& s);
        public:
        explicit string() : _arr(nullptr),_size(0),_capacity(0)
        {}
        string(const char* s) : _arr(nullptr),_size(0),_capacity(0)
        {
            _size = strlen(s);
            _capacity = _size;
            _arr = new char[_size];
            strcpy(_arr,s);
        }
        string(const string& s):_arr(nullptr),_size(0),_capacity(0)
        {
            string tmp(s._arr);
            swap(tmp);
        }
        string& operator=(const string& s)
        {
            if(this != &s)
            {
                string tmp(s._arr);
                swap(tmp);
            }
            return *this;
        }
        ~string()
        {
            if(_arr)
            {
                delete []_arr;
                _arr = nullptr;
            }
        }
        public:
        typedef char* iterator;
        //iterator
        iterator begin() const
        {
            return _arr;
        }
        iterator end() const
        {
            return _arr+_size;
        }

        iterator insert(iterator pos,char c)
        {
            if(_size == _capacity)
            {
                size_t oldpos = pos - begin();
                size_t newcapacity = _capacity == 0? 1 : 2*_capacity;
                //Expand capacity first
                resever(newcapacity);

                pos = begin() + oldpos;
            }
            iterator it = end();
            while(it > pos)
            {
                *it = *(it-1);
                --it;
            }
            *pos = c;
            _size++;
            _arr[_size] = '\0';
            return pos;
        }
        string& insert(size_t pos,char c)
        {
            if(_size == _capacity)
            {
                //Capacity expansion
                size_t newcapacity = _capacity == 0 ? 1 :  2*_capacity;
                resever(newcapacity);
            }
            size_t sz = size();
            while(sz > pos)
            {
                _arr[sz] = _arr[sz-1];
                --sz;
            }
            _arr[pos] = c;
            ++_size;
            _arr[_size] = '\0';
            return *this;
        }

        string& insert(size_t pos, const char* str)
        {
            size_t len = strlen(str);
            if(_size == _capacity)
            {
                size_t newcapacity = _capacity + len;
                resever(newcapacity);
            }

            size_t sz = _size - 1;
            while(sz >= pos)
            {
                _arr[sz+len-1] = _arr[sz];
                --sz;
            }

            for(size_t i = 0; i < len && pos <= sz;++i)
            {
                _arr[pos++] = str[i];
            }
            _size = _size + len;
            _arr[_size] = '\0';
            return *this;
        }

        static const size_t npos = -1;
        string& erase(size_t pos=0 ,size_t len=npos)
        {
            size_t end = pos + len;
            if(len == npos)
                end = size();
            while(end < size())
            {
                _arr[pos++] = _arr[end++];
            }
            if(len== npos)
                _size = pos;
            else 
                _size = _size - len;
            _arr[_size] = '\0';
            return *this; 
        }

        //capacity
        size_t size() const
        {
            return _size;
        }
        size_t capacity() const
        {
            return _capacity;
        }
        size_t length()const
        {
            return _size;
        }
        void resever(size_t newcapacity)
        {
            if(_capacity < newcapacity)
            {
                //Capacity expansion
                char* tmp = new char[newcapacity];
                strcpy(tmp,_arr);
                std::swap(tmp,_arr);
                _capacity = newcapacity;
                delete []tmp;
            }
        }

        void resize(size_t newsize,char t = ' ')
        {
            if(newsize > _capacity)
            {
                //Capacity expansion
                resever(newsize);
                memset(_arr+_size,t,newsize - _size);
            }

            _arr[newsize] = '\0';
            _size = newsize;
        }


        public:
        //modify
        void clear()
        {
            erase();
        }
        void push_back(char c)
        {
            insert(end(),c);
        }

        string& operator+=(char c)
        {
            push_back(c);
            return *this;
        }
        void swap(string& s)
        {
            std::swap(_arr,s._arr);
            std::swap(_size,s._size);
            std::swap(_capacity,s._capacity);
        }
        void append(const char* str)
        {
            size_t len = strlen(str);

            if(_size + len > _capacity)
            {
                size_t newcapacity = _size + len;
                resever(newcapacity);
            }

            size_t i = 0;
            while(i < len)
            {
                _arr[_size++] = str[i]; 
                ++i;
            }
            _arr[_size] = '\0';
        }
        string& operator+=(const char* str)
        {
            append(str);
            return *this;
        }
        const char* c_str() const
        {
            return _arr;
        }
        //operator
        char& operator[](size_t index)
        {
            assert(index < _size);
            return *(_arr+index);
        }

        size_t find (char c, size_t pos = 0) const
        {
            while(pos < _size)
            {
                if(_arr[pos] == c)
                    return pos;
                pos++;
            }
            return npos;
        }

        private:
        char* _arr;
        size_t _size;
        size_t _capacity;

    };

    ostream& operator<<(ostream& out,const string& s)
    {
        for(const auto &e:s)
            out << e;
        return out;

    }
};


void TestMystring()
{
    mytest::string s1("hello");
    s1.push_back(' ');
    s1.push_back('b');
    //s1.append(1, 'i');
    //s1.append("acs");
    s1 += "tce";
    s1 += 't';
    //s1.clear();
    cout << s1 << endl;
    cout << s1.size() << endl;
    cout << s1.capacity() << endl;
    // Using iterators to print elements in string s
    mytest::string::iterator it = s1.begin();
    while (it != s1.end())
    {
        cout << *it<<" ";
        ++it;
    }
    cout << endl;

    // Here you can see that a class supports the scope for as long as it supports the basic iterator
    for(auto ch : s1)
        cout<<ch<<" ";
    cout<<endl;
}


int main()
{
    TestMystring();
    return 0;
}

Keywords: C++

Added by philwong on Thu, 10 Feb 2022 12:12:59 +0200

Programming VIP