The deep copy that bothered me for 48 hours, today finally

Received feedback from community students, hope MMClassification support kfold-cross-valid The cross validation function is arranged by the development students immediately, and it is planned to support this feature within 24 hours. However, there is a problem in development: the Config object generated by deep copy has no dump method. Therefore, the type of print object is explored and it is found that the object generated by deep copy is not of Config type. Then there is only one truth. There is something wrong with the deep copy. The following is an example that describes the problem:

# https://github.com/open-mmlab/mmcv/blob/v1.4.5/mmcv/utils/config.py 
>>> from mmcv import Config 
>>> from copy import deepcopy 
>>> cfg = Config.fromfile("./tests/data/config/a.py") 
>>> new_cfg = deepcopy(cfg) 
>>> type(cfg) == type(new_cfg) 
False 
>>> type(cfg), type(new_cfg) 
(mmcv.utils.config.Config, mmcv.utils.config.ConfigDict) 

It can be found that the object generated by deep copy is new_cfg is mmcv utils. config. Configdict type instead of the expected mmcv utils. config. Config type. Of course, in the end, the problem has been solved and the new features have been successfully launched. I heard a lot of feedback about the problem of deep copy before, so I share the whole process of solving the problem here, hoping to help you understand deep copy. To solve the problem of deep copy, we must first find out what is deep copy and the difference between it and shallow copy.

Light copy vs deep copy

When the copied object is an immutable object, such as string and tuple without variable elements, there is no difference between shallow copy and deep copy. Both return the copied object, that is, no copy occurs.

>>> import copy 
>>> a = (1, 2, 3)  # The elements of tuples are immutable objects 
>>> b = copy.copy(a)  # Shallow copy 
>>> c = copy.deepcopy(a)  # Deep copy 
>>> id(a), id(b), id(c)  # View memory address 
(140093083446128, 140093083446128, 140093083446128) 

As can be seen from the above example, the addresses of a, b and c are the same, indicating that there is no copy, and the three point to the same object. When the copied object is a variable object, such as dictionary, list, tuple with variable elements, shallow copy and deep copy are different. Shallow copy creates a new object and then copies the references in the original object. The difference is that deep copy creates a new object, and then recursively copies the values in the original object. The following is an example that shows that both light copy and deep copy will create a new object.

>>> import copy 
>>> a = [1, 2, 3] 
>>> b = copy.copy(a) 
>>> c = copy.deepcopy(a) 
>>> id(a), id(b), id(c) 
(140093084981120, 140093585550464, 140093085038592) 

As can be seen from the above example, the addresses of a, b and c are inconsistent and do not point to the same object, that is, both shallow copy and deep copy create new objects. However, if there are variable objects in a, the modification of a will affect the value of b, but not the value of c. The following is an example of a variable object in the copied object.

>>> import copy 
>>> a = [1, 2, [3, 4]] 
>>> b = copy.copy(a) 
>>> c = copy.deepcopy(a) 
>>> id(a), id(b), id(c) 
(140093082172288, 140093090759296, 140093081717760) 
>>> id(a[2]), id(b[2]), id(c[2]) 
(140093087982272, 140093087982272, 140093084980288)  # You can see that a[2] and b[2] point to the same object 
>>> a[2].append(5) 
>>> a, b, c 
([1, 2, [3, 4, 5]], [1, 2, [3, 4, 5]], [1, 2, [3, 4]]) 

As can be seen from the above example, when modifying the variable object in a, the object b generated by shallow copy is also changed, while the object c generated by deep copy is not changed. Generation of problems After understanding the difference between shallow copy and deep copy, let's return to the focus of this article. Why can't deep copy in Config be copied normally? The answer is that Config is not implemented__ deepcopy__ Magic method. So, is it not realized__ deepcopy__ Are there bound to be deep copy type inconsistencies in your classes? Let's start with an example.

>>> from copy import deepcopy 
>>> class HelloWorld: 
        def __init__(self): 
        self.attr1 = 'attribute1' 
        self.attr2 = 'attribute2' 
 
>>> hello_world = HelloWorld() 
>>> new_hello_world = deepcopy(hello_world) 
>>> type(hello_world), type(new_hello_world) 
(__main__.HelloWorld, __main__.HelloWorld) 

As can be seen from the above, the object generated by deep copy is new_hello_world and copied hello_ The world is consistent. I couldn't help thinking. Config and HelloWorld didn't provide it__ deepcopy__ Method, but why the object types of the former deep copy are inconsistent, while those of the latter are consistent. In order to find out the reason behind this, you need to read the source code of the copy module. The following is the source code of deep copy in the copy module.

# https://github.com/python/cpython/blob/3.10/Lib/copy.py#L128 
# _ deepcopy_dispatch is a dictionary used to record deep copy methods corresponding to built-in types 
_deepcopy_dispatch = d = {} 
 
def _deepcopy_atomic(x, memo): 
    return x 
 
# For immutable objects, the copied object is returned directly 
d[int] = _deepcopy_atomic 
d[float] = _deepcopy_atomic 
d[str] = _deepcopy_atomic 
 
# For mutable objects, an empty object is created first, and then the elements in the object are copied deeply 
def _deepcopy_list(x, memo, deepcopy=deepcopy): 
    y = [] 
    memo[id(x)] = y 
    append = y.append 
    for a in x: 
        append(deepcopy(a, memo)) 
    return y 
 
d[list] = _deepcopy_list 
 
def deepcopy(x, memo=None, _nil=[]): 
    """Deep copy operation on arbitrary Python objects. 
 
    See the module's __doc__ string for more info. 
    """ 
 
    if memo is None: 
        memo = {} 
 
    # If object x has been copied, the copied object y is returned 
    # Avoid circular recursive copies 
    d = id(x) 
    y = memo.get(d, _nil) 
    if y is not _nil: 
        return y 
 
    # Judge the type of x. if it is a built-in type, call the corresponding deep copy method 
    cls = type(x) 
    copier = _deepcopy_dispatch.get(cls) 
    if copier is not None: 
        y = copier(x, memo) 
    else: 
        if issubclass(cls, type): 
            y = _deepcopy_atomic(x, memo) 
        else: 
            # If you can get the of object x__ deepcopy__  Method, the method is called for deep copy 
            copier = getattr(x, "__deepcopy__", None) 
            if copier is not None: 
                y = copier(memo) 
            else: 
                # https://github.com/python/cpython/blob/3.10/Lib/copyreg.py 
                reductor = dispatch_table.get(cls) 
                if reductor: 
                    rv = reductor(x) 
                else: 
                    # __ reduce_ex__  And__ reduce__  For serialization 
                    # They return strings or tuples 
                    # https://docs.python.org/3/library/pickle.html#object.__reduce__ 
                    reductor = getattr(x, "__reduce_ex__", None) 
                    if reductor is not None: 
                        rv = reductor(4) 
                    else: 
                        reductor = getattr(x, "__reduce__", None) 
                        if reductor: 
                            rv = reductor() 
                        else: 
                            raise Error( 
                                "un(deep)copyable object of type %s" % cls) 
                if isinstance(rv, str): 
                    y = x 
                else: 
                    # When rv is a tuple, call_ reconstruct create object 
                    y = _reconstruct(x, memo, *rv) 
 
    # If is its own copy, don't memoize. 
    if y is not x: 
        memo[d] = y 
        _keep_alive(x, memo) # Make sure x lives at least as long as d 
    return y 

For the HelloWorld object hello_ world,copy.deepcopy(hello_world) is called first__ reduce_ex__ Serialize the object and then call it reconstruct creates an object. For Config object CFG, copy Deepcopy (CFG) should call Config__ deepcopy__ Method completes the copy of the object, but getattr (x), "_deepcopy_", None) (line 50 of the source code above) but can't find Config__ deepcopy__ Method, because Config does not implement this method, it calls Config's__ getattr__(self, name) method, but the method returns_ cfg_dict (type ConfigDict)__ deepcopy__ method. Therefore, the object generated by deep copy is new_ cfg = copy. The type of deepcopy (CFG) is ConfigDict.

# https://github.com/open-mmlab/mmcv/blob/v1.4.4/mmcv/utils/config.py 
class Config: 
 
    def __getattr__(self, name): 
        return getattr(self._cfg_dict, name) 

Problem solving

To avoid calling_ cfg_dict__ deepcopy__ Method, we need to add__ deepcopy__ Method, in this way, copier = getattr (x), "_deepcopy_", (none) will call Config__ deepcopy__ Complete the deep copy of the object.

# https://github.com/open-mmlab/mmcv/blob/master/mmcv/utils/config.py 
class Config: 
 
     def __deepcopy__(self, memo): 
        cls = self.__class__ 
        # Use__ new__  Create an empty object 
        other = cls.__new__(cls) 
        # Adding an other object to memo is to avoid creating the same object in a loop 
        # More about memo can be read https://pymotw.com/3/copy/ 
        memo[id(self)] = other 
 
        # Object initialization 
        for key, value in self.__dict__.items(): 
            super(Config, other).__setattr__(key, copy.deepcopy(value, memo)) 
 
        return other 

The student who developed the proposed a PR to MMCV( https://github.com/open-mmlab/mmcv/pull/1658 )The problem is finally solved. The following is an Example in PR message.

  • Before joining the PR (MMCV version < = 1.4.5)
>>> from mmcv import Config 
>>> from copy import deepcopy 
>>> cfg = Config.fromfile("./tests/data/config/a.py") 
>>> new_cfg = deepcopy(cfg) 
>>> type(cfg) == type(new_cfg) 
False 
>>> type(cfg), type(new_cfg) 
(mmcv.utils.config.Config, mmcv.utils.config.ConfigDict) 

As you can see, use copy The Config object type copied by deepcopy becomes ConfigDict type, which does not meet our expectations.

  • After closing the PR (MMCV version > 1.4.5)
>>> from mmcv import Config 
>>> from copy import deepcopy 
>>> cfg = Config.fromfile("./tests/data/config/a.py") 
>>> new_cfg = deepcopy(cfg) 
>>> type(cfg) == type(new_cfg) 
True 
>>> type(cfg), type(new_cfg) 
(mmcv.utils.config.Config, mmcv.utils.config.Config) 
>>> print(cfg._cfg_dict == new_cfg._cfg_dict) 
True 
>>> print(cfg._cfg_dict is new_cfg._cfg_dict) 
False 

After closing the PR, the copied Config object meets the expectation.

reference

Keywords: Machine Learning

Added by sametch on Tue, 22 Feb 2022 12:41:25 +0200