[Medium Python] fourth sentence: what is the relationship between class attributes and instance attributes?

preface

When it comes to programming, we have to mention object-oriented. It's a rotten topic. For decades, people have been arguing about the quality of object-oriented. From a macro point of view, the essence of a huge program is the simulation of entity sets and relationships in the business. Although it is convenient to solve the relationship problem of entities by means of process-oriented and composition, it needs the basis of object-oriented programming to solve the abstraction of entity concepts. Therefore, object-oriented programming is a very important part of the idea, which can not be arbitrarily denied or ignored.

The soul of object-oriented lies in the analysis of the concept of class, and python also supports classes. Although it does not have strong constraints like other static languages, it is completely satisfied if you want to organize the code into object-oriented. In the ordinary Python communication and interview process, many questions about classes will also be involved. For example, the most common is to explain the relationship between classes and instances. To understand the relationship between class and instance, it is most intuitive to cut in from class attribute and instance attribute. Therefore, today's article will talk about two or three things about class attributes and instance attributes in Python.

Accessing properties in classes and instances

First, the previous test code:

def print_seg(msg):
    print('\n'.join(['=' * 40, str(msg), '=' * 40 ]))


class HeadmasterDescriptor:
    headmaster: str = 'Dumbledore'

    def __get__(self, obj, cls):
        return '<HeadmasterDescriptor %s>' % self.headmaster

    def __set__(self, obj, value):
        print('<HeadmasterDescriptor> set new headmaster: %s' % value)
        self.headmaster = str(value)


class Student:
    headmaster = HeadmasterDescriptor()
    students = set()
    teacher = 'Snape'

    def __init__(self, name: str, gender: str, age: int):
        assert gender in ['male', 'female']
        self.name = name
        self.age = age
        self.gender = gender

        # add to students
        self.students.add(name)

    def __del__(self):
        print('<Student> remove student %s' % self.name)
        self.students.remove(self.name)


def main():
    # ============================== test #1 start ==================================
    student_1 = Student(name='conan', gender='male', age=18)
    student_2 = Student(name='saki', gender='female', age=15)
    print_seg('test #1 start')

    # students
    print('[student-1] students: %s' % student_1.students)
    print('[student-2] students: %s' % student_2.students)
    print('[Student] students: %s' % Student.students)

    # dir
    print('[student-1] dir: %s' % dir(student_1))
    print('[Student] dir: %s' % dir(student_1))

    # instance attributes
    print('[student-1] name: %s' % student_1.name)
    print('[student-2] gender: %s' % student_2.gender)
    print('[student-2] age: %s' % getattr(student_2, 'age'))

    # headmaster
    print('[student-1] old headmaster: %s' % getattr(student_1, 'headmaster'))
    print('[student-2] old headmaster: %s' % student_2.headmaster)
    print('[Student] new headmaster: %s' % Student.headmaster)
    print('%s, %s' % (id(student_2.headmaster), id(Student.headmaster)))
    student_1.headmaster = 'Alan Tam'
    print('[student-1] new headmaster: %s' % student_1.headmaster)
    print('[student-2] new headmaster: %s' % student_2.headmaster)
    print('[Student] new headmaster: %s' % getattr(Student, 'headmaster'))

    # remove student
    del student_1
    print('[student-2] students: %s' % student_2.students)
    print('[Student] students: %s' % Student.students)

    # set teacher
    print('[student-2] teacher: %s' % student_2.teacher)
    print('[Student] teacher: %s' % Student.teacher)
    student_2.teacher = 'Jodie'
    print('[student-2] teacher: %s' % student_2.teacher)
    print('[Student] teacher: %s' % Student.teacher)

    print_seg('test#1 end')
    # ============================== test #1 end ==================================


if __name__ == '__main__':
    main()

This code constructs such a scenario:

  • First, create two students. During the creation process, add the student name to the set students
  • Print the attributes & methods of Student instance student-1 and Student class Student with dir, and then access the instance attributes by clicking or getattr
  • Replace the headmaster
  • Delete student_ one
  • Replace the teacher

We see the printed result as follows:

========================================
test #1 start
========================================
[student-1] students: {'saki', 'conan'}
[student-2] students: {'saki', 'conan'}
[Student] students: {'saki', 'conan'}
[student-1] dir: ['__class__', '__del__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', 'age', 'gender', 'headmaster', 'name', 'students', 'teacher']
[Student] dir: ['__class__', '__del__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', 'age', 'gender', 'headmaster', 'name', 'students', 'teacher']
[student-1] name: conan
[student-2] gender: female
[student-2] age: 15
[student-1] old headmaster: <HeadmasterDescriptor Dumbledore>
[student-2] old headmaster: <HeadmasterDescriptor Dumbledore>
[Student] new headmaster: <HeadmasterDescriptor Dumbledore>
1999801558000, 1999801558000
<HeadmasterDescriptor> set new headmaster: Alan Tam
[student-1] new headmaster: <HeadmasterDescriptor Alan Tam>
[student-2] new headmaster: <HeadmasterDescriptor Alan Tam>
[Student] new headmaster: <HeadmasterDescriptor Alan Tam>
<Student> remove student conan
[student-2] students: {'saki'}
[Student] students: {'saki'}
[student-2] teacher: Snape
[Student] teacher: Snape
[student-2] teacher: Jodie
[Student] teacher: Snape
========================================
test#1 end
========================================
<Student> remove student saki

You can see:

  • When two students are created, the name of the student is added to the class attribute students. You can get the same results whether you access students in a class or an instance.
  • Through getattr or dotted methods, you can normally access the properties of classes or instances.
  • Replace the headmaster.
    • headmaster is a descriptor with getter and setter
    • The so-called descriptor can be simply understood as a special attribute that supports specific behaviors according to different access methods. We perform a get operation on the headmaster and get__ get__ Method. When a new value is assigned to the headmaster, if a band is found__ set__ This logic will be triggered to change the value maintained in the descriptor of the class attribute headmaster.
    • Otherwise, if the attribute is not defined as a descriptor in the class and is defined in the instance, the modification operation will only change the attribute value maintained by the instance itself, just like the following teacher.
  • Delete student_1. Class definition is triggered__ del__ Function, student_ The name of 1 is removed from students
  • student_2. Replace the teacher. Only your own class is replaced. The original class attribute is still the default.

In terms of performance, classes are more of a template than instances. Each instance is similar to fork a class, and then__ init__ Add your own properties in. If an instance wants to access a property, the property is__ init__ If it is not defined, it will look for attributes from its own class in the next step. If you can't find it yet, you have to find it from the parent class, and it comes to the topic of (multiple) inheritance.

So why is there such a presentation? Next, let's go deep into the source code and find out.
Gets the property of the class whose bytecode is LOAD_ATTR. After some in-depth, it will eventually be implemented to_ PyObject_GenericGetAttrWithDict

// object.c

PyObject *
_PyObject_GenericGetAttrWithDict(PyObject *obj, PyObject *name,
                                 PyObject *dict, int suppress)
{
    /* Make sure the logic of _PyObject_GetMethod is in sync with
       this method.

       When suppress=1, this function suppress AttributeError.
    */

    PyTypeObject *tp = Py_TYPE(obj);
    PyObject *descr = NULL;
    PyObject *res = NULL;
    descrgetfunc f;
    Py_ssize_t dictoffset;
    PyObject **dictptr;

    if (!PyUnicode_Check(name)){
        PyErr_Format(PyExc_TypeError,
                     "attribute name must be string, not '%.200s'",
                     Py_TYPE(name)->tp_name);
        return NULL;
    }
    Py_INCREF(name);

    if (tp->tp_dict == NULL) {
        if (PyType_Ready(tp) < 0)
            goto done;
    }

    descr = _PyType_Lookup(tp, name);

    f = NULL;
    if (descr != NULL) {
        Py_INCREF(descr);
        f = Py_TYPE(descr)->tp_descr_get;
        if (f != NULL && PyDescr_IsData(descr)) {
            res = f(descr, obj, (PyObject *)Py_TYPE(obj));
            if (res == NULL && suppress &&
                    PyErr_ExceptionMatches(PyExc_AttributeError)) {
                PyErr_Clear();
            }
            goto done;
        }
    }

    if (dict == NULL) {
        /* Inline _PyObject_GetDictPtr */
        dictoffset = tp->tp_dictoffset;
        if (dictoffset != 0) {
            if (dictoffset < 0) {
                Py_ssize_t tsize = Py_SIZE(obj);
                if (tsize < 0) {
                    tsize = -tsize;
                }
                size_t size = _PyObject_VAR_SIZE(tp, tsize);
                _PyObject_ASSERT(obj, size <= PY_SSIZE_T_MAX);

                dictoffset += (Py_ssize_t)size;
                _PyObject_ASSERT(obj, dictoffset > 0);
                _PyObject_ASSERT(obj, dictoffset % SIZEOF_VOID_P == 0);
            }
            dictptr = (PyObject **) ((char *)obj + dictoffset);
            dict = *dictptr;
        }
    }
    if (dict != NULL) {
        Py_INCREF(dict);
        res = PyDict_GetItemWithError(dict, name);
        if (res != NULL) {
            Py_INCREF(res);
            Py_DECREF(dict);
            goto done;
        }
        else {
            Py_DECREF(dict);
            if (PyErr_Occurred()) {
                if (suppress && PyErr_ExceptionMatches(PyExc_AttributeError)) {
                    PyErr_Clear();
                }
                else {
                    goto done;
                }
            }
        }
    }

    if (f != NULL) {
        res = f(descr, obj, (PyObject *)Py_TYPE(obj));
        if (res == NULL && suppress &&
                PyErr_ExceptionMatches(PyExc_AttributeError)) {
            PyErr_Clear();
        }
        goto done;
    }

    if (descr != NULL) {
        res = descr;
        descr = NULL;
        goto done;
    }

    if (!suppress) {
        PyErr_Format(PyExc_AttributeError,
                     "'%.50s' object has no attribute '%U'",
                     tp->tp_name, name);
    }
  done:
    Py_XDECREF(descr);
    Py_DECREF(name);
    return res;
}

From this code and some function call definitions, we can know that the priority of obtaining attributes is as follows:

  • First, search the class inheritance chain for whether there is a descriptor with the corresponding name. If there is a belt__ get__ The descriptor of the and contains__ set__ If yes (judged by PyDescr_IsData), this descriptor is preferred
  • The second is in the example__ dict__ Find properties in
    • Interested students can add to the examples and classes in the above test code__ dict__, Look at the output
  • Then if this descriptor has a band__ get__ Yes, through this descriptor__ get__ Method to get the value of this property
  • Finally, if you don't bring__ get__ Yes, maybe this is just an ordinary instance. It is not strictly a descriptor (such as Student.teacher). Just return the corresponding value

The inheritance chain of a class can be passed through the inheritance chain of a class__ mro__ Attribute is obtained by C3 linearization algorithm. Interested students can understand the principle and code implementation behind it. python's multiple inheritance is based on this mechanism.
From the example of the previous code, we can see the band__ get__,__ set__ Indeed, the same descriptor is obtained from both classes and instances.

Similarly, when the property is set, the_ PyObject_GenericSetAttrWithDict (in object.c, there is no source code here. If you are interested in consulting it yourself). The priorities are:

  • Finding properties in inheritance chain with__ set__ Directly call descriptor__ set__
  • If not, it is in the instance__ dict__ Set properties directly in

Accessing instance properties in class methods

We usually add various functions (Methods) to the class definition, and we also access a large number of self attributes in the method definition. But we should know that python is a dynamic language. Self in the function definition is not necessarily an instance of this class. Not only can subclass instances be passed in, but also class instances that meet the access of properties in the function.

Students who have studied golang or learned the concepts of combination and ecs will understand that if a thing has earthy yellow skin, no pattern, large body, fast running, sharp teeth, meat and golden hair, it should be a male lion. Even if other animals in the world may meet these conditions, we can treat them equally if we only care about these characteristics. The same is true for class methods. If there are no constraints on instance types and only constraints on instance properties, the instances that meet these properties can become parameters of class methods. Let's take an example:

class Lion:
    def __init__(self, name: str, age: int):
        self.name = name
        self.age = age


class Student:
    def __init__(self, name: str, gender: str, age: int):
        assert gender in ['male', 'female']
        self.name = name
        self.gender = gender
        self.age = age

    def output(self):
        print('class: %s, name: %s, age: %s' % (
            self.__class__.__name__,
            self.name,
            self.age
        ))


if __name__ == '__main__':
    s = Student(name='haha', gender='male', age=18)
    s.output()
    Student.output(s)
    Student.output(Lion(name='simba', age=5))

The focus is on the last sentence: Student.output(Lion(name='simba', age=5)) -- this sentence is true, and there is no error from the printed results. At the same time, s.output() and Student.output(s) show the same effect.

We print the opcode results of s.output() and Student.output(s) (the above code is slightly changed, for example, instance s becomes student), and we can find the differences:

 29          26 LOAD_FAST                0 (student)
             28 LOAD_METHOD              2 (output)
             30 CALL_METHOD              0
             32 POP_TOP

 30          34 LOAD_GLOBAL              0 (Student)
             36 LOAD_METHOD              2 (output)
             38 LOAD_FAST                0 (student)
             40 CALL_METHOD              1
             42 POP_TOP

When calling student.output, load directly_ Call after method; When calling student.output, you also need to load a parameter student (of course, it can also be other instances) and then call. opcode rendering is consistent with the source code.

Let's first look at load_ Content corresponding to method

case TARGET(LOAD_METHOD): {
    /* Designed to work in tandem with CALL_METHOD. */
    PyObject *name = GETITEM(names, oparg);
    PyObject *obj = TOP();
    PyObject *meth = NULL;

    int meth_found = _PyObject_GetMethod(obj, name, &meth);

    if (meth == NULL) {
        /* Most likely attribute wasn't found. */
        goto error;
    }

    if (meth_found) {
        /* We can bypass temporary bound method object.
                   meth is unbound method and obj is self.

                   meth | self | arg1 | ... | argN
                 */
        SET_TOP(meth);
        PUSH(obj);  // self
    }
    else {
        /* meth is not an unbound method (but a regular attr, or
                   something was returned by a descriptor protocol).  Set
                   the second element of the stack to NULL, to signal
                   CALL_METHOD that it's not a method call.

                   NULL | meth | arg1 | ... | argN
                */
        SET_TOP(NULL);
        Py_DECREF(obj);
        PUSH(meth);
    }
    DISPATCH();
}

LOAD_METHOD will be implemented to_ PyObject_GetMethod method:

int
_PyObject_GetMethod(PyObject *obj, PyObject *name, PyObject **method)
{
    PyTypeObject *tp = Py_TYPE(obj);
    PyObject *descr;
    descrgetfunc f = NULL;
    PyObject **dictptr, *dict;
    PyObject *attr;
    int meth_found = 0;

    assert(*method == NULL);

    if (Py_TYPE(obj)->tp_getattro != PyObject_GenericGetAttr
            || !PyUnicode_Check(name)) {
        *method = PyObject_GetAttr(obj, name);
        return 0;
    }

    if (tp->tp_dict == NULL && PyType_Ready(tp) < 0)
        return 0;

    descr = _PyType_Lookup(tp, name);
    if (descr != NULL) {
        Py_INCREF(descr);
        if (_PyType_HasFeature(Py_TYPE(descr), Py_TPFLAGS_METHOD_DESCRIPTOR)) {
            meth_found = 1;
        } else {
            f = Py_TYPE(descr)->tp_descr_get;
            if (f != NULL && PyDescr_IsData(descr)) {
                *method = f(descr, obj, (PyObject *)Py_TYPE(obj));
                Py_DECREF(descr);
                return 0;
            }
        }
    }

    dictptr = _PyObject_GetDictPtr(obj);
    if (dictptr != NULL && (dict = *dictptr) != NULL) {
        Py_INCREF(dict);
        attr = PyDict_GetItemWithError(dict, name);
        if (attr != NULL) {
            Py_INCREF(attr);
            *method = attr;
            Py_DECREF(dict);
            Py_XDECREF(descr);
            return 0;
        }
        else {
            Py_DECREF(dict);
            if (PyErr_Occurred()) {
                Py_XDECREF(descr);
                return 0;
            }
        }
    }

    if (meth_found) {
        *method = descr;
        return 1;
    }

    if (f != NULL) {
        *method = f(descr, obj, (PyObject *)Py_TYPE(obj));
        Py_DECREF(descr);
        return 0;
    }

    if (descr != NULL) {
        *method = descr;
        return 0;
    }

    PyErr_Format(PyExc_AttributeError,
                 "'%.50s' object has no attribute '%U'",
                 tp->tp_name, name);
    return 0;
}

_ PyObject_GetMethod contains multiple judgments. For student.output and student.output, we go to different branches:

  • When student.output is called, the_ PyType_Lookup(tp, name) finds the descriptor of the output function (PyFunction_Type) and calls tp_descr_get corresponding method
    • Finally back to LOAD_METHOD, go to the branch corresponding to if (meth_found)
    • At the top of the stack is the student instance, followed by the output method
  • When calling Student.output, because Student is a type, PY_ TYPE(obj)->tp_ Getattro is PyType_Type in type_getattro instead of PyObject_GenericGetAttr, and the name is legal, so pyobject is used directly_ Getattr (obj, name) found the corresponding output method
    • Finally back to LOAD_METHOD, go to the branch corresponding to else of if (meth_found)
    • At the top of the stack is the output method, followed by NULL

LOAD_METHOD is followed by CALL_METHOD, at this time, we should remember that Student.output has parameters, so we will push an additional student to the stack. After that, let's look at CALL_METHOD:

case TARGET(CALL_METHOD): {
    /* Designed to work in tamdem with LOAD_METHOD. */
    PyObject **sp, *res, *meth;

    sp = stack_pointer;

    meth = PEEK(oparg + 2);
    if (meth == NULL) {
        /* `meth` is NULL when LOAD_METHOD thinks that it's not
                   a method call.

                   Stack layout:

                       ... | NULL | callable | arg1 | ... | argN
                                                            ^- TOP()
                                               ^- (-oparg)
                                    ^- (-oparg-1)
                             ^- (-oparg-2)

                   `callable` will be POPed by call_function.
                   NULL will will be POPed manually later.
                */
        res = call_function(tstate, &sp, oparg, NULL);
        stack_pointer = sp;
        (void)POP(); /* POP the NULL. */
    }
    else {
        /* This is a method call.  Stack layout:

                     ... | method | self | arg1 | ... | argN
                                                        ^- TOP()
                                           ^- (-oparg)
                                    ^- (-oparg-1)
                           ^- (-oparg-2)

                  `self` and `method` will be POPed by call_function.
                  We'll be passing `oparg + 1` to call_function, to
                  make it accept the `self` as a first argument.
                */
        res = call_function(tstate, &sp, oparg + 1, NULL);
        stack_pointer = sp;
    }

    PUSH(res);
    if (res == NULL)
        goto error;
    DISPATCH();
}

In call_ Stack will be checked in method. If there is no accident, the top-down structure on the stack should be like this (see the notes of LOAD_METHOD and CALL_METHOD):

  • student.output: parameter N ~ parameter 1 (if any), student instance, output method
  • Student.output: parameter N+1 ~ parameter 2 (if any), student instance (parameter 1), output method, NULL

Call will eventually be called_ Function, and the final effect of both is Student.output(student, *args, **kwargs).
Therefore, we can see that if an instance of another class is passed in Student.output, if the relevant attribute is called, pyobject will eventually be called in the instance of another class_ GenericGetAttrWithDict. This method is universal and constrained to a single class, so it can explain why Student.output(Lion(name='simba', age=5)) is true (of course, it is not recommended to write this in real programming! There are other better workaround to improve code readability).

Application of descriptor property: Property

In the last two sections, we mentioned the concept of descriptor many times. In normal python programming, it is basically difficult for us to contact the descriptor, but in the internal implementation of python, the descriptor is a very core part. It can be said that it is a data structure born to adapt to Python's class attribute access interface. Therefore, it is necessary for every Python student to understand this concept.

Let's say that some common built-in class attribute definitions, such as property, classmethod and staticmethod, are presented in our program in the form of decorators, but their essence is descriptors. This is also the magic of descriptor.

This paper takes property as an example to analyze the application of descriptor. First, let's look at the test code:

import pprint


class Human:
    def __init__(self, first_name='', last_name=''):
        self.__first_name = first_name
        self.__last_name = last_name

    @property
    def first_name(self):
        return self.__first_name

    @first_name.setter
    def first_name(self, value):
        print('[%s] change first name to %s' % (id(self), value))
        self.__first_name = str(value)

    @property
    def last_name(self):
        return self.__last_name

    @last_name.setter
    def last_name(self, value):
        print('[%s] change last name to %s' % (id(self), value))
        self.__last_name = str(value)

    @property
    def full_name(self):
        return '%s %s' % (self.first_name, self.last_name)


def main():
    h = Human(first_name='James', last_name='Bond')
    h1 = Human(first_name='Anatoli', last_name='Todorov')
    print('first name: %s' % h.first_name)
    print('last name: %s' % h.last_name)
    print('full name: %s' % h.full_name)
    h.first_name = 'Jiss'
    h1.last_name = 'Toledo'
    print('[h] first name: %s' % h.first_name)
    print('[h] full name: %s' % h.full_name)
    print('[h1] last name: %s' % h1.last_name)
    print('[h1] full name: %s' % h1.full_name)
    print('[Human] first name: %s' % Human.first_name)
    print('[Human] last name: %s' % Human.last_name)
    print('[Human] full name: %s' % Human.full_name)
    print('[h] dict: %s' % h.__dict__)
    print('[h1] dict: %s' % h1.__dict__)
    print('[Human] dict: %s' % pprint.pformat(Human.__dict__))


if __name__ == '__main__':
    main()

The printed result is:

first name: James
last name: Bond
full name: James Bond
[1661162208128] change first name to Jiss
[1661162208032] change last name to Toledo
[h] first name: Jiss
[h] full name: Jiss Bond
[h1] last name: Toledo
[h1] full name: Anatoli Toledo
[Human] first name: <property object at 0x00000182C4FB7B80>
[Human] last name: <property object at 0x00000182C4FB7BD0>
[Human] full name: <property object at 0x00000182C4F9E220>
[h] dict: {'_Human__first_name': 'Jiss', '_Human__last_name': 'Bond'}
[h1] dict: {'_Human__first_name': 'Anatoli', '_Human__last_name': 'Toledo'}
[Human] dict: mappingproxy({'__dict__': <attribute '__dict__' of 'Human' objects>,
              '__doc__': None,
              '__init__': <function Human.__init__ at 0x00000182C4FB3B80>,
              '__module__': '__main__',
              '__weakref__': <attribute '__weakref__' of 'Human' objects>,
              'first_name': <property object at 0x00000182C4FB7B80>,
              'full_name': <property object at 0x00000182C4F9E220>,
              'last_name': <property object at 0x00000182C4FB7BD0>})

Property is often used to normalize the definition of some private variables. Here we start from the example__ dict__ You can also see that the private variable with two underscore prefixes has been secretly renamed during code compilation (interested students can learn about _Py_Mangle). In the class, all property related properties are property object s; In the instance, all property related properties are independent of each other and have specific values.

How does property achieve this effect? We need to go into the source code to find out.

First, let's look at the type definition pyproperty corresponding to property_ Type

PyTypeObject PyProperty_Type = {
    PyVarObject_HEAD_INIT(&PyType_Type, 0)
    "property",                                 /* tp_name */
    sizeof(propertyobject),                     /* tp_basicsize */
    0,                                          /* tp_itemsize */
    /* methods */
    property_dealloc,                           /* tp_dealloc */
    0,                                          /* tp_vectorcall_offset */
    0,                                          /* tp_getattr */
    0,                                          /* tp_setattr */
    0,                                          /* tp_as_async */
    0,                                          /* tp_repr */
    0,                                          /* tp_as_number */
    0,                                          /* tp_as_sequence */
    0,                                          /* tp_as_mapping */
    0,                                          /* tp_hash */
    0,                                          /* tp_call */
    0,                                          /* tp_str */
    PyObject_GenericGetAttr,                    /* tp_getattro */
    0,                                          /* tp_setattro */
    0,                                          /* tp_as_buffer */
    Py_TPFLAGS_DEFAULT | Py_TPFLAGS_HAVE_GC |
        Py_TPFLAGS_BASETYPE,                    /* tp_flags */
    property_init__doc__,                       /* tp_doc */
    property_traverse,                          /* tp_traverse */
    (inquiry)property_clear,                    /* tp_clear */
    0,                                          /* tp_richcompare */
    0,                                          /* tp_weaklistoffset */
    0,                                          /* tp_iter */
    0,                                          /* tp_iternext */
    property_methods,                           /* tp_methods */
    property_members,                           /* tp_members */
    property_getsetlist,                        /* tp_getset */
    0,                                          /* tp_base */
    0,                                          /* tp_dict */
    property_descr_get,                         /* tp_descr_get */
    property_descr_set,                         /* tp_descr_set */
    0,                                          /* tp_dictoffset */
    property_init,                              /* tp_init */
    PyType_GenericAlloc,                        /* tp_alloc */
    PyType_GenericNew,                          /* tp_new */
    PyObject_GC_Del,                            /* tp_free */
};

From the definition, we can see that pyproperty_ TP of type_ descr_ Get and tp_descr_set has a corresponding callback function, so any instance of property can be regarded as a descriptor

When we create a property instance, we use the decorator writing method, and the parsed form is similar to property(function). Therefore, we need to take another look at the process of property instance creation.

Property itself is a type object. When creating an instance, it is parsed according to the writing method of property(function), which is equivalent to treating property as a function. We can first look at the pytype class of type object_ Definition of type:

PyTypeObject PyType_Type = {
    PyVarObject_HEAD_INIT(&PyType_Type, 0)
    "type",                                     /* tp_name */
    sizeof(PyHeapTypeObject),                   /* tp_basicsize */
    sizeof(PyMemberDef),                        /* tp_itemsize */
    (destructor)type_dealloc,                   /* tp_dealloc */
    offsetof(PyTypeObject, tp_vectorcall),      /* tp_vectorcall_offset */
    0,                                          /* tp_getattr */
    0,                                          /* tp_setattr */
    0,                                          /* tp_as_async */
    (reprfunc)type_repr,                        /* tp_repr */
    0,                                          /* tp_as_number */
    0,                                          /* tp_as_sequence */
    0,                                          /* tp_as_mapping */
    0,                                          /* tp_hash */
    (ternaryfunc)type_call,                     /* tp_call */
    0,                                          /* tp_str */
    (getattrofunc)type_getattro,                /* tp_getattro */
    (setattrofunc)type_setattro,                /* tp_setattro */
    0,                                          /* tp_as_buffer */
    Py_TPFLAGS_DEFAULT | Py_TPFLAGS_HAVE_GC |
    Py_TPFLAGS_BASETYPE | Py_TPFLAGS_TYPE_SUBCLASS |
    Py_TPFLAGS_HAVE_VECTORCALL,                 /* tp_flags */
    type_doc,                                   /* tp_doc */
    (traverseproc)type_traverse,                /* tp_traverse */
    (inquiry)type_clear,                        /* tp_clear */
    0,                                          /* tp_richcompare */
    offsetof(PyTypeObject, tp_weaklist),        /* tp_weaklistoffset */
    0,                                          /* tp_iter */
    0,                                          /* tp_iternext */
    type_methods,                               /* tp_methods */
    type_members,                               /* tp_members */
    type_getsets,                               /* tp_getset */
    0,                                          /* tp_base */
    0,                                          /* tp_dict */
    0,                                          /* tp_descr_get */
    0,                                          /* tp_descr_set */
    offsetof(PyTypeObject, tp_dict),            /* tp_dictoffset */
    type_init,                                  /* tp_init */
    0,                                          /* tp_alloc */
    type_new,                                   /* tp_new */
    PyObject_GC_Del,                            /* tp_free */
    (inquiry)type_is_gc,                        /* tp_is_gc */
};

It is easy to know that if type object is treated as a function (strictly speaking, it is called callable), TP will be triggered_ Type corresponding to call_ Call function. type_ The call function is not listed here, because only two steps are the most critical:

  • obj = type->tp_new(type, args, kwds)
  • res = type->tp_init(obj, args, kwds)

That is, to create a property instance, you need to find the TP of the property_ Get an empty instance of the function corresponding to new, and then find TP_ Initialize the instance of the function corresponding to init. By pyproperty_ According to the definition of type, TP of property_ New corresponds to PyType_GenericNew, which simply allocates memory space; And tp_init corresponds to property_init,property_ The real implementation of init is property_init_impl, so let's look directly at property_init_ Definition of impl.

static int
property_init_impl(propertyobject *self, PyObject *fget, PyObject *fset,
                   PyObject *fdel, PyObject *doc)
/*[clinic end generated code: output=01a960742b692b57 input=dfb5dbbffc6932d5]*/
{
    if (fget == Py_None)
        fget = NULL;
    if (fset == Py_None)
        fset = NULL;
    if (fdel == Py_None)
        fdel = NULL;

    Py_XINCREF(fget);
    Py_XINCREF(fset);
    Py_XINCREF(fdel);
    Py_XINCREF(doc);

    Py_XSETREF(self->prop_get, fget);
    Py_XSETREF(self->prop_set, fset);
    Py_XSETREF(self->prop_del, fdel);
    Py_XSETREF(self->prop_doc, doc);
    self->getter_doc = 0;

    /* if no docstring given and the getter has one, use that one */
    if ((doc == NULL || doc == Py_None) && fget != NULL) {
        _Py_IDENTIFIER(__doc__);
        PyObject *get_doc;
        int rc = _PyObject_LookupAttrId(fget, &PyId___doc__, &get_doc);
        if (rc <= 0) {
            return rc;
        }
        if (Py_IS_TYPE(self, &PyProperty_Type)) {
            Py_XSETREF(self->prop_doc, get_doc);
        }
        else {
            /* If this is a property subclass, put __doc__
               in dict of the subclass instance instead,
               otherwise it gets shadowed by __doc__ in the
               class's dict. */
            int err = _PyObject_SetAttrId((PyObject *)self, &PyId___doc__, get_doc);
            Py_DECREF(get_doc);
            if (err < 0)
                return -1;
        }
        self->getter_doc = 1;
    }

    return 0;
}

The property instance is initialized. The parameters passed in are fget, fset, fdel and doc. From property_ init_ In impl, it is easy to know that the four parameters will eventually be put into prop_get,prop_set,prop_del,prop_doc. Generally, the function wrapped with the property decorator corresponds to fget.

In the previous "access properties", we learned that if a class instance accesses properties, the first priority is to judge whether there is a data descriptor corresponding to a name on the inheritance chain (including _ get and _ set). From the above example, we will finally find that first is defined in the class_ Name and other property objects, and the type of property object is PyProperty_Type is to have TP at the same time_ descr_ Get and tp_descr_set can be used as a data descriptor, so in_ PyObject_GenericGetAttrWithDict will directly go to the logic of f(descr, obj, (PyObject *)Py_TYPE(obj)) to return the result. Convert the variable name involved in the function to tp_descr_get(property instance, class instance, class)

TP of property_descr_get corresponds to property_descr_get, let's look at the definition:

static PyObject *
property_descr_get(PyObject *self, PyObject *obj, PyObject *type)
{
    if (obj == NULL || obj == Py_None) {
        Py_INCREF(self);
        return self;
    }

    propertyobject *gs = (propertyobject *)self;
    if (gs->prop_get == NULL) {
        PyErr_SetString(PyExc_AttributeError, "unreadable attribute");
        return NULL;
    }

    return PyObject_CallOneArg(gs->prop_get, obj);
}

We can see that the logic of the getter will eventually call the prop of the property object_ Get, pass parameter obj. Obviously, this prop_get is the fget function wrapped in property, and obj is the class instance. The first parameter of our fget function is self. Obviously, we get the properties corresponding to the class instance directly through fget.

At this point, we can also find that in the python test code above, h.first_name and Human.first_name.fget(h), the two are the same expression. By extension, suppose there is an instance of a class called fakehuman, which contains an attribute called_ Human__first_name (as mentioned above, double underlined variables will be renamed), then human.first_ The expression name. Fget (fakehuman) is also passed!

The setter of property is the same routine, which will eventually be implemented into property_descr_set

static int
property_descr_set(PyObject *self, PyObject *obj, PyObject *value)
{
    propertyobject *gs = (propertyobject *)self;
    PyObject *func, *res;

    if (value == NULL)
        func = gs->prop_del;
    else
        func = gs->prop_set;
    if (func == NULL) {
        PyErr_SetString(PyExc_AttributeError,
                        value == NULL ?
                        "can't delete attribute" :
                "can't set attribute");
        return -1;
    }
    if (value == NULL)
        res = PyObject_CallOneArg(func, obj);
    else
        res = PyObject_CallFunctionObjArgs(func, obj, value, NULL);
    if (res == NULL)
        return -1;
    Py_DECREF(res);
    return 0;
}

From property_ descr_ It can be seen from set that when value is null (not passed into python's None HA), prop will be called_ Delete corresponds to the delete (delete is used for the delete attribute, and the corresponding opcode is DELETE_ATTR, which essentially sets attr to null); When value is non null, prop is called_ The setter corresponding to set. In this way, the requirements of a property setter and a delete can be easily met.

From the implementation of property, we can see that descriptor acts as a bridge between the property itself and user access. It is a very flexible and ingenious idea. classmethod and static method are also non data descriptor (without setter) in essence. Interested students can also study them in depth.

summary

This paper analyzes the relationship between python classes and instances through the relationship between class / instance attributes, and extends an important concept in python's internal implementation - descriptor. Three months ago, I didn't even know what descriptor was. After this period of study, I also had a new understanding of the implementation mechanism of python class / instance attributes. I hope you can get something from reading this article.

Keywords: Python Class OOP

Added by pspeakman on Sat, 06 Nov 2021 13:16:51 +0200