Analysis of iOS principle: load and initialize methods from the perspective of source code

Analysis of iOS principle: load and initialize methods from the perspective of source code

1, Introduction

In iOS development, NSObject class is the base class of everything. It is very important in the sorting class architecture of Objective-C. there are two famous methods: load method and initialize method.

+ (void)load;
+ (void)initialize;

Speaking of these two methods, your first reaction must be that they are too old-fashioned. The call timing and function of these two methods have almost become a necessary question for iOS interview. The call timing itself is also very simple:

1. The load method is called in the pre main stage, and each class will be called only once.

2. The initialize method will be called before the first method call of the class or subclass.

The above two explanations are correct in themselves, but in addition, there are many problems worthy of further study, such as:

1. What is the calling order of the load method of the subclass and the parent class?

2. What is the calling order of load methods of classes and classifications?

3. If the subclass does not implement the load method, will the parent class be called?

4. What happens when multiple classifications implement the load method?

5. What is the calling order of the load method of each class?

6. What is the calling order of initialize methods of parent and child classes?

7. After the subclass implements the initialize method, will the initialize method of the parent class be called?

8. What happens when multiple classifications implement the initialize method?

9. ...

Can you give a clear answer to all the questions mentioned above? In fact, the load and initialize methods themselves have many interesting features. In this blog, we will make an in-depth analysis of the implementation principles of these two methods in combination with the Objective-C source code. I believe that if you don't know enough about load and initialize and can't fully understand the problems raised above, this blog will be full of harvest. Whether you use the load and initialize methods in future interviews or at work, it may help you understand their implementation principle from the source code.

2, Practice leads to true knowledge - first look at the method

Before starting the analysis, we can first create a test project to do a simple test on the execution of the load method. First, we create an Xcode command line program project, in which we create some classes, subclasses and classifications to facilitate our testing. The directory structure is shown in the following figure:

Among them, MyObjectOne and MyObjectTwo are classes inherited from NSObject, MySubObjectOne is a subclass of MyObjectOne, and MySubObjectTwo is a subclass of MyObjectTwo. At the same time, we also created three classifications, implemented the load method in the class, and printed it, as follows:

+ (void)load {
    NSLog(@"load:%@", [self className]);
}

Similarly, similar are implemented in classification:

+ (void)load {
    NSLog(@"load-category:%@", [self className]);
}

Finally, we add a Log in the main function:

int main(int argc, const char * argv[]) {
    @autoreleasepool {
        NSLog(@"Main");
    }
    return 0;
}

Run the project, and the print results are as follows:

2021-02-18 14:33:46.773294+0800 KCObjc[21400:23090040] load:MyObjectOne
2021-02-18 14:33:46.773867+0800 KCObjc[21400:23090040] load:MySubObjectOne
2021-02-18 14:33:46.773959+0800 KCObjc[21400:23090040] load:MyObjectTwo
2021-02-18 14:33:46.774008+0800 KCObjc[21400:23090040] load:MySubObjectTwo
2021-02-18 14:33:46.774052+0800 KCObjc[21400:23090040] load-category:MyObjectTwo
2021-02-18 14:33:46.774090+0800 KCObjc[21400:23090040] load-category:MyObjectOne
2021-02-18 14:33:46.774127+0800 KCObjc[21400:23090040] load-category:MyObjectOne
2021-02-18 14:33:46.774231+0800 KCObjc[21400:23090040] Main

It can be seen from the print results that the load method is called before the main method starts. In terms of execution sequence, first call the load method of the class, and then call the classified load method. From the relationship between parent and child classes, first call the load method of the parent class, and then call the load method of the child class.

Next, let's analyze from the source code what kind of mystery the system originates from when it calls the load method.

3, Analyze the call of load method from the source code

To deeply study the load method, we first need to start with the initialization function of Objective-C:

void _objc_init(void)
{
    static bool initialized = false;
    if (initialized) return;
    initialized = true;
    
    // fixme defer initialization until an objc-using image is found?
    environ_init();
    tls_init();
    static_init();
    runtime_init();
    exception_init();
    cache_init();
    _imp_implementationWithBlock_init();

    // We don't need to pay attention to anything else, just this line of code
    _dyld_objc_notify_register(&map_images, load_images, unmap_image);

#if __OBJC2__
    didCallDyldNotifyRegister = true;
#endif
}

_ objc_ The init function is defined in objc OS Mm file, this function is used to initialize the Objective-C program, which is called by the boot program. In fact, it will be called very early, and it is driven by the complex call of the operating system boot program, which is insensitive to the developer. In_ objc_ In the init function, environment initialization, runtime initialization, cache initialization and other operations will be carried out. A very important step is execution_ dyld_objc_notify_register function, which will call load_images function to load the image.

The call of the load method is actually a step in the class loading process. First, let's look at a load_ Implementation of images function:

void
load_images(const char *path __unused, const struct mach_header *mh)
{
    if (!didInitialAttachCategories && didCallDyldNotifyRegister) {
        didInitialAttachCategories = true;
        loadAllCategories();
    }

    // Return without taking locks if there are no +load methods here.
    if (!hasLoadMethods((const headerType *)mh)) return;

    recursive_mutex_locker_t lock(loadMethodLock);

    // Discover load methods
    {
        mutex_locker_t lock2(runtimeLock);
        prepare_load_methods((const headerType *)mh);
    }

    // Call +load methods (without runtimeLock - re-entrant)
    call_load_methods();
}

Filter out the parts we don't care about. The core related to the load method call is as follows:

void
load_images(const char *path __unused, const struct mach_header *mh)
{
    // There is no load method in the image. It is returned directly
    if (!hasLoadMethods((const headerType *)mh)) return;
    {
        // Prepare load method
        prepare_load_methods((const headerType *)mh);
    }
    // Make a call to the load method
    call_load_methods();
}

The core part is the preparation of the load method and the call of the laod method. Let's look at it step by step. Let's look at the preparation of the load method first (we remove the irrelevant parts):

void prepare_load_methods(const headerType *mhdr)
{
    size_t count, i;
    // Get a list of all class components
    classref_t const *classlist = 
        _getObjc2NonlazyClassList(mhdr, &count);
    for (i = 0; i < count; i++) {
        // Organize the load methods of all classes
        schedule_class_load(remapClass(classlist[i]));
    }
    // Get a list of all classification components
    category_t * const *categorylist = _getObjc2NonlazyCategoryList(mhdr, &count);
    for (i = 0; i < count; i++) {
        category_t *cat = categorylist[i];
        // Sort out the classification methods
        add_category_to_loadable_list(cat);
    }
}

Here, we basically have a clue. The calling order of the load method is basically determined by the sorting process. Moreover, we can find that the sorting of the load method of the class and the sorting of the classified load method are independent of each other. Therefore, we can also infer that the calling time is also independent. First, let's look at the load method of the class and the sorting function schedule_ class_ Load (after removing irrelevant codes):

static void schedule_class_load(Class cls)
{
    // If the class does not exist or has been load ed, return
    if (!cls) return;
    if (cls->data()->flags & RW_LOADED) return;

    // Ensure the loading order and load the parent class recursively
    schedule_class_load(cls->superclass);
    // Load the load method of the current class into the load method list
    add_class_to_loadable_list(cls);
    // Set the current class to laod already loaded
    cls->setInfo(RW_LOADED); 
}

As you can see, schedule_ class_ The load function uses a recursive method to perform the inheritance chain layer by layer to ensure that when loading the load method, the parent class is loaded first, and then the child class is loaded. add_class_to_loadable_list is the core load method collation function, as follows (irrelevant code is removed):

void add_class_to_loadable_list(Class cls)
{
    IMP method;
    // Read the load method in the class
    method = cls->getLoadMethod();
    if (!method) return; // Class does not implement the load method, so it returns directly
    // Build storage list and expansion logic
    if (loadable_classes_used == loadable_classes_allocated) {
        loadable_classes_allocated = loadable_classes_allocated*2 + 16;
        loadable_classes = (struct loadable_class *)
            realloc(loadable_classes,
                              loadable_classes_allocated *
                              sizeof(struct loadable_class));
    }
    // Add loadable to the list_ Class structure, which stores the class and the corresponding laod method
    loadable_classes[loadable_classes_used].cls = cls;
    loadable_classes[loadable_classes_used].method = method;
    // Pointer movement of tag list index
    loadable_classes_used++;
}

loadable_clas structure is defined as follows:

struct loadable_class {
    Class cls;  // may be nil
    IMP method;
};

The implementation of getLoadMetho function mainly obtains the implementation of load method from class, as follows:

IMP 
objc_class::getLoadMethod()
{
    // Get method list
    const method_list_t *mlist;
    mlist = ISA()->data()->ro()->baseMethods();
    if (mlist) {
        // Traverse, find the load method return
        for (const auto& meth : *mlist) {
            const char *name = sel_cname(meth.name);
            if (0 == strcmp(name, "load")) {
                return meth.imp;
            }
        }
    }
    return nil;
}

Now, the preparation logic of the load method of the class is very clear. Finally, the load methods of all classes will be added to the class named loadable in the order of parent class and child class_ In the list of classes, loadable_ You should pay attention to the name classes. We'll meet it later.

Let's look at the preparation process of laod method for classification, which is very similar to the class we introduced above, add_ category_ to_ loadable_ The list function is simplified as follows:

void add_category_to_loadable_list(Category cat)
{
    IMP method;
    // Gets the load method of the current classification
    method = _category_getLoadMethod(cat);
    if (!method) return;
    // List creation and expansion logic
    if (loadable_categories_used == loadable_categories_allocated) {
        loadable_categories_allocated = loadable_categories_allocated*2 + 16;
        loadable_categories = (struct loadable_category *)
            realloc(loadable_categories,
                              loadable_categories_allocated *
                              sizeof(struct loadable_category));
    }
    // Store the classification with the load method
    loadable_categories[loadable_categories_used].cat = cat;
    loadable_categories[loadable_categories_used].method = method;
    loadable_categories_used++;
}

You can see that the load method of the final classification is stored in loadable_ In the categories list.

When the load method is ready, let's analyze the execution process of the load method, call_ load_ The core implementation of the methods function is as follows:

void call_load_methods(void)
{
    bool more_categories;
    do {
        // First on loadable_classes to traverse, loadable_classes_ The used field can be understood as the number of elements in the list
        while (loadable_classes_used > 0) {
            call_class_loads();
        }

        // Then call the category traversal
        more_categories = call_category_loads();
       
    } while (loadable_classes_used > 0  ||  more_categories);
 
}

call_ class_ The simplified implementation of the loads function is as follows:

static void call_class_loads(void)
{
    int i;
    // loadable_classes list
    struct loadable_class *classes = loadable_classes;
    // Number of load methods to be executed
    int used = loadable_classes_used;
    // Clean up data
    loadable_classes = nil;
    loadable_classes_allocated = 0;
    loadable_classes_used = 0;
    // The loop is executed from front to back
    for (i = 0; i < used; i++) {
        // Get class
        Class cls = classes[i].cls;
        // Get the corresponding load method
        load_method_t load_method = (load_method_t)classes[i].method;
        if (!cls) continue; 
        // Execute load method
        (*load_method)(cls, @selector(load));
    }
}

call_ category_ The implementation of the loads function is more complex, which is simplified as follows:

static bool call_category_loads(void)
{
    int i, shift;
    bool new_categories_added = NO;
    
    // Get loadable_categories list of load methods
    struct loadable_category *cats = loadable_categories;
    int used = loadable_categories_used;
    int allocated = loadable_categories_allocated;
    loadable_categories = nil;
    loadable_categories_allocated = 0;
    loadable_categories_used = 0;

    // The load method is called by traversing from front to back
    for (i = 0; i < used; i++) {
        Category cat = cats[i].cat;
        load_method_t load_method = (load_method_t)cats[i].method;
        Class cls;
        if (!cat) continue;
        cls = _category_getClass(cat);
        if (cls  &&  cls->isLoadable()) {
            (*load_method)(cls, @selector(load));
            cats[i].cat = nil;
        }
    }
    return new_categories_added;
}

Now, I believe you have called the load method first, then call it after classification, and why the parent class calls first, and then calls after the subclass. But there is another point that we don't know very well, that is, how to determine the call order between classes or categories. From the source code, we can see that the class list is through_ The getobjc2nonlazyclass list function obtains the same classified list through_ getObjc2NonlazyCategoryList function. The order of classes or classifications obtained by these two functions is actually related to the compilation order of class source files, as shown in the following figure:

You can see that the execution order of the printed load method is consistent with the compilation order of the source code.

4, initialize method analysis

We can use the same strategy as when analyzing the load method to test the implementation of the initialize method. First, add the implementation of the initialize method to all classes in the test project. At this time, if you run the project directly, you will find that there is no output on the console. This is because the initialize method is executed only when the method of the class is called for the first time. Write the following test code in the main function:

int main(int argc, const char * argv[]) {
    @autoreleasepool {
        [MySubObjectOne new];
        [MyObjectOne new];
        [MyObjectTwo new];
        NSLog(@"------------");
        [MySubObjectOne new];
        [MyObjectOne new];
        [MyObjectTwo new];
    }
    return 0;
}

Run the code console and the printing effect is as follows:

2021-02-18 21:29:55.761897+0800 KCObjc[43834:23521232] initialize-cateOne:MyObjectOne
2021-02-18 21:29:55.762526+0800 KCObjc[43834:23521232] initialize:MySubObjectOne
2021-02-18 21:29:55.762622+0800 KCObjc[43834:23521232] initialize-cate:MyObjectTwo
2021-02-18 21:29:55.762665+0800 KCObjc[43834:23521232] ------------

It can be seen that the print data appears in front of the split line, which means that once the initialize method of a class is called, messages will be sent to the class later, and the initialize method will not be called. Another thing to note is that if messages are sent to a child class, the initialize of the parent class will be called first, and then the initialize of the child class. At the same time, If the initialize method is implemented in the classification, the of the class itself will be overwritten, and the loading order of the classification will overwrite the previous one. Let's analyze the calling characteristics of the initialize method through the source code.

First, when the class method of a class is called, the class in runtime is executed_ Getclassmethod method to find the implementation function. The implementation of this method in the source code is as follows:

Method class_getClassMethod(Class cls, SEL sel)
{
    if (!cls  ||  !sel) return nil;
    return class_getInstanceMethod(cls->getMeta(), sel);
}

It can be seen from the source code that calling the class method of a class is actually calling the example method of its metaclass. The getMeta function is used to obtain the metaclass of the class. We will not extend the relevant organization principles of classes and metaclasses here. What we need to focus on is class_ The getinstancemethod function is also very simple to implement, as follows:

Method class_getInstanceMethod(Class cls, SEL sel)
{
    if (!cls  ||  !sel) return nil;

    // Make a list of query methods and try to work on method parsing
    lookUpImpOrForward(nil, sel, cls, LOOKUP_RESOLVER);

    // Get method from class object
    return _class_getMethod(cls, sel);
}

In class_ In the implementation of getinstancemethod method_ class_getMethod is the function that finally obtains the method to be called. Before that, lookupimportorforforward function will do some pre operations, including the call logic of initialize function. We remove the irrelevant logic. The core implementation of lookupimportorforforward is as follows:

IMP lookUpImpOrForward(id inst, SEL sel, Class cls, int behavior)
{
    IMP imp = nil;
    // The core is! CLS - > isinitialized() if the current class has not been initialized, the initializeandlavelocked function will be executed
    if (slowpath((behavior & LOOKUP_INITIALIZE) && !cls->isInitialized())) {
        cls = initializeAndLeaveLocked(cls, inst, runtimeLock);
    }

    return imp;
}

Initializeandlavelocked will directly call the initializeAndMaybeRelock function as follows:

static Class initializeAndLeaveLocked(Class cls, id obj, mutex_t& lock)
{
    return initializeAndMaybeRelock(cls, obj, lock, true);
}

The initializeAndMaybeRelock function performs class initialization logic. This process is thread safe. The core related codes are as follows:

static Class initializeAndMaybeRelock(Class cls, id inst,
                                      mutex_t& lock, bool leaveLocked)
{
    // If it has been initialized, return directly
    if (cls->isInitialized()) {
        return cls;
    }
    // A non metaclass of the current class was found
    Class nonmeta = getMaybeUnrealizedNonMetaClass(cls, inst);
    // Perform initialization
    initializeNonMetaClass(nonmeta);

    return cls;
}

The initializeNonMetaClass function will recursively query up the inheritance chain to find all uninitialized parent classes for initialization. The core implementation is simplified as follows:

void initializeNonMetaClass(Class cls)
{
    Class supercls;
    // Whether the tag needs to be initialized
    bool reallyInitialize = NO;
    // If the parent class exists and has not been initialized, the parent class is initialized recursively
    supercls = cls->superclass;
    if (supercls  &&  !supercls->isInitialized()) {
        initializeNonMetaClass(supercls);
    }
    
    SmallVector<_objc_willInitializeClassCallback, 1> localWillInitializeFuncs;
    {
        // If the current class is not initializing and the current class has not been initialized
        if (!cls->isInitialized() && !cls->isInitializing()) {
            // Set the initialization flag. This class is marked as initialized
            cls->setInitializing();
            // The tag needs to be initialized
            reallyInitialize = YES;
        }
    }
    // Whether initialization is required
    if (reallyInitialize) {
        @try
        {
            // Call initialization function
            callInitialize(cls);
        }
        @catch (...) {
            @throw;
        }
        return;
    }
}

The callInitialize function eventually calls objc_msgSend function to send initialize message to class, as follows:

void callInitialize(Class cls)
{
    ((void(*)(Class, SEL))objc_msgSend)(cls, @selector(initialize));
    asm("");
}

It should be noted that the biggest difference between the initialize method and the load method is that it is finally through objc_msgSend. If each class is not initialized, it will pass objc_msgSend to send an initialize message to the class. Therefore, if the subclass does not implement initialize, follow objc_msgSend's message mechanism will find the implementation of the parent class along the inheritance chain and call it. All initialize methods will not be called only once. If this method is implemented in the parent class and it has multiple subclasses that do not implement this method, when each subclass receives the message for the first time, it will call the initialize method of the parent class, This is very important and must be kept in mind in practical development.

5, Conclusion

load and initialize methods are two very simple and commonly used methods in iOS development. However, compared with ordinary methods, they have some special features. Through the interpretation of the source code, we can more deeply understand the reasons and principles of these special features. The programming process is like practice. We know what it is and why it is. Let's encourage each other.

Added by Cugel on Wed, 05 Jan 2022 16:44:07 +0200