Time-consuming analysis of linux write function

Background:

When an embedded device writes an SD card, it occasionally calls write card, linux-3.4.y.

II. linux Kernel io Process

1. The application calls write, falls into the kernel and executes the vfs_write function, writing data to the page cache (each cache page contains several buffers). Before writing, we need to check (1) whether the page is being written back, suspend the process if it is being written back, and await the wake-up process when the write-back flag is empty; (2) check whether the page buffer is locked, and suspend the process waiting for wake-up if it is locked.
2. The kernel has a resident thread, which creates a thread for each bdi, checks periodically whether it needs to write back, and submits bio if it needs to, letting the driver write to the sd card
 3. At the end of bio, the callback is executed and the page write-back flag is cleared

3. Analysis of correlation function (recording main functions to track source code)

1.Page writing cache
(1)Important structure:
const struct file_operations fat_file_operations = {
    .llseek        = generic_file_llseek,
    .read        = do_sync_read,
    .write        = do_sync_write,
    .aio_read    = generic_file_aio_read,
    .aio_write    = generic_file_aio_write,
    .mmap        = generic_file_mmap,
    .release    = fat_file_release,
    .unlocked_ioctl    = fat_generic_ioctl,
#ifdef CONFIG_COMPAT
    .compat_ioctl    = fat_generic_compat_ioctl,
#endif
    .fsync        = fat_file_fsync,
    .splice_read    = generic_file_splice_read,
};

struct address_space_operations {
    int (*writepage)(struct page *page, struct writeback_control *wbc);
    int (*readpage)(struct file *, struct page *);
    int (*sync_page)(struct page *);
    int (*writepages)(struct address_space *, struct writeback_control *);
    int (*set_page_dirty)(struct page *page);
    int (*readpages)(struct file *filp, struct address_space *mapping,
                    struct list_head *pages, unsigned nr_pages);
    int (*write_begin)(struct file *, struct address_space *mapping,
                            loff_t pos, unsigned len, unsigned flags,
                            struct page **pagep, void **fsdata);
    int (*write_end)(struct file *, struct address_space *mapping,
                            loff_t pos, unsigned len, unsigned copied,
                            struct page *page, void *fsdata);
    sector_t (*bmap)(struct address_space *, sector_t);
    int (*invalidatepage) (struct page *, unsigned long);
    int (*releasepage) (struct page *, int);
    void (*freepage)(struct page *);
    ssize_t (*direct_IO)(int, struct kiocb *, const struct iovec *iov,
                    loff_t offset, unsigned long nr_segs);
    struct page* (*get_xip_page)(struct address_space *, sector_t,
                    int);
    /* migrate the contents of a page to the specified target */
    int (*migratepage) (struct page *, struct page *);
    int (*launder_page) (struct page *);
    int (*error_remove_page) (struct mapping *mapping, struct page *page);
    int (*swap_activate)(struct file *);
    int (*swap_deactivate)(struct file *);
};


(2)Function call flow:
vfs_write-->
    do_sync_write-->
        f_op->aio_write(generic_file_aio_write)-->(mm/filemap.c)
             __generic_file_aio_write-->
                 generic_file_buffered_write-->
                     generic_perform_write-->(Important function)
                         a_ops->write_begin(block_write_begin)-->(fs/buffer.c)The main time-consuming functions are as follows
                             grab_cache_page_write_begin-->
                                 wait_on_page_writeback-->(If the page is being rewritten after the application arrives, the current process needs to be suspended and awaited after rewriting.)(Mainly time-consuming)
                             __block_write_begin
                                 wait_on_buffer-->(Allocate a buffer for a page. If the requested buffer is locked, suspend the process and wait for the unlock to wake up.)(Secondary time consuming)


static inline void wait_on_buffer(struct buffer_head *bh)
{
    might_sleep();
    if (buffer_locked(bh))
        __wait_on_buffer(bh);
}

(3)Reference material:    
https://www.cnblogs.com/children/p/3420430.html
https://www.jianshu.com/p/d33ec2707e7f
http://blog.chinaunix.net/uid-14528823-id-4289180.html
https://my.oschina.net/u/2475751/blog/535859
https://blog.csdn.net/wh8_2011/article/details/51787282
https://www.cnblogs.com/honpey/p/4931962.html
https://blog.csdn.net/ctoday/article/details/37966233

2.Kernel writeback threads:
(1)function analysis
linux3.2 After that, there is a thread in the kernel that resides in memory. bdi_forker_thread,Responsible for bdi_object Establish bdi_writeback_thread Threads, while detecting if bdi_writeback_thread If the thread is idle for a long time, it will be destroyed.
bdi_writeback_thread Thread in fs/fs-writeback.c It's in a while In the loop, check if you need to write back, and then execute the scheduling function to wait for wake-up. The kernel wakes up the thread at regular intervals, at which time it can view files./proc/sys/vm/dirty_writeback_centisecs. 
bdi_writeback_thread call wb_do_writeback Function rewriting
wb_do_writeback Handle bdi-work_list Need to write back work,At the same time, it also checks whether there is page cache to write back from two aspects. One is whether there is too long time for dirty pages, but whether the proportion of dirty pages has reached the upper limit set. The corresponding files are as follows:/proc/sys/vm/dirty_expire_centisecs and/proc/sys/vm/dirty_background_ratio
wb_do_writeback-->
    wb_writeback-->
        writeback_sb_inodes-->
            writeback_single_inode-->
                do_writepages-->(mm/page-writeback.c)
                    mapping->a_ops->writepages-->
fat32 Registered mapping->a_ops->writepages mean fat_writepages(fs/fat/inode.c),fat_writepages call mpage_writepages(fs/mpage.c), mpage_writepages call__mpage_writepage
[copy]--------------------------------------
_mpage_writepage Functions are the core interface for writing files. The code flow is as follows: if page Yes buffer_head,Then complete the disk mapping, the code only supports all page They are all set to dirty pages, unless they are not set to dirty pages. page Put it at the end of the file, that's the requirement page Set the continuity of dirty pages. If page No buffer_head,All in the interface page Set to dirty page. If all block If they are continuous, they enter directly. bio Request the process, or go back writepage The mapping process.

//Use page_has_buffers to determine whether there is buffer_head(bh) in the current page, then use page_buffers to convert the current page to the BH pointer of buffer_head, and then use bh-> b_this_page to traverse all BH of the current page, and call buffer_locked(bh) to lock buffer-head. Even if a BH is not mapped, it will enter the confused process. First_unmapped records the first one without mapping. Bh, in addition to ensuring that all BH is mapped, it also ensures that all BH is set to dirty pages and uptodate is completed. If the number of block s per page is not zero (by judging whether the first_unmapped is not zero), then it goes directly to the process page_is_mapped that the current page has been mapped, otherwise it goes to the confused process.

//If there is no buffer_head(bh) in the current page, you need to map the current page to disk, encapsulate it with the buffer_head variable map_bh, and do the conversion between buffer_head and bio.

page_is_mapped If there is one in the process bio Resource and Discontinuity of the disk block number between the current page and the previous page was detected(Code correspondence bio && mpd->last_block_in_bio != blocks[0] – 1,blocks[0]Represents the first disk block),Then use mpage_bio_submit To submit an accumulation bio Request, will be before the continuity block Write it to the device. Otherwise enter alloc_new Technological process.

alloc_new In the process, judgment bio Empty(Indicates that a previous submission has just been made bio)It needs to be used. mpage_alloc Apply for a new one bio Resources, then use bio_add_page towards bio Add the current page,If bio The length of the middle can't accommodate this addition. page For the entire length, add bio Data Submission on bio request mpage_bio_submit,Reentry of remaining data alloc_new Process to do bio Application operation. If a disposable will page All data in the bio On, in page Yes buffer In the case of all buffer Clear all dirty pages. use set_page_writeback Set up page To write back the status, give page Unlock(unlock_page). When bh Of boundary If the disk block number is set or the current page and the previous page are not continuous, submit a cumulative continuity first. block Of bio. Otherwise, it shows the current situation. page All of them block It's continuous, and it's like before. page in block It is also continuous, in which case no submission is required bio,Update only the disk block number of the previous page mpd->last_block_in_bio For the present page The last one block No, then quit and proceed to the next page Continuity check until discontinuity occurs bio Submission.

confused Submitted in the process bio Operations, but mapping errors are set.
[end]----------------------------------------
//In short, the _mpage_writepage function calls mpage_end_io to submit bio, which drives the dirty page to be written to the sd card, and protects the page in the process. When bio is complete, call back bio - > bi_end_io = mpage_end_io to clear the page's writeback flag

(2)Reference material:
https://blog.csdn.net/asmxpl/article/details/21548129
http://blog.sina.com.cn/s/blog_6f5549150102vaoz.html
http://blog.chinaunix.net/uid-7494944-id-3833328.html
https://blog.csdn.net/zhufengtianya/article/details/42145985

//Four, supplement
1.buffer head Of lock and unlock There is no analysis yet.

2.We call write When function page caching, check the page's writeback Logo, if you are writing back, hang up the process and wait to wake up. write The function is blocked. bio Call callback to clear the page after execution writeback Flag, the application is awakened.
//It has been said before that the kernel does a write-back check every fixed time (/ proc/sys/vm/dirty_writeback_centisecs), usually when the proportion of dirty pages reaches / proc/sys/vm/dirty_background_ratio. We reduce the values of these two parameters at the same time and find that the peak of bio consumption time decreases.


dirty_writeback_centisecs(s)  dirty_background_ratio(%) bio_max_time(ms)
5                             10                        6000
2                             5                         4800

Keywords: C Linux

Added by TreColl on Sun, 06 Oct 2019 03:24:27 +0300