linux virtual file system source code analysis (detailed explanation)


Virtual file system is a huge architecture. If you want to analyze everything, it will be particularly complex and clumsy. If you look at it, you won't know what to say (of course, it's mainly the author's food). Therefore, this blog tries to analyze the operation mechanism of VFS file system with the open() function as the starting point. The code of this paper comes from Linux3 four point two

Basic knowledge

First, let's look at a picture

From this figure, we can see that the system call function does not directly operate the real file system, but through an intermediate layer, that is, the virtual file system. Why is there a virtual file system?

There are three common file systems in Linux: disk based file system; Memory based file system; Network file system, (these three types of file systems coexist in the file system layer and provide storage services for different types of data. The formats of these three types of file systems are different, that is to say, if you directly read the real file system without using the virtual file system, you have to write several corresponding read functions for different types of file systems), Therefore, the appearance of virtual file (VFS) is to operate any file in Linux by using the same set of file I/O system calls, without considering the specific file system format

Data structure of VFS

VFS relies on four main data structures and some auxiliary data structures to describe its structure information. These data structures behave like objects; Each main object contains operation objects composed of operation function tables, which describe the operations that the kernel can perform on these main objects.

1. Superblock object

Storing the control information of an installed file system, representing an installed file system; Every time an actual file system is installed, the kernel will read some control information from a specific location on the disk to fill the superblock object in memory. An installation instance corresponds to a superblock object one by one. The super block passes through a domain s in its structure_ Type records the file system type to which it belongs.

struct super_block { //Super block data structure
        struct list_head s_list;                /*Pointer to the super block linked list*/
        struct file_system_type  *s_type;       /*file system type*/
       struct super_operations  *s_op;         /*Super block method*/
        struct list_head         s_instances;   /*This type of file system*/

struct super_operations { //Super block method
        //This function creates and initializes a new inode object under a given superblock
        struct inode *(*alloc_inode)(struct super_block *sb);
        //This function reads the inode from the disk and dynamically fills the rest of the corresponding inode object in memory
        void (*read_inode) (struct inode *);

2. Inode object

The inode object stores information about the file and represents an actual physical file on the storage device. When a file is accessed for the first time, the kernel will assemble the corresponding inode object in memory to provide the kernel with all the information necessary to operate a file; Some of this information is stored in a specific location on the disk, and the other part is dynamically populated during loading.

struct inode {//Index node structure
      struct inode_operations  *i_op;     /*Inode operation table*/
     struct file_operations   *i_fop;  /*The file operation set of the file corresponding to the inode*/
     struct super_block       *i_sb;     /*Related superblocks*/

struct inode_operations { //Index node method
     //This function creates a new index node for the file corresponding to the dentry object, which is mainly called by the open() system call
     int (*create) (struct inode *,struct dentry *,int, struct nameidata *);

     //Find the index node corresponding to the dentry object in a specific directory
     struct dentry * (*lookup) (struct inode *,struct dentry *, struct nameidata *);

3. Catalog item object

The introduction of the concept of directory entry is mainly for the purpose of facilitating the search of files. Each component of a path, whether it is a directory or an ordinary file, is a directory item object. For example, in the path / home / source / test C, directory /, home, source and file test C corresponds to a directory item object. Different from the previous two objects, the directory item objects have no corresponding disk data structure. VFS parses them into directory item objects one by one in the process of traversing the path names.

struct dentry {//Directory item structure
     struct inode *d_inode;           /*Related inodes*/
    struct dentry *d_parent;         /*Directory entry object of parent directory*/
    struct qstr d_name;              /*The name of the directory entry*/
     struct list_head d_subdirs;      /*Subdirectory*/
     struct dentry_operations *d_op;  /*Table of contents item operations*/
    struct super_block *d_sb;        /*File superblock*/

struct dentry_operations {
    //Judge whether the directory item is valid;
    int (*d_revalidate)(struct dentry *, struct nameidata *);
    //Generate hash values for directory items;
    int (*d_hash) (struct dentry *, struct qstr *);

*3. File object

File object is the representation of open files in memory. It is mainly used to establish the corresponding relationship between processes and files on disk. It consists of sys_open() field created by = = sys_close() = = destroy. The relationship between file objects and physical files is a bit like the relationship between processes and programs. When we look at VFS in user space, it seems that we only need to deal with file objects without caring about superblocks, inodes or directory entries. Because multiple processes can open and operate the same file at the same time, there may be multiple corresponding file objects in the same file. The file object simply represents the open file from the process point of view, which in turn points to the directory entry object (in turn to the index node). The file object corresponding to a file may not be unique, but its corresponding inode and directory item objects are undoubtedly unique.

struct file {
     struct list_head        f_list;        /*File object linked list*/
    struct dentry          *f_dentry;       /*Related catalog item objects*/
    struct vfsmount        *f_vfsmnt;       /*Related installation file system*/
    struct file_operations *f_op;           /*File operation table*/

struct file_operations {
    //File read operation
    ssize_t (*read) (struct file *, char __user *, size_t, loff_t *);
    //File write operation
    ssize_t (*write) (struct file *, const char __user *, size_t, loff_t *);
    int (*readdir) (struct file *, void *, filldir_t);
    //File open operation
    int (*open) (struct inode *, struct file *);

Main article

After introducing the basic knowledge points, we begin to explore how to find the corresponding data of the file stored on the hardware inside linux when we try to open a file through open()

First, let's take a look at the above two pictures, files_struct is mainly an array of file pointers. We usually say that the file descriptor is an integer, and this integer can be used as a subscript, so as to get from files_ Get the file structure from struct. task_struct is a process descriptor, which represents the action of opening a file. Here I want to express the knowledge point: when the file is opened for the first time (successfully opened), a connection as shown in the above figure will be established, and the returned fd file descriptor is connected with the underlying storage structure. fd is used as the file descriptor and the file is used as the carrier of data, We can understand them as the relationship between the password and the safe. The first time we open the file is to set the password during initialization (establish the connection between the password and the safe). When we need to take things in the safe in the future, we can operate the safe process only through the password set for the first time.

In the kernel, there is a file descriptor table corresponding to each process, which represents all the files opened by this process. Each item in the file description table is a pointer to a data block used to describe the open file - file object. The file object describes the opening mode, reading and writing location and other important information of the file. When the process opens a file, the kernel will create a new file object. It should be noted that the file object is not exclusive to a process. The pointers in the file descriptor table of different processes can point to the same file object to share the open file. The file object has a reference count, which records the number of file descriptors referencing the object. The kernel destroys the file object only when the reference count is 0. Therefore, a process closes the file and does not affect the process sharing the same file object with it

Let's analyze the specific code.

Application layer:

Before operating any file, the application must first call open() to open the file, that is, inform the kernel to create a new structure representing the file, and return the descriptor (an integer) of the file, which is unique in the process. The function used is open()

int open(const char * pathname,int oflag, mode_t mode )
    /*pathname:Represents the file name of the file to be opened;

       oflag: Identification indicating open (read only open / write only open / read write open.........)
      mode: When creating a new file, you need to specify the mode parameter (set permissions)

Kernel layer:

* when the open() system call enters the kernel, the final function called is syscall_ Define3 (open, const char _user, filename, int, flags, int, mode) (as for how open calls sys... E3, I suggest not to explore it. The author has turned over the source code for a long time and didn't understand it very well, but if you look at the early version (0.11 Linux), it's easier to see that it is through an embedded assembly code to jump directly to sys_open()...), which is located in FS / open C, the specific implementation process will be analyzed below.

First of all, I will give a complete function call relationship as follows. I will explain the more important functions


                            error = dir->i_op->atomic_open()
                            f->f_op = fops_get(inode->i_fop);
                            open = f->f_op->open;
                            error = open(inode, f);
SYSCALL_DEFINE3(open, const char __user *, filename, int, flags, int, mode)
	long ret;
	//Judge whether the system supports large files, that is, judge the number of bits of long. If 64, it means that large files are supported;	
	if (force_o_largefile())
		flags |= O_LARGEFILE;
	//Complete the main work, AT_FDCWD means to search from the current directory
	ret = do_sys_open(AT_FDCWD, filename, flags, mode);
	/* avoid REGPARM breakage on x86: */
	asmlinkage_protect(3, ret, filename, flags, mode);
	return ret;

This function mainly calls do_sys_open() to complete the opening work, do_ sys_ The code analysis of open() is as follows.

long do_sys_open(int dfd, const char__user *filename, int flags, int mode)
	//Copy the file name to be opened to the kernel. The analysis of this function is shown below;
	char *tmp = getname(filename);
	int fd = PTR_ERR(tmp);

	if (!IS_ERR(tmp)) {
		//Find a free file table pointer from the file table of the process. If there is an error, it will be returned. See the description below;
		fd = fd = get_unused_fd();
		if (fd >= 0) {
			//Perform the opening operation, see the description below, dfd=AT_FDCWD;
			struct file *f = do_filp_open(dfd, tmp, flags, mode, 0);
			if (IS_ERR(f)) {
				fd = PTR_ERR(f);
			} else {
				fsnotify_open(f);//The function is to open the monitoring point of filp and add it to the monitoring system
				//Add the open file table f to the file table array of the current process, as described below;
				fd_install(fd, f);
	return fd;

From the analysis of code and flow chart, we know that, How do fd and file connect (the file object contains a pointer to the dentry object. The dentry object represents an independent file path. If a file path is opened multiple times, multiple file objects will be established, but they all point to the same dentry object. The dentry object also contains a pointer to the inode object. The inode object represents an independent file. Because there are hard links and symbols Link, so different dentry objects can point to the same inode object Inode object contains all the information needed for the final operation of the file, such as file system type, file operation method, file permission, access date, etc.)

Let's think in reverse. Now that we have obtained fd, how to find the corresponding file? In the current process, we keep the file descriptor, the file descriptor (files_structures), and the file descriptor (fatable). Through the item of fd corresponding to the pointer array of file type in the file descriptor table, we can find file

The node object contains all the information required for the final operation of the file, such as file system type, file operation method, file permission, access date, etc. * * = =)

Let's think in reverse. Now that we have obtained fd, how to find the corresponding file? In the current process, we keep the file descriptor, the file descriptor (files_structures), and the file descriptor (fatable). Through the item of fd corresponding to the pointer array of file type in the file descriptor table, we can find file

Even if this article is finished here, there is still one unfinished work, such as do_ filp_ How does open (DFD, TMP, flags, mode) get a file?, This is a very complicated process. I will try to analyze it when I am free in the future

Keywords: Linux

Added by Patch^ on Mon, 28 Feb 2022 15:02:08 +0200