Linux block device driver ramdisk analog disk

preface

The purpose of writing this article is to record your learning process so that you can review and refer to the relevant knowledge points in the future.

1, Block device drive frame

.

1, block_device structure

linux kernel uses block_device means block device, block_device is a structure defined in include / linux / Fs H file, the structure is as follows:

struct block_device {
	dev_t			bd_dev;  /* not a kdev_t - it's a search key */
	int			bd_openers;
	struct inode *		bd_inode;	/* will die */
	struct super_block *	bd_super;
	struct mutex		bd_mutex;	/* open/close mutex */
	struct list_head	bd_inodes;
	void *			bd_claiming;
	void *			bd_holder;
	int			bd_holders;
	bool			bd_write_holder;
#ifdef CONFIG_SYSFS
	struct list_head	bd_holder_disks;
#endif
	struct block_device *	bd_contains;
	unsigned		bd_block_size;
	struct hd_struct *	bd_part;
	/* number of times partitions within this device have been opened. */
	unsigned		bd_part_count;
	int			bd_invalidated;
	struct gendisk *	bd_disk;	/* Describes the structure of a disk, gendisk */
	struct request_queue *  bd_queue;
	
	........
};

For block_device structure, focus on bd_disk member variable, which is
Pointer type of gendisk structure. The kernel uses block_device to represent a specific block device object, such as a hard disk or partition. If it is a hard disk bd_disk points to the general disk structure gendisk.

. register block device
Like the character device driver, you need to register a new block device with the kernel and apply for the device number. The block device registration function is register_blkdev, the function prototype is as follows:
int register_blkdev(unsigned int major, const char *name)
Function parameters and return values have the following meanings:
major: main equipment number.
Name: block device name.
Return value: if the parameter major is between 1 and 255, it indicates the user-defined main device number, then 0 indicates successful registration, and if a negative value is returned, it indicates failed registration. If major is 0, it means that the system automatically assigns the master device number, and the return value is the master device number assigned by the system (1 ~ 255). If a negative value is returned, it means that the registration fails.

. log off block device
If you do not use a block device, you need to log out. The function is unregister_blkdev, the function prototype is as follows:
void unregister_blkdev(unsigned int major, const char *name)
Function parameters and return values have the following meanings:
major: the main device number of the block device to be unregistered.
Name: the name of the block device to log off.
Return value: none.

.

2. gendisk structure

The linux kernel uses gendisk to describe a disk device, which is a structure defined in include / linux / genhd H, as follows:

struct gendisk {
	/* major, first_minor and minors are input parameters only,
	 * don't use directly.  Use disk_devt() and disk_max_parts().
	 */
	/* The primary device number of the disk device */
	int major;			/* major number of driver */
	/* The first secondary device number of the disk */
	int first_minor;
	/* The number of this device number of the disk, that is, the number of partitions of the disk */
	int minors;                     /* maximum number of minors, =1 for
                                         * disks that can't be partitioned. */

	char disk_name[DISK_NAME_LEN];	/* name of major driver */
	char *(*devnode)(struct gendisk *gd, umode_t *mode);

	unsigned int events;		/* supported events */
	unsigned int async_events;	/* async events, subset of all */

	/* Array of pointers to partitions indexed by partno.
	 * Protected with matching bdev lock but stat and other
	 * non-critical accesses use RCU.  Always access through
	 * helpers.
	 */
	struct disk_part_tbl __rcu *part_tbl;	 /* Partition table corresponding to disk */
	struct hd_struct part0;
	
	const struct block_device_operations *fops;	/* Block device operation set */
	struct request_queue *queue;	/* Request queue corresponding to disk */
	void *private_data;	/* Private data */

	.......
};

The more important members in it have been annotated in Chinese in the code. When writing the device driver of the block, you need to allocate and initialize a gendisk. The linux kernel provides a set of gendisk operation functions. The common API functions are as follows.

. apply for gendisk
Apply before using gendisk, allo_ The disk function is used to apply for a gendisk. The prototype of the function is as follows:
struct gendisk *alloc_disk(int minors)
Function parameters and return values have the following meanings:
minors: the number of secondary device numbers, that is, the number of partitions corresponding to gendisk.
Return value: Success: return the requested gendisk, failure: NULL.

. delete gendisk
If you want to delete gendisk, you can use the function del_ The prototype of gendisk function is as follows:
void del_gendisk(struct gendisk *gp)
Function parameters and return values have the following meanings:
gp: gendisk to delete.
Return value: none.

. add gendisk to the kernel
Use alloc_ After the disk is applied to gendisk, the system cannot be used. You must use add_ The disk function adds the requested gendisk to the kernel, add_ The prototype of disk function is as follows:
void add_disk(struct gendisk *disk)
Function parameters and return values have the following meanings:
disk: the gendisk to be added to the kernel.
Return value: none.

. set gendisk capacity
Each disk has capacity, so you need to set its capacity when initializing gendisk, using the function set_ The prototype of capacity function is as follows:
void set_capacity(struct gendisk *disk, sector_t size)
Function parameters and return values have the following meanings:
disk: gendisk to set the capacity.
size: disk capacity. Note that this is the number of sectors.
Return value: none.

. adjust gendisk reference count
The kernel will get_disk and put_disk these two functions to adjust the reference count of gendisk. You can know according to the name, get_disk is to increase the reference count of gendisk, put_disk is to reduce the reference count of gendisk. The prototypes of these two functions are as follows:
truct kobject *get_disk(struct gendisk *disk)

void put_disk(struct gendisk *disk)

.

3, block_device_operations structure

And character device file_ Like operations, block devices also have operation sets, which are structured blocks_ device_ Operations, which is defined in include / Linux / blkdev H, the structure is as follows:

struct block_device_operations {
	int (*open) (struct block_device *, fmode_t);
	int (*release) (struct gendisk *, fmode_t);
	int (*ioctl) (struct block_device *, fmode_t, unsigned, unsigned long);
	int (*compat_ioctl) (struct block_device *, fmode_t, unsigned, unsigned long);
	int (*direct_access) (struct block_device *, sector_t,
						void **, unsigned long *);
	unsigned int (*check_events) (struct gendisk *disk,
				      unsigned int clearing);
	/* ->media_changed() is DEPRECATED, use ->check_events() instead */
	int (*media_changed) (struct gendisk *);
	void (*unlock_native_capacity) (struct gendisk *);
	int (*revalidate_disk) (struct gendisk *);
	int (*getgeo)(struct block_device *, struct hd_geometry *);
	/* this callback is with swap_lock and sometimes page table lock held */
	void (*swap_slot_free_notify) (struct block_device *, unsigned long);
	struct module *owner;
};

As you can see, block_ device_ The operation set function in the operations structure and the file of the character device_ The operations operation set is basically similar, but the read and write operations in the block device are no longer completed through the read and write operation functions, so the two function pointers cannot be found above. In addition to open, release and ioctl, there is a getgeo in it. It will be used in the next driver writing practice. It is used to obtain disk information, including information such as magnetic head, cylinder and sector, including one parameter

struct hd_geometry *

, it's HD_ Pointer of geometry structure type, defined in include / Linux / hdreg H in the document, the contents are as follows:

/* Describe the structure of a disk */
struct hd_geometry {
      unsigned char heads;	/* head */
      unsigned char sectors;	/* Number of sectors in a track */
      unsigned short cylinders;	/* Cylinder, i.e. number of tracks */
      unsigned long start;	
};

.

4. Block device I/O request process

Block is mentioned above_ device_ Read and write functions such as read and write are not found in the operations structure, so how does the block device read and write data from the physical block device? Here is a very important request in the processing block device driver_ Queue, request and bio.
The relationship between them is shown in the figure below:

The kernel sends all reads and writes to the block device to the request queue request_ In the queue, request_ There are a large number of requests (request structure) in the queue, and the request contains bio, which contains bvec_iter,bio_ The VEC structure describes the start sector of the I/O request, the data direction (read and write), and the page in which the data is put, that is, bio stores the read and write related data. Bio related codes are defined as follows:

. bio structure

struct bio { 
	struct bio *bi_next; /* Next bio in the request queue */
	struct block_device *bi_bdev; /* Pointing block device */
	unsigned long bi_flags; /* bio Status and other information */
	unsigned long bi_rw; /* I/O Operation, read or write */
	struct bvec_iter bi_iter; /* I/O Operation, read or write */
	.......
	struct bio_vec *bi_io_vec; /* bio_vec list */
	struct bio_set *bi_pool;
	struct bio_vec bi_inline_vecs[0];
};

. bvec_iter structure

struct bvec_iter { 
	sector_t bi_sector; /* I/O Requested device start sector (512 bytes) */
	unsigned int bi_size; /* Number of I / OS remaining */
	unsigned int bi_idx; /* blv_vec Current index in */
	unsigned int bi_bvec_done; /* The number of bytes that have been processed in the current bvec */
};

. bio_vec structure

struct bio_vec { 
	struct page *bv_page; /* page */
	unsigned int bv_len; /* length */
	unsigned int bv_offset; /* deviation */
	};

Our operation on physical storage devices is nothing more than writing the data in RAM to physical storage devices, or reading the data in physical devices to ram for processing. There are three requirements for data transmission: data source, data length and data destination, that is, which address of the physical storage device you want to read from, which address in RAM you want to read, and what is the length of the read data. Since bio is the smallest data transmission unit of block equipment, it is necessary for bio to describe these information clearly, including Bi_ The ITER structure member variable is used to describe the address information of the physical storage device, such as the sector address to be operated. bi_io_vec points to bio_vec array first address, bio_vec array is ram information, such as page address, page offset and length. bio,bvec_iter and bio_ The relationship between the three structures of VEC is shown in the following figure:

5. Involving processing request_ API for queue, request and bio

I know request from above_ The relationship among queue, request and bio. Let's take a look at the functions provided by the kernel related to processing request_ API functions for queue, request, and bio.

1. Initialize request queue
request_queue *blk_init_queue(request_fn_proc *rfn, spinlock_t *lock)
Function parameters and return values have the following meanings:
rfn: request processing function pointer, each request_ Each queue must have a request processing function, which needs to be implemented by the writer.
Lock: spin lock pointer. You need to drive the writer to define a spin lock and pass it in., The request queue will use this spin lock.
Return value: if it is NULL, it means failure. If it is successful, it will return the request_queue address.

2. Clear request queue
void blk_cleanup_queue(struct request_queue *q)

3. Allocate request queue
struct request_queue *blk_alloc_queue(gfp_t gfp_mask)
For ramdisk, a completely random access non mechanical device, there is no need for complex I/O scheduling. At this time, you can directly "kick off" the I/O scheduler and use the following functions to bind the request queue and make_request_fn.

void blk_queue_make_request(struct request_queue *q, make_request_fn *mfn)

Usually blk_cleanup_queue() and blk_queue_make_request() is used in combination. The logic is generally:
xxx_queue = blk_cleanup_queue(GFP_KERNEL);
blk_queue_make_request(xxx_queue, xxx_make_request_fn);

4. Extraction request
struct request *blk_peek_request(struct request_queue *q)
The above function is used to return the next request to be processed. If there is no request, it returns NULL.

5. Start request
void blk_start_request(struct request *req)
Using BLK_ peek_ After the request function obtains the next request to be processed, it will start processing the request, and use the above function to start the request.

6. One step processing request
struct request *blk_fetch_request(struct request_queue *q)
We can consider using BLK_ fetch_ The request () function completes the extraction and startup of the request.

7. Traversing bio and fragments
__ rq_for_each_bio(_bio, rq) traverses all BIOS of a request

#define __rq_for_each_bio(_bio, rq) \
 if ((rq->bio)) \
 for (_bio = (rq)->bio; _bio; _bio = _bio->bi_next)

bio_for_each_segment traverses all BIOS of a bio_vec

#define bio_for_each_segment(bvl, bio, iter) \
 __bio_for_each_segment(bvl, bio, iter, (bio)->bi_iter)

rq_for_each_segment iteration traverses all the BIOS in a request_ vec

#define rq_for_each_segment(bvl, _rq, _iter)			\
	__rq_for_each_bio(_iter.bio, _rq)			\
		bio_for_each_segment(bvl, _iter.bio, _iter.iter)

8. Report complete
void __blk_end_request_all(struct request *rq, int error)
void blk_end_request_all(struct request *rq, int error)

The above two functions are used to report whether the request is completed. An error of 0 indicates completion, and less than 0 indicates failure__ blk_end_request_all needs to be used in the scenario of holding queue lock.

If "manufacturing request" is used, that is, the I/O scheduler is put aside to process bio directly, bio should be used after bio processing is completed_ Endio function notification processing completed:

bvoid bio_endio(struct bio *bio, int error)

.

2, Practice: write ramdisk driver

ramdisk is an analog disk whose data is actually stored in RAM. It simulates a disk through the memory space allocated by kzalloc(), and accesses the memory in the form of a block device.

The following driver sets request_mode to access ramdisk by using request queue or not using request queue.

ramdisk.c driver

#include <linux/types.h>
#include <linux/kernel.h>
#include <linux/delay.h>
#include <linux/ide.h>
#include <linux/init.h>
#include <linux/module.h>
#include <linux/errno.h> 
#include <linux/gpio.h> 
#include <asm/mach/map.h>
#include <asm/uaccess.h>
#include <asm/io.h>

#include <linux/blkdev.h>
#include <linux/blkpg.h>
#include <linux/bio.h>
#include <linux/hdreg.h>

#define RAMDISK_SIZE    (2 * 1024 * 1024)   
#define RAMDISK_NAME    "ramdisk"
#define RAMDISK_ Minors (3) / * 3 partitions*/

#define RAMDISK_ Queue (1) / * use request queue*/
#define RAMDISK_ Noqueue (2) / * do not use request queue*/

const unsigned char request_mode = RAMDISK_QUEUE;   /* Select whether to use request queue */

/* ramdisk Equipment structure */
struct ramdisk_dev{
    int major;
    unsigned char *ramdiskbuf;  /* ramdisk Memory space for analog block devices */
    spinlock_t lock;
    struct gendisk *gendisk;    /* Describes a disk device, the gendisk structure */
    struct request_queue *queue;    /* Request queue */
};

struct ramdisk_dev ramdiskdev;  /* ramdisk equipment */

static int setup_device(struct ramdisk_dev *dev);


/* Transmission process when using request queue */
static void ramdisk_transfer(struct ramdisk_dev *dev, struct request *req)
{
    unsigned long start = blk_rq_pos(req) << 9;
    unsigned long len = blk_rq_cur_bytes(req);

    char *buffer = bio_data(req->bio);

    if(rq_data_dir(req) == READ)
        memcpy(buffer, dev->ramdiskbuf + start, len);
    else if(rq_data_dir(req) == WRITE)
        memcpy(dev->ramdiskbuf + start, buffer, len);
}

/* The request processing function handles every request in the request queue */
static void ramdisk_request_fn(struct request_queue *q)
{
    struct request *req;
    struct bio_vec *bvec;
    struct req_iterator iter;

    req = blk_fetch_request(q);
#if 0
    while(req != NULL){
        struct ramdisk_dev *dev = req->rq_disk->private_data;
       
        ramdisk_transfer(dev, req);   /*  Make specific transmission processing for the request */

        __blk_end_request_all(req, 0);
        req = blk_fetch_request(q);
    }
#endif
    
     while(req != NULL){
         struct ramdisk_dev *dev = req->rq_disk->private_data;

        rq_for_each_segment(bvec, req, iter){   /* Traverse all BIOS in a request_ vec */
            ramdisk_transfer(dev, req);  
        } 
        __blk_end_request_all(req, 0);
        req = blk_fetch_request(q);
    }
    
}

/* 
 * "Manufacturing request "function,
 * Used when the request queue is not used 
*/
static void ramdisk_make_request_fn(struct request_queue *q, struct bio *bio)
{
    struct ramdisk_dev *dev = q->queuedata;
    int offset;
    struct bio_vec *bvec;
    unsigned long len = 0;
    int i = 0;

    offset = (bio->bi_sector) << 9;

    /* Process all bio in bio_vec */
    bio_for_each_segment(bvec, bio, i){
        char *ptr = __bio_kmap_atomic(bio, bio->bi_idx,i);
        len = bvec->bv_len;

        if(bio_data_dir(bio) == READ)
            memcpy(ptr, dev->ramdiskbuf + offset, len);
        else if(bio_data_dir(bio) == WRITE)
            memcpy(dev->ramdiskbuf + offset, ptr, len);

        offset += len;
        __bio_kunmap_atomic(ptr,i);
    }
    set_bit(BIO_UPTODATE, &bio->bi_flags);
    bio_endio(bio, 0);
}


static int ramdisk_open(struct block_device *bdev, fmode_t mode)
{
    //struct ramdisk_dev *dev = bdev->bd_disk->private_data;

    printk(KERN_EMERG "ramdisk open!\r\n");
    return 0;
}

static int ramdisk_release(struct gendisk *disk, fmode_t mode)
{
    //struct ramdisk_dev *dev = disk->private_data;

    printk(KERN_EMERG "ramdisk release!\r\n");

    return 0;
}

static int ramdisk_getgeo(struct block_device *bdev, struct hd_geometry *geo)
{
    //struct ramdisk_dev *dev = bdev->bd_disk->private_data;

    geo->heads = 2;         /* head */
    geo->cylinders = 32;    /* Cylinder (i.e. number of tracks) */
    geo->sectors = RAMDISK_SIZE / (2*32*512);   /* Number of sectors per track */

    return 0;
    
}

static struct block_device_operations ramdisk_fops = {
    .owner = THIS_MODULE,
    .open = ramdisk_open,
    .release = ramdisk_release,
    .getgeo = ramdisk_getgeo,
};

/* Drive entry function */
static int __init ramdisk_init(void)
{
    int ret = 0;

    /* Request RAM disk memory */
    ramdiskdev.ramdiskbuf = kzalloc(RAMDISK_SIZE, GFP_KERNEL);
    if(ramdiskdev.ramdiskbuf == NULL){  
        ret = -EINVAL;  /* Failed to request memory space */
        goto ram_fail;
    }

    /* Initialize spin lock */
    spin_lock_init(&ramdiskdev.lock);

    /* Register block device */
    ramdiskdev.major = register_blkdev(0, RAMDISK_NAME); /* Automatically assign master equipment number */
    if(ramdiskdev.major < 0){
        goto register_blkdev_fail;
    }
    printk(KERN_EMERG "major = %d\r\n",ramdiskdev.major);

    ret = setup_device(&ramdiskdev);
    if(ret != 0)
        return ret;

    return 0;

register_blkdev_fail:
    kfree(ramdiskdev.ramdiskbuf); /* Free memory */
ram_fail:
    return ret;
}

/* Drive exit function */
static void __exit ramdisk_exit(void)
{
    put_disk(ramdiskdev.gendisk);      /* Reference count - 1 */
    del_gendisk(ramdiskdev.gendisk);   /* Delete gendisk */

    blk_cleanup_queue(ramdiskdev.queue);   /* Clear request queue */

    unregister_blkdev(ramdiskdev.major, RAMDISK_NAME); /* Unregister block device */

    kfree(ramdiskdev.ramdiskbuf); /* Free memory */
}


static int setup_device(struct ramdisk_dev *dev)
{
    int ret = 0;

    switch(request_mode){
        case RAMDISK_NOQUEUE:   /* Do not use request queue */
            dev->queue = blk_alloc_queue(GFP_KERNEL);
            if(dev->queue == NULL){
                ret = -EINVAL;
                goto blk_queue_fail;  
            }
            blk_queue_make_request(dev->queue, ramdisk_make_request_fn);  /* Set "manufacturing request" function*/
            break;

        default:    /* If no mode is specified, the request queue is used by default  */
            printk(KERN_EMERG "Bad request mode %d, using simple\r\n", request_mode);

        case RAMDISK_QUEUE:  /* Use request queue */
            dev->queue = blk_init_queue(ramdisk_request_fn, &dev->lock);
            if (dev->queue == NULL){
                ret = -EINVAL; 
                goto blk_queue_fail;
            }
            break;
    }

    dev->queue->queuedata = dev;    /* Private data */

    /* Allocate and initialize gendisk */
    dev->gendisk = alloc_disk(RAMDISK_MINORS);
    if (dev->gendisk == NULL)
    {
        printk(KERN_EMERG "alloc_disk fail\r\n");
        ret = -EINVAL;  /* Failed to allocate gendisk */
        goto gendisk_alloc_fail;
    }

    /* Initialize gendisk and add it to the kernel */
    dev->gendisk->major = dev->major;
    dev->gendisk->first_minor = 0;
    dev->gendisk->fops = &ramdisk_fops;
    dev->gendisk->private_data = dev;   /* Private data */
    dev->gendisk->queue = dev->queue;
    sprintf(dev->gendisk->disk_name, RAMDISK_NAME);
    set_capacity(dev->gendisk, RAMDISK_SIZE/512);    /* Set gendisk capacity */
    add_disk(dev->gendisk);  /* Add gendisk to kernel */

    return 0;

blk_queue_fail:
    unregister_blkdev(dev->major, RAMDISK_NAME);
gendisk_alloc_fail:
    put_disk(dev->gendisk);   /* Reference count - 1 */
    kfree(dev->ramdiskbuf);   /* Free memory */
    return ret;
}


module_init(ramdisk_init);
module_exit(ramdisk_exit);

MODULE_LICENSE("GPL");
MODULE_AUTHOR("xzj");

.

test

Load driver insmod
Format ramdisk ------- mkfs vfat /dev/ramdisk

View all hard disk and partition information ------ fdisk -l

The ramdisk device has been identified. The capacity and disk information are the same as those in the above driver.

Mount the disk to the TMP directory ----- mount /dev/ramdisk /tmp

Create a new one in the tmp directory txt to test whether ramdisk access is normal, and then uninstall. Then re mount to the mnt directory. You can see that the mnt directory contains the files previously created under the tmp directory txt, indicating that the ramdisk driver works normally.

Keywords: Linux ARM

Added by astribuncio on Fri, 28 Jan 2022 23:29:03 +0200