1. Write in front
Previously, we have introduced some common knowledge of C language and standard input and output. In this blog, we will introduce the call of UNIX system interface. UNIX operating system provides services through a series of system calls. These system calls are actually functions in the operating system, which can be called by user programs. This blog mainly introduces some important system calls used in C language programs.
2. File descriptor
Everything in Linux is a file, so we need to know about file descriptors.
In UNIX operating system, all peripheral devices are regarded as files in the file system. Therefore, all input / output should be completed by reading or writing files.
In case of passing, the system must be notified of this intention before reading or writing the file. This process is called opening the file.
If you are writing a file, you may need to create the file first or discard the pre-existing contents of the file. The system checks your rights. If everything is normal, the operating system will return a small non negative integer to the program, which is called the file descriptor.
At any time, the input / output to the file is identified by the file descriptor, not by the file name.
Because most of the input / output is done through the keyboard and display, UNIX has made special arrangements for this for convenience. When the command interpreter runs a program, it will open three files. The corresponding file descriptors are 0, 1 and 2 respectively, indicating standard input, standard output and standard error in turn. If the program reads from file 0 and writes to 1 and 2, it can input / output without worrying about opening the file.
3. Low level I/O – read and write
Input and output are realized through read and write system calls. In C language, these two system calls can be accessed through the functions read and write
int n_read = read(int fd, char *buf, int n); int n_written = write(int fd, char *buf, int n);
In these two functions, the first parameter is the file descriptor, the second parameter is the character array in which the read or write data is stored in the program, and the third parameter is the number of bytes to be transmitted.
Each call returns the number of bytes actually transferred.
When reading a file, the return value of the function may be less than the number of bytes requested. If the return value is 0, it indicates that the end of the file has been reached; If the return value is - 1, it indicates that some error has occurred.
When writing a file, the return value is the number of bytes actually written. If the return value is not equal to the number of bytes requested to be written, an error has occurred.
Let's look at the following program. The specific code is as follows:
#include "syscalls.h" main() /* copy input to output */ { char buf[BUFSIZ]; int n; while ((n = read(0, buf, BUFSIZ)) > 0) write(1, buf, n); return 0; }
4.open, create, close and unlink
In addition to the default standard input, standard output and standard error files, other files must be displayed and opened before reading or writing.
Open is similar to the previous fopen, except that the former returns a file descriptor, which only knows a value of type int. The latter returns a file pointer. If an error occurs, open returns - 1.
#include <fcntl.h> int fd; int open(char *name, int flags, int perms); fd = open(name, flags, perms);
The parameter name is a string containing the file name. The second parameter flags is a value of type int, which describes how to open the file. The main values are as follows:
O_RDONLY Open file as read-only O_WRONLY Open file in write only mode O_RDWR Open file in read-write mode
If you use open to open a file that does not exist, it will cause an error. You can use the create system call to create a new file or overwrite an existing old file, as shown below:
int creat(char *name, int perms); fd = creat(name, perms);
If create successfully creates a file, it returns a file descriptor, otherwise - 1. If the file already exists, create truncates the length of the file to 0 and discards the existing content. Using create to create an existing file does not cause an error.
If the file to be created does not exist, create creates the file with the permissions specified by the parameter perms. We can see the following procedures, specifically as follows:
#include <stdio.h> #include <fcntl.h> #include <unistd.h> #define PERMS 0666 /* RW for owner, group, others */ void error(char *, ...); /* cp: copy f1 to f2 */ main(int argc, char *argv[]) { int f1, f2, n; char buf[BUFSIZ]; if (argc != 3) error("Usage: cp from to"); if ((f1 = open(argv[1], O_RDONLY, 0)) == -1) error("cp: can't open %s", argv[1]); if ((f2 = creat(argv[2], PERMS)) == -1) error("cp: can't create %s, mode %03o", argv[2], PERMS); while ((n = read(f1, buf, BUFSIZ)) > 0) if (write(f2, buf, n) != n) error("cp: write error on file %s", argv[2]); return 0; }
The output file created by the program has fixed permissions 0666.
Note that the function error is similar to the function printf and can be called with a variable length parameter table.
There is a limit to the number of files a program can open at the same time (usually 20). Accordingly, if a program needs to process many files at the same time, it must reuse the file descriptor. The function close(int fd) is used to disconnect the file descriptor from the open file and release the file descriptor for use by other files. The close function corresponds to the fclose function in the standard library, but it does not need to flush the buffer. If the program exits through the exit function or returns from the main program, all open files will be closed.
The function unlink(char * name) deletes the file nmae from the file system, which corresponds to the function remove of the standard library.
5. Random access – lseek
Input / output is usually sequential: each time read and write are called, the position of read and write is immediately after the position of the previous operation. However, sometimes files need to be accessed in any order. The system call lseek can move anywhere in the file without actually reading or writing any data.
long lseek(int fd, long offset, int origin);
Set the current location of the file with the file descriptor fd as offset, where offset is relative to the location specified by orgin. The subsequent read-write operation will start at this position. The value of origin can be 0, 1 or 2, which is used to specify that the offset is calculated from the beginning of the file, from the current position or from the end of the file, respectively.
When using lseek system call, you can treat the file as a large array at the expense of slower access speed. For example, the following function will read in any number of bytes from any location in the file. It returns the number of bytes read in. If an error occurs, it returns - 1.
#include <unistd.h> /*get: read n bytes from position pos */ int get(int fd, long pos, char *buf, int n) { if (lseek(fd, pos, 0) >= 0) /* get to pos */ return read(fd, buf, n); else return -1; }
The lseek system call returns a value of type long, which indicates the new location of the file. If an error occurs, it returns - 1.
6. Example – implementation of fopen and getc functions
The files in the standard library are not described by file descriptors, but by file pointers.
File refers to a pointer that executes the structure containing various information of the file, which contains the following contents:
A pointer to the buffer, through which you can read a large part of the file at a time; A counter that records the number of characters remaining in the buffer; A pointer to the next character in the buffer; File descriptor; Flag describing read / write mode; Flag describing the error, etc.
Let's see the main function of fopen function is to open the file. The specific code is as follows:
typedef struct _iobuf { int cnt; /* characters left */ char *ptr; /* next character position */ char *base; /* location of buffer */ int flag; /* mode of file access */ int fd; /* file descriptor */ } FILE; extern FILE _iob[OPEN_MAX]; #include <fcntl.h> #include "syscalls.h" #define PERMS 0666 /* RW for owner, group, others */ FILE *fopen(char *name, char *mode) { int fd; FILE *fp; if (*mode != 'r' && *mode != 'w' && *mode != 'a') return NULL; for (fp = _iob; fp < _iob + OPEN_MAX; fp++) if ((fp->flag & (_READ | _WRITE)) == 0) break; /* found free slot */ if (fp >= _iob + OPEN_MAX) /* no free slots */ return NULL; if (*mode == 'w') fd = creat(name, PERMS); else if (*mode == 'a') { if ((fd = open(name, O_WRONLY, 0)) == -1) fd = creat(name, PERMS); lseek(fd, 0L, 2); } else fd = open(name, O_RDONLY, 0); if (fd == -1) /* couldn't access name */ return NULL; fp->fd = fd; fp->cnt = 0; fp->base = NULL; fp->flag = (*mode == 'r') ? _READ : _WRITE; return fp; }
7. Example – storage allocator
Malloc does not allocate storage space from a fixed size array determined at compile time, but requests space from the operating system when needed. The space managed by malloc is not necessarily continuous. In this way, the free storage space is organized in the form of free linked list. Each block contains a length, a pointer to the next block and a pointer to its own storage space. These blocks are organized in ascending order of storage addresses, and the last block points to the first block
When there is an application request, malloc will scan the free block list until a large enough block is found.
If the block exactly matches the requested size, it is removed from the linked list and returned to the user. If the block is too large, it is divided into two parts: the block with appropriate size is returned to the user, and the rest is left in the free block list. If a large enough block cannot be found, apply to the operating system for a large block and add it to the free block list.
The release process is also to first search the free block linked list to find the appropriate location where the released block can be inserted. If there is a free block on either side adjacent to the released block, the two blocks are combined into a larger block so that there will not be too many fragments in the storage space. Because the list of free blocks is linked together in the order of increasing addresses, it is easy to judge whether adjacent blocks are free.
The free block contains a pointer to the next block in the linked list, a record of block size and a pointer to the free space itself. The control information at the beginning of the block is called the header. To simplify block alignment, the size of all blocks must be an integral multiple of the head size. And the head is correctly aligned. This is achieved through a union that contains the desired head structure and an instance of the type with the most limited alignment requirements.
In the malloc function, the length of the request is rounded to ensure that it is an integral multiple of the header size. The actual allocated block will contain one more unit for the header itself. The size of the block actually allocated will be recorded in the size field of the header. The guidance returned by the malloc function will point to free space, not the head of the block. The user can perform any operation on the obtained storage space, but if data is written outside the allocated storage space, the block linked list may be destroyed.
The size field is necessary because the blocks controlled by malloc function are not necessarily continuous, so it is impossible to calculate their size through pointer arithmetic operation.
The variable base represents the header of the free block linked list. When malloc function is called for the first time, freep is NULL, and the system will create a degenerate free block linked list, which only contains a block of size 0, and the block points to itself. In any case, when free space is requested, the free block linked list will be searched. The search starts where the last free block was found. This strategy can ensure that the linked list is uniform. If the found block is too large, its tail will be returned to the user. In this way, the header of the initial block only needs to modify the size field. In any case, the pointer returned to the user points to the free storage space in the block, that is, one unit larger than the pointer to the head.
static Header base; /* empty list to get started */ static Header *freep = NULL; /* start of free list */ /* malloc: general-purpose storage allocator */ void *malloc(unsigned nbytes) { Header *p, *prevp; Header *moreroce(unsigned); unsigned nunits; nunits = (nbytes + sizeof(Header) - 1) / sizeof(header) + 1; if ((prevp = freep) == NULL) { /* no free list yet */ base.s.ptr = freeptr = prevptr = &base; base.s.size = 0; } for (p = prevp->s.ptr;; prevp = p, p = p->s.ptr) { if (p->s.size >= nunits) { /* big enough */ if (p->s.size == nunits) /* exactly */ prevp->s.ptr = p->s.ptr; else { /* allocate tail end */ p->s.size -= nunits; p += p->s.size; p->s.size = nunits; } freep = prevp; return (void *) (p + 1); } if (p == freep) /* wrapped around free list */ if ((p = morecore(nunits)) == NULL) return NULL; /* none left */ } }
The function morecore is used to request storage space from the operating system. Its implementation details vary from system to system.
Finally, let's look at the free function. Starting from the address pointed to by free, it scans the free block list one by one to find the place where the free block can be inserted. This position may be before two free blocks or at the end of the linked list. In either case, if the released block is adjacent to another free block, the two blocks are combined. The operation of merging two blocks is very simple. You only need to set the pointer to the correct position and set the correct block size.
/* free: put block ap in free list */ void free(void *ap) { Header *bp, *p; bp = (Header *)ap - 1; /* point to block header */ for (p = freep; !(bp > p && bp < p->s.ptr); p = p->s.ptr) if (p >= p->s.ptr && (bp > p || bp < p->s.ptr)) break; /* freed block at start or end of arena */ if (bp + bp->size == p->s.ptr) { /* join to upper nbr */ bp->s.size += p->s.ptr->s.size; bp->s.ptr = p->s.ptr->s.ptr; } else bp->s.ptr = p->s.ptr; if (p + p->size == bp) { /* join to lower nbr */ p->s.size += bp->s.size; p->s.ptr = bp->s.ptr; } else p->s.ptr = bp; freep = p; }
8. Write at the end
So far, the introduction of the whole C is over. More practice is needed later, otherwise it won't work. The code must be written more.