python file operation

Basic operation of file

Open / close file
The built-in function open can open a file under the specified path and return a file object
There are two most commonly used parameters for open. The first parameter is the file name (absolute path or relative path),
The second is the opening mode, 'r' /'w '/'a' /'b ', which means read (default) / write / append write / binary

f = open('C:/Users/wyw15/Desktop/test.txt/','r')
f.close()

It's not easy to close it in time 
def func():
    f = open('C:/Users/wyw15/Desktop/test.txt/','r')
    # Perform file operations...
    x = 10
    if x == 10:
        return 
    f.close()
When executed if Statement, the function will return, and the whole function may be recycled
 Garbage collection mechanism, we can not determine when this can be recycled, which is unpredictable
 We can't count on this mechanism. We also need to recycle manually
modify
def func():
    f = open('C:/Users/wyw15/Desktop/test.txt/','r')
    # Perform file operations...
    x = 10
    if x == 10:
        f.close()
        return 
    f.close()
That is, close the file in every operation that can be returned
 Disadvantages: we have many conditions. We need to recycle under each condition
 Also, once there is a problem with the statement above the conditional statement,
We will directly report an error in Jining at that problem, and the file closing statement cannot be executed
 At this time, our documents will remain open

Context manager

def func():
    with open('C:/Users/wyw15/Desktop/test.txt/','r') as f:
        #File basic operation
        pass 

read file
Read: read the data of specified length bytes and return a string
readline: read a line of data and return a string, the same as for
readlines: read the entire file and return a list Each item in the list is a string representing the contents of a line
Directly use for line in f to loop through each line The function is similar to that of readline
Read only one line at a time. Compared with readlines, it takes less memory, but the number of accesses to IO devices will increase and the speed is slow

with open('C:/Users/wyw15/Desktop/test.txt','r') as f:
    print(f.readlines())
Output garbled code

Solution: unified coding format
Typical Chinese coding format: GBK UTF-8
Your file is a kind of coding, and code processing is also a kind of coding
The two must match each other, and the words are not garbled
How do I know what format a file is?
Open the file you want to operate - "" file - "" save as - "" on the left of the bottom line of save

with open('C:/Users/wyw15/Desktop/test.txt', 'r' ,encoding='utf-8') as f:
    print(f.readlines())
    ['Hello world']

Coding occurs in those environments,
1. File storage
2. File reading and parsing
3. Network transmission and data format conversion

Another way to read files
with open('C:/Users/wyw15/Desktop/test.txt', 'r' ,encoding='utf-8') as f:
    for line in f:
        print(line)

Differences between two file reads:
readlines is to read out all the files and read them into memory
for once in a while, read a line,
That is, readlines read it once and for read it many times
readlines reads fast and takes up too much memory - small files
for is slow to read and takes up less memory

Note that functions such as readline or readlines still retain line breaks
So we often need to write such code to get rid of line breaks

for line in f.readlines():
print(line.strip())
# perhaps
data = [line.strip() for line in f.readlines()] # Remember our list parsing syntax

Write file

Write: write a string to a file
To write a file, you must open the file in the way of 'w' or 'a' Otherwise, it will fail to write

with open('C:/Users/wyw15/Desktop/test.txt', 'w' ,encoding='utf-8') as f:
    f.write('hehehe') 

writelines: the parameter is a list, and each element in the list is a string
There is no such function as write line Because this action is equivalent to adding '\ n' after the string when writing
Similarly, when using writelines, you also need to ensure that the end of each element is marked with '\ n'

About read / write buffers

Learning Linux, we know that the functions of FREAD and fwrite in C language library functions are similar to those of system calls read and write however
fread/fwrite is with buffer
Python's file read and write operations can support either buffered or unbuffered
When using the open function to open a file, there is actually a third parameter that specifies whether to use the buffer and the size of the buffer
How much is it (see help(open) and print f.doc)
Use the flush method to flush the buffer immediately

Operation file pointer

The file has the ability of random access This process is accomplished by manipulating the file pointer
seek: move the file pointer to the first few bytes from the beginning of the file There are two parameters The first parameter offset represents the byte of the offset
Number The second parameter, when, indicates where the offset starts A value of 0 means to calculate from the beginning, a value of 1 means to calculate from the current position, and a value of
2. Indicates from the end of the file
tell: get the location pointed to by the current file pointer Returns the offset from the current location to the beginning of the file

with statement and context manager

As we said just now, close the used file objects in time, otherwise it may cause handle leakage
But what if the logic is cumbersome, or we forget to call close manually?
C + + uses "smart pointer" to manage the release of memory / handle, and automatically completes the release with the help of object constructor and destructor
Release process
However, the recycling of objects in Python depends on the GC mechanism, which is not as time - effective as in C + +
Python introduces a context manager to solve this kind of problem

with open('out') as f:
print(''.join(f.readlines()))

File operations are performed within the with statement block When the file operation is completed, it is out of the with statement It will automatically perform the closing operation of f
Only an object that supports context protocol can be applied to the with statement We call this object context manager Many in Python
Built in objects are context managers, such as file objects, thread lock objects, etc

Code example: Based on a simple text, Construct a large text(Construct test data)
import sys
input_file_path = sys.argv[1]
output_file_path = sys.argv[2]
output_size = int(sys.argv[3]) * 1024 * 1024
input_file = open(input_file_path)
input_data = input_file.readlines()
output_file = open(output_file_path, 'w')
index = 0
total_size = 0
while True:
if total_size > output_size:
break
output_file.write(input_data[index % len(input_data)])
total_size += len(input_data[index % len(input_data)])
index += 1

Basic operation of file system

File path operation
os.path this module contains some practical path operation functions

import os.path

p = '/Users/wyw15/Desktop/test.txt'
result = os.path.dirname()
print(result)

separate
basename() removes the directory path and returns the file name

import os.path
p = 'aaa/bbb/ccc.txt'
result = os.path.basename(p)
print(result)
ccc.txt

dirname removes the file name and returns the directory path

import os.path
p = 'aaa/bbb/ccc.txt'
result = os.path.dirname(p)
print(result)
aaa/bbb

join combines the separated parts into a pathname

import os.path
p = 'aaa/bbb/ccc.txt'
result = os.path.split(p) #First use split for segmentation
print(f'[{result}]')
# [('aaa/bbb', 'ccc.txt')]
result = os.path.join(p)
print(result)
aaa/bbb/ccc.txt

split returns (dirname(),basename()) tuples

import os.path
p = 'aaa/bbb/ccc.txt'
result = os.path.split(p) #First use split for segmentation
print(f'[{result}]')
[('aaa/bbb', 'ccc.txt')]

splitdrive returns a (drivename,pathname) tuple
Splittext returns (filename,extension) tuples -- important

import os.path
p = 'aaa/bbb/ccc.txt'
result = os.path.splitext(p)
print(f'[{result}]')
[('aaa/bbb/ccc', '.txt')]
This is a tuple. How do I get it.txt
import os.path
p = 'aaa/bbb/ccc.txt'
_,result = os.path.splitext(p)
print(f'[{result}]')
[.txt]

information
getatime() returns the most recent return time
getctime returns the file creation time
getmtime returns the most recent file modification time
getsize returns the file size in bytes

query
Exists specifies whether the path (file or directory) exists - important
The returned value is bool

import os.path
p = 'aaa/sss/cc.txt'
print(os.path.exists(p))
# False
pp = 'C:/Users/wyw15/Desktop/test.txt'
print(os.path.exists(pp))
True

isabs specifies whether the path is absolute
isdir specifies whether the path exists and is a directory

import os.path
p = 'C:/Users/wyw15/Desktop/test.txt'
print(os.path.isdir(p))
False
 This is an ordinary file

isfile specifies whether the path exists and is a file
islink specifies whether the path exists and is a symbolic link
ismount specifies whether the path exists and is a mount point

Does the two pathnames of the samefile point to the same file

Extension:
Modify name
import os.path as path

Common file system operations

The os module contains many basic operations on files / directories, as shown in the following table

File operation
mkfifo()/mknod() create named pipes / create file system nodes

remove()/unlink() delete file

import os.path
p = 'C:/Users/wyw15/Desktop/test_walk/'
os.remove(p + '11.txt')
At this time, we are test_walk Files in 11.txt Will be deleted

rename()/renames() renames the file
stst() returns file information
symlink() creates a symbolic link
utime() update timestamp
tmpfile() creates and opens ('w+b ') a new temporary file

walk() generates all file names under a directory tree - important
i

mport os.path
p = 'C:/Users/wyw15/Desktop/test_walk'
for item in os.walk(p):
    print(item)
('C:/Users/wyw15/Desktop/test_walk', ['a', 'b'], [])
('C:/Users/wyw15/Desktop/test_walk\\a', [], ['1.txt', '2.txt'])
('C:/Users/wyw15/Desktop/test_walk\\b', [], ['3.txt', '4.txt'])
Because the returned is a tuple, we can intercept it
import os.path
p = 'C:/Users/wyw15/Desktop/test_walk'
for base,_,files in os.walk(p):
    for f in files:
        print(base + f)
C:/Users/wyw15/Desktop/test_walk\a1.txt
C:/Users/wyw15/Desktop/test_walk\a2.txt
C:/Users/wyw15/Desktop/test_walk\b3.txt
C:/Users/wyw15/Desktop/test_walk\b4.txt

Directory / folder
chdir()/fchdir() change the current working directory / change the current working directory through a file descriptor
Equivalent to cd command

chroot() changes the root directory of the current process

listdir() lists the files in the specified directory
Want to be ls function

import os.path
p = 'C:/Users/wyw15/Desktop/test_walk/'
print(os.listdir(p))
['b']

getcwd()/getcwdu() returns the current working directory / function is the same, but plays a Unicode object
mkdir()/makedirs() create directory / create multi tier directory

Rmdir() / removediers() delete directory / delete multi tier directory -- python2
Only empty directories can be deleted now

import shutil
p = 'C:/Users/wyw15/Desktop/test_walk/'
shutil.rmtree(p + 'a')
take test_walk Medium a The file was deleted

Access / permissions
access() check permission mode
chmod() changes the permission mode
chown()/lchown() has the same function of changing owner and group ID / but does not track links
umask() sets the default permission mode

File descriptor operation
Open() the underlying operating system open (for files, use the standard open() function)
read()/write() reads / writes data according to the file descriptor
dup()/dup2() assigns the same file descriptor / function, but is copied to another file descriptor

Keywords: Python Back-end

Added by gijs25 on Thu, 20 Jan 2022 21:55:15 +0200