The tenth day of basic learning in python -- Notes

File operation (IO Technology)
A complete program generally includes data storage and reading; The program data we wrote earlier is not actually stored, so the data disappears after the python interpreter executes. In actual development, we often need to read data from external storage media (hard disk, optical disc, U SB flash disk, etc.), or store the data generated by the program into files to realize "persistent" storage

Text and binary files

  1. Text file text file stores ordinary "character" text. python defaults to unicode character set (two bytes represent one character, up to 65536), which can be opened by Notepad program. However, documents edited by word software are not text files.
  2. Binary file binary file stores the data content with "bytes" and cannot be opened with Notepad. Special software must be used for decoding. Common are: MP4 video files, MP3 audio files, JPG pictures, doc documents and so on.
    Create file object (open)
    The open() function is used to create file objects. The basic syntax format is as follows:
    Open (file name [, opening method])

Basic file write operation
Writing text files is generally three steps:

  1. Create file object
  2. Write data
  3. Close file object

Introduction to common codes
ASCII:
This is the earliest and most common single byte coding system in the world, which is mainly used to display modern English and other Western European languages. ASCII code is represented by 7 bits and can only represent 128 characters. Only 27 = 128 characters are defined, which can be fully encoded with 7 bits, while the capacity of 8 bits per byte is 256, so the highest bit of ASCII encoding of one byte is always 0.

ISO8859-1:
The corresponding text symbols of Western European language, Greek, Thai, Arabic and Hebrew are added on top of ASCII coding, which is downward compatible with ASCII coding

GB2312,GBK,GB18030: GB2312
The full name is the Chinese character coded character set for information exchange. It was released in China in 1980 and is mainly used for Chinese character processing in computer systems. GB2312 mainly contains 6763 Chinese characters and 682 symbols. GB2312 covers most of the usage of Chinese characters, but it can't deal with special rare words such as ancient Chinese, so codes such as GBK and GB18030 appeared later.

GBK:
On the basis of it, more Chinese characters are added. It contains a total of 21003 Chinese characters. GB18030. Now, the latest internal code word set was released in 2000 and enforced in 2001. It contains the language characters of most ethnic minorities in China, and contains more than 70000 Chinese characters. It mainly adopts single byte, double byte and four byte character coding

Unicode:
Unicode coding is designed to be fixed with two bytes, and all characters are represented by 16 bits (2 ^ 16 = 65536), including English characters that previously occupied only 8 bits

UTF-8:
For English letters, unicode also needs two bytes to represent. Therefore, unicode is not convenient for transmission and storage. Therefore, UTF coding is generated

write()/writelines() write data write(a):
Write string a to the file writelines(b): write the string list to the file without adding line breaks

close() closes the file stream:
Since the underlying file is controlled by the operating system, the file object we open must explicitly call the close() method to close the file object.

#Use the exception mechanism to manage the closing operation of file objects
try:
    f = open(r"a.txt","w")
    strs = ["aa\n","bb\n","cc\n"]
    f.writelines(strs)
except BaseException as e:
    print(e)
finally:
    f.close()

Operation results:

with statement (context manager)
The with keyword (context manager) can automatically manage context resources. No matter what reason jumps out of the with block, it can ensure that the file is closed correctly, and can automatically restore the scene when entering the code block after the code block is executed.

with open(r"b.txt","a") as f:
    f.write("hahahaha")

Operation results:

Reading of text file
The following three methods are generally used to read files:

  1. read([size]) reads size characters from the file and returns them as a result. If there is no size parameter, the entire file is read. Reading to the end of the file returns an empty string. 2. readline() reads a line and returns it as a result. Reading to the end of the file returns an empty string.
  2. In the readlines() text file, each line is stored in the list as a string and the list is returned
[Operation] read a file by line
with open(r"b.txt","r") as f:
    while True:
        fragment = f.readline()
        if not fragment:
            break
        else:
            print(fragment,end="")

Reading and writing of binary files
The processing flow of binary file is consistent with that of text file. First, we need to create the file object, but we need to specify the binary mode to create the binary file object.

with open("jie.JPG","rb") as f:
    with open("jie_copy.JPG","wb") as w:
        for line in f.readlines():
            w.write(line)
print("Picture copy complete")

Operation results:

Common properties and methods of file objects
File objects encapsulate file related operations. Earlier, we learned to read and write files through file objects.
seek(offset[,whence]):
Move the file pointer to the new position, and offset represents the offset of how many bytes relative to where;
Different values of where represent different meanings:
0: calculate from file header (default)
1: Calculate from current position
2: Calculate from the end of the file

with open("b.txt","r",encoding="utf-8") as f:
    print("file name{0}".format(f.name))
    print(f.tell())
    print("Read content:{0}".format(str(f.readline())))
    print(f.tell())
    f.seek(0,0)
    print("Read content:{0}".format(str(f.readline())))

Operation results:

Using pickle serialization

In Python, everything is an object, which is essentially a "memory block for storing data". Sometimes, we need to save the "data of memory block" to the hard disk or transmit it to other computers through the network. At this time, you need to "serialize and deserialize objects". Object serialization mechanism is widely used in distributed and parallel systems.
Serialization refers to the conversion of objects into "serialized" data form, which is stored on the hard disk or transmitted to other places through the network. Deserialization refers to the reverse process of converting the read "serialized data" into objects.

Operation of CSV file:
It is a comma delimited text format, which is often used for data exchange, import and export of Excel files and database data.

os.path module
os. The path module provides directory related operations (path judgment, path segmentation, path connection, folder traversal)

import os
os.system("regedit") 

#Test the operation of files and directories in the os module
import os
print(os.name)#Windows - > NT Linux and unix - > POSIX
print(os.sep)  #Windows - > \ Linux and UNIX - >/
print(repr(os.linesep)) #windows->\r\n  linux-->\n\
print(os.getcwd())
os.chdir("d:")  #Change the current working directory to: d: root directory
os.mkdir("book") #Create directory
#os.rmdir("book")#There are many things in the relative path, and there is the current working directory

Operation results:

#Test the operation of files and directories in the os module
import os.path
#Judge: absolute path, directory, file and file existence
print(os.path.isabs("d:/a.txt"))
print(os.path.isdir("d:/a.txt"))
print(os.path.isfile("d:/a.txt"))
print(os.path.exists("d:/a.txt"))

Operation results:

[example] list all in the specified directory py file and output the file name

#List all under the specified directory py file and output the file name
import os
path = os.getcwd()
file_list = os.listdir(path)
for filenmae in file_list:
    if filenmae.endswith("py"):
        print(filenmae,end="\t")
print("##############")
file_list2 = [filenmae for filename in os.listdir(path) if filenmae.endswith("py")]
for f in file_list2:
    print(f,end="\t")

Operation results:

walk() recursively traverses all files and directories
os.walk() method:
Returns a tuple of 3 elements (dirpath, dirnames, filenames),
dirpath: the path of the specified directory to list
dirnames: all folders in the directory
filenames: all files in the directory

#Test walk()
import os
path = os.getcwd()
list_files = os.walk(path)
for dirpath,dirnames,filenmaes in list_files:
    for dir in dirnames:
        print(dir)
    for file in filenmaes:
        print(file)

Operation results:

shutil module (copy and compression)
The shutil module is provided in the python standard library. It is mainly used to copy, move and delete files and folders; You can also compress and decompress files and folders.

import shutil
shutil.copyfile("b.txt","b_copy.txt")

Operation results:

recursive algorithm
Recursion is a common way to solve problems, that is, to gradually simplify the problem. The basic idea of recursion is "call yourself". A method using recursion technology will call itself directly or indirectly.
Using recursion, we can solve some complex problems with simple programs. For example: Calculation of Fibonacci sequence, Hanoi Tower, fast platoon and other problems.

[Example] using recursion n!
#Test recursion, recursion factorization
def factorial(n):
    if n==1:
        return n
    else:
        return n*factorial(n-1)
print(factorial(5))

Operation results:

[example] use recursive algorithm to traverse all files in the directory

#Test recursion, print directories and files
import os
allfiles=[]
def getAllFiles(path,level):
    childFiles = os.listdir(path)
    for file in childFiles:
        filepath = os.path.join(path,file)
        if os.path.isdir(filepath):
            getAllFiles(filepath,level+1)
        allfiles.append("\t"*level+filepath)
getAllFiles("D:\BS",0)
for f in reversed(allfiles):
    print(f)

Operation results:

Keywords: Python Back-end

Added by blt2589 on Thu, 27 Jan 2022 18:18:29 +0200