Fundamentals of Python -- brush Liao Xuefeng's tutorial notes

On the way of in-depth learning, I found that I had forgotten the python I learned two years ago, and I was especially unfamiliar with the object-oriented part. Therefore, I brushed teacher Liao Xuefeng's official tutorial to sort out the missing knowledge points.

Reference website: https://www.liaoxuefeng.com/wiki/1016959663602400

1.struct

Python provides a struct module to solve the conversion of bytes and other binary data types.

demo1:

import struct
print(struct.pack('>I', 10240099))#The pack function changes any data type into bytes, > indicates that the byte order is big endian, that is, network order, and I indicates a 4-byte unsigned integer
print(struct.unpack('>IH', b'\xf0\xf0\xf0\xf0\x80\x80'))#unpack changes bytes into the corresponding data type, and the following bytes are successively changed into I: 4-byte unsigned integers and H: 2-byte unsigned integers

out1:

b'\x00\x9c@c'
(4042322160, 32896)

2.hashlib

Python's hashlib provides common summary algorithms, such as MD5, SHA1, and so on.

What is a digest algorithm? The algorithm is also called hash algorithm and hash algorithm. It converts any length of data into a fixed length data string (usually represented by hexadecimal string) through a function.

For example, you wrote an article with a string 'how to use python hashlib - by Michael', and attached a summary of the article as' 2d73d4f15c0db7f5ecb321b6a65e5d6d '. If someone tampered with your article and published it as' how to use python hashlib - by Bob ', you can point out that Bob tampered with your article at once, because the summary calculated according to' how to use python hashlib - by Bob 'is different from the summary of the original article.

It can be seen that the summary algorithm is to calculate a fixed length summary digest for any length of data through the summary function f(), in order to find out whether the original data has been tampered with.

The reason why the summary algorithm can point out whether the data has been tampered with is because the summary function is a one-way function. It is easy to calculate f(data), but it is very difficult to deduce data through digest. Moreover, a bit modification to the original data will lead to a completely different calculated summary.

We take the common digest algorithm MD5 as an example to calculate the MD5 value of a string

demo2:

import hashlib

md5 = hashlib.md5()
md5.update('how to use md5 in python hashlib?'.encode('utf-8'))

'''
#If the amount of data is large, you can call update() several times in blocks
md5.update('how to use md5 in '.encode('utf-8'))
md5.update('python hashlib?'.encode('utf-8'))
'''


print(md5.hexdigest())

out2:

d26a53750bc40b38b65a520292f69306

Another common summarization algorithm is SHA1. Calling SHA1 is completely similar to calling MD5:

demo3:

import hashlib

sha1 = hashlib.sha1()
sha1.update('how to use sha1 in '.encode('utf-8'))
sha1.update('python hashlib?'.encode('utf-8'))
print(sha1.hexdigest())

out3:

2c76b57293ce30acef38d98f6046927161b46a44

The result of SHA1 is 160 bit bytes, usually represented by a 40 bit hexadecimal string.

The safer algorithms than SHA1 are SHA256 and SHA512, but the more secure algorithms are not only slower, but also have a longer summary length.

Is it possible that two different data get the same summary through a summary algorithm? It is entirely possible because any summarization algorithm maps an infinite number of data sets to a finite set. This situation is called collision. For example, Bob tries to launch an article 'how to learn hashlib in python - by Bob' based on your summary, and the summary of this article is exactly the same as your article. This situation is not impossible, but it is very difficult.

Since the MD5 value of common passwords can be easily calculated, it is necessary to ensure that the stored user passwords are not MD5 of those common passwords that have been calculated. This method is realized by adding a complex string to the original password, commonly known as "adding salt":

def calc_md5(password):
    return get_md5(password + 'the-Salt')

For the MD5 password processed by Salt, as long as Salt is not known by hackers, even if the user enters a simple password, it is difficult to deduce the plaintext password through MD5.

However, if two users use the same simple password, such as 123456, two identical MD5 values will be stored in the database, indicating that the passwords of the two users are the same. Is there a way for users with the same password to store different MD5?

If it is assumed that the user cannot modify the login name, MD5 can be calculated by taking the login name as part of Salt, so that users with the same password can also store different MD5.

3.hmac

Through the hash algorithm, we can verify whether a piece of data is valid by comparing the hash value of the data. For example, to judge whether the user password is correct, we use the password saved in the database_ MD5 compares the results of calculating md5(password). If they are consistent, the password entered by the user is correct.

In order to prevent hackers from deducing the original password according to the hash value through the rainbow table, when calculating the hash, you can't calculate only for the original input. You need to add a salt to make the same input get different hashes, which greatly increases the difficulty of hackers.

If the salt is generated randomly by ourselves, we usually use md5(message + salt) when calculating MD5. But in fact, regard salt as a "password", and the hash of salt is: when calculating the hash of a message, calculate different hashes according to different passwords. To verify the hash value, you must also provide the correct password.

This is actually the Hmac algorithm: keyed hashing for message authentication. It uses a standard algorithm to mix the key into the hash calculation process.

Unlike our custom salt adding algorithm, Hmac algorithm is common to all hash algorithms, whether MD5 or SHA-1. Using Hmac instead of our own salt algorithm can make the program algorithm more standardized and safer.

Python's own HMAC module implements the standard HMAC algorithm. Let's see how to use HMAC to implement hash with key.

First, we need to prepare the original message message, random key and hash algorithm to be calculated. MD5 is used here. The code using hmac is as follows:

demo4:

import hmac
message = b'Hello, world!'
key = b'secret'
h = hmac.new(key, message, digestmod='MD5')
# If the message is long, you can call h.update(msg) multiple times
print(h.hexdigest())

out4:

fa4ee7d173f2d97ee79022d1a7355bcf

It can be seen that using hmac is very similar to ordinary hash algorithm. The length of hmac output is consistent with that of the original hash algorithm. Note that both the key and message passed in are bytes, and the str type needs to be encoded as bytes first.

4.itertools

Python's built-in module itertools provides very useful functions for manipulating iterative objects.

First, let's look at several "infinite" iterators provided by itertools

demo5:

>>> import itertools
>>> natuals = itertools.count(1)
>>> for n in natuals:
...     print(n)
...
1
2
3
...

Because count() will create an infinite iterator, the above code will print out the natural number sequence. It can't stop at all. You can only press Ctrl+C to exit.

cycle() will repeat an incoming sequence indefinitely:

demo6:

>>> import itertools
>>> cs = itertools.cycle('ABC') # Note that strings are also a kind of sequences
>>> for c in cs:
...     print(c)
...
'A'
'B'
'C'
'A'
'B'
'C'
...

Also can't stop.

repeat() is responsible for repeating an element indefinitely. However, if you provide the second parameter, you can limit the number of repetitions:

>>> ns = itertools.repeat('A', 3)
>>> for n in ns:
...     print(n)
...
A
A
A

The infinite sequence will iterate indefinitely only during the for iteration. If only an iteration object is created, it will not generate infinite elements in advance. In fact, it is impossible to create infinite elements in memory.

>>> print(ns)
repeat('A', 0)

Although an infinite sequence can iterate indefinitely, we usually intercept a finite sequence according to conditional judgment through functions such as takewhile():

demo7:

>>> natuals = itertools.count(1)
>>> ns = itertools.takewhile(lambda x: x <= 10, natuals)
>>> list(ns)
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

chain()

chain() can concatenate a set of iterative objects to form a larger iterator

demo8:

>>> for c in itertools.chain('ABC','XYZ'):
	print(c)

A
B
C
X
Y
Z

groupby()

groupby() picks out the adjacent repeating elements in the iterator and puts them together:

demo9:

>>> for key, group in itertools.groupby('AAABBBCCAAA'):
	print(key, list(group))

	
A ['A', 'A', 'A']
B ['B', 'B', 'B']
C ['C', 'C']
A ['A', 'A', 'A']

In fact, the selection rule is completed through the function. As long as the return values of the two elements acting on the function are equal, the two elements are considered to be in a group, and the return value of the function is used as the key of the group. If we want to ignore the case grouping, we can make the elements' a 'and' a 'return the same key:

demo10:

>>> for key, group in itertools.groupby('AaaBBbcCAAa', lambda c: c.upper()):
	print(key, list(group))

A ['A', 'a', 'a']
B ['B', 'B', 'b']
C ['c', 'C']
A ['A', 'A', 'a']

5.contextlib

Not only the fp object returned by the open() function can use the with statement. In fact, any object can be used in the with statement as long as the context management is correctly implemented.

Context management is implemented through__ enter__ And__ exit__ These two methods are implemented. For example, the following class implements these two methods:

demo11:

class Query(object):

    def __init__(self, name):
        self.name = name

    def __enter__(self):
        print('Begin')
        return self
    
    def __exit__(self, exc_type, exc_value, traceback):
        if exc_type:
            print('Error')
        else:
            print('End')
    
    def query(self):
        print('Query info about %s...' % self.name)


with Query('Bob') as q:
    q.query()

out11:

Begin
Query info about Bob...
End

@contextmanager

Write__ enter__ And__ exit__ It is still cumbersome, so the Python standard library contextlib provides a simpler way to write. The above code can be rewritten as follows

demo12:

from contextlib import contextmanager

class Query(object):

    def __init__(self, name):
        self.name = name

    def query(self):
        print('Query info about %s...' % self.name)

@contextmanager
def create_query(name):
    print('Begin')
    q = Query(name)
    yield q
    print('End')


#@The decorator of contextmanager accepts a generator and uses the yield statement to put with As var outputs the variable, and then the with statement can work normally
with create_query('Bob') as q:
    q.query()

out12:

Begin
Query info about Bob...
End

Many times, we want to automatically execute specific code before and after the execution of a piece of code, which can also be implemented with @ contextmanager. For example:

demo13:

from contextlib import contextmanager

@contextmanager
def tag(name):
    print("<%s>" % name)
    yield
    print("</%s>" % name)

with tag("a1"):
    print("hello")
    print("world")

out13:

<a1>
hello
world
</a1>

The execution order of the code is:

1.with statement first executes the statement before yield, so < A1 > is printed;
2. The yield call will execute all statements inside the with statement, so print out hello and world;
3. Finally, execute the statement after yield and print < / A1 >.

@closing

If an object has no implementation context, we can't use it in the with statement. At this time, you can use closing() to turn the object into a context object. For example, use urlopen() with the with statement:

demo14:

from contextlib import closing
from urllib.request import urlopen

with closing(urlopen('https://www.python.org')) as page:
    for line in page:
        print(line)

6.urllib

urllib provides a series of functions for manipulating URL s.

Get
The request module of urllib can easily grab the URL content, that is, send a GET request to the specified page, and then return the HTTP response:

For example, a URL for watercress https://api.douban.com/v2/book/2129650 Grab and return the response. If we want to simulate the browser to send a GET Request, we need to use the Request object. By adding an HTTP header to the Request object, we can disguise the Request as a browser. For example, simulate iPhone 6 to Request Douban home page:

demo15:

from urllib import request

req = request.Request('http://www.douban.com/')
req.add_header('User-Agent', 'Mozilla/6.0 (iPhone; CPU iPhone OS 8_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/8.0 Mobile/10A5376e Safari/8536.25')
with request.urlopen(req) as f:
    print('Status:', f.status, f.reason)
    for k, v in f.getheaders():
        print('%s: %s' % (k, v))
    print('Data:', f.read().decode('utf-8'))

out15:

Status: 200 OK
Date: Mon, 12 Jul 2021 03:51:13 GMT
Content-Type: text/html; charset=utf-8
Transfer-Encoding: chunked
Connection: close
Vary: Accept-Encoding
X-Xss-Protection: 1; mode=block
X-Douban-Mobileapp: 0
Expires: Sun, 1 Jan 2006 01:00:00 GMT
Pragma: no-cache
Cache-Control: must-revalidate, no-cache, private
X-DAE-App: talion
X-DAE-Instance: default
Set-Cookie: bid=Z3E5GFmVKng; Expires=Tue, 12-Jul-22 03:51:13 GMT; Domain=.douban.com; Path=/
X-DOUBAN-NEWBID: Z3E5GFmVKng
Server: dae
Strict-Transport-Security: max-age=15552000
X-Content-Type-Options: nosniff
X-Frame-Options: SAMEORIGIN
Data: #Many contents have been deleted

Post
If you want to send a request by POST, you only need to pass in the parameter data in bytes.

We simulate a microblog login, first read the login email and password, and then follow Weibo The format of the login page of CN is passed in as username = XXX & password = XXX:

demo16:

from urllib import request, parse

print('Login to weibo.cn...')
email = input('Email: ')
passwd = input('Password: ')
login_data = parse.urlencode([
    ('username', email),
    ('password', passwd),
    ('entry', 'mweibo'),
    ('client_id', ''),
    ('savestate', '1'),
    ('ec', ''),
    ('pagerefer', 'https://passport.weibo.cn/signin/welcome?entry=mweibo&r=http%3A%2F%2Fm.weibo.cn%2F')
])

req = request.Request('https://passport.weibo.cn/sso/login')
req.add_header('Origin', 'https://passport.weibo.cn')
req.add_header('User-Agent', 'Mozilla/6.0 (iPhone; CPU iPhone OS 8_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/8.0 Mobile/10A5376e Safari/8536.25')
req.add_header('Referer', 'https://passport.weibo.cn/signin/login?entry=mweibo&res=wel&wm=3349&r=http%3A%2F%2Fm.weibo.cn%2F')

with request.urlopen(req, data=login_data.encode('utf-8')) as f:
    print('Status:', f.status, f.reason)
    for k, v in f.getheaders():
        print('%s: %s' % (k, v))
    print('Data:', f.read().decode('utf-8'))

out16:

Login to weibo.cn...
Email: xxx #Hidden
Password: xxx #Hidden
Status: 200 OK
Server: WeiBo/LB
Date: Mon, 12 Jul 2021 04:00:26 GMT
Content-Type: text/html
Transfer-Encoding: chunked
Connection: close
Vary: Accept-Encoding
Cache-Control: no-cache, must-revalidate
Expires: Sat, 26 Jul 1997 05:00:00 GMT
Pragma: no-cache
Access-Control-Allow-Origin: https://passport.weibo.cn
Access-Control-Allow-Credentials: true
DPOOL_HEADER: 85-144-200-aliyun-core.jpool.sinaimg.cn
Data: {"retcode":50011002,"msg":"\u7528\u6237\u540d\u6216\u5bc6\u7801\u9519\u8bef","data":{"username":"617633113@qq.com","errline":691}}

7.XML

DOM vs SAX
There are two ways to manipulate XML: Dom and SAX. DOM will read the whole XML into memory and parse it into a tree, so it takes up a lot of memory and parsing is slow. The advantage is that it can traverse the nodes of the tree arbitrarily. SAX is a stream mode. It parses while reading. It occupies little memory and parses quickly. The disadvantage is that we need to handle events ourselves.

Normally, SAX is preferred because DOM takes up too much memory.

Parsing XML using SAX in Python is very concise. Usually, the event we care about is start_element,end_element and char_data, prepare these three functions, and then you can parse XML.

demo17:

from xml.parsers.expat import ParserCreate

class DefaultSaxHandler(object):
    def start_element(self, name, attrs):
        print('sax:start_element: %s, attrs: %s' % (name, str(attrs)))

    def end_element(self, name):
        print('sax:end_element: %s' % name)

    def char_data(self, text):
        print('sax:char_data: %s' % text)

xml = r'''<?xml version="1.0"?>
<ol>
    <li><a href="/python">Python</a></li>
    <li><a href="/ruby">Ruby</a></li>
</ol>
'''

handler = DefaultSaxHandler()
parser = ParserCreate()
parser.StartElementHandler = handler.start_element
parser.EndElementHandler = handler.end_element
parser.CharacterDataHandler = handler.char_data
parser.Parse(xml)

out17:

sax:start_element: ol, attrs: {}
sax:char_data: 

sax:char_data:     
sax:start_element: li, attrs: {}
sax:start_element: a, attrs: {'href': '/python'}
sax:char_data: Python
sax:end_element: a
sax:end_element: li
sax:char_data: 

sax:char_data:     
sax:start_element: li, attrs: {}
sax:start_element: a, attrs: {'href': '/ruby'}
sax:char_data: Ruby
sax:end_element: a
sax:end_element: li
sax:char_data: 

sax:end_element: ol

8.HTMLParser

demo18:

Using HTMLParser, you can parse the text and images in the web page.

from html.parser import HTMLParser
from html.entities import name2codepoint

class MyHTMLParser(HTMLParser):

    def handle_starttag(self, tag, attrs):
        print('<%s>' % tag)

    def handle_endtag(self, tag):
        print('</%s>' % tag)

    def handle_startendtag(self, tag, attrs):
        print('<%s/>' % tag)

    def handle_data(self, data):
        print(data)

    def handle_comment(self, data):
        print('<!--', data, '-->')

    def handle_entityref(self, name):
        print('&%s;' % name)

    def handle_charref(self, name):
        print('&#%s;' % name)

parser = MyHTMLParser()
parser.feed('''<html>
<head></head>
<body>
<!-- test html parser -->
    <p>Some <a href=\"#\">html</a> HTML&nbsp;tutorial...<br>END</p>
</body></html>''')

out18:

<html>


<head>
</head>


<body>


<!--  test html parser  -->

    
<p>
Some 
<a>
html
</a>
 HTML tutorial...
<br>
END
</p>


</body>
</html>

Keywords: Python

Added by fitzbean on Wed, 19 Jan 2022 05:01:09 +0200