Basic explanation of Python crawler: the use of threading module

Python crawler, data analysis, website development and other case tutorial videos can be viewed online for free

https://space.bilibili.com/523606542 

Python learning exchange group: 1039649593

Use of threading module

python's thread module is the underlying module, and python's threading module wraps thread, which can be used more conveniently.

1. Threading module creates thread object

Following the above case, we can use the blocking time of the program to make the program execute the following tasks, which can be realized in a multi-threaded way. Corresponding needs can be realized with the help of threading module:
As shown below

import time
import threading
def work():
"""Only function objects can distribute field name line product"""
print('5.Wash tea cups: 1min ' )
time.sleep(1)
print('6.Put tea: 1min ' )
time.sleep(1)

start_time = time .time()
print( '1.Washing pot: 1min ' )
time.s1eep(1)
print( '2.Cool water:1min ' )
time.sleep(1)
print( '3.Boil water: 1min ' )
time.sleep(1)
print( '4.Wait for the water to boil:3min ' )
work_thread = threading.Thread(target=work)

# Start thread object
work_thread.start()
time.sleep(1) # 5. Tea cup washing: 1min
time.sleep(1) # 6. Put tea: 1min
time.sleep(1)
print( '7.Make tea:1min ' )
time.sleep(1)
print('It took a total of: ',time.time() - start_time)

The above case is a single thread. It should be noted that the operation thread of threading module must operate function objects. A common function object can be transformed into a thread object through threading module.

2. Creating multithreading module

When a process is started, a main thread will be generated by default, because the thread is the smallest unit of the program execution flow. When multi threading is set, the main thread will create multiple sub threads. In python, by default, the main thread will exit after executing its own task. At this time, the sub thread will continue to execute its own task until the end of its own task.

import time
import threading

def upload():
print("Start uploading files...")
time.sleep(2)
print("Finish uploading files...")

def down1oad():
print("Start downloading files...")
time.s1eep(2)
print("Finished downloading files...")

if __name__ == '__main__':
upload_thread = threading.Thread(target=up1oad)
up1oad_thread .start()
up1oad_thread.join()
down1oad_thread = threading.Thread(target=down1oad,daemon=True)
down1oad_thread.start()
print('End of main thread')

In other words, the main thread will create multiple sub threads when allocating tasks, and the task progress of the sub threads will not hinder the execution of the main thread. However, the main thread will wait for the sub thread to complete the task before ending the main thread. In other words, in fact, the main thread executes the task first. If you want to end the whole thread after the main thread executes, you can set the guard main thread.

3. Multi thread parameter transfer

For multi-threaded parameter transmission, args is used to accept location parameters and kwargs is used to accept keyword parameters. As follows:

import threading

def get(ur1,header=None):
    print(ur1)
    print(header)

for url in [ 'https : / /www.baidu.com', 'https:/ /www. soso.com ' ,' https: / /www . 360. com']:
    # threading.Thread
get_thread = threading. Thread(target=get,args=(ur1, ), kwargs={ ' header ':{ 'user-agent ' : ' pythonrequests'}})
    get_thread.start

4. Resource competition generated by threads

First, let's look at a case:

import threading
import time
import random

def add1(n):
    for i in range(100) :
        time.sleep(random.randint(1,3))
        with open( 'he7lo.txt', mode='a', encoding='utf-8 ' ) as f:
            f.write(f'in} he1lo wor1d !'+ 'he7lo wor1d !'*1024)
            f.write(' \n ')

if __name__ == '___main__' :
    for n in range(10) :
        t1 = threading. Thread(target=add1,args=(n,))
        t1.start()

Keywords: Python Multithreading crawler

Added by johnmess on Tue, 01 Feb 2022 05:22:47 +0200