Series selection
Python crawlers are slow? Concurrent programming to understand it
Daemon thread
In Python multithreading, after the code of the main thread runs, if there are other sub threads that have not been executed, the main thread will wait for the sub threads to execute before ending; This creates a problem. If a thread is set to an infinite loop, it means that the whole main thread (Python program) cannot end. Take an example.
import threading import time # Non daemon thread def normal_thread(): for i in range(10000): time.sleep(1) print(f'normal thread {i}') print(threading.current_thread().name, 'Thread start') thread1 = threading.Thread(target=normal_thread) thread1.start() print(threading.current_thread().name, 'Thread end')
As can be seen from the above results, although the main thread has ended, the sub thread is still running. When the sub thread runs, the whole program really ends. If you want to terminate other unfinished threads when the main thread ends, you can set the thread as a daemon thread. If only the daemon thread is still executing and the main program ends, the Python program can exit normally. The threading module provides two ways to set up daemon threads.
threading.Thread(target=daemon_thread, daemon=True)
thread.setDaemon(True)
import threading import time # Daemon thread (forced to wait for 1s) def daemon_thread(): for i in range(5): time.sleep(1) print(f'daemon thread {i}') # Non daemon thread (no forced wait) def normal_thread(): for i in range(5): print(f'normal thread {i}') print(threading.current_thread().name, 'Thread start') thread1 = threading.Thread(target=daemon_thread, daemon=True) thread2 = threading.Thread(target=normal_thread) thread1.start() # thread1.setDaemon(True) thread2.start() print(threading.current_thread().name, 'Thread end')
thread1 is set as the daemon thread above, and the program ends directly after the non daemon thread and the main thread run. Therefore, daemon_ The output statement in thread () did not have time to execute. The output in the figure shows that normal is still output after the MainThread thread ends_ The reason is that it will take some time from the end of the main thread to the forced stop of the daemon thread.
Inheritance of daemon threads
The child thread will inherit the daemon attribute of the current thread. The main thread is a non daemon thread by default. Therefore, all new threads in the main thread are non daemon threads by default. However, when a new thread is created in the daemon thread, it will inherit the daemon attribute of the current thread, and the child thread is also a daemon thread.
join() blocking
In a multi-threaded crawler, the information of different pages is usually crawled through multiple threads at the same time, and then analyzed, processed and stored uniformly. This requires waiting for all sub threads to complete the execution before continuing the following processing. This requires the join() method.
The function of the join() method is to block (suspend) other threads (non started threads and main threads), wait for the called thread to run, and then wake up the operation of other threads. Look at an example.
import threading import time def block(second): print(threading.current_thread().name, 'The thread is running') time.sleep(second) print(threading.current_thread().name, 'Thread end') print(threading.current_thread().name, 'The thread is running') thread1 = threading.Thread(target=block, name=f'thread test 1', args=[3]) thread2 = threading.Thread(target=block, name=f'thread test 2', args=[1]) thread1.start() thread1.join() thread2.start() print(threading.current_thread().name, 'Thread end')
The above only uses join() for thread1. Pay attention to the position where join() is used. It is executed before thread2.start(). After execution, thread2 and the main thread are suspended. Thread2 and the main thread will execute only after thread1 thread is executed. Since thread2 is not a guard thread here, when the main thread is executed, Thread2 will continue to run.
See here, is there a question? If you follow the execution process of the above code, the whole program will become a single threaded program, which is caused by the improper use of join(). Let's change the above code a little.
import threading import time def block(second): print(threading.current_thread().name, 'The thread is running') time.sleep(second) print(threading.current_thread().name, 'Thread end') print(threading.current_thread().name, 'The thread is running') thread1 = threading.Thread(target=block, name=f'thread test 1', args=[3]) thread2 = threading.Thread(target=block, name=f'thread test 2', args=[1]) thread1.start() thread2.start() thread1.join() print(threading.current_thread().name, 'Thread end')
Now the program is truly multithreaded. When the join() method is used, only the main thread is suspended. When thread1 is executed, the main thread will be executed.
Finally, it should be noted that the blocking of the join() method is object independent and has nothing to do with whether it is a daemon thread or a main thread. It should be noted that if you want real multi-threaded operation, you need to start all sub threads and call join(), otherwise it will become a single thread!
This is all the content of this article, if it feels good. ❤ Point a praise before you go!!! ❤
For those who have just started Python or want to start python, they can exchange and learn together through wechat search [new vision of Python]. They all come from novices. Sometimes a simple question card takes a long time, but they may suddenly realize it at the touch of others, and sincerely hope that everyone can make progress together.