With the aesthetic orientation of health, beauty, youth and fashion, 169 Beauty Picture Network shows the beauty of beauty for the vast number of netizens and appreciates the beauty and feelings of the contemporary young female generation.
Source sharing:
1 ''' 2 What can I learn from my learning process? 3 python Learning Exchange Button qun,934109170 4 There are good learning tutorials, development tools and e-books in the group. 5 Share with you python Enterprises'Current Demand for Talents and How to Learn Well from Zero Foundation python,And learn what. 6 ''' 7 import requests 8 from pyquery import PyQuery as pq 9 import os 10 headers={ 11 'user-agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36 SE 2.X MetaSr 1.0' 12 } 13 14 #A module for downloading pictures 15 def Download_the_module(file,tehurl): 16 count = 1 17 # Go to the website and download pictures 18 The_second_request = requests.get(tehurl, headers=headers).text 19 # download 20 The_doc = pq(The_second_request) 21 Download_the_pictures = The_doc('.big_img') 22 Take_out_the=pq(Download_the_pictures.html()) 23 Extract_the=Take_out_the.find('img').items() 24 for i in Extract_the: 25 save=i.attr('src') 26 #print(save) 27 The_sponse=requests.get(save,headers=headers) 28 The_name='F:/picture/'+file 29 Save_the_address = str(The_name) 30 # Check if there is image Create a directory if it does not exist 31 if not os.path.exists(Save_the_address): 32 33 34 os.makedirs('F:/picture/' + file) 35 else: 36 37 38 with open(Save_the_address+'/%s.jpg'%count,'wb')as f: 39 f.write(The_sponse.content) 40 print('Downloaded%s Zhang'%count) 41 count += 1 42 #Crawl address 43 def Climb_to_address(page): 44 45 URL='https://www.169tp.com/gaogensiwa/list_3_%s.html'%page 46 sponse=requests.get(URL,headers=headers) 47 sponse.encoding='gbk' 48 encodin=sponse.text 49 doc=pq(encodin) 50 extract=doc('.pic').items() 51 for i in extract: 52 #file name 53 The_file_name=i.text() 54 #Extracted website 55 The_url=i.attr('href') 56 57 Download_the_module(The_file_name,The_url) 58 59 #There are 616 pages altogether. 60 a=int(input('Please enter the number of pages to start crawling:')) 61 b=int(input('Please enter the number of pages that end crawling:')) 62 Climb_to_address(a,b)
One advantage of using Python is that it can do repetitive work instead of us, release our labor force, and let us have time to do what we like (Tou Lan).
There are two problems with this crawler. One is: crawling the website does not have any anti climbing mechanism, so you basically do not need to add anything to header. Setting up session and cookie also provides us with great convenience, and the code is simple to write. The second problem is: the program can not be interrupted, but once interrupted, you have to start downloading again, so there should be a way to set where the crawler starts to crawl. In fact, this problem is not difficult to solve, just as homework, you can try it when you have time!