Crawling pictures on splash with selenium

Because at the beginning of learning crawler, there was a project to crawl a picture of a website, a picture. [website link] (https://www.splash.com/),
Today, I thought about this website when I was thinking about the project. Now I want to use selenium to scroll down the page, so as to crawl the content of multiple pages. This time, I crawled 160 pictures (10 pages down)
Code quantity: 50 lines

# coding: utf-8

from selenium import webdriver
import time,requests
from bs4 import BeautifulSoup

driver=webdriver.Chrome()

def get_page(driver): # Go to the specified Homepage
    page=driver.get('https://unsplash.com/')

def sroll_page(driver): # Scroll pages and return to page resources
    get_page(driver)
    # js script to scroll down to the bottom of the page
    js='window.scrollTo(0, document.body.scrollHeight);'
    for i in range(0,10): # Scroll the page ten times
        # This is to execute js script
        driver.execute_script(js)
        # Because if the page continues to go down to the bottom, you need to leave a time to wait for the page to load successfully. My network speed is very slow, so I left it for ten seconds, and then continued to scroll.
        time.sleep(10)
    return driver.page_source # Back to page resources
#Functions to parse pages
def parser_page(html):
    url_list=[]
    Soup=BeautifulSoup(html,'lxml')
    div=Soup.find_all('div',class_="_1OvAL _2T3hc _27nWV")
    for i in div:
        try: # Generally, I like to add an error capture to prevent one or two elements from cramping, which causes the whole program to stop. Here, the website is done in a standard way, and no error is caught
            x=i.find_all('a',itemprop="contentUrl")
            try:
                for z in x:
                    url_raw=z.get('href')
                        url='https://unsplash.com/'+url_raw+'/download?force=true'
                    url_list.append(url)
            except Exception as e:
                print('Small link error',e)
        except Exception as f:
            print('Big link error',f)

    return url_list
#Download pictures
def download_pic(list):
    #print(list,len(list))
    for i in range(len(list)):
        adress='D://Picture / {0}.png '.format(i)
        html=requests.get(list[i],verify=False)
        with open(adress, 'ab') as f:
            print('Downloading section{0}Zhang picture'.format(i+1))
            f.write(html.content)
            print('The first{0}Photos written successfully'.format(i+1))


def main(driver):
    html=sroll_page(driver)
    urls_list=parser_page(html)
    download_pic(urls_list)



if  __name__=='__main__':
    main(driver)

The above is the complete source code. Due to the comprehensive knowledge, it's not suitable for sprouting.

Keywords: Selenium network

Added by captainplanet17 on Tue, 31 Mar 2020 13:32:07 +0300

Programming VIP

Crawling pictures on splash with selenium

Popular Keywords