I. simple code example
import asyncio from pyppeteer import launch async def main(): browser = await launch() page = await browser.newPage() #Open a new page await page.goto('https://www.baidu.com / ', visit Baidu await page.screenshot({'path': 'baidu.png'}) #Screenshot and store await browser.close() asyncio.get_event_loop().run_until_complete(main())
II. Method of page object
await page. Method
1. Settings related
setUserAgent(str)
Set up UserAgent
setCookie(cookie1, cookie2.......)
Set cookie
cookies It should be a dictionary containing these fields: name(str): Must fill value(str): Must fill url (STR) domain (STR) path (STR) expires (Number): Unix Time in seconds httpOnly (Boolean) secure (Boolean) sameSite(str): 'Strict'or'Lax'
2. Page related
goto(url)
Access page
reload()
Page loading completed
goBack()/goForward()
Page back / page forward
3. implementation of js
evaluate(js_str)
Execute js for an element
4. screenshots
screenshot(dict)
key in dict
Path (str): the file path where the image is saved. The screenshot type will be inferred from the file extension. #Basically, this is enough Type (str): Specifies the screen capture type, which can be jpeg or png. The default is png. Quality (int): the quality of the image, between 0-100. Not for png images. Full page (bool): if it is true, please intercept the complete scrollable page. The default is False. clip (Dictionary): Specifies the object of the page clipping area. This option should contain the following fields: x (int): x coordinate of the upper left corner of the clipping area. y (int): the y coordinate of the upper left corner of the clipping area. Width (int): the width of the clipping area. Height (int): the height of the cut area. omitBackground (bool): hides the default white background and allows you to capture screenshots with transparency.
5. save pdf
pdf(dict)
Return: returns the generated PDF bytes object.
Path (str): the file path where the PDF is saved. Scale (float): the scale of web page rendering, which is 1 by default. Display header footer (bool): displays the header and footer. The default is False. headerTemplate (str): HTML template for printing the title. Should be a valid HTML tag with the following classes. Date: formatted print date title: file name url: file location pageNumber: current page number totalPages: total pages in the document Footer template (str): an HTML template for printing footers. The same template headerTemplate should be used. Print background (bool): print background graphics. The default is False. landscape (bool): paper direction. The default is False. pageRanges (string): the range of paper to print, for example, "1-5,8,11-13". The default is an empty string, indicating all pages. Format (str): paper format. If set, takes precedence over width or height. The default is Letter. Width (str): the paper width, which accepts values marked with units. Height (str): paper height, accept values marked with units. Margin: the margin of the paper, which is None by default. Top (str): top margin, accepts values marked with units. Right (str): the right margin, which accepts values marked with units. Bottom (str): the bottom margin, which accepts values marked with units. Left (str): left margin to accept values marked with units.
6. Access to content
content()
Page text
print(await page.evaluate('document.body.textContent', force_expr=True)) That's OK.
cookies()
Page cookies
title()
Title
7. Get elements
All returned are ElementHandle or None
print(await page.querySelector('div selector')) #Get the first print(await page.querySelectorAll("CSS selector")) #Get all querySelectorEval('css selector','js_str','Front js Required parameters') #Get the first one and execute js on it querySelectorAllEval('css selector','js_str','Front js Required parameters') #Get all and execute js on it await page.xpath('xpath Selector')
8. Waiting mode
# await page.waitForXPath('h3', timeout=300) # await page.waitForNavigation(waitUntil="networkidle0") # await page.waitForFunction('document.getElementByTag("h3")') # await page.waitForSelector('.t') # await page.waitFor('document.querySelector("#t")') # await page.waitForNavigation(waitUntil='networkidle0') # await page.waitForFunction('document.querySelector("").innerText.length == 7')
9. Get the attribute or text in the ElementHandle
await (await ElementHandle_obj.getProperty('attribute')).jsonValue() await (await ElementHandle_obj.getProperty('textContent')).jsonValue() #text
10. Interact with browser
render operation in request ABCD HTML
You can refer to this request_html, which encapsulates pyppeter