Usage Summary of Python pyppeter module

I. simple code example

import asyncio
from pyppeteer import launch

async def main():
    browser = await launch()
    page = await browser.newPage()    #Open a new page
    await page.goto('https://www.baidu.com / ', visit Baidu
    await page.screenshot({'path': 'baidu.png'})  #Screenshot and store
    await browser.close()

asyncio.get_event_loop().run_until_complete(main())

II. Method of page object

await page. Method

1. Settings related

setUserAgent(str)

Set up UserAgent

setCookie(cookie1, cookie2.......)

Set cookie

cookies It should be a dictionary containing these fields:
name(str): Must fill
value(str): Must fill
url (STR)
domain (STR)
path (STR)
expires (Number): Unix Time in seconds
httpOnly (Boolean)
secure (Boolean)
sameSite(str): 'Strict'or'Lax'

2. Page related

goto(url)

Access page

reload()

Page loading completed

goBack()/goForward()

Page back / page forward

3. implementation of js

evaluate(js_str)

Execute js for an element

4. screenshots

screenshot(dict)

key in dict

Path (str): the file path where the image is saved. The screenshot type will be inferred from the file extension. #Basically, this is enough

Type (str): Specifies the screen capture type, which can be jpeg or png. The default is png.

Quality (int): the quality of the image, between 0-100. Not for png images.

Full page (bool): if it is true, please intercept the complete scrollable page. The default is False.

clip (Dictionary): Specifies the object of the page clipping area. This option should contain the following fields:

x (int): x coordinate of the upper left corner of the clipping area.

y (int): the y coordinate of the upper left corner of the clipping area.

Width (int): the width of the clipping area.

Height (int): the height of the cut area.

omitBackground (bool): hides the default white background and allows you to capture screenshots with transparency.

5. save pdf

pdf(dict)

Return: returns the generated PDF bytes object.

Path (str): the file path where the PDF is saved.
Scale (float): the scale of web page rendering, which is 1 by default.
Display header footer (bool): displays the header and footer. The default is False.
headerTemplate (str): HTML template for printing the title. Should be a valid HTML tag with the following classes.
Date: formatted print date
 title: file name
 url: file location
 pageNumber: current page number
 totalPages: total pages in the document
 Footer template (str): an HTML template for printing footers. The same template headerTemplate should be used.
Print background (bool): print background graphics. The default is False.
landscape (bool): paper direction. The default is False.
pageRanges (string): the range of paper to print, for example, "1-5,8,11-13". The default is an empty string, indicating all pages.
Format (str): paper format. If set, takes precedence over width or height. The default is Letter.
Width (str): the paper width, which accepts values marked with units.
Height (str): paper height, accept values marked with units.
Margin: the margin of the paper, which is None by default.
Top (str): top margin, accepts values marked with units.
Right (str): the right margin, which accepts values marked with units.
Bottom (str): the bottom margin, which accepts values marked with units.
Left (str): left margin to accept values marked with units.

6. Access to content

content()

Page text

print(await page.evaluate('document.body.textContent', force_expr=True)) That's OK.

cookies()

Page cookies

title()

Title

7. Get elements

All returned are ElementHandle or None

print(await page.querySelector('div selector'))    #Get the first
print(await page.querySelectorAll("CSS selector"))  #Get all

querySelectorEval('css selector','js_str','Front js Required parameters')  #Get the first one and execute js on it
querySelectorAllEval('css selector','js_str','Front js Required parameters') #Get all and execute js on it

await page.xpath('xpath Selector')

8. Waiting mode

# await page.waitForXPath('h3', timeout=300)
# await page.waitForNavigation(waitUntil="networkidle0")
# await page.waitForFunction('document.getElementByTag("h3")')
# await page.waitForSelector('.t')
# await page.waitFor('document.querySelector("#t")')
# await page.waitForNavigation(waitUntil='networkidle0')
# await page.waitForFunction('document.querySelector("").inner‌Text.length == 7')

9. Get the attribute or text in the ElementHandle

await (await ElementHandle_obj.getProperty('attribute')).jsonValue()
await (await ElementHandle_obj.getProperty('textContent')).jsonValue()  #text

10. Interact with browser

render operation in request ABCD HTML

You can refer to this request_html, which encapsulates pyppeter

Keywords: Python Attribute Unix

Added by danielson2k on Mon, 02 Dec 2019 15:30:10 +0200

Programming VIP