Do you understand all these questions at the beginning of Python? (including learning path)

Python basic syntax learning path

python concept level

Expression → create and process objects

Statement → include expression

Logical unit → function or class, composed of statements

Module →. py code file composition module

Package → define a group of related files or modules (package is a folder, module is a file in it, and the folder includes an init.py file)

Program → several packages + several files

What are the ten mistakes?

1. Variable naming problem

2. Difference between numerical value and string calculation

3. The difference between list and dictionary

4. Sequence index problem

5. Dictionary index problem

6,range()

7. Two kinds of operators: assignment operation and comparison operation

8. Dead cycle problem

*9. The difference between f(x) and f(x)

10. The difference between return and print in function

Implementing the first data crawler in Python

Let's make a crawler of Douban Book score.

Import module:

import requests
from bs4 import BeautifulSoup
​
print('Module imported successfully')

The code annotation is very clear, so it will not be explained here.

The code is as follows:

# Extract label
​
#print(soup.head)  # Header information
print(soup.title) # Title
print(soup.a)  # First a tag extracted
<title>New book Express</title>
<a class="nav-login" href="https://Accounts. Double. COM / Passport / login? Source = book "rel =" nofollow "> sign in / register</a>
# Label, attribute, element
​
print(soup.a.name,type(soup.a.name))
print(soup.a.attrs,type(soup.a.attrs))
print(soup.a.text,type(soup.a.text))
​
soup.a.attrs['href']
a <class 'str'>
{'href': 'https://accounts.douban.com/passport/login?source=book', 'class': ['nav-login'], 'rel': ['nofollow']} <class 'dict'>
//Log in / register < class' STR '>


# Find all() → find all tags
​
urls = soup.find('div',class_="grid-12-12 clearfix").find_all('a')
url_lst = []
​
for url in urls[::2]:
    url_lst.append(url['href'])
    # Save all URLs
​
print(len(url_lst))
print(url_lst[:5])
40
['https://book.douban.com/subject/30475767/', 'https://book.douban.com/subject/30488936/', 'https://book.
# Create function and collect page information
​
def get_data(ui):
    ri = requests.get(url = ui)  
    soupi = BeautifulSoup(ri.text,'lxml')   
        # Visit page + page resolution
    infors = soupi.find_all('div',class_="detail-frame")
    lst = []
    for i in infors:
        dic = {}
        dic['Title'] = i.find('h2').text.replace('\n','')
        dic['score'] = i.find_all('p')[0].text.replace('\n','').replace(' ','')
        dic['Other information'] = i.find_all('p')[1].text.replace('\n','').replace(' ','')
        dic['brief introduction'] = i.find_all('p')[2].text.replace('\n','').replace(' ','')
        lst.append(dic)
    return lst
    # Function build complete
​
url = 'https://book.douban.com/latest'
result = get_data(url)
    # Call function to collect data
​
result[:3]

Output:

[{'Title': 'Life sea',
  'score': '8.4',
  'Other information': 'Mai Jia/Beijing October literature and Art Press/2019-4-16',
  'brief introduction': 'A person goes through the life of struggle in the era, and there is a sigh of life in the strange story,There is cruelty in daily life, and kindness in time. A new work by the Mai family.'},
 {'Title': 'Hesse's fairy tale for everyone',
  'score': '8.9',
  'Other information': '[Virtue]Herman·Hesse/Unread·Artists Beijing United Publishing Company/2019-4',
  'brief introduction': 'It contains 20 fairytale works of Hesse, from the first fairy tale "two brothers" written when he was ten years old, to "iris" dedicated to his wife, and then to "king you of Zhou" from the war drama princes.'},
 {'Title': '33 A revolution',
  'score': '8.8',
  'Other information': '[ancient]Carnegie·Sanchez·Guevara/Shanghai People's Publishing House/2019-4',
  'brief introduction': 'This collection of short stories is based on Cuba after the revolution. It compares the society to a thirty-three turn per minute skipping pin record, which is played repeatedly. People are facing material and spiritual difficulties day by day. The author is the grandson of Guevara, the leader of the Cuban revolution.'}]

Data conversion:

# Data transformation - dataframe
​
import pandas as pd
​
df = pd.DataFrame(result)
df

result:

Source network, for learning purposes only, invasion and deletion.

Don't panic. I have a set of learning materials, including 40 + E-books, 800 + teaching videos, involving Python foundation, reptile, framework, data analysis, machine learning, etc. I'm not afraid you won't learn! https://shimo.im/docs/JWCghr8prjCVCxxK/ Python learning materials

Pay attention to the official account [Python circle].

file

Keywords: Programming Python Attribute network

Added by AliceG on Fri, 24 Apr 2020 12:29:47 +0300