Basic syntax of Python


1.python interpreter type
1.cpython:c language development
2.jpython: java language development
3.Ironpython:.net language development
2. Composition of programs in Python
The program is composed of modules
The module contains statements, functions, classes, etc
Statement contains an expression
The expression establishes and processes the data object and returns the reference relationship of the object
3. Data types in Python
int float complex (1+2j is equivalent to complex number) boolean type
4.python arithmetic operators and expressions
Expressions: consist of math and operators
Function: let the computer do something and return the value
Operator: + - * / / floor division (division and rounding)% remainder exponentiation
5. Numeric type function
1.abs() returns the absolute value of a number
2.round(number,[ndigits])
number: data to be processed
ndigits: keep several decimal places after the decimal point
In python, if the parameter of a function has []. It means that this parameter is optional. It can be passed or not
3.pow(x,y,z=None)
xy%z
6. Variables
1. Concept
A variable is an identifier associated with an object
2. Function
It is used to bind an object and return the reference relationship of the object for reuse in the future
3. Naming rules of variables
Start with a letter or underscore followed by a letter, underscore or number
You cannot use keywords in python as variable names (about 33)
7. Assignment statement
1. Grammar
Variable name = expression
2. Type
The variable itself has no type, and its type is determined by the bound object
3. Description
1. When the variable does not exist, create this variable and establish a relationship with the object
2. When the variable exists, change the binding relationship between the variable and the object
4. Attention
1. A variable can only be bound to one variable
2. An object can bind multiple variables
8. Small integer object pool in Python
In cpython, the numbers in the range of - 5 ~ 256 will always exist in the memory address and will never be released
9. Large integer object pool in pycharm
In order to save memory, pycharm developers default the same object in the same file to the same object
10. Basic output function [print]
1. Function
Output a series of values to the device in the form of a standard
2. Grammar
print(value,sep="",end="/n")
value output content
; sep stands for the default separator between output contents

11. Standard input function [input]
1. Function: obtain a series of characters from the standard input device
2. Syntax: input("Prompt string")
3. Return type: str string
Notes are used #.

12. Type conversion function
1.float(): converts a string or number to a floating-point number
2.int(): converts a string or number to a decimal integer
int([obj],base=10)
Obj: data to be processed, base: represents the base of obj data
For example: int("11", base=2), then return 3,

13. Comparison operation

,>=,<,<=,==,!=

1. Statement statement

1. Concept

Statement is the smallest unit of python program execution

2. Explain

Multiple statements written in one line need to use (;) Separate, but this is not recommended

3. Line break [\]

1. Show wrap

The line break character [\] tells the interpretation executor, and the code of the next line is also the content of this statement

2. Implicit line feed

When the contents in all parentheses wrap, the interpretation actuator will automatically go to the next line to find the corresponding parentheses until they are found

s=100+300+56+2345+10
s1=100+300+56+\
   2345+10
s2=(100+300+56+
    2345+10)
print(s)
print(s1)
print(s2)

2. if conditional judgment statement

1. Function

Let the program selectively execute one or some statements according to conditions

2. Grammar

if Truth expression 1:
    Statement block 1
elif Truth expression 2:
    Statement block 2
elif Truth expression 3:
    Statement block 3
else:
    Statement block 4

3. Explain

1. The truth value judgment will be carried out from top to bottom. If the value of a truth value expression is True, the statement in it will be executed, and then the execution of the if statement will be ended. If the values of all truth value expressions are False, the statement block in the else statement will be executed

2. elif clauses can have 1, 0 or more

3. else clause can only have 1 or 0

# n=int(input("please enter a number:")
# if n%2==0:
#     Yes, "print!")
# else:
#     print(n, "is odd!")

# n=int(input("please enter a number:")
# if n>0:
#     print(n, "is a positive number!")
# if n<0:
#     print(n, "is negative!")
# else:
#     print("n is 0!")

month=int(input("Please enter the number of months:"))
if 1<=month<=12:
    # pass #placeholder 
    if month<=3:
        print("spring!")
    elif month<=6:
        print("summer!")
    elif month<=9:
        print("autumn!")
    else:
        print("Winter!")
else:
    print("The month you entered is incorrect, please re-enter!")

The difference between if and elif: if there are many events in the program and if is used, the whole program will be traversed. If elif is used, as long as if or one of the subsequent elifs meets the conditions, the program will end the program after executing the corresponding input statement (that is, the subsequent elif and else statements will not be executed redundantly) to improve efficiency

practice

Write a program respectively, input the students' scores in three subjects, and judge the highest score and the lowest score

a=int(input("Please enter the grade of the first subject:"))
b=int(input("Please enter the grade of the second subject:"))
c = int(input("Please enter the grade of the third subject:"))
# Method 1
# if c < a > b:
#     print(a, "is the highest score!")
# elif a<b>c:
#     print(b, "is the highest score!")
# else:
#     print(c, "is the highest score!")

# Method 2
m=a   # Suppose a is the highest score
if b>m:
    m=b  #Because b is higher than the highest score, b is assigned to the variable with the highest score
if c>m:
    m=c
print(m,"Is the highest score!")

3. String [str]

1. Function

Used to record text information

2. Representation

The parts enclosed in quotation marks are called strings

'' single quotation mark

"" double quotes

Three single quotation marks

"" three double quotes

3. Quotation mark description

Double quotation marks inside single quotation marks are not terminators

A single quotation mark within a double quotation mark is not a terminator

>>> s='hello world'
>>> type(s)
<class 'str'>
>>> s
'hello world'
>>> s="hello world"
>>> s
'hello world'
>>> s='I'm a student'
  File "<stdin>", line 1
    s='I'm a student'
         ^
SyntaxError: invalid syntax
>>> s="I'm a student"
>>> s
"I'm a student"

Function of three quotation marks:

Three quotation marks can contain single quotation marks and double quotation marks

Newline characters in a three quote string are automatically converted to \ n

Three quotation marks are generally used to represent the document string of a function or class

>>> s="""hello world
... my name is xx
... """
>>> s
'hello world\nmy name is xx\n'
>>> print(s)
hello world
my name is xx

4. Escape character

Use escape characters to represent special characters

String literals use string \ followed by some characters to represent a special string

s='I\'m a student'   #Use \ 'to represent a‘

Common escape characters

SymboldescribeSymboldescribe
\'Represents a single quotation mark\"Represents a double quotation mark
\nRepresents a newline character\\Represents one\
\rReturns the cursor to the beginning of the line\tHorizontal tab (Tab)

5. Original string [raw]

1. Function

Invalid transfer character \

2. Grammar

r "string"

>>> s="C:\newfile\test.py"
>>> print(s)
C:
ewfile  est.py
>>> s=r"C:\newfile\test.py"
>>> print(s)
C:\newfile\test.py
>>>

6. String operation

1. Symbols

+The plus sign operation is used to splice strings

+=Operator is used to splice the original string with the string on the right to generate a new string

*Used to generate duplicate strings

*=Generate a duplicate string and bind the original variable to the generated string

7. String comparison operation

1. Symbols

> >= < <= == !=

2. Comparison rules

Compare according to the Unicode encoding value corresponding to the character

Unicode code is called universal code 65535

3. Function

chr() returns the character corresponding to the Unicode encoding

ord() returns the Unicode encoding corresponding to a character

>>> ord("Xu")
24464
>>> ord("a")
97
>>> ord("A")
65
>>>
>>> chr(24464)
'Xu'
>>> chr(55555)
'\ud903'
>>> chr(46783)
'뚿'
>>>
>>> "a">"A"
True
>>> "ABC">"abc"
False
>>> "ABC" > "Abc"
False
>>> "ABC" >"ABCD"
False
>>>

8. Index of string [index]

1. Function

Sequences can access elements or objects in the sequence through indexes

2. Grammar

String [integer expression]

3. Explain

The forward index starts from 0, the second index is 1, and so on. The index of the last element is the length of the string - 1

The reverse index starts from - 1. The index of the last element is - 1, the penultimate is - 1, and so on. The index of the first element is the opposite of the length of the string

A B C D E F G

0 1 2 3 4 5 6

-7 -6 -5 -4 -3 -2 -1

practice:

Enter any string and print out the first character and the last character of the string

If the length of the string is even, print out an @ symbol. If the length of the string is odd, print out the middle character

len() returns the length of a sequence

s=input("Please enter a string:")
print(s[0])
print(s[-1])
if len(s)%2==0:
    print("@")
else:
    mid_index=len(s)//2 # find the index corresponding to the middle character
    print(s[mid_index])

9. slice of string

1. Function

Extract consecutive or spaced elements from a string

2. Grammar

String [start index: end index: step size]

3. Parameters

1. Start index: the position where the slice is cut. 0 represents the first element and 1 represents the second element

2. End index: the end point of the slice, but excluding the end point

3. Step size: the direction and offset of the slice after obtaining the current element each time. No step size is equivalent to moving the position of an index to the right after the value is taken (1 by default)

When the step size is positive, take the positive slice
When the step size is negative, the reverse slice is taken

When the slice of a string contains step size, it is equal to cutting the current element. After cutting the current element, use the index of the current element plus step size to get a new index and get the element corresponding to the new index

When the step size of a string slice is negative, the element corresponding to the starting index must be on the right of the element corresponding to the ending index before reverse slicing can be carried out

A B C D E F G

0 1 2 3 4 5 6

-7 -6 -5 -4 -3 -2 -1

>>> s[0:3]
'ABC'
>>> s[1:5]
'BCDE'
>>> s[1:-1]
'BCDEF'
>>> s[0:-1:2]
'ACE'
>>> s[-1:0:-2]
'GEC'
>>>
>>> s[0:-1:-2]

practice:

Enter a string to judge whether the string is a palindrome

ABCBA

Shanghai's tap water comes from the sea

4. Format string

1. Function

Generate a formatted string

2. Grammar

Format string% parameter value

Format string% (parameter value 1, parameter value 2)

fmt="full name:%s,Age:%d"
name="Xiao Ming"
age=20
print(fmt%(name,age))
Symboldescribe
%sString placeholder
%dDecimal integer placeholder
%fDecimal floating point placeholder

3. Modifier

-Align left (right by default)

+Show positive sign

0 fill in the blank space on the left

Width: the width of the entire data output

Precision: how many decimal places are reserved

>>> a
123
>>> "%d"%123
'123'
>>> "%10d"%123
'       123'
>>> "%-10d"%123
'123       '
>>> "%-+10d"%123
'+123      '
>>> "%0+10d"%123
'+000000123'
>>> "%f"%123.456
'123.456000'
>>> "%.2f"%123.456
'123.46'
>>> "%2f"%123.456
'123.456000'
>>> "%f"%123.4567891
'123.456789'
>>> "%.7f"%123.4567891
'123.4567891'
>>>

5. while loop

1. Function

Let the program execute one or more statements repeatedly according to the conditions

2. Grammar

initial condition 
while Truth expression:
    Statement block 1
    Condition variation
else:
    Statement block 2

3. Process

1. Define an initial condition first

2. First judge the True value expression and test whether the Boolean value is True or False

3. If the truth expression is True, execute statement block 1, and then return to step 2 to judge the truth expression

4. If the truth expression is False, execute statement block 2 and end the execution of the while statement

i=1  #A variable used to record the number of cycles

while i<=10:
    print("hello world")
    i+=1

4. Attention

1. You want to control the value of the loop's truth expression to prevent dead loops

2. Loop conditions are usually controlled by loop variables in truth expressions

3. Usually, the loop variables need to be changed inside the loop statement block to control the number of loops and the direction of variables

6. Nesting of while loops

The while statement itself is a compound statement, which can be nested into another statement

1. Grammar

while Truth expression 1:
    Statement block 1
    while Truth expression 2:
        Statement block 2
    else:
        Statement block 3
else:
    Statement block 4

All integers in the range of 1 ~ 20 are displayed in one line, and each integer is separated by a space

Print 10 lines of this data

j=1
while j<=10:
    i=1
    while i<=20:
        print(i,end=" ")
        i+=1
    else:
        print() #Print a line break
    j+=1

7. break statement

1. Function

Used in a while for loop statement to terminate the execution of the current loop statement

2. Explain

1. When the break statement is executed, all statements after the break statement will not be executed

2. break statements are usually used in combination with if statements

3. When a break statement terminates a loop, the else clause of the loop statement will not execute

4. The break statement can only terminate the execution of the current loop. If a loop is nested, it will not jump out of the nested outer loop

5. The break statement can only be used inside a loop statement

i=1
while i<10:
    print("At the beginning of the cycle i=",i)
    if i==5:
        break
    print("At the end of the cycle i=",i)
    i+=1
else:
    print("else Statement executed!")
print("When the program exits i=",i)

practice:

Enter a positive integer and print whether the number is a prime number

n=int(input("Please enter a positive integer:"))
if n<=1:
    print("Not prime!")
elif n==2:
    print(n,"It's prime!")
else:
    i=2
    while i<n:
        if n%i==0:
            print("Not prime")
            break
        i+=1
    else:
        print("It's prime!")

8. for loop

1. Function

Used to traverse data elements in iteratable objects

2. Grammar

for variable in Iteratable object:
    Statement block 1
else:
    Statement block 2

3. Iteratable object

It refers to objects that can obtain data in turn, including non empty strings, non empty lists, non empty dictionaries, tuples, etc

4. Explain

1. The variable is successively assigned with the elements given by the iteratable object each time, and then the statement block 1 is executed

2. After the iteratable object cannot provide data elements, execute the statement part in the else clause, and then exit the loop

3. The else clause can be omitted

s="ABCDEF"
for i in s:
    print(i,end=" ")
    print()
else:
    print("Loop terminated due to end of iteration!")

practice:

Write a program, input a string, and print out how many spaces there are in the string

s=input("Please enter a string:")
count=0 #Variable used to count the number of spaces
# i=0
# lenth=len(s)
# while i<lenth:
#     if s[i]==" ":
#         count+=1
#     i+=1
# print("the number of spaces entered is% d"% count ")

for i in s:
    if i==" ":
        count+=1
print("The number of spaces entered is%d individual"%count)

9. Nesting of for loops

for x in "ABC":
    for y in "123":
        print(x+y)

10. range function

1. Function

Used to create an iteratable object that generates a series of integers (also known as an integer sequence generator)

range(stop) starts from 0, generates an integer each time, and then adds 1 until stop

range(start,stop,step) starts from start, generates an integer each time, and then moves step until stop

for i in range(10):
    print(i,end=" ")
print()


for i in range(1,20,2):
    print(i,end=" ")

11. continue statement

1. Function

It is used in the loop statement. The statement after continue in this loop is no longer executed, and a new loop is restarted

2. Explain

1. Executing the continue statement in the while loop will directly jump to the truth expression of the while statement to re judge the loop conditions

2. Executing the continue statement in the for loop will remove an element from the iteratable object, bind the variable and cycle again

for i in range(5):
    if i==2:
        continue
    print(i)

12. Random random module

1,random.random()

Used to generate a random floating-point number in the range of 0 to 1

2,random.randint(a,b)

Used to generate an integer within a specified range, where a is the lower limit and b is the upper limit

3,random.randrange(start,stop,step)

Gets a random number from the specified set according to the increasing cardinality

import random

a=random.random()
print(a)
b=random.randint(1,100)
print(b)
c=random.randrange(1,100,2)
print(c)

13. List [list]

1. Function

A container used to store any type of data

2. Concept

1. A list is a container that can store any type of data

2. The list is a variable sequence

3. There is no correlation between elements, and there is a sequential relationship between them

3. Representation

[]

4. List operation

+It is used to splice the list and generate a new list. The memory address will change

+=It is used to splice the original list and the right list, and rebind the new list with this variable. The memory address will not change

*Generate a duplicate list, generate a new list, and the memory address will change

*=Used to generate a duplicate list and rebind the new list with this variable. The memory address will not change

>>> L=[100,200,300,400]
>>> id(L)
1883498866184
>>> L=L+[500,600]
>>> L
[100, 200, 300, 400, 500, 600]
>>> id(L)
1883498866824
>>> L=[100,200,300,400]
>>> id(L)
1883498866184
>>> L+=[500,500]
>>> L
[100, 200, 300, 400, 500, 500]
>>> id(L)
1883498866184
>>>
>>> s="abc"
>>> id(s)
1883497905432
>>> s=s+"de"
>>> s
'abcde'
>>> id(s)
1883498848184
>>> s="abc"
>>> id(s)
1883497905432
>>> s+="de"
>>> s
'abcde'
>>> id(s)
1883498848128
>>>

5. List comparison operation

1. Symbols

> >= < <= == !=

2. Rules

Compare the sizes according to the Unicode encoded values of the characters (data types must be the same) in the corresponding positions of the list

3. in not in operator

Determine whether an object exists in the sequence

6. Index and slice of list

The index of the list is exactly the same as the slicing rule in the same string

7. Index assignment of list

A list is a variable sequence, and the elements in the list can be changed by index assignment

>>> L=[100,200,300]
>>> L[0]="hello"
>>> L
['hello', 200, 300]
>>>

8. Slice assignment of list

1. Function

The sorting of the original list can be changed, and data can be inserted and modified

Slice can be used to change the value of the corresponding element of the list

2. Grammar

List [slice] = iteratable objects

>>> L=[100,200,300,400,500,600]
>>> L[0:1]
[100]
>>> L[0:1]=["A"]
>>> L
['A', 200, 300, 400, 500, 600]
>>> L[0:1]=["A","B"]
>>> L
['A', 'B', 200, 300, 400, 500, 600]
>>>
>>> L[0:3]
['A', 'B', 200]
>>> L[0:5:2]
['A', 200, 400]
>>> L[0:5:2]=["hello","world","name"]
>>> L
['hello', 'B', 'world', 300, 'name', 500, 600]
>>> L[0:5:2]=["hello","world","name",100]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: attempt to assign sequence of size 4 to extended slice of size 3
>>> L[0:5:2]=["hello","world","name",]
>>> L
['hello', 'B', 'world', 300, 'name', 500, 600]
>>>
>>> L[0:0]
[]
>>> L[0:0]=[1000]
>>> L
[1000, 'hello', 'B', 'world', 300, 'name', 500, 600]

append() method

Append an element to the end of the list

L.append()

14. DICTIONARY [dict]

1. Concept

1. A dictionary is a variable container that can store any type of data

2. Each data in the dictionary is mapped and stored with key value pairs

3. Every data in the dictionary is indexed by keys, not by subscripts

4. The keys in the dictionary cannot be repeated, and only immutable types can be used as the keys of the dictionary

5. There is no sequential relationship between the data of the dictionary, and the storage of the dictionary is disordered

2. Function

It can improve the retrieval speed of data

3. Representation

Enclose with {} and separate each key value pair with a comma. Between keys and values:

>>> d={}
>>> d
{}
>>> type(d)
<class 'dict'>
>>> d={"name":"Xiao Ming","age":20}
>>> d
{'name': 'Xiao Ming', 'age': 20}
>>>

4. Dictionary index

1. Grammar

dict[key]

>>> d["name"]
'Xiao Ming'
>>> d["age"]
20
>>>

5. Basic operation of dictionary

1. Add key value pair

dict[key]=value

2. Modify key value pair

dict[key]=value

3. Delete key value pair

del dict[key]

6. in not operator

Determine whether a key exists in the dictionary

15. tuple

1. Concept

Tuples are immutable sequences

Tuples are containers that can store any data type, and there is a sequential relationship between them

2. Representation

Enclose with (), and enclose a single element with a comma to distinguish whether it is a single object or a tuple

>>> t=(100,200)
>>> t
(100, 200)
>>> type(t)
<class 'tuple'>
>>> t=(100)
>>> t
100
>>> type(t)
<class 'int'>
>>> t=(100,)
>>> t
(100,)
>>> type(t)
<class 'tuple'>

3. Operation of tuples

+ += * *= < <= > >= == != in not in

The operation rule of the rule is the same as that of the list (the operation can be performed only when the data type is the same)

4. Index and slice of tuples

The index and slicing rules of tuples are the same as those of strings, and do not have the function of slicing and index assignment

16. String text parsing function

1,split()

Split the string, slice the string through the specified separator, and return the split string list

S.split()

>>> s="www.baidu.com"
>>> s.split(".")
['www', 'baidu', 'com']
>>>

2,join()

S.join(iterable)

Use the string in the iteratable object to return a string separated by S

>>> L="hello"
>>> "-".join(L)
'h-e-l-l-o'
>>> " ".join(L)
'h e l l o'
>>>

17. Function

1. Concept

A function is a block of statements that can be repeated

Functions can be regarded as a collection of program statements and given a name instead

2. Function

It can be reused to improve the reusability of code

3. Grammar

def Function name(parameter list[Formal parameter]):
    Statement block

4. Explain

1. The function name must be an identifier, which conforms to the naming rules of variables

2. The function has its own namespace. To let the function process external data, pass in some data to the secondary function through the parameter list. If you do not need to pass in parameters, the parameter list can be empty, but the statement part cannot be empty and needs to be filled with pass statement

3. The function name is a variable and cannot be assigned easily

def say_demo():
    print("hello world")
    print("wlecome to Hangzhou")
    print("xxx")


say_demo()  #Function call

5. Function call

1. The call to a function is an expression

2. If there is no return statement, this function returns the None object after execution

3. If the function needs to return other objects, use the return statement

6. return statement

1. Grammar

return [expression]

2. Function

Used in a function to end the execution of the current function, return to the place where the function is called, and return a reference relationship to the object

3. Explain

1. The expression after the return statement can be omitted, which is equivalent to return None

2. If there is no return statement in the function, the function returns None after executing the last statement, which is equivalent to adding a return None statement at the end

day3
1. Transfer method of function arguments
1. Position transmission reference
The correspondence between arguments and formal parameters is in sequence according to the position
def func(a,b,c):
print("a",a)
print("b",b)
print("c",c)

	func(100,200,300)
	
	2.Sequence transmission parameter
	Sequence parameter passing refers to the use of*After the sequence is disassembled, the parameters are transferred in the way of position parameter transfer
	def func(a,b,c):
		print("a",a)
		print("b",b)
		print("c",c)
		
	L[100,200,300]
	func(*L)
	
	3.Keyword transfer parameter
	When parameters are passed, formal parameters and arguments are matched by name
	def func(a,b,c):
		print("a",a)
		print("b",b)
		print("c",c)
		
		func(a=100,b=200,c=300)

	4.Dictionary keyword transfer parameter
	When the argument is a dictionary, use**After disassembling the dictionary, transfer the parameters according to the keyword
	def func(a,b,c):
		print("a",a)
		print("b",b)
		print("c",c)
	
	d=("a":100,"b":200,"c":300)
	func(**d)
2,Default parameters for function
	1.grammar
	def Function name(Parameter name 1=The default value is,...): 
		pass
	Default parameters must exist from right to left. If one parameter is a default parameter, all parameters on the right must be default parameters
3,How to define the formal parameters of a function
	1.Positional parameter
	Accept arguments according to their position
	2.Asterisk tuple parameter
	def Function name(*args):
		print("The number of arguments is",len(args))
		print("args",args)
	3.Named keyword parameter
		1.grammar
		def func(*,Named keyword parameter)
			pass	
		def func(*args,Named keyword parameter)
		    pass
		2.effect
		All named keyword parameters must be passed with keyword parameters or dictionary keywords
		def func(**kwargs):
		print("The number of keyword parameters is:",len(kwar gs))
		print("kwar gs=",kwar gs)
		func(a=100,b=200,c="300")
		Order of function parameters from left to right:
		Position parameter asterisk tuple parameter naming keyword parameter double asterisk keyword parameter

	4.Double star dictionary parameter
	1.grammar
	def func(**kwargs)
		  pass
	2.effect
	Collect redundant keyword parameters
	def func(**kwargs):
		print("The number of keyword parameters is:",len(kwar gs))
		print("kwar gs=",kwar gs)
		func(a=100,b=200,c="300")

	Order of function parameters from left to right:
	Position parameter asterisk tuple parameter naming keyword parameter double asterisk keyword parameter
	
4,Function variable problem
	1.local variable
	The variables defined inside the function are called local variables (the formal parameters of the function are also local variables)
	Local variables can only be used inside functions
	Local variables are created during a function call and are automatically destroyed after the function call
	
	2.global variable
	Defined outside the function, inside the module (current).py)The variables of are called global variables
	Global variable, which can be accessed directly by all functions (but it cannot be assigned directly inside the function)
	
	Description of local variables
		1.Create a local variable when assigning a value to the variable for the first time in the function, and modify the binding relationship of the local variable when assigning a value to the variable again
		2.Assignment statements inside functions do not affect global variables
		3.Local variables can only be accessed inside the declared function, while global variables can be accessed within the scope of the whole module (current file)
		be careful: cpython The interpreter executor will default the variable on the left side of the operator to a local variable
		
5,object-oriented programming
	1.object
	An object or instance in real life
	2.object-oriented programming
	Regard everything as an object, and establish the relationship between objects with behavior
	Object can have attribute [noun]
	The object can have behavior [verb]
	3.Class[ class]
	Objects with the same properties and behaviors are grouped into one group, that is, one type
	Class is a tool used to describe objects. Classes can create objects (instances) of this kind
	Class defines the properties and methods common to each object in the collection
	4.grammar
	class  Class name (inheritance list):
	"""Class"""
		Example method
		Class variable
		Class method
		Static method
	5.effect
		1.You can create one or more objects (instances)
		2.Variables and methods defined within a class can be owned by instances created by this class
	6.explain
		1.The class name must be an identifier (the same naming rules as variables). It is recommended to capitalize the first letter
		2.A class name is essentially a variable that binds to a class instance
	class Car():
	"""This is a car"""
	print("Auto factory creation completed!")
	def __init__(self,color)
		self.colro=color
		
	def run(self,speed): //self represents the instance generated by the class
	"""Method for adding driving behavior to an instance"""
		self.speed = speed //Add speed attribute to the instance
		print("The car is running at",self.speed,"Drive at your speed!")


​ def Loge(self,loge ):
​ self.loge=loge
print("brand of car", self. Logo)

​ car=Car()
​ car.run(100)
​ car. Loge (Mercedes Benz)

​ 7. Instantiation of class (call of class)
​ 1. grammar
Variable = class name ([create parameter list])
​ 2. effect
Create an instance object of this class and return the reference relationship of this instance object
​ 3. explain
The instance uses its own scope and namespace to create instance variables (properties) for the instance. The instance can call methods in the class and access class variables in the class

Add attribute: car Color = black

	8.Class
		1.grammar
		class Class name():
			def Instance method name( self,Parameter 1, parameter 2.. . ): 
				pass 
		2.effect
		Used to describe the behavior of an object, so that all objects of this class can have this behavior
		
		3.explain
			1.The essence of an instance method is a function, which is defined in a class
			2.The first parameter of the instance method represents the instance that calls this method, which is generally used self express
			3.Properties of the instance method property class
	
	9.Call of instance method
	example.Instance method name (parameter list)
	Class name.Instance method name (instance, call parameter)
	
	10.Class constructor (initialization function)
		1.grammar
			class Class name ():
				def __init__(self,parameter list): 
					pass
		2.effect
			1.init Method is a special method, called the constructor or initialization method of a class. It will be called automatically when an instance of this class is created
			2.self An instance representing a class must exist when it is defined, although it is not necessary to pass in the corresponding parameters
			3.If you add formal parameters like a self created class, you need to use a constructor
			4.Function: add necessary resources such as attributes to the newly created object
		
	11.inherit
		1.concept
			1.Inheritance: inheritance is the function of continuing the old class
			2.Derivation: add new functions based on the old class
		2.effect
			1.With inheritance and derivation mechanism, some common functions can be added to the base class
			2.Change the function of the original class without changing the parent class code to realize code sharing
			
	12.python File operations in
		File is the basic unit for data storage, which is usually used for long-term data storage
		1.Operation steps of file
			1.Open file
			2.Read write file
			3.Close file
		2.Operation format
			1,grammar
				open(filename,mod,[encoding])
				filename The path or name of the file
				mod Operation mode of file
				encoding File encoding format
			2.explain
				Open a file and return the file stream object. If the opening fails, it will be triggered ioError error
			3.How to close a file
				f.close()
			1-1.mod Parameter setting
				r:Open as read-only
				w:.........Overwrite write//Writing is the function of creating files by yourself
				a:Open in write only mode. If there is content in the source file, write will be appended
				b:Open file in binary mode
				wb:Open file in binary write mode
				rb:Open file in binary read mode
				t:Open file in text mode
			4.Reading and writing of text (write line breaks by yourself)
				1.Writing of text
					1.f.write(character string): Write string to an open file
					2.fwritelines(String list): Write multiple strings to an open file
				2.Text reading
					1.f.readline() Read a line of text
					2.f.readlines() Read multiline text
					3.f.read(n) read N Characters
			5.with sentence
				1.grammar
					with expression as variable
						pass
7,python Reptile	
	1.web review
		url:Uniform resource locator 
		http:80
		https: 443
		get Get displayed in url in
		post Secure access

8,Crawler request module
	1.classification
		1.python2 Medium: urllib2 urllib3
		2.python3 Medium: urllib.request requests
	2.common method
		1.urllib.request.urlopen(url)
			Function: Send a request to the website and get a response
			Format: byte stream=res.read()//obtain
				  character string res.read().decode("utf-8")//decode
		2.req=urllib.request.Request(url,headers)
			Function: create a request object to send a request to the website and get a response
			Use process:
				1.req=urllib.request.Request(url,headers) Create request object
				2.res=urllib.request.urlopen(req) Make a request to the website
				3.html=res.read().decode("utf-8") decode
9,url Coding module
	https://www.baidu.com/s?wd=%E8%94%A1%E5%BE%90%E5%9D%A4
	1.Module name
		urllib.parse: url Coding module
	2.coding method  urlencode
		1.grammar
			 urllib.parse.urlencode({Dictionaries})
				//import urllib.parse

				//key={"wd": "source"}
				//data=urllib.parse.urlencode(key) # encodes data
				//print(data)
	3.quote Method - Coding
				//import urllib.parse

				//data=urllib.parse.quote("source")
				//print(data)			
	
	4.unquote Method - decoding
				//import urllib.parse

				//data=urllib.parse.unquote("%E6%BA%90")
				//print(data)
	
		1

Item 1
Write a small crawler. When the program runs, input any keyword in the terminal and obtain the source code of the queried web page through Baidu query function.
1. Get the query information first
2. url code the query information
3. Splice real URLs
4. Access through the real url to get the corresponding object
5. Transcode the corresponding object. And save to local
/*import urllib.request
import urllib.parse

		headers={"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36"}
		base_url="https://www.baidu.com/s? "# web page's default url
		key=input("Please enter the information you want to query:")
		key=urllib.parse.urlencode({"wd":key}) #url encode the data
		url=base_url+key #Splice to get the real url

		req=urllib.request.Request(url,headers=headers) #Create request object
		res=urllib.request.urlopen(req) #Make a request to the website
		html=res.read().decode("utf-8")
		with open("Baidu.html","w",encoding="utf-8") as f:
			f.write(html)

		print("File written successfully!")*/

Item 2
Write a crawler program. When the program is running, enter the specified post bar name, you can enter any post bar and crawl the source code of the number of pages in the specified range of the post bar (using object-oriented programming)

import urllib.request
import urllib.parse

class URL():
    def __init__(self):

        self.base_url= "https://tieba.baidu.com/f?"
        self.headers= {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36"}

    def get_html(self,url,filename): #Get the function of web page source code
        req=urllib.request.Request(url,headers=self.headers)
        res=urllib.request.urlopen(req)
        html=res.read().decode("utf-8")
        self.save_html(html,filename)

    def get_page(self):
        name=input("Please enter the post bar name of the query:")
        begin=int(input("Please enter the initial page number for crawling:"))
        end=int(input("Please enter the page number of crawling end:"))
        for i in range(begin,end):
            key= urllib.parse.urlencode({"kw":name})#Code the entered post bar name
            pn=(i-1)*50 #Get the parameters to control the number of pages
            url=self.base_url+key+"&pn"+str(pn) #The real url obtained by splicing
            filename="The first"+str(i)+"page.html"#Create a name for the file
            self.get_html(url,filename) #Call the upper function to get the source code

    def save_html(self,html,filename):
        with open(filename,"w",encoding="utf-8") as f:
            f.write(html)
            print("Data saved successfully")

tieba=URL()

tieba.get_page()

1. Analysis module

1. Classification of data

1. Structured data

Features: fixed format html json xml

2. Unstructured data

Pictures, videos and audio are generally stored as binary

2. Regular expression module re

1. Metacharacters in regular expressions

Metacharactereffect
.Match any character (excluding newline \ n)
\dMatch any number
\sMatch any white space character
\SMatch any non white space character
\nMatch a newline character
\wMatch letters, numbers, and underscores
\WMatches are not letters, numbers, underscores
*Match 0 or more expressions
+Match occurs once or n times
Matches 1 or 0 fragments defined by previous regular expressions (non greedy)
{m}Exactly match m previous expressions
^Match the beginning of a line of string
$Matches the end of a line of string
()Matching the expression in parentheses also represents a group

2,match()

Try to match the regular expression from the beginning of the string. If successful, return the matching result; otherwise, return None

import re

content="Hello 123 4567 cxk_ji ni tai mei"
result=re.match("^Hello\s\d\d\d\s\d{4}\s\w{3}",content)
print(result.group())
print(result.span())

3. Match target

If you want to get part of the specified content from the string, you can use parentheses to enclose the data you want to extract. Parentheses actually mark the start and end indexes of a sub expression. Each marked sub expression will correspond to a group in turn. Call the group method to pass in the index of the group to extract the result

import re

content="Hello 123 4567 cxk_ji ni tai mei"
result=re.match("^Hello\s(\d\d\d\s\d{4})\s(\w{3})",content)
print(result.group(1))
print(result.group(2))
print(result.span())

4. Universal matching

There is a universal matching that can be used, that is. *, Including (.) You can match any string (except the newline character), and the asterisk represents matching the previous character infinite times, so they are combined to match any character

import re

content="Hello 123 4567 cxk_ji ni tai mei"
result=re.match("^Hello.*$",content)
print(result.group())
print(result.span())

5. Greedy matching and non greedy matching

1. Greedy matching (. *): match as many as possible on the premise of successful matching of the whole expression

import re

content="Hello 1234567 cxk_ji ni tai mei"
result=re.match("^Hello.*(\d+).*$",content)  #The result of greedy matching is only 7
print(result.group(1))
print(result.span())

2. Non greedy matching (. *?): On the premise of successful matching of the whole expression, try to match as few as possible

import re

content="Hello 1234567 cxk_ji ni tai mei"
result=re.match("^Hello.*?(\d+).*$",content)  #The result of non greedy matching is only 1234567
print(result.group(1))
print(result.span())

6. Modifier

Regular expressions can contain some optional flag modifiers to control the matching pattern, and the modifier is specified as an optional flag

import re

content="""Hello 1234567 cxk_ji 
ni tai mei"""
result=re.match("^Hello.*$",content,re.S)  #The result of greedy matching is only 7 re S means that line breaks can be matched
print(result.group())
print(result.span())

Common modifiers

Modifier describe
re.IMake matching pairs case insensitive
re.LDo localization recognition matching
re.MMultiline matching
re.SLine breaks can be matched
re.UParsing characters from Unicode character sets
re.XThis flag gives you a more flexible format to make your regular expression easier to understand

7,search()

Once the beginning of the match method does not match, the whole match will fail

1. Function

The entire string is scanned at match time and the first successful match is returned

2. Grammar

re.search(pattern,string,flags=0)

3. Parameter description

1. pattern: matching regular expression

2. String: the string that needs to be matched

3. flags: flag bit

import re

content="""Hello 1234567 cxk_ji 
ni tai mei"""
result=re.search("cxk.*$",content,re.S)  
print(result.group())
print(result.span())

8,findall()

If you want to get all the text content that matches the regular expression, you need to use findall(), which will search the whole string and then return all the content that matches the regular expression

sub()

You can use regular expressions to modify text

string="12dskjdskj34kdslkds56lkds78"
content=re.sub("\d+","",string) #Remove all numbers, and the second parameter is the replaced string
print(content)

9,compile()

Regular expression strings can be mutated into regular expression objects for reuse in subsequent matches

string1="2019-7-7 10:48"
string2="2019-7-8 11:48"
string3="2019-7-9 12:48"

pattern=re.compile("\d{2}:\d{2}")  #Compile regular expressions into an object
result1=re.sub(pattern,"",string1)
result2=re.sub(pattern,"",string2)
result3=re.sub(pattern,"",string3)
print(result1)
print(result2)
print(result3)

3. csv module

1. Import module

import csv

2. Open csv file

with open("xx.csv","w",newline="",encoding="utf-8") as f:

newline = "" must be added, otherwise there will be more blank lines

3. Initialize write object

witer=csv.writer(f)

4. Write data

writer.writerow([list])

import csv

with open("demo.csv","w",newline="",encoding="utf-8") as f:
    writer=csv.writer(f) #Initialize write object
    L=["Xiao Ming",20,"male"]
    L1=["Xiao Hong",30,"female"]
    writer.writerow(L) #Write data
    writer.writerow(L1)

4. Cat's eye movie project

Regular expression for extracting movie information:

'<div class="movie-item-info">.*?title="(.*?)".*?class="star">(.*?)</p>.*?class="releasetime">(.*?)</p>',re.S

Write a crawler program, which can crawl the name, starring role and release time of all ranking films, and store them in excel and local

1. Splice url information of each page

2. Write a function to get the source code of the web page

3. Write functions that parse useful data from web page source code

4. Write a function that writes data to a local csv file

5. Module installation

python -m pip install requests

python -m pip install bs4

python -m pip install lxml

6. requests module

1. Common methods

1,res=requests.get(url,headers=)

Send a request to the website and get the response object

2. Response corresponding attribute (res)

1. res.text changes from byte to string

2,res.encoding="utf-8"

3. res.content binary byte stream (required when downloading pictures, audio and video)

4,res.status_code returns the HTTP response code

5. res.url returns the URL of the actual data

import requests

url="https://www.baidu.com/"
headers={"User-Agent":"Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36"}

res=requests.get(url,headers=headers)
res.encoding="utf-8"
print(res.text)
print(res.url) #Return the url address of the actual data
print(res.status_code) #Return HTTP response code
print(res.content) #Get content bytes

6. url encoding parameters (params)

params: Dictionary

res=requests.get(url,params={},headers=)

Automatically encode the url of params, and then splice it with the url

{"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3",
"Accept-Encoding": "gzip, deflate",
"Accept-Language": "zh-CN,zh;q=0.9",
"Cache-Control": "max-age=0",
"Connection": "keep-alive",
"Cookie": "anonymid=jxsj5fy78nj65k; depovince=ZJ; _r01_=1; jebe_key=dacc7a51-8acd-46aa-ac34-9b0dcd0bf180%7C712e620fada2a7904696ec0f971e40cd%7C1562478178258%7C1%7C1562478176757; JSESSIONID=abcU3TNteZ-K-7OYbFlVw; ick_login=0feb451a-fdc7-4f21-9739-70aa6b149596; loginfrom=null; wp_fold=0; jebe_key=dacc7a51-8acd-46aa-ac34-9b0dcd0bf180%7C20431d0d28353673afdf82da213cc1fa%7C1562487391831%7C1%7C1562487390408; t=60525e04b0d5fb0611a37d12e64779fe7; societyguester=60525e04b0d5fb0611a37d12e64779fe7; id=964833547; xnsid=347dad61; jebecookies=5da84bdc-7000-4f30-8e5c-10fcff9ae57a|||||",
"Host": "www.renren.com",
"Upgrade-Insecure-Requests":"1" ,
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36"}

7. Beautiful soup parser

1. Node selector

html = """
<html>
<head>
<title>The Dormouse's story</title>
</head>
<body>
<p class="title" name="dromouse"><b>The Dormouse's story</b></p>
<p class="story">Once upon a time there were three little sisters; and their names were
<a href="http://example.com/elsie" class="sister" id="link1"><!-- Elsie --></a>,
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>
<p class="story">...</p>
"""

from bs4 import BeautifulSoup

soup=BeautifulSoup(html,"lxml")
print(soup.title)
print(soup.head)
print(soup.p.string)
print(type(soup.p))

2. Get properties

print(soup.p.attrs["name"])
print(soup.p.attrs)
print(soup.p["name"])

3. Nested selection

print(soup.head.title.string)

4. Method selector

1,find_all()

Query all qualified elements and pass them some attributes or text to get all qualified elements

html="""
<div class="panel">
    <div class="panel-heading">
        <h4>Hello</h4>
    </div>
    <div class="panel-body">
        <ul class="list" id="list-1">
            <li class="element">Foo</li>
            <li class="element">Bar</li>
            <li class="element">Jay</li>
        </ul>
        <ul class="list list-small" id="list-2">
            <li class="element">Foo</li>
            <li class="element">Bar</li>
        </ul>
    </div>
</div>
"""

from bs4 import BeautifulSoup

soup=BeautifulSoup(html,"lxml")
# print(soup.find_all(name="ul")) #Find by node name
# print(type(soup.find_all(name="ul")[1])) #The return type is BS4 element. Tags are iteratable objects

for i in soup.find_all(name="ul"):
    for li in i.find_all(name="li"):
        print(li.string)

5,attrs

In addition to querying according to the node name, we can also access some attributes to query

print(soup.find_all(attrs={"id":"list-1"}))
print(soup.find_all(id="list-1"))
print(soup.find_all(class_="element"))
print(soup.find_all("ul",class_="list list-small"))

6,text

Text can match the text of the node. The passed in form can be a string or a regular expression

print(soup.find_all(text=re.compile("Hello")))
print(soup.find_all(text="Hello"))

7,find()

Returns the first element that matches

print(soup.find(name="ul"))

8. css selector

When using css selector, you need to call the select() method and pass in the corresponding css selector

The id name in the css selector is # represented before it

css select the class name before express

print(soup.select(".panel .panel-heading"))
print(soup.select("ul li")) #Find the node information of li under ul
print(soup.select("#list-2 .element"))#Find the node information of class=element under id list-2
print(soup.select("div > ul > li"))#Find the node information of li under the ul tag under div

Nested selection

for ul in soup.select("ul"):
    print(ul["id"])
    print(ul.attrs["id"])

Get text

for li in soup.select("li"):
    print("get_text:",li.get_text())
    print("string:",li.string)
    print("text:",li.text)

Modules to be installed

python -m pip install jieba

python -m pip install wordcloud

python -m pip install matplotlib

python -m pip install imageio

I want to wait for the car I've been waiting for

Item 3. Crawl the short comments of any film on douban.com and store them in the txt text of this

import requests
from bs4 import BeautifulSoup

class Douban():
    def __init__(self):
        self.baseurl="https://movie.douban.com/subject/30171425/comments?start="
        self.headers ={"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36"}
        self.star=0

    def get_html(self,url):
        html=requests.get(url,headers=self.headers).text
        self.get_comment(html)

    def get_comment(self,html):
        comment_list=[]#Save the processed comment list
        soup=BeautifulSoup(html,"lxml")
        comment=soup.select(".comment p")
        # comment=soup.find_all("span",class_="short")
        for i in comment:
            comment_list.append(i.text)
        self.save_comment(comment_list)

    def save_comment(self,comment_list):
        with open("comment.txt","w",encoding="utf-8") as f:
            f.writelines(comment_list)
            print("Information storage completed!")

    def get_page(self):
        begin=int(input("Enter crawl start page"))
        end=int(input("Enter crawl end page"))
        for page in range(begin,end+1):
            self.star=(page-1)*10
            url=self.baseurl+str(self.star)
            self.get_html(url)#Call the upper function to get the web page source code



douban=Douban()
douban.get_page()

Item 4. Climb www.gushiwen.com Org, and save it locally

import  requests
from bs4 import BeautifulSoup
import os

class Gushi():
    def __init__(self):
        self.url = "https://so.gushiwen.org/shiwen/default_0AA2.aspx"
        self.headers = {
            "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36"}
    def get_html(self):
        html=requests.get(self.url,headers=self.headers).text
        self.get_info(html)


    def get_info(self,html):
        soup=BeautifulSoup(html,"lxml")
        title=soup.select("div > p > a > b")
        content=soup.find_all("div",class_="contson")
        self.save_info(title,content)#Call save function

    def save_info(self,title,content):
        files=os.getcwd()+"\Ancient poetry"#Create a folder path
        if not os.path.exists(files):
            os.mkdir(files)#Create a folder
        L=[]
        for i in range(len(title)):
            L.append(content[i].text)#Store the processed content into the list
            with open(files+"\%s.txt"%title[i].text,"w",encoding="utf-8")as f:
                f.write(title[i].text)#Write title
                for j in L[i]:
                    if j!=" ":
                        f.write(j)#Write content
                        if j==". ":
                            f.write("\n")



gushi=Gushi()
gushi.get_html()

day5

1. Jieba participle

1. Word segmentation mode

1. Accurate word segmentation, trying to separate sentences most accurately, which is suitable for text analysis

2. Full mode: sweep out all words that can be formed into words in the sentence in seconds, which is very fast, but it can not solve the problem of ambiguity

3. Search engine mode: segment long words on the basis of accurate mode to improve the recall rate, which is suitable for search engines

2. Word segmentation method

For a long paragraph of text, the principle of word segmentation is roughly divided into three parts:

1. First, use regular expressions to roughly divide Chinese paragraphs into sentences

2. Construct each sentence into a directed acyclic graph, and then find the best cutting scheme

3. Finally, for continuous words, HMM model is used to divide them again

2,codecs

This method can specify an encoding to open the file. The file opened with this method returns Unicode encoding when reading and Unicode encoding when writing

Use open() to specify Unicode encoding for writing. If str is used, it will be decoded into Unicode according to the character encoding declared in the source code file before operation. Compared with open(), this is not prone to problems

3. XPath parsing module

1. Concept

The whole process is XML path language, that is, XML path language, which is a language to find information in XML documents

2. Use process

from lxml import etree

parseHTML=etree.HTML(html)
result=parseHTML.xpath("xpath expression")

3. Common XPath rules

nodenameSelect all nodes for this node
/Select direct child node from current node
//Select a descendant node from the current node
.Select current node
...Select the parent node of the current node
@Select Properties

4. All nodes

from lxml import etree

html=etree.HTML(text)#Convert data to xpath objects
print(html.xpath('//*') # get all node information

1. The return type is a list. Each Element is of Element type, followed by many node names

2. You can also specify the node name to obtain data

from lxml import etree

html=etree.HTML(text)#Convert data to xpath objects
print(html.xpath('//li ')) # get the node information of the specified name

5. Child nodes

You can find word nodes or descendant nodes through / or / /

print(html.xpath('//li/a ')) # find the child node under li
print(html.xpath('//ul//a ')) # find the descendant node a under ul

6. Parent node

You can query the parent node information of the current node through (...)

print(html.xpath('//a[@href="http://domestic.firefox.sina.com/"]/../@class')) # query the class attribute of the parent node of the current node
print(html.xpath('//a[@href="http://domestic.firefox.sina.com/"]/../../@id ')) # query the id attribute of the grandparent node of the current node

7. Attribute matching

print(html.xpath('//li[@class="link1"]'))
print(html.xpath('//a[@id="channel"]'))

8. Text acquisition

The xpath () method in the text node can be used to get

print(html.xpath('//li[@class="link2"]/a/text() ') # extract the text information of node a under node li
print(html.xpath('//li[@class="link2"]/a/@href ')) # extract the href information of node a under node li

9. Attribute multi value matching (contains)

Sometimes some nodes have more than one attribute, so we need to use multi value matching

text="""

<li class="li list-1"><a href="link.html">hello world</a></li>
<li class="li list-1"><a href="link.html">hello</a></li>
"""
from lxml import etree

html=etree.HTML(text)#Convert data to xpath objects
print(html.xpath('//li[@class="li list-1"]/a/text()'))
print(html.xpath('//li[contains(@class,"li")]/a/text()'))

Through the contains method, the first parameter passes in the attribute name and the second parameter passes in the attribute value. As long as the attribute contains the passed in attribute value, the matching can be completed

10. Select in order

Sometimes, when selecting, some attributes match multiple nodes at the same time, but if you only want one of them, you need to

text="""
<div>
    <ul>
        <li class="item1"><a href="link1.html">cxk song</a></li>
        <li class="item2"><a href="link2.html">cxk dance</a></li>
        <li class="item1"><a href="link3.html">cxk rap</a></li>
        <li class="item2"><a href="link4.html">cxk basketball</a></li>
    </ul>
</div>
"""
from lxml import etree

html=etree.HTML(text)#Convert data to xpath objects
# print(html.xpath('//li[@class="li list-1"]/a/text()'))
# print(html.xpath('//li[contains(@class,"li")]/a/text()'))
print(html.xpath('//li[last()]/a/text()'))
print(html.xpath('//li[1]/a/text() ') # get the text information in the a tag in the first li tag
print(html.xpath('//li[position()<3]/a/text()'))
print(html.xpath('//li[last()-2]/a/text() ') # the third to last
print(html.xpath('//li[@class="lteml"][2]/a/text()'))

1. Content of Defense

1. Introduction of PPT project team members

2. Training summary

3. Source code notes of usual class (source code of defense project)

2. Defense items

Select any website and crawl the data of the website (text information, picture information, audio information)

3. json module

1. Concept

Objects and numbers in javascript

The data in the object: {"key": "value"} json must be represented in double quotation marks

Array: [x1,x2,x3]

2. Role of json module

Conversion between json formatted string and python data type

3. Read json

json.loads() :

Function: json format - > Python data type

json python

Object dictionary

Array list

import  json

str="""
[{"name":"Xiao Ming"},{"age":"20"},{"sex":"male"}]
"""
print(type(str))
data=json.loads(str) #Convert json information to python data type
print(data)
print(type(data))
print(data[0]["name"])
print(data[1].get("age"))
print(data[0].get("address","Beijing")) #If the information does not exist, the default value is returned

4. Output json

json.dumps()

Function: python data type ------ > JSON format

python json

Dictionary object

List array

Tuple array

be careful:

1,json. Dump () uses ascii encoding by default
2. Add guarantee_ ascii = false, disable ascii encoding
with open("data.json","w",encoding="utf-8") as f:
    f.write(json.dumps(data,indent=2,ensure_ascii=False)) #The indent parameter indicates the number of indented characters

If there is Chinese, you need to specify the code first, and then ensure_ The ASCII parameter is set to False

4. The process of crawling dynamic ajax request web pages

1. Analyze the request rules of the web page, open the review element, select the xhr option, view the change law of the web page request through the rolling pulley, and find the request information of js

2. After finding the request information, select the request, enter the request body, select the preview option, and check whether there is the data we want in the returned information

3. If the desired data exists in the returned information, select the Headers option, check the RequestURL parameter, find the real url of the web page, and then query the parameters. Generally, the parameters of the url exist in the Query Sring Parameters in the Headers option

4. By splicing the public part and parameters of the real url, we can get the url that can get more information, and then get the json information in js

Get download address json["data"]["items"]
Then traverse  for i in json["data"]["items"]: 
              i["item"]["video_playurl"]
import requests
import json
import string
import time
import random

class BilibiliVideo():
    def __init__(self):
        self.url="http://api.vc.bilibili.com/board/v1/ranking/top? "# dynamic request's public url part
        self.headers={"User-Agent":"Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36"}
        self.all_str=string.punctuation+string.whitespace #Bind all special characters and white space characters


    def get_json(self):
        for offset in range(1,52,10): #Multiple fetches are realized by changing the value of offset in a loop
            params={"page_size":"10",
                    "next_offset":str(offset),
                    "tag":"Small video",
                    "type":"tag_general_week",
                    "platform":"pc"}
            html=requests.post(self.url,data=params,headers=self.headers).text #Return web page data (json)
            html=json.loads(html) #Turn json information into data types in python
            self.get_video_url(html) #Call the function to get the video link

    def get_video_url(self,html): #Function to obtain small video download address
        for video in html["data"]["items"]:
            video_url=video["item"]["video_playurl"] #Get the download address of the video
            video_name=video["item"]["description"] #Get the name of the video
            for char in video_name: #Traverse the characters in the video name
                if char in self.all_str: #If there are special characters
                    video_name=video_name.replace(char,"") #If it exists, replace the special characters with blanks

            if len(video_name)>=50:
                video_name=video_name[:51]  #If the length of the video name is greater than 50, slice the name and keep only the first 50 bits
            filename=video_name+".mp4" #Save file in mp4 format
            video_content=requests.get(video_url,headers=self.headers).content #Initiate a request for the video address and obtain binary information

            with open(filename,"wb") as f:
                f.write(video_content)
                print("%s Download successful"%filename)
            time.sleep(random.randint(1,5))




bilibili=BilibiliVideo()
bilibili.get_json()

Keywords: Python

Added by vandutch on Thu, 10 Feb 2022 13:46:29 +0200