Crawler preparation - recognize HTML&css

Crawler preparation - recognize HTML & CSS

Before crawling, you need to know HTML and css in addition to Python.

Technical structure of web pages: HTML, css, JS

HTML (structure standard) - provide web content (provide different content through different tags)

CSS (style standard) - responsible for the style layout of web page content

JS (standards of conduct) - responsible for controlling web page changes


HTML - hypertext markup language

A web page is an HTML, and the HTML code is usually written in an HTML file that can be parsed directly by the browser

1) basic HTML structure: an HTML tag contains a head tag and a body tag

HTML tag - represents the entire page

head - the top of the page is responsible for displaying the page icon and title part (also responsible for invisible setting content)

body - responsible for the display of web page content

2) label syntax

HTML provides different through different tags. Tags are divided into two types in structure:

a) double label < label name attribute 1 = attribute value 1 attribute 2 = attribute value 2... > label content < / label name >

b), < tag name attribute 1 = attribute value 1 attribute 2 = attribute value 2... >, < tag name attribute 1 = attribute value 1 attribute 2 = attribute value 2... / >

1) There can be no gap between, tag name and '<' and '>'

2) no matter what type of data the attribute value is, it must be enclosed in double quotation marks

3) the label content of double label can be any content: including pure text, one or more labels or a mixture of text and labels

4) the tag name is provided by HTML

Text label
Title label
<h1>Primary title</h1>	
<h2>Secondary title</h2>
<h3>Tertiary title</h3>
<h4>Four level title</h4>
<h5>Five level title</h5>
<h6>Six level title</h6>
Paragraph label
<p><Graduation form</p>
<p>First emperor(1)Start a business(2)Halfway through(3)Collapse(4),this(5)The world is divided into three parts(6),Weakness of Yizhou(7),this(8)Cheng is in danger. however(9)The ministers of the guards are unremitting, and the loyal people forget their bodies(10)Those who are outside cover the special treatment of chasing the former Emperor(11),To repay your majesty. Chengyi (12) open the holy hearing to light(13)The legacy of the former Emperor is magnificent(15)You should not belittle yourself because of the spirit of people with lofty ideals(16),quote phrases to confound the eternal principles of rectitude(17),The way of remonstrating with Saizhong(18).  In the palace and the mansion, everything is one(19);reward and punishment(20),Similarities and differences: if there is a crime(21)And be loyal to the good(22), Yifu Yousi(23)On his punishment and reward(24),In order to show your Majesty's truth(25);Not favoritism(26),Make internal and external differences(27).  Shi Zhong, Shi Lang Guo Youzhi, Fei Yi, Dong Yun and others are all good and honest, and they are determined to be loyal and pure(28),It's because the former Emperor simply left it to his majesty(29):  I thought that there were no big or small matters in the palace. I learned it and consulted it(30),Then the implementation will help fill the gaps(31),Benefit from(32). 
Inline label
<span>Inline text1</span>
<label>Inline text2</label>
<font>Inline Text3</font>
Labels and symbols related to text effects
		<b>I was b The label is bold</b>
		<strong>I was strong Bold</strong>
		<!--Tilt: em-->
		<em>I am here em In the label, so I tilted</em>
		<!--Line feed: br-->
		<br />
		<p>I have a on my head br Line feed</p>
		<span>There are two in front of me nbsp Space, which is in pixels</span>&emsp;<span>There is one in front of me emsp Blank space
        	    He is the regular space
div tag

1. Provide content as a normal double label

2. The content in the web page is grouped into blocks as a box

input tag
<!--Normal input box
        	value:Contents in input box
        <input type="" value="" placeholder="Come and search for me"/>
        <br />
        <!--radio button-->
        <input type="radio" name="sex" id="s1" value="" />
        <label for="s1">male</label>
        <input type="radio" name="sex" id="s2" value="" />
        <label for="s2">female</label>
        <br />
        <!--Multi selection button-->
        <input type="checkbox" name="hobby" id="c1" value="" />
        <label for="c1">strawberry</label>
        <input type="checkbox" name="hobby" id="c2" value="" />
        <label for="c2">Banana</label>
        <input type="checkbox" name="hobby" id="c3" value="" />
        <label for="c3">grapefruit</label>
        <br />
        <!--Normal button-->
        <input type="button" name="" id="" value="determine" />
        <button type="button">cancel</button>
        <button type="button"><img src="img/jd_logo.ico"/></button>
        <br />
        <!--Other functions-->
        <input type="color" name="" id="" value="" />
        <input type="file" name="" id="" value="" />
        <input type="date" name="" id="" value="" />
        <input type="datetime-local" name="" id="" value="" />
<!--Unordered list-->
				Unordered list option 1
				Unordered list option 2
				Unordered list option 3
		<!--Ordered list-->
				Ordered list option 1
				Ordered list option 2
				Ordered list option 3
Pictures and hyperlinks
        	Picture label: img
        	Properties: src  Set picture address(Local, network)
        		  title(Set picture title)
		<!--Picture label: img-->
		<h1>----------1,Picture label----------</h1>
		<!--Show local pictures-->
		<p>Local picture</p>
		<img src="img/mmexport1573830159890.jpg" title="Lin Siyi" alt="Lin Siyi" />
		<p>Network picture</p>
		<img src="https://img2. baidu. com/it/u=1810171082,1266198879&fm=26&fmt=auto&gp=0. Jpg "title =" sister Bao "alt =" sister Bao "/ >

			Hyperlink: a
		<h1>----------2,Hyperlink label----------</h1>
		<a href=" COM / "> Baidu</a>
		<p>Picture hyperlink</p>
		<a href="">
			<img src="img/jd_logo.ico" />
		</a href="">
		<br />
		<a href="">
			<img src="img/jd_logo.ico"/>

Function of selector:

1. Label selector (element selector) - directly use the label name as the selector and select all specified labels

Example: a{} - select all a Tags

2. id selector - add # as a selector before the id attribute value, and select the tag whose id attribute value is the specified value (the id in the same page is unique)

Example: #a1{} - select the tag with id value A1

3. Class selector - precedes the value of the class attribute As a selector, select all tags whose class attribute values are the specified values

Example: a1 {} - select all tags with class value a1

4. Group selector - separate multiple selectors with ',' as one selector, and select all the labels selected by each selector separated with ','

Example: p, #a1 A1 - select all p tags and tags whose id is p1 and class tags whose class attribute value is A1

5. Descendant selectors - multiple selectors are separated by spaces

Example: div p{} - select the P tag of all div descendants (select the P tag in all DIV); Div and P are descendants

6. Descendant selectors - multiple selectors are separated by >


1. css syntax:

css is used to add style and layout to labels

css syntax:

Selector {property 1: property value 1; property 2: property value 2; property 3: property value 3;...}


Selector - select the label you want to style. (if it is introverted, the style selector {} needs to be omitted)

Attribute - the attribute name is determined by css, and different vertical styles represent different styles

Common attributes: Color - text color; Background color - background color

Font size - font size; Width - width; Height - height; Border - border

2. css code location:

1) inline style sheet: write the css code in the style attribute of the tag;

<p style="color: pink; font-family: 'imprint mt shadow'; font-size: 25px;">
			I was changed by the inline style

2) internal style sheet: write css code in style

<div class="div1">
			I was changed by the internal style
<style type="text/css">
				width: 200px;
				height: 200px;
				border: 5px solid  plum;
				line-height: 200px;
				text-align: center;
				color: palegreen;

3) external style sheet: write the css code in the css file and import it through the link tag

<div class="div2">
			I was changed by the external style

	width: 200px;
	height: 200px;
	border: 5px solid  peachpuff;
	line-height: 200px;
	text-align: center;
	color: powderblue;
	margin-top: 20px;

The above sharing is only the sharing of some common basic knowledge of the front end before the crawler;

Keywords: Python

Added by peter_anderson on Sat, 25 Dec 2021 10:34:13 +0200