preface
At the weekend, I was idle at home. I brushed wechat and played with my mobile phone. When I found that my wechat avatar should be changed, I went to the Internet to find the avatar. Looking at the pictures, I thought that as a code farmer, I could climb down these pictures and make them into a wechat applet. I did what I said. I basically knew how to do it. I sorted and shared it with you.
catalogue
- Install node and download the dependencies
- Build service
- Request the page we want to crawl and return json
Install node
We start to install node. You can download it from the node official website nodejs.org/zh-cn/ , run node after downloading,
node -v
After successful installation, the version number you installed will appear.
Next, we use node to print out hello world and create a new input file named index.js
console.log('hello world')
Run this file
node index.js
It will output hello world on the control panel
Build server
Create a new folder named node.
First you need to download the express dependency
npm install express Copy code
Create a new file directory named demo.js, as shown in the figure below:
!img](https://p1-jj.byteimg.com/tos...)
Introduce the downloaded express in demo.js
const express = require('express'); const app = express(); app.get('/index', function(req, res) { res.end('111') }) var server = app.listen(8081, function() { var host = server.address().address var port = server.address().port console.log("Application instance, access address: http://%s:%s", host, port) })
Run node demo.js to set up a simple service, as shown in the figure:
!img](https://p1-jj.byteimg.com/tos...)
Request the page we want to crawl
Request the page we want to crawl
npm install superagent npm install superagent-charset npm install cheerio
superagent is used to initiate requests. It is a lightweight and progressive ajax api with good readability and low learning curve. It internally relies on nodejs native request api. It is suitable for nodejs environment. You can also use http to initiate requests
Super charset prevents the scrambled data and changes the character format
Cherio is a fast, flexible and implemented jQuery core implementation specially customized for the server. After installing dependencies, you can import them
var superagent = require('superagent'); var charset = require('superagent-charset'); charset(superagent); const cheerio = require('cheerio');
After the introduction, ask for our address, https://www.qqtn.com/tx/weixi... , as shown in the figure:
!img](https://p1-jj.byteimg.com/tos...)
Declare address variable:
const baseUrl = 'https://www.qqtn.com/'
After these settings, the request is sent. Next, please see the complete code demo.js
var superagent = require('superagent'); var charset = require('superagent-charset'); charset(superagent); var express = require('express'); var baseUrl = 'https://www.qqtn.com/'; // You can enter any web address const cheerio = require('cheerio'); var app = express(); app.get('/index', function(req, res) { //Set request header res.header("Access-Control-Allow-Origin", "*"); res.header('Access-Control-Allow-Methods', 'PUT, GET, POST, DELETE, OPTIONS'); res.header("Access-Control-Allow-Headers", "X-Requested-With"); res.header('Access-Control-Allow-Headers', 'Content-Type'); //type var type = req.query.type; //Page number var page = req.query.page; type = type || 'weixin'; page = page || '1'; var route = `tx/${type}tx_${page}.html` //The page information of a web page is GB2312, so the chat should be. charset('gb2312 '), and the general web page is UTF-8. You can directly use. charset('utf-8') superagent.get(baseUrl + route) .charset('gb2312') .end(function(err, sres) { var items = []; if (err) { console.log('ERR: ' + err); res.json({ code: 400, msg: err, sets: items }); return; } var $ = cheerio.load(sres.text); $('div.g-main-bg ul.g-gxlist-imgbox li a').each(function(idx, element) { var $element = $(element); var $subElement = $element.find('img'); var thumbImgSrc = $subElement.attr('src'); items.push({ title: $(element).attr('title'), href: $element.attr('href'), thumbSrc: thumbImgSrc }); }); res.json({ code: 200, msg: "", data: items }); }); }); var server = app.listen(8081, function() { var host = server.address().address var port = server.address().port console.log("Application instance, access address: http://%s:%s", host, port) })
Run demo.js to return the data we got, as shown in the figure:
A simple node crawler is completed. I hope you can click a star on the project as your recognition and support for the project. Thank you.