On Writing Those Things Use js to Statisticate the Reading Quantity of Major Blogs

In the process of daily article data statistics, manual method has been difficult to deal with, so gradually began the process of intervention statistics.

In the previous section, I explored the use of csv file format for article data statistics. I thought I could deal with it for a while, but I gave up in just one day.

The reason is not that I am lazy, I need to copy the content of the article, then organize it into a specific csv format, and finally use the java tool classes that have been written for statistics.

In these three steps, the first step is to copy the content of the article, the second step is to organize the format of the article, and the third step is to write the csv tool class.

Therefore, can it be simpler? In the advanced stage of lazy cancer, we must continue to seek new solutions.

How to use csv files to process statistical data can be consulted https://snowdreams1006.github.io/static-semi-manual-with-csv.html

Realization effect

Mogao lesson notes

Mogao lesson notes : https://www.imooc.com/u/52244...

Brief book

Brief book : https://www.jianshu.com/u/577...

Blog Garden

Blog Garden : https://www.cnblogs.com/snowd...

Tencent Cloud Community

Tencent Cloud Community : https://cloud.tencent.com/dev...

js crawls analysis data

The following is a chrome browser example to illustrate how to use the default console to capture key data, this article needs a certain jQuery foundation.

Mogao lesson notes

Right-click the check option on the target page, open the default developer console, click the left-most mouse arrow, and then select key data, such as browsing volume.

At this point, the developer console automatically scrolls to the Elements tab, right-clicks Copy on the target data, and then clicks Copy selector, which is now positioned at the reading node.

Click on the Console tab and change the selector to jQuery selector, that is, $("copy selector"). text(), now output the content directly in the console to see if you can grab the browsing volume!

Now we have successfully located the designated elements, and we need to count the reading volume of all articles, so we need to locate all elements.

$("#articlesList > div:nth-child(1) > div.item-btm.clearfix > div > div:nth-child(1) > em").text();

Simple analysis of the article structure combined with selector analysis, we can see that browsing, recommendation and comment are basically the same, the only difference is the ranking order, so in order to accurately locate the number of browses, we need to locate the first element, recommendation is the second element, so analogy.

<div class="r right-info">
    <div class="favorite l">
        <i class="icon sns-thumb-up-outline"></i><em> 83 browse</em>
    </div>
    <div class="favorite l">
        <i class="icon sns-thumb-up-outline"></i><em> 1 Recommend</em>
    </div>
    <div class=" l">
        <i class="icon sns-comment"></i><em> 0 comment</em>
    </div>    
</div>

After clarifying the basic document structure, we began to modify the selector to locate the browsing volume of all articles. We made the following modifications.

$("#articlesList div:nth-child(1) > em").text();

Just keep the head and tail, and then remove the middle part > div: nth-child (1) > Div. item-btm. Clearfix > div > so that you can easily locate the browsing volume of all elements, is it simple?

Seeing the output of the console, my heart was instantly reassured. Isn't this just the amount of browsing for all articles on the first page? Observing the format of the output content, we need to divide the whole string into an array of strings according to the space.

It should be noted that there is another space at the beginning of a line, so before splitting it into an array of strings, we first remove the space at the beginning of a line.

// Before removing spaces: "83 browse 91 browse 114 browse 150 browse 129 browse 175 browse 222 browse 173 browse 225 browse 201 browse 217 browse 202 browse 229 browse 184 browse 155 browse 153 browse 211 browse"

$("#articlesList div:nth-child(1) > em").text().trim();

// After removing the blanks: "83 browse 91 browse 114 browse 150 browse 129 browse 175 browse 222 browse 173 browse 225 browse 201 browse 217 browse 202 browse 229 browse 184 browse 155 browse 153 browse 211 browse"

Now let's divide the whole string into an array of strings by spaces.

// Before splitting strings: "83 browse 91 browse 114 browse 150 browse 129 browse 175 browse 222 browse 173 browse 225 browse 201 browse 217 browse 202 browse 229 browse 184 browse 155 browse 153 browse 211 browse"

$("#articlesList div:nth-child(1) > em").text().trim().split(" ");

// After splitting the string: ["83 browsing", "91 browsing", "114 browsing", "150 browsing", "129 browsing", "175 browsing", "222 browsing", "173 browsing", "225 browsing", "200 browsing", "201 browsing", "217 browsing", "291 browsing", "202 browsing", "229 browsing", "184", "226 browsing", "155 browsing", "153 browsing"]

Now we have enough to split the whole string into a small string. Next, we need to remove the browsing in 83 browsing, leaving only the number 83.

$.each($("#articlesList div:nth-child(1) > em").text().trim().split(" "),function(idx,ele){
     console.log(ele.substr(0,ele.lastIndexOf("browse")));
});

Now that we have captured the real browsing volume, the next step is relatively simple. We can directly accumulate these browsing volume. What we need to pay attention to is that the number of browsing here is still a string type, which needs to be converted into a digital type to carry out the accumulation operation yo! _______

//Reading volume
var readCount = 0;
$.each($("#articlesList div:nth-child(1) > em").text().trim().split(" "),function(idx,ele){
     readCount += parseInt(ele.substr(0,ele.lastIndexOf("browse")));
});
console.log("Reading volume: " + readCount);

Summary

Taking chrome browser as an example, we explain how to use the console tool to capture key data, extract valid data step by step from the page structure analysis entrance, and finally change from one data to multiple data, so as to realize the cumulative statistics of data.

Generally speaking, it is relatively simple and does not need too much basic knowledge, but it is still a little summary of the jQuery knowledge points involved in it!

  • Locate specific elements: $("Here is the replication selector")
  • Locate specific element content: $("Here is the replication selector"). text()
  • Remove the first and last spaces of the string: $("Here is the replication selector"). text().trim()
  • Separate strings into string arrays by spaces: $("Here is the replication selector"). text().trim().split(")
  • Intercept the specified part of the string: ele.substr(0,ele.lastIndexOf("browse")
  • Convert strings to numeric types: parseInt(ele.substr(0,ele.lastIndexOf("browse");
  • Variable summation: readCount += parseInt(ele.substr(0,ele.lastIndexOf("browse");

Complete examples:

//Reading volume
var readCount = 0;
$.each($("#articlesList div:nth-child(1) > em").text().trim().split(" "),function(idx,ele){
     readCount += parseInt(ele.substr(0,ele.lastIndexOf("browse")));
});
console.log("Reading volume: " + readCount);

//Recommended quantity
var recommendCount = 0;
$.each($("#articlesList div:nth-child(2) > em").text().trim().split(" "),function(idx,ele){
     recommendCount += parseInt(ele.substr(0,ele.lastIndexOf("Recommend")));
});
console.log("Recommended quantity: " + recommendCount);

//Comment quantity
var commendCount = 0;
$.each($("#articlesList div:nth-child(3) > em").text().trim().split(" "),function(idx,ele){
     commendCount += parseInt(ele.substr(0,ele.lastIndexOf("comment")));
});
console.log("Comment quantity: " + commendCount);

Brief book

For example, some published articles have not been drilled, so the order of reading is uncertain, unlike Mu's handbook, which was introduced earlier, but the key data of the brief book is preceded by small icons, so we can use icons to locate the data next to it.

According to the steps described above, we still locate the reading volume, but note-44847909 > div > div > a: nth-child (2) > I can not be used directly, because we just analyzed that the brief book can not use sequential positioning but can only use icons to assist positioning.

So, first look at the structure of the document and try to locate it directly to the small icons of the total amount of reading.

After analyzing the structure of the article, we can easily locate all the reading icons, of course, this is an array of elements, not string arrays yo!

$("#list-container .ic-list-read")

Next, let's see if we can correctly locate each icon, and then locate the reading volume on the left side of the icon.

Now that we have been able to locate all the small reading icons, now think about how to locate the real reading next to it.

<div class="meta">
    <span class="jsd-meta">
      <i class="iconfont ic-paid1"></i> 0.2
    </span>
    <a target="_blank" href="/p/3441940065b5">
        <i class="iconfont ic-list-read"></i> 2
    </a>        
    <a target="_blank" href="/p/3441940065b5#comments">
      <i class="iconfont ic-list-comments"></i> 0
    </a>      
    <span><i class="iconfont ic-list-like"></i> 1</span>
    <span class="time" data-shared-at="2019-04-12T10:39:57+08:00">Yesterday 10:39</span>
</div>

Analyzing the structure of the article, we find that the amount of reading is the content of the parent node of the small icon, which is simple. We can naturally locate the amount of reading when we locate the parent node.

$("#list-container .ic-list-read").each(function(idx,ele){
    console.log($(ele).parent().text().trim());
});

Now that we can locate the amount of reading, it's very simple to sum it up first.

//Reading volume
var readCount = 0;
$("#list-container .ic-list-read").each(function(idx,ele){
    readCount += parseInt($(ele).parent().text().trim());
});
console.log("Reading volume: " + readCount);

Summary

Firstly, the basic structure of the article is analyzed. It is found that the reading amount of the brief book needs to be located in the small reading icon, and then in the parent node. Then the content of the parent node is the real reading amount.

After locating the real reading volume, all the problems will be solved, and the new jQuery knowledge points will be summarized.

Locate the parent of the current node: $(ele).parent()

Complete examples:

//Reading volume
var readCount = 0;
$("#list-container .ic-list-read").each(function(idx,ele){
    readCount += parseInt($(ele).parent().text().trim());
});
console.log("Reading volume: " + readCount);

//Comment quantity
var commendCount = 0;
$("#list-container .ic-list-comments").each(function(idx,ele){
    commendCount += parseInt($(ele).parent().text().trim());
});
console.log("Comment quantity: " + commendCount);

//Favorite quantity
var recommendCount = 0;
$("#list-container .ic-list-like").each(function(idx,ele){
    recommendCount += parseInt($(ele).parent().text().trim());
});
console.log("Favorite quantity: " + recommendCount);

Blog Garden

The list of articles in the blog park is retro. The traditional table layout is the simplest of these platforms. It is basically different how to introduce them.

Copy to the reader: post-row-10694598 > td: nth-child (4) Then combine the structure of the article, so we can get the reader of all the articles.

$("#post_list td:nth-child(4)")

Next, you need to traverse the array to see if you can capture all the articles on the current page.

$("#post_list td:nth-child(4)").each(function(idx,ele){
    console.log($(ele).text().trim());
});

Successfully grabbed the amount of reading, and now start accumulating the amount of reading all the articles on the current page.

//Reading number
var readCount = 0;
$("#post_list td:nth-child(4)").each(function(idx,ele){
    readCount += parseInt($(ele).text().trim());
});
console.log("Reading number: " + readCount);

Summary

The traditional table layout in a regular way only needs to be positioned in order to specific elements. It should be noted that the blog Garden article pages are paginated. If you need to count the reading amount of all articles, you need to add up the reading amount of each page manually.

Complete examples:

//Comment number
var commendCount = 0;
$("#post_list td:nth-child(3)").each(function(idx,ele){
    commendCount += parseInt($(ele).text().trim());
});
console.log("Comment number: " + commendCount);

//Reading number
var readCount = 0;
$("#post_list td:nth-child(4)").each(function(idx,ele){
    readCount += parseInt($(ele).text().trim());
});
console.log("Reading number: " + readCount);

Tencent Cloud Community

The article structure of Tencent cloud community is basically the same as that of simple books. It can adopt icon positioning method like simple books, and it can also adopt direct sequential positioning method like Muchao Network and Blog Garden.

In order to locate more accurately, icon positioning is now used to obtain reading volume.

#react-root > div:nth-child(1) > div.J-body.com-body.with-bg > section > div > section > div > div.com-log-list > section:nth-child(1) > section > div > div > span > span

Since we need to locate icons, we need to analyze the relationship between icons and reading.

<div class="com-operations com-article-panel-operations">
    <span class="com-opt-link">
        <i class="com-i-view"></i>
        <span class="text">76</span>
    </span>
    <a href="javascript:;" class="com-opt-link">
        <i class="com-i-like"></i>
        <span class="text">3</span>
    </a>
</div>

Therefore, we need to do the following modifications in order to locate and read.

$("#react-root .com-i-view").each(function(idx,ele){
    console.log($(ele).next().text().trim());
});

Located at the reading volume, the next step is a simple data accumulation and summation.

//Reading volume
var readCount = 0;
$("#react-root .com-i-view").each(function(idx,ele){
    readCount += parseInt($(ele).next().text().trim());
});
console.log("Reading volume: " + readCount);

Summary

Tencent cloud community, like the brief book, adopts the paging overlay mode, so it needs to count all the articles, just scroll until all the articles are loaded.

Summarize the new jQuery knowledge points involved:

Gets the next node of the current node: $(ele).next()

Complete examples:

//Reading volume
var readCount = 0;
$("#react-root .com-i-view").each(function(idx,ele){
    readCount += parseInt($(ele).next().text().trim());
});
console.log("Reading volume: " + readCount);

//Praise points
var recommendCount = 0;
$("#react-root .com-i-like").each(function(idx,ele){
    recommendCount += parseInt($(ele).next().text().trim());
});
console.log("Praise points: " + recommendCount);

Summary

This paper grabs the article data directly through jQuery mode, which is simple, convenient, low-cost and quick to start.

There are pages in the list of articles in Mucho and Blog Gardens. If you need to count the total number of articles viewed, you need to add up the articles on each page until the last page.
Although there are branches in the list of articles in the brief book and Tencent cloud community, it will automatically accumulate, so when counting all the articles, you only need to wait for all the articles to be loaded, and then use the js script to do one-time statistics.

Okay, this is the end of the sharing. If you think this article is helpful to you, you are welcome to share it for more people to see. By the way, the last article also solved the problem of statistics, but it used java to read csv files. If you are interested, you can. Have a look.

Keywords: Javascript JQuery React Java github

Added by Colleen78 on Sat, 18 May 2019 15:08:06 +0300