Implementation of automatic comment on csdn blog post

text

Let's use java code to crawl the csdn blog site, and then automatically comment. This wave of operation can be said to be quite coquettish. Let's go to the code.

The first step is the login code, which is a large part of the Internet. The code uses the JSP dependency package to parse html to get the corresponding elements, which is equivalent to the css selector, a very powerful tripartite component.

/**
  * Log in to the csdn page. Of course, comments need to be logged in.
  * 
  * @throws Exception
  */
 public static void loginCsdnPager() throws Exception {
   String html = HttpUtils.sendGet("https://passport.csdn.net/account/login?ref=toolbar");

   try {
     Thread.currentThread().sleep(3000);
   } catch (InterruptedException e) {
     // TODO Auto-generated catch block
     e.printStackTrace();
   }
   Document doc = Jsoup.parse(html);

   Element form = doc.select(".user-pass").get(0);
   String lt = form.select("input[name=lt]").get(0).val();
   String execution = form.select("input[name=execution]").get(0).val();
   String _eventId = form.select("input[name=_eventId]").get(0).val();

   List<NameValuePair> nvps = new ArrayList<NameValuePair>();
   nvps.add(new BasicNameValuePair("username", CSDNACCOUNT));
   nvps.add(new BasicNameValuePair("password", CSDNPASSWORD));
   nvps.add(new BasicNameValuePair("lt", lt));
   nvps.add(new BasicNameValuePair("execution", execution));
   nvps.add(new BasicNameValuePair("_eventId", _eventId));

   System.out.println(nvps);
   // Start to request CSDN server for login operation. A simple package to get the returned results directly
   String ret = HttpUtils.sendPost("https://passport.csdn.net/account/login", nvps);

   System.out.println("ret is " + ret);
   // The ret will contain the following information, which can be judged.
   if (ret.indexOf("redirect_back") > -1) {
     System.out.println("Login successful...");
   } else if (ret.indexOf("Sign in too often") > -1) {
     throw new Exception("Login too often, please try again later...");
   } else {
     throw new Exception("Login too often, please try again later...");
   }
 }

With the login code, we have to get the list of blog articles, which is the source of our crawling. Start from the blog homepage and climb to other network nodes:

https://blog.csdn.net

We can treat ourselves as an insect, and then we will climb from node A to node B on the spider web to the destination.

First, go to the homepage, then get the url of the classification list in the left column of the homepage, and click these URLs to open all articles under the classification. Here we only take the url of the article list of the initial page under each category (of course, we can also implement the paging when the mouse is pulled down to get more article lists). Here we define an array constant named FETCHPAGES to manage the category list to be crawled.


String html = HttpUtils.sendGet("https://blog.csdn.net/");

   Document doc = Jsoup.parse(html);
   Elements as = doc.select(".nav_com").select("li").select("a");

   // Collect article a Tags
   List<Elements> blogList = Lists.newArrayListWithCapacity(as.size());
   for (Element a : as) {

     if (!FETCHPAGES.contains(a.text())) {
       continue;
     }

     String fetcheUrl = "https://blog.csdn.net" + a.attr("href");
     System.out.println(fetcheUrl);
     String blogHtml = HttpUtils.sendGet(fetcheUrl);

     Document blogDoc = Jsoup.parse(blogHtml);

     Elements blogAs = blogDoc.select(".title").select("h2").select("a");

     System.out.println(blogAs);
     blogList.add(blogAs);
   }

After collecting the list of articles, we need to log in (after logging in, there will be problems in the collection list, and the specific reason is unknown). Here, we just need to log in for the next comment.

// Log in after collecting a tags, or you will lose a lot of a tags. The specific reason is unknown.
   loginCsdnPager();

   BufferedOutputStream bos = null;
   // Comment success counter
   int count = 0;
   try {
     // Print the successful url of the comment to the file
     File file = new File("D:/tmp/successLog/success.log");
     bos = new BufferedOutputStream(new FileOutputStream(file));
     // Crawl all a Tags
     for (Elements blogs : blogList) {

       for (Element blog : blogs) {

         // Get the article url
         String href = blog.attr("href");

         // Get the ID after the article url, which needs to be used when commenting
         String commitSuffixUrl = href.substring(href.lastIndexOf("/") + 1);

         // Open article
         String blogHtml = HttpUtils.sendGet(href);
         System.out.println(blog.text() + "------------" + blog.attr("href"));

         Document blogDoc = Jsoup.parse(blogHtml);
         Elements titleAs = blogDoc.select(".title-box").select("a");

         System.out.println(titleAs);

         if (titleAs != null && !titleAs.isEmpty()) {
           // Comment request url prefix
           String commitPrefixUrl = titleAs.get(0).attr("href");
           //
           System.out.println(titleAs.text() + "-----------" + commitPrefixUrl);

           // Splicing comment request url
           String commitUrl = commitPrefixUrl + "/phoenix/comment/submit?id=" + commitSuffixUrl;

           System.out.println("commitUrl ==" + commitUrl);

           // Build the body required for comment request
           List<NameValuePair> nvps = new ArrayList<NameValuePair>();
           nvps.add(new BasicNameValuePair("replyId", ""));
           nvps.add(new BasicNameValuePair("content",
               "plus Wei letter ammlysouw Free collection java,python,Front end, Android, database, big data, IOS Etc."));

           // Comment on
           String postRequest = HttpUtils.sendPost(commitUrl, nvps);
           JSONObject jsonObj = JSONObject.parseObject(postRequest);

           System.out.println(postRequest);

           // Comment result, success is 1
           if (jsonObj.getInteger("result") == 1) {

             String articalUrl = commitPrefixUrl + "/article/details/" + commitSuffixUrl + "\n";
             System.out.println("success articalUrl is " + articalUrl);
             // Record the successful url of the comment to the file
             bos.write(articalUrl.getBytes());
             bos.flush();
             count++;
           } else {
             // Unsuccessful indicates that the request is too fast and the thread sleeps for 2 seconds. The article failed to comment will be lost here.
             try {
               Thread.currentThread().sleep(2 * 60 * 1000);
             } catch (InterruptedException e) {
               // TODO Auto-generated catch block
               e.printStackTrace();
             }
           }
         } else {
           continue;
         }
       }
     }
   } catch (IOException e) {
     System.out.println("error is " + e);
   } finally {

     if (bos != null) {
       try {
         // Document the successful delivery
         bos.write((count + "\n").getBytes());
         bos.flush();
         System.out.println("bos will colse");
         bos.close();
       } catch (IOException e) {
         // TODO Auto-generated catch block
         System.out.println("error is " + e);
       }
     }
   }

After login, the collected article url is parsed, then the url is opened, the comment request url and the request parameters are spliced, and the post request is initiated. The comment will be limited by the website server three times later, indicating that the comment is too fast and needs to sleep for 2 seconds before continuing. Finally, the url and the number of successful comments will be recorded in the local file for easy viewing.

This public account is free * * to provide csdn download service and massive it learning resources * * if you are ready to enter the IT pit and aspire to become an excellent program ape, these resources are suitable for you, including but not limited to java, go, python, springcloud, elk, embedded, big data, interview materials, front-end and other resources. At the same time, we have set up a technology exchange group. There are many big guys who will share technology articles from time to time. If you want to learn and improve together, you can reply [2] in the background of the public account. Free invitation plus technology exchange groups will learn from each other and share programming it related resources from time to time.

Scan the code to pay attention to the wonderful content and push it to you at the first time

Keywords: Programming Java Python Big Data JSP

Added by betman_kizo on Mon, 21 Oct 2019 12:11:57 +0300