1, Zhixuan collection novel crawling and analysis software based on JAVA
Abstract
With the development of network, there are more and more electronic novels. It has become a big problem to quickly and effectively obtain effective information from a large number of novels. Java is one of the more popular languages at present. It is necessary to develop java web crawler technology. In the development of Java Web crawler, we mainly use Http Client, jsup technology, MySQL database, Java Swing and other technologies to crawl the Zhixuan book collection novel website. Finally, analyze the crawled data, provide the user with the interface for weighted analysis, provide the user with the interface for direct download, and then store the data in MySQL. The system can obtain the latest novels, the novels with the most total hits and the novels with the most monthly hits. It can collect novels and has a strong function of screening novels, which is convenient for users to find their own novels faster and more accurately.
Keywords: web crawler, Java, graphical interface, database, book;
1.1 requirements overview
1.1.1 problem description
1) The following functions are implemented in java language:
Realize the crawling and analysis of online novels. The software can crawl each novel on the website and obtain a series of relevant information about the novel, such as novel name, author name and various evaluation numbers. Users can filter according to their preferences, such as weighted ranking of evaluation numbers, selection of novel types, etc. The system can also give a comprehensive recommendation degree according to the real-time praise number of the novel for users' reference. Users can also get the latest novels, the novels with the most total hits, the novels with the most monthly hits, and directly download their favorite novels.
(1) The program has a login registration window.
(2) The program can crawl to all novels on a novel website.
(3) Users can filter novels according to their preferences.
(4) Users can also select better novels by weighted ranking according to the number of novels evaluated.
(5) The program will give a comprehensive recommendation degree according to the number of real-time evaluations of the novel for users' reference.
(6) The program can get the latest novels, the novels with the most total hits and the most monthly hits.
(7) Users can collect novels and update novel evaluation.
(8) Users can download their favorite novels directly on the software.
2) Write curriculum design report or curriculum design summary curriculum design report requirements: the summary report includes demand analysis, overall design, detailed design, coding (write out programming steps in detail), test steps and contents, curriculum design summary, reference materials, etc.
1.1.2 subject requirements:
(1) Complete the application according to the software process of analysis, design, coding, debugging and testing.
(2) User information includes user name, password and QQ number, which is convenient for login and password modification.
(3) The program includes login, registration and password retrieval window. The main window consists of 9 sub panels and novel introduction window. The main window is divided into 7 types of novel panel, intelligent search panel and user panel.
1.1.3 realization requirements:
(1) Users can log in, register and retrieve their passwords.
(2) The graphical interface realizes data input, output and analysis.
1.2 demand environment
The equipment required in this course design is hardware requirements and software configuration requirements. The specific requirements are as follows:
(1) Hardware requirements: one computer.
(2) Software configuration: JDK1.8+Idea.
1.3 knowledge points involved in this subject
-
Use of classes
-
Inheritance and polymorphism
-
Collection and Generics
-
Graphical interface
-
event processing
-
Multithreaded concurrent system
-
Http Client
-
JSP technology
-
JDBC and database operation
Chapter II system design
2.1 functional module diagram
According to the demand analysis, this course design is a novel crawling and analysis program by calling classes and functions, so its functional modules are divided into: login and registration, crawling the whole station novels, querying various types of novels, analyzing the novels selected by the user, downloading the novels selected by the user, searching the novels according to the novel name, and the user can filter the novels weighted according to his preferences, Exit the system and other functional modules.
Chapter III database design
3.1 establishment of database
Establish myfiction database.
/* Navicat MySQL Data Transfer Source Server : admin Source Server Type : MySQL Source Server Version : 80023 Source Host : localhost:3306 Source Schema : myfiction Target Server Type : MySQL Target Server Version : 80023 File Encoding : 65001 Date: 29/10/2021 23:34:27 */ SET NAMES utf8mb4; SET FOREIGN_KEY_CHECKS = 0; -- ---------------------------- -- Table structure for fiction -- ---------------------------- DROP TABLE IF EXISTS `fiction`; CREATE TABLE `fiction` ( `name` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci NOT NULL, `writer` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci NULL DEFAULT NULL, `class1` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci NULL DEFAULT NULL, `class2` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci NULL DEFAULT NULL, `xc` int NULL DEFAULT NULL, `lc` int NULL DEFAULT NULL, `gc` int NULL DEFAULT NULL, `kc` int NULL DEFAULT NULL, `dc` int NULL DEFAULT NULL, `id` int NOT NULL, `brief` varchar(2550) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci NULL DEFAULT NULL, `size` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci NULL DEFAULT NULL, PRIMARY KEY (`name`, `id`) USING BTREE, INDEX `id`(`id`) USING BTREE ) ENGINE = InnoDB CHARACTER SET = utf8mb4 COLLATE = utf8mb4_0900_ai_ci ROW_FORMAT = DYNAMIC; -- ---------------------------- -- Table structure for user -- ---------------------------- DROP TABLE IF EXISTS `user`; CREATE TABLE `user` ( `username` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci NOT NULL, `password` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci NULL DEFAULT NULL, `qq` int NULL DEFAULT NULL, `address` varchar(500) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci NULL DEFAULT NULL, PRIMARY KEY (`username`) USING BTREE ) ENGINE = InnoDB CHARACTER SET = utf8mb4 COLLATE = utf8mb4_0900_ai_ci ROW_FORMAT = DYNAMIC; -- ---------------------------- -- Table structure for user_fiction -- ---------------------------- DROP TABLE IF EXISTS `user_fiction`; CREATE TABLE `user_fiction` ( `FCID` int NOT NULL, `UID` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci NOT NULL, PRIMARY KEY (`FCID`, `UID`) USING BTREE, INDEX `FK_Reference_10`(`UID`) USING BTREE, CONSTRAINT `FK_Reference_10` FOREIGN KEY (`UID`) REFERENCES `user` (`username`) ON DELETE RESTRICT ON UPDATE RESTRICT, CONSTRAINT `FK_Reference_9` FOREIGN KEY (`FCID`) REFERENCES `fiction` (`id`) ON DELETE RESTRICT ON UPDATE RESTRICT ) ENGINE = InnoDB CHARACTER SET = utf8 COLLATE = utf8_general_ci ROW_FORMAT = Dynamic; SET FOREIGN_KEY_CHECKS = 1;
Table 3.1 user table (user table)
name | type | null | Primary key | explain |
---|---|---|---|---|
username | varchar | no | √ | Login user name |
password | varchar | no | Login password | |
int | no | Bind QQ number | ||
address | varchar | no | User default download path |
Table 3.2 fiction table
name | type | null | Primary key | explain |
---|---|---|---|---|
name | varchar | no | √ | Novel name |
writer | varchar | no | Novel writer | |
class1 | varchar | no | Novel type 1 | |
class2 | varchar | no | Novel type 2 | |
xc | int | no | Fairy grass number | |
lc | int | no | Grain and grass number | |
gc | int | no | Hay number | |
kc | int | no | Withered grass number | |
dc | int | no | Number of poisonous weeds | |
id | int | no | √ | Novel ID |
brief | varchar | no | Introduction to the novel |
Table 3.3 user_fiction table (collection table)
name | type | null | Primary key | explain |
---|---|---|---|---|
FCID | int | no | √ | Novel ID |
UID | varchar | no | √ | User ID |
Chapter IV test and operation
Conclusion
The system can complete the user's login, registration and password retrieval, crawl and analyze the online novels, obtain the latest novels, obtain the novels with the largest total and monthly hits, and download the novels directly. The Java based online novel information crawling and analysis system crawls a novel website through crawler technology, and then makes a comprehensive analysis according to the evaluation number, hits and publishing time of the novel. Finally, it gives users a recommendation, so that users can quickly find their favorite novels and find their own novels, It can save time and increase efficiency in the process of finding novels.
The development process of this system mainly uses graphical interface, file flow, multi-threaded concurrent system, Http Client, jsoup technology, JDBC and database operation. Multi thread concurrent crawling network novels to improve the efficiency of the program. The program makes extensive use of the static internal class and static constant method in the singleton mode, and ensures that the instance is not created many times through the privatization construction method, which solves the problem of frequent creation and destruction of the same instance, so as to save system resources.
There are five layers in this project, namely data access layer, entity layer, service layer, tool layer and visual layer. After the framework is set up, the maintenance and added functions of the project appear very clear.