ElasticSearch practical learning notes (from introduction to mastery)

Article catalog

ElasticSearch

Author: CodingGorit

Date: October 22, 2020

Note: the learning notes are recorded from the crazy God at station B: ElasticSearch learning

1, Learning outline

  1. install
  2. Ecosphere
  3. Word splitter lk
  4. RestFul operation ES
  5. CRUD
  6. SpringBoot inherits ElasticSearch (from principle analysis!!!)
  7. Crawler crawls data!!! JD.COM
  8. Actual combat, simulated full-text retrieval

Search for relevant ES (used in large amount of data)

Lucene is a set of information retrieval Toolkit (Jar package, excluding search engine system)! Solr Included: index structure! Read and write index tool! Sorting, search rules... Tool classes Lucene and eslasticearch relationship: ElasticSearch is encapsulated and enhanced based on Lucene

2, ElasticSearch overview

Abbreviated as es

  • An open source and highly extensible distributed full text retrieval engine
  • Near real-time storage and retrieval of data
  • es uses java to develop and uses license as its core to realize all indexing and search functions
  • Its purpose is to hide the complexity of Lucene through a simple RESTFul API, so as to make full-text search simple

3, ElasticSearch installation

  • JDK 1.8
  1. Download, unzip
  2. Familiar with directory
bin: Startup file
	config: configuration file
	log4j: log file
	jvm.options: java Configuration of virtual machine shutdown first
	elasticsearch.xml:	elasticsearch Configuration file!
lib: relevant jar package
logs: journal
modules: functional module 
plugins: plug-in unit ik	
  1. Start, access 9200
  2. Access test: localhost:9200

Install the visualization plug-in es head plug-in

  1. Download address: https://github.com/mobz/elasticsearch-head/
  2. start-up
npm install
npm run start

Configure cross domain in elasticSearch.yml

http.cors.enabled: true
http.cors.allow-origin: "*"

Install kibana

  1. Download, unzip
  2. internationalization

Find the kibana.yml file under config and modify the last line i18n.locale: "zh CN"

4, ES core concepts

  1. Indexes
  2. Field type (mapping)
  3. documents

What are clusters, nodes, indexes, types, documents, shards, and mappings?

elasticSearch is an objective comparison between document oriented, relational database and elasticSearch! Everything is JSON { }

Noun correspondence

ElasticSearch

Relational DB

Indexes

database

types

tables

documents

rows

fields

Fields (columns)

elasticSearch (cluster) can contain multiple indexes (databases), each index can contain multiple types (tables), each type contains multiple documents (rows), and each document contains multiple fields (columns)

physical design

elasticSearch is a cluster

file

Record by record

user
	zs: 15
	ls: 22

type

Automatic identification, string,

Indexes

database

5, IK word breaker plug-in

Add the downloaded to the plugin

Skip, Episode 8

  • Elastic search plugin can use this command to view the loaded plug-ins
  • ik_smart and ik_max_word (most fine-grained Division)
  • kibana test
  • Custom word segmentation

6, Rest style description

Basic Rest command

method

url address

describe

PUT

localhost:9200 / index name / type name / document id

Create and update the document (specify the document id). If the document id remains unchanged and submitted repeatedly, the previous data can be directly overwritten

POST

localhost:9200 / index name / type name

Create document (random document id)

POST

localhost:9200 / index name / type name / document id/_update

Modify document

DELETE

localhost:9200 / index name / type name / document id

remove document

GET

localhost:9200 / index name / type name / document id

Query document by document id

POST

localhost:9200 / index name / type name/_ seaarch

Query all data

Basic test

6.1 index creation

  1. Create an index
PUT /Index name/~Type name~/file id
{
  "name":"Gorit",
  "age": 18,
  "gender": "male"
}

Return value, data added successfully

#! Deprecation: [types removal] Specifying types in document index requests is deprecated, use the typeless endpoints instead (/{index}/_doc/{id}, /{index}/_doc, or /{index}/_create/{id}).
{
  "_index" : "test",
  "_type" : "type1",
  "_id" : "1", 
  "_version" : 1, // Modification times
  "result" : "created", // state
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 0,
  "_primary_term" : 1
}
  1. Create index rule
PUT /test1/
{
  "mappings": {
    "properties": {
      "name": {
        "type": "text"
      },
      "age": {
        "type": "long"
      },
      "birthday": {
        "type": "date"
      }
    }
  }
}

Return value

{
  "acknowledged" : true,
  "shards_acknowledged" : true,
  "index" : "test1"
}

es default configuration field type!

6.2 query

GET test

# result
{
  "test" : {
    "aliases" : { },
    "mappings" : {
      "properties" : {
        "age" : {
          "type" : "long"
        },
        "gender" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "name" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        }
      }
    },
    "settings" : {
      "index" : {
        "creation_date" : "1603203146037",
        "number_of_shards" : "1",
        "number_of_replicas" : "1",
        "uuid" : "q47lWt_4ToOBo1rxQ1pPNw",
        "version" : {
          "created" : "7060299"
        },
        "provided_name" : "test"
      }
    }
  }
}


GET test1

{
  "test1" : {
    "aliases" : { },
    "mappings" : {
      "properties" : {
        "age" : {
          "type" : "long"
        },
        "birthday" : {
          "type" : "date"
        },
        "name" : {
          "type" : "text"
        }
      }
    },
    "settings" : {
      "index" : {
        "creation_date" : "1603203453667",
        "number_of_shards" : "1",
        "number_of_replicas" : "1",
        "uuid" : "a-upVXJwR7u7JZztTjyVGg",
        "version" : {
          "created" : "7060299"
        },
        "provided_name" : "test1"
      }
    }
  }
}

Extension: through_ cat / can get es a lot of current information

GET _cat/health

GET _cat/indices?v

6.3 modify index

Submit PUT and overwrite it

Modify data

PUT /test/type1/1
{
  "name":"Gorit111",
  "age": 18,
  "gender": "male"
}

Modification results

#! Deprecation: [types removal] Specifying types in document index requests is deprecated, use the typeless endpoints instead (/{index}/_doc/{id}, /{index}/_doc, or /{index}/_create/{id}).
{
  "_index" : "test",
  "_type" : "type1",
  "_id" : "1",
  "_version" : 2,
  "result" : "updated",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 1,
  "_primary_term" : 1
}

New method POST command update

POST /test/_doc/1/_update
{
  "doc": {
      "name":"Zhang San"
  }
}

// result
{
  "_index" : "test",
  "_type" : "_doc",
  "_id" : "1",
  "_version" : 3,
  "_seq_no" : 2,
  "_primary_term" : 1,
  "found" : true,
  "_source" : {
    "name" : "Zhang San",
    "age" : 18,
    "gender" : "male"
  }
}

6.4 deleting indexes

Delete index!!!

DELETE test

Delete through the delete command. Judge whether the index or document is deleted according to your request

7, About document operations

7.1 basic operation (review and consolidation)

  1. Add data (add multiple records)
PUT /gorit/user/1
{
  "name": "CodingGorit",
  "age": 23,
  "desc": "An independent individual developer",
  "tags": ["Python","Java","JavaScript"]
}

PUT /gorit/user/2
{
  "name": "Loong",
  "age": 20,
  "desc": "Full Stack Developer ",
  "tags": ["Python","JavaScript"]
}

result:

#! Deprecation: [types removal] Specifying types in document index requests is deprecated, use the typeless endpoints instead (/{index}/_doc/{id}, /{index}/_doc, or /{index}/_create/{id}).
{
  "_index" : "gorit",
  "_type" : "user",
  "_id" : "1",
  "_version" : 1,
  "result" : "created",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 0,
  "_primary_term" : 1
}
  1. get data
GET /gorit/user/_search   # Query all data

GET /gorit/user/1 # Query single data
  1. Update data PUT
PUT /gorit/user/3
{
  "name": "Li Si 222",
  "age": 20,
  "desc": "Java Development Engineer",
  "tags": ["Python","Java"]
}

# The PUT update field is incomplete and the data will be blank
  1. post _update, this method is recommended!
# The modification method is the same as PUT, which will make the data blank
POST /gorit/user/1
{
  "doc": {
    "name": "coco"
  }
}

# Modifying data will not be stagnant and will be more efficient
POST /gorit/user/1/_update
{
  "doc": {
    "name": "coco"
  }
}

Simple search!

# Query a record
GET /gorit/user/1

# Query all
GET /gorit/user/_search

# For conditional query [exact matching], if we don't have a property setting field, it will be set to keyword by default. This keyword field is matched by full matching. If it is text type, fuzzy query will take effect
GET /gorit/user/_search?q=name:coco

7.2 complex query search: select (sorting, paging, highlighting, fuzzy query, accurate query)!

  1. Filter plus specified field query
GET /gorit/user/_search
{
  "query": {
    "match": {
      "name": "Li Si"
    }
  },
  "_source": ["name","desc"]
}

7.3 sorting

GET /gorit/user/_search
{
  "query": {
    "match": {
      "name": "gorit"
    }
  },
  "sort": [
    {
     "age": {
       "order": "desc"
     }
    }
  ]
}

7.4 paging query

Using as like as two peas, from and size, paging queries are exactly the same as limit pageSize.

  1. from page
  2. How many pieces of data are returned
GET /gorit/user/_search
{
  "query": {
    "match": {
      "name": "Li Si"
    }
  },
  "sort": [
    {
     "age": {
       "order": "desc"
     }
    }
  ],
  "from": 0,
  "size": 1
}

7.5 filiter interval query

# Query by age range
GET /gorit/user/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "name": "gorit"
          }
        }
      ],
      "filter": [
        {
          "range": {
            "age": {
              "gte": 1,
              "lte": 25
            }
          }
        }
      ]
    }
  }
}
  • gt greater than
  • gte is greater than or equal to
  • lt less than
  • lte less than or equal to

7.6 Boolean query

must (and), all conditions must meet where id=1 and name = xxx

# Boolean query
GET /gorit/user/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "name": "gorit"
          }
        },{
          "match": {
            "age": "16"
          }
        }
      ]
    }
  }
}

7.7 matching multiple conditions

Match at the same time

# Multiple conditions are separated by spaces. As long as one condition is met, it can be found out. At this time, it can be judged according to the score
GET /gorit/user/_search
{
  "query": {
    "match": {
      "tags": "Java Python"
    }
  }
}

7.7 precise query

term query is directly through the inverted index to find the specified entries accurately!

About participle

  • term, direct and accurate query
  • match: can use word splitter to parse!! (first analyze the document, and then query through the analyzed document!!!)

Two types of text keyword

Conclusion:

  • Textseparable
  • keyword cannot be further divided

7.8 highlight query

# Highlight the query and search results. You can highlight or add custom highlighting conditions
GET /gorit/user/_search
{
  "query": {
    "match": {
      "name": "Gorit"
    }
  },
  "highlight": {
    "pre_tags": "", 
    "post_tags": "", 
    "fields": {
      "name": {}
    }
  }
}

# Response results
#! Deprecation: [types removal] Specifying types in search requests is deprecated.
{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 1.6375021,
    "hits" : [
      {
        "_index" : "gorit",
        "_type" : "user",
        "_id" : "6",
        "_score" : 1.6375021,
        "_source" : {
          "name" : "Gorit",
          "age" : 16,
          "desc" : "Operation and maintenance engineer",
          "tags" : [
            "Linux",
            "c++",
            "python"
          ]
        },
        "highlight" : {
          "name" : [
            "Gorit"
          ]
        }
      }
    ]
  }
}

These MySQL can also be used, but MySQL is less efficient

  • matching
  • Match by criteria
  • Exact match
  • Interval range matching
  • Matching field filtering
  • Multi condition query
  • Highlight query
  • Inverted index

8, Integrated SpringBoot

Find official documents

Specific test

  1. Create index
  2. Determine whether the index exists
  3. Delete index
  4. create documents
  5. Operation document
// Coordinate dependence
		<dependency>
			<groupId>org.springframework.boot</groupId>
			<artifactId>spring-boot-starter-data-elasticsearch</artifactId>
		</dependency>

// Core code            
package cn.gorit;

import cn.gorit.pojo.User;
import com.alibaba.fastjson.JSON;
import javafx.scene.control.IndexRange;
import org.apache.lucene.util.QueryBuilder;
import org.elasticsearch.action.admin.indices.delete.DeleteIndexRequest;
import org.elasticsearch.action.bulk.BulkRequest;
import org.elasticsearch.action.bulk.BulkResponse;
import org.elasticsearch.action.delete.DeleteRequest;
import org.elasticsearch.action.delete.DeleteResponse;
import org.elasticsearch.action.get.GetRequest;
import org.elasticsearch.action.get.GetResponse;
import org.elasticsearch.action.index.IndexRequest;
import org.elasticsearch.action.index.IndexResponse;
import org.elasticsearch.action.search.SearchRequest;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.action.support.master.AcknowledgedRequest;
import org.elasticsearch.action.support.master.AcknowledgedResponse;
import org.elasticsearch.action.update.UpdateRequest;
import org.elasticsearch.action.update.UpdateResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.client.indices.CreateIndexRequest;
import org.elasticsearch.client.indices.CreateIndexResponse;
import org.elasticsearch.client.indices.GetIndexRequest;
import org.elasticsearch.common.unit.TimeValue;
import org.elasticsearch.common.xcontent.XContent;
import org.elasticsearch.common.xcontent.XContentType;
import org.elasticsearch.index.query.MatchAllQueryBuilder;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.index.query.TermQueryBuilder;
import org.elasticsearch.search.SearchHit;
import org.elasticsearch.search.builder.SearchSourceBuilder;
import org.elasticsearch.search.fetch.subphase.FetchSourceContext;
import org.json.JSONObject;
import org.junit.jupiter.api.Test;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.beans.factory.annotation.Qualifier;
import org.springframework.boot.test.context.SpringBootTest;
import org.springframework.http.codec.cbor.Jackson2CborDecoder;

import java.io.IOException;
import java.util.ArrayList;
import java.util.concurrent.TimeUnit;

/**
 * es 7.6.2 API test
 */
@SpringBootTest
class DemoApplicationTests {

	// Name match
	@Autowired
	@Qualifier("restHighLevelClient")
	private RestHighLevelClient client;

	@Test
	void contextLoads() {

	}
	// Index creation
	@Test
	void testCreateIndex() throws IOException {
		// 1. The index creation request is equivalent to put / spirit_ index
		CreateIndexRequest request = new CreateIndexRequest("gorit_index");
		// 2. Execute the IndicesClient creation request, and get the response after the request
		CreateIndexResponse response = client.indices().create(request, RequestOptions.DEFAULT);
		System.out.println(response);
	}

	// Test to obtain the index and determine whether it exists
	@Test
	void testGetIndexExist() throws IOException {
		GetIndexRequest request = new GetIndexRequest("gorit_index");
		boolean exist = client.indices().exists(request,RequestOptions.DEFAULT);
		System.out.println(exist);
	}

	// Delete index
	@Test
	void testDeleteIndex() throws IOException {
		DeleteIndexRequest request = new DeleteIndexRequest("gorit_index");
		// delete
		AcknowledgedResponse delete	= client.indices().delete(request,RequestOptions.DEFAULT);
		System.out.println(delete.isAcknowledged());
	}

	// Add document
	@Test
	void testAddDocument() throws IOException {
		// create object
		User u = new User("Gorit",3);
		// Create request
		IndexRequest request = new IndexRequest("gorit_index");

		// Rule put / spirit_ index/_ doc/1
		request.id("1");
		request.timeout(TimeValue.timeValueSeconds(3));
		request.timeout("1s");

		// Put data into request json
		IndexRequest source = request.source(JSON.toJSONString(u), XContentType.JSON);
		// client poke request
		IndexResponse response = client.index(request, RequestOptions.DEFAULT);

		System.out.println(response.toString());
		System.out.println(response.status());// Returns the corresponding status CREATED
	}

	// Get the document and judge the existence of get /index/_doc/1
	@Test
	void testIsExists() throws IOException {
		GetRequest getRequest = new GetRequest("gorit_index", "1");

		// Do not get returned_ The context of source
		getRequest.fetchSourceContext(new FetchSourceContext(false));
		getRequest.storedFields("_none_");

		boolean exists = client.exists(getRequest, RequestOptions.DEFAULT);
		System.out.println(exists);
	}

	// Get document information
	@Test
	void testGetDocument() throws IOException {
		GetRequest getRequest = new GetRequest("gorit_index", "1");
		GetResponse getResponse = client.get(getRequest, RequestOptions.DEFAULT);
		// Print the contents of the document
		System.out.println(getResponse.getSourceAsString());
		System.out.println(getResponse); // Returning all the contents is the same as the command
	}

	// Update document information
	@Test
	void testUpdateDocument() throws IOException {
		UpdateRequest updateRequest = new UpdateRequest("gorit_index", "1");
		updateRequest.timeout("1s");

		User user = new User("CodingGoirt", 18);
		updateRequest.doc(JSON.toJSONString(user),XContentType.JSON);

		UpdateResponse updateResponse = client.update(updateRequest, RequestOptions.DEFAULT);
		// Print the contents of the document
		System.out.println(updateResponse.status());
		System.out.println(updateResponse); // Returning all the contents is the same as the command
	}

	// Delete document record
	@Test
	void testDeleteDocument() throws IOException {
		DeleteRequest deleteRequest = new DeleteRequest("gorit_index", "1");
		deleteRequest.timeout("1s");

		DeleteResponse deleteResponse = client.delete(deleteRequest, RequestOptions.DEFAULT);
		// Print the contents of the document
		System.out.println(deleteResponse.status());
		System.out.println(deleteResponse); // Returning all the contents is the same as the command
	}

	// Special, real projects. Batch insert data

	@Test
	void testBulkRequest() throws IOException {
		BulkRequest bulkRequest = new BulkRequest();
		bulkRequest.timeout("10s");

		ArrayList<User> userList = new ArrayList<>();
		userList.add(new User("Zhang San 1",1));
		userList.add(new User("Zhang San 2",2));
		userList.add(new User("Zhang San 3",3));
		userList.add(new User("Zhang San 4",4));
		userList.add(new User("Zhang San 5",5));
		userList.add(new User("Zhang san6",6));
		userList.add(new User("Zhang san7",7));

		// Batch request
		for (int i=0;i<userList.size();i++) {
			// Batch update and batch delete can be modified here to the corresponding request
			bulkRequest.add(new IndexRequest("gorit_index")
			.id(""+(i+1))
			.source(JSON.toJSONString(userList.get(i)),XContentType.JSON));
		}

		BulkResponse bulkItemResponses = client.bulk(bulkRequest, RequestOptions.DEFAULT);
		System.out.println(bulkItemResponses.hasFailures()); // Failed
		System.out.println(bulkItemResponses.status());

	}

	// query
	// 	SearchRequest search request
	//  SearchSourceBuilder condition construction
	// HighlightBuilder build highlights
	// TermQueryBuilder exact query
	// MatchAllQueryBuilder
	//	xxx QueryBuilder 
	@Test
	void testSearch() throws IOException {
		SearchRequest searchRequest = new SearchRequest("gorit_index");
		// Build search criteria
		SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
		/**
		 *   The query criteria are implemented using the QueryBuilders tool class
		 * 	 QueryBuilders.termQuery accurate
		 * 	 QueryBuilders.matchAllQueryBuilder() Match all
		 */

		TermQueryBuilder termQueryBuilder = QueryBuilders.termQuery("name", "gorit1");//Precise query
//		MatchAllQueryBuilder matchAllQueryBuilder = QueryBuilders.matchAllQuery();

		sourceBuilder.query(termQueryBuilder);
		// paging
		sourceBuilder.from();
		sourceBuilder.size();
		sourceBuilder.highlighter(); // Set highlight
		sourceBuilder.timeout(new TimeValue(60, TimeUnit.SECONDS));

		// Build search
		searchRequest.source(sourceBuilder);

		SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);
		System.out.println(JSON.toJSONString(searchResponse.getHits()));
		System.out.println("==========================================");
		for (SearchHit documentFields: searchResponse.getHits().getHits()) {
			System.out.println(documentFields.getSourceAsMap());
		}
	}

}

9, Actual combat

Project dependency
        
        <dependency>
            <groupId>org.jsoupgroupId>
            <artifactId>jsoupartifactId>
            <version>1.10.2version>
        dependency>
        <dependency>
            <groupId>com.alibabagroupId>
            <artifactId>fastjsonartifactId>
            <version>1.2.68version>
        dependency>
        <dependency>
            <groupId>org.springframework.bootgroupId>
            <artifactId>spring-boot-starter-data-elasticsearchartifactId>
        dependency>
        <dependency>
            <groupId>org.springframework.bootgroupId>
            <artifactId>spring-boot-starter-thymeleafartifactId>
        dependency>
        <dependency>
            <groupId>org.springframework.bootgroupId>
            <artifactId>spring-boot-starter-webartifactId>
        dependency>

        <dependency>
            <groupId>org.springframework.bootgroupId>
            <artifactId>spring-boot-devtoolsartifactId>
            <scope>runtimescope>
            <optional>trueoptional>
        dependency>
        <dependency>
            <groupId>org.springframework.bootgroupId>
            <artifactId>spring-boot-configuration-processorartifactId>
            <optional>trueoptional>
        dependency>
        <dependency>
            <groupId>org.projectlombokgroupId>
            <artifactId>lombokartifactId>
            <optional>trueoptional>
        dependency>
        <dependency>
            <groupId>org.springframework.bootgroupId>
            <artifactId>spring-boot-starter-testartifactId>
            <scope>testscope>
            <exclusions>
                <exclusion>
                    <groupId>org.junit.vintagegroupId>
                    <artifactId>junit-vintage-engineartifactId>
                exclusion>
            exclusions>
        dependency>

Reptile

configuration file

package cn.gorit.config;

import org.apache.http.HttpHost;
import org.elasticsearch.client.RestClient;
import org.elasticsearch.client.RestHighLevelClient;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;

/**
 * Spring step
 * 1. Find object
 * 2. Put it into spring
 * 3. Analysis source code
 *
 * @Classname ElasticSearchConfig
 * @Description TODO
 * @Date 2020/10/21 17:20
 * @Created by CodingGorit
 * @Version 1.0
 */
@Configuration // xml -bean
public class ElasticSearchConfig {

    @Bean
    public RestHighLevelClient restHighLevelClient() {
        RestHighLevelClient client = new RestHighLevelClient(
                RestClient.builder(
                        new HttpHost("localhost", 9200, "http")
                )
        );
        return client;
    }

}

Crawl the content of JD search

config configuration class

package cn.gorit.util;

import cn.gorit.pojo.Content;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
import org.springframework.stereotype.Component;

import java.net.MalformedURLException;
import java.net.URL;
import java.util.ArrayList;
import java.util.List;

/**
 * @Classname HtmlParseUtil
 * @Description TODO
 * @Date 2020/10/21 23:17
 * @Created by CodingGorit
 * @Version 1.0
 */
@Component
public class HtmlParseUtil {

//    public static void main(String[] args) throws Exception {
//        new HtmlParseUtil().parseJD("English"). forEach(System.out::println);
//    }

    public List<Content> parseJD(String keyword) throws Exception {
        // Request url
        // Networking, unable to get ajax data
        String url = "https://search.jd.com/Search?keyword=wd&enc=utf-8";
        // Parse web page (returned Document object)
        Document document = Jsoup.parse(new URL(url.replace("wd",keyword)),30000);
        // Get all node labels
        Element element = document.getElementById("J_goodsList");
        // Get all li elements
        Elements elements = element.getElementsByTag("li");
        // Gets the content in the element
        List<Content> goodsList = new ArrayList<>();
        for (Element e: elements) {
            String img = e.getElementsByTag("img").eq(0).attr("data-lazy-img");
            String price = e.getElementsByClass("p-price").eq(0).text();
            String title = e.getElementsByClass("p-name").eq(0).text();

            goodsList.add(new Content(title,img,price));
//            System.out.println(img);
//            System.out.println(price);
//            System.out.println(title);
        }
        return goodsList;
    }
}

Service method

package cn.gorit.service;

import cn.gorit.pojo.Content;
import cn.gorit.util.HtmlParseUtil;
import com.alibaba.fastjson.JSON;
import org.elasticsearch.action.bulk.BulkRequest;
import org.elasticsearch.action.bulk.BulkResponse;
import org.elasticsearch.action.index.IndexRequest;
import org.elasticsearch.action.search.SearchRequest;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.common.text.Text;
import org.elasticsearch.common.unit.TimeValue;
import org.elasticsearch.common.xcontent.XContentType;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.index.query.TermQueryBuilder;
import org.elasticsearch.search.SearchHit;
import org.elasticsearch.search.builder.SearchSourceBuilder;
import org.elasticsearch.search.fetch.subphase.highlight.HighlightBuilder;
import org.elasticsearch.search.fetch.subphase.highlight.HighlightField;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Service;

import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import java.util.Map;
import java.util.concurrent.TimeUnit;

/**
 * @Classname ContentService
 * @Description TODO
 * @Date 2020/10/22 18:44
 * @Created by CodingGorit
 * @Version 1.0
 */
@Service
public class ContentService {

    @Autowired
    private RestHighLevelClient restHighLevelClient;

    // Can not be used directly, as long as the Spring container
    public static void main(String[] args) throws Exception {
        new ContentService().parseContent("java");
    }

    // 1. Put the parsed data into the es index
    public Boolean parseContent (String keywords) throws Exception {
        // Get the information of the queried list
        List<Content> contents = new HtmlParseUtil().parseJD(keywords);
        // Put the queried data into es
        BulkRequest bulkRequest = new BulkRequest();
        bulkRequest.timeout("2m");

        for (int i=0;i < contents.size();++i) {
            bulkRequest.add(
                    new IndexRequest("jd_goods")
                    .source(JSON.toJSONString(contents.get(i)),XContentType.JSON));
        }
        BulkResponse bulkResponse = restHighLevelClient.bulk(bulkRequest, RequestOptions.DEFAULT);
        return !bulkResponse.hasFailures();
    }

    // 2. Obtain these data and realize the basic search function
    public List<Map<String,Object>> searchPagehighLight   (String keyword, int pageNo,int pageSize) throws IOException {
        if (pageNo <= 1)
            pageNo = 1;

        // Clear conditions
        SearchRequest searchRequest = new SearchRequest("jd_goods");

        SearchSourceBuilder builder = new SearchSourceBuilder();

        builder.from(pageNo);
        builder.size(pageSize);
        // Accurate matching
        TermQueryBuilder termQueryBuilder = QueryBuilders.termQuery("title",keyword);
        builder.query(termQueryBuilder);
        builder.timeout(new TimeValue(60, TimeUnit.SECONDS));

        // Highlight 
        HighlightBuilder highlightBuilder = new HighlightBuilder();
        highlightBuilder.field("title");
        highlightBuilder.requireFieldMatch(false);
        highlightBuilder.preTags("");
        highlightBuilder.postTags("");
        builder.highlighter(highlightBuilder);

        // Perform search
        searchRequest.source(builder);
        SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);

        // Analytical results
        ArrayList<Map<String,Object>> list= new ArrayList<>();
        for (SearchHit hit: searchResponse.getHits().getHits()) {
            // Resolve highlighted fields
            Map<String, HighlightField> highlightFields = hit.getHighlightFields();
            HighlightField title = highlightFields.get("title");
            Map<String,Object> sourceAsMap = hit.getSourceAsMap();// Original results
            // Parse the highlighted field and replace the original field with our highlighted field
            if (title != null) {
                Text[] fragments = title.fragments();
                StringBuilder nTitle = new StringBuilder();
                for (Text text:fragments) {
                    nTitle.append(text);
                }
                sourceAsMap.put("title",nTitle);
            }
            list.add(hit.getSourceAsMap()); // Replace the highlighted field with the original content
        }
        return list;
    }

    // 2. Obtain these data and realize the basic search function
    public List<Map<String,Object>> searchPage (String keyword, int pageNo,int pageSize) throws IOException {
        if (pageNo <= 1)
            pageNo = 1;

        // Clear conditions
        SearchRequest searchRequest = new SearchRequest("jd_goods");

        SearchSourceBuilder builder = new SearchSourceBuilder();

        builder.from(pageNo);
        builder.size(pageSize);
        // Accurate matching
        TermQueryBuilder termQueryBuilder = QueryBuilders.termQuery("title",keyword);
        builder.query(termQueryBuilder);
        builder.timeout(new TimeValue(60, TimeUnit.SECONDS));


        // Perform search
        searchRequest.source(builder);
        SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);

        // Analytical results
        ArrayList<Map<String,Object>> list= new ArrayList<>();
        for (SearchHit hit: searchResponse.getHits().getHits()) {

            list.add(hit.getSourceAsMap()); // Replace the highlighted field with the original content
        }
        return list;
    }
}

Controller

package cn.gorit.controller;

import cn.gorit.pojo.Content;
import cn.gorit.service.ContentService;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.PathVariable;
import org.springframework.web.bind.annotation.RestController;
import org.springframework.web.bind.annotation.RestControllerAdvice;

import java.io.IOException;
import java.util.List;
import java.util.Map;

/**
 * @Classname ContentController
 * @Description TODO
 * @Date 2020/10/22 18:45
 * @Created by CodingGorit
 * @Version 1.0
 */
@RestController
public class ContentController {

    @Autowired
    private ContentService service;

    /**
     * Add data to ES
     * @param keyword
     * @return
     * @throws Exception
     */
    @GetMapping("/parse/{keyword}")
    public Boolean pares(@PathVariable("keyword")  String keyword) throws Exception {
        return service.parseContent(keyword);
    }

    /**
     * Query ES data
     * @param keyword
     * @param pageNo
     * @param pageSize
     * @return
     * @throws IOException
     */
    @GetMapping("/search/{keyword}/{pageNo}/{pageSize}")
    public List<Map<String,Object>> search(@PathVariable("keyword") String keyword,@PathVariable("pageNo") int pageNo, @PathVariable("pageSize") int pageSize) throws IOException {
        if (pageNo == 0) {
            pageNo = 1;
        }
        return service.searchPage(keyword, pageNo, pageSize);
    }
}

Front and rear end separation

POSTMAN test

Search highlight

A set of projects, multi terminal application

10, Summary

  1. ElasticSearch basic usage
  2. SpringBoot integration ES
  3. Actual combat search

Welcome to the personal open source project (coding with Java)

Added by phpbeginner0120 on Wed, 08 Dec 2021 21:22:26 +0200