Introduction and Advancement of Elastic Search in springboot: Combination Query, Aggregation Query

Links to the original text: https://blog.csdn.net/topdandan/article/details/81436141

1. Configure elastic Search in spring boot

1.1 Introducing related jar packages into engineering

1.1.1 Add the required jar package to build.gradle

I created the gradle project, the corresponding maven project is the same, add the corresponding jar package can be

// Adding dependencies to Spring Data Elastic search
compile('org.springframework.boot:spring-boot-starter-data-elasticsearch')
 
// Adding JNA dependencies, java accesses packages required by the current operating system
compile('net.java.dev.jna:jna:4.3.0')

1.1.2 Add the configuration of elastic search to application.properties

#The default name of es, if you install es without special operation, is this name
spring.data.elasticsearch.cluster-name=elasticsearch
# Elasticsearch Cluster Node Service Address, separated by commas, starts a client node if nothing else is specified, default java access port 9300
spring.data.elasticsearch.cluster-nodes=localhost:9300
# Set connection timeout
spring.data.elasticsearch.properties.transport.tcp.connect_timeout=120s

1.2 Creating Document Entity Objects

package site.wlss.blog.domain.es;
 
import java.io.Serializable;
import java.sql.Timestamp;
 
import org.springframework.data.annotation.Id;
import org.springframework.data.elasticsearch.annotations.Document;
import org.springframework.data.elasticsearch.annotations.Field;
import org.springframework.data.elasticsearch.annotations.FieldIndex;
 
import site.wlss.blog.domain.Blog;
 
 
/**
 * EsBlog Document class.
 * 
 * @since  2018 5 August 2000
 * @author wangli
 */
/*@Document Some of the attributes in the annotations, like mysql, are as follows:
    index –> DB 
    type –> Table 
    Document –> row 
*/
@Document(indexName = "blog", type = "blog")
public class EsBlog implements Serializable {
 
    private static final long serialVersionUID = 1L;
 
    @Id  // Primary key, note that the search is id type string, which is different from what we usually use.
    private String id;  //@ After the Id annotation is added, the primary key corresponds to the column in Elastic search, and can be queried directly with the primary key when querying.
    @Field(index = FieldIndex.not_analyzed)  // Do not do full-text search fields
    private Long blogId; // The id of the blog entity, where an id attribute of the blog is added
    private String title;
    private String summary;
    private String content;
    @Field(index = FieldIndex.not_analyzed)  // Do not do full-text search fields

The above is part of my code. Note that there is an @Document annotation for the entity object and an @id annotation for the object ID. There is also a @Field annotation. This is a description of the field. Here is a detailed explanation of these annotations.

Interpretation 1: @Document annotation

@ Several attributes in the Document annotation, analogous to mysql, are as follows:
indexName --> The name of the index library. It is suggested that the name of the project be named, which is equivalent to the database DB.
type -> type. It is suggested that table be named after entity, which is equivalent to table in database.
Document -> row is equivalent to a specific object

Attached are the annotations:

String indexName();//Name of index library. Name of project is recommended
 
String type() default "";//Type, which is recommended to be named after the entity
 
short shards() default 5;//Default partition number
 
short replicas() default 1;//Default number of backups per partition
 
String refreshInterval() default "1s";//refresh interval
 
String indexStoreType() default "fs";//Index file storage type

Interpretation 2: @Id annotation

In Elastic search, the primary key corresponds to the column, and can be queried directly with the primary key when querying.

Explanation 3: @Field annotation

public @interface Field {
 
FieldType type() default FieldType.Auto;#Automatic Detection of Attribute Types
 
FieldIndex index() default FieldIndex.analyzed;#Default participle
 
DateFormat format() default DateFormat.none;
 
String pattern() default "";
 
boolean store() default false;#By default, the original text is not stored
 
String searchAnalyzer() default "";#Specifies the word splitter to be used for field search
 
String indexAnalyzer() default "";#The word separator specified when the specified field is indexed
 
String[] ignoreFields() default {};#If a field needs to be ignored
 
boolean includeInParent() default false;

}

2. Create document library through jpa

Because we introduced elastic search of spring data, it follows the interface of spring data, that is to say, the method of operating elastic Search is exactly the same as that of operating spring data jpa. We can only inherit the document library from Elastic search Repository.

package site.wlss.blog.repository.es;
 
import org.springframework.data.domain.Page;
import org.springframework.data.domain.Pageable;
import org.springframework.data.elasticsearch.repository.ElasticsearchRepository;
 
import site.wlss.blog.domain.es.EsBlog;
 
 
/**
 * EsBlog Repository Interface.
 * @author Wang Li
 * @date 2018 5 August 2000
 */
public interface EsBlogRepository extends ElasticsearchRepository<EsBlog, String> {
    //Here are two additional query methods we created according to the spring data jpa naming specification
    /**
     * Fuzzy Query (Deduplication), Containing by Title, Introduction, Description and Label
     * @param title
     * @param Summary
     * @param content
     * @param tags
     * @param pageable
     * @return
     */
    Page<EsBlog> findDistinctEsBlogByTitleContainingOrSummaryContainingOrContentContainingOrTagsContaining(String title,String Summary,String content,String tags,Pageable pageable);
    
    /**
     * Query Es Blog according to its id
     * @param blogId
     * @return
     */
    EsBlog findByBlogId(Long blogId);
}

The contents are two additional methods I created based on spring data jpa.

3. Query documents according to reporitory

There is no difference between this method and the common method of operation in jpa, that is, common addition, deletion and modification checking.

4. Elastic Search's advanced complex queries: non-aggregated queries and aggregated queries

Here's what I want to focus on today.

4.1 Non-aggregated complex queries (here we show the common processes of non-aggregated complex queries)

public List<EsBlog> elasticSerchTest() {
    //1. Create Query Builder (that is, set query conditions). Here we create a combination query (also known as multi-condition query). More query methods will be introduced later.
    /*Combination Query Builder
        * must(QueryBuilders)   :AND
        * mustNot(QueryBuilders):NOT
        * should:               :OR
    */
    BoolQueryBuilder builder = QueryBuilders.boolQuery();
    //Under builder, must, should and mustNot are equivalent to and, or and not in sql
    
    //Setting up a vague search, there are two words in the brief comment of the blog: learning
    builder.must(QueryBuilders.fuzzyQuery("sumary", "Study"));
    
    //Set the title of the blog to be queried to contain keywords
    builder.must(new QueryStringQueryBuilder("man").field("springdemo"));
 
    //The ranking of blog comments is decreasing in turn
    FieldSortBuilder sort = SortBuilders.fieldSort("commentSize").order(SortOrder.DESC);
 
    //Set Paging (10 items are displayed on the first page)
    //Note that the start is from 0, a bit like the query for method limit in sql
    PageRequest page = new PageRequest(0, 10);
 
    //2. Building queries
    NativeSearchQueryBuilder nativeSearchQueryBuilder = new NativeSearchQueryBuilder();
    //Set search criteria to build
    nativeSearchQueryBuilder.withQuery(builder);
    //Setting paging to build
    nativeSearchQueryBuilder.withPageable(page);
    //Set the sort to build
    nativeSearchQueryBuilder.withSort(sort);
    //Production of Native SearchQuery
    NativeSearchQuery query = nativeSearchQueryBuilder.build();
 
    //3. Execution Method 1
    Page<EsBlog> page = esBlogRepository.search(query);
    //Execution Method 2: Note that there is another way to execute it here: using elastic search Template
    //Annotations need to be added when executing Method 2
    //@Autowired
    //private ElasticsearchTemplate elasticsearchTemplate;
    List<EsBlog> blogList = elasticsearchTemplate.queryForList(query, EsBlog.class);
    
    //4. Get the total number of entries (for front-end paging)
    int total = (int) page.getTotalElements();
 
    //5. Get the queried data content (returned to the front end)
    List<EsBlog> content = page.getContent();
 
    return content;
}

4.2 Examples of Query Builder Construction Method for Query Conditions

Before using aggregated queries, it's necessary to look at some common ways to create query conditions called Query Builder

4.2.1 Exact Query (Must Match Perfectly)

Single matching termQuery

//Non-separable query parameter 1: field name, parameter 2: field query value, because non-separable, so Chinese characters can only query a word, English is a word.
QueryBuilder queryBuilder=QueryBuilders.termQuery("fieldName", "fieldlValue");
//Word segmentation query, using default word segmentation device
QueryBuilder queryBuilder2 = QueryBuilders.matchQuery("fieldName", "fieldlValue");

Multiple Matches

//Non-segmented query, parameter 1: field name, parameter 2: multi-field query value, because non-segmented, so Chinese characters can only query a word, English is a word.
QueryBuilder queryBuilder=QueryBuilders.termsQuery("fieldName", "fieldlValue1","fieldlValue2...");
//Word segmentation query, using default word segmentation device
QueryBuilder queryBuilder= QueryBuilders.multiMatchQuery("fieldlValue", "fieldName1", "fieldName2", "fieldName3");
//Matching all files means no query conditions are set
QueryBuilder queryBuilder=QueryBuilders.matchAllQuery();

4.2.2 Fuzzy Query (as long as it contains)

//Five common methods of fuzzy query are as follows
//1. Common string queries
QueryBuilders.queryStringQuery("fieldValue").field("fieldName");//Left-right ambiguity
//2. Queries commonly used to recommend similar content
QueryBuilders.moreLikeThisQuery(new String[] {"fieldName"}).addLikeText("pipeidhua");//If filedName is not specified, it defaults to all, commonly used in recommendation of similar content.
//3. Prefix query: If the field has no participle, it matches the whole field prefix
QueryBuilders.prefixQuery("fieldName","fieldValue");
//4.fuzzy query: A participle-based fuzzy query, which is queried by adding fuzzy attributes. If a document can match hotelName with a letter before or after tel, the meaning of fuzzy query is to add or decrease n words before and after the term.
QueryBuilders.fuzzyQuery("hotelName", "tel").fuzziness(Fuzziness.ONE);
//5.wildcard query: wildcard query, supporting * arbitrary strings;? Any character
QueryBuilders.wildcardQuery("fieldName","ctr*");//The first is field name, and the second is a string with matching characters.
QueryBuilders.wildcardQuery("fieldName","c?r?");

4.2.3 Range Query

//Closed Interval Query
QueryBuilder queryBuilder0 = QueryBuilders.rangeQuery("fieldName").from("fieldValue1").to("fieldValue2");
//Open Interval Query
QueryBuilder queryBuilder1 = QueryBuilders.rangeQuery("fieldName").from("fieldValue1").to("fieldValue2").includeUpper(false).includeLower(false);//The default is true, which is to include
//greater than
QueryBuilder queryBuilder2 = QueryBuilders.rangeQuery("fieldName").gt("fieldValue");
//Greater than or equal to
QueryBuilder queryBuilder3 = QueryBuilders.rangeQuery("fieldName").gte("fieldValue");
//less than
QueryBuilder queryBuilder4 = QueryBuilders.rangeQuery("fieldName").lt("fieldValue");
//Less than or equal to
QueryBuilder queryBuilder5 = QueryBuilders.rangeQuery("fieldName").lte("fieldValue");

4.2.4 Combination Query/Multi-Conditional Query/Boolean Query

QueryBuilders.boolQuery()
QueryBuilders.boolQuery().must();//Documents must match conditions exactly, equivalent to and
QueryBuilders.boolQuery().mustNot();//Documents must not match conditions, equivalent to not
QueryBuilders.boolQuery().should();//If at least one condition is met, the document will meet should, equivalent to or

4.3 Aggregated Query

Elastic search has a function called aggregations, which allows you to generate complex analysis statistics on data. It's like GROUP BY in SQL, but it's more powerful.

To master aggregation, you only need to understand two main concepts: (refer to https://blog.csdn.net/dm_vincent/article/details/42387161)

Buckets: A collection of documents that satisfy a certain condition.

Metrics: Statistical information calculated for documents in a bucket.

This is it! Each aggregation is simply a combination of one or more buckets, zero or multiple indicators. It can be roughly translated into SQL:

SELECT COUNT(color) 
FROM table
GROUP BY color

The above COUNT(color) is equivalent to an indicator. GROUP BY color is equivalent to a bucket.

Grouping in buckets and SQL has similar concepts, while indicators are similar to COUNT(), SUM(), MAX().

Let's take a closer look at these concepts.

Buckets

A bucket is a collection of documents that satisfy certain conditions:

An employee belongs to either a male bucket or a female bucket.
The city of Albany belongs to the barrel of New York State.
Date 2014-10-28 belongs to October barrel.
As aggregation is performed, the values in each document are calculated to determine whether they match the bucket conditions. If the match is successful, the document is placed in the bucket and the aggregation continues.

Buckets can also be nested in other buckets, allowing you to complete hierarchical or conditional demarcation of these requirements. For example, Cincinnati can be placed in the barrel of Ohio State, while the whole Ohio State can be placed in the barrel of the United States.

There are many types of buckets in ES that allow you to divide documents in many ways (by hour, by the most popular entries, by age, by geographical location, and more). But fundamentally, they all operate on the same principle: dividing documents according to conditions.

Indicators (Metrics)

Buckets allow us to divide documents meaningfully, but ultimately we need to calculate some metrics for the documents in each bucket. Bucket splitting is the ultimate goal: it provides a way to divide documents so that you can calculate the required metrics.

Most metrics are simple mathematical operations (e.g., min, mean, max, and sum), which are calculated using values in documents. In practice, indicators allow you to calculate, for example, average salary, maximum selling price, or 95% query latency.

Combine the two

An aggregation is a combination of barrels and indicators. An aggregation can have only one bucket, or one indicator, or one for each. There can even be multiple nested barrels in the barrel. For example, we can divide documents into barrels according to the country in which they belong, and then calculate their average salary (an indicator) for each barrel.

Because buckets can be nested, we can implement a more complex aggregation operation:

Documents are divided into barrels according to the country. (barrel)
Then the barrels in each country are divided into barrels according to gender. (barrel)
Then the barrels of each sex are divided into barrels according to the age range. (barrel)
Finally, the average salary is calculated for each age group. (Indicators)

Aggregation queries are created by Aggregation Builders. Some common aggregation queries are as follows

(Reference: http://blog.csdn.net/u010454030/article/details/63266035)

(1)Statistics of the number of fields
  ValueCountBuilder vcb=  AggregationBuilders.count("count_uid").field("uid");
(2)Re-counting the number of fields (with minor errors)
 CardinalityBuilder cb= AggregationBuilders.cardinality("distinct_count_uid").field("uid");
(3)Polymerization filtration
FilterAggregationBuilder fab= AggregationBuilders.filter("uid_filter").filter(QueryBuilders.queryStringQuery("uid:001"));
(4)Grouping by a field
TermsBuilder tb=  AggregationBuilders.terms("group_name").field("name");
(5)Summation
SumBuilder  sumBuilder=    AggregationBuilders.sum("sum_price").field("price");
(6)Average
AvgBuilder ab= AggregationBuilders.avg("avg_price").field("price");
(7)Maximum
MaxBuilder mb= AggregationBuilders.max("max_price").field("price"); 
(8)Find the Minimum
MinBuilder min=    AggregationBuilders.min("min_price").field("price");
(9)Grouping by date interval
DateHistogramBuilder dhb= AggregationBuilders.dateHistogram("dh").field("date");
(10)Get the results in the aggregation
TopHitsBuilder thb=  AggregationBuilders.topHits("top_result");
(11)Nested aggregation
NestedBuilder nb= AggregationBuilders.nested("negsted_path").path("quests");
(12)Reverse nesting
AggregationBuilders.reverseNested("res_negsted").path("kps ");

The detailed usage steps of aggregated queries are as follows:

public void test(){
    //Goal: Search for the most blogged users (one blog corresponds to one user) and achieve the desired results by searching for the frequency of user names in the blog
    //First create a new collection for storing data
    List<String> ueserNameList=new ArrayList<>();
    //1. Create query conditions, namely QueryBuild
    QueryBuilder matchAllQuery = QueryBuilders.matchAllQuery();//Setting all queries is equivalent to not setting query conditions
    //2. Building queries
    NativeSearchQueryBuilder nativeSearchQueryBuilder = new NativeSearchQueryBuilder();
    //2.0 Setting QueryBuilder
    nativeSearchQueryBuilder.withQuery(matchAllQuery);
    //2.1 Set the search type. The default value is QUERY_THEN_FETCH. Refer to https://blog.csdn.net/wulex/article/details/71081042.
    nativeSearchQueryBuilder.withSearchType(SearchType.QUERY_THEN_FETCH);//Specify the type of index, query only the matching documents from each fragment, then reorder and rank, and get the first size documents.
    //2.2 Specify index libraries and document types
    nativeSearchQueryBuilder.withIndices("myBlog").withTypes("blog");//Specify the name and type of the index library to query, which is actually the indedName and type set in our document @Document
    //2.3 Here comes the point!!! Specify aggregation functions. In this case, take a field grouping aggregation as an example (you can set it according to your own aggregation query requirements)
    //The aggregation function explains: Calculate the frequency of occurrence of the field (assumed to be username) in all documents and rank it in descending order (usually used for a field's thermal ranking)
    TermsBuilder termsAggregation = AggregationBuilders.terms("Name for aggregate queries").field("username").order(Terms.Order.count(false));
    nativeSearchQueryBuilder.addAggregation(termsAggregation);
    //2.4 Building Query Objects
    NativeSearchQuery nativeSearchQuery = nativeSearchQueryBuilder.build();
    //3. Executing queries
    //3.1 Method 1, queries are executed through reporitory to obtain a Page-wrapped result set
    Page<EsBlog> search = esBlogRepository.search(nativeSearchQuery);
    List<EsBlog> content = search.getContent();
    for (EsBlog esBlog : content) {
        ueserNameList.add(esBlog.getUsername());
    }
    //After I get the corresponding document, I can get the author of the document, and then I can find the most popular users.
    //3.2 Method 2, query by the elastic search Template. queryForList method of the elastic Search template
    List<EsBlog> queryForList = elasticsearchTemplate.queryForList(nativeSearchQuery, EsBlog.class);
    //3.3 Method 3. By querying the elastic search Template. query () method of the elastic Search template, the aggregation (commonly used) can be obtained.
    Aggregations aggregations = elasticsearchTemplate.query(nativeSearchQuery, new ResultsExtractor<Aggregations>() {
        @Override
        public Aggregations extract(SearchResponse response) {
            return response.getAggregations();
        }
    });
    //Converting to map sets
    Map<String, Aggregation> aggregationMap = aggregations.asMap();
    //Get the aggregation subclass of the corresponding aggregation function. The aggregation subclass is also a map set. The value inside is the bucket Bucket. We want to get the Bucket.
    StringTerms stringTerms = (StringTerms) aggregationMap.get("Name for aggregate queries");
    //Get all the buckets
    List<Bucket> buckets = stringTerms.getBuckets();
    //Converting a collection into an iterator traversal bucket, of course, if you don't delete the elements in buckets, just go ahead and traverse it.
    Iterator<Bucket> iterator = buckets.iterator();
    
    while(iterator.hasNext()) {
        //The bucket bucket is also a map object, so we can just take its key value.
        String username = iterator.next().getKeyAsString();//Or bucket.getKey().toString();
        //According to username, the corresponding document can be queried in the result, and the set of stored data can be added.
        ueserNameList.add(username);
    }
    //Finally, search the corresponding result set according to ueserNameList
    List<User> listUsersByUsernames = userService.listUsersByUsernames(ueserNameList);
}

Original address: https://blog.csdn.net/topdandan/article/details/81436141

Keywords: Spring ElasticSearch SQL Java

Added by sepodati on Wed, 21 Aug 2019 05:31:28 +0300

Programming VIP