ElasticSearch advanced: an overview of the implementation of various ES queries in Java

ElasticSearch multiple query operations

  • preface
  • 1 entry query
    • 1.1 equivalent query - term
    • 1.2 multi value query terms
    • 1.3 range query
    • 1.4 prefix query - prefix
    • 1.5 wildcard query - wildcard
  • 2 compound query
    • 2.1 Boolean query
    • 2.2 Filter query
  • 3 aggregate query
    • 3.1 maximum value, average value and summation
    • 3.2 de duplication query
    • 3.3 group aggregation
      • 3.3.1 single condition grouping
      • 3.3.2 multi condition grouping
    • 3.4 filter polymerization

preface

  • ElasticSearch Article 1: ElasticSearch Foundation: start with inverted index and quickly recognize ES

The theme of this blog post is ES query, so I sorted out as many ES query scenarios as possible to form the following figure:

This article is based on elastic search version 7.13.2. es has been greatly updated since 7.0. After 7.3, the TransportClient client has been deprecated and replaced by the Java High Level REST Client.

Sample data for testing

First, some test data in Mysql:

id

name

age

sex

address

sect

skill

power

create_time

modify_time

1

zhang wuji

18

male

Bright Summit

Ming religion

nine men's power

99

2021-05-14 16:50:33

2021-06-29 16:48:56

2

Zhou Zhiruo

17

female

Mount Emei

Emei sect

The nine Yin manual

88

2021-05-14 11:37:07

2021-06-29 16:56:40

3

Zhao Min

14

female

mostly

imperial court

nothing

40

2021-05-14 11:37:07

2021-06-29 15:22:24

A row of data in Mysql exists as a document in ES:

{
  "_index" : "person",
  "_type" : "_doc",
  "_id" : "4",
  "_score" : 1.0,
  "_source" : {
    "address" : "Mount Emei",
    "modifyTime" : "2021-06-29 19:46:25",
    "createTime" : "2021-05-14 11:37:07",
    "sect" : "Emei sect",
    "sex" : "male",
    "skill" : "Eighteen dragon subduing palms",
    "name" : "Song Qingshu",
    "id" : 4,
    "power" : 50,
    "age" : 21
  }
}

After briefly combing the relevant systems of ES Java API, those interested can study the source code themselves.

Next, we use more than a dozen examples to quickly start the query operation of ES. Each example will provide SQL statements, ES statements and Java code.

1 entry query

The so-called term query, that is, ES will not segment the query conditions. It will be queried only when the term matches the query string exactly.

1.1 equivalent query - term

Equivalent query is to filter out all records with a field equal to a specific value.

SQL:

select * from person where name = 'zhang wuji';

The ES query statement is very different (note that the query field is provided with keyword):

GET /person/_search
{
	"query": {
		"term": {
			"name.keyword": {
				"value": "zhang wuji",
				"boost": 1.0
			}
		}
	}
}

After ElasticSearch 5.0, the string type has been significantly changed. The string type is removed, and the string field is divided into two new data types: text for full-text search and keyword for keyword search.

Query results:

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : { // Slice information
    "total" : 1, // Total score pieces
    "successful" : 1, // Number of fragments successfully queried
    "skipped" : 0, // Number of slices skipped query
    "failed" : 0  // Number of slices failed to query
  },
  "hits" : { // Hit result
    "total" : {
      "value" : 1, // quantity
      "relation" : "eq"  // Relationship: equal to
    },
    "max_score" : 2.8526313,  // Highest score
    "hits" : [
      {
        "_index" : "person", // Indexes
        "_type" : "_doc", // type
        "_id" : "1",
        "_score" : 2.8526313,
        "_source" : {
          "address" : "Bright Summit",
          "modifyTime" : "2021-06-29 16:48:56",
          "createTime" : "2021-05-14 16:50:33",
          "sect" : "Ming religion",
          "sex" : "male",
          "skill" : "nine men's power",
          "name" : "zhang wuji",
          "id" : 1,
          "power" : 99,
          "age" : 18
        }
      }
    ]
  }
}

How to construct ES requests in Java: (only the build statements of SearchSourceBuilder are retained in subsequent examples)

/**
 * term Precise query
 *
 * @throws IOException
 */

@Autowired
private RestHighLevelClient client;

@Test
public void queryTerm() throws IOException {
	// Create query request based on index
    SearchRequest searchRequest = new SearchRequest("person");
    SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
    // Building query statements
    searchSourceBuilder.query(QueryBuilders.termQuery("name.keyword", "zhang wuji"));
    System.out.println("searchSourceBuilder=====================" + searchSourceBuilder);
    searchRequest.source(searchSourceBuilder);
    SearchResponse response = client.search(searchRequest, RequestOptions.DEFAULT);
    System.out.println(JSONObject.toJSON(response));
}

If you carefully observe the query results, you will find that the ES query results will contain_ Score, es will score according to the matching degree of the results. Scoring will consume performance. If you confirm that your query does not need scoring, set the query statement to turn off scoring:

GET /person/_search
{
	"query": {
		"constant_score": {
			"filter": {
				"term": {
					"sect.keyword": {
						"value": "zhang wuji",
						"boost": 1.0
					}
				}
			},
			"boost": 1.0
		}
	}
}

Java build query statement:

SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
// The query criteria constructed in this way will not be score calculated, so as to improve the query efficiency
searchSourceBuilder.query(QueryBuilders.constantScoreQuery(QueryBuilders.termQuery("sect.keyword", "Ming religion")));

1.2 multi value query terms

Multi condition query is similar to IN query IN Mysql, for example:

select * from persons where sect in('Ming religion','Wudang sect');

ES query statement:

GET /person/_search
{
	"query": {
		"terms": {
			"sect.keyword": [
				"Ming religion",
				"Wudang sect"
			],
			"boost": 1.0
		}
	}
}

Java implementation:

SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
// Building query statements
searchSourceBuilder.query(QueryBuilders.termsQuery("sect.keyword", Arrays.asList("Ming religion", "Wudang sect")));
}

1.3 range query

Range query is to query the records of a field in a specific interval.

SQL:

select * from pesons where age between 18 and 22;

ES query statement:

GET /person/_search
{
	"query": {
		"range": {
			"age": {
				"from": 10,
				"to": 20,
				"include_lower": true,
				"include_upper": true,
				"boost": 1.0
			}
		}
	}
}

Java build query criteria:

SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
// Building query statements
searchSourceBuilder.query(QueryBuilders.rangeQuery("age").gte(10).lte(30));
}

1.4 prefix query - prefix

Prefix queries are similar to fuzzy queries in SQL.

SQL:

select * from persons where sect like 'Wudang%';

ES query statement:

{
	"query": {
		"prefix": {
			"sect.keyword": {
				"value": "Wudang",
				"boost": 1.0
			}
		}
	}
}

Java build query criteria:

SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
// Building query statements
searchSourceBuilder.query(QueryBuilders.prefixQuery("sect.keyword","Wudang"));

1.5 wildcard query - wildcard

Wildcard query, similar to prefix query, belongs to the category of fuzzy query, but wildcard is obviously more powerful.

SQL:

select * from persons where name like 'Zhang%Avoid';

ES query statement:

{
	"query": {
		"wildcard": {
			"sect.keyword": {
				"wildcard": "Zhang*Avoid",
				"boost": 1.0
			}
		}
	}
}

Java build query criteria:

SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
// Building query statements
searchSourceBuilder.query(QueryBuilders.wildcardQuery("sect.keyword","Zhang*Avoid"));

2 compound query

The previous examples are all single condition queries. In practical applications, we are likely to filter multiple values or fields. Let's look at a simple example:

select * from persons where sex = 'female' and sect = 'Ming religion';

Such a multi condition equivalent query needs to be borrowed from the combination filter, and its query statement is:

{
	"query": {
		"bool": {
			"must": [
				{
				    "term": {
						"sex": {
							"value": "female",
							"boost": 1.0
						}
					}
				},
				{
					"term": {
						"sect.keywords": {
							"value": "Ming religion",
							"boost": 1.0
						}
					}
				}
			],
			"adjust_pure_negative": true,
			"boost": 1.0
		}
	}
}

Java construct query statement:

SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
// Building query statements
searchSourceBuilder.query(QueryBuilders.boolQuery()
        .must(QueryBuilders.termQuery("sex", "female"))
        .must(QueryBuilders.termQuery("sect.keyword", "Ming religion"))
);

2.1 Boolean query

Boolean filter is a kind of compound filter. It can accept multiple other filters as parameters and combine these filters into various Boolean (logical) combinations.

There can be 4 seed conditions under bool filter, and any one or more of them can be selected. Filter is quite special. I won't talk about it here first.

{
   "bool" : {
      "must" :     [],
      "should" :   [],
      "must_not" : [],
   }
}
  • Must: all statements must match, equivalent to '='.
  • must_not: all statements cannot match, and '! =' Or not in equivalent.
  • should: there are at least n statements to match, and N is controlled by parameters.

Accuracy control:

All must statements must match_ Not statements must not match, but how many should statements should match? By default, no should statement must match, with one exception: when there is no must statement, at least one should statement must match.

We can use minimum_ should_ The match parameter controls the number of should statements to be matched. It can be either an absolute number or a percentage:

GET /person/_search
{
	"query": {
		"bool": {
			"must": [
				{
					"term": {
						"sex": {
							"value": "female",
							"boost": 1.0
						}
					}
				}
			],
			"should": [
				{
					"term": {
						"address.keyword": {
							"value": "Mount Emei",
							"boost": 1.0
						}
					}
				},
				{
					"term": {
						"sect.keyword": {
							"value": "Ming religion",
							"boost": 1.0
						}
					}
				}
			],
			"adjust_pure_negative": true,
			"minimum_should_match": "1",
			"boost": 1.0
		}
	}
}

Java build query statement:

SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
// Building query statements
searchSourceBuilder.query(QueryBuilders.boolQuery()
        .must(QueryBuilders.termQuery("sex", "female"))
        .should(QueryBuilders.termQuery("address.word", "Mount Emei"))
        .should(QueryBuilders.termQuery("sect.keyword", "Ming religion"))
        .minimumShouldMatch(1)
);

Finally, let's take a more complex example and combine the clauses of bool:

select 
	*
from
	persons
where 
	sex = 'female'
and
	age between 30 and 40
and 
	sect != 'Ming religion'
and 
	(address = 'Mount Emei' OR skill = 'hidden weapon')

Elasticsearch is used to represent the above SQL example:

GET /person/_search
{
	"query": {
		"bool": {
			"must": [
				{
					"term": {
						"sex": {
							"value": "female",
							"boost": 1.0
						}
					}
				},
				{
					"range": {
						"age": {
							"from": 30,
							"to": 40,
							"include_lower": true,
							"include_upper": true,
							"boost": 1.0
						}
					}
				}
			],
			"must_not": [
				{
					"term": {
						"sect.keyword": {
							"value": "Ming religion",
							"boost": 1.0
						}
					}
				}
			],
			"should": [
				{
					"term": {
						"address.keyword": {
							"value": "Mount Emei",
							"boost": 1.0
						}
					}
				},
				{
					"term": {
						"skill.keyword": {
							"value": "hidden weapon",
							"boost": 1.0
						}
					}
				}
			],
			"adjust_pure_negative": true,
			"minimum_should_match": "1",
			"boost": 1.0
		}
	}
}

Build this query condition in Java:

SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
// Building query statements
BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery()
        .must(QueryBuilders.termQuery("sex", "female"))
        .must(QueryBuilders.rangeQuery("age").gte(30).lte(40))
        .mustNot(QueryBuilders.termQuery("sect.keyword", "Ming religion"))
        .should(QueryBuilders.termQuery("address.keyword", "Mount Emei"))
        .should(QueryBuilders.rangeQuery("power.keyword").gte(50).lte(80))
        .minimumShouldMatch(1);  // At least several conditions must be met to set should

// Build BoolQueryBuilder into SearchSourceBuilder
searchSourceBuilder.query(boolQueryBuilder);

2.2 Filter query

The difference between query and filter: when querying, query will first compare query conditions, then calculate scores, and finally return document results; The filter first judges whether the query conditions are met. If they are not met, the query results will be cached (the document does not meet the results). If they are met, the results will be directly cached. The filter will not score the results, which can improve the query efficiency.

filter can be used in various ways. Here are a few examples to demonstrate it.

Mode 1: used alone:

{
	"query": {
		"bool": {
			"filter": [
				{
					"term": {
						"sex": {
							"value": "male",
							"boost": 1.0
						}
					}
				}
			],
			"adjust_pure_negative": true,
			"boost": 1.0
		}
	}
}

When used alone, the filter is basically the same as must, except that the filter does not calculate the score and is more efficient.

Java build query statement:

SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
// Building query statements
searchSourceBuilder.query(QueryBuilders.boolQuery()
        .filter(QueryBuilders.termQuery("sex", "male"))
);

Mode 2, and must, must_not sibling, equivalent to subquery:

select * from (select * from persons where sect = 'Ming religion')) a where sex = 'female';

ES query statement:

{
	"query": {
		"bool": {
			"must": [
				{
					"term": {
						"sect.keyword": {
							"value": "Ming religion",
							"boost": 1.0
						}
					}
				}
			],
			"filter": [
				{
					"term": {
						"sex": {
							"value": "female",
							"boost": 1.0
						}
					}
				}
			],
			"adjust_pure_negative": true,
			"boost": 1.0
		}
	}
}

Java:

SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
// Building query statements
searchSourceBuilder.query(QueryBuilders.boolQuery()
        .must(QueryBuilders.termQuery("sect.keyword", "Ming religion"))
        .filter(QueryBuilders.termQuery("sex", "female"))
);

Method 3: set must, must_not is placed under filter, which is the most commonly used method:

{
	"query": {
		"bool": {
			"filter": [
				{
					"bool": {
						"must": [
							{
								"term": {
									"sect.keyword": {
										"value": "Ming religion",
										"boost": 1.0
									}
								}
							},
							{
								"range": {
									"age": {
										"from": 20,
										"to": 35,
										"include_lower": true,
										"include_upper": true,
										"boost": 1.0
									}
								}
							}
						],
						"must_not": [
							{
								"term": {
									"sex.keyword": {
										"value": "female",
										"boost": 1.0
									}
								}
							}
						],
						"adjust_pure_negative": true,
						"boost": 1.0
					}
				}
			],
			"adjust_pure_negative": true,
			"boost": 1.0
		}
	}
}

Java:

SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
// Building query statements
searchSourceBuilder.query(QueryBuilders.boolQuery()
        .filter(QueryBuilders.boolQuery()
                .must(QueryBuilders.termQuery("sect.keyword", "Ming religion"))
                .must(QueryBuilders.rangeQuery("age").gte(20).lte(35))
                .mustNot(QueryBuilders.termQuery("sex.keyword", "female")))
);

3 aggregate query

Next, we will use some cases to demonstrate ES aggregation query.

3.1 maximum value, average value and summation

Case: query the maximum age, minimum age and average age.

SQL:

select max(age) from persons;

ES:

GET /person/_search
{
	"aggregations": {
		"max_age": {
			"max": {
				"field": "age"
			}
		}
	}
}

Java:

@Autowired
private RestHighLevelClient client;

@Test
public void maxQueryTest() throws IOException {
	// Aggregate query criteria
    AggregationBuilder aggBuilder = AggregationBuilders.max("max_age").field("age");
    SearchRequest searchRequest = new SearchRequest("person");
    SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
    // Building aggregate query criteria into SearchSourceBuilder
    searchSourceBuilder.aggregation(aggBuilder);
    System.out.println("searchSourceBuilder----->" + searchSourceBuilder);

    searchRequest.source(searchSourceBuilder);
    // Execute the query to get the SearchResponse
    SearchResponse response = client.search(searchRequest, RequestOptions.DEFAULT);
    System.out.println(JSONObject.toJSON(response));
}

Using aggregate query, only 10 document data will be returned in the result by default (of course, we are concerned about the aggregate result, not the document). How many pieces of data are returned can be controlled independently:

GET /person/_search
{
	"size": 20,
	"aggregations": {
		"max_age": {
			"max": {
				"field": "age"
			}
		}
	}
}

In Java, you only need to add the following statement:

searchSourceBuilder.size(20);

Similar to max, other statistical queries are simple:

AggregationBuilder minBuilder = AggregationBuilders.min("min_age").field("age");
AggregationBuilder avgBuilder = AggregationBuilders.avg("min_age").field("age");
AggregationBuilder sumBuilder = AggregationBuilders.sum("min_age").field("age");
AggregationBuilder countBuilder = AggregationBuilders.count("min_age").field("age");

3.2 de duplication query

Case: query how many sects there are.

SQL:

select count(distinct sect) from persons;

ES:

{
	"aggregations": {
		"sect_count": {
			"cardinality": {
				"field": "sect.keyword"
			}
		}
	}
}

Java:

@Test
public void cardinalityQueryTest() throws IOException {
	// Create a request for an index
    SearchRequest searchRequest = new SearchRequest("person");
    // query criteria
    SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
    // Aggregate query
    AggregationBuilder aggBuilder = AggregationBuilders.cardinality("sect_count").field("sect.keyword");
    searchSourceBuilder.size(0);
    // Building aggregate queries into query criteria
    searchSourceBuilder.aggregation(aggBuilder);
    System.out.println("searchSourceBuilder----->" + searchSourceBuilder);

    searchRequest.source(searchSourceBuilder);
    // Execute the query to get the results
    SearchResponse response = client.search(searchRequest, RequestOptions.DEFAULT);
    System.out.println(JSONObject.toJSON(response));
}

3.3 group aggregation

3.3.1 single condition grouping

Case: query the number of people in each sect

SQL:

select sect,count(id) from mytest.persons group by sect;

ES:

{
	"size": 0,
	"aggregations": {
		"sect_count": {
			"terms": {
				"field": "sect.keyword",
				"size": 10,
				"min_doc_count": 1,
				"shard_min_doc_count": 0,
				"show_term_doc_count_error": false,
				"order": [
					{
						"_count": "desc"
					},
					{
						"_key": "asc"
					}
				]
			}
		}
	}
}

Java:

SearchRequest searchRequest = new SearchRequest("person");
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
searchSourceBuilder.size(0);
// Group by sect
AggregationBuilder aggBuilder = AggregationBuilders.terms("sect_count").field("sect.keyword");
searchSourceBuilder.aggregation(aggBuilder);

3.3.2 multi condition grouping

Case: how many men and women are there in each sect

SQL:

select sect,sex,count(id) from mytest.persons group by sect,sex;

ES:

{
	"aggregations": {
		"sect_count": {
			"terms": {
				"field": "sect.keyword",
				"size": 10
			},
			"aggregations": {
				"sex_count": {
					"terms": {
						"field": "sex.keyword",
						"size": 10
					}
				}
			}
		}
	}
}

3.4 filter polymerization

query is omitted from all the previous aggregation example requests, and the whole request is just an aggregation. This means that we aggregate all data, but in real applications, we often aggregate data in a specific range, such as the following example.

Case: query the maximum age in Mingjiao. This involves using aggregation with conditional queries.

SQL:

select max(age) from mytest.persons where sect = 'Ming religion';

ES:

GET /person/_search
{
	"query": {
		"term": {
			"sect.keyword": {
				"value": "Ming religion",
				"boost": 1.0
			}
		}
	},
	"aggregations": {
		"max_age": {
			"max": {
				"field": "age"
			}
		}
	}
}

Java:

SearchRequest searchRequest = new SearchRequest("person");
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
// Aggregate query criteria
AggregationBuilder maxBuilder = AggregationBuilders.max("max_age").field("age");
// Equivalent query
searchSourceBuilder.query(QueryBuilders.termQuery("sect.keyword", "Ming religion"));
searchSourceBuilder.aggregation(maxBuilder);

There are also some more complex query examples.

Case: query the number of people above 0-20, 21-40, 41-60 and 61.

SQL:

select 
	sum(case when age<=20 then 1 else 0 end) ageGroup1,
	sum(case when age >20 and age <=40 then 1 else 0 end) ageGroup2,
	sum(case when age >40 and age <=60 then 1 else 0 end) ageGroup3,
	sum(case when age >60 and age <=200 then 1 else 0 end) ageGroup4
from 
	mytest.persons;

ES:

{
	"size": 0,
	"aggregations": {
		"age_avg": {
			"range": {
				"field": "age",
				"ranges": [
					{
						"from": 0.0,
						"to": 20.0
					},
					{
						"from": 21.0,
						"to": 40.0
					},
					{
						"from": 41.0,
						"to": 60.0
					},
					{
						"from": 61.0,
						"to": 200.0
					}
				],
				"keyed": false
			}
		}
	}
}

Java:

Query results:

"aggregations" : {
  "age_avg" : {
    "buckets" : [
      {
        "key" : "0.0-20.0",
        "from" : 0.0,
        "to" : 20.0,
        "doc_count" : 3
      },
      {
        "key" : "21.0-40.0",
        "from" : 21.0,
        "to" : 40.0,
        "doc_count" : 13
      },
      {
        "key" : "41.0-60.0",
        "from" : 41.0,
        "to" : 60.0,
        "doc_count" : 4
      },
      {
        "key" : "61.0-200.0",
        "from" : 61.0,
        "to" : 200.0,
        "doc_count" : 1
      }
    ]
  }
}

The above is all the contents of ElasticSearch query. It is rich and detailed, comparable to the operation manual. It is strongly recommended to collect it!

Added by drizzle on Thu, 04 Nov 2021 16:07:19 +0200