ElasticSearch multiple query operations
- preface
- 1 entry query
- 1.1 equivalent query - term
- 1.2 multi value query terms
- 1.3 range query
- 1.4 prefix query - prefix
- 1.5 wildcard query - wildcard
- 2 compound query
- 2.1 Boolean query
- 2.2 Filter query
- 3 aggregate query
- 3.1 maximum value, average value and summation
- 3.2 de duplication query
- 3.3 group aggregation
- 3.3.1 single condition grouping
- 3.3.2 multi condition grouping
- 3.4 filter polymerization
preface
- ElasticSearch Article 1: ElasticSearch Foundation: start with inverted index and quickly recognize ES
The theme of this blog post is ES query, so I sorted out as many ES query scenarios as possible to form the following figure:
data:image/s3,"s3://crabby-images/16535/165359e2f96d97d0b5b8b96f0e967049843dddb2" alt=""
This article is based on elastic search version 7.13.2. es has been greatly updated since 7.0. After 7.3, the TransportClient client has been deprecated and replaced by the Java High Level REST Client.
Sample data for testing
First, some test data in Mysql:
id | name | age | sex | address | sect | skill | power | create_time | modify_time |
---|---|---|---|---|---|---|---|---|---|
1 | zhang wuji | 18 | male | Bright Summit | Ming religion | nine men's power | 99 | 2021-05-14 16:50:33 | 2021-06-29 16:48:56 |
2 | Zhou Zhiruo | 17 | female | Mount Emei | Emei sect | The nine Yin manual | 88 | 2021-05-14 11:37:07 | 2021-06-29 16:56:40 |
3 | Zhao Min | 14 | female | mostly | imperial court | nothing | 40 | 2021-05-14 11:37:07 | 2021-06-29 15:22:24 |
A row of data in Mysql exists as a document in ES:
{ "_index" : "person", "_type" : "_doc", "_id" : "4", "_score" : 1.0, "_source" : { "address" : "Mount Emei", "modifyTime" : "2021-06-29 19:46:25", "createTime" : "2021-05-14 11:37:07", "sect" : "Emei sect", "sex" : "male", "skill" : "Eighteen dragon subduing palms", "name" : "Song Qingshu", "id" : 4, "power" : 50, "age" : 21 } }
After briefly combing the relevant systems of ES Java API, those interested can study the source code themselves.
data:image/s3,"s3://crabby-images/848b3/848b314d230e670efb13f8f452b11afb6d0bbb4b" alt=""
Next, we use more than a dozen examples to quickly start the query operation of ES. Each example will provide SQL statements, ES statements and Java code.
1 entry query
The so-called term query, that is, ES will not segment the query conditions. It will be queried only when the term matches the query string exactly.
1.1 equivalent query - term
Equivalent query is to filter out all records with a field equal to a specific value.
SQL:
select * from person where name = 'zhang wuji';
The ES query statement is very different (note that the query field is provided with keyword):
GET /person/_search { "query": { "term": { "name.keyword": { "value": "zhang wuji", "boost": 1.0 } } } }
After ElasticSearch 5.0, the string type has been significantly changed. The string type is removed, and the string field is divided into two new data types: text for full-text search and keyword for keyword search.
Query results:
{ "took" : 0, "timed_out" : false, "_shards" : { // Slice information "total" : 1, // Total score pieces "successful" : 1, // Number of fragments successfully queried "skipped" : 0, // Number of slices skipped query "failed" : 0 // Number of slices failed to query }, "hits" : { // Hit result "total" : { "value" : 1, // quantity "relation" : "eq" // Relationship: equal to }, "max_score" : 2.8526313, // Highest score "hits" : [ { "_index" : "person", // Indexes "_type" : "_doc", // type "_id" : "1", "_score" : 2.8526313, "_source" : { "address" : "Bright Summit", "modifyTime" : "2021-06-29 16:48:56", "createTime" : "2021-05-14 16:50:33", "sect" : "Ming religion", "sex" : "male", "skill" : "nine men's power", "name" : "zhang wuji", "id" : 1, "power" : 99, "age" : 18 } } ] } }
How to construct ES requests in Java: (only the build statements of SearchSourceBuilder are retained in subsequent examples)
/** * term Precise query * * @throws IOException */ @Autowired private RestHighLevelClient client; @Test public void queryTerm() throws IOException { // Create query request based on index SearchRequest searchRequest = new SearchRequest("person"); SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder(); // Building query statements searchSourceBuilder.query(QueryBuilders.termQuery("name.keyword", "zhang wuji")); System.out.println("searchSourceBuilder=====================" + searchSourceBuilder); searchRequest.source(searchSourceBuilder); SearchResponse response = client.search(searchRequest, RequestOptions.DEFAULT); System.out.println(JSONObject.toJSON(response)); }
If you carefully observe the query results, you will find that the ES query results will contain_ Score, es will score according to the matching degree of the results. Scoring will consume performance. If you confirm that your query does not need scoring, set the query statement to turn off scoring:
GET /person/_search { "query": { "constant_score": { "filter": { "term": { "sect.keyword": { "value": "zhang wuji", "boost": 1.0 } } }, "boost": 1.0 } } }
Java build query statement:
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder(); // The query criteria constructed in this way will not be score calculated, so as to improve the query efficiency searchSourceBuilder.query(QueryBuilders.constantScoreQuery(QueryBuilders.termQuery("sect.keyword", "Ming religion")));
1.2 multi value query terms
Multi condition query is similar to IN query IN Mysql, for example:
select * from persons where sect in('Ming religion','Wudang sect');
ES query statement:
GET /person/_search { "query": { "terms": { "sect.keyword": [ "Ming religion", "Wudang sect" ], "boost": 1.0 } } }
Java implementation:
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder(); // Building query statements searchSourceBuilder.query(QueryBuilders.termsQuery("sect.keyword", Arrays.asList("Ming religion", "Wudang sect"))); }
1.3 range query
Range query is to query the records of a field in a specific interval.
SQL:
select * from pesons where age between 18 and 22;
ES query statement:
GET /person/_search { "query": { "range": { "age": { "from": 10, "to": 20, "include_lower": true, "include_upper": true, "boost": 1.0 } } } }
Java build query criteria:
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder(); // Building query statements searchSourceBuilder.query(QueryBuilders.rangeQuery("age").gte(10).lte(30)); }
1.4 prefix query - prefix
Prefix queries are similar to fuzzy queries in SQL.
SQL:
select * from persons where sect like 'Wudang%';
ES query statement:
{ "query": { "prefix": { "sect.keyword": { "value": "Wudang", "boost": 1.0 } } } }
Java build query criteria:
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder(); // Building query statements searchSourceBuilder.query(QueryBuilders.prefixQuery("sect.keyword","Wudang"));
1.5 wildcard query - wildcard
Wildcard query, similar to prefix query, belongs to the category of fuzzy query, but wildcard is obviously more powerful.
SQL:
select * from persons where name like 'Zhang%Avoid';
ES query statement:
{ "query": { "wildcard": { "sect.keyword": { "wildcard": "Zhang*Avoid", "boost": 1.0 } } } }
Java build query criteria:
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder(); // Building query statements searchSourceBuilder.query(QueryBuilders.wildcardQuery("sect.keyword","Zhang*Avoid"));
2 compound query
The previous examples are all single condition queries. In practical applications, we are likely to filter multiple values or fields. Let's look at a simple example:
select * from persons where sex = 'female' and sect = 'Ming religion';
Such a multi condition equivalent query needs to be borrowed from the combination filter, and its query statement is:
{ "query": { "bool": { "must": [ { "term": { "sex": { "value": "female", "boost": 1.0 } } }, { "term": { "sect.keywords": { "value": "Ming religion", "boost": 1.0 } } } ], "adjust_pure_negative": true, "boost": 1.0 } } }
Java construct query statement:
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder(); // Building query statements searchSourceBuilder.query(QueryBuilders.boolQuery() .must(QueryBuilders.termQuery("sex", "female")) .must(QueryBuilders.termQuery("sect.keyword", "Ming religion")) );
2.1 Boolean query
Boolean filter is a kind of compound filter. It can accept multiple other filters as parameters and combine these filters into various Boolean (logical) combinations.
data:image/s3,"s3://crabby-images/45371/4537136b1e3a2af57f99ddbf1324e7bf85fedde0" alt=""
There can be 4 seed conditions under bool filter, and any one or more of them can be selected. Filter is quite special. I won't talk about it here first.
{ "bool" : { "must" : [], "should" : [], "must_not" : [], } }
- Must: all statements must match, equivalent to '='.
- must_not: all statements cannot match, and '! =' Or not in equivalent.
- should: there are at least n statements to match, and N is controlled by parameters.
Accuracy control:
All must statements must match_ Not statements must not match, but how many should statements should match? By default, no should statement must match, with one exception: when there is no must statement, at least one should statement must match.
We can use minimum_ should_ The match parameter controls the number of should statements to be matched. It can be either an absolute number or a percentage:
GET /person/_search { "query": { "bool": { "must": [ { "term": { "sex": { "value": "female", "boost": 1.0 } } } ], "should": [ { "term": { "address.keyword": { "value": "Mount Emei", "boost": 1.0 } } }, { "term": { "sect.keyword": { "value": "Ming religion", "boost": 1.0 } } } ], "adjust_pure_negative": true, "minimum_should_match": "1", "boost": 1.0 } } }
Java build query statement:
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder(); // Building query statements searchSourceBuilder.query(QueryBuilders.boolQuery() .must(QueryBuilders.termQuery("sex", "female")) .should(QueryBuilders.termQuery("address.word", "Mount Emei")) .should(QueryBuilders.termQuery("sect.keyword", "Ming religion")) .minimumShouldMatch(1) );
Finally, let's take a more complex example and combine the clauses of bool:
select * from persons where sex = 'female' and age between 30 and 40 and sect != 'Ming religion' and (address = 'Mount Emei' OR skill = 'hidden weapon')
Elasticsearch is used to represent the above SQL example:
GET /person/_search { "query": { "bool": { "must": [ { "term": { "sex": { "value": "female", "boost": 1.0 } } }, { "range": { "age": { "from": 30, "to": 40, "include_lower": true, "include_upper": true, "boost": 1.0 } } } ], "must_not": [ { "term": { "sect.keyword": { "value": "Ming religion", "boost": 1.0 } } } ], "should": [ { "term": { "address.keyword": { "value": "Mount Emei", "boost": 1.0 } } }, { "term": { "skill.keyword": { "value": "hidden weapon", "boost": 1.0 } } } ], "adjust_pure_negative": true, "minimum_should_match": "1", "boost": 1.0 } } }
Build this query condition in Java:
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder(); // Building query statements BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery() .must(QueryBuilders.termQuery("sex", "female")) .must(QueryBuilders.rangeQuery("age").gte(30).lte(40)) .mustNot(QueryBuilders.termQuery("sect.keyword", "Ming religion")) .should(QueryBuilders.termQuery("address.keyword", "Mount Emei")) .should(QueryBuilders.rangeQuery("power.keyword").gte(50).lte(80)) .minimumShouldMatch(1); // At least several conditions must be met to set should // Build BoolQueryBuilder into SearchSourceBuilder searchSourceBuilder.query(boolQueryBuilder);
2.2 Filter query
The difference between query and filter: when querying, query will first compare query conditions, then calculate scores, and finally return document results; The filter first judges whether the query conditions are met. If they are not met, the query results will be cached (the document does not meet the results). If they are met, the results will be directly cached. The filter will not score the results, which can improve the query efficiency.
filter can be used in various ways. Here are a few examples to demonstrate it.
Mode 1: used alone:
{ "query": { "bool": { "filter": [ { "term": { "sex": { "value": "male", "boost": 1.0 } } } ], "adjust_pure_negative": true, "boost": 1.0 } } }
When used alone, the filter is basically the same as must, except that the filter does not calculate the score and is more efficient.
Java build query statement:
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder(); // Building query statements searchSourceBuilder.query(QueryBuilders.boolQuery() .filter(QueryBuilders.termQuery("sex", "male")) );
Mode 2, and must, must_not sibling, equivalent to subquery:
select * from (select * from persons where sect = 'Ming religion')) a where sex = 'female';
ES query statement:
{ "query": { "bool": { "must": [ { "term": { "sect.keyword": { "value": "Ming religion", "boost": 1.0 } } } ], "filter": [ { "term": { "sex": { "value": "female", "boost": 1.0 } } } ], "adjust_pure_negative": true, "boost": 1.0 } } }
Java:
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder(); // Building query statements searchSourceBuilder.query(QueryBuilders.boolQuery() .must(QueryBuilders.termQuery("sect.keyword", "Ming religion")) .filter(QueryBuilders.termQuery("sex", "female")) );
Method 3: set must, must_not is placed under filter, which is the most commonly used method:
{ "query": { "bool": { "filter": [ { "bool": { "must": [ { "term": { "sect.keyword": { "value": "Ming religion", "boost": 1.0 } } }, { "range": { "age": { "from": 20, "to": 35, "include_lower": true, "include_upper": true, "boost": 1.0 } } } ], "must_not": [ { "term": { "sex.keyword": { "value": "female", "boost": 1.0 } } } ], "adjust_pure_negative": true, "boost": 1.0 } } ], "adjust_pure_negative": true, "boost": 1.0 } } }
Java:
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder(); // Building query statements searchSourceBuilder.query(QueryBuilders.boolQuery() .filter(QueryBuilders.boolQuery() .must(QueryBuilders.termQuery("sect.keyword", "Ming religion")) .must(QueryBuilders.rangeQuery("age").gte(20).lte(35)) .mustNot(QueryBuilders.termQuery("sex.keyword", "female"))) );
3 aggregate query
Next, we will use some cases to demonstrate ES aggregation query.
3.1 maximum value, average value and summation
Case: query the maximum age, minimum age and average age.
SQL:
select max(age) from persons;
ES:
GET /person/_search { "aggregations": { "max_age": { "max": { "field": "age" } } } }
Java:
@Autowired private RestHighLevelClient client; @Test public void maxQueryTest() throws IOException { // Aggregate query criteria AggregationBuilder aggBuilder = AggregationBuilders.max("max_age").field("age"); SearchRequest searchRequest = new SearchRequest("person"); SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder(); // Building aggregate query criteria into SearchSourceBuilder searchSourceBuilder.aggregation(aggBuilder); System.out.println("searchSourceBuilder----->" + searchSourceBuilder); searchRequest.source(searchSourceBuilder); // Execute the query to get the SearchResponse SearchResponse response = client.search(searchRequest, RequestOptions.DEFAULT); System.out.println(JSONObject.toJSON(response)); }
Using aggregate query, only 10 document data will be returned in the result by default (of course, we are concerned about the aggregate result, not the document). How many pieces of data are returned can be controlled independently:
GET /person/_search { "size": 20, "aggregations": { "max_age": { "max": { "field": "age" } } } }
In Java, you only need to add the following statement:
searchSourceBuilder.size(20);
Similar to max, other statistical queries are simple:
AggregationBuilder minBuilder = AggregationBuilders.min("min_age").field("age"); AggregationBuilder avgBuilder = AggregationBuilders.avg("min_age").field("age"); AggregationBuilder sumBuilder = AggregationBuilders.sum("min_age").field("age"); AggregationBuilder countBuilder = AggregationBuilders.count("min_age").field("age");
3.2 de duplication query
Case: query how many sects there are.
SQL:
select count(distinct sect) from persons;
ES:
{ "aggregations": { "sect_count": { "cardinality": { "field": "sect.keyword" } } } }
Java:
@Test public void cardinalityQueryTest() throws IOException { // Create a request for an index SearchRequest searchRequest = new SearchRequest("person"); // query criteria SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder(); // Aggregate query AggregationBuilder aggBuilder = AggregationBuilders.cardinality("sect_count").field("sect.keyword"); searchSourceBuilder.size(0); // Building aggregate queries into query criteria searchSourceBuilder.aggregation(aggBuilder); System.out.println("searchSourceBuilder----->" + searchSourceBuilder); searchRequest.source(searchSourceBuilder); // Execute the query to get the results SearchResponse response = client.search(searchRequest, RequestOptions.DEFAULT); System.out.println(JSONObject.toJSON(response)); }
3.3 group aggregation
3.3.1 single condition grouping
Case: query the number of people in each sect
SQL:
select sect,count(id) from mytest.persons group by sect;
ES:
{ "size": 0, "aggregations": { "sect_count": { "terms": { "field": "sect.keyword", "size": 10, "min_doc_count": 1, "shard_min_doc_count": 0, "show_term_doc_count_error": false, "order": [ { "_count": "desc" }, { "_key": "asc" } ] } } } }
Java:
SearchRequest searchRequest = new SearchRequest("person"); SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder(); searchSourceBuilder.size(0); // Group by sect AggregationBuilder aggBuilder = AggregationBuilders.terms("sect_count").field("sect.keyword"); searchSourceBuilder.aggregation(aggBuilder);
3.3.2 multi condition grouping
Case: how many men and women are there in each sect
SQL:
select sect,sex,count(id) from mytest.persons group by sect,sex;
ES:
{ "aggregations": { "sect_count": { "terms": { "field": "sect.keyword", "size": 10 }, "aggregations": { "sex_count": { "terms": { "field": "sex.keyword", "size": 10 } } } } } }
3.4 filter polymerization
query is omitted from all the previous aggregation example requests, and the whole request is just an aggregation. This means that we aggregate all data, but in real applications, we often aggregate data in a specific range, such as the following example.
Case: query the maximum age in Mingjiao. This involves using aggregation with conditional queries.
SQL:
select max(age) from mytest.persons where sect = 'Ming religion';
ES:
GET /person/_search { "query": { "term": { "sect.keyword": { "value": "Ming religion", "boost": 1.0 } } }, "aggregations": { "max_age": { "max": { "field": "age" } } } }
Java:
SearchRequest searchRequest = new SearchRequest("person"); SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder(); // Aggregate query criteria AggregationBuilder maxBuilder = AggregationBuilders.max("max_age").field("age"); // Equivalent query searchSourceBuilder.query(QueryBuilders.termQuery("sect.keyword", "Ming religion")); searchSourceBuilder.aggregation(maxBuilder);
There are also some more complex query examples.
Case: query the number of people above 0-20, 21-40, 41-60 and 61.
SQL:
select sum(case when age<=20 then 1 else 0 end) ageGroup1, sum(case when age >20 and age <=40 then 1 else 0 end) ageGroup2, sum(case when age >40 and age <=60 then 1 else 0 end) ageGroup3, sum(case when age >60 and age <=200 then 1 else 0 end) ageGroup4 from mytest.persons;
ES:
{ "size": 0, "aggregations": { "age_avg": { "range": { "field": "age", "ranges": [ { "from": 0.0, "to": 20.0 }, { "from": 21.0, "to": 40.0 }, { "from": 41.0, "to": 60.0 }, { "from": 61.0, "to": 200.0 } ], "keyed": false } } } }
Java:
Query results:
"aggregations" : { "age_avg" : { "buckets" : [ { "key" : "0.0-20.0", "from" : 0.0, "to" : 20.0, "doc_count" : 3 }, { "key" : "21.0-40.0", "from" : 21.0, "to" : 40.0, "doc_count" : 13 }, { "key" : "41.0-60.0", "from" : 41.0, "to" : 60.0, "doc_count" : 4 }, { "key" : "61.0-200.0", "from" : 61.0, "to" : 200.0, "doc_count" : 1 } ] } }
The above is all the contents of ElasticSearch query. It is rich and detailed, comparable to the operation manual. It is strongly recommended to collect it!