Request Body Search Query Body in Elastic Search API

This article is a bit long and may require a bit of patience after reading. This article introduces three paging methods of es, sorting, from, size, source filter, dov values fields, post filter, highlight, rescoring, search type, scroll, preference, explanation, version, index boost, min_score, names query in detail. Inner hits, field collapsing, Search After.

You can choose the content you are interested in to read according to the keywords. Basically, the above contents give examples of JAVA usage.

This section describes in detail the query subject of the Elastic Search API and the implementation subject of customized query conditions.

1,query

Query conditions in the body of the search request are defined using Elastic search DSL query syntax. Define the query body by using query.

GET /_search
    {
            "query" : {
                "term" : { "user" : "kimchy" }
            }
    }

2,From / Size

A paging syntax of Elastic Search. Paging the result set by using from and size parameters. From sets the offset of the first data. Since Elastic search is naturally distributed, it divides data horizontally by setting the number of main fragments. A query request usually needs to aggregate data from multiple backend nodes (fragments), so this approach will encounter a pass of distributed databases. Question of Use: Deep Paging. Elastic search provides another way of paging, scrolling API(Scroll), which will be analyzed in detail later. Note: from + size cannot exceed the value of the index.max_result_window configuration item, which defaults to 10000.

3. sort (sort)

Similar to traditional relational databases, elastic search supports sorting by one or more fields, as well as asc ascending or descending descending. In addition, Elastic search can be sorted by _score (score-based) and default values. If sorting is used, the sorting value of each document (field sort) is also returned as part of the response.

3.1 Sorting Order

Elastic search provides two sort orders, SortOrder.ASC(asc) ascending and SortOrder.DESC(desc) descending. If the sort type is _score, its default sort order is descending.

If the sort type is a field, the default sort order is ascending (asc).

3.2 Sorting Model Selection

Elastic search supports sorting by array or multivalued fields. The schema option controls the selected array values to sort the documents to which it belongs. Mode options can be
Lower value:

  • min participates in the sorting using the smallest value in the array.
  • max participates in sorting using the largest value in the array.
  • Sum uses the sum and participation sort in the array.
  • avg participates in the sorting using the average in the array.
  • Median participates in the sorting using the median in the array.
  • Examples are as follows:
PUT /my_index/_doc/1?refresh
{
   "product": "chocolate",
   "price": [20, 4]
}
POST /_search
{
   "query" : {
      "term" : { "product" : "chocolate" }
   },
   "sort" : [
      {"price" : {"order" : "asc", "mode" : "avg"}}   // @1
   ]
} 

If an array type value participates in sorting, it usually calculates some elements of the array to get a final value, such as average, maximum, minimum, sum and so on. es is specified by the sorting model model.

3.3 Nested Field Sorting

Elastic search also supports sorting fields within one or more nested objects. A nested query feed contains the following options (parameters):

  • path
    Define nested objects to sort. The sorting field must be a direct field (non-nested field) in the nested object, and the sorting field must exist.
  • filter
    Define the filtering context and the filtering context in the sorting environment.
  • max_children
    Sorting is to consider the maximum number of sub-attribute documents under the root document, which is unlimited by default.
  • nested
    Sorting body supports nesting.
"sort" : [
  {
    "parent.child.age" : {      // @1
        "mode" :  "min",
         "order" : "asc",
         "nested": {                // @2
            "path": "parent",
            "filter": {
                "range": {"parent.age": {"gte": 21}}
            },
            "nested": {                            // @3
                "path": "parent.child",
                "filter": {
                    "match": {"parent.child.name": "matt"}
                }
            }
         }
    }
  }
]

Code @1: Sort field names, support cascading to represent field names.
Code @2: Sort nesting syntax is defined by nested attributes, where path specifies the current nested object, filter defines the filter context, and nested attributes can be re-nested within @3.

3.4 missing values

Because of the index of es, the fields under the type can increase dynamically when indexing documents. If some documents do not contain sorting fields, how can the order of these documents be determined? ES is determined by missing attribute, and its optional value is:

  • _last
    Default value, last.
  • _first
    Top of the list.

3.5 ignoring unmapped fields

By default, an exception is thrown if the sort field is an unmapped field. The exception can be ignored by unmapped_type, which specifies a type, i.e. tells ES that if the mapping of the field name is not found, the field is considered to be a type specified by unmapped_type, and no value of the field is stored in all documents.

3.6 Geo sorting

Map type sorting, which will be explained in the next presentation of geo type.

4. Field filtering (_source and storage_fields)

By default, everything under the _source field is returned for the hit result. The field filtering mechanism allows users to return some fields in the _source field as needed. Its filtering setup mechanism has been described in detail in Elastic search Document Get API Details, Principles and Examples, and will not be repeated here.

5, Doc Value Fields

The method of use is as follows:

GET /_search
{
    "query" : {
        "match_all": {}
    },
    "docvalue_fields" : [
        {
            "field": "my_date_field",   
            "format": "epoch_millis" 

        }
    ]
}

By specifying the fields and formats to be converted using doc value_fields, doc value fields are also valid for fields that define store = false in the mapping file. Field support uses wildcards, such as "field":"myfield*". The field specified in docvalue_fields does not change the value in the _source field, but uses the field return value for additional returns.

The java instance code snippet is as follows (the complete Demo example will be given at the end of this article):

SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
sourceBuilder.query(QueryBuilders.termQuery("user", "dingw"))
        .sort(new FieldSortBuilder("post_date").order(SortOrder.DESC))
        .docValueField("post_date", "epoch_millis")

The results are as follows:

{
    "took":88,
    "timed_out":false,
    "_shards":{
        "total":5,
        "successful":5,
        "skipped":0,
        "failed":0
    },
    "hits":{
        "total":2,
        "max_score":null,
        "hits":[
            {
                "_index":"twitter",
                "_type":"_doc",
                "_id":"11",
                "_score":null,
                "_source":{
                    "post_date":"2009-11-19T14:12:12",
                    "message":"test bulk update",
                    "user":"dingw"
                },
                "fields":{
                    "post_date":[
                        "1258639932000"
                    ]
                },
                "sort":[
                    1258639932000
                ]
            },
            {
                "_index":"twitter",
                "_type":"_doc",
                "_id":"12",
                "_score":null,
                "_source":{
                    "post_date":"2009-11-18T14:12:12",
                    "message":"test bulk",
                    "user":"dingw"
                },
                "fields":{
                    "post_date":[
                        "1258553532000"
                    ]
                },
                "sort":[
                    1258553532000
                ]
            }
        ]
    }
}

6,Post Filter

post filter filters the document after the query condition hits again.

GET /shirts/_search
{
  "query": {
    "bool": {
      "filter": {
        "term": { "brand": "gucci" }      // @1
      }
    }
  },
  "post_filter": {     // @2
    "term": { "color": "red" }
  }
}

Firstly, the index is retrieved according to @1 condition, and then matched documents are obtained. Then the results are filtered again using @2 filtering condition.

7. Highlighting

7.1 Es-supported highlight analyzer

It is used to highlight the query keywords in the query results to indicate that the matching part of the query conditions in the query results is highlighted by another color.

Note: Highlight displays do not reflect the Boolean logic of queries when extracting terms to be highlighted. Therefore, for some complex Boolean queries, such as nested Boolean queries or queries using minimum_should_match, some errors may occur in highlighting.

Highlight the actual content of the required field. If the field is not stored (the map does not set store to true), the relevant field is extracted from _source.
Elastic search supports three highlighting tools, which are used by specifying type for each field.

  • unified highlighter
    Use Lucene unified highlight display. Firstly, the text is decomposed into sentences and the individual sentences are scored using BM25 algorithm as if they were documents in the corpus. Supports precise phrase and multi-term highlighting (ambiguity, prefix, regular expression). This is the default highlight display for es.
  • plain highlighter
    Use standard Lucene highlighter. plain highlighter is best suited for matching highlighting requirements for a single field. In order to accurately reflect the query logic, it creates a small index in memory and reruns the original query conditions through Lucene's query execution plan in order to obtain lower-level matching information of the current document. If multiple fields need to be highlighted, it is recommended to use unified highlighter or term_vector fields.

plain highlighter highlighting is a real-time analysis and processing highlighter. That is, when the user queries the target data docid, the search engine will extract the highlighted field data into memory, then call the analyzer of the field to process, the analyzer will analyze and process the text, after the analysis is completed, the similarity algorithm is used to calculate the top n groups with the highest score and the highlighted section returns the data. . Assume that users search for larger documents and need to be highlighted at the same time. Displaying 40 queries per page (20k data per page), even if similarity calculation and search ranking are not time-consuming, the whole query will be dragged down to nearly two seconds by highlighting. Highlighter highlighter is a real-time analysis highlighter. This real-time analysis mechanism can make ES take up less IO resources and less storage space (a full vocabulary can save half of the storage space compared with fvh mode). In real-time computing highlighting, cpu resources are used to alleviate the pressure of io, and the highlighter fields are shorter (e.g. Highlighting the title of the article) is faster, and because of less IO visits and less IO pressure, it is conducive to improving system throughput.

Reference material: https://blog.csdn.net/kjsoftware/article/details/76293204

  • fast vector highlighter
    Using lucene fast vector highlingter, based on word vectors, the highlighting processor must turn on term_vector = with_positions_offsets.

To solve the problem of highlighting speed performance on large text fields, the lucene highlighting module provides a vector-based highlighting method fast-vector-highlighter (also known as fvh). Fast-vector-highlighter (fvh) high-brightness display directly calculates highlighted paragraphs by using word vectors saved during indexing. In the highlighting process, the real-time analysis process is less than that of plain highlighter. Instead, the results of word segmentation are directly read from disk to memory for calculation. Therefore, the precondition of using FVH is to configure the storage word vector, which contains the information of word location and word offset.

Note: The fvh highlighter does not support span queries. If you need support for span queries, try other highlights, such as unified highlighter.

The logic of fvh when highlighted is as follows:
1. Analyzing highlighted query grammar and extracting highlighted words in expressions
2. Read the set of word vectors under the document field from disk
3. Traverse the set of word vectors and extract the word vectors from the expressions.
4. Read the word frequency information according to the extracted target word vector, and get each position information and offset according to the word frequency.
5. Obtaining Highlighting Information of the First n Groups with Higher Scores by Similarity Algorithms
6. Read the field content (multi-fields are separated by spaces), and intercept the highlighted field directly according to the extracted word vector.
Reference material: https://blog.csdn.net/kjsoftware/article/details/76293204

7.2 Offsets Strategy

Get offset strategy. One of the keys to highlighting is the location (position and offset) of the highlighted root.

The strategy of obtaining offset information (Offsets) in 3 is provided in ES.

  • The postings list
    If index_options is set to offsets, the unified highlighter will use this information to highlight the document without having to re-analyze the text. It directly reruns the original query on the index and extracts the matching offset from the index. This is important if the field is large because it does not need to re-analyze the text that needs to be highlighted. Less disk space than term_vector mode.
  • Term vectors
    If term_vector is set to with_positions_offset in the field mapping, the unified highlighter will automatically use term_vector to highlight the field. It is especially suitable for large fields (> 1MB) and highlighting multi-root queries (such as prefixes or wildcards) because it can access the terminology dictionary of each document. The fast vector highlighter highlighter must set the field mapping term_vector to with_positions_offset before it can take effect.
  • Plain highlighting
    When there are no other options, use this model uniformly. It creates a small index in memory and reruns the original query condition through Lucene's query execution plan to access the low-level matching information on the current document. Repeat this for each field and document that needs to be highlighted. Plain highlighting is this mode.

Note: Plain highlighting displays may require a lot of time and memory for large text. To prevent this, in the next Elastic search, the maximum number of text characters to be analyzed will be limited to one million. The 6.x version defaults to unlimited, but you can use the index settings parameter index. highlight. max_analysis_offset to set specific indexes.

7.3 Highlight Configuration Items

The highlighted global configuration is overwritten at the field level.

  • boundary_chars
    Sets the set of boundary strings, which by default includes:,!????tn
  • boundary_max_scan
    Scan boundary characters. Default 20
  • boundary_scanner
    Specifies how to decompose highlighted fragments with optional values of chars, sentence, word
  • chars
    Characters. Use the characters specified by bordery_chars as highlighted boundaries. The distance of scanning boundary characters is controlled by boundary_max_scan. This scanning method is only applicable to fast vector highlighter.
  • sentence
    Sentences, highlighted fragments at the boundary of the next sentence determined by Java BreakIterator. You can use boundary_scanner_locale to specify the locale to use. The default behavior of the unified highlighter highlighter highlighter.
  • word
    Words, highlighted fragments at the boundary of the next word determined by Java's BreakIterator.
  • boundary_scanner_locale
    Regional settings. This parameter takes the form of language markup, for example. "En-us", "fr", "ja-JP". More information can be found in the Locale Language Markup Document. The default value is local. root.
  • encoder
    Indicates whether a snippet should be coded as HTML: default (uncoded) or HTML (HTML - escape the text of the snippet, and then insert a highlighted tag).
  • fields
    Specifies fields to retrieve highlighted fields to support wildcards. For example, you can specify comment_* to get a highlighted display of all text and keyword fields starting with comment_.

Note: When you use wildcards, only text and keyword type fields are matched.

  • force_source
    Whether to force highlighting from _source, default to false. In fact, the default is to highlight the content of the source field (_source), even if the field is stored separately.
  • fragmenter
    Specifies how to split text in highlighted code fragments: optional values are simple, span. Only for Plain highlighting. The default is span.
  • simple
    Divide the text into fragments of the same size.
  • span
    Segmentation of text into fragments of the same size, but try to avoid splitting text between highlighted terms. This is useful when querying phrases.
  • fragment_offset
    Control margin (blank) that starts highlighting, only for fast vector highlighter.
  • fragment_size
    Highlighted clips, default 100.
  • highlight_query
    Highlight queries other than matching search queries. This is especially useful if you use rescore queries, because highlighting by default does not take these queries into account. In general, search queries should be included in highlight_query.
  • matched_fields
    Combine matches on multiple fields to highlight a single field. This is the most intuitive way to analyze multiple fields of the same string in different ways. All matched_fields must set term_vector to with_positions_offset, but only the fields to which the matches are combined are loaded, so it is recommended that the field store be set to true. It is only suitable for fast vector highlighter fluorescent pen.
  • no_match_size
    If there is no matching fragment to highlight, the number of text you want to return from the beginning of the field. The default value is 0 (nothing is returned).
  • number_of_fragments
    Maximum number of highlighted fragments returned. If the number of fragments is set to 0, no fragments are returned. The default is 5.
  • order
    The default value is none, which returns highlighted documents in the order of fields and can be set to score (sorted by correlation).
  • phrase_limit
    Control the number of matching phrases in the document to be considered. Prevent fast vector highlighter from parsing too many phrases and consuming too much memory. When using matched_fields, the phrase_limit phrase for each matched field is considered. Increasing restrictions increases query time and consumes more memory. Only fast vector highlighter is supported. The default is 256.
  • pre_tags
    Used to highlight HTML tags, used with post_tags, and highlighted text by default.
  • post_tags
    Used to highlight HTML tags, used with pre_tags, and highlighted text by default.
  • require_field_match
    By default, only fields containing query matches are highlighted. Set require_field_match to false to highlight all fields. The default value is true.
  • tags_schema
    Define highlighting styles, for example.
  • type
    Specify a highlighted display, optional values: unified, plain, fvh. The default value is unified.

7.4 Highlight demo

public static void testSearch_highlighting() {
        RestHighLevelClient client = EsClient.getClient();
        try {
            SearchRequest searchRequest = new SearchRequest();
            searchRequest.indices("map_highlighting_01");
            SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
            sourceBuilder.query(
            //        QueryBuilders.matchAllQuery()
                    QueryBuilders.termQuery("context", "ID")
                    );
            
            HighlightBuilder highlightBuilder = new HighlightBuilder();
            highlightBuilder.field("context");
            
            sourceBuilder.highlighter(highlightBuilder);
            searchRequest.source(sourceBuilder);
            System.out.println(client.search(searchRequest, RequestOptions.DEFAULT));
        } catch (Exception e) {
            // TODO: handle exception
        }
    }

Its return value is as follows:

{
    "took":2,
    "timed_out":false,
    "_shards":{
        "total":5,
        "successful":5,
        "skipped":0,
        "failed":0
    },
    "hits":{
        "total":1,
        "max_score":0.2876821,
        "hits":[
            {
                "_index":"map_highlighting_01",
                "_type":"_doc",
                "_id":"erYsbmcBeEynCj5VqVTI",
                "_score":0.2876821,
                "_source":{
                    "context":"Chengzhong Road can accept the processing of second generation identity cards from other places."
                },
                "highlight":{   // @1
                    "context":[
                        "Mid-west Road can accept the second generation from other places<em>ID</em>Processing."
                    ]
                }
             }
        ]
    }
}

Here, highlight is explained again, in which each field returns a subset of the original data and a maximum of fragmentSize matching entries for keywords. Usually, when displaying text on the page, the field should replace the original value so as to have a highlighted effect.

8,Rescoring

Rescoring mechanism. A query first uses efficient algorithms to find documents, and then uses another query algorithm to return the top n documents. Usually these algorithms are inefficient but can provide matching accuracy.

The total score of resoring query and original query is as follows:

  • total
    The two scores add up
  • multiply
    Multiply the original score by the rescore query score. Used for function query redirection.
  • avg
    Average
  • max
    Maximum
  • min
    Take the minimum.

9,Search Type

Query type, optional values: QUERY_THEN_FETCH, QUERY_AND_FETCH, DFS_QUERY_THEN_FETCH. Default value: query_then_fetch.

  • QUERY_THEN_FETCH: First send requests to related fragments (multiple) according to routing algorithm, then return only documentId and some necessary information (such as sorting), then aggregate and sort the results of each fragment, then select the number of data (top n) that the client specifies to obtain, and then according to docu. MentId then requests specific document information from each fragment.
  • QUERY_AND_FETCH: In version 5.4.x, it begins to be discarded. It requests data directly from each fragment node. Each fragment returns the document information of the number of requests from the client, and then aggregates all requests to the client. The returned data is size* (the number of fragments after routing).
  • DFS_QUERY_THEN_FETCH: Before sending requests to each node, a word frequency and correlation calculation will be done. The following process is the same as QUERY_THEN_FETCH. It can be seen that the document correlation of this query type will be higher, but the performance is worse than QUERY_THEN_FETCH.

10,scroll

Scroll queries. es Another way of paging. Although the search request returns a single "page" of results, the scroll API can be used to retrieve a large number of results (even all results) from a single search request, much like using cursors in traditional databases. The scroll API is not used for real-time user requests, but for processing large amounts of data, such as re-indexing the contents of an index into new indexes with different configurations.

10.1 How to use scroll API

The scroll API is used in two steps:

1. In the first step, the scroll parameter is used to specify the scroll query (similar to the cursor lifetime of the database).

POST /twitter/_search?scroll=1m
{
    "size": 100,
    "query": {
        "match" : {
            "title" : "elasticsearch"
        }
    }
}

This method returns an important parameter: scrollId.

2. The second step is to use the scrollId to pull the next batch (next page data) from the es server.

POST  /_search/scroll 
{
    "scroll" : "1m", 
    "scroll_id" : "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAAD4WYm9laVYtZndUQlNsdDcwakFMNjU1QQ==" 
}

In the third step of the cycle, data can be processed in batches.

3. The third step is to clear scrollId, which is similar to clearing database cursors and releasing resources quickly.

DELETE /_search/scroll
{
    "scroll_id" : "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAAD4WYm9laVYtZndUQlNsdDcwakFMNjU1QQ=="
}

The following is a sample program of the java version of the scoll api:

public static void testScoll() {
        RestHighLevelClient client = EsClient.getClient();
        String scrollId = null;
        try {
            System.out.println("step 1 start ");
            // step 1 start
            SearchRequest searchRequest = new SearchRequest();
            searchRequest.indices("map_highlighting_01");
            SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
            sourceBuilder.query(
                    QueryBuilders.termQuery("context", "ID")
                    );
            searchRequest.source(sourceBuilder);
            searchRequest.scroll(TimeValue.timeValueMinutes(1));
            SearchResponse result = client.search(searchRequest, RequestOptions.DEFAULT);
            scrollId = result.getScrollId();
            // step 1 end
            
            // step 2 start
            if(!StringUtils.isEmpty(scrollId)) {
                System.out.println("step 2 start ");
                SearchScrollRequest scrollRequest = new SearchScrollRequest(scrollId);
                scrollRequest.scroll(TimeValue.timeValueMinutes(1));
                while(true) { //Cyclic traversal
                    SearchResponse scollResponse = client.scroll(scrollRequest, RequestOptions.DEFAULT);
                    if(scollResponse.getHits().getHits() == null ||
                            scollResponse.getHits().getHits().length < 1) {
                        break;
                    }
                    scrollId = scollResponse.getScrollId();
                    // work with documents
                    scrollRequest.scrollId(scrollId);
                }
            // step 2 end    
            }
            System.out.println(result);
        } catch (Exception e) {
            e.printStackTrace();
        } finally {
            if(!StringUtils.isEmpty(scrollId)) {
                System.out.println("step 3 start ");
                // step 3 start
                ClearScrollRequest clearScrollRequest = new ClearScrollRequest();
                clearScrollRequest.addScrollId(scrollId);
                try {
                    client.clearScroll(clearScrollRequest, RequestOptions.DEFAULT);
                } catch (IOException e) {
                    // TODO Auto-generated catch block
                    e.printStackTrace();
                }
            // step 3 end
            }
        } 
        
    }

This article focuses on the first query, not only returns scrollId, but also returns the first batch of data.

10.2 Keeping the search context alive

The scroll parameter (passed to the search request and each scroll request) tells Elastic search how long it should keep the search context active. Its value (e.g. 1m, see Time unitsedit) does not take long enough to process all the data -- it only takes long enough to process the previous batch of results. Each scroll request (with scroll parameters) sets a new expiration time. If the scroll request is not passed in, the search context is released as part of the scroll request. The internal implementation of scroll is similar to a snapshot. When a scroll request is first received, a snapshot is created for the results matched by the search context, and subsequent changes in the document do not reflect the results of the API.

10.3 sliced scroll

For scroll queries that return a large number of documents, the scroll can be divided into several pieces that can be used independently and specified by slice.

For example:

GET /twitter/_search?scroll=1m     // @1
{
    "slice": {                                      // @11
        "id": 0,                                    // @12
        "max": 2                                 // @13
    },
    "query": {
        "match" : {
            "title" : "elasticsearch"
        }
    }
}
GET /twitter/_search?scroll=1m        // @2
{
    "slice": {
        "id": 1,
        "max": 2
    },
    "query": {
        "match" : {
            "title" : "elasticsearch"
        }
    }
}

@ 1, @2 two parallel queries, according to the piecewise query.
@ 11: slice defines piecewise queries.
@ 12: The ID of the fragment query.
@ 13: The total number of queries.

This mechanism is very suitable for multithreaded data processing.

The specific fragmentation mechanism is to first forward the request to each fragmentation node, then use the matched document (hashcode(_uid)%slice number) at each node, and then return the data to the coordination node. That is to say, by default, fragmentation is based on the _uid of the document. In order to improve the fragmentation process, we can optimize it in the following way and specify the fragmentation field.

  • The fragment field type is numeric.
  • The doc_values of the field are set to true.
  • This field is indexed in each document.
  • The field values are assigned only at creation time and will not be updated.
  • The cardinality of the field should be very high (equivalent to database index selectivity), which ensures that the data returned by each slice is comparable and the data is evenly distributed.

Note that the default slice number is 1024, and the default value can be changed by the index setting item index.max_slices_per_scroll.

For example:

GET /twitter/_search?scroll=1m
{
    "slice": {
        "field": "date",
        "id": 0,
        "max": 10
    },
    "query": {
        "match" : {
            "title" : "elasticsearch"
        }
    }
}

11,preference

The tendency of queries to select replica fragmentation (that is, to select the fragmentation value of replicas in a replica group). By default, Elastic search selects from available fragmented replicas in an unspecified order, and routing between replicas will be described in more detail in the cluster chapter. This field allows you to specify the fragmentation tendency and which copy to select.

Optional value of preference:

  • _primary
    Execution only on nodes, discarded after version 6.1.0, will be removed in version 7.x.
  • _primary_first
    Priority is given to execution on the primary node. Abandoned after version 6.1.0, will be removed in version 7.x.
  • _replica
    Operations are performed only on replica fragments, and if there are multiple replicas, the order is random. Abandoned after version 6.1.0, will be removed in version 7.x.
  • _replica_first
    Priority is given to execution on replica fragmentation, and if there are multiple replicas, the order is random. Abandoned after version 6.1.0, will be removed in version 7.x.
  • _only_local
    Operations will only be performed on slices allocated to local nodes. _ The only_local option ensures that only fragmented copies are used on local nodes, which is sometimes useful for troubleshooting. All other options do not fully guarantee the use of any specific fragmented copy in the search, and when the index changes, this may mean that if repeated searches are performed on different fragmented copies in different refresh states, different results may be produced.
  • _local
    Priority is given to execution on local slices.
  • _prefer_nodes:abc,xyz
    Priority is given to execution on the slice of the specified node ID. The node ID in the example is abc, xyz.
  • _shards:2,3
    Limit the operation to a specified fragment. (Here are 2 and 3) This preference can be combined with other preferences, but it must first appear: _shards: 2,3 | local.
  • _only_nodes:abc,xyz,...
    Restriction is based on node ID.
  • Custom (string) value
    Custom string, whose route is hashcode (this value)% assignment group number of nodes. For example, in web applications, sessionId is usually used as a preference value.

12,explain

Explain how the scores are calculated.

GET /_search
{
    "explain": true,
    "query" : {
        "term" : { "user" : "kimchy" }
    }
}

13,version

If set to true, the current version number of each hit document is returned.

GET /_search
{
    "version": true,
    "query" : {
        "term" : { "user" : "kimchy" }
    }
}

14,Index Boost

When searching for multiple indexes, different boost levels are allowed for each index. This property is very convenient when the click-through rate from one index is more important than that from another index.

The following examples are used:

GET /_search
{
    "indices_boost" : [
        { "alias1" : 1.4 },
        { "index*" : 1.3 }
    ]
}

15,min_score

Specifies the minimum score of the returned document, which is not returned if the score of the document is below that value.

GET /_search
{
    "min_score": 0.5,
    "query" : {
        "term" : { "user" : "kimchy" }
    }
}

16,Named Queries

Each filter and query can accept _name in its top-level definition. A matched_queries structure is added to each matched document in the search response to record the matched query name of the document. The tags of queries and filters are meaningful only for bool queries.

The java example is as follows:

public static void testNamesQuery() {
        RestHighLevelClient client = EsClient.getClient();
        try {
            SearchRequest searchRequest = new SearchRequest();
            searchRequest.indices("esdemo");
            SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
            sourceBuilder.query(
                    QueryBuilders.boolQuery()
                        .should(QueryBuilders.termQuery("context", "fox").queryName("q1"))
                        .should(QueryBuilders.termQuery("context", "brown").queryName("q2"))
                    );
            searchRequest.source(sourceBuilder);
            SearchResponse result = client.search(searchRequest, RequestOptions.DEFAULT);
            System.out.println(result);
        } catch (Throwable e) {
            e.printStackTrace();
        } finally {
            EsClient.close(client);
        }
    }

The results are as follows:

{
    "took":4,
    "timed_out":false,
    "_shards":{
        "total":5,
        "successful":5,
        "skipped":0,
        "failed":0
    },
    "hits":{
        "total":2,
        "max_score":0.5753642,
        "hits":[
            {
                "_index":"esdemo",
                "_type":"matchquerydemo",
                "_id":"2",
                "_score":0.5753642,
                "_source":{
                    "context":"My quick brown as fox eats rabbits on a regular basis.",
                    "title":"Keeping pets healthy"
                },
                "matched_queries":[
                    "q1",
                    "q2"
                ]
            },
            {
                "_index":"esdemo",
                "_type":"matchquerydemo",
                "_id":"1",
                "_score":0.39556286,
                "_source":{
                    "context":"Brown rabbits are commonly seen brown.",
                    "title":"Quick brown rabbits"
                },
                "matched_queries":[
                    "q2"
                ]
            }
        ]
    }
}

As mentioned above, each matched document contains matched_queries, indicating which query condition the document matches.

17,Inner hits

For defining return rules for internal nested layers, its inner hits support the following options:

  • from is used for internal matching paging.
  • Size is used for internal matching paging, size.
  • Sort sort sort strategy.
  • Name is the name defined for the internal nesting layer.

This part of the example will be highlighted in the next section.

18. field collapsing (field folding)

Allows folding search results based on field values. Folding is done by selecting only the most sorted document on each folding key. It's a bit like aggregate grouping, and its effect is similar to grouping by field. The first layer of the default hit document list is made up of the first information of that field, or by allowing search results to be folded according to field values. Folding is done by selecting only the most sorted document on each folding key. For example, the following query retrieves the best tweet s for each user and sorts them according to the number of preferences.

Let's start with an example to show the use of field collapsing.

1) First of all, the Twitter content of the query contains the tweets of elastic search:

GET /twitter/_search
{
    "query": {
        "match": {
            "message": "elasticsearch"
        }
    },
    "collapse" : {
        "field" : "user" 
    },
    "sort": ["likes"]
}

Returns the result:

{
    "took":8,
    "timed_out":false,
    "_shards":{
        "total":5,
        "successful":5,
        "skipped":0,
        "failed":0
    },
    "hits":{
        "total":5,
        "max_score":null,
        "hits":[
            {
                "_index":"mapping_field_collapsing_twitter",
                "_type":"_doc",
                "_id":"OYnecmcB-IBeb8B-bF2X",
                "_score":null,
                "_source":{
                    "message":"to be a elasticsearch",
                    "user":"user2",
                    "likes":3
                },
                "sort":[
                    3
                ]
            },
            {
                "_index":"mapping_field_collapsing_twitter",
                "_type":"_doc",
                "_id":"OonecmcB-IBeb8B-bF2q",
                "_score":null,
                "_source":{
                    "message":"to be elasticsearch",
                    "user":"user2",
                    "likes":3
                },
                "sort":[
                    3
                ]
            },
            {
                "_index":"mapping_field_collapsing_twitter",
                "_type":"_doc",
                "_id":"OInecmcB-IBeb8B-bF2G",
                "_score":null,
                "_source":{
                    "message":"elasticsearch is very high",
                    "user":"user1",
                    "likes":3
                },
                "sort":[
                    3
                ]
            },
            {
                "_index":"mapping_field_collapsing_twitter",
                "_type":"_doc",
                "_id":"O4njcmcB-IBeb8B-Rl2H",
                "_score":null,
                "_source":{
                    "message":"elasticsearch is high db",
                    "user":"user1",
                    "likes":1
                },
                "sort":[
                    1
                ]
            },
            {
                "_index":"mapping_field_collapsing_twitter",
                "_type":"_doc",
                "_id":"N4necmcB-IBeb8B-bF0n",
                "_score":null,
                "_source":{
                    "message":"very likes elasticsearch",
                    "user":"user1",
                    "likes":1
                },
                "sort":[
                    1
                ]
            }
        ]
    }
}

First of all, the above list will list all users'Twitters. What if you only want each user to display only one Twitter, with the highest rating, or only two Twitters per user?
At this point, fold by field and shine on the stage.
java demo is as follows:

public static void search_field_collapsing() {
        RestHighLevelClient client = EsClient.getClient();
        try {
            SearchRequest searchRequest = new SearchRequest();
            searchRequest.indices("mapping_field_collapsing_twitter");
            SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
            sourceBuilder.query(
                    QueryBuilders.matchQuery("message","elasticsearch")
            );
            sourceBuilder.sort("likes", SortOrder.DESC);
            CollapseBuilder collapseBuilder = new CollapseBuilder("user");
            sourceBuilder.collapse(collapseBuilder);
            searchRequest.source(sourceBuilder);
            SearchResponse result = client.search(searchRequest, RequestOptions.DEFAULT);
            System.out.println(result);
        } catch (Throwable e) {
            e.printStackTrace();
        } finally {
            EsClient.close(client);
        }
    }

The results are as follows:

{
    "took":22,
    "timed_out":false,
    "_shards":{
        "total":5,
        "successful":5,
        "skipped":0,
        "failed":0
    },
    "hits":{
        "total":5,
        "max_score":null,
        "hits":[
            {
                "_index":"mapping_field_collapsing_twitter",
                "_type":"_doc",
                "_id":"OYnecmcB-IBeb8B-bF2X",
                "_score":null,
                "_source":{
                    "message":"to be a elasticsearch",
                    "user":"user2",
                    "likes":3
                },
                "fields":{
                    "user":[
                        "user2"
                    ]
                },
                "sort":[
                    3
                ]
            },
            {
                "_index":"mapping_field_collapsing_twitter",
                "_type":"_doc",
                "_id":"OInecmcB-IBeb8B-bF2G",
                "_score":null,
                "_source":{
                    "message":"elasticsearch is very high",
                    "user":"user1",
                    "likes":3
                },
                "fields":{
                    "user":[
                        "user1"
                    ]
                },
                "sort":[
                    3
                ]
            }
        ]
    }
}

The above example only returns the first data for each user. What if each user needs to return two data? It can be set by inner_hit.

public static void search_field_collapsing() {
        RestHighLevelClient client = EsClient.getClient();
        try {
            SearchRequest searchRequest = new SearchRequest();
            searchRequest.indices("mapping_field_collapsing_twitter");
            SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
            sourceBuilder.query(
                    QueryBuilders.matchQuery("message","elasticsearch")
            );
            sourceBuilder.sort("likes", SortOrder.DESC);
            CollapseBuilder collapseBuilder = new CollapseBuilder("user");
            
            InnerHitBuilder collapseHitBuilder = new InnerHitBuilder("collapse_inner_hit");
            collapseHitBuilder.setSize(2);
            collapseBuilder.setInnerHits(collapseHitBuilder);
            sourceBuilder.collapse(collapseBuilder);
            
            searchRequest.source(sourceBuilder);
            SearchResponse result = client.search(searchRequest, RequestOptions.DEFAULT);
            System.out.println(result);
        } catch (Throwable e) {
            e.printStackTrace();
        } finally {
            EsClient.close(client);
        }
    }

The results are as follows:

{
    "took":42,
    "timed_out":false,
    "_shards":{
        "total":5,
        "successful":5,
        "skipped":0,
        "failed":0
    },
    "hits":{
        "total":5,
        "max_score":null,
        "hits":[
            {
                "_index":"mapping_field_collapsing_twitter",
                "_type":"_doc",
                "_id":"OYnecmcB-IBeb8B-bF2X",
                "_score":null,
                "_source":{
                    "message":"to be a elasticsearch",
                    "user":"user2",
                    "likes":3
                },
                "fields":{
                    "user":[
                        "user2"
                    ]
                },
                "sort":[
                    3
                ],
                "inner_hits":{
                    "collapse_inner_hit":{
                        "hits":{
                            "total":2,
                            "max_score":0.19363807,
                            "hits":[
                                {
                                    "_index":"mapping_field_collapsing_twitter",
                                    "_type":"_doc",
                                    "_id":"OonecmcB-IBeb8B-bF2q",
                                    "_score":0.19363807,
                                    "_source":{
                                        "message":"to be elasticsearch",
                                        "user":"user2",
                                        "likes":3
                                    }
                                },
                                {
                                    "_index":"mapping_field_collapsing_twitter",
                                    "_type":"_doc",
                                    "_id":"OYnecmcB-IBeb8B-bF2X",
                                    "_score":0.17225473,
                                    "_source":{
                                        "message":"to be a elasticsearch",
                                        "user":"user2",
                                        "likes":3
                                    }
                                }
                            ]
                        }
                    }
                }
            },
            {
                "_index":"mapping_field_collapsing_twitter",
                "_type":"_doc",
                "_id":"OInecmcB-IBeb8B-bF2G",
                "_score":null,
                "_source":{
                    "message":"elasticsearch is very high",
                    "user":"user1",
                    "likes":3
                },
                "fields":{
                    "user":[
                        "user1"
                    ]
                },
                "sort":[
                    3
                ],
                "inner_hits":{
                    "collapse_inner_hit":{
                        "hits":{
                            "total":3,
                            "max_score":0.2876821,
                            "hits":[
                                {
                                    "_index":"mapping_field_collapsing_twitter",
                                    "_type":"_doc",
                                    "_id":"O4njcmcB-IBeb8B-Rl2H",
                                    "_score":0.2876821,
                                    "_source":{
                                        "message":"elasticsearch is high db",
                                        "user":"user1",
                                        "likes":1
                                    }
                                },
                                {
                                    "_index":"mapping_field_collapsing_twitter",
                                    "_type":"_doc",
                                    "_id":"N4necmcB-IBeb8B-bF0n",
                                    "_score":0.2876821,
                                    "_source":{
                                        "message":"very likes elasticsearch",
                                        "user":"user1",
                                        "likes":1
                                    }
                                }
                            ]
                        }
                    }
                }
            }
        ]
    }
}

At this point, the return result is two levels, the first level, or the first message for each user, and then inner_hits are nested internally.

19,Search After

Elastic search supports the third paging method, which does not support skipping pages.

Paging supported by Elastic Search is currently known:
1. Through from and size, when deep paging was achieved, the cost became very high, so es provided index parameters: index.max_result_window to control (from + size) the maximum value, default to 10,000, beyond which error will be reported.
2. Scroll API through scroll, which is similar to the snapshot mode of work, does not have real-time, and scroll context storage needs to consume some performance.
This section introduces the third paging method, search after, which queries the next page of data based on the results of the previous page. The basic idea is to select a set of fields (sorting field, can achieve global uniqueness). The response result of es's sorting query will return the sort array, including the maximum value of this sorting field. The next page query regards this set of fields as the query condition, and es will return the next batch of appropriate data based on this data.

The java example is as follows:

public static void search_search_after() {
        RestHighLevelClient client = EsClient.getClient();
        try {
            SearchRequest searchRequest = new SearchRequest();
            searchRequest.indices("mapping_search_after");
            SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
            sourceBuilder.query(
                    QueryBuilders.termQuery("user","user2")
            );
            sourceBuilder.size(1);
            sourceBuilder.sort("id", SortOrder.ASC);
            searchRequest.source(sourceBuilder);
            SearchResponse result = client.search(searchRequest, RequestOptions.DEFAULT);
            System.out.println(result);
            if(hasHit(result)) { // If the data is matched this time
                // Omitting Data Processing Logic
                // Continue with the next batch of queries
                // result.getHits().
                int length = result.getHits().getHits().length;
                SearchHit aLastHit = result.getHits().getHits()[length - 1];
                //Start the next round of queries
                sourceBuilder.searchAfter(aLastHit.getSortValues());
                result = client.search(searchRequest, RequestOptions.DEFAULT);
                System.out.println(result);
            }
        } catch (Throwable e) {
            e.printStackTrace();
        } finally {
            EsClient.close(client);
        }
    }
    private static boolean hasHit(SearchResponse result) {
        return !( result.getHits() == null ||
                result.getHits().getHits() == null ||
                result.getHits().getHits().length < 1 );
    }

This paper introduces three paging modes of es, sorting, from, size, source filter, dov values fields, post filter, highlighting, rescoring, search type, scroll, preference, preference, explanation, version, index boost, min_score, names query, Inner hits, field collapsing, Search After.

The original release date is: 2019-03-12
Author: Ding Wei
This article is from Interest Circle of Middleware To learn about relevant information, you can pay attention to it. Interest Circle of Middleware.

Keywords: Database ElasticSearch Java Fragment less

Added by volatileboy on Sun, 01 Sep 2019 12:58:13 +0300