Difference between string keyword and text types in ElasticSearch

1, Background

There are many basic data types of ES. This article focuses on string types:
ES2.* There are no these two fields in the version, only the string field.
ES5.* And later versions, set the string field as an obsolete field and introduce the text and keyword fields.

The basic data types of ES may vary slightly according to different versions. Please refer to the instructions of different versions on the official website: https://www.elastic.co/guide/en/elasticsearch/reference/6.2/mapping-types.html

2, Difference between text and keyword

All text type strings can be defined as "text" text type or "keyword" keyword type.

The difference is that text type (text type) will use the default word splitter for word segmentation, that is, the stored data will be word segmented first, and then the word phrases after word segmentation will be stored in the index. Of course, you can also specify a specific word splitter for it.
text type retrieval does not directly give whether it matches, but retrieves the similarity and returns the results from high to low according to the similarity. This will lead to the possibility that the data we thought should be queried may not be found.

If it is defined as keyword type (keyword type), it will not be segmented by default and will be stored as is. When a field needs to be filtered, sorted and aggregated according to the exact value, the keyword type should be used
Keyword type retrieval is directly stored as binary. During retrieval, we directly match, and false is returned if there is no match. Therefore, keyword can be used for exact matching.

For the fuzzy query of ES, please refer to other blog posts:
https://blog.csdn.net/pony_maggie/article/details/113951893
Theoretically, the performance of fuzzy query is not as good as term and match.

3, Code use

eg:mapping structure

      {
       "mappings": {
          "example_test_type": {
            "dynamic": "false",
            "_all": {
              "enabled": false
            },
            "properties": {
              "userName": {//User name: tester (fuzzy matching)
                "type": "text"
              },
              "userPlace": {//Registered residence: Jilin (exact matching)
                "type": "keyword"
              },
              "createTime": {
                "type": "long"
              }
            }
          }
        }
       }  

get query parameters (successfully query a record):

{
  "from": 0,
  "size": 10,
  "query": {
    "bool": {
      "must": [
        {
          "term": {
            "userPlace": {
              "value": "Jilin",
              "boost": 1.0
            }
          }
        },
        {
          "match_phrase": {
            "userName": {
              "query": "test",//As long as the input parameter is included by the tested person
              "slop": 0,
              "zero_terms_query": "NONE",
              "boost": 1.0
            }
          }
        }
      ],
      "adjust_pure_negative": true,
      "boost": 1.0
    }
  },
  "sort": [
    {
      "createTime": {
        "order": "desc"
      }
    }
  ]
}

java code call:

   /*
   *1, Query condition assembly
   **/
   SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
    BoolQueryBuilder boolQueryBuilder = new BoolQueryBuilder();
     //Sort by creation time in descending order
    List<FieldSortBuilder> sortBuilderList = new ArrayList<>();
    sortBuilderList.add(new FieldSortBuilder("createTime").order(SortOrder.DESC));
    if (CollectionUtils.isNotEmpty(sortBuilderList)) {
        for (FieldSortBuilder sortBuilder : sortBuilderList) {
            sourceBuilder.sort(sortBuilder);
        }
    }
	 //User name
   boolQueryBuilder.must(QueryBuilders.matchPhraseQuery("userName", userName));
     //Registered residence
   boolQueryBuilder.must(QueryBuilders.termQuery("userPlace", userPlace));
   sourceBuilder.query(boolQueryBuilder);
   
   /*
   *2, Call es query
   **/
   SearchRequest searchRequest = new SearchRequest(example_test_index);//Indexes
   searchRequest.types(example_test_type);//type
   searchRequest.source(sourceBuilder);
   SearchResponse   response = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
   
   /*
   *3, Processing returned results
   **/
    List<UserBO > resultList = new ArrayList<>();
    SearchHits hits = response.getHits();
    if (hits == null || hits.totalHits <= 0) {
        return null;
    }
    //Convert r es ults to objects
    UserBO userBO = null;
    for (SearchHit hit : hits.getHits()) {
        userBO = JsonUtil.parseObject(hit.getSourceAsString(), UserBO .class);
    	resultList .add(userBO);
        }
    }
  }

The string type in this document is mainly processed in conjunction with matchPhraseQuery and termQuery.

Keywords: ElasticSearch

Added by cableuser on Sun, 16 Jan 2022 03:59:22 +0200