elasticsearch problems and Solutions

This article is written based on elastic search version 7.10.

1. The script updates a property value in the object array.

In the background, the business side needs to say that the CDN acceleration is enabled for commodity pictures, and the previous picture domain names need to be modified. The picture address in the database can be updated through sql statements. The key is that the data in our es is not stored in the collection database binlog, but in the message sent by the business side, so we need to manually update the picture address domain name in the next es.

Update the domain name in imageUrl in the properties of the images field in the emails index to the new domain name.

Email mapping structure:

{
  "emails" : {
    "mappings" : {
      "dynamic" : "strict",
      "properties" : {
        "images" : {
          "properties" : {
            "imageUrl" : {
              "type" : "keyword"
            },
            "isMaster" : {
              "type" : "long"
            },
            "order" : {
              "type" : "long"
            }
          }
        }
      }
    }
  }
}

es dsl statement:

POST emails/_update_by_query
{
  "query": {
    "match_all": {}
  },
  "script": {
    "source": """ 
    def a = ctx._source.images;
    if(a != null){
      for(def x : a){
          if(x['imageUrl'].startsWith('https://www.baidu.com')){
            x['imageUrl'] = x['imageUrl'].replace('https://www.baidu.com', 'https://www.toutiao.com');
         }
        }
      ctx._source.images = a;
    }
    """,
    "lang": "painless"
  }
}

2. Update simple type array fields.

Email mapping structure:

{
  "emails" : {
    "mappings" : {
      "dynamic" : "strict",
      "properties" : 
        "tagId" : {
          "type" : "long"
        }
      }
    }
  }
}

es dsl:

POST emails/_update_by_query
{
  "query": {
    "match_all": {}
  },
  "script": {
    "source": """
    def a = new ArrayList();
    if(ctx._source.tagId != null){
      def oldTagIds = ctx._source.tagId;
      if(oldTagIds instanceof Collection){
        a.addAll(oldTagIds)
      } else{
        a.add(oldTagIds)
      }
    }
    a.addAll(params.x);
    ctx._source.tagId = a;
      """,
    "params": {
      "x": [
        1,
        2,
        3,
        4,
        5
      ]
    },
    "lang": "painless"
  }
}

Be sure to judge whether the old value is a collection type. For the collection type, use the addAll() method, otherwise there will be a collection in the collection instead of the same collection.

3. Possible exceptions using the flatted field type.

Background: there is an extended field in the business table, which is placed in the map structure. The values in the map can be increased or deleted.

1) The following exception occurred.

Caused by: org.elasticsearch.ElasticsearchException: Elasticsearch exception 
[type=max_bytes_length_exceeded_exception , reason=bytes can be at most 32766 in length;
 got 38881]

The reason for this exception is that the value length of a key in the business extension field map is very large. Read the official website documents to get the following information:

The flatted type has a parameter ignore_above. The official website explains this parameter as follows:

Leaf values longer than this limit will not be indexed. By default, there is no limit and all values 
will be indexed. Note that this limit applies to the leaf values within the flattened object field,
and not the length of the entire field.

Leaf node values exceeding this limit will not be indexed. By default, there is no limit and all values will be indexed. Note that this restriction applies to the leaf node value in the flat object field, not the length of the entire field.

This option is also useful for protecting against Lucene's term byte-length limit of 32766.

Ignore from the official website_ Above size recommendations:

The value for ignore_above is the character count, but Lucene counts bytes. 
If you use UTF-8 text with many non-ASCII characters, you may want to set the 
limit to 32766 / 4 = 8191 since UTF-8 characters may occupy at most 4 bytes.

Final solution: the value of the search does not need to be too large. Set it to 1000.

POST emails/_mapping
{
  "properties": {
    "featureFlat": {
      "type": "flattened",
      "ignore_above":1000
    }
  }
}

2) If the value in the flatted type field is deleted, using update will cause the deleted data to remain.

This design to es full-text file update and partial document update.

The put request is a full-text file update, and the post request is a partial update.

The final solution is to use put request, that is, IndexRequest
Elastic search rest high level client to operate es).

Finally, let's write it here today. It will be updated continuously later. You are welcome to like, comment and collect. This is my first technical document. Thank you for your support.

Life creed: I want to master one more knowledge today than yesterday.

Keywords: Big Data ElasticSearch search engine

Added by Termina on Thu, 16 Dec 2021 05:09:05 +0200