ElasticSearch is a beginner's article

Elastasearch - Introduction and installation

Full text search belongs to the most common requirement, open source Elasticsearch (hereinafter referred to as ES) is the first choice of full-text search engine. It can quickly store, search and analyze massive data.

The bottom layer of ES is the open source library Lucene. ES is a Lucene based search server. It provides a distributed full-text search engine based on restful web interface. ES is an open source project developed in Java language and based on Apache protocol. It is the most popular enterprise search engine at present. ES is widely used in cloud computing, can achieve real-time search, and has the characteristics of stability, reliability and speed.

file

Chinese documents : Based on version 2.3, some contents are outdated

Official documents : the details shall be subject to the official documents

Related concepts

  • Index: an index is a collection of documents with similar characteristics, similar to the concept of database in MySql.

  • Type: type is a logical category partition of an index. Generally, it is a document type with a set of public fields, similar to the concept of tables in MySql. Note: in Elasticsearch 6.0.0 and later versions, type type is removed, and an index can only contain one type.

    reason:

    1. In relational databases, table s are independent (stored independently), but different types in the same index in es are stored in the same index (lucene's index file). Therefore, the mapping of fields with the same name in different types must be consistent.

    2. Different types of "records" are stored in the same index, which will affect the compression performance of lucene.

    ES 7: the type parameter in the URL is optional. For example, the document type is not required to index a document

    ES 8: the Type parameter in the URL is no longer supported

    Migration solution: migrate the index from multi type to single type, and each type of document has an independent index

  • Document: a document is a basic information unit that can be indexed. It is expressed in JSON, similar to the concept of row record in MySql.

  • Shards (shards): when an index stores a large amount of data, it may exceed the hardware limit of a single node. To solve this problem, Elasticsearch provides the concept of subdividing the index into Shards. The shard mechanism gives the index the ability to expand horizontally and allows cross shard distribution and parallelization operations, so as to improve performance and throughput.

  • Replicas: in the network environment where failures may occur, a failover mechanism is required. Elasticsearch provides the function of replicating index fragments into one or more replicas. Replicas provide high availability in case of failure of some nodes.

  • Cluster: a cluster is a collection of one or more nodes that together save the entire data and provide federated index and search functions across all nodes. Each cluster has its own unique cluster name, and nodes join the cluster through the name.

  • Node: a node refers to a single elasticsearch instance belonging to a cluster, which stores data and participates in the indexing and search functions of the cluster. Nodes can be configured to join a specific cluster by cluster name. By default, each node is set to join a cluster named elasticsearch.

Inverted index: Reference address

Docker installation Elasticsearch and Kibana

1. Download the image file

# Storing and retrieving data
docker pull elasticsearch:7.4.2

# Visual retrieval data
docker pull kibana:7.4.2

2. Configure the mounted data folder

# Create profile directory
mkdir -p /mydata/elasticsearch/config

# Create data directory
mkdir -p /mydata/elasticsearch/data

# Make the files in the / mydata/elasticsearch / folder readable and writable
chmod -R 777 /mydata/elasticsearch/

# Configure any machine to access elasticsearch
echo "http.host: 0.0.0.0" >/mydata/elasticsearch/config/elasticsearch.yml

3. Start Elasticsearch

The \ after the command is a newline character. Note that there is a space in front of it

docker run --name elasticsearch -p 9200:9200 -p 9300:9300 \
-e  "discovery.type=single-node" \
-e ES_JAVA_OPTS="-Xms64m -Xmx512m" \
-v /mydata/elasticsearch/config/elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml \
-v /mydata/elasticsearch/data:/usr/share/elasticsearch/data \
-v  /mydata/elasticsearch/plugins:/usr/share/elasticsearch/plugins \
-d elasticsearch:7.4.2
  • p 9200:9200 -p 9300:9300: two ports are exposed. 9200 is used for HTTP REST API requests. 9300 is the communication port between ES in the distributed cluster state;
  • E "discovery. Type = single node": es runs as a single node
  • e ES_JAVA_OPTS="-Xms64m -Xmx512m": set startup to occupy memory. If it is not set, it may occupy all memory of the current system
  • v: Mount the configuration file, data file and plug-in data in the container to the local folder;
  • d elasticsearch:7.6.2: Specifies the mirror to start

Visit IP:9200 and see the returned json data, indicating that the startup is successful.

4. Set Elasticsearch to start with Docker

# Currently, Docker starts automatically, so ES is also started automatically
docker update elasticsearch --restart=always

5. Start visualization Kibana

docker run --name kibana \
-e ELASTICSEARCH_HOSTS=http://120.25.165.166:9200 \
-p 5601:5601 \
-d kibana:7.4.2
  • ELASTICSEARCH_HOSTS=http://120.25.165.166:9200: Set the IP address of your virtual machine here

Browser input 120.25 165.166:5601 test:

6. Set Kibana to start with Docker

# Currently, Docker starts automatically, so kibana also starts automatically
docker update kibana --restart=always

Elasticsearch - getting started

Elasticsearch operates data through REST API interfaces. The following is a demonstration of its use through requests from several interfaces. (the current virtual machine IP is 120.25.165.167)

View cluster status_ cat

  1. /_ cat/nodes: view all nodes

Interface: GET http://120.25.165.167:9200/_cat/nodes

  1. /_ cat/health: View ES health

Interface: GET http://120.25.165.167:9200/_cat/health

  1. /_ cat/master: View master node information

Interface: GET http://120.25.165.167:9200/_cat/master

  1. /_ Cat / indexes: view all indexes

Interface: GET http://120.25.165.167:9200/_cat/indices

Index (add) a document

That is, save a piece of data in which index and type, and specify which unique identifier to use.

PUT request

Interface: PUT http://120.25.165.167:9200/customer/external/1

POST request

Interface: POST http://120.25.165.167:9200/customer/external/

Both PUT and POST are OK

  • If you do not specify an id for POST addition, the id will be automatically generated. Specifying the id will modify the data and add a version number;
  • PUT can be added or modified. PUT must specify id; Because PUT needs to specify an id, it is generally used for modification. If the id is not specified, an error will be reported.

View document

/index/type/id

Interface: GET http://120.25.165.167:9200/customer/external/1

**Field interpretation:**
{
    "_index": "customer",  # At which index (Library)
    "_type": "external",   # In which type (table)
    "_id": "1",						 # Document ID (record)
    "_version": 5,				 # Version number
    "_seq_no": 4,					 # The concurrency control field will be + 1 every time it is updated, which is used as an optimistic lock
    "_primary_term": 1,		 # As above, the main partition will be reassigned. If it is restarted, it will change
    "found": true,
    "_source": {					 # data
        "name": "zhangsan"
    }
}

# Carry optimistic lock when updating_ seq_ no=0&_ primary_ Term = 1 update fails when the carried data does not match the actual value

Update document

/index/type/id/_update

Interface: POST http://120.25.165.167:9200/customer/external/1/_update

Differences between several updated documents

When indexing documents, i.e. saving documents, there are also two ways to update documents:

  • When the PUT request has an id and the id data exists, the document will be updated;
  • When the POST request has an id, which is the same as PUT, and the id data already exists, the document will be updated;

The two requests are similar, that is, if the data exists with id, the update operation will be performed.

Analogy:

  • The message format of the request body is different_ The data to be modified in the update mode should be wrapped under the doc key
  • _ The update method will not update repeatedly, the existing data will not be updated, and the version number will not change. The other two methods will update repeatedly (overwrite the original data), and the version number will change
  • These methods can add attributes when updating. PUT requests with id and POST requests with id will directly overwrite the original data and will not add attributes to the original attributes

Delete document & Index

remove document

Interface: DELETE http://120.25.165.167:9200/customer/external/1

Delete index

Interface: DELETE http://120.25.165.167:9200/customer

Batch operation data - bulk

Syntax format:

{action:{metadata}}\n   // For example, index saves records and update updates
{request body  }\n

{action:{metadata}}\n
{request body  }\n

Batch operation specifying index and type

Interface: POST /customer/external/_bulk

Parameters:

{"index":{"_id":"1"}}
{"name":"John Doe"}
{"index":{"_id":"2"}}
{"name":"John Doe"}

Test batches using dev tools in Kibana:

Perform batch operations on all indexes

Interface: Post/_ bulk

Parameters:

{"delete":{"_index":"website","_type":"blog","_id":"123"}}
{"create":{"_index":"website","_type":"blog","_id":"123"}}
{"title":"my first blog post"}
{"index":{"_index":"website","_type":"blog"}}
{"title":"my second blog post"}
{"update":{"_index":"website","_type":"blog","_id":"123"}}
{"doc":{"title":"my updated blog post"}}

  • For batch operations here, when an execution fails, other data can still be executed, that is, they are independent of each other.
  • The bulk api executes all actions in this order. If a single action fails for any reason, it will continue to process the remaining actions after it.
  • When the bulk api returns, it will provide the status of each action (in the same order as sending), so you can check whether a specified action has failed.

Elasticsearch - Advanced Search

Import sample test data

Prepare a fictional JSON document sample of customer bank account information. Each document has the following schema.

{
    "account_number": 1,
    "balance": 39225,
    "firstname": "Amber",
    "lastname": "Duke",
    "age": 32,
    "gender": "M",
    "address": "880 Holmes Lane",
    "employer": "Pyrami",
    "email": "amberduke@pyrami.com",
    "city": "Brogan",
    "state": "IL"
}

https://raw.githubusercontent.com/elastic/elasticsearch/master/docs/src/test/resources/accounts.json

Access failed to request: Mirror Address . Import test data.

POST bank/account/_bulk

POST /bank/account/_bulk
{
  "index": {
    "_id": "1"
  }
}
{
  "account_number": 1,
  "balance": 39225,
  "firstname": "Amber",
  "lastname": "Duke",
  "age": 32,
  "gender": "M",
  "address": "880 Holmes Lane",
  "employer": "Pyrami",
  "email": "amberduke@pyrami.com",
  "city": "Brogan",
  "state": "IL"
}
......Omit several pieces of data

Introduction to retrieval examples

The following requests are operated in kibana dev tools

The simplest search, using match_all, for example, search all, request interface:

GET /bank/_search
{
  "query": {
    "match_all": {}
  },
  "sort": [
    {
      "account_number": "asc"
    }
  ]
}
# Query query criteria
# Sort sort criteria

result

{
  "took" : 7,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1000,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [
      {
        "_index" : "bank",
        "_type" : "account",
        "_id" : "0",
        "_score" : null,
        "_source" : {
          "account_number" : 0,
          "balance" : 16623,
          "firstname" : "Bradshaw",
          "lastname" : "Mckenzie",
          "age" : 29,
          "gender" : "F",
          "address" : "244 Columbus Place",
          "employer" : "Euron",
          "email" : "bradshawmckenzie@euron.com",
          "city" : "Hobucken",
          "state" : "CO"
        },
        "sort" : [
          0
        ]
      },
      ...
    ]
  }
}

Response field interpretation

  • took – how long it takes Elasticsearch to run the query, in milliseconds
  • timed_out – whether the search request timed out
  • _ shards – how many tiles were searched, and how many tiles were successful, failed, or skipped.
  • max_score – find the relevant file with the highest score
  • hits.total.value - how many matching documents were found
  • hits.sort - sort position of the document (when not sorted by relevant score)
  • hits._score - the relevant score of the document (not applicable when using match_all)

Response result description

Elasticsearch will return 10 pieces of data by default, and will not return all data at once.

Description of request mode

ES supports two basic retrieval methods;

  • Send search parameters (uri + search parameters) through REST request uri;
  • Send them through REST request body (uri + request body);

That is, in addition to the request interface in the above example, retrieval is performed according to the request body;

It can also be retrieved in the form of GET request parameters:

GET bank/_search?q=*&sort=account_number:asc
# q = * query all
# sort=account_number:asc by account_number in ascending order

Query DSL

This section refers to official documents: Query DSL

Elasticsearch provides a JSON style DSL that can execute queries. This is called Query DSL, and the query language is very comprehensive. Elasticsearch can use it to realize rich search functions with a simple JSON interface. It will be used for the following search operations:

1. Basic grammar format

Typical structure of a query statement:

QUERY_NAME:{
   ARGUMENT:VALUE,
   ARGUMENT:VALUE,...
}

For a field, its structure is as follows:

{
  QUERY_NAME:{
     FIELD_NAME:{
       ARGUMENT:VALUE,
       ARGUMENT:VALUE,...
      }
   }
}

Request example:

GET bank/_search
{
  "query": {
    "match_all": {}
  },
  "from": 0,
  "size": 5,
  "sort": [
    {
      "account_number": {
        "order": "desc"
      },
      "balance": {
      	"order": "asc"
      }
    }
  ]
}

# match_all query type: represents all query types. In es, you can combine many query types in query to complete complex queries;
# from+size limit to complete paging function; Starting from the first data, how much data is there on each page
# Sort sort, multi field sort, will sort the subsequent fields when the previous fields are equal, otherwise the previous order will prevail;

2. Return some fields

Request example:

GET bank/_search
{
  "query": {
    "match_all": {}
  },
  "from": 0,
  "size": 5,
  "sort": [
    {
      "account_number": {
        "order": "desc"
      }
    }
  ],
  "_source": ["balance","firstname"]
}

# _ source specifies the field name contained in the returned result

Examples of results:

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1000,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [
      {
        "_index" : "bank",
        "_type" : "account",
        "_id" : "999",
        "_score" : null,
        "_source" : {
          "firstname" : "Dorothy",
          "balance" : 6087
        },
        "sort" : [
          999
        ]
      },
    	...
    ]
 	}
}

3. match query

Exact query - basic data type (non text)

GET bank/_search
{
  "query": {
    "match": {
      "account_number": 20
    }
  }
}
# Find matching account_ term is recommended for non text data with number 20

Fuzzy query - text string

GET bank/_search
{
  "query": {
    "match": {
      "address": "mill lane"
    }
  }
}
# Find the data with the matching address containing mill or lane

match is full-text search. Word segmentation matching is performed on the search fields, and the score will be given according to the response_ The principle of score sorting is inverted index.

Exact match - text string

GET bank/_search
{
  "query": {
    "match": {
      "address.keyword": "288 Mill Street"
    }
  }
}
# Find data with address 288 Mill Street.
# The search here is an accurate search. Only when there is an exact match can the existing records be found,
# If you want to blur the query, you should use match_phrase matching

4. match_phrase phrase matching

The value to be matched is retrieved as an entire word (no word segmentation)

GET bank/_search
{
  "query": {
    "match_phrase": {
      "address": "mill lane"
    }
  }
}
# Here, the data with address matching the phrase mill lane will be retrieved

5. multi_math multi field matching

GET bank/_search
{
  "query": {
    "multi_match": {
      "query": "mill",
      "fields": [
        "city",
        "address"
      ]
    }
  }
}
# Retrieving city or address matches the data containing mill, and word segmentation will be performed on the query criteria

6. bool composite query

Compound statements can be combined with any other query statements, including matching statements. This means that between compound statements

It can be nested with each other and can express very complex logic.

  • Must: all conditions listed in must must be met
  • must_not, must not match must_ All conditions listed in not.
  • Should, one of the conditions listed in should can be satisfied (similar to or in MYSQL).
GET bank/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "gender": "M"
          }
        },
        {
          "match": {
            "address": "mill"
          }
        }
      ]
    }
  }
}
# Query the data whose gender is M and address contains mill

7. filter result filtering

Not all queries need to generate scores, especially those documents that are only used for filtering filtering. In order not to calculate the score, elastic search automatically checks the scenario and optimizes the execution of the query.

The filter filters the results and does not calculate the correlation score.

GET bank/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "address": "mill"
          }
        }
      ],
      "filter": {
        "range": {
          "balance": {
            "gte": "10000",
            "lte": "20000"
          }
        }
      }
    }
  }
}
# First, query all documents with matching address including mil,
# Then filter the query results according to 10000 < = balance < = 20000

In the boolean query, must, should and must_not elements are called query clauses. Whether the document meets the criteria in each "must" or "should" clause determines the "relevance score" of the document. The higher the score, the better the document matches your search criteria. By default, Elasticsearch returns documents sorted by these correlation scores.

Conditions in the 'must_not' clause are treated as' filters'. It affects whether the document is included in the results, but it does not affect the scoring method of the document. You can also explicitly specify any filter to include or exclude documents based on structured data.

8. term exact search

Avoid using term to query text fields

By default, Elasticsearch passes analysis Word segmentation splits the value of a text field into parts, which makes it difficult to accurately match the value of a text field.

If you want to query text field values, use match query instead.

https://www.elastic.co/guide/en/elasticsearch/reference/7.11/query-dsl-term-query.html

Above 3 Match matching query describes the exact query for non text fields. Elasticsearch officially recommends using term to accurately retrieve such non text fields.

GET bank/_search
{
  "query": {
    "term": {
      "age": "28"
    }
  }
}
# Find data with age 28

9. Aggregation - perform aggregation

https://www.elastic.co/guide/en/elasticsearch/reference/7.11/search-aggregations.html

Aggregate syntax

GET /my-index-000001/_search
{
  "aggs":{
    "aggs_name":{ # The name of this aggregation is easy to display in the result set
        "AGG_TYPE":{ # Type of aggregation (avg,term,terms)
        }
     }
	}
}

Example 1 - search for the age distribution and average age of all people with mill in address

GET bank/_search
{
  "query": {
    "match": {
      "address": "Mill"
    }
  },
  "aggs": {
    "ageAgg": {
      "terms": {
        "field": "age",
        "size": 10
      }
    },
    "ageAvg": {
      "avg": {
        "field": "age"
      }
    },
    "balanceAvg": {
      "avg": {
        "field": "balance"
      }
    }
  },
  "size": 0
}
# "ageAgg": {    				  ---  The aggregate name is ageagg
#   "terms": { 				    ---  Aggregate type is term
#     "field": "age", --- the aggregation field is age
#     "size": 10 			    ---  Get the first ten data after aggregation
#   }
# },
# ------------------------
# "ageAvg": {    				  ---  The aggregate name is ageavg
#   "avg": { 				      ---  The aggregation type is AVG for average
#     "field": "age" 	    ---  The aggregation field is age
#   }
# },
# ------------------------
# "balanceAvg": { 				  ---  The aggregate name is balanceavg
#   "avg": { 				      ---  The aggregation type is AVG for average
#     "field": "balance" -- the aggregate field is balance
#   }
# }
# ------------------------
# "size": 0 --- the hit result is not displayed, and only the aggregate information is viewed

result:

{
  "took" : 10,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 4,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "ageAgg" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : 38,
          "doc_count" : 2
        },
        {
          "key" : 28,
          "doc_count" : 1
        },
        {
          "key" : 32,
          "doc_count" : 1
        }
      ]
    },
    "ageAvg" : {
      "value" : 34.0
    },
    "balanceAvg" : {
      "value" : 25208.0
    }
  }
}

Example 2 - aggregate by age and find the average salary of these people in these age groups

GET bank/_search
{
  "query": {
    "match_all": {}
  },
  "aggs": {
    "ageAgg": {
      "terms": {
        "field": "age",
        "size": 100
      },
      "aggs": {
        "ageAvg": {
          "avg": {
            "field": "balance"
          }
        }
      }
    }
  },
  "size": 0
}

result:

{
  "took" : 12,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1000,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "ageAgg" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : 31,
          "doc_count" : 61,
          "ageAvg" : {
            "value" : 28312.918032786885
          }
        },
        {
          "key" : 39,
          "doc_count" : 60,
          "ageAvg" : {
            "value" : 25269.583333333332
          }
        },
        {
          "key" : 26,
          "doc_count" : 59,
          "ageAvg" : {
            "value" : 23194.813559322032
          }
        },
        {
          "key" : 32,
          "doc_count" : 52,
          "ageAvg" : {
            "value" : 23951.346153846152
          }
        },
        {
          "key" : 35,
          "doc_count" : 52,
          "ageAvg" : {
            "value" : 22136.69230769231
          }
        },
        {
          "key" : 36,
          "doc_count" : 52,
          "ageAvg" : {
            "value" : 22174.71153846154
          }
        },
        {
          "key" : 22,
          "doc_count" : 51,
          "ageAvg" : {
            "value" : 24731.07843137255
          }
        },
        {
          "key" : 28,
          "doc_count" : 51,
          "ageAvg" : {
            "value" : 28273.882352941175
          }
        },
        {
          "key" : 33,
          "doc_count" : 50,
          "ageAvg" : {
            "value" : 25093.94
          }
        },
        {
          "key" : 34,
          "doc_count" : 49,
          "ageAvg" : {
            "value" : 26809.95918367347
          }
        },
        {
          "key" : 30,
          "doc_count" : 47,
          "ageAvg" : {
            "value" : 22841.106382978724
          }
        },
        {
          "key" : 21,
          "doc_count" : 46,
          "ageAvg" : {
            "value" : 26981.434782608696
          }
        },
        {
          "key" : 40,
          "doc_count" : 45,
          "ageAvg" : {
            "value" : 27183.17777777778
          }
        },
        {
          "key" : 20,
          "doc_count" : 44,
          "ageAvg" : {
            "value" : 27741.227272727272
          }
        },
        {
          "key" : 23,
          "doc_count" : 42,
          "ageAvg" : {
            "value" : 27314.214285714286
          }
        },
        {
          "key" : 24,
          "doc_count" : 42,
          "ageAvg" : {
            "value" : 28519.04761904762
          }
        },
        {
          "key" : 25,
          "doc_count" : 42,
          "ageAvg" : {
            "value" : 27445.214285714286
          }
        },
        {
          "key" : 37,
          "doc_count" : 42,
          "ageAvg" : {
            "value" : 27022.261904761905
          }
        },
        {
          "key" : 27,
          "doc_count" : 39,
          "ageAvg" : {
            "value" : 21471.871794871793
          }
        },
        {
          "key" : 38,
          "doc_count" : 39,
          "ageAvg" : {
            "value" : 26187.17948717949
          }
        },
        {
          "key" : 29,
          "doc_count" : 35,
          "ageAvg" : {
            "value" : 29483.14285714286
          }
        }
      ]
    }
  }
}

Example 3 - find out all age distributions, and the average salary of M and F in these age groups, as well as the overall average salary of this age group

GET bank/_search
{
  "query": {
    "match_all": {}
  },
  "aggs": {
    "ageAgg": {
      "terms": {
        "field": "age",
        "size": 100
      },
      "aggs": {
        "genderAgg": {
          "terms": {
            "field": "gender.keyword"
          },
          "aggs": {
            "balanceAvg": {
              "avg": {
                "field": "balance"
              }
            }
          }
        },
        "ageBalanceAvg": {
          "avg": {
            "field": "balance"
          }
        }
      }
    }
  },
  "size": 0
}
# "field": "gender.keyword" gender is txt, which cannot be aggregated and must be added keyword exact substitution

result:

{
  "took" : 17,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1000,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "ageAgg" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : 31,
          "doc_count" : 61,
          "genderAgg" : {
            "doc_count_error_upper_bound" : 0,
            "sum_other_doc_count" : 0,
            "buckets" : [
              {
                "key" : "M",
                "doc_count" : 35,
                "balanceAvg" : {
                  "value" : 29565.628571428573
                }
              },
              {
                "key" : "F",
                "doc_count" : 26,
                "balanceAvg" : {
                  "value" : 26626.576923076922
                }
              }
            ]
          },
          "ageBalanceAvg" : {
            "value" : 28312.918032786885
          }
        },
        {
          "key" : 39,
          "doc_count" : 60,
          "genderAgg" : {
            "doc_count_error_upper_bound" : 0,
            "sum_other_doc_count" : 0,
            "buckets" : [
              {
                "key" : "F",
                "doc_count" : 38,
                "balanceAvg" : {
                  "value" : 26348.684210526317
                }
              },
              {
                "key" : "M",
                "doc_count" : 22,
                "balanceAvg" : {
                  "value" : 23405.68181818182
                }
              }
            ]
          },
          "ageBalanceAvg" : {
            "value" : 25269.583333333332
          }
        },
        ...
      ]
    }
  }
}

Elasticsearch-Mapping

Mapping - Introduction to mapping official website:

https://www.elastic.co/guide/en/elasticsearch/reference/7.11/mapping.html

1. Introduction to mapping

Mapping is used to define a document and how its contained field s are stored and indexed.

For example, use maping to define:

  • Which string attributes should be regarded as full text fields;
  • Which attributes contain numbers, dates, or geographic locations;
  • Whether all attributes in the document are indexed (all configured);
  • Format of date;
  • Custom mapping rules to dynamically add attributes;

Viewing mapping information

GET bank/_mapping

{
  "bank" : {
    "mappings" : {
      "properties" : {
        "account_number" : {
          "type" : "long"
        },
        "address" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "age" : {
          "type" : "long"
        },
        "balance" : {
          "type" : "long"
        },
        "city" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "email" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "employer" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "firstname" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "gender" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "lastname" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "state" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        }
      }
    }
  }
}

2. New version type removal

ElasticSearch7 - remove the type concept

  1. Two data representations in relational databases are independent. Even if they have columns with the same name, it will not affect their use, but this is not the case in ES. Elastic search is a search engine developed based on Lucene, and the final processing method of files with the same name under different type s in ES is the same in Lucene.
  • Two users under two different types_ Name, under the same index of ES, is actually considered to be the same filed. You must define the same filed mapping in two different types. Otherwise, the same field names in different types will conflict in processing, resulting in the decrease of Lucene processing efficiency.
  • Removing type is to improve the efficiency of ES data processing.
  1. Elasticsearch 7. The type parameter in the X URL is optional. For example, indexing a document no longer requires a document type.
  2. Elasticsearch 8.x no longer supports the type parameter in URL s.
  3. Solution: migrate the index from multi type to single type. Each type of document has an independent index. Migrate all the type data under the existing index to the specified location. See data migration for details

3. Attribute type

Reference: Official Attribute type

Mapping operation

References: Creating Mapping operation

1. Create index mapping

Create an index and specify the mapping rules for attributes (equivalent to creating a new table and specifying fields and field types)

PUT /my_index
{
  "mappings": {
    "properties": {
      "age": {
        "type": "integer"
      },
      "email": {
        "type": "keyword"
      },
      "name": {
        "type": "text"
      }
    }
  }
}

result:

{
  "acknowledged" : true,
  "shards_acknowledged" : true,
  "index" : "my_index"
}

2. Add fields to existing mappings

reference resources: Official documents

PUT /my_index/_mapping
{
  "properties": {
    "employee-id": {
      "type": "keyword",
      "index": false
    }
  }
}

# "Index" here: false, indicating that the newly added field cannot be retrieved. The default is true
# https://www.elastic.co/guide/en/elasticsearch/reference/7.x/mapping-index.html

result:

{
  "acknowledged" : true
}

3. View mapping

reference resources: Official documents

GET /my_index/_mapping
# View the mapping of a field
GET /my_index/_mapping/field/employee-id

result:

{
  "my_index" : {
    "mappings" : {
      "properties" : {
        "age" : {
          "type" : "integer"
        },
        "email" : {
          "type" : "keyword"
        },
        "employee-id" : {
          "type" : "keyword",
          "index" : false
        },
        "name" : {
          "type" : "text"
        }
      }
    }
  }
}
# index false indicates that it cannot be found by the index

4. Update mapping

reference resources: Official documents

We cannot update an existing field mapping. Update must create a new index for data migration.

5. Data migration

There are two migration methods: one is to remove type after 7 and 7, and the other is to include type migration.

No type data migration

POST reindex [Fixed writing]
{
  "source":{
      "index":"twitter"
   },
  "dest":{
      "index":"new_twitters"
   }
}

With data migration

POST reindex [Fixed writing]
{
  "source":{
      "index":"twitter",
      "twitter":"twitter"
   },
  "dest":{
      "index":"new_twitters"
   }
}

6. Data migration instance

For our test data, it is the index bank containing type.

Now let's create a new index newbank and modify the types of some fields to demonstrate the data migration operation when the mapping needs to be updated.

① View the current field mapping type of index bank

GET /bank/_mapping
# result
{
  "bank" : {
    "mappings" : {
      "properties" : {
        "account_number" : {
          "type" : "long"
        },
        "address" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "age" : {
          "type" : "long"
        },
        "balance" : {
          "type" : "long"
        },
        "city" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "email" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "employer" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "firstname" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "gender" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "lastname" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "state" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        }
      }
    }
  }
}

② Create a new index newbank and modify the field type

PUT /newbank
{
  "mappings": {
    "properties": {
      "account_number": {
        "type": "long"
      },
      "address": {
        "type": "text"
      },
      "age": {
        "type": "integer"
      },
      "balance": {
        "type": "long"
      },
      "city": {
        "type": "keyword"
      },
      "email": {
        "type": "keyword"
      },
      "employer": {
        "type": "keyword"
      },
      "firstname": {
        "type": "text"
      },
      "gender": {
        "type": "keyword"
      },
      "lastname": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      },
      "state": {
        "type": "keyword"
      }
    }
  }
}

③ Data migration

POST _reindex
{
  "source": {
    "index": "bank",
    "type": "account"
  },
  "dest": {
    "index": "newbank"
  }
}

result:

#! Deprecation: [types removal] Specifying types in reindex requests is deprecated.
{
  "took" : 269,
  "timed_out" : false,
  "total" : 1000,
  "updated" : 0,
  "created" : 1000,
  "deleted" : 0,
  "batches" : 1,
  "version_conflicts" : 0,
  "noops" : 0,
  "retries" : {
    "bulk" : 0,
    "search" : 0
  },
  "throttled_millis" : 0,
  "requests_per_second" : -1.0,
  "throttled_until_millis" : 0,
  "failures" : [ ]
}

④ View migrated data

GET /newbank/_search
# Results: after migration, the type was unified as_ doc remove type
{
  "took" : 367,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1000,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "newbank",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "account_number" : 1,
          "balance" : 39225,
          "firstname" : "Amber",
          "lastname" : "Duke",
          "age" : 32,
          "gender" : "M",
          "address" : "880 Holmes Lane",
          "employer" : "Pyrami",
          "email" : "amberduke@pyrami.com",
          "city" : "Brogan",
          "state" : "IL"
        }
      },
      ...

Keywords: ElasticSearch search engine lucene

Added by asifkamalzahid on Wed, 29 Dec 2021 08:12:53 +0200