Introduction:
ES is a non relational database of distributed documents (a document is similar to a single record in a relational database). Each field of the document will be indexed by default, and the data of each field can be searched. It can be horizontally extended to hundreds of servers to store and process PB level data. ES is based on Restful Api interface and can interact with ES through Restful Api.
Use case:
- es can be used as the database of blog system, articles and other contents can be stored in es, and can be searched quickly according to the contents of articles
- When the amount of data in the relational database is particularly large and the query is slow, a large amount of data can be imported into es and the query goes through es
- Deploy a large-scale log framework and logstash to collect logs, es store, search and analyze massive events, and kibana can visually view the results; ELK technology, elasticsearch+logstash+kibana
Quick start project
In Windows environment, Download es and kibana. Kibana is an open source analysis and visualization platform. CURD, kibana version and ES version can be updated synchronously through kibana Dev Tool and ES data. Therefore, it is best to use the same version. Download address: es Download,kibana Download
After downloading, unzip the respective compressed packages and run the in the bin directory under the respective compressed packages in cmd bat file; es default port number 9200, kibana default port number 5601, visit localhost:5601 to enter kibana
concept
Index: an index is similar to a database in a traditional relational database. It is a place to store relational documents.
Type: similar to the table in traditional relational database, in the old version of es, an index library can contain multiple types. In the index of es, different mapping types have the same attribute name, and they all use the same Lucene attribute at the bottom. Because there are few or no fields in different types of the same index library, which will affect the query efficiency of es, it will be discussed in es6 In version 0.0, a document can only contain one type, while in 7.0.0, the type will be discarded and completely deleted in 8.0.0. The type in 7.x defaults to_ doc
Document: a single record equivalent to a relational database
Fields: equivalent to columns in a relational database
Inverted index:
Relational database can improve the speed of data retrieval by adding an index, such as a B-tree index, to the specified column. Elasticsearch and Lucene use a structure called inverted index to achieve the same purpose. In es, an inverted index is created for each attribute in a document by default. An attribute without inverted index cannot be searched. Why is es query fast?
Operation steps of inverted index:
- First, extract all the keywords contained in the document
- Then save the corresponding relationship between keywords and documents
- Finally, index and sort the keywords themselves.
In this way, when users retrieve keywords, they can first find the keyword index and find the document through the corresponding relationship between keywords and documents.
As shown in the following three documents:
id | age | sex | name |
---|---|---|---|
1 | 18 | female | Peking University |
2 | 20 | male | Hebei University Youth |
3 | 18 | male | Patriotic youth |
Documents with specified keywords:
Serial number | keyword | Include document |
---|---|---|
1 | Peking University | 1,2 |
2 | Hebei | 2 |
3 | university | 2 |
4 | youth | 2,3 |
5 | Patriotic | 3 |
6 | 18 | 1,3 |
7 | 20 | 2 |
8 | male | 2,3 |
9 | female | 1 |
According to the keyword query, the result document can be retrieved directly
RestApi
method | url | describe |
---|---|---|
PUT | localhost:9200 / index name / type name / document id | Create document (specify document id) |
POST | localhost:9200 / index name / type name | Create document (random document id) |
POST | localhost:9200 / index name / type name / document id/_update | Modify document |
DELETE | localhost:9200 / index name / type name / document id | remove document |
GET | localhost:9200 / index name / type name / document id | Query document by document id |
POST | localhost:9200 / index name / type name/_ search | Query all data |
Main data type of field
- String type: text, keyword
- Number type: long, integer, short, byte, double, float, half_float, scaled_float
- date: date
- Date nanosecond: date_nanos
- boolean: boolean
- Binary: binary
- Range: integer_range, float_range, long_range, double_range, date_range
Slice
Slicing is divided into main slicing and copy slicing
Sharding is similar to the database and table in mysql. When creating an index database, es can set to create several primary shards and replication Shards. By default, an index is assigned five primary shards, and one primary shard has one replication shard. Each document is stored in an independent primary shard. When inserting a document, it will be divided according to the document's_ ID (unique id) determines which main partition the document is stored in;
Replica shard is just a copy of the master shard. It can prevent data loss caused by hardware failure, and provide read requests, such as searching or retrieving documents from other Shards. Both master shard and replica shard can handle read requests - search or document retrieval, so the more redundant data, the greater the search throughput that can be processed.
give an example
A more comprehensive tutorial on querying es data with kibana
// Create a commodity index library without specifying the type. The default type is PUT goods { // mappings is to define the field names and data types of indexes in the index library, which is similar to the table structure information in mysql. "mappings": { "properties": { "goodsId": { "type": "integer" }, // Set the goodsName type to text for full-text retrieval and keyword for keyword search // The fields of keyword type are sorted by default. When fetching data, they will be output directly according to the order, which has high query efficiency "goodsName": { "type": "text", "fields": { "keyword": { "type": "keyword", // The maximum field value length of the index. The excess part will not be indexed and stored "ignore_above": 256 } } }, "createTime": { "type": "date" } } } } // Insert a piece of product data without specifying id POST goods/_doc { "goodsId":1, "goodsName":"Commodity 1", "createTime":"1643178793939" }
// Query all data GET goods/_search // result: { "took" : 0, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 2, // There are two pieces of data in total "relation" : "eq" }, "max_score" : 1.0, "hits" : [ { "_index" : "goods", "_type" : "_doc", "_id" : "vhqYlX4BFS2tMwa2XXMK", "_score" : 1.0, "_source" : { "goodsId" : 1, "goodsName" : "Commodity 1", "createTime" : "1643178793939" } }, { "_index" : "goods", "_type" : "_doc", "_id" : "wRqYlX4BFS2tMwa2dXO9", "_score" : 1.0, "_source" : { "goodsId" : 2, "goodsName" : "Commodity 2", "createTime" : "1643178793939" } } ] } }
// Query according to goodsId criteria GET goods/_search { "query": { "match": { "goodsId": 1 } } } // result: { "took" : 0, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 1, "relation" : "eq" }, "max_score" : 1.0, "hits" : [ { "_index" : "goods", "_type" : "_doc", "_id" : "vhqYlX4BFS2tMwa2XXMK", "_score" : 1.0, "_source" : { "goodsId" : 1, "goodsName" : "Commodity 1", "createTime" : "1643178793939" } } ] } }