ES introduction learning notes

Introduction:

ES is a non relational database of distributed documents (a document is similar to a single record in a relational database). Each field of the document will be indexed by default, and the data of each field can be searched. It can be horizontally extended to hundreds of servers to store and process PB level data. ES is based on Restful Api interface and can interact with ES through Restful Api.

Use case:

es can be used as the database of blog system, articles and other contents can be stored in es, and can be searched quickly according to the contents of articles
When the amount of data in the relational database is particularly large and the query is slow, a large amount of data can be imported into es and the query goes through es
Deploy a large-scale log framework and logstash to collect logs, es store, search and analyze massive events, and kibana can visually view the results; ELK technology, elasticsearch+logstash+kibana

es Official Guide

Quick start project

In Windows environment, Download es and kibana. Kibana is an open source analysis and visualization platform. CURD, kibana version and ES version can be updated synchronously through kibana Dev Tool and ES data. Therefore, it is best to use the same version. Download address: es Download，kibana Download

After downloading, unzip the respective compressed packages and run the in the bin directory under the respective compressed packages in cmd bat file; es default port number 9200, kibana default port number 5601, visit localhost:5601 to enter kibana

concept

Index: an index is similar to a database in a traditional relational database. It is a place to store relational documents.

Type: similar to the table in traditional relational database, in the old version of es, an index library can contain multiple types. In the index of es, different mapping types have the same attribute name, and they all use the same Lucene attribute at the bottom. Because there are few or no fields in different types of the same index library, which will affect the query efficiency of es, it will be discussed in es6 In version 0.0, a document can only contain one type, while in 7.0.0, the type will be discarded and completely deleted in 8.0.0. The type in 7.x defaults to_ doc

Document: a single record equivalent to a relational database

Fields: equivalent to columns in a relational database

Inverted index:

Relational database can improve the speed of data retrieval by adding an index, such as a B-tree index, to the specified column. Elasticsearch and Lucene use a structure called inverted index to achieve the same purpose. In es, an inverted index is created for each attribute in a document by default. An attribute without inverted index cannot be searched. Why is es query fast?

Operation steps of inverted index:

First, extract all the keywords contained in the document
Then save the corresponding relationship between keywords and documents
Finally, index and sort the keywords themselves.

In this way, when users retrieve keywords, they can first find the keyword index and find the document through the corresponding relationship between keywords and documents.

As shown in the following three documents:

id	age	sex	name
1	18	female	Peking University
2	20	male	Hebei University Youth
3	18	male	Patriotic youth

Documents with specified keywords:

Serial number	keyword	Include document
1	Peking University	1，2
2	Hebei	2
3	university	2
4	youth	2，3
5	Patriotic	3
6	18	1，3
7	20	2
8	male	2，3
9	female	1

According to the keyword query, the result document can be retrieved directly

RestApi

method	url	describe
PUT	localhost:9200 / index name / type name / document id	Create document (specify document id)
POST	localhost:9200 / index name / type name	Create document (random document id)
POST	localhost:9200 / index name / type name / document id/_update	Modify document
DELETE	localhost:9200 / index name / type name / document id	remove document
GET	localhost:9200 / index name / type name / document id	Query document by document id
POST	localhost:9200 / index name / type name/_ search	Query all data

Main data type of field

String type: text, keyword
Number type: long, integer, short, byte, double, float, half_float, scaled_float
date: date
Date nanosecond: date_nanos
boolean: boolean
Binary: binary
Range: integer_range, float_range, long_range, double_range, date_range

Slice

Slicing is divided into main slicing and copy slicing

Sharding is similar to the database and table in mysql. When creating an index database, es can set to create several primary shards and replication Shards. By default, an index is assigned five primary shards, and one primary shard has one replication shard. Each document is stored in an independent primary shard. When inserting a document, it will be divided according to the document's_ ID (unique id) determines which main partition the document is stored in;

Replica shard is just a copy of the master shard. It can prevent data loss caused by hardware failure, and provide read requests, such as searching or retrieving documents from other Shards. Both master shard and replica shard can handle read requests - search or document retrieval, so the more redundant data, the greater the search throughput that can be processed.

es deep slice

give an example

A more comprehensive tutorial on querying es data with kibana

Aggregate query

// Create a commodity index library without specifying the type. The default type is
PUT goods
{
  	// mappings is to define the field names and data types of indexes in the index library, which is similar to the table structure information in mysql.
	"mappings": {
		"properties": {
			"goodsId": {
				"type": "integer"
			},
            // Set the goodsName type to text for full-text retrieval and keyword for keyword search
            // The fields of keyword type are sorted by default. When fetching data, they will be output directly according to the order, which has high query efficiency
			"goodsName": {
				"type": "text",
				"fields": {
					"keyword": {
						"type": "keyword",
                        // The maximum field value length of the index. The excess part will not be indexed and stored
						"ignore_above": 256
					}
				}
			},
			"createTime": {
				"type": "date"
			}
		}
	}
}
// Insert a piece of product data without specifying id
POST goods/_doc
{
  "goodsId":1,
  "goodsName":"Commodity 1",
  "createTime":"1643178793939"
}

// Query all data
GET goods/_search
// result:
{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2, // There are two pieces of data in total
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "goods",
        "_type" : "_doc",
        "_id" : "vhqYlX4BFS2tMwa2XXMK",
        "_score" : 1.0,
        "_source" : {
          "goodsId" : 1,
          "goodsName" : "Commodity 1",
          "createTime" : "1643178793939"
        }
      },
      {
        "_index" : "goods",
        "_type" : "_doc",
        "_id" : "wRqYlX4BFS2tMwa2dXO9",
        "_score" : 1.0,
        "_source" : {
          "goodsId" : 2,
          "goodsName" : "Commodity 2",
          "createTime" : "1643178793939"
        }
      }
    ]
  }
}

// Query according to goodsId criteria
GET goods/_search
{
  "query": {
    "match": {
      "goodsId": 1
    }
  }
}
// result:
{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "goods",
        "_type" : "_doc",
        "_id" : "vhqYlX4BFS2tMwa2XXMK",
        "_score" : 1.0,
        "_source" : {
          "goodsId" : 1,
          "goodsName" : "Commodity 1",
          "createTime" : "1643178793939"
        }
      }
    ]
  }
}

Keywords: Big Data ElasticSearch search engine

Added by daloss on Mon, 07 Feb 2022 03:25:07 +0200

Programming VIP