Elasticsearch installation, configuration and use

The reason why Es is a member of the elastic stack family is that its search function is very powerful. In fact, it is a distributed search engine.

install

The installation of this article is based on Ubuntu 20.04 TLS virtual machine and adopts the mode of. tar.gz, < Click to download Linux x86_64 bit tar. GZ >

After extracting the downloaded file with the tar -xzf command, locate the configuration file: elasticsearch.yml

  • Open the node name and # delete it before node.name: node-1
  • network.host: 0.0.0.0 can also be configured as its own ip address. The default is localhost, which cannot be accessed by external computers
  • cluster.initial_master_nodes: ["node-1"] the name here should be consistent with the node name above
  • Add the following two statements to the end of the file so that < not attempted > can be accessed in the browser
    http.cors.enabled: true
    http.cors.allow-origin: "*"

Because it is executed by the virtual machine, you may encounter the problem that the memory of the virtual machine is too small. The solution is:
vim /etc/sysctl.conf add a line: vm.max_map_count=655360 (virtual machine memory size: 262144 at least)
After adding, press the: wq command to save and enter sysctl -p to refresh the application.

The ES is developed in java, so the JDK needs to be set. The new version has its own JDK by default. The ES here needs the JDK version of 11 by default. Therefore, if jdk8 is installed on your computer, it may need to be upgraded. Another solution is to use the built-in JDK.

Under normal circumstances, when es does not find the configuration of jdk in the environment variable, it will directly use the built-in configuration. The command to delete an environment variable is: unset < environment variable name >

Finally, start the startup es step: enter the bin directory and execute the elastic search file. For background execution, use: nohup &.

use

In actual use, we will use the template to create index and data stream.
There are two types of ES templates:

  • index template
  • component template

When using, reference the component template to the index template for combination. When creating an index or data stream, if it matches the partten in the index template, the corresponding index will be configured using the settings in the template.

Another important point is: the data storage time of es.
We can create a policy and reference it to the index template to configure the index's rolling storage.

When scrolling storage, one index may generate multiple indexes, but we have no way to dynamically change the corresponding index name in the application (if the application calls the REST API).

So: alias (alias) came out.
An alias can correspond to multiple indexes. When scrolling storage, you only need to configure the alias name in the index template, and subsequent operations will operate on this alias to solve the above problems.

Start creating the components required for the index template
1. component template

component template can create many different types and combine them into index template as needed.
The REST API corresponding to creating the component template is:

PUT http://192.168.37.128:9200/_ component_ Template / < custom component template name >
//The bottom json is the body request body
{
   "template":{
       "mappings":{
           "properties":{
               "address":{
                   "type":"keyword"
               }
           }
       }
   }
}

192.168.37.128 is my local virtual machine Ip, and 9200 is es the default external port number. (9300 is the port number used inside the default es cluster)

After successful creation, it will return:

{
    "acknowledged": true
}

At that time, if the component is combined into the index template, the mapping defined in the component will be synchronously applied to the index adapted to the index template.

Get all component template APIs:

GET http://192.168.37.128:9200/_component_template/
2.policy

policy uses es' index lifecycle manage ment (ILM) and can be used to dynamically manage data. When the data volume is not very large, we do not need to configure all its cycles.
ILM defines the lifecycle as five phases:

  • Hot: The index is actively being updated and queried.
  • Warm: The index is no longer being updated but is still being queried.
  • Cold: The index is no longer being updated and is queried infrequently. The information still needs to be searchable, but it's okay if those queries are slower.
  • Frozen: The index is no longer being updated and is queried rarely. The information still needs to be searchable, but it's okay if those queries are extremely slow.
  • Delete: The index is no longer needed and can safely be removed.

In general, we only need to define the Hot and Delete phases.

The REST API for creating a policy is:

PUT http://192.168.37.128:9200/_ ILM / policy / < custom policy name >
//Body request body
{
    "policy":{
        "phases":{
            "hot":{
                "actions":{
                    "rollover":{
                    //When each Index stores up to 3 pieces of data, start the rollover and enter the next configuration
                    //Setting stage: delete deletes the entire current index and generates the next index
                    //That is, even if there are 10 pieces of data in the current index, they will be deleted together
                        "max_docs":"3",  
                        "max_age":"7d" //If the above is not satisfied, the index has existed for 7 days. It will also trigger the rollover and enter the next stage.
                    }
                }
            },
            "delete":{
                "min_age":"0ms", //By default, the following deletion operations are performed immediately after triggering (delete the corresponding data in the Index that meets the configuration, such as the number of max_docs, or the storage time reaches 7 days)
                "actions":{
                    "delete":{}
                }
            }
        }
    }
}

result:

{
    "acknowledged": true
}

rollover is executed regularly. The default time is 10 minutes. So if you don't reach this time interval during the test, you may find that the conditions are met, but you don't enter the next stage as expected

The above configuration is a test configuration. You can't configure it so short in a normal production environment. When testing, we can easily test whether the configuration is normal.

If necessary during the test, change the rollover execution interval api to:

PUT http://192.168.37.128:9200/_cluster/settings
//Body request body
{
     "transient": {
      "indices.lifecycle.poll_interval": "20s" //Once in 20s
    }
}

At the same time, we can also use the api to view the results of the rollover operation:

GET http://192.168.37.128:9200/_ Cat / shards / times - * / / times - * is the partten configured by the index template

Examples of results:

times-000005 0 p STARTED    1 3.8kb 192.168.37.128 node-1
times-000005 0 r UNASSIGNED     

OK, so far, the component template and policy we need have been created. Next, you can create the index template.

index template

You can add data in the index template_ Stream: {} to determine whether the index template is applied to index creation or data stream creation.
Create the index template api as:

PUT http://192.168.37.128:9200/_ index_ Template / < custom index template name >
//request body
{
//Whenever the index is created, the template will be applied if the name times - is prefixed
  "index_patterns":["times-*"], 
  //"data_stream": {}, / / if this is written, the template is applied to the data stream; otherwise, the index is applied
  "template":{
      "settings":{
          "number_of_shards":1,
          "number_of_replicas":1,
          "index.lifecycle.name": "times-policy", //The name of the Policy just created
          "index.lifecycle.rollover_alias": "times" //Alias, which will be used as Index in the future
      }
  },
  "priority":50, //Because es there is a built-in template, the default priority is 100. If it is greater than 100, the built-in template will be used first. If it is less than 100, the built-in template will be used first
  //The two component template s created earlier
  "composed_of":["component_template1","component_template2"],
  "version":1
}

Then, when creating the first index, you need to indicate that the index is written. This setting cannot be directly placed in the index template, otherwise it will not take effect.

Create index:

//The standard name is: followed by 6 digits: 5 zeros and 1 ones
PUT http://192.168.37.128:9200/tests-000001
//reqeust body
{
    "aliases":{
        "tests":{
            "is_write_index":true
        }
    }
}

After this operation is completed, subsequent data additions to the index are performed using the specified alias to add, delete, modify, and query

Also note:
When creating a data stream, you need to specify is directly in the index template_ write_ index.

{
  "index_patterns":["tests-*"],
  "data_stream":{},
  "template":{
      "settings":{
          "number_of_shards":1,
          "number_of_replicas":1,
          "index.lifecycle.name": "times-policy", 
          "index.lifecycle.rollover_alias": "tests" 
      },
      "aliases":{
        "tests":{
            "is_write_index":true
        }
    }
  },
  "priority":50,
  "composed_of":["component_template1","component_template2"],
  "version":1
}

After you directly create a data stream, you can apply the above configuration:
Create data stream:

PUT http://192.168.37.128:9200/_data_stream/tests-000001

Add data to data stream:

POST http://192.168.37.128:9200/tests/_bulk?refresh

After the data is successfully added to the data stream, the returned index will be displayed. The current index is similar to:
. ds-tests-000001-2021.11.11-000001, once per rollover, the last value + 1.

To view data in the data stream:

GET http://192.168.37.128:9200/tests/_doc/_search

Postscript:
Index names must comply with the following rules:
It can only be lowercase and cannot contain: /, *?, ", <, >, |, space#
Cannot start with -, _, +
It can't be. Or
The length cannot be greater than 255 bytes

Keywords: Big Data ElasticSearch search engine

Added by xAtlas on Thu, 11 Nov 2021 06:31:37 +0200