Take you hand in hand to build Elasticsearch cluster

1, Why build Elasticsearch cluster?

Elasticsearch is an open source and highly extended distributed full-text search engine.

The complexity of Lucene is hidden through a simple RESTful API, which makes full-text search simple.

ES mainly solves the following problems:

  • Retrieve relevant data
  • Return statistics
  • Be fast

1. High availability

The design reduces the time when the system cannot provide services. Assuming that the service node is down at a certain time, the node is temporarily unavailable. If the system can provide services all the time, its availability is 100%. Therefore, in order to reduce the unavailable time of Elasticsearch service, the function of cluster is reflected.

Of course, if you want to ensure that the ES data is not lost during downtime, you need to back up its data. Therefore, there is the concept of fragmentation and replica.

Slicing: data is divided into multiple parts, and all data is stored in ES Index, but the actual index only maintains the relationship with multiple partitions, and the data is routed to multiple partitions. For example, if an index has 5 shards, the index will have 0,1,2,3,4, which specifies the partition in which each document data is stored according to the routing operation formula has(_routing)%number_of_primary_shards specifies that the data Evenly distributed among clusters.

If there is one or several slices with a large amount of data stored, you can use Index partition index.routring_partition_size, but there are restrictions after use Cannot create join_feild relationship mapping _ routing will become a necessary pass in parameter for writing data
Copy: copy is The replication partition of the primary partition can be flexibly adjusted to improve the performance of the whole cluster Fault tolerance, and it should be noted that Replica shards cannot be on the same node as primary Shards. Generally speaking, Elasticsearch will try to store different fragments of an index on different hosts, and the copies of the fragments will also be stored on different hosts as much as possible, so as to improve the fault tolerance rate and high availability.

2. Health status

Elasticsearch cluster has three health states, and single node elasticsearch can also be regarded as a cluster.

  • Green: indicates that the primary partition and replica of all indexes are allocated and available, and the cluster is 100% available;
  • Yellow: the primary partition has been allocated and all primary partitions are available, but not all replicas are available. It may be missing or the replica of an index has not been allocated. You can allocate it through the API of the move cancel allocate command or move the partition to the specified node. Note here Second hand mobile phone auction platform The master partition and its replica must not be on the same node. At this time, the system fault tolerance and cluster high availability are weakened;
  • Red (red): not all primary partitions are available, which means that there is likely to be a risk of data loss. If there is only one single node Elasticsearch, it belongs to a yellow state because there is no replica;

3. Storage space

An ES cluster with multiple nodes has more storage space than a single node. You can set elasticsearch.yml and set the mount directory of data and log.

2, ES cluster and node type

The cluster consists of multiple nodes, each host is called a node, and each ES instance in the pseudo cluster is a node.

The above figure shows a cluster, in which Node-1 is the master node. The master node has permission to control the whole cluster and has permission to control the whole cluster. Each node has three partitions, where P0, P1, P2 represents the Primary partition, and the one beginning with R represents the replica partition corresponding to each Primary partition. There are three Primary partitions in total, and each Primary partition has two corresponding replica partitions.

  • Master node: master node. The primary node is mainly responsible for contents related to cluster operations, such as creating or deleting indexes, tracking which nodes are part of the cluster, and determining which partitions are assigned to relevant nodes. A stable master node is very important to the health of the cluster. By default, a node in any cluster may be selected as the primary node. Indexing data, searching and querying operations will occupy a lot of cpu, memory and io resources. In order to ensure the stability of a cluster, it is a better choice to separate the master node and the data node. Although the master node can also coordinate nodes, route search and add data from the client to the data node, it is best not to use these dedicated master nodes. An important principle is to do as little work as possible;
  • Data node: the data node. The data node is mainly a node that stores index data. It is mainly used to add, delete, modify, query and aggregate documents. Data nodes have high requirements for CPU, memory and IO. During optimization, the status of data nodes needs to be monitored. When resources are insufficient, new nodes need to be added to the cluster;
  • Load balancing node: also known as Client node, also known as Client node. When a node is neither configured as a master node nor a data node, the node can only process routing requests, search, distribution index operations, etc. in essence, the Client node acts as an intelligent load balancer. An independent Client node is very useful in a large cluster. It coordinates the master node and the data node. The Client node can get the status of the cluster by joining the cluster, and the requests can be routed directly according to the status of the cluster;
  • Preprocessing node: also called ingrest node. You can preprocess data before indexing data. All nodes support ingrest operation by default, or you can configure a node as an ingrest node. The above are several types of nodes. A node can actually correspond to different types. For example, a node can become a master node, a data node and a preprocessing node at the same time. However, if a node is neither a master node nor a data node, it is a load balancing node. Specific types can be set through specific configuration files;

When Elasticsearch processes query requests, the node workflow is as follows:

  • The request is handed over to the master node;
  • The master node receives the request and broadcasts the request to each shard of the index on the data node;
  • Each partition executes a search request and returns the result;
  • The segmented results are merged on the master node, sorted and returned to the user;

3, Build Elasticsearch7.X pseudo cluster

1. Download ES

Here, we install ES in the opt directory of the server

# Use Huawei image super fast!! And rename elasticsearch7.4-x86.tar.gz (x86 is the system instruction Architecture)
$ wget -O elasticsearch7.4-x86.tar.gz https://mirrors.huaweicloud.com/elasticsearch/7.4.0/elasticsearch-7.4.0-linux-x86_64.tar.gz
# decompression
$ tar -zxvf elasticsearch7.4-x86.tar.gz

2. Modify jvm.options and elasticsearch.yml configuration

# Enter es configuration directory
$ cd /opt/elasticsearch7.4-x86/config

# Modify jvm related parameters and adjust jvm heap memory size
$ vim jvm.options

# Configure es
$ vim elasticsearch.yml

jvm.options configuration

## Modify import: JVM heap size, and the memory ego will be set like this
-Xms512m
-Xmx512m

Configure elasticsearch.yml

# Do not set index related configuration in this file
cluster.name: waybill-center # Setting the cluster name is more important!
# ------------------------------------ Node ------------------------------------
node.name: es-master # Configure node name
node.master: true # Are you eligible to be elected as master? By default, the first machine in the ES cluster is the master node
node.data: false # Whether to store index data. The default value is true. One role and one class node are recommended in a large cluster. Do not have multiple roles
node.ingest: false #By default, all nodes can be ingest nodes
# ----------------------------------- Paths ------------------------------------
#path.conf: /opt/elasticsearch7.4-x86/config # Set the storage path of the configuration file. The default is the config directory under the es root directory
path.data: /data/es-master/data # Set the index data storage path. The default is the data directory under the es root directory
path.logs: /data/es-master/log # Set the log file storage path. The default is the log directory under the es root directory
# ----------------------------------- Memory -----------------------------------
#bootstrap.memory_lock: true # Lock the memory without switching to avoid suppressing the JVM virtual memory when the system memory is insufficient
# ---------------------------------- Network -----------------------------------
#network.host: 192.168.0.1 # Also set bind_host and publish_host
network.bind_host: 0.0.0.0 # Set the node binding ip, which can be used for http access
network.publish_host: x.x.x.x # Setting the interaction ip of other nodes with this node can make the intranet ip list must be a real ip
# Set a custom port for HTTP:
http.port: 9200 # Set external service http port
transport.tcp.port: 9300 # Set the tcp port for interaction between nodes
transport.tcp.compress: true # Set whether to compress data during tcp transmission. The default value is false
# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when this node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
#
discovery.seed_hosts: ["x.x.x.x:9300", "x.x.x.x:9301","x.x.x.x:9302"]
# The IP address of each node of the cluster can also use els, els.shuaiguoxia.com and other names, which need to be resolved by each node
#
# Bootstrap the cluster using an initial set of master-eligible nodes:
#
cluster.initial_master_nodes: ["es-master"]
discovery.zen.minimum_master_nodes: 2  # In order to avoid cerebral fissure, the number of cluster nodes is at least half + 1
# For more information, consult the discovery and cluster formation module documentation.
# ---------------------------------- Gateway -----------------------------------
gateway.recover_after_nodes: 3 # Set data recovery when N nodes in the cluster are started. The default value is 1

3. Configure ES to use its own jdk (recommended)

Modify the Elasticsearch script file in the bin directory

source "`dirname "$0"`"/elasticsearch-env # Here you can see that the elastic search env script loaded first sets the environment
# use es internal jdk
export JAVA_HOME=$ES_HOME/jdk/
export PATH=$JAVA_HOME/bin:$PATH

View elasticsearch env

#!/bin/bash

set -e -o pipefail

CDPATH=""

SCRIPT="$0"

# SCRIPT might be an arbitrarily deep series of symbolic links; loop until we
# have the concrete path
while [ -h "$SCRIPT" ] ; do
  ls=`ls -ld "$SCRIPT"`
  # Drop everything prior to ->
  link=`expr "$ls" : '.*-> \(.*\)$'`
  if expr "$link" : '/.*' > /dev/null; then
    SCRIPT="$link"
  else
    SCRIPT=`dirname "$SCRIPT"`/"$link"
  fi
done

# determine Elasticsearch home; to do this, we strip from the path until we find
# bin, and then strip bin (there is an assumption here that there is no nested
# directory under bin also named bin)
ES_HOME=`dirname "$SCRIPT"` # Here you can see that ES has been set_ HOME

# now make ES_HOME absolute
ES_HOME=`cd "$ES_HOME"; pwd`

while [ "`basename "$ES_HOME"`" != "bin" ]; do
  ES_HOME=`dirname "$ES_HOME"`
done
ES_HOME=`dirname "$ES_HOME"` 

4. Copy the current es build other data nodes

Copy current ES directory

# Copy two copies of the directory of the data node
cp -r /opt/elasticsearch7.4-x86 /opt/es-data-node1 
cp -r /opt/elasticsearch7.4-x86 /opt/es-data-node2

To modify the configuration files of two data nodes, modify the first data node first

# vim /opt/es-data-node1/config/elasticsearch.yml
node.name: es-node1 # Configure node name
node.ingest: false #By default, all nodes can be ingest nodes
node.master: false # Are you eligible to be elected as master? By default, the first machine in the ES cluster is the master node
node.data: true # Whether to store index data. The default value is true
#path.conf: /opt/es-data-node1/config # Set the storage path of the configuration file. The default is the config directory under the es root directory
path.data: /data/es-node1/data # Set the index data storage path. The default is the data directory under the es root directory
path.logs: /data/es-node1/log # Set the log file storage path. The default is the log directory under the es root directory
http.port: 9201 # Set external service http port
transport.tcp.port: 9301 # Set the tcp port for interaction between nodes

Modify the configuration files of the two data nodes, and then modify the second data node

# vim /opt/es-data-node2/config/elasticsearch.yml
node.name: es-node2 # Configure node name
node.master: false # Are you eligible to be elected as master? By default, the first machine in the ES cluster is the master node
node.data: true # Whether to store index data. The default value is true
#path.conf: /opt/es-data-node2/config # Set the storage path of the configuration file. The default is the config directory under the es root directory
path.data: /data/es-node2/data # Set the index data storage path. The default is the data directory under the es root directory
path.logs: /data/es-node2/log # Set the log file storage path. The default is the log directory under the es root directory
http.port: 9202 # Set external service http port
transport.tcp.port: 9302 # Set the tcp port for interaction between nodes

5. Create ES storage data and log directory

# Create or modify the path according to the configuration file of each node
mkdir -p /data/es-master/data
mkdir -p /data/es-node1/data
mkdir -p /data/es-node2/data
mkdir -p /data/es-master/log
mkdir -p /data/es-node1/log
mkdir -p /data/es-node2/log

6. Because ES cannot be started by root, create a new user

groupadd es # New user group es
useradd es -g es# Create a new user and add to es user group
passwd es # You can also change the user password (enter 123123)

7. Authorize ES users to operate on the directory

chown -R es:es  /data/es-master/
chown -R es:es  /data/es-node1/
chown -R es:es  /data/es-node2/

8. Start ES cluster

# Need to switch to es user
su es
# Start service
./opt/elasticsearch7.4-x86/bin/elasticsearch -d
./opt/es-data-node1/bin/elasticsearch -d
./opt/es-data-node2/bin/elasticsearch -d

When running ES in the background, you can add the - p command to let ES run in the background. The - p parameter records the process ID as a file # setting for background startup

./opt/elasticsearch7.4-x86/bin/elasticsearch  -p /tmp/elasticsearch-pid -d

End process

# View running pid
cat /tmp/elasticsearch-pid && echo
# End process
kill -SIGTERM {pid}

4, Startup exception (system resource restriction level)

1. Current maximum file descriptor exceeded

If max file descriptions appears, modify limits.conf

vim /etc/security/limits.conf # Modify restriction file

Number of files added to the file

*  soft nofile 65536
*  hard nofile 65536

2. The current maximum number of threads has been exceeded

If max number of threads appears, modify limits.d/20-nproc.conf

vim /etc/security/limits.d/20-nproc.conf

modify parameters

*  soft    nproc   4096
*  hard   nproc   4096
# Or no restrictions
# root       soft    nproc     unlimited 

3. Current maximum virtual memory exceeded

If virtual memory areas appears, modify sysctl.conf

vim /etc/sysctl.conf

modify parameters

vm.max_map_count=262144
# You can also use the following command to temporarily modify the memory limit
sysctl -w vm.max_map_count=262144

5, Verify on the master node machine

After configuring and starting the cluster, because the primary node opens http.port and the primary node is mainly used for the management of the whole cluster, it is recommended to configure only the primary node when configuring kibana.

However, when we verify, we can use the following commands to verify the setup status of the cluster

curl  -X GET http://localhost:9200/_cat/nodes # get requests to view cluster health status

# The response will be as follows. The asterisk is the primary node pointed to
127.0.0.1 69 93 1 0.06 0.07 0.06 dilm * es-master
127.0.0.1 69 93 1 0.06 0.07 0.06 dil  - es-node1
127.0.0.1 56 93 1 0.06 0.07 0.06 dil  - es-node2 

 

Keywords: ElasticSearch Distribution search engine

Added by cirene on Thu, 04 Nov 2021 15:29:37 +0200