Elasticsearch participle, Docker installation, Nginx

brief introduction

Official documents: https://www.elastic.co/guide/en/elasticsearch/reference/7.x/analysis.html

A tokenizer receives a character stream, divides it into independent tokens (word elements, usually independent words), and then outputs the tokens stream.

For example: whitespace tokenizer splits text when it encounters white space characters. It will put the text "Quick brown fox!" Split into [Quick,brown,fox!].

The tokenizer is also responsible for recording the order or position position position of each term (for phrase and word proximity query), and the character offsets of the start and end of the original word represented by term (for highlighting the content of the search).

elasticsearch provides many built-in word splitters that can be used to build custom analyzers.

POST _analyze
{
  "analyzer": "standard",
  "text": "The 2 QUICK Brown-Foxes jumped over the lazy dog's bone."
}

The default word splitter is generally for English. For Chinese, we need to install additional word splitter for word segmentation.

Install IK word breaker

Preparation in advance:

IK word splitter belongs to Elasticsearch plug-in, so the installation directory of IK word splitter is Elasticsearch plugins directory. When we start Elasticsearch with Docker, this directory has been mounted to / mydata/elasticsearch/plugins directory of the host.

The version of IK word splitter needs to correspond to the version of Elasticsearch. The currently selected version is 7.4.2, and the download address is: Github Release Or visit: Mirror Address

1. Download

# Enter the installed plug-in directory / mydata/elasticsearch/plugins
cd /mydata/elasticsearch/plugins

# Install wget download tool (if installed, skip)
 yum install -y wget

# Download the corresponding version of IK word breaker (here is 7.4.2)
wget https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v7.4.2/elasticsearch-analysis-ik-7.4.2.zip

The IK word breaker has been installed in the plugins directory of the mount. Now let's go inside the es container and check whether the installation is successful

# Enter the inside of the container
docker exec -it elasticsearch /bin/bash

# View es plug-in directory
ls /usr/share/elasticsearch/plugins

# You can see elastic search-analysis-ik-7.4.2 zip

Therefore, we only need to operate in the mounted directory / mydata/elasticsearch/plugins.

2. Decompression

# Enter the plug-in directory of es
cd /mydata/elasticsearch/plugins

# Unzip it to the ik directory under the plugins directory
unzip elasticsearch-analysis-ik-7.4.2.zip -d ik

# Delete downloaded packages
rm -f elasticsearch-analysis-ik-7.4.2.zip 

# Modify folder access
chmod -R 777 ik/

3. Check the installed ik plug-ins

# Enter es container
docker exec -it elasticsearch /bin/bash

# Enter the es bin directory
cd /usr/share/elasticsearch/bin

# Execute the view command to display ik
elasticsearch-plugin list

# Exit container
exit

# Restart Elasticsearch
docker restart elasticsearch

4. Test ik word splitter

Intelligent word segmentation

GET _analyze
{
  "analyzer": "ik_smart",
  "text": "I am Chinese,"
}

Maximum participle

GET _analyze
{
  "analyzer": "ik_max_word",
  "text": "I am Chinese,"
}

Custom extended Thesaurus

For some relatively new words, ik word segmentation is not supported, so we need to customize the thesaurus

We customize the word segmentation file in nginx and remotely call the word segmentation file in nginx by configuring the ik configuration file of es to realize the custom extended thesaurus.

Note: by default, nginx requests the html static directory of the data directory

nginx installation reference: docker installation nginx

1. Custom word segmentation file in nginx

echo "Shang Silicon Valley" > /mydata/nginx/html/fenci.txt

The default request address of nginx is IP: port / fenci txt;

If you want to add a new word, just add a new line to the file and save the new word.

2. Configure custom thesaurus for es

# 1. Open and edit the ik plug-in configuration file
vim /mydata/elasticsearch/plugins/ik/config/IKAnalyzer.cfg.xml

Open the comments on this line and configure your own word segmentation dictionary

3. Restart the elasticsearch container

docker restart elasticsearch

4. Test custom Thesaurus

GET _analyze
{
  "analyzer": "ik_smart",
  "text": "Shang Silicon Valley"
}

result:

Docker installation Nginx

Here is how to install nginx using docker. First, we start a temporary nginx, copy its configuration to the local nginx configuration directory we will mount, and then create a new nginx container we want to use.

1. Create configuration directory

mkdir -p /mydata/nginx

2. Start the temporary nginx container

docker run -p 80:80 --name nginx -d nginx:1.10

3. Copy the configuration of the Nginx container

# Enter the mounted nginx directory
cd /mydata/nginx

# Copy the configuration file in the container to the nginx directory
# The structure after modification is as follows: / mydata/nginx/nginx / Other documents
docker container cp nginx:/etc/nginx .

# Rename the copied nginx folder conf
# Final effect: / mydata/nginx/conf / Other documents
mv nginx conf

4. Delete temporary nginx container

# Stop running nginx container
docker stop nginx

# Delete nginx container
docker rm nginx

5. Start nginx container

docker run -p 80:80 --name nginx \
-v /mydata/nginx/html:/usr/share/nginx/html \
-v /mydata/nginx/logs:/var/log/nginx \
-v /mydata/nginx/conf/:/etc/nginx \
-d nginx:1.10

6. Set nginx to start with Docker

docker update nginx --restart=always

7. Test nginx

echo '<h1><a target="_blank" href="https://gitee.com/UnityAlvin/gulimall">gulimall</a></h1>' \
>/mydata/nginx/html/index.html

Visit: your virtual machine ip can see the following content, indicating that the installation is successful

Keywords: ElasticSearch Nginx

Added by silrayn on Sun, 30 Jan 2022 07:21:24 +0200