Install IK word breaker


Before we created the index and looked up the data, we used the default word segmentation device. The word segmentation effect is not ideal. We will divide the text field into Chinese characters one by one, and then segment the searched sentences when searching. Therefore, we need a more intelligent ik word segmentation device.

1.1. Install ik plug-ins online (slower)

Download address: https://github.com/medcl/elasticsearch-analysis-ik/releases
Select the corresponding version. Here, select version 7.12.1.

# Enter the inside of the container
docker exec -it elasticsearch /bin/bash

# Download and install online
./bin/elasticsearch-plugin  install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v7.12.1/elasticsearch-analysis-ik-7.12.1.zip

#sign out
exit
#Restart container
docker restart elasticsearch

1.2. Install ik plug-ins offline

1) View data volume directory

To install the plug-in, you need to know the location of the plugins directory of elasticsearch. I used the data volume to mount, so you need to view the data volume directory of elasticsearch through the following command:

docker volume inspect es-plugins

Display results:

[
    {
        "CreatedAt": "2022-05-06T10:06:34+08:00",
        "Driver": "local",
        "Labels": null,
        "Mountpoint": "/var/lib/docker/volumes/es-plugins/_data",
        "Name": "es-plugins",
        "Options": null,
        "Scope": "local"
    }
]

Description the plugins directory is mounted to: / var / lib / docker / volumes / es plugins/_ Data in this directory.

2) Decompress the installation package of word splitter

We need to unzip the installation package downloaded in advance by ik word splitter and rename it ik

3) Upload to the plug-in data volume of the es container

That is, / var / lib / docker / volumes / es plugins/_ data:

4) Restart container

# 4. Restart container
docker restart es
# View es log
docker logs -f es
# The IK word splitter contains two modes:
 ik_smart: Minimum segmentation
 ik_max_word: Thinnest segmentation

1.3 extended word dictionary

With the development of the Internet, "word making movement" is becoming more and more frequent. Many new words have appeared, which do not exist in the original vocabulary list. For example, some network words.

Therefore, our vocabulary also needs to be constantly updated. IK word splitter provides the function of expanding vocabulary.

1) Open the IK word splitter config Directory:

2) At ikanalyzer cfg. XML configuration file content addition:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
        <comment>IK Analyzer Extended configuration</comment>
        <!--Users can configure their own extended dictionary here *** Add extended dictionary-->
        <entry key="ext_dict">ext.dic</entry>
</properties>

3) Create a new ext.dic. You can copy a configuration file under the config directory for modification

awesome

4) Restart elasticsearch

docker restart es
# View log
docker logs -f elasticsearch

The ext.dic configuration file has been successfully loaded in the log

Note that the encoding of the current file must be in UTF-8 format. It is strictly prohibited to edit it with Windows Notepad

1.4 Dictionary of stop words

In Internet projects, the transmission speed between networks is very fast, so many languages are not allowed to be transmitted on the network, such as sensitive words such as politics, so we should also ignore the current words when searching.

The IK word splitter also provides a powerful stop word function, allowing us to directly ignore the contents of the current stop vocabulary when indexing.

1)IKAnalyzer.cfg.xml configuration file content addition:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
        <comment>IK Analyzer Extended configuration</comment>
        <!--Users can configure their own extended dictionary here-->
        <entry key="ext_dict">ext.dic</entry>
         <!--Users can configure their own extended stop word dictionary here  *** Add stop word dictionary-->
        <entry key="ext_stopwords">stopword.dic</entry>
</properties>

2) In stopword DIC add stop word

# Add stop words here

3) Restart elasticsearch

# Restart service
docker restart elasticsearch
docker restart kibana

# View log
docker logs -f elasticsearch

Stopword.com has been successfully loaded in the log DIC configuration file

Note that the encoding of the current file must be in UTF-8 format

Keywords: ElasticSearch search engine

Added by Chrisww on Wed, 23 Feb 2022 06:17:07 +0200