Synchronize MongoDB data to elasticsearch through Monstache

Installation and configuration of Monstache

monstache is a go daemon that can synchronize MongoDB data to Elasticsearch in real time.

preparation

Ready for mongodb4 4.6 replica set environment
The Elasticsearch 7 environment is ready
github source code address of monstache: https://github.com/rwynn/monstache

monstachemongodbelashticsearch
53.6+7
63.6+8

Step 1: install the monstache environment

1. Install go and configure environment variables
(1) Download and unzip the go installation package

wget https://dl.google.com/go/go1.14.4.linux-amd64.tar.gz
tar -C /usr/local -xzf go1.14.4.linux-amd64.tar.gz

(2) Use the vim /etc/profile command to open the environment variable configuration file and write the following contents to the file

export PATH=$PATH:/usr/local/go/bin

(3) Application environment variable

source /etc/profile

(4) View go environment

[root@localhost ~]# go env
GO111MODULE="on"
GOARCH="amd64"
GOBIN=""
GOCACHE="/root/.cache/go-build"
GOENV="/root/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/root/go"
GOPRIVATE=""
GOPROXY="https://goproxy.io,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/dev/null"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build567870495=/tmp/go-build -gno-record-gcc-switches"

(5) GOPROXY is a foreign agent by default and cannot be used in China, so GOPROXY is recommended Cn as a substitute

export GO111MODULE=on
export GOPROXY=https://goproxy.io.direct

2. Install Monstache
(1) Entry path

cd /usr/local

(2) Download installation package
i. Get the installation package through git and switch the version. Because es 7 corresponds to mongstack 5, switch to rel 5 branch

git clone https://github.com/rwynn/monstache.git

ii. Download the rel 5 branch compressed package in github and decompress it. Put the decompressed file under / usr/local through xftp

(3) Enter the monstache path

cd monstache

(4) Install monstache

go install

(5) View the version of monstache

monstache -v

Step 2: configure real-time synchronization tasks

  1. Enter the Monstache installation directory, create and edit the configuration file
  2. Refer to the following example to modify the configuration file.
    Simple configuration examples are as follows. For detailed configuration, see montache usage.
# settings

# connect to MongoDB using the following URL
mongo-url = "mongodb://192.168.10.134:27017/"
# connect to the Elasticsearch REST API at the following node URLs
elasticsearch-urls = ["http://192.168.2.128:9200"]

# frequently required settings

# if you need to seed an index from a collection and not just listen and sync changes events
# you can copy entire collections or views from MongoDB to Elasticsearch
direct-read-namespaces = ["lawdb.law"]
# direct-read-namespaces = ["lawdb.test","lawdb.law"]

# if you want to use MongoDB change streams instead of legacy oplog tailing use change-stream-namespaces
# change streams require at least MongoDB API 3.6+
# if you have MongoDB 4+ you can listen for changes to an entire database or entire deployment
# in this case you usually don't need regexes in your config to filter collections unless you target the deployment.
# to listen to an entire db use only the database name.  For a deployment use an empty string.
# change-stream-namespaces = ["lawdb.law"]

# additional settings

# if you don't want to listen for changes to all collections in MongoDB but only a few
# e.g. only listen for inserts, updates, deletes, and drops from mydb.mycollection
# this setting does not initiate a copy, it is only a filter on the change event listener
namespace-regex = '^lawdb\.law$'
# compress requests to Elasticsearch
#gzip = true
# generate indexing statistics
#stats = true
# index statistics into Elasticsearch
#index-stats = true
# use the following PEM file for connections to MongoDB
#mongo-pem-file = "/path/to/mongoCert.pem"
# disable PEM validation
#mongo-validate-pem-file = false
# use the following user name for Elasticsearch basic auth
elasticsearch-user = "elastic"
# use the following password for Elasticsearch basic auth
elasticsearch-password = "elasticsearch"
# use 4 go routines concurrently pushing documents to Elasticsearch
elasticsearch-max-conns = 4
# use the following PEM file to connections to Elasticsearch
#elasticsearch-pem-file = "/path/to/elasticCert.pem"
# validate connections to Elasticsearch
#elastic-validate-pem-file = true
# propogate dropped collections in MongoDB as index deletes in Elasticsearch
dropped-collections = true
# propogate dropped databases in MongoDB as index deletes in Elasticsearch
dropped-databases = true
# do not start processing at the beginning of the MongoDB oplog
# if you set the replay to true you may see version conflict messages
# in the log if you had synced previously. This just means that you are replaying old docs which are already
# in Elasticsearch with a newer version. Elasticsearch is preventing the old docs from overwriting new ones.
#replay = false
# resume processing from a timestamp saved in a previous run
resume = true
# do not validate that progress timestamps have been saved
#resume-write-unsafe = false
# override the name under which resume state is saved
#resume-name = "default"
# use a custom resume strategy (tokens) instead of the default strategy (timestamps)
# tokens work with MongoDB API 3.6+ while timestamps work only with MongoDB API 4.0+
resume-strategy = 0
# exclude documents whose namespace matches the following pattern
#namespace-exclude-regex = '^mydb\.ignorecollection$'
# turn on indexing of GridFS file content
#index-files = true
# turn on search result highlighting of GridFS content
#file-highlighting = true
# index GridFS files inserted into the following collections
#file-namespaces = ["users.fs.files"]
# print detailed information including request traces
verbose = true
# enable clustering mode
cluster-name = 'es-cn-mp91kzb8m00******'
# do not exit after full-sync, rather continue tailing the oplog
#exit-after-direct-reads = false
[[mapping]]
namespace = "lawdb.law"
index = "newlaw_ver2"
type = "doc"

#[[mapping]]
#namespace = "lawdb.test"
#index = "newlaw_ver2"
#type = "doc"
parameterexplain
mongo-urlThe primary node access address of the MongoDB instance.
elasticsearch-urlses instance address
direct-read-namespacesSpecify the set to be synchronized. The data set to be synchronized in this article is the test and law set under lawdb database.
namespace-regexSpecify the collection to listen to through regular expressions. This setting can be used to monitor changes in data in collections that conform to regular expressions.
elasticsearch-userThe user name to access the ES instance. The default is elastic.
elasticsearch-passwordThe password of the corresponding user. The password of elastic user is specified when creating an instance. If you forget it, you can reset it. For the precautions and operation steps of resetting the password, see resetting the instance access password.
elasticsearch-max-connsDefines the number of threads connecting to ES. The default is 4, that is, four Go threads are used to synchronize data to ES at the same time.
dropped-collectionsThe default value is true, which means that when the MongoDB collection is deleted, the corresponding index in ES will be deleted at the same time.
dropped-databasesThe default value is true, which means that when the MongoDB database is deleted, the corresponding index in ES will be deleted at the same time.
resumeThe default is false. If it is set to true, Monstache will write the timestamp of the MongoDB operation that has been successfully synchronized to ES to Monstache In the Monstache set. When Monstache stops unexpectedly, the synchronization task can be resumed through this timestamp to avoid data loss. If cluster name is specified, this parameter will be turned on automatically. See resume for details.
resume-strategySpecify the recovery policy. It only takes effect when resume is true. See resume strategy for details.
verboseThe default is false, which means that debugging logging is not enabled.
cluster-nameSpecify the cluster name. Once specified, monster will enter high availability mode, and processes with the same cluster name will be coordinated. See cluster name for details.
mappingSpecifies the ES Index Mapping. By default, when data is synchronized from MongoDB to es, the index is automatically mapped to the database name Collection name. If you need to modify the index name, you can set this parameter. For details, see Index Mapping.

3. Run monstache

monstache -f config.toml

Keywords: Java Linux ElasticSearch MongoDB

Added by wildmanmatt on Tue, 18 Jan 2022 22:40:03 +0200