Installation and configuration of Monstache
monstache is a go daemon that can synchronize MongoDB data to Elasticsearch in real time.
preparation
Ready for mongodb4 4.6 replica set environment
The Elasticsearch 7 environment is ready
github source code address of monstache: https://github.com/rwynn/monstache
monstache | mongodb | elashticsearch |
---|---|---|
5 | 3.6+ | 7 |
6 | 3.6+ | 8 |
Step 1: install the monstache environment
1. Install go and configure environment variables
(1) Download and unzip the go installation package
wget https://dl.google.com/go/go1.14.4.linux-amd64.tar.gz tar -C /usr/local -xzf go1.14.4.linux-amd64.tar.gz
(2) Use the vim /etc/profile command to open the environment variable configuration file and write the following contents to the file
export PATH=$PATH:/usr/local/go/bin
(3) Application environment variable
source /etc/profile
(4) View go environment
[root@localhost ~]# go env GO111MODULE="on" GOARCH="amd64" GOBIN="" GOCACHE="/root/.cache/go-build" GOENV="/root/.config/go/env" GOEXE="" GOFLAGS="" GOHOSTARCH="amd64" GOHOSTOS="linux" GOINSECURE="" GONOPROXY="" GONOSUMDB="" GOOS="linux" GOPATH="/root/go" GOPRIVATE="" GOPROXY="https://goproxy.io,direct" GOROOT="/usr/local/go" GOSUMDB="sum.golang.org" GOTMPDIR="" GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64" GCCGO="gccgo" AR="ar" CC="gcc" CXX="g++" CGO_ENABLED="1" GOMOD="/dev/null" CGO_CFLAGS="-g -O2" CGO_CPPFLAGS="" CGO_CXXFLAGS="-g -O2" CGO_FFLAGS="-g -O2" CGO_LDFLAGS="-g -O2" PKG_CONFIG="pkg-config" GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build567870495=/tmp/go-build -gno-record-gcc-switches"
(5) GOPROXY is a foreign agent by default and cannot be used in China, so GOPROXY is recommended Cn as a substitute
export GO111MODULE=on export GOPROXY=https://goproxy.io.direct
2. Install Monstache
(1) Entry path
cd /usr/local
(2) Download installation package
i. Get the installation package through git and switch the version. Because es 7 corresponds to mongstack 5, switch to rel 5 branch
git clone https://github.com/rwynn/monstache.git
ii. Download the rel 5 branch compressed package in github and decompress it. Put the decompressed file under / usr/local through xftp
(3) Enter the monstache path
cd monstache
(4) Install monstache
go install
(5) View the version of monstache
monstache -v
Step 2: configure real-time synchronization tasks
- Enter the Monstache installation directory, create and edit the configuration file
- Refer to the following example to modify the configuration file.
Simple configuration examples are as follows. For detailed configuration, see montache usage.
# settings # connect to MongoDB using the following URL mongo-url = "mongodb://192.168.10.134:27017/" # connect to the Elasticsearch REST API at the following node URLs elasticsearch-urls = ["http://192.168.2.128:9200"] # frequently required settings # if you need to seed an index from a collection and not just listen and sync changes events # you can copy entire collections or views from MongoDB to Elasticsearch direct-read-namespaces = ["lawdb.law"] # direct-read-namespaces = ["lawdb.test","lawdb.law"] # if you want to use MongoDB change streams instead of legacy oplog tailing use change-stream-namespaces # change streams require at least MongoDB API 3.6+ # if you have MongoDB 4+ you can listen for changes to an entire database or entire deployment # in this case you usually don't need regexes in your config to filter collections unless you target the deployment. # to listen to an entire db use only the database name. For a deployment use an empty string. # change-stream-namespaces = ["lawdb.law"] # additional settings # if you don't want to listen for changes to all collections in MongoDB but only a few # e.g. only listen for inserts, updates, deletes, and drops from mydb.mycollection # this setting does not initiate a copy, it is only a filter on the change event listener namespace-regex = '^lawdb\.law$' # compress requests to Elasticsearch #gzip = true # generate indexing statistics #stats = true # index statistics into Elasticsearch #index-stats = true # use the following PEM file for connections to MongoDB #mongo-pem-file = "/path/to/mongoCert.pem" # disable PEM validation #mongo-validate-pem-file = false # use the following user name for Elasticsearch basic auth elasticsearch-user = "elastic" # use the following password for Elasticsearch basic auth elasticsearch-password = "elasticsearch" # use 4 go routines concurrently pushing documents to Elasticsearch elasticsearch-max-conns = 4 # use the following PEM file to connections to Elasticsearch #elasticsearch-pem-file = "/path/to/elasticCert.pem" # validate connections to Elasticsearch #elastic-validate-pem-file = true # propogate dropped collections in MongoDB as index deletes in Elasticsearch dropped-collections = true # propogate dropped databases in MongoDB as index deletes in Elasticsearch dropped-databases = true # do not start processing at the beginning of the MongoDB oplog # if you set the replay to true you may see version conflict messages # in the log if you had synced previously. This just means that you are replaying old docs which are already # in Elasticsearch with a newer version. Elasticsearch is preventing the old docs from overwriting new ones. #replay = false # resume processing from a timestamp saved in a previous run resume = true # do not validate that progress timestamps have been saved #resume-write-unsafe = false # override the name under which resume state is saved #resume-name = "default" # use a custom resume strategy (tokens) instead of the default strategy (timestamps) # tokens work with MongoDB API 3.6+ while timestamps work only with MongoDB API 4.0+ resume-strategy = 0 # exclude documents whose namespace matches the following pattern #namespace-exclude-regex = '^mydb\.ignorecollection$' # turn on indexing of GridFS file content #index-files = true # turn on search result highlighting of GridFS content #file-highlighting = true # index GridFS files inserted into the following collections #file-namespaces = ["users.fs.files"] # print detailed information including request traces verbose = true # enable clustering mode cluster-name = 'es-cn-mp91kzb8m00******' # do not exit after full-sync, rather continue tailing the oplog #exit-after-direct-reads = false [[mapping]] namespace = "lawdb.law" index = "newlaw_ver2" type = "doc" #[[mapping]] #namespace = "lawdb.test" #index = "newlaw_ver2" #type = "doc"
parameter | explain |
---|---|
mongo-url | The primary node access address of the MongoDB instance. |
elasticsearch-urls | es instance address |
direct-read-namespaces | Specify the set to be synchronized. The data set to be synchronized in this article is the test and law set under lawdb database. |
namespace-regex | Specify the collection to listen to through regular expressions. This setting can be used to monitor changes in data in collections that conform to regular expressions. |
elasticsearch-user | The user name to access the ES instance. The default is elastic. |
elasticsearch-password | The password of the corresponding user. The password of elastic user is specified when creating an instance. If you forget it, you can reset it. For the precautions and operation steps of resetting the password, see resetting the instance access password. |
elasticsearch-max-conns | Defines the number of threads connecting to ES. The default is 4, that is, four Go threads are used to synchronize data to ES at the same time. |
dropped-collections | The default value is true, which means that when the MongoDB collection is deleted, the corresponding index in ES will be deleted at the same time. |
dropped-databases | The default value is true, which means that when the MongoDB database is deleted, the corresponding index in ES will be deleted at the same time. |
resume | The default is false. If it is set to true, Monstache will write the timestamp of the MongoDB operation that has been successfully synchronized to ES to Monstache In the Monstache set. When Monstache stops unexpectedly, the synchronization task can be resumed through this timestamp to avoid data loss. If cluster name is specified, this parameter will be turned on automatically. See resume for details. |
resume-strategy | Specify the recovery policy. It only takes effect when resume is true. See resume strategy for details. |
verbose | The default is false, which means that debugging logging is not enabled. |
cluster-name | Specify the cluster name. Once specified, monster will enter high availability mode, and processes with the same cluster name will be coordinated. See cluster name for details. |
mapping | Specifies the ES Index Mapping. By default, when data is synchronized from MongoDB to es, the index is automatically mapped to the database name Collection name. If you need to modify the index name, you can set this parameter. For details, see Index Mapping. |
3. Run monstache
monstache -f config.toml