Mysql data synchronization to elastic search in Docker environment

Written in front

There are generally two principles to implement mysql data synchronization es:

  1. The synchronization is performed through the regular query of sql statement.

    1. elasticsearch-jdbc
    2. [logstash JDBC] official( https://www.elastic.co/blog/l...
  2. Use binlog for synchronization

    1. Big God's open source project go-mysql-elasticsearch

This article mainly introduces logstash JDBC, which is an official project. It is frequently updated and maintained. It is also relatively safe to use. Of course, it has some disadvantages. There's not much bullshit to start with.

Choreograph the docker environment

This article is built in the docker environment. First of all, you need to install docker. The author's version is 18.09.2. Because what we have been developing and using is laradock This is still the es and logstash added based on this environment. Because mysql has not made any changes, please refer to the laradock document for mysql installation arrangement. In fact, the arrangement of es and logstash is very simple.

Arrange elastic search

Dockerfile of es

FROM elasticsearch:7.3.0

EXPOSE 9200 9300

docker-compose.yml file

elasticsearch:
      build: ./elasticsearch
      volumes:
        - elasticsearch:/usr/share/elasticsearch/data
      environment:
        - cluster.name=laradock-cluster
        - node.name=laradock-node
        - bootstrap.memory_lock=true
        - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
        - cluster.initial_master_nodes=laradock-node
      ulimits:
        memlock:
          soft: -1
          hard: -1
      ports:
        - "${ELASTICSEARCH_HOST_HTTP_PORT}:9200"
        - "${ELASTICSEARCH_HOST_TRANSPORT_PORT}:9300"
      depends_on:
        - php-fpm
      networks:
        - frontend
        - backend

Docker compose up - D elasticsearch mysql starts the image. You can test and find that es and mysql should have been started.

Arrange logstash

Since we use the logstash JDBC plug-in to achieve synchronization, our focus is on the choreography of logstash.

Logstash jdbc is connected to mysql through java, so we first need a jdbc jar file, which can be downloaded from Official website Download, get a jar file, copy it into the docker's choreography directory, of course, you can also use docker to download when choreography.

The catalogue of the author's logstash

dockerfile file

FROM logstash:7.3.0

USER root
RUN rm -f /usr/share/logstash/pipeline/logstash.conf
RUN curl -L -o /usr/share/logstash/lib/mysql-connector-java-5.1.47.jar https://repo1.maven.org/maven2/mysql/mysql-connector-java/5.1.47/mysql-connector-java-5.1.47.jar
ADD ./pipeline/ /usr/share/logstash/pipeline/
ADD ./config /usr/share/logstash/config/
ADD mysql-connector-java-8.0.18.jar /usr/share/logstash/logstash-core/lib/jars/mysql/mysql-connector-java-8.0.18.jar

RUN logstash-plugin install logstash-input-jdbc

Synchronization task configuration file mysql/mysql.conf

input {
    jdbc {  
        # Here is the jdbc connection mysql statement. The second mysql is because the internal access of this docker project requires network bridging. You can modify it yourself
        jdbc_connection_string => "jdbc:mysql://mysql:3306/koudai"
        jdbc_user => "root"
        jdbc_password => "root"
        # Driver; / usr/share/logstash/config/mysql / starting for logstash plug-in is the default directory for finding jar files
        jdbc_driver_library => "/usr/share/logstash/config/mysql/mysql-connector-java-8.0.18.jar"
        # Drive class name
        jdbc_driver_class => "com.mysql.jdbc.Driver"
        jdbc_paging_enabled => "true"
        jdbc_page_size => "50000"
        jdbc_default_timezone => "Asia/Shanghai"
        # sql file name
        statement_filepath => "/usr/share/logstash/config/mysql/task.sql"
        # Monitoring interval [minute, hour, day, month, year]
        schedule => "* * * * *"
        type => "user"
        # Whether to record the last execution result. If it is true, the value of the tracking column field last executed will be recorded and saved in the file specified by last run metadata path
        record_last_run => true
        # Whether to record the value of a column. If record last run is true, you can customize the column name of the track. At this time, the parameter must be true. Otherwise, the default track value is timestamp
        use_column_value => true
        # If use column value is true, you need to configure the database column name of this parameter. track. The column must be incremental. Generally, it is mysql primary key
        tracking_column => "lastmodifiedTime"
        tracking_column_type => "timestamp"
        last_run_metadata_path => "./last_record/logstash_article_last_time"
        # Whether to clear the record of last run metadata path? If it is true, each time is equivalent to querying all database records from the beginning
        clean_run => false
        # Convert the field (column) name to lowercase
        lowercase_column_names => false
    }
}

output {
    elasticsearch {
        # The same reason is that the internal access of this docker project needs network bridging
        hosts => ["http://elasticsearch:9200"]
        index => "user"
        document_id => "%{uid}"
    }
}

Synchronous mysql

select * from kdgx_partner_charge_user

For more use, please refer to Here

Since then, the new data in mysql can be synchronized to es.

Keywords: Linux MySQL JDBC ElasticSearch Docker

Added by Pinkmischief on Wed, 27 Nov 2019 18:47:58 +0200