Apache Druid installation and deployment manual

An Apache Druid architecture

1. Coordinator

Monitor historical processing, and be responsible for allocating segments to the specified service to ensure that historical data is self balanced

2. Overlord

Monitor the MiddleManager to process and control data loading into the druid cluster; Responsible for the ingestion tasks assigned to the MiddleManager and coordinating the release of segments

3. Broker

Process the query from the client, parse and redirect the query to Historical and MiddleManager. The Broker receives the data from this sub query, combines these results, and then returns them to the inquirer

4. Router

It provides a unified routing gateway for Brokers, overlards and Coordinator

5. Historical

It is used to query the stored and historical data to the workstation. Historical processes the segments loaded from deep storage and responds to the query of historical data sent by these segments from the broker

6. MiddleManager

Ingest new data into the cluster; It is responsible for additional data sources (new real-time data) and publishing new druid segments. It is a work node for performing submission tasks; Submitting tasks to peons is an independent JVM. Because of the isolation of task resources and logs, each peon uses an isolated JVMs. Each peon can only run one task at a time, and a MiddleManager has multiple peons

7. Additional reliance

Deep storage: a shared file storage accessible by druid; For example, distributed file systems HDFS, S3, and a file system attached to the network; Use it to store any data that has been ingested;
Metadata store: a shared metadata store, typical relational databases PostgreSql and Mysql;
Zookeeper: one used to discover, coordinate and lead elections for additional services;

II. Preparatory work

1. mysql (Metadata Storage as Druid)

Create database:

CREATE DATABASE druid DEFAULT CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci;

to grant authorization:

grant all privileges on druid.* to druid@'%' identified by 'druid';

2. Cluster node planning

IP addressNode functionDeployment Services
10.0.111.140Master ServerCoordinator,Overlord
10.0.111.141~143Data ServerHistorical,MiddleManager
10.0.111.144Query ServerBroker ,router

3. druid access address

  • Coordinator
    http://10.0.111.140:8081
  • Router
    http://10.0.111.144:8888

III. cluster configuration

1. Download the installation package

Download the installation package on the planned master server node and unzip it in the / opt/tools Directory

wget https://mirrors.tuna.tsinghua.edu.cn/apache/druid/0.18.1/apache-druid-0.18.1-bin.tar.gz

The following operations should be carried out under cd /opt/tools/apache-druid-0.18.1 /
Note the following parameter setting requirements:

druid.processing.numMergeBuffers = max(2, druid.processing.numThreads / 4)
druid.processing.numThreads =  Number of cores - 1 (or 1)
druid.server.http.numThreads = max(10, (Number of cores * 17) / 16 + 2) + 30
MaxDirectMemorySize >= druid.processing.buffer.sizeByte *(druid.processing.numMergeBuffers + druid.processing.numThreads + 1) 

2. Modify common configuration

  • common.runtime.properties configuration file
    druid cluster uses mysql as metadata storage and HDFS as deep storage
vim conf/druid/cluster/_common/common.runtime.properties
druid.extensions.loadList=["druid-hdfs-storage", "druid-kafka-indexing-service", "druid-datasketches","mysql-metadata-storage"]
druid.extensions.hadoopDependenciesDir=/opt/tools/druid/hadoop-dependencies
druid.host=hadoop10
druid.startup.logging.logProperties=true
druid.zk.service.host=hadoop1:2181,hadoop2:2181,hadoop3:2181,hadoop4:2181,hadoop5:2181
druid.zk.paths.base=/druid
druid.metadata.storage.type=mysql
druid.metadata.storage.connector.connectURI=jdbc:mysql://10.0.111.134:3306/druid
druid.metadata.storage.connector.user=druid
druid.metadata.storage.connector.password=druid
# For HDFS:
druid.indexer.task.hadoopWorkingPath=/tmp/druid-indexing
druid.storage.type=hdfs
druid.storage.storageDirectory=hdfs://nameservice1/druid/segments
# For HDFS:
druid.indexer.logs.type=hdfs
druid.indexer.logs.directory=hdfs://nameservice1/druid/indexing-logs
druid.selectors.indexing.serviceName=druid/overlord
druid.selectors.coordinator.serviceName=druid/coordinator
druid.monitoring.monitors=["org.apache.druid.java.util.metrics.JvmMonitor"]
druid.emitter=logging
druid.emitter.logging.logLevel=info
druid.indexing.doubleStorage=double
druid.server.hiddenProperties=["druid.s3.accessKey","druid.s3.secretKey","druid.metadata.storage.connector.password"]
druid.sql.enable=true
druid.lookup.enableLookupSyncOnStartup=false

Copy hdfs related configuration files to conf/druid/cluster/_common / directory
Modify the Druid of each node Host configuration

3. Synchronize the druid directory to other nodes

4. Modify the master node configuration

  • jvm.config configuration file
vim conf/druid/cluster/master/coordinator-overlord/jvm.config
-server
-Xms8g
-Xmx8g
-XX:+ExitOnOutOfMemoryError
-XX:+UseG1GC
-Duser.timezone=UTC
-Dfile.encoding=UTF-8
-Djava.io.tmpdir=var/tmp
-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager
-Dderby.stream.error.file=var/druid/derby.log
  • runtime.properties configuration file
vim conf/druid/cluster/master/coordinator-overlord/runtime.properties
druid.service=druid/coordinator
druid.plaintextPort=8081
druid.coordinator.startDelay=PT10S
druid.coordinator.period=PT5S
druid.coordinator.asOverlord.enabled=true
druid.coordinator.asOverlord.overlordService=druid/overlord
druid.indexer.queue.startDelay=PT5S
druid.indexer.runner.type=remote
druid.indexer.storage.type=metadata

5. Modify the query node configuration file

  • broker-jvm.config configuration file
vim conf/druid/cluster/query/broker/jvm.config
-server
-Xms6g
-Xmx6g
-XX:MaxDirectMemorySize=8g
-XX:+ExitOnOutOfMemoryError
-Duser.timezone=UTC
-Dfile.encoding=UTF-8
-Djava.io.tmpdir=var/tmp
-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager
  • broker-runtime.properties configuration file
vim conf/druid/cluster/query/broker/runtime.properties
druid.service=druid/broker
druid.plaintextPort=8082
druid.server.http.numThreads=60
druid.broker.http.numConnections=50
druid.broker.http.maxQueuedBytes=10000000
druid.processing.buffer.sizeBytes=500000000
druid.processing.numMergeBuffers=6
druid.processing.numThreads=1
druid.processing.tmpDir=var/druid/processing
druid.broker.cache.useCache=false
druid.broker.cache.populateCache=false
  • router-jvm.config configuration file
vim conf/druid/cluster/query/router/jvm.config
-server
-Xms1g
-Xmx1g
-XX:+UseG1GC
-XX:MaxDirectMemorySize=128m
-XX:+ExitOnOutOfMemoryError
-Duser.timezone=UTC
-Dfile.encoding=UTF-8
-Djava.io.tmpdir=var/tmp
-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager
  • router-runtime.properties configuration file
vim conf/druid/cluster/query/router/runtime.properties
druid.service=druid/router
druid.plaintextPort=8888
druid.router.http.numConnections=50
druid.router.http.readTimeout=PT5M
druid.router.http.numMaxThreads=100
druid.server.http.numThreads=100
druid.router.defaultBrokerServiceName=druid/broker
druid.router.coordinatorServiceName=druid/coordinator
druid.router.managementProxy.enabled=true

6. Modify the data server node configuration file

  • historical-jvm.config configuration file
vim conf/druid/cluster/data/historical/jvm.config
-server
-Xms2g
-Xmx2g
-XX:MaxDirectMemorySize=8g
-XX:+ExitOnOutOfMemoryError
-Duser.timezone=UTC
-Dfile.encoding=UTF-8
-Djava.io.tmpdir=var/tmp
-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager
  • historical-runtime.properties configuration file
vim conf/druid/cluster/data/historical/runtime.properties
druid.service=druid/historical
druid.plaintextPort=8083
druid.server.http.numThreads=57
druid.processing.buffer.sizeBytes=200000000
druid.processing.numMergeBuffers=5
druid.processing.numThreads=23
druid.processing.tmpDir=var/druid/processing
druid.segmentCache.locations=[{"path":"var/druid/segment-cache","maxSize":3000000000}]
druid.server.maxSize=3000000000
druid.historical.cache.useCache=true
druid.historical.cache.populateCache=true
druid.cache.type=caffeine
druid.cache.sizeInBytes=256000000
  • middleManager-jvm.config
vim conf/druid/cluster/data/middleManager/jvm.config
-server
-Xms128m
-Xmx128m
-XX:+ExitOnOutOfMemoryError
-Duser.timezone=UTC
-Dfile.encoding=UTF-8
-Djava.io.tmpdir=var/tmp
-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager
  • middleManager-runtime.properties
vim conf/druid/cluster/data/middleManager/jvm.config
druid.service=druid/middleManager
druid.plaintextPort=8091
druid.worker.capacity=5
druid.indexer.runner.javaOpts=-server -Xms1g -Xmx1g -XX:MaxDirectMemorySize=1g -Duser.timezone=UTC -Dfile.encoding=UTF-8 -XX:+ExitOnOutOfMemoryError -Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager
druid.indexer.task.baseTaskDir=var/druid/task
druid.server.http.numThreads=57
druid.indexer.fork.property.druid.processing.numMergeBuffers=5
druid.indexer.fork.property.druid.processing.buffer.sizeBytes=100000000
druid.indexer.fork.property.druid.processing.numThreads=1
druid.indexer.task.hadoopWorkingPath=/tmp/druid-indexing

IV. cluster startup and shutdown

1. Background startup and shutdown

nohup ./bin/start-cluster-master-no-zk-server start >/dev/null 2>&1 & 
nohup ./bin/start-cluster-query-server start  >/dev/null 2>&1 &
nohup ./bin/start-cluster-data-server start >/dev/null 2>&1 &
./bin/service --down

2. Service mode startup and shutdown

  • centos 7 service startup
vim /lib/systemd/system/druidmaster.service
[Unit]
Description=druidmaster
After=network.target
[Service]
Type=forking
EnvironmentFile=/home/path
WorkingDirectory=/opt/tools/apache-druid-0.20.2/
ExecStart=/opt/tools/apache-druid-0.20.2/bin/start-cluster-master-no-zk-server start
ExecStop=/opt/tools/apache-druid-0.20.2/bin/service --down
Restart=1

[Install]
WantedBy=multi-user.target
vim /home/path
JAVA_HOME=/opt/tools/jdk
PATH=/opt/tools/jdk/bin:/opt/tools/jdk/jre/bin:/usr/local/jdk/bin:/usr/local/jdk/jre/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin
source /home/path
chmod 764 /lib/systemd/system/druidmaster.service
chmod 764 /usr/lib/systemd/system/druidmaster.service
systemctl start druidmaster.service

V. problems encountered

1. hdfs deep storage does not take effect

Solution: copy hdfs related configuration files to conf/druid/cluster/_common / directory

2. The service fails to start, and a large number of HS are generated in the running directory_ err_ pid*. Log log

View log tailf - 100 var / SV / historical log
View java error log tail - 100 HS_ err_ pid2416. log
Solution: check that the available memory of the server is lower than the role jvm configuration. Modify the jvm configuration file and start successfully

3. All druid services start in the background, exit xshell, and the druid process is killed

The machine where the druid server is located logs in through the public network server. When exiting the xshell, it cannot send the correct termination signal to the server, and the nohup program does not receive the normal exit instruction
Solution: exit xshell normally

4. Prevent log explosion

Each role JVM Config add the following configuration:
-Ddruidrole=coordinator-overlord
Modify log4j2 XML file configuration log policy:

<Configuration status="WARN" monitorInterval="30">
    <Properties>
        <Property name="baseDir">var/log/druid</Property>
        <Property name="filename">${sys:druidrole}</Property>
    </Properties>
  <Appenders>
    <Console name="Console" target="SYSTEM_OUT">
      <PatternLayout pattern="%d{ISO8601} %p [%t] %c - %m%n"/>
    </Console>
    <RollingFile name="RollingFile"
                 fileName="${baseDir}/${filename}.log"
                 filePattern="${baseDir}/${filename}.%i.log.gz">
        <PatternLayout pattern="%d{ISO8601} %p [%t] %c - %m%n"/>
        <Policies>
            <SizeBasedTriggeringPolicy size="200 MB"/>
        </Policies>
        <DefaultRolloverStrategy max="5"/>
    </RollingFile>
  </Appenders>
  <Loggers>
    <Root level="info">
      <AppenderRef ref="RollingFile"/>
    </Root>

Keywords: Database Big Data Apache

Added by NathanLedet on Thu, 13 Jan 2022 08:44:49 +0200