An Apache Druid architecture
1. Coordinator
Monitor historical processing, and be responsible for allocating segments to the specified service to ensure that historical data is self balanced
2. Overlord
Monitor the MiddleManager to process and control data loading into the druid cluster; Responsible for the ingestion tasks assigned to the MiddleManager and coordinating the release of segments
3. Broker
Process the query from the client, parse and redirect the query to Historical and MiddleManager. The Broker receives the data from this sub query, combines these results, and then returns them to the inquirer
4. Router
It provides a unified routing gateway for Brokers, overlards and Coordinator
5. Historical
It is used to query the stored and historical data to the workstation. Historical processes the segments loaded from deep storage and responds to the query of historical data sent by these segments from the broker
6. MiddleManager
Ingest new data into the cluster; It is responsible for additional data sources (new real-time data) and publishing new druid segments. It is a work node for performing submission tasks; Submitting tasks to peons is an independent JVM. Because of the isolation of task resources and logs, each peon uses an isolated JVMs. Each peon can only run one task at a time, and a MiddleManager has multiple peons
7. Additional reliance
Deep storage: a shared file storage accessible by druid; For example, distributed file systems HDFS, S3, and a file system attached to the network; Use it to store any data that has been ingested;
Metadata store: a shared metadata store, typical relational databases PostgreSql and Mysql;
Zookeeper: one used to discover, coordinate and lead elections for additional services;
II. Preparatory work
1. mysql (Metadata Storage as Druid)
Create database:
CREATE DATABASE druid DEFAULT CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci;
to grant authorization:
grant all privileges on druid.* to druid@'%' identified by 'druid';
2. Cluster node planning
IP address | Node function | Deployment Services |
---|---|---|
10.0.111.140 | Master Server | Coordinator,Overlord |
10.0.111.141~143 | Data Server | Historical,MiddleManager |
10.0.111.144 | Query Server | Broker ,router |
3. druid access address
- Coordinator
http://10.0.111.140:8081 - Router
http://10.0.111.144:8888
III. cluster configuration
1. Download the installation package
Download the installation package on the planned master server node and unzip it in the / opt/tools Directory
wget https://mirrors.tuna.tsinghua.edu.cn/apache/druid/0.18.1/apache-druid-0.18.1-bin.tar.gz
The following operations should be carried out under cd /opt/tools/apache-druid-0.18.1 /
Note the following parameter setting requirements:
druid.processing.numMergeBuffers = max(2, druid.processing.numThreads / 4) druid.processing.numThreads = Number of cores - 1 (or 1) druid.server.http.numThreads = max(10, (Number of cores * 17) / 16 + 2) + 30 MaxDirectMemorySize >= druid.processing.buffer.sizeByte *(druid.processing.numMergeBuffers + druid.processing.numThreads + 1)
2. Modify common configuration
- common.runtime.properties configuration file
druid cluster uses mysql as metadata storage and HDFS as deep storage
vim conf/druid/cluster/_common/common.runtime.properties
druid.extensions.loadList=["druid-hdfs-storage", "druid-kafka-indexing-service", "druid-datasketches","mysql-metadata-storage"] druid.extensions.hadoopDependenciesDir=/opt/tools/druid/hadoop-dependencies druid.host=hadoop10 druid.startup.logging.logProperties=true druid.zk.service.host=hadoop1:2181,hadoop2:2181,hadoop3:2181,hadoop4:2181,hadoop5:2181 druid.zk.paths.base=/druid druid.metadata.storage.type=mysql druid.metadata.storage.connector.connectURI=jdbc:mysql://10.0.111.134:3306/druid druid.metadata.storage.connector.user=druid druid.metadata.storage.connector.password=druid # For HDFS: druid.indexer.task.hadoopWorkingPath=/tmp/druid-indexing druid.storage.type=hdfs druid.storage.storageDirectory=hdfs://nameservice1/druid/segments # For HDFS: druid.indexer.logs.type=hdfs druid.indexer.logs.directory=hdfs://nameservice1/druid/indexing-logs druid.selectors.indexing.serviceName=druid/overlord druid.selectors.coordinator.serviceName=druid/coordinator druid.monitoring.monitors=["org.apache.druid.java.util.metrics.JvmMonitor"] druid.emitter=logging druid.emitter.logging.logLevel=info druid.indexing.doubleStorage=double druid.server.hiddenProperties=["druid.s3.accessKey","druid.s3.secretKey","druid.metadata.storage.connector.password"] druid.sql.enable=true druid.lookup.enableLookupSyncOnStartup=false
Copy hdfs related configuration files to conf/druid/cluster/_common / directory
Modify the Druid of each node Host configuration
3. Synchronize the druid directory to other nodes
4. Modify the master node configuration
- jvm.config configuration file
vim conf/druid/cluster/master/coordinator-overlord/jvm.config
-server -Xms8g -Xmx8g -XX:+ExitOnOutOfMemoryError -XX:+UseG1GC -Duser.timezone=UTC -Dfile.encoding=UTF-8 -Djava.io.tmpdir=var/tmp -Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager -Dderby.stream.error.file=var/druid/derby.log
- runtime.properties configuration file
vim conf/druid/cluster/master/coordinator-overlord/runtime.properties
druid.service=druid/coordinator druid.plaintextPort=8081 druid.coordinator.startDelay=PT10S druid.coordinator.period=PT5S druid.coordinator.asOverlord.enabled=true druid.coordinator.asOverlord.overlordService=druid/overlord druid.indexer.queue.startDelay=PT5S druid.indexer.runner.type=remote druid.indexer.storage.type=metadata
5. Modify the query node configuration file
- broker-jvm.config configuration file
vim conf/druid/cluster/query/broker/jvm.config
-server -Xms6g -Xmx6g -XX:MaxDirectMemorySize=8g -XX:+ExitOnOutOfMemoryError -Duser.timezone=UTC -Dfile.encoding=UTF-8 -Djava.io.tmpdir=var/tmp -Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager
- broker-runtime.properties configuration file
vim conf/druid/cluster/query/broker/runtime.properties
druid.service=druid/broker druid.plaintextPort=8082 druid.server.http.numThreads=60 druid.broker.http.numConnections=50 druid.broker.http.maxQueuedBytes=10000000 druid.processing.buffer.sizeBytes=500000000 druid.processing.numMergeBuffers=6 druid.processing.numThreads=1 druid.processing.tmpDir=var/druid/processing druid.broker.cache.useCache=false druid.broker.cache.populateCache=false
- router-jvm.config configuration file
vim conf/druid/cluster/query/router/jvm.config
-server -Xms1g -Xmx1g -XX:+UseG1GC -XX:MaxDirectMemorySize=128m -XX:+ExitOnOutOfMemoryError -Duser.timezone=UTC -Dfile.encoding=UTF-8 -Djava.io.tmpdir=var/tmp -Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager
- router-runtime.properties configuration file
vim conf/druid/cluster/query/router/runtime.properties
druid.service=druid/router druid.plaintextPort=8888 druid.router.http.numConnections=50 druid.router.http.readTimeout=PT5M druid.router.http.numMaxThreads=100 druid.server.http.numThreads=100 druid.router.defaultBrokerServiceName=druid/broker druid.router.coordinatorServiceName=druid/coordinator druid.router.managementProxy.enabled=true
6. Modify the data server node configuration file
- historical-jvm.config configuration file
vim conf/druid/cluster/data/historical/jvm.config
-server -Xms2g -Xmx2g -XX:MaxDirectMemorySize=8g -XX:+ExitOnOutOfMemoryError -Duser.timezone=UTC -Dfile.encoding=UTF-8 -Djava.io.tmpdir=var/tmp -Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager
- historical-runtime.properties configuration file
vim conf/druid/cluster/data/historical/runtime.properties
druid.service=druid/historical druid.plaintextPort=8083 druid.server.http.numThreads=57 druid.processing.buffer.sizeBytes=200000000 druid.processing.numMergeBuffers=5 druid.processing.numThreads=23 druid.processing.tmpDir=var/druid/processing druid.segmentCache.locations=[{"path":"var/druid/segment-cache","maxSize":3000000000}] druid.server.maxSize=3000000000 druid.historical.cache.useCache=true druid.historical.cache.populateCache=true druid.cache.type=caffeine druid.cache.sizeInBytes=256000000
- middleManager-jvm.config
vim conf/druid/cluster/data/middleManager/jvm.config
-server -Xms128m -Xmx128m -XX:+ExitOnOutOfMemoryError -Duser.timezone=UTC -Dfile.encoding=UTF-8 -Djava.io.tmpdir=var/tmp -Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager
- middleManager-runtime.properties
vim conf/druid/cluster/data/middleManager/jvm.config
druid.service=druid/middleManager druid.plaintextPort=8091 druid.worker.capacity=5 druid.indexer.runner.javaOpts=-server -Xms1g -Xmx1g -XX:MaxDirectMemorySize=1g -Duser.timezone=UTC -Dfile.encoding=UTF-8 -XX:+ExitOnOutOfMemoryError -Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager druid.indexer.task.baseTaskDir=var/druid/task druid.server.http.numThreads=57 druid.indexer.fork.property.druid.processing.numMergeBuffers=5 druid.indexer.fork.property.druid.processing.buffer.sizeBytes=100000000 druid.indexer.fork.property.druid.processing.numThreads=1 druid.indexer.task.hadoopWorkingPath=/tmp/druid-indexing
IV. cluster startup and shutdown
1. Background startup and shutdown
nohup ./bin/start-cluster-master-no-zk-server start >/dev/null 2>&1 & nohup ./bin/start-cluster-query-server start >/dev/null 2>&1 & nohup ./bin/start-cluster-data-server start >/dev/null 2>&1 & ./bin/service --down
2. Service mode startup and shutdown
- centos 7 service startup
vim /lib/systemd/system/druidmaster.service
[Unit] Description=druidmaster After=network.target [Service] Type=forking EnvironmentFile=/home/path WorkingDirectory=/opt/tools/apache-druid-0.20.2/ ExecStart=/opt/tools/apache-druid-0.20.2/bin/start-cluster-master-no-zk-server start ExecStop=/opt/tools/apache-druid-0.20.2/bin/service --down Restart=1 [Install] WantedBy=multi-user.target
vim /home/path
JAVA_HOME=/opt/tools/jdk PATH=/opt/tools/jdk/bin:/opt/tools/jdk/jre/bin:/usr/local/jdk/bin:/usr/local/jdk/jre/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin
source /home/path chmod 764 /lib/systemd/system/druidmaster.service chmod 764 /usr/lib/systemd/system/druidmaster.service systemctl start druidmaster.service
V. problems encountered
1. hdfs deep storage does not take effect
Solution: copy hdfs related configuration files to conf/druid/cluster/_common / directory
2. The service fails to start, and a large number of HS are generated in the running directory_ err_ pid*. Log log
View log tailf - 100 var / SV / historical log
View java error log tail - 100 HS_ err_ pid2416. log
Solution: check that the available memory of the server is lower than the role jvm configuration. Modify the jvm configuration file and start successfully
3. All druid services start in the background, exit xshell, and the druid process is killed
The machine where the druid server is located logs in through the public network server. When exiting the xshell, it cannot send the correct termination signal to the server, and the nohup program does not receive the normal exit instruction
Solution: exit xshell normally
4. Prevent log explosion
Each role JVM Config add the following configuration:
-Ddruidrole=coordinator-overlord
Modify log4j2 XML file configuration log policy:
<Configuration status="WARN" monitorInterval="30"> <Properties> <Property name="baseDir">var/log/druid</Property> <Property name="filename">${sys:druidrole}</Property> </Properties> <Appenders> <Console name="Console" target="SYSTEM_OUT"> <PatternLayout pattern="%d{ISO8601} %p [%t] %c - %m%n"/> </Console> <RollingFile name="RollingFile" fileName="${baseDir}/${filename}.log" filePattern="${baseDir}/${filename}.%i.log.gz"> <PatternLayout pattern="%d{ISO8601} %p [%t] %c - %m%n"/> <Policies> <SizeBasedTriggeringPolicy size="200 MB"/> </Policies> <DefaultRolloverStrategy max="5"/> </RollingFile> </Appenders> <Loggers> <Root level="info"> <AppenderRef ref="RollingFile"/> </Root>