Kafka cluster management -- configuration and tuning

Cluster is a kind of computer system, which is connected by a group of loosely integrated computer software and / or hardware to complete the computing work with high close cooperation. In a sense, they can be regarded as a computer. A single computer in a cluster system is usually called a node and is usually connected through a local area network, but there are other possible connection modes. Cluster computers are often used to improve the computing speed and / or reliability of a single computer. In general, the performance price ratio of cluster computer is much higher than that of single computer, such as workstation or supercomputer.

Characteristics of clusters

The cluster has the following two characteristics:

Scalability: the performance of the cluster is not limited to a single service entity. New service entities can be dynamically added to the cluster to enhance the performance of the cluster.
High availability: when one node of the cluster fails, the application running on this node will be automatically taken over at the other node. Eliminating a single point of failure is very important to enhance data availability, accessibility and reliability.

Cluster capabilities

Load balancing: load balancing distributes the tasks evenly to the computing and network resources in the cluster environment to improve the data throughput.
Error recovery: if a server in the cluster cannot be used due to failure or maintenance needs, resources and applications will be transferred to the available cluster nodes. The process in which the resources of one node cannot work and the resources of another available node can transparently take over and continue to complete the task is called error recovery.

Load balancing and error recovery require that resources executing the same task exist in each service entity, and for each resource of the same task, the information view required to execute the task must be the same.

1, Cluster usage scenario

Kafka is a distributed message system, which has the characteristics of high-level expansion and high throughput. In Kafka cluster, there is no concept of "central master node". All nodes in the cluster are equal.

Broker

Each Broker is a Kafka service instance. Multiple brokers form a Kafka cluster. The messages published by producers will be saved in the Broker, and consumers will pull messages from the Broker for consumption.

Kafka cluster architecture diagram

As can be seen from the figure, Kafka strongly relies on ZooKeeper to manage its own cluster through ZooKeeper, such as Broker list management, relationship between Partition and Broker, relationship between Partition and Consumer, load balancing between Producer and Consumer, consumption progress Offset record, Consumer registration, etc. Therefore, in order to achieve high availability, ZooKeeper must also be a cluster.

2, Cluster construction

1.ZooKeeper cluster construction

scene

Real clusters need to be deployed on different servers, but when we test, we can't afford to start more than a dozen virtual machines at the same time. So here we build a pseudo cluster, that is, build all services on one virtual machine and distinguish them by ports.

We need to build a Zookeeper cluster (pseudo cluster) with three nodes.

Install JDK

Cluster directory

Create the Zookeeper cluster directory and copy the unzipped Zookeeper to the following three directories

itcast@Server-node:/mnt/d/zookeeper-cluster$ ll 
total 0 
drwxrwxrwx 1 dayuan dayuan 512 Jul 24 10:02 ./ 
drwxrwxrwx 1 dayuan dayuan 512 Aug 19 18:42 ../ 
drwxrwxrwx 1 dayuan dayuan 512 Jul 24 10:02 zookeeper-1/ 
drwxrwxrwx 1 dayuan dayuan 512 Jul 24 10:02 zookeeper-2/ 
drwxrwxrwx 1 dayuan dayuan 512 Jul 24 10:02 zookeeper-3/ 
itcast@Server-node:/mnt/d/zookeeper-cluster$

ClientPort settings

Configure the dataDir (zoo.cfg) clientPort of each Zookeeper as 2181, 2182 and 2183 respectively

# the port at which the clients will connect 
clientPort=2181

myid configuration

Create a myid file in the data directory of each zookeeper, with the contents of 0, 1 and 2 respectively. This file records the ID of each server

dayuan@MY-20190430BUDR:/mnt/d/zookeeper-cluster/zookeeper-1$ cat 
temp/zookeeper/data/myid 
0
dayuan@MY-20190430BUDR:/mnt/d/zookeeper-cluster/zookeeper-1$

zoo.cfg

In every zookeeper's zoo CFG configure the client access port and cluster server IP list.

dayuan@MY-20190430BUDR:/mnt/d/zookeeper-cluster/zookeeper-1$ cat conf/zoo.cfg 
# The number of milliseconds of each tick 
# Heartbeat time zk of the server 
tickTime=2000 
# The number of ticks that the initial 
# synchronization phase can take 
initLimit=10 
# The number of ticks that can pass between 
# sending a request and getting an acknowledgement 
syncLimit=5 
# the directory where the snapshot is stored. 
# do not use /tmp for storage, /tmp here is just 
# example sakes. 
#dataDir=/tmp/zookeeper 
dataDir=temp/zookeeper/data 
dataLogDir=temp/zookeeper/log 
# the port at which the clients will connect 
clientPort=2181 
# the maximum number of client connections. 
# increase this if you need to handle more clients 
#maxClientCnxns=60 
#
# Be sure to read the maintenance section of the 
# administrator guide before turning on autopurge. 
#
# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance 
#
# The number of snapshots to retain in dataDir 
#autopurge.snapRetainCount=3 
# Purge task interval in hours 
# Set to "0" to disable auto purge feature 
#autopurge.purgeInterval=1 

server.0=127.0.0.1:2888:3888 
server.1=127.0.0.1:2889:3889 
server.2=127.0.0.1:2890:3890 
dayuan@MY-20190430BUDR:/mnt/d/zookeeper-cluster/zookeeper-1$

Explanation: server Server ID = server IP address: communication port between servers: voting port between servers

Start cluster

Starting the cluster is to start each instance separately. After starting, we can query the running status of each instance

itcast@Server-node:/mnt/d/zookeeper-cluster/zookeeper-1$ bin/zkServer.sh status 
ZooKeeper JMX enabled by default 
Using config: /mnt/d/zookeeper-cluster/zookeeper-1/bin/../conf/zoo.cfg 
Mode: leader

itcast@Server-node:/mnt/d/zookeeper-cluster/zookeeper-2$ bin/zkServer.sh status 
ZooKeeper JMX enabled by default 
Using config: /mnt/d/zookeeper-cluster/zookeeper-2/bin/../conf/zoo.cfg 
Mode: follower 

itcast@Server-node:/mnt/d/zookeeper-cluster/zookeeper-3$ bin/zkServer.sh status 
ZooKeeper JMX enabled by default 
Using config: /mnt/d/zookeeper-cluster/zookeeper-3/bin/../conf/zoo.cfg 
Mode: follower

2.Kafka cluster construction

Cluster directory

itcast@Server-node:/mnt/d/kafka-cluster$ ll 
total 0 
drwxrwxrwx 1 dayuan dayuan 512 Aug 28 18:15 ./ 
drwxrwxrwx 1 dayuan dayuan 512 Aug 19 18:42 ../ 
drwxrwxrwx 1 dayuan dayuan 512 Aug 28 18:39 kafka-1/ 
drwxrwxrwx 1 dayuan dayuan 512 Jul 24 14:02 kafka-2/ 
drwxrwxrwx 1 dayuan dayuan 512 Jul 24 14:02 kafka-3/ 
drwxrwxrwx 1 dayuan dayuan 512 Aug 28 18:15 kafka-4/ 
itcast@Server-node:/mnt/d/kafka-cluster$

server.properties

# The broker number must be unique in the cluster 
broker.id=1 
# host address 
host.name=127.0.0.1 
# port 
port=9092 
# Message log storage address 
log.dirs=/tmp/kafka/log/cluster/log3 
# ZooKeeper address, multiple, separated by 
zookeeper.connect=localhost:2181,localhost:2182,localhost:2183

Start cluster

Enter each Kafka instance through cmd and enter the command to start

............................... 
[2019-07-24 06:18:19,793] INFO [Transaction Marker Channel Manager 2]: Starting (kafka.coordinator.transaction.TransactionMarkerChannelManager) 
[2019-07-24 06:18:19,793] INFO [TransactionCoordinator id=2] Startup complete. (kafka.coordinator.transaction.TransactionCoordinator) 
[2019-07-24 06:18:19,846] INFO [/config/changes-event-process-thread]: Starting (kafka.common.ZkNodeChangeNotificationListener$ChangeEventProcessThread) 
[2019-07-24 06:18:19,869] INFO [SocketServer brokerId=2] Started data-plane processors for 1 acceptors (kafka.network.SocketServer) 
[2019-07-24 06:18:19,879] INFO Kafka version: 2.2.1 (org.apache.kafka.common.utils.AppInfoParser) 
[2019-07-24 06:18:19,879] INFO Kafka commitId: 55783d3133a5a49a (org.apache.kafka.common.utils.AppInfoParser) 
[2019-07-24 06:18:19,883] INFO [KafkaServer id=2] started (kafka.server.KafkaServer)

3, Multi cluster synchronization

MirrorMaker exists to solve Kafka's Cross cluster synchronization and create a mirror cluster; The following figure shows how it works. The tool consumes the source cluster messages and then pushes the data back to the target cluster.

1. Configuration

create mirror

Using MirrorMaker to create an image is relatively simple. After setting up the target Kafka cluster, you only need to start the mirror maker program. Among them, one or more consumer profiles and one producer profile are required, while whitelist and blacklist are optional. Specify the Zookeeper of the source Kafka cluster in the configuration of consumer and the Zookeeper (or broker.list) of the target cluster in the configuration of producer.

kafka-run-class.sh kafka.tools.MirrorMaker – 
consumer.config sourceCluster1Consumer.config – 
consumer.config sourceCluster2Consumer.config –num.streams 2 – 
producer.config targetClusterProducer.config –whitelist=".*"

consumer profile:

# format: host1:port1,host2:port2 ... 
bootstrap.servers=localhost:9092 

# consumer group id 
group.id=test-consumer-group 

# What to do when there is no initial offset in Kafka or if the current 
# offset does not exist any more on the server: latest, earliest, none 
#auto.offset.reset=

producer profile:

# format: host1:port1,host2:port2 ... 
bootstrap.servers=localhost:9092 

# specify the compression codec for all data generated: none, gzip, snappy, lz4, zstd 
compression.type=none

2. Tuning

How to ensure that the synchronization data is not lost? You need to confirm when sending it to the target cluster first: request required. When acks = 1 is sent, the blocking mode is adopted, otherwise the buffer is full and the data is discarded: queue enqueue. timeout. ms=-1

summary

This chapter mainly explains the Kafka cluster, introduces the cluster usage scenarios, the construction of Zookeeper and Kafka multi borrowing point clusters, and the synchronous operation of multi clusters.

**Reference: comprehensive analysis of advanced core knowledge in Java
Free access to 18 resume templates for Java Engineers: private letter [data] free access

last

Even job hopping is a learning process. Only a comprehensive review can make us better enrich ourselves, arm ourselves and make our interview no longer rough! Today, I'll share with you a comprehensive collection of Java interview questions on Github, which helped me win the Offer of a large factory and raise my monthly salary to 30K!

Data collection method: Blue portal

I also share it with you for the first time. I hope I can help you go to the big factory you like! Prepare for gold, silver and four!
There are 20 topics of knowledge points in total, which are: