Kafka's online deployment scheme for Kafka production optimization

1. Operating system

Deploy on Linux

I/O model
Data network transmission efficiency

1.1 I/O model level

The I/O model is the method by which the operating system executes IO instructions.

There are five types

Blocking IO
Non blocking IO
IO multiplexing
Signal driven IO
Asynchronous IO

We can simply think that the latter model is more efficient than the previous model. epoll model is between the third and fourth, and select belongs to the third.

The underlying layer of Kafka's client uses Java's selector, which is implemented in epoll on Linux and select on Windows. Therefore, Kafka deployed on Linux will have more efficient I/O performance.

1.2 network transmission efficiency

When data is transmitted between disk and network, you can enjoy the quickness, convenience and efficiency brought by zero copy mechanism on Linux, but not on Windows.

2. Disk

Ordinary mechanical hard disk is enough.
If you have sufficient funds, you can Raid the hard disk group;
If the funds are tight, the read-write performance can be guaranteed without using the Raid scheme.

2.1 choose ordinary mechanical hard disk or solid state hard disk

Ordinary mechanical hard disk can be used. Kafka storage mode is sequential reading and writing. The biggest disadvantage of mechanical hard disk is slow random reading and writing. Therefore, the use of mechanical hard disk will not cause low performance.

2.2 raid of disk group

raid has the advantages of providing redundant disk storage space and load balancing.

However, Kafka has its own redundancy mechanism, and realizes the function of load balancing through the design of zoning. Therefore, if you have the economic ability, you can put it on the storage space of the group raid. If you consider the cost performance, you can directly do no raid.

3. Disk capacity

Number of messages per day
Size of each message
Number of copies
Message retention time
Enable compression

How much storage space does Kafka need

Design scenario: log data sends 100 million pieces of data to kafka every day. There are two copies of each data to prevent data loss. The data is saved for two weeks, and the average size of each message is 1KB.

If 100 million 1KB messages are saved every day for two weeks, the total daily size is: 100 million 1KB2/1000/1000=200GB
kafka has other types of data besides message data, so it needs 220GB to increase the redundancy space by 10%
220GB*14 ≈ 3TB in two weeks
If compression is enabled and the compression ratio is about 0.75, the total storage space is planned to be 3TB*0.75=2.25TB

4. Bandwidth

If the network is 10 Gigabit bandwidth, there will be no network bottleneck. If the amount of data is particularly large, calculate according to the design scenario below.
If the network is 100m or Gigabit bandwidth, there will be network bottlenecks in the scenario of processing a large amount of data. It can be calculated and processed according to the following traditional empirical formula, or it can be designed according to the following scenario and the actual situation of its own production.

Empirical formula: number of servers = 2 × (producer peak production rate) × Number of copies ÷ 100) + 1

Design scenario: if the computer room has Gigabit bandwidth, we need to process 1TB of data in one hour. How many kafka servers do we need?

Since the bandwidth is gigabit network, 1000Mbps=1Gbps, the amount of data each server can receive per second is 1Gb=1000Mb
Assuming that Kafka occupies 70% of the whole server network (the other 30% is reserved for other services), Kafka can use 700Mb bandwidth. However, from a conventional point of view, we can't always let Kafka top the peak bandwidth, so we need to reserve 2 / 3 or even 3 / 4 of the resources, that is, the bandwidth used by a single Kafka server should actually be 700Mb/3=240Mb
1TB data needs to be processed in one hour, 1TB = 102410248mb = 800000mb, then the data processing volume in one second is 8000000Mb/3600s=2330Mb data.
The number of servers required is 2330Mb/240Mb ≈ 10.
Considering the number of copies of the message, if it is 2, 20 servers are required, and if it is 3, 30 servers are required.

5. Memory

It is recommended that the memory of the server node where Kafka is installed should be at least 16G.

Kafka's memory consists of heap memory + page cache.

5.1 heap memory configuration

10G-15G per node is recommended

It needs to be in Kafka server start SH modify

if [ "x$KAFKA_HEAP_OPTS" = "x" ]; then
    export KAFKA_HEAP_OPTS="-Xmx10G -Xms10G"
fi

5.2 query of Kafka GC

Check the process number of Kafka through the jps command

[atguigu@hadoop102 kafka]$ jps
2321 Kafka
5255 Jps
1931 QuorumPeerMain

Check the GC status of Kafka according to the Kafka process number

[atguigu@hadoop102 kafka]$ jstat -gc 2321 1s 10
 S0C    S1C    S0U    S1U      EC       EU        OC         OU       MC      MU     CCSC   CCSU   YGC     YGCT    FGC     FGCT      GCT
 0.0   7168.0  0.0   7168.0 103424.0 60416.0  1986560.0   148433.5  52092.0 46656.1 6780.0 6202.2     13    0.531   0      0.000    0.531
 0.0   7168.0  0.0   7168.0 103424.0 60416.0  1986560.0   148433.5  52092.0 46656.1 6780.0 6202.2     13    0.531   0      0.000    0.531
 0.0   7168.0  0.0   7168.0 103424.0 60416.0  1986560.0   148433.5  52092.0 46656.1 6780.0 6202.2     13    0.531   0      0.000    0.531
 0.0   7168.0  0.0   7168.0 103424.0 60416.0  1986560.0   148433.5  52092.0 46656.1 6780.0 6202.2     13    0.531   0      0.000    0.531
 0.0   7168.0  0.0   7168.0 103424.0 60416.0  1986560.0   148433.5  52092.0 46656.1 6780.0 6202.2     13    0.531   0      0.000    0.531
 0.0   7168.0  0.0   7168.0 103424.0 61440.0  1986560.0   148433.5  52092.0 46656.1 6780.0 6202.2     13    0.531   0      0.000    0.531
 0.0   7168.0  0.0   7168.0 103424.0 61440.0  1986560.0   148433.5  52092.0 46656.1 6780.0 6202.2     13    0.531   0      0.000    0.531
 0.0   7168.0  0.0   7168.0 103424.0 61440.0  1986560.0   148433.5  52092.0 46656.1 6780.0 6202.2     13    0.531   0      0.000    0.531
 0.0   7168.0  0.0   7168.0 103424.0 61440.0  1986560.0   148433.5  52092.0 46656.1 6780.0 6202.2     13    0.531   0      0.000    0.531

S0C: size of the first surviving area
S1C: size of the second surviving area
S0U: use size of the first surviving area
S1U: use size of the second surviving area
EC: the size of Eden Park
EU: use size of Eden Park
OC: old age size
OU: size used in old age
MC: method area size
MU: size of method area
CCSC: compressed class space size
CCSU: compressed class space usage size
YGC: garbage collection times of young generation
YGCT: waste collection time of young generation
FGC: garbage collection times in old age
FGCT: waste collection time in old age
GCT: total time consumed by garbage collection

5.3 Kafka heap memory usage query

Check the heap memory of Kafka according to the Kafka process number

[atguigu@hadoop102 kafka]$ jmap -heap 2321
Attaching to process ID 2321, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 25.212-b10

using thread-local object allocation.
Garbage-First (G1) GC with 8 thread(s)

Heap Configuration:
   MinHeapFreeRatio         = 40
   MaxHeapFreeRatio         = 70
   MaxHeapSize              = 2147483648 (2048.0MB)
   NewSize                  = 1363144 (1.2999954223632812MB)
   MaxNewSize               = 1287651328 (1228.0MB)
   OldSize                  = 5452592 (5.1999969482421875MB)
   NewRatio                 = 2
   SurvivorRatio            = 8
   MetaspaceSize            = 21807104 (20.796875MB)
   CompressedClassSpaceSize = 1073741824 (1024.0MB)
   MaxMetaspaceSize         = 17592186044415 MB
   G1HeapRegionSize         = 1048576 (1.0MB)

Heap Usage:
G1 Heap:
   regions  = 2048
   capacity = 2147483648 (2048.0MB)
   used     = 246367744 (234.95458984375MB)
   free     = 1901115904 (1813.04541015625MB)
   11.472392082214355% used
G1 Young Generation:
Eden Space:
   regions  = 83
   capacity = 105906176 (101.0MB)
   used     = 87031808 (83.0MB)
   free     = 18874368 (18.0MB)
   82.17821782178218% used
Survivor Space:
   regions  = 7
   capacity = 7340032 (7.0MB)
   used     = 7340032 (7.0MB)
   free     = 0 (0.0MB)
   100.0% used
G1 Old Generation:
   regions  = 147
   capacity = 2034237440 (1940.0MB)
   used     = 151995904 (144.95458984375MB)
   free     = 1882241536 (1795.04541015625MB)
   7.471886074420103% used

13364 interned Strings occupying 1449608 bytes

5.4 Kafka page cache

The page cache is the memory of the Linux system server. We only need to ensure that 25% of the data in a segment (the default value is 1G) is in memory.

5.5 summary

Based on the above, Kafka needs at least 11G to run smoothly and stably in the big data scenario. It is recommended that the memory of the server node where Kafka is installed should be at least 16G.

6.CPU

It is recommended that the number of CPU cores of Kafka server should be more than 32.

Observe all Kafka thread related configurations. There are the following

Parameter name	remarks	Default value
num.network.threads	The number of threads used by the server to receive requests from the network and send responses to the network	3
num.io.threads	The number of threads the server uses to process requests, which may include disk I/O	8
num.replica.fetchers	The number of replica pull threads. Increasing this value can increase the parallelism of replica node pull	1
num.recovery.threads.per.data.dir	The number of threads per data directory used for log recovery at startup and refresh at shutdown	1
log.cleaner.threads	Number of background threads used for log cleanup	1
background.threads	Number of threads used for various background processing tasks	10

Among them, the fourth parameter is only used during startup and shutdown, and log cleaning is only available at a certain time interval. Therefore, there should be at least 22 resident threads.

In the production environment, it is recommended that the number of CPU cores should be at least 16 cores and more than 32 cores to ensure the normal processing and operation of Kafka cluster in big data environment.

Keywords: Linux kafka Distribution Middleware

Added by PHPycho on Wed, 23 Feb 2022 14:09:46 +0200

Programming VIP