How to reallocate data after adding nodes in Kafka cluster

This redistribution is implemented by adding partitions to the specified nodes and sharing the pressure. For details of this scheme, please refer to the connection: Pit avoidance Guide: scheme summary of rapid expansion of Kafka cluster_ Java theory and practice - CSDN blog

Steps to add a node

Connect the server of other nodes After copying the properties configuration file, modify the following parameters

broker.id
log.dirs
zookeeper.connect

Principle of data migration

  1. Only the newly added Topic will distribute the data on the new node. If you want to distribute the existing data to the new node, you need to migrate the data in the Topic to the new node.

  2. The data migration process is started manually, but it is fully automated. Kafka adds the new node as a follower of the partition to be migrated and allows it to fully replicate the existing data in the partition. After the new node completely replicates the contents of this partition and joins the synchronous replica, one of the existing replicas will delete the data of its partition.

Introduction to data migration tools

The partition reassignment tool can be used to move partitions between agents. The ideal partition allocation will ensure uniform data load and partition size among all agents. The partition redistribution tool does not have the ability to automatically study the data distribution in Kafka cluster and move partitions around to achieve uniform load distribution. Therefore, you must figure out which topics or sections should be moved.

The partition reassignment tool can run in three modes:

  • --Generate: in this mode, given the topic list and proxy list, the tool will generate a partition and replica reallocation plan to reallocate all partitions of the specified topic on all nodes. Given a list of topics and target agents, this option only provides a convenient way to generate a partition reassignment plan.

  • --execute: in this mode, the tool will start the reallocation of partitions according to the reallocation plan provided by the user. (use the -- reassignment JSON file option). This can be a custom reassignment plan manually made by the administrator, or it can be provided by using the -- generate option

  • --Verify: in this mode, the tool will verify the reallocation status of all partitions listed during the last -- execute. The status can be successfully completed, failed, or in progress

Example:

Existing 5-node broker_id is 1,2,3,4,5; New node broker_id is 6

There are 6 copies of Topic:

Create a topic profile to migrate

topics-to-move.json

{"topics": [{"topic": "test"}],"version":1}

Generate reassignment plan

kafka-reassign-partitions --bootstrap-server localhost:9092 --zookeeper zookeeper-001:2181 --topics-to-move-json-file topics-to-move.json --broker-list "1,2,3,4,5,6" --generate

The above command will produce the following contents. Save the contents under the Proposed partition reassignment configuration as test reassign JSON file

Current partition replica assignment
 
{"version":1,"partitions":[{"topic":"test","partition":0,"replicas":[5,4,1,2,3],"log_dirs":["any","any","any","any","any"]},{"topic":"test","partition":5,"replicas":[5,2,3,4,1],"log_dirs":["any","any","any","any","any"]},{"topic":"test","partition":1,"replicas":[1,5,2,3,4],"log_dirs":["any","any","any","any","any"]},{"topic":"test","partition":4,"replicas":[4,5,1,2,3],"log_dirs":["any","any","any","any","any"]},{"topic":"test","partition":3,"replicas":[3,4,5,1,2],"log_dirs":["any","any","any","any","any"]},{"topic":"test","partition":2,"replicas":[2,1,3,4,5],"log_dirs":["any","any","any","any","any"]}]}
 
Proposed partition reassignment configuration
 
{"version":1,"partitions":[{"topic":"test","partition":4,"replicas":[5,1,2,3,4],"log_dirs":["any","any","any","any","any"]},{"topic":"test","partition":1,"replicas":[2,4,5,6,1],"log_dirs":["any","any","any","any","any"]},{"topic":"test","partition":3,"replicas":[4,6,1,2,3],"log_dirs":["any","any","any","any","any"]},{"topic":"test","partition":0,"replicas":[1,3,4,5,6],"log_dirs":["any","any","any","any","any"]},{"topic":"test","partition":5,"replicas":[6,2,3,4,5],"log_dirs":["any","any","any","any","any"]},{"topic":"test","partition":2,"replicas":[3,5,6,1,2],"log_dirs":["any","any","any","any","any"]}]}

 
Proposed partition reassignment configuration
 
{"version":1,"partitions":[{"topic":"test","partition":4,"replicas":[5,1,2,3,4],"log_dirs":["any","any","any","any","any"]},{"topic":"test","partition":1,"replicas":[2,4,5,6,1],"log_dirs":["any","any","any","any","any"]},{"topic":"test","partition":3,"replicas":[4,6,1,2,3],"log_dirs":["any","any","any","any","any"]},{"topic":"test","partition":0,"replicas":[1,3,4,5,6],"log_dirs":["any","any","any","any","any"]},{"topic":"test","partition":5,"replicas":[6,2,3,4,5],"log_dirs":["any","any","any","any","any"]},{"topic":"test","partition":2,"replicas":[3,5,6,1,2],"log_dirs":["any","any","any","any","any"]}]}

Perform data migration

kafka-reassign-partitions --bootstrap-server localhost:9092 --zookeeper zookeeper-001:2181 --reassignment-json-file test-reassign.json --execute

Check the status of the reassigned partition

kafka-reassign-partitions --bootstrap-server localhost:9092 --zookeeper zookeeper-001:2181 --reassignment-json-file test-reassign.json --verify

Reprint: How to reallocate data after adding nodes in Kafka cluster 

Keywords: Big Data kafka Distribution

Added by rodneykm on Sat, 05 Mar 2022 09:02:03 +0200