Construction of offline data warehouse acquisition channel

Construction of offline data warehouse - Installation of data acquisition tools

1, zookeeper installation and configuration

(1) zookeeper-3.5.9 installation

First go to the Internet to download the zookeeper-3.5.9 installation package, and put the installation package into the installation package path of flink102

cd /opt/software

Unpack the installation package after putting it into the

tar -zxvf apache-zookeeper-3.5.9-bin.tar.gz -C /opt/module
# Then check in / opt/module
cd /opt/module
ll # Or ls
# Rename the file for later use
mv apache-zookeeper-3.5.9-bin zookeeper-3.5.9

(2) Modify zookeeper configuration file

Go to the zookeeper folder to view the file directory

cd /opt/module/zookeeper-3.5.9
ll

Create zkData to save logs

mkdir zkData

Modify the configuration file of zookeeper
Rename the file first

mv conf/zoo_sample.cfg zoo.cfg
# Then modify it
vim conf/zoo.cfg

The amendments are as follows:

# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial 
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between 
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just 
# example sakes.
dataDir=/opt/module/zookeeper-3.5.9/zkData
# the port at which the clients will connect
clientPort=2181
# the maximum number of client connections.
# increase this if you need to handle more clients
#maxClientCnxns=60
#
# Be sure to read the maintenance section of the 
# administrator guide before turning on autopurge.
#
# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
#
# The number of snapshots to retain in dataDir
#autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to "0" to disable auto purge feature
#autopurge.purgeInterval=1
#########cluster#########
server.2=flink102:2888:3888
server.3=flink103:2888:3888
server.4=flink104:2888:3888

(3) Add zookeeper environment variable

In the previous article, you defined an environment variable file and added it

sudo vim /etc/profile.d/my_env.sh

Add zoo_home:

# JAVA_HOME
export JAVA_HOME=/opt/module/jdk1.8.0_212
export PATH=$PATH:$JAVA_HOME/bin
# HADOOP_HOME
export HADOOP_HOME=/opt/module/hadoop-3.1.3
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
# ZOO_HOME
export ZOO_HOME=/opt/module/zookeeper-3.5.9
export PATH=$PATH:$ZOO_HOME/bin

source /etc/profile

(4) zookeeper startup

Start zookeeper service

cd /opt/module/zookeeper-3.5.9
bin/zkServer.sh start

Start zookeeper client

bin/zkCli.sh

(5) Cluster zookeeper configuration

The stand-alone setup has been completed above. Next, the files will be distributed to nodes 103 and 104

cd /opt/module
xsync zookeeper-3.5.9
sudo xsync /etc/profile.d/my_env.sh

Remember to go to the source of nodes 103 and 104
Create a myid file in the zkData folder and edit it

cd /opt/module/zookeeper-3.5.9
vim zkData/myid
# Write any number
2

Distributed to nodes 103 and 104

xsync zkData/myid

And modify the numbers on 103 and 104 as long as they are different

(6) zookeeper cluster scripting

cd to our script directory

cd /home/flink/bin
vim zk.sh

The script is as follows:

#!/bin/bash
if [ $# -lt 1 ]
  then
  echo "No Args Input"
  exit
fi
 
case $1 in
"start")
  for i in flink102 flink103 flink104
  do
  echo "==================$i=================="
  ssh $i /opt/module/zookeeper-3.5.9/bin/zkServer.sh start
  done
  for i in flink102 flink103 flink104
  do
  echo "==================$i=================="
  ssh $i /opt/module/zookeeper-3.5.9/bin/zkServer.sh status
  done
;;
"stop")
  for i in flink102 flink103 flink104
  do
  echo "==================$i=================="
  ssh $i /opt/module/zookeeper-3.5.9/bin/zkServer.sh stop
  done
;;
"status")
  for i in flink102 flink103 flink104
  do
  echo "==================$i=================="
  ssh $i /opt/module/zookeeper-3.5.9/bin/zkServer.sh status
  done
;;
*)
 echo "Args Error"
;;
esac

Grant Execution Authority

chmod 777 zk.sh

Now you can try the group zookeeper

zk.sh start
# Check whether there is a group in the background
jpsall

2, kafka installation and configuration

(1) kafka installation

Here I choose kafka_2.11-2.4.1.tgz, which can be downloaded from the official website
Just like zookeeper, put the installation package into the installation package directory

cd /opt/software
# Unzip kafka
tar -zxvf kafka_2.11-2.4.1.tgz -C /opt/module
# Go to the module directory and modify the file name
cd /opt/module
mv kafka_2.11-2.4.1 kafka

(2) Modify kafka configuration file

Enter the kafka folder to view the file structure

cd kafka
ll

drwxr-xr-x.  3 flink flink  4096 3 March 2020 bin
drwxr-xr-x.  2 flink flink  4096 11 December 14:44 config
drwxrwxr-x. 20 flink flink  4096 11 December 16:33 datas
drwxr-xr-x.  2 flink flink  4096 11 December 10:12 libs
-rw-r--r--.  1 flink flink 32216 3 March 2020 LICENSE
drwxrwxr-x.  2 flink flink  4096 11 December 16:00 logs
-rw-r--r--.  1 flink flink   337 3 March 2020 NOTICE
drwxr-xr-x.  2 flink flink  4096 3 March 2020 site-docs

Enter the config directory to modify the configuration file

cd config
vim server.properties

The revised content is:

############################# Server Basics #############################

# The id of the broker. This must be set to a unique integer for each broker.
broker.id=0

# The number of threads that the server uses for receiving requests from the network and sending responses to the network
num.network.threads=3

# The number of threads that the server uses for processing requests, which may include disk I/O
num.io.threads=8

# The send buffer (SO_SNDBUF) used by the socket server
socket.send.buffer.bytes=102400

# The receive buffer (SO_RCVBUF) used by the socket server
socket.receive.buffer.bytes=102400

# The maximum size of a request that the socket server will accept (protection against OOM)
socket.request.max.bytes=104857600

############################# Log Basics #############################

# A comma separated list of directories under which to store log files
log.dirs=/opt/module/kafka/datas

# The default number of log partitions per topic. More partitions allow greater
# parallelism for consumption, but this will also result in more files across
# the brokers.
num.partitions=1

# The number of threads per data directory to be used for log recovery at startup and flushing at shutdown.
# This value is recommended to be increased for installations with data dirs located in RAID array.
num.recovery.threads.per.data.dir=1

############################# Zookeeper #############################

# Zookeeper connection string (see zookeeper docs for details).
# This is a comma separated host:port pairs, each corresponding to a zk
# server. e.g. "127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002".
# You can also append an optional chroot string to the urls to specify the
# root directory for all kafka znodes.
zookeeper.connect=flink102:2181,flink103:2181,flink104:2181/kafka

# Timeout in ms for connecting to zookeeper
zookeeper.connection.timeout.ms=6000

(3) Configuring kafka environment variables

sudo vim /etc/profile.d/my_env.sh

Add the following:

# KAFKA_HOME
export KAFKA_HOME=/opt/module/kafka
export PATH=$PATH:$KAFKA_HOME/bin

Remember the source

(4) Distributed to nodes 103 and 104

xsync /opt/module/kafka
sudo xsync /etc/profile.d/my_env.sh

Remember that each node has source, and then enter the path / opt/module/kafka/conf to modify the server Properties, the broker Set ID to 1 and 2

(5) Group kafka script

Try to start the kafka service on node 102 first

bin/kafka-server-start.sh

kafka group script writing

cd /home/flink/bin
vim kf.sh

#! /bin/bash

case $1 in
"start"){
   for i in flink102 flink103 flink104
   do
       echo " --------start-up $i Kafka-------"
       ssh $i "/opt/module/kafka/bin/kafka-server-start.sh -daemon /opt/module/kafka/config/server.properties"
   done
};;
"stop"){
   for i in flink102 flink103 flink104
   do
       echo " --------stop it $i Kafka-------"
       ssh $i "/opt/module/kafka/bin/kafka-server-stop.sh stop"
   done
};;
esac

chmod 777 kf.sh

Try to get together kafka and look at the background process with jpsall

kf.sh start
jpsall

3, flume installation and configuration

(1) flume installation and configuration

flume I chose 1.9.0, which can be downloaded from the official website
Put the installation package in / opt/software

tar -zxvf /opt/software/apache-flume-1.9.0-bin.tar.gz -C /opt/module

Modify file name

cd /opt/module
mv apache-flume-1.9.0-bin flume-1.9.0

After flume is installed, you need to delete a jar package to be compatible with Hadoop 3.0 one point three

rm /opt/module/flume-1.9.0/lib/guava-11.0.2.jar

So far, the tools to be installed for the data acquisition channel have been installed

Added by Vasko on Fri, 14 Jan 2022 05:59:48 +0200

Programming VIP