Construction of offline data warehouse - Installation of data acquisition tools
@
1, zookeeper installation and configuration
(1) zookeeper-3.5.9 installation
First go to the Internet to download the zookeeper-3.5.9 installation package, and put the installation package into the installation package path of flink102
cd /opt/software
Unpack the installation package after putting it into the
tar -zxvf apache-zookeeper-3.5.9-bin.tar.gz -C /opt/module # Then check in / opt/module cd /opt/module ll # Or ls # Rename the file for later use mv apache-zookeeper-3.5.9-bin zookeeper-3.5.9
(2) Modify zookeeper configuration file
Go to the zookeeper folder to view the file directory
cd /opt/module/zookeeper-3.5.9 ll
Create zkData to save logs
mkdir zkData
Modify the configuration file of zookeeper
Rename the file first
mv conf/zoo_sample.cfg zoo.cfg # Then modify it vim conf/zoo.cfg
The amendments are as follows:
# The number of milliseconds of each tick tickTime=2000 # The number of ticks that the initial # synchronization phase can take initLimit=10 # The number of ticks that can pass between # sending a request and getting an acknowledgement syncLimit=5 # the directory where the snapshot is stored. # do not use /tmp for storage, /tmp here is just # example sakes. dataDir=/opt/module/zookeeper-3.5.9/zkData # the port at which the clients will connect clientPort=2181 # the maximum number of client connections. # increase this if you need to handle more clients #maxClientCnxns=60 # # Be sure to read the maintenance section of the # administrator guide before turning on autopurge. # # http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance # # The number of snapshots to retain in dataDir #autopurge.snapRetainCount=3 # Purge task interval in hours # Set to "0" to disable auto purge feature #autopurge.purgeInterval=1 #########cluster######### server.2=flink102:2888:3888 server.3=flink103:2888:3888 server.4=flink104:2888:3888
(3) Add zookeeper environment variable
In the previous article, you defined an environment variable file and added it
sudo vim /etc/profile.d/my_env.sh
Add zoo_home:
# JAVA_HOME export JAVA_HOME=/opt/module/jdk1.8.0_212 export PATH=$PATH:$JAVA_HOME/bin # HADOOP_HOME export HADOOP_HOME=/opt/module/hadoop-3.1.3 export PATH=$PATH:$HADOOP_HOME/bin export PATH=$PATH:$HADOOP_HOME/sbin # ZOO_HOME export ZOO_HOME=/opt/module/zookeeper-3.5.9 export PATH=$PATH:$ZOO_HOME/bin
source /etc/profile
(4) zookeeper startup
Start zookeeper service
cd /opt/module/zookeeper-3.5.9 bin/zkServer.sh start
Start zookeeper client
bin/zkCli.sh
(5) Cluster zookeeper configuration
The stand-alone setup has been completed above. Next, the files will be distributed to nodes 103 and 104
cd /opt/module xsync zookeeper-3.5.9 sudo xsync /etc/profile.d/my_env.sh
Remember to go to the source of nodes 103 and 104
Create a myid file in the zkData folder and edit it
cd /opt/module/zookeeper-3.5.9 vim zkData/myid # Write any number 2
Distributed to nodes 103 and 104
xsync zkData/myid
And modify the numbers on 103 and 104 as long as they are different
(6) zookeeper cluster scripting
cd to our script directory
cd /home/flink/bin vim zk.sh
The script is as follows:
#!/bin/bash if [ $# -lt 1 ] then echo "No Args Input" exit fi case $1 in "start") for i in flink102 flink103 flink104 do echo "==================$i==================" ssh $i /opt/module/zookeeper-3.5.9/bin/zkServer.sh start done for i in flink102 flink103 flink104 do echo "==================$i==================" ssh $i /opt/module/zookeeper-3.5.9/bin/zkServer.sh status done ;; "stop") for i in flink102 flink103 flink104 do echo "==================$i==================" ssh $i /opt/module/zookeeper-3.5.9/bin/zkServer.sh stop done ;; "status") for i in flink102 flink103 flink104 do echo "==================$i==================" ssh $i /opt/module/zookeeper-3.5.9/bin/zkServer.sh status done ;; *) echo "Args Error" ;; esac
Grant Execution Authority
chmod 777 zk.sh
Now you can try the group zookeeper
zk.sh start # Check whether there is a group in the background jpsall
2, kafka installation and configuration
(1) kafka installation
Here I choose kafka_2.11-2.4.1.tgz, which can be downloaded from the official website
Just like zookeeper, put the installation package into the installation package directory
cd /opt/software # Unzip kafka tar -zxvf kafka_2.11-2.4.1.tgz -C /opt/module # Go to the module directory and modify the file name cd /opt/module mv kafka_2.11-2.4.1 kafka
(2) Modify kafka configuration file
Enter the kafka folder to view the file structure
cd kafka ll
drwxr-xr-x. 3 flink flink 4096 3 March 2020 bin drwxr-xr-x. 2 flink flink 4096 11 December 14:44 config drwxrwxr-x. 20 flink flink 4096 11 December 16:33 datas drwxr-xr-x. 2 flink flink 4096 11 December 10:12 libs -rw-r--r--. 1 flink flink 32216 3 March 2020 LICENSE drwxrwxr-x. 2 flink flink 4096 11 December 16:00 logs -rw-r--r--. 1 flink flink 337 3 March 2020 NOTICE drwxr-xr-x. 2 flink flink 4096 3 March 2020 site-docs
Enter the config directory to modify the configuration file
cd config vim server.properties
The revised content is:
############################# Server Basics ############################# # The id of the broker. This must be set to a unique integer for each broker. broker.id=0 # The number of threads that the server uses for receiving requests from the network and sending responses to the network num.network.threads=3 # The number of threads that the server uses for processing requests, which may include disk I/O num.io.threads=8 # The send buffer (SO_SNDBUF) used by the socket server socket.send.buffer.bytes=102400 # The receive buffer (SO_RCVBUF) used by the socket server socket.receive.buffer.bytes=102400 # The maximum size of a request that the socket server will accept (protection against OOM) socket.request.max.bytes=104857600 ############################# Log Basics ############################# # A comma separated list of directories under which to store log files log.dirs=/opt/module/kafka/datas # The default number of log partitions per topic. More partitions allow greater # parallelism for consumption, but this will also result in more files across # the brokers. num.partitions=1 # The number of threads per data directory to be used for log recovery at startup and flushing at shutdown. # This value is recommended to be increased for installations with data dirs located in RAID array. num.recovery.threads.per.data.dir=1 ############################# Zookeeper ############################# # Zookeeper connection string (see zookeeper docs for details). # This is a comma separated host:port pairs, each corresponding to a zk # server. e.g. "127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002". # You can also append an optional chroot string to the urls to specify the # root directory for all kafka znodes. zookeeper.connect=flink102:2181,flink103:2181,flink104:2181/kafka # Timeout in ms for connecting to zookeeper zookeeper.connection.timeout.ms=6000
(3) Configuring kafka environment variables
sudo vim /etc/profile.d/my_env.sh
Add the following:
# KAFKA_HOME export KAFKA_HOME=/opt/module/kafka export PATH=$PATH:$KAFKA_HOME/bin
Remember the source
(4) Distributed to nodes 103 and 104
xsync /opt/module/kafka sudo xsync /etc/profile.d/my_env.sh
Remember that each node has source, and then enter the path / opt/module/kafka/conf to modify the server Properties, the broker Set ID to 1 and 2
(5) Group kafka script
Try to start the kafka service on node 102 first
bin/kafka-server-start.sh
kafka group script writing
cd /home/flink/bin vim kf.sh
#! /bin/bash case $1 in "start"){ for i in flink102 flink103 flink104 do echo " --------start-up $i Kafka-------" ssh $i "/opt/module/kafka/bin/kafka-server-start.sh -daemon /opt/module/kafka/config/server.properties" done };; "stop"){ for i in flink102 flink103 flink104 do echo " --------stop it $i Kafka-------" ssh $i "/opt/module/kafka/bin/kafka-server-stop.sh stop" done };; esac
chmod 777 kf.sh
Try to get together kafka and look at the background process with jpsall
kf.sh start jpsall
3, flume installation and configuration
(1) flume installation and configuration
flume I chose 1.9.0, which can be downloaded from the official website
Put the installation package in / opt/software
tar -zxvf /opt/software/apache-flume-1.9.0-bin.tar.gz -C /opt/module
Modify file name
cd /opt/module mv apache-flume-1.9.0-bin flume-1.9.0
After flume is installed, you need to delete a jar package to be compatible with Hadoop 3.0 one point three
rm /opt/module/flume-1.9.0/lib/guava-11.0.2.jar