Hadoop HDFS 3.3.1 distributed storage setup

preface

After reading the title, will you be surprised how Kunlun distributed database plays a distributed file system?

Since version 0.9 of Kunlun distributed database requires the addition of HDFS distributed file backup function, my little brother will share the learning process in his work.

HDFS (Hadoop distributed file system) is the core sub project of Hadoop project and the basis of data storage management in distributed computing.

It is developed based on the requirements of accessing and processing large files in streaming data mode, and can run on cheap commercial servers.

It has the characteristics of high error tolerance, high reliability, high scalability, high availability and high throughput. It provides fault-free storage for massive data and brings a lot of convenience to the application and processing of Large Data Set.

HDFS is open source and stores the data to be processed by Hadoop applications. It is similar to ordinary Unix and Linux file systems. The difference is that it implements the idea of google's GFS file system and is an extensible distributed file system suitable for large-scale distributed data processing related applications.

Next, we will introduce in detail how to build HDFS distributed file storage system on two CENTOS8 virtual machines:

1, Configure basic environment

1.1 modify the host name and restart it to take effect

vim /etc/hosts
192.168.207.164         centos8-0
192.168.207.165         centos8-1

1.2 do not start and turn off the firewall

systemctl stop firewalld.service
systemctl disable firewalld.service

1.3 turn off Selinux and set startup not to start

setenforce 0
vim /etc/selinux/config
SELINUX=disabled

Repeat steps 1.1-1.3 for another machine

1.4 configure password free login

1.4.1 log in to the system with root user to generate the key:

ssh-keygen
Press enter all the time to confirm (the generated key files are id_rsa and id_rsa. Pub under / root/.ssh)

1.4.2 configure your own password free login:

ssh-copy-id centos8-0
(another machine ssh-copy-id centos8-1) enter yes according to the prompt and enter the password (centos8-0 is the host name of your current first machine).

1.4.3 copy the key file to the second host:

ssh-copy-id centos8-1
(another machine ssh-copy-id centos8-0), enter yes according to the prompt, and enter the root password of the other host according to the prompt.

1.4.4 after configuring the password free login, test each other to see if the password free login is possible:

ssh centos8-0 (ssh centos8-1)

2, Install java environment

2.1 download java binary package: https://www.oracle.com/

2.2 decompress and move to / usr/ocal

tar zxf jdk-8u131-linux-x64.tar.gz
mv jdk1.8.0_131 /usr/local

2.3 configuring environment variables

Open / etc/profile and add the following contents at the end of the file:

export JAVA_HOME=/usr/local/jdk1.8.0_131
export JRE_HOME=/usr/local/jdk1.8.0_131/jre
export PATH=$PATH:$JAVA_HOME/bin
exportCLASSPATH=.:$JAVA_HOME/lib:/dt.jar:$JAVA_HOME/lib/tools.jar
exportLD_LIBRARY_PATH=$JAVA_HOME/jre/lib/amd64:$JAVA_HOME/jre/lib/amd64/server

2.4 making environmental variables effective
source /etc/profile

2.5 test whether the installation is successful
`java -version
`
2.6 copy to second machine

scp -r /usr/local/jdk1.8.0_131/root@centos8-1:/usr/local/
scp /etc/profile root@centos8-1:/etc/

`2.7 second virtual machine centos8-1
source /etc/profile`

3, Hadoop installation

3.1 download address of each version of Hadoop:

https://archive.apache.org/dist/hadoop/common/

3.2 it is recommended to put hadoop in the / home directory. The root directory space of CENTOS8 is too small

mkdir /home/hadoop/
tar zxf hadoop-3.3.1.tar.gz 
mkdir /home/hadoop/tmp -p
mkdir /home/hadoop/dfs/data -p
mkdir /home/hadoop/dfs/name -p

3.3 open and modify / etc/profile, and add the following contents at the end of the file:

exportHADOOP_HOME=/home/hadoop/hadoop-3.3.1
exportPATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

3.4 making environmental variables effective
source /etc/profile

3.5 configure Hadoop env sh

cd /home/hadoop/hadoop-3.3.1/etc/hadoop
vim hadoop-env.sh

Then add:

export JAVA_HOME=/usr/local/jdk1.8.0_131
export HDFS_NAMENODE_USER=root
export HDFS_DATANODE_USER=root
export HDFS_DATANODE_SECURE_USER=root
export HDFS_SECONDARYNAMENODE_USER=root
export HADOOP_SHELL_EXECNAME=root

3.6 configure core site xml

<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://centos8-0:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/tmp</value>
</property>
</configuration>

3.7 configure HDFS site xml

<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/home/hadoop/dfs/name</value>
</property>
<property>
<name>dfs.datanode.name.dir</name>
<value>/home/hadoop/dfs/data</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>centos8-1:9000</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
</configuration>

3.8 configure wroks and add the following:
centos8-0
centos8-1

3.9 copying to other machines

scp -r /home/hadoop root@centos8-1:/home/
scp /etc/profile root@centos8-1:/etc/

Second virtual machine centos8-1
source /etc/profile

3.10 initialize the namenode node, and the second machine should also be initialized
hdfs namenode -format

3.11 start the service on the first machine:
start-dfs.sh

3.12 access test:

Default browse address: http://192.168.207.164:9870/

View the file through utilities - > browse the file system. Uploading the file from the WINDOWS Web page will fail. You need to add the following:

C:\Windows\System32\drivers\etc\hosts
192.168.207.164         centos8-0
192.168.207.165         centos8-1

3.13 command line test

hadoop fs -ls /
hadoop fs -mkdir /mytest
hadoop fs -copyFromLocal test.txt/test.txt
hadoop fs -appendToFile test.txt/test.txt
hadoop fs -cat /test.txt

So far, Hadoop HDFS 3.3.1 distributed storage has been built

*KunlunDB project is open source

[GitHub: ]
https://github.com/zettadb

[Gitee: ]
https://gitee.com/zettadb

END

Keywords: Database

Added by ih8telepathy.cm on Thu, 24 Feb 2022 11:31:54 +0200