preface
After reading the title, will you be surprised how Kunlun distributed database plays a distributed file system?
Since version 0.9 of Kunlun distributed database requires the addition of HDFS distributed file backup function, my little brother will share the learning process in his work.
HDFS (Hadoop distributed file system) is the core sub project of Hadoop project and the basis of data storage management in distributed computing.
It is developed based on the requirements of accessing and processing large files in streaming data mode, and can run on cheap commercial servers.
It has the characteristics of high error tolerance, high reliability, high scalability, high availability and high throughput. It provides fault-free storage for massive data and brings a lot of convenience to the application and processing of Large Data Set.
HDFS is open source and stores the data to be processed by Hadoop applications. It is similar to ordinary Unix and Linux file systems. The difference is that it implements the idea of google's GFS file system and is an extensible distributed file system suitable for large-scale distributed data processing related applications.
Next, we will introduce in detail how to build HDFS distributed file storage system on two CENTOS8 virtual machines:
1, Configure basic environment
1.1 modify the host name and restart it to take effect
vim /etc/hosts 192.168.207.164 centos8-0 192.168.207.165 centos8-1
1.2 do not start and turn off the firewall
systemctl stop firewalld.service systemctl disable firewalld.service
1.3 turn off Selinux and set startup not to start
setenforce 0 vim /etc/selinux/config SELINUX=disabled
Repeat steps 1.1-1.3 for another machine
1.4 configure password free login
1.4.1 log in to the system with root user to generate the key:
ssh-keygen
Press enter all the time to confirm (the generated key files are id_rsa and id_rsa. Pub under / root/.ssh)
1.4.2 configure your own password free login:
ssh-copy-id centos8-0
(another machine ssh-copy-id centos8-1) enter yes according to the prompt and enter the password (centos8-0 is the host name of your current first machine).
1.4.3 copy the key file to the second host:
ssh-copy-id centos8-1
(another machine ssh-copy-id centos8-0), enter yes according to the prompt, and enter the root password of the other host according to the prompt.
1.4.4 after configuring the password free login, test each other to see if the password free login is possible:
ssh centos8-0 (ssh centos8-1)
2, Install java environment
2.1 download java binary package: https://www.oracle.com/
2.2 decompress and move to / usr/ocal
tar zxf jdk-8u131-linux-x64.tar.gz mv jdk1.8.0_131 /usr/local
2.3 configuring environment variables
Open / etc/profile and add the following contents at the end of the file:
export JAVA_HOME=/usr/local/jdk1.8.0_131 export JRE_HOME=/usr/local/jdk1.8.0_131/jre export PATH=$PATH:$JAVA_HOME/bin exportCLASSPATH=.:$JAVA_HOME/lib:/dt.jar:$JAVA_HOME/lib/tools.jar exportLD_LIBRARY_PATH=$JAVA_HOME/jre/lib/amd64:$JAVA_HOME/jre/lib/amd64/server
2.4 making environmental variables effective
source /etc/profile
2.5 test whether the installation is successful
`java -version
`
2.6 copy to second machine
scp -r /usr/local/jdk1.8.0_131/root@centos8-1:/usr/local/ scp /etc/profile root@centos8-1:/etc/
`2.7 second virtual machine centos8-1
source /etc/profile`
3, Hadoop installation
3.1 download address of each version of Hadoop:
https://archive.apache.org/dist/hadoop/common/
3.2 it is recommended to put hadoop in the / home directory. The root directory space of CENTOS8 is too small
mkdir /home/hadoop/ tar zxf hadoop-3.3.1.tar.gz mkdir /home/hadoop/tmp -p mkdir /home/hadoop/dfs/data -p mkdir /home/hadoop/dfs/name -p
3.3 open and modify / etc/profile, and add the following contents at the end of the file:
exportHADOOP_HOME=/home/hadoop/hadoop-3.3.1 exportPATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
3.4 making environmental variables effective
source /etc/profile
3.5 configure Hadoop env sh
cd /home/hadoop/hadoop-3.3.1/etc/hadoop vim hadoop-env.sh
Then add:
export JAVA_HOME=/usr/local/jdk1.8.0_131 export HDFS_NAMENODE_USER=root export HDFS_DATANODE_USER=root export HDFS_DATANODE_SECURE_USER=root export HDFS_SECONDARYNAMENODE_USER=root export HADOOP_SHELL_EXECNAME=root
3.6 configure core site xml
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://centos8-0:9000</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/home/hadoop/tmp</value> </property> </configuration>
3.7 configure HDFS site xml
<configuration> <property> <name>dfs.replication</name> <value>2</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>/home/hadoop/dfs/name</value> </property> <property> <name>dfs.datanode.name.dir</name> <value>/home/hadoop/dfs/data</value> </property> <property> <name>dfs.namenode.secondary.http-address</name> <value>centos8-1:9000</value> </property> <property> <name>dfs.permissions</name> <value>false</value> </property> </configuration>
3.8 configure wroks and add the following:
centos8-0
centos8-1
3.9 copying to other machines
scp -r /home/hadoop root@centos8-1:/home/ scp /etc/profile root@centos8-1:/etc/
Second virtual machine centos8-1
source /etc/profile
3.10 initialize the namenode node, and the second machine should also be initialized
hdfs namenode -format
3.11 start the service on the first machine:
start-dfs.sh
3.12 access test:
Default browse address: http://192.168.207.164:9870/
View the file through utilities - > browse the file system. Uploading the file from the WINDOWS Web page will fail. You need to add the following:
C:\Windows\System32\drivers\etc\hosts 192.168.207.164 centos8-0 192.168.207.165 centos8-1
3.13 command line test
hadoop fs -ls / hadoop fs -mkdir /mytest hadoop fs -copyFromLocal test.txt/test.txt hadoop fs -appendToFile test.txt/test.txt hadoop fs -cat /test.txt
So far, Hadoop HDFS 3.3.1 distributed storage has been built
*KunlunDB project is open source
[GitHub: ]
https://github.com/zettadb
[Gitee: ]
https://gitee.com/zettadb
END