Install Hadoop and configure pseudo distributed environment for Ubuntu

Ubuntu 16.04 install Hadoop and configure pseudo distributed environment

(class assignments)

1, hadoop

Hadoop is a distributed system infrastructure developed by the Apache foundation. Users can develop distributed programs without knowing the details of the distributed bottom layer. Make full use of the power of cluster for high-speed computing and storage. Hadoop implements a Distributed File System, one of which is HDFS (Hadoop Distributed File System). HDFS has the characteristics of high fault tolerance and is designed to be deployed on low-cost hardware; Moreover, it provides high throughput to access application data, which is suitable for applications with large data set s. HDFS relax es the requirements of POSIX and can access the data in the file system in the form of stream. The core design of Hadoop framework is HDFS and MapReduce. HDFS provides storage for massive data, while MapReduce provides computing for massive data.

2, Preparation before installation

1.JDK installation package (download the installation package of. tar.gz for Linux system)

(1) Website download: https://www.oracle.com/java/technologies/javase/javase-jdk8-downloads.html
(2) Network disk self access: https://pan.baidu.com/s/1OfLQ8VtFJN648k-P7z3FpQ
Extraction code: yqwm
(I prepared jdk-8u301-linux-x64 here)

2.hadoop installation package

(1) Website download: https://archive.apache.org/dist/hadoop/common/hadoop-3.3.1/hadoop-3.3.1.tar.gz
(2) Network disk self access: https://pan.baidu.com/s/1vHuDktIdtBYDvZu37J575Q
Extraction code: a2yb

3.Xshell7,xftp7

4. Connect using xshell

tips:

ifconfig     #Display or configure network devices

SSH connection is not possible because SSH server is not installed

sudo apt-get install openssh-server    #Install SSH server

3, Install JDK

1. Update Ubuntu source

sudo apt-get update

2. Upload JDK installation package

Upload the jdk installation package just downloaded with xftp of Xshell

Generally upload to: / home/ubuntu (here ubuntu is my account name)

3. Unzip the JDK installation package to the / usr/local / directory

sudo tar -zxvf jdk-8u301-linux-x64.tar.gz -C /usr/local/

4. Rename the extracted folder to jdk8

cd /usr/local/
sudo mv jdk1.8.0_301/ jdk8

5. Add to environment variable

cd /home/Account name/
sudo gedit .bashrc

Add the following at the end of the file:

export JAVA_HOME=/usr/local/jdk8
export JRE_HOME=$JAVA_HOME/jre
export CLASSPATH=.:$JAVA_HOME/lib:$JRE_HOME/lib
export PATH=.:$JAVA_HOME/bin:$PATH

heed: add content must not be wrong, remember to save Oh

take effect

source .bashrc

6. Verify that the installation is successful

java -version

The java version is installed and added to the environment variable successfully

4, Create hadoop user

sudo useradd -m hadoop -s /bin/bash  #Create a hadoop user and use / bin/bash as the shell
sudo passwd hadoop                   #Set the password for hadoop users and enter the password twice in a row
sudo adduser hadoop sudo             #Add administrator privileges for hadoop users
su hadoop                            #Switch the current user to hadoop

5, Install Hadoop

1. Install SSH

sudo apt-get install ssh

2. Configure login free authentication to avoid permission problems when using Hadoop

ssh-keygen -t rsa        #Enter all the way after entering this command
cd ~/.ssh
cat id_rsa.pub >> authorized_keys
ssh localhost            #It was found that the password was not entered to connect
exit                     #Exit remote connection status

3. Upload hadoop installation package

Upload the just downloaded hadoop installation package with xftp of Xshell

Generally upload to: / home / account name

4. Unzip the hadoop installation package to the / usr/local directory, rename the folder to hadoop, and finally set the permissions

cd /home/ubuntu                 #Here ubuntu is my account name
sudo tar -zxvf hadoop-3.3.1.tar.gz -C /usr/local/
cd /usr/local
sudo mv hadoop-3.3.1/ hadoop
sudo chown -R ubuntu hadoop/   #Here ubuntu is my account name

5. Verify that the installation is successful

cd /usr/local/hadoop/bin
./hadoop version

The version of hadoop appears

6. Set JAVE_HOME environment variable

sudo gedit /usr/local/hadoop/etc/hadoop/hadoop-env.sh

Put the original (tips: the line number is displayed in the lower right corner, which is easier to find in line 54)

export JAVA_HOME=${JAVA_HOME}

Change to

export JAVA_HOME=/usr/local/jdk8

And remove the comments, remember to save Oh

7. Set Hadoop environment variables

sudo gedit /home/ubuntu/.bashrc   #Here ubuntu is my account name

Append at the end

export HADOOP_HOME=/usr/local/hadoop
export CLASSPATH=.:{JAVA_HOME}/lib:${HADOOP_HOME}/sbin:$PATH
export PATH=.:${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin:$PATH

take effect

source /home/ubuntu/.bashrc   #Here ubuntu is my account

6, Configure pseudo distributed environment

Modify two configuration files (core-site.xml and hdfs-site.xml)

1. Modify core-site.xml

sudo gedit  /usr/local/hadoop/etc/hadoop/core-site.xml

Add the following

<configuration>

<property>
<name>hadoop.tmp.dir</name>
<value>file:/usr/local/hadoop/tmp</value>
</property>

<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>

<configuration>

Remember to save

2. Modify hdfs-site.xml

sudo gedit /usr/local/hadoop/etc/hadoop/hdfs-site.xml

Add the following

<configuration>

<property>
<name>dfs.replication</name>
<value>1</value>
</property>

<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hadoop/tmp/dfs/name</value>
</property>

<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/local/hadoop/tmp/dfs/data</value>
</property>

<configuration>

Remember to save
YARN is not configured here for the time being. You need to configure it yourself

3. Perform NameNode formatting

cd /usr/local/hadoop/bin
./hdfs namenode -format   

The storage directory / usr / local / Hadoop / TMP / DFs / name has been successfully formatted indicates that the format is successful

4. Start all Hadoop components

cd /usr/local/hadoop/sbin
./start-all.sh

Warnings may appear during startup, which can be ignored directly without affecting normal use

5. After successful startup, you can access the Web page to view the NameNode and Datanode information, and you can also view the files in HDFS online

http: / / (own ip): 9870/
Or your own Firefox browser
localhost: 9870

6. View Hadoop related component processes

jps

You will find the following processes

7. Close all Hadoop components

cd /usr/local/hadoop/sbin
./stop-all.sh

Added by Dillenger on Sat, 09 Oct 2021 16:15:59 +0300