Tencent Yun ubuntu 16.04.1 LTS 64 bits
Linux operation
Modify the password of root
sudo passwd root
Log off users
logout
Close the firewall
ufw disable
Uninstall iptables components
apt-get remove iptables
Download vim components (for text editing)
apt-get install vim
Word change
sudo dpkg-reconfigure console-setup
linux remote connection
- First: ssh services on linux
- Second: Use the ssh client tool
Install ssh tools for the system
apt-get install openssh-server
Start ssh service
/etc/init.d/ssh start
Check the process to see if the specified service has been started
ps -e | grep sshd
ssh can only be used with processes
ubuntu does not allow root users to log in by default
Open the / etc/ssh/sshd_config file using vim
vim /etc/ssh/sshd_config
Subsequently, the content of PermitRootLogin is modified to yes
Configuring ftp services
Install ftp components
apt-get install vsftpd
Modify ftp user password
passwd ftp
After the FTP service is installed, a directory is automatically created: / srv/ftp
cd /srv/ftp
Set this directory to full permissions
chmod 777 /srv/ftp
To make ftp work properly, you need to modify the configuration file "/ etc/vsftpd.conf"
vim /etc/vsftpd.conf
Set the following configuration:
Settings do not allow anonymous login (user name and password must be correct)
anonymous_enable=NO
# Configure that the user has write permission
write_enable=YES
Allow local users to log in
local_enable=YES
Whether to limit all users to the home directory (remove comments #)
chroot_local_user=YES
Whether to start a restricted list of users
chroot_list_enable=YES
Define directories for list settings (because multiple accounts can be set in the list)
chroot_list_file=/etc/vsftpd.chroot_list
Adding a service configuration
pam_service_name=vsftpd
Enter the configuration file
vim /etc/vsftpd.chroot_list
Add an ftp user, enter, complete, save and exit
Modify/etc/pam.d/vsftpd
vim /etc/pam.d/vsftpd
Comment out the following
auth required pam_shells.so
Start ftp service
service vsftpd start service vsftpd restart //Restart
Check to see if ftp service has been started
ps -e | grep vsftpd
hadoop installation and configuration
jdk installation and configuration
Download jdk or direct ftp incoming
wget http://download.oracle.com/otn-pub/java/jdk/8u191-b12-demos/2787e4a523244c269598db4e85c51e0c/jdk-8u191-linux-x64-demos.tar.gz
Unzip and save in / usr/local directory
tar xzvf jdk-8u191-linux-x64-demos.tar.gz -C /usr/local
Rename
mv jdk1.8.0_191/ jdk
Enter the environment file for configuration
vim /etc/profile
export JAVA_HOME=/usr/local/jdk
export PATH=$PATH:$JAVA_HOME/bin:
export CLASS_PATH=$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
# By default, after modifying the content of the environment variable, the system needs to be restarted to read the new configuration, but it can also use source to make the configuration take effect immediately:
source /etc/profile
Installing hadoop in linux
tar xzvf hadoop-2.8.5-src -C /usr/local
# Configuration of Entry Environment Variables
export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:
Let configuration take effect
source /etc/profile
hadoop relies on JDK support, so define the JDK path to use in a resource file of hadoop
vim hadoop-env.sh
Determine the jdk settings to be used in Hadoop
export JAVA_HOME=/usr/local/jdk
# To test whether hadoop is installed and available, you can use hadoop with its own test program
For word statistics, first specify a word file
# Create an input directory under the Hadoop directory:
root@VM-0-3-ubuntu:/usr/local/hadoop# mkdir input
# Write a document
root@VM-0-3-ubuntu:/usr/local/hadoop# echo hello 6jj hello nihaoa > input/info.txt
Use "" to split each word
Distributed hadoop configuration
Configure ssh
ip cannot be changed, otherwise it needs to be reconfigured
For configuration convenience, set the host name for each computer
vim /etc/hostname
Change the localhost inside to "Hadoopm"
You also need to modify the mapping configuration of the host, modify the'/ etc/hosts'file, and add the mapping of IP address to hadoopm host name.
vim /etc/hosts
172.16.0.3 hadoopm
In order to make it work, it is recommended to restart input reboot and restart Linux
In the whole process of Hadoop processing, ssh is used to achieve communication, so even on the local machine, ssh is also recommended for communication processing, and ssh must be configured on the computer to exempt from landing processing.
Since the computer may already have SSH configuration, it is recommended to delete the ".ssh" folder in the root directory.
cd ~ rm -rf ~/.ssh
Generate ssh Key on the host of Hadoop:
ssh-keygen -t rsa
At this point, if the program wants to login, it still needs a password. It needs to store the public key information in the authorized_key file in the authorized authentication file.
cd ~ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
In the future, landing-free processing can be carried out.
Landing subsequently
ssh root@hadoopm
When the login becomes a remote connection, exit can be used to exit the current connection.
Related configuration of hadoop
All configuration files are in the "/usr/local/hadoop/etc/hadoop/" directory
Configuration: "core-site.xml"
Identify Hadoop's core information, including temporary directories, access addresses
<property> <name>hadoop.tmp.dir</name> <value>/home/root/hadoop_tmp</value> </property> <property> <name>fs.defaultFS</name> <value>hdfs://hadoopm:9000</value> </property>
- The "hdfs://hadoopm:9000" information configured in this article describes the path of the page manager to be opened later. The default port of Hadoop version 2.X is 9000.
- The most important thing in this configuration is "/ home/root/hadoop_tmp". If the temporary file information of the file path configuration is not configured, the "tmp" file will be generated in the Hadoop folder ("/usr/local/hadoop/tmp"). If this configuration is restarted, all information will be cleared, that is to say, the Hadoop environment will fail. In order to ensure that there is no error, a "/ho" can be established directly. Me/root/hadoop_tmp "directory mkdir ~/root/hadoop_tmp"
Configuration: "yarn-site.xml"
It can be simply understood as configuring the processing of related job s
<property> <name>yarn.resourcemanager.admin.address</name> <value>hadoopm:8033</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>hadoopm:8025</value> </property> <property> <name>yarn.resourcemanager.scheduler.address</name> <value>hadoopm:8030</value> </property> <property> <name>yarn.resourcemanager.address</name> <value>hadoopm:8050</value> </property> <property> <name>yarn.resourcemanager.scheduler.address</name> <value>hadoopm:8030</value> </property> <property> <name>yarn.resourcemanager.webapp.address</name> <value>hadoopm:8088</value> </property> <property> <name>yarn.resourcemanager.webapp.https.address</name> <value>hadoopm:8090</value> </property>
Configuration: "hdfs-site.xml"
You can determine the number of backups of files and the path of data folders
<property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>file:///usr/local/hadoop/dfs/name</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:///usr/local/hadoop/dfs/data</value> </property> <property> <name>dfs.namenode.http-address</name> <value>hadoopm:50070</value> </property> <property> <name>dfs.namenode.secondary.http-address</name> <value>hadoopm:50090</value> </property> <property> <name>dfs.permissions</name> <value>false</value> </property>
- "replication": the number of copies of a file, usually three copies of a file backup
- "dfs.namenode.name.dir": Define the name node path
- "dfs.datanode.data.dir": Define the data file node path
- "dfs.namenode.http-address": HTTP path access for name service
- "dfs.namenode.secondary.http-address": second name node
- "dfs.permissions": access to permissions, to avoid inaccessibility, set false
Because Hadoop belongs to the distributed development environment, it is necessary to build cluster in the future.
It is recommended to create a master file in the'/ usr/local/hadoop/etc/hadoop /'directory with the host name written in it, which is hadoopm (the host name defined in the previous host file). If it is a stand-alone environment, it can also be written without it.
vim masters
hadoopm
Modify slaves and add hadoopm
vim slaves
Since all namenodes are saved in the hadoop directory at this time, the datanode path can be created by itself for the sake of insurance, in the hadoop directory.
mkdir dfs dfs/name dfs/data
If Hadoop has a problem and needs to be reconfigured, remove the two folders
Format file system
hdfs namenode -format
If formatted properly, the "INFOutil. ExitUtil: Exiting with status 0" message appears
If an error occurs, the "INFOutil. ExitUtil: Exiting with status 1" message appears.
A simple process can then be performed to start Hadoop:
Then you can use the jps command provided by JDK to see that if all Java processes return six processes, the configuration is successful.
jsp
Then you can test whether HDFS is working properly.
Close the service if needed: stop-all.sh command
start-all.sh