Hadoop 08: introduction to HDFS recycle bin and security mode

1, Recycle bin for HDFS

There is a recycle bin in our windows system. If you want to restore deleted files, you can restore them here. HDFS also has a recycle bin.

HDFS will create a recycle bin directory for each user: / user / user name / Trash /, every file / directory deleted by the user on the Shell command line will enter the corresponding recycle bin directory. The data in the recycle bin has a life cycle, that is, when the file / directory in the recycle bin is not recovered by the user within a period of time, HDFS will automatically delete the file / directory completely, and then, The user will never find this file / directory.

By default, the recycle bin of hdfs is not enabled. It needs to be enabled through a configuration in the core site The following configuration is added to the XML. The unit of value is minutes, and 1440 minutes represents the life cycle of a day

<property>
    <name>fs.trash.interval</name>
    <value>1440</value>
</property>

Verify the deletion operation before modifying the configuration information. It shows that it has been deleted directly.

[root@bigdata01 hadoop-3.2.0]# hdfs dfs -rm -r /NOTICE.txt
Deleted /NOTICE.txt

Modify the recycle bin configuration, first operate on bigdata01, then synchronize to the other two nodes, and stop the cluster first

[root@bigdata01 hadoop-3.2.0]# sbin/stop-all.sh 
[root@bigdata01 hadoop-3.2.0]# vi etc/hadoop/core-site.xml 
<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://bigdata01:9000</value>
    </property>
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/data/hadoop_repo</value>
   </property>
    <property>
        <name>fs.trash.interval</name>
        <value>1440</value>
    </property>
</configuration>
[root@bigdata01 hadoop-3.2.0]# scp -rq etc/hadoop/core-site.xml bigdata02:/data/soft/hadoop-3.2.0/etc/hadoop/
[root@bigdata01 hadoop-3.2.0]# scp -rq etc/hadoop/core-site.xml bigdata03:/data/soft/hadoop-3.2.0/etc/hadoop/

Start the cluster and then delete it

[root@bigdata01 hadoop-3.2.0]# sbin/start-all.sh
[root@bigdata01 hadoop-3.2.0]# hdfs dfs -rm -r /README.txt
2020-04-09 11:43:47,664 INFO fs.TrashPolicyDefault: Moved: 'hdfs://bigdata01:9000/README.txt' to trash at: hdfs://bigdata01:9000/user/root/.Trash/Current/README.txt

At this time, you see the prompt message that the deleted file has been moved to the specified directory, which is actually moved to the recycle bin directory of the current user.

Recycle bin files can also be downloaded locally. In fact, the recycle bin is just an HDFS directory with special meaning.

Note: if the deleted file is too large and exceeds the size of the recycle bin, you will be prompted that the deletion fails
 Parameters need to be specified -skipTrash ,Specifying this parameter means that deleted files will not go to the recycle bin

[root@bigdata01 hadoop-3.2.0]# hdfs dfs -rm -r -skipTrash /user.txt
Deleted /user.txt

2, Security mode of HDFS

When you are operating HDFS at ordinary times, you may sometimes encounter this problem. Especially when you upload or delete files when you start the cluster, you will find an error and prompt that the NameNode is in safe mode.

This belongs to the security mode of HDFS. Every time the cluster restarts, HDFS will check whether the file information in the cluster is complete, such as whether the replica is missing. Therefore, it is not allowed to modify the cluster during this time period. If this situation occurs, you can wait a little while and exit the security mode automatically after the HDFS self-test is completed.

[root@bigdata01 hadoop-3.2.0]# hdfs dfs -rm -r /hadoop-3.2.0.tar.gz
2020-04-09 12:00:36,646 WARN fs.TrashPolicyDefault: Can't create trash directory: hdfs://bigdata01:9000/user/root/.Trash/Current
org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot create directory /user/root/.Trash/Current. Name node is in safe mode.

At this time, access the web ui interface of HDFS, and you can see the following information: on indicates that it is in safe mode, and off indicates that the safe mode has ended


Or you can view the current status through the hdfs command

[root@bigdata01 hadoop-3.2.0]# hdfs dfsadmin -safemode get
Safe mode is ON

If you want to leave the safe mode quickly, you can force it to leave by command. Under normal circumstances, it is recommended to exit automatically after HDFS self-test is completed

[root@bigdata01 hadoop-3.2.0]# hdfs dfsadmin -safemode leave
Safe mode is OFF

At this point, you can operate the files in HDFS again.

Keywords: Hadoop hdfs mr

Added by phillips321 on Wed, 02 Mar 2022 02:13:11 +0200