"Everything is a document" is the basic design philosophy of UNIX. Files are organized into tree directories according to hierarchical relationships, which constitute the basic form of the file system. When using the file system to save data, users do not need to care about the underlying storage mode of data, and can access it accord ...
Added by waseembari1985 on Thu, 03 Mar 2022 12:19:16 +0200
1, Recycle bin for HDFS
There is a recycle bin in our windows system. If you want to restore deleted files, you can restore them here. HDFS also has a recycle bin.
HDFS will create a recycle bin directory for each user: / user / user name / Trash /, every file / directory deleted by the user on the Shell command line will enter the correspond ...
Added by phillips321 on Wed, 02 Mar 2022 02:13:11 +0200
HDFS Architecture Overview
Hadoop Distributed File System (HDFS for short) is a distributed file system.
NameNode (nn): stores the metadata of the file. Such as file name, file directory structure, file attributes (generation time, number of copies, file permissions), block list of each file, DataNo ...
Infrastructure of a distributed systemIt mainly solves the problems of massive data storage and distributed computing
1.1 three major releases of Hadoop
The original version of Apache was released in 2006Cloudera integrates many big data frameworks internally, and the corresponding product is CDH releas ...
Added by birwin on Wed, 23 Feb 2022 18:37:40 +0200
1. HDFS write process
1.client adopt Distributed FileSystem Module direction NameNode Request to upload files, NameNode It will check whether the target file exists, whether the path is correct, and whether the user has permission.
2.NameNode towards client Return whether you can upload or not, and return three items at the ...
Added by kee1108 on Wed, 23 Feb 2022 10:12:13 +0200
Translate pyspark code that implements a business logic into sparksql to supplement the historical data for the past six months (run by day) based on sparksql;
1) Translate pyspark to sparksql; 2) Based on sparksql, supplement the historical data of the past half year (run by day);
1) First, pyspark is tra ...
Added by crimsonmoon on Fri, 11 Feb 2022 03:30:23 +0200
Some experience on the use of HDFS
I've been working on big data in the company for some time. Take time to sort out the problems encountered and some better optimization methods.
1.HDFS storage multi directory
1.1 production server disk
1.2 on HDFS site Configure multiple directories in the XML file, and pay attention t ...
Added by Soldier Jane on Fri, 28 Jan 2022 02:06:47 +0200
Part of the content is extracted from the training materials of shangsilicon Valley, dark horse and so on
1. Hadoop Archive
HDFS is not good at storing small files, because each file has at least one block, and the metadata of each block will occupy memory in the NameNode. If there are a large number of small files ...
Added by dstantdog3 on Tue, 25 Jan 2022 10:12:58 +0200
Hadoop HA high availability installation
Problems needing attention in this scheme
hdfs-site. Dfs.xml file ha. fencing. The methods parameter is shell instead of sshence Because the host of the primary node is down (the host is down instead of stopping the service) and cannot be switched However, most articles related to Hadoop HA fro ...