Compatibility competition of shared file system on cloud
"Everything is a document" is the basic design philosophy of UNIX. Files are organized into tree directories according to hierarchical relationships, which constitute the basic form of the file system. When using the file system to save data, users do not need to care about the underlying storage mode of data, and can access it accord ...
Added by waseembari1985 on Thu, 03 Mar 2022 12:19:16 +0200
Hadoop 08: introduction to HDFS recycle bin and security mode
1, Recycle bin for HDFS
There is a recycle bin in our windows system. If you want to restore deleted files, you can restore them here. HDFS also has a recycle bin.
HDFS will create a recycle bin directory for each user: / user / user name / Trash /, every file / directory deleted by the user on the Shell command line will enter the correspond ...
Added by phillips321 on Wed, 02 Mar 2022 02:13:11 +0200
Hadoop cluster entry configuration
Hadoop overview
Hadoop composition
HDFS Architecture Overview
Hadoop Distributed File System (HDFS for short) is a distributed file system.
NameNode (nn): stores the metadata of the file. Such as file name, file directory structure, file attributes (generation time, number of copies, file permissions), block list of each file, DataNo ...
Added by Irap on Thu, 24 Feb 2022 08:51:49 +0200
Hadoop in simple terms -- getting started
Hadoop learning
1.Hadoop overview
Infrastructure of a distributed systemIt mainly solves the problems of massive data storage and distributed computing
1.1 three major releases of Hadoop
The original version of Apache was released in 2006Cloudera integrates many big data frameworks internally, and the corresponding product is CDH releas ...
Added by birwin on Wed, 23 Feb 2022 18:37:40 +0200
Hadoop principle and tuning
Hadoop principle
1. HDFS write process
1.client adopt Distributed FileSystem Module direction NameNode Request to upload files, NameNode It will check whether the target file exists, whether the path is correct, and whether the user has permission.
2.NameNode towards client Return whether you can upload or not, and return three items at the ...
Added by kee1108 on Wed, 23 Feb 2022 10:12:13 +0200
python parallel scheduling spark tasks
background
Translate pyspark code that implements a business logic into sparksql to supplement the historical data for the past six months (run by day) based on sparksql;
Core Point
1) Translate pyspark to sparksql; 2) Based on sparksql, supplement the historical data of the past half year (run by day);
Realization
1) First, pyspark is tra ...
Added by crimsonmoon on Fri, 11 Feb 2022 03:30:23 +0200
Some experience of using Hadoop
Some experience on the use of HDFS
Write before:
I've been working on big data in the company for some time. Take time to sort out the problems encountered and some better optimization methods.
1.HDFS storage multi directory
1.1 production server disk
1.2 on HDFS site Configure multiple directories in the XML file, and pay attention t ...
Added by Soldier Jane on Fri, 28 Jan 2022 02:06:47 +0200
Hadoop ecosystem - HDFS small file solution
preface
Part of the content is extracted from the training materials of shangsilicon Valley, dark horse and so on
1. Hadoop Archive
HDFS is not good at storing small files, because each file has at least one block, and the metadata of each block will occupy memory in the NameNode. If there are a large number of small files ...
Added by dstantdog3 on Tue, 25 Jan 2022 10:12:58 +0200
Detailed installation of Hadoop full set of components in Li Jian collection -- taking you into the abyss of big data
catalogue
Hadoop deployment
Deploy components
1, VMware deployment installation
2, Ubuntu18 Deployment and installation of version 04.5
3, Installing VMware Tools
4, Configure ssh password free login
5, Java environment installation
Hadoop installation
MySQL installation and deployment
hive installation deployment
Sqoop installati ...
Added by Entanio on Sun, 23 Jan 2022 03:34:05 +0200
Hadoop HA high availability deployment
Hadoop HA high availability installation
Problems needing attention in this scheme
hdfs-site. Dfs.xml file ha. fencing. The methods parameter is shell instead of sshence Because the host of the primary node is down (the host is down instead of stopping the service) and cannot be switched However, most articles related to Hadoop HA fro ...
Added by lou28 on Wed, 19 Jan 2022 17:21:01 +0200