Compatibility competition of shared file system on cloud

"Everything is a document" is the basic design philosophy of UNIX. Files are organized into tree directories according to hierarchical relationships, which constitute the basic form of the file system. When using the file system to save data, users do not need to care about the underlying storage mode of data, and can access it accord ...

Added by waseembari1985 on Thu, 03 Mar 2022 12:19:16 +0200

Hadoop 08: introduction to HDFS recycle bin and security mode

1, Recycle bin for HDFS There is a recycle bin in our windows system. If you want to restore deleted files, you can restore them here. HDFS also has a recycle bin. HDFS will create a recycle bin directory for each user: / user / user name / Trash /, every file / directory deleted by the user on the Shell command line will enter the correspond ...

Added by phillips321 on Wed, 02 Mar 2022 02:13:11 +0200

Hadoop cluster entry configuration

Hadoop overview Hadoop composition HDFS Architecture Overview Hadoop Distributed File System (HDFS for short) is a distributed file system. NameNode (nn): stores the metadata of the file. Such as file name, file directory structure, file attributes (generation time, number of copies, file permissions), block list of each file, DataNo ...

Added by Irap on Thu, 24 Feb 2022 08:51:49 +0200

Hadoop in simple terms -- getting started

Hadoop learning 1.Hadoop overview Infrastructure of a distributed systemIt mainly solves the problems of massive data storage and distributed computing 1.1 three major releases of Hadoop The original version of Apache was released in 2006Cloudera integrates many big data frameworks internally, and the corresponding product is CDH releas ...

Added by birwin on Wed, 23 Feb 2022 18:37:40 +0200

Hadoop principle and tuning

Hadoop principle 1. HDFS write process 1.client adopt Distributed FileSystem Module direction NameNode Request to upload files, NameNode It will check whether the target file exists, whether the path is correct, and whether the user has permission. 2.NameNode towards client Return whether you can upload or not, and return three items at the ...

Added by kee1108 on Wed, 23 Feb 2022 10:12:13 +0200

python parallel scheduling spark tasks

background Translate pyspark code that implements a business logic into sparksql to supplement the historical data for the past six months (run by day) based on sparksql; Core Point 1) Translate pyspark to sparksql; 2) Based on sparksql, supplement the historical data of the past half year (run by day); Realization 1) First, pyspark is tra ...

Added by crimsonmoon on Fri, 11 Feb 2022 03:30:23 +0200

Some experience of using Hadoop

Some experience on the use of HDFS Write before: I've been working on big data in the company for some time. Take time to sort out the problems encountered and some better optimization methods. 1.HDFS storage multi directory 1.1 production server disk 1.2 on HDFS site Configure multiple directories in the XML file, and pay attention t ...

Added by Soldier Jane on Fri, 28 Jan 2022 02:06:47 +0200

Hadoop ecosystem - HDFS small file solution

preface Part of the content is extracted from the training materials of shangsilicon Valley, dark horse and so on 1. Hadoop Archive    HDFS is not good at storing small files, because each file has at least one block, and the metadata of each block will occupy memory in the NameNode. If there are a large number of small files ...

Added by dstantdog3 on Tue, 25 Jan 2022 10:12:58 +0200

Detailed installation of Hadoop full set of components in Li Jian collection -- taking you into the abyss of big data

catalogue Hadoop deployment Deploy components 1, VMware deployment installation 2, Ubuntu18 Deployment and installation of version 04.5 ​ 3, Installing VMware Tools 4, Configure ssh password free login 5, Java environment installation Hadoop installation MySQL installation and deployment hive installation deployment Sqoop installati ...

Added by Entanio on Sun, 23 Jan 2022 03:34:05 +0200

Hadoop HA high availability deployment

Hadoop HA high availability installation Problems needing attention in this scheme hdfs-site. Dfs.xml file ha. fencing. The methods parameter is shell instead of sshence Because the host of the primary node is down (the host is down instead of stopping the service) and cannot be switched However, most articles related to Hadoop HA fro ...

Added by lou28 on Wed, 19 Jan 2022 17:21:01 +0200