HDFS Architecture Overview
Hadoop Distributed File System (HDFS for short) is a distributed file system.
NameNode (nn): stores the metadata of the file. Such as file name, file directory structure, file attributes (generation time, number of copies, file permissions), block list of each file, DataNo ...
Some experience on the use of HDFS
I've been working on big data in the company for some time. Take time to sort out the problems encountered and some better optimization methods.
1.HDFS storage multi directory
1.1 production server disk
1.2 on HDFS site Configure multiple directories in the XML file, and pay attention t ...
Added by Soldier Jane on Fri, 28 Jan 2022 02:06:47 +0200
1, Environment introduction
Install the Ubuntu virtual machine using VirtualBox. Install Hadoop and Eclipse 3.0 in Ubuntu 8 compiler. Download and install JAVA environment, Download jdk and complete the pseudo distributed environment configuration of Hadoop. Import all the required JAR packages encountered by the compiler in Eclipse. Start Had ...
Added by IRON FART on Tue, 04 Jan 2022 09:13:29 +0200
Preface - From Wan Junfeng Kevin
The average delay of the service is basically about 30ms. One of the very big prerequisites is that we make extensive use of MapReduce technology, so that even if our service calls many services, it often depends only on the duration of the slowest request.
For your existing services, you do not need to opti ...
Added by freakuency on Sun, 02 Jan 2022 19:58:37 +0200
Install Hadoop 2 7.3 (under Linux system)
Install MySQL (under Windows or Linux system)
Install Hive (under Linux system) reference: Hive installation configuration
Download search data from Sogou lab for analysis
The downloaded data contains 6 fields, and the data format is described as follows:
Access time user ID ...
Added by cheekychop on Sun, 12 Dec 2021 13:58:21 +0200
This paper is based on the realization of the function of counting the total salary of employees in each department. If it has not been realized, please refer to: Realize the function of counting the total salary of employees in each department
1. Use serialization
2. Implement partition partition
3.Map uses Combiner
You need to set up a hadoop pseudo-distributed cluster platform, which you can see in this tutorial Quick Start Tutorial for Hadoop Big Data Technology and Pseudo-Distributed Clustering
Eclipse Environment Configuration
Eclipse(Windows Local System)
1. Install plug-ins:
Address: https:// ...
Added by bc2013 on Thu, 25 Nov 2021 20:05:01 +0200
1, Previously on
The last article introduced the Api calling method of MapReduce and the configuration of eclipse. This time, we will use MapReduce to count words in English article files!
Welcome to my previous article: MapReduce related eclipse configuration and Api call
Installation requiredDownload methodIDEAO ...
Added by webdes03 on Wed, 10 Nov 2021 21:15:32 +0200