Distributed parallel computing experiment WordCount word count

Test the WordCount function in Hadoop cluster

Goal: build a Hadoop development environment using Eclipse+Maven, and compile and run the official WordCount source code.

Create Hadoop project

establish Maven project

Creating Maven Please set it before the project Maven , at least maven Change the image to domestic source

stay Eclipse In, Fil·e>New>Maven Project :

Add Hadoop dependency

At the beginning of the project pom.xml Document project Add the following content under the node (in < project > < / Project >):

<properties> 
<hadoop.version>2.8.5</hadoop.version> 
</properties>
 <dependencies> 
<dependency>
 <groupId>org.apache.hadoop</groupId> 
<artifactId>hadoop-client</artifactId> 
<version>${hadoop.version}</version> 
</dependency>
 <dependency>
 <groupId>org.apache.hadoop</groupId>
 <artifactId>hadoop-common</artifactId>
 <version>${hadoop.version}</version> 
</dependency>
 <dependency> 
<groupId>org.apache.hadoop</groupId>
 <artifactId>hadoop-hdfs</artifactId> 
<version>${hadoop.version}</version>
 </dependency>
 </dependencies>

hadoop jar package has been added to the project

Implement WordCount function

You can start from hadoop Extract from the official installation package WordCount Source code, the path in the compressed package is: hadoop-

2.8.5\share\hadoop\mapreduce\sources\hadoop-mapreduce-examples-2.8.5-

sources.jar , use the decompression tool directly from the jar Extract from the package WordCount.java

Some official source codes:

Build project

Right click the item and select[ run as ] > [ maven build... ], in Goals Medium input clean package :

Test the WordCount function in the cluster

Start cluster

start-all.sh

jps check and run, and the results must at least include:

[root@hadoopnode1 ~]# jps
136 NameNode
252 ResourceManager
862 Jps

Create a test file (myword.txt) in the virtual machine

[root@hadoopnode1 ~]# mkdir -p /home/demo   
[root@hadoopnode1 ~]# cd/home/demo 
[root@hadoopnode1 demo]# vi myword.txt

Write in the file (of course, this is only the test data, and the specific data is still based on your needs):

this is a wordcount test! 
hello! my name is jerry. 
who are you! 
where are you from! 
the end!

Create an input folder on hdfs( -p is to create the parent directory along the path -p Is to create a parent directory along the path ):

[root@hadoopnode1 demo]# hdfs dfs -mkdir -p /wordcount/input

Upload test files to hdfs:

[root@hadoopnode1 demo]# hdfs dfs -put myword.txt /wordcount/input

Upload the jar package and run:

Packed /bigdataprotrain/target/bigdataprotrain-0.0.1-SNAPSHOT.jar utilize ftp Tool upload

To cluster namenode node /home/demo Directory:

Command interpretation: hadoop jar Jar package name Package name. Class name Enter file address Output file address

/wordcount/input / is the directory where the input file is located, which needs to be established in advance
/wordcount/output is the directory where the output file is located. The output directory is automatically created and cannot be saved in advance
Otherwise, an error will occur. If it exists, please delete it in advance.
com.issedu.bigdatapro.sample.WordCount is the package name plus the class name of the main method

[root@hadoopnode1 demo]# hadoop jar bigdataprotrain-0.0.1- SNAPSHOT.jar com.issedu.bigdatapro.sample.WordCount /wordcount/input/ /wordcount/output

View output results:

[root@hadoopnode1 demo]# hdfs dfs -ls /wordcount/output

Results at this time:

be careful:

_SUCCESS The number of file bytes is 0 , there is no content, but the output is marked as successful. The actual content is displayed in the part-r-

00000 In, there may be multiple files with different serial numbers

Found 2 items

-rw-r--r-- 3 root supergroup 0 2020-03-18 09:42

/wordcount/output/_SUCCESS

-rw-r--r-- 3 root supergroup 120 2020-03-18 09:42

/wordcount/output/part-r-00000

Download to local view

[root@hadoopnode1 demo]# hdfs dfs -get /wordcount/output/part* 
[root@hadoopnode1 demo]# cat part-r-00000

The results are as follows:

a 1

are 2

end! 1

from! 1

hello! 1

is 2

jerry. 1

my 1

name 1

test! 1

the 1

this 1

where 1

who 1

wordcount 1

you 1

you! 1

Keywords: Hadoop Maven Zookeeper mapreduce

Added by Bad HAL 9000 on Mon, 20 Sep 2021 19:14:38 +0300

Programming VIP