Distributed parallel computing experiment WordCount word count

Test the WordCount function in Hadoop cluster

Goal: build a Hadoop development environment using Eclipse+Maven, and compile and run the official WordCount source code.

  Create Hadoop project

    establish Maven project
Creating Maven Please set it before the project Maven , at least maven Change the image to domestic source
stay Eclipse In, Fil·e>New>Maven Project :



  Add Hadoop dependency

At the beginning of the project pom.xml Document project Add the following content under the node (in < project > < / Project >):

hadoop jar package has been added to the project

  Implement WordCount function

You can start from hadoop Extract from the official installation package WordCount Source code, the path in the compressed package is: hadoop-
sources.jar , use the decompression tool directly from the jar Extract from the package WordCount.java

  Some official source codes:

  Build project

Right click the item and select[ run as ] > [ maven build... ], in Goals Medium input clean package :


  Test the WordCount function in the cluster

Start cluster


jps check and run, and the results must at least include:

[root@hadoopnode1 ~]# jps
136 NameNode
252 ResourceManager
862 Jps

Create a test file (myword.txt) in the virtual machine

[root@hadoopnode1 ~]# mkdir -p /home/demo   
[root@hadoopnode1 ~]# cd/home/demo 
[root@hadoopnode1 demo]# vi myword.txt

Write in the file (of course, this is only the test data, and the specific data is still based on your needs):

this is a wordcount test! 
hello! my name is jerry. 
who are you! 
where are you from! 
the end! 
Create an input folder on hdfs( -p is to create the parent directory along the path -p Is to create a parent directory along the path ):
[root@hadoopnode1 demo]# hdfs dfs -mkdir -p /wordcount/input
Upload test files to hdfs:
[root@hadoopnode1 demo]# hdfs dfs -put myword.txt /wordcount/input

Upload the jar package and run:

Packed /bigdataprotrain/target/bigdataprotrain-0.0.1-SNAPSHOT.jar utilize ftp Tool upload
To cluster namenode node /home/demo Directory:
Command interpretation: hadoop   jar   Jar package name   Package name. Class name   Enter file address   Output file address
  • /wordcount/input / is the directory where the input file is located, which needs to be established in advance
  • /wordcount/output is the directory where the output file is located. The output directory is automatically created and cannot be saved in advance
  • Otherwise, an error will occur. If it exists, please delete it in advance.
  • com.issedu.bigdatapro.sample.WordCount is the package name plus the class name of the main method
[root@hadoopnode1 demo]# hadoop jar bigdataprotrain-0.0.1- SNAPSHOT.jar com.issedu.bigdatapro.sample.WordCount /wordcount/input/ /wordcount/output

View output results:

[root@hadoopnode1 demo]# hdfs dfs -ls /wordcount/output

Results at this time:

be careful:

_SUCCESS The number of file bytes is 0 , there is no content, but the output is marked as successful. The actual content is displayed in the part-r-
00000 In, there may be multiple files with different serial numbers
Found 2 items
-rw-r--r-- 3 root supergroup 0 2020-03-18 09:42
-rw-r--r-- 3 root supergroup 120 2020-03-18 09:42
Download to local view
[root@hadoopnode1 demo]# hdfs dfs -get /wordcount/output/part* 
[root@hadoopnode1 demo]# cat part-r-00000

The results are as follows:

a 1
are 2
end! 1
from! 1
hello! 1
is 2
jerry. 1
my 1
name 1
test! 1
the 1
this 1
where 1
who 1
wordcount 1
you 1
you! 1

Keywords: Hadoop Maven Zookeeper mapreduce

Added by Bad HAL 9000 on Mon, 20 Sep 2021 19:14:38 +0300