Distributed parallel computing experiment WordCount word count
Test the WordCount function in Hadoop cluster
Goal: build a Hadoop development environment using Eclipse+Maven, and compile and run the official WordCount source code.
Create Hadoop project
establish
Maven
project
Creating
Maven
Please set it before the project
Maven
, at least
maven
Change the image ...
Added by Bad HAL 9000 on Mon, 20 Sep 2021 19:14:38 +0300
Hive DML data operation (data import and export)
1, Data import
1. Load data into the table (load)
1.1 syntax
hive> load data [local] inpath '/opt/module/datas/student.txt' [overwrite] into table student [partition (partcol1=val1,...)];
(1) load data: indicates loading data (2) Local: indicates loading data from local to hive table; Otherwise, load data from HDFS to hive table (3) inp ...
Added by infyportalgroup on Sat, 18 Sep 2021 06:08:19 +0300
hadoop pseudo-distributed installation process
1. Create a virtual machine memory settings first It's best to be larger or not to have fun My settings are 100g and then the memory threads are configured on their own computer.
2. Then configure the file to install jdk What is preceded by a written shell script that you can use directly or configure yourself
3. Download the hadoop installat ...
Added by chris_2001 on Thu, 09 Sep 2021 19:41:34 +0300
What exactly is the ScodendaryNameNode for HDFS
0-Preface
What does HDFS Secondary NameNode do?
This is a classic basic interview question, and the interviewer has asked the interviewer many times (and of course many times). From the impression, about half of the interviewees can't answer correctly, and even give the answer "is not NameNode's hobby". In order to save spa ...
Added by samusk on Thu, 09 Sep 2021 19:31:11 +0300
MapReduce actual case, MapTask operation mechanism, ReduceTask operation mechanism, MapReduce execution process, hadoop data compression, implementation of Join algorithm
MapReduce actual case, MapTask operation mechanism, ReduceTask operation mechanism, MapReduce execution process, hadoop data compression, implementation of Join algorithm
MapReduce actual case
Reverse order of upstream traffic
Cell phone number division
MapTask operation mechanism
Operation proces ...
Added by Fergal Andrews on Sat, 13 Jun 2020 09:39:07 +0300
Hadoop Learning Notes-Spqrk for TopN(Python)
Spqrk implements TopN
Experimentation Requirements
Data preparation
Expected results
Related Classes and Operators
findspark
pyspark:
SparkContext:
parallelize(*c*, *numSlices=None*)
collect()
textFile(*name*, *minPartitions=None*, *use_unicode=True*)
map(*f*, *preservesPartitioning=False*)
cache( ...
Added by jdiver on Sun, 07 Jun 2020 05:11:10 +0300
CDH6.3.2 Enable Kerberos integration using phoenix
Tags (space delimited): building large data platforms
1. Download and install Phoenix parcel
2. Install CSD files
3. Add Phoenix service in Cloudera Manager (provided HBase service is installed)
4. Configure HBase for Phoenix
V. Verify Phoenix installation and smoke test
6. Import Data Validation Test
7. Integration of phoinex schema with ...
Added by lszanto on Thu, 04 Jun 2020 19:45:25 +0300
HDFS - operation and maintenance
1, Add node
Operating system configuration: ① host name, network, firewall, ssh configuration
ssh-keygen -t rsa
At the same time, the auth*-keys file of ssh of any node in the cluster can be distributed to the latest node
Add the domain name mapping of this node in the / etc/hosts file of all nodes
Copy the configuration file of namenode to ...
Added by Rayman3.tk on Wed, 06 May 2020 02:22:05 +0300
Why is the number of hard links 2 after Linux creates a directory
Execute ll in a directory
[hadoop@hadoop0 test]$ ll
total 8
drwxrwxr-x. 2 hadoop hadoop 4096 Jan 17 22:28 a
drwxrwxr-x. 2 hadoop hadoop 4096 Jan 17 22:28 b
-rw-rw-r--. 1 hadoop hadoop 0 Jan 17 22:28 c
It is found that the number of hard links between directory a and directory b is 2, while that of file c is 1
Why is that?
Discovery:
Under d ...
Added by Accurax on Sun, 03 May 2020 22:44:32 +0300
Optimized points (window functions) of sparksql over hivesql
Sometimes, a select statement contains multiple window functions whose window definitions (OVER clauses) may be the same or different.
For the same windows, there is no need to partition and sort them again. We can merge them into a Window operator.
such as The realization principle of window function in spark and hive Case in:
select i ...
Added by serverman on Tue, 07 Apr 2020 17:52:21 +0300