MapReduce actual case, MapTask operation mechanism, ReduceTask operation mechanism, MapReduce execution process, hadoop data compression, implementation of Join algorithm
MapReduce actual case
Reverse order of upstream traffic
Cell phone number division
MapTask operation mechanism
Operation proces ...
Added by Fergal Andrews on Sat, 13 Jun 2020 09:39:07 +0300
Spqrk implements TopN
Related Classes and Operators
textFile(*name*, *minPartitions=None*, *use_unicode=True*)
Added by jdiver on Sun, 07 Jun 2020 05:11:10 +0300
Tags (space delimited): building large data platforms
1. Download and install Phoenix parcel
2. Install CSD files
3. Add Phoenix service in Cloudera Manager (provided HBase service is installed)
4. Configure HBase for Phoenix
V. Verify Phoenix installation and smoke test
6. Import Data Validation Test
7. Integration of phoinex schema with ...
Added by lszanto on Thu, 04 Jun 2020 19:45:25 +0300
1, Add node
Operating system configuration: ① host name, network, firewall, ssh configuration
ssh-keygen -t rsa
At the same time, the auth*-keys file of ssh of any node in the cluster can be distributed to the latest node
Add the domain name mapping of this node in the / etc/hosts file of all nodes
Copy the configuration file of namenode to ...
Added by Rayman3.tk on Wed, 06 May 2020 02:22:05 +0300
Execute ll in a directory
[hadoop@hadoop0 test]$ ll
drwxrwxr-x. 2 hadoop hadoop 4096 Jan 17 22:28 a
drwxrwxr-x. 2 hadoop hadoop 4096 Jan 17 22:28 b
-rw-rw-r--. 1 hadoop hadoop 0 Jan 17 22:28 c
It is found that the number of hard links between directory a and directory b is 2, while that of file c is 1
Why is that?
Under d ...
Added by Accurax on Sun, 03 May 2020 22:44:32 +0300
Sometimes, a select statement contains multiple window functions whose window definitions (OVER clauses) may be the same or different.
For the same windows, there is no need to partition and sort them again. We can merge them into a Window operator.
such as The realization principle of window function in spark and hive Case in:
select i ...
Added by serverman on Tue, 07 Apr 2020 17:52:21 +0300
The mechanism of hadoop is only available in hadoop 2.x. the implementation of this function depends on a distributed component: zookeeper.
Brief introduction to zookeeper
zookeeper mainly provides distributed coordination services. Main functions: 1. Provide storage and management of a small amount of data. 2. Provide monitoring function for d ...
Added by THEMADGEEK on Mon, 06 Apr 2020 12:21:44 +0300
1, Introduction to Oozie framework
2, Main functions of Oozie
3, Oozie internal analysis
4, Horizontal and vertical scalability of Oozie
5, The Action execution model of Oozie
1, Introduction to Oozie framework
Definition of Oozie: tamer
An open source framework based on workfl ...
Added by Dragonfly on Mon, 16 Mar 2020 07:46:10 +0200
Data backup of HBase
1.1 backup the table based on the class provided by HBase
Use the class provided by HBase to export the data of a table in HBase to HDFS, and then to the test HBase table.
(1) = = export from hbase table to HDFS==
[hadoop@node01 shells]$ hbase org.apache.hadoop.hbase.mapreduce.Export myuser /hbase_data/myuser_bak
Added by ShashidharNP on Mon, 24 Feb 2020 13:26:23 +0200
Compile the Flink 1.9.0 report, Flink FS Hadoop shaded cannot be found
1.Flink source code download
git clone firstname.lastname@example.org:apache/flink.git
Then you can switch to different branches of the project and execute the following command to switch the code to release-1.9 branch:
git checkout release-1.9 ...
Added by Sekka on Sat, 22 Feb 2020 16:58:52 +0200