MapReduce actual case, MapTask operation mechanism, ReduceTask operation mechanism, MapReduce execution process, hadoop data compression, implementation of Join algorithm

MapReduce actual case, MapTask operation mechanism, ReduceTask operation mechanism, MapReduce execution process, hadoop data compression, implementation of Join algorithm MapReduce actual case Reverse order of upstream traffic Cell phone number division MapTask operation mechanism Operation proces ...

Added by Fergal Andrews on Sat, 13 Jun 2020 09:39:07 +0300

Hadoop Learning Notes-Spqrk for TopN(Python)

Spqrk implements TopN Experimentation Requirements Data preparation Expected results Related Classes and Operators findspark pyspark: SparkContext: parallelize(*c*, *numSlices=None*) collect() textFile(*name*, *minPartitions=None*, *use_unicode=True*) map(*f*, *preservesPartitioning=False*) cache( ...

Added by jdiver on Sun, 07 Jun 2020 05:11:10 +0300

CDH6.3.2 Enable Kerberos integration using phoenix

Tags (space delimited): building large data platforms 1. Download and install Phoenix parcel 2. Install CSD files 3. Add Phoenix service in Cloudera Manager (provided HBase service is installed) 4. Configure HBase for Phoenix V. Verify Phoenix installation and smoke test 6. Import Data Validation Test 7. Integration of phoinex schema with ...

Added by lszanto on Thu, 04 Jun 2020 19:45:25 +0300

HDFS - operation and maintenance

1, Add node Operating system configuration: ① host name, network, firewall, ssh configuration ssh-keygen -t rsa At the same time, the auth*-keys file of ssh of any node in the cluster can be distributed to the latest node Add the domain name mapping of this node in the / etc/hosts file of all nodes Copy the configuration file of namenode to ...

Added by Rayman3.tk on Wed, 06 May 2020 02:22:05 +0300

Why is the number of hard links 2 after Linux creates a directory

Execute ll in a directory [hadoop@hadoop0 test]$ ll total 8 drwxrwxr-x. 2 hadoop hadoop 4096 Jan 17 22:28 a drwxrwxr-x. 2 hadoop hadoop 4096 Jan 17 22:28 b -rw-rw-r--. 1 hadoop hadoop 0 Jan 17 22:28 c It is found that the number of hard links between directory a and directory b is 2, while that of file c is 1 Why is that? Discovery: Under d ...

Added by Accurax on Sun, 03 May 2020 22:44:32 +0300

Optimized points (window functions) of sparksql over hivesql

Sometimes, a select statement contains multiple window functions whose window definitions (OVER clauses) may be the same or different. For the same windows, there is no need to partition and sort them again. We can merge them into a Window operator. such as The realization principle of window function in spark and hive Case in: select i ...

Added by serverman on Tue, 07 Apr 2020 17:52:21 +0300

hadoop 8-day course -- the fifth day, the HA mechanism of hadoop

The mechanism of hadoop is only available in hadoop 2.x. the implementation of this function depends on a distributed component: zookeeper. Brief introduction to zookeeper zookeeper mainly provides distributed coordination services. Main functions: 1. Provide storage and management of a small amount of data. 2. Provide monitoring function for d ...

Added by THEMADGEEK on Mon, 06 Apr 2020 12:21:44 +0300

[Oozie] Introduction to Oozie architecture and operation model

Article directory 1, Introduction to Oozie framework 2, Main functions of Oozie 3, Oozie internal analysis 4, Horizontal and vertical scalability of Oozie 5, The Action execution model of Oozie 1, Introduction to Oozie framework Definition of Oozie: tamer An open source framework based on workfl ...

Added by Dragonfly on Mon, 16 Mar 2020 07:46:10 +0200

Hbase data backup case explanation

Data backup of HBase 1.1 backup the table based on the class provided by HBase Use the class provided by HBase to export the data of a table in HBase to HDFS, and then to the test HBase table. (1) = = export from hbase table to HDFS== [hadoop@node01 shells]$ hbase org.apache.hadoop.hbase.mapreduce.Export myuser /hbase_data/myuser_bak (2 ...

Added by ShashidharNP on Mon, 24 Feb 2020 13:26:23 +0200

Compile the Flink 1.9.0 report, Flink FS Hadoop shaded cannot be found

Compile the Flink 1.9.0 report, Flink FS Hadoop shaded cannot be found 1.Flink source code download git clone git@github.com:apache/flink.git Then you can switch to different branches of the project and execute the following command to switch the code to release-1.9 branch: git checkout release-1.9 ...

Added by Sekka on Sat, 22 Feb 2020 16:58:52 +0200