Distributed parallel computing experiment WordCount word count

Test the WordCount function in Hadoop cluster Goal: build a Hadoop development environment using Eclipse+Maven, and compile and run the official WordCount source code.   Create Hadoop project     establish Maven project Creating Maven Please set it before the project Maven , at least maven Change the image ...

Added by Bad HAL 9000 on Mon, 20 Sep 2021 19:14:38 +0300

Hive DML data operation (data import and export)

1, Data import 1. Load data into the table (load) 1.1 syntax hive> load data [local] inpath '/opt/module/datas/student.txt' [overwrite] into table student [partition (partcol1=val1,...)]; (1) load data: indicates loading data (2) Local: indicates loading data from local to hive table; Otherwise, load data from HDFS to hive table (3) inp ...

Added by infyportalgroup on Sat, 18 Sep 2021 06:08:19 +0300

hadoop pseudo-distributed installation process

1. Create a virtual machine memory settings first It's best to be larger or not to have fun My settings are 100g and then the memory threads are configured on their own computer. 2. Then configure the file to install jdk What is preceded by a written shell script that you can use directly or configure yourself 3. Download the hadoop installat ...

Added by chris_2001 on Thu, 09 Sep 2021 19:41:34 +0300

What exactly is the ScodendaryNameNode for HDFS

0-Preface What does HDFS Secondary NameNode do? This is a classic basic interview question, and the interviewer has asked the interviewer many times (and of course many times). From the impression, about half of the interviewees can't answer correctly, and even give the answer "is not NameNode's hobby". In order to save spa ...

Added by samusk on Thu, 09 Sep 2021 19:31:11 +0300

MapReduce actual case, MapTask operation mechanism, ReduceTask operation mechanism, MapReduce execution process, hadoop data compression, implementation of Join algorithm

MapReduce actual case, MapTask operation mechanism, ReduceTask operation mechanism, MapReduce execution process, hadoop data compression, implementation of Join algorithm MapReduce actual case Reverse order of upstream traffic Cell phone number division MapTask operation mechanism Operation proces ...

Added by Fergal Andrews on Sat, 13 Jun 2020 09:39:07 +0300

Hadoop Learning Notes-Spqrk for TopN(Python)

Spqrk implements TopN Experimentation Requirements Data preparation Expected results Related Classes and Operators findspark pyspark: SparkContext: parallelize(*c*, *numSlices=None*) collect() textFile(*name*, *minPartitions=None*, *use_unicode=True*) map(*f*, *preservesPartitioning=False*) cache( ...

Added by jdiver on Sun, 07 Jun 2020 05:11:10 +0300

CDH6.3.2 Enable Kerberos integration using phoenix

Tags (space delimited): building large data platforms 1. Download and install Phoenix parcel 2. Install CSD files 3. Add Phoenix service in Cloudera Manager (provided HBase service is installed) 4. Configure HBase for Phoenix V. Verify Phoenix installation and smoke test 6. Import Data Validation Test 7. Integration of phoinex schema with ...

Added by lszanto on Thu, 04 Jun 2020 19:45:25 +0300

HDFS - operation and maintenance

1, Add node Operating system configuration: ① host name, network, firewall, ssh configuration ssh-keygen -t rsa At the same time, the auth*-keys file of ssh of any node in the cluster can be distributed to the latest node Add the domain name mapping of this node in the / etc/hosts file of all nodes Copy the configuration file of namenode to ...

Added by Rayman3.tk on Wed, 06 May 2020 02:22:05 +0300

Why is the number of hard links 2 after Linux creates a directory

Execute ll in a directory [hadoop@hadoop0 test]$ ll total 8 drwxrwxr-x. 2 hadoop hadoop 4096 Jan 17 22:28 a drwxrwxr-x. 2 hadoop hadoop 4096 Jan 17 22:28 b -rw-rw-r--. 1 hadoop hadoop 0 Jan 17 22:28 c It is found that the number of hard links between directory a and directory b is 2, while that of file c is 1 Why is that? Discovery: Under d ...

Added by Accurax on Sun, 03 May 2020 22:44:32 +0300

Optimized points (window functions) of sparksql over hivesql

Sometimes, a select statement contains multiple window functions whose window definitions (OVER clauses) may be the same or different. For the same windows, there is no need to partition and sort them again. We can merge them into a Window operator. such as The realization principle of window function in spark and hive Case in: select i ...

Added by serverman on Tue, 07 Apr 2020 17:52:21 +0300