Sogou log query analysis (MapReduce+Hive+idea comprehensive experiment)
prerequisite:
Install Hadoop 2 7.3 (under Linux system)
Install MySQL (under Windows or Linux system)
Install Hive (under Linux system) reference: Hive installation configuration
Title:
Download search data from Sogou lab for analysis
The downloaded data contains 6 fields, and the data format is described as follows:
Access time user ID ...
Added by cheekychop on Sun, 12 Dec 2021 13:58:21 +0200
Hadoop is fully distributed (Zookeeper is not configured)
1, Prepare resources (1 ~ 4 downloads are free resources, or you can download them yourself)
Operating system (CentOS7) (1) desktop version is used as master: Baidu online disk link: Click download Extraction code: wz4z (2) the version without desktop is used as a slave: Baidu online disk link: Click download Extraction code: gjyfHadoop-2.9.2 ...
Added by jayarsee on Fri, 10 Dec 2021 13:59:11 +0200
[proficient in Spark series] is it difficult to start everything? This article makes it easy for you to get started with Spark
๐ Author: "big data Zen"
๐ ** Introduction * *: This article is a series of spark articles. The column will record the contents from the basic to advanced spark, including the introduction of spark, cluster construction, core components, RDD, the use of operators, underlying principles, SparkCore, SparkSQL, SparkStreaming, etc, S ...
Added by stringman on Sun, 05 Dec 2021 18:32:19 +0200
MapReduce program 3 of Maven project --- realize the function of counting the total salary of employees in each department (optimization)
This paper is based on the realization of the function of counting the total salary of employees in each department. If it has not been realized, please refer to: Realize the function of counting the total salary of employees in each department
Optimization project:
1. Use serialization
2. Implement partition partition
3.Map uses Combiner
...
Added by baw on Sun, 05 Dec 2021 11:00:33 +0200
Scala process control
1. if else
1.1 single branch
Syntax structure:
if (expr) {
expr by true Statement executed when
}
1.2 double branch
Syntax structure:
if (expr) {
expr by true Statement executed when
} else {
expr by false Statement executed when
}
1.3 multi branch
Syntax structure:
if (expr1) {
expr1 by true Statement executed when
} ...
Added by famous58 on Fri, 03 Dec 2021 01:37:31 +0200
ZooKeeper command line client
ZooKeeper command line client
Start client
Start the local zookeeper client:. / zkCli.sh
[root@node-02 bin]# ./zkCli.sh
Connecting to localhost:2181 # 2181 is the client listening port
...
[zk: localhost:2181(CONNECTED) 0]
Start remote zookeeper client:. / zkCli.sh โ server ip:port
[root@node-01 bin]# ./zkCli.sh -server node-02:21 ...
Added by lookee on Thu, 02 Dec 2021 23:55:17 +0200
Write HDFS data to es through Map/Reduce, and ES data to HDFS
Environmental preparation
System centos 7
java 1.8
hadoop 2.7
ES 7.15.2 (for installation of ES stand-alone version, refer to: https://blog.csdn.net/weixin_36340771/article/details/121389741 )
Prepare hadoop local running environment
Get Hadoop files
Link: https://pan.baidu.com/s/1MGriraZ8ekvzsJyWdPssrw Extraction code: u4uc
Configure H ...
Added by groovything on Wed, 01 Dec 2021 21:01:13 +0200
Big data: platform building (hadoop+spark+zeppelin)
Zeppelin is an open source Apache incubation project. It is a basic web notebook tool that supports interactive data analysis. Through plug-in access to various interpreter s, users can complete interactive query in specific language or data processing back-end, and quickly realize data visualization. Zeppelin: query and analyze data and genera ...
Added by IanM on Fri, 26 Nov 2021 21:42:08 +0200
Hadoop deployment and configuration
Hadoop download address
https://archive.apache.org/dist/hadoop/common/hadoop-3.1.3/
1, Hadoop installation
1. Upload hadoop-3.1.3.tar.gz to / opt/software directory of linux
hadoop-3.1.3.tar.gz
2. Unzip hadoop-3.1.3.tar.gz to / opt/server /
[linux@node1 software]$ tar -zxvf hadoop-3.1.3.tar.gz -C /opt/server/
3. Modify / etc/profile. ...
Added by the7soft.com on Fri, 26 Nov 2021 13:00:49 +0200
[Introduction to Cloud Computing Experiment 3] MapReduce programming
Pre-environment
You need to set up a hadoop pseudo-distributed cluster platform, which you can see in this tutorial Quick Start Tutorial for Hadoop Big Data Technology and Pseudo-Distributed Clustering
Eclipse Environment Configuration
Eclipse(Windows Local System)
1. Install plug-ins:
hadoop-eclipse-plugin-2.7.3.jar
Address: https:// ...
Added by bc2013 on Thu, 25 Nov 2021 20:05:01 +0200