Sogou log query analysis (MapReduce+Hive+idea comprehensive experiment)

prerequisite: Install Hadoop 2 7.3 (under Linux system) Install MySQL (under Windows or Linux system) Install Hive (under Linux system) reference: Hive installation configuration Title: Download search data from Sogou lab for analysis The downloaded data contains 6 fields, and the data format is described as follows: Access time user ID ...

Added by cheekychop on Sun, 12 Dec 2021 13:58:21 +0200

Hadoop is fully distributed (Zookeeper is not configured)

1, Prepare resources (1 ~ 4 downloads are free resources, or you can download them yourself) Operating system (CentOS7) (1) desktop version is used as master: Baidu online disk link: Click download Extraction code: wz4z (2) the version without desktop is used as a slave: Baidu online disk link: Click download Extraction code: gjyfHadoop-2.9.2 ...

Added by jayarsee on Fri, 10 Dec 2021 13:59:11 +0200

[proficient in Spark series] is it difficult to start everything? This article makes it easy for you to get started with Spark

๐Ÿš€ Author: "big data Zen" ๐Ÿš€ ** Introduction * *: This article is a series of spark articles. The column will record the contents from the basic to advanced spark, including the introduction of spark, cluster construction, core components, RDD, the use of operators, underlying principles, SparkCore, SparkSQL, SparkStreaming, etc, S ...

Added by stringman on Sun, 05 Dec 2021 18:32:19 +0200

MapReduce program 3 of Maven project --- realize the function of counting the total salary of employees in each department (optimization)

This paper is based on the realization of the function of counting the total salary of employees in each department. If it has not been realized, please refer to: Realize the function of counting the total salary of employees in each department Optimization project: 1. Use serialization 2. Implement partition partition 3.Map uses Combiner ...

Added by baw on Sun, 05 Dec 2021 11:00:33 +0200

Scala process control

1. if else 1.1 single branch Syntax structure: if (expr) { expr by true Statement executed when } 1.2 double branch Syntax structure: if (expr) { expr by true Statement executed when } else { expr by false Statement executed when } 1.3 multi branch Syntax structure: if (expr1) { expr1 by true Statement executed when } ...

Added by famous58 on Fri, 03 Dec 2021 01:37:31 +0200

ZooKeeper command line client

ZooKeeper command line client Start client Start the local zookeeper client:. / zkCli.sh [root@node-02 bin]# ./zkCli.sh Connecting to localhost:2181 # 2181 is the client listening port ... [zk: localhost:2181(CONNECTED) 0] Start remote zookeeper client:. / zkCli.sh โ€“ server ip:port [root@node-01 bin]# ./zkCli.sh -server node-02:21 ...

Added by lookee on Thu, 02 Dec 2021 23:55:17 +0200

Write HDFS data to es through Map/Reduce, and ES data to HDFS

Environmental preparation System centos 7 java 1.8 hadoop 2.7 ES 7.15.2 (for installation of ES stand-alone version, refer to: https://blog.csdn.net/weixin_36340771/article/details/121389741 ) Prepare hadoop local running environment Get Hadoop files Link: https://pan.baidu.com/s/1MGriraZ8ekvzsJyWdPssrw Extraction code: u4uc Configure H ...

Added by groovything on Wed, 01 Dec 2021 21:01:13 +0200

Big data: platform building (hadoop+spark+zeppelin)

Zeppelin is an open source Apache incubation project. It is a basic web notebook tool that supports interactive data analysis. Through plug-in access to various interpreter s, users can complete interactive query in specific language or data processing back-end, and quickly realize data visualization. Zeppelin: query and analyze data and genera ...

Added by IanM on Fri, 26 Nov 2021 21:42:08 +0200

Hadoop deployment and configuration

Hadoop download address https://archive.apache.org/dist/hadoop/common/hadoop-3.1.3/ 1, Hadoop installation 1. Upload hadoop-3.1.3.tar.gz to / opt/software directory of linux hadoop-3.1.3.tar.gz 2. Unzip hadoop-3.1.3.tar.gz to / opt/server / [linux@node1 software]$ tar -zxvf hadoop-3.1.3.tar.gz -C /opt/server/ 3. Modify / etc/profile. ...

Added by the7soft.com on Fri, 26 Nov 2021 13:00:49 +0200

[Introduction to Cloud Computing Experiment 3] MapReduce programming

Pre-environment You need to set up a hadoop pseudo-distributed cluster platform, which you can see in this tutorial Quick Start Tutorial for Hadoop Big Data Technology and Pseudo-Distributed Clustering Eclipse Environment Configuration Eclipse(Windows Local System) 1. Install plug-ins: hadoop-eclipse-plugin-2.7.3.jar Address: https:// ...

Added by bc2013 on Thu, 25 Nov 2021 20:05:01 +0200