hadoop cluster construction
1, Hadoop cluster construction
1. Installing virtual machines
1. Installing vmtools
hadoop@ubuntu:sudo apt-get install open-vm-tools-desktop -y
2. Install vim editor
hadoop@ubuntu:sudo apt install vim
2. Install jdk
1. Unzip the installation package
hadoop@ubuntu:~$ sudo tar -zxvf jdk-8u171-linux-x64.tar.gz -C /usr/local
2. Modify e ...
Added by nano on Wed, 05 Jan 2022 03:46:35 +0200
ES series tutorial 02: Elasticsearch one day tour
This article was first published in the official account of the geek barracks. Original addressThe best way to learn elastic search (hereinafter referred to as ES) is to practice more. In this series of tutorials, I will use the small project "online bookstore" throughout each chapter. The background of this project is very simple. Ea ...
Added by delphi123 on Wed, 05 Jan 2022 03:04:48 +0200
[Spark] action operator of RDD
The so-called action operator is the method to trigger job execution
reduce
Function signature: def reduce (F: (T, t) = > t): t Function Description: aggregate all elements in RDD, first aggregate data in partitions, and then aggregate data between partitions
@Test
def reduce(): Unit = {
val rdd = sc.makeRDD(List(1,2,3,4)) ...
Added by nascarjunky on Wed, 05 Jan 2022 02:37:28 +0200
Hive: window function
1, What is the window function
2, Window function classification
1, Cumulative calculation window function
1,sum() over()
It is often encountered in work to calculate the cumulative value up to a certain month. At this time, you need to use sum() to open the window For example, give a transaction form_ trade: Now it is necessary to calcul ...
Added by fpyontek on Tue, 04 Jan 2022 17:38:22 +0200
ELK - log collection system
ELK - log collection system
1. What logs do you want to collect?
① System log – prepare for monitoring
② Service log – database – MySQL – slow query log, error log and normal log
③ Business log – log4j (business log must be collected)
Note: log4j - data business log of Java class
(1) To be targeted to collect ...
Added by recset on Tue, 04 Jan 2022 03:26:51 +0200
First understand the three installation modes of Hadoop
First understand the three installation modes of Hadoop
Features: high reliability (not afraid of loss), high efficiency (fast processing speed), high fault tolerance
ps: use Hadoop version:
Next, Hadoop 2 8.5, although Hadoop has been updated to 3.5 X is over; However, we always adhere to the view of "using the old instead of th ...
Added by chaser7016 on Tue, 04 Jan 2022 03:14:13 +0200
Elasticsearch installation and grammar learning
1, Introduction
With the help of the official website
Introduction to Elasticsearch You know, for search (and analysis) Elasticsearch is the core distributed search and analysis engine of Elastic Stack. Logstash and Beats help collect, aggregate and enrich your data and store it in elasticsearch. With Kibana, you can interactively explore, ...
Added by Hitman2oo2 on Tue, 04 Jan 2022 01:31:00 +0200
Microservice deployment on k8s platform
Micro services involved
demo involving three microservices:
Service registration and discovery: Eureka serverManagement service: admin serviceUser service: User Service
The management service and User service will register with eureka. When accessing the add user api of the Admin service, the Admin service will call the add user api of the U ...
Added by jcubie on Tue, 04 Jan 2022 00:19:23 +0200
Spark introduction and spark deployment, principle and development environment construction
Spark introduction and spark deployment, principle and development environment construction
Introduction to spark
Spark is a fast, universal and scalable big data analysis and calculation engine based on memory.
It is a general memory parallel computing framework developed by the AMP Laboratory (Algorithms, Machines, and People Lab) at the U ...
Added by benreisner on Mon, 03 Jan 2022 22:14:19 +0200
seaborn visualization 01 covers almost all usage
seaborn visualization (I)
Matplotlib tries to make simple things easier and difficult things possible, while Seaborn makes difficult things easier. seaborn is for statistical mapping. Generally speaking, seaborn can meet 90% of the mapping needs of data analysis. Seaborn is actually a higher-level API package based on matplotlib, which make ...
Added by renno on Mon, 03 Jan 2022 19:26:41 +0200