hadoop cluster construction

1, Hadoop cluster construction 1. Installing virtual machines 1. Installing vmtools hadoop@ubuntu:sudo apt-get install open-vm-tools-desktop -y 2. Install vim editor hadoop@ubuntu:sudo apt install vim 2. Install jdk 1. Unzip the installation package hadoop@ubuntu:~$ sudo tar -zxvf jdk-8u171-linux-x64.tar.gz -C /usr/local 2. Modify e ...

Added by nano on Wed, 05 Jan 2022 03:46:35 +0200

ES series tutorial 02: Elasticsearch one day tour

This article was first published in the official account of the geek barracks. Original addressThe best way to learn elastic search (hereinafter referred to as ES) is to practice more. In this series of tutorials, I will use the small project "online bookstore" throughout each chapter. The background of this project is very simple. Ea ...

Added by delphi123 on Wed, 05 Jan 2022 03:04:48 +0200

[Spark] action operator of RDD

The so-called action operator is the method to trigger job execution reduce Function signature: def reduce (F: (T, t) = > t): t Function Description: aggregate all elements in RDD, first aggregate data in partitions, and then aggregate data between partitions @Test def reduce(): Unit = { val rdd = sc.makeRDD(List(1,2,3,4)) ...

Added by nascarjunky on Wed, 05 Jan 2022 02:37:28 +0200

Hive: window function

1, What is the window function 2, Window function classification 1, Cumulative calculation window function 1,sum() over() It is often encountered in work to calculate the cumulative value up to a certain month. At this time, you need to use sum() to open the window For example, give a transaction form_ trade: Now it is necessary to calcul ...

Added by fpyontek on Tue, 04 Jan 2022 17:38:22 +0200

ELK - log collection system

ELK - log collection system 1. What logs do you want to collect? ① System log – prepare for monitoring ② Service log – database – MySQL – slow query log, error log and normal log ③ Business log – log4j (business log must be collected) Note: log4j - data business log of Java class (1) To be targeted to collect ...

Added by recset on Tue, 04 Jan 2022 03:26:51 +0200

First understand the three installation modes of Hadoop

First understand the three installation modes of Hadoop Features: high reliability (not afraid of loss), high efficiency (fast processing speed), high fault tolerance ps: use Hadoop version: Next, Hadoop 2 8.5, although Hadoop has been updated to 3.5 X is over; However, we always adhere to the view of "using the old instead of th ...

Added by chaser7016 on Tue, 04 Jan 2022 03:14:13 +0200

Elasticsearch installation and grammar learning

1, Introduction With the help of the official website Introduction to Elasticsearch You know, for search (and analysis) Elasticsearch is the core distributed search and analysis engine of Elastic Stack. Logstash and Beats help collect, aggregate and enrich your data and store it in elasticsearch. With Kibana, you can interactively explore, ...

Added by Hitman2oo2 on Tue, 04 Jan 2022 01:31:00 +0200

Microservice deployment on k8s platform

Micro services involved demo involving three microservices: Service registration and discovery: Eureka serverManagement service: admin serviceUser service: User Service The management service and User service will register with eureka. When accessing the add user api of the Admin service, the Admin service will call the add user api of the U ...

Added by jcubie on Tue, 04 Jan 2022 00:19:23 +0200

Spark introduction and spark deployment, principle and development environment construction

Spark introduction and spark deployment, principle and development environment construction Introduction to spark Spark is a fast, universal and scalable big data analysis and calculation engine based on memory. It is a general memory parallel computing framework developed by the AMP Laboratory (Algorithms, Machines, and People Lab) at the U ...

Added by benreisner on Mon, 03 Jan 2022 22:14:19 +0200

seaborn visualization 01 covers almost all usage

seaborn visualization (I) Matplotlib tries to make simple things easier and difficult things possible, while Seaborn makes difficult things easier. seaborn is for statistical mapping. Generally speaking, seaborn can meet 90% of the mapping needs of data analysis. Seaborn is actually a higher-level API package based on matplotlib, which make ...

Added by renno on Mon, 03 Jan 2022 19:26:41 +0200