ELK cluster environment deployment of big data series
This paper mainly introduces the environment deployment and configuration of ELK related components, and takes the system syslog as the source data input to test and verify the data reception of elasticsearch and the data display of Kibana.
1. Introduction to basic concepts and environment configuration
1.1 basic concept of Elk
ELK is an o ...
Added by k4pil on Fri, 04 Mar 2022 23:52:58 +0200
Flink_ 09_ CEP (personal summary)
Statement: 1 *** 2. Because it is a personal summary, write the article with the most concise words 3. If there is any mistake or improper place, please point out
Introduction ...
Added by amarquis on Fri, 04 Mar 2022 21:29:32 +0200
Python project practice: analyze big data with PySpark
Python project practice: analyze big data with PySpark
Big data, as its name implies, is a large amount of data. Generally, these data are above PB level. PB is the unit of data storage capacity, which is equal to the 50th power of 2 bytes, or about 1000 TB in value. These data are characterized by a wide variety, including video, voice, pictu ...
Added by ztealmax on Fri, 04 Mar 2022 19:19:29 +0200
Introduction to the core concept of elasticSearch: ES cluster index fragment management
In the previous chapter, we built the ES cluster. Interested friends can refer to it Introduction to the core concept of elasticSearch (XIII): docker building ES cluster Here we introduce the partition management of ES cluster index
ES cluster index fragmentation management
introduce
Shard: because ES is a distributed search engine, t ...
Added by semtex on Fri, 04 Mar 2022 18:13:56 +0200
Redis - redis persistence
Introduction to persistence
Redis is an in memory database. If the database state in memory is not saved to disk, the database state in the server will be lost once the server process exits. So redis provides persistence function!
What is persistence The working mechanism of using permanent storage media to save data and recover the saved ...
Added by Sanjib Sinha on Fri, 04 Mar 2022 08:54:45 +0200
Common commands for practical operation
preparation
Start hadoop cluster
[amelia@hadoop102 hadoop-2.7.2]$ sbin/start-dfs.sh
-help: output this command parameter
[amelia@hadoop102 hadoop-2.7.2]$ hadoop fs -help rm
Create / sanguo folder
[amelia@hadoop102 hadoop-2.7.2]$ hadoop fs -mkdir /sanguo
Check whether the sanguo file exists in hadoop 2. Upload
-moveFromLocal: cut and ...
Added by balkan7 on Thu, 03 Mar 2022 08:08:04 +0200
Passenger express logistics big data project: initialize Spark flow computing program
catalogue
Initialize Spark streaming program
1, SparkSql parameter tuning settings
1. Set session time zone
2. Sets the maximum number of bytes a single partition can hold when reading a file
3. Set the threshold for merging small files
4. Sets the number of partitions to use when shuffling data with join or aggregate
5. Set the maximum ...
Added by ratcateme on Wed, 02 Mar 2022 22:37:24 +0200
Hadoop environment configuration (Linux virtual machine)
Hadoop environment configuration (Linux virtual machine)
This semester, I chose the course of big data management and analysis, which mainly uses Hadoop framework for data analysis and application development. First, I will configure the environment
be careful
It's better to put JDK and Hadoop under / usr/local When adding environment ...
Added by stargate03 on Mon, 28 Feb 2022 13:06:21 +0200
Introduction and test of allowed lateness in Flink
By default, when the watermark passes through the end of window and the previous data arrives, these data will be deleted.
In order to avoid some late data being deleted, the concept of allowed lateness is generated.
In short, allowed latency is for event time. After the watermark exceeds the end of window, it is also allowed to wait for a pe ...
Added by vargadanis on Thu, 24 Feb 2022 13:31:24 +0200
Hadoop in simple terms -- getting started
Hadoop learning
1.Hadoop overview
Infrastructure of a distributed systemIt mainly solves the problems of massive data storage and distributed computing
1.1 three major releases of Hadoop
The original version of Apache was released in 2006Cloudera integrates many big data frameworks internally, and the corresponding product is CDH releas ...
Added by birwin on Wed, 23 Feb 2022 18:37:40 +0200