ELK cluster environment deployment of big data series

This paper mainly introduces the environment deployment and configuration of ELK related components, and takes the system syslog as the source data input to test and verify the data reception of elasticsearch and the data display of Kibana. 1. Introduction to basic concepts and environment configuration 1.1 basic concept of Elk ELK is an o ...

Added by k4pil on Fri, 04 Mar 2022 23:52:58 +0200

Flink_ 09_ CEP (personal summary)

Statement: 1 ***               2. Because it is a personal summary, write the article with the most concise words               3. If there is any mistake or improper place, please point out Introduction ...

Added by amarquis on Fri, 04 Mar 2022 21:29:32 +0200

Python project practice: analyze big data with PySpark

Python project practice: analyze big data with PySpark Big data, as its name implies, is a large amount of data. Generally, these data are above PB level. PB is the unit of data storage capacity, which is equal to the 50th power of 2 bytes, or about 1000 TB in value. These data are characterized by a wide variety, including video, voice, pictu ...

Added by ztealmax on Fri, 04 Mar 2022 19:19:29 +0200

Introduction to the core concept of elasticSearch: ES cluster index fragment management

In the previous chapter, we built the ES cluster. Interested friends can refer to it Introduction to the core concept of elasticSearch (XIII): docker building ES cluster Here we introduce the partition management of ES cluster index ES cluster index fragmentation management introduce Shard: because ES is a distributed search engine, t ...

Added by semtex on Fri, 04 Mar 2022 18:13:56 +0200

Redis - redis persistence

Introduction to persistence Redis is an in memory database. If the database state in memory is not saved to disk, the database state in the server will be lost once the server process exits. So redis provides persistence function! What is persistence The working mechanism of using permanent storage media to save data and recover the saved ...

Added by Sanjib Sinha on Fri, 04 Mar 2022 08:54:45 +0200

Common commands for practical operation

preparation Start hadoop cluster [amelia@hadoop102 hadoop-2.7.2]$ sbin/start-dfs.sh -help: output this command parameter [amelia@hadoop102 hadoop-2.7.2]$ hadoop fs -help rm Create / sanguo folder [amelia@hadoop102 hadoop-2.7.2]$ hadoop fs -mkdir /sanguo Check whether the sanguo file exists in hadoop 2. Upload -moveFromLocal: cut and ...

Added by balkan7 on Thu, 03 Mar 2022 08:08:04 +0200

Passenger express logistics big data project: initialize Spark flow computing program

catalogue Initialize Spark streaming program 1, SparkSql parameter tuning settings 1. Set session time zone 2. Sets the maximum number of bytes a single partition can hold when reading a file 3. Set the threshold for merging small files 4. Sets the number of partitions to use when shuffling data with join or aggregate 5. Set the maximum ...

Added by ratcateme on Wed, 02 Mar 2022 22:37:24 +0200

Hadoop environment configuration (Linux virtual machine)

Hadoop environment configuration (Linux virtual machine) This semester, I chose the course of big data management and analysis, which mainly uses Hadoop framework for data analysis and application development. First, I will configure the environment be careful It's better to put JDK and Hadoop under / usr/local When adding environment ...

Added by stargate03 on Mon, 28 Feb 2022 13:06:21 +0200

Introduction and test of allowed lateness in Flink

By default, when the watermark passes through the end of window and the previous data arrives, these data will be deleted. In order to avoid some late data being deleted, the concept of allowed lateness is generated. In short, allowed latency is for event time. After the watermark exceeds the end of window, it is also allowed to wait for a pe ...

Added by vargadanis on Thu, 24 Feb 2022 13:31:24 +0200

Hadoop in simple terms -- getting started

Hadoop learning 1.Hadoop overview Infrastructure of a distributed systemIt mainly solves the problems of massive data storage and distributed computing 1.1 three major releases of Hadoop The original version of Apache was released in 2006Cloudera integrates many big data frameworks internally, and the corresponding product is CDH releas ...

Added by birwin on Wed, 23 Feb 2022 18:37:40 +0200