Big data ELK in 2021: collecting Apache Web server logs

The most detailed big data ELK article series in the whole network. It is strongly recommended to collect and pay attention!   The new articles have listed the historical article directory to help you review the previous knowledge focus. catalogue Collect Apache Web server logs 1, Demand 2, Prepare log data 3, Send logs to Logstas ...

Added by discorevilo on Sat, 11 Dec 2021 01:37:00 +0200

Installation and use of Flume of data integration tool

Flume introduction 1. Flume 1. flume is a distributed, reliable and highly available system for massive log collection, aggregation and transmission. Support customization of various data senders in the log system for data collection; At the same time, flume provides the ability to simply process data and write to various data recipien ...

Added by Josien on Thu, 09 Dec 2021 17:38:58 +0200

Easy to understand, an article takes you to know Kafka

This article is transferred from: Le byteThe article mainly explains: KafkaFor more Java related information, you can pay attention to the official account number: 999Asynchronous communication principleObserver modeObserver mode, also known as Publish/Subscribe modeDefine a one to many dependency between objects, so that whenever an object cha ...

Added by Jurik on Thu, 09 Dec 2021 03:03:20 +0200

[data analysis and mining] binary classification / multi classification prediction practice based on LightGBM,XGBoost and logistic regression (with data sets and codes)

1, Classification prediction based on logistic regression 1 Introduction and application of logistic regression 1.1 introduction to logistic regression Although Logistic regression (LR) has the word "regression", it is actually a classification model and is widely used in various fields. Although deep learning is more popular t ...

Added by linkin on Wed, 08 Dec 2021 12:08:24 +0200

ElasticSearch dynamic mapping and static mapping_ 08

Mapping is mapping, which is used to define a document and how the fields contained in the document should be stored and indexed. Therefore, it is actually a bit similar to the definition of tables in relational databases. Mapping classification Dynamic mapping As the name suggests, it is a map created automatically. es automatically analyzes ...

Added by Topshed on Wed, 08 Dec 2021 07:03:05 +0200

spark integrated hive summary

  I won't say much about installing spark here~ !!! Look! To install mysql and hive: Install RPM package and download mysql:   sudo yum localinstall https://repo.mysql.com//mysql80-community-release-el7-1.noarch.rpm sudo yum install mysql-community-server Start MySQL service and view the status: systemctl start mysqld.service service ...

Added by jaimitoc30 on Tue, 07 Dec 2021 22:24:11 +0200

Flink -- transform (keyed flow conversion operator)

Keyed flow conversion operator keyby If you want to aggregate now, you must group first, so keyby is very important The keyby operator is special and is not a step-by-step operation Not the real aoprete It defines the relationship between two tasks Data transmission mode keyby groups based on defined key s A repartition is performed based on ...

Added by 11Tami on Tue, 07 Dec 2021 09:19:32 +0200

storm source code analysis

2021SC@SDUSC First, introduce some knowledge about Worker. Then analyze the code. About Worker Relationship among worker, executor and task A worker is a process. A worker is a process. A process contains one or more threads. A thread is an executor. A thread will process one or more tasks. A task is a task, and a task is an instance ob ...

Added by capella07 on Tue, 07 Dec 2021 00:49:11 +0200

scala -- set explanation, set related method introduction, Traversable use

1. Assembly 1.1 general Anyone who has learned about programming knows that the sentence "program = algorithm + data structure" was put forward by the famous Swiss computer scientist Nicholas Voss, who was also the winner of the Turing Award in 1984. Algorithm refers to a series of effective and general steps of calculation. Algorit ...

Added by Deadman2 on Mon, 06 Dec 2021 07:19:09 +0200

ElasticJob ‐ Lite: Simple & Dataflow job

ElasticJob ‐ Lite: simple & dataflow job The following introduction to ElasticJob comes from Official documents: ElasticJob is a distributed scheduling solution for Internet Ecology and massive tasks. It is composed of two independent subprojects ElasticJob Lite and ElasticJob Cloud. It creates a distributed scheduling solution suitab ...

Added by barrywood on Mon, 06 Dec 2021 06:54:16 +0200