[Flink from getting started to mastering 01] DataStream API
In the previous article, we introduced the installation, deployment and basic concepts of Flink. Today, let's learn about DataStream API, one of the core of Flink.
01 distributed stream processing foundation
In the figure above, we divide the whole code into three parts, namely, the basic model of distributed stream processing:
SourceTrans ...
Added by c-o-d-e on Sun, 20 Feb 2022 10:32:54 +0200
Kafka installation and deployment
Big data related knowledge points
1. Introduction to Kafka
Kafka is a high-throughput distributed publish and subscribe message system. It can process all action flow data in consumer scale websites. It has the ability of high performance, persistence, multi copy backup and horizontal expansion
Distributed system, easy to expand outward;At t ...
Added by kjelle392 on Sat, 19 Feb 2022 19:08:24 +0200
kafka Basics - kafka producer client
preface
In the first section, we mentioned that on the kafka server side, we can create producers and send messages through commands. However, in the actual development, we all create producers and send messages in the project in the form of java. In this section, we will explain kafka producer based on JAVA API.
1, Introduction to JAVA API c ...
Added by cloudy243 on Thu, 17 Feb 2022 22:45:38 +0200
Introduction to RabbitMQ message middleware
RabbitMQ message middleware
1. Message Oriented Middleware
1. Introduction
**Message middleware, also known as message queue, refers to the platform independent data exchange with efficient and reliable message transmission mechanism, and the integration of distributed systems based on data communication** In the distributed environment, mes ...
Added by notsleepy on Sat, 12 Feb 2022 03:53:51 +0200
3. Deep dive Kafka producer - Core Architecture
Deep dive kafka producer - Core Architecture
3. Deep dive KafkaProducer infrastructure
kafka has customized a set of network protocols, which can be implemented in any language to achieve the effect of pushing messages to and from kafka clusters. The clients module in the source code of kafka version 2.8.0 is the official default imple ...
Added by cparekh on Wed, 09 Feb 2022 13:36:14 +0200
Java integrates Flink to stream data obtained from Kafka
Last example https://blog.csdn.net/xxkalychen/article/details/117149540?spm=1001.2014.3001.5502 Setting Flink's data source to Socket is just to provide streaming data for testing. This is not generally used in production. The standard model is to obtain streaming data from message queues. Flink provides the encapsulation of connecting with Kaf ...
Added by moty66 on Wed, 09 Feb 2022 02:01:06 +0200
consumer flow control and Rebalance analysis of kafka Java client
Flow control Consumer
In order to avoid excessive traffic hitting the Consumer end and crushing the Consumer due to the sharp increase of traffic in Kafka, we need to limit the current of the Consumer. For example, when the amount of data processed reaches a certain threshold, the consumption is suspended, and when it is lower than the thresho ...
Added by wystan on Tue, 08 Feb 2022 16:06:17 +0200
Kafka 2.5.0 cluster installation (single machine pseudo cluster)
Kafka 2.5.0 cluster installation (single machine pseudo cluster)
Installation of Zookeeper 3.6.3 single node
Download and unzip Zookeeper
Download address https://zookeeper.apache.org/releases.html Download the binary version here. You don't need to compile apache-zookeeper-3.6.3-bin tar. gz Create the folder zookeeper under the / path and ...
Added by mr_zhang on Mon, 07 Feb 2022 23:03:15 +0200
Transaction and idempotency of Kafka producer
Background: kafka client's producer API sends messages and simple source code analysis
Starting from Kafka 0.11, Kafka producer supports two modes: idempotent producer and transaction producer. Idempotent producers strengthen Kafka's delivery semantics, from at least one delivery to precise one delivery. In particular, the retry of the produce ...
Added by grail on Sun, 06 Feb 2022 20:58:13 +0200
Flink real-time data warehouse of big data project (DWM layer)
Design ideas
In the past, we split the data into independent Kafka topics through diversion and other processing methods. Next, when processing the data, we should consider processing the index items used in real-time calculation. Timeliness is the pursuit of real-time data warehouse. Therefore, in some scenarios, it is not necessary to have a ...
Added by SteveMellor on Thu, 03 Feb 2022 21:34:05 +0200