[Flink from getting started to mastering 01] DataStream API

In the previous article, we introduced the installation, deployment and basic concepts of Flink. Today, let's learn about DataStream API, one of the core of Flink. 01 distributed stream processing foundation In the figure above, we divide the whole code into three parts, namely, the basic model of distributed stream processing: SourceTrans ...

Added by c-o-d-e on Sun, 20 Feb 2022 10:32:54 +0200

Kafka installation and deployment

Big data related knowledge points 1. Introduction to Kafka Kafka is a high-throughput distributed publish and subscribe message system. It can process all action flow data in consumer scale websites. It has the ability of high performance, persistence, multi copy backup and horizontal expansion Distributed system, easy to expand outward;At t ...

Added by kjelle392 on Sat, 19 Feb 2022 19:08:24 +0200

kafka Basics - kafka producer client

preface In the first section, we mentioned that on the kafka server side, we can create producers and send messages through commands. However, in the actual development, we all create producers and send messages in the project in the form of java. In this section, we will explain kafka producer based on JAVA API. 1, Introduction to JAVA API c ...

Added by cloudy243 on Thu, 17 Feb 2022 22:45:38 +0200

Introduction to RabbitMQ message middleware

RabbitMQ message middleware 1. Message Oriented Middleware 1. Introduction **Message middleware, also known as message queue, refers to the platform independent data exchange with efficient and reliable message transmission mechanism, and the integration of distributed systems based on data communication** In the distributed environment, mes ...

Added by notsleepy on Sat, 12 Feb 2022 03:53:51 +0200

3. Deep dive Kafka producer - Core Architecture

Deep dive kafka producer - Core Architecture 3. Deep dive KafkaProducer infrastructure kafka has customized a set of network protocols, which can be implemented in any language to achieve the effect of pushing messages to and from kafka clusters. The clients module in the source code of kafka version 2.8.0 is the official default imple ...

Added by cparekh on Wed, 09 Feb 2022 13:36:14 +0200

Java integrates Flink to stream data obtained from Kafka

Last example https://blog.csdn.net/xxkalychen/article/details/117149540?spm=1001.2014.3001.5502 Setting Flink's data source to Socket is just to provide streaming data for testing. This is not generally used in production. The standard model is to obtain streaming data from message queues. Flink provides the encapsulation of connecting with Kaf ...

Added by moty66 on Wed, 09 Feb 2022 02:01:06 +0200

consumer flow control and Rebalance analysis of kafka Java client

Flow control Consumer In order to avoid excessive traffic hitting the Consumer end and crushing the Consumer due to the sharp increase of traffic in Kafka, we need to limit the current of the Consumer. For example, when the amount of data processed reaches a certain threshold, the consumption is suspended, and when it is lower than the thresho ...

Added by wystan on Tue, 08 Feb 2022 16:06:17 +0200

Kafka 2.5.0 cluster installation (single machine pseudo cluster)

Kafka 2.5.0 cluster installation (single machine pseudo cluster) Installation of Zookeeper 3.6.3 single node Download and unzip Zookeeper Download address https://zookeeper.apache.org/releases.html Download the binary version here. You don't need to compile apache-zookeeper-3.6.3-bin tar. gz Create the folder zookeeper under the / path and ...

Added by mr_zhang on Mon, 07 Feb 2022 23:03:15 +0200

Transaction and idempotency of Kafka producer

Background: kafka client's producer API sends messages and simple source code analysis Starting from Kafka 0.11, Kafka producer supports two modes: idempotent producer and transaction producer. Idempotent producers strengthen Kafka's delivery semantics, from at least one delivery to precise one delivery. In particular, the retry of the produce ...

Added by grail on Sun, 06 Feb 2022 20:58:13 +0200

Flink real-time data warehouse of big data project (DWM layer)

Design ideas In the past, we split the data into independent Kafka topics through diversion and other processing methods. Next, when processing the data, we should consider processing the index items used in real-time calculation. Timeliness is the pursuit of real-time data warehouse. Therefore, in some scenarios, it is not necessary to have a ...

Added by SteveMellor on Thu, 03 Feb 2022 21:34:05 +0200