Does Flink Checkpoint support the maintenance of Kafka data consumption status?
Author: Wen naisongWhen using Flink to consume Kafka data in real time, it involves the maintenance of offset status. In order to ensure the restart of Flink job or the failure retry of Operator level during running, if you want to achieve "breakpoint continuation", you need the support of Flink Checkpoint. The question is, if you sim ...
Added by halcyonalt on Sat, 15 Jan 2022 09:53:19 +0200
Handle construction of Flink connector based on GBase8s
brief introduction
This article will first explain to you what Flink connector and CDC are, and then build a simple Flink connector for GBase8s with you by hand, and complete the practical project, that is, synchronize data to GBase8s through Mysql CDC in real time through the connector.
What is Flink connector
Flink has built-in basi ...
Added by tnewton on Fri, 14 Jan 2022 08:43:23 +0200
[Flink] [Chapter 6 Window]
Window overview
Streaming computing is a data processing engine designed to process infinite data sets, which refers to a growing essentially infinite data set, and window is a means to cut infinite data into finite blocks. Window is the core of infinite data stream processing. Window splits an infinite stream into "buckets" of finit ...
Added by s_ainley87 on Thu, 13 Jan 2022 18:28:15 +0200
ListState usage of Flink state
1. Basic status of flick
Flink has two basic states: operator state and keyed state. Their main difference is that the scope of action is different. The scope of action of operator state is limited to operator tasks (that is, when the current partition is executed, all data can access the state). In the keying state, not all data in the curren ...
Added by eulalyn09 on Sat, 08 Jan 2022 07:47:13 +0200
Flink multi parallelism and WaterMark
Flink multi parallelism and WaterMark
Recently, when reviewing flink, I found that the demo s written before are all single parallelism. Is the window trigger under the sudden fantasy multi parallelism the same as the single parallelism? Therefore, the following problems are extended.
First, I set the data delay time to 2s, and then set a ...
Added by craige on Thu, 06 Jan 2022 12:16:23 +0200
Flip on yarn specifies the third-party jar package
1. Background
When submitting the flick task to yarn, we usually use the shade plug-in to package all the required jar packages into a large jar package, and then submit them to yarn through the flick run command. However, if there are frequent code changes, or many colleagues in the team need to develop multiple business modules in the same p ...
Added by rschneid on Mon, 03 Jan 2022 08:13:20 +0200
Complete solution of big data Flink e-commerce real-time warehouse actual combat project process (V)
Premise summary: we have implemented dynamic shunting before, that is, we have shunted dimension data and fact data through TableProcessFunction1 class, and then we write the data into Hbase table and Kafka topic table:
hbaseDS.addSink(new DimSink());
kafkaDS.addSink(kafkaSink);
At this time, the two data types after dynamic shunting are r ...
Added by Dagwing on Sun, 02 Jan 2022 02:56:07 +0200
Blink SQL time attribute
Time attribute
Flink supports three time concepts related to stream data processing: Processing Time, Event Time and Ingestion Time.
Blink SQL only supports two time types: Event Time and Processing Time:
Event Time: Event Time (usually the most original creation time of data). Event Time must be the data provided in the data store.Process ...
Added by ju8ular1 on Sun, 02 Jan 2022 02:04:20 +0200
Talk about Java type erasure, Lambda expression used in Flink, lost information and Flink type hint mechanism
Recently, when learning Flink, I found that due to the existence of Java type erasure, generic types cannot be detected when using Lambda expressions in Flink. We need to use Flink type hint mechanism to solve it. Now let's analyze it in depth!
What is Java generic erasure
This article does not introduce Java generics. Students who do not ...
Added by spider.nick on Thu, 30 Dec 2021 02:18:50 +0200
About Flink's Metrics monitoring instructions
1-Metrics introduction
Because it is difficult to find the actual internal situation after the cluster is running, whether it runs slowly or fast, whether it is abnormal, etc., developers cannot view all Task logs in real time. For example, how to deal with large jobs or many jobs? At this point, Metrics can help developers understand the curr ...
Added by frankstr on Sun, 26 Dec 2021 01:41:41 +0200