Flink CDC and kafka carry out multi-source merging and downstream synchronization scheme
1, Foreword This paper mainly aims at the problem that Flink SQL cannot realize multi-source consolidation of multi database and multi table by using Flink CDC, and how to update the downstream Kafka synchronously after multi-source consolidation, because at present, Flink SQL can only carry out the job operation of single table Flink CDC, whic ...
Added by jefkin on Tue, 01 Feb 2022 23:22:40 +0200
flink: Table&Sql environment construction and program structure
share
Big data blog list
explain
I have always been interested in the knowledge of Flink Table. Now I decide to go beyond some unnecessary knowledge and learn Flink Table directly. This paper mainly introduces the architecture and interface implementation of Flink Table.Apache Flink has two relational APIs for unified stream batch proces ...
Added by alexboyer on Mon, 31 Jan 2022 12:26:30 +0200
Practice data Lake iceberg Lesson 11 test the complete process of partition table (making data, building tables, merging, deleting snapshots)
Catalogue of series articles
Practical data Lake iceberg lesson 1 Introduction Practical data Lake iceberg lesson 2 iceberg underlying data format based on hadoop Practice data Lake iceberg lesson 3 in sql client, read data from kafka to iceberg in sql Practice data Lake iceberg lesson 4 in sql client, read data from kafka to iceberg in sql ...
Added by kpmonroe on Sat, 29 Jan 2022 11:28:53 +0200
Apache Flink learning notes Window in streaming processing
Window concept
In most scenarios, the data streams we need to count are unbounded, so we can't wait until the whole data stream is terminated. Usually, we only need to make statistical analysis on the data within a certain time range or quantity range: for example, count the hits of all goods in the past hour every five minutes; Or after every ...
Added by phpr0ck5 on Sat, 29 Jan 2022 09:53:54 +0200
3. Introduction to flink programming
1 Introduction to Flink programming1.1 initialize Flink project template1.1.1 preparationsMaven 3.0.4 and above and JDK 8 are required1.1.2 using maven command to create java project templateExecute the maven command. If the maven local warehouse does not have dependent jar s, it needs to have a networkmvn archetype:generate
-DarchetypeGroupI ...
Added by dtasman7 on Thu, 27 Jan 2022 21:02:53 +0200
Basic steps of Flink programming and loading different types of data sources
Basic steps of Flink programming:
1. Create the stream execution environment streamexecutionenvironment Getexecutionenvironment() gets the stream environment.
2. Load data Source
3. Transformation
4. Output Sink, land it in other data warehouses and print it directly
Basic operation of Flink data -- four categories
Operation of a single ...
Added by Oxymen on Wed, 26 Jan 2022 23:28:31 +0200
FlinkSQL flow table and dimension table join and dual flow join
Dimension table is a concept in data warehouse. The dimension attribute in dimension table is to observe data and supplement the information of fact table. In the real-time data warehouse, there are also the concepts of dimension table and fact table. The fact table is usually kafka's real-time stream data, and the dimension table is usually st ...
Added by devai on Fri, 21 Jan 2022 21:44:47 +0200
Flink (14): Transformation operator of Flink
catalogue
0. Links to related articles
1. union and connect operators
2. split, select and Side Outputs
3. rebalance partition
4. Other partition operators
0. Links to related articles
1. union and connect operators
API:
Union: the union operator can merge multiple data streams of the same type and generate data strea ...
Added by fourthe on Fri, 21 Jan 2022 16:32:09 +0200
transform operator of Flink stream processing api
Welcome to my Personal blog Learn more
transform
Function: convert the South data (source data) into the required data
Common functions
map
The map operator is similar to the map in python. In python, the data is converted into the data in lambda expression, while the map in flink is more extensive. Through a new Mapfunction, the user-defi ...
Added by Kaitosoto on Thu, 20 Jan 2022 07:52:28 +0200
Flink Tutorial - Flnk 1.11 Streaming Data ORC Format Writing file
In flink, StreamingFileSink is an important sink for writing streaming data to the file system. It supports writing data in row format (json, csv, etc.) and column format (orc, parquet).
Hive is a broad data storage, while ORC, as a special optimized column storage format of hive, plays an important role in the storage format of hive. Today ...
Added by ssmitra on Mon, 17 Jan 2022 10:46:53 +0200