flink: Table&Sql environment construction and program structure

share

explain

  • I have always been interested in the knowledge of Flink Table. Now I decide to go beyond some unnecessary knowledge and learn Flink Table directly. This paper mainly introduces the architecture and interface implementation of Flink Table.
  • Apache Flink has two relational APIs for unified stream batch processing: table API and SQL. Table API is a query API for Scala and Java languages. It can combine selection, filtering, join and other relational operators in a very intuitive way. Flink SQL is a standard SQL implemented based on Apache compute. The queries in the two APIs have the same semantics for the input of a DataSet and a DataStream, and will also produce the same calculation results.

data

framework

  • Blink version 1.9 is provided by Alibaba, and Flink 1.12 implements Tabel API and SQL functions by default.
    [the external link image transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the image and upload it directly (img-RQGfbmaJ-1623385080236)(. /... / img / flick / flick tableapi SQL. PNG)]

advantage

  • Declarative: users only care about what to do, not how to do it.
  • High performance: support query optimization to obtain better execution performance.
  • Unified flow and batch: the same statistical logic can run in both flow mode and batch mode.
  • Stable standard: the semantics follows the SQL standard and is not easy to change.
  • Easy to understand: clear semantics, what you see is what you get.

maven Guide Package

tabel API and SQL

  • Import different packages for java and scala languages.
<!-- java -->
<dependency>
  <groupId>org.apache.flink</groupId>
  <artifactId>flink-table-api-java-bridge_2.11</artifactId>
  <version>1.12.3</version>
  <scope>provided</scope>
</dependency>
<!-- scala -->
<dependency>
  <groupId>org.apache.flink</groupId>
  <artifactId>flink-table-api-scala-bridge_2.11</artifactId>
  <version>1.12.3</version>
  <scope>provided</scope>
</dependency>

Local environment configuration

  • If you need to run the test program locally in the IDE, you need to add the following modules. Which one to use depends on which engine is Flink or Blink.
<!-- flink Engine, 1.9 Default before version -->
<dependency>
  <groupId>org.apache.flink</groupId>
  <artifactId>flink-table-planner_2.11</artifactId>
  <version>1.12.3</version>
  <scope>provided</scope>
</dependency>
<!-- Blink Current default engine -->
<dependency>
  <groupId>org.apache.flink</groupId>
  <artifactId>flink-table-planner-blink_2.11</artifactId>
  <version>1.12.3</version>
  <scope>provided</scope>
</dependency>

optimization

stream supports scala

  • In terms of internal implementation, some table related codes are developed in Scala. Regardless of batch or streaming programs, the following dependencies must be added.
<dependency>
  <groupId>org.apache.flink</groupId>
  <artifactId>flink-streaming-scala_2.11</artifactId>
  <version>1.12.3</version>
  <scope>provided</scope>
</dependency>

Support custom formats or functions

  • If you need to implement user-defined format to parse Kafka data or user-defined function to process business, you need to add the following dependencies.
<dependency>
  <groupId>org.apache.flink</groupId>
  <artifactId>flink-table-common</artifactId>
  <version>1.12.3</version>
  <scope>provided</scope>
</dependency>

Program structure

What is the difference between the blink (New) and the blink (old) planners

  1. Blink regards batch jobs as a special case of stream processing. Table and DataSet do not support mutual conversion, and batch jobs will not be converted into DataSet programs, but into DataStream programs, and so will stream jobs.
  2. Blink planner does not support BatchTableSource, but uses bounded StreamTableSource instead.
  3. The implementation of FilterableTableSource in the old planner and Blink planner is incompatible. The old planner pushes planerexpression down to filterable tablesource, while the Blink planner pushes Expression down.
  4. String based key value configuration options are used only in the Blink planner. (see configuration for details)
  5. The implementation of planerconfig in the two planners (CalciteConfig) is different.
  6. The Blink planner optimizes multiple sins into a directed acyclic graph (DAG), which is supported by both TableEnvironment and StreamTableEnvironment. The old planner always optimizes each sink into a new directed acyclic graph, and all graphs are independent of each other.
  7. The old planner currently does not support catalog statistics, while Blink does

Program structure

  • The Table API and SQL program for batch and stream processing follow the same pattern. First create the TableEnvironment, then create the table, and continue to operate the table through the Table API or SQL to realize business functions. The example code is as follows:
// Create a TableEnvironment execution stream or batch
TableEnvironment tableEnv = ...; 

// Create input table
tableEnv.executeSql("CREATE TEMPORARY TABLE table1 ... WITH ( 'connector' = ... )");
// Create output table
tableEnv.executeSql("CREATE TEMPORARY TABLE outputTable ... WITH ( 'connector' = ... )");

// Use the Table API to perform query operations
Table table2 = tableEnv.from("table1").select(...);
// Perform query operations using SQL
Table table3 = tableEnv.sqlQuery("SELECT ... FROM table1 ... ");

// Get result data using Table API
TableResult tableResult = table2.executeInsert("outputTable");
tableResult...

summary

  • The addition of Table and SQL functions simplifies the difficulty of Flink development. If you understand SQL, you can develop the functions you need.
  • Interest is the best teacher. In addition to daily work, maintain the enthusiasm at the bottom of your heart. Treat work and interest separately and maintain vitality at all times.

Keywords: Big Data flink

Added by alexboyer on Mon, 31 Jan 2022 12:26:30 +0200