Flink reads Kafka data and sinks to Clickhouse
Flink reads Kafka data and sinks to Clickhouse
In real-time streaming data processing, we can usually do real-time OLAP processing in the way of Flink+Clickhouse. The advantages of the two will not be repeated. This paper uses a case to briefly introduce the overall process.
Overall process:
Import json format data to kafka specific topicsWr ...
Added by dallasx on Thu, 23 Dec 2021 03:37:36 +0200
One trick of SQL day: how to use HQL to extract string elements from fixed positions [explain Hive string position lookup function]
catalogue
0 problem description
1 problem solving
2 Summary
0 problem description
SQL extracts string elements from a fixed location. You have a string that contains a continuous piece of log data. You want to parse the string and extract some information from it. However, the information you need does not exist in the fixed position of t ...
Added by colbyg on Wed, 22 Dec 2021 20:23:21 +0200
ActiveMQ Message Queuing implements Point-to-Point (Queue) and Publish/Subscribe (Topic)
(Message Queue Message Queue) JMS
Preface
JMS, Java Message Service Application Interface, is a Java platform API for Message-Oriented Middleware (MOM), used to send messages between two applications for asynchronous communication JMS is a vendor-independent API for accessing and receiving system messages, similar to JDBC(Java Database Co ...
Added by Bac on Wed, 22 Dec 2021 07:36:45 +0200
Introduction to canal and its deployment, principle and use
Introduction to Alibaba canal and its deployment, principle and use
Introduction to canal
What is canal
Alibaba B2B company, because of the characteristics of its business, sellers are mainly concentrated in China and buyers are mainly concentrated in foreign countries, so it has derived the demand for remote computer rooms in Hangzhou and t ...
Added by ankit17_ag on Wed, 22 Dec 2021 06:28:20 +0200
Hadoop distributed platform construction
Building Hadoop distributed platform in linux system
First, if the liunx network cannot be connected, click "Edit" in the VMwvare main interface, and then select "virtual network editor" After entering, restore the default settings in the following two steps. Generally, you can restore them after setting 1. Environmental ...
Added by Daveyz83 on Mon, 20 Dec 2021 15:04:22 +0200
Spark shared variable
By default, if an external variable is used in an operator function, the value of this variable will be copied to each task. At this time, each task can only operate its own copy of the variable. If multiple tasks want to share a variable, this method cannot be done.
Spark provides two shared variables for this purpose:
One is broadcast varia ...
Added by The Stewart on Sun, 19 Dec 2021 22:53:16 +0200
[hard big data] summary of Flink's enterprise application in real-time computing platform and real-time data warehouse
Welcome to the blog home page: https://blog.csdn.net/u013411339 Welcome to like, collect, leave messages, and exchange messages!This article was originally written by [Wang Zhiwu] and started on CSDN blog!This article is the first CSDN forum. It is strictly prohibited to reprint without the permission of the official and myself!
This artic ...
Added by Adam_28 on Sun, 19 Dec 2021 13:25:10 +0200
Time series data analysis
Time series data analysis
Reference Zhihu article: time series data analysis 101, author: Li Jianyang
In addition, it also adds the evaluation method summary of classification and clustering + python implementation.
1 prepare and process time series data
1.1 preparing data sets
Looking for ready-made data in open source data ware ...
Added by manx on Sun, 19 Dec 2021 03:32:28 +0200
Troubleshooting hdfs for hadoop optimization
This blog is mainly about troubleshooting hadoop hdfs, including NameNode fault handling, cluster security mode and disk repair. If there is something bad, welcome everyone! thank!
nn Fault Handling
1. Scene The NameNode process hangs and the stored data is lost. How to recover the NameNode 2. Fault simulation (1) kill -9 NameNode proce ...
Added by Stressed on Sat, 18 Dec 2021 15:50:32 +0200
Big data Spark Structured Streaming
1 insufficient spark streaming
In 2016, Apache Spark launched the Structured Streaming project, a new stream computing engine based on Spark SQL, which allows users to write high-performance stream processing programs as easily as writing batch programs. Structured Streaming is not a simple improvement to Spark Streaming, but a new stre ...
Added by Gamic on Sat, 18 Dec 2021 07:26:14 +0200