Big Data [Page 21] - Programming VIP - Very Interesting Programming

Big Data

Flink reads Kafka data and sinks to Clickhouse

Flink reads Kafka data and sinks to Clickhouse In real-time streaming data processing, we can usually do real-time OLAP processing in the way of Flink+Clickhouse. The advantages of the two will not be repeated. This paper uses a case to briefly introduce the overall process. Overall process: Import json format data to kafka specific topicsWr ...

Added by dallasx on Thu, 23 Dec 2021 03:37:36 +0200

One trick of SQL day: how to use HQL to extract string elements from fixed positions [explain Hive string position lookup function]

catalogue 0 problem description 1 problem solving 2 Summary 0 problem description SQL extracts string elements from a fixed location. You have a string that contains a continuous piece of log data. You want to parse the string and extract some information from it. However, the information you need does not exist in the fixed position of t ...

Added by colbyg on Wed, 22 Dec 2021 20:23:21 +0200

ActiveMQ Message Queuing implements Point-to-Point (Queue) and Publish/Subscribe (Topic)

(Message Queue Message Queue) JMS Preface JMS, Java Message Service Application Interface, is a Java platform API for Message-Oriented Middleware (MOM), used to send messages between two applications for asynchronous communication JMS is a vendor-independent API for accessing and receiving system messages, similar to JDBC(Java Database Co ...

Added by Bac on Wed, 22 Dec 2021 07:36:45 +0200

Introduction to canal and its deployment, principle and use

Introduction to Alibaba canal and its deployment, principle and use Introduction to canal What is canal Alibaba B2B company, because of the characteristics of its business, sellers are mainly concentrated in China and buyers are mainly concentrated in foreign countries, so it has derived the demand for remote computer rooms in Hangzhou and t ...

Added by ankit17_ag on Wed, 22 Dec 2021 06:28:20 +0200

Hadoop distributed platform construction

Building Hadoop distributed platform in linux system First, if the liunx network cannot be connected, click "Edit" in the VMwvare main interface, and then select "virtual network editor" After entering, restore the default settings in the following two steps. Generally, you can restore them after setting 1. Environmental ...

Added by Daveyz83 on Mon, 20 Dec 2021 15:04:22 +0200

Spark shared variable

By default, if an external variable is used in an operator function, the value of this variable will be copied to each task. At this time, each task can only operate its own copy of the variable. If multiple tasks want to share a variable, this method cannot be done. Spark provides two shared variables for this purpose: One is broadcast varia ...

Added by The Stewart on Sun, 19 Dec 2021 22:53:16 +0200

[hard big data] summary of Flink's enterprise application in real-time computing platform and real-time data warehouse

Welcome to the blog home page: https://blog.csdn.net/u013411339 Welcome to like, collect, leave messages, and exchange messages!This article was originally written by [Wang Zhiwu] and started on CSDN blog!This article is the first CSDN forum. It is strictly prohibited to reprint without the permission of the official and myself! This artic ...

Added by Adam_28 on Sun, 19 Dec 2021 13:25:10 +0200

Time series data analysis

Time series data analysis Reference Zhihu article: time series data analysis 101, author: Li Jianyang In addition, it also adds the evaluation method summary of classification and clustering + python implementation. 1 prepare and process time series data 1.1 preparing data sets Looking for ready-made data in open source data ware ...

Added by manx on Sun, 19 Dec 2021 03:32:28 +0200

Troubleshooting hdfs for hadoop optimization

This blog is mainly about troubleshooting hadoop hdfs, including NameNode fault handling, cluster security mode and disk repair. If there is something bad, welcome everyone! thank! nn Fault Handling 1. Scene The NameNode process hangs and the stored data is lost. How to recover the NameNode 2. Fault simulation (1) kill -9 NameNode proce ...

Added by Stressed on Sat, 18 Dec 2021 15:50:32 +0200

Big data Spark Structured Streaming

1 insufficient spark streaming In 2016, Apache Spark launched the Structured Streaming project, a new stream computing engine based on Spark SQL, which allows users to write high-performance stream processing programs as easily as writing batch programs. Structured Streaming is not a simple improvement to Spark Streaming, but a new stre ...

Added by Gamic on Sat, 18 Dec 2021 07:26:14 +0200

Popular Keywords