hive small case - Comprehensive windowing function, judgment statement, date conversion, time mean calculation

What needs to be done: a full scale to calculate the average start and end time of the task in seven days 1, Introduction data The data table is a full synchronization table. The partition is based on the date. It contains the start time, end time, total seconds of start time (total seconds to the early morning), and total seconds of end ...

Added by quicknik on Mon, 21 Feb 2022 16:58:37 +0200

hive tuning example analysis

hive distribute by group application tuning Group by fields in the table set hive.auto.convert.join=true; set hive.auto.convert.join.noconditionaltask=true; set hive.auto.convert.join.noconditionaltask.size=10000000; set hive.mapjoin.smalltable.filesize=200000000; set hive.merge.mapfiles = true; set hive.merge.mapredfiles = false; --MR Small ...

Added by jaydeee on Thu, 17 Feb 2022 20:24:42 +0200

kettle data synchronization perfect version

Perfect version of kettle to realize data incremental synchronization preface Some time ago, there was an operation of using kettle to realize data synchronization, including Installation and configuration of kettle, creation of job, creation of translate, etc. At that time, the time point of dead writing was used (that is, the data wil ...

Added by EODC on Sat, 29 Jan 2022 21:55:08 +0200

Doris storage file format optimization

Doris storage file format optimization #File format Documents include: The beginning of the file is an 8-byte magic code, which is used to identify the file format and version Data Region: used to store the data information of each column. The data here is loaded by page on demand Index Region: doris uniformly stores the index data of each ...

Added by Napper on Thu, 27 Jan 2022 06:22:27 +0200

Basic steps of Flink programming and loading different types of data sources

Basic steps of Flink programming: 1. Create the stream execution environment streamexecutionenvironment Getexecutionenvironment() gets the stream environment. 2. Load data Source 3. Transformation 4. Output Sink, land it in other data warehouses and print it directly Basic operation of Flink data -- four categories Operation of a single ...

Added by Oxymen on Wed, 26 Jan 2022 23:28:31 +0200

Big data warehouse technology training task 3

Big data warehouse training - task 3 Data analysis and prediction of Taobao double 11 Case introduction The case of Taobao double 11 data analysis and prediction course involves various typical operations involved in the whole process of data processing such as data preprocessing, storage, query and visual analysis, including the installatio ...

Added by mark_nsx on Fri, 21 Jan 2022 07:36:11 +0200

Chapter 2 Hive installation

Chapter 2 Hive installation 2.1 hive installation address 1. Hive official website address http://hive.apache.org/ 2. Document viewing address https://cwiki.apache.org/confluence/display/Hive/GettingStarted 3. Download address http://archive.apache.org/dist/hive/ 4. github address https://github.com/apache/hive 2.2 Hive installation a ...

Added by weknowtheworld on Sun, 16 Jan 2022 01:17:49 +0200

On Hive advanced functions

Basic operation of function View the description information of the specified function: desc function function name;Display function extension content: desc function extended function name; Typical advanced functions Group sorting takes TopN To implement the grouping sorting function, you need to use row_number and over functions. row_ ...

Added by alexhard on Fri, 14 Jan 2022 10:47:21 +0200

Hive [environment setup 02] [hive-3.1.2 version HiveServer2/beeline configuration use]

Hive has built-in HiveServer and HiveServer2 services, both of which allow clients to connect using multiple programming languages. However, HiveServer cannot handle concurrent requests from multiple clients, so HiveServer2 is generated. HiveServer2 (HS2) allows remote clients to submit requests to hive and retrieve results in various programmi ...

Added by greenber on Sun, 02 Jan 2022 03:01:29 +0200

Hive tuning idea - knowledge summary

Hive tuning: Choosing the appropriate "storage format" and "compression method" for the analyzed data can improve the analysis efficiency of hive Data compression format: When selecting a compression algorithm, you need to consider whether it can be divided, If segmentation is not supported (the integrity of a pi ...

Added by ZHarvey on Thu, 30 Dec 2021 02:06:19 +0200