hive small case - Comprehensive windowing function, judgment statement, date conversion, time mean calculation
What needs to be done: a full scale to calculate the average start and end time of the task in seven days
1, Introduction data
The data table is a full synchronization table. The partition is based on the date. It contains the start time, end time, total seconds of start time (total seconds to the early morning), and total seconds of end ...
Added by quicknik on Mon, 21 Feb 2022 16:58:37 +0200
hive tuning example analysis
hive distribute by group application tuning
Group by fields in the table
set hive.auto.convert.join=true;
set hive.auto.convert.join.noconditionaltask=true;
set hive.auto.convert.join.noconditionaltask.size=10000000;
set hive.mapjoin.smalltable.filesize=200000000;
set hive.merge.mapfiles = true;
set hive.merge.mapredfiles = false; --MR Small ...
Added by jaydeee on Thu, 17 Feb 2022 20:24:42 +0200
kettle data synchronization perfect version
Perfect version of kettle to realize data incremental synchronization
preface
Some time ago, there was an operation of using kettle to realize data synchronization, including Installation and configuration of kettle, creation of job, creation of translate, etc.
At that time, the time point of dead writing was used (that is, the data wil ...
Added by EODC on Sat, 29 Jan 2022 21:55:08 +0200
Doris storage file format optimization
Doris storage file format optimization
#File format
Documents include:
The beginning of the file is an 8-byte magic code, which is used to identify the file format and version
Data Region: used to store the data information of each column. The data here is loaded by page on demand
Index Region: doris uniformly stores the index data of each ...
Added by Napper on Thu, 27 Jan 2022 06:22:27 +0200
Basic steps of Flink programming and loading different types of data sources
Basic steps of Flink programming:
1. Create the stream execution environment streamexecutionenvironment Getexecutionenvironment() gets the stream environment.
2. Load data Source
3. Transformation
4. Output Sink, land it in other data warehouses and print it directly
Basic operation of Flink data -- four categories
Operation of a single ...
Added by Oxymen on Wed, 26 Jan 2022 23:28:31 +0200
Big data warehouse technology training task 3
Big data warehouse training - task 3
Data analysis and prediction of Taobao double 11
Case introduction
The case of Taobao double 11 data analysis and prediction course involves various typical operations involved in the whole process of data processing such as data preprocessing, storage, query and visual analysis, including the installatio ...
Added by mark_nsx on Fri, 21 Jan 2022 07:36:11 +0200
Chapter 2 Hive installation
Chapter 2 Hive installation
2.1 hive installation address
1. Hive official website address
http://hive.apache.org/
2. Document viewing address
https://cwiki.apache.org/confluence/display/Hive/GettingStarted
3. Download address
http://archive.apache.org/dist/hive/
4. github address
https://github.com/apache/hive
2.2 Hive installation a ...
Added by weknowtheworld on Sun, 16 Jan 2022 01:17:49 +0200
On Hive advanced functions
Basic operation of function
View the description information of the specified function: desc function function name;Display function extension content: desc function extended function name;
Typical advanced functions
Group sorting takes TopN
To implement the grouping sorting function, you need to use row_number and over functions. row_ ...
Added by alexhard on Fri, 14 Jan 2022 10:47:21 +0200
Hive [environment setup 02] [hive-3.1.2 version HiveServer2/beeline configuration use]
Hive has built-in HiveServer and HiveServer2 services, both of which allow clients to connect using multiple programming languages. However, HiveServer cannot handle concurrent requests from multiple clients, so HiveServer2 is generated. HiveServer2 (HS2) allows remote clients to submit requests to hive and retrieve results in various programmi ...
Added by greenber on Sun, 02 Jan 2022 03:01:29 +0200
Hive tuning idea - knowledge summary
Hive tuning:
Choosing the appropriate "storage format" and "compression method" for the analyzed data can improve the analysis efficiency of hive
Data compression format:
When selecting a compression algorithm, you need to consider whether it can be divided, If segmentation is not supported (the integrity of a pi ...
Added by ZHarvey on Thu, 30 Dec 2021 02:06:19 +0200