hive tuning example analysis
hive distribute by group application tuning
Group by fields in the table
set hive.auto.convert.join=true;
set hive.auto.convert.join.noconditionaltask=true;
set hive.auto.convert.join.noconditionaltask.size=10000000;
set hive.mapjoin.smalltable.filesize=200000000;
set hive.merge.mapfiles = true;
set hive.merge.mapredfiles = false; --MR Small ...
Added by jaydeee on Thu, 17 Feb 2022 20:24:42 +0200
Hadoop cluster ecological construction
Hadoop cluster construction (continuous update)
The relevant resource files that are not used in this paper, the extraction code eeee
1: Preparations to be completed before starting construction
Built Linux serverYou can access the public network (ping www.baidu.com), and you can ping itXshell connection (can be omitted)Server version infor ...
Added by ijug.net on Thu, 17 Feb 2022 10:50:58 +0200
python parallel scheduling spark tasks
background
Translate pyspark code that implements a business logic into sparksql to supplement the historical data for the past six months (run by day) based on sparksql;
Core Point
1) Translate pyspark to sparksql; 2) Based on sparksql, supplement the historical data of the past half year (run by day);
Realization
1) First, pyspark is tra ...
Added by crimsonmoon on Fri, 11 Feb 2022 03:30:23 +0200
CDH6.1. Upgrade Impala to version 3.4 to enable auto refresh metadata function and Its Solutions
At cdh6 Version 1 we try on cdh6 In version 1, Impala was upgraded and the function of automatically refreshing metadata was enabled. Some problems were encountered during this period. They were finally solved by checking the log, source code, Google and so on. Use this article to sort it out and give back to the community.
The main reference ...
Added by gwydionwaters on Tue, 08 Feb 2022 02:43:35 +0200
Atlas installation of big data components based on Apache version
Atlas2.1.0 detailed installation record of big data components based on Apache open source version (test environment)
Note: Atlas installation refers to a large number of online materials. This record is only used for future convenience. If there is infringement in this article, please contact immediately.
Component version
Component nameCom ...
Added by Dark.Munk on Wed, 02 Feb 2022 06:26:47 +0200
Statistical topics on ant forest plant application (Hive example)
Application Statistics of ant forest plants
Create two tables user_low_carbon: it records the user's daily low-carbon life in ant forest plant_carbon: ant forest plant exchange form, which is used to record the carbon emission reduction required to apply for environmental protection plants
Table structure
Table 1 table_name: user_low_carb ...
Added by TheSeeker on Sun, 30 Jan 2022 14:59:02 +0200
Chapter VII_ Partition table [single partition, multi partition, dynamic partition, modified partition]
1. What is zoning 1. Partitions in hive are subdirectories (for data files) (table = directory, partition = directory)2. Why create partitions (benefits of partitions) 1. Data isolation & Query Optimization3. Single partition
-- Single partition
-- Create partition table(Single partition)
create table home.ods_front_log_dd (
log_id strin ...
Added by Fife Club on Sun, 30 Jan 2022 07:59:34 +0200
hive partition notes
hive partition
1. Primary zoning
A partition in Hive is a subdirectory. It is basically consistent with the slice in map. Map slicing is also to improve parallelism. Open the data in the table separately. When you check the data in the table, write the partition information to avoid scanning the whole table; It is an optimized scheme.
The pa ...
Added by jeff21 on Sat, 29 Jan 2022 17:12:37 +0200
Import and processing of business data in offline data warehouse
I don't know if it's good or not. I'll try my best to tell it
Data synchronization
The previous article talked about using Sqoop to export the data of Mysql and Hdfs to each other.
This is the second chapter of offline warehouse. It is about the processing of business data.
The basic business data of offline data warehouse are stored in M ...
Added by kurtsu on Thu, 27 Jan 2022 19:00:14 +0200
Detailed installation of Hadoop full set of components in Li Jian collection -- taking you into the abyss of big data
catalogue
Hadoop deployment
Deploy components
1, VMware deployment installation
2, Ubuntu18 Deployment and installation of version 04.5
3, Installing VMware Tools
4, Configure ssh password free login
5, Java environment installation
Hadoop installation
MySQL installation and deployment
hive installation deployment
Sqoop installati ...
Added by Entanio on Sun, 23 Jan 2022 03:34:05 +0200