hive tuning example analysis

hive distribute by group application tuning Group by fields in the table set hive.auto.convert.join=true; set hive.auto.convert.join.noconditionaltask=true; set hive.auto.convert.join.noconditionaltask.size=10000000; set hive.mapjoin.smalltable.filesize=200000000; set hive.merge.mapfiles = true; set hive.merge.mapredfiles = false; --MR Small ...

Added by jaydeee on Thu, 17 Feb 2022 20:24:42 +0200

Hadoop cluster ecological construction

Hadoop cluster construction (continuous update) The relevant resource files that are not used in this paper, the extraction code eeee 1: Preparations to be completed before starting construction Built Linux serverYou can access the public network (ping www.baidu.com), and you can ping itXshell connection (can be omitted)Server version infor ...

Added by ijug.net on Thu, 17 Feb 2022 10:50:58 +0200

python parallel scheduling spark tasks

background Translate pyspark code that implements a business logic into sparksql to supplement the historical data for the past six months (run by day) based on sparksql; Core Point 1) Translate pyspark to sparksql; 2) Based on sparksql, supplement the historical data of the past half year (run by day); Realization 1) First, pyspark is tra ...

Added by crimsonmoon on Fri, 11 Feb 2022 03:30:23 +0200

CDH6.1. Upgrade Impala to version 3.4 to enable auto refresh metadata function and Its Solutions

At cdh6 Version 1 we try on cdh6 In version 1, Impala was upgraded and the function of automatically refreshing metadata was enabled. Some problems were encountered during this period. They were finally solved by checking the log, source code, Google and so on. Use this article to sort it out and give back to the community. The main reference ...

Added by gwydionwaters on Tue, 08 Feb 2022 02:43:35 +0200

Atlas installation of big data components based on Apache version

Atlas2.1.0 detailed installation record of big data components based on Apache open source version (test environment) Note: Atlas installation refers to a large number of online materials. This record is only used for future convenience. If there is infringement in this article, please contact immediately. Component version Component nameCom ...

Added by Dark.Munk on Wed, 02 Feb 2022 06:26:47 +0200

Statistical topics on ant forest plant application (Hive example)

Application Statistics of ant forest plants Create two tables user_low_carbon: it records the user's daily low-carbon life in ant forest plant_carbon: ant forest plant exchange form, which is used to record the carbon emission reduction required to apply for environmental protection plants Table structure Table 1 table_name: user_low_carb ...

Added by TheSeeker on Sun, 30 Jan 2022 14:59:02 +0200

Chapter VII_ Partition table [single partition, multi partition, dynamic partition, modified partition]

1. What is zoning 1. Partitions in hive are subdirectories (for data files) (table = directory, partition = directory)2. Why create partitions (benefits of partitions) 1. Data isolation & Query Optimization3. Single partition -- Single partition -- Create partition table(Single partition) create table home.ods_front_log_dd ( log_id strin ...

Added by Fife Club on Sun, 30 Jan 2022 07:59:34 +0200

hive partition notes

hive partition 1. Primary zoning A partition in Hive is a subdirectory. It is basically consistent with the slice in map. Map slicing is also to improve parallelism. Open the data in the table separately. When you check the data in the table, write the partition information to avoid scanning the whole table; It is an optimized scheme. The pa ...

Added by jeff21 on Sat, 29 Jan 2022 17:12:37 +0200

Import and processing of business data in offline data warehouse

I don't know if it's good or not. I'll try my best to tell it Data synchronization The previous article talked about using Sqoop to export the data of Mysql and Hdfs to each other. This is the second chapter of offline warehouse. It is about the processing of business data. The basic business data of offline data warehouse are stored in M ...

Added by kurtsu on Thu, 27 Jan 2022 19:00:14 +0200

Detailed installation of Hadoop full set of components in Li Jian collection -- taking you into the abyss of big data

catalogue Hadoop deployment Deploy components 1, VMware deployment installation 2, Ubuntu18 Deployment and installation of version 04.5 ​ 3, Installing VMware Tools 4, Configure ssh password free login 5, Java environment installation Hadoop installation MySQL installation and deployment hive installation deployment Sqoop installati ...

Added by Entanio on Sun, 23 Jan 2022 03:34:05 +0200