Optimized points (window functions) of sparksql over hivesql

Sometimes, a select statement contains multiple window functions whose window definitions (OVER clauses) may be the same or different. For the same windows, there is no need to partition and sort them again. We can merge them into a Window operator. such as The realization principle of window function in spark and hive Case in: select i ...

Added by serverman on Tue, 07 Apr 2020 17:52:21 +0300

[Oozie] Introduction to Oozie architecture and operation model

Article directory 1, Introduction to Oozie framework 2, Main functions of Oozie 3, Oozie internal analysis 4, Horizontal and vertical scalability of Oozie 5, The Action execution model of Oozie 1, Introduction to Oozie framework Definition of Oozie: tamer An open source framework based on workfl ...

Added by Dragonfly on Mon, 16 Mar 2020 07:46:10 +0200

Hadoop HDFS operation command

Hadoop HDFS operation command View all commands supported by Hadoop HDFS hadoop fs List directory and file information hadoop fs -ls Circular listing of directory, subdirectory and file information hadoop fs -lsr Copy the test.txt of the local file system to the / user / sunlight directory ...

Added by True`Logic on Mon, 27 Jan 2020 14:52:31 +0200

Series: using python+antlr to analyze hive sql to obtain data consanguinity

target Series 3 The basic AST traversal is done in. Before deep extraction of table name and column name in SQL, we need to solve the two remaining practical problems in Chapter 3, semicolon and case Semicolon problem The performance of the semicolon problem is the automatically generated HiveParser ...

Added by mortal991 on Thu, 16 Jan 2020 08:58:24 +0200

Big data: installation details of Hive

What is hive? Open source by facebook, used to solve the data statistics of massive structured logs; A data warehouse tool based on hadoop uses HDFS to store and map structured data files into a table, and provides the function of sql like query. The bottom layer uses MR to calculate; The essence is to transform HQL into ...

Added by calbolino on Tue, 10 Dec 2019 20:06:18 +0200

Hive later view and expand

explode(Official website link) Expand is a UDTF (table generation function) that converts a single input row to multiple output rows. Generally, it is used in combination with general view, mainly in two ways: Input type Usage method describe T explode(ARRAY<T> a) Decompose the array into multiple rows, return a single column a ...

Added by leony on Sun, 08 Dec 2019 13:56:09 +0200

hive-udf operation under Ieda

Code environment: Windows 10 + Idea19-01 + spring-boot 2.1.6 + JDK1.8 jar package running environment: centos virtual machine + Hadoop 3.1.1 + hive3.1.1 + JDK1.8 Create a new spring-boot project in idea, including the basic ones. This project contains only one web package, as follows: pom.xml The ...

Added by tonyw on Wed, 09 Oct 2019 00:54:29 +0300

Using Python to Send Hive Detailed Data by Mail

Links to the original text: https://www.jianshu.com/p/f13fb250369d I. Requirement Description Customers need to receive specific activity data every Monday, generate Excel or CSV files, and send them to designated recipients by mail. The p ...

Added by zvonko on Mon, 07 Oct 2019 17:21:37 +0300

Hive builds tables and imports data

Catalog 1. Tabulation statement 2. build tables 1. General table 2. external table 3. partition table 2. Importing data 1. Load data 2. Insert ... Select 3. alter partition operation 1. Tabulation statement TABLE statement CREATE [EXTERNAL] TABLE [IF NOT EXISTS] table_name -- build ...

Added by xt3mp0r~ on Wed, 02 Oct 2019 11:05:23 +0300

Hive's Practical Analysis - Automobile Sales Problem (Code + Analysis)

Data files: https://pan.baidu.com/s/1bud5O36RtSm4dNQ17h-wuA Extraction code: lq3a 1. Create tables According to the data file, we can write the following statement for table building. create table cars( province string, --Province month int, --mon ...

Added by dirkdetken on Tue, 10 Sep 2019 16:04:54 +0300