Series: using python+antlr to analyze hive sql to obtain data consanguinity
target
Series 3 The basic AST traversal is done in.
Before deep extraction of table name and column name in SQL, we need to solve the two remaining practical problems in Chapter 3, semicolon and case
Semicolon problem
The performance of the semicolon problem is the automatically generated HiveParser ...
Added by mortal991 on Thu, 16 Jan 2020 08:58:24 +0200
Big data: installation details of Hive
What is hive?
Open source by facebook, used to solve the data statistics of massive structured logs;
A data warehouse tool based on hadoop uses HDFS to store and map structured data files into a table, and provides the function of sql like query. The bottom layer uses MR to calculate;
The essence is to transform HQL into ...
Added by calbolino on Tue, 10 Dec 2019 20:06:18 +0200
Hive later view and expand
explode(Official website link)
Expand is a UDTF (table generation function) that converts a single input row to multiple output rows. Generally, it is used in combination with general view, mainly in two ways:
Input type
Usage method
describe
T
explode(ARRAY<T> a)
Decompose the array into multiple rows, return a single column a ...
Added by leony on Sun, 08 Dec 2019 13:56:09 +0200
hive-udf operation under Ieda
Code environment: Windows 10 + Idea19-01 + spring-boot 2.1.6 + JDK1.8
jar package running environment: centos virtual machine + Hadoop 3.1.1 + hive3.1.1 + JDK1.8
Create a new spring-boot project in idea, including the basic ones. This project contains only one web package, as follows: pom.xml
The ...
Added by tonyw on Wed, 09 Oct 2019 00:54:29 +0300
Using Python to Send Hive Detailed Data by Mail
Links to the original text: https://www.jianshu.com/p/f13fb250369d
I. Requirement Description
Customers need to receive specific activity data every Monday, generate Excel or CSV files, and send them to designated recipients by mail. The p ...
Added by zvonko on Mon, 07 Oct 2019 17:21:37 +0300
Hive builds tables and imports data
Catalog
1. Tabulation statement
2. build tables
1. General table
2. external table
3. partition table
2. Importing data
1. Load data
2. Insert ... Select
3. alter partition operation
1. Tabulation statement
TABLE statement
CREATE [EXTERNAL] TABLE [IF NOT EXISTS] table_name -- build ...
Added by xt3mp0r~ on Wed, 02 Oct 2019 11:05:23 +0300
Hive's Practical Analysis - Automobile Sales Problem (Code + Analysis)
Data files: https://pan.baidu.com/s/1bud5O36RtSm4dNQ17h-wuA
Extraction code: lq3a
1. Create tables
According to the data file, we can write the following statement for table building.
create table cars(
province string, --Province
month int, --mon ...
Added by dirkdetken on Tue, 10 Sep 2019 16:04:54 +0300
Programmers chatting downstairs: a jvm crash investigation
Downstairs of an office building on Dawang Road.
Ape A: When we worked in our company, we ran a lot of data processing tasks in the middle of the night, and then the program often crashed.
Me: Oh? How to deal with that?
Ape A: There was some water in the architecture at that time. It said that we should adjust the ratio of "Eden" to ...
Added by hofmann777 on Mon, 09 Sep 2019 13:09:17 +0300
Installation and Configuration of Hive
In order to explore the mystery and greatness of Hive, we embarked on the road of learning Hive, the good and bad of this tool, let alone install Hive first...
We use MySQL to store Hive's metadata Metastore, so install MySQL first. The specific ...
Added by alpachino on Sat, 07 Sep 2019 15:05:02 +0300
flume Learning - Including Installation
1. What is Flume: Flume is a highly available, highly reliable, distributed system for collecting, aggregating and transferring massive logs provided by Cloudera. Flume is based on streaming architecture, flexible and simple.
Flume Composition A ...
Added by SeenGee on Wed, 28 Aug 2019 12:43:54 +0300