Series: using python+antlr to analyze hive sql to obtain data consanguinity

target Series 3 The basic AST traversal is done in. Before deep extraction of table name and column name in SQL, we need to solve the two remaining practical problems in Chapter 3, semicolon and case Semicolon problem The performance of the semicolon problem is the automatically generated HiveParser ...

Added by mortal991 on Thu, 16 Jan 2020 08:58:24 +0200

Big data: installation details of Hive

What is hive? Open source by facebook, used to solve the data statistics of massive structured logs; A data warehouse tool based on hadoop uses HDFS to store and map structured data files into a table, and provides the function of sql like query. The bottom layer uses MR to calculate; The essence is to transform HQL into ...

Added by calbolino on Tue, 10 Dec 2019 20:06:18 +0200

Hive later view and expand

explode(Official website link) Expand is a UDTF (table generation function) that converts a single input row to multiple output rows. Generally, it is used in combination with general view, mainly in two ways: Input type Usage method describe T explode(ARRAY<T> a) Decompose the array into multiple rows, return a single column a ...

Added by leony on Sun, 08 Dec 2019 13:56:09 +0200

hive-udf operation under Ieda

Code environment: Windows 10 + Idea19-01 + spring-boot 2.1.6 + JDK1.8 jar package running environment: centos virtual machine + Hadoop 3.1.1 + hive3.1.1 + JDK1.8 Create a new spring-boot project in idea, including the basic ones. This project contains only one web package, as follows: pom.xml The ...

Added by tonyw on Wed, 09 Oct 2019 00:54:29 +0300

Using Python to Send Hive Detailed Data by Mail

Links to the original text: https://www.jianshu.com/p/f13fb250369d I. Requirement Description Customers need to receive specific activity data every Monday, generate Excel or CSV files, and send them to designated recipients by mail. The p ...

Added by zvonko on Mon, 07 Oct 2019 17:21:37 +0300

Hive builds tables and imports data

Catalog 1. Tabulation statement 2. build tables 1. General table 2. external table 3. partition table 2. Importing data 1. Load data 2. Insert ... Select 3. alter partition operation 1. Tabulation statement TABLE statement CREATE [EXTERNAL] TABLE [IF NOT EXISTS] table_name -- build ...

Added by xt3mp0r~ on Wed, 02 Oct 2019 11:05:23 +0300

Hive's Practical Analysis - Automobile Sales Problem (Code + Analysis)

Data files: https://pan.baidu.com/s/1bud5O36RtSm4dNQ17h-wuA Extraction code: lq3a 1. Create tables According to the data file, we can write the following statement for table building. create table cars( province string, --Province month int, --mon ...

Added by dirkdetken on Tue, 10 Sep 2019 16:04:54 +0300

Programmers chatting downstairs: a jvm crash investigation

Downstairs of an office building on Dawang Road. Ape A: When we worked in our company, we ran a lot of data processing tasks in the middle of the night, and then the program often crashed. Me: Oh? How to deal with that? Ape A: There was some water in the architecture at that time. It said that we should adjust the ratio of "Eden" to ...

Added by hofmann777 on Mon, 09 Sep 2019 13:09:17 +0300

Installation and Configuration of Hive

In order to explore the mystery and greatness of Hive, we embarked on the road of learning Hive, the good and bad of this tool, let alone install Hive first... We use MySQL to store Hive's metadata Metastore, so install MySQL first. The specific ...

Added by alpachino on Sat, 07 Sep 2019 15:05:02 +0300

flume Learning - Including Installation

1. What is Flume: Flume is a highly available, highly reliable, distributed system for collecting, aggregating and transferring massive logs provided by Cloudera. Flume is based on streaming architecture, flexible and simple. Flume Composition A ...

Added by SeenGee on Wed, 28 Aug 2019 12:43:54 +0300