Introduction to big data
1, Introduction to big data
1. Data and data analysis
2. Role of data analysis
Current situation analysisCause analysisForecast analysis
3. Basic steps of data analysis
Clarify the purpose of analysisdata collectiondata processingData analysisData presentationReport writing
4. Big data
What is big dataThe challenge of massive dataCharac ...
Added by Monkeymatt on Sun, 21 Nov 2021 09:40:44 +0200
Functions in 08 hive
hive built-in function
In Hive, functions are mainly divided into two types: built-in functions and user-defined functions.
Function view
show functions;
desc function functionName;
Date function
1) Current system time function: current_date(). current_timestamp(),unix_timestamp()
-- Function 1:current_date();
Current system date format: ...
Added by amycrystal123 on Tue, 09 Nov 2021 09:41:28 +0200
Hive of big data foundation -- partition table and bucket table
Author: duktig
Blog: https://duktig.cn (first article)
Excellent still work hard. May you enjoy what you give and enjoy what you get.
See github knowledge base for more articles: https://github.com/duktig666/knowledge
background
After learning Hadoop, do you feel that writing a MapReduce program is very complex, and it requires a lot ...
Added by mridang_agarwal on Tue, 02 Nov 2021 05:48:10 +0200
Flink+Hudi framework Lake warehouse integrated solution
Abstract: This paper introduces the prototype construction of Flink + Hudi Lake Warehouse Integration Scheme in detail. The main contents are as follows:
Hudi The new architecture is integrated with the lake warehouse Best practices Flink on Hudi Flink CDC 2.0 on Hudi
Tips: FFA 2021 is heavily opened. Click "read the original te ...
Added by benzrf on Mon, 18 Oct 2021 07:38:52 +0300
Data warehouse tool hive
1. What's Hive
1. General Apache Hive data warehouse software provides query and management of large data sets stored in distributed. It is built on Apache Hadoop and mainly provides the following functions:
(1) It provides a series of tools that can be used to extract / transform / load data (ETL);
(2) It is a mechanism that can store, quer ...
Added by SleepyP on Sat, 16 Oct 2021 08:51:21 +0300
Hive environment building + reading es data to internal tables
Scenario:
The project needs function optimization. It needs to compare the same data. Which is more efficient to query from hive or es. Therefore, we need to synchronize all the data of an index in es to hdfs, and query hdfs data through hive to compare their efficiency.
Step 1: preliminary pre ...
Added by Naez on Wed, 13 Oct 2021 00:02:23 +0300
Hive sql programming interview questions
Hive sql programming interview questions
Question 1
Table structure: uid,subject_id,score
Ask: find out the students whose scores in all subjects are greater than the average score of a certain subject
The data set is as follows
1001 01 90
1001 02 90
1001 03 90
1002 01 85
1002 02 85
1002 03 70
1003 01 70
1003 02 70
1003 03 85
1) Create t ...
Added by vinpkl on Fri, 08 Oct 2021 11:53:20 +0300
Learning to use hadoop
Tip: after the article is written, the directory can be generated automatically. Please refer to the help document on the right for how to generate it
1, The role of hadoop?
What is hadoop?
Hadoop is an open source framework that can write and run distributed applications to process large-scale data. It is designed for offline and larg ...
Added by wittanthony on Tue, 05 Oct 2021 00:56:46 +0300
Hive SQL syntax summary
I've been doing hive related work these days. Fortunately, I learned a little before and got started very quickly. Now I'm free, let's systematically review the syntax of hive sql again
preface
Hive is an application tool based on data warehouse. It is used to process structured data in Hadoop. It is based on Hadoop and operates the data th ...
Added by napier_matt on Sun, 26 Sep 2021 03:33:07 +0300
Big data Hive parameter configuration
1 clips and commands client and commands
1.1 Hive CLI
$HIVE_HOME/bin/hive is a shellUtil, usually called hive's first generation client or old client. It has two main functions: 1: It is used to run Hive queries in interactive or batch mode. Note that as a client, the Hive metastore service is required and accessible, not the hiveserve ...
Added by Kodak07 on Tue, 21 Sep 2021 14:17:31 +0300