Introduction to big data

1, Introduction to big data 1. Data and data analysis 2. Role of data analysis Current situation analysisCause analysisForecast analysis 3. Basic steps of data analysis Clarify the purpose of analysisdata collectiondata processingData analysisData presentationReport writing 4. Big data What is big dataThe challenge of massive dataCharac ...

Added by Monkeymatt on Sun, 21 Nov 2021 09:40:44 +0200

Functions in 08 hive

hive built-in function In Hive, functions are mainly divided into two types: built-in functions and user-defined functions. Function view show functions; desc function functionName; Date function 1) Current system time function: current_date(). current_timestamp(),unix_timestamp() -- Function 1:current_date(); Current system date format: ...

Added by amycrystal123 on Tue, 09 Nov 2021 09:41:28 +0200

Hive of big data foundation -- partition table and bucket table

Author: duktig Blog: https://duktig.cn (first article) Excellent still work hard. May you enjoy what you give and enjoy what you get. See github knowledge base for more articles: https://github.com/duktig666/knowledge background After learning Hadoop, do you feel that writing a MapReduce program is very complex, and it requires a lot ...

Added by mridang_agarwal on Tue, 02 Nov 2021 05:48:10 +0200

Flink+Hudi framework Lake warehouse integrated solution

Abstract: This paper introduces the prototype construction of Flink + Hudi Lake Warehouse Integration Scheme in detail. The main contents are as follows: Hudi The new architecture is integrated with the lake warehouse Best practices Flink on Hudi Flink CDC 2.0 on Hudi Tips: FFA 2021 is heavily opened. Click "read the original te ...

Added by benzrf on Mon, 18 Oct 2021 07:38:52 +0300

Data warehouse tool hive

1. What's Hive 1. General Apache Hive data warehouse software provides query and management of large data sets stored in distributed. It is built on Apache Hadoop and mainly provides the following functions: (1) It provides a series of tools that can be used to extract / transform / load data (ETL); (2) It is a mechanism that can store, quer ...

Added by SleepyP on Sat, 16 Oct 2021 08:51:21 +0300

Hive environment building + reading es data to internal tables

Scenario:          The project needs function optimization. It needs to compare the same data. Which is more efficient to query from hive or es. Therefore, we need to synchronize all the data of an index in es to hdfs, and query hdfs data through hive to compare their efficiency. Step 1: preliminary pre ...

Added by Naez on Wed, 13 Oct 2021 00:02:23 +0300

Hive sql programming interview questions

Hive sql programming interview questions Question 1 Table structure: uid,subject_id,score Ask: find out the students whose scores in all subjects are greater than the average score of a certain subject The data set is as follows 1001 01 90 1001 02 90 1001 03 90 1002 01 85 1002 02 85 1002 03 70 1003 01 70 1003 02 70 1003 03 85 1) Create t ...

Added by vinpkl on Fri, 08 Oct 2021 11:53:20 +0300

Learning to use hadoop

Tip: after the article is written, the directory can be generated automatically. Please refer to the help document on the right for how to generate it 1, The role of hadoop? What is hadoop? Hadoop is an open source framework that can write and run distributed applications to process large-scale data. It is designed for offline and larg ...

Added by wittanthony on Tue, 05 Oct 2021 00:56:46 +0300

Hive SQL syntax summary

I've been doing hive related work these days. Fortunately, I learned a little before and got started very quickly. Now I'm free, let's systematically review the syntax of hive sql again preface Hive is an application tool based on data warehouse. It is used to process structured data in Hadoop. It is based on Hadoop and operates the data th ...

Added by napier_matt on Sun, 26 Sep 2021 03:33:07 +0300

Big data Hive parameter configuration

1 clips and commands client and commands 1.1 Hive CLI $HIVE_HOME/bin/hive is a shellUtil, usually called hive's first generation client or old client. It has two main functions: 1: It is used to run Hive queries in interactive or batch mode. Note that as a client, the Hive metastore service is required and accessible, not the hiveserve ...

Added by Kodak07 on Tue, 21 Sep 2021 14:17:31 +0300