Big data warehouse technology training task 3
Big data warehouse training - task 3
Data analysis and prediction of Taobao double 11
Case introduction
The case of Taobao double 11 data analysis and prediction course involves various typical operations involved in the whole process of data processing such as data preprocessing, storage, query and visual analysis, including the installatio ...
Added by mark_nsx on Fri, 21 Jan 2022 07:36:11 +0200
hive operation instruction
1. Build table
songsong,bingbing_lili,xiao song:18_xiaoxiao song:19,hui long
guan_beijing
yangyang,caicai_susu,xiao yang:18_xiaoxiao yang:19,chao yang_beijing
create table if not exists test(
name string,
friends array<string>,
children map<string, int>,
address struct<street:string, city:string> )
row format delimited ...
Added by rogair on Thu, 20 Jan 2022 06:25:28 +0200
Chapter 2 Hive installation
Chapter 2 Hive installation
2.1 hive installation address
1. Hive official website address
http://hive.apache.org/
2. Document viewing address
https://cwiki.apache.org/confluence/display/Hive/GettingStarted
3. Download address
http://archive.apache.org/dist/hive/
4. github address
https://github.com/apache/hive
2.2 Hive installation a ...
Added by weknowtheworld on Sun, 16 Jan 2022 01:17:49 +0200
Hive's Learning Notes - Chapter 10 Hive Practice
1. Requirement Description
Statistics Silicon Valley video and video website general indicators, various TopN indicators:
Count Video Views Top10Statistics Video Category Heat Top10Statistics Video Views Top20 CategoriesStatistical Video Views Rank of the Category of Video Associated with Top50Count video heat Top10 in each categoryCount vide ...
Added by dawnrae on Sat, 15 Jan 2022 04:50:44 +0200
hive sql calculates the total number and average age of all users and active users
The log is as follows. Please write the code to get the total number and average age of all users and active users. (active users refer to users who have access records for two consecutive days)
Date user age
2019-02-11,test_1,23
2019-02-11,test_2,19
2019-02-11,test_3,39
2019-02-11,test_1,23
2019-02-11,test_3,39
2019-02-11,test_1,23
2019-0 ...
Added by hinchcliffe on Fri, 14 Jan 2022 23:13:51 +0200
On Hive advanced functions
Basic operation of function
View the description information of the specified function: desc function function name;Display function extension content: desc function extended function name;
Typical advanced functions
Group sorting takes TopN
To implement the grouping sorting function, you need to use row_number and over functions. row_ ...
Added by alexhard on Fri, 14 Jan 2022 10:47:21 +0200
Six stage big data -- day05 -- database creation and database table creation / hive query method / hive FAQ
----Then day04 notes continue editing-----
3.2 external table:
Description of external table:
Because the external table loads the data of other hdfs paths into the table, the hive table will think that it does not completely monopolize the data. Therefore, when deleting the hive table, the data is still stored in hdfs and will not be delete ...
Added by ramjai on Tue, 11 Jan 2022 18:22:09 +0200
Spark sparksql foundation, DataFrame, DataSet
Spark-SQL
summary
Spark SQL is a spark module used by spark for structured data processing.
For developers, SparkSQL can simplify the development of RDD, improve the development efficiency, and the execution efficiency is very fast. Therefore, in practical work, SparkSQL is basically used. In order to simplify the development of RDD and impr ...
Added by Asnom on Thu, 06 Jan 2022 08:03:44 +0200
Hive: window function
1, What is the window function
2, Window function classification
1, Cumulative calculation window function
1,sum() over()
It is often encountered in work to calculate the cumulative value up to a certain month. At this time, you need to use sum() to open the window For example, give a transaction form_ trade: Now it is necessary to calcul ...
Added by fpyontek on Tue, 04 Jan 2022 17:38:22 +0200
Hive [environment setup 02] [hive-3.1.2 version HiveServer2/beeline configuration use]
Hive has built-in HiveServer and HiveServer2 services, both of which allow clients to connect using multiple programming languages. However, HiveServer cannot handle concurrent requests from multiple clients, so HiveServer2 is generated. HiveServer2 (HS2) allows remote clients to submit requests to hive and retrieve results in various programmi ...
Added by greenber on Sun, 02 Jan 2022 03:01:29 +0200