Big data warehouse technology training task 3

Big data warehouse training - task 3 Data analysis and prediction of Taobao double 11 Case introduction The case of Taobao double 11 data analysis and prediction course involves various typical operations involved in the whole process of data processing such as data preprocessing, storage, query and visual analysis, including the installatio ...

Added by mark_nsx on Fri, 21 Jan 2022 07:36:11 +0200

hive operation instruction

1. Build table songsong,bingbing_lili,xiao song:18_xiaoxiao song:19,hui long guan_beijing yangyang,caicai_susu,xiao yang:18_xiaoxiao yang:19,chao yang_beijing create table if not exists test( name string, friends array<string>, children map<string, int>, address struct<street:string, city:string> ) row format delimited ...

Added by rogair on Thu, 20 Jan 2022 06:25:28 +0200

Chapter 2 Hive installation

Chapter 2 Hive installation 2.1 hive installation address 1. Hive official website address http://hive.apache.org/ 2. Document viewing address https://cwiki.apache.org/confluence/display/Hive/GettingStarted 3. Download address http://archive.apache.org/dist/hive/ 4. github address https://github.com/apache/hive 2.2 Hive installation a ...

Added by weknowtheworld on Sun, 16 Jan 2022 01:17:49 +0200

Hive's Learning Notes - Chapter 10 Hive Practice

1. Requirement Description Statistics Silicon Valley video and video website general indicators, various TopN indicators: Count Video Views Top10Statistics Video Category Heat Top10Statistics Video Views Top20 CategoriesStatistical Video Views Rank of the Category of Video Associated with Top50Count video heat Top10 in each categoryCount vide ...

Added by dawnrae on Sat, 15 Jan 2022 04:50:44 +0200

hive sql calculates the total number and average age of all users and active users

The log is as follows. Please write the code to get the total number and average age of all users and active users. (active users refer to users who have access records for two consecutive days) Date user age 2019-02-11,test_1,23 2019-02-11,test_2,19 2019-02-11,test_3,39 2019-02-11,test_1,23 2019-02-11,test_3,39 2019-02-11,test_1,23 2019-0 ...

Added by hinchcliffe on Fri, 14 Jan 2022 23:13:51 +0200

On Hive advanced functions

Basic operation of function View the description information of the specified function: desc function function name;Display function extension content: desc function extended function name; Typical advanced functions Group sorting takes TopN To implement the grouping sorting function, you need to use row_number and over functions. row_ ...

Added by alexhard on Fri, 14 Jan 2022 10:47:21 +0200

Six stage big data -- day05 -- database creation and database table creation / hive query method / hive FAQ

----Then day04 notes continue editing----- 3.2 external table: Description of external table: Because the external table loads the data of other hdfs paths into the table, the hive table will think that it does not completely monopolize the data. Therefore, when deleting the hive table, the data is still stored in hdfs and will not be delete ...

Added by ramjai on Tue, 11 Jan 2022 18:22:09 +0200

Spark sparksql foundation, DataFrame, DataSet

Spark-SQL summary Spark SQL is a spark module used by spark for structured data processing. For developers, SparkSQL can simplify the development of RDD, improve the development efficiency, and the execution efficiency is very fast. Therefore, in practical work, SparkSQL is basically used. In order to simplify the development of RDD and impr ...

Added by Asnom on Thu, 06 Jan 2022 08:03:44 +0200

Hive: window function

1, What is the window function 2, Window function classification 1, Cumulative calculation window function 1,sum() over() It is often encountered in work to calculate the cumulative value up to a certain month. At this time, you need to use sum() to open the window For example, give a transaction form_ trade: Now it is necessary to calcul ...

Added by fpyontek on Tue, 04 Jan 2022 17:38:22 +0200

Hive [environment setup 02] [hive-3.1.2 version HiveServer2/beeline configuration use]

Hive has built-in HiveServer and HiveServer2 services, both of which allow clients to connect using multiple programming languages. However, HiveServer cannot handle concurrent requests from multiple clients, so HiveServer2 is generated. HiveServer2 (HS2) allows remote clients to submit requests to hive and retrieve results in various programmi ...

Added by greenber on Sun, 02 Jan 2022 03:01:29 +0200