Hive's Practical Analysis - Automobile Sales Problem (Code + Analysis)

Data files:

Extraction code: lq3a

1. Create tables

According to the data file, we can write the following statement for table building.

create  table cars(
province string, --Province
month int, --month
city string, --city 
county string, --District and county
year int, --year
cartype string,--Vehicle type
productor string,--Manufacturer
brand string, --brand
mold string,--Vehicle type
owner string,--ownership
nature string, --Nature of usage
number int,--Number
ftype string,--Engine Model
outv int,--displacement
power double, --power
fuel string,--Fuel types
length int,--Conductor
width int,--Vehicle width
height int,--Vehicle height
xlength int,--Chamber length
xwidth int,--Compartment width
xheight int,--Box Height
count int,--Axle number
base int,--Wheelbase
front int,--Front wheel pitch
norm string,--Tire specification
tnumber int,--Tyre number
total int,--Total mass
curb int,--curb weight
hcurb int,--Authenticated Load Quality
passenger string,--Authorized passenger carrying capacity
zhcurb int,--Quasi-traction quality
business string,--Chassis enterprise
dtype string,--Chassis brand
fmold string,--Chassis Model
fbusiness string,--Engine Enterprise
name string,--Vehicle name
age int,--Age
sex string --Gender
row format delimited
fields terminated by '\t';

2. Importing data

Here's the data file path. Write your own storage path.

load data local inpath '/usr/local/hive/Test/cars.txt' into table cars;

(Enter select * from cars limit 2; check for successful insertion)

Successful entry.

3. Query Analysis

(Remember to write after every query sentence;)

3.1 - Statistics of the number of passenger and commercial vehicles (i.e. non-operational and operational vehicles)

nature string, -- Use nature

select nature,count(number) from cars where nature!='' group by nature;

Parsing: Here count is a counting function, which counts the number of usage properties. Later, we restrict the use of properties. This column can not be empty. This is a simple data cleaning, cleaning out the garbage data. Then group by is divided according to the different content of the use properties column, and counts separately.

3.2 - Statistics on the proportion of car sales per month in Shanxi Province in 2013

Montint, - Month

select count(*) from cars;Total quantity

select month from (select month,count(*) from cars group by month);Quantity per month

select month,round(mei/zong,2)as resl from (select month,count(*) as mei from cars where month is not null group by month)as mm,(select count(*) as zong from cars)as zz;

According to our division above, we can see that this question has been split into three steps. In order to find out the proportion of each month, we first need to know the total number and the number of each month, and then use the stitching to find the proportion.

In the last step of integration, we use the round function, which rounds the numbers in (), the latter two means to keep two decimal numbers, and Mei / zone is a new variable that we use as a new variable according to our understanding. "

3.3 - Statistics on the proportion of men and women who buy cars

sex String, gender

select sex,count(*) from cars;Total number
select sex,count(*) from cars where sex is not null and sex !='' group by sex;men and women
select sex,round(nv/zong,2)as resl from (select sex,count(*) as nv from cars where sex is not null and sex !='' group by sex)as nnvv,(select sex,count(*) as zong from cars)as zz;

The sentences here are roughly the same as those above, so you can understand them as well.

3.4 - Statistics of ownership, models and types of vehicles

mold string, - vehicle type

owner string, - ownership

cartype string, - vehicle model

select owner,cartype,mold,count(*) from cars group by owner,cartype,mold;

This sentence is also very well understood, that is, to select three columns you need, output their content, and groupby to duplicate.

3.5 - Statistics of total monthly sales of different types of vehicles

mold string, - vehicle type


select mold,month,count(*) from cars group by mold,month;

Total monthly sales are calculated by type and monthly output and count ing

3.6 - Statistics engine models and fuel types through sales of different brands of vehicles


ftype string, - engine model

fuel string, - fuel type

collect_set() is the same as set in java. The contents stored in it can not be duplicated. It can achieve simple de-duplication function. Collecti_list is a list, and the contents can be duplicated.
select brand,collect_set(ftype),collect_set(fuel) from cars where brand is not null and brand != '' group by brand;

Here, collect_set is used to remove the weight, and the back seat is a simple data cleaning.

3.7 - Statistics of Wuling's monthly sales


select brand,month,count(*) from cars where brand='Wuling' and month is not null group by brand,month;

Keywords: hive Java

Added by dirkdetken on Tue, 10 Sep 2019 16:04:54 +0300