Data files: https://pan.baidu.com/s/1bud5O36RtSm4dNQ17h-wuA
Extraction code: lq3a
1. Create tables
According to the data file, we can write the following statement for table building.
create table cars( province string, --Province month int, --month city string, --city county string, --District and county year int, --year cartype string,--Vehicle type productor string,--Manufacturer brand string, --brand mold string,--Vehicle type owner string,--ownership nature string, --Nature of usage number int,--Number ftype string,--Engine Model outv int,--displacement power double, --power fuel string,--Fuel types length int,--Conductor width int,--Vehicle width height int,--Vehicle height xlength int,--Chamber length xwidth int,--Compartment width xheight int,--Box Height count int,--Axle number base int,--Wheelbase front int,--Front wheel pitch norm string,--Tire specification tnumber int,--Tyre number total int,--Total mass curb int,--curb weight hcurb int,--Authenticated Load Quality passenger string,--Authorized passenger carrying capacity zhcurb int,--Quasi-traction quality business string,--Chassis enterprise dtype string,--Chassis brand fmold string,--Chassis Model fbusiness string,--Engine Enterprise name string,--Vehicle name age int,--Age sex string --Gender ) row format delimited fields terminated by '\t';
2. Importing data
Here's the data file path. Write your own storage path.
load data local inpath '/usr/local/hive/Test/cars.txt' into table cars;
(Enter select * from cars limit 2; check for successful insertion)
Successful entry.
3. Query Analysis
(Remember to write after every query sentence;)
3.1 - Statistics of the number of passenger and commercial vehicles (i.e. non-operational and operational vehicles)
nature string, -- Use nature
select nature,count(number) from cars where nature!='' group by nature;
Parsing: Here count is a counting function, which counts the number of usage properties. Later, we restrict the use of properties. This column can not be empty. This is a simple data cleaning, cleaning out the garbage data. Then group by is divided according to the different content of the use properties column, and counts separately.
3.2 - Statistics on the proportion of car sales per month in Shanxi Province in 2013
Montint, - Month
select count(*) from cars;Total quantity select month from (select month,count(*) from cars group by month);Quantity per month select month,round(mei/zong,2)as resl from (select month,count(*) as mei from cars where month is not null group by month)as mm,(select count(*) as zong from cars)as zz;
According to our division above, we can see that this question has been split into three steps. In order to find out the proportion of each month, we first need to know the total number and the number of each month, and then use the stitching to find the proportion.
In the last step of integration, we use the round function, which rounds the numbers in (), the latter two means to keep two decimal numbers, and Mei / zone is a new variable that we use as a new variable according to our understanding. "
3.3 - Statistics on the proportion of men and women who buy cars
sex String, gender
select sex,count(*) from cars;Total number select sex,count(*) from cars where sex is not null and sex !='' group by sex;men and women select sex,round(nv/zong,2)as resl from (select sex,count(*) as nv from cars where sex is not null and sex !='' group by sex)as nnvv,(select sex,count(*) as zong from cars)as zz;
The sentences here are roughly the same as those above, so you can understand them as well.
3.4 - Statistics of ownership, models and types of vehicles
mold string, - vehicle type
owner string, - ownership
cartype string, - vehicle model
select owner,cartype,mold,count(*) from cars group by owner,cartype,mold;
This sentence is also very well understood, that is, to select three columns you need, output their content, and groupby to duplicate.
3.5 - Statistics of total monthly sales of different types of vehicles
mold string, - vehicle type
Montint,--Month
select mold,month,count(*) from cars group by mold,month;
Total monthly sales are calculated by type and monthly output and count ing
3.6 - Statistics engine models and fuel types through sales of different brands of vehicles
Branstring,--Brand
ftype string, - engine model
fuel string, - fuel type
collect_set() is the same as set in java. The contents stored in it can not be duplicated. It can achieve simple de-duplication function. Collecti_list is a list, and the contents can be duplicated.
select brand,collect_set(ftype),collect_set(fuel) from cars where brand is not null and brand != '' group by brand;
Here, collect_set is used to remove the weight, and the back seat is a simple data cleaning.
3.7 - Statistics of Wuling's monthly sales
Branstring,--Brand
Montint,--Month
select brand,month,count(*) from cars where brand='Wuling' and month is not null group by brand,month;