1, Background introduction
Today, the monthly test results of class 3 in Senior High School of pheasant university came out. Let's announce your test results first.

Then, I'll announce to you the living consumption of your classmates.

Let's make a simple analysis using mysql based on the above examination results and living consumption records.
Of course, it can be seen from the title of this article. This article is to tell you how to use SQL "window function" in combination with this data?

From the above figure, we should know the importance of this knowledge point. Including your future study of hive or oracle database, or data analysis interview, this will be a very important knowledge point.
2, Create table statements and insert data
Create table
create table exam_score( sname varchar(20), age int, subject varchar(20), score varchar(20) )charset=utf8; # ----------------------- # create table cost_fee( sname varchar(20), buydate varchar(20), buycost int )charset=utf8;
insert data
insert into exam_score values ('Zhang San' , 18, 'language' , 90), ('Zhang San' , 18, 'mathematics' , 80), ('Zhang San' , 18, 'English' , 70), ('Li Si' , 21, 'language' , 88), ('Li Si' , 21, 'mathematics' , 78), ('Li Si' , 21, 'English' , 71), ('Wang Wu' , 18, 'language' , 95), ('Wang Wu' , 18, 'mathematics' , 83), ('Wang Wu' , 18, 'English' , 71), ('Zhao Liu' , 19, 'language' , 98), ('Zhao Liu' , 19, 'mathematics' , 90), ('Zhao Liu' , 19, 'English' , 80); # ----------------------- # insert into cost_fee values ('Zhang San','2019-01-01',10), ('Zhang San','2019-03-03',23), ('Zhang San','2019-02-05',46), ('Li Si','2019-02-02',15), ('Li Si','2019-01-07',50), ('Li Si','2019-03-04',29), ('Wang Wu','2019-03-08',62), ('Wang Wu','2019-02-09',68), ('Wang Wu','2019-01-11',75), ('Zhao Liu','2019-02-08',55), ('Zhao Liu','2019-03-10',12), ('Zhao Liu','2019-01-12',80);
3, Introduction to window function classification
Before formally talking about the application of "window function", I will take you to sort out the basis of "window function". We can divide window functions into the following categories:
- Aggregate function + over() collocation;
- Sort function + over() collocation;
- ntile() function + over();
- Offset function + over() collocation;
What are the functions of each class? Observe the mind map below.

For over(), there are two commonly used keywords that must be described. As follows:
- partition by + field: you can imagine the group by keyword, which is used for "grouping";
- order by + field: This is easier to understand, which is the keyword used for "sorting";
4, Window function application
Above, we have introduced some commonly used "window functions". Here, using the data created at the beginning of the article, we will talk about the application of "window functions".
I hope you can summarize the meaning of each function through each case. I won't write it in detail here.
1. Aggregate function + over() collocation
① Calculate the score and average of each student
select sname ,subject ,score ,avg(score) over(partition by sname) as avg_score from exam_score
The results are as follows:

② Calculate the consumption and total consumption of each student from January to March
select sname ,buydate ,buycost ,sum(buycost) over(partition by sname) as sum_cost from cost_fee
The results are as follows:

③ Calculate the consumption and cumulative total consumption of each student from January to March
select sname ,buydate ,buycost ,sum(buycost) over(partition by sname order by buydate) as sum_cost from cost_fee
The results are as follows:

Note: combined with ② and ③, you can find that partition by combined with order by and without order by get completely different results. One is group summation (without order by); One is the cumulative sum of groups (plus order by).
2. Sort function + over() collocation
① Calculate the ranking of each subject. The same score is ranked differently, and the order is increased in turn
select sname ,subject ,score ,row_number() over(partition by subject order by score) rank1 from exam_score
The results are as follows:

② Calculate the ranking of each subject. The same score ranks the same, and the remaining ranking jumps
select sname ,subject ,score ,rank() over(partition by subject order by score) rank1 from exam_score
The results are as follows:

③ Calculate the ranking of each subject. The same scores rank the same, and the remaining ranking order increases
select sname ,subject ,score ,dense_rank() over(partition by subject order by score) rank1 from exam_score
The results are as follows:

3. ntile() function + over() collocation
The ntile() function is a bit cluttered. You don't know what kind to classify it. This function mainly uses "data segmentation". If this function is useful, it can also sort the data, similar to the row mentioned above_ Number() function.
① Yes, exam_score table to split the whole table
select sname ,subject ,score ,ntile(4) over() rank1 from exam_score
The results are as follows:

If you don't believe it, go down and try it. No matter what number is written in ntile(), it seems to be OK.
② Yes, exam_score table, grouped and segmented by subject
select sname ,subject ,score ,ntile(4) over(partition by subject) rank1 from exam_score
The results are as follows:

Even for grouping and segmentation, you will find that this is meaningless because the score is not sorted.
② Yes, exam_score table. After sorting scores, they are grouped and segmented according to subject s (the most useful)
select sname ,subject ,score ,ntile(4) over(partition by subject order by score) rank1 from exam_score
The results are as follows:

Note: if you carefully observe this usage, it is basically equivalent to row_number() function, the effect is the same.
4. Offset function + over() collocation
① Show your "last purchase time" and "next purchase time"
Note: for the first day, "first buy" is displayed; For the last day, "last buy" is displayed;
select sname ,buydate ,lag(buydate,1,'first day') over(partition by sname order by buydate) as Last purchase time ,lead(buydate,1,'last day') over(partition by sname order by buydate) as Next purchase time from cost_fee
The results are as follows:

② "First purchase time" and "last purchase time" of each student as of the current date
select sname ,buydate ,first_value(buydate) over(partition by sname order by buydate) as First purchase time ,last_value(buydate) over(partition by sname order by buydate) as Last purchase time from cost_fee
The results are as follows:

③ Show each student's "first purchase time" and "last purchase time"
Note: it doesn't say "as of the current date". Please pay attention to the difference between ② and ③. Different needs lead to different results.
select sname ,buydate ,first_value(buydate) over(partition by sname order by buydate) as First purchase time ,last_value(buydate) over(partition by sname ) as Last purchase time from cost_fee
The results are as follows:
