Oracle Advanced Query OVER (PARTITION BY.)

In order to facilitate learning and testing, all the examples are built under Oracle's own user Scott.

Note: The red order by in the title indicates that order by must be taken when using this method.

1. rank ()/ dense_rank () over (partition by... Order by...)

Now customers have such a need to query the information of the highest-paid employees in each department. I believe that students with certain oracle application knowledge can write the following SQL statements:
  1. select e.ename, e.job, e.sal, e.deptno  
  2.   from scott.emp e,  
  3.        (select e.deptno, max(e.sal) sal from scott.emp e group by e.deptno) me  
  4.  where e.deptno = me.deptno  
  5.    and e.sal = me.sal;  
In meeting customer needs, we should habitually think about whether there are other ways. This is for sure, using the rank() over(partition by...) or dense_rank() over(partition by...) grammar in the title of this section. The SQL is as follows:
  1. select e.ename, e.job, e.sal, e.deptno  
  2.   from (select e.ename,  
  3.                e.job,  
  4.                e.sal,  
  5.                e.deptno,  
  6.                rank() over(partition by e.deptno order by e.sal desc) rank  
  7.           from scott.emp e) e  
  8.  where e.rank = 1;  
  1. select e.ename, e.job, e.sal, e.deptno  
  2.   from (select e.ename,  
  3.                e.job,  
  4.                e.sal,  
  5.                e.deptno,  
  6.                dense_rank() over(partition by e.deptno order by e.sal desc) rank  
  7.           from scott.emp e) e  
  8.  where e.rank = 1;  
Why do you get the same result as the above statement? Here is a supplement to the rank()/dense_rank() over(partition by e.deptno order by e.sal desc) grammar.
over: On what conditions.
partition by e.deptno: by department number (partition).
order by e.sal desc: Rank from high to low (using rank()/dense_rank(), you have to take order by or it's illegal)
rank()/dense_rank():: Classification
The whole sentence means that on the basis of division by department, employees are graded according to their wages from high to low, and "rank" is expressed by numbers from small to large (the minimum must be 1).  

So what's the difference between rank() and dense_rank()?
rank(): Jump sort, if there are two first levels, then there is the third level.
dense_rank(): Continuous sorting, if there are two first levels, then it is still the second level.

Homework: Inquire about the minimum wage employees in the department.

2. min()/max() over(partition by...)

Now that we have inquired about the department's maximum/minimum wage, the customer's demand has returned. We have inquired about the employee's information and calculated the difference between the employee's wage and the department's maximum/minimum wage. This is still relatively simple, and it is revised on the basis of the groupby statement in the first section as follows:

  1. select e.ename,  
  2.          e.job,  
  3.          e.sal,  
  4.          e.deptno,  
  5.          e.sal - me.min_sal diff_min_sal,  
  6.          me.max_sal - e.sal diff_max_sal  
  7.     from scott.emp e,  
  8.          (select e.deptno, min(e.sal) min_sal, max(e.sal) max_sal  
  9.             from scott.emp e  
  10.            group by e.deptno) me  
  11.    where e.deptno = me.deptno  
  12.    order by e.deptno, e.sal;  
Above we used min() and max(), the former for the minimum and the latter for the maximum. What would happen if these two methods were used in conjunction with over(partition by...)? Let's look at the following SQL statements:
  1. select e.ename,  
  2.        e.job,  
  3.        e.sal,  
  4.        e.deptno,  
  5.        nvl(e.sal - min(e.sal) over(partition by e.deptno), 0) diff_min_sal,  
  6.        nvl(max(e.sal) over(partition by e.deptno) - e.sal, 0) diff_max_sal  
  7.   from scott.emp e;  
The query results of these two statements are the same. You can see that min() and max() actually seek the minimum and maximum, but only on the basis of partition by partition.

Homework: What would happen if order by was added to this example?

3. lead()/lag() over(partition by... order by...)

Chinese people like to compare, have good face and are famous all over the world. Customers are even better at this point, but they are not addicted after comparing with the maximum/minimum wage. This time, they put forward a more abnormal demand, calculating the difference between personal wage and one higher/one lower wage than themselves. This requirement really bothers me. I don't know how to implement it in the group by statement. However.... Now that we have over partition by..., everything seems so simple. As follows:

  1. select e.ename,  
  2.        e.job,  
  3.        e.sal,  
  4.        e.deptno,  
  5.        lead(e.sal, 1, 0) over(partition by e.deptno order by e.sal) lead_sal,  
  6.        lag(e.sal, 1, 0) over(partition by e.deptno order by e.sal) lag_sal,  
  7.        nvl(lead(e.sal) over(partition by e.deptno order by e.sal) - e.sal,  
  8.            0) diff_lead_sal,  
  9.        nvl(e.sal - lag(e.sal) over(partition by e.deptno order by e.sal), 0) diff_lag_sal  
  10.   from scott.emp e;   
After reading the above sentences, do you also feel a false alarm (a sudden freezing of chickens after a cold sweat, which is easy to catch a cold)? Let's talk about the two new methods used above.
Lead (column name, n,m): The default value of column name > in line n after the current record is m if there is no column name; if there is no parameter n,m, the default value is null if there is no column name > in the first row after the current record.
Lag (column name, n,m): The default value of < column name > in line n before the current record is m if there is no record; if there is no parameter n,m, the default value is null if there is no record < column name > in the first row before the current record.

Here are some common methods used in this grammar.

  1. select e.ename,  
  2.        e.job,  
  3.        e.sal,  
  4.        e.deptno,  
  5.        first_value(e.sal) over(partition by e.deptno) first_sal,  
  6.        last_value(e.sal) over(partition by e.deptno) last_sal,  
  7.        sum(e.sal) over(partition by e.deptno) sum_sal,  
  8.        avg(e.sal) over(partition by e.deptno) avg_sal,  
  9.        count(e.sal) over(partition by e.deptno) count_num,  
  10.        row_number() over(partition by e.deptno order by e.sal) row_num  
  11.   from scott.emp e;  

Important Note: After reading this article, you may have a misunderstanding that OVER (PARTITION BY.) is better than GROUP BY, but it is not so. The former can not replace the latter, and the former is not as efficient as the latter, but the former provides more functions, so we hope that you can choose according to the needs in use.

Keywords: SQL Oracle

Added by Dragonfly on Thu, 16 May 2019 05:30:30 +0300