[learning notes] Task05: window functions -- Tianchi Longzhu plan SQL training camp

Task05: window functions, etc. - Tianchi Longzhu program SQL training camp

Original tutorial link

Window function

Window functions are also called OLAP functions. OLAP is short for online analytical processing, which means real-time analysis and processing of database data.

  • Conventional SELECT statements are used to query the whole table, and the window function allows us to selectively summarize, calculate and sort some data.

    General form of window function:

    <Window function> OVER ([PARTITION BY <Listing>]
                         ORDER BY <Column name for sorting>)  
    
    • PARTITION BY is used for grouping, that is, to select which window to look at, which is similar to the grouping function of the GROUP BY clause. However, the PARTITION BY clause does not have the summary function of the GROUP BY clause and does not change the number of rows recorded in the original table.

      • If the partton by keyword is not added, the whole table is a window

        • Note that MySql is a window for the line and the previous behavior

    • ORDER BY is used to sort, that is, to determine the sort rule (field) in the window.

  • example:

    • SELECT product_name
             ,product_type
             ,sale_price
             ,RANK() OVER (PARTITION BY product_type
                               ORDER BY sale_price) AS ranking
        FROM product  
      
    • In this case, press product_type group, and then within the group, press sale_price sort. Note that press product here_ The type grouping does not merge the of a group into one row.

    • result:

Where window functions are used

Generally, it can only be used in the select clause

  • Syntactically, in addition to the SELECT clause, the ORDER BY clause or the SET clause of the UPDATE statement can also be used. However, there are few actual business examples, so at the beginning, we just need to remember that "it can only be used in the SELECT clause".

Why?

  • In DBMS, window function is used to operate the "result" processed by WHERE clause or group by clause

Window function classification

It can be divided into two categories:

  • First, aggregate functions such as SUM, MAX and MIN are used in window functions

  • Second, RANK and DENSE_RANK and other special window functions for sorting

Special window functions:

  • RANK

    • When there are 3 records in the first place: 1, 1, 1, 4
  • DENSE_RANK

    • When there are three records in the first place: 1, 1, 1, 2
  • ROW_NUMBER

    • When there are three records in the first place: 1, 2, 3 and 4
  • example:

    • SELECT  product_name
             ,product_type
             ,sale_price
             ,RANK() OVER (ORDER BY sale_price) AS ranking
             ,DENSE_RANK() OVER (ORDER BY sale_price) AS dense_ranking
             ,ROW_NUMBER() OVER (ORDER BY sale_price) AS row_num
        FROM product  
      

Application of aggregate function in window function

When the aggregate function is used in the window function, it calculates the aggregate of all data accumulated to the current row.

All aggregate functions can be used as window functions, and their syntax is exactly the same as that of special window functions.

example:

  • SELECT product_id, product_name, sale_price,
     SUM (sale_price) OVER (ORDER BY product_id) AS current_sum
    FROM Product;
    
    • Result: calculate the total value of the sales unit price of the goods with the commodity number "less than yourself".

Calculate moving average

You can specify the summary range in more detail for the rows contained in the window: the summary range is called the frame.

  • example

    • -- Code listing 8-6 Specify the nearest 3 rows as the summary object
      SELECT product_id, product_name, sale_price,
      	AVG (sale_price) OVER (ORDER BY product_id
      							ROWS 2 PRECEDING) AS moving_avg
      FROM Product;
      
    • Rows 2 predicting the first 2 rows and this row, or: the 3 rows closest up

      • Here, we use the keywords "ROWS" and "predicting" to specify the frame as "up to the previous ~ ROWS". Therefore, "ROWS 2 predicting" is to specify the frame as "up to the previous 2 ROWS", that is, to limit the records as summary objects to the following "closest 3 ROWS".
    • If the number in the condition is changed to "rows 5 predicting", it means "up to the previous 5 rows" (the nearest 6 rows)
  • Such a statistical method is called moving average

allied:

  • Specify the frame as 2 lines after the current record (the nearest 3 lines):

    • ROWS 2 FOLLOWING

  • If you want to use the front and back lines of the current record as summary objects, you can use the keywords "PRECEDING" and "FOLLOWING" at the same time.

    • SELECT product_id, product_name, sale_price,
      AVG (sale_price) OVER (ORDER BY product_id
      						ROWS BETWEEN 1 PRECEDING AND 
      						1 FOLLOWING) AS moving_avg
      FROM Product;
      
      • Take the interval of "1 predicting" (previous line) and "1 FOLLOWING" (subsequent line) as the summary object

        ● record of the previous line
        ● self (current record)
        ● record on the next line
  • be careful:

    • ORDER BY in the OVER clause is only used to determine the order in which window functions are calculated, and has no effect on the order of results.
    • If you want the records to be sorted according to the ORDER BY clause in the window function, you need to sort the results after the main query statement:

      • SELECT product_name, product_type, sale_price,
        RANK () OVER (ORDER BY sale_price) AS ranking
        FROM Product
        ORDER BY ranking;
        
      • Finally, ORDER BY ranking is added;

GROUPING operator

Only ROLLUP is introduced here. For the rest, please refer to the books recommended at the beginning

  • The GROUPING operator includes the following three types

    • ROLLUP
    • CUBE
    • GROUPING SETS
  • ROLLUP

    • Subtotal line and total line can be calculated at the same time

    • example:

      • SELECT product_type, regist_date, SUM(sale_price) AS sum_price
        FROM Product
        GROUP BY product_type, regist_date with ROLLUP;
        
  • For more details, it is recommended to refer to the books listed at the beginning of the article

Problem solving:

5.2

Continue to use the product table to calculate the total sales_price of each date in ascending order of registration date. The sorting is to rank the "sports T-shirt" record with NULL registration date in the first place (that is, it is regarded as earlier than other dates)

Answer:

-- incorrect null Answer for processing arranged in the first line:
SELECT product_type,regist_date,SUM(sale_price) AS sum_price 
FROM product 
GROUP BY regist_date,product_type WITH ROLLUP ;

Keywords: Database MySQL Big Data SQL

Added by Mahfouz on Sat, 25 Dec 2021 17:38:09 +0200