SQL training camp -- Task06: comprehensive exercises - 10 classic questions

This note is the learning content of Alibaba cloud Tianchi Longzhu plan SQL training camp. The link is: https://tianchi.aliyun.com/specials/promotion/aicampsql

preface

This note is the learning content of Alibaba cloud Tianchi Longzhu plan SQL training camp. The link is: https://tianchi.aliyun.com/specials/promotion/aicampsql

Dataset script

Create data table script: http://tianchi-media.oss-cn-beijing.aliyuncs.com/dragonball/SQL/create_table.sql
Insert data script: http://tianchi-media.oss-cn-beijing.aliyuncs.com/dragonball/SQL/data.zip

After downloading the script, run create in the MySQL environment_ Table.sql script, create a data table, and then unzip the downloaded data.zip. The unzipped directory is as follows:

8-10ccf_offline_stage1_train.sql
6-winequality-white.sql
5-8-10ccf_online_stage1_train.sql
4-macro industry.sql
3-ccf_offline_stage1_test_revised.sql
2-winequality-red.sql
1-9income statement.sql
1-9company operating.sql
1-7market data.sql

The serial number in front of the script file name indicates the serial number of the topic using the dataset. For example, 1-7market data.sql indicates that the dataset is used in questions 1 and 7.
Similarly, the sql script here is for you to insert data. You only need to open it and run it in MySQL environment to import the data into the data table.

Exercises and reference answers

Exercise 1

Please use the quarterly revenue forecast data sets "Income Statement.xls", "Company Operating.xlsx" and "Market Data.xlsx" of A-share listed companies, with Market Data as the main table and tick in the three tables_ Symbol information of 600383 and 600048 are combined. Only the following fields need to be displayed.
Data source: https://tianchi.aliyun.com/dataset/dataDetail?dataId=1074


Reference answer:

SELECT M.TICKER_SYMBOL, M.END_DATE, M.CLOSE_PRICE, 		
		I.TICKER_SYMBOL, I.END_DATE, I.T_REVENUE, I.T_COGS, I.N_INCOME,
		C.TICKER_SYMBOL, C.INDIC_NAME_EN, C.END_DATE, C.VALUE
FROM `market data` AS M 
INNER JOIN `Income Statement` AS I ON M.TICKER_SYMBOL = I.TICKER_SYMBOL
INNER JOIN `Company Operating` AS C  ON I.TICKER_SYMBOL = C.TICKER_SYMBOL
WHERE M.TICKER_SYMBOL = 600383 OR M.TICKER_SYMBOL = 600048;

Exercise 2

Please use the Wine Quality Data set winequality red. CSV to find out all red wines with pH=3.03, and then rank their city acid in Chinese style (the next ranking in the same ranking should be the next consecutive integer value. In other words, there should be no "interval" between ranking).
Data source: https://tianchi.aliyun.com/dataset/dataDetail?dataId=44

Table incoming statement

TICKER_SYMBOLEND_DATET_REVENUET_COGSN_INCOME

Table Market Data

TICKER_SYMBOLEND_DATECLOSE_PRICE

Table Company Operating

TICKER_SYMBOLINDIC_NAME_ENEND_DATEVALUE

Reference answer:

SELECT DENSE_RANK() OVER(ORDER BY W.`citric acid`) AS 'RANK', W.* 
FROM `winequality-white` AS W 
WHERE W.pH = 3.03;

Exercise 3

Data source: https://tianchi.aliyun.com/competition/entrance/231593/information

Using Coupon Usage Data for
In the data set ccf_offline_stage1_test_revised.csv in O2O, try to find out the businesses that issued the most total amount of coupons and the most coupons during July 2016.

Here, only the reduced amount is considered, not the coupons with a few discounts.

Reference answer:

select `Merchant_id` as 'Businesses that issue the largest number of coupons', count( `User_id`) as 'Number of sheets issued'
from `ccf_offline_stage1_test_revised` 
where year (`Date_received`) = 2016 
	and month(`Date_received`) = 07
group by `Merchant_id` 
order by  `Number of sheets issued` desc limit 1;

Exercise 4

Data source: https://tianchi.aliyun.com/dataset/dataDetail?dataId=1074

Please use the sheet index in macro & industry.xlsx, the data set in the quarterly revenue forecast of A-share listed companies_ Data, please calculate the power consumption of the whole society: primary industry: the value of the current month is in which month did the peak power consumption occur in 2015? And what percentage increase / decrease compared with the same period last year?

Reference answer:

SELECT SUM(DATA_VALUE) AS 'SUM', MONTH(PERIOD_DATE) AS 'MONTH' FROM `macro industry`
WHERE YEAR(PERIOD_DATE) = 2015
GROUP BY `MONTH`
ORDER BY `SUM` LIMIT 1;

Exercise 5

Data source: https://tianchi.aliyun.com/competition/entrance/231593/information

Using Coupon Usage Data for
According to the data set ccf_online_stage1_train.csv in O2O, what is the overall rejection rate of online coupons during June 2016? And find out the merchants with the highest rejection rate of coupons.

Abandonment rate = number of coupons received but not used / total number of coupons received

Reference answer:

Find the overall online coupon abandonment rate:

SELECT COUNT(*) AS 'Total number of coupons received',
	(SELECT COUNT(*) 
	FROM `ccf_online_stage1_train` 
    WHERE `Coupon_id` IS NOT NULL AND `Coupon_id` <> 'fixed'
    	AND `Date` IS NULL
		AND YEAR(`Date_received`) = 2016 AND MONTH( `Date_received`) = 06
		 	)AS 'Number of coupons received but not used'  
FROM `ccf_online_stage1_train` 
WHERE `Coupon_id` IS NOT NULL AND `Coupon_id` <> 'fixed'
AND YEAR(`Date_received`) = 2016 AND MONTH( `Date_received`) = 06;

Businesses seeking the highest rejection rate of coupons:

SELECT `Merchant_id`,
		`Total number of coupons received`, 
		COUNT(*) AS 'Number of coupons received but not used', 
		100*COUNT(*)/`Total number of coupons received` AS 'Coupon abandonment rate(%)' 
FROM `ccf_online_stage1_train`
INNER JOIN ( 
    SELECT `Merchant_id` AS 'M_G',  COUNT(*) AS 'Total number of coupons received' 
    FROM `ccf_online_stage1_train` 
    WHERE `Coupon_id` IS NOT NULL AND `Coupon_id` <> 'fixed' 
    	AND YEAR(`Date_received`) = 2016 AND MONTH( `Date_received`) = 06
    GROUP BY `Merchant_id` 
			) AS G
ON `Merchant_id` = G.`M_G`
WHERE `Coupon_id` IS NOT NULL AND `Date` IS NULL AND `Coupon_id` <> 'fixed' 
	AND YEAR(`Date_received`) = 2016 AND MONTH( `Date_received`) = 06
GROUP BY `Merchant_id` 
ORDER BY `Coupon abandonment rate(%)` DESC ;

Exercise 6

Data source: https://tianchi.aliyun.com/dataset/dataDetail?dataId=44

Please use the Wine Quality Data set winequality-white.csv to find out
All white wines with pH=3.63, and then rank their residual sugar in English (discontinuous ranking)

Reference answer:

SELECT RANK() OVER(ORDER BY W.`residual sugar`) AS 'RANK', ph, W.* 
FROM `winequality-red` AS W 
WHERE W.pH = 3.63;

Exercise 7

Data source: https://tianchi.aliyun.com/dataset/dataDetail?dataId=1074

Please use the sheet data in Market Data.xlsx, the data set in the quarterly revenue forecast of A-share listed companies,

By the end of 2018, what are the three industries with the largest market value? And what are the three companies with the largest market value in these three industries? (find the top three companies in each industry, i.e. a total of 9 companies)

Reference answer:

SELECT SUM(A.`MARKET_VALUE`) AS 'Total market value of the industry',B.* FROM `market data` AS A  
INNER JOIN (SELECT TYPE_NAME_CN ,TICKER_SYMBOL,`MARKET_VALUE`, 
            		RANK() 
            		OVER(PARTITION BY TYPE_NAME_CN 
                         ORDER BY `MARKET_VALUE`+0 DESC) AS 'RANK' 
            FROM `market data`
            WHERE END_DATE = '2018-05-31'
			 )AS B
ON A.TYPE_NAME_CN = B.TYPE_NAME_CN
WHERE END_DATE = '2018-05-31'
GROUP BY B.TYPE_NAME_CN, B.TICKER_SYMBOL, B.`MARKET_VALUE`
HAVING `RANK`<=3
ORDER BY `Total market value of the industry` DESC, `RANK` LIMIT 9;

Exercise 8

Data source: https://tianchi.aliyun.com/competition/entrance/231593/information

Using Coupon Usage Data for
The data sets ccf_online_stage1_train.csv and ccf_offline_stage1_train.csv in O2O try to find out the customers who used the most online and offline coupons during June 2016.

Reference answer:

SELECT A.`User_id`,  COUNT(*) AS 'Coupon usage times'
FROM (
		(SELECT OFF.User_id, OFF.Date_received, OFF.Date FROM `ccf_offline_stage1_train` AS OFF
		WHERE `Coupon_id` IS NOT NULL AND `Date` IS NOT NULL AND LEFT(`Date_received`,6)='201606') 
		UNION
		(SELECT `ON`.User_id, `ON`.Date_received, `ON`.Date FROM `ccf_online_stage1_train`  AS `ON`
		WHERE `Coupon_id` <> 'fixed' AND `Coupon_id` IS NOT NULL AND `Date` IS NOT NULL AND YEAR(`Date_received`) = 2016 AND MONTH(`Date_received`) = 6)
    ) AS A
GROUP BY A.`User_id`
ORDER BY COUNT(*) DESC
LIMIT 1;

Exercise 9

Data source: https://tianchi.aliyun.com/dataset/dataDetail?dataId=1074

Please use the sheet general in the quarterly revenue forecast data set Income Statement.xls of A-share listed companies
Business and sheet en in Company Operating.xlsx.

Find out the net profit corresponding to the quarter with the highest passenger throughput of Baiyun Airport in all years of the data set according to quarterly statistics? (note that it is the net profit corresponding to a single quarter, not cumulative net profit.)

Reference answer:

First, find out the quarter of the maximum passenger throughput - use the sum window function to make statistics year by year and quarter by quarter:

SELECT INDIC_NAME_EN, 
		SUM(`VALUE`) OVER( PARTITION BY `YEAR`, SEASON) AS S, 
		`YEAR`, 
		SEASON 
FROM
(SELECT INDIC_NAME_EN, `VALUE`,END_DATE, YEAR(END_DATE) AS 'YEAR', 
		CASE WHEN MONTH(END_DATE) IN (1,2,3) THEN 1 
				WHEN MONTH(END_DATE) IN (4,5,6) THEN 2
				WHEN MONTH(END_DATE) IN (7,8,9) THEN 3
				WHEN MONTH(END_DATE) IN (10,11,12) THEN 4
		END AS SEASON
FROM `company operating` 
WHERE INDIC_NAME_EN = 'Baiyun Airport:Passenger throughput') AS P
ORDER BY S DESC; 

SELECT `T_PROFIT`, A.* FROM `income statement` AS A
WHERE TICKER_SYMBOL = 600004 AND REPORT_TYPE = 'Q1' AND END_DATE = '2018-03-31';

Exercise 10

Data source: https://tianchi.aliyun.com/competition/entrance/231593/information

Using Coupon Usage Data for
The data sets "ccf_online_stage1_train.csv" and "ccf_offline_stage1_train.csv" in O2O try to find out the top three merchants who have been used with the most coupons online and offline during June 2016.

For example, if merchant A uses A 200 minus 50 coupon and consumer B uses A 30 minus 1 coupon, merchant A will be deducted by 51 yuan.

Reference answer:

SELECT `Merchant_id` , SUM(A) AS 'Accumulated amount of full reduction of used coupons' 
FROM ( 
	SELECT Merchant_id, 
    	RIGHT(Discount_rate,LENGTH( `Discount_rate`) - LOCATE(':',`Discount_rate` ) )AS A 
    FROM `ccf_online_stage1_train` 
	WHERE `Date_received` IS NOT NULL AND `Date` IS NOT NULL AND `Coupon_id` <> 'fixed'
	UNION 
	SELECT Merchant_id, 
    	RIGHT(Discount_rate,LENGTH( `Discount_rate`) - LOCATE(':',`Discount_rate` ) )AS A 
    FROM `ccf_offline_stage1_train` 
	WHERE `Date_received` IS NOT NULL AND `Date` IS NOT NULL) AA
GROUP BY `Merchant_id`
ORDER BY SUM(A) DESC
LIMIT 3;

Keywords: Database MySQL SQL

Added by deregular on Sun, 28 Nov 2021 15:28:26 +0200