Task06: comprehensive exercises - 10 classic questions - exercises

This note is the learning content of Alibaba cloud Tianchi Longzhu program SQL training camp. The link is: https://tianchi.aliyun.com/specials/promotion/aicampsql

Exercise 1

Please use the quarterly revenue forecast data sets "Income Statement.xls", "Company Operating.xlsx" and "Market Data.xlsx" of A-share listed companies, with Market Data as the main table and ticker in the three tables_ The information with symbol 600383 and 600048 is combined. Only the following fields need to be displayed.

Table nameField name
Income StatementTICKER_SYMBOL
Income StatementEND_DATE
Income StatementT_REVENUE
Income StatementT_COGS
Income StatementN_INCOME
Market DataTICKER_SYMBOL
Market DataEND_DATE_
Market DataCLOSE_PRICE
Company OperatingTICKER_SYMBOL
Company OperatingINDIC_NAME_EN
Company OperatingEND_DATE
Company OperatingVALUE

Exercise Code:

SELECT inst.TICKER_SYMBOL, inst.END_DATE, inst.T_REVENUE, inst.T_COGS, inst.N_INCOME,
md.TICKER_SYMBOL, md.END_DATE, md.CLOSE_PRICE,
co.TICKER_SYMBOL, co.INDIC_NAME_EN, co.END_DATE, co.VALUE
FROM `Market Data` AS md
JOIN `income statement` AS inst
ON md.TICKER_SYMBOL = inst.TICKER_SYMBOL
JOIN `Company Operating` AS co
ON co.TICKER_SYMBOL = co.TICKER_SYMBOL
WHERE md.TICKER_SYMBOL IN ('600383', '600048');

Exercise results:

Exercise 2

Please use Wine Quality Data set "winequality red. CSV" to find out all red wines with pH=3.03, and then rank their city acid in Chinese style (the next ranking in the same ranking should be the next consecutive integer value. In other words, there should be no "interval" between ranking)

Exercise Code:

SELECT *, DENSE_RANK() OVER (ORDER BY `citric acid`) AS RN
FROM `winequality-red` 
WHERE pH = '3.03';

Exercise results:

Exercise 3

Using the data set ccf_offline_stage1_test_revised.csv in Coupon Usage Data for O2O, try to find out the merchants with the largest total amount of coupons and the largest number of coupons during July 2016.

Here, only the amount of full reduction is considered, and coupons with a few discounts are not considered.

Exercise Code:

SELECT * FROM 
(SELECT SUBSTRING(Date_received, 1, 7) AS date_rec, 
Merchant_id, 
COUNT(User_id) AS coupon_cnt,
NULL AS `ttl_dis_amt`
FROM `ccf_offline_stage1_test_revised`
WHERE SUBSTRING(Date_received, 1, 7) = '2016-07'
GROUP BY Merchant_id
ORDER BY COUNT(User_id) DESC
LIMIT 1) temp1
UNION ALL
SELECT * FROM 
(SELECT SUBSTRING(Date_received, 1, 7) AS date_rec, 
Merchant_id,
NULL AS `coupon_cnt`,
SUM(SUBSTRING_INDEX(Discount_rate, ':', -1) + 0) AS ttl_dis_amt
FROM `ccf_offline_stage1_test_revised`
WHERE SUBSTRING(Date_received, 1, 7) = '2016-07'
GROUP BY Merchant_id
ORDER BY SUM(SUBSTRING_INDEX(Discount_rate, ':', -1) + 0) DESC
LIMIT 1) temp2;

Exercise results:

Exercise 4

Please use the sheet index in macro & industry.xlsx, the data set in the quarterly revenue forecast of A-share listed companies_ Data, please calculate the power consumption of the whole society: primary industry: the value of the current month is in which month did the peak of power consumption occur in 2015? And what percentage increase / decrease compared with the same period last year?

Exercise Code:

SELECT temp1.indic_id, temp1.name_cn, temp1.FREQUENCY_CD, temp1.PERIOD_DATE, temp1.DATA_VALUE,
temp2.indic_id, temp2.name_cn, temp2.FREQUENCY_CD, temp2.PERIOD_DATE, temp2.DATA_VALUE,
CONCAT(ROUND(((temp1.DATA_VALUE - temp2.DATA_VALUE) / temp2.DATA_VALUE * 100), 2) , '%') AS month_rate
FROM (
SELECT indic_id, name_cn, FREQUENCY_CD, PERIOD_DATE, DATA_VALUE,
ROW_NUMBER() OVER (ORDER BY DATA_VALUE DESC) AS RN
FROM `macro industry`
WHERE name_cn = 'Total Electricity Consumption: Primary Industry' 
AND SUBSTRING(PERIOD_DATE, 1, 4) = '2015'
ORDER BY DATA_VALUE DESC
LIMIT 1 ) temp1 
LEFT JOIN (
SELECT indic_id, name_cn, FREQUENCY_CD, PERIOD_DATE, DATA_VALUE
FROM `macro industry` 
WHERE name_cn = 'Total Electricity Consumption: Primary Industry' 
AND SUBSTRING(PERIOD_DATE, 1, 4) = '2014'
) temp2
ON SUBSTRING_INDEX(temp1.PERIOD_DATE, '-', -2) = SUBSTRING_INDEX(temp2.PERIOD_DATE, '-', -2);

Exercise results:

Exercise 5

Using the data set ccf_online_stage1_train.csv in Coupon Usage Data for O2O, try to calculate the overall rejection rate of online coupons during June 2016? And find out the merchants with the highest rejection rate of coupons.

Rejection rate = number of coupons received but not used / total number of coupons received

The size of the data set "ccf_online_stage1_train.csv" is 2.11GB, and the loading time is too long. Try to answer later.

Exercise 6

Please use Wine Quality Data set winequality-white.csv to find out all white wines with pH=3.63, and then rank their residual sugar in English (discontinuous ranking)

Exercise Code:

SELECT *,  RANK() OVER (ORDER BY `residual sugar`) AS RN
FROM `winequality-red`
WHERE pH = '3.63'

Exercise results:

Exercise 7

Please use the sheet data in the data set Market Data.xlsx in the quarterly revenue forecast of A-share listed companies,

What are the three industries with the largest market value by the end of 2018? And what are the three companies with the largest market capitalization in these three industries? (find out the top three companies in each industry, that is, a total of 9 companies)

Exercise Code:

SELECT a.TYPE_NAME_EN, a.TYPE_NAME_CN, a.TTL_MARKET_VALUE,
b.SECURITY_ID, b.TICKER_SYMBOL, b.MARKET_VALUE AS TICKER_MARKET_VALUE
FROM (
SELECT TYPE_NAME_EN, TYPE_NAME_CN, SUM(MARKET_VALUE + 0) AS TTL_MARKET_VALUE 
-- String type + 0 = value type
FROM `Market Data`
WHERE END_DATE = '2018-05-31'
GROUP BY TYPE_NAME_EN, TYPE_NAME_CN
ORDER BY SUM(MARKET_VALUE) DESC
LIMIT 3) a
LEFT JOIN (
SELECT SECURITY_ID, TICKER_SYMBOL, TYPE_NAME_EN, TYPE_NAME_CN, MARKET_VALUE,
ROW_NUMBER() OVER (PARTITION BY TYPE_NAME_CN ORDER BY MARKET_VALUE + 0 DESC) AS RN 
FROM `Market Data`
WHERE END_DATE = '2018-05-31'
) b
ON a.TYPE_NAME_EN = b.TYPE_NAME_EN
WHERE b.RN <= 3
ORDER BY a.TTL_MARKET_VALUE DESC;

Exercise results:

Exercise 8

Using the data sets ccf_online_stage1_train.csv and ccf_offline_stage1_train.csv in Coupon Usage Data for O2O, try to find out the customers who used the most online and offline coupons during June 2016.

The size of the data set "ccf_online_stage1_train.csv" is 2.11GB, and the loading time is too long. Try to answer later.

Exercise 9

Please use sheet general business in Income Statement.xls and sheet en in Company Operating.xlsx, the quarterly revenue forecast data set of A-share listed companies.

Find out the corresponding net profit of the quarter with the highest passenger throughput of Baiyun Airport in all years of the data set? (note that it is the net profit corresponding to a single quarter, not the cumulative net profit.)

Exercise Code:

SELECT a.TICKER_SYMBOL, a.INDIC_NAME_EN, a.DATE_YEAR, a.DATE_QUARTER, a.TTL_VALUE, b.REPORT_TYPE, b.END_DATE, b.T_PROFIT
FROM (
SELECT TICKER_SYMBOL, INDIC_NAME_EN, YEAR(END_DATE) AS DATE_YEAR, QUARTER(END_DATE) AS DATE_QUARTER, SUM(VALUE) AS TTL_VALUE
FROM `Company Operating`
WHERE INDIC_NAME_EN = 'Baiyun Airport:Passenger throughput'
GROUP BY TICKER_SYMBOL, INDIC_NAME_EN, YEAR(END_DATE), QUARTER(END_DATE)
ORDER BY SUM(VALUE) DESC
LIMIT 1) a
LEFT JOIN `Income Statement` b
ON a.TICKER_SYMBOL = b.TICKER_SYMBOL
WHERE REPORT_TYPE = 'Q1' AND END_DATE = '2018-03-31';

Exercise results:

Exercise 10

Using the data sets ccf_online_stage1_train.csv and ccf_offline_stage1_train.csv in Coupon Usage Data for O2O, try to find out the top three merchants who have been used with the most coupons online and offline during June 2016.

For example, if merchant A uses A 200 minus 50 coupon and consumer B uses A 30 minus 1 coupon, merchant A will be deducted by 51 yuan.

The size of the data set "ccf_online_stage1_train.csv" is 2.11GB, and the loading time is too long. Try to answer later.

Keywords: Database SQL

Added by planetsim on Sun, 20 Feb 2022 10:25:19 +0200