SQL exercise questions (real questions from big factories) - from Niuke question bully

For the real problems of large factories, it is necessary to have a good understanding of the problems. The problems with similar sources and actual projects are the most important to exercise thinking ability and programming skills.

Part1: a short audio and video

SQL1 average completion rate of each video

Level: simple

Title:

User video interaction table tb_user_video_log

(uid - user ID, video_id - VIDEO ID, start_time - start viewing time, end_time - end viewing time, if_follow - follow, if_like - like, if_retweet - forward, comment_id - Comment ID)

Short video message table tb_video_info

 

(video_id - VIDEO ID, author - creator ID, tag - category tag, duration - VIDEO duration (seconds), release_time - release time)



Question: calculate the completion rate of each video with playback records in 2021 (keep three decimal places for the results), and sort it in descending order according to the completion rate

Note: video completion rate refers to the proportion of completed playback times to the total playback times. For simplicity, when the difference between the end viewing time and the start playback time > = the video duration, the playback is considered to be completed.

 
Output example:
The results of the sample data are as follows:

Explanation:

In October 2021, the video 2001 was played three times, with the viewing duration of 30 seconds, 24 seconds and 34 seconds respectively, and the video duration of 30 seconds. Therefore, two times were considered to have been played, so the completion rate was 0.667;
The video 2002 was played twice in September and October 2021. The viewing duration was 42 seconds and 30 seconds respectively, and the video duration was 60 seconds, so the completion rate was 0.000.

Procedure:

SELECT
A.video_id_total AS video_id,
CAST(IFNULL(video_num_need, 0) / video_num_all AS DECIMAL(10,3)) AS avg_comp_play_rate
FROM
(SELECT
video_id AS video_id_total,
COUNT(*) AS video_num_all
FROM
tb_user_video_log
WHERE
YEAR(end_time) = 2021
GROUP BY
video_id) AS A
LEFT JOIN
(SELECT
B.video_id AS video_id,
B.release_time AS relsease_time,
COUNT(*) AS video_num_need
FROM
(SELECT
a.video_id AS video_id,
a.start_time AS start_time,
a.end_time AS end_time,
b.duration AS duartion,
b.release_time AS release_time,
a.end_time - a.start_time AS watch_time
FROM
tb_user_video_log AS a
INNER JOIN
tb_video_info AS b
ON
a.video_id = b.video_id
WHERE
YEAR(end_time) = 2021
AND
(a.end_time - a.start_time) >= b.duration) AS B
GROUP BY
B.video_id) AS C
ON
A.video_id_total = C.video_id
ORDER BY
avg_comp_play_rate DESC;

Video categories with SQL2 average playback progress greater than 60%

Level: simple

Title:

User video interaction table tb_user_video_log

(uid - user ID, video_id - VIDEO ID, start_time - start viewing time, end_time - end viewing time, if_follow - follow, if_like - like, if_retweet - forward, comment_id - Comment ID)


Short video message table tb_video_info
 
(video_id - VIDEO ID, author - creator ID, tag - category tag, duration - VIDEO duration, release_time - release time)


Problem: calculate the average playback progress of various videos and output categories with progress greater than 60%.

Note:
  • Playback progress = playback duration ÷ video duration * 100%. When the playback duration is greater than the video duration, the playback progress is recorded as 100%.
  • The results are kept to two decimal places and sorted in reverse order according to the playback progress.

Output example:
The output results of the sample data are as follows:
Explanation:
The film and television videos 2001 have been seen by users 101, 102 and 103. The playback progress is 30 seconds (100%), 21 seconds (70%) and 30 seconds (100%) respectively. The average playback progress is 90.00% (two decimal places are reserved);
The gourmet video 2002 has been seen by users 102 and 103. The playback progress is 30 seconds (50%) and 60 seconds (100%) respectively. The average playback progress is 75.00% (two decimal places are reserved);
Procedure:
SELECT
tag,
CONCAT(ROUND(avg_play_progress * 100, 2),'%') 
FROM
(
SELECT
tag,
AVG(
CASE WHEN TIMESTAMPDIFF(second, b.start_time, b.end_time) >= a.duration
THEN 1
ELSE TIMESTAMPDIFF(second, b.start_time, b.end_time) / a.duration 
END) AS avg_play_progress
FROM 
tb_video_info AS a 
INNER JOIN 
tb_user_video_log AS b
ON
a.video_id = b.video_id
GROUP BY
a.tag) AS A
WHERE 
A.avg_play_progress > 0.6
ORDER BY
A.avg_play_progress DESC

SQL3 forwarding volume / rate of each type of video in recent one month

Level: medium

User video interaction table tb_user_video_log

(uid - user ID, video_id - VIDEO ID, start_time - start viewing time, end_time - end viewing time, if_follow - follow, if_like - like, if_retweet - forward, comment_id - Comment ID)


Short video message table tb_video_info

(video_id - VIDEO ID, author - creator ID, tag - category tag, duration - VIDEO duration, release_time - release time)


Question: count the forwarding volume and forwarding rate of each type of video in the last month with user interaction (calculated according to the recent 30 days including the current day, for example, the data in the recent 30 days on October 31 is between 10.2 and 10.31) (keep 3 decimal places).

Note: forwarding rate = forwarding volume ÷ playback volume. The results are sorted in descending order by forwarding rate.

Output example:
The output of the sample data is as follows

Explanation:
From table tb_user_video_log data is available, and the date of data dump is October 1, 2021. In the past 30 days, film and television videos 2001 were played 3 times, and were forwarded 2 times, with a forwarding rate of 0.667; Food Video 2002 has been played twice and forwarded once, with a forwarding rate of 0.500.
Procedure:
SELECT
a.tag,
SUM(CASE WHEN b.if_retweet = 1 THEN 1 ELSE 0 END) AS retweet_cnt,
ROUND(SUM(CASE WHEN b.if_retweet = 1 THEN 1 ELSE 0 END) / COUNT(*), 3) AS retweet_rate
FROM
tb_video_info AS a
INNER JOIN
tb_user_video_log AS b
ON
a.video_id = b.video_id
WHERE
TIMESTAMPDIFF(
    DAY,
    DATE(b.start_time),
    DATE((
    SELECT
    MAX(start_time) 
    FROM tb_user_video_log))) <= 29
GROUP BY
a.tag
ORDER BY
retweet_rate DESC

SQL4 the monthly rise rate of each creator and the total number of fans as of now

Level: medium

User video interaction table tb_user_video_log


(uid - user ID, video_id - VIDEO ID, start_time - start viewing time, end_time - end viewing time, if_follow - follow, if_like - like, if_retweet - forward, comment_id - Comment ID)

Short video message table tb_video_info


(video_id - VIDEO ID, author - creator ID, tag - category tag, duration - VIDEO duration, release_time - release time)

Question: calculate the monthly rising rate of each creator in 2021 and the total number of fans by the end of the month

Note:
  • Powder rising rate = (powder adding amount - powder dropping amount) / playback amount. The results are sorted in ascending order by Creator ID and total fans.
  • if_follow - whether the user pays attention is 1, which means that the user pays attention to the video creator when watching the video, 0 means that the attention state has not changed before and after this interaction, and 2 means that the attention has been cancelled during this viewing.

Output example:
The output of the sample data is as follows

Explanation:
Table tb in sample data_ user_ video_ Only the playback records of videos 2001 and 2002 in the log are from the creator 901, and the playback time is in September and October 2021; In September, the amount of added powder was 1, the amount of dropped powder was 0, and the amount of broadcast was 2, so the powder rise rate was 0.500 (3 decimal places are reserved); in October, the amount of added powder was 2, the amount of dropped powder was 1, and the amount of broadcast was 4, so the powder rise rate was 0.250, and the total number of fans up to now was 2.
Procedure:
# For each creator, every month
SELECT
A.author AS author,
A.month AS month,
ROUND(fans_situation / total_play, 3) AS fans_growth_rate,
SUM(fans_situation) OVER(PARTITION BY author ORDER BY month) AS total_fans
FROM
(SELECT
b.author AS author,
DATE_FORMAT(a.start_time,'%Y-%m') AS month,
# Change of fans
SUM(
CASE 
WHEN a.if_follow = 1 THEN 1
WHEN a.if_follow = 2 THEN -1
ELSE 0
END) AS fans_situation,
# Playback volume
COUNT(*) AS total_play
FROM
tb_user_video_log AS a
INNER JOIN
tb_video_info AS b
ON
a.video_id = b.video_id
WHERE
YEAR(a.start_time) = 2021
AND YEAR(a.end_time) = 2021
GROUP BY
author, month) AS A
ORDER BY
author, total_fans

SQL5 likes and forwards of each type of video during the National Day

Level: difficult

User video interaction table tb_user_video_log

(uid - user ID, video_id - VIDEO ID, start_time - start viewing time, end_time - end viewing time, if_follow - follow, if_like - like, if_retweet - forward, comment_id - Comment ID)


Short video message table tb_video_info

(video_id - VIDEO ID, author - creator ID, tag - category tag, duration - VIDEO duration, release_time - release time)

Question: make statistics on the total likes of each type of video every day in the past week and the maximum forwarding volume in a single day in a week in the first three days of national day in 2021. The results are sorted by video category in descending order and date in ascending order. Assuming that there are enough data in the database, at least every day of the first three days of national day and the week before each category has playback records.

Output example:
The output of the sample data is as follows
Explanation:
From table tb_ user_ video_ According to the data in the log, only tourism videos can be played. The daily likes and forwards from September 25 to October 3, 2021 are as follows:

Therefore, in the first three days (10.01 ~ 10.03) of the national day, the total number of likes in the last seven days (9.25 ~ 10.01) of 10.01 is 5, and the maximum number of forwards in a single day is 2 (the largest on September 25); similarly, two indicators of 10.02 and 10.03 can be obtained.
Procedure:
SELECT
B.tag AS tag,
B.dt AS dt,
B.sum_like_cnt_7d AS sum_like_cnt_7d,
B.max_between_cnt_7d AS max_between_cnt_7d
FROM
(SELECT
A.tag AS tag,
A.dt AS dt,
SUM(A.like_cnt) OVER(PARTITION BY A.tag ORDER BY A.dt ROWS 6 PRECEDING) AS sum_like_cnt_7d,
MAX(A.retweet_cnt) OVER(PARTITION BY A.tag ORDER BY A.dt ROWS 6 PRECEDING) AS max_between_cnt_7d
FROM
(SELECT
b.tag AS tag,
DATE_FORMAT(a.start_time, '%Y-%m-%d') AS dt,
SUM(a.if_like) AS like_cnt,
SUM(a.if_retweet) AS retweet_cnt
FROM
tb_user_video_log AS a
INNER JOIN
tb_video_info AS b
ON
a.video_id = b.video_id
WHERE
DATE_FORMAT(a.start_time, '%Y-%m-%d') BETWEEN '2021-09-25' AND '2021-10-03'
AND DATE_FORMAT(a.end_time, '%Y-%m-%d') BETWEEN '2021-09-25' AND '2021-10-03'
GROUP BY
tag, DATE_FORMAT(a.start_time, '%Y-%m-%d')
) AS A
) AS B
WHERE
B.dt BETWEEN '2021-10-01' AND '2021-10-03'
ORDER BY
B.tag DESC, B.dt ASC;

Note:

This topic examines the sliding window. In this kind of problem of how many days to take, the window function can be designed according to the needs of the topic. Take this topic as an example

Taking the situation within a week, the idea can be sorted in ascending and descending order.

1) Ascending order: 6 lines ahead of the current line

partition by tag order by dt rows 6 preceding 
partition by tag order by dt rows between 6 preceding and CURRENT row

2) descending order: 6 lines from the current line

partition by tag order by dt desc rows between  CURRENT row and 6 following

Remember that this method is very helpful for solving similar problems.

The top 3 video with the highest popularity among the videos released by SQL6 in recent months

Level: difficulty

describe

 

Existing users - video interaction table tb_user_video_log

(uid - user ID, video_id - VIDEO ID, start_time - start viewing time, end_time - end viewing time, if_follow - follow, if_like - like, if_retweet - forward, comment_id - Comment ID)


Short video message table tb_video_info

(video_id - VIDEO ID, author - creator ID, tag - category tag, duration - VIDEO duration, release_time - release time)


Question: find out the top 3 videos with the highest popularity among the videos released in recent months.

Note:
  • Popularity = (a * video completion rate + b * likes + c * comments + d * forwards) * freshness;
  • Freshness = 1 / (no playing days recently + 1);
  • The currently configured parameters a, B, C and D are 100, 5, 3 and 2 respectively.
  • Last play date in end_time - the end viewing time shall prevail. Assuming T, the statistics of the last month shall be based on the closed interval of [T-29, T].
  • The heat in the results is kept as an integer and sorted in descending order of heat.

Output example:
The output of the sample data is as follows

Explanation:
The latest broadcast date is 2021-10-03, which is recorded as the date of the day; The videos released in the past month (2021-09-04 and later) include 2001, 2002, 2003 and 2004, but there is no broadcast record in 2004;
The completion rate of video 2001 is 1.0 (4 times played, 4 times completed), 3 times liked, 1 time commented, and 2 times forwarded. The number of days not played recently is 0, so the popularity is: (100 * 1.0 + 5 * 3 + 3 * 1 + 2 * 2) / (0 + 1) = 122
Similarly, the completion rate of video 2003 is 0, the number of likes is 1, the number of comments and forwarding is 0, and the number of days without playing recently is 3. Therefore, the popularity is: (100 * 0 + 5 * 1 + 3 * 0 + 2 * 0) / (3 + 1) = 1 (1.2 is reserved as an integer).
Procedure:
SELECT
video_id,
ROUND((100 * total_play_rate + 5 * like_num + 3 * comment_num + 2 * retweet_num) * (1 / (not_play_days + 1)),0) AS hot_index
FROM
(SELECT
a.video_id AS video_id,
# Video completion rate
AVG(CASE
WHEN TIMESTAMPDIFF(SECOND, a.start_time, a.end_time) < b.duration 
THEN 0
WHEN TIMESTAMPDIFF(SECOND, a.start_time, a.end_time) >= b.duration
THEN 1
END) AS total_play_rate,
# Number of likes
SUM(a.if_like) AS like_num,
# Number of comments
COUNT(a.comment_id) AS comment_num,
# Forwarding number
SUM(a.if_retweet) AS retweet_num,
# Recent days without broadcasting
DATEDIFF(
(
SELECT MAX(end_time) FROM tb_user_video_log),
MAX(a.end_time)
) AS not_play_days
FROM
tb_user_video_log AS a
INNER JOIN
tb_video_info AS b
ON
a.video_id = b.video_id
WHERE
DATEDIFF(
(
SELECT MAX(end_time) FROM tb_user_video_log),
b.release_time
) <= 29
GROUP BY
a.video_id) AS A
ORDER BY
hot_index DESC
LIMIT 3;

Unfinished to be continued...

Keywords: MySQL

Added by banzaimonkey on Sun, 02 Jan 2022 02:23:13 +0200