SQL | partial time function | taxi problem

The order information table records the information of Brazilian passengers using taxi software, including order call, response, cancellation and completion time.

-order_id: order ID,Call order ID
-passenger_id:passenger ID,Passenger identification number
-call_time:Call time, the time point when the passenger sends the request for car use from the application (Beijing time)
-grab_time:Response time: the time point when the driver clicks to receive the order (Beijing time)
-cancel_time:Cancellation time, the time when the driver or passenger cancels the order (Beijing time)
-finish_time:When the order is finished, the driver clicks the time point of arrival at the destination (Beijing time)

be careful:
(1) The time in the table is Beijing time. Brazil is 11 hours slower than China.
(2) If the data value of the response time column is "1970", it means that the order has not been answered by the driver, which is an invalid order.
Index meaning:
Response rate: the proportion of call orders answered
Completion rate: the proportion of completed orders of call orders
Call response time: the average time from the call to the answer of the answered order
1. Create table

CREATE TABLE IF NOT EXISTS login2
CREATE TABLE IF NOT EXISTS order2(
order_id INT,
passenger_id VARCHAR(10),
call_time DATETIME,
grab_time DATETIME,
cancel_time DATETIME,
finish_time DATETIME
);

2. Insert data

INSERT INTO order2 VALUES(1,'001','2020/3/20 18:08:00','2020/3/20 18:09:00','1971/1/01 00:00:00','2020/3/20 19:08:00'),
						 (2,'002','2020/3/20 18:08:00','2020/3/20 18:10:00','1971/1/01 00:00:00','2020/3/20 20:08:00'),
						 (3,'002','2020/3/20 18:08:00','2020/3/20 18:08:00','2020/3/20 18:09:00','1971/1/01 00:00:00'),
						 (4,'003','2020/3/24 18:08:00','1971/1/01 00:00:00','2020/3/24 18:09:00','1971/1/01 00:00:00'),
						 (5,'004','2020/3/24 18:08:00','1971/1/01 00:00:00','1971/1/01 00:00:00','1971/1/01 00:00:00'),
						 (6,'005','2020/3/27 18:08:00','2020/3/27 18:09:00','1971/1/01 00:00:00','2020/3/27 18:40:00');

3. Logical SQL
All the following questions are based on Brazilian time and should be converted to Brazilian time
First:
To reduce all the time in the table by 11 hours, DATE_SUB(datetime,INTERVAL 11 HOUR)

Second:
Time zone conversion, Beijing is Dongba District, GMT + 08:00. CONVERT_TZ(datetime,'+08:00','-03:00')

SELECT  order_id,
	    passenger_id,
	    DATE_SUB(call_time,INTERVAL 11 HOUR) AS call_time,
	    DATE_SUB(grab_time,INTERVAL 11 HOUR) AS grab_time,
	    DATE_SUB(cancel_time,INTERVAL 11 HOUR) AS cancel_time,
	    DATE_SUB(finish_time,INTERVAL 11 HOUR) AS finish_time
FROM order2;

or

SELECT  order_id,
	    passenger_id,
		CONVERT_TZ(call_time,'+08:00','-03:00') AS call_time,
		CONVERT_TZ(grab_time,'+08:00','-03:00') AS grab_time,
		CONVERT_TZ(cancel_time,'+08:00','-03:00') AS cancel_time,
		CONVERT_TZ(finish_time,'+08:00','-03:00') AS finish_time
 FROM order2;

(1) Response rate and completion rate of orders
Response rate: the proportion of call orders answered, that is, the number of orders in COUNT year (grab_time) < > 1970 / the number of orders in all calls in COUNT
Order completion rate: the proportion of completed orders of call orders, that is, the number of orders of COUNT year (finish_time) < > 1970 / the number of orders of all calls of COUNT

mysql> WITH t1 AS (SELECT order_id,
	->   				passenger_id,
    ->    				DATE_SUB(call_time,INTERVAL 11 HOUR) AS call_time,
    ->    				DATE_SUB(grab_time,INTERVAL 11 HOUR) AS grab_time,
    ->    				DATE_SUB(cancel_time,INTERVAL 11 HOUR) AS cancel_time,
    ->    				DATE_SUB(finish_time,INTERVAL 11 HOUR) AS finish_time
    ->    		   FROM order2)
    -> SELECT COUNT(CASE WHEN YEAR(grab_time)=1970 THEN NULL ELSE grab_time END)/COUNT(*) AS 'Response rate',
    ->        COUNT(CASE WHEN YEAR(finish_time)=1970 THEN NULL ELSE grab_time END)/COUNT(*) AS 'Completion rate'
    -> FROM t1;
+-----------+-----------+
| Response rate    | Completion rate    |
+-----------+-----------+
|    0.6667 |    0.5000 |
+-----------+-----------+
1 row in set (0.01 sec)

The functions ROUND() and CONCAT() can be used to turn the display to

(2) How long is the call response time
Timestamp diff (unit, datetime_expr1, datetime_expr2) unit can be year, month, day, hour, minute, second, etc

It is different from DATEDIFF(date1,date2), which returns the days of difference

mysql> WITH t1 AS (SELECT   order_id,
    ->       				passenger_id,
    ->       				CONVERT_TZ(call_time,'+08:00','-03:00') AS call_time,
    ->       				CONVERT_TZ(grab_time,'+08:00','-03:00') AS grab_time,
    ->       				CONVERT_TZ(cancel_time,'+08:00','-03:00') AS cancel_time,
    ->       				CONVERT_TZ(finish_time,'+08:00','-03:00') AS finish_time
    ->      		FROM order2)
    -> SELECT SUM(TIMESTAMPDIFF(minute,call_time,grab_time))/COUNT(*) AS 'Average response time'
    -> FROM  t1
    -> WHERE YEAR(grab_time)<>1970;
+--------------------+
| Average response time       |
+--------------------+
|             1.0000 |
+--------------------+
1 row in set (0.00 sec)

(3) According to the data of this week, which hour (local time) has the highest call volume?
Which hour (local time) has the least call volume?

There are generally several cases of sorting problems:
order by
rank three window functions
For the where sub query, the score of eg higher than Xiaoming is no more than 2

First: if you only understand each hour of the week, that is, don't consider every day, just consider each hour of the week
Considering the repeated call volume, dense is adopted_ Rank() over(), with parallel ranking and continuous sorting
According to the conditions to be met:
① This week, because the date 2020-3-27 is specified for the test data according to the data content, I understand that this week is seven days from the beginning of the day, so I use DATEDIFF. If it is understood as a certain week, BETWEEN specifies the date range
② Call volume per hour, grouped by HOUR(call_time), calculate the COUNT call volume
③ Highest call volume, DENSE_RANK() is sorted in descending order, with the rank of 1 taken
Similarly, the call volume is the lowest, DENSE_RANK() is sorted in ascending order, with the rank of 1 taken

mysql> WITH t1 AS (SELECT	 order_id,
    ->       				passenger_id,
    ->       				CONVERT_TZ(call_time,'+08:00','-03:00') AS call_time,
    ->       				CONVERT_TZ(grab_time,'+08:00','-03:00') AS grab_time,
    ->       				CONVERT_TZ(cancel_time,'+08:00','-03:00') AS cancel_time,
    ->       				CONVERT_TZ(finish_time,'+08:00','-03:00') AS finish_time
    ->      		FROM order2)
    -> SELECT hour AS 'Highest call hour'
    -> FROM
    -> 		(SELECT HOUR(call_time) AS hour,
    			    COUNT(order_id) AS order_co, # ② Call volume of HOUR packet
    ->    		    DENSE_RANK() OVER(ORDER BY COUNT(order_id) DESC) AS rk  # ③ Sort, take the sort as 1
    -> 		FROM t1
    -> 		WHERE DATEDIFF('2020-3-27',call_time)<=7  # ① This week
    -> 		GROUP BY HOUR(call_time))t2 # ② Call volume of HOUR packet
    -> WHERE rk=1; # ③ Sort, take the sort as 1
+--------------------------+
| Highest call hour         |
+--------------------------+
|                        7 |
+--------------------------+
1 row in set (0.00 sec)

**Second: * * if you want to consider the hour with the highest / lowest average call volume per hour this week, in fact, thinking is purely a brain hole
According to the conditions to be met
① This week, ibid
② Call volume per hour. If you calculate the total call volume per hour / 7, it is no different from the above, except 7;
If there are no calls in a certain hour, it is not averaged. For example, there are 14 calls from 6 p.m. on Monday to Wednesday, with an average of 14 instead of 6
③ Sort, ditto, DENSE_RANK() sort by average
It's just a brain hole. Don't make it up. It's just as follows

mysql> WITH t1 AS (SELECT   order_id,
    ->       				passenger_id,
    ->       				CONVERT_TZ(call_time,'+08:00','-03:00') AS call_time,
    ->       				CONVERT_TZ(grab_time,'+08:00','-03:00') AS grab_time,
    ->       				CONVERT_TZ(cancel_time,'+08:00','-03:00') AS cancel_time,
    ->       				CONVERT_TZ(finish_time,'+08:00','-03:00') AS finish_time
    ->      		FROM order2)
    -> SELECT hour AS 'Maximum call hours'
    -> FROM
    -> 		(SELECT hour,AVG(order_co) AS avg,DENSE_RANK() OVER(ORDER BY AVG(order_co) DESC ) AS rk
    -> 		FROM
    -> 			(SELECT DATE(call_time) AS date,HOUR(call_time) AS hour,COUNT(order_id) AS order_co
    -> 			FROM t1
    -> 			WHERE DATEDIFF('2020-3-27',call_time)<=7
    -> 			GROUP BY DATE(call_time),HOUR(call_time))t2
    -> 		GROUP BY hour)t3
    -> WHERE rk=1;
+--------------------------+
| Maximum call hours         |
+--------------------------+
|                        7 |
+--------------------------+
1 row in set (0.00 sec)

(3) What is the proportion of users who continue to call the next day

mysql> WITH t1 AS (SELECT   order_id,
    ->       				passenger_id,
    ->       				CONVERT_TZ(call_time,'+08:00','-03:00') AS call_time,
    ->       				CONVERT_TZ(grab_time,'+08:00','-03:00') AS grab_time,
    ->       				CONVERT_TZ(cancel_time,'+08:00','-03:00') AS cancel_time,
    ->       				CONVERT_TZ(finish_time,'+08:00','-03:00') AS finish_time
    ->      		FROM order2)
    -> SELECT DATE(t1.call_time),COUNT(DISTINCT CASE WHEN DATEDIFF(t1.call_time,t2.call_time)=1 THEN t1.passenger_id ELSE NULL END)/COUNT(DISTINCT t1.passenger_id) AS 'Retained the next day'
    -> FROM t1 t1 LEFT JOIN t1 t2 ON t1.passenger_id = t2.passenger_id
    -> GROUP BY DATE(t1.call_time);
+--------------------+--------------+
| DATE(t1.call_time) | Retained the next day     |
+--------------------+--------------+
| 2020-03-20         |       0.0000 |
| 2020-03-24         |       0.0000 |
| 2020-03-27         |       0.0000 |
+--------------------+--------------+
3 rows in set (0.00 sec)

Note: the title comes from the official account of monkey data analysis.

Keywords: SQL

Added by Atanu on Mon, 31 Jan 2022 07:04:46 +0200