Chapter 09_ Subquery

Subquery refers to a query in which one query statement is nested inside another query statement. This feature has been introduced since MySQL 4.1.

The use of SQL sub query greatly enhances the ability of SELECT query, because many times the query needs to obtain data from the result set, or calculate a data result from the same table, and then compare it with the data result (which may be a scalar or a set).

1. Demand analysis and problem solving

1.1 practical problems

Existing solutions:

#Mode 1:
SELECT salary
FROM employees
WHERE last_name = 'Abel';

SELECT last_name,salary
FROM employees
WHERE salary > 11000;

#Mode 2: self connection
SELECT e2.last_name,e2.salary
FROM employees e1,employees e2
WHERE e1.last_name = 'Abel'
AND e1.`salary` < e2.`salary`
#Method 3: sub query
SELECT last_name,salary
FROM employees
WHERE salary > (
		SELECT salary
		FROM employees
		WHERE last_name = 'Abel'
		);

1.2 basic usage of sub query

  • Basic syntax structure of subquery:

  • The sub query (internal query) is executed once before the main query.
  • The results of the sub query are used by the main query (external query).
  • matters needing attention
    • Subqueries should be enclosed in parentheses
    • Place the subquery to the right of the comparison criteria
    • Single row operators correspond to single row subqueries, and multi row operators correspond to multi row subqueries

1.3 classification of sub query

Classification method 1:

We return one or more records according to the results of internal query, and divide the sub query into single line sub query and multi line sub query.

  • single-row subqueries

  • Multiline subquery

Classification method 2:

We divide sub queries into related (or related) sub queries and irrelevant (or non related) sub queries according to whether the internal query has been executed multiple times.

The sub query queries the data result from the data table. If the data result is executed only once, and then the data result is executed as the condition of the main query, such a sub query is called irrelevant sub query.

Similarly, if the sub query needs to be executed multiple times, that is, in a circular way, start from the external query, pass in the sub query for query each time, and then feed back the results to the external, this nested execution method is called related sub query.

2. Single line sub query

2.1 single line comparison operator

Operator meaning
= equal to
> greater than
>= greater than or equal to
< less than
<= less than or equal to
<> not equal to

2.2 code example

Title: query the information of employees whose salary is greater than the salary of employee No. 149

Title: return to job_ The ID is the same as that of employee 141. Salary is more than that of employee 143_ ID and salary

SELECT last_name, job_id, salary
FROM   employees
WHERE  job_id =  
                (SELECT job_id
                 FROM   employees
                 WHERE  employee_id = 141)
AND    salary >
                (SELECT salary
                 FROM   employees
                 WHERE  employee_id = 143);

Title: return the last of the employee with the lowest salary in the company_ name,job_ ID and salary

SELECT last_name, job_id, salary
FROM   employees
WHERE  salary = 
                (SELECT MIN(salary)
                 FROM   employees);

Title: query the manager of employee 141 or 174_ ID and department_ Employees of other employees with the same ID_ id,manager_id,department_id

Implementation method 1: unpaired comparison

SELECT  employee_id, manager_id, department_id
FROM    employees
WHERE   manager_id IN
		  (SELECT  manager_id
                   FROM    employees
                   WHERE   employee_id IN (174,141))
AND     department_id IN 
		  (SELECT  department_id
                   FROM    employees
                   WHERE   employee_id IN (174,141))
AND	employee_id NOT IN(174,141);

Implementation mode 2: pairwise comparison

SELECT	employee_id, manager_id, department_id
FROM	employees
WHERE  (manager_id, department_id) IN
                      (SELECT manager_id, department_id
                       FROM   employees
                       WHERE  employee_id IN (141,174))
AND	employee_id NOT IN (141,174);

2.3 sub query in having

  • Execute the subquery first.
  • Returns results to the HAVING clause in the main query.

Title: query the Department id and the minimum wage of the Department whose minimum wage is greater than the minimum wage of No. 50 department

SELECT   department_id, MIN(salary)
FROM     employees
GROUP BY department_id
HAVING   MIN(salary) >
                       (SELECT MIN(salary)
                        FROM   employees
                        WHERE  department_id = 50);

2.4 sub query in case

To use a single column subquery in a CASE expression:

Title: explicit employee_id,last_name and location. Where, if the employee department_id and location_ Department with ID 1800_ If the ID is the same, the location is' Canada ', and the rest is' USA'.

SELECT employee_id, last_name,
       (CASE department_id
        WHEN
             (SELECT department_id FROM departments
	      WHERE location_id = 1800)           
        THEN 'Canada' ELSE 'USA' END) location
FROM   employees;

2.5 null value problem in sub query

SELECT last_name, job_id
FROM   employees
WHERE  job_id =
                (SELECT job_id
                 FROM   employees
                 WHERE  last_name = 'Haas');

The subquery does not return any rows

2.5 illegal use of sub query

SELECT employee_id, last_name
FROM   employees
WHERE  salary =
                (SELECT   MIN(salary)
                 FROM     employees
                 GROUP BY department_id);

Multi line subqueries use single line comparators

3. Multi line sub query

  • Also known as set comparison subquery
  • Multiple rows returned by internal query
  • Using multiline comparison operators

3.1 multiline comparison operator

Operator meaning
IN Equal to any one in the list
ANY It needs to be used together with the single line comparison operator to compare with a value returned by the sub query
ALL It needs to be used together with the single line comparison operator to compare with all the values returned by the sub query
SOME In fact, it is an alias of ANY, which has the same function. Generally, ANY is often used

Experience the difference between ANY and ALL

3.2 code examples

Title: return to other jobs_ Job in ID_ ID is' it '_ Prog 'employee number, name and job of any employee with low salary in the Department_ ID and salary

Title: return to other jobs_ Job in ID_ ID is' it '_ Prog 'employee number, name and job of all employees with low salary in the Department_ ID and salary

Title: query the Department id with the lowest average wage

#Mode 1:
SELECT department_id
FROM employees
GROUP BY department_id
HAVING AVG(salary) = (
			SELECT MIN(avg_sal)
			FROM (
				SELECT AVG(salary) avg_sal
				FROM employees
				GROUP BY department_id
				) dept_avg_sal
			)
#Mode 2:
SELECT department_id
FROM employees
GROUP BY department_id
HAVING AVG(salary) <= ALL (
				SELECT AVG(salary) avg_sal
				FROM employees
				GROUP BY department_id
)

3.3 null value problem

SELECT last_name
FROM employees
WHERE employee_id NOT IN (
			SELECT manager_id
			FROM employees
			);

4. Related sub query

4.1 related sub query execution process

If the execution of a sub query depends on an external query, it is usually because the tables in the sub query use external tables and are conditionally associated. Therefore, every time an external query is executed, the sub query must be recalculated. Such a sub query is called an associated sub query.

Related sub queries are executed in row by row order. Each row of the main query executes a sub query.

Description: the columns in the main query are used in the sub query

4.2 code examples

Title: query the last of employees whose salary is greater than the average salary of the Department_ Name, salary and its department_id

Method 1: related sub query

Method 2: use sub query in FROM

SELECT last_name,salary,e1.department_id
FROM employees e1,(SELECT department_id,AVG(salary) dept_avg_sal FROM employees GROUP BY department_id) e2
WHERE e1.`department_id` = e2.department_id
AND e2.dept_avg_sal < e1.`salary`;

From type sub query: the sub query is a part of from. The sub query should be led with (), and the sub query should be aliased,
Use it as a "temporary virtual table".

Use subquery in ORDER BY:

Title: query employee id,salary, according to department_name sort

SELECT employee_id,salary
FROM employees e
ORDER BY (
	  SELECT department_name
	  FROM departments d
	  WHERE e.`department_id` = d.`department_id`
	);

Title: if employees in the employees table_ ID and job_ Employee in history table_ The number of employees with the same ID is not less than 2, and the employees of these employees with the same ID are output_ id,last_ Name and its job_id

SELECT e.employee_id, last_name,e.job_id
FROM   employees e 
WHERE  2 <= (SELECT COUNT(*)
             FROM   job_history 
             WHERE  employee_id = e.employee_id);

4.3 EXISTS and NOT EXISTS keywords

  • The associated subquery is usually used together with the EXISTS operator to check whether there are qualified rows in the subquery.
  • If there are no qualified rows in the subquery:
    • Condition returns FALSE
    • Continue to find in subquery
  • If there are qualified rows in the subquery:
    • Do not continue to find in subquery
    • The condition returns TRUE
  • NOT EXISTS keyword indicates that if there is no condition, it returns TRUE; otherwise, it returns FALSE.

Title: query the employees of company managers_ id,last_name,job_id,department_id information

Mode 1:

SELECT employee_id, last_name, job_id, department_id
FROM   employees e1
WHERE  EXISTS ( SELECT *
                 FROM   employees e2
                 WHERE  e2.manager_id = 
                        e1.employee_id);

Mode 2: self connection

SELECT DISTINCT e1.employee_id, e1.last_name, e1.job_id, e1.department_id
FROM   employees e1 JOIN employees e2
WHERE e1.employee_id = e2.manager_id;

Mode 3:

SELECT employee_id,last_name,job_id,department_id
FROM employees
WHERE employee_id IN (
		     SELECT DISTINCT manager_id
		     FROM employees
		     
		     );

Title: query the departments of departments that do not exist in the employees table in the departments table_ ID and department_name

SELECT department_id, department_name
FROM departments d
WHERE NOT EXISTS (SELECT 'X'
                  FROM   employees
                  WHERE  department_id = d.department_id);

4.4 relevant updates

UPDATE table1 alias1
SET    column = (SELECT expression
                 FROM   table2 alias2
                 WHERE  alias1.column = alias2.column);

Use related subqueries to update data from one table to another.

Title: add a department in employees_ Name field, the data is the Department name corresponding to the employee

# 1)
ALTER TABLE employees
ADD(department_name VARCHAR2(14));

# 2)
UPDATE employees e
SET department_name =  (SELECT department_name 
	                       FROM   departments d
	                       WHERE  e.department_id = d.department_id);

4.4 relevant deletion

 DELETE FROM table1 alias1
 WHERE column operator (SELECT expression
                        FROM   table2 alias2
                        WHERE  alias1.column = alias2.column);

Use related subqueries to delete data from one table based on data from another.

Title: delete the table employees, which is the same as EMP_ Data in history table

DELETE FROM employees e
WHERE employee_id in  
           (SELECT employee_id
            FROM   emp_history 
            WHERE  employee_id = e.employee_id);

5. Throw a thinking question

Question: whose salary is higher than Abel's?

answer:

#Mode 1: self connection
SELECT e2.last_name,e2.salary
FROM employees e1,employees e2
WHERE e1.last_name = 'Abel'
AND e1.`salary` < e2.`salary`
#Method 2: sub query
SELECT last_name,salary
FROM employees
WHERE salary > (
		SELECT salary
		FROM employees
		WHERE last_name = 'Abel'
		);

Question: are the above two methods good or bad?

Answer: good self connection mode!

You can use subquery or self connection in the topic. In general, it is recommended that you use self connection, because in the processing of many DBMS, the processing speed of self connection is much faster than that of sub query.

It can be understood as follows: the sub query is actually the condition judgment after the query through the unknown table, while the self connection is the condition judgment through the known self data table. Therefore, the self connection processing is optimized in most DBMS.

Keywords: MySQL

Added by jbog91 on Tue, 25 Jan 2022 16:02:22 +0200