1, Database cognition
(1) What is a database
- Database:
1) A warehouse that organizes, stores, and manages data according to data structures
2) A container that holds organized data
3) A special file used to store data
(2) Type of database
- Commercial databases: Oracle, SQL Server, DB2, etc
- Open source databases: MySQL, PostgreSQL, etc
- Desktop database: represented by Microsoft Access, it is suitable for desktop applications
(3) Composition of database
- Column: a field in a table
- Row: a record in a table
- Table: table is structured information; Row and column composition table
- Database: multiple tables form a database
- How much data can be stored in MySQL:
1) MySQL can create up to 2 billion tables per database
2) A table allows 1024 columns (i.e. 1024 fields) to be defined
3) SQL has no direct limit on the number of rows in each table, but it is limited by the storage space of the database
(4) Introduction to SQL
- SQL: a database query and programming language for accessing data and querying, updating, and managing relational database systems
- The most commonly used data analysis languages are: data query language, data operation language and data definition language, that is, add, delete, modify and query
- The most commonly used is "query", that is, data retrieval
2, Data type
(1) Most commonly used data types
-
Data type summary
data type explain int integer float Decimal, floating point varchar text date date -
Numerical type
data type explain INT(size) Integer specifying the maximum number of digits in parentheses FLOAT(size,d) For numbers with floating decimal point, specify the maximum number of digits in parentheses, and d represents the number of digits after the decimal point DOUBLE(size,d) For numbers with floating decimal point, specify the maximum number of digits in parentheses, and d represents the number of digits after the decimal point -
Date type
data type explain YEAR() Year in 2-digit or 4-digit format DATE() Date format: YYYY-MM-DD TIME() Time format: HH:MM:SS DATETIME() Combined format of date and time: YYYY-MM-DD HH:MM:SS TIMESTAMP() time stamp -
Text type
data type explain char(n) Save a fixed length string (which can contain letters, numbers, and special characters) varchar() Save variable length strings (which can contain letters, numbers, and special characters)
3, SQL basic query language
-
Common SQL commands and writing order
SELECT * (required)
FROM table
WHERE conditions
GROUP BY field
HAVING conditions
ORDER BY field
LIMIT limit the number of data pieces
Note: SQL is not case sensitive; Commands are generally uppercase, table names and field names are generally lowercase -
Retrieve data (SELECT statement)
Syntax structure:SELECT * FROM Table name
1) Retrieve single column: select column name from table name
2) Retrieve multiple columns: separated by "," select column name 1, column name 2 from table name
3) Retrieve all columns: select*
4) LIMIT search results: LIMIT
5) De duplication retrieval: DISTINCT -
Qualification (WHERE statement)
1) Syntax structure:SELECT * FROM Table name WHERE condition
2) Common judgment commands in WHERE conditions
Value judgment: greater than (>), less than (<), equal to (=), not equal to (< >), greater than or equal to (> =), less than or equal to (< =), between and
Logical judgment: AND, OR, NOT, IN()
Fuzzy judgment: LIKE,%_ (underlined), etc., such as LIKE "% Cai%"3) Exercise 1:
Value judgment: retrieve the "investment information with investment amount greater than 100000" in the "investment table"SELECT * FROM investmenttab2015 WHERE investment > 100000;
4) Exercise 2:
Logical judgment: retrieve the "investment information with investment amount greater than 100000 and investment period of 360 days" in the "investment table"SELECT * FROM investmenttab2015 WHERE investment > 100000 AND investdays = 360;
5) Case operation:
Select the investment information with investment period of 270 days or 360 days, investment from January 1, 2018 to December 31, 2019 and investment amount of more than 100000 (including 100000) from the "investment table"SELECT * FROM investmenttab2015 WHERE investdays IN (270,360) AND investment >= 100000 AND investmentdate BETWEEN '2018-01-01' AND '2019-12-31';
-
Calculation field (addition, subtraction, multiplication and division)
1) Syntax structure:SELECT Field 1 + Field 2 AS 'total', Field 1 - Field 2 AS 'differ', Field 1 * Field 2 AS 'product', Field 1 / Field 2 AS 'divide' FROM Table name WHERE Condition (not required)
2) Case operation:
Add a new field in "investment table": investment income column incomeSELECT *, investment*(investrate/360)*investdays as income FROM investmenttab2015;
-
Splice field (concat)
SELECT concat(Field 1,Field 2) AS 'new field name' FROM Table name
Add a new splicing field in the "user table" and splice the province province and city together as the new splicing field
SELECT *, concat(province,city) FROM users; SELECT *, concat(province,"-",city) FROM users;
-
Data grouping (GROUP BY)
1) Syntax structure:SELECT field,Calculation field FROM Table name WHERE query criteria GROUP BY field
2) Case operation:
Calculate the investment amount of "investment products of various investment periods"SELECT investdays,sum(investment) FROM investmenttab2015 GROUP BY investdays;
-
Data filtering (HAVING)
1) Syntax structure:SELECT field,Calculation field FROM Table name WHERE query criteria GROUP BY field HAVING Filter condition
2) Case operation:
Calculate the investment amount of "various investment periods" and display the investment information with an investment amount greater than 150 million (1.5 billion)SELECT investdays,SUM(investment) FROM investmenttab2015 GROUP BY investdays HAVING SUM(investment) > 1500000000;
-
Result sorting (ORDER BY)
1) Syntax structure:SELECT field,Calculation field FROM Table name WHERE Query criteria (not required) GROUP BY field HAVING Filter condition (not required) ORDER BY field DESC
2) Sort:
Ascending: default ASC
Descending: DESC3) Case operation:
Calculate the investment amount of "various investment periods", display the investment information with an investment amount greater than 150 million (1.5 billion), and arrange it in descending order from high to lowSELECT investdays,SUM(investment) FROM investmenttab2015 GROUP BY investdays HAVING SUM(investment) > 1500000000 ORDER BY SUM(investment) DESC;
4, SQL function
(1) Numerical function
-
Functions and descriptions are shown in the following table:
function describe Sum (column name) Returns the sum of a column AVG (column name) Returns the average value of a column Min (column name) Returns the minimum value of a column Max (column name) Returns the maximum value of a column Count (column name) Returns the number of rows in a column (excluding NULL values) COUNT(*) Returns the number of selected rows COUNT(distinct column name) Returns the number of distinct results ABS (column name) Return absolute value QRT (column name) Return square root -
Case operation
Calculate the investment amount and quantity of "various investment term financial products"SELECT investdays,SUM(investment),COUNT(investmentlistid) FROM investmenttab2015 GROUP BY investdays;
(2) Time date function
-
Time format:
1) DATETIME - Format: YYYY-MM-DD HH:MM:SS
2) DATE - Format: YYYY-MM-DD
3) YEAR - Format: YYYY -
Functions and descriptions are shown in the following table:
function explain NOW() Returns the current date and time CURDATE() Returns the current date CURTIME() Returns the current time DATE() Extracts the date part of a date or date / time expression EXTRACT() Returns a separate part of the date / time DATE_ADD() Adds the specified time interval to the date DATE_SUB() Subtracts the specified time interval from the date DATEDIFF() Returns the number of days between two dates DATE_FORMAT() Display date / time in different formats -
Common date commands
SELECT NOW(), # Returns the current date and time DATE(NOW()), # Date format MONTH(NOW()), # month YEAR(NOW()), # particular year DATE_FORMATE(NOW(),'%Y-%m-%d'), # Date format DATE_FORMATE(NOW(),'%Y-%m'), # Year / month format, English in uppercase and Chinese in lowercase DATE_ADD(NOW(),INTERVAL 1 DAY), # Add date plus day / week DATE_SUB(NOW(),INTERVAL 1 DAY) # Date minus one day / week SELECT DATEDIFF('2019-08-31','2019-01-01'); # How many days is the date interval SELECT NOW(); # View current date and time SELECT CURDATE(); # View current date
-
Case operation:
1) View "investment end date"SELECT *,DATE_ADD(investmentdate,INTERVAL investdays DAY) AS end_date FROM investmenttab2015;
2) View "investment month"
SELECT *,DATE_FORMAT(investmentdate,"%m") AS investment_month FROM investmenttab2015;
(3) Textual function
- Functions and descriptions are shown in the following table:
function describe Left (column name) Returns the left character Right (column name) Returns the right character Length (column name) Returns the length of a field
(4) Control statement
- IF function
Syntax structure:IF(condition, true, false)
- Case operation:
Query the investment table and add a new column "investment quota type". The investment amount greater than or equal to 100000 is high investment, and the investment amount less than 100000 is low investmentSELECT *,IF(investment>=100000,'High input','Low input') AS "Investment quota type" FROM investmenttab2015;
- Conditional function (CASE WHEN)
1) Syntax structure:
2) Case operation:CASE WHEN Condition 1 THEN 'Result 1' WHEN Condition 2 THEN 'Result 2' WHEN Condition 3 THEN 'Result 3' ...... ELSE 'Default value' END AS 'new field name'
Query the investment table and add a new column "investment quota type". The investment amount greater than or equal to 100000 is high investment, the investment amount greater than or equal to 50000 but less than 100000 is medium investment, and the investment amount less than 50000 is low investmentSELECT *, CASE WHEN investment >= 100000 THEN "High input" WHEN investment >= 50000 AND investment < 100000 THEN "Medium input" ELSE "Low input" END AS "Investment quota type" FROM investmenttab2015;
- PivotTable report
Use CASE WHEN with GROUP B to complete the PivotTable functionSELECT investdays, COUNT(CASE WHEN investment >= 100000 THEN "High input" END) AS "High input", COUNT(CASE WHEN investment >= 50000 AND investment < 100000 THEN "Medium input" END) AS "Medium input", COUNT(CASE WHEN investment < 50000 THEN "Low input" END) AS "Low input" FROM investmenttab2015 GROUP BY investdays; # IF implementation idea SELECT investdays, COUNT(IF(investment >= 100000,"High input",null)) AS "High input", COUNT(IF(investment >= 50000 AND investment < 100000,"Medium input",null)) AS "Medium input", COUNT(IF(investment < 100000,"Low input",null)) AS "Low input" FROM investmenttab2015 GROUP BY investdays;
5, SQL advanced
(1) SQL statement execution order
- Execution sequence
Step 1: determine the data source
FROM
JOIN
ON
Step 2: filter data
WHERE
GROUP BY
HAVING
Step 3: query data
SELECT
Step 4: display data
DISTINCT
ORDER BY
LIMIT
(2) SQL multi table join query
-
UNION vertical connection
1) Why connect (related to the storage form of data): one table has some information and the other table has other information. We want to obtain all the information, so we need to associate the two tables
2) Purpose: combination
3) Syntax structure:
UNION is used to combine the results from many SELECT statements into a result set, also known as UNION querySELECT * FROM A UNION [ALL | DISTINCT] SELECT * FROM B UNION [ALL | DISTINCT] SELECT * FROM C
a. UNION ALL connect all
b. UNION(DISTINCT) de reconnection
4) Case operation:
Connect the investment table and investment table 2015 into a large table# De reconnection (when the data is consistent with the type, it is recognized as duplicate) SELECT * FROM investmenttab UNION SELECT * FROM investmenttab2015;
Note:
a. The number of fields must be consistent
b. The order of fields must be consistent -
Transverse connection
1) Classification
a. INNER JOIN: the same as INNER JOIN
Table 1 joins Table 2 and returns the rows common to table 1 and table 2
b. LEFT JOIN: Table 1 LEFT JOIN table 2, based on table 1, matches the relevant information of Table 2; Use the most; Even if there is no match in the right table, all rows are returned from the left table
c. RIGHT JOIN: Table 1 RIGHT JOIN table 2, based on table 2, matches the relevant information of Table 1; Even if there is no match in the left table, all rows are returned from the right table
d. Full connection: full connection cannot be realized directly in MySQL2) INNER JOIN
Case operation:
Connect the investment tab and users table into a large table to display data with the same useridSELECT * FROM investmenttab INNER JOIN users ON investmenttab.userid = users.userid; # using SELECT * FROM investmenttab INNER JOIN users USING(userid); # inner ellipsis SELECT * FROM investmenttab JOIN users ON investmenttab.userid = users.userid; # on can be selected as different fields. You must ensure that the contents of the fields are the same data
3) LEFT JOIN left join
The table on the left shall prevail:
Connect the investment tab and users into a large table, and the table on the left shall prevailSELECT * FROM investmenttab LEFT JOIN users ON investmenttab.userid = users.userid; # Some fields in the join table SELECT a.userid,b.age FROM investmenttab AS a LEFT JOIN users AS b ON a.userid = b.userid;
Case operation:
Case operation: query the number and amount of investment in each regionSELECT COUNT(DISTINCT a.userid),b.area,SUM(a.investment) FROM investmenttab AS a LEFT JOIN users AS b ON a.userid = b.userid GROUP BY b.area;
4) RIGHT JOIN right join
The table on the right shall prevail:
Connect the investment tab and users into a large table, and the table on the right shall prevailSELECT * FROM investmenttab RIGHT JOIN users ON investmenttab.userid = users.userid WHERE investmenttab.investmentdate='2016-12-31';
(3) Subquery (nested query)
-
Subquery
1) When a query is part of another query, we call the inner query a sub query (inner query) and the outer query a parent query (main query)
2) Multi table query can be realized through sub query. The query statement may include keywords such as in, any, all and exists. In addition, it may contain comparison operators -
Sub query classification
1) WHERE subquery
2) FROM subquery -
WHERE subquery
Syntax structure:SELECT * FROM t WHERE col > (SELECT XXX FROM t)
1) Use operator
Case operation 1:
Query the investment sheet information in the investment tab where the investment amount is lower than the doc No.'ph-000000001 'SELECT * FROM investmenttab WHERE investment < ( SELECT investment FROM investmenttab WHERE investmentlistid = 'ph-000000001' );
Case operation 2:
Query the investment sheet information whose investment amount and investment cycle are consistent with the doc no. 'ph-000000001' in the investment tabSELECT * FROM investmenttab WHERE (investment,investdays) = ( SELECT investment,investdays FROM investmenttab WHERE investmentlistid = 'ph-000000001' );
Note:
a. The field names after where must be in the same order as the field names in the sub query
b. The number of fields after where should be consistent with the number of fields in the sub query2) Use IN keyword
When the condition of the main query is the result of the sub query, you can use the keyword in to judge. On the contrary, if the condition of the main query is not the result of the sub query, you can use the keyword not in to judge
Case operation:
Query the investment information of users in North ChinaSELECT * FROM investmenttab WHERE userid IN ( SELECT userid FROM users WHERE area='North China' );
-
FROM subquery
Syntax structure:SELECT t1.* FROM (SELECT field FROM Table name) AS t1 WHERE condition
Case operation:
Query the total investment of each city, rank according to the total investment, and find the top 10 citiesSELECT * FROM ( SELECT b.city,SUM(investment),ROW_NUMBER() over(ORDER BY SUM(investment) DESC) AS rank_num FROM investmenttab AS a INNER JOIN users AS b ON a.userid = b.userid GROUP BY b.city ) AS t WHERE t.rank_num <= 10;
-
WITH temporary table
1) On mysql8 After 0, the WITH table function is added to enable corresponding sub queries
2) The WITH temporary table, also known as the sub query part, is used to define an SQL fragment that will be used by the whole SQL statement
3) The return result of the WITH clause is stored in the user's temporary table space. It is queried only once and used repeatedly to improve efficiency
4) Grammatical structureWITH t1 AS (SELECT field FROM Table name) SELECT * FROM t1
Case operation:
Query the total investment of each city, rank according to the total investment, and find the top 10 citiesWITH t AS ( SELECT b.city,SUM(investment),ROW_NUMBER() over(ORDER BY SUM(investment) DESC) AS rank_num FROM investmenttab AS a INNER JOIN users AS b ON a.userid = b.userid GROUP BY b.city ) SELECT * FROM t WHERE t.rank_num <= 10;
-
Sorting column problem
1) There is an order by statement in MySQL to realize sorting, but in practice, we need to add a new column to the sorting result as the sorting sequence, that is, the ranking of the sorting result needs to be displayed together
2) On mysql5 7, there is no corresponding windowing function, which needs to be compared by using programmed logic, that is, the idea of introducing variables
3) And on mysql8 0, the corresponding windowing function is provided, which can be used directly
4)MySQL8.0 sort
Syntax structure:Usage 1: sorting without grouping ROW_NUMBER() OVER(ORDER BY field DESC) For example: ROW_NUMBER() OVER(ORDER BY Student achievement DESC) Indicates that the grades of all students are ranked from high to low, regardless of class Usage 2: Group sorting ROW_NUMBER() OVER(PARTITION BY Field 1 ORDER BY Field 2 DESC) It means grouping according to field 1 and sorting according to field 2 within the group. The value calculated by this function represents the sequence number after internal sorting of each group For example: ROW_NUMBER() OVER(PARTITION BY class ORDER BY Student achievement DESC) This function indicates the grouping according to "class" and sorting according to "student achievement" within each "class". The value calculated by this function represents the sequence number after sorting within each "class"
5) Explain
ROW_NUMBER() functions as a number
PARTITION BY partitions the same data
ORDER BY causes the data to be sorted in a certain order
6) Three sorts
It will be sorted according to the user's investment amount, from large to small, and the sorted results will be displayed in the query results at the same time# ROW_NUMBER() simple sort # It will be sorted according to the user's investment amount, from large to small, and the sorted results will be displayed in the query results at the same time SELECT investment,ROW_NUMBER() over(ORDER BY investment DESC) AS 'row_number' FROM investmenttab; # DENSE_RANK() does not occupy the same position SELECT investment,DENSE_RANK() over(ORDER BY investment DESC) AS 'dense_rank' FROM investmenttab; # RANK() same ranking SELECT investment,RANK() over(ORDER BY investment DESC) AS 'rank' FROM investmenttab;
7) PARTITION BY grouping sort
Case operation: find the ten cities with the highest investment amount in each regionSELECT * FROM ( SELECT b.area,b.city,ROW_NUMBER() over(PARTITION BY b.area ORDER BY SUM(a.investment) DESC) AS 'rank' FROM investmenttab AS a INNER JOIN users AS b ON a.userid = b.userid GROUP BY b.area,b.city ) AS t WHERE t.rank <= 10;