Optimized case of semi-connection and anti-connection

Let's first understand some concepts:

Semi-join: Two tables are associated, returning only matched data and only one table. Semi-join generally refers to the occurrence of IN and EXISTS in sub-queries.

Anti-join: A connection between two tables that returns only the data of the main table and only the data that is not associated with the main table and the sub-table is called an anti-join. Anti-connection refers to NOT IN and NOT EXISTS.

Subquery expansion: The optimizer expands the nested subqueries into an equivalent JOIN, and then optimizes the JOIN. If it does not expand, every data retrieved from the main table is matched by substituting a sub-query. Generally speaking, the efficiency is relatively low. At this time, the key word FILTER will appear in the execution plan. When the sub-query does not expand, SQL can not optimize the whole query from the connection mode, connection order and so on. Because these are fixed.

Sort and merge join: Sort and merge join principle is to sort two table/row sources according to JOIN columns (of course, when sorting, kick out columns that do not meet where criteria), and then join. Sort-and-merge connections are only applicable to non-equivalent JOINs. According to the principle of sorting and merging, we know that sorting and merging joins are resource-intensive, because to sort two tables/result sets, two tables need to be placed in PGA, so in general, CBO will not choose SORT MERGE JOIN.

Case Background: A new SQL statement was added to a system, and the result could not run out....

INSERT INTO AAA_AAAA.AAA_MID_FFF
  SELECT S.AS_OF_DATE,
         S.CUST_ACCT_NO,
         S.GL_ACCT_ID,
         S.ORGIN_DT,
         S.DUE_DT,
         S.TRAN_METHOD,
         S.INTEREST_RATE,
         S.TRAN_END_RATE,
         S.INTEREST_RATE - S.TRAN_END_RATE MARGIN_RATE
    FROM AAA_AAAA.AAA_MID_INSTAAAAAA S
   WHERE S.CUST_ACCT_NO NOT IN
         (SELECT DISTINCT T.CUST_ACCT_NO FROM AAA_MID_FFF T)
     AND S.MA_ACCT_NO IN
         (SELECT
           MA_ACCT_NO
            FROM (SELECT R.MA_ACCT_NO,
                         ROW_NUMBER() OVER(PARTITION BY R.CUST_ACCT_NO ORDER BY R.ORGIN_DT) RN
                    FROM AAA_MID_FFF_RESULT R
                   WHERE R.USED_TRAN_METHOD = 'Repricing_Term') M
           WHERE M.RN = 1);
Plan hash value: 466080511
 
---------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name               | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------
|   0 | INSERT STATEMENT             |                    |     1 |   513 |   416   (1)| 00:00:05 |
|   1 |  LOAD TABLE CONVENTIONAL     | AAA_MID_FFF        |       |       |            |          |
|*  2 |   FILTER                     |                    |       |   513 |   416   (1)| 00:00:05 |
|*  3 |    HASH JOIN ANTI NA         |                    |     1 |   655 |   416   (1)| 00:00:05 |
|   4 |     TABLE ACCESS FULL        | AAA_MID_INSTAAAAAA |     1 |   495 |     2   (0)| 00:00:01 |
|   5 |     TABLE ACCESS FULL        | AAA_MID_FFF        |  107K | 1893K |     2   (0)| 00:00:01 |
|*  6 |    VIEW                      |                    |     1 |   155 |     3  (34)| 00:00:01 |
|*  7 |     WINDOW SORT PUSHED RANK  |                    |     1 |   355 |     3  (34)| 00:00:01 |
|*  8 |      TABLE ACCESS FULL       | AAA_MID_FFF_RESULT |     1 |   355 |     2   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - filter(EXISTS (SELECT 0 FROM (SELECT "S"."MA_ACCT_NO" "MA_ACCT_NO",ROW_NUMBER() OVER(
   					PARTITION BY "S"."CUST_ACCT_NO" ORDER BY "S"."ORGIN_DT") "RN" FROM
   					"ANNE_WORK"."AAA_MID_FFF_RESULT" "S" WHERE "S"."USED_TRAN_METHOD"='Repricing_Term') "M" WHERE
   					"MA_ACCT_NO"=:B1 AND "M"."RN"=1))
   3 - access("S"."CUST_ACCT_NO"="S"."CUST_ACCT_NO" AND
   					SUBSTR("CUST_ACCT_NO",1,16)=SUBSTR("S"."CUST_ACCT_NO",1,16))
   6 - filter("MA_ACCT_NO"=:B1 AND "M"."RN"=1)
   7 - filter(ROW_NUMBER() OVER ( PARTITION BY "R"."CUST_ACCT_NO" ORDER BY 
              "R"."ORGIN_DT")<=1)
   8 - filter("R"."USED_TRAN_METHOD"='Repricing_Term')

Execution plan Id =2 FILTER predicate information

2 - filter(EXISTS (SELECT 0 FROM (SELECT "S"."MA_ACCT_NO" "MA_ACCT_NO",ROW_NUMBER() OVER(
                       PARTITION BY "S"."CUST_ACCT_NO" ORDER BY "S"."ORGIN_DT") "RN" FROM
                       "ANNE_WORK"."AAA_MID_FFF_RESULT" "S" WHERE "S"."USED_TRAN_METHOD"='Repricing_Term') "M" WHERE "MA_ACCT_NO"=:B1 AND "M"."RN"=1))

Subqueries as a whole are not expanded. FILTER is called an improved nested loop, which can be optimized as NEST LOOP. First, look at the number of rows returned by the driver table. This driver table is the result set of HASH connection of AAA_MID_INSTAAAA and AAA_MID_FFF. The results of COUNT(*) tables are 4kw and 100k, respectively. The 100k tables are the counteracting tables (similar to the tables of the faulty pipelining records), that is to say, the tables are constructed for the purpose of not repeating the pipelining numbers. So backlinking to such a table does not reduce the magnitude of the entire result set. So the result set of HASH connection is close to the COUNT of AAA_MID_INSTAAAA, with 4 kW rows as the driving table of NEST LOOP and 4 kW cycles. As mentioned earlier, FILTER can not change the connection mode and order. The common method is rewriting (using WITH AS). Here we use HINT to expand subqueries

INSERT INTO AAA_AAAA.AAA_MID_FFF
  SELECT S.AS_OF_DATE,
         S.CUST_ACCT_NO,
         S.GL_ACCT_ID,
         S.ORGIN_DT,
         S.DUE_DT,
         S.TRAN_METHOD,
         S.INTEREST_RATE,
         S.TRAN_END_RATE,
         S.INTEREST_RATE - S.TRAN_END_RATE MARGIN_RATE
    FROM AAA_AAAA.AAA_MID_INSTAAAAAA S
   WHERE S.CUST_ACCT_NO NOT IN
         (SELECT DISTINCT T.CUST_ACCT_NO FROM AAA_MID_FFF T)
     AND S.MA_ACCT_NO IN
         (SELECT
          /*+ unnest */
           MA_ACCT_NO
            FROM (SELECT R.MA_ACCT_NO,
                         ROW_NUMBER() OVER(PARTITION BY R.CUST_ACCT_NO ORDER BY R.ORGIN_DT) RN
                    FROM AAA_MID_FFF_RESULT R
                   WHERE R.USED_TRAN_METHOD = 'Repricing_Term') M
           WHERE M.RN = 1);

Plan hash value: 2087877310
 
--------------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name                  | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------------
|   0 | INSERT STATEMENT               |                       |     1 |   655 |   415   (1)| 00:00:05 |
|   1 |  LOAD TABLE CONVENTIONAL       | AAA_MID_FFF           |       |       |            |          |
|*  2 |   HASH JOIN ANTI NA            |                       |     1 |   655 |   415   (1)| 00:00:05 |
|   3 |    MERGE JOIN SEMI             |                       |     1 |   637 |     4  (50)| 00:00:01 |
|   4 |     TABLE ACCESS BY INDEX ROWID| AAA_MID_INSTAAAAAA    |     1 |   495 |     0   (0)| 00:00:01 |
|   5 |      INDEX FULL SCAN           | PK_AAA_MID_INSTAAAAAA |     1 |       |     0   (0)| 00:00:01 |
|*  6 |     SORT UNIQUE                |                       |     1 |   142 |     4  (50)| 00:00:01 |
|   7 |      VIEW                      | VW_NSO_1              |     1 |   142 |     3  (34)| 00:00:01 |
|*  8 |       VIEW                     |                       |     1 |   155 |     3  (34)| 00:00:01 |
|*  9 |        WINDOW SORT PUSHED RANK |                       |     1 |   355 |     3  (34)| 00:00:01 |
|* 10 |         TABLE ACCESS FULL      | AAA_MID_FFF_RESULT    |     1 |   355 |     2   (0)| 00:00:01 |
|  11 |    TABLE ACCESS FULL           | AAA_MID_FFF           |   107K|  1893K|   410   (1)| 00:00:05 |
--------------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - access("S"."CUST_ACCT_NO"="T"."CUST_ACCT_NO" AND 
              SUBSTR("CUST_ACCT_NO",1,16)=SUBSTR("T"."CUST_ACCT_NO",1,16))
   6 - access("S"."MA_ACCT_NO"="MA_ACCT_NO")
       filter("S"."MA_ACCT_NO"="MA_ACCT_NO")
   8 - filter("M"."RN"=1)
   9 - filter(ROW_NUMBER() OVER ( PARTITION BY "R"."CUST_ACCT_NO" ORDER BY "R"."ORGIN_DT")<=1)
  10 - filter("R"."USED_TRAN_METHOD"='Repricing_Term')

Note the writing and location of this HINT (HINT is only valid for the current query block). The execution plan after adding HINT does eliminate the "FILTER" that causes performance problems, but the effect is not very good. It still takes 9000s + to return the results. According to common sense, this amount of data should normally respond within 10 minutes.

From the execution plan, we can see the MERGE JOIN "long time no see". Above, we talked about the principle of SORT MERGE JOIN. Here we talk about the applicable scenario of this connection mode: only for non-equivalent connections, which is the only applicable scenario for him. In other words, if it is an equivalent connection, HASH is much more efficient than SMJ unless the connection columns are sorted and memory is large enough. Of course, the bottleneck here is Id =4, 5, we found that there is no *, no filtering before 4, 5, in this case, the index INDEX FULL SCAN, and then all the table. Both operations are read in blocks. A rough estimate is that the data block of the original table is scanned twice and read in a single block. So it takes 128*2 times as long to scan the whole table of 4000w rows at a time. In this case, we only need FULL(S) to scan the whole table, plus the implementation plan after HINT.

INSERT INTO AAA_AAAA.AAA_MID_FFF
  SELECT /*+ FULL(S) */S.AS_OF_DATE,
         S.CUST_ACCT_NO,
         S.GL_ACCT_ID,
         S.ORGIN_DT,
         S.DUE_DT,
         S.TRAN_METHOD,
         S.INTEREST_RATE,
         S.TRAN_END_RATE,
         S.INTEREST_RATE - S.TRAN_END_RATE MARGIN_RATE
    FROM AAA_AAAA.AAA_MID_INSTAAAAAA S
   WHERE S.CUST_ACCT_NO NOT IN
         (SELECT DISTINCT T.CUST_ACCT_NO FROM AAA_MID_FFF T)
     AND S.MA_ACCT_NO IN
         (SELECT
          /*+ unnest */
           MA_ACCT_NO
            FROM (SELECT R.MA_ACCT_NO,
                         ROW_NUMBER() OVER(PARTITION BY R.CUST_ACCT_NO ORDER BY R.ORGIN_DT) RN
                    FROM AAA_MID_FFF_RESULT R
                   WHERE R.USED_TRAN_METHOD = 'Repricing_Term') M
           WHERE M.RN = 1);
           
Plan hash value: 258356950
 
---------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name               | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------
|   0 | INSERT STATEMENT             |                    |     1 |   655 |   416   (1)| 00:00:05 |
|   1 |  LOAD TABLE CONVENTIONAL     | AAA_MID_FFF        |       |       |            |          |
|*  2 |   HASH JOIN ANTI NA          |                    |     1 |   655 |   416   (1)| 00:00:05 |
|*  3 |    HASH JOIN SEMI            |                    |     1 |   637 |     5  (20)| 00:00:01 |
|   4 |     TABLE ACCESS FULL        | AAA_MID_INSTAAAAAA |     1 |   495 |     2   (0)| 00:00:01 |
|   5 |     VIEW                     | VW_NSO_1           |     1 |   142 |     3  (34)| 00:00:01 |
|*  6 |      VIEW                    |                    |     1 |   155 |     3  (34)| 00:00:01 |
|*  7 |       WINDOW SORT PUSHED RANK|                    |     1 |   355 |     3  (34)| 00:00:01 |
|*  8 |        TABLE ACCESS FULL     | AAA_MID_FFF_RESULT |     1 |   355 |     2   (0)| 00:00:01 |
|   9 |    TABLE ACCESS FULL         | AAA_MID_FFF        |   107K|  1893K|   410   (1)| 00:00:05 |
---------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - access("S"."CUST_ACCT_NO"="T"."CUST_ACCT_NO" AND 
              SUBSTR("CUST_ACCT_NO",1,16)=SUBSTR("T"."CUST_ACCT_NO",1,16))
   3 - access("S"."MA_ACCT_NO"="MA_ACCT_NO")
   6 - filter("M"."RN"=1)
   7 - filter(ROW_NUMBER() OVER ( PARTITION BY "R"."CUST_ACCT_NO" ORDER BY 
              "R"."ORGIN_DT")<=1)
   8 - filter("R"."USED_TRAN_METHOD"='Repricing_Term')

At this time, we found that the execution plan actually went back to the normal connection of HASH JOIN. Why?

Think about - -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Because in the case of equivalence join, if the associated columns are ordered, CBO considers that SMJ may be more efficient than HASH. In the case, the result set of the following analysis function has been sorted, the table join column above has an index, and the storage of the index is ordered, so it just meets the conditions that CBO considers to use SMJ. When we force S table full table scan, CBO can't get ordered data, so the connection mode is changed to HASH JOIN.

Keywords: SQL

Added by roneill on Sun, 12 May 2019 12:38:52 +0300