[12c] let you know the collection of Extended Statistics of Oracle 12c database

In the Oracle database, the optimizer needs to rely on the collection of statistical information of related objects in order to generate the optimal execution plan. The database can automatically collect the execution plan or use the package DBMS_STATS is collected manually. This article does not explain the collection of general statistics. It focuses on the collection of Extended Statistics.

1 Introduction to extended statistics

When executing SQL, the statistics of a single column after the where condition can easily determine the predicate selectivity of the corresponding column. However, if multiple columns of the same table are included as predicates after the where condition, the statistics of a single column cannot show the relationship between columns. At this time, the execution plan obtained by the optimizer based on the statistics of a single column may not be optimal Yes. This problem can be solved by column group. By collecting statistical information of column group, we can reflect the relationship between a group of columns, so that the optimizer can choose the optimal execution plan. The statistical information of column group is called extended statistical information. In Oracle, extended statistics include:

  • Column group statistics: when multiple columns of a table appear in a SQL statement at the same time, this type of extended statistics can improve cardinality estimation
  • Expression statistics: this type of statistics improves optimizer evaluation when expressions are used on predicates.

As shown in the figure: sh.customers Two columns cust in the table_ state_ Province and contry_ The statistics of the ID column, as well as the statistics of the column group composed of these two columns, the name of the column group is generated by the system.

 

2 use general statistics

1) Experimental environment

SQL> select * from v$version;



BANNER CON_ID

-------------------------------------------------------------------------------- ----------

Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production 0

PL/SQL Release 12.1.0.2.0 - Production 0

CORE 12.1.0.2.0 Production 0

TNS for Linux: Version 12.1.0.2.0 - Production 0

NLSRTL Version 12.1.0.2.0 - Production 0

2) Build test data

SQL> create table sh.customers_new as select * from sh.customers;



Table created.

SQL> exec dbms_stats.gather_table_stats('SH','CUSTOMERS_NEW',method_opt => 'for all columns size 1');



PL/SQL procedure successfully completed.

3) Query column statistics

SQL> select column_name,num_distinct,histogram

from dba_tab_col_statistics

where owner='SH' and table_name='CUSTOMERS_NEW' and column_name in('CUST_STATE_PROVINCE','COUNTRY_ID');



COLUMN_NAME NUM_DISTINCT HISTOGRAM

------------------------------ ------------ ---------------

CUST_STATE_PROVINCE 145 NONE

COUNTRY_ID 19 NONE

3) Total number of query tables and satisfying cust_ state_ Amount of data for province ='ca '

SQL> select count(1) from sh.customers_new;



COUNT(1)

----------

55500



SQL> select count(1) from sh.customers_new where cust_state_province='CA';



COUNT(1)

----------

3341

4) View the execution plan corresponding to a single column

SQL> explain plan for

2 select * from sh.customers_new where cust_state_province='CA';



Explained.

SQL> select * from table(dbms_xplan.display);



PLAN_TABLE_OUTPUT

-------------------------------------------------------------------------------------------------------------------------------------

Plan hash value: 3410015392



-----------------------------------------------------------------------------------

| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |

-----------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 383 | 72387 | 423 (1)| 00:00:01 |

|* 1 | TABLE ACCESS FULL| CUSTOMERS_NEW | 383 | 72387 | 423 (1)| 00:00:01 |

-----------------------------------------------------------------------------------



Predicate Information (identified by operation id):

---------------------------------------------------



PLAN_TABLE_OUTPUT

-------------------------------------------------------------------------------------------------------------------------------------



1 - filter("CUST_STATE_PROVINCE"='CA')



13 rows selected.

Look at the following calculation to match the number of rows estimated by the optimizer:

SQL> select 55500/145 from dual;

 

55500/145

----------

382.758621

5) View the execution plan corresponding to the combination condition

SQL> explain plan for

2 select * from sh.customers_new where cust_state_province='CA' and country_id='52790';



Explained.



SQL> select * from table(dbms_xplan.display);



PLAN_TABLE_OUTPUT

-------------------------------------------------------------------------------------------------------------------------------------

Plan hash value: 3410015392



-----------------------------------------------------------------------------------

| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |

-----------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 20 | 3780 | 423 (1)| 00:00:01 |

|* 1 | TABLE ACCESS FULL| CUSTOMERS_NEW | 20 | 3780 | 423 (1)| 00:00:01 |

-----------------------------------------------------------------------------------



Predicate Information (identified by operation id):

---------------------------------------------------



PLAN_TABLE_OUTPUT

-------------------------------------------------------------------------------------------------------------------------------------



1 - filter("CUST_STATE_PROVINCE"='CA' AND "COUNTRY_ID"=52790)



13 rows selected.

The result shows that the optimizer estimates 20 rows, which is the same as the following calculation, but the actual number of rows is 3341. Why?

SQL> select 55500/145/19 from dual;

 

55500/145/19

------------

20.1451906

First of all, the optimizer does not know the relationship between the two columns and CA in country 52790, which greatly underestimates the returned data row;

3 use rank statistics

Through the above experiments, we can know that if there is a certain relationship between columns and the optimizer cannot accurately evaluate the number of rows returned when there is a combination condition in the where condition, then in this case, we can collect the statistical information of the combination columns.

1) Collect composite column statistics

begin
  dbms_stats.gather_table_stats('SH',
                                'CUSTOMERS_NEW',
                                method_opt => 'for all columns size 1,for columns (cust_state_province,country_id) size skewonly');
end;
 /



PL/SQL procedure successfully completed.

2) View the execution plan corresponding to the combination condition

explain plan for
  select * from sh.customers_new where cust_state_province='CA' and country_id='52790';

Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 3410015392

-----------------------------------------------------------------------------------
| Id  | Operation	  | Name	  | Rows  | Bytes | Cost (%CPU)| Time	  |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |		  |  3341 |   655K|   423   (1)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| CUSTOMERS_NEW |  3341 |   655K|   423   (1)| 00:00:01 |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------------------------------------------------------

   1 - filter("CUST_STATE_PROVINCE"='CA' AND "COUNTRY_ID"=52790)

13 rows selected.

It can be seen that the estimated number of rows is 3341, while the actual number of rows is 3341, which is the same as the real result.

SQL> select count(1) from sh.customers_new where cust_state_province='CA' and country_id='52790';

 

COUNT(1)

----------

3341

3) Delete extended statistics

begin

dbms_stats.drop_extended_stats('sh',

'CUSTOMERS_NEW',

'(cust_state_province, country_id)');

end;

/



PL/SQL procedure successfully completed.

The above is a demonstration of column group statistics in extended statistics.

Keywords: Oracle SQL Database Linux

Added by libertyct on Wed, 27 May 2020 15:38:34 +0300