Oracle CDC Overview - not available

1. Introduction to Oracle CDC

CDC(Change Data Capture). Many people believe that as long as it involves database data replication and incremental data extraction, it is necessary to buy charging software. In fact, we can also realize database data replication and incremental data extraction through free tools such as CDC and LogMiner provided by Oracle. Various data replication software only makes it easier to obtain incremental data, or can support more extended functions (such as synchronization between heterogeneous databases, data cleaning and replacement of ETL process), However, in fact, Oracle itself supports CDC mechanism, but few people pay attention to it, the operation is somewhat complex, and it is rumored to be unstable. We often see some inexplicable problems on the forum.

2. Overview of incremental data acquisition

Data acquisition usually refers to the Extract data extraction part in the ETL process. In addition to ETL, different application systems usually need to transfer data. Under certain environmental conditions, data cannot be directly moved from one system to another. Data can only be transferred with the help of text as an intermediate medium, and the generation of text is limited by time window. Therefore, there are certain requirements for the performance of data acquisition, i.e. data extraction. The following methods are commonly used to increase data collection:

  • 1. Timestamps on rows
  • 2. Version Numbers on rows
  • 3. Status indicators on rows
  • 4. Mixed use of timestamp, version number and status indication (Time/Version/Status on rows)
  • 5. Triggers on tables
  • 6. Table differentiation
  • 7. Log scanners on databases

For incremental data collection, two main schemes are introduced in Oracle, one is the familiar materialized view, and the other is the CDC component (Change Data Capture) to be introduced in this paper.

The CDC feature is introduced into Oracle9i Database. CDC can help you identify the data that has changed since the last extraction. With CDC, you can extract the data while performing INSERT, UPDATE or DELETE operations on the source table, and the changed data is saved in the change table of the database. In this way, you can capture the changed data, and then use the database view to A controllable way is provided to the target system.

3. Publish subscribe model of CDC

The CDC architecture is based on the publisher / subscriber model. The publisher captures the change data and provides it to the subscriber. The subscriber uses the change data obtained from the publisher. Usually, the CDC system has one publisher and multiple subscribers. The publisher first needs to identify the source table required to capture the change data. Then, it captures the change data and saves it in a specially created change database It also enables subscribers to control access to change data. Subscribers need to know what change data they are interested in. A subscriber may not be interested in all the data published by the publisher. Subscribers need to create a subscriber view to access the change data authorized by the publisher.

CDC has several important basic concepts that need to be clarified first:

  • Source table, the source table of the business database that needs to capture data
  • The change table stores the change data captured from the source table (including data generated by various DML S)
  • A change set is a data set that ensures transaction consistency. A change set corresponds to multiple change tables
  • Subscription view, which provides a view for reading change table data
  • The subscription window defines the time range for viewing change data. It is like a sliding window for observing change data. After the change data processing is completed, you can clear the subscription window.

4. Oracle CDC mechanism

4.1 synchronous replication

Synchronous Change Data Capture Configuration

The principle is very simple. The original table and the target table must be the same library. The trigger mechanism (after setting the synchronization CDC, the trigger cannot be seen, but the actual operation mechanism is still the trigger mechanism) is used to copy the contents of the original table to another target table. This mechanism is not much different from building triggers for the table itself.

4.2 asynchronous online log CDC

Asynchronous HotLog Configuration

There is no trigger in this process, but Redo Log is used, but online log is used instead of archive log. In addition, the original table and target table must still be the same library. This mode is relatively simple. At the same time, this mode is generated only after Oracle 10, and 9i does not have this mechanism.

4.3 asynchronous distributed CDC

Asynchronous Distributed HotLog Configuration

In fact, this mode is an optimization of asynchronous online log CDC, which is easy to understand. It adds the DB-LINK mechanism so that the original table and the target table are not in the same database. In fact, it is no essential difference from asynchronous online log CDC.

4.4 asynchronous online log replication CDC

Asynchronous Autolog Online Change Data Capture Configuration

Asynchronous online log replication CDC mode is much more advanced. Using Standby Redo Log (hot standby database log) actually uses Oracle's hot standby mechanism to write the log to the hot standby database, and the target table can be established on the hot standby database, which further reduces the impact on the performance of the primary database.

4.5 archive log CDC

Asynchronous AutoLog Archive Change Data Capture Configuration

Archive log CDC mode is the most perfect mode, but it needs a mechanism to obtain archive logs (parallel file system technology), and then analyze archive logs at the target end for change data processing. Theoretically, this mode can almost not affect the performance of the original database.

Frankly speaking, I don't have a deep understanding of Oracle. I just read a little more to solve specific problems. If I encounter similar problems to be solved in real work, or students who are crazy about technology can study them. I pasted the specific setting steps of four modes. Although they are in English, they are still very clear. (I recommend the second one because the setting is relatively simple and the performance is standard. If there are no special requirements, you can use asynchronous online log CDC.

The following is the deployment test of the asynchronous online log CDC environment.

5. CDC related database objects (packages)

  • Package

n DBMS_CDC_PUBLISH, used to define publishing operations

n DBMS_CDC_SUBSCRIBE, used to define subscription operations

  • role

n EXECUTE_CATALOG_ROLE

n SELECT_CATALOG_ROLE

n CREATE TABLE and CREATE SESSION privileges

n EXECUTE on the DBMS_CDC_PUBLISH package

  • view

n ALL_SOURCE_TABLES source table

n ALL_PUBLISHED_COLUMNS

n All_Subscribed_Columns

n All_Subscriptions

n All_Subscribed_Tables

6. Implementation steps of CDC

The synchronous mode is captured in the form of trigger. This implementation is not as flexible as using trigger, and its impact on performance is greater than that of capturing change data from logs. Therefore, this paper uses asynchronous Hotlog in asynchronous capture to explain the specific implementation steps of CDC

6.1 preparing data and users

SQL>show user
scott
SQL>create table t1 ( a_num number , b_varchar varchar2(10),c_char char(10));#Test table
SQL>conn / as sysdba
SQL>create user cdcpub identified by cdcpub; #Publisher (publisher)
SQL>create user subscriber1 identified by subscriber1;#Subscriber

6.2 adjust database parameters (to be modified according to the actual situation)

compatible = 10.2.0
java_pool_size = 50000000
job_queue_processes = 2
parallel_max_servers = + 5
processes = + 7
sessions = + 2
streams_pool_size = + 21 MB
undo_retention = 3600

6.3 modify database log mode

#The database is in archivelog mode by default
SQL>show user
USER is "sys"
SQL>ALTER DATABASE FORCE LOGGING; #This step is optional but recommended;
SQL>ALTER DATABASE ADD SUPPLEMENTAL LOG DATA;#This step needs to be performed;
SQL>ALTER TABLE scott.t1
ADD SUPPLEMENTAL LOG GROUP log_group_t1 (a_num,b_varchar,c_char) ALWAYS;

#To capture all columns in the t1 table, replace the alter table statement above with the following statement:

#ALTER TABLE scott.t1 ADD SUPPLEMENTAL LOG DATA (ALL) #COLUMNS;

6.4 authorization

ALTER USER cdcpub QUOTA UNLIMITED ON SYSTEM
QUOTA UNLIMITED ON SYSAUX;
GRANT CREATE SESSION TO cdcpub;
GRANT CREATE TABLE TO cdcpub;
GRANT CREATE TABLESPACE TO cdcpub;
GRANT UNLIMITED TABLESPACE TO cdcpub;
GRANT SELECT_CATALOG_ROLE TO cdcpub;
GRANT EXECUTE_CATALOG_ROLE TO cdcpub;
GRANT CREATE SEQUENCE TO cdcpub;
GRANT DBA TO cdcpub;
GRANT EXECUTE on DBMS_CDC_PUBLISH TO cdcpub;
EXECUTE DBMS_STREAMS_AUTH.GRANT_ADMIN_PRIVILEGE(GRANTEE => 'cdcpub');

6.5 prepare source table

SQL>show user
USER is "SYS"
SQL>
BEGIN
DBMS_CAPTURE_ADM.PREPARE_TABLE_INSTANTIATION(TABLE_NAME => 'scott.t1');
END;

6.6 create change set

SQL>show user
USER is "cdcpub"
SQL>
BEGIN
DBMS_CDC_PUBLISH.CREATE_CHANGE_SET(
change_set_name => 'cdc_test_cs',
description => 'Change set for scott.t info',
change_source_name => 'HOTLOG_SOURCE',
stop_on_ddl => 'y'
);
END;

6.7 create change table

SQL>show user
USER is "cdcpub"
SQL>
BEGIN
DBMS_CDC_PUBLISH.CREATE_CHANGE_TABLE(
owner => 'cdcpub',
change_table_name => 't1_ct',
change_set_name => 'cdc_test_cs',
source_schema => 'scott',
source_table => 't1',
column_type_list =>'a_num number,b_varchar varchar2(10),c_char char(10)',
capture_values => 'both',
rs_id => 'y',
row_id => 'n',
user_id => 'n',
timestamp => 'n',
object_id => 'n',
source_colmap => 'n',
target_colmap => 'y',
options_string => '');
END;

6.8 activate change set

SQL>show user
USER is "cdcpub"
SQL>
BEGIN
DBMS_CDC_PUBLISH.ALTER_CHANGE_SET(
change_set_name => 'cdc_test_cs',
enable_capture => 'y');
END;

6.9 authorization to subscribers

SQL>show user
USER is "cdcpub"
SQL> GRANT SELECT ON cdcpub.t1_ct TO subscriber1;
SQL>show user
USER is "SYS"
SQL>GRANT CREATE TABLE TO subscriber1;
SQL>GRANT CREATE SESSION TO subscriber1;
SQL>GRANT CREATE VIEW TO subscriber1;
SQL>GRANT UNLIMITED TABLESPACE TO subscriber1;

6.10 subscription change data

  • Create subscription set
SQL>show user
USER is "subscriber1"
SQL>
BEGIN
DBMS_CDC_SUBSCRIBE.CREATE_SUBSCRIPTION(
change_set_name => 'cdc_test_cs',
description => 'Change data for t1',
subscription_name => 't1_sub');
END;
  • Subscription source table and related fields in the source table
SQL>show user
USER is "subscriber1"
SQL>
BEGIN
DBMS_CDC_SUBSCRIBE.SUBSCRIBE(
subscription_name => 't1_sub',
source_schema => 'scott',
source_table => 't1',
column_list => 'a_num,b_varchar,c_char',
subscriber_view => 't1_view');
END;
  • Activate subscription
SQL>show user
USER is "subscriber1"
SQL>
BEGIN
DBMS_CDC_SUBSCRIBE.ACTIVATE_SUBSCRIPTION(
subscription_name => 't1_sub');
END;
  • Extended subscription window
SQL>show user
USER is "subscriber1"
SQL>
BEGIN
DBMS_CDC_SUBSCRIBE.EXTEND_WINDOW(
subscription_name => 't1_sub');
END;
//extend_window function description: this process is to set the high-level boundary of the subscription window, as long as it is set
//New data can be seen only when new data is;
  • View subscription view content
SQL>show user
USER is "subscriber1"
SQL>select * from t1_view; #Because there is no data change in scott.t1 table, there is no data in the current view
no rows selected
SQL>conn scott/tiger
SQL> insert into t1 values(2,'cdc_test1','cdc');
SQL>commit;
SQL>conn subscriber1/subscriber1
SQL>
BEGIN
DBMS_CDC_SUBSCRIBE.EXTEND_WINDOW(
subscription_name => 't1_sub');
END;

SQL>select OPERATION$,a_num,b_varchar,c_char from t1_view
OP A_NUM B_VARCHAR C_CHAR
-- ---------- ---------- ----------
I 2 cdc_test1 cdc

  • Clear changeset data
    If the current changed data is no longer needed, it can be cleared to make room for the subsequently generated data;
BEGIN
DBMS_CDC_SUBSCRIBE.PURGE_WINDOW(
subscription_name => 'SALES_SUB');
END;
  • Delete subscription:
    If the current subscription does not need to be reused, it can be deleted;
BEGIN
DBMS_CDC_SUBSCRIBE.DROP_SUBSCRIPTION(
subscription_name => 't1_sub');
END;

Keywords: Operation & Maintenance Database Oracle

Added by me1000 on Thu, 18 Nov 2021 18:34:25 +0200