A problem that the backup database of Data Guard does not apply logs

I have encountered a set of Library login prompts these two days,

Check the current archive log path. The space utilization rate has reached 100%, so delete the archive log files 30 days ago in rman,

DELETE ARCHIVELOG ALL COMPLETED BEFORE 'SYSDATE-30';

This error is prompted. It turns out that this is a set of DG, which is hasty. He means that these archived logs need to be prepared for the database, so they are not allowed to be deleted,

RMAN-08137: warning: archived log not deleted, needed for standby or upstream capture process
archived log file name=... thread=1 sequence=...

We log in to the standby database and find that the archive space is also 100% used. Is there a problem with synchronization?

After a random knock, it was found that the standby database was not started, which was probably caused by previous shutdown and maintenance.

Then, start to the mount state, start the listener at the same time, and execute the log application,

alter database recover managed standby database using current logfile disconnect from session;

However, when you see the log of MRP process waiting for sequence=61,

SQL> select process,status,client_process,thread#,sequence#,block#,active_agents,known_agents from  gv$managed_standby where process in('LNS','RFS','LGWR','MRP0') and thread# <>0;
PROCESS   STATUS       CLIENT_P    THREAD#  SEQUENCE#     BLOCK# ACTIVE_AGENTS KNOWN_AGENTS
--------- ------------ -------- ---------- ---------- ---------- ------------- ------------
MRP0      WAIT_FOR_GAP N/A               1         61          0             9            9

By retrieving v$archived_log, there are many logs that are not synchronized,

USERENV('INSTANCE')    THREAD#        LSQ        HSQ
------------------- ---------- ---------- ----------
                  1          1         61        316

Execute catalog start with in rman (import the latest backup set and archive log file list into the control document),

catalog start with '/archive';

After a while, MRP is now waiting for sequence=317,

SQL> select process,status,client_process,thread#,sequence#,block#,active_agents,known_agents from  gv$managed_standby where process in('LNS','RFS','LGWR','MRP0') and thread# <>0;
PROCESS   STATUS       CLIENT_P    THREAD#  SEQUENCE#     BLOCK# ACTIVE_AGENTS KNOWN_AGENTS
--------- ------------ -------- ---------- ---------- ---------- ------------- ------------
MRP0      WAIT_FOR_GAP N/A               1        317          0             9            9

Keep still, register the log manually,

alter database register logfile '/archive/1_317_xxxxxxxxxx.dbf';

The catalog is repeated several times, and the alert record of the standby database still indicates that the space is full,

Errors in file /oracle/app/oracle/diag/rdbms/conflundg/conflundg/trace/xxxxx_arc1_xxxxx.trc:
ORA-19502: write error on file "/archive/1_433_xxxxxxxxxx.dbf", block number xxxxxx (block size=512)
ORA-27061: waiting for async I/Os failed
Linux-x86_64 Error: 28: No space left on device

Find out the archive logs that can be deleted at present, and delete them at the physical layer to free up the archive space,

select 'rm -rf '||name from  v$archived_log where name like '%.dbf' and applied='YES' and completion_time<=sysdate-30;

However, it is found that there are still a lot of broken logs in the archive space of the standby database. Each scp is very tired. Or rebuild the standby database according to the recommendation and start over. After a while, we can only say that Oracle is really resistant. The logs begin to synchronize, and the sequence s of the primary and standby databases are almost the same,

SQL> select process,status,client_process,thread#,sequence#,block#,active_agents,known_agents from  gv$managed_standby where process in('LNS','RFS','LGWR','MRP0') and thread# <>0;
PROCESS   STATUS       CLIENT_P    THREAD#  SEQUENCE#     BLOCK# ACTIVE_AGENTS KNOWN_AGENTS
--------- ------------ -------- ---------- ---------- ---------- ------------- ------------
MRP0      APPLYING_LOG N/A               1        537     181996             9            9
RFS       IDLE         Archival          1          0          0             0            0
RFS       IDLE         LGWR              1        537     181996             0            0

In fact, both the primary and standby databases have created crontab to regularly delete archived logs. However, through debugging, it is found that the folder specified in the script to store execution logs has been deleted, resulting in execution interruption.

Through this case, on the one hand, it shows that the robustness of the task script can also be improved. For example, judging whether a folder exists will at least not lead to the whole logic error due to a non key factor. On the other hand, it also exposes the problem of monitoring coverage.

Therefore, for the above problems and scenarios, the following two monitoring point functions can be added,

1. Detection and monitoring of database availability to avoid abnormal shutdown and unopened of database.

2. Monitor the execution of archive log deletion task to avoid execution failure and archive log not deleted.

In addition, in this process, it is exposed that the understanding of the operation and principle of rman tool is still quite rusty and needs to be targeted.

Programming VIP

A problem that the backup database of Data Guard does not apply logs

Popular Keywords