I have encountered a set of Library login prompts these two days,

Check the current archive log path. The space utilization rate has reached 100%, so delete the archive log files 30 days ago in rman,
DELETE ARCHIVELOG ALL COMPLETED BEFORE 'SYSDATE-30';
This error is prompted. It turns out that this is a set of DG, which is hasty. He means that these archived logs need to be prepared for the database, so they are not allowed to be deleted,
RMAN-08137: warning: archived log not deleted, needed for standby or upstream capture process archived log file name=... thread=1 sequence=...
We log in to the standby database and find that the archive space is also 100% used. Is there a problem with synchronization?
After a random knock, it was found that the standby database was not started, which was probably caused by previous shutdown and maintenance.
Then, start to the mount state, start the listener at the same time, and execute the log application,
alter database recover managed standby database using current logfile disconnect from session;
However, when you see the log of MRP process waiting for sequence=61,
SQL> select process,status,client_process,thread#,sequence#,block#,active_agents,known_agents from gv$managed_standby where process in('LNS','RFS','LGWR','MRP0') and thread# <>0; PROCESS STATUS CLIENT_P THREAD# SEQUENCE# BLOCK# ACTIVE_AGENTS KNOWN_AGENTS --------- ------------ -------- ---------- ---------- ---------- ------------- ------------ MRP0 WAIT_FOR_GAP N/A 1 61 0 9 9
By retrieving v$archived_log, there are many logs that are not synchronized,
USERENV('INSTANCE') THREAD# LSQ HSQ ------------------- ---------- ---------- ---------- 1 1 61 316
Execute catalog start with in rman (import the latest backup set and archive log file list into the control document),
catalog start with '/archive';
After a while, MRP is now waiting for sequence=317,
SQL> select process,status,client_process,thread#,sequence#,block#,active_agents,known_agents from gv$managed_standby where process in('LNS','RFS','LGWR','MRP0') and thread# <>0; PROCESS STATUS CLIENT_P THREAD# SEQUENCE# BLOCK# ACTIVE_AGENTS KNOWN_AGENTS --------- ------------ -------- ---------- ---------- ---------- ------------- ------------ MRP0 WAIT_FOR_GAP N/A 1 317 0 9 9
Keep still, register the log manually,
alter database register logfile '/archive/1_317_xxxxxxxxxx.dbf';
The catalog is repeated several times, and the alert record of the standby database still indicates that the space is full,
Errors in file /oracle/app/oracle/diag/rdbms/conflundg/conflundg/trace/xxxxx_arc1_xxxxx.trc: ORA-19502: write error on file "/archive/1_433_xxxxxxxxxx.dbf", block number xxxxxx (block size=512) ORA-27061: waiting for async I/Os failed Linux-x86_64 Error: 28: No space left on device
Find out the archive logs that can be deleted at present, and delete them at the physical layer to free up the archive space,
select 'rm -rf '||name from v$archived_log where name like '%.dbf' and applied='YES' and completion_time<=sysdate-30;
However, it is found that there are still a lot of broken logs in the archive space of the standby database. Each scp is very tired. Or rebuild the standby database according to the recommendation and start over. After a while, we can only say that Oracle is really resistant. The logs begin to synchronize, and the sequence s of the primary and standby databases are almost the same,
SQL> select process,status,client_process,thread#,sequence#,block#,active_agents,known_agents from gv$managed_standby where process in('LNS','RFS','LGWR','MRP0') and thread# <>0; PROCESS STATUS CLIENT_P THREAD# SEQUENCE# BLOCK# ACTIVE_AGENTS KNOWN_AGENTS --------- ------------ -------- ---------- ---------- ---------- ------------- ------------ MRP0 APPLYING_LOG N/A 1 537 181996 9 9 RFS IDLE Archival 1 0 0 0 0 RFS IDLE LGWR 1 537 181996 0 0
In fact, both the primary and standby databases have created crontab to regularly delete archived logs. However, through debugging, it is found that the folder specified in the script to store execution logs has been deleted, resulting in execution interruption.
Through this case, on the one hand, it shows that the robustness of the task script can also be improved. For example, judging whether a folder exists will at least not lead to the whole logic error due to a non key factor. On the other hand, it also exposes the problem of monitoring coverage.
Therefore, for the above problems and scenarios, the following two monitoring point functions can be added,
1. Detection and monitoring of database availability to avoid abnormal shutdown and unopened of database.
2. Monitor the execution of archive log deletion task to avoid execution failure and archive log not deleted.
In addition, in this process, it is exposed that the understanding of the operation and principle of rman tool is still quite rusty and needs to be targeted.
Recently updated articles:
<Some methods of debugging Oracle database problems by Windows>
<Some problems encountered recently>
<Several common storage engines of MySQL>
<Two operations to create a PDB>
<hang occurs during truncate operation in Oracle>
Article classification and index:
Classification and index of 800 articles of official account>