案例介绍
环境介绍
- 操纵系统: Red Hat Enterprise Linux release 8.10 (Ootpa)
- 数据库版本: Oracle 19.23.0.0.0
上周五,系统管理员需要给Linux升级补丁,UAT环境下的一套DG,数据库没有正常关闭的环境下,操纵系统升级补丁后强制reboot了,周一早上处理的过程中遇到下面错误:
备库的告警日志有下面错误信息(GAP sequence提示信息):- PR00 (PID:145361): FAL: Failed to request gap sequence
- PR00 (PID:145361): GAP - thread 1 sequence 94-94
- PR00 (PID:145361): DBID 1790039322 branch 1173452819
- PR00 (PID:145361): FAL: All defined FAL servers have been attempted
- PR00 (PID:145361): -------------------------------------------------------------------------
- PR00 (PID:145361): Check that the CONTROL_FILE_RECORD_KEEP_TIME initialization
- PR00 (PID:145361): parameter is defined to a value that's sufficiently large
- PR00 (PID:145361): enough to maintain adequate log switch information to resolve
- PR00 (PID:145361): archived redo log gaps.
- PR00 (PID:145361): -------------------------------------------------------------------------
- 2024-07-22T10:26:06.796447+08:00
复制代码 主库查询- set linesize 720
- col name for a60
- col creator for a12
- select name,creator,sequence#,applied,completion_time from v$archived_log where sequence#=94;
- SQL> set linesize 720
- SQL> col name for a60
- SQL> col creator for a12
- SQL> select name,creator,sequence#,applied,completion_time from v$archived_log where sequence#=94;
- NAME CREATOR SEQUENCE# APPLIED COMPLETIO
- ------------------------------------------------------------ ------------ ---------- --------- ---------
- /gspdblog/gspprod_1173452819_1_94.arc ARCH 94 NO 19-JUL-24
- SQL>
复制代码 检查确认主库上的归档日志gspprod_1173452819_1_94.arc已经被删除了。
检查备库最后应用的归档日志信息:- SQL> set pages 1000 lines 1000
- SQL> SELECT al.thrd "Thread", almax "Last Seq Received", lhmax "Last Seq Applied"
- 2 FROM (select thread# thrd, MAX(sequence#) almax
- 3 FROM v$archived_log
- 4 WHERE resetlogs_change#=(SELECT resetlogs_change# FROM v$database) GROUP BY thread#) al,
- 5 (SELECT thread# thrd, MAX(sequence#) lhmax
- 6 FROM v$log_history
- 7 WHERE resetlogs_change#=(SELECT resetlogs_change# FROM v$database) GROUP BY thread#) lh
- 8 WHERE al.thrd = lh.thrd;
- Thread Last Seq Received Last Seq Applied
- ---------- ----------------- ----------------
- 1 111 93
- SQL>
- SQL> col client_pid for a10
- SQL> select inst_id, thread#, process, pid, status, client_process, client_pid,
- 2 sequence#, block#, active_agents, known_agents from gv$managed_standby order by thread#, pid;
- INST_ID THREAD# PROCESS PID STATUS CLIENT_P CLIENT_PID SEQUENCE# BLOCK# ACTIVE_AGENTS KNOWN_AGENTS
- ---------- ---------- --------- ------------------------ ------------ -------- ---------- ---------- ---------- ------------- ------------
- 1 0 DGRD 143439 ALLOCATED N/A N/A 0 0 0 0
- 1 0 DGRD 143441 ALLOCATED N/A N/A 0 0 0 0
- 1 0 RFS 147165 IDLE UNKNOWN 148946 0 0 0 0
- 1 0 RFS 147167 IDLE UNKNOWN 148942 0 0 0 0
- 1 0 RFS 147169 IDLE UNKNOWN 148944 0 0 0 0
- 1 1 ARCH 143437 CLOSING ARCH 143437 91 700416 0 0
- 1 1 ARCH 143443 CLOSING ARCH 143443 92 704512 0 0
- 1 1 ARCH 143445 CLOSING ARCH 143445 93 696320 0 0
- 1 1 ARCH 143447 CLOSING ARCH 143447 90 708608 0 0
- 1 1 MRP0 145359 WAIT_FOR_GAP N/A N/A 94 0 9 9
- 1 1 RFS 145885 IDLE Archival 148936 0 0 0 0
- 1 1 RFS 147161 IDLE LGWR 148948 112 321083 0 0
- 12 rows selected.
- SQL>
复制代码 因为UAT环境没有备份归档日志,而主库上都设置了一个作业清除两天前的归档日志(资源不足,需要定期清理归档日志),正常环境下,这个作业并不会带来什么问题,而由于上周五升级系统补丁,数据库停了2天,但是这个作业并没有停止(crontab作业),周一处理的时间,这个作业已经将两天前的归档日志给清理了。导致备用数据库无法获取序列号(SEQUENCE#)为94的归档日志。此时由于出现归档日志的GAP导致备用数据库无法同步数据,这种环境下, 我们计划用Oracle 18.1提供的新特性来恢复物理备库,如下所示- RMAN> RECOVER STANDBY DATABASE FROM SERVICE primary_connect_identifier;
- This command will internally keep track of standby file locations, refresh standby controlfile from primary,
- update the new standby controlfile with standby file names, perform incremental backup on primary, transfer
- the backup-pieces over network to standby and perform recovery on standby
复制代码 主要是这种新特性来恢复备用数据库非常方便,一条命令即可搞定,相比之前的增量备份/还原要简单很多。
操纵步骤:
- SQL> alter database recover managed standby database cancel;
- Database altered.
复制代码- SQL> shutdown immediate;
- Database closed.
- Database dismounted.
- ORACLE instance shut down.
- SQL> startup mount;
- ORACLE instance started.
- Total System Global Area 8589932424 bytes
- Fixed Size 13932424 bytes
- Variable Size 2348810240 bytes
- Database Buffers 6207569920 bytes
- Redo Buffers 19619840 bytes
- Database mounted.
复制代码- # 注意,不执行此命令,会遇到RMAN-05150
- SQL> recover managed standby database cancel;
- Media recovery complete.
- SQL>
复制代码 如果不执行上面命令,在RMAN做recover standby database时会遇到RMAN-05150错误,如下案例所示:- RMAN> recover standby database from service gsp;
- Starting recover at 22-JUL-24
- RMAN-00571: ===========================================================
- RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
- RMAN-00571: ===========================================================
- RMAN-03002: failure of recover command at 07/22/2024 10:46:14
- RMAN-05150: Managed Recovery Process must be disabled before running RECOVER STANDBY DATABASE.
复制代码 4: 备库执行修复命令,开始在线刷新备库- RMAN> recover standby database from service gsp;
复制代码 注意,这里RMAN毗连数据库的方式请选择账号暗码,不要利用系统认证方式,否则可能会遇到错误案例1.
正常环境下,你会看到雷同这样的输出信息- .............................
- starting media recovery
- media recovery complete, elapsed time: 00:00:00
- Finished recover at 22-JUL-24
- Finished recover at 22-JUL-24
复制代码 5:启动数据库恢复同步
在sqlplus中执行下面命令- alter databae open;
- alter pluggable database all open;
- alter database recover managed standby database using current logfile disconnect;
复制代码 下面还介绍一下,在操纵过程中轻易踩到的坑或错误:
错误案例1
利用RMAN恢复备库时,遇到ORA-17629: Cannot connect to the remote database server错误,如下所示- $ rman target /
- Recovery Manager: Release 19.0.0.0.0 - Production on Mon Jul 22 10:59:06 2024
- Version 19.23.0.0.0
- Copyright (c) 1982, 2019, Oracle and/or its affiliates. All rights reserved.
- connected to target database: gsp (DBID=1790039322, not open)
- RMAN> recover standby database from service gsp;
- Starting recover at 22-JUL-24
- Oracle instance started
- Total System Global Area 8589932424 bytes
- Fixed Size 13932424 bytes
- Variable Size 2348810240 bytes
- Database Buffers 6207569920 bytes
- Redo Buffers 19619840 bytes
- contents of Memory Script:
- {
- restore standby controlfile from service 'gsp';
- alter database mount standby database;
- }
- executing Memory Script
- Starting restore at 22-JUL-24
- using target database control file instead of recovery catalog
- allocated channel: ORA_DISK_1
- channel ORA_DISK_1: SID=5 device type=DISK
- RMAN-00571: ===========================================================
- RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
- RMAN-00571: ===========================================================
- RMAN-03002: failure of recover command at 07/22/2024 10:59:33
- RMAN-03015: error occurred in stored script Memory Script
- ORA-17629: Cannot connect to the remote database server
- ORA-17627:
- ORA-17629: Cannot connect to the remote database server
- RMAN> exit
复制代码 遇到这个错误,是因为RMAN毗连数据库利用操纵系统认证,这种方式毗连远程数据库(remote database server)就会有问题,应该改成账号暗码认证方式毗连数据库。这样就不会遇到这个错误了。
正确方式- rman target sys/user_password
复制代码 错误方式错误案例2
还原恢复过程遇到下面一系列ORA错误。具体如下所示:- .......................................................................
- RMAN-00571: ===========================================================
- RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
- RMAN-00571: ===========================================================
- RMAN-03002: failure of recover command at 07/22/2024 11:08:40
- RMAN-03015: error occurred in stored script Memory Script
- ORA-19849: error while reading backup piece from service gsp
- ORA-19573: cannot obtain exclusive enqueue for datafile 18
- ORA-19890: data file already in use
- ORA-45909: restore, recover or block media recovery may be in progress
- ORA-19660: some files in the backup set could not be verified
- ORA-19661: datafile 18 could not be verified due to corrupt blocks
- ORA-19849: error while reading backup piece from service gsp
- ORA-19573: cannot obtain exclusive enqueue for datafile 18
- ORA-19890: data file already in use
- ORA-45909: restore, recover or block media recovery may be in progress
- RMAN>
复制代码 检查备库- select process,status,sequence#,thread# from gv$managed_standby where process like 'MRP%';
- SQL> select process,status,sequence#,thread# from gv$managed_standby where process like 'MRP%';
- PROCESS STATUS SEQUENCE# THREAD#
- --------- ------------ ---------- ----------
- MRP0 WAIT_FOR_GAP 94 1
复制代码 在dgmgr中执行命令- DGMGRL> edit database gspro set state='APPLY-OFF';
- Succeeded.
- DGMGRL>
复制代码 重新验证(备用数据库)MRP进程是否已经结束。- SQL> select process,status,sequence#,thread# from gv$managed_standby where process like 'MRP%';
- no rows selected
- SQL>
复制代码 如上所示,MRP进程已经不存在了,就可以重新举行还原恢复操纵。
参考资料:
- ORA-19573 when trying to restore to standby with incremental backup From Primary or During any RMAN restore operation (Doc ID 1646232.1)
- ORA-17629 with RMAN 'From Service' Command (Doc ID 2960469.1)
扫描上面二维码关注我如果你真心觉得文章写得不错,而且对你有所资助,那就不妨帮助“推荐"一下,您的“推荐”和”打赏“将是我最大的写作动力!本文版权归作者所有,欢迎转载,但未经作者同意必须保留此段声明,且在文章页面明显位置给出原文毗连.
免责声明:如果侵犯了您的权益,请联系站长,我们会及时删除侵权内容,谢谢合作!更多信息从访问主页:qidao123.com:ToB企服之家,中国第一个企服评测及商务社交产业平台。 |