马上注册,结交更多好友,享用更多功能,让你轻松玩转社区。
您需要 登录 才可以下载或查看,没有账号?立即注册
x
最近一段时间,遇到几个项目由于种种原因导致GRID_HOME目录或者下面的文件的权限被修改,出现CRS不能正常的启动。在启动ora.mdnsd的时,一直在STARTING。此故障原来模拟过几次都很轻松搞定,并且在半年前一个生产环境中的操作中也很顺利,但是这次一个朋友RP不好,不管怎么弄,就是不能启动。再来测试一下11.2.0.3/11.2.0.4环境中怎么恢复文件的权限。在下面操作中,会提到rootcrs.pl脚本,在11.2.0.3.6后oracle为了解决权限问题,添加init参数,但是此参数作用有限,并不是help中介绍的那样。1 11.2.0.4环境测试
1.1 测试前的资源状态
下面查看一下测试前的资源状态,确保每个资源的状态都是正常的。- [grid@rac112042 ~]$ crsctl status resource -t
- --------------------------------------------------------------------------------
- NAME TARGET STATE SERVER STATE_DETAILS
- --------------------------------------------------------------------------------
- Local Resources
- --------------------------------------------------------------------------------
- ora.DATA.dg
- ONLINE ONLINE rac112041
- ONLINE ONLINE rac112042
- ora.LISTENER.lsnr
- ONLINE ONLINE rac112041
- ONLINE ONLINE rac112042
- ora.OCR.dg
- ONLINE ONLINE rac112041
- ONLINE ONLINE rac112042
- ora.asm
- ONLINE ONLINE rac112041 Started
- ONLINE ONLINE rac112042 Started
- ora.gsd
- ONLINE OFFLINE rac112041
- ONLINE OFFLINE rac112042
- ora.net1.network
- ONLINE ONLINE rac112041
- ONLINE ONLINE rac112042
- ora.ons
- ONLINE ONLINE rac112041
- ONLINE ONLINE rac112042
- ora.registry.acfs
- ONLINE ONLINE rac112041
- ONLINE ONLINE rac112042
- --------------------------------------------------------------------------------
- Cluster Resources
- --------------------------------------------------------------------------------
- ora.LISTENER_SCAN1.lsnr
- 1 ONLINE ONLINE rac112041
- ora.cvu
- 1 ONLINE ONLINE rac112041
- ora.oc4j
- 1 ONLINE ONLINE rac112041
- ora.rac11204.db
- 1 ONLINE ONLINE rac112042 Open
- 2 ONLINE ONLINE rac112041 Open
- ora.rac112041.vip
- 1 ONLINE ONLINE rac112041
- ora.rac112042.vip
- 1 ONLINE ONLINE rac112042
- ora.scan1.vip
- 1 ONLINE ONLINE rac112041
复制代码 1.2 修改目录的属主为ROOT
将/oracle目录的权限全部修改为root:root - [root@rac112042 ~]# chown -R root:root /oracle
复制代码 1.3 启动主机,观察报错
这里直接重启主机来测试的,因为修改的权限crs不能正常停。- [root@rac112042 ~]# reboot
- [root@rac112042 ~]#
- Broadcast message from root@rac112042
- (/dev/pts/0) at 18:39 ...
- The system is going down for reboot NOW!
- [grid@rac112042 ~]$ crsctl status resource -t -init
- An error occurred while attempting to change log file permissions. Logging may not be active for this process.
- CLSU-00100: Operating System function: scls_chmod failed with error data: 1
- CLSU-00101: Operating System error message: Operation not permitted
- CLSU-00103: error location: chmodfail
- CLSU-00104: additional error information: chmod operation failed
- An error occurred while attempting to change log file permissions. Logging may not be active for this process.
- CLSU-00100: Operating System function: scls_chmod failed with error data: 1
- CLSU-00101: Operating System error message: Operation not permitted
- CLSU-00103: error location: chmodfail
- CLSU-00104: additional error information: chmod operation failed
- --------------------------------------------------------------------------------
- NAME TARGET STATE SERVER STATE_DETAILS
- --------------------------------------------------------------------------------
- Cluster Resources
- --------------------------------------------------------------------------------
- ora.asm
- 1 ONLINE OFFLINE
- ora.cluster_interconnect.haip
- 1 ONLINE OFFLINE
- ora.crf
- 1 ONLINE OFFLINE
- ora.crsd
- 1 ONLINE OFFLINE
- ora.cssd
- 1 ONLINE OFFLINE
- ora.cssdmonitor
- 1 ONLINE OFFLINE
- ora.ctssd
- 1 ONLINE OFFLINE
- ora.diskmon
- 1 OFFLINE OFFLINE
- ora.drivers.acfs
- 1 ONLINE OFFLINE
- ora.evmd
- 1 ONLINE OFFLINE
- ora.gipcd
- 1 ONLINE OFFLINE
- ora.gpnpd
- 1 ONLINE OFFLINE
- ora.mdnsd
- 1 ONLINE OFFLINE STARTING
- 可以看到crs启动的时候一直HANG在了.ora.mdnsd这个资源,启动的动作一直视STARTING,代表一直在启动这个资源。还能看到CRSCTL报错信息
复制代码 1.3.1 查看GRID的ALERT日志
- [ohasd(1964)]CRS-2112:The OLR service started on node rac112042.
- 2015-12-15 05:50:23.407:
- [ohasd(1964)]CRS-1301:Oracle High Availability Service started on node rac112042.
- 2015-12-15 05:50:23.430:
- [ohasd(1964)]CRS-8017:location: /etc/oracle/lastgasp has 2 reboot advisory log files, 0 were announced and 0 errors occurred
- 2015-12-15 05:50:28.420:
- [/oracle/app/11.2.0/grid/bin/orarootagent.bin(2186)]CRS-2302:Cannot get GPnP profile. Error CLSGPNP_NO_DAEMON (GPNPD daemon is not running).
- 2015-12-15 05:52:32.545:
- [ohasd(1964)]CRS-2757:Command 'Start' timed out waiting for response from the resource 'ora.mdnsd'. Details at (:CRSPE00111:) {0:0:2} in /oracle/app/11.2.0/grid/log/rac112042/ohasd/ohasd.log.
- 这里看到ora.mdnsd资源启动时候出现了timed out还在方法的重试。
复制代码 1.3.2 查看ohasd日志
- [root@rac112042 ~]# tail -f /oracle/app/11.2.0/grid/log/rac112042/ohasd/ohasd.log
- 2015-12-15 05:52:34.274: [ AGFW][314713856]{0:0:2} Agfw Proxy Server replying to the message: AGENT_HANDSHAKE[Proxy] ID 20484:11
- 2015-12-15 05:52:34.277: [ AGFW][314713856]{0:11:2} Received the reply to the message: RESTYPE_ADD[ora.daemon.type] ID 8196:312 from the agent /oracle/app/11.2.0/grid/bin/oraagent_grid
- 2015-12-15 05:52:34.278: [ AGFW][314713856]{0:11:2} Received the reply to the message: RESTYPE_ADD[ora.asm.type] ID 8196:313 from the agent /oracle/app/11.2.0/grid/bin/oraagent_grid
- 2015-12-15 05:52:34.279: [ AGFW][314713856]{0:11:2} Received the reply to the message: RESTYPE_ADD[ora.evm.type] ID 8196:314 from the agent /oracle/app/11.2.0/grid/bin/oraagent_grid
- 2015-12-15 05:52:34.280: [ AGFW][314713856]{0:11:2} Received the reply to the message: RESTYPE_ADD[ora.gipc.type] ID 8196:315 from the agent /oracle/app/11.2.0/grid/bin/oraagent_grid
- 2015-12-15 05:52:34.282: [ AGFW][314713856]{0:11:2} Received the reply to the message: RESTYPE_ADD[ora.gpnp.type] ID 8196:316 from the agent /oracle/app/11.2.0/grid/bin/oraagent_grid
- 2015-12-15 05:52:34.283: [ AGFW][314713856]{0:11:2} Received the reply to the message: RESTYPE_ADD[ora.haip.type] ID 8196:317 from the agent /oracle/app/11.2.0/grid/bin/oraagent_grid
- 2015-12-15 05:52:34.284: [ AGFW][314713856]{0:11:2} Received the reply to the message: RESTYPE_ADD[ora.mdns.type] ID 8196:318 from the agent /oracle/app/11.2.0/grid/bin/oraagent_grid
- 2015-12-15 05:52:34.285: [ AGFW][314713856]{0:11:2} Received the reply to the message: RESOURCE_ADD[ora.asm 1 1] ID 4356:319 from the agent /oracle/app/11.2.0/grid/bin/oraagent_grid
- 2015-12-15 05:52:34.286: [ AGFW][314713856]{0:11:2} Received the reply to the message: RESOURCE_ADD[ora.mdnsd 1 1] ID 4356:320 from the agent /oracle/app/11.2.0/grid/bin/oraagent_grid
复制代码 1.4 修改文件权限
这里直接使用chown -R 修改文件的权限,grid在安装的时候,在执行root.sh前$ORACLE_HOME目录下面所有的文件的都是grid:oinstall的(这里具体的安装用户,用户组可能不一样,根据各个环境不同做相应的修改)。- [root@rac112042 app]# pwd
- /oracle/app
- [root@rac112042 app]# chown -R grid:oinstall 11.2.0
- [root@rac112042 app]# chown -R grid:oinstall grid
复制代码 init修改部分文件的权限- [root@rac112042 app]# /oracle/app/11.2.0/grid/crs/install/rootcrs.pl -init
- Using configuration parameter file: /oracle/app/11.2.0/grid/crs/install/crsconfig_params
复制代码 由于GRID_HOME目录下面部分文件的属主是root,特别是与acfs、asmfd等相关的文件。需要使用init来修改,如果没有使用相关的功能,这步可以不需要操作的。修改oracle 2进制文件权限这里需要6751的权限- [root@rac112042 app]# ls -l /oracle/app/11.2.0/grid/bin/oracle
- -rwxr-x--x 1 grid oinstall 209914519 Aug 27 21:16 /oracle/app/11.2.0/grid/bin/oracle
- [root@rac112042 app]# chmod 6751 /oracle/app/11.2.0/grid/bin/oracle
- [root@rac112042 app]# ls -l /oracle/app/11.2.0/grid/bin/oracle
- -rwsr-s--x 1 grid oinstall 209914519 Aug 27 21:16 /oracle/app/11.2.0/grid/bin/oracle
复制代码 1.5 重启CRS
重启CRS后的资源状态如下- [grid@rac112042 ~]$ crsctl status resource -t -init
- --------------------------------------------------------------------------------
- NAME TARGET STATE SERVER STATE_DETAILS
- --------------------------------------------------------------------------------
- Cluster Resources
- --------------------------------------------------------------------------------
- ora.asm
- 1 ONLINE ONLINE rac112042 Started
- ora.cluster_interconnect.haip
- 1 ONLINE ONLINE rac112042
- ora.crf
- 1 ONLINE ONLINE rac112042
- ora.crsd
- 1 ONLINE ONLINE rac112042
- ora.cssd
- 1 ONLINE ONLINE rac112042
- ora.cssdmonitor
- 1 ONLINE ONLINE rac112042
- ora.ctssd
- 1 ONLINE ONLINE rac112042 OBSERVER
- ora.diskmon
- 1 OFFLINE OFFLINE
- ora.drivers.acfs
- 1 ONLINE ONLINE rac112042
- ora.evmd
- 1 ONLINE ONLINE rac112042
- ora.gipcd
- 1 ONLINE ONLINE rac112042
- ora.gpnpd
- 1 ONLINE ONLINE rac112042
- ora.mdnsd
- 1 ONLINE ONLINE rac112042
- [grid@rac112042 ~]$ crsctl status resource -t
- --------------------------------------------------------------------------------
- NAME TARGET STATE SERVER STATE_DETAILS
- --------------------------------------------------------------------------------
- Local Resources
- --------------------------------------------------------------------------------
- ora.DATA.dg
- ONLINE ONLINE rac112041
- ONLINE ONLINE rac112042
- ora.LISTENER.lsnr
- ONLINE ONLINE rac112041
- ONLINE ONLINE rac112042
- ora.OCR.dg
- ONLINE ONLINE rac112041
- ONLINE ONLINE rac112042
- ora.asm
- ONLINE ONLINE rac112041 Started
- ONLINE ONLINE rac112042 Started
- ora.gsd
- ONLINE OFFLINE rac112041
- ONLINE OFFLINE rac112042
- ora.net1.network
- ONLINE ONLINE rac112041
- ONLINE ONLINE rac112042
- ora.ons
- ONLINE ONLINE rac112041
- ONLINE ONLINE rac112042
- ora.registry.acfs
- ONLINE ONLINE rac112041
- ONLINE ONLINE rac112042
- --------------------------------------------------------------------------------
- Cluster Resources
- --------------------------------------------------------------------------------
- ora.LISTENER_SCAN1.lsnr
- 1 ONLINE ONLINE rac112041
- ora.cvu
- 1 ONLINE ONLINE rac112041
- ora.oc4j
- 1 ONLINE ONLINE rac112041
- ora.rac11204.db
- 1 ONLINE OFFLINE
- 2 ONLINE ONLINE rac112041 Open
- ora.rac112041.vip
- 1 ONLINE ONLINE rac112041
- ora.rac112042.vip
- 1 ONLINE ONLINE rac112042
- ora.scan1.vip
- 1 ONLINE ONLINE rac112041
复制代码 1.6 TRACE init修改的文件内容
下面通过strace命令可以跟踪到init参数修改了那些文件的权限- [root@rac112042 bin]# strace /oracle/app/11.2.0/grid/crs/install/rootcrs.pl -init >/tmp/rootcrs_init.txt
- chmod("/oracle/app/11.2.0/grid/cfgtoollogs/crsconfig/rootcrs_rac112042.log", 0775) = 0
- chmod("/oracle/app/11.2.0/grid/cdata", 0775) = 0
- chmod("/oracle/app/11.2.0/grid/cdata/rac11204cluster", 0775) = 0
- chmod("/oracle/app/11.2.0/grid/cfgtoollogs", 0775) = 0
- chmod("/oracle/app/11.2.0/grid/cfgtoollogs/crsconfig", 0775) = 0
- chmod("/oracle/app/11.2.0/grid/log", 0775) = 0
- chmod("/oracle/app/11.2.0/grid/log/rac112042", 01755) = 0
- chmod("/oracle/app/11.2.0/grid/log/rac112042/crsd", 0750) = 0
- chmod("/oracle/app/11.2.0/grid/log/rac112042/ctssd", 0750) = 0
- chmod("/oracle/app/11.2.0/grid/log/rac112042/evmd", 0750) = 0
- ........
- chown("/oracle/app/11.2.0/grid/cfgtoollogs/crsconfig/rootcrs_rac112042.log", 501, 501) = 0
- chown("/oracle/app/11.2.0/grid/cdata", 501, 501) = 0
- chown("/oracle/app/11.2.0/grid/cdata/rac11204cluster", 501, 501) = 0
- chown("/oracle/app/11.2.0/grid/cfgtoollogs", 501, 501) = 0
- chown("/oracle/app/11.2.0/grid/cfgtoollogs/crsconfig", 501, 501) = 0
- chown("/oracle/app/11.2.0/grid/log", 501, 501) = 0
- chown("/oracle/app/11.2.0/grid/log/rac112042", 0, 501) = 0
- chown("/oracle/app/11.2.0/grid/log/rac112042/crsd", 0, 501) = 0
- chown("/oracle/app/11.2.0/grid/log/rac112042/ctssd", 0, 501) = 0
- chown("/oracle/app/11.2.0/grid/log/rac112042/evmd", 501, 501) = 0
- chown("/oracle/app/11.2.0/grid/log/rac112042/cssd", 501, 501) = 0
- chown("/oracle/app/11.2.0/grid/log/rac112042/mdnsd", 501, 501) = 0
- chown("/oracle/app/11.2.0/grid/log/rac112042/gpnpd", 501, 501) = 0
- .....
复制代码 其实init修改的东西就是下面文件中的- [grid@rac112041 utl]$ cat crsconfig_dirs |wc -l
- 147
- [grid@rac112041 utl]$ cat crsconfig_fileperms |wc -l
- 227
复制代码 由于init参数在11.2.0.3.6以后才提供的参数,如果在11.2.0.3.6之前需要手动修改这些文件的权限,我们可以通过下面的命令来完成- grep -v ^# crsconfig_dirs|grep -v ^$|awk {'print "chown " $3 ":" $4,$2'}>crsconfig_dirs_1.sh
- grep -v ^# crsconfig_dirs|grep -v ^$|awk {'print "chmod " $5,$2'}>crsconfig_dirs_2.sh
- crsconfig_filesperms
- grep -v ^# crsconfig_fileperms|grep -v ^$|awk {'print "chown " $3 ":" $4,$2'}>crsconfig_fileperms_1.sh
- grep -v ^# crsconfig_fileperms|grep -v ^$|awk {'print "chmod " $5,$2'}>crsconfig_fileperms_2.sh
复制代码 1.7 通过执行root.sh来解决权限问题
环境模拟跟上面一样,下面通过执行root.sh来修改文件的权限。大概的步骤如下这里看到,集群式正常启动的
2 11.2.0.3的测试
下面简单给出11.2.0.3测试的恢复操作- [root@11rac1 app]# chown -R grid:dba 11.2.0
- [root@11rac1 app]# chown -R grid:dba grid
- [grid@11rac1 ~]$ ls -l $ORACLE_HOME/bin/oracle
- -rwxr-x--x 1 grid dba 204033598 Oct 17 2014 /u01/app/11.2.0/grid/bin/oracle
- [grid@11rac1 ~]$ chmod 6751 /u01/app/11.2.0/grid/bin/oracle
- [grid@11rac1 ~]$ ls -l $ORACLE_HOME/bin/oracle
- -rwsr-s--x 1 grid dba 204033598 Oct 17 2014 /u01/app/11.2.0/grid/bin/oracle
复制代码 3 测试总结
如果出现权限被修改,建议可以删除节点,再添加节点的方式来处理。虽然可以通过改权限等方式来处理,但是不保证后期打补丁或者其它的操作出现一些其它奇怪的错误。
免责声明:如果侵犯了您的权益,请联系站长,我们会及时删除侵权内容,谢谢合作! |