--//生产系统同事使用toad连接经常出现ora-04030错误。
ORA-04030: out of process memory when trying to allocate 123416 bytes (QERHJ hash-joi,kllcqas:kllsltba)
--//仔细看joi确实不是join,开始猜测可能某个程序的sql语句选择hash-join,导致pga消耗太大。
--//同事给了我alert日志的截图,提示都是icare_s001_20087.trc的进程,难道全部使用共享连接模式报错!!
--//icare_s001_20087.trc的进程报如下错误,对应进程里面应该能看到sql语句。
ORA-04030: out of process memory when trying to allocate 82456 bytes (pga heap,control file i/o buffer)
SYS@192.168.100.41:1521/icare:DEDICATED> select * from v$instance;
INSTANCE_NUMBER INSTANCE_NAME HOST_NAME VERSION STARTUP_TIME STATUS PAR THREAD# ARCHIVE LOG_SWITCH_WAIT LOGINS SHU DATABASE_STATUS INSTANCE_ROLE ACTIVE_ST BLO
--------------- ---------------- --------------- ----------------- ------------------- ------------ --- ---------- ------- --------------- ---------- --- ----------------- ------------------ --------- ---
1 icare DZBL_DB_101.132 10.2.0.4.0 2021-08-24 08:24:17 OPEN NO 1 STARTED ALLOWED NO ACTIVE PRIMARY_INSTANCE NORMAL NO
--//出现问题后没有重启过机器!!顺便说明一下,这台机器很少访问,基本作为历史数据库,偶尔有应用连上就是查询数据.
2.分析:
SYS@192.168.100.41:1521/icare> show sga
Total System Global Area 6442450944 bytes
Fixed Size 2093584 bytes
Variable Size 2181041648 bytes
Database Buffers 4244635648 bytes
Redo Buffers 14680064 bytes
SYS@192.168.100.41:1521/icare> show parameter pga
PARAMETER_NAME TYPE VALUE
-------------------- ----------- -------
pga_aggregate_target big integer 3G
SYS@192.168.100.41:1521/icare> @ spid
SID SERIAL# PROCESS SERVER SPID PID P_SERIAL# C50
---------- ---------- ------------ --------- ------ ------- ---------- --------------------------------------------------
855 22839 31462 SHARED 1811 18 237 alter system kill session '855,22839' immediate;
--//噢!以前管理的数据库不合理,选择的连接模式是SHARED模式,spid=1811.
--//这样可能更好理解为什么alert日志都是记录都是S001的进程.
$ grep -B1 "ORA-04030" alert_icare.log | grep Error | sort | uniq -c
1 Errors in file /opt/oracle/admin/icare/bdump/icare_s000_18301.trc:
1 Errors in file /opt/oracle/admin/icare/bdump/icare_s000_26348.trc:
1968 Errors in file /opt/oracle/admin/icare/bdump/icare_s001_20087.trc:
--//很明显问题问题出在共享连接的进程,也就是可以估计应用段的进程都是以共享模式连接的.
SYS@192.168.100.41:1521/icare> show parameter disp
PARAMETER_NAME TYPE VALUE
---------------- -------- ---------------
dispatchers string (PROTOCOL=TCP)
max_dispatchers integer
--//奇怪没有任何配置.难道同事建立的配置连接串没有(SERVER = DEDICATED)设置.缺省共享模式优先.
$ lsnrctl service
LSNRCTL for Linux: Version 10.2.0.4.0 - Production on 10-AUG-2023 08:25:53
Copyright (c) 1991, 2007, Oracle. All rights reserved.
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=EXTPROC1)))
Services Summary...
Service "icare" has 2 instance(s).
Instance "icare", status UNKNOWN, has 1 handler(s) for this service...
Handler(s):
"DEDICATED" established:1419 refused:0
LOCAL SERVER
Instance "icare", status READY, has 2 handler(s) for this service...
Handler(s):
"DEDICATED" established:1458 refused:0 state:ready
LOCAL SERVER
"D000" established:7472195 refused:0 current:109 max:1022 state:ready
DISPATCHER
(ADDRESS=(PROTOCOL=tcp)(HOST=DZBL_DB_101.132)(PORT=45587))
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Service "icaredg4" has 1 instance(s).
Instance "icare", status READY, has 2 handler(s) for this service...
Handler(s):
"DEDICATED" established:1458 refused:0 state:ready
LOCAL SERVER
"D000" established:7472195 refused:0 current:109 max:1022 state:ready
DISPATCHER
(ADDRESS=(PROTOCOL=tcp)(HOST=DZBL_DB_101.132)(PORT=45587))
Service "icaredg4_XPT" has 1 instance(s).
Instance "icare", status READY, has 2 handler(s) for this service...
Handler(s):
"DEDICATED" established:1458 refused:0 state:ready
LOCAL SERVER
"D000" established:7472195 refused:0 current:109 max:1022 state:ready
DISPATCHER
(ADDRESS=(PROTOCOL=tcp)(HOST=DZBL_DB_101.132)(PORT=45587))
The command completed successfully
--//icare服务存在两者模式,缺省连接串没有配置,优先选择共享模式连接.
SYS@192.168.100.41:1521/icare> column Program format a30
SYS@192.168.100.41:1521/icare> @pga 10
NAME VALUE_MB
------------------------------ ----------
aggregate PGA target parameter 3072
total PGA inuse 17655.4385
total PGA allocated 19551.1279
over allocation count 18030955
--//检查内核参数/proc/sys/vm/max_map_count:
# cat /proc/sys/vm/max_map_count
65530
--//来自链接 https://www.josip-pojatina.com/en/bulk-collect-and-memory-limits/
To increase a number of the map entries from 4 GB (65530 x 64 / 1024 to get the amount in GB) into the 16 GB, you need
to execute the following command as a root user:
--//65530*64/1024 = 4095.62500000000000000000
--//2^16 = 65536
--//如何知道每个64K?
SYS@192.168.100.41:1521/icare:DEDICATED> alter system kill session '833,21702,@1' immediate -- ICARE@ (w3wp.exe);
alter system kill session '833,21702,@1' immediate -- ICARE@ (w3wp.exe)
*
ERROR at line 1:
ORA-00026: missing or invalid session ID
SYS@192.168.100.41:1521/icare:DEDICATED> alter system kill session '833,21702' immediate -- ICARE@ (w3wp.exe);
System altered.
--//昏!10g版本竟然不支持@1这样的写法.
SYS@192.168.100.41:1521/icare:DEDICATED> @ killi program='w3wp.exe'
COMMANDS_TO_VERIFY_AND_RUN
--------------------------------------------------------------------------------------
alter system kill session '1010,44238' immediate -- ICARE@WORKGROUP\DELL70 (w3wp.exe);
...
$ lsnrctl service
LSNRCTL for Linux: Version 10.2.0.4.0 - Production on 10-AUG-2023 10:16:38
Copyright (c) 1991, 2007, Oracle. All rights reserved.
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=EXTPROC1)))
Services Summary...
Service "icare" has 2 instance(s).
Instance "icare", status UNKNOWN, has 1 handler(s) for this service...
Handler(s):
"DEDICATED" established:1419 refused:0
LOCAL SERVER
Instance "icare", status READY, has 2 handler(s) for this service...
Handler(s):
"DEDICATED" established:3043 refused:0 state:ready
LOCAL SERVER
"D001" established:7 refused:0 current:3 max:1022 state:ready
DISPATCHER
(ADDRESS=(PROTOCOL=tcp)(HOST=DZBL_DB_101.132)(PORT=33030))
Service "icaredg4" has 1 instance(s).
Instance "icare", status READY, has 2 handler(s) for this service...
Handler(s):
"DEDICATED" established:3043 refused:0 state:ready
LOCAL SERVER
"D001" established:7 refused:0 current:3 max:1022 state:ready
DISPATCHER
(ADDRESS=(PROTOCOL=tcp)(HOST=DZBL_DB_101.132)(PORT=33030))
Service "icaredg4_XPT" has 1 instance(s).
Instance "icare", status READY, has 2 handler(s) for this service...
Handler(s):
"DEDICATED" established:3043 refused:0 state:ready
LOCAL SERVER
"D001" established:7 refused:0 current:3 max:1022 state:ready
DISPATCHER
(ADDRESS=(PROTOCOL=tcp)(HOST=DZBL_DB_101.132)(PORT=33030))
The command completed successfully
--//使用d001替换原来的d000.
5.附件脚本代码:
$ cat pga
column name format a30
column machine format a30
SELECT NAME, VALUE/1024/1024 VALUE_MB
FROM V$PGASTAT
WHERE NAME IN ( 'aggregate PGA target parameter',
'total PGA allocated',
'total PGA inuse')
union all
SELECT NAME, VALUE
FROM V$PGASTAT
WHERE NAME IN ('over allocation count');
SELECT name profile, cnt, decode(total, 0, 0, round(cnt*100/total)) percentage
FROM (SELECT name, value cnt, (sum(value) over ()) total
FROM V$SYSSTAT WHERE name like 'workarea exec%');
SELECT *
FROM ( SELECT p.spid,
s.sid,
s.serial#,
s.machine,
s.client_info,
DECODE (s.program, NULL, p.program, s.program) AS "Program",
p.pga_used_mem,
p.pga_alloc_mem,
p.pga_max_mem
FROM v$process p, v$session s
WHERE s.paddr = p.addr
ORDER BY p.pga_used_mem DESC)
WHERE ROWNUM