GreatSQL内存斲丧异常排查攻略:从系统到应用层面的深入分析
当 GreatSQL 数据库处于高并发高负载时,大概会发现 mysqld 历程的内存斲丧远远超出设置的 innodb_buffer_pool_size 时,偶然候甚至会高达甚至超过系统内存的90%,遇到这种问题时,心里经常会发慌,担心下一秒内存就会爆了发生 OOM,或者数据库hang死不响应。
本文和大家试着使用 GreatSQL 中的 sys schema 和 performance_schema 举行深入分析,找出内存斲丧大户的源头,并尽大概解决问题。
下面是详细的排查方法和步调。
1. 确认实际内存斲丧
1.1 操纵系统层面分析
先检查确认 mysqld 历程的内存详细斲丧占用环境,做到心里有数,避免真的下一秒就发生 OOM 的问题:- $ free -ht
- free -ht
- total used free shared buff/cache available
- Mem: 30Gi 28Gi 240Mi 33Mi 2.0Gi 1.7Gi
- Swap: 0B 0B 0B
- Total: 30Gi 28Gi 240Mi
- $ ps aux | grep mysqld
- mysql 51931 23.0 89.8 32100800 29008060 ? Ssl Nov22 949:41 /data/apps/GreatSQL-8.0.32-26-Linux-glibc2.28-x86_64/bin/mysqld
- $ top -p $(pidof mysqld) -n 1
- top - 05:36:37 up 4 days, 4:06, 1 user, load average: 5.56, 8.70, 10.87
- Tasks: 1 total, 0 running, 1 sleeping, 0 stopped, 0 zombie
- %Cpu(s): 8.4 us, 1.7 sy, 0.0 ni, 86.6 id, 3.4 wa, 0.0 hi, 0.0 si, 0.0 st
- MiB Mem : 31553.3 total, 265.6 free, 29148.6 used, 2139.2 buff/cache
- MiB Swap: 0.0 total, 0.0 free, 0.0 used. 1903.3 avail Mem
- PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
- 51931 mysql 20 0 30.6g 27.7g 4656 S 80.0 89.8 949:51.99 mysqld
复制代码 在上述结果中重点关注几个指标:
- RES(Resident Memory):物理内存占用,约 27.7G。
- VIRT(Virtual Memory):虚拟地址空间大小,约 30.6G。
看到 mysqld 历程当前内存斲丧占比约 90%,还算可控,没到火烧眉毛的境地。
继续使用 pmap 检察 mysqld 历程中的内存分布环境:- $ pmap -x $(pidof mysqld) | sort -k3 -rn | head -n 20
- total kB 32100804 29029796 29016940
- 00007f8a484d8000 5368992 5361664 5361664 rw--- [ anon ]
- 00007f892533d000 4698892 4691952 4691952 rw--- [ anon ]
- 00007f87e961e000 4564872 4556784 4556784 rw--- [ anon ]
- 00007f86adbe0000 4296832 4290544 4290544 rw--- [ anon ]
- 00007f86298bb000 1023252 1015792 1015792 rw--- [ anon ]
- 00007f866c350000 979648 979632 979632 rw--- [ anon ]
- 00007f87b8112000 719800 712688 712688 rw--- [ anon ]
- 00007f89046d4000 451760 444400 444400 rw--- [ anon ]
- 0000000005afc000 286800 282640 282640 rw--- [ anon ]
- 00007f8ba7715000 200300 200276 200276 rw--- [ anon ]
- 00007f8578000000 131072 131072 131072 rw--- [ anon ]
- 00007f8570000000 131072 131072 131072 rw--- [ anon ]
- 00007f8568000000 131072 131072 131072 rw--- [ anon ]
- 00007f8560000000 131072 131072 131072 rw--- [ anon ]
- 00007f8558000000 131072 131072 131072 rw--- [ anon ]
- 00007f8550000000 131048 131048 131048 rw--- [ anon ]
- 00007f8438000000 130668 130668 130668 rw--- [ anon ]
- 00007f8b98000000 65536 65536 65536 rw--- [ anon ]
- 00007f8b94000000 65536 65536 65536 rw--- [ anon ]
复制代码 看到大量的匿名内存(anon)斲丧较多内存,这大概是由以下几个原因引起的:
- 动态分配的内存:
- GreatSQL 在运行过程中会频繁地举行内存分配和开释,这些内存通常是以匿名映射的形式存在于历程的虚拟地址空间中。
- 比方,GreatSQL 的线程池、缓存、临时表等都会动态分配内存。
- 缓冲区和缓存:
- GreatSQL 使用大量的缓冲区和缓存来进步性能,比方InnoDB Buffer Pool(以下简称 IBP)、InnoDB Log Buffer等。
- 这些缓冲区和缓存通常会占用大量的匿名内存。
- 线程堆栈:
- 每个线程都有自己的堆栈空间,这些堆栈空间也是匿名内存的一部门。
- 为了响应用户请求而创建了大量线程,那么这些线程的堆栈空间会占用不少内存。
- 临时文件:
- GreatSQL 在处理大查询或排序操纵时,大概会使用临时表、临时文件,其内存映射也会占用匿名内存。
- 内存泄漏:
- 如果 GreatSQL 存在内存泄漏问题,也会导致匿名内存不断增长。
可以针对上述各个模块/维度做进一步排查分析。
1.2 检查 IBP 内存相干设置
- 使用以下 SQL 命令查询 IBP 等内存斲丧较多的相干模块内存设置
- greatsql> SHOW GLOBAL VARIABLES LIKE 'innodb_buffer_pool_size';
- +-------------------------+-------------+
- | Variable_name | Value |
- +-------------------------+-------------+
- | innodb_buffer_pool_size | 21474836480 |
- +-------------------------+-------------+
- greatsql> SHOW GLOBAL VARIABLES LIKE 'innodb_log_buffer_size';
- +------------------------+----------+
- | Variable_name | Value |
- +------------------------+----------+
- | innodb_log_buffer_size | 33554432 |
- +------------------------+----------+
复制代码 从上面可见 IBP 设置为 20G,但是 mysqld 历程的内存占用为 27.7G,超过 IBP 较多,这大概是由于用户的 SQL 请求(比如服从较低的慢查询 SQL)其他模块或线程引起。还必要继续排查。
2. 利用 Performance Schema 排查内存斲丧泉源
从 5.6.6 版本开始,Performance Schema 默认启用,是一个内置的性能诊断工具,用于实时监控和分析 GreatSQL 服务器的运行状态。它提供了详细的性能数据,包罗 内存分配的全局视图、SQL 语句的执行时间、线程活动、锁等候等详细信息,帮助开发者和 DBA 识别和解决性能瓶颈。
2.1 按内存模块检察占用
使用 memory_summary_global_by_event_name 按模块检察内存分配环境:- greatsql> USE performance_schema;
- greatsql> SELECT
- EVENT_NAME,
- CURRENT_NUMBER_OF_BYTES_USED AS memory_bytes,
- CURRENT_NUMBER_OF_BYTES_USED / 1024 / 1024 AS memory_mb
- FROM
- performance_schema.memory_summary_global_by_event_name
- WHERE
- CURRENT_NUMBER_OF_BYTES_USED > 0
- ORDER BY
- CURRENT_NUMBER_OF_BYTES_USED DESC
- LIMIT 10;
- +--------------------------------------------------------------------+--------------+----------------+
- | EVENT_NAME | memory_bytes | memory_mb |
- +--------------------------------------------------------------------+--------------+----------------+
- | memory/innodb/buf_buf_pool | 21957836800 | 20940.62500000 |
- | memory/group_rpl/GCS_XCom::xcom_cache | 1070853221 | 1021.24521351 |
- | memory/mysys/IO_CACHE | 84149952 | 80.25164795 |
- | memory/performance_schema/events_statements_summary_by_digest | 42240000 | 40.28320313 |
- | memory/innodb/log_buffer_memory | 33555440 | 32.00096130 |
- | memory/innodb/ut0link_buf | 25165888 | 24.00006104 |
- | memory/innodb/lock0lock | 22440096 | 21.40054321 |
- | memory/sql/TABLE | 15646883 | 14.92203045 |
- | memory/performance_schema/events_statements_history_long | 15040000 | 14.34326172 |
- | memory/performance_schema/events_errors_summary_by_thread_by_error | 14561280 | 13.88671875 |
- +--------------------------------------------------------------------+--------------+----------------+
- 10 rows in set (0.00 sec)
复制代码
- EVENT_NAME:详细内存分配的模块名称,如 memory/innodb/buffer_pool、memory/sql/temporary_table 等。
- CURRENT_NUMBER_OF_BYTES_USED:当前分配的内存总量。
比方:
- 如果 memory/innodb/buf_buf_pool 值较高,说明 InnoDB buffer pool 占用较高。其他几个包含 memory/innodb 关键字的,也都是和 InnoDB 存储引擎相干的内存模块。
- 其中 memory/mysys/MY_BITMAP::bitmap 重要用于管理位图数据结构 (bitmap) 的内存使用。它的设计初衷是为了实现高效的位存储与处理,重要用于存储和操纵必要标记位(bit flag)来跟踪或控制的数据。其详细存储的数据包罗但不限于:
- 索引或表分区的状态:位图被用来记录索引或分区的使用状态。比方,在分区表扫描时,通过位图可以高效管理哪些分区必要扫描或已经扫描。
- 事件或锁状态:记录事件的特定标记位或锁的使用状态,比如资源锁(resource locks)的分配状态。
- InnoDB 的内部操纵:位图被用于跟踪一些内部存储引擎的优化过程,比方自适应哈希索引、页的脏位(dirty bit)标记等。
- 线程管理:管理线程池中线程的分配和使用状态。
- 性能统计:在某些性能分析的场景下,位图用于记录启用或禁用的统计模块。
- 其中 memory/group_rpl/GCS_XCom::xcom_cache 是 MGR Xcom cache,在 GreatSQL 8.0.32-26 中初始默认值即为 1GB,详情参考 Xcom cache分配静态化。其他几个包含 group_rpl 关键字特征的,也是和 MGR 相干的模块。
- 其中 memory/mysys/IO_CACHE 是一个重要的内存管理模块,重要用于管理和优化文件I/O操纵。IO_CACHE 提供了一个高效的缓存机制,可以明显进步文件读写操纵的性能。重要存储的数据有:
- 临时文件数据:排序、分组、联接等操纵过程中生成的中间结果。
- 二进制日记:binlog 的写入和读取操纵中使用缓存。
- 文件块:GreatSQL 访问文件时,将数据块加载到缓存中,避免重复读取。
- 表数据:表扫描或索引扫描时,用于缓存表或索引的数据。
- 如果 memory/sql/temporary_table 值较高,说明内存被临时表斲丧。
- 如果 memory/innodb/hash_index 值较高,大概是 InnoDB 的自适应哈希索引占用内存。
上面的查询结果表明,memory/innodb/buf_buf_pool(IBP) 占用内存约 20G,memory/group_rpl/GCS_XCom::xcom_cache(MGR XCom Cache) 占用内存约 1G,都是符合预期的。但是 memory/mysys/IO_CACHE 占用的内存较高,必要重点排查。
2.2 跟踪各模块内存使用变化
可以每间隔一段时间重复执行下面的 SQL 请求,观察各个模块的内存斲丧变化,找出内存斲丧增长较快的模块,它们大概就是导致 mysqld 历程斲丧较大内存的“元凶”。- greatsql> USE performance_schema;
- greatsql> SELECT
- EVENT_NAME,
- SUM(SUM_NUMBER_OF_BYTES_ALLOC) / 1024 / 1024 AS total_memory_mb
- FROM
- performance_schema.memory_summary_global_by_event_name
- GROUP BY
- EVENT_NAME
- ORDER BY
- SUM_NUMBER_OF_BYTES_ALLOC DESC
- LIMIT 10;
- +---------------------------------------------+------------------+
- | EVENT_NAME | total_memory_mb |
- +---------------------------------------------+------------------+
- | memory/innodb/memory | 3688428.98232269 |
- | memory/mysys/MY_BITMAP::bitmap | 289065.08729172 |
- | memory/group_rpl/transaction_data | 219301.70309544 |
- | memory/group_rpl/Gcs_message_data::m_buffer | 219176.21560478 |
- | memory/mysys/IO_CACHE | 102064.87601471 |
- | memory/group_rpl/GCS_XCom::xcom_cache | 57685.34130669 |
- | memory/sql/Log_event | 47153.59659863 |
- | memory/group_rpl/write_set_encoded | 35822.83545971 |
- | memory/innodb/buf_buf_pool | 20940.62500000 |
- | memory/group_rpl/certification_data | 11146.79415703 |
- +---------------------------------------------+------------------+
复制代码 结合前面各模块当前占用的内存环境,从上述查询结果综合分析看,较大概率应该就是 memory/mysys/IO_CACHE 模块斲丧内存过大。
2.3 按线程检察内存占用
接着继续检察各线程内存占用环境,确认是否有个别线程(尤其是长连接线程)斲丧了过多内存资源。使用 memory_summary_by_thread_by_event_name 检察各线程的内存分配,同时关联查询 threads 视图,可以显示各线程当前正在执行的 SQL 请求及其执行耗时:- -- 1. 查看各线程当前的内存分配情况
- greatsql> USE performance_schema;
- greatsql> SELECT
- m.EVENT_NAME,
- m.COUNT_ALLOC,
- m.CURRENT_NUMBER_OF_BYTES_USED AS mem_sum,
- (m.CURRENT_NUMBER_OF_BYTES_USED / 1024 / 1024.0) AS mem_sum_mb,
- t.NAME,
- t.TYPE,
- t.PROCESSLIST_ID,
- LEFT(t.PROCESSLIST_INFO, 10)
- FROM
- memory_summary_by_thread_by_event_name m
- JOIN threads t
- USING (THREAD_ID)
- WHERE
- t.PROCESSLIST_ID != CONNECTION_ID()
- ORDER BY
- m.CURRENT_NUMBER_OF_BYTES_USED desc
- LIMIT 20;
- +-------------------------------+-------------+---------+------------+----------------------------------------------+------------+----------------+------------------------------+
- | EVENT_NAME | COUNT_ALLOC | mem_sum | mem_sum_mb | NAME | TYPE | PROCESSLIST_ID | LEFT(t.PROCESSLIST_INFO, 10) |
- +-------------------------------+-------------+---------+------------+----------------------------------------------+------------+----------------+------------------------------+
- | memory/innodb/memory | 13 | 21888 | 0.02087402 | thread/group_rpl/THD_applier_module_receiver | FOREGROUND | 12 | Group repl |
- | memory/sql/THD::main_mem_root | 3 | 20576 | 0.01962280 | thread/sql/one_connection | FOREGROUND | 39893 | load data |
- | memory/sql/THD::main_mem_root | 3 | 20576 | 0.01962280 | thread/sql/one_connection | FOREGROUND | 39894 | load data |
- | memory/sql/THD::main_mem_root | 3 | 20576 | 0.01962280 | thread/sql/one_connection | FOREGROUND | 39895 | load data |
- | memory/sql/THD::main_mem_root | 3 | 20576 | 0.01962280 | thread/sql/one_connection | FOREGROUND | 39896 | load data |
- | memory/sql/THD::main_mem_root | 3 | 20576 | 0.01962280 | thread/sql/one_connection | FOREGROUND | 39897 | load data |
- | memory/sql/THD::main_mem_root | 3 | 20576 | 0.01962280 | thread/sql/one_connection | FOREGROUND | 39898 | load data |
- | memory/sql/THD::main_mem_root | 3 | 20576 | 0.01962280 | thread/sql/one_connection | FOREGROUND | 39899 | load data |
- | memory/sql/THD::main_mem_root | 3 | 20576 | 0.01962280 | thread/sql/one_connection | FOREGROUND | 39900 | load data |
- | memory/sql/THD::main_mem_root | 3 | 20576 | 0.01962280 | thread/sql/one_connection | FOREGROUND | 39901 | load data |
- | memory/sql/THD::main_mem_root | 3 | 20576 | 0.01962280 | thread/sql/one_connection | FOREGROUND | 39902 | load data |
- | memory/sql/THD::main_mem_root | 3 | 20576 | 0.01962280 | thread/sql/one_connection | FOREGROUND | 39903 | load data |
- | memory/sql/THD::main_mem_root | 3 | 20576 | 0.01962280 | thread/sql/one_connection | FOREGROUND | 39904 | load data |
- | memory/sql/THD::main_mem_root | 3 | 20576 | 0.01962280 | thread/sql/one_connection | FOREGROUND | 39905 | load data |
- | memory/sql/THD::main_mem_root | 3 | 20576 | 0.01962280 | thread/sql/one_connection | FOREGROUND | 39906 | load data |
- | memory/sql/THD::main_mem_root | 3 | 20576 | 0.01962280 | thread/sql/one_connection | FOREGROUND | 39907 | load data |
- | memory/sql/THD::main_mem_root | 3 | 20576 | 0.01962280 | thread/sql/one_connection | FOREGROUND | 39908 | load data |
- | memory/sql/THD::main_mem_root | 3 | 20576 | 0.01962280 | thread/sql/one_connection | FOREGROUND | 39909 | load data |
- | memory/sql/THD::main_mem_root | 3 | 20576 | 0.01962280 | thread/sql/one_connection | FOREGROUND | 39910 | load data |
- | memory/sql/THD::main_mem_root | 3 | 20576 | 0.01962280 | thread/sql/one_connection | FOREGROUND | 39911 | load data |
- +-------------------------------+-------------+---------+------------+----------------------------------------------+------------+----------------+------------------------------+
- -- 2. 查看各线程汇总的内存分配情况
- greatsql> SELECT
- m.EVENT_NAME,
- m.COUNT_ALLOC,
- m.SUM_NUMBER_OF_BYTES_ALLOC AS mem_sum,
- (m.SUM_NUMBER_OF_BYTES_ALLOC / 1024 / 1024.0) AS mem_sum_mb,
- t.NAME,
- t.TYPE,
- t.PROCESSLIST_ID,
- LEFT(t.PROCESSLIST_INFO, 10)
- FROM
- memory_summary_by_thread_by_event_name m
- JOIN threads t
- USING (THREAD_ID)
- WHERE
- t.PROCESSLIST_ID != CONNECTION_ID()
- ORDER BY
- m.SUM_NUMBER_OF_BYTES_ALLOC desc
- LIMIT 20;
- +----------------------+-------------+-------------+----------------+----------------------------------------------+------------+----------------+------------------------------+
- | EVENT_NAME | COUNT_ALLOC | mem_sum | mem_sum_mb | NAME | TYPE | PROCESSLIST_ID | LEFT(t.PROCESSLIST_INFO, 10) |
- +----------------------+-------------+-------------+----------------+----------------------------------------------+------------+----------------+------------------------------+
- | memory/sql/Log_event | 818062681 | 36821553500 | 35115.76986313 | thread/group_rpl/THD_applier_module_receiver | FOREGROUND | 12 | Group repl |
- | memory/innodb/memory | 258356 | 266640048 | 254.28776550 | thread/sql/one_connection | FOREGROUND | 40222 | load data |
- | memory/innodb/memory | 255478 | 263811432 | 251.59018707 | thread/sql/one_connection | FOREGROUND | 40204 | load data |
- | memory/innodb/memory | 217298 | 224575448 | 214.17183685 | thread/sql/one_connection | FOREGROUND | 40209 | load data |
- | memory/innodb/memory | 212201 | 219160304 | 209.00755310 | thread/sql/one_connection | FOREGROUND | 40215 | load data |
- | memory/innodb/memory | 209052 | 215978440 | 205.97309113 | thread/sql/one_connection | FOREGROUND | 40212 | load data |
- | memory/innodb/memory | 203823 | 210364872 | 200.61957550 | thread/sql/one_connection | FOREGROUND | 40220 | load data |
- | memory/innodb/memory | 201921 | 208627128 | 198.96233368 | thread/sql/one_connection | FOREGROUND | 40224 | load data |
- | memory/innodb/memory | 195252 | 202055944 | 192.69556427 | thread/sql/one_connection | FOREGROUND | 40214 | load data |
- | memory/innodb/memory | 193319 | 199526048 | 190.28286743 | thread/sql/one_connection | FOREGROUND | 40208 | load data |
- | memory/innodb/memory | 192498 | 198820216 | 189.60973358 | thread/sql/one_connection | FOREGROUND | 40227 | load data |
- | memory/innodb/memory | 191717 | 198099104 | 188.92202759 | thread/sql/one_connection | FOREGROUND | 40205 | load data |
- | memory/innodb/memory | 191234 | 197764864 | 188.60327148 | thread/sql/one_connection | FOREGROUND | 40202 | load data |
- | memory/innodb/memory | 190012 | 196401888 | 187.30343628 | thread/sql/one_connection | FOREGROUND | 40216 | load data |
- | memory/innodb/memory | 189098 | 195217576 | 186.17398834 | thread/sql/one_connection | FOREGROUND | 40207 | load data |
- | memory/innodb/memory | 188670 | 195084304 | 186.04689026 | thread/sql/one_connection | FOREGROUND | 40223 | load data |
- | memory/innodb/memory | 187466 | 193563912 | 184.59693146 | thread/sql/one_connection | FOREGROUND | 40218 | load data |
- | memory/innodb/memory | 187045 | 193354488 | 184.39720917 | thread/sql/one_connection | FOREGROUND | 40217 | load data |
- | memory/innodb/memory | 186838 | 193196152 | 184.24620819 | thread/sql/one_connection | FOREGROUND | 40219 | load data |
- | memory/innodb/memory | 186465 | 192576408 | 183.65517426 | thread/sql/one_connection | FOREGROUND | 40210 | load data |
- +----------------------+-------------+-------------+----------------+----------------------------------------------+------------+----------------+------------------------------+
复制代码 从上面的查询结果可见,当前有较多的 LOAD DATA 请求正在运行,有大概是它们导致的内存占用较高的原因。
其中
- CURRENT_NUMBER_OF_BYTES_USED 表示当前分配但尚未开释的内存块的聚合大小。CURRENT_NUMBER_OF_BYTES_USED = SUM_NUMBER_OF_BYTES_ALLOC − SUM_NUMBER_OF_BYTES_FREE。
- SUM_NUMBER_OF_BYTES_ALLOC 表示已分配内存块的聚合大小。
- SUM_NUMBER_OF_BYTES_FREE 表示已开释内存块的聚合大小。
排查分析道这里,根本上可以推断是由于有大量并发 LOAD DATA 导入数据请求导致 mysqld 内存占用较高。
3. 利用 sys schema 简化分析
相对于用 Performance Schema 排查分析,接纳 sys schema 分析则更简朴省事。接下来介绍如何利用 sys schema 分析。
GreatSQL sys schema 是一组视图、存储过程和函数的集合,它基于 performance_schema 提供了更易读和易用的性能数据汇总。sys schema 通过简化复杂的性能指标,帮助数据库管理员和开发人员快速诊断和优化 GreatSQL 的性能问题。
3.1 检察全局及各模块内存分布
起首,检察当前全部内存分配环境:- greatsql> USE sys;
- greatsql> SELECT * FROM memory_global_total;
- +-----------------+
- | total_allocated |
- +-----------------+
- | 22.08 GiB |
- +-----------------+
复制代码 在 IBP 设置为 20G 的前提下,从 memory_global_total 查询到的内存分配总数并没有超过太多,说明较大大概性是由于用户的 SQL 请求(比如服从较低的慢查询 SQL)或其他模块引起。
继续查询内存使用的全局分布环境:- greatsql> SELECT
- *
- FROM
- sys.memory_global_by_current_bytes
- LIMIT 20;
- +-----------------------------------------------------------------------------+---------------+---------------+-------------------+------------+------------+----------------+
- | event_name | current_count | current_alloc | current_avg_alloc | high_count | high_alloc | high_avg_alloc |
- +-----------------------------------------------------------------------------+---------------+---------------+-------------------+------------+------------+----------------+
- | memory/innodb/buf_buf_pool | 160 | 20.45 GiB | 130.88 MiB | 160 | 20.45 GiB | 130.88 MiB |
- | memory/group_rpl/GCS_XCom::xcom_cache | 4295 | 1018.00 MiB | 242.71 KiB | 463303 | 1.13 GiB | 2.55 KiB |
- | memory/mysys/IO_CACHE | 175 | 280.82 MiB | 1.60 MiB | 539 | 906.46 MiB | 1.68 MiB |
- | memory/performance_schema/events_statements_summary_by_digest | 1 | 40.28 MiB | 40.28 MiB | 1 | 40.28 MiB | 40.28 MiB |
- | memory/innodb/log_buffer_memory | 1 | 32.00 MiB | 32.00 MiB | 1 | 32.00 MiB | 32.00 MiB |
- | memory/innodb/ut0link_buf | 2 | 24.00 MiB | 12.00 MiB | 2 | 24.00 MiB | 12.00 MiB |
- | memory/innodb/lock0lock | 9893 | 21.40 MiB | 2.22 KiB | 9893 | 21.40 MiB | 2.22 KiB |
- | memory/sql/TABLE | 5796 | 17.49 MiB | 3.09 KiB | 5798 | 17.50 MiB | 3.09 KiB |
- | memory/performance_schema/events_statements_history_long | 1 | 14.34 MiB | 14.34 MiB | 1 | 14.34 MiB | 14.34 MiB |
- | memory/performance_schema/events_errors_summary_by_thread_by_error | 257 | 13.89 MiB | 55.33 KiB | 257 | 13.89 MiB | 55.33 KiB |
- | memory/performance_schema/events_statements_summary_by_thread_by_event_name | 1 | 13.66 MiB | 13.66 MiB | 1 | 13.66 MiB | 13.66 MiB |
- | memory/innodb/memory | 7583 | 12.28 MiB | 1.66 KiB | 8812 | 16.80 MiB | 1.95 KiB |
- | memory/performance_schema/file_instances | 4 | 11.00 MiB | 2.75 MiB | 4 | 11.00 MiB | 2.75 MiB |
- | memory/performance_schema/events_statements_history_long.digest_text | 1 | 9.77 MiB | 9.77 MiB | 1 | 9.77 MiB | 9.77 MiB |
- | memory/performance_schema/events_statements_summary_by_digest.digest_text | 1 | 9.77 MiB | 9.77 MiB | 1 | 9.77 MiB | 9.77 MiB |
- | memory/performance_schema/events_statements_history_long.sql_text | 1 | 9.77 MiB | 9.77 MiB | 1 | 9.77 MiB | 9.77 MiB |
- | memory/performance_schema/memory_summary_by_thread_by_event_name | 1 | 9.32 MiB | 9.32 MiB | 1 | 9.32 MiB | 9.32 MiB |
- | memory/performance_schema/table_handles | 1 | 9.06 MiB | 9.06 MiB | 1 | 9.06 MiB | 9.06 MiB |
- | memory/mysys/KEY_CACHE | 3 | 8.00 MiB | 2.67 MiB | 3 | 8.00 MiB | 2.67 MiB |
- | memory/innodb/sync0arr | 3 | 7.03 MiB | 2.34 MiB | 3 | 7.03 MiB | 2.34 MiB |
- +-----------------------------------------------------------------------------+---------------+---------------+-------------------+------------+------------+----------------+
复制代码 在 sys schema 中,大部门视图都同时存储原始数据以及格式化后可读性更强的两种视图。以是上面的 SQL 查询还可以改成查询原始未格式化的视图:- greatsql> SELECT
- *
- FROM
- sys.x$memory_global_by_current_bytes
- LIMIT 20;
- +-----------------------------------------------------------------------------+---------------+---------------+-------------------+------------+-------------+----------------+
- | event_name | current_count | current_alloc | current_avg_alloc | high_count | high_alloc | high_avg_alloc |
- +-----------------------------------------------------------------------------+---------------+---------------+-------------------+------------+-------------+----------------+
- | memory/innodb/buf_buf_pool | 160 | 21957836800 | 137236480.0000 | 160 | 21957836800 | 137236480.0000 |
- | memory/group_rpl/GCS_XCom::xcom_cache | 4068 | 1067435559 | 262398.1217 | 463303 | 1208663474 | 2608.7970 |
- | memory/mysys/IO_CACHE | 126 | 206147792 | 1636093.5873 | 539 | 950487072 | 1763426.8497 |
- | memory/performance_schema/events_statements_summary_by_digest | 1 | 42240000 | 42240000.0000 | 1 | 42240000 | 42240000.0000 |
- | memory/innodb/log_buffer_memory | 1 | 33555440 | 33555440.0000 | 1 | 33555440 | 33555440.0000 |
- | memory/innodb/ut0link_buf | 2 | 25165888 | 12582944.0000 | 2 | 25165888 | 12582944.0000 |
- | memory/innodb/lock0lock | 9893 | 22440096 | 2268.2802 | 9893 | 22440096 | 2268.2802 |
- | memory/sql/TABLE | 5796 | 18341476 | 3164.5059 | 5798 | 18351820 | 3165.1983 |
- | memory/performance_schema/events_statements_history_long | 1 | 15040000 | 15040000.0000 | 1 | 15040000 | 15040000.0000 |
- | memory/performance_schema/events_errors_summary_by_thread_by_error | 257 | 14561280 | 56658.6770 | 257 | 14561280 | 56658.6770 |
- | memory/performance_schema/events_statements_summary_by_thread_by_event_name | 1 | 14321664 | 14321664.0000 | 1 | 14321664 | 14321664.0000 |
- | memory/innodb/memory | 7562 | 12858512 | 1700.4115 | 8812 | 17615632 | 1999.0504 |
- | memory/performance_schema/file_instances | 4 | 11534336 | 2883584.0000 | 4 | 11534336 | 2883584.0000 |
- | memory/performance_schema/events_statements_history_long.digest_text | 1 | 10240000 | 10240000.0000 | 1 | 10240000 | 10240000.0000 |
- | memory/performance_schema/events_statements_summary_by_digest.digest_text | 1 | 10240000 | 10240000.0000 | 1 | 10240000 | 10240000.0000 |
- | memory/performance_schema/events_statements_history_long.sql_text | 1 | 10240000 | 10240000.0000 | 1 | 10240000 | 10240000.0000 |
- | memory/performance_schema/memory_summary_by_thread_by_event_name | 1 | 9768960 | 9768960.0000 | 1 | 9768960 | 9768960.0000 |
- | memory/performance_schema/table_handles | 1 | 9502720 | 9502720.0000 | 1 | 9502720 | 9502720.0000 |
- | memory/mysys/KEY_CACHE | 3 | 8390864 | 2796954.6667 | 3 | 8390864 | 2796954.6667 |
- | memory/innodb/sync0arr | 3 | 7373032 | 2457677.3333 | 3 | 7373032 | 2457677.3333 |
- +-----------------------------------------------------------------------------+---------------+---------------+-------------------+------------+-------------+----------------+
复制代码 从上面两个查询结果可知,除了 IBP 和 MGR 之外,模块 memory/mysys/IO_CACHE 占用的内存最高,是重点分析排查对象。
检察 sys.memory_global_by_current_bytes 视图定义,可知它的原始数据来自 performance_schema:- greatsql> SHOW CREATE VIEW sys.memory_global_by_current_bytes\G
- *************************** 1. row ***************************
- View: memory_global_by_current_bytes
- Create View: CREATE ALGORITHM=MERGE DEFINER=`mysql.sys`@`localhost` SQL SECURITY INVOKER VIEW
- `memory_global_by_current_bytes` (`event_name`,`current_count`,`current_alloc`,`current_avg_alloc`,
- `high_count`,`high_alloc`,`high_avg_alloc`)
- AS select `performance_schema`.`memory_summary_global_by_event_name`.`EVENT_NAME`
- AS `event_name`,`performance_schema`.`memory_summary_global_by_event_name`.`CURRENT_COUNT_USED`
- AS `current_count`,format_bytes(`performance_schema`.`memory_summary_global_by_event_name`.`CURRENT_NUMBER_OF_BYTES_USED`)
- AS `current_alloc`,format_bytes(ifnull((`performance_schema`.`memory_summary_global_by_event_name`.`CURRENT_NUMBER_OF_BYTES_USED` / nullif(`performance_schema`.`memory_summary_global_by_event_name`.`CURRENT_COUNT_USED`,0)),0))
- AS `current_avg_alloc`,`performance_schema`.`memory_summary_global_by_event_name`.`HIGH_COUNT_USED`
- AS `high_count`,format_bytes(`performance_schema`.`memory_summary_global_by_event_name`.`HIGH_NUMBER_OF_BYTES_USED`)
- AS `high_alloc`,format_bytes(ifnull((`performance_schema`.`memory_summary_global_by_event_name`.`HIGH_NUMBER_OF_BYTES_USED` / nullif(`performance_schema`.`memory_summary_global_by_event_name`.`HIGH_COUNT_USED`,0)),0))
- AS `high_avg_alloc` from `performance_schema`.`memory_summary_global_by_event_name`
- where (`performance_schema`.`memory_summary_global_by_event_name`.`CURRENT_NUMBER_OF_BYTES_USED` > 0)
- order by `performance_schema`.`memory_summary_global_by_event_name`.`CURRENT_NUMBER_OF_BYTES_USED` desc
- character_set_client: utf8mb4
- collation_connection: utf8mb4_0900_ai_ci
复制代码 从 performance_schema 中读取源数据,并举行格式化处理,大大提升了可读性。同理,其他视图也如此。
3.2 检察各线程内存分布
检察各线程的内存使用详情:- -- 按历史总消耗内存排序
- -- 这里因为要按 total_allocated 列排序,所以查询原始视图
- greatsql> SELECT
- *
- FROM
- sys.x$memory_by_thread_by_current_bytes
- ORDER BY
- total_allocated DESC
- LIMIT 20;
- +-----------+---------------------------------------+--------------------+-------------------+-------------------+-------------------+-----------------+
- | thread_id | user | current_count_used | current_allocated | current_avg_alloc | current_max_alloc | total_allocated |
- +-----------+---------------------------------------+--------------------+-------------------+-------------------+-------------------+-----------------+
- | 57 | group_rpl/THD_applier_module_receiver | 87 | 62603 | 719.5747 | 21888 | 35248068439 |
- | 33632 | root@localhost | 30 | 8592036 | 286401.2000 | 8388736 | 1450180050 |
- | 45 | innodb/clone_gtid_thread | 5530 | 1916646 | 346.5906 | 1242184 | 328052882 |
- | 34281 | root@localhost | 21 | 44051 | 2097.6667 | 20576 | 286781508 |
- | 34273 | root@localhost | 20 | 43707 | 2185.3500 | 20576 | 274540679 |
- | 34271 | root@localhost | 20 | 43707 | 2185.3500 | 20576 | 273058531 |
- | 34282 | root@localhost | 21 | 44003 | 2095.3810 | 20576 | 272966254 |
- | 34275 | root@localhost | 20 | 43707 | 2185.3500 | 20576 | 261564478 |
- | 34274 | root@localhost | 20 | 43707 | 2185.3500 | 20576 | 240307573 |
- | 34280 | root@localhost | 20 | 43707 | 2185.3500 | 20576 | 238306694 |
- | 34284 | root@localhost | 20 | 43707 | 2185.3500 | 20576 | 235438640 |
- | 34272 | root@localhost | 21 | 44051 | 2097.6667 | 20576 | 232405048 |
- | 34283 | root@localhost | 20 | 43707 | 2185.3500 | 20576 | 226022807 |
- | 34270 | root@localhost | 21 | 44051 | 2097.6667 | 20576 | 222124926 |
- | 34277 | root@localhost | 20 | 43707 | 2185.3500 | 20576 | 216611682 |
- | 34268 | root@localhost | 20 | 43707 | 2185.3500 | 20576 | 216088005 |
- | 34269 | root@localhost | 20 | 43707 | 2185.3500 | 20576 | 215724518 |
- | 34276 | root@localhost | 20 | 43707 | 2185.3500 | 20576 | 215354247 |
- | 34286 | root@localhost | 20 | 43707 | 2185.3500 | 20576 | 214817414 |
- | 34278 | root@localhost | 18 | 41387 | 2299.2778 | 20576 | 213726193 |
- +-----------+---------------------------------------+--------------------+-------------------+-------------------+-------------------+-----------------+
- -- 按当前内存消耗排序
- -- 已默认按 current_allocated 排序,所以无需查询原始视图
- SELECT
- *
- FROM
- sys.memory_by_thread_by_current_bytes
- LIMIT 20;
- +-----------+---------------------------------------+--------------------+-------------------+-------------------+-------------------+-----------------+
- | thread_id | user | current_count_used | current_allocated | current_avg_alloc | current_max_alloc | total_allocated |
- +-----------+---------------------------------------+--------------------+-------------------+-------------------+-------------------+-----------------+
- | 44680 | root@localhost | 91 | 16.21 MiB | 182.42 KiB | 16.00 MiB | 408.03 MiB |
- | 45 | innodb/clone_gtid_thread | 5932 | 1.96 MiB | 346 bytes | 1.32 MiB | 327.43 MiB |
- | 1 | sql/main | 4938 | 1.30 MiB | 276 bytes | 427.63 KiB | 8.61 MiB |
- | 22 | innodb/log_writer_thread | 2347 | 293.38 KiB | 128 bytes | 293.38 KiB | 293.38 KiB |
- | 63 | group_rpl/THD_mysql_thread | 208 | 182.98 KiB | 900 bytes | 130.20 KiB | 378.95 KiB |
- | 57 | group_rpl/THD_applier_module_receiver | 87 | 61.14 KiB | 719 bytes | 21.38 KiB | 36.14 GiB |
- | 59 | sql/replica_sql | 68 | 59.56 KiB | 896 bytes | 16.04 KiB | 129.57 KiB |
- | 60 | sql/replica_worker | 31 | 44.04 KiB | 1.42 KiB | 16.04 KiB | 53.38 KiB |
- | 45888 | root@localhost | 22 | 43.31 KiB | 1.97 KiB | 20.09 KiB | 312.29 MiB |
- | 45897 | root@localhost | 22 | 43.31 KiB | 1.97 KiB | 20.09 KiB | 369.21 MiB |
- | 45899 | root@localhost | 22 | 43.31 KiB | 1.97 KiB | 20.09 KiB | 315.29 MiB |
- | 45905 | root@localhost | 22 | 43.31 KiB | 1.97 KiB | 20.09 KiB | 317.19 MiB |
- | 45908 | root@localhost | 22 | 43.31 KiB | 1.97 KiB | 20.09 KiB | 307.82 MiB |
- | 45890 | root@localhost | 21 | 43.02 KiB | 2.05 KiB | 20.09 KiB | 400.17 MiB |
- | 45891 | root@localhost | 21 | 43.02 KiB | 2.05 KiB | 20.09 KiB | 336.53 MiB |
- | 45886 | root@localhost | 21 | 43.02 KiB | 2.05 KiB | 20.09 KiB | 324.93 MiB |
- | 45889 | root@localhost | 21 | 43.02 KiB | 2.05 KiB | 20.09 KiB | 303.29 MiB |
- | 45907 | root@localhost | 21 | 43.02 KiB | 2.05 KiB | 20.09 KiB | 309.84 MiB |
- | 45911 | root@localhost | 21 | 43.02 KiB | 2.05 KiB | 20.09 KiB | 308.27 MiB |
- | 45919 | root@localhost | 21 | 42.97 KiB | 2.05 KiB | 20.09 KiB | 306.36 MiB |
- +-----------+---------------------------------------+--------------------+-------------------+-------------------+-------------------+-----------------+
复制代码 同样地,还可以和 performance_schema.threads 关联查询,就可以找到相应线程/会话中大概正在运行的 SQL 请求。
从查询结果明显可知,是由于当前有大量 root@localhost 连接会话执行 LOAD DATA 导入数据,这些会话占用了较多内存。
3.3 检察各用户内存分配
如果怀疑是某个用户的查询导致内存斲丧过高,还可按用户分别统计:- greatsql> SELECT
- *
- FROM
- sys.memory_by_user_by_current_bytes;
- +-----------------+--------------------+-------------------+-------------------+-------------------+-----------------+
- | user | current_count_used | current_allocated | current_avg_alloc | current_max_alloc | total_allocated |
- +-----------------+--------------------+-------------------+-------------------+-------------------+-----------------+
- | background | 13993 | 3.99 MiB | 298 bytes | 1.33 MiB | 40.07 GiB |
- | root | 859 | 2.76 MiB | 3.29 KiB | 1.00 MiB | 3.95 TiB |
- | event_scheduler | 3 | 16.27 KiB | 5.42 KiB | 16.04 KiB | 16.27 KiB |
- +-----------------+--------------------+-------------------+-------------------+-------------------+-----------------+
复制代码 看到 root 用户汗青上总斲丧了 3.95 TB 内存,可见它的怀疑最大。执行 SHOW PROCESSLIST 可以看到当前 root 用户在反复执行并发导入大量数据,这个原因造成了内存总斲丧超过较大,等候导入完成后,天然就会接纳开释。
综合以上两种分析方法和过程,根本上可以排查定位是什么原因导致 mysqld 历程占用过多内存。
4. 检查内存分配的重要大概原因
4.1 InnoDB 内存相干设置
InnoDB 模块大概斲丧大量内存,以下参数必要关注:
- innodb_buffer_pool_size IBP缓冲池。
- innodb_log_buffer_size 事件日记缓冲区。
- innodb_adaptive_hash_index 自适应哈希索引,默认开启,大概占用额外内存。发起关闭。
- innodb_buffer_pool_instances 缓冲池分区数量,过多分区大概引起额外内存开销。
分别检查确认这些参数设置环境:- greatsql> SHOW GLOBAL VARIABLES LIKE 'innodb%';
复制代码 4.2 临时表内存
如果查询生成大量临时表,大概会占用内存。以下参数决定了临时表的大小和行为:
- tmp_table_size 和 max_heap_table_size:
- greatsql> SHOW GLOBAL VARIABLES LIKE 'tmp_table_size';
- greatsql> SHOW GLOBAL VARIABLES LIKE 'max_heap_table_size';
复制代码- greatsql> SHOW GLOBAL STATUS LIKE 'Created_tmp%';
复制代码 4.3 线程/会话内存
高并发会导致内存分配超标,尤其是以下参数:
- thread_stack 每个线程的栈大小,默认 256KB。
- read_buffer_size / read_rnd_buffer_size / join_buffer_size / sort_buffer_size 线程级分配的内存缓冲区。
4.4 复杂查询的内存斲丧
复杂的排序、联接、子查询等操纵会额外分配内存缓冲区,如果有较多的慢查询也表明大概存在一些斲丧较多内存的查询请求,可以通过查询以下变量确认斲丧:- greatsql> SHOW GLOBAL STATUS LIKE 'Sort_merge_passes';
- greatsql> SHOW GLOBAL STATUS LIKE 'Select_full_join';
- greatsql> SHOW GLOBAL STATUS LIKE 'Slow_queries';
复制代码 4.5 表缓存与元数据缓存
表和源数据缓存 table_open_cache 和 table_definition_cache 也大概占用较多内存:- greatsql> SHOW VARIABLES LIKE 'table_open_cache';
- greatsql> SHOW VARIABLES LIKE 'table_definition_cache';
复制代码 5. 分析排查方法总结
- 确认内存使用是否超标:结合系统工具与 GreatSQL 内部视图分析。
- 确定详细内存分配模块:通过 performance schema 或 sys schema 系统视图查询。
- 检查确认内存、缓冲等相干参数是否设置合理:
- 如果临时表斲丧过高,降低 tmp_table_size 和 max_heap_table_size。
- 如果线程占用过多内存,调解 read_buffer_size 和 join_buffer_size。
- 如果 IBP 占用过多内存,则适当调低 innodb_buffer_pool_size,一般上限设置为物理内存的 70% 左右。
- 优化查询和索引设计,避免复杂查询和不必要的临时表创建。
- 优化慢查询 SQL 请求,避免低服从的 SQL 请求斲丧过多CPU、内存及磁盘 I/O 资源,并对其他 SQL 请求造成间接关联影响。
相信通过以上方法,根本上可以分析定位并解决 mysqld 历程内存占用异常的问题。
6. 如何避免 GreatSQL 斲丧过多内存
从上面的分析排查过程及思路中,也就知道了有哪些方法可以避免让 GreatSQL 在运行过程中斲丧太多内存,以下是几条发起:
- 接纳 jemalloc 代替 glibc 自带的 malloc 库,其优势在于减少内存碎片和提升高并发场景下内存的分配服从,进步内存管理服从的同时还能降低数据库运行时发生 OOM 的风险。在本案例中,原来 mysqld 历程最高跑到27.8G(占物理内存90%),改用 jemalloc 后最高只跑到24.2G(占物理内存78.7%),结果相当明显。
- 根据数据库负载以及业务特征,设置合适的 IBP 值,一般上限设置为物理内存的 70% 左右,设置过大轻易造成 OOM。同时也要根据实际环境,适当调解(或调低)会话级缓冲池,包罗 tmp_table_size / max_heap_table_size / read_buffer_size / read_rnd_buffer_size / join_buffer_size / sort_buffer_size / thread_stack 等多个参数,同时也要适当控制最大连接数参数 max_connection。
- 加强监控,实时发现并处理一些斲丧内存较大的 SQL 操纵,比如大事件(把大事件拆分成多个小事件)、长事件(长时间不提交的事件要做好监控并发出告警,甚至自动终止这些事件),以及全表扫描、分组、排序、多表联接(是否可以添加合适的索引)等范例,这种请求通常比力轻易产生临时表、临时文件,通常也是慢查询 SQL,必要重点关注。要定时巡查并优化这些慢查询 SQL。
重点做好上面这几点,根本上就能避免大部门轻易造成 mysqld 斲丧内存过多的环境,让 GreatSQL 运行的更丝滑安稳。
以上。
Enjoy GreatSQL
免责声明:如果侵犯了您的权益,请联系站长,我们会及时删除侵权内容,谢谢合作!更多信息从访问主页:qidao123.com:ToB企服之家,中国第一个企服评测及商务社交产业平台。 |