SELECT ifNull(sum(t1.unTrackQty), 0) AS unTrackQtyFROM wms.wms_order_sku_local AS t1 FINAL PREWHERE t1.shipmentOrderCreateTime > '2021-11-17 11:00:00' AND t1.shipmentOrderCreateTime <= '2021-11-18 11:00:00' AND t1.gridStationNo = 'WG0000514' AND t1.warehouseNo NOT IN ('wms-6-979', 'wms-6-978', '6_979', '6_978') AND t1.orderType = '10'WHERE t1.ckDeliveryTaskStatus = '3'
复制代码
现在分析下,从上述日志中能够拿到什么信息,首先该查询语句没有使用主键索引,具体信息如下
2022.02.17 21:21:54.038239 [ 618 ] {ea8f56fe-cf2b-4260-8f44-a006458bdab3} wms.wms_order_sku_local (SelectExecutor): Key condition: unknown, unknown, and, unknown, unknown, and, and, unknown, unknown, and, and
同样也没有使用分区索引,具体信息如下
2022.02.17 21:21:54.038271 [ 618 ] {ea8f56fe-cf2b-4260-8f44-a006458bdab3} wms.wms_order_sku_local (SelectExecutor): MinMax index condition: unknown, unknown, and, unknown, unknown, and, and, unknown, unknown, and, and
此次查询一共扫描36个parts,9390个MarkRange,通过查询system.parts系统分区信息表发现当前表一共拥有36个活跃的分区,相当于全表扫描。
2022.02.17 21:44:58.012832 [ 1138 ] {f1561330-4988-4598-a95d-bd12b15bc750} wms.wms_order_sku_local (SelectExecutor): Selected 36 parts by date, 36 parts by key, 9390 marks by primary key, 9390 marks to read from 36 ranges
此次查询总共读取了73645604 行数据,这个行数也是这个表的总数据行数,读取耗时0.229100749s,共读取1.20GB的数据。
2022.02.17 21:21:54.265490 [ 618 ] {ea8f56fe-cf2b-4260-8f44-a006458bdab3} executeQuery: Read 73645604 rows, 1.20 GiB in 0.229100749 sec., 321455099 rows/sec., 5.22 GiB/sec.
此次查询语句消耗的内存最大为60.37MB
2022.02.17 21:21:54.265551 [ 618 ] {ea8f56fe-cf2b-4260-8f44-a006458bdab3} MemoryTracker: Peak memory usage (for query): 60.37 MiB.
最后汇总了下信息,此次查询总共耗费了0.267s,处理了7365W数据,共1.28GB,并且给出了数据处理速度。
1 rows in set. Elapsed: 0.267 sec. Processed 73.65 million rows, 1.28 GB (276.03 million rows/s., 4.81 GB/s.)
通过上述可以发现两点严重问题
从实践上看,设置成Nullable对性能影响也没有多大,可能是因为我们数据量比较小。不过官方已经明确指出尽量不要使用Nullable类型,因为Nullable字段不能被索引,而且Nullable列除了有一个存储正常值的文件,还会有一个额外的文件来存储Null标记。
Using Nullable almost always negatively affects performance, keep this in mind when designing your databases.
-- 创建query_log分布式表CREATE TABLE IF NOT EXISTS system.query_log_allON CLUSTER defaultAS system.query_logENGINE = Distributed(sht_ck_cluster_pro,system,query_log,rand());-- 查询语句select -- 执行次数 count(), -- 平均查询时间 avg(query_duration_ms) avgTime, -- 平均每次读取数据行数 floor(avg(read_rows)) avgRow, -- 平均每次读取数据大小 floor(avg(read_rows) / 10000000) avgMB, -- 具体查询语句 any(query), -- 去除掉where条件,用户group by归类 substring(query, positionCaseInsensitive(query, 'select'), positionCaseInsensitive(query, 'from')) as queryLimitfrom system.query_log_all/system.query_logwhere event_date = '2022-01-21' and type = 2group by queryLimitorder by avgRow desc;