加快无索引表引起的主从延长数据回放
一、场景
由于某些缘故原由,客户现场存在一张 8千万 的大表,而且该表上无任何索引(也无主键),平时该表上 UPDATE 或 DELETE 只操作几条数据。忽然有一天业务进行了某种操作,DELETE 2万 条数据,悲剧发生了,当在主库上执行了之后,传到从库上之后一直回放,当时评估了下可能会回放10天,后来在经过业务同意之后,对表进行操作,用于加快回放日志,处理该问题。
二、处理思路
停掉复制线程,关闭 Binlog 日志记载,添加索引,重启复制线程。
三、复现步骤
1、准备数据:
创建库:- greatsql> CREATE DATABASE qj;
- Query OK, 1 row affected (0.01 sec)
复制代码 准备数据,准备一张8千万数据的表:- $ sysbench --db-driver=mysql --mysql-host=192.168.139.230 --mysql-port=3307 --mysql-user=greatsql --mysql-password=GreatSQL@2024 --mysql-db=qj --table_size=80000000 --tables=1 oltp_write_only prepare
- sysbench 1.1.0-df89d34 (using bundled LuaJIT 2.1.0-beta3)
- Creating table 'sbtest1'...
- Inserting 80000000 records into 'sbtest1'
- Creating a secondary index on 'sbtest1'...
复制代码 主节点修改表布局,将表的索引去掉- greatsql> USE qj
- Database changed
- greatsqll> ALTER TABLE sbtest1 MODIFY id int not null;
- Query OK, 80000000 rows affected (36 min 29.13 sec)
- Records: 80000000 Duplicates: 0 Warnings: 0
- greatsql> ALTER TABLE sbtest1 DROP key `k_1`;
- Query OK, 0 rows affected (0.96 sec)
- Records: 0 Duplicates: 0 Warnings: 0
- greatsql> ALTER TABLE sbtest1 DROP PRIMARY KEY;
- Query OK, 80000000 rows affected (14 min 29.31 sec)
- Records: 80000000 Duplicates: 0 Warnings: 0
复制代码 2、模仿延长:
01、主节点:
模仿删除数据- greatsql> DELETE FROM qj.sbtest1 WHERE k<20000;
- Query OK, 19894 rows affected (4 min 20.40 sec)
复制代码 03、通过操作从节点加快日志回放
找到回放线程,KILL掉回放线程,关闭当前会话binlog日志记载,添加索引,重新启动复制回放线程- greatsql> SELECT sleep(300);
- +------------+
- | sleep(300) |
- +------------+
- | 0 |
- +------------+
- 1 row in set (5 min 0.01 sec)
- greatsql> SHOW SLAVE STATUS \G
- *************************** 1. row ***************************
- Slave_IO_State: Waiting for source to send event
- Master_Host: 172.17.139.230
- Master_User: greatsql
- Master_Port: 3307
- Connect_Retry: 60
- Master_Log_File: binlog.000078
- Read_Master_Log_Pos: 953750186
- Relay_Log_File: relaylog.000231
- Relay_Log_Pos: 949954084
- Relay_Master_Log_File: binlog.000078
- Slave_IO_Running: Yes
- Slave_SQL_Running: Yes
- Replicate_Do_DB:
- Replicate_Ignore_DB:
- Replicate_Do_Table:
- Replicate_Ignore_Table:
- Replicate_Wild_Do_Table:
- Replicate_Wild_Ignore_Table:
- Last_Errno: 0
- Last_Error:
- Skip_Counter: 0
- Exec_Master_Log_Pos: 949953874
- Relay_Log_Space: 953750683
- Until_Condition: None
- Until_Log_File:
- Until_Log_Pos: 0
- Master_SSL_Allowed: No
- Master_SSL_CA_File:
- Master_SSL_CA_Path:
- Master_SSL_Cert:
- Master_SSL_Cipher:
- Master_SSL_Key:
- Seconds_Behind_Master: 1345
- Master_SSL_Verify_Server_Cert: No
- Last_IO_Errno: 0
- Last_IO_Error:
- Last_SQL_Errno: 0
- Last_SQL_Error:
- Replicate_Ignore_Server_Ids:
- Master_Server_Id: 1000403307
- Master_UUID: 1d0963e9-85d9-11ef-80e1-00163e28e06a
- Master_Info_File: mysql.slave_master_info
- SQL_Delay: 0
- SQL_Remaining_Delay: NULL
- Slave_SQL_Running_State: Replica has read all relay log; waiting for more updates
- Master_Retry_Count: 86400
- Master_Bind:
- Last_IO_Error_Timestamp:
- Last_SQL_Error_Timestamp:
- Master_SSL_Crl:
- Master_SSL_Crlpath:
- Retrieved_Gtid_Set: 1d0963e9-85d9-11ef-80e1-00163e28e06a:1-163235
- Executed_Gtid_Set: 1d0963e9-85d9-11ef-80e1-00163e28e06a:1-163234
- Auto_Position: 1
- Replicate_Rewrite_DB:
- Channel_Name:
- Master_TLS_Version:
- Master_public_key_path:
- Get_master_public_key: 0
- Network_Namespace:
- 1 row in set, 1 warning (0.00 sec)
- greatsql> SHOW GLOBAL VARIABLES LIKE 'read_only';
- +---------------+-------+
- | Variable_name | Value |
- +---------------+-------+
- | read_only | OFF |
- +---------------+-------+
- 1 row in set (0.01 sec)
复制代码 四、思索
01、如何保持主从划一性
02、正在回放数据时,STOP SLAVE 是否会有问题
03、如果主从布局作为数据节点,上层还有计算节点该如何保持元数据的划一性即计算节点记载的表的布局和数据节点的表布局要划一
关于以上三个问题我的处理方案如下:
1、如何保持主从划一性- greatsql> SHOW PROCESSLIST;
- +--------+-----------------+----------------------+------+---------+---------+----------------------------------------------------------+------------------+------------+-----------+---------------+
- | Id | User | Host | db | Command | Time | State | Info | Time_ms | Rows_sent | Rows_examined |
- +--------+-----------------+----------------------+------+---------+---------+----------------------------------------------------------+------------------+------------+-----------+---------------+
- | 5 | event_scheduler | localhost | NULL | Daemon | 2369454 | Waiting on empty queue | NULL | 2369454039 | 0 | 0 |
- | 17 | system user | connecting host | NULL | Connect | 2369447 | Waiting for source to send event | NULL | 2369446555 | 0 | 0 |
- | 543448 | greatsql | 172.17.136.93:48298 | NULL | Sleep | 110 | | NULL | 109832 | 0 | 0 |
- | 543500 | greatsql | 172.17.139.230:35198 | NULL | Sleep | 586 | | NULL | 585885 | 0 | 0 |
- | 543588 | greatsql | 172.17.136.93:57948 | NULL | Sleep | 50 | | NULL | 49878 | 0 | 0 |
- | 543663 | greatsql | 172.17.139.230:35726 | NULL | Query | 0 | init | show processlist | 0 | 0 | 0 |
- | 543666 | greatsql | 172.17.136.93:58908 | NULL | Sleep | 290 | | NULL | 289861 | 0 | 0 |
- | 543708 | system user | | NULL | Query | 477 | Replica has read all relay log; waiting for more updates | NULL | 476520 | 0 | 0 |
- | 543709 | system user | | qj | Query | 1383 | Applying batch of row changes (delete) | NULL | 26102 | 0 | 0 |
- | 543710 | system user | | NULL | Connect | 477 | Waiting for an event from Coordinator | NULL | 476570 | 0 | 0 |
- | 543711 | system user | | NULL | Connect | 477 | Waiting for an event from Coordinator | NULL | 476569 | 0 | 0 |
- | 543712 | system user | | NULL | Connect | 477 | Waiting for an event from Coordinator | NULL | 476568 | 0 | 0 |
- | 543714 | greatsql | 172.17.136.93:33582 | NULL | Sleep | 470 | | NULL | 469798 | 0 | 0 |
- +--------+-----------------+----------------------+------+---------+---------+----------------------------------------------------------+------------------+------------+-----------+---------------+
- 13 rows in set (0.00 sec)
- greatsql> KILL 543709;
- Query OK, 0 rows affected (0.00 sec)
- greatsql> SHOW PROCESSLIST;
- +--------+-----------------+----------------------+------+---------+---------+----------------------------------+------------------+------------+-----------+---------------+
- | Id | User | Host | db | Command | Time | State | Info | Time_ms | Rows_sent | Rows_examined |
- +--------+-----------------+----------------------+------+---------+---------+----------------------------------+------------------+------------+-----------+---------------+
- | 5 | event_scheduler | localhost | NULL | Daemon | 2369473 | Waiting on empty queue | NULL | 2369473241 | 0 | 0 |
- | 17 | system user | connecting host | NULL | Connect | 2369466 | Waiting for source to send event | NULL | 2369465757 | 0 | 0 |
- | 543448 | greatsql | 172.17.136.93:48298 | NULL | Sleep | 129 | | NULL | 129034 | 0 | 0 |
- | 543500 | greatsql | 172.17.139.230:35198 | NULL | Sleep | 605 | | NULL | 605087 | 0 | 0 |
- | 543588 | greatsql | 172.17.136.93:57948 | NULL | Sleep | 9 | | NULL | 9132 | 0 | 0 |
- | 543663 | greatsql | 172.17.139.230:35726 | NULL | Query | 0 | init | show processlist | 0 | 0 | 0 |
- | 543666 | greatsql | 172.17.136.93:58908 | NULL | Sleep | 309 | | NULL | 309064 | 0 | 0 |
- | 543714 | greatsql | 172.17.136.93:33582 | NULL | Sleep | 489 | | NULL | 489000 | 0 | 0 |
- +--------+-----------------+----------------------+------+---------+---------+----------------------------------+------------------+------------+-----------+---------------+
- 8 rows in set (0.00 sec)
- greatsql> SHOW SLAVE STATUS \G
- *************************** 1. row ***************************
- Slave_IO_State: Waiting for source to send event
- Master_Host: 172.17.139.230
- Master_User: greatsql
- Master_Port: 3307
- Connect_Retry: 60
- Master_Log_File: binlog.000078
- Read_Master_Log_Pos: 953750186
- Relay_Log_File: relaylog.000231
- Relay_Log_Pos: 949954084
- Relay_Master_Log_File: binlog.000078
- Slave_IO_Running: Yes
- Slave_SQL_Running: No
- Replicate_Do_DB:
- Replicate_Ignore_DB:
- Replicate_Do_Table:
- Replicate_Ignore_Table:
- Replicate_Wild_Do_Table:
- Replicate_Wild_Ignore_Table:
- Last_Errno: 1317
- Last_Error: Coordinator stopped because there were error(s) in the worker(s). The most recent failure being: Worker 1 failed executing transaction '1d0963e9-85d9-11ef-80e1-00163e28e06a:163235' at master log binlog.000078, end_log_pos 950241265. See error log and/or performance_schema.replication_applier_status_by_worker table for more details about this failure or others, if any.
- Skip_Counter: 0
- Exec_Master_Log_Pos: 949953874
- Relay_Log_Space: 953750683
- Until_Condition: None
- Until_Log_File:
- Until_Log_Pos: 0
- Master_SSL_Allowed: No
- Master_SSL_CA_File:
- Master_SSL_CA_Path:
- Master_SSL_Cert:
- Master_SSL_Cipher:
- Master_SSL_Key:
- Seconds_Behind_Master: NULL
- Master_SSL_Verify_Server_Cert: No
- Last_IO_Errno: 0
- Last_IO_Error:
- Last_SQL_Errno: 1317
- Last_SQL_Error: Coordinator stopped because there were error(s) in the worker(s). The most recent failure being: Worker 1 failed executing transaction '1d0963e9-85d9-11ef-80e1-00163e28e06a:163235' at master log binlog.000078, end_log_pos 950241265. See error log and/or performance_schema.replication_applier_status_by_worker table for more details about this failure or others, if any.
- Replicate_Ignore_Server_Ids:
- Master_Server_Id: 1000403307
- Master_UUID: 1d0963e9-85d9-11ef-80e1-00163e28e06a
- Master_Info_File: mysql.slave_master_info
- SQL_Delay: 0
- SQL_Remaining_Delay: NULL
- Slave_SQL_Running_State:
- Master_Retry_Count: 86400
- Master_Bind:
- Last_IO_Error_Timestamp:
- Last_SQL_Error_Timestamp: 241105 19:06:59
- Master_SSL_Crl:
- Master_SSL_Crlpath:
- Retrieved_Gtid_Set: 1d0963e9-85d9-11ef-80e1-00163e28e06a:1-163235
- Executed_Gtid_Set: 1d0963e9-85d9-11ef-80e1-00163e28e06a:1-163234
- Auto_Position: 1
- Replicate_Rewrite_DB:
- Channel_Name:
- Master_TLS_Version:
- Master_public_key_path:
- Get_master_public_key: 0
- Network_Namespace:
- 1 row in set, 1 warning (0.00 sec)
- greatsql> SET sql_log_bin=0;
- Query OK, 0 rows affected (0.00 sec)
- greatsql> ALTER TABLE qj.sbtest1 ADD index tmp_k(k);
- Query OK, 0 rows affected (4 min 51.92 sec)
- Records: 0 Duplicates: 0 Warnings: 0
- greatsql> START SLAVE;
- Query OK, 0 rows affected, 1 warning (0.02 sec)
- greatsql> SHOW SLAVE STATUS \G
- *************************** 1. row ***************************
- Slave_IO_State: Waiting for source to send event
- Master_Host: 172.17.139.230
- Master_User: greatsql
- Master_Port: 3307
- Connect_Retry: 60
- Master_Log_File: binlog.000078
- Read_Master_Log_Pos: 953750186
- Relay_Log_File: relaylog.000231
- Relay_Log_Pos: 953750396
- Relay_Master_Log_File: binlog.000078
- Slave_IO_Running: Yes
- Slave_SQL_Running: Yes
- Replicate_Do_DB:
- Replicate_Ignore_DB:
- Replicate_Do_Table:
- Replicate_Ignore_Table:
- Replicate_Wild_Do_Table:
- Replicate_Wild_Ignore_Table:
- Last_Errno: 0
- Last_Error:
- Skip_Counter: 0
- Exec_Master_Log_Pos: 953750186
- Relay_Log_Space: 953750683
- Until_Condition: None
- Until_Log_File:
- Until_Log_Pos: 0
- Master_SSL_Allowed: No
- Master_SSL_CA_File:
- Master_SSL_CA_Path:
- Master_SSL_Cert:
- Master_SSL_Cipher:
- Master_SSL_Key:
- Seconds_Behind_Master: 0
- Master_SSL_Verify_Server_Cert: No
- Last_IO_Errno: 0
- Last_IO_Error:
- Last_SQL_Errno: 0
- Last_SQL_Error:
- Replicate_Ignore_Server_Ids:
- Master_Server_Id: 1000403307
- Master_UUID: 1d0963e9-85d9-11ef-80e1-00163e28e06a
- Master_Info_File: mysql.slave_master_info
- SQL_Delay: 0
- SQL_Remaining_Delay: NULL
- Slave_SQL_Running_State: Replica has read all relay log; waiting for more updates
- Master_Retry_Count: 86400
- Master_Bind:
- Last_IO_Error_Timestamp:
- Last_SQL_Error_Timestamp:
- Master_SSL_Crl:
- Master_SSL_Crlpath:
- Retrieved_Gtid_Set: 1d0963e9-85d9-11ef-80e1-00163e28e06a:1-163235
- Executed_Gtid_Set: 1d0963e9-85d9-11ef-80e1-00163e28e06a:1-163235
- Auto_Position: 1
- Replicate_Rewrite_DB:
- Channel_Name:
- Master_TLS_Version:
- Master_public_key_path:
- Get_master_public_key: 0
- Network_Namespace:
- 1 row in set, 1 warning (0.00 sec)
复制代码 2、正在回放数据时,STOP SLAVE是否会有问题- #为保持主从表结构一致,主节点添加索引
- greatsql> SET sql_log_bin=0;
- greatsql> ALTER TABLE qj.sbtest1 ADD index tmp_k(k);
复制代码 3、如果主从布局作为数据节点,上层还有计算节点该如何保持元数据的划一性即计算节点记载的表的布局和数据节点的表布局要划一- greatsql> STOP SLAVE --当该GTID回放很长时间时,stop可能会很长时间,我们可以采用 kill 线程方式结束回放
- greatsql> SHOW PROCESSLIST; --找到会话ID
- greatsql> KILL 会话号
- greatsql> SHOW SLAVE STATUS \G
- -- 添加索引即可
复制代码 五、总结
加快无索引表的回放重要是针对该表上进行 DELETE 或 UPDATE 操作时有用,而且我们操作的时候要注意是否能记载binlog日志,保证主从数据划一性,当从节点作为数据节点即存储节点的角色时,要关注计算节点即代理层元数据与存储节点元数据的划一性或者是符合计算节点关于该表的定义规则。
Enjoy GreatSQL
免责声明:如果侵犯了您的权益,请联系站长,我们会及时删除侵权内容,谢谢合作!更多信息从访问主页:qidao123.com:ToB企服之家,中国第一个企服评测及商务社交产业平台。 |