KingbbaseES V8R6集群维护案例之---集群之间数据迁移

打印 上一主题 下一主题

主题 915|帖子 915|积分 2749

案例说明:
生产环境是集群环境,测试环境是集群,现需要将生产环境的数据迁移到测试集群中运行,本文档详细介绍了从集群环境迁移数据的操作步骤,可以作为生产环境迁移数据的参考。
适用版本:
    KingbaseES  V8R6
本案例数据库版本(集群使用相同的版本):
  1. test=# select version();
  2.                                                        version
  3. ----------------------------------------------------------------------------------------------------------------------
  4. KingbaseES V008R006C005B0041 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-46), 64-bit
  5. (1 row)
复制代码
生产集群节点信息:
  1. [kingbase@node1 bin]$ ./repmgr cluster show
  2. ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                   
  3. ----+-------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------------------------------------------------------------------------------------------
  4. 1  | node1 | primary | * running |          | default  | 100      | 1        | host=192.168.1.201 user=system dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
  5. 2  | node2 | standby |   running | node1    | default  | 100      | 1        | host=192.168.1.202 user=system dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
复制代码
测试集群节点信息:
  1. [kingbase@node101 bin]$ ./repmgr cluster show
  2. ID | Name    | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                 
  3. ----+---------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------------------------------------------------------------------------------------------
  4. 1  | node101 | primary | * running |          | default  | 100      | 13       | host=192.168.1.101 user=system dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
  5. 2  | node102 | standby |   running | node101  | default  | 100      | 13       | host=192.168.1.102 user=system dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
复制代码
一、生产环境迁移数据前的准备
1、生产环境数据信息
  1. prod=# \l
  2.                              List of databases
  3.    Name    | Owner  | Encoding | Collate  |    Ctype    | Access privileges
  4. -----------+--------+----------+----------+-------------+-------------------
  5. esrep     | system | UTF8     | ci_x_icu | zh_CN.UTF-8 |
  6. prod      | system | UTF8     | ci_x_icu | zh_CN.UTF-8 |
  7. prod1     | system | UTF8     | ci_x_icu | zh_CN.UTF-8 |
  8. prod2     | system | UTF8     | ci_x_icu | zh_CN.UTF-8 |
  9. security  | system | UTF8     | ci_x_icu | zh_CN.UTF-8 |
  10. template0 | system | UTF8     | ci_x_icu | zh_CN.UTF-8 | =c/system        +
  11.            |        |          |          |             | system=CTc/system
  12. template1 | system | UTF8     | ci_x_icu | zh_CN.UTF-8 | =c/system        +
  13.            |        |          |          |             | system=CTc/system
  14. test      | system | UTF8     | ci_x_icu | zh_CN.UTF-8 |
  15. (8 rows)
  16. prod=# select count(*) from t2;
  17. count
  18. --------
  19. 100000
  20. (1 row)
复制代码
2、关闭生产集群
  1. [kingbase@node1 bin]$ ./sys_monitor.sh stop
  2. 2022-06-20 16:00:46 Ready to stop all DB ...
  3. .......
  4. 2022-06-20 16:01:02 DB on "[192.168.1.202]" stop success.
  5. 2022-06-20 16:01:02 begin to stop DB on "[192.168.1.101]".
  6. waiting for server to shut down....... done
  7. server stopped
  8. 2022-06-20 16:01:06 DB on "[192.168.1.201]" stop success.
  9. 2022-06-20 16:01:06 Done.
复制代码
二、迁移生产数据到测试环境
Tips:
1)将生产数据迁移到集群,需要停止生产数据库服务,根据data目录数据的大小,要估算停机窗口时间。
2)在生产数据库前,建议手工创建检查点,如果wal日志比较大,建议备份后,清理wal日志,只需要保留最近一天的日志到最近检查点后即可。
3)需要跨主机将生产库主库data目录拷贝到集群的主备库节点,需                                                                                                                                                                           根据网络带宽和节点数,估算整个拷贝时间。
1、关闭测试集群
  1. [kingbase@node101 bin]$ ./sys_monitor.sh stop
  2. 2022-06-20 16:10:46 Ready to stop all DB ...
  3. 2022-06-20 16:11:02 DB on "[192.168.1.102]" stop success.
  4. 2022-06-20 16:11:02 begin to stop DB on "[192.168.1.101]".
  5. waiting for server to shut down....... done
  6. server stopped
  7. 2022-06-20 16:11:06 DB on "[192.168.1.101]" stop success.
  8. 2022-06-20 16:11:06 Done.
复制代码
2、将测试库数据备份
[kingbase@node101 kingbase]$ mv data data.bk
3、拷贝生产集群主库data到测试集群主备库(所有节点)
1)拷贝生产数据到测试库
  1. [kingbase@node1 kingbase]$ scp -r data node101:/home/kingbase/cluster/R6HA/kha/kingbase/
  2. [kingbase@node1 kingbase]$ scp -r data node102:/home/kingbase/cluster/R6HA/kha/kingbase/
复制代码
2)备库创建standby.signal
[kingbase@node102 data]$ touch standby.signal
3)复制测试集群的数据库配置文件:(所有节点)
  1. [kingbase@node101 data]$ cp ../data.bk/kingbase.auto.conf ./
  2. [kingbase@node101 data]$ cp ../data.bk/kingbase.conf ./
  3. [kingbase@node101 data]$ cp ../data.bk/es_rep.conf ./
  4. # 查看kingbase.auto.conf
  5. [kingbase@node101 data]$ cat kingbase.auto.conf
  6. # Do not edit this file manually!
  7. # It will be overwritten by the ALTER SYSTEM command.
  8. enable_upper_colname = 'on'
  9. wal_retrieve_retry_interval = '5000'
  10. primary_conninfo = 'user=system connect_timeout=10 host=192.168.1.102 port=54321 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 application_name=node101'
  11. recovery_target_timeline = 'latest'
  12. primary_slot_name = 'repmgr_slot_1'
  13. synchronous_standby_names = ''
复制代码
三、重新注册集群节点
Tips:
因为data数据中存储的是原生产集群的节点信息(esrep库),所以要根据测试库的repmgr.conf文件重新注册节点。
1、启动主备库数据库服务
  1. [kingbase@node101 bin]$ ./sys_ctl restart -D ../data
  2. waiting for server to shut down.... done
  3. .......
  4. server started
复制代码
2、查看节点状态信息
  1. [kingbase@node101 bin]$ ./repmgr cluster show
  2. ID | Name  | Role    | Status        | Upstream | Location | Priority | Timeline | Connection string                              
  3. ----+-------+---------+---------------+----------+----------+----------+----------+----------------------------------------------------------------------------------------------------------------------------------------------------
  4. 1  | node1 | primary | ? unreachable |          | default  | 100      | ?        | host=192.168.1.201 user=system dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
  5. 2  | node2 | standby | - failed      | node1    | default  | 100      | ?        | host=192.168.1.202 user=system dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
  6. WARNING: following issues were detected
  7.   - unable to connect to node "node1" (ID: 1)
  8.   - node "node1" (ID: 1) is registered as an active primary but is unreachable
  9.   - unable to connect to node "node2" (ID: 2)
  10. # 如上所示,因为repmgr.conf和esrep库的注册信息不一致,现在集群节点处于非正常状态。
复制代码
3、重新注册主备库
1)注册主库
  1. [kingbase@node101 bin]$ ./repmgr primary register --force
  2. INFO: connecting to primary database...
  3. INFO: "repmgr" extension is already installed
  4. NOTICE: primary node record (ID: 1) updated
  5. [kingbase@node101 bin]$ ./repmgr cluster show
  6. ID | Name    | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                 
  7. ----+---------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------------------------------------------------------------------------------------------
  8. 1  | node101 | primary | * running |          | default  | 100      | 1        | host=192.168.1.101 user=system dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
  9. 2  | node2   | standby | - failed  | node101  | default  | 100      | ?        | host=192.168.1.202 user=system dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
  10. WARNING: following issues were detected
  11.   - unable to connect to node "node2" (ID: 2)
复制代码
2)注册备库
  1. [kingbase@node102 bin]$ ./repmgr standby register --force
  2. INFO: connecting to local node "node102" (ID: 2)
  3. INFO: connecting to primary database
  4. INFO: standby registration complete
  5. NOTICE: standby node "node102" (ID: 2) successfully registered
  6. [kingbase@node102 bin]$ ./repmgr cluster show
  7. ID | Name    | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                 
  8. ----+---------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------------------------------------------------------------------------------------------
  9. 1  | node101 | primary | * running |          | default  | 100      | 1        | host=192.168.1.101 user=system dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
  10. 2  | node102 | standby |   running | node101  | default  | 100      | 1        | host=192.168.1.102 user=system dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
  11. # 如上所示,节点注册完成后,集群节点状态正常。
复制代码
四、查看流复制状态
Tips:
如果生产集群和测试集群的流复制复制槽名称不一致,可能需要重建复制槽。
  1. # 复制槽信息
  2. test=# select * from sys_replication_slots;
  3.    slot_name   | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn
  4. ---------------+--------+-----------+--------+----------+-----------+--------+------------+------+--------------+-------------+---------------------
  5. repmgr_slot_2 |        | physical  |        |          | f         | t      |      26620 |  963 |              | 0/4E000FB8  |
  6. repmgr_slot_1 |        | physical  |        |          | f         | f      |            |      |              |             |
  7. (2 rows)
  8. # 流复制状态信息
  9. test=# select * from sys_stat_replication;
  10.   pid  | usesysid | usename | application_name |  client_addr  | client_hostname | client_port |         backend_start         | backend_xmin |   state   |  sent_lsn  | write_lsn  | flush_lsn  | replay_lsn | write_lag | flush_lag | replay_lag | sync_priority | sync_state |          reply_time
  11. -------+----------+---------+------------------+---------------+-----------------+-------------+-------------------------------+----
  12. 26620 |       10 | system  | node102          | 192.168.1.102 |                 |       49454 | 2022-06-20 17:08:15.347515+08 |          | streaming | 0/4E000FB8 | 0/4E000FB8 | 0/4E000FB8 | 0/4E000FB8 |           |           |            |             1 | sync       | 2022-06-20 17:11:44.277666+08
  13. (1 row)
复制代码
五、验证数据
1、查看迁移后数据
  1. test=# \l
  2.                              List of databases
  3.    Name    | Owner  | Encoding | Collate  |    Ctype    | Access privileges
  4. -----------+--------+----------+----------+-------------+-------------------
  5. esrep     | system | UTF8     | ci_x_icu | zh_CN.UTF-8 |
  6. prod      | system | UTF8     | ci_x_icu | zh_CN.UTF-8 |
  7. prod1     | system | UTF8     | ci_x_icu | zh_CN.UTF-8 |
  8. prod2     | system | UTF8     | ci_x_icu | zh_CN.UTF-8 |
  9. security  | system | UTF8     | ci_x_icu | zh_CN.UTF-8 |
  10. template0 | system | UTF8     | ci_x_icu | zh_CN.UTF-8 | =c/system        +
  11.            |        |          |          |             | system=CTc/system
  12. template1 | system | UTF8     | ci_x_icu | zh_CN.UTF-8 | =c/system        +
  13.            |        |          |          |             | system=CTc/system
  14. test      | system | UTF8     | ci_x_icu | zh_CN.UTF-8 |
  15. (8 rows)
  16. prod=# select count(*) from t2;
  17. count
  18. --------
  19. 100000
  20. (1 row)
复制代码
2、重启集群
  1. [kingbase@node101 bin]$ ./sys_monitor.sh restart
  2. 2022-06-20 17:12:39 Ready to stop all DB ...
  3. .......
  4. 2022-06-20 17:13:14 repmgrd on "[192.168.1.102]" start success.
  5. ID | Name    | Role    | Status    | Upstream | repmgrd | PID   | Paused? | Upstream last seen
  6. ----+---------+---------+-----------+----------+---------+-------+---------+--------------------
  7. 1  | node101 | primary | * running |          | running | 28907 | no      | n/a
  8. 2  | node102 | standby |   running | node101  | running | 26280 | no      | 2 second(s) ago
  9. [2022-06-20 17:13:19] [NOTICE] redirecting logging output to "/home/kingbase/cluster/R6HA/kha/kingbase/log/kbha.log"
  10. [2022-06-20 17:13:22] [NOTICE] redirecting logging output to "/home/kingbase/cluster/R6HA/kha/kingbase/log/kbha.log"
  11. 2022-06-20 17:13:29 Done.
复制代码
六、总结
  1. 1、从集群环境迁移数据,如果需要保证数据一致,必须要将集群停库,对于生产环境,要考虑停机窗口。
  2. 2、如果需要将目标集群数据重新加载到新的集群,需要将目标集群数据做逻辑备份,但是在导入时如果有重复数据需注意处理。(如将测试数据再导入到迁移后的集群中,可能有许多数据会重复)。
  3. 3、申请停机窗口,要考虑源集群数据量的大小、主机间的网络带宽、集群节点数、集群配置时间、集群启动故障的处理时间等。
复制代码
免责声明:如果侵犯了您的权益,请联系站长,我们会及时删除侵权内容,谢谢合作!
回复

使用道具 举报

0 个回复

正序浏览

快速回复

您需要登录后才可以回帖 登录 or 立即注册

本版积分规则

王國慶

金牌会员
这个人很懒什么都没写!

标签云

快速回复 返回顶部 返回列表