Hi~这里是ProXiao
一、repmgr概述
repmgr:是一个用于增强和管理PostgreSQL数据库内建复制和故障转移机制的开源工具集。其主要功能包罗:设置备用服务器、监控复制状态、以及在故障发生时主动或手动实行故障转移和切换操作。
- Replication cluster: 在repmgr中,“replication cluster”是指一组通过流式复制技能连接在一起的PostgreSQL服务器。这些服务器之间复制数据,从而确保数据一致性和高可用性。
- Node: 在复制集群中,”node”表示单个的PostgreSQL服务器实例。每个节点可以扮演主节点或备用节点的脚色。
- Upstream node: 在备用服务器的上下文中,”upstream node”是指该备用服务器正在吸收复制数据的节点。这通常指的是主节点,但在级联复制中也大概是另一台备用节点。
- Failover: “Failover”操作发生在主节点失效时,一个选定的备用节点被提升为新的主节点。`repmgrd`守护进程可以设置为支持主动故障转移,以只管减少服务中断的时间。
- Switchover: “Switchover”是一种受控的操作,用于主动将主节点的脚色切换到一台备用节点上。与故障转移不同,切换是在没有主节点失效的情况下出于主动维护等缘故原由举行的。
- Fencing: 在举行故障转移后,为了防止原主节点不期望地重新参加集群并造成数据辩论(称为脑裂情况),必须实施”fencing”策略。Fencing能确保原主节点与集群的别的部分保持隔离。
- Witness server: repmgr支持设置一个”Witness server”,其不到场数据复制,但包含关于集群状态的元数据。它的作用是在故障转移时资助确定谁是最适合成为新主节点的备用服务器。Witness server能提供额外的信息来协助做出正确选择,从而确保集群的一致性和稳固性。
二、组件说明
repmgr 包罗两个主要的组件:
- repmgr:这是一个命令行工具,用于实行多种管理任务,比如:
- 设置和启动备用服务器
- 将备用服务器提升为新的主服务器
- 在主服务器和备用服务器之间举行切换
- 显示复制集群中各服务器的状态
- repmgrd:这是一个守护进程,其主动监控复制集群并实行如下任务:
- 监控复制性能并记载相干数据
- 通过检测到主服务器的故障并主动提升最合适的备用服务器来实施故障转移
- 向用户定义的脚本发送集群中事故的通知,这些脚本可以用来实行任务,比如发送电子邮件警报等
三、安装部署
1、情况
留意要点
- 不支持在win上部署
- 同一套集群PG版本统一
- repmgr安装统一版本而且集群中所有节点都必须安装
repmgr+pg版本对应关系(版本关系可以在github检察或Document检察)
repmgr版本 | PG对应版本 | repmgr 5.4 | 9.4, 9.5, 9.6, 10, 11, 12, 13, 15,16 | repmgr 5.3 | 9.4, 9.5, 9.6, 10, 11, 12, 13, 14, 15 | repmgr 5.2 | 9.4, 9.5, 9.6, 10, 11, 12, 13 | repmgr 5.1 | 9.3, 9.4, 9.5, 9.6, 10, 11, 12 | repmgr 5.0 | 9.3, 9.4, 9.5, 9.6, 10, 11, 12 | repmgr 4.x | 9.3, 9.4, 9.5, 9.6, 10, 11 | repmgr 3.x | 9.3, 9.4, 9.5, 9.6 | repmgr 2.x | 9.0, 9.1, 9.2, 9.3, 9.4 | 2、安装:
注:postgresql已安装ok(不认识的可以yum一次性解决)
- #修改pg配置postgresql.conf
- listen_addresses = '*'
- wal_level = logical
- wal_log_hints = on
- #重启pg
- systemctl restart postgresql-15
复制代码
- #创建repmgr账号和库
- create user repmgr with superuser password 'repmgr123';
- create database repmgr owner repmgr;
- 配置认证pg_hba.conf
- # 允许用户 repmgr 通过local,127.0.0.1,10.248.32. 连接到replication
- local replication repmgr trust
- host replication repmgr 10.248.32.187/24 trust
- host replication repmgr 10.248.32.188/24 trust
- # 允许用户 repmgr 通过local,127.0.0.1,10.248.32. 连接到repmgr schema
- local repmgr repmgr trust
- host repmgr repmgr 10.248.32.187/24 trust
- host repmgr repmgr 10.248.32.188/24 trust
- #重启pg
- systemctl reload postgresql-15
复制代码
- #选择任意节点创建密钥对(一路回车什么都不输入)
- ssh-keygen -t rsa -b
- Generating public/private rsa key pair.
- Enter file in which to save the key (/var/lib/postgresql/.ssh/id_rsa):
- Created directory '/var/lib/postgresql/.ssh'.
- Enter passphrase (empty for no passphrase):
- Enter same passphrase again:
- Your identification has been saved in /var/lib/postgresql/.ssh/id_rsa.
- Your public key has been saved in /var/lib/postgresql/.ssh/id_rsa.pub.
- The key fingerprint is:
- SHA256:fokF65XAW82Z8xI1SJuPlmCKnEuchkj6uder8nVp+c4 postgres@cda1-032187-test-tb-postgresql-goodscenter
- The key's randomart image is:
- +---[RSA 4096]----+
- | ...o |
- | . o.* . |
- | . + + X |
- | o . + + O o B |
- |. . . O S + = o |
- | . . o + * o . |
- | o .o O o |
- | .....o + |
- | .+o... .E |
- +----[SHA256]-----+
- cat /var/lib/postgresql/.ssh/id_rsa.pub >/var/lib/postgresql/.ssh/authorized_keys
- #将密钥信息
- scp -r /var/lib/postgresql/.ssh/ root@other-ip:/var/lib/postgresql/
- -- other节点执行权限变更
- chmod 0700 /var/lib/postgresql/.ssh/
- chmod 0600 /var/lib/postgresql/.ssh/*
- chown postgres:postgres /var/lib/postgresql/.ssh/ -R
- #所有节点配置pgpass
- ip1:5432:repmgr:repmgr:repmgr123
- ip2:5432:repmgr:repmgr:repmgr123
- chmod 0600 .pgpass
复制代码
- -- 注册primary节点(IP1)
- cat /etc/repmgr.conf
- node_id=****
- node_name='****'
- conninfo='host=**** port=**** user=**** dbname=**** connect_timeout=****'
- data_directory='/pgdata/'
- ssh_options='-q -o ConnectTimeout=10'
- -- 修改权限
- chown postgres:postgres /etc/repmgr.conf
- -- 注入primary node
- su - postgres
- repmgr -f /etc/repmgr.conf primary register
- INFO: connecting to primary database...
- NOTICE: attempting to install extension "repmgr"
- NOTICE: "repmgr" extension successfully installed
- NOTICE: primary node record (ID: 1) registered
- -- 验证集群
- repmgr -f /etc/repmgr.conf cluster show
- ID| Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
- ----+---------------+---------+-----------+----------+----------+----------+----------+--------------------------------------------------------------------------
- **** | **** | primary | * running | | default | 100 | 1 | host=**** port=**** user=**** dbname=**** connect_timeout=****
- -- 元数据表中的记录
- repmgr=# select * from nodes;
- -[ RECORD 1 ]----+-------------------------------------------------------------------------
- node_id | ****
- upstream_node_id |
- active | t
- node_name | ****
- type | primary
- location | default
- priority | 100
- conninfo | host=**** port=**** user=**** dbname=**** connect_timeout=****
- slot_name |
- config_file | /etc/repmgr.conf
- #在pg1写入测试数据
- psql -c "create database demo01;"
- pgbench -i -s 20 -d demo01;
复制代码
- -- 注册standby节点(IP2)
- cat /etc/repmgr.conf
- node_id=****
- node_name='****'
- conninfo='host=**** port=**** user=**** dbname=**** connect_timeout=****'
- data_directory='/pgdata/'
- ssh_options='-q -o ConnectTimeout=10'
- -- 修改权限
- chown postgres:postgres /etc/repmgr.conf
- -- 使用参数--dry-run 检查是否可以克隆从库
- 主要检查如下几点:
- 检查目录
- 检查参数 max_wal_senders 是否大于2
- 检查参数 wal_log_hints
- 检查通过会执行备份命令 pg_basebackup -l "repmgr base backup"
- systemctl stop postgresql-16
- -- 停止pg才能执行如下步骤(如果当前实例pgdata目录不为空,则加上--force参数)
- repmgr -h ip -U user -d database -f /etc/repmgr.conf standby clone --dry-run
- NOTICE: destination directory "/pgdata" provided
- INFO: connecting to source node
- DETAIL: connection string is: host=**** port=**** user=**** dbname=****
- DETAIL: current installation size is 337 MB
- INFO: replication slot usage not requested; no replication slot will be set up for this standby
- NOTICE: checking for available walsenders on the source node (2 required)
- NOTICE: checking replication connections can be made to the source server (2 required)
- INFO: checking and correcting permissions on existing directory "/pgdata"
- NOTICE: starting backup (using pg_basebackup)...
- HINT: this may take some time; consider using the -c/--fast-checkpoint option
- INFO: executing:
- pg_basebackup -l "repmgr base backup" -D /pgdata -h ip -p port -U user -X stream
- NOTICE: standby clone (using pg_basebackup) complete
- NOTICE: you can now start your PostgreSQL server
- HINT: for example: pg_ctl -D /pgdata start
- HINT: after starting the server, you need to register this standby with "repmgr standby register"
- -- 启动从库
- systemctl start postgresql-16
- --注册从节点
- repmgr -f /etc/repmgr.conf standby register
- INFO: connecting to local node "ip2" (ID: 2)
- INFO: connecting to primary database
- WARNING: --upstream-node-id not supplied, assuming upstream node is primary (node ID: 1)
- INFO: standby registration complete
- NOTICE: standby node "ip2" (ID: 2) successfully registered
- -- 查看集群信息
- repmgr -f /etc/repmgr.conf cluster show
- ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
- ----+---------------+---------+-----------+---------------+----------+----------+----------+--------------------------------------------------------------------------
- 1 | ip1 | primary | * running | | default | 100 | 1 | host=**** port=**** user=**** dbname=**** connect_timeout=****
- 2 | ip2 | standby | running | | default | 100 | 1 | host=**** port=**** user=**** dbname=**** connect_timeout=****
- -- 主从切换
- repmgr -f /etc/repmgr.conf standby switchover
- repmgr -f /etc/repmgr.conf cluster show
- ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
- ----+---------------+---------+-----------+----------+----------+----------+----------+--------------------------------------------------------------------------
- 1 | ip1 | standby | running | | default | 100 | 3 | host=**** port=**** user=**** dbname=**** connect_timeout=****
- 2 | ip2 | primary | * running | | default | 100 | 4 | host=**** port=**** user=**** dbname=**** connect_timeout=****
复制代码 四、总结
通过上述操作后,repmgr即可管理一套postgresql 1主1从的集群,但仍旧遗留了部分事项:
- 没有实现failover主动感知pg节点异常举行主从切换
- 节点扩容、节点缩容、节点更换等步骤
- 主从入口分离【因为是云服务器,思量是联合slb来实现,而不是pgpool来复杂化】
更多有关智能化前沿洞察资讯在这儿~
欢迎留言关注ProXiao
免责声明:如果侵犯了您的权益,请联系站长,我们会及时删除侵权内容,谢谢合作!更多信息从访问主页:qidao123.com:ToB企服之家,中国第一个企服评测及商务社交产业平台。 |