glusterfs集群故障恢复
以2台主机最简单的副本方式举例,安装步骤如下,先配置hosts,主机名,安装gluster服务
1 2 3 4 5 6 7 8 9 10 |
OS: CentOS 7.4 cat > /etc/hosts <<EOF 192.168.0.198 master 192.168.0.199 slave EOF hostnamectl --static set-hostname master/slave exec $SHELL yum install centos-release-gluster -y yum install -y glusterfs glusterfs-server glusterfs-fuse glusterfs-rdma glusterfs-geo-replication glusterfs-devel systemctl start glusterd && systemctl enable glusterd && systemctl status glusterd |
添加其他节点,创建volume并启动
1 2 3 4 5 6 7 8 |
gluster peer probe slave mkdir /data gluster peer status gluster volume create gfs-volume replica 2 transport tcp master:/data slave:/data force ls -a /data/ gluster volume info gluster volume start gfs-volume gluster volume info gfs-volume |
其他主机挂载gfs
1 2 3 4 |
cat > /etc/fstab <<EOF master:gfs-volume /opt glusterfs defaults,_netdev 0 0 EOF mount -a |
假如slave主机挂掉,并且不能启动,有2种方式可以恢复,
1,再创建一台主机,保持主机名和ip同故障主机一致
2,新创建一台主机,替换故障的主机
方案1,如果还是用192.168.0.199 这个ip作为slave节点,先安装glusterfs服务,不用启动
既然slave挂了,那么先要找到slave主机gfs的uuid,在正常的节点查看,这里是在master节点
1 2 3 4 |
# cat /var/lib/glusterd/peers/* uuid=090f4559-e1a4-43ed-8a3d-2edd4042ce50 state=3 hostname1=slave |
在挂掉的slave主机编辑gfs配置后,启动gfs服务
1 2 3 4 |
# cat /var/lib/glusterd/glusterd.info UUID=090f4559-e1a4-43ed-8a3d-2edd4042ce50 operating-version=1 # systemctl start glusterd |
加入集群,检查状态,如果不ok,那么多重启几次glusterfs
1 2 3 4 |
gluster peer status gluster peer probe master gluster peer status systemctl restart glusterd |
同步数据
1 2 3 4 |
gluster volume info gluster volume sync master all cat /var/lib/glusterd/glusterd.info 查看 operating-version 这个数值已经变动 |
查看状态
1 |
gluster volume heal gfs-volume info |
方案2,替代故障主机
创建主机,主机名比如为 three,ip为 192.168.0.200,在正常运行的gfs主机添加新主机,替换故障主机
1 2 |
gluster peer probe three gluster volume replace-brick gfs-volume slave:data three:/data commit force |
查看状态
1 |
gluster volume heal gfs-volume info |
因为gfs模式为replica ,到这里也就结束了,如果是Distributed ,需要rebalance
1 |
gluster volume rebalance gfs-volume fix-layout start |
附:
移除一个gfs节点Remove a brick
1 |
gluster volume remove-brick gfs-volume replica 1 slave:/data start |
移除slave主机的数据,只保留1份,即master
查看移除的状态
1 |
gluster volume remove-brick gfs-volume replica 1 slave:/data status |
添加一个副本
1 |
gluster volume add-brick gfs-volume replica 2 slave:/data |
docs:
Managing GlusterFS Volumes
https://docs.gluster.org/en/latest/Administrator%20Guide/Managing%20Volumes/
GlusterFS Architecture
https://docs.gluster.org/en/latest/Quick-Start-Guide/Architecture/
- 原创作品欢迎转载,请以超链接形式注明出处: http://bbotte.com/server-config/glusterfs-cluster-failure-recovery/
- 文章信息: bbotte 于 linux工匠|关注运维自动化|Python开发|linux高可用集群|数据库维护|性能提优|系统架构 发表
到你的博客走一趟,如同阳光洒在我脸上,心里暖洋洋!
这里真心不错,每次来都有新收获!