介绍

​ 早期的redis分布式集群部署方案:

	1)客户端分区:有客户端程序决定key写分配合写入的redis node,但是需要客户端自己处理写入分配,高可用管理和故障转移等待。

	2)代理方案:基于三方软件实现redis proxy,客户端先连接代理层,由代理层实现key的写入分配,对客户端来说是由比较简单,但是对于集群管理节点增减和相对比较麻烦,而且代理本身也是单点和性能瓶颈。

​ 在哨兵sentinel机制中,可以解决redis高可用的问题,即当master故障后可以自动将slave提升为master从而可以保证redis服务的正常使用,但是无法解决redis单机写入的瓶颈问题,即单机的redis写入性能受限于单机的内存大小、并发数量、网卡速率等因素,因此redis官方在redis 3.0版本之后推出了无中心架构的redis cluster机制,在无中心的redis集群当中,其每个节点保存当前节点数据和整个集群状态,每个节点都和其他所有节点连接,特点如下:

	1:所有Redis节点使用(PING机制)互联 
	2:集群中某个节点的失效,是整个集群中超过半数的节点监测都失效才算真正的失效 
	3:客户端不需要proxy即可直接连接redis,应用程序需要写全部的redis服务器IP。 
	4:redis cluster把所有的redis node映射到 0-16383个槽位(slot)上,读写需要到指定的redis node上进行 操作,因此有多少个reids node相当于redis 并发扩展了多少倍。 
	5:Redis cluster预先分配16384个(slot)槽位,当需要在redis集群中写入一个key -value的时候,会使用 CRC16(key) mod 16384之后的值,决定将key写入值哪一个槽位从而决定写入哪一个Redis节点上,从而有效解决单 机瓶颈。

为什么用集群:

​ 主从:不能做故障切换,但节点数据压力过大

​ 哨兵:及诶单的压力过大

简单原理

​ 集群会把数据做成数据集,然后数据集分成16384个槽,集群会把这16384个槽,平均分布在集群中的master节点上,然后读写依赖基于一次性hash的算法,把数据写到相对应的槽里;对应每个master都最少一个slave,slave节点上存储的是master上数据副本,一旦master节点不可用,这个slave节点会自动将自己的角色提升为master节点,群集就可以继续提供服务了。

群集架构

基本架构

假如三个主节点分别是:A, B, C 三个节点,采用哈希槽 (hash slot)的方式来分配16384个slot 的话,它们三个节点分别承担的slot 区间是:

节点A覆盖 0-5460 
节点B覆盖 5461-10922 
节点C覆盖 10923-16383

主从架构

Redis cluster的架构虽然解决了并发的问题,但是又引入了一个新的问题,每个Redis master的高可用如何解决?

部署流程

环境准备

环境A:三台服务器,每台服务器启动6379和6380两个redis 服务。

192.168.66.140:3679/3680
192.168.66.141:3679/3680
192.168.66.142:3679/3680

另外预留一台服务器做集群添加节点测试。

192.168.66.143:3679/3680

环境B:生产环境建议直接6台服务器。

192.168.66.140:3679
192.168.66.141:3679
192.168.66.142:3679
192.168.66.143:3679
192.168.66.144:3679
192.168.66.145:3679
192.168.66.146:3679			#预留服务器
192.168.66.147:3679			#预留服务器
192.168.66.148:3679			#预留服务器

创建集群的前提

​ 1、每个redis node节点采用相同的硬件配置、相同的密码、相同的redis版本

​ 2、每个节点必须开启的参数:

cluster-enabled yes 				     #必须开启集群状态,开启后redis 进程会有cluster显示
cluster-config-file nodes-6380.conf 	  #此文件有redis cluster集群自动创建和维护,不需要任何手动操作

​ 3、所有redis服务器必须没有任何数据,/apps/redis/data中的数据。rm -rf 就行

​ 4、先启动为单机redis且没有任何键和值

修改配置文件

​ vim /apps/redis/etc/redis.conf

1385 cluster-enabled yes				     #开启集群
1393 cluster-config-file nodes-6379.conf	  #集群内部配置文件
1399 cluster-node-timeout 15000				 #超时时间,单位毫秒

​ 注意:如果之前做过主从复制,那么master机器上配置文件中最后一行会有replicaof从的地址,需要删除掉。

启动redis

​ 都先当单节点启动redis,启动六个实力

​ sudo -u redis /apps/redis/bin/redis-server /apps/redis/etc/redis-6379.conf

root@zhao:/apps/redis/etc# sudo -u redis /apps/redis/bin/redis-server /apps/redis/etc/redis-6379.conf 
root@zhao:/apps/redis/etc# sudo -u redis /apps/redis/bin/redis-server /apps/redis/etc/redis-6380.conf
root@zhao:/apps/redis/etc# ps aux | grep redis
redis  1340  0.0  0.1  65340  5092 ?      Ssl  14:52   0:00 /apps/redis/bin/redis-server 192.168.66.140:6379 [cluster]
redis  1347  0.1  0.1  65340  5256 ?      Ssl  14:52   0:00 /apps/redis/bin/redis-server 192.168.66.140:6380 [cluster]
root   1353  0.0  0.0  14436  1060 pts/0  S+   14:53   0:00 grep --color=auto redis

​ 注释:ps查看后[cluster]表示是群集的标志。

创建集群

​ 找到任意一个redis客户端节点就可以进行创建集群。

​ redis-cli -a <密码> --cluster create 想要添加的群集内的主机地址 --cluster-replicas1

​ 注释:-a 如果redis有密码可以指定,没有就不用

​ --cluster create 添加到集群中的主机地址:端口

​ --cluster-replicas 1 副本数量,1为一主一从

root@zhao:/apps/redis/etc# redis-cli --cluster create 192.168.66.140:6379 192.168.66.140:6380 192.168.66.141:6379 192.168.66.141:6380 192.168.66.142:6379 192.168.66.142:6380 --cluster-replicas 1
>>> Performing hash slots allocation on 6 nodes...
Master[0] -> Slots 0 - 5460
Master[1] -> Slots 5461 - 10922
Master[2] -> Slots 10923 - 16383
Adding replica 192.168.66.141:6380 to 192.168.66.140:6379
Adding replica 192.168.66.142:6380 to 192.168.66.141:6379
Adding replica 192.168.66.140:6380 to 192.168.66.142:6379
M: 1165c1bf292e4222f28f974df13735c6d92f32f1 192.168.66.140:6379			#带M的为master
   slots:[0-5460] (5461 slots) master								  #当前master的槽位起始和结束位
S: 9386d35562a4357a6a8190a52c7d23f86c647dd2 192.168.66.140:6380			#带S的slave
   replicates 2f3a189d1e958ac06dc7b893b88d54f8a420c5f7
M: 9868445a43dedf5458627176b08c8973e8af1b38 192.168.66.141:6379
   slots:[5461-10922] (5462 slots) master
S: 88925e3df3c32748a1e05321f3d41f1b77eba6a1 192.168.66.141:6380
   replicates 1165c1bf292e4222f28f974df13735c6d92f32f1
M: 2f3a189d1e958ac06dc7b893b88d54f8a420c5f7 192.168.66.142:6379
   slots:[10923-16383] (5461 slots) master
S: 2c8edf98b01cadf9991da42ce6219a7d06f1be71 192.168.66.142:6380
   replicates 9868445a43dedf5458627176b08c8973e8af1b38
Can I set the above configuration? (type 'yes' to accept): yes			#输入yes自动创建集群
>>> Nodes configuration updated
>>> Assign a different config epoch to each node
>>> Sending CLUSTER MEET messages to join the cluster
Waiting for the cluster to join
.
>>> Performing Cluster Check (using node 192.168.66.140:6379)
M: 1165c1bf292e4222f28f974df13735c6d92f32f1 192.168.66.140:6379			#master的ID及端口
   slots:[0-5460] (5461 slots) master								  #已经分配的槽位
   1 additional replica(s)											 #分配了一个slave
S: 2c8edf98b01cadf9991da42ce6219a7d06f1be71 192.168.66.142:6380
   slots: (0 slots) slave											 #slave没有分配槽位
   replicates 9868445a43dedf5458627176b08c8973e8af1b38
S: 88925e3df3c32748a1e05321f3d41f1b77eba6a1 192.168.66.141:6380
   slots: (0 slots) slave
   replicates 1165c1bf292e4222f28f974df13735c6d92f32f1
M: 2f3a189d1e958ac06dc7b893b88d54f8a420c5f7 192.168.66.142:6379
   slots:[10923-16383] (5461 slots) master
   1 additional replica(s)
M: 9868445a43dedf5458627176b08c8973e8af1b38 192.168.66.141:6379
   slots:[5461-10922] (5462 slots) master
   1 additional replica(s)
S: 9386d35562a4357a6a8190a52c7d23f86c647dd2 192.168.66.140:6380
   slots: (0 slots) slave
   replicates 2f3a189d1e958ac06dc7b893b88d54f8a420c5f7
[OK] All nodes agree about slots configuration.					   #所有节点槽位分配完成
>>> Check for open slots...									     #检查打开的槽位
>>> Check slots coverage...									     #检查插槽覆盖范围
[OK] All 16384 slots covered.								     #所有槽位(16384个)分配完成

查看群集状态

​ 由于未设置masterauth认证密码,所以主从未建立起来,但是集群已经运行,所以需要在每个slave控制台使用confifig set设置masterauth密码,或者写在每个redis配置文件中,最好是在控制点设置密码之后再写入配置文件当中。

登入其中一个master主机

​ info replication

root@zhao:/apps/redis/etc# redis-cli -h 192.168.66.140 -p 6379		#登入redis
192.168.66.140:6379>
192.168.66.140:6379> info replication
# Replication
role:master														   #节点类型
connected_slaves:1													#分配了一个slave
slave0:ip=192.168.66.141,port=6380,state=online,offset=560,lag=0		#slave节点信息
master_failover_state:no-failover
master_replid:8ac9fa8466724c5f71af6c9bcaf0d1bfb0839a9c
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:560
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:560

登入其中一个slave主机

​ info replication

root@zhao:/apps/redis/etc# redis-cli -h 192.168.66.140 -p 6380
192.168.66.140:6380>
192.168.66.140:6380> info replication
# Replication
role:slave								#节点类型
master_host:192.168.66.142				  #master地址
master_port:6379						 #master端口
master_link_status:up					 #启动状态
master_last_io_seconds_ago:0
master_sync_in_progress:0
slave_read_repl_offset:826
slave_repl_offset:826
slave_priority:100
slave_read_only:1
replica_announced:1
connected_slaves:0
master_failover_state:no-failover
master_replid:35c8ae618e61a419d921fd70c8be614dc64e3ab3
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:826
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:826

查看整个集群信息

​ cluster info

root@zhao:/apps/redis/etc# redis-cli -h 192.168.66.140 -p 6379
192.168.66.140:6379>
192.168.66.140:6379> cluster info
cluster_state:ok					#集群状态
cluster_slots_assigned:16384		 #已分配多少槽
cluster_slots_ok:16384				#槽的状态
cluster_slots_pfail:0				#失败了多少槽
cluster_slots_fail:0				#集群槽失败了多少
cluster_known_nodes:6				#节点个数
cluster_size:3					    #集群的节点,只统计master,三主三从
cluster_current_epoch:6
cluster_my_epoch:1
cluster_stats_messages_ping_sent:859
cluster_stats_messages_pong_sent:882
cluster_stats_messages_sent:1741
cluster_stats_messages_ping_received:877
cluster_stats_messages_pong_received:859
cluster_stats_messages_meet_received:5
cluster_stats_messages_received:1741

查看群集主从对应信息

​ cluster nodes

root@zhao:/apps/redis/etc# redis-cli -h 192.168.66.140 -p 6379
192.168.66.140:6379>
192.168.66.140:6379> cluster nodes
2c8edf....f1be71 192.168.66.142:6380@16380 slave 986844....af1b38 0 1649230348678 3 connected
88925e....eba6a1 192.168.66.141:6380@16380 slave 1165c1....2f32f1 0 1649230350000 1 connected
2f3a18....0c5f7 192.168.66.142:6379@16379 master - 0 1649230349000 5 connected 10923-16383
986844....af1b38 192.168.66.141:6379@16379 master - 0 1649230351708 3 connected 5461-10922
1165c1....2f32f1 192.168.66.140:6379@16379 myself,master - 0 1649230349000 1 connected 0-5460
9386d3....647dd2 192.168.66.140:6380@16380 slave 2f3a18....20c5f7 0 1649230350699 5 connected

​ 注释:他是通过路由的方式进行自动跳转的。

​ 具体格式:https://cloud.tencent.com/developer/section/1374002

测试

​ 在任何一台机器上写数据,都会写入到一个槽中,这个槽会在自动跳转到相对应的主机中进行写入。查看数据也会自动跳转到相对应的主机上查看数据。

登入集群

​ 登入时需要加-c,登入到集群中的任意一台机器上都可以。

​ redis-cli -c -h 192.168.66.140 -p 6379

root@zhao:/apps/redis/etc# redis-cli -c -h 192.168.66.140 -p 6379
192.168.66.140:6379>

写入数据

192.168.66.140:6379> set k1 v2
-> Redirected to slot [12706] located at 192.168.66.142:6379
OK
192.168.66.142:6379> set k2 v2
-> Redirected to slot [449] located at 192.168.66.140:6379
OK

读数据

192.168.66.140:6379> get k1
-> Redirected to slot [12706] located at 192.168.66.142:6379
"v2"
192.168.66.142:6379> get k2
-> Redirected to slot [449] located at 192.168.66.140:6379
"v2"

故障转移测试

​ 当集群中的master机器宕机后,master对应的slave机器就会自动顶替为master。

当前访问192.168.66.140:6379

192.168.66.140:6379> cluster nodes
2c8edf....f1be71 192.168.66.142:6380@16380 slave 986844....af1b38 0 1649230348678 3 connected
88925e....eba6a1 192.168.66.141:6380@16380 slave 1165c1....2f32f1 0 1649230350000 1 connected
2f3a18....0c5f7 192.168.66.142:6379@16379 master - 0 1649230349000 5 connected 10923-16383
986844....af1b38 192.168.66.141:6379@16379 master - 0 1649230351708 3 connected 5461-10922
1165c1....2f32f1 192.168.66.140:6379@16379 myself,master - 0 1649230349000 1 connected 0-5460
9386d3....647dd2 192.168.66.140:6380@16380 slave 2f3a18....20c5f7 0 1649230350699 5 connected

宕机192.168.66.141:3679

root@zhao:/apps/redis/etc# ps aux | grep redis
redis      1253  0.1  0.1  65340  5272 ?        Ssl  14:54   0:00 /apps/redis/bin/redis-server 192.168.66.141:6379 [cluster]
redis      1260  0.1  0.1  65340  5264 ?        Ssl  14:54   0:00 /apps/redis/bin/redis-server 192.168.66.141:6380 [cluster]
root       1266  0.0  0.0  14436  1048 pts/0    S+   14:54   0:00 grep --color=auto redis
root@zhao:/apps/redis/etc# kill 1253

查看群集主从对应的信息

192.168.66.141:6380> cluster nodes
2f3a18....20c5f7 192.168.66.142:6379@16379 master - 0 1649237763776 5 connected 10923-16383
986844....f1b38 192.168.66.141:6379@16379 master,fail - 1649237691569 1649237688523 3 disconnected
1165c1....2f32f1 192.168.66.140:6379@16379 master - 0 1649237761740 1 connected 0-5460
2c8edf....f1be71 192.168.66.142:6380@16380 master - 0 1649237762757 7 connected 5461-10922
88925e....eba6a1 192.168.66.141:6380@16380 myself,slave 1165c1...2f32f1 0 1649237759000 1 connected
9386d3....47dd2 192.168.66.140:6380@16380 slave 2f3a18....20c5f7 0 1649237763000 5 connected

​ 注释:当前在141的6380机器上,141的6379master后边是fail,表示当前节点无法访问,142的6380机器为master

启动宕机的机器

192.168.66.141:6380> cluster nodes
2f3a18....20c5f7 192.168.66.142:6379@16379 master - 0 1649238686000 5 connected 10923-16383
986844....f1b38 192.168.66.141:6379@16379 slave 2c8edf....06f1be71 0 1649238685915 7 connected
1165c1....2f32f1 192.168.66.140:6379@16379 master - 0 1649238683000 1 connected 0-5460
2c8edf....f1be71 192.168.66.142:6380@16380 master - 0 1649238685000 7 connected 5461-10922
88925e....eba6a1 192.168.66.141:6380@16380 myself,slave 1165c1....2f32f1 0 1649238684000 1 connected
9386d3....647dd2 192.168.66.140:6380@16380 slave 2f3a18....20c5f7 0 1649238686932 5 connected

​ 注释:141的6379变成slave了,142的6380还是master。

集群伸缩

伸缩原理

​ redis记入提供了灵活的扩容和收缩方案。在不影响集群对外服务的情况下,可以为集群添加节点进行扩容也可以下线部分节点进行缩容。

​ 从以上图中看出,Redis集群可以实现对节点的灵活上下线控制。其中原理可抽象为槽和对应数据在不同节点之间灵活移动。首先来看我们之前搭建的集群槽和数据与节点的对应关系,

​ 三个主节点分别维护自己负责的槽和对应的数据,如果希望加入1个节点实现集群扩容时,需要通过相关命令把一部分槽和数据迁移给新节点。

图中每个节点把一部分槽和数据迁移到新的节点6385,每个节点负责的槽和数据相比之前变少了从而达到了集群扩容的目的。这里我们故意忽略了 槽和数据在节点之间迁移的细节,目的是想让读者重点关注在上层槽和节点 分配上来,理解集群的水平伸缩的上层原理:集群伸缩=槽和数据在节点之间的移动。

扩容集群

准备新节点

​ 需要提前准备好新节点并运行在集群模式下,新节点建议跟集群内的节点配置保持一致,便于管理统一。

修改配置文件

​ vim /apps/redis/etc/redis.conf

1385 cluster-enabled yes				     #开启集群
1393 cluster-config-file nodes-6379.conf	  #集群内部配置文件
1399 cluster-node-timeout 15000				 #超时时间,单位毫秒

启动节点

root@zhao:~# sudo -u redis /apps/redis/bin/redis-server /apps/redis/etc/redis-6379.conf
root@zhao:~# sudo -u redis /apps/redis/bin/redis-server /apps/redis/etc/redis-6380.conf

​ 注释:启动后的新节点作为孤儿节点运行,并没有其他节点与之通信

加入集群

​ 生产使用方法二。

方法一

​ 新节点依然采用cluster meet命令加入到现有集群中。在集群内任意节点执行cluster meet命令让6380和6379节点加入进来,命令如下:

root@zhao:/apps/redis/etc# redis-cli -c -h 192.168.66.141 -p 6379		#进入集群
192.168.66.141:6379> 
192.168.66.141:6379> cluster meet 192.168.66.143 6379
OK
192.168.66.141:6379> cluster meet 192.168.66.143 6380
OK

​ 集群内新旧节点经过一段时间的ping/pong消息通信之后,所有节点会发现新节点并将它们的状态保存到本地。

​ 验证信息

root@zhao:/apps/redis/etc# redis-cli -c -h 192.168.66.141 -p 6380
192.168.66.141:6380> cluster nodes
2f3a18....20c5f7 192.168.66.142:6379@16379 master - 0 1649242172377 5 connected 10923-16383
986844....af1b38 192.168.66.141:6379@16379 slave 2c8edf98b01cadf9991da42ce6219a7d06f1be71 0 1649242171000 7 connected
c59c6f....954adc 192.168.66.143:6379@16379 master - 0 1649242171367 0 connected
1165c1....2f32f1 192.168.66.140:6379@16379 master - 0 1649242170000 1 connected 0-5460
2c8edf....f1be71 192.168.66.142:6380@16380 master - 0 1649242171000 7 connected 5461-10922
88925e....eba6a1 192.168.66.141:6380@16380 myself,slave 1165c1....2f32f1 0 1649242170000 1 connected
9386d3....647dd2 192.168.66.140:6380@16380 slave 2f3a18....20c5f7 0 1649242172000 5 connected
370928....ebbe15 192.168.66.143:6380@16380 master - 0 1649242821006 8 connected

​ 新节点刚开始都是主节点状态,但是由于没有负责的槽,所以不能接受任何读写操作。对于新节点的后续操作我们一般有两种选择:

​ 1、为它迁移槽和数据实现扩容。

​ 2、作为其他主节点的从节点负责故障转移。

方法二

​ 在已经在集群的机器上执行加入集群的命令,可以一次性把所有要扩容的机器全加进来。

​ redis-cli --cluster add-node 新主机IP:端口 已存在主机IP:端口

root@zhao:/apps/redis/etc# redis-cli --cluster add-node 192.168.66.143:6379 192.168.66.140:6379
>>> Adding node 192.168.66.143:6379 to cluster 192.168.66.140:6379
>>> Performing Cluster Check (using node 192.168.66.140:6379)
M: 1165c1bf292e4222f28f974df13735c6d92f32f1 192.168.66.140:6379
   slots:[0-5460] (5461 slots) master
   1 additional replica(s)
M: 2c8edf98b01cadf9991da42ce6219a7d06f1be71 192.168.66.142:6380
   slots:[5461-10922] (5462 slots) master
   1 additional replica(s)
S: 88925e3df3c32748a1e05321f3d41f1b77eba6a1 192.168.66.141:6380
   slots: (0 slots) slave
   replicates 1165c1bf292e4222f28f974df13735c6d92f32f1
M: 2f3a189d1e958ac06dc7b893b88d54f8a420c5f7 192.168.66.142:6379
   slots:[10923-16383] (5461 slots) master
   1 additional replica(s)
S: 9868445a43dedf5458627176b08c8973e8af1b38 192.168.66.141:6379
   slots: (0 slots) slave
   replicates 2c8edf98b01cadf9991da42ce6219a7d06f1be71
S: 9386d35562a4357a6a8190a52c7d23f86c647dd2 192.168.66.140:6380
   slots: (0 slots) slave
   replicates 2f3a189d1e958ac06dc7b893b88d54f8a420c5f7
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
>>> Send CLUSTER MEET to node 192.168.66.143:6379 to make it join the cluster.
[OK] New node added correctly.

验证信息

root@zhao:/apps/redis/etc# redis-cli -c -h 192.168.66.141 -p 6380
192.168.66.141:6380> cluster nodes
2f3a18....20c5f7 192.168.66.142:6379@16379 master - 0 1649242172377 5 connected 10923-16383
986844....af1b38 192.168.66.141:6379@16379 slave 2c8edf98b01cadf9991da42ce6219a7d06f1be71 0 1649242171000 7 connected
c59c6f....954adc 192.168.66.143:6379@16379 master - 0 1649242171367 0 connected
1165c1....2f32f1 192.168.66.140:6379@16379 master - 0 1649242170000 1 connected 0-5460
2c8edf....f1be71 192.168.66.142:6380@16380 master - 0 1649242171000 7 connected 5461-10922
88925e....eba6a1 192.168.66.141:6380@16380 myself,slave 1165c1....2f32f1 0 1649242170000 1 connected
9386d3....647dd2 192.168.66.140:6380@16380 slave 2f3a18....20c5f7 0 1649242172000 5 connected
370928....ebbe15 192.168.66.143:6380@16380 master - 0 1649242821006 8 connected

​ 新节点刚开始都是主节点状态,但是由于没有负责的槽,所以不能接受任何读写操作。对于新节点的后续操作我们一般有两种选择:

​ 1、为它迁移槽和数据实现扩容。

​ 2、作为其他主节点的从节点负责故障转移。

重要提示

​ 正式环境建议使用redis-trib.rb add-node命令加入新节点,该命令内部会执行新节点状态检查,如果新节点已经加入其他集群或者包含数据,则放弃集群加入操作并打印如下信息:

[ERR] Node 127.0.0.1:6385 is not empty. Either the node already knows other nodes (check with CLUSTER NODES) or contains some key in database 0.

​ 如果我们手动执行cluster meet命令加入已经存在于其他集群的节点,会造成被加入节点的集群合并到现有集群的情况,从而造成数据丢失和错乱,后果非常严重,线上谨慎操作。

迁移槽 和数据

​ 移槽和数据是一块迁移的,数据就在每个槽里,迁移需要保证每个节点负责相似数量的槽,从而保证各节点的数据均匀。

	redis-cli --cluster reshard 新节点IP:端口
root@zhao:~# redis-cli --cluster reshard 192.168.66.143:6379
>>> Performing Cluster Check (using node 192.168.66.143:6379)
M: ab93840c160911bfdac646e8987d9a2933c8bc2f 192.168.66.143:6379
   slots: (0 slots) master
S: b530bfac0c6960ede742f1d9ad3fb08c2cba4502 192.168.66.140:6380
   slots: (0 slots) slave
   replicates 39163a89a81a5ed96f1b69c03df5bdf0dc8b4427
S: bc529585af1b69adb74c9cad91fb480c1f89ce64 192.168.66.142:6380
   slots: (0 slots) slave
   replicates a08e7dc9d7f40d8ec050ccc6c75973aa2872229c
M: 39163a89a81a5ed96f1b69c03df5bdf0dc8b4427 192.168.66.142:6379
   slots:[10923-16383] (5461 slots) master
   1 additional replica(s)
S: 07c9bc2ea9a6852b7730992ed33a591a4bcc73c1 192.168.66.141:6380
   slots: (0 slots) slave
   replicates dce9016f8c1935e65544d153b7d7b2c147a96e9f
M: dce9016f8c1935e65544d153b7d7b2c147a96e9f 192.168.66.140:6379
   slots:[0-5460] (5461 slots) master
   1 additional replica(s)
M: a08e7dc9d7f40d8ec050ccc6c75973aa2872229c 192.168.66.141:6379
   slots:[5461-10922] (5462 slots) master
   1 additional replica(s)
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
How many slots do you want to move (from 1 to 16384)? 4096			#分配多少个槽位,16384除master个数
What is the receiving node ID? ab93840c160911bfdac646e8987d9a2933c8bc2f			#接收slot的服务器ID,手动输入192.168.66.143的node ID
Please enter all the source node IDs.
  Type 'all' to use all the nodes as source nodes for the hash slots.
  Type 'done' once you entered all the source nodes IDs.
Source node #1: all 					#将哪些源主机的槽位分配给192.168.66.143:6379,all是自动在所有的redis node选择划分,如果是从redis cluster删除主机可以使用此方式将主机上的槽位全部移动到别的redis主机
......
 Moving slot 1362 from dce9016f8c1935e65544d153b7d7b2c147a96e9f
    Moving slot 1363 from dce9016f8c1935e65544d153b7d7b2c147a96e9f
    Moving slot 1364 from dce9016f8c1935e65544d153b7d7b2c147a96e9f
Do you want to proceed with the proposed reshard plan (yes/no)? yes        #确认分配
......
Moving slot 1361 from 192.168.66.140:6379 to 192.168.66.143:6379: 
Moving slot 1362 from 192.168.66.140:6379 to 192.168.66.143:6379: 
Moving slot 1363 from 192.168.66.140:6379 to 192.168.66.143:6379: 
Moving slot 1364 from 192.168.66.140:6379 to 192.168.66.143:6379: 
root@zhao:~# 														#完成

​ 验证143:6379是否有槽位

root@zhao:~# redis-cli -c -h 192.168.66.140 -p 6379
192.168.66.140:6379> cluster nodes
b530bf....ba4502 192.168.66.140:6380@16380 slave 39163....c8b4427 0 1649249044000 5 connected
dce901....a96e9f 192.168.66.140:6379@16379 myself,master - 0 1649249046000 1 connected 1365-5460
a08e7d....72229c 192.168.66.141:6379@16379 master - 0 1649249045000 3 connected 6827-10922
ab9384....c8bc2f 192.168.66.143:6379@16379 master - 0 1649249047955 7 connected 0-1364 5461-6826 10923-12287
07c9bc....cc73c1 192.168.66.141:6380@16380 slave dce90....a96e9f 0 1649249046937 1 connected
bc5295....89ce64 192.168.66.142:6380@16380 slave a08e7....72229c 0 1649249044905 3 connected
39163a....8b4427 192.168.66.142:6379@16379 master - 0 1649249045920 5 connected 12288-16383

​ 注释:上边的143:6379有槽位了,只不过槽位不在一块,分布在:0-1364 5461-6826 10923-12287。

为新master添加slave节点

​ 需要再向当前的Redis集群中添加一个Redis单机服务器,用于解决当前192.168.7.107单机的潜在宕机问题,即实现响应的高可用功能。启动143的6380机器,不添加槽位了。

​ 需要手动将其指定为143:6379master的slave,否则其默认角色为master。

未改slave前节点

root@zhao:~# redis-cli -c -h 192.168.66.140 -p 6379
192.168.66.140:6379> cluster nodes
b530bf....ba4502 192.168.66.140:6380@16380 slave 39163a....8b4427 0 1649249044000 5 connected
dce901....a96e9f 192.168.66.140:6379@16379 myself,master - 0 1649249046000 1 connected 1365-5460
a08e7d....72229c 192.168.66.141:6379@16379 master - 0 1649249045000 3 connected 6827-10922
ab9384....c8bc2f 192.168.66.143:6379@16379 master - 0 1649249047955 7 connected 0-1364 5461-6826 10923-12287
07c9bc....cc73c1 192.168.66.141:6380@16380 slave dce90....a96e9f 0 1649249046937 1 connected
bc5295....89ce64 192.168.66.142:6380@16380 slave a08e7....72229c 0 1649249044905 3 connected
39163a....8b4427 192.168.66.142:6379@16379 master - 0 1649249045920 5 connected 12288-16383
fe3693....8ad459 192.168.66.143:6380@16380 master - 0 1649249550000 0 connected

​ 注释:143:6380加入集群中默认是master节点

登入到新添加的节点中

root@zhao:~# redis-cli -c -h 192.168.66.143 -p 6380
192.168.66.143:6380> 

找到目标master的ID

192.168.66.143:6380> cluster nodes
b530bf....ba4502 192.168.66.140:6380@16380 slave 39163a....8b4427 0 1649249044000 5 connected
dce901....a96e9f 192.168.66.140:6379@16379 myself,master - 0 1649249046000 1 connected 1365-5460
a08e7d....72229c 192.168.66.141:6379@16379 master - 0 1649249045000 3 connected 6827-10922
ab9384....c8bc2f 192.168.66.143:6379@16379 master - 0 1649249047955 7 connected 0-1364 5461-6826 10923-12287
07c9bc....cc73c1 192.168.66.141:6380@16380 slave dce90....a96e9f 0 1649249046937 1 connected
bc5295....89ce64 192.168.66.142:6380@16380 slave a08e7....72229c 0 1649249044905 3 connected
39163a....8b4427 192.168.66.142:6379@16379 master - 0 1649249045920 5 connected 12288-16383
fe3693....8ad459 192.168.66.143:6380@16380 master - 0 1649249550000 0 connected

​ 注释:目标的ID是143:6379的ID:ab93840c160911bfdac646e8987d9a2933c8bc2f

添加slave节点

​ cluster replicate master节点ID

192.168.66.143:6380> cluster replicate ab93840c160911bfdac646e8987d9a2933c8bc2f
OK

验证是否成功

192.168.66.143:6380> cluster nodes
b530bf....ba4502 192.168.66.140:6380@16380 slave 39163a....8b4427 0 1649249044000 5 connected
dce901....a96e9f 192.168.66.140:6379@16379 myself,master - 0 1649249046000 1 connected 1365-5460
a08e7d....72229c 192.168.66.141:6379@16379 master - 0 1649249045000 3 connected 6827-10922
ab9384....c8bc2f 192.168.66.143:6379@16379 master - 0 1649249047955 7 connected 0-1364 5461-6826 10923-12287
07c9bc....cc73c1 192.168.66.141:6380@16380 slave dce90....a96e9f 0 1649249046937 1 connected
bc5295....89ce64 192.168.66.142:6380@16380 slave a08e7....72229c 0 1649249044905 3 connected
39163a....8b4427 192.168.66.142:6379@16379 master - 0 1649249045920 5 connected 12288-16383
fe3693....8ad459 192.168.66.143:6380@16380 myself,slave ab9384....3c8bc2f 0 1649250010000 7 connected

缩容集群

​ 添加节点的时候是先添加node节点到集群,然后分配槽位,删除节点的操作与添加节点的操作正好相反,是先将被删除的Redis node上的槽位迁移到集群中的其他Redis node节点上,然后再将其删除,如果一个Redis node节点上的槽位没有被完全迁移,删除该node的时候会提示有数据且无法删除。

​ 流程说明:

​ 1)首先需要确定下线节点是否有负责的槽,如果是,需要把槽迁移到其他节点,保证节点下线后整个集群槽节点映射的完整性。

​ 2)当下线节点不再负责槽或者本身是从节点时,就可以通知集群内其他节点忘记下线节点,当所有的节点忘记该节点后可以正常关闭。

收缩前准备

​ 查看要收缩的目标。

192.168.66.143:6380> cluster nodes
b530bf....ba4502 192.168.66.140:6380@16380 slave 39163a....8b4427 0 1649249044000 5 connected
dce901....a96e9f 192.168.66.140:6379@16379 myself,master - 0 1649249046000 1 connected 1365-5460
a08e7d....72229c 192.168.66.141:6379@16379 master - 0 1649249045000 3 connected 6827-10922
ab9384....c8bc2f 192.168.66.143:6379@16379 master - 0 1649249047955 7 connected 0-1364 5461-6826 10923-12287
07c9bc....cc73c1 192.168.66.141:6380@16380 slave dce90....a96e9f 0 1649249046937 1 connected
bc5295....89ce64 192.168.66.142:6380@16380 slave a08e7....72229c 0 1649249044905 3 connected
39163a....8b4427 192.168.66.142:6379@16379 master - 0 1649249045920 5 connected 12288-16383
fe3693....8ad459 192.168.66.143:6380@16380 myself,slave ab9384....3c8bc2f 0 1649250010000 7 connected

​ 注释:要确定下的主机及端口,重新分片的主机ID,平均把槽分给其他master节点

下线迁移槽

​ 下线节点需要把自己负责的槽迁移到其他节点,原理与之前节点扩容的迁移槽过程一致。

​ 迁移master的槽位到其他的master中

​ 被迁移redis master源服务器(上边那句话的后master中)必须保证没有数据,否则迁移报错并会被强制中断。

redis fix 迁移失败的主机:端口			#如果迁移失败使用此命令修复集群

​ 收缩正好和扩容迁移方向相反,143:6379变为源节点,其他主节点变为目标节点,源节点需要把自身负责的4096个槽均匀地迁移到其他主节点上。由于每次执行reshard命令只能有一个目标节点,因此需要执行3次reshard命令,分别迁移1365、1365、1366个槽。

	redis-cli --cluster reshard masterIP:端口			#只要不是要下线的主机就可以,要在对应的机器上执行
root@zhao:~# redis-cli --cluster reshard 192.168.66.141:6379
>>> Performing Cluster Check (using node 192.168.66.141:6379)
M: a08e7dc9d7f40d8ec050ccc6c75973aa2872229c 192.168.66.141:6379
   slots:[6827-10922] (4096 slots) master
   1 additional replica(s)
S: fe3693d596b90b35ca17bbca5de84114cc8ad459 192.168.66.143:6380
   slots: (0 slots) slave
   replicates ab93840c160911bfdac646e8987d9a2933c8bc2f
M: dce9016f8c1935e65544d153b7d7b2c147a96e9f 192.168.66.140:6379
   slots:[1365-5460] (4096 slots) master
   1 additional replica(s)
M: 39163a89a81a5ed96f1b69c03df5bdf0dc8b4427 192.168.66.142:6379
   slots:[12288-16383] (4096 slots) master
   1 additional replica(s)
S: b530bfac0c6960ede742f1d9ad3fb08c2cba4502 192.168.66.140:6380
   slots: (0 slots) slave
   replicates 39163a89a81a5ed96f1b69c03df5bdf0dc8b4427
M: ab93840c160911bfdac646e8987d9a2933c8bc2f 192.168.66.143:6379
   slots:[0-1364],[5461-6826],[10923-12287] (4096 slots) master
   1 additional replica(s)
S: bc529585af1b69adb74c9cad91fb480c1f89ce64 192.168.66.142:6380
   slots: (0 slots) slave
   replicates a08e7dc9d7f40d8ec050ccc6c75973aa2872229c
S: 07c9bc2ea9a6852b7730992ed33a591a4bcc73c1 192.168.66.141:6380
   slots: (0 slots) slave
   replicates dce9016f8c1935e65544d153b7d7b2c147a96e9f
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
How many slots do you want to move (from 1 to 16384)? 4096		#分配多少个槽位出来,一共有4096个,要平均分给其他master主机。(4096除3余1个)
What is the receiving node ID? dce9016f8c1935e65544d153b7d7b2c147a96e9f	  #接收slot的服务器ID,只要是master就可以。但要每个master都要均匀,不能写多个,要执行三遍命令。
Please enter all the source node IDs.
  Type 'all' to use all the nodes as source nodes for the hash slots.
  Type 'done' once you entered all the source nodes IDs.
Source node #1: ab93840c160911bfdac646e8987d9a2933c8bc2f 		#从哪个服务器迁移4096个槽位(143:6379的ID)
Source node #2: done 			#写done,表示没有其他master了
......
 Moving slot 1362 from dce9016f8c1935e65544d153b7d7b2c147a96e9f
    Moving slot 1363 from dce9016f8c1935e65544d153b7d7b2c147a96e9f
    Moving slot 1364 from dce9016f8c1935e65544d153b7d7b2c147a96e9f
Do you want to proceed with the proposed reshard plan (yes/no)? yes        #是否继续
......
Moving slot 1361 from 192.168.66.140:6379 to 192.168.66.143:6379: 
Moving slot 1362 from 192.168.66.140:6379 to 192.168.66.143:6379: 
Moving slot 1363 from 192.168.66.140:6379 to 192.168.66.143:6379: 
Moving slot 1364 from 192.168.66.140:6379 to 192.168.66.143:6379: 
root@zhao:~# 														#迁移完成

验证槽位

root@zhao:~# redis-cli -c -h 192.168.66.141 -p 6379
192.168.66.141:6379> cluster nodes
fe3693....8ad459 192.168.66.143:6380@16380 slave dce901....a96e9f 0 1649255949554 8 connected
dce901....a96e9f 192.168.66.140:6379@16379 master - 0 1649255951000 8 connected 0-6826 10923-12287
39163a....8b4427 192.168.66.142:6379@16379 master - 0 1649255951577 5 connected 12288-16383
b530bf....ba4502 192.168.66.140:6380@16380 slave 39163a....8b4427 0 1649255949000 5 connected
a08e7d....72229c 192.168.66.141:6379@16379 myself,master - 0 1649255950000 3 connected 6827-10922
ab9384....c8bc2f 192.168.66.143:6379@16379 slave dce901....a96e9f 0 1649255950566 8 connected
bc5295....89ce64 192.168.66.142:6380@16380 slave a08e7d....72229c 0 1649255948000 3 connected
07c9bc....cc73c1 192.168.66.141:6380@16380 slave dce901....a96e9f 0 1649255948542 8 connected

​ 注释:上边的143:6379没有槽位了,直接变成slave了。

验证master与slave

​ 确认整个redis cluster集群中,每个master至少有一个slave,可以有多个,但至少要有一个提供数据备份和服务高可用。

root@zhao:~# redis-cli -c -h 192.168.66.141 -p 6379
192.168.66.141:6379> cluster nodes
fe3693....8ad459 192.168.66.143:6380@16380 slave dce901....a96e9f 0 1649255949554 8 connected
dce901....a96e9f 192.168.66.140:6379@16379 master - 0 1649255951000 8 connected 0-6826 10923-12287
39163a....8b4427 192.168.66.142:6379@16379 master - 0 1649255951577 5 connected 12288-16383
b530bf....ba4502 192.168.66.140:6380@16380 slave 39163a....8b4427 0 1649255949000 5 connected
a08e7d....72229c 192.168.66.141:6379@16379 myself,master - 0 1649255950000 3 connected 6827-10922
ab9384....c8bc2f 192.168.66.143:6379@16379 slave dce901....a96e9f 0 1649255950566 8 connected
bc5295....89ce64 192.168.66.142:6380@16380 slave a08e7d....72229c 0 1649255948000 3 connected
07c9bc....cc73c1 192.168.66.141:6380@16380 slave dce901....a96e9f 0 1649255948542 8 connected

验证集群Master与Slave对应关系

​ Redis Slave节点一定不能个master在一个服务器,必须为跨主机交叉备份模式,避免主机故障后主备全部挂掉,如果出现Redis Slave与Redis master在同一台Redis node的情况,则需要安装以上步骤重新进行slave分配,直到不相互交叉备份为止。

root@zhao:~# redis-cli -c -h 192.168.66.141 -p 6379
192.168.66.141:6379> cluster nodes
fe3693....8ad459 192.168.66.143:6380@16380 slave dce901....a96e9f 0 1649255949554 8 connected
dce901....a96e9f 192.168.66.140:6379@16379 master - 0 1649255951000 8 connected 0-6826 10923-12287
39163a....8b4427 192.168.66.142:6379@16379 master - 0 1649255951577 5 connected 12288-16383
b530bf....ba4502 192.168.66.140:6380@16380 slave 39163a....8b4427 0 1649255949000 5 connected
a08e7d....72229c 192.168.66.141:6379@16379 myself,master - 0 1649255950000 3 connected 6827-10922
ab9384....c8bc2f 192.168.66.143:6379@16379 slave dce901....a96e9f 0 1649255950566 8 connected
bc5295....89ce64 192.168.66.142:6380@16380 slave a08e7d....72229c 0 1649255948000 3 connected
07c9bc....cc73c1 192.168.66.141:6380@16380 slave dce901....a96e9f 0 1649255948542 8 connected

从集群中删除节点服务器

​ 槽位已经迁移完成,但是服务器IP信息还在集群当中,因此还需要将IP信息从集群删除。

查看信息

​ 查看好节点的ID号

root@zhao:~# redis-cli -c -h 192.168.66.141 -p 6379
192.168.66.141:6379> cluster nodes
fe3693....8ad459 192.168.66.143:6380@16380 slave dce901....a96e9f 0 1649255949554 8 connected
dce901....a96e9f 192.168.66.140:6379@16379 master - 0 1649255951000 8 connected 0-6826 10923-12287
39163a....8b4427 192.168.66.142:6379@16379 master - 0 1649255951577 5 connected 12288-16383
b530bf....ba4502 192.168.66.140:6380@16380 slave 39163a....8b4427 0 1649255949000 5 connected
a08e7d....72229c 192.168.66.141:6379@16379 myself,master - 0 1649255950000 3 connected 6827-10922
ab9384....c8bc2f 192.168.66.143:6379@16379 slave dce901....a96e9f 0 1649255950566 8 connected
bc5295....89ce64 192.168.66.142:6380@16380 slave a08e7d....72229c 0 1649255948000 3 connected
07c9bc....cc73c1 192.168.66.141:6380@16380 slave dce901....a96e9f 0 1649255948542 8 connected

删除节点

​ redis-cli --cluster del-node 节点IP:端口 节点ID

root@zhao:~# redis-cli --cluster del-node 192.168.66.143:6379 ab9384....c8bc2f
>>> Removing node ab93840c160911bfdac646e8987d9a2933c8bc2f from cluster 192.168.66.143:6379
>>> Sending CLUSTER FORGET messages to the cluster...
>>> Sending CLUSTER RESET SOFT to the deleted node.

验证node是否删除

root@zhao:~# redis-cli -c -h 192.168.66.141 -p 6379
192.168.66.141:6379> cluster nodes
fe3693....8ad459 192.168.66.143:6380@16380 slave dce901....a96e9f 0 1649255949554 8 connected
dce901....a96e9f 192.168.66.140:6379@16379 master - 0 1649255951000 8 connected 0-6826 10923-12287
39163a....8b4427 192.168.66.142:6379@16379 master - 0 1649255951577 5 connected 12288-16383
b530bf....ba4502 192.168.66.140:6380@16380 slave 39163a....8b4427 0 1649255949000 5 connected
a08e7d....72229c 192.168.66.141:6379@16379 myself,master - 0 1649255950000 3 connected 6827-10922
bc5295....89ce64 192.168.66.142:6380@16380 slave a08e7d....72229c 0 1649255948000 3 connected
07c9bc....cc73c1 192.168.66.141:6380@16380 slave dce901....a96e9f 0 1649255948542 8 connected