Analysis of Redis master-slave replication

How does Redis master-slave replication work? Do you know how to keep high performance while synchronizing data?

    • https://redis.io/topics/replication Note that based on the latest version of redis 5, the slave term and configuration item have been officially changed to replica. In fact, they refer to slave nodes.

Basic process of master-slave replication

# Master-Replica replication. Use replicaof to make a Redis instance a copy of
# another Redis server. A few things to understand ASAP about Redis replication.
#
# +------------------+ +---------------+
# | Master | ---> | Replica |
# | (receive writes) | | (exact copy) |
# +------------------+ +---------------+
#
# 1) Redis replication is asynchronous, but you can configure a master to
# stop accepting writes if it appears to be not connected with at least
# a given number of replicas.
# 2) Redis replicas are able to perform a partial resynchronization with the
# master if the replication link is lost for a relatively small amount of
# time. You may want to configure the replication backlog size (see the next
# sections of this file) with a sensible value depending on your needs.
# 3) Replication is automatic and does not need user intervention. After a
# network partition replicas automatically try to reconnect to masters
# and resynchronize with them.
#
# replicaof <masterip> <masterport>
Basic process of Master and slave replica replication

  • When the connection between the Master master and the replica is stable, the Master continuously performs incremental resync, sends the incremental data to the replica, and the replica updates its own data after receiving the data, and reports the processing situation to the Master by REPLCONF ACK PING every second.
  • If replica is disconnected from and reconnected with Master, replica attempts to send PSYNC command to Master. If the condition is satisfied (for example, a known historical replica is referenced, or the backlog is sufficient), then partial resync will be triggered. Otherwise, the Master will trigger a full resync to the replica

From the above basic process, we can see that if there is a problem with the network, we can cause full resync, which will seriously affect the data progress of catching up with the master from replica. So how to solve it? There are two aspects: master-slave response time strategy and master-slave space accumulation strategy.

Master slave response time policy
  • 1. PING the Master every repl PING replica period second to check whether the Master is hung.
repl-ping-replica-period 10
  • 2. The replication timeout between replica (salve) and Master is 60s by default
  • a) From the perspective of replica, RDB data transmitted by the master is not received during full synchronization of SYNC
  • b) From the perspective of replica, there is no packet sent by the master or PING response sent by replica
  • c) master angle, no repconf ack rings received from replica. When redis detects the repl timeout (the default value is 60s), the connection between the master and slave will be closed, and redis replica initiates the request to reestablish the master-slave connection.
repl-timeout 60
Master-slave space accumulation strategy

After the Master receives the data write, it will write to the replication buffer (this is mainly used for the data transmission buffer of Master-slave replication), and also write to the backlog replication backlog. When replica disconnects and reconnects PSYNC (including replication ID and currently processed offset), if the historical replica can be found in the replication backlog, then partial resync will be triggered, otherwise it will be triggered A Master synchronizes to the replica in full resync.

# Set the replication backlog size. The backlog is a buffer that accumulates
# replica data when replicas are disconnected for some time, so that when a replica
# wants to reconnect again, often a full resync is not needed, but a partial
# resync is enough, just passing the portion of data the replica missed while
# disconnected.
#
# The bigger the replication backlog, the longer the time the replica can be
# disconnected and later be able to perform a partial resynchronization.
#
# The backlog is only allocated once there is at least a replica connected.
#
# repl-backlog-size 1mb

Parameters related to backlog replication backlog:

# Incremental synchronization window
repl-backlog-size 1mb 
repl-backlog-ttl 3600

full resync full synchronization workflow

Full synchronous workflow:

  • replica sends PSYNC. (assuming the condition of full synchronization is met)
  • Master processes full synchronization through subprocesses. Subprocesses write snapshots through BGSAVE command and fork subprocesses dump.rdb . At the same time, the master starts buffering all new write commands received from the client to the replication buffer.
  • The Master subprocess transmits rdb data to replica through the network card.
  • replica saves rdb data to disk and then loads it into memory (delete old data and block loading new data) (incremental synchronization follows)

If the master disk is slow and the bandwidth is good, the diskless mode can be used (note that this is experimental):

repl-diskless-sync no --> yes Turn on diskless mode
repl-diskless-sync-delay 5

replica can provide services by default during full synchronization or disconnection.

replica-serve-stale-data yes

Replica will block the client's connection in the time window when replica is loaded into memory.

Allow writes only with N attached replicas

By default, the master uses asynchronous replication, which means that the client writes the command. The master needs to confirm by himself, and confirms that there are at least N copies, and the delay is less than M seconds, then the master will accept the write, otherwise an error will be returned

#It is not enabled by default
 Min replicas to write    
Min replicas Max lag < seconds >

In addition, the Client client can use the WAIT command similar to the ACK mechanism to ensure that there are a specified number of confirmed copies in other Redis instances.

127.0.0.1:9001>set a x
OK.
127.0.0.1:9001>wait 1 1000
1

Failover

replication ID is mainly used to identify the dataset ID from the current master. There are two replication ID S: master_replid´╝îmaster_replid2

127.0.0.1:9001> info replication
# Replication
role:master
connected_slaves:1
slave0:ip=127.0.0.1,port=9011,state=online,offset=437,lag=1
master_replid:9ab608f7590f0e5898c4574299187a52ad0db7ec
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:437
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:437

When the master is suspended and one of the replicas is upgraded to master, it will open a new era and generate a new replication ID: master_replid At the same time, the old master_replid set to master_replid2.

# Replication
role:master
connected_slaves:2
slave0:ip=127.0.0.1,port=9021,state=online,offset=34874,lag=0
slave1:ip=127.0.0.1,port=9001,state=online,offset=34741,lag=0
master_replid:dfa343264a79179c1061f8fb81d49077db8e4e5f
master_replid2:9ab608f7590f0e5898c4574299187a52ad0db7ec
master_repl_offset:34874
second_repl_offset:6703
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:34874

In this way, other replica connections to the new master do not need another full synchronization. You can continue to synchronize the replica and use the new era data.

How does replica handle expired keys?

  • Replica does not actively delete the expired key. Replica will delete the expired key only when Master passes the memory elimination strategy such as LRU or actively accesses the expired key, and the composite DEL command is given to replica
  • There is a time difference in the above. The internal logic clock of replica is used. When the client tries to read an expired key, replica will report that it does not exist.

@SvenAugustus(https://www.flysium.xyz/)
More attention to WeChat official account, focus on sharing the dry cargo related to server development and programming:

Tags: Database Redis network less Programming

Posted on Thu, 21 May 2020 03:19:07 -0700 by sri.sjc