What is the ghost of MySQL binlog failing to recover data based on time?

Problem phenomena

  • A customer feedback, when using binlog to do point-based recovery, often fails to correctly restore data (the probability is about 5%~10%).

  • The database environment provides MySQL 5.7.27 version of cascade replication topology (A->B->C), main library A provides write services, read services from library B, and cascade replication provides data backup and recovery services from library C. Among them, AB library is the production service library, and the data backup and recovery service is to take regular and manual snapshots of the data from library C by cascade replication, and then to restore the data based on time point with binlog from library C. When the data can not be recovered correctly, it means that the time point corresponding to the actual recovered data is much earlier than the time point specified by the mysqlbinlog command (for example, the main database data has reached 12:00, the snapshot time from the library C is 10:00, the deadline for executing the recovery operation command is 11:00, and the final recovery is achieved). The time point for the data to be recovered is 10:30; or the final data to be recovered is only snapshot data - that is, only 10:00 data without any binlog compensation.

Rough deduction and analysis

* The step-by-step replication of cascaded replication may result in increased replication latency step by step, or large transactions lead to replication latency, and then, problems caused by delay.
* The system logic of performing backup recovery has problems caused by BUG.
* Are mysqlbinlog parsing tools BUG, or are MySQL Server writing binlogs from the library causing problems with BUG, or are high loads causing MySQL's brain to break?  
* Some of the problems that have not been exposed are caused by the operation.

  • At first glance, it's a little confused, and several possibilities (not very reliable) are quickly deduced in my mind.


  • Obviously, through these simple information and rough deductive analysis, we can not determine the truth of the problem. To determine the truth of the problem, we need more detailed information and reproducible methods. Next, we will follow this routine and take you step by step to analyze the truth of the problem.

1. Information Collection

What information needs to be collected? Here's a general list of some information that might be useful for analyzing problems.

  • Collect data directly related to problem phenomena for intuitive perception of what the problem looks like.

    * What are the operations related to the phenomena of problems and how are they operated?
    * The original command text of binlog and the corresponding binlog original file are parsed from library C.
    * Copy status and delay information from library C.
    * Configure the template from my.cnf of Library C.
    * Transaction information, lock information, show process list thread status information from library C.

  • Collect the host load information from library C and the load information from database for judging some problems that may be caused by high load.

  • Collect MySQL error logs from library C, operating system routine logs, operating system security logs to determine whether there are some errors or BUG is triggered.

After collecting the information listed above, our analysis and investigation work will be more operable, but some of these information may be useless interference information, some may not be the cause of the problem, but can help to correct the direction of the investigation problem (e.g., proved not to be caused by it). After the problem, we will not go in this direction to continue in-depth analysis and investigation. Really useful information may be playing hide-and-seek with you most of the time. So we need to get rid of the false information and preserve the true information. Then, the probability of really useful information lies in the rest of the information. Following this principle, we will first filter out useful information from all the information collected.

PS: Note the Chinese comments in the code section below.

2. Analytical investigation

According to the collected host load and database load information, the host load and database load information from database C are relatively low, and the problems caused by high load are temporarily eliminated.

Judging from the various error logs collected, no error information was found from library C, and the problem caused by MySQL Server's BUG was temporarily excluded (but doubtful).

According to the collected data directly related to the problem phenomenon.

* Evet timestamps for FDE, Previous-GTIDs types are 19:09:10
* The event(GTID, Query, Table_map, Update_rows|Delete_rows|Update_rows, Xid) timestamp for recording data ranges from 19:03:30 to 19:12:34.
* Evet timestamp of Rotate type is 19:12:34

  • No obvious Sao operation was found, which excluded the possibility of human operation.

  • The average latency from the library is 5 to 20 hours (which should be noticed).

  • After parsing and viewing the original binlog file from library C through mysqlbinlog, it is found that most of the contents recorded in binlog are not abnormal, but the sequence of event timestamps in very few binlogs is out of order (which needs attention):

  • The stop-datetime timestamp specified by the mysqlbinlog parsing command is'19:08:10'(to be noted).

  • The configuration template found no exceptions and excluded the possibility of improper parameter configuration.

  • There is no abnormal thread state, no big transaction and no long lock waiting information in the database, which excludes the possibility of large transaction, lock waiting and long uncommitted transaction.

Based on the information gathered above, we can now roughly judge that the replication delay is caused by the replication delay. So, the question is, what is the relationship between the replication delay and the incorrect recovery of data based on time points? What happens to the event timestamp order in binlog? The mysqlbinlog parsing command specifies a timestamp of'19:08:10'. When it encounters an event timestamp of FDE and Previous-GTIDs type at 19:09:10, it is found that the time range condition is not satisfied and the subsequent event of parsing (that is, the event parsing of data section is skipped)? To clarify these three issues, we need to find a way to reproduce the problem phenomenon (because the customer site is not free to toss around, so we need to find their own environment to reproduce the phenomenon, according to the information collected above, reproducing the problem first needs to deliberately create replication delay).

PS: Verbose, how do we judge the validity of the above information? In addition to the need to have a certain level of knowledge, I personally believe that the factors that can make you feel a little uneasy need to be noted, not intentionally or unintentionally ignore some potential risk points.

3. Find the reason

In our own replication environment, we intentionally create replication delays to produce replication delay-related data from libraries (a master-slave replication topology, using sysbench 64 threads to continuously pressurize the libraries).

Stop the SQL thread from the library and manually execute the flush binary logs statement to scroll the log, recording the last binlog after scrolling.

#Stop the SQL thread
root@localhost : (none) 05:21:12> stop slave sql_thread;
Query OK, 0 rows affected (0.00 sec)

#To facilitate subsequent data viewing and avoid interference with previously recorded binlog content, switch binlog manually from the library first
root@localhost : (none) 05:21:31> flush binary logs;
Query OK, 0 rows affected (0.01 sec)

root@localhost : (none) 05:22:00> show binary logs;
| Log_name | File_size |
| mysql-bin.000001 | 60392080 |
| mysql-bin.000010 | 194 |  #After scrolling, the binlog file of the crime is mysql-bin.000010
10 rows in set (0.00 sec)

After about 10 minutes, we started the SQL thread from the library to catch up with the data. When we looked at the replication status, we found that the replication was delayed.

#Start the SQL thread
root@localhost : (none) 05:22:13> start slave sql_thread;
Query OK, 0 rows affected (0.00 sec)

#View replication status
root@localhost : (none) 05:30:08> show slave status\G
*************************** 1. row ***************************
           Slave_IO_State: Waiting for master to send event
              Master_User: repl
              Master_Port: 3306
            Connect_Retry: 10
          Master_Log_File: mysql-bin.000015
      Read_Master_Log_Pos: 375161305
           Relay_Log_File: mysql-relay-bin.000026
            Relay_Log_Pos: 148909788
    Relay_Master_Log_File: mysql-bin.000009
         Slave_IO_Running: Yes
        Slave_SQL_Running: Yes
      Exec_Master_Log_Pos: 148909575
          Relay_Log_Space: 3596749378
    Seconds_Behind_Master: 648  #The delay for successful manufacturing replication is 648 seconds (of course, this delay may not be accurate, but this is not the point here)

Use the mysqlbinlog command to parse and view the contents of the mysql-bin.000010 file from the library.

[root@node3 binlog]# ll
total 4836776
-rw-r----- 1 mysql mysql 536903179 Jul 29 17:31 mysql-bin.000010
-rw-r----- 1 mysql mysql 21004292 Jul 29 17:31 mysql-bin.000011
-rw-r----- 1 mysql mysql 572 Jul 29 17:31 mysql-bin.index

#Parse file mysql-bin.000010
[root@node3 binlog]# mysqlbinlog mysql-bin.000010 > a.sql
#Filter the data rows with server id in a.sql, that is, those text rows that mark event type
[root@node3 binlog]# grep -E 'server id|GTID_NEXT' a.sql > a.txt 

#View the size of two files
[root@node3 binlog]# ls -lh a.*
-rw-r--r-- 1 root root 878M Jul 29 17:32 a.sql
-rw-r--r-- 1 root root 322M Jul 29 17:44 a.txt

Open a.txt file from the library using vim to view all event timestamp information.

[root@node3 binlog]# vim a.txt
#The time stamps of event in binlog are found to be out of order, which are as follows (the time stamps of FDE, Previous-GTIDs are 190729 17:22:00, the GTID, Query, Table_map, Update_rows|Delete_rows|Update_rows, Xid time stamps are 190729 17:19:18 to 190729 17:20:57, and Rotate time stamps are 190729 17:20:57).
#190729 17:22:00 server id 33062 end_log_pos 123 CRC32 0xc3a9ea93 Start: binlog v 4, server v 5.7.27-log created 190729 17:22:00
#190729 17:22:00 server id 33062 end_log_pos 194 CRC32 0x22797305 Previous-GTIDs
#190729 17:19:18 server id 33061 end_log_pos 259 CRC32 0xd48537a6 GTID last_committed=0 sequence_number=1 rbr_only=yes
SET @@SESSION.GTID_NEXT= 'd9c5c228-b1e0-11e9-b14d-525400c33752:258161'/*!*/;
#190729 17:19:18 server id 33061 end_log_pos 327 CRC32 0x78818eeb Query thread_id=364 exec_time=650 error_code=0
#190729 17:19:18 server id 33061 end_log_pos 451 CRC32 0x6ca05182 Table_map: `sbtest`.`sbtest1` mapped to number 115
#190729 17:19:18 server id 33061 end_log_pos 867 CRC32 0xce1dc10f Update_rows: table id 115 flags: STMT_END_F
#190729 17:19:18 server id 33061 end_log_pos 986 CRC32 0xf4d017ed Table_map: `sbtest`.`sbtest2` mapped to number 114
#190729 17:19:18 server id 33061 end_log_pos 1211 CRC32 0xbbaebd66 Delete_rows: table id 114 flags: STMT_END_F
#190729 17:19:18 server id 33061 end_log_pos 1541 CRC32 0x14f2e853 Table_map: `sbtest`.`sbtest2` mapped to number 114
#190729 17:19:18 server id 33061 end_log_pos 1766 CRC32 0xd9080627 Write_rows: table id 114 flags: STMT_END_F
#190729 17:19:18 server id 33061 end_log_pos 1797 CRC32 0x1d0265e4 Xid = 1084024
#190729 17:20:57 server id 33061 end_log_pos 536900936 CRC32 0x12796d01 GTID last_committed=260141 sequence_number=260157 rbr_only=yes
SET @@SESSION.GTID_NEXT= 'd9c5c228-b1e0-11e9-b14d-525400c33752:518317'/*!*/;
#190729 17:20:57 server id 33061 end_log_pos 536901004 CRC32 0x8eb6df8a Query thread_id=360 exec_time=622 error_code=0
#190729 17:20:57 server id 33061 end_log_pos 536901128 CRC32 0x3e09a82e Table_map: `sbtest`.`sbtest2` mapped to number 114
#190729 17:20:57 server id 33061 end_log_pos 536901544 CRC32 0xa97d84de Update_rows: table id 114 flags: STMT_END_F
#190729 17:20:57 server id 33061 end_log_pos 536901786 CRC32 0x35aec8ff Table_map: `sbtest`.`sbtest1` mapped to number 115
#190729 17:20:57 server id 33061 end_log_pos 536902202 CRC32 0x05064820 Update_rows: table id 115 flags: STMT_END_F
#190729 17:20:57 server id 33061 end_log_pos 536902321 CRC32 0x293c82e4 Table_map: `sbtest`.`sbtest1` mapped to number 115
#190729 17:20:57 server id 33061 end_log_pos 536902546 CRC32 0xe4bfcf8b Delete_rows: table id 115 flags: STMT_END_F
#190729 17:20:57 server id 33061 end_log_pos 536902876 CRC32 0x1fbc35a7 Table_map: `sbtest`.`sbtest1` mapped to number 115
#190729 17:20:57 server id 33061 end_log_pos 536903101 CRC32 0x45c6cc8e Write_rows: table id 115 flags: STMT_END_F
#190729 17:20:57 server id 33061 end_log_pos 536903132 CRC32 0xc6b80fb0 Xid = 2284911
#190729 17:20:57 server id 33062 end_log_pos 536903179 CRC32 0x23a1bcf6 Rotate to mysql-bin.000011 pos: 4

Happy + excited, quickly parse from the library with the mysqlbinlog command, here only need to specify -- stop-datetime timestamp less than 190729 17:22:00, but greater than 190729 17:19:18, and then parse the data to empty, confirm that the problem recurs.

#Here we specify -- stop-datetime timestamp 190729 17:20:00
[root@node3 binlog]# mysqlbinlog --stop-datetime='2019-07-29 17:20:00' mysql-bin.000010 > b.sql  

[root@node3 binlog]# ls -lh b.sql
-rw-r--r-- 1 root root 336 Jul 29 17:37 b.sql

In the slave library, use vim to open b.sql, hey, as it turns out, there is no data, and the problem phenomenon is confirmed and reproduced.

#There is no data related event.
SET @@SESSION.GTID_NEXT= 'AUTOMATIC' /* added by mysqlbinlog */ /*!*/;
# End of log file

so... Here, we succeeded in reproducing the phenomenon of the problem, but for the three problems mentioned above, now, it seems that none of them has been solved. Now, let's review the three problems.

* Now it seems that there is no way to solve this problem.

* Now we can take a step forward on this issue. Select one or two GTID transaction numbers from a.txt file (the GTIDs of the two transactions in the content code section of the aforementioned a.txt file are d9c5c228-b1e0-11e9-b14d-525400c33752:258161 and d9c5c228-b1e0-11e9-b14d-525400c33752:518317 respectively), and then parse the log records corresponding to the GTID transaction number in the main library to see the binlog records in the main library. What kind of timestamp does a transaction correspond to?

* To solve this problem, you can now determine which event to exit by looking at the source code of the mysqlbinlog command and debugging with gdb. (Many people may say that mysqlbinlog exits when parsing events of FDE and Previous-GTIDs type, which seems to be true, but... To be technically rigorous, there must be conclusive evidence, so it is not recommended to draw a hasty conclusion at this time.

  • First question: What is the relationship between replication latency and incorrect point-based recovery data?  

  • Second question: What happens to the event timestamp order disorder in binlog?  

  • The third question is: when the time stamp of the mysqlbinlog parsing command is'2019-07-29 17:20:00', does it happen that the event time stamp of FDE and Previous-GTIDs type is 190729 17:22:00, when the time range condition is not satisfied, the subsequent event of parsing is skipped directly? "

Now, let's start with the second problem, using the GTID string d9c5c228-b1e0-11e9-b14d-525400c33752:258161:518317 to parse the binlog of the main library.

#Embarrassingly, we don't know which binlog of the main library these two GTIDs are in. We can only start with the first binlog, parse the main binlog one by one, and look at the GTID SET in the Previous-GTIDs event in the first 50 rows. By parsing the binlog one by one, we find that the two GTIDs we need to find are hidden in the mysql-bi of the main library. N.000009 and mysql-bin.000010 binary log files
#Parsing mysql-bin.000009 file
[root@node2 binlog]# mysqlbinlog mysql-bin.000009 |head -50  
# at 123
#190729 17:19:01 server id 33061 end_log_pos 194 CRC32 0xf53f1220 Previous-GTIDs
# d9c5c228-b1e0-11e9-b14d-525400c33752:1-214532

#Parsing mysql-bin.000010 file
[root@node2 binlog]# mysqlbinlog mysql-bin.000010 |head -50
# at 123
#190729 17:20:41 server id 33061 end_log_pos 194 CRC32 0x27fe0eac Previous-GTIDs
# d9c5c228-b1e0-11e9-b14d-525400c33752:1-474622

#Parsing mysql-bin.000011 file
[root@node2 binlog]# mysqlbinlog mysql-bin.000011 |head -50 
# at 123
#190729 17:22:17 server id 33061 end_log_pos 194 CRC32 0x1457abd9 Previous-GTIDs
# d9c5c228-b1e0-11e9-b14d-525400c33752:1-731105

In the main library, mysql-bin.000009 and mysql-bin.000010 binary log files are parsed using MySQL inlog, and GTID is specified to contain d9c5c228-b1e0-11e9-b14d-525400c33752:258161:518317.

#Parsing binlog
[root@node2 binlog]# mysqlbinlog mysql-bin.000009 mysql-bin.000010 --include-gtids='d9c5c228-b1e0-11e9-b14d-525400c33752:258161:518317' > c.sql

#Filter out information rows of event event type
[root@node2 binlog]# grep -E 'server id|GTID_NEXT' c.sql > c.txt

#Check the sizes of c.sql and c.txt files
[root@node2 binlog]# ls -lh c.*
-rw-r--r-- 1 root root 102M Jul 29 18:16 c.sql
-rw-r--r-- 1 root root 3.3K Jul 29 18:18 c.txt

#vim Views the contents of the c.txt file
[root@node2 binlog]# vim c.txt 
#190729 17:19:01 server id 33061 end_log_pos 123 CRC32 0xfade248c Start: binlog v 4, server v 5.7.27-log created 190729 17:19:01
#190729 17:19:01 server id 33061 end_log_pos 194 CRC32 0xf53f1220 Previous-GTIDs

#The binlog event timestamp with GTID d9c5c228-b1e0-11e9-b14d-525400c33752:258161 is identical to the binlog event timestamp in the slave Library
#190729 17:19:18 server id 33061 end_log_pos 88394786 CRC32 0x7a13b001 GTID last_committed=43590 sequence_number=43629 rbr_only=yes
SET @@SESSION.GTID_NEXT= 'd9c5c228-b1e0-11e9-b14d-525400c33752:258161'/*!*/;
#190729 17:19:18 server id 33061 end_log_pos 88394865 CRC32 0x1cb7f558 Query thread_id=364 exec_time=0 error_code=0
#190729 17:19:18 server id 33061 end_log_pos 88394989 CRC32 0xf07fac6f Table_map: `sbtest`.`sbtest1` mapped to number 115
#190729 17:19:18 server id 33061 end_log_pos 88395405 CRC32 0x8a0adc66 Update_rows: table id 115 flags: STMT_END_F
#190729 17:19:18 server id 33061 end_log_pos 88395524 CRC32 0x60282fdb Table_map: `sbtest`.`sbtest2` mapped to number 114
#190729 17:19:18 server id 33061 end_log_pos 88395749 CRC32 0xe041108a Delete_rows: table id 114 flags: STMT_END_F
#190729 17:19:18 server id 33061 end_log_pos 88396079 CRC32 0x1d8f8fd2 Table_map: `sbtest`.`sbtest2` mapped to number 114
#190729 17:19:18 server id 33061 end_log_pos 88396304 CRC32 0x704462b2 Write_rows: table id 114 flags: STMT_END_F
#190729 17:19:18 server id 33061 end_log_pos 88396335 CRC32 0xbf367aab Xid = 5087302
#190729 17:20:41 server id 33061 end_log_pos 536892835 CRC32 0x864ece23 Rotate to mysql-bin.000010 pos: 4
#190729 17:20:41 server id 33061 end_log_pos 123 CRC32 0x489dccb5 Start: binlog v 4, server v 5.7.27-log created 190729 17:20:41
#190729 17:20:41 server id 33061 end_log_pos 194 CRC32 0x27fe0eac Previous-GTIDs

#The binlog event timestamp with GTID d9c5c228-b1e0-11e9-b14d-525400c33752:518317 is still identical to the binlog event timestamp in the slave Library
#190729 17:20:57 server id 33061 end_log_pos 91264585 CRC32 0xc7ea8f93 GTID last_committed=43656 sequence_number=43695 rbr_only=yes
SET @@SESSION.GTID_NEXT= 'd9c5c228-b1e0-11e9-b14d-525400c33752:518317'/*!*/;
#190729 17:20:57 server id 33061 end_log_pos 91264664 CRC32 0x7b4539d5 Query thread_id=360 exec_time=0 error_code=0
#190729 17:20:57 server id 33061 end_log_pos 91264788 CRC32 0xf11c5c66 Table_map: `sbtest`.`sbtest2` mapped to number 114
#190729 17:20:57 server id 33061 end_log_pos 91265204 CRC32 0x856ae255 Update_rows: table id 114 flags: STMT_END_F
#190729 17:20:57 server id 33061 end_log_pos 91265446 CRC32 0xdcd7ee6b Table_map: `sbtest`.`sbtest1` mapped to number 115
#190729 17:20:57 server id 33061 end_log_pos 91265862 CRC32 0x3a8a4aa2 Update_rows: table id 115 flags: STMT_END_F
#190729 17:20:57 server id 33061 end_log_pos 91265981 CRC32 0xf51f1fc2 Table_map: `sbtest`.`sbtest1` mapped to number 115
#190729 17:20:57 server id 33061 end_log_pos 91266206 CRC32 0xc52d3e94 Delete_rows: table id 115 flags: STMT_END_F
#190729 17:20:57 server id 33061 end_log_pos 91266536 CRC32 0xff5e2784 Table_map: `sbtest`.`sbtest1` mapped to number 115
#190729 17:20:57 server id 33061 end_log_pos 91266761 CRC32 0x708a7aaa Write_rows: table id 115 flags: STMT_END_F
#190729 17:20:57 server id 33061 end_log_pos 91266792 CRC32 0xcde1e798 Xid = 10289372

By comparing the binlog event timestamps of the two transactions in the binlog of the master and slave libraries, it is found that the binlog event timestamps of the two transactions in the slave libraries are the original timestamps of the transaction in the binlog of the independent libraries (the event timestamps corresponding to these transactions will not be changed by other database instances in the replication topology). But why does the FDE, Previous-GTIDs timestamp from the binlog file of the library outweigh the event timestamp of the transaction? Is this the case for every binlog from the repository itself, after delays in inventory? With doubt, continue to parse mysql-bin.000011, mysql-bin.000012 from the library.

#Parsing mysql-bin.000011
[root@node3 binlog]# mysqlbinlog mysql-bin.000011 |grep -E 'server id|GTID_NEXT' > d.txt

# vim looks at the contents of d.txt file and finds no case that the time stamps of FDE, Previous-GTIDs are larger than the binlog event time stamps of transactions (WTF..., is this a magic horse situation?).
[root@node3 binlog]# vim d.txt
#190729 17:20:57 server id 33062 end_log_pos 123 CRC32 0xa03c5bbc Start: binlog v 4, server v 5.7.27-log created 190729 17:20:57
#190729 17:20:57 server id 33062 end_log_pos 194 CRC32 0xfff79f0a Previous-GTIDs
#190729 17:20:57 server id 33061 end_log_pos 259 CRC32 0x37ef05b1 GTID last_committed=0 sequence_number=1 rbr_only=yes
SET @@SESSION.GTID_NEXT= 'd9c5c228-b1e0-11e9-b14d-525400c33752:518318'/*!*/;
#190729 17:20:57 server id 33061 end_log_pos 327 CRC32 0xe88d5be8 Query thread_id=332 exec_time=622 error_code=0
#190729 17:20:57 server id 33061 end_log_pos 451 CRC32 0x3f4748f8 Table_map: `sbtest`.`sbtest1` mapped to number 115
#190729 17:20:57 server id 33061 end_log_pos 867 CRC32 0x6cab4c30 Update_rows: table id 115 flags: STMT_END_F
#190729 17:20:57 server id 33061 end_log_pos 986 CRC32 0x8f8db6cc Table_map: `sbtest`.`sbtest1` mapped to number 115
#190729 17:20:57 server id 33061 end_log_pos 1211 CRC32 0x49418317 Delete_rows: table id 115 flags: STMT_END_F
#190729 17:20:57 server id 33061 end_log_pos 1541 CRC32 0x6faf4972 Table_map: `sbtest`.`sbtest1` mapped to number 115
#190729 17:20:57 server id 33061 end_log_pos 1766 CRC32 0x35b274e6 Write_rows: table id 115 flags: STMT_END_F
#190729 17:20:57 server id 33061 end_log_pos 1797 CRC32 0x70757956 Xid = 2284921
#190729 17:22:34 server id 33062 end_log_pos 536885932 CRC32 0xe8e5a7b3 Rotate to mysql-bin.000012 pos: 4

#Continue parsing mysql-bin.000012
[root@node3 binlog]# mysqlbinlog mysql-bin.000012 |grep -E 'server id|GTID_NEXT' > e.txt 

#vim looks at the contents of e.txt file and still finds no case where the time stamps of FDE, Previous-GTIDs are larger than the binlog event timestamps of transactions
[root@node3 binlog]# vim e.txt
#190729 17:22:34 server id 33062 end_log_pos 123 CRC32 0x55c719a6 Start: binlog v 4, server v 5.7.27-log created 190729 17:22:34
#190729 17:22:34 server id 33062 end_log_pos 194 CRC32 0x246c1b20 Previous-GTIDs
#190729 17:22:34 server id 33061 end_log_pos 259 CRC32 0x5dbbac77 GTID last_committed=0 sequence_number=1 rbr_only=yes
SET @@SESSION.GTID_NEXT= 'd9c5c228-b1e0-11e9-b14d-525400c33752:776005'/*!*/;
#190729 17:22:34 server id 33061 end_log_pos 327 CRC32 0x86ca88e5 Query thread_id=331 exec_time=593 error_code=0
#190729 17:22:34 server id 33061 end_log_pos 451 CRC32 0xc2c13e95 Table_map: `sbtest`.`sbtest1` mapped to number 115
#190729 17:22:34 server id 33061 end_log_pos 867 CRC32 0xd95ec374 Update_rows: table id 115 flags: STMT_END_F
#190729 17:22:34 server id 33061 end_log_pos 1109 CRC32 0xc698d4ca Table_map: `sbtest`.`sbtest1` mapped to number 115
#190729 17:22:34 server id 33061 end_log_pos 1525 CRC32 0x893ec0b8 Update_rows: table id 115 flags: STMT_END_F
#190729 17:22:34 server id 33061 end_log_pos 1644 CRC32 0x503ef822 Table_map: `sbtest`.`sbtest1` mapped to number 115
#190729 17:22:34 server id 33061 end_log_pos 1869 CRC32 0x56078d13 Delete_rows: table id 115 flags: STMT_END_F
#190729 17:22:34 server id 33061 end_log_pos 2199 CRC32 0xb134e795 Table_map: `sbtest`.`sbtest1` mapped to number 115
#190729 17:22:34 server id 33061 end_log_pos 2424 CRC32 0x0bf4a350 Write_rows: table id 115 flags: STMT_END_F
#190729 17:22:34 server id 33061 end_log_pos 2455 CRC32 0x4d065250 Xid = 3484259
#190729 17:24:12 server id 33062 end_log_pos 536888910 CRC32 0x029b5df4 Rotate to mysql-bin.000013 pos: 4

Is there a problem with the sequence of event s in the mysql-bin.000010 file for Mao Cong Library? And the timestamp in all subsequent binlog files is okay? What's the difference between them? Think about it, it seems to recall that mysql-bin.000010 is generated when the flush binary logs statement is manually executed to switch, while the subsequent other binlog files are automatically switched. How to switch automatically? When the binlog size reaches the max_binlog_size system variable setting, automatic switching will be triggered. Let's look at the size of the binlog file and the settings of the max_binlog_size system variable from the library.

#Setting of max_binlog_size system variables
root@localhost : (none) 06:56:20> show variables like '%max_binlog_size%';
| Variable_name | Value |
| max_binlog_size | 536870912 |  #Converting to M units is 512M
1 row in set (0.00 sec)

#View the size of the binlog file
[root@node3 binlog]# ls -lh                                                                                                                                                                                                                   
total 9.4G
-rw-r----- 1 mysql mysql 176615208 Jul 29 17:22 mysql-bin.000009  #In mysql-bin.000009, flush binary logs were executed to switch to mysql-bin.000010, so the file size of mysql-bin.000009 is much smaller.
-rw-r----- 1 mysql mysql 536903179 Jul 29 17:31 mysql-bin.000010  #Converting to M unit is 512M, which just meets the settings of max_binlog_size system variables
-rw-r----- 1 mysql mysql 536885932 Jul 29 17:32 mysql-bin.000011
-rw-r----- 1 mysql mysql 536888910 Jul 29 17:33 mysql-bin.000012
-rw-r----- 1 mysql mysql 536887458 Jul 29 17:34 mysql-bin.000013
-rw-r----- 1 mysql mysql 536888575 Jul 29 17:35 mysql-bin.000014
-rw-r----- 1 mysql mysql 536903940 Jul 29 17:37 mysql-bin.000015
-rw-r----- 1 mysql mysql 491243565 Jul 29 18:14 mysql-bin.000016
-rw-r----- 1 mysql mysql 832 Jul 29 17:37 mysql-bin.index

So far, we have gone a step further and learned that the time stamp disorder in the binlog file mysql-bin.000010 from the library is caused by the manual execution of the flush binary logs statement to switch logs, whereas other binlogs are automatically generated by triggering scroll. But... The time stamps of FDE and Previous-GTIDs in binlog generated by these two methods must be different. If we want to find out the time stamps of FDE and Previous-GTIDs in two cases directly, it may be more complicated. Here we use the counter-evidence method. In the case of delay in replication, we first put the replication thread from the library. Stop and execute flush binary logs; discover that FDE and Previous-GTIDs timestamps are the current timestamps of the library instance, and then restart the replication thread from the library so that the size of the binlog file from the library can automatically switch when max_binlog_size system variables are set. Find that FDE and Previous-GTIDs timestamps are from the last bin. The event timestamp corresponding to the last transaction in the log file (that is, the FDE and Previous-GTIDs timestamps in the automatically switched binlog follow the main library binlog event, the event timestamp in the whole binlog does not appear disorderly, and it is impossible to ask the event that cannot resolve the transaction when using mysqlbinlog to parse the binlog). Interested students can verify it by themselves, which will not be repeated here.

The second problem has been solved here. Now, how can we solve the third problem? Looking at the source code and doing the gdb debugging, we temporarily invited a mysterious friend to help us (at the end of the article, we also provided a reference link to the mysqlbinlog document). Here is the key evidence for the exit position of the mysqlbinlog command. FDE and Previous-GTIDs timestamps are 1908250:01:37, event timestamps are 190724:14:07:36, and mysqlbinlog parsing binlog specifies -- stop-datetime time is 2019-07-2500:00 (Friendship hint: The data in this figure is the environment of this mysterious friend, which does not correspond to the data in the context of this article).

  • The logical code for processing timestamps in the source file mysqlbinlog.cc is as follows.

  • The event data in the binlog of debugging test data is as follows.

  • The command to perform mysqlbinlog parsing is as follows.

  • The debugging information of gdb is as follows. It can be seen intuitively that when dealing with event of FORMAT_DESCRIPTION_EVENT (FDE), go to end exits directly.

Now that the third problem has been solved, the first one remains: What is the relationship between replication delay and incorrect point-based recovery? We have solved the second and third questions. The first question seems to be the answer. Let's reorganize the information above.

  • During sysbench manometry, we first stop the SQL thread, then manually execute the flush binary logs statement to switch the binlog. At this time, the switched binlog file is named mysql-bin.000010. The FDE and Previous-GTIDs timestamps in the binlog are from the current timestamp of the library (190729 17:22:00). At this time, the SQL thread is not started. That is to say, the binlog from the main library in relay log will not be applied, that is to say, no data event s will be written from the binlog of the library itself.

  • Then, after about 10 minutes, we started the SQL thread and started applying the binlog from the main library. The event timestamp related to the first transaction in the mysql-bin.000010 file was 190729 17:19:18 (it is the timestamp from the main library when writing the binlog, and it will not change because of any operation from the slave library). That is to say, this is the case. The event timestamp in mysql-bin.000010 is in a disordered state.

  • Then, when parsing binlog with mysqlbinlog, the time stamp of stop-datetime is designated as 2019-07-2917:20:00, which is less than the time stamp of FDE and Previous-GTIDs 190729:17:22:00. According to the logic of the debugging of gdb above, the time stamp specified is less than the time stamp of FDE, which does not satisfy the conditions and no longer scans backwards (even later). The timestamp of the continued event is 190729 17:19:18, and it's no use meeting the timestamp specified by mysqlbinlog. The mysqlbinlog command exits the parsing task.

So far, the third problem has been solved.

  • The reason for incorrect point-based data recovery is that the flush binary logs statement is manually executed to switch the binlog when replication delay occurs from the inventory (note here: in the case of replication delay from the inventory, if the process of the slave database is restarted, the timestamps in the binlog will still be out of order), which leads to bin. The timestamp in the log file is out of order, and then the mysqlbinlog tool specifies an awkward timestamp (stuck in a timestamp smaller than FDE and Previous-GTIDs but larger than the timestamp of data event), which causes the parsing task to exit and fails to parse the event of data correctly.

PS: So far, we have identified two direct causes of time stamp disorder from the library binlog through the above series of analysis and checking steps.

  • Manually execute the flush bianry logs statement.

  • Restart the slave database process.

4. Question Tracking

At this moment, is it possible to finish the work? If it's pure technology research, well, it shouldn't be a big problem, but it's not a simple technology problem, but a practical problem discovered by the customer site. We need to find out where the flush statement is executed in the actual environment or restart the process from the database, and study the preventive strategy. Therefore, we need to make a second confirmation with the customer site.

It has been confirmed again that the flush binary logs statement has not been executed manually and the slave process has been restarted once or twice. However, the one or two restart operations obviously do not match the probabilities of 5% to 10% of the problem phenomena. There must be some mystery among them, which is where the flush binary logs statement has been executed manually, but at this time, yes. Few factors have been suspected. In fact, where can be suspected, the first one should be the code of the database management system.

Feedback to the relevant R&D staff suggests checking whether flush binary logs or flush logs statements are executed in the code and whether unnecessary restart of the slave library process has been done. The next day, they did find a flush logs statement after the global read lock when snapshot the data volume in the code. This operation is completely unnecessary for the database to implement the function logic of restoring data based on time points, and flush statements can be deleted. But can we immediately suggest that we delete the flush statement from the management system code? As a rigorous technical practitioner, we should not intentionally or unintentionally ignore the potential risk points. You should advise developers to track code submissions to see when and who submitted them, then confirm why this action was added, and then evaluate whether flush statement-related code needs to be deleted.

About a day later, the research and development partners were surprised to find that the code related to the flush statement was submitted 3.5 years ago. The submission annotation said that it was only for the purpose of snapshot, but no one knew why the flush operation was added. What can we do? Finally, we suggest that we develop and delete the code related to flush statements. Then, the whole management system goes through the routine function test once, then goes through the stress test again, until all the tests are finished, until there is no problem that "time-based recovery data" can not recover data correctly, and no other new problems arise. . Then, I suggest that customers, unless necessary, do not arbitrarily execute flush statements from the library, do not restart the slave library at will, or what impact it may have, such as Barbara Barbara...

At this point, you can issue relevant report documents and submit your homework.

Reference link:

Eight Eccentrics in Chongqing (Gao Peng):




Author's Brief Introduction

Luo Xiaobo-Waukee Senior Database Technology Expert

IT has been working for many years, mainly responsible for MySQL product database support and after-sales support. I have participated in the design and compilation of version publishing system, lightweight monitoring system, operation and maintenance management platform and database management platform. I am familiar with MySQL architecture, Innodb storage engine, and like to specialize in open source technology. I have done many offline database topic sharing in public, and published many database-related research articles. .

Tags: MySQL mysqlbinlog SQL Database

Posted on Thu, 01 Aug 2019 20:37:55 -0700 by dougp23