Mysql master replication failed

Mysql master replication failed

Fault description

Cause description Because of the aging of the cabinet PDU, the whole cabinet power off.
Failure time 20160923-10:09
Discovery time 20160929-13:56

Architecture information

Tomcat Memcache Keepalive Mysql master replication

Node information

Serial number Node name IP address Wrong information
1 aipprd1 10.66.1.52 Got fatal error 1236 from master when reading data from binary log: 'binlog truncated in the middle of event; consider out of disk space on master; the first event 'mysql-bin.000084' at 91941417, the last event read from '/aip/mysql/data/log/mysql-bin.000084' at 91941783, the last byte read from '/aip/mysql/data/log/mysql-bin.000084' at 91942912.'
2 aipprd2 10.66.1.51 Got fatal error 1236 from master when reading data from binary log: 'binlog truncated in the middle of event; consider out of disk space on master; the first event 'mysql-bin.000082' at 6369026, the last event read from '/aip/mysql/data/log/mysql-bin.000082' at 6369026, the last byte read from '/aip/mysql/data/log/mysql-bin.000082' at 6369280.'

fault analysis

  • Because of the Mysql monitoring script of Zabbix, no trigger event was triggered, so it was not discovered until 20160 929 when Zabbix log was checked. At that time, Keepalive's VIP was on aipprd1, on this node, the database was served externally.
  • First of all, aipprd1 is the main component. First, aipprd2 is synchronized from the environment.
  • After aipprd2 is synchronized from the environment, aipprd1 is synchronized from the environment.

Synchronized slave environment for AIPPRD2

  • Check the Slave status of aipprd2
    According to the description of the code area, Last_Errno's error code is 106 2, requiring manual modification of Position.

    [root@aipprd2 ~]# mysql -uzabbixmoniter -ppassw0rd -hlocalhost -e "show slave status\G;"
    *************************** 1. row ***************************
               Slave_IO_State: 
                  Master_Host: 10.66.1.52
                  Master_User: root
                  Master_Port: 3388
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000082
          Read_Master_Log_Pos: 6369026
               Relay_Log_File: mysql-relay-bin.000257
                Relay_Log_Pos: 8999
        Relay_Master_Log_File: mysql-bin.000082
             Slave_IO_Running: No
            Slave_SQL_Running: No
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
    Replicate_Wild_Ignore_Table: 
                   Last_Errno: 1062
                   Last_Error: Error 'Duplicate entry '93FF91EF92866D23E80E4A57D55ED538-n1.tomcat604' for key 'PRIMARY'' on query. Default database: 'aipprd'. Query: 'INSERT INTO eahttpsession (     sessionid, username, account,      createtime, loginip,userid,explorer,userDomain,computerName,computerUserName)   VALUES ('93FF91EF92866D23E80E4A57D55ED538-n1.tomcat604', 'Li Hua', 'XS003_4200',      '2016-09-23 12:10:33', , '10752920','MSIE 7.0','','','')'
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 6367510
              Relay_Log_Space: 11099
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: NULL
    Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 1236
                Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'binlog truncated in the middle of event; consider out of disk space on master; the first event 'mysql-bin.000082' at 6369026, the last event read from '/aip/mysql/data/log/mysql-bin.000082' at 6369026, the last byte read from '/aip/mysql/data/log/mysql-bin.000082' at 6369280.'
               Last_SQL_Errno: 1062
               Last_SQL_Error: Error 'Duplicate entry '93FF91EF92866D23E80E4A57D55ED538-n1.tomcat604' for key 'PRIMARY'' on query. Default database: 'aipprd'. Query: 'INSERT INTO eahttpsession (     sessionid, username, account,      createtime, loginip,userid,explorer,userDomain,computerName,computerUserName)   VALUES ('93FF91EF92866D23E80E4A57D55ED538-n1.tomcat604', 'Li Hua', 'XS003_4200',      '2016-09-23 12:10:33', , '10752920','MSIE 7.0','','','')'
    Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 1
  • Modify on aipprd2 according to Pos file and PS in Last_IO_Error.
    After Pos file and PS are modified according to the prompt given by the error report, the error will still be reported.

    mysql> slave stop;
    mysql> CHANGE MASTER TO master_host='10.66.1.52', master_port=3388, master_user='root',     master_password='passw0rd', master_log_file='mysql-bin.000082', master_log_pos=6369026;
    mysql> slave start;
  • Check Mysql's log in aipprd2
    Mysql logs record the start time of Crash and give suggestions, Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log'mysql-bin.000082'position 6367510

    160923 12:35:22 [Note] Starting crash recovery...
    160923 12:35:22 [Note] Crash recovery finished.
    160923 12:36:27 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mysql-bin.000082' position 6367510
  • On aipprd2, follow the Pos file and PS changes recommended in the log.
    After the modification of Pos file and PS recommended in the log, the error is still reported.

    mysql> slave stop;
    mysql> CHANGE MASTER TO master_host='10.66.1.52', master_port=3388, master_user='root', master_password='passw0rd', master_log_file='mysql-bin.000082', master_log_pos=6367510;
    mysql> slave start;
  • Check posfile on aipprd1
    First, check show slave status G; the pos given in the log, and find that it does not exist at all.

    [root@aipprd1 log]# mysqlbinlog --no-defaults --start-position=6369026 mysql-bin.000083 
    /*!40019 SET @@session.max_insert_delayed_threads=0*/;
    /*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;
    DELIMITER /*!*/;
    
    # at 4
    
    
    #160923 12:38:59 server id 1  end_log_pos 107  Start: binlog v 4, server v 5.5.24-log           created 160923 12:38:59 at startup
    
    ROLLBACK/*!*/;
    BINLOG '
    47HkVw8BAAAAZwAAAGsAAAAAAAQANS41LjI0LWxvZwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
    AAAAAAAAAAAAAAAAAADjseRXEzgNAAgAEgAEBAQEEgAAVAAEGggAAAAICAgCAA==
    '/*!*/;
    ERROR: Error in Log_event::read_log_event(): 'read error', data_len: 912208946, event_type:      71
    ERROR: Could not read entry at offset 6369026: Error in log format or read error.
    DELIMITER 
    
    # End of log file
    
    ROLLBACK /* added by mysqlbinlog */;
    /*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;

    After checking the suggested pos in Mysql log, we found that this record exists in posfile, and the last pos of this posfile is 6368660, while show slave status G; pos is 6369026, which obviously does not exist in the log file.

    [root@aipprd1 log]# mysqlbinlog --no-defaults  --start-position=6367510 mysql-bin.000082 
    /*!40019 SET @@session.max_insert_delayed_threads=0*/;
    /*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;
    DELIMITER /*!*/;
    
    # at 4
    
    
    #160923 10:39:15 server id 1  end_log_pos 107  Start: binlog v 4, server v 5.5.24-log created 160923 10:39:15 at startup
    
    
    # Warning: this binlog is either in use or was not closed properly.
    
    ROLLBACK/*!*/;
    BINLOG '
    05XkVw8BAAAAZwAAAGsAAAABAAQANS41LjI0LWxvZwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
    AAAAAAAAAAAAAAAAAADTleRXEzgNAAgAEgAEBAQEEgAAVAAEGggAAAAICAgCAA==
    '/*!*/;
        # at 6367510
        # at 6367585
        # at 6367963
        # at 6367990
        # at 6368073
        # at 6368163
        # at 6368275
        # at 6368358
        # at 6368558
        # at 6368585
        # at 6368660
        # End of log file
    ROLLBACK /* added by mysqlbinlog */;
    /*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;

    Finally, show slave status G is checked in mysql-bin.000083, suggesting that pos:6369026 does not exist.

    [root@aipprd1 log]# mysqlbinlog --no-defaults  --start-position=6369026 mysql-bin.000083 
  • Re-initiate modification of posfile and pos operations on aipprd2
    Check the logs on aipprd1. Since no pos:6369026 exists at the end of mysql-bin.000082 and mysql-bin.000083 is the next log, the operation of modifying posfile and POS is re-initiated.

    [root@aipprd1 log]# ll
    total 526908
    -rw-rw---- 1 mysql mysql 104858346 Sep 19 23:04 mysql-bin.000079
    -rw-rw---- 1 mysql mysql 104859112 Sep 22 13:01 mysql-bin.000080
    -rw-rw---- 1 mysql mysql  42558345 Sep 23 10:23 mysql-bin.000081
    -rw-rw---- 1 mysql mysql   6369280 Sep 23 12:13 mysql-bin.000082
    -rw-rw---- 1 mysql mysql 104857953 Sep 26 15:02 mysql-bin.000083
    -rw-rw---- 1 mysql mysql  70844542 Sep 27 17:42 mysql-bin.000084
    -rw-rw---- 1 mysql mysql 104858026 Sep 29 15:48 mysql-bin.000085
    -rw-rw---- 1 mysql mysql    297019 Sep 29 15:54 mysql-bin.000086
    -rw-rw---- 1 mysql mysql       296 Sep 29 15:48 mysql-bin.log.index
    -rw-rw---- 1 mysql mysql       107 Sep 27 18:05 mysql-relay-bin.000182
    -rw-rw---- 1 mysql mysql        43 Sep 27 18:05 mysql-relay-bin.index

    Modify posfile to mysql-bin.000083 and POS to 0. After starting Slave, the synchronization is now normal.

    mysql> slave stop;
    mysql> CHANGE MASTER TO master_host='10.66.1.52', master_port=3388, master_user='root', master_password='passw0rd', master_log_file='mysql-bin.000083', master_log_pos=0;
    mysql> slave start;
    mysql> show slave status\G;
    *************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 10.66.1.52
                  Master_User: root
                  Master_Port: 3388
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000086
          Read_Master_Log_Pos: 18585
               Relay_Log_File: mysql-relay-bin.000002
                Relay_Log_Pos: 8119286
        Relay_Master_Log_File: mysql-bin.000083
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
    Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 8119140
              Relay_Log_Space: 280580804
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: 521370
    Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
    Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 1
    1 row in set (0.00 sec)
  • In the process of synchronization, several Last_SQL_Error: Error'Duplicate entry 1026's SQL Error is found. This is because the repeated primary key causes Slave to stop working and perform the following operations to solve (if there are multiple repeated primary keys, it needs to be executed many times):

    mysql> slave stop;
    mysql> set GLOBAL SQL_SLAVE_SKIP_COUNTER=1;
    mysql> slave start;

    Another way is to modify the mysql configuration file/etc/my.cnf and add a line slave_skip_errors = 1062 under [mysqld]. After saving, restart mysql and mysql slave can be synchronized properly.

Synchronizing the slave environment of AIPPRD1

  • Check the Slave status of aipprd1

    mysql> show slave status\G;
    *************************** 1. row ***************************
               Slave_IO_State: 
                  Master_Host: 10.66.1.51
                  Master_User: root
                  Master_Port: 3388
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000084
          Read_Master_Log_Pos: 91941783
               Relay_Log_File: mysql-relay-bin.000002
                Relay_Log_Pos: 296
        Relay_Master_Log_File: mysql-bin.000084
             Slave_IO_Running: No
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
    Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 91941783
              Relay_Log_Space: 452
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: NULL
    Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 1236
                Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'binlog truncated in the middle of event; consider out of disk space on master; the first event 'mysql-bin.000084' at 91941417, the last event read from '/aip/mysql/data/log/mysql-bin.000084' at 91941783, the last byte read from '/aip/mysql/data/log/mysql-bin.000084' at 91942912.'
               Last_SQL_Errno: 0
               Last_SQL_Error: 
    Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 2
    1 row in set (0.00 sec)
  • Check log files on aipprd2
    After checking show slave status G on aipprd1, it is suggested that posfile needs to be modified to mysql-bin.000084 and pos91942912, because when aipprd2 is synchronized, the actual synchronized data comes from aipprd1, which exists on aipprd1 itself.

    [root@aipprd2 log]# ll
    total 653832
    -rw-rw---- 1 mysql mysql 104857774 Sep 21 13:12 mysql-bin.000083
    -rw-rw---- 1 mysql mysql  91942912 Sep 23 10:21 mysql-bin.000084
    -rw-rw---- 1 mysql mysql   6295552 Sep 23 12:10 mysql-bin.000085
    -rw-rw---- 1 mysql mysql      8250 Sep 27 17:39 mysql-bin.000086
    -rw-rw---- 1 mysql mysql 104858014 Sep 29 17:20 mysql-bin.000087
    -rw-rw---- 1 mysql mysql 104857971 Sep 29 19:18 mysql-bin.000088
    -rw-rw---- 1 mysql mysql  55744926 Sep 29 20:23 mysql-bin.000089
    -rw-rw---- 1 mysql mysql       259 Sep 29 19:18 mysql-bin.log.index
    -rw-rw---- 1 mysql mysql  70844718 Sep 29 15:35 mysql-relay-bin.000005
    -rw-rw---- 1 mysql mysql 104857729 Sep 29 15:44 mysql-relay-bin.000006
    -rw-rw---- 1 mysql mysql       708 Sep 29 15:44 mysql-relay-bin.000007
    -rw-rw---- 1 mysql mysql       199 Sep 29 15:44 mysql-relay-bin.000008
    -rw-rw---- 1 mysql mysql  25188420 Sep 30 08:40 mysql-relay-bin.000009
    -rw-rw---- 1 mysql mysql       215 Sep 29 18:42 mysql-relay-bin.index

    Check show master status on aipprd2; then record Posfile as mysql-bin.000089, since aipprd1's data is up-to-date and aipprd2 has been synchronized since aipprd1 (by checking show slave status G; Seconds_Behind_Master:, if this value is small, it should be synchronized.), then the data on both sides should be similar.

    mysql> show master status;
    +------------------+-----------+--------------+------------------+
    | File             | Position  | Binlog_Do_DB | Binlog_Ignore_DB |
    +------------------+-----------+--------------+------------------+
    | mysql-bin.000089 | 104341982 |              |                  |
    +------------------+-----------+--------------+------------------+
    1 row in set (0.00 sec)
    
  • Initiate modification of posfile and pos operations on aipprd1
    So here we use posfile's mysql-bin.000089 and pos's 0 to modify, start Slave, start synchronization.

    mysql> slave stop;
    Query OK, 0 rows affected (0.11 sec)
    mysql> CHANGE MASTER TO master_host='10.66.1.51', master_port=3388, master_user='root', master_password='passw0rd', master_log_file='mysql-bin.000089', master_log_pos=0;
    Query OK, 0 rows affected (0.06 sec)
    mysql> slave start;
    Query OK, 0 rows affected (0.00 sec)
    mysql> show slave status\G;
    *************************** 1. row ***************************
                Slave_IO_State: Waiting for master to send event
                  Master_Host: 10.66.1.51
                  Master_User: root
                  Master_Port: 3388
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000089
          Read_Master_Log_Pos: 55744926
               Relay_Log_File: mysql-relay-bin.000002
                Relay_Log_Pos: 253
        Relay_Master_Log_File: mysql-bin.000089
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
    Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 55744926
              Relay_Log_Space: 409
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: 0
    Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
    Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 2
    1 row in set (0.00 sec)

Tags: MySQL Ruby Attribute mysqlbinlog

Posted on Sat, 23 Mar 2019 09:06:54 -0700 by mapleshilc