Analysis of gh-ost Principle

gh-ost principle

Introduction

The last article introduced gh-ost parameters and specific methods of use, as well as the core features - dynamic adjustable pause, dynamic modification parameters and so on. This paper explains gh-ost's execution process, data migration and handover details design from the source code.

Two principle

2.1 Implementation Process

This example is based on the core process of executing ddl records on the main library. The core code is

github.com/github/gh-ost/go/logic/migrator.go Of Migrate()

func (this *Migrator) Migrate() //Migrate executes the complete migration logic. This is the major gh-ost function.

1 Check the basic information of database instances

a test db Connectivity or not,
b Permission verification 
  show grants for current_user()
c Obtain binlog Related information,Include row Format and modification binlog Restart after formatting replicate
  select @@global.log_bin, @@global.binlog_format
  select @@global.binlog_row_image
d Is the original table storage engine innodb,Checklist-related foreign keys,Is there a trigger?,There are two ways to estimate the number of rows, one is through explain Read the execution plan and the other is select count(*) from table ,Encounter hundreds of G Big watch, the latter must be very slow.

  explain select /* gh-ost */ * from `test`.`b` where 1=1

2. Simulate slave, get current site information, and create binlog streamer to listen for binlog

2019-09-08T22:01:20.944172+08:00    17760 Query    show /* gh-ost readCurrentBinlogCoordinates */ master status
2019-09-08T22:01:20.947238+08:00    17762 Connect    root@127.0.0.1 on  using TCP/IP
2019-09-08T22:01:20.947349+08:00    17762 Query    SHOW GLOBAL VARIABLES LIKE 'BINLOG_CHECKSUM'
2019-09-08T22:01:20.947909+08:00    17762 Query    SET @master_binlog_checksum='NONE'
2019-09-08T22:01:20.948065+08:00    17762 Binlog Dump    Log: 'mysql-bin.000005'  Pos: 795282

3 Create log table xx_ghc and shadow table xx_gho and execute alter statement to change shadow table to target table structure. The following log records the process, and gh-ost records the core steps in _b_ghc.

2019-09-08T22:01:20.954866+08:00    17760 Query    create /* gh-ost */ table `test`.`_b_ghc` (
            id bigint auto_increment,
            last_update timestamp not null DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
            hint varchar(64) charset ascii not null,
            value varchar(4096) charset ascii not null,
            primary key(id),
            unique key hint_uidx(hint)
        ) auto_increment=256
2019-09-08T22:01:20.957550+08:00    17760 Query    create /* gh-ost */ table `test`.`_b_gho` like `test`.`b`
2019-09-08T22:01:20.960110+08:00    17760 Query    alter /* gh-ost */ table `test`.`_b_gho` engine=innodb
2019-09-08T22:01:20.966740+08:00    17760 Query 
   insert /* gh-ost */ into `test`.`_b_ghc`(id, hint, value)values (NULLIF(2, 0), 'state', 'GhostTableMigrated') on duplicate key update last_update=NOW(),value=VALUES(value)

4 insert into xx_gho select * from XX copy data

Get the current maximum primary key and minimum primary key and then insert the data into the shadow table according to the command line parameter chunk

Get the smallest primary key select `id` from `test`.`b` order by `id` asc limit 1;
//Get the maximum primary key soelect `id` from `test `.b `order by `id `desc limit 1;
//Get the first chunk:
select  /* gh-ost `test`.`b` iteration:0 */ `id` from `test`.`b` where ((`id` > _binary'1') or ((`id` = _binary'1'))) and ((`id` < _binary'21') or ((`id` = _binary'21'))) order by `id` asc limit 1 offset 999;

//Loop insertion into target table:
insert /* gh-ost `test`.`b` */ ignore into `test`.`_b_gho` (`id`, `sid`, `name`, `score`, `x`) (select `id`, `sid`, `name`, `score`, `x` from `test`.`b` force index (`PRIMARY`)  where (((`id` > _binary'1') or ((`id` = _binary'1'))) and ((`id` < _binary'21') or ((`id` = _binary'21')))) lock in share mode;

//Loop to the maximum id and then rely on incremental binlog synchronization     

Notice that

In the process of rowcopy, lock in share mode is added to the original table to prevent data from being modified in the process of copying. This is very important for subsequent understanding of the overall data migration. Because gh-ost does not modify this part of the data record in the process of copying. For INSERT, UPDATE, DELETE events obtained by parsing binlog, we only need to analyze log before copy and log after copy before copy. The overall data migration will be analyzed in detail later.

5 Incremental application binlog migration data

The core code is gh-ost/go/sql/builder.go, which mainly explains the DML transformation. Of course, there are other functions to do auxiliary work, such as database, table name checking and grammatical integrity checking.

Resolve to delete statement and convert to delete statement

func BuildDMLDeleteQuery(databaseName, tableName string, tableColumns, uniqueKeyColumns *ColumnList, args []interface{}) (result string, uniqueKeyArgs []interface{}, err error) {
   ....Ellipsis code...
    result = fmt.Sprintf(`
            delete /* gh-ost %s.%s */
                from
                    %s.%s
                where
                    %s
        `, databaseName, tableName,
        databaseName, tableName,
        equalsComparison,
    )
    return result, uniqueKeyArgs, nil
}

Resolve to insert statement and convert to replace into statement

func BuildDMLInsertQuery(databaseName, tableName string, tableColumns, sharedColumns, mappedSharedColumns *ColumnList, args []interface{}) (result string, sharedArgs []interface{}, err error) {
   ....Ellipsis code...
    result = fmt.Sprintf(`
            replace /* gh-ost %s.%s */ into
                %s.%s
                    (%s)
                values
                    (%s)
        `, databaseName, tableName,
        databaseName, tableName,
        strings.Join(mappedSharedColumnNames, ", "),
        strings.Join(preparedValues, ", "),
    )
    return result, sharedArgs, nil
}

Resolve to update statement and convert to statement

func BuildDMLUpdateQuery(databaseName, tableName string, tableColumns, sharedColumns, mappedSharedColumns, uniqueKeyColumns *ColumnList, valueArgs, whereArgs []interface{}) (result string, sharedArgs, uniqueKeyArgs []interface{}, err error) {
   ....Ellipsis code...
    result = fmt.Sprintf(`
             update /* gh-ost %s.%s */
                     %s.%s
                set
                    %s
                where
                     %s
         `, databaseName, tableName,
        databaseName, tableName,
        setClause,
        equalsComparison,
    )
    return result, sharedArgs, uniqueKeyArgs, nil
}

Data Consistency Analysis of Data Migration

There are three operations of gh-ost to original table and shadow table during ddl change: row copy to original table (we replace A operation), dml operation to original table (B) and apply binlog(C) to shadow table. And binlog is based on dml operation, so the application binlog of shadow table must follow the dml of the original table in the following order:

Through the analysis of the above combinations, we can see that the data is consistent in the end. Especially when the copy is over, only apply binlog is left, which is simpler.

6. cut-over switching between the original table and the shadow table after copying the data

gh-ost switching is atomic switching, which is basically accomplished by the operation of two sessions. The author has written three articles explaining the idea and switching algorithm of cut-over operation. Detailed ideas please move to the link below.

http://code.openark.org/blog/...

http://code.openark.org/blog/...

http://code.openark.org/blog/...

Here is an excerpt of the third article describing core switching logic. The principle is based on MySQL's internal mechanism: after being blocked by lock table, rename execution takes precedence over dml, that is, rename table is executed first, and then dml. Assuming that the session of gh-ost operation is c10 and c20, and that of DML requests for other services are c1-c9, c11-c19 and c21-c29.

1 Conversation c1..c9: Yes b Table normal execution DML Operation.
2 Conversation c10 : Establish_b_del Advance prevention rename Table, resulting in data loss.
      create /* gh-ost */ table `test`.`_b_del` (
            id int auto_increment primary key
        ) engine=InnoDB comment='ghost-cut-over-sentry'
        
3 Conversation c10 implement LOCK TABLES b WRITE, `_b_del` WRITE. 
4 Conversation c11-c19 New in dml perhaps select Request, but because of the table b There are locks and wait.
5 Conversation c20:Set the lock waiting time and execute it rename
    set session lock_wait_timeout:=1
    rename /* gh-ost */ table `test`.`b` to `test`.`_b_20190908220120_del`, `test`.`_b_gho` to `test`.`b`
  c20 Operational reasons c10 Lock the watch and wait.
  
6 c21-c29 For tables b The new request because lock table and rename table And wait.
7 Conversation c10 adopt sql Check session c20 In execution rename Operate and wait mdl Lock.
select id
            from information_schema.processlist
            where
                id != connection_id()
                and 17765 in (0, id)
                and state like concat('%', 'metadata lock', '%')
                and info  like concat('%', 'rename', '%')

8 c10 Based on step 7 drop table `_b_del` ,Delete the command and execute it. b The table still cannot be written. Be-all dml The requests were blocked.

9 c10 implement UNLOCK TABLES; here c20 Of rename The first command is executed. Other conversations c1-c9,c11-c19,c21-c29 The request can operate on a new table b. 

Key Points (Knock on the Blackboard)

1 The _b_del table is created to prevent cut-over from executing ahead of time, resulting in data loss.

2 The same session can drop the table after executing the write lock first.

3 Regardless of who executes rename table and dml operations first, rename table always takes precedence over dml when blocked.

You can execute gh-ost by yourself while opening general log to see the specific operation process.

2019-09-08T22:01:24.086734    17765    create /* gh-ost */ table `test`.`_b_20190908220120_del` (
            id int auto_increment primary key
        ) engine=InnoDB comment='ghost-cut-over-sentry'
2019-09-08T22:01:24.091869    17760 Query    lock /* gh-ost */ tables `test`.`b` write, `test`.`_b_20190908220120_del` write
2019-09-08T22:01:24.188687    17765    START TRANSACTION
2019-09-08T22:01:24.188817    17765      select connection_id()
2019-09-08T22:01:24.188931    17765      set session lock_wait_timeout:=1
2019-09-08T22:01:24.189046    17765      rename /* gh-ost */ table `test`.`b` to `test`.`_b_20190908220120_del`, `test`.`_b_gho` to `test`.`b`
2019-09-08T22:01:24.192293+08:00    17766 Connect    root@127.0.0.1 on test using TCP/IP
2019-09-08T22:01:24.192409    17766      SELECT @@max_allowed_packet
2019-09-08T22:01:24.192487    17766      SET autocommit=true
2019-09-08T22:01:24.192578    17766      SET NAMES utf8mb4
2019-09-08T22:01:24.192693    17766      select id
            from information_schema.processlist
            where
                id != connection_id()
                and 17765 in (0, id)
                and state like concat('%', 'metadata lock', '%')
                and info  like concat('%', 'rename', '%')
2019-09-08T22:01:24.193050    17766 Query    select is_used_lock('gh-ost.17760.lock')
2019-09-08T22:01:24.193194    17760 Query    drop /* gh-ost */ table if exists `test`.`_b_20190908220120_del`
2019-09-08T22:01:24.194858    17760 Query    unlock tables
2019-09-08T22:01:24.194965    17760 Query    ROLLBACK
2019-09-08T22:01:24.197563    17765 Query    ROLLBACK
2019-09-08T22:01:24.197594    17766 Query    show /* gh-ost */ table status from `test` like '_b_20190908220120_del'
2019-09-08T22:01:24.198082    17766 Quit
2019-09-08T22:01:24.298382    17760 Query    drop /* gh-ost */ table if exists `test`.`_b_ghc`

What happens if each step of the cut-over process fails to execute? In fact, nothing will happen except security.

If c10's create `b_del'fails, the gh-ost program exits.

If the lock statement in c10 fails, the gh-ost program exits because the table has not been locked, and the dml request can proceed normally.

If c10 has an exception before c20 executes rename
 
 The lock held by A. c10 is released, and the request to query c1-c9 and c11-c19 can be executed immediately in b.
 Because the `b_del'table exists, rename table B to `b_del' of c20 will fail.
 C. The whole operation failed, but nothing terrible happened. Some queries were blocked for some time. We need to retry.

If c10 fails to exit when c20 executes rename blocked, similar to the above, and the lock is released, c20 fails to execute rename because of the existence of the - b_old table, and all requests return to normal.

If c20 fails abnormally, gh-ost will not catch rename, session c10 will continue to run, release lock, and all requests will return to normal.

If both c10 and c20 fail, no problem: lock is cleared, rename lock is cleared. c1-c9, c11-c19 and c21-c29 can be executed normally on b.

The impact of the entire process on the application

Application writes to tables are blocked until the exchange of shadow tables succeeds or until the operation fails. If successful, the application continues to operate on the new table. If the switch fails, the application continues to operate on the original table.

Impact on replication

Because the lock statement will not be copied in the binlog file, slave can only use rename statement for atomic operation, which is not harmful to the copy.

7. Processing the Closing Work

The last part of the operation is actually related to the specific parameters. The most important and indispensable is

Close the binlog syncer connection

As for deleting intermediate tables, it is actually related to parameters -- initially-drop-ghost-table -- initially-drop-old-table.

Three summary

Throughout the execution process of GH OST, check the source algorithm design, especially the subtle design ideas of cut over, atomic operation, any exception will not have a serious impact on the business. Friends who have used the tool are welcome to share their problems, and those who have not used the tool are also welcome to make bold attempts.

Reference articles

https://www.cnblogs.com/mysql...

This public number has long focused on database technology and performance optimization, fault case analysis, knowledge sharing of database operation and maintenance technology, personal growth and self-management, etc. Welcome to sweep code.

Tags: MySQL Session Database github

Posted on Fri, 20 Sep 2019 02:08:10 -0700 by hehachris