MySql || Quickly create 100w records

Normally, each developer will discuss the optimization of sql when there is a large amount of data.But not everyone has 100w of data to use for actual combat, so today we'll start by simulating a table of 100w of data.

  • Creation principle

In fact, there are many ways to create, fast and slow.This blog is certainly not the fastest one to write, but it is really a good one to operate and understand.Let me first explain how it works: it uses the features of the MEMORY engine in mysql to quickly insert 100w of data into memory, and then sql into the target table.

  • Operation steps
  1. Create the table t_user, which is the table used to store the data.
 1 CREATE TABLE `t_user` (
 2   `id` int(11) NOT NULL AUTO_INCREMENT,
 3   `c_user_id` varchar(36) NOT NULL DEFAULT '',
 4   `c_name` varchar(22) NOT NULL DEFAULT '',
 5   `c_province_id` int(11) NOT NULL,
 6   `c_city_id` int(11) NOT NULL,
 7   `create_time` datetime NOT NULL,
 8   PRIMARY KEY (`id`) 9 ) ENGINE=InnoDB  DEFAULT CHARSET=utf8mb4;

2. Create the memory table t_user_memory, which is used to insert data quickly.

1 CREATE TABLE `t_user_memory` (
2   `id` int(11) NOT NULL AUTO_INCREMENT,
3   `c_user_id` varchar(36) NOT NULL DEFAULT '',
4   `c_name` varchar(22) NOT NULL DEFAULT '',
5   `c_province_id` int(11) NOT NULL,
6   `c_city_id` int(11) NOT NULL,
7   `create_time` datetime NOT NULL,
8   PRIMARY KEY (`id`)
9 ) ENGINE=MEMORY DEFAULT CHARSET=utf8mb4;

3. Create the random string function randStr() to be more random when assigning c_name.

delimiter $$
CREATE DEFINER=`root`@`%` FUNCTION `randStr`(n INT) RETURNS varchar(255) CHARSET utf8mb4
DETERMINISTIC
BEGIN
 DECLARE chars_str varchar(100) DEFAULT 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789';
 DECLARE return_str varchar(255) DEFAULT '';
 DECLARE i INT DEFAULT 0;
 WHILE i<n DO
     SET return_str = concat(return_str, substring(chars_str, FLOOR(1 + RAND() * 62), 1));
        SET i=i+1;
END WHILE;
RETURN return_str;
END $$

The syntax of this function can be understood without explanation, but there are a few points to note.

  1. delimiter keyword, which is a keyword that declares a terminator.Simply put, prevent clashes with BEGIN and END statements because we want them to be executed continuously.
  2. CONCAT(str,str) links two strings to form a new string.
  3. FLOOR(num) function, rounded down.For example, floor(1.23) = 1, floor(-1.23) = -2.

 

4. Create a random time function randDataTime() to make the later time more random.

 1 delimiter $$
 2 CREATE DEFINER = `root`@`%` FUNCTION `randDataTime` (sd DATETIME, ed DATETIME) RETURNS datetime DETERMINISTIC
 3 BEGIN
 4     DECLARE
 5         sub INT DEFAULT 0 ; 
 6     DECLARE
 7             ret DATETIME ;
 8         SET sub = ABS(
 9             UNIX_TIMESTAMP(ed) - UNIX_TIMESTAMP(sd)
10         ) ;
11         SET ret = DATE_ADD(
12             sd,
13             INTERVAL FLOOR(1 + RAND() *(sub - 1)) SECOND
14         ) ; 
15     RETURN ret ; 
16 END $$

It is important to note that the DATE_ADD(dateTime,INTERVAL num SECOND) function, the first parameter is the time added, and the second parameter is a combination indicating the length added, such as: (INTERVAL 1 YEAR) indicating the time interval of one year added to the original basis.

5. Create the insert data stored procedure add_t_user_memory (int)

 1 delimiter $$
 2 CREATE DEFINER=`root`@`%` PROCEDURE add_t_user_memory (IN n INT)
 3 BEGIN
 4     DECLARE
 5         i INT DEFAULT 1 ;
 6     WHILE i < n DO
 7         INSERT INTO t_user_memory (
 8             c_user_id,
 9             c_name,
10             c_province_id,
11             c_city_id,
12             create_time
13         )
14     VALUES
15         (
16             uuid(),
17             randStr (20),
18             FLOOR(RAND() * 1000),
19             RAND() * 100,
20             NOW()
21         );
22 SET i = i + 1 ;
23 END
24 WHILE ;
25 END $$

 

  • Problems in execution

After completing the above statement, we can use CALL, add_t_user_memory (1000000) to create 100w of data, which takes about 20 minutes.However, a problem was found during the call, as shown in the following figure.

Originally't_user_memory'was full.It is the table I defined above that stores data in memory, and the default size in MYSQL is 16MB.This size can only fit more than 4w of data, so we'll try to enlarge it.Then you can execute the following statement

1 SET GLOBAL max_heap_table_size = 1024 * 1024 * 1024 * 1;
2 
3 #View the size of the current settings
4 select @@max_heap_table_size;

Note: Modifications require a re-link to mysql to update the changes.I am not responsible for the consequences of such a change in the production environment.

 

  • Perfect database
  1. Execute the following statement, and it will take about 10s to insert into t_user.
INSERT INTO t_user SELECT * FROM t_user_memory;

2. Disrupt the creation time.

#Update Annual Interval
UPDATE t_user SET create_time=date_add(create_time, interval FLOOR(1 + (RAND() * 4)) year);
#Update seconds interval
UPDATE t_user SET create_time=randDataTime(NOW(),create_time);
  • complete

In this way, our million-level database is created!Then, we can empty the table t_user_memory, because it is very memory-intensive, and it takes up as much memory as your data.Next time try SQL optimization with your business.

Tags: MySQL SQL Database Stored Procedure

Posted on Sat, 10 Aug 2019 21:42:21 -0700 by Rupo