99%SQL optimization encountered in your work, here are all solutions for you

-- Example table
CREATE TABLE `employees` (
  `name` varchar(24) NOT NULL DEFAULT '' COMMENT 'Full name',
  `age` int(20) NOT NULL DEFAULT '0' COMMENT 'Age',
  `position` varchar(20) NOT NULL DEFAULT '' COMMENT 'position',
  PRIMARY KEY (`id`),
  KEY `idx_name_age_position` (`name`,`age`,`position`) USING BTREE,
  KEY `idx_age` (`age`) USING BTREE

--Create 100,000 records
drop procedure if EXISTS insert_emp;
delimiter ;;
create procedure insert_emp()
    declare i int;
    set i=1;
    while(i < 100000)DO
        INSERT INTO employees(name,age,position) values(CONCAT('xiaoqiang',i),i,'coder');
        SET i=i+1;
    end WHILE;
delimiter ;
call insert_emp();

Paged queries sorted by self-increasing and continuous primary keys

select * from employees LIMIT 9999 ,5;

Represents the removal of five rows from table employees starting at 10,000 rows.It looks like there are only five records being queried. In fact, this SQL reads 10,000 records first, then discards the first 10,000 records, and then reads the next five desired data.No separate order by was added, meaning sort by primary key.
Therefore, to query the data behind a large table, the execution efficiency is very low.
Because the primary key is self-increasing and continuous, it can be rewritten to query five rows of data starting from 10001 according to the primary key, as follows:

select * from  employees WHERE id > 9999 limit 5;

You can see the execution plans for the two sqls. It is clear that the rewritten sql has gone indexed, and the number of rows scanned has been significantly reduced, resulting in more efficient execution.However, this overridden sql is not practical in many scenarios because it is possible that some records in the table will be deleted and the primary key will be blank, resulting in inconsistent results.
Delete a record first, then test the original and optimized sql:

select * from employees LIMIT 9999 ,5;

 select * from employees where id> 9999 limit 5;

The results of the two SQLs are different, so the method described above cannot be used if the primary key is not continuous.
In addition, since the original sql was an order by non-primary key field, the results of overwriting sql in the above way are inconsistent.So this rewriting satisfies the following two conditions:

  • Primary keys are self-increasing and continuous
  • The results are sorted by primary key

Paged queries sorted by non-primary key fields

select * from employees order by name limit 9000, 5;

 explain select * from employees order by name limit 9000, 5;

The value corresponding to the key field is null, and it is found that the index for the name field is not used.Because scanning the entire index and finding rows without an index can facilitate multiple index trees, which are more costly than scanning the entire table, the index optimizer discards the use of indexes.
The key to optimization is to make the sorting return as few fields as possible, so you can have the sorting and paging operations find the primary key first, and then the corresponding records based on the primary key.
Change to the following:

select * from employees as e inner join(select id from employees order by name limit 9000,5) as ed on e.id=ed.id;

You can see that the result is consistent with the original sql result, the execution time is reduced by more than half, and then the execution plan is compared:

The original sql used filesort sorting, while the optimized sql used index sorting.

in and exists optimization

Principle: Small tables drive large tables, that is, small tables'datasets drive large tables' datasets
In: when the dataset of table B is smaller than that of table A, in is due to exists

select * from A where id in(select id from B)
//Equivalent to
 for(select id from B){
     select * from A where A.id=B.id

exists: exits is better than in when the dataset of table A is smaller than that of table B
When data from Query A is placed in Subquery B for conditional validation, the result of validation (true or false) determines whether the queried data is preserved.

select * from A  exists(select 1 from B where A.id=B.id)

//Equivalent to
for(select * from A){
    select * from B where A.id=B.id

count(*) query optimization

explain select count(1) from employees;
explain select count(id) from employees;
explain select count(name) from employees;
explain select count(*) from employees;

The execution plans for the four SQLs are almost the same, and count(name) uses a federated index, mainly distinguishing rows of data where counting based on a field does not count values for which the field is null.
All count operations except count(name) use secondary indexes rather than primary key indexes because secondary indexes store less data and have better retrieval performance.

Not noticing my public number yet?

  • The two-dimensional code at the end of the scanner pays attention to the public number, which can be drawn as follows:
  • Learning materials: 1T video tutorials: covering front-end and back-end teaching videos of Java web, machine learning/artificial intelligence teaching videos, Linux system tutorial videos, IELTS video tutorials;
  • More than 100 books: Classic Must See Books, LeetCode Exercises Complete with C/C++, Java, Python programming languages;
  • Software Tools: Almost all the software you might use on the programming path;
  • Project source: 20 JavaWeb project sources.

Tags: MySQL SQL Java Programming less

Posted on Mon, 09 Sep 2019 18:08:40 -0700 by Monkey-Moejo