Some suggestions on writing high quality SQL

1. Try not to use select * to query SQL, but select specific fields.

Counter example:

select * from employee;

Positive example:

selectidnamefrom employee;

reason:

  • Only take the required fields, save resources and reduce network overhead.

  • When select * is used for query, it is likely that the overwrite index will not be used, resulting in a back table query.

2. If you know that there is only one query result or only one maximum / minimum record, it is recommended to use limit 1

Suppose you have an employee table now. Find a person named jay

CREATETABLE`employee` (
  `id`int(11) NOTNULL,
  `name`varchar(255) DEFAULTNULL,
  `age`int(11) DEFAULTNULL,
  `date` datetime DEFAULTNULL,
  `sex`int(1) DEFAULTNULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDBDEFAULTCHARSET=utf8;

Counter example:

selectidnamefrom employee wherename='jay'

Positive example

selectidnamefrom employee wherename='jay'limit1;

reason:

  • When limit 1 is added, as long as a corresponding record is found, it will not continue to scan down, and the efficiency will be greatly improved.

  • Of course, if name is the only index, there is no need to add limit 1. Because limit is mainly used to prevent full table scanning and improve performance. If a statement itself can predict that full table scanning is not needed, there is no difference in performance if there is a limit.

3. Try to avoid using or to join conditions in where clause

Create a new user table with a common index userId. The table structure is as follows:

CREATETABLE`user` (
  `id`int(11) NOTNULL AUTO_INCREMENT,
  `userId`int(11) NOTNULL,
  `age`int(11) NOTNULL,
  `name`varchar(255) NOTNULL,
  PRIMARY KEY (`id`),
  KEY`idx_userId` (`userId`)
) ENGINE=InnoDBDEFAULTCHARSET=utf8;

Suppose you need to query the user whose userid is 1 or the age is 18, it is easy to have the following sql

Counter example:

select * fromuserwhere userid=1or age =18

Positive example:

//Using union all
select * fromuserwhere userid=1
unionall
select * fromuserwhere age = 18

//Or separate themsqlWrite:
select * fromuserwhere userid=1
select * fromuserwhere age = 18

reason:

  • Using or may invalidate the index and scan the whole table.

In the case of age with or + without index, suppose it goes through the index of userId, but when it comes to the query condition of age, it still has to scan the whole table, that is to say, it needs three steps: scan the whole table + scan the index + merge. If it starts with scan the whole table, it will be finished by scanning directly once. mysql has an optimizer, which considers efficiency and cost. When encountering or condition, the index may fail, which seems reasonable.

4. Optimize limit paging

When we do paging requirements, we usually use limit, but when the offset is very large, the query efficiency becomes low.

Counter example:

selectidname,age from employee limit1000010

Positive example:

//Scheme 1: return the maximum record (offset) of the last query
selectidnamefrom employee whereid>10000limit10.

//Scheme II:orderby + Indexes
selectidnamefrom employee orderbyidlimit1000010

//Scheme 3: limit the number of pages when business allows:

reason:

  • When the offset is the largest, the query efficiency will be lower, because Mysql does not skip the offset to get the later data directly, but first discards the offset + the number of entries to get, and then discards the previous offset data before returning.

  • If you use optimization scheme 1, return the last maximum query record (offset), so that you can skip the offset and improve the efficiency.

  • Scheme 2 uses order by + index, which can also improve query efficiency.

  • For scheme 3, it is suggested to discuss with the business. Is it necessary to check the page after this. Because the vast majority of users will not turn back too many pages.

5. Optimize your like statement

In daily development, if you use fuzzy keyword query, it's easy to think of like, but like is likely to make your index invalid.

Counter example:

select userId,namefromuserwhere userId like'%123';

Positive example:

select userId,namefromuserwhere userId like'123%';

reason:

  • Put% in front without index, as follows:

  • If you put% after the keyword, you will still index it. As follows:

6. Use the where condition to limit the data to be queried and avoid returning redundant rows

Suppose the business scenario is this: query whether a user is a member. I've seen the old implementation code...

Counter example:

List<Long> userIds = sqlMap.queryList("select userId fromuserwhere isVip=1");
boolean isVip = userIds.contains(userId);

Positive example:

Long userId = sqlMap.queryObject("select userId fromuserwhere userId='userId'and isVip='1'")
boolean isVip = userId!=null;

reason:

  • If you need any data, you can check it to avoid returning unnecessary data and save costs.

7. Try to avoid using mysql's built-in functions on index columns

Business requirements: query the users who have logged in in the last seven days (assuming loginTime is indexed)

Counter example:

select userId,loginTime from loginuser whereDate_ADD(loginTime,Interval7DAY) >=now();

Positive example:

explainselect userId,loginTime from loginuser where  loginTime >= Date_ADD(NOW(),INTERVAL - 7DAY);

reason:

  • The built-in function of mysql is used on the index column, and the index is invalid

  • If the index column does not add built-in functions, the index will still go.

8. The expression operation on the field in the where clause should be avoided as far as possible, which will result in the system abandoning the use of index for full table scanning

Counter example:

select * fromuserwhere age-1 =10

Positive example:

select * fromuserwhere age =11

reason:

  • Although age added index, but because of its operation, index directly lost...

9. Inner join, left join and right join are preferred. If it is a left join, the result in the left table should be as small as possible

  • Inner join inner join. When two tables are queried for join, only the result set that exactly matches the two tables is retained

  • When left join queries two tables, it will return all the rows in the left table, even if there are no matching records in the right table.

  • right join returns all rows of the right table when two tables are queried, even if there are no matching records in the left table.

On the premise that all SQL requirements are met, it is recommended to use Inner join first. If you want to use left join, the data result of the left table should be as small as possible, and if you have conditions, it should be processed on the left as much as possible.

Counter example:

select * from tab1 t1 left join tab2 t2  on t1.size = t2.size where t1.id>2;

Positive example:

select * from (select * from tab1 where id >2) t1 left join tab2 t2 on t1.size = t2.size;

reason:

  • If the inner join is an equivalent join, it may return fewer rows, so the performance is better.

  • In the same way, the left join is used, the data results of the left table are as small as possible, and the conditions are put to the left for processing, which means that the number of rows returned may be relatively small.

10. Try to avoid using the! = or < > operators in the where clause, otherwise the engine will abandon the use of indexes for full table scanning.

Counter example:

select age,name  from user where age <>18;

Positive example:

//Consider two separate sql writes
select age,name  from user where age <18;
select age,name  from user where age >18;

reason:

  • Using! = and < > is likely to invalidate the index

 

11. When using union index, pay attention to the order of index columns, and generally follow the leftmost matching principle.

Table structure: (there is a union index idx ﹣ userId ﹣ age, with userId before and age after)

CREATE TABLE `user` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `userId` int(11) NOT NULL,
  `age` int(11) DEFAULT NULL,
  `name` varchar(255) NOT NULL,
  PRIMARY KEY (`id`),
  KEY `idx_userid_age` (`userId`,`age`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=utf8;

Counter example:

select * from user where age = 10;

Positive example:

//In line with the leftmost matching principle
select * from user where userid=10 and age =10;
//In line with the leftmost matching principle
select * from user where userid =10;

reason:

  • When we create a joint index, such as (k1,k2,k3), we create (k1), (k1,k2) and (k1,k2,k3), which is the leftmost matching principle.

  • The union index does not meet the leftmost principle, and the index will generally fail, but this is also related to the Mysql optimizer.

12. In order to optimize the query, we should consider to build indexes on the columns involved in where and order by, and try to avoid full table scanning.

Counter example:

select * from user where address ='Shenzhen' order by age ;

Positive example:

Add index
alter table user add index idx_address_age (address,age)

 

13. If too much data is inserted, consider bulk insertion.

Counter example:

for(User u :list){
 INSERT into user(name,age) values(#name#,#age#)   
}

Positive example:

//500 batch inserts at a time, in batches
insert into user(name,age) values
<foreach collection="list" item="item" index="index" separator=",">
    (#{item.name},#{item.age})
</foreach>

reason:

  • Good batch insertion performance, more time-saving

Make a metaphor: if you need to move 10000 bricks to the top of the building, you have an elevator. The elevator can put an appropriate amount of bricks at a time (500 at most). You can choose to transport one brick at a time, or 500 at a time. Which time do you think is a big consumption?

14. When appropriate, use the overlay index.

Overlay index can make your SQL statement do not need to return to the table, just access the index can get all the data needed, greatly improving the query efficiency.

Counter example:

// like fuzzy query, stop indexing
select * from user where userid like '%123%'

Positive example:

//If the id is the primary key, then it is a normal index, that is, the overwrite index is launched.
select id,name from user where userid like '%123%';

 

15. Be careful with distinct keywords

The distinct keyword is commonly used to filter duplicate records to return non duplicate records. When a field or few fields are queried, it will bring optimization effect to the query. But it can be used in many fields, which will greatly reduce the query efficiency.

Counter example:

SELECT DISTINCT * from  user;

Positive example:

select DISTINCT name from user;

reason:

  • The cpu time and elapsed time of statements with distinct are higher than those without distinct. Because when querying many fields, if you use distinct, the database engine will compare the data and filter out the duplicate data. However, the filtering process will take up system resources and cpu time.

16. Remove redundant and duplicate indexes

Counter example:

  KEY `idx_userId` (`userId`)
  KEY `idx_userId_age` (`userId`,`age`)

Positive example:

//Delete the userId index because the combined index (A, B) is equivalent to creating the (A) and (A, B) indexes
  KEY `idx_userId_age` (`userId`,`age`)

reason:

  • Duplicate indexes need to be maintained, and the optimizer needs to consider them one by one when optimizing queries, which will affect performance.

17. If the amount of data is large, optimize your modify / delete statements.

Avoid modifying or deleting too much data at the same time, because it will cause high cpu utilization, which will affect other people's access to the database.

Counter example:

//Delete 100000 or 1 million +?
delete from user where id <100000;
//Or single cycle operation, low efficiency and long time
for(User user: list){
   delete from user;
}

Positive example:

//Delete in batches, such as 500 at a time
delete user where id<500
delete product where id>=500 and id<1000

reason:

  • If too much data is deleted at one time, there may be an error of lock wait timeout exceed ed, so batch operation is recommended.

18. Consider using the default value instead of null in the where clause.

Counter example:

select * from user where age is not null;

Positive example:

//Set 0 as default
select * from user where age>0;

reason:

  • This is not to say that using is null or is not null will not leave the index, which is related to mysql version and query cost.

If the mysql optimizer finds that the cost of index walking is higher than that of no index walking, it will definitely give up the index. These conditions! =, > is null, is not null is often considered to invalidate the index, in fact, because in general, the query cost is high and the optimizer automatically gives up.

  • If you change the null value to the default value, it will make indexing possible in many cases. At the same time, the meaning of expression will be relatively clear.

19. Do not have more than 5 meter connections

  • The more tables you have, the more time and overhead you have to compile.

  • Split the connection table into smaller ones for execution, which is more readable.

  • If you have to join a lot of tables to get data, that means bad design.

20. Rational use of exist & in

Suppose that table A represents the employee table of an enterprise, and table B represents the Department table. Query all employees of all departments. It is easy to have the following SQL:

select * from A where deptId in (select deptId from B);

This is equivalent to:

Query department table B first

select deptId from B

Then the deptId of the Department queries the employees of A

select * from A where A.deptId = B.deptId

It can be abstracted into such a cycle:

   List<> resultSet ;
    for(int i=0;i<B.length;i++) {
          for(int j=0;j<A.length;j++) {
          if(A[i].id==B[j].id) {
             resultSet.add(A[i]);
             break;
          }
       }
    }

Obviously, in addition to using in, we can also use exists to implement the same query function, as follows:

select * from A where exists (select 1 from B where A.deptId = B.deptId);

Because the understanding of exists query is to execute the main query first, get the data, and then put it into the sub query for condition verification. According to the verification result (true or false), it determines whether the data result of the main query is satisfied with retention.

So, to write like this is equivalent to:

select * from A, cycle from table A first

select * from B where A.deptId = B.deptId, then loop from B table

Similarly, it can be abstracted into such a cycle:

   List<> resultSet ;
    for(int i=0;i<A.length;i++) {
          for(int j=0;j<B.length;j++) {
          if(A[i].deptId==B[j].deptId) {
             resultSet.add(A[i]);
             break;
          }
       }
    }

Database is the most difficult to release with the program link. Suppose we link twice, and each time we do a million times of data set query, we go after the query, so we only do it twice; on the contrary, we establish a million times of links, and the application link release is repeated repeatedly, so the system can't stand it. That is, mysql optimization principle, that is, small tables drive large tables, and small datasets drive large datasets, so as to improve performance.

Therefore, we need to select the one with the smallest outer loop, that is, if the data amount of B is less than A, it is suitable to use in; if the data amount of B is greater than A, it is suitable to select exist.

21. Try to replace union with union all

If there are no duplicate records in the retrieval results, it is recommended that union all replace Union.

Counter example:

select * from user where userid=1
union
select * from user where age = 10

Positive example:

select * from user where userid=1
union all
select * from user where age = 10

reason:

  • If you use union, whether the retrieval results are repeated or not, you will try to merge them, and then sort them before outputting the final results. If it is known that there are no duplicate records in the retrieval results, use union all instead of union, which will improve the efficiency.

22. Indexes should not be too many, generally less than 5.

  • Index is not the more the better. Although index improves the efficiency of query, it also reduces the efficiency of insert and update.

  • The index may be rebuilt during insert or update, so it needs to be considered carefully, depending on the specific situation.

  • It is better not to have more than 5 indexes in a table. If there are too many indexes, it is necessary to consider whether some indexes do not exist.

23. Try to use numerical field, and try not to design the field with only numerical information as character type

Counter example:

king_id` varchar(20) NOT NULL COMMENT 'guardian Id'

Positive example:

`king_id` int(11) NOT NULL COMMENT 'guardian Id'`

reason:

  • Compared with numeric fields, character type reduces query and connection performance and increases storage overhead.

24. Index is not suitable for fields with a large number of duplicate data, such as gender database fields.

Because the SQL optimizer optimizes the query based on the amount of data in the table. If there is a large amount of duplicate data in the index column, the Mysql query optimizer calculates that the cost of not walking the index is lower, and it is likely to give up the index.

25. Try to avoid returning too much data to the client.

Suppose the business requirement is that users request to view the live data they have watched in the last year.

Counter example:

//Query all data at once
select * from LivingInfo where watchId =useId and watchTime >= Date_sub(now(),Interval 1 Y)

Positive example:

//Paging query
select * from LivingInfo where watchId =useId and watchTime>= Date_sub(now(),Interval 1 Y) limit offset,pageSize

//For front-end paging, you can query the first two hundred records first, because the average user should not turn down too many pages,
select * from LivingInfo where watchId =useId and watchTime>= Date_sub(now(),Interval 1 Y) limit 200 ;

26. When connecting multiple tables in the SQL statement, please use the alias of the table and prefix the alias to each column, so that the semantics is clearer.

Counter example:

select  * from A inner
join B on A.deptId = B.deptId;

Positive example:

select  memeber.name,deptment.deptName from A member inner
join B deptment on member.deptId = deptment.deptId;

27. Try to use varchar/nvarchar instead of char/nchar.

Counter example:

  `deptName` char(100) DEFAULT NULL COMMENT 'Department name'

Positive example:

  `deptName` varchar(100) DEFAULT NULL COMMENT 'Department name'

reason:

  • Because the storage space of the variable length field is small, the storage space can be saved.

  • Secondly, for query, searching in a relatively small field is more efficient.

28. To improve the efficiency of the group by statement, you can filter out unnecessary records before executing the statement.

Counter example:

select job,avg(salary) from employee  group by job having job ='president'
or job = 'managent'

Positive example:

select job,avg(salary) from employee where job ='president'
or job = 'managent' group by job;

29. How to use quotation marks when the field type is string? Otherwise, the index will fail

Counter example:

select * from user where userid =123;

Positive example:

select * from user where userid ='123';

reason:

  • Why doesn't the first statement go without a single quotation mark? This is because when the single quotation mark is not added, it is a comparison between strings and numbers. Their types do not match. MySQL will do implicit type conversion, convert them to floating-point numbers, and then compare them.

30. Use explain to analyze your SQL plan

When developing and writing SQL, try to form a habit. Use explain to analyze the SQL you write, especially the index block.

explain select * from user where userid =10086 or age =18;

 

Article reference https://mp.weixin.qq.com/s/z5EYJPeT5pE3Y_kpSCIG2g

Tags: Database SQL MySQL less

Posted on Thu, 30 Apr 2020 19:55:20 -0700 by adi