[MySQL] Problems caused by different character sets

1. Different character sets in two tables make indexing unavailable

The development reflects a request for sql that is slow and takes more than 3s. sql is as follows:

SELECT
  tc.customer_name,
  cb.service_record_id~~~~,
  tc.customer_code,
  tc.customer_desc,
  tc.customer_account_id,
  tc.phone
FROM
  tur_customer tc
INNER JOIN tur_customer_bind cb ON tc.customer_account_id = cb.customer_account_id
WHERE
  cb.employee_account_id = '181'
AND tc.is_delete = 0
ORDER BY
  cb.service_record_id DESC
LIMIT 0,10

Table structure:

CREATE TABLE `tur_customer` (
  `id` int(11) unsigned NOT NULL AUTO_INCREMENT COMMENT 'Primary key',
  `customer_code` varchar(64) NOT NULL COMMENT 'Customer code',
  `customer_name` varchar(255) NOT NULL COMMENT 'Customer name',
  `customer_name_hash` varchar(128) DEFAULT NULL COMMENT 'Customer name Hash',
  `customer_account_id` varchar(32) NOT NULL COMMENT 'Unique Identification for Account System',
  `real_name_status` int(11) DEFAULT '10' COMMENT 'Authentication Status of Real Name System: 10 Uncertified, 20 Real name, 30 Certified',
  `sex` int(11) DEFAULT '1' COMMENT 'Gender;1 Male; 0 women',
  `phone` varchar(128) NOT NULL COMMENT 'Contact number, cell phone number or landline number',
  PRIMARY KEY (`id`),
  UNIQUE KEY `uniq_customercode` (`customer_code`) USING BTREE,
  UNIQUE KEY `uniq_customeraccountid` (`customer_account_id`) USING BTREE,
  KEY `idx_customernamehash` (`customer_name_hash`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=267031 DEFAULT CHARSET=utf8 COMMENT='Customer basic information form';
 
CREATE TABLE `tur_customer_bind` (
  `id` int(11) unsigned NOT NULL AUTO_INCREMENT COMMENT 'id',
  `customer_account_id` varchar(32) NOT NULL COMMENT 'Customer account number',
  `employee_account_id` varchar(32) NOT NULL COMMENT 'Consultant account',
  `employee_account_name` varchar(32) NOT NULL COMMENT 'Name of consultant',
  `service_record_id` int(11) NOT NULL COMMENT 'Latest Service Records ID',
  `bind_time` datetime NOT NULL COMMENT 'Binding time',
  `store_code` varchar(32) DEFAULT NULL COMMENT 'Store code',
  PRIMARY KEY (`id`) USING BTREE,
  UNIQUE KEY `uniq_account` (`customer_account_id`) USING BTREE COMMENT 'Unique Index of Customer Accounts',
  KEY `idx_employee_account_id` (`employee_account_id`) USING BTREE COMMENT 'Consultant Account Index'
) ENGINE=InnoDB AUTO_INCREMENT=14583 DEFAULT CHARSET=utf8mb4 ROW_FORMAT=COMPACT COMMENT='Current customer relationship table';

The implementation plan of the analysis table is as follows:

+----+-------------+-------+------+--------------------------------------+-------------------------+---------+-------+-------+--------------------------------------------------------+
| id | select_type | table | type | possible_keys                        | key                     | key_len | ref   | rows  | Extra                                                  |
+----+-------------+-------+------+--------------------------------------+-------------------------+---------+-------+-------+--------------------------------------------------------+
|  1 | SIMPLE      | cb    | ref  | uniq_account,idx_employee_account_id | idx_employee_account_id | 130     | const |     1 | Using index condition; Using temporary; Using filesort |
|  1 | SIMPLE      | tc    | ALL  | NULL                                 | NULL                    | NULL    | NULL  | 58697 | Using where; Using join buffer (Block Nested Loop)     |
+----+-------------+-------+------+--------------------------------------+-------------------------+---------+-------+-------+--------------------------------------------------------+

Both table correlation fields tur_customer.customer_account_id and tur_customer_bind.customer_account_id are indexed and of varchar type, but the index is not used. Looking at the character set, one table is utf8 and the other is utf8mb4.

The different character sets of the two tables result in the impossibility of using the index. The character set of the view library is utf8, and the sorting rule is utf8_general_ci.

After modifying the table tur_customer to utf8mb4 format, the execution time of the sql is only 0.1s. Look again at the execution plan as follows:


+----+-------------+-------+--------+--------------------------------------+-------------------------+---------+-----------------------------+------+----------------------------------------------------+
| id | select_type | table | type   | possible_keys                        | key                     | key_len | ref                         | rows | Extra                                              |
+----+-------------+-------+--------+--------------------------------------+-------------------------+---------+-----------------------------+------+----------------------------------------------------+
|  1 | SIMPLE      | cb    | ref    | uniq_account,idx_employee_account_id | idx_employee_account_id | 130     | const                       |    1 | Using index condition; Using where; Using filesort |
|  1 | SIMPLE      | tc    | eq_ref | uniq_customeraccountid               | uniq_customeraccountid  | 130     | test.cb.customer_account_id |    1 | Using where                                        |
+----+-------------+-------+--------+--------------------------------------+-------------------------+---------+-----------------------------+------+----------------------------------------------------+

2. Join join join table error reporting with different sorting rules of two tables

The error information is as follows:

SQL state [HY000]; error code [1267]; Illegal mix of collations (utf8mb4_general_ci,IMPLICIT) and (utf8mb4_unicode_ci,IMPLICIT) for operation '='; nested exception is java.sql.SQLException: Illegal mix of collations (utf8mb4_general_ci,IMPLICIT) and (utf8mb4_unicode_ci,IMPLICIT) for operation '='"

We looked up the DDL of the following two tables, one is utf8mb4_unicode_ci and the other is utf8mb4_general_ci, which caused this problem.

Solution: Change the two tables to utf8mb4_unicode_ci or utf8mb4_general_ci.

If you do not modify the table structure, you can specify the character set encoding on the equal sign side when querying:

  • Format: WHERE column name collate utf8mb4 ABCD Unicode ABCD CI = nested statement
  • Or: WHERE column name = nested statement COLLATE utf8mb4_unicode_ci to make the character sets on both sides equal!

Tags: MySQL SQL Java encoding

Posted on Sat, 12 Oct 2019 08:02:19 -0700 by ponch9