马上注册,结交更多好友,享用更多功能,让你轻松玩转社区。
您需要 登录 才可以下载或查看,没有账号?立即注册
x
5.7 与 8.0 对相同文件的 LOAD DATA 语句结果差别
问题形貌
某客户现场支持,由MySQL 5.7.21升级MySQL 8.0.25后,通过LOAD DATA导入文件,当同一会话连续导入差别的编码(UTF8/GB18030)文件时会出现乱码。数据库版本未升级之前,相同的导入利用在MySQL 5.7.21未出现乱码。
问题分析
1)查看简化后的 LOAD DATA语句- greatsql> LOAD DATA LOCAL INFILE '/home/greatdb/TEST_UTF8_bak.txt' IGNORE INTO TABLE ASSP_SIS_PAYRES_IMP_BAK CHARACTER SET UTF8MB4 IGNORE 0 LINES (@row) SET `D_NAME` = NULLIF(TRIM(CONVERT(UNHEX(SUBSTR(HEX(@row), 1,240)) USING UTF8MB4)),'');
- Query OK, 2 rows affected (0.01 sec)
- Records: 2 Deleted: 0 Skipped: 0 Warnings: 0
- greatsql> LOAD DATA LOCAL INFILE '/home/greatdb/TEST_GB18030_bak.txt' IGNORE INTO TABLE ASSP_SIS_PAYRES_IMP_BAK CHARACTER SET GB18030 IGNORE 0 LINES (@row) SET `D_NAME` = NULLIF(TRIM(CONVERT(UNHEX(SUBSTR(HEX(@row), 1,240)) USING GB18030)),'');
- Query OK, 2 rows affected (0.01 sec)
- Records: 2 Deleted: 0 Skipped: 0 Warnings: 0
复制代码 2)查看表数据- +----------+------------------------------------------------------+
- | AUTO_INC | D_NAME |
- +----------+------------------------------------------------------+
- | 1 | xxx社会保险xxx |
- | 2 | xxx市路桥区xxx |
- | 4 | 鍙板窞甯傝矾妗ュ尯绀句細淇濋櫓浜嬩笟绠$悊涓績 |
- | 5 | 鍙板窞甯傝矾妗ュ尯绀句細淇濋櫓浜嬩笟绠$悊涓績 |
- +----------+------------------------------------------------------+
- 4 rows in set (0.00 sec)
复制代码 3)查抄业务表的字符集与校验集,发现字符集为 utf8mb4 、校验集为 utf8mb4_bin
4)查抄数据库的字符集与校验集- greatsql> SHOW GLOBAL VARIABLES LIKE '%char%';
- +--------------------------------------+--------------------------------+
- | Variable_name | Value |
- +--------------------------------------+--------------------------------+
- | character_set_client | utf8mb4 |
- | character_set_connection | utf8mb4 |
- | character_set_database | utf8mb4 |
- | character_set_filesystem | binary |
- | character_set_results | utf8mb4 |
- | character_set_server | utf8mb4 |
- | character_set_system | utf8mb3 |
- | character_sets_dir | /opt/mysql3301/share/charsets/ |
- | validate_password_special_char_count | 1 |
- +--------------------------------------+--------------------------------+
- 9 rows in set (0.01 sec)
- greatsql> SHOW GLOBAL VARIABLES LIKE '%coll%';
- +-------------------------------+--------------------+
- | Variable_name | Value |
- +-------------------------------+--------------------+
- | collation_connection | utf8mb4_bin |
- | collation_database | utf8mb4_bin |
- | collation_server | utf8mb4_bin |
- | default_collation_for_utf8mb4 | utf8mb4_general_ci |
- +-------------------------------+--------------------+
- 4 rows in set (0.00 sec)
复制代码 程序在MySQL 5.7.21跑了很长时间,一直没有问题,把数据库升级MySQL 8.0.25后,新导入的数据出现部分乱码, 由此怀疑,MySQL 8.0定长数据导入LOAD DATA @row 出现BUG。
BUG场景:同一个会话 LOAD DATA多种字符集文件,利用@暂时变量切割字段。将导致导入数据乱码,向MySQL官方提BUG,已证实为BUG(编号115824)
问题复现
MySQL: 8.0.25- greatsql> SELECT VERSION();+-----------+| version() |+-----------+| 8.0.25 |+-----------+1 row in set (0.00 sec)table ddl:CREATE TABLE `assp_sis_payres_imp_bak` ( `AUTO_INC` bigint unsigned NOT NULL AUTO_INCREMENT COMMENT '自增列', `D_NAME` varchar(210) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin DEFAULT NULL, PRIMARY KEY (`AUTO_INC`)) ENGINE=InnoDB AUTO_INCREMENT=7 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin;greatsql> SHOW GLOBAL VARIABLES LIKE '%char%';
- +--------------------------------------+--------------------------------+
- | Variable_name | Value |
- +--------------------------------------+--------------------------------+
- | character_set_client | utf8mb4 |
- | character_set_connection | utf8mb4 |
- | character_set_database | utf8mb4 |
- | character_set_filesystem | binary |
- | character_set_results | utf8mb4 |
- | character_set_server | utf8mb4 |
- | character_set_system | utf8mb3 |
- | character_sets_dir | /opt/mysql3301/share/charsets/ |
- | validate_password_special_char_count | 1 |
- +--------------------------------------+--------------------------------+
- 9 rows in set (0.01 sec)
- greatsql> SHOW GLOBAL VARIABLES LIKE '%coll%';
- +-------------------------------+--------------------+
- | Variable_name | Value |
- +-------------------------------+--------------------+
- | collation_connection | utf8mb4_bin |
- | collation_database | utf8mb4_bin |
- | collation_server | utf8mb4_bin |
- | default_collation_for_utf8mb4 | utf8mb4_general_ci |
- +-------------------------------+--------------------+
- 4 rows in set (0.00 sec)greatsql> TRUNCATE TABLE assp_sis_payres_imp_bak;Query OK, 0 rows affected (0.03 sec)greatsql> SELECT charset(@row), @row;+---------------+------------+| charset(@row) | @row |+---------------+------------+| binary | NULL |+---------------+------------+1 row in set (0.00 sec)greatsql> LOAD DATA LOCAL INFILE '/root/dba_zc/load/TEST_UTF8_bak.txt' IGNORE INTO TABLE ASSP_SIS_PAYRES_IMP_BAK CHARACTER SET UTF8MB4 IGNORE 0 LINES (@row) SET `D_NAME` = NULLIF(TRIM(CONVERT(UNHEX(SUBSTR(HEX(@row), 1,240)) USING UTF8MB4)),'');Query OK, 2 rows affected (0.01 sec)Records: 2 Deleted: 0 Skipped: 0 Warnings: 0greatsql> SELECT charset(@row), @row; +---------------+------------------------+| charset(@row) | @row |+---------------+------------------------+| utf8mb4 | XXX路桥区社会保XXX |+---------------+------------------------+greatsql> LOAD DATA LOCAL INFILE '/root/dba_zc/load/TEST_GB18030_bak.txt' IGNORE INTO TABLE ASSP_SIS_PAYRES_IMP_BAK CHARACTER SET GB18030 IGNORE 0 LINES (@row) SET `D_NAME` = NULLIF(TRIM(CONVERT(UNHEX(SUBSTR(HEX(@row), 1,240)) USING GB18030)),'');Query OK, 2 rows affected (0.01 sec)Records: 2 Deleted: 0 Skipped: 0 Warnings: 0greatsql> SELECT charset(@row), @row; +---------------+-----------------------------------------+| charset(@row) | @row |+---------------+-----------------------------------------+| gb18030 | XXX路桥区社会保XXX |+---------------+-----------------------------------------+greatsql> SELECT * FROM ASSP_SIS_PAYRES_IMP_BAK;+----------+---------------------------------------------------------+| AUTO_INC | D_NAME |+----------+---------------------------------------------------------+| 1 | XXX路桥区社会保XXX || 2 | XXX路桥区社会保XXX || 4 | 鍙板窞甯傝矾妗ュ尯绀句細淇濋櫓浜嬩笟绠$悊涓績 || 5 | 鍙板窞甯傝矾妗ュ尯绀句細淇濋櫓浜嬩笟绠$悊涓績 |+----------+---------------------------------------------------------+4 rows in set (0.00 sec)
复制代码 MySQL 5.7.21- greatsql> SELECT VERSION();
- +------------+
- | version() |
- +------------+
- | 5.7.21-log |
- +------------+
- 1 row in set (0.01 sec)
- table ddl:
- CREATE TABLE `assp_sis_payres_imp_bak` (
- `AUTO_INC` bigint unsigned NOT NULL AUTO_INCREMENT COMMENT '自增列',
- `D_NAME` varchar(210) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin DEFAULT NULL,
- PRIMARY KEY (`AUTO_INC`)
- ) ENGINE=InnoDB AUTO_INCREMENT=7 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin;
- greatsql> SHOW GLOBAL VARIABLES LIKE '%char%';
- +--------------------------------------+--------------------------------+
- | Variable_name | Value |
- +--------------------------------------+--------------------------------+
- | character_set_client | utf8mb4 |
- | character_set_connection | utf8mb4 |
- | character_set_database | utf8mb4 |
- | character_set_filesystem | binary |
- | character_set_results | utf8mb4 |
- | character_set_server | utf8mb4 |
- | character_set_system | utf8 |
- | character_sets_dir | /opt/mysql3305/share/charsets/ |
- | validate_password_special_char_count | 1 |
- +--------------------------------------+--------------------------------+
- 9 rows in set (0.00 sec)
- greatsql> SHOW GLOBAL VARIABLES LIKE '%coll%';
- +----------------------+--------------------+
- | Variable_name | Value |
- +----------------------+--------------------+
- | collation_connection | utf8mb4_general_ci |
- | collation_database | utf8mb4_general_ci |
- | collation_server | utf8mb4_general_ci |
- +----------------------+--------------------+
- 3 rows in set (0.00 sec)
- greatsql> SELECT charset(@row), @row;
- +---------------+------------+
- | charset(@row) | @row |
- +---------------+------------+
- | binary | NULL |
- +---------------+------------+
- 1 row in set (0.00 sec)
- greatsql> LOAD DATA LOCAL INFILE '/root/dba_zc/load/TEST_UTF8_bak.txt' IGNORE INTO TABLE ASSP_SIS_PAYRES_IMP_BAK CHARACTER SET UTF8MB4 IGNORE 0 LINES (@row) SET `D_NAME` = NULLIF(TRIM(CONVERT(UNHEX(SUBSTR(HEX(@row), 1,240)) USING UTF8MB4)),'');
- Query OK, 2 rows affected (0.01 sec)
- Records: 2 Deleted: 0 Skipped: 0 Warnings: 0
- greatsql> SELECT charset(@row), @row;
- +---------------+-----------------------+
- | charset(@row) | @row |
- +---------------+-----------------------+
- | utf8mb4 | XXX路桥区社会保XXX |
- +---------------+-----------------------+
- greatsql> LOAD DATA LOCAL INFILE '/root/dba_zc/load/TEST_GB18030_bak.txt' IGNORE INTO TABLE ASSP_SIS_PAYRES_IMP_BAK CHARACTER SET GB18030 IGNORE 0 LINES (@row) SET `D_NAME` = NULLIF(TRIM(CONVERT(UNHEX(SUBSTR(HEX(@row), 1,240)) USING GB18030)),'');
- Query OK, 2 rows affected (0.01 sec)
- Records: 2 Deleted: 0 Skipped: 0 Warnings: 0
- greatsql> SELECT charset(@row), @row;
- +---------------+-----------------------+
- | charset(@row) | @row |
- +---------------+-----------------------+
- | gb18030 | XXX路桥区社会保XXX |
- +---------------+-----------------------+
- greatsql> SELECT * FROM ASSP_SIS_PAYRES_IMP_BAK;
- +---------------+-----------------------------+
- | AUTO_INC | D_NAME |
- +---------------+-----------------------------+
- | 1 | XXX路桥区社会保XXX |
- | 2 | XXX路桥区社会保XXX |
- | 4 | XXX路桥区社会保XXX |
- | 5 | XXX路桥区社会保XXX |
- +---------------+-----------------------------+
- 4 rows in set (0.00 sec)
复制代码 BUG规避方案
通过SELECT``charset(@row), @row; 可以看到@row在执行LOAD DATA后在5.7.21和8.0.25是一样的,但最终的影响不一样。固然MySQL官方确认此问题为BUG,但没有提供规避方案或者办理方案。通过万里工程师研究后,发现一种可行的规避方案。每次执行LOAD DATA下令前执行 [set @row=_binary'';] 进行规避。- greatsql> SELECT VERSION();
- +-----------+
- | version() |
- +-----------+
- | 8.0.25 |
- +-----------+
- 1 row in set (0.00 sec)
- greatsql> SET @row=_binary'';
- Query OK, 0 rows affected (0.00 sec)
- greatsql> LOAD DATA LOCAL INFILE '/home/greatdb/TEST_UTF8_bak.txt' IGNORE INTO TABLE ASSP_SIS_PAYRES_IMP_BAK CHARACTER SET UTF8MB4 IGNORE 0 LINES (@row) SET `D_NAME` = NULLIF(TRIM(CONVERT(UNHEX(SUBSTR(HEX(@row), 1,240)) USING UTF8MB4)),'');
- Query OK, 2 rows affected (0.01 sec)
- Records: 2 Deleted: 0 Skipped: 0 Warnings: 0
- greatsql> SET @row=_binary'';
- Query OK, 0 rows affected (0.00 sec)
- greatsql> LOAD DATA LOCAL INFILE '/home/greatdb/TEST_GB18030_bak.txt' IGNORE INTO TABLE ASSP_SIS_PAYRES_IMP_BAK CHARACTER SET GB18030 IGNORE 0 LINES (@row) SET `D_NAME` = NULLIF(TRIM(CONVERT(UNHEX(SUBSTR(HEX(@row), 1,240)) USING GB18030)),'');
- Query OK, 2 rows affected (0.01 sec)
- Records: 2 Deleted: 0 Skipped: 0 Warnings: 0
- greatsql> SELECT * FROM assp_sis_payres_imp_bak;
- +----------+--------------------------------------------------+
- | AUTO_INC | D_NAME |
- +----------+--------------------------------------------------+
- | 1 | XXX路桥区社会保XXX |
- | 2 | XXX路桥区社会保XXX |
- | 4 | XXX路桥区社会保XXX |
- | 5 | XXX路桥区社会保XXX |
- +----------+--------------------------------------------------+
- 4 rows in set (0.00 sec)
复制代码 问题总结
1.BUG原因
MySQL8.0重构定长数据导入LOAD DATA @row 出现BUG.同一个数据库会话,多次执行LOAD DATA @row下令,则第n次执行LOAD DATA @row 的字符集利用的是n-1次的字符集,当文件的字符集存在差别,比方先后处理GB18030、UTF8字符集的文件就会数据乱码。此问题MySQL官方已证实为BUG(编号115824)
2.BUG触发条件
触发条件:需同时满意以下三个条件才会触发此bug。
1)LOAD DATA下令利用类似 @row暂时变量 进行数据处理,比方对定长记录按字节切割出多个字段:- LINES (@row) SET COLUMN_NAME = NULLIF(TRIM(CONVERT(UNHEX(SUBSTR(HEX(@row),1,20)) USING GB18030))。
复制代码 2)在同一个毗连中,多次执行LOAD DATA下令,且先后处理的文件字符集存在差别(比方GB18030和UTF8)。
3)利用MySQL 8.0。
3.BUG规避办法
由万里工程师提出,与MySQL官方社区沟通证实,涉及到满意上述BUG触发条件的场景,通过在每次执行LOAD DATA下令前执行 [set @row=_binary'';] 进行规避。
参考:https://bugs.mysql.com/bug.php?id=115824
Enjoy GreatSQL
免责声明:如果侵犯了您的权益,请联系站长,我们会及时删除侵权内容,谢谢合作!更多信息从访问主页:qidao123.com:ToB企服之家,中国第一个企服评测及商务社交产业平台。 |