DECIMAL 数据处理原理浅析

慢吞云雾缓吐愁 · 2022-9-16 17:21:23

注：本文分析内容基于 MySQL 8.0 版本

文章开始前先复习一下官方文档关于 DECIMAL 类型的一些介绍：

The declaration syntax for a DECIMAL column is DECIMAL(M,D). The ranges of values for the arguments are as follows:
M is the maximum number of digits (the precision). It has a range of 1 to 65.
D is the number of digits to the right of the decimal point (the scale). It has a range of 0 to 30 and must be no larger than M.
If D is omitted, the default is 0. If M is omitted, the default is 10.
The maximum value of 65 for M means that calculations on DECIMAL values are accurate up to 65 digits. This limit of 65 digits of precision also applies to exact-value numeric literals, so the maximum range of such literals differs from before. (There is also a limit on how long the text of DECIMAL literals can be; see Section 12.25.3, “Expression Handling”.)

以上材料提到的最大精度和小数位是本文分析关注的重点：

最大精度是 65 位
小数位最多 30 位

接下来将先分析 MySQL 服务输入处理 DECIMAL 类型的常数。
现在，先抛出几个问题：

MySQL 中当使用 SELECT 查询常数时，例如：SELECT 123456789.123; 是如何处理的？
MySQL 中查询一下两条语句分别返回结果是多少？为什么？
1. SELECT 111111111111111111111111111111111111111111111111111111111111111111111111111111111;
2. SELECT 1111111111111111111111111111111111111111111111111111111111111111111111111111111111;
复制代码

MySQL 如何解析常数

来看第1个问题，MySQL 的词法分析在处理 SELECT 查询常数的语句时，会根据数字串的长度选择合适的类型来存储数值，决策逻辑代码位于 int_token(const char *str, uint length)@sql_lex.cc，具体的代码片段如下：

static inline uint int_token(const char *str, uint length) {
...
if (neg) {
cmp = signed_long_str + 1;
smaller = NUM; // If <= signed_long_str
bigger = LONG_NUM; // If >= signed_long_str
} else if (length < signed_longlong_len)
return LONG_NUM;
else if (length > signed_longlong_len)
return DECIMAL_NUM;
else {
cmp = signed_longlong_str + 1;
smaller = LONG_NUM; // If <= signed_longlong_str
bigger = DECIMAL_NUM;
}
} else {
if (length == long_len) {
cmp = long_str;
smaller = NUM;
bigger = LONG_NUM;
} else if (length < longlong_len)
return LONG_NUM;
else if (length > longlong_len) {
if (length > unsigned_longlong_len) return DECIMAL_NUM;
cmp = unsigned_longlong_str;
smaller = ULONGLONG_NUM;
bigger = DECIMAL_NUM;
} else {
cmp = longlong_str;
smaller = LONG_NUM;
bigger = ULONGLONG_NUM;
}
}
while (*cmp && *cmp++ == *str++)
;
return ((uchar)str[-1] <= (uchar)cmp[-1]) ? smaller : bigger;
}

复制代码

接着上面的思路往下看常数的语法解析：

static const char *long_str = "2147483647";
static const uint long_len = 10;
static const char *signed_long_str = "-2147483648";
static const char *longlong_str = "9223372036854775807";
static const uint longlong_len = 19;
static const char *signed_longlong_str = "-9223372036854775808";
static const uint signed_longlong_len = 19;
static const char *unsigned_longlong_str = "18446744073709551615";
static const uint unsigned_longlong_len = 20;

复制代码

语法解析器在获取到 toekn = DECIMAL_NUM 后，会创建一个 Item_decimal 对象来存储输入的数值。
在分析代码之前先来看几个常数定义：

root@mysqldb 14:09: [(none)]> SELECT 111111111111111111111111111111111111111111111111111111111111111111111111111111111;
+-----------------------------------------------------------------------------------+
| 111111111111111111111111111111111111111111111111111111111111111111111111111111111 |
+-----------------------------------------------------------------------------------+
| 111111111111111111111111111111111111111111111111111111111111111111111111111111111 |
+-----------------------------------------------------------------------------------+
1 row in set (2.28 sec)
root@mysqldb 14:09: [(none)]> SELECT 1111111111111111111111111111111111111111111111111111111111111111111111111111111111;
+------------------------------------------------------------------------------------+
| 1111111111111111111111111111111111111111111111111111111111111111111111111111111111 |
+------------------------------------------------------------------------------------+
| 99999999999999999999999999999999999999999999999999999999999999999 |
+------------------------------------------------------------------------------------+
1 row in set, 1 warning (2.01 sec)

复制代码

DECIMAL_BUFF_LENGTH：表示整个 DECIMAL 类型数据的缓冲区大小
DECIMAL_MAX_POSSIBLE_PRECISION：每个缓冲区单元可以存储 9 位数字，所以最大可以处理的精度这里为 81
DECIMAL_MAX_PRECISION：用来限制官方文档介绍中 decimal(M,D) 中的 M 的最大值，亦或是当超大常数溢出后返回的整数部分最大长度
DECIMAL_MAX_SCALE：用来限制官方文档介绍中 decimal(M,D) 中的 D 的最大值

NUM_literal:
int64_literal
| DECIMAL_NUM
{
$$= NEW_PTN Item_decimal(@$, $1.str, $1.length, YYCSCL);
}
| FLOAT_NUM
{
$$= NEW_PTN Item_float(@$, $1.str, $1.length);
}
;

复制代码

在Item_decimal构造函数中调用str2my_decimal函数对输入数值进行处理，将其转换为my_decimal类型的数据。

/** maximum length of buffer in our big digits (uint32). */
static constexpr int DECIMAL_BUFF_LENGTH{9};
/** the number of digits that my_decimal can possibly contain */
static constexpr int DECIMAL_MAX_POSSIBLE_PRECISION{DECIMAL_BUFF_LENGTH * 9};
/**
maximum guaranteed precision of number in decimal digits (number of our
digits * number of decimal digits in one our big digit - number of decimal
digits in one our big digit decreased by 1 (because we always put decimal
point on the border of our big digits))
*/
static constexpr int DECIMAL_MAX_PRECISION{DECIMAL_MAX_POSSIBLE_PRECISION -
8 * 2};
static constexpr int DECIMAL_MAX_SCALE{30};

复制代码

str2my_decimal 函数先将数值字符串转为合适的字符集后，调用 string2decimal 函数将数值字符串转为 decimal_t 类型的数据。my_decimal 类型和 decimal_t 类型的关系如下：

Item_decimal::Item_decimal(const POS &pos, const char *str_arg, uint length,
const CHARSET_INFO *charset)
: super(pos) {
str2my_decimal(E_DEC_FATAL_ERROR, str_arg, length, charset, &decimal_value);
item_name.set(str_arg);
set_data_type(MYSQL_TYPE_NEWDECIMAL);
decimals = (uint8)decimal_value.frac;
fixed = true;
max_length = my_decimal_precision_to_length_no_truncation(
decimal_value.intg + decimals, decimals, unsigned_flag);
}

复制代码

解析过程大致如下：

分别计算整数部分和小数部分各有多少个字符
分别计算整数部分和小数部分各需要多少个 buffer 元素来存储
- 如果整数部分需要的 buffer 元素个数超过 9，则表示溢出
- 如果整数部分和小数部分需要的 buffer 元素个数超过 9，则表示需要将小数部分进行截断
  由于先解析整数部分，再解析小数部分，因此，如果整数部分如果完全占用所有 buffer 元素，此时，小数部分会被截断。
将整数部分和小数部分按每 9 个字符转为一个整数记录到 buffer 的元素中（buffer中的模型示例如下）

int str2my_decimal(uint mask, const char *from, size_t length,
const CHARSET_INFO *charset, my_decimal *decimal_value) {
const char *end, *from_end;
int err;
char buff[STRING_BUFFER_USUAL_SIZE];
String tmp(buff, sizeof(buff), &my_charset_bin);
if (charset->mbminlen > 1) {
uint dummy_errors;
tmp.copy(from, length, charset, &my_charset_latin1, &dummy_errors);
from = tmp.ptr();
length = tmp.length();
charset = &my_charset_bin;
}
from_end = end = from + length;
err = string2decimal(from, (decimal_t *)decimal_value, &end);
if (end != from_end && !err) {
/* Give warning if there is something other than end space */
for (; end < from_end; end++) {
if (!my_isspace(&my_charset_latin1, *end)) {
err = E_DEC_TRUNCATED;
break;
}
}
check_result_and_overflow(mask, err, decimal_value);
return err;
}

复制代码

check_result_and_overflow 代码实现:

@startuml
class decimal_t
{
+ int intg, frac, len;
+ bool sign;
+ decimal_digit_t *buf;
}
class my_decimal
{
- decimal_digit_t buffer[DECIMAL_BUFF_LENGTH];
}
decimal_t <|-- my_decimal
@enduml

复制代码

如果 check_result_and_overflow 调用之前的处理发生了溢出行为，则意味着 decimal 不能存储完整的数据，MySQL 决定这种情况下仅返回decimal 默认的最大精度数值，由上面的代码片段可以看出最大精度数值是 65 个 9。
超大常量数据生成的 DECIMAL 数据与 DECIMAL 字段类型的区别

通过上面对超大常量数据生成的 DECIMAL 数据处理的分析，可以得出问题3的答案：两者不同，区别如下：

DECIMAL 字段类型有显式的精度和小数位的限制，也就是 DECIMAL 字段插入数据时能插入的正数部分的长度为 M-D，而超大常量数据生成的 DECIMAL 数据则会隐含的优先处理考虑整数部分，整数部分处理完才继续处理小数部分，如果缓冲区不够则将小数位截断，如果缓冲区不够整数部分存放则转为 65 个 9。
在 MySQL 的服务源码中 DECIMAL 字段类型使用 Field_new_decimal 类型接收处理，而超大常量数据生成的 DECIMAL 数据由 Item_decimal 类型接收处理。

Enjoy GreatSQL
免责声明：如果侵犯了您的权益，请联系站长，我们会及时删除侵权内容，谢谢合作！

		自动登录	找回密码
密码			立即注册

DECIMAL 数据处理原理浅析

0 个回复

快速回复

楼主热帖

标签云