摘要:MySQL,JSON类型,多值索引, 用户画像
MySQL是结构化数据存储,JSON黑白结构化格式,在MySQL中利用JSON类型可以打通关系型和非关系型数据的存储之间的界限,为业务提供更好的架构选择,以下内容包罗
- (1)JSON 数据类型
- (2)JSON类型创建插入数据
- (3)提取JSON内字段
- (4)JSON类型和字符串的区别
- (5)JSON类型数据修改
- (6)JSON类型利用索引
- (7)JSON类型其他常用函数
- (8)JSON ARRAY的多值索引
- (9)基于JSON类型的用户画像设计
(1)JSON 数据类型
JSON(JavaScript Object Notation)主要用于互联网应用服务之间的数据交换。MySQL 支持JSON 对象和JSON 数组两种类型,JSON 类型是从 MySQL 5.7 版本开始支持的功能,MySQL中利用JSON有以下好处
- 无须预定义字段:字段可以无限拓展,制止了ALTER ADD COLUMN的操作,利用更加灵活
- 处理希罕字段:制止了希罕字段的NULL值,制止冗余存储
- 支持索引:相比于字符串格式的JSON,JSON类型支持索引做特定的查询优化
总体而言,JSON 类型比较适合存储一些修改较少、相对静态的数据,或者说适合存储修改较少,且容忍希罕的聚合数据,好比存储用户的登录信息
(2)JSON类型创建插入数据
- create database test default charset utf8mb4;
- CREATE TABLE student (
- id BIGINT AUTO_INCREMENT PRIMARY KEY,
- info JSON DEFAULT NULL
- );
复制代码 插入数据
- mysql> INSERT student (info) VALUES ('{"sex": "F", "age": 13, "city": "beijing"}');
- Query OK, 1 row affected (0.13 sec)
- mysql> INSERT student (info) VALUES ('{"sex": "M", "age": 14, "city": "suzhou"}');
- Query OK, 1 row affected (0.10 sec)
- mysql> INSERT student (info) VALUES ('{"sex": "F", "age": 23, "city": "shenzhen"}');
- Query OK, 1 row affected (0.20 sec)
复制代码 查看数据
- mysql> select * from student;
- +----+---------------------------------------------+
- | id | info |
- +----+---------------------------------------------+
- | 1 | {"age": 13, "sex": "F", "city": "beijing"} |
- | 2 | {"age": 14, "sex": "M", "city": "suzhou"} |
- | 3 | {"age": 23, "sex": "F", "city": "shenzhen"} |
- +----+---------------------------------------------+
复制代码 试一下插入JSON数组
- mysql> INSERT student (info) VALUES ('[1,2,3,4]');
- Query OK, 1 row affected (0.12 sec)
- mysql> INSERT student (info) VALUES ('[{"sex": "M"},{"sex":"F", "city":"nanjing"}]');
- Query OK, 1 row affected (0.11 sec)
- mysql> select * from student;
- +----+-------------------------------------------------+
- | id | info |
- +----+-------------------------------------------------+
- | 1 | {"age": 13, "sex": "F", "city": "beijing"} |
- | 2 | {"age": 14, "sex": "M", "city": "suzhou"} |
- | 3 | {"age": 23, "sex": "F", "city": "shenzhen"} |
- | 4 | [1, 2, 3, 4] |
- | 5 | [{"sex": "M"}, {"sex": "F", "city": "nanjing"}] |
- +----+-------------------------------------------------+
- 5 rows in set (0.00 sec)
复制代码
- 可以混合插入JSON数组和JSON对象,但是必须要符合JSON格式
- 此中整个字符串利用单引号,键值对利用双引号,
- JSON中可以答应有多余空格,MySQL会自动解析,输出的利用格式为符号后带有一个空格
(3)提取JSON内字段
由于支持了新的JSON类型,MySQL 配套提供了丰富的 JSON 字段处理函数,用于方便地操作 JSON 数据,最常见的就是函数 JSON_EXTRACT,它用来从 JSON 数据中提取所需要的字段内容
(1)提取JSON对象
主要是JSON_UNQUOTE和JSON_EXTRACT,JSON_EXTRACT作用是去除最外侧的双引号,JSON_EXTRACT根据键提取值
- mysql> SELECT
- -> id,
- -> JSON_UNQUOTE(JSON_EXTRACT(info,"$.sex")) sex,
- -> JSON_UNQUOTE(JSON_EXTRACT(info,"$.age")) age,
- -> JSON_UNQUOTE(JSON_EXTRACT(info,"$.city")) city
- -> FROM student;
- +----+------+------+----------+
- | id | sex | age | city |
- +----+------+------+----------+
- | 1 | F | 13 | beijing |
- | 2 | M | 14 | suzhou |
- | 3 | F | 23 | shenzhen |
- +----+------+------+----------+
- 3 rows in set (0.00 sec)
复制代码 MySQL 还提供了 ->> 表达式,和上述 SQL 效果完全一样,->>也是去除最外面的引号,另有一种符号->,它也能得到提取结果但是不去除外面的符号,相当于->和JSON_EXTRACT对应
- mysql> SELECT
- -> id,
- -> info->>"$.sex" sex,
- -> info->>"$.age" age,
- -> info->>"$.city" city
- -> FROM student;
- +----+------+------+----------+
- | id | sex | age | city |
- +----+------+------+----------+
- | 1 | F | 13 | beijing |
- | 2 | M | 14 | suzhou |
- | 3 | F | 23 | shenzhen |
- +----+------+------+----------+
- 3 rows in set (0.00 sec)
复制代码 如果JSON对象中查询的键不存在,则返回为NULL
- mysql> SELECT
- -> id,
- -> JSON_UNQUOTE(JSON_EXTRACT(info,"$.sex")) sex,
- -> JSON_UNQUOTE(JSON_EXTRACT(info,"$.height")) height
- -> FROM student;
- +----+------+--------+
- | id | sex | height |
- +----+------+--------+
- | 1 | F | NULL |
- | 2 | M | NULL |
- | 3 | F | NULL |
- +----+------+--------+
- 3 rows in set (0.00 sec)
复制代码 (2)提取JSON数组
先创建JSON数组类型,插入数据
- mysql> CREATE TABLE student (
- -> id BIGINT AUTO_INCREMENT PRIMARY KEY,
- -> info JSON DEFAULT NULL
- -> );
- Query OK, 0 rows affected (0.42 sec)
- mysql> INSERT student (info) VALUES ('[1, 2, 3, 4]');
- Query OK, 1 row affected (0.09 sec)
- mysql> INSERT student (info) VALUES ('[2, 3, 4]');
- Query OK, 1 row affected (0.08 sec)
- mysql> INSERT student (info) VALUES ('[3, 1, -1]');
- Query OK, 1 row affected (0.13 sec)
- mysql> select * from student;
- +----+--------------+
- | id | info |
- +----+--------------+
- | 1 | [1, 2, 3, 4] |
- | 2 | [2, 3, 4] |
- | 3 | [3, 1, -1] |
- +----+--------------+
- 3 rows in set (0.00 sec)
复制代码 JOSN数组通过索引取对应的值,同样是利用JSON_EXTRACT,索引从0开始
- mysql> SELECT
- -> JSON_EXTRACT(info, '$[0]') first
- -> FROM student;
- +----------------------------+
- | first |
- +----------------------------+
- | 1 |
- | 2 |
- | 3 |
- +----------------------------+
复制代码 同样可以接纳->>符号,这种方式会去除双引号,如果JSON数组内的元素是双引号字符串,显示出来的时候也会被去除引号
- mysql> SELECT
- -> id,
- -> info->>"$[0]" first,
- -> info->>"$[1]" second
- -> FROM student;
- +----+-------+--------+
- | id | first | second |
- +----+-------+--------+
- | 1 | 1 | 2 |
- | 2 | 2 | 3 |
- | 3 | 3 | 1 |
- +----+-------+--------+
复制代码 可以提取数组中的多个值,此时JSON_EXTRACT后面传入多个参数,提取的次序和输出次序一致
- mysql> SELECT id, JSON_EXTRACT(info, "$[1]", "$[0]") a FROM student;
- +----+--------+
- | id | a |
- +----+--------+
- | 1 | [2, 1] |
- | 2 | [5, 2] |
- | 3 | [3, 1] |
- +----+--------+
- 3 rows in set (0.01 sec)
复制代码 如果提取的索引位置不存在则返回空
- mysql> SELECT
- -> id,
- -> info->>"$[2]" a
- -> FROM student;
- +----+------+
- | id | a |
- +----+------+
- | 1 | 3 |
- | 2 | 6 |
- | 3 | NULL |
- +----+------+
- 3 rows in set (0.00 sec)
复制代码 (3)提取嵌套数组
嵌套数组主要是JSON数组内包含多个JSON对象或者JSON数组包含多个JSON数组,先以包含多个JSON对象为例
- mysql> truncate table student;
- Query OK, 0 rows affected (0.70 sec)
- mysql> INSERT student (info) VALUES ('[{"no":1,"ent_name":"张家港市杨舍百桥士方园艺场","score":98.2}, {"no":2,"ent_name":"昆山市朱北苗圃有限公司","score":98.2}, {"no":3,"ent_name":"苏州市吴中区临湖现代渔业发展有限公司","score":98.2}]');
- Query OK, 1 row affected (0.08 sec)
- mysql> INSERT student (info) VALUES ('[{"no":1,"ent_name":"张家港市合力土石方挖掘服务部","score":98.1}, {"no":2,"ent_name":"中国石化销售有限公司江苏苏州养武加油站","score":97.6}, {"no":3,"ent_name":"中国石化销售有限公司江苏苏州太仓璜泾二站服务点","score":97.5}]');
- Query OK, 1 row affected (0.09 sec)
复制代码 - mysql> select * from student;
- +----+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
- | id | info |
- +----+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
- | 1 | [{"no": 1, "score": 98.2, "ent_name": "张家港市杨舍百桥士方园艺场"}, {"no": 2, "score": 98.2, "ent_name": "昆山市朱北苗圃有限公司"}, {"no": 3, "score": 98.2, "ent_name": "苏州市吴中区临湖现代渔业发展有限公司"}] |
- | 2 | [{"no": 1, "score": 98.1, "ent_name": "张家港市合力土石方挖掘服务部"}, {"no": 2, "score": 97.6, "ent_name": "中国石化销售有限公司江苏苏州养武加油站"}, {"no": 3, "score": 97.5, "ent_name": "中国石化销售有限公司江苏苏州太仓璜泾二站服务点"}] |
- +----+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
- 2 rows in set (0.00 sec)
复制代码 提取嵌套数组的值,首先要找到数组中对应的对象,然后根据对象的键找值
- mysql> SELECT
- -> id,
- -> JSON_UNQUOTE(JSON_EXTRACT(info -> "$[0]", "$.ent_name")) first_ent_name
- -> FROM student;
- +----+--------------------------------------------+
- | id | first_ent_name |
- +----+--------------------------------------------+
- | 1 | 张家港市杨舍百桥士方园艺场 |
- | 2 | 张家港市合力土石方挖掘服务部 |
- +----+--------------------------------------------+
- 2 rows in set (0.00 sec)
复制代码 对比一下之前的写法则只能提取第一个数组对象,可见JSON_UNQUOTE对于内侧的引号不删除,只删除外侧的引号
- mysql> SELECT
- -> id,
- -> JSON_UNQUOTE(JSON_EXTRACT(info, "$[0]")) first
- -> FROM student;
- +----+------------------------------------------------------------------------------------+
- | id | first |
- +----+------------------------------------------------------------------------------------+
- | 1 | {"no": 1, "score": 98.2, "ent_name": "张家港市杨舍百桥士方园艺场"} |
- | 2 | {"no": 1, "score": 98.1, "ent_name": "张家港市合力土石方挖掘服务部"} |
- +----+------------------------------------------------------------------------------------+
- 2 rows in set (0.00 sec)
复制代码 在看数组套数组的情况
- mysql> INSERT student (info) VALUES ('[1, 2, [3, 4]]');
- Query OK, 1 row affected (0.07 sec)
- mysql> select * from student;
- +----+----------------+
- | id | info |
- +----+----------------+
- | 1 | [1, 2, [3, 4]] |
- +----+----------------+
- 1 row in set (0.00 sec)
复制代码 提取此中嵌套的数组,第一个表现嵌套数据的位置索引,第二个*表现嵌套数据内取全部元素
- mysql> SELECT
- -> id,
- -> JSON_EXTRACT(info, "$[2][*]") a
- -> FROM student;
- +----+--------+
- | id | a |
- +----+--------+
- | 1 | [3, 4] |
- +----+--------+
- 1 row in set (0.00 sec)
复制代码 如果第二个参数不是*,也可以选取嵌套数据内的指定位置的元素
- SELECT
- id,
- JSON_EXTRACT(info, "$[2][1]") a
- FROM student;
复制代码 - mysql> SELECT
- -> id,
- -> JSON_EXTRACT(info, "$[2][1]") a
- -> FROM student;
- +----+------+
- | id | a |
- +----+------+
- | 1 | 4 |
- +----+------+
- 1 row in set (0.00 sec)
复制代码 (4)提取JSON后增加过滤 / 排序条件
提取JSON后不能用新定名的字段做筛选过滤,需要调用把JSON函数或者符号再写一遍
- mysql> select * from student;
- +----+---------------------------------------------+
- | id | info |
- +----+---------------------------------------------+
- | 1 | {"age": 13, "sex": "F", "city": "beijing"} |
- | 2 | {"age": 14, "sex": "M", "city": "suzhou"} |
- | 3 | {"age": 23, "sex": "F", "city": "shenzhen"} |
- +----+---------------------------------------------+
- 3 rows in set (0.00 sec)
复制代码 筛选sex是F,age大于14的
- mysql> SELECT
- -> id,
- -> info
- -> FROM student WHERE info->>"$.age" > 14 and info->>"$.sex" = 'F';
- +----+---------------------------------------------+
- | id | info |
- +----+---------------------------------------------+
- | 3 | {"age": 23, "sex": "F", "city": "shenzhen"} |
- +----+---------------------------------------------+
- 1 row in set (0.01 sec)
复制代码 根据age倒序排序取第一,只要city列
- mysql> SELECT
- -> id,
- -> info->>"$.city"
- -> FROM student WHERE info->>"$.age" > 14 and info->>"$.sex" = 'F';
- +----+-----------------+
- | id | info->>"$.city" |
- +----+-----------------+
- | 3 | shenzhen |
- +----+-----------------+
- 1 row in set (0.00 sec)
复制代码 (4)JSON类型和字符串的区别
除了JSON类型支持索引之外,看一下脚本语言对于JSON类型取值和字符串是否有区别,利用pymysql毗连测试
- >>> import pymysql
- >>> config = {"user": "root", "password": "123456", "database": "test", "host": "127.0.0.1", "port": 3306}
- >>> conn = pymysql.connect(**config)
- >>> cursor = conn.cursor()
- >>> cursor.execute("select info from student where id = 1")
- 1
- >>> res = cursor.fetchall() # (('{"age": 13, "sex": "F", "city": "beijing"}',),)
- >>> cursor.close()
- >>> conn.close()
- >>> json.loads(res[0][0])
- {'age': 13, 'sex': 'F', 'city': 'beijing'}
复制代码 可见结果是一个JSON格式的字符串,可以直接解析成JSON,所以脚本语言取出的JSON类型结果和字符串没有差别,就是JSON格式的字符串,另外指定JSON格式后,MySQL会对插入的字符串做检验,如果不符合JSON格式插入报错,这也是和传统Varchar或者TEXT的区别
- JSON格式相比于Varchar,TEXT支持索引
- JSON格式会对插入的字符串做JSON格式校验,不符合则报错
- JSON格式的输入输出都是字符串,如果利用Varchar或者TEXT格式人工保证字符串为JSON格式,效果是一致的
(5)JSON类型数据修改
修改数据主要是JSON_SET,JSON_INSERT,JSON_REPLACE三个方法
- JSON_SET:替换现有key的值,插入不存在的key的值
- JSON_INSERT:插入不存在的key的值,已经存在的不修改
- JSON_REPLACE:只替换已存在的key的值,不存在的不做插入
利用的时候联合update语句
- mysql> select * from student;
- +----+---------------------------------------------+
- | id | info |
- +----+---------------------------------------------+
- | 1 | {"age": 13, "sex": "F", "city": "beijing"} |
- | 2 | {"age": 14, "sex": "M", "city": "suzhou"} |
- | 3 | {"age": 23, "sex": "F", "city": "shenzhen"} |
- +----+---------------------------------------------+
- 3 rows in set (0.01 sec)
复制代码 JSON_SET,不存在则插入,有则替换
- mysql> UPDATE student SET info = JSON_SET(info, "$.city", 'wuxi', "$.height", 123) WHERE id = 1;
- Query OK, 1 row affected (0.87 sec)
- Rows matched: 1 Changed: 1 Warnings: 0
- mysql> select * from student;
- +----+--------------------------------------------------------+
- | id | info |
- +----+--------------------------------------------------------+
- | 1 | {"age": 13, "sex": "F", "city": "wuxi", "height": 123} |
- | 2 | {"age": 14, "sex": "M", "city": "suzhou"} |
- | 3 | {"age": 23, "sex": "F", "city": "shenzhen"} |
- +----+--------------------------------------------------------+
复制代码 JSON_INSERT,只会插入不存在的值
- mysql> UPDATE student SET info = JSON_INSERT(info, "$.city", 'wuxi', "$.height", 123) WHERE id = 1;
- Query OK, 1 row affected (0.11 sec)
- Rows matched: 1 Changed: 1 Warnings: 0
- mysql> select * from student;
- +----+-----------------------------------------------------------+
- | id | info |
- +----+-----------------------------------------------------------+
- | 1 | {"age": 13, "sex": "F", "city": "beijing", "height": 123} |
- | 2 | {"age": 14, "sex": "M", "city": "suzhou"} |
- | 3 | {"age": 23, "sex": "F", "city": "shenzhen"} |
- +----+-----------------------------------------------------------+
- 3 rows in set (0.00 sec)
复制代码 JSON_REPLACE,只会替换已有值
- mysql> UPDATE student SET info = JSON_REPLACE(info, "$.city", 'wuxi', "$.height", 123) WHERE id = 1;
- Query OK, 1 row affected (0.06 sec)
- Rows matched: 1 Changed: 1 Warnings: 0
- mysql> select * from student;
- +----+---------------------------------------------+
- | id | info |
- +----+---------------------------------------------+
- | 1 | {"age": 13, "sex": "F", "city": "wuxi"} |
- | 2 | {"age": 14, "sex": "M", "city": "suzhou"} |
- | 3 | {"age": 23, "sex": "F", "city": "shenzhen"} |
- +----+---------------------------------------------+
- 3 rows in set (0.00 sec)
复制代码 删除利用JSON_REMOVE,在JSON对象中指定key删除
- mysql> select * from student;
- +----+---------------------------------------------+
- | id | info |
- +----+---------------------------------------------+
- | 1 | {"age": 13, "sex": "F", "city": "wuxi"} |
- | 2 | {"age": 14, "sex": "M", "city": "suzhou"} |
- | 3 | {"age": 23, "sex": "F", "city": "shenzhen"} |
- +----+---------------------------------------------+
- 3 rows in set (0.00 sec)
- mysql> UPDATE student SET info = JSON_REMOVE(info, "$.age") WHERE id = 1;
- Query OK, 1 row affected (0.45 sec)
- Rows matched: 1 Changed: 1 Warnings: 0
- mysql> select * from student;
- +----+---------------------------------------------+
- | id | info |
- +----+---------------------------------------------+
- | 1 | {"sex": "F", "city": "wuxi"} |
- | 2 | {"age": 14, "sex": "M", "city": "suzhou"} |
- | 3 | {"age": 23, "sex": "F", "city": "shenzhen"} |
- +----+---------------------------------------------+
- 3 rows in set (0.00 sec)
复制代码 在JSON数组中指定下标删除
- mysql> select * from student;
- +----+----------------+
- | id | info |
- +----+----------------+
- | 1 | [1, 2, [3, 4]] |
- | 2 | [2, 5, 6] |
- | 3 | [1, 3] |
- +----+----------------+
- 3 rows in set (0.00 sec)
- mysql> UPDATE student SET info = JSON_REMOVE(info, "$[1]") WHERE id=2;
- Query OK, 1 row affected (0.72 sec)
- Rows matched: 1 Changed: 1 Warnings: 0
- mysql> select * from student;
- +----+----------------+
- | id | info |
- +----+----------------+
- | 1 | [1, 2, [3, 4]] |
- | 2 | [2, 6] |
- | 3 | [1, 3] |
- +----+----------------+
- 3 rows in set (0.00 sec)
复制代码 (6)JSON类型利用索引
当 JSON 数据量非常大,用户希望对 JSON 数据进行有效检索时,可以利用 MySQL 的函数索引功能对 JSON 中的某个字段进行索引,具体方式是先创建一个假造列,再对假造列创建索引
先看一下没有索引下,对JSON对象中某个key做条件检索的EXPLAIN计划,可见访问方式type是ALL全表扫面,利用的索引Key是NULL
- mysql> select * from student;
- +----+---------------------------------------------+
- | id | info |
- +----+---------------------------------------------+
- | 1 | {"age": 13, "sex": "F", "city": "beijing"} |
- | 2 | {"age": 14, "sex": "M", "city": "suzhou"} |
- | 3 | {"age": 23, "sex": "F", "city": "shenzhen"} |
- +----+---------------------------------------------+
- 3 rows in set (0.00 sec)mysql> explain select * from student where info->>"$.age" = 13;+----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+-------------+| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |+----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+-------------+| 1 | SIMPLE | student | NULL | ALL | NULL | NULL | NULL | NULL | 3 | 100.00 | Using where |+----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+-------------+1 row in set, 1 warning (0.01 sec)
复制代码 创建假造列age,而且添加普通索引,可见表新增了一列(虽然是假造列,但是在SELECT和脚本语言取数都能取到新增的假造列),重新EXPLAIN之后Type改为ref代表普通索引,利用的Key是idx_age
- mysql> ALTER TABLE student ADD COLUMN age INT as (info->>"$.age");
- Query OK, 0 rows affected (0.54 sec)
- Records: 0 Duplicates: 0 Warnings: 0
- mysql> select * from student;
- +----+---------------------------------------------+------+
- | id | info | age |
- +----+---------------------------------------------+------+
- | 1 | {"age": 13, "sex": "F", "city": "beijing"} | 13 |
- | 2 | {"age": 14, "sex": "M", "city": "suzhou"} | 14 |
- | 3 | {"age": 23, "sex": "F", "city": "shenzhen"} | 23 |
- +----+---------------------------------------------+------+
- 3 rows in set (0.00 sec)
- mysql> ALTER TABLE student ADD INDEX idx_age(age);
- Query OK, 0 rows affected (0.51 sec)
- Records: 0 Duplicates: 0 Warnings: 0
- mysql> explain select * from student where info->>"$.age" = 13;
- +----+-------------+---------+------------+------+---------------+---------+---------+-------+------+----------+-------+
- | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
- +----+-------------+---------+------------+------+---------------+---------+---------+-------+------+----------+-------+
- | 1 | SIMPLE | student | NULL | ref | idx_age | idx_age | 5 | const | 1 | 100.00 | NULL |
- +----+-------------+---------+------------+------+---------------+---------+---------+-------+------+----------+-------+
- 1 row in set, 1 warning (0.00 sec)
复制代码 除了普通索引,同理还可以为JSON对象的其他Key值创建唯一索引等
除了再建表之后创建假造列增加索引,也可以在建表的时候就为JSON类型创建索引
- mysql> CREATE TABLE student (
- -> id BIGINT AUTO_INCREMENT PRIMARY KEY,
- -> info JSON DEFAULT NULL,
- -> age Int as (info->>"$.age"),
- -> index idx_age(age)
- -> );
复制代码 (7)JSON类型其他常用函数
除了上面的JSON_EXTRACT和JSON_UNQUOTE,常用函数如下
- MEMBER OF:只能对JSON数组利用,返回1元素存在数组中,0元素不存在数组中
- JSON_CONTAINS:可以对JSON数组和JSON对象利用,针对JSON数组检查一个元素或者多个元素是否存在,对于JSON对象检查指定路径下是否有某个值或者是否有某个路径(Key)
- JSON_OVERLAP:比较两个JSON数组是否至少有一个元素一致,如果是返回1,否则返回0,如果是JSON对象,判断是否是找有一对key value一致
- JSON_KEYS:返回JSON对象的Key,也可以是嵌套JSON对象
以上函数可以在前面加上NOT关键字就可以取反
MEMBER OF是MySQL 8.0.17新增的函数,查看元素或者子数组是否存在在JSON数组中,如果把MEMBER OF放在where条件后面则直接筛选结果是1的,可以省略写=1
- mysql> select * from student;
- +----+----------------+
- | id | info |
- +----+----------------+
- | 1 | [1, 2, [3, 4]] |
- | 2 | [2, 5, 6] |
- | 3 | [1, 3] |
- +----+----------------+
- 3 rows in set (0.01 sec)
- mysql> SELECT * FROM student WHERE 3 MEMBER OF(info);
- +----+--------+
- | id | info |
- +----+--------+
- | 3 | [1, 3] |
- +----+--------+
- 1 row in set (0.00 sec)
- mysql> SELECT * FROM student WHERE JSON_ARRAY(3, 4) MEMBER OF(info);
- +----+----------------+
- | id | info |
- +----+----------------+
- | 1 | [1, 2, [3, 4]] |
- +----+----------------+
- 1 row in set (0.00 sec)
复制代码 JSON_CONTAINS和MEMBER OF类似,但是JSON_CONTAINS可以作用与JSON对象,对于JSON数组两者也有区别,JSON_CONTAINS可以指定多个数组内的元素,相当于是且的关系,JSON_CONTAINS放在where后面默认筛选结果值是1的
- mysql> select * from student;
- +----+----------------+
- | id | info |
- +----+----------------+
- | 1 | [1, 2, [3, 4]] |
- | 2 | [2, 5, 6] |
- | 3 | [1, 3] |
- +----+----------------+
- 3 rows in set (0.00 sec)
- mysql> SELECT * FROM student WHERE JSON_CONTAINS(info, '[2, 6]');
- +----+-----------+
- | id | info |
- +----+-----------+
- | 2 | [2, 5, 6] |
- +----+-----------+
- 1 row in set (0.00 sec)
- mysql> SELECT * FROM student WHERE JSON_CONTAINS(info, '[2]');
- +----+----------------+
- | id | info |
- +----+----------------+
- | 1 | [1, 2, [3, 4]] |
- | 2 | [2, 5, 6] |
- +----+----------------+
- mysql> SELECT * FROM student WHERE JSON_CONTAINS(info, '[3]');
- +----+----------------+
- | id | info |
- +----+----------------+
- | 1 | [1, 2, [3, 4]] |
- | 3 | [1, 3] |
- +----+----------------+
复制代码 当JSON_CONTAINS作用与JSON对象时,需要判断某个key-value对是否存,格式是(target,value,key),在指定value的时候,数值用单引号,字符串还要在内加一层双引号
- mysql> select * from student;
- +----+---------------------------------------------+
- | id | info |
- +----+---------------------------------------------+
- | 1 | {"age": 13, "sex": "F", "city": "beijing"} |
- | 2 | {"age": 14, "sex": "M", "city": "suzhou"} |
- | 3 | {"age": 23, "sex": "F", "city": "shenzhen"} |
- +----+---------------------------------------------+
- mysql> SELECT * FROM student WHERE JSON_CONTAINS(info, '"F"', '$.sex');+----+---------------------------------------------+| id | info |+----+---------------------------------------------+| 1 | {"age": 13, "sex": "F", "city": "beijing"} || 3 | {"age": 23, "sex": "F", "city": "shenzhen"} |+----+---------------------------------------------+mysql> SELECT * FROM student WHERE JSON_CONTAINS(info, '13', '$.age');+----+--------------------------------------------+| id | info |+----+--------------------------------------------+| 1 | {"age": 13, "sex": "F", "city": "beijing"} |+----+--------------------------------------------+
复制代码 JSON_OVERLAP返回两个JSON数组或者JSON对象至少有一个/对元素一致则是1,房子啊where后面筛选出存在至少一个交集的数据
- mysql> SELECT * FROM student WHERE JSON_OVERLAPS(info, '[1, 5]');
- +----+--------------+
- | id | info |
- +----+--------------+
- | 1 | [1, 2, 5] |
- | 2 | [2, 5, 6] |
- | 3 | [1, 3, 8] |
- | 4 | [1, 2, 7, 8] |
- +----+--------------+
- 4 rows in set (0.00 sec)
- mysql> SELECT * FROM student WHERE JSON_OVERLAPS(info, '[1, 5, 6]');
- +----+--------------+
- | id | info |
- +----+--------------+
- | 1 | [1, 2, 5] |
- | 2 | [2, 5, 6] |
- | 3 | [1, 3, 8] |
- | 4 | [1, 2, 7, 8] |
- +----+--------------+
- 4 rows in set (0.00 sec)
- mysql> SELECT * FROM student WHERE JSON_OVERLAPS(info, '[5]');
- +----+-----------+
- | id | info |
- +----+-----------+
- | 1 | [1, 2, 5] |
- | 2 | [2, 5, 6] |
- +----+-----------+
- 2 rows in set (0.00 sec)
- mysql> SELECT * FROM student WHERE JSON_OVERLAPS(info, '[9]');
- Empty set (0.00 sec)
复制代码 (8)JSON ARRAY的多值索引
从MySQL8.0.17开始,InnoDB支持多值索引。多值索引是在存储JSON数组的列上定义的辅助索引,对于JSON ARRAY的MEMBER OF,JSON_CONTAINS,JSON_OVERLAPS可以利用多值索引进行性能优化
- mysql> select * from student;
- +----+--------------+
- | id | info |
- +----+--------------+
- | 1 | [1, 2, 5, 9] |
- | 2 | [2, 5, 6, 8] |
- | 3 | [5, 3, 8, 9] |
- | 4 | [1, 2, 7, 8] |
- +----+--------------+
- 4 rows in set (0.00 sec)
复制代码 先不创建多值索引,利用JSON_CONTAINS语句EXPLAIN查看实行计划
- mysql> EXPLAIN SELECT * FROM student WHERE JSON_CONTAINS(info, '[5, 3]');
- +----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+-------------+
- | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
- +----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+-------------+
- | 1 | SIMPLE | student | NULL | ALL | NULL | NULL | NULL | NULL | 4 | 100.00 | Using where |
- +----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+-------------+
- 1 row in set, 1 warning (0.00 sec)
复制代码 Type为全表扫描,key为NULL,下一步给表增加多值索引,注意如果这个地方改为idx_info((cast((info->"$") as unsigned array))),则后续全部的函数都要是info->"$",否则走不了索引
- mysql> ALTER TABLE student ADD INDEX idx_info((cast(info as unsigned array)));
- Query OK, 0 rows affected (0.57 sec)
- Records: 0 Duplicates: 0 Warnings: 0
复制代码 查看创建多值索引之后的EXPLAIN计划,可见现在JSON操作函数都走了索引
- mysql> EXPLAIN SELECT * FROM student WHERE 5 MEMBER OF(info);
- +----+-------------+---------+------------+------+---------------+----------+---------+-------+------+----------+-------------+
- | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
- +----+-------------+---------+------------+------+---------------+----------+---------+-------+------+----------+-------------+
- | 1 | SIMPLE | student | NULL | ref | idx_info | idx_info | 9 | const | 1 | 100.00 | Using where |
- +----+-------------+---------+------------+------+---------------+----------+---------+-------+------+----------+-------------+
- 1 row in set, 1 warning (0.00 sec)
- mysql> EXPLAIN SELECT * FROM student WHERE JSON_CONTAINS(info, '[5, 3]');
- +----+-------------+---------+------------+-------+---------------+----------+---------+------+------+----------+-------------+
- | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
- +----+-------------+---------+------------+-------+---------------+----------+---------+------+------+----------+-------------+
- | 1 | SIMPLE | student | NULL | range | idx_info | idx_info | 9 | NULL | 4 | 100.00 | Using where |
- +----+-------------+---------+------------+-------+---------------+----------+---------+------+------+----------+-------------+
- 1 row in set, 1 warning (0.00 sec)
- mysql> EXPLAIN SELECT * FROM student WHERE JSON_OVERLAPS(info, '[5, 3]');
- +----+-------------+---------+------------+-------+---------------+----------+---------+------+------+----------+-------------+
- | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
- +----+-------------+---------+------------+-------+---------------+----------+---------+------+------+----------+-------------+
- | 1 | SIMPLE | student | NULL | range | idx_info | idx_info | 9 | NULL | 4 | 100.00 | Using where |
- +----+-------------+---------+------------+-------+---------------+----------+---------+------+------+----------+-------------+
- 1 row in set, 1 warning (0.00 sec)
复制代码 (9)基于JSON类型的用户画像设计
下面基于30万级别的数据,以MySQL的JSON做用户画像的存储和查询测试,画像值已经全部处理成罗列值,不加入数值型的字段,第一步梳理用户标签,定义一张标签表,记录标签值和标签号
- mysql> CREATE TABLE tags_info (
- -> tag_id bigint auto_increment primary key,
- -> tag_name varchar(255) not null,
- -> tag_value varchar(255) not null
- -> );
- Query OK, 0 rows affected (0.80 sec)
复制代码 插入全部的标签,预览表如下
- mysql> select * from tags_info order by rand() limit 10;
- +--------+--------------+--------------------+
- | tag_id | tag_name | tag_value |
- +--------+--------------+--------------------+
- | 24 | 渠道名称 | 自助收银 |
- | 38 | 类目范围 | 类目多样 |
- | 3 | 生日月份 | 1 |
- | 35 | RFM | 重要保持会员 |
- | 11 | 生日月份 | 9 |
- | 44 | 时间偏好 | 常客 |
- | 16 | 会员等级 | 员工卡 |
- | 27 | 会员状态 | 沉默 |
- | 25 | 渠道名称 | 闪电购 |
- | 43 | 时间偏好 | 周末客 |
- +--------+--------------+--------------------+
复制代码 下一步构建用户画像表,标签值替换为标签ID,将一个用户的全部标签值存储为JSON ARRAY,先构建一张用户画像结果表
- mysql> CREATE TABLE user_tag (
- -> user_id bigint not null primary key,
- -> user_tags JSON
- -> );
- Query OK, 0 rows affected (0.41 sec)
复制代码 插入数据查看预览结果
- mysql> select * from user_tag order by rand() limit 10;
- +---------+--------------------------------------+
- | user_id | user_tags |
- +---------+--------------------------------------+
- | 36978 | [14, 19, 23, 25, 32, 42] |
- | 28515 | [14, 19, 23, 27, 36, 42] |
- | 28683 | [14, 19, 28] |
- | 39368 | [14, 19, 24, 26, 36, 43] |
- | 22269 | [14, 19, 28] |
- | 42160 | [14, 19, 23, 26, 32, 43] |
- | 22321 | [14, 17, 27, 33, 40, 43, 44, 47, 49] |
- | 20407 | [14, 19, 20, 25, 34, 38, 43] |
- | 26167 | [14, 17, 23, 27, 35, 38, 43] |
- | 38082 | [14, 19, 25, 32, 42] |
- +---------+--------------------------------------+
- 10 rows in set (0.00 sec)
复制代码 下一步对用户画像进行标签筛选指定的人群做营销,实际上转化为SQL就是多个条件的与或非组合,看几个案例
(1)取用户品级是白银卡(17)或者银卡(20),且用户状态是活泼的(28),逻辑是(17∪20)∩28,33万里面筛选出265个人
- mysql> SELECT * FROM user_tag WHERE JSON_OVERLAPS(user_tags, '[18, 17, 20]') AND JSON_CONTAINS(user_tags, '[28]') limit 5;
- +---------+------------------+
- | user_id | user_tags |
- +---------+------------------+
- | 3201 | [14, 19, 20, 28] |
- | 4183 | [14, 19, 20, 28] |
- | 4554 | [14, 19, 20, 28] |
- | 4890 | [14, 19, 20, 28] |
- | 6334 | [14, 17, 28] |
- +---------+------------------+
- mysql> SELECT count(*) FROM user_tag WHERE JSON_OVERLAPS(user_tags, '[18, 17, 20]') AND JSON_CONTAINS(user_tags, '[28]');
- +----------+
- | count(*) |
- +----------+
- | 265 |
- +----------+
- 1 row in set (0.66 sec)
复制代码 (2)筛选母婴客群(47)且是周末客(43),但是过滤掉流失会员(29),逻辑是47∩43∩(not 29),终极结果筛选出11678人
- mysql> SELECT * FROM user_tag WHERE JSON_CONTAINS(user_tags, '[43, 47]') AND NOT JSON_CONTAINS(user_tags, '[29]') limit 10;
- +---------+----------------------------------------------+
- | user_id | user_tags |
- +---------+----------------------------------------------+
- | 3036 | [14, 17, 24, 27, 33, 37, 40, 43, 44, 47, 48] |
- | 3049 | [14, 17, 27, 33, 38, 40, 43, 44, 47, 48] |
- | 3072 | [14, 19, 20, 27, 33, 37, 40, 43, 44, 47, 48] |
- | 3099 | [14, 19, 21, 27, 33, 37, 40, 43, 44, 47] |
- | 3110 | [14, 15, 27, 33, 37, 40, 43, 44, 46, 47, 48] |
- | 3127 | [14, 19, 20, 27, 33, 37, 40, 43, 44, 46, 47] |
- | 3132 | [14, 19, 27, 33, 40, 41, 43, 44, 47] |
- | 3147 | [14, 17, 27, 34, 40, 43, 46, 47, 48] |
- | 3157 | [14, 19, 23, 26, 34, 43, 47, 48] |
- | 3160 | [14, 19, 23, 27, 34, 40, 43, 44, 47, 48] |
- +---------+----------------------------------------------+
- 10 rows in set (0.00 sec)
- mysql> SELECT count(*) FROM user_tag WHERE JSON_CONTAINS(user_tags, '[43, 47]') AND NOT JSON_CONTAINS(user_tags, '[29]');
- +----------+
- | count(*) |
- +----------+
- | 11678 |
- +----------+
- 1 row in set (0.55 sec)
复制代码 (3)筛选RFM为重要开头的高价值客户(34,35,36,37),且是微信会员(15),逻辑是(34∪35∪36∪37)∩15,终极筛选出148人
- mysql> SELECT * FROM user_tag WHERE JSON_OVERLAPS(user_tags, '[34, 35, 36, 37]') AND JSON_CONTAINS(user_tags, '[15]') limit 10;
- +---------+--------------------------------------------------------+
- | user_id | user_tags |
- +---------+--------------------------------------------------------+
- | 3110 | [14, 15, 27, 33, 37, 40, 43, 44, 46, 47, 48] |
- | 6990 | [14, 15, 23, 27, 34, 37, 40, 43, 44, 47, 48, 49] |
- | 11214 | [14, 15, 24, 27, 33, 37, 40, 43, 44, 47] |
- | 13447 | [0, 2, 14, 15, 27, 33, 37, 40, 43, 44, 46, 47, 48] |
- | 18356 | [14, 15, 25, 34, 38, 43] |
- | 22016 | [1, 2, 14, 15, 27, 33, 37, 40, 43, 44, 46, 47, 48, 49] |
- | 22392 | [14, 15, 23, 27, 33, 37, 40, 41, 43, 44, 48, 49] |
- | 22721 | [1, 7, 14, 15, 27, 33, 37, 40, 42, 44, 48] |
- | 22800 | [14, 15, 23, 27, 33, 37, 40, 43, 44, 48] |
- | 25122 | [14, 15, 26, 34, 37, 40, 43, 47, 48, 49] |
- +---------+--------------------------------------------------------+
- 10 rows in set (0.06 sec)
- mysql> SELECT count(*) FROM user_tag WHERE JSON_OVERLAPS(user_tags, '[34, 35, 36, 37]') AND JSON_CONTAINS(user_tags, '[15]');
- +----------+
- | count(*) |
- +----------+
- | 148 |
- +----------+
- 1 row in set (0.75 sec)
复制代码 给这张用户画像表增加一下多值索引
- ALTER TABLE user_tag ADD INDEX idx_info((cast(user_tags as unsigned array)));
复制代码 结果是SQL不能得到准确结果,之前能检索到人群现在检索结果为符合条件的人为0,可能是在多值索引的情况下,不能组合多个JSON函数的缘故起因
- mysql> SELECT count(*) FROM user_tag WHERE JSON_OVERLAPS(user_tags, '[34, 35, 36, 37]') AND JSON_CONTAINS(user_tags, '[15]');
- +----------+
- | count(*) |
- +----------+
- | 0 |
- +----------+
- 1 row in set (0.00 sec)
复制代码 另外在多次测试下,就算倒霉用JSON函数组合,单个利用JSON_CONTAINS,JSON_OVERLAPS也可能会导致索引失效,具体缘故起因不明,如果利用NOT条件多值索引直接失效
末了编辑于:2025-02-05 20:55:48 © 著作权归作者全部,转载或内容合作请联系作者
喜欢的朋侪记得点赞、收藏、关注哦!!!
免责声明:如果侵犯了您的权益,请联系站长,我们会及时删除侵权内容,谢谢合作!更多信息从访问主页:qidao123.com:ToB企服之家,中国第一个企服评测及商务社交产业平台。 |