概述
本日做一个数据分析,其中有一列数据有些有数据有些没数据,因此我们需要把每数据的举行增补进来因此我们需要利用last_value 函数和over 结合利用,但是遇到一个比力奇葩的标题不能按照预期举行处理。
新说原因: 由于我们要处理的列,不是 null 而是空字符导致函数失效
遇到这个标题我思考的几个思绪:
- last_value 函数第一次用不熟悉,猜疑自己利用方法不对,因此官方文档, 百度各种找资料,最后确认自己明白的没标题
- 由于我们是用公司内部平台猜疑平台有标题,最后确认没标题
- 猜疑低版本的标题,咨询平台是hive 2.0,发现这个函数是0.1.1开始支持
- 自己造数据举行测试,发现没标题,自己造的数据用null,忽然想起来是null和空字符串的标题导致的,经过验证确实是
函数说明:
first_value:函数用于返回当前第一个值。可开启true下令,跳过null值
last_value:函数用于返回当前最后个值。可开启true下令,跳过null值
LAST_VALUE 是 SQL 中的一个窗口函数,用于获取窗口内最后一个值。窗口函数在数据库查询中用于在某个特定的窗口范围内盘算聚合、分析等操作。
- LAST_VALUE(column_name) OVER (PARTITION BY partition_expression ORDER BY sort_expression [ASC | DESC] ROWS BETWEEN start AND end)
复制代码 其中:
- column_name:要获取最后值的列名。
- PARTITION BY partition_expression:可选,指定分区表达式,用于将结果分成多个分区,类似于 GROUP BY。
- ORDER BY sort_expression:指定用于排序的列或表达式。
- ASC | DESC:可选,用于指定排序次序,默以为升序。
- ROWS BETWEEN start AND end:可选,用于指定窗口的范围。
last_value
数据:
- 1 a a null 202301 202301
- 1 b b null null 202302
- 1 null c null null 202303
- 1 d null null null 202304
- 2 a a null 202301 202301
- -- 预期实现
- 1 d c null 202301 202304
- 2 a a null 202301 202301
复制代码- select last_value(age) over(partition by a order by b,c desc)
复制代码- SELECT *
- FROM
- (SELECT id
- ,last_value(name,TRUE) OVER (PARTITION BY id ORDER BY up_time) name
- ,last_value(age,TRUE) OVER (PARTITION BY id ORDER BY up_time) age
- ,last_value(address,TRUE) OVER (PARTITION BY id ORDER BY up_time) address
- ,last_value(ct_time,TRUE) OVER (PARTITION BY id ORDER BY up_time) ct_time
- ,up_time
- ,row_number() over (partition by id order by up_time desc ) as rank
- FROM
- (select *
- from
- (select 1 as id,'a' as name ,'a' as age,null as address,202301 as ct_time,202301 as up_time
- union all
- select 1 as id,'b' as name ,'b' as age,null as address,null as ct_time, 202302 as up_time
- union all
- select 1 as id,null as name,'c' as age,null as address,null as ct_time, 202303 as up_time
- union all
- select 1 as id,'d' as name ,null as age,null as address,null as ct_time, 202304 as up_time
- union all
- select 2 as id,'a' as name ,'a' as age,null as address,202301 as ct_time, 202301 as up_time
- ) t
- )
- )
- WHERE rank=1
- ;
- SELECT *
- FROM
- (SELECT id
- ,last_value(name,TRUE) OVER (PARTITION BY id ORDER BY up_time ROWS BETWEEN unbounded preceding and unbounded following) name
- ,last_value(age,TRUE) OVER (PARTITION BY id ORDER BY up_time ROWS BETWEEN unbounded preceding and unbounded following) age
- ,last_value(address,TRUE) OVER (PARTITION BY id ORDER BY up_time ROWS BETWEEN unbounded preceding and unbounded following) address
- ,last_value(ct_time,TRUE) OVER (PARTITION BY id ORDER BY up_time ROWS BETWEEN unbounded preceding and unbounded following) ct_time
- ,up_time
- ,row_number() over (partition by id order by up_time desc ) as rank
- FROM
- (select *
- from
- (select 1 as id,'a' as name ,'a' as age,null as address,202301 as ct_time,202301 as up_time
- union all
- select 1 as id,'b' as name ,'b' as age,null as address,null as ct_time, 202302 as up_time
- union all
- select 1 as id,null as name,'c' as age,null as address,null as ct_time, 202303 as up_time
- union all
- select 1 as id,'d' as name ,null as age,null as address,null as ct_time, 202304 as up_time
- union all
- select 2 as id,'a' as name ,'a' as age,null as address,202301 as ct_time, 202301 as up_time
- ) t
- )
- )
- WHERE rank=1
- ;
- 在上述sql中,使用last_value函数对每一个列按照主键id分组,取一个最新值,如果遇见null值,使用参数true进行忽略,最后再使用窗口函数row_number进行分组排序取最大一条数据即可实现数据合并。
复制代码
first_value
- CREATE TABLE data (id INT, value INT);
-
- INSERT INTO data (id, value) VALUES
- (1, NULL),
- (2, NULL),
- (3, NULL),
- (4, NULL),
- (5, 99),
- (6, NULL),
- (7, NULL),
- (8, 101),
- (9, NULL),
- (10, NULL);
复制代码- select
- id,
- value,
- --1、取第一个值。null也正常取值
- first_value(value) over(order by id),
- --2、第一行到当前行,取第一个非null的值
- first_value(value,true) over(order by id ),
- --3、取当前行,到最后行,第一个非null值。
- --业务需求:取当前行之后的,最近的非null值
- first_value(value,true)
- over(order by id rows between current row and unbounded following)
- from data;
复制代码
- SELECT
- role_gid,
- nd, --值为1、2、3、、、、28
- level_raw,
- battle_level_raw,
- coalesce(
- battle_level_raw,
- --取后面行第一个值(最接近)
- --函数含义:分组排序取组内第一个值.
- 需求是:当前行的后面行的第一个值.true是忽略null值 否则跟原始值一样
- first_value(battle_level_raw,true )
- OVER(PARTITION BY role_gid ORDER BY nd
- ROWS BETWEEN CURRENT ROW and UNBOUNDED FOLLOWING) ,
- 取前面行最后值(最接近)
- --函数含义:分组排序取组内最后一个值.
- 需求是:当前行的前面行的最后值.true是忽略null值 否则跟原始值一样
- last_value(battle_level_raw ,true )
- OVER(PARTITION BY role_gid ORDER BY nd
- ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
- ) battle_level_notnull
- from nd_game_sd_sjmy.tmp_dws_user_adventure_role_bat_detail_di ;
复制代码 免责声明:如果侵犯了您的权益,请联系站长,我们会及时删除侵权内容,谢谢合作!更多信息从访问主页:qidao123.com:ToB企服之家,中国第一个企服评测及商务社交产业平台。 |