Oracle 系列数据库利用 listagg去重，删除重复数据的几种方法 ...

忿忿的泥巴坨 · 2024-8-2 12:10:34

listagg聚合之后很多重复数据，下面是解决重复数据问题

案例表

create table "dept_tag_info"
(
"tag_id" bigint not null,
"tag_code" varchar(200),
"tag_name" varchar(500),
"tag_level" varchar(200),
"parent_id" bigint,
"gmt_create" datetime(6),
"create_by" varchar(50),
"update_by" varchar(50),
"gmt_modified" datetime(6),
"del_flag" char(1),
"sort" integer,
"multiple_choice" char(1),
"relation_dept_category" varchar(500),
"tips_text" varchar(2000),
"remarks" varchar(500),
"enabled" char(1),
constraint "dept_tag_info_new_pk" not cluster primary key("tag_id")) storage(on "ctbiyi_data_v3", clusterbtr) ;
comment on table "dept_tag_info" is '企业标签基础信息表';
comment on column "dept_tag_info"."tag_id" is '主键';
comment on column "dept_tag_info"."tag_code" is '标签编码';
comment on column "dept_tag_info"."tag_name" is '标签名称';
comment on column "dept_tag_info"."tag_level" is '标签层级';
comment on column "dept_tag_info"."parent_id" is '父节点编码id';
comment on column "dept_tag_info"."gmt_create" is '创建时间';
comment on column "dept_tag_info"."create_by" is '创建人';
comment on column "dept_tag_info"."update_by" is '修改人';
comment on column "dept_tag_info"."gmt_modified" is '修改时间';
comment on column "dept_tag_info"."del_flag" is '删除标记 0-未删除 1-已删除';
comment on column "dept_tag_info"."sort" is '排序';
comment on column "dept_tag_info"."multiple_choice" is '多选（1是 0否）';
comment on column "dept_tag_info"."relation_dept_category" is '关联主体';

复制代码

为了方便大家看所以全部小写

select
t.tag_code,
t.tag_name,
listagg(t.tag_level, ',') within group(order by t.tag_level) as tag_levels
from
dept_tag_info t
group by
t.tag_code,
t.tag_name;

复制代码

第一种：利用wm_concat() + distinct去重聚合

select
t.tag_code,
t.tag_name,
wm_concat(distinct t.tag_level) as tag_levels
from
dept_tag_info t
group by
t.tag_code,
t.tag_name;

复制代码

第二种：利用listagg，先去重，再聚合

select
t.tag_code,
t.tag_name,
listagg(t.tag_level, ',') within group(order by t.tag_level) as tag_levels
from
(select distinct s.tag_code, s.tag_name, s.tag_level
from dept_tag_info s) t
group by
t.tag_code,
t.tag_name;

复制代码

第三种：xmlagg(xmlparse(content t.tag_level || ‘,’ wellformed) order by t.tag_level)：利用 xmlagg 和 xmlparse 函数将 tag_level 字段聚合为一个用逗号分隔的字符串，并按 tag_level 排序。
getclobval()：将 xml 类型的结果转换为 clob（character large object）。
rtrim(…, ‘,’)：去掉聚合结果末尾的逗号。
内部子查询 select distinct s.tag_code, s.tag_name, s.tag_level from dynamic_ctbiyi_v3.dept_tag_info s：
选择唯一的 tag_code、tag_name 和 tag_level

select
t.tag_code,
t.tag_name,
rtrim(
xmlagg(
xmlparse(content t.tag_level || ',' wellformed)
order by t.tag_level
).getclobval(),
','
) as tag_levels
from
(select distinct s.tag_code, s.tag_name, s.tag_level
from dept_tag_info s) t
group by
t.tag_code,
t.tag_name;

复制代码

listagg 的优缺点
优点：
简便和易用：LISTAGG 语法简单，易于明白和利用。
性能较好：在很多情况下，LISTAGG 的实行速度会快于 XMLAGG，尤其是在处理较少数据量时。
排序：支持在聚合过程中对字符串举行排序，利用 WITHIN GROUP 子句。
缺点：
字符串长度限定：LISTAGG 天生的字符串长度不能凌驾 4000 字符，如果凌驾这个限定，会抛出错误。
无格式化功能：LISTAGG 仅限于字符串毗连，不支持更复杂的格式化。
xmlagg 的优缺点
优点：
字符串长度更大：XMLAGG 可以处理比 LISTAGG 更大的字符串，因为天生的结果是 CLOB 类型，不受 4000 字符的限定。
灵活性：支持更复杂的 XML 处理和格式化功能，适合必要复杂字符串操作的场景。
缺点：
性能问题：在处理大量数据时，XMLAGG 大概比 LISTAGG 慢，因为涉及到 XML 解析和处理。
复杂性：语法相对复杂，利用起来不如 LISTAGG 简单。
利用 LISTAGG：当聚合后的字符串长度不凌驾 4000 字符时，并且只必要简单的字符串毗连和排序。
利用 XMLAGG：当聚合后的字符串长度大概凌驾 4000 字符，大概必要更复杂的格式化和处理时。
根据具体需求选择合适的函数可以在保证代码简便性和实行服从的同时，满足业务需求。
手动处理重复数据的一种快捷安全的方式

-- 查找重复记录
select "tag_id", count(*) as cnt
from dept_tag_info
group by "tag_id"
having count(*) > 1
order by cnt desc;

复制代码

主删除语句：

delete from dept_tag_info t
where t.rowid in (
select rid
from (
select t1.rowid as rid, row_number() over (partition by t1.tag_code, t1.tag_name order by 1) as rn
from dept_tag_info t1
) t2
where t2.rn > 1
);

复制代码

免责声明：如果侵犯了您的权益，请联系站长，我们会及时删除侵权内容，谢谢合作！更多信息从访问主页：qidao123.com:ToB企服之家，中国第一个企服评测及商务社交产业平台。

		自动登录	找回密码
密码			立即注册

Oracle 系列数据库利用 listagg去重，删除重复数据的几种方法 ...

本帖子中包含更多资源

0 个回复

快速回复

楼主热帖

标签云

浏览过的版块