Hive 分区

飞不高 · 2024-8-12 22:11:20

1. 预备源表

Create external table 源表(
字段类型,
字段类型,
Time date comment '分区字段'
)row format delimited
Fields terminated by ','
Location '上传的表存放的路径';

复制代码

加载数据：

Linux 中写：hadoop fs -put ‘linux文件路径’ ’文件要上传的路径，一般为要映射的表所在的路径’

DataGrip中写：

Load data local inpath ‘Linux的文件路径’ into table 表名;

Load data inpath ‘HDFS中的文件路径’ into table 表名;(有剪切效果)
2. 分区表

2.1 创建一级分区表

查询表时字段包含：创建字段+分区字段

Create table 表名_分区 (
字段1,
字段2
) partition by (time date)
row format delimited
Fields terminated by ',';

复制代码

给分区表静态加载数据：

Insert into table 表名_分区 partition (time='2024-01-01') Select 字段1,字段2 from 源表 where time='2024-01-01';
Insert into table 表名_分区 partition (time='2024-01-02') Select 字段1,字段2 from 源表 where time='2024-01-02';

复制代码

给分区表动态加载数据：

开启动态分区：

Set hive.exec.dynamic.partition.mode=nonstrict;

情况1：

Insert into table 表名_分区 partition (time) Select 字段1,字段2,time from 源表;

复制代码

情况2：

Insert into table 表名_分区 partition (year) Select 字段1,字段2,year(time) as year from 源表;

复制代码

2.2 创建多级分区表

表中字段包含：创建字段+分区字段

Create table 表名_分区 (
字段1,
字段2
) partition by (year string ,month string,day string)
row format delimited
Fields terminated by ',';

复制代码

给分区表动态加载数据：

开启动态分区：

Set hive.exec.dynamic.partition.mode=nonstrict;

Insert into table 表名_分区 partition (year,month,day) Select 字段1,字段2,year(time) string , month(time) string , day(time) string from 源表;

复制代码

3. 修改分区

3.1 增加分区

增加分区：本质上是更改元数据，不会加载数据，因此需包管在修改分区的路径下数据文件已经存在，或者增加完分区后再加载数据。

一级分区表添加：

Alter table 表名_分区 add
partition (time='2024-01-01') location '/user/hive/warehouse/表名_分区/time=2024-01-01'
partition (time='2024-01-02') location '/user/hive/warehouse/表名_分区/time=2024-01-02';

复制代码

多级分区表添加：

Alter table 表名_分区 add
partition (year='2024',month='02') location '/user/hive/warehouse/表名_分区/与其他分区名类似'
partition (year='2024',month='03') location '/user/hive/warehouse/表名_分区/与其他分区名类似';

复制代码

3.2 重定名分区

Alter table 表名_分区 partition 原分区名 rename to partition 新分区名;

Alter table 表名_分区 partition(time='2024-01-01') rename to partition (time='2024-01-05');

复制代码

3.3 删除分区

Alter table 表名_分区 drop if exists partition (time=’2024-01-04’) [purge]不放进垃圾桶;

3.4 修改分区

Alter table 表名_分区 partition (time=’2024-01-01’) set (修改文件存储格式，分区位置);

3.5 MSCK元数据修复

指定修复分区的哪些操纵，默认修复增加分区操纵
MSCK repair table 表名 [add/drop partitions]
使用场景：

在创建完分区表且加载完数据后，想要从HDFS或Linux中直接添加/删除一个分区时，此时增加/删除的分区并没有存放在分区表的元数据中，因此后期再加载数据文件到该分区目录时不见效，因此想要MSCK修复表的元数据。

3.5.1 修复增加分区

#在linux中使用hdfs命令创建分区文件夹
hadoop fs -mkdir -p /user/hive/..../表名_分区/time='2024-08-08';
hadoop fs -mkdir -p /user/hive/..../表名_分区/time='2024-08-09';
#把数据文件上传到对应的分区文件夹下
hadoop fs -put 文件名0808.txt /user/hive/..../表名_分区/time='2024-08-08';
hadoop fs -put 文件名0809.txt /user/hive/..../表名_分区/time='2024-08-09';

复制代码

查询表时，发现表中并没有0808和0809的数据，使用MSCK来修复

MSCK repair table 表名_分区 add partitions;

复制代码

3.5.2 修复删除分区

#在linux中使用hdfs命令删除分区文件夹
hadoop fs -rm -r /user/hive/..../表名_分区/time='2024-08-08';
hadoop fs -rm -r /user/hive/..../表名_分区/time='2024-08-09';

复制代码

在show 表的元数据时，发现表的该分区元数据信息并没有删除，使用MSCK修复

MSCK repair table 表名_分区 drop partitions;

复制代码

4. 分区加载数据

均是在DataGrip中执行的
4.1 数据文件加载到分区表中

load data [local] inpath 'HDFS[Linux]文件地址' [overwrite] into table 表名 partition(分区字段='分区值');

复制代码

文件地址分为相对、绝对、完整的URI
相对：linux/HDFS手动进入到目录下 ./剩余的路径
绝对：/完整的路径
完整：HDFS为 hdfs://namenode:历程号/文件路径；Linux为 file:///文件路径
overwrite :后面跟的是目标表的指定分区目录，则只覆盖了指定分区目录中的数据。
4.2 源表数据插入到分区表中

#静态分区
from 源表
insert overwrite table 表名 partition (分区字段='值')
select 字段 (该字段中不包含分区字段，因为分区字段的值已经指定) ....
#动态分区
from 源表
insert overwrite table 表名 partition (分区字段)
select 字段 , 分区字段 ....

复制代码

对比 insert+select 方式源表插入到普通表中：

#追加插入
insert into table 表名 select 字段 from 源表;
#覆盖插入
insert overwrite table 表名 select 字段 from 源表;
#一次扫描多次插入
from 源表
insert into table 表1 select 字段
insert overwrite table 表2 select 字段;

复制代码

5. 导出数据

#将文件导出到HDFS中
insert overwrite directory '导出文件放置的HDFS目录' row format delimited fields terminated by ',' select 字段 from 表名;
#将文件导出到Linux中
insert overwrite local directory '导出文件放置的Linux目录' row format delimited fields terminated by ',' select 字段 from 表名;
#表文件多重导出到表中
from 源表
insert overwrite [local] directory '路径1' select 字段 ....
insert overwrite [local] directory '路径2' select 字段 ....

复制代码

免责声明：如果侵犯了您的权益，请联系站长，我们会及时删除侵权内容，谢谢合作！更多信息从访问主页：qidao123.com:ToB企服之家，中国第一个企服评测及商务社交产业平台。

		自动登录	找回密码
密码			立即注册

Hive 分区

0 个回复

快速回复

楼主热帖

标签云