HIVE学习条记–Linux命令篇

熊熊出没 · 2024-12-19 21:55:33

【HIVE体验】

jps(可以看到Runjar确保启动metastore
创建表
create table test (id int,name string);
默认存在default数据库
插入数据
insert into test values(1,‘tom’),(2,‘jerry’)
查询数据
select name,count(*)as cnt from test group by name
本质上HIVE操作的照旧hdfs中的文件默认在HIVE/warehouse，在mysql中存的都是元数据
启动HIVE
bin/hive --service metastore
启动hiveserver2服务
bin/hive --service hiveserver2
后台启动
nohup in/hive --service hiveserver2 >> logs.log 2>&1 &
第三方客户端beeline连接hive，还有比方datagrip,dbeaver
！ connect jdbc:hive2://node1:10000
【HIVE根本语法】

查看数据库信息
desc database myhive;
创建数据库的时间指定存储位置
create databse myhive2 location ‘/myhive2’;
删除数据库,可以通过cascade逼迫删除包含表的数据库
drop database myhive;
数据表操作
create table test (id int,name string,gender string);
建表的时间指定分隔符
create table test()row format delimited fields teminated by ‘/t’;
创建外部表,指定到文件夹级别查看表
create external table test()row format delimited fields teminated by ‘/t’ location ‘/tmp/test’;
查看表类型
desc formatted table_name;
内部表改成外部表
alter table table_name set tblproperties(‘EXTERNAL’=‘TRUE’);
【HIVE数据导入和导出】

从文件中加载，速度快，本质是文件的移动，源文件会被移动
load data [local] inpath [overwrite] into table databse.table_name
数据加载-insert select语法,这个用法会启动MapReduce
insert [overwrite|into] table tablename1 [partition] [if not exists] select name from name_table;
数据导出-insert overwrite
insert overwrite [local] directory [path] select * from table_name;
制定导出的文件的分隔符
insert overwrite [local] directory [path] row format delimited fields terminated by ‘/t’ select * from table_name;
hive表导出-hiveshell
/hive -e ‘selelct * from table_name’ > test.txt
/hive -f export.sql > export.txt
【分区表】

创建一个单分区表，按照月份分区
create table test() partitioned by (month string) row format delimited fields teminated by ‘/t’;
加载数据到分区表
load data [local] inpath [overwrite] into table databse.table_name partition(month=‘202005’);
创建一个多分区的表，按照年代日
create table test() partitioned by (year string,month string,day string) row format delimited fields teminated by ‘/t’;
load data [local] inpath [overwrite] into table databse.table_name partition(year=‘2022’,month=‘05’,day=‘10’);
【分桶表】

分区试讲表放在不同文件夹存储，分桶就是将表拆分到固定数量的不同文件中进行存储
开启分桶表的自动优化
set hive.enforce.bucketing=true;
创建分桶表
create table test() clustered by (id) into 3 buckets row format delimited fields terminated by ‘/t’;
分桶表数据加载不能用load data 实行，只能通过insert select,而且先创建一个中转表
向中转表load数据后再insert到分桶表中
create table test (id string,name string);
load data local inpath ‘/test’ into table test_temp
insert overwrite table test select * from test_temp cluster by (id);
数据的划分基于分桶列的值进行hash取模决定，因为load不会触发MapReduce计算，所以无法实行hash算法，只是简朴的进行数据移动，所以不能用于分桶表数据插入
【表操作】

修改表名
alter table old_name rename to new_name;
修改表属性值
alter table table_name set tblproperties(‘EXTERNAL’=‘FALSE’);
添加分区
alter table tablename add partition(month=‘201101’);
修改分区值
alter table tablename partition(month=‘200205’) rename to partition(month=‘200305’);
删除分区
alter table tablename drop partition(month=‘201105’);
添加列
alter table table_name add columns(v1 int);
修改列名但是不能修改列的数据类型
alter table test_change change v1 v1new int;
清空表数据
truncate table test;
外部表无法实行清空操作，没有管理权限
【复杂数据类型-数组】

array collection items terminated by ','代表数组存储的分隔符
create table test(name array)
row format delimited fields terminated by ‘/t’
collection items terminated by ‘,’;
取数select name[0] from test;
select name from test where ARRAY_CONTAINS(name,‘tom’)；条件返回
【复杂数据类型-map映射】

key-value型数据
create table test(id int,name map<string,string>)
row format delimited fields terminated by ‘/t’
collection items terminated by ‘,’
map keys teminated by ‘:’;
id name
1 father:Tom,mather

ucy,brother:Jim
查看每个人父亲名字
select id ,name[‘father’] from test;
取出全部key值,返回数据类型是数组
select map_keys(name) from test;
查询每一组map的大小
select size(name) from test;
查看指定数据是偶包含
select * from test where ARRAY_CONTAINS(map_keys(name),‘father’);
select * from test where ARRAY_CONTAINS(map_values(name),‘tom’);
【复杂数据类型-struct布局】

create table test(id int,info structname:string,age:int)
row format delimited fields terminated by ‘/t’
collection items terminated by ‘#’；
id info
1 Tom#12
2 Jerry#13
查询内容
select id,info.name from test;
【hive SQL根本查询】

特有关键字cluster by ,distribute by,sort by
过滤广东省的订单
select * from test where useraddress like ‘%广东%’；
找出广东省单笔营业额最大的
select * from test where useraddress like ‘%广东%’ order by totalmoney desc limit 1；
统计未支付和已支付的人数
select count(*) as cnt from test group by ispay;
已付款的订单中，统计每个用户最高消费额
select user_id,max(totalmoney) from test where ispay=1 group by userid;
每个用户的平均消费金额
select user_id,avg(totalmoney) from test group by userid;
统计平均值大于10000的
select user_id,avg(totalmoney) as avg from test group by userid having avg>10000;
【hive sql RLIKE正则匹配】

hive正则表达式查询表-泉源【B站-黑马程序员】

下一篇 hive-简朴实战案例篇敬请等待

免责声明：如果侵犯了您的权益，请联系站长，我们会及时删除侵权内容，谢谢合作！更多信息从访问主页：qidao123.com:ToB企服之家，中国第一个企服评测及商务社交产业平台。

		自动登录	找回密码
密码			立即注册

HIVE学习条记–Linux命令篇

本帖子中包含更多资源

0 个回复

快速回复

楼主热帖

标签云