环境的搭建

打印 上一主题 下一主题

主题 559|帖子 559|积分 1677

第4章 Hadoop文件参数配置

实验一:hadoop 全分布配置

1.1 实验目标

完成本实验,您应该能够:

  • 掌握 hadoop 全分布的配置
  • 掌握 hadoop 全分布的安装
  • 掌握 hadoop 配置文件的参数意义
1.2 实验要求


  • 认识 hadoop 全分布的安装
  • 相识 hadoop 配置文件的意义
1.3 实验过程

1.3.1 实验任务一:在 Master 节点上安装 Hadoop

1.3.1.1 步骤一:解压缩 hadoop-2.7.1.tar.gz 安装包到/usr 目录下
  1. [root@master ~]# tar zvxf jdk-8u152-linux-x64.tar.gz -C /usr/local/src/
  2. [root@master ~]# tar zvxf hadoop-2.7.1.tar.gz -C /usr/local/src/
复制代码
1.3.1.2 步骤二:将 hadoop-2.7.1 文件夹重命名为 hadoop
  1. [root@master ~]# cd /usr/local/src/
  2. [root@master src]# ls
  3. hadoop-2.7.1  jdk1.8.0_152
  4. [root@master src]# mv hadoop-2.7.1/ hadoop
  5. [root@master src]# mv jdk1.8.0_152/ jdk
  6. [root@master src]# ls
  7. hadoop  jdk
复制代码
1.3.1.3 步骤三:配置 Hadoop 环境变量

​ [root@master ~]# vi /etc/profile.d/hadoop.sh
注意:在第二章安装单机 Hadoop 系统已经配置过环境变量,先删除之前配置后添加
  1. #写入以下信息
  2. export JAVA_HOME=/usr/local/src/jdk
  3. export HADOOP_HOME=/usr/local/src/hadoop
  4. export PATH=${JAVA_HOME}/bin:${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin:$PATH
复制代码
1.3.1.4 步骤四:使配置的 Hadoop 的环境变量生效
  1. [root@master ~]# source /etc/profile.d/hadoop.sh
  2. [root@master ~]# echo $PATH
  3. /usr/local/src/jdk/bin:/usr/local/src/hadoop/bin:/usr/local/src/hadoop/sbin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin
复制代码
1.3.1.5 步骤五:执行以下命令修改 hadoop-env.sh 配置文件
  1. [root@master ~]# vi /usr/local/src/hadoop/etc/hadoop/hadoop-env.sh
  2. #写入以下信息
  3. export JAVA_HOME=/usr/local/src/jdk
复制代码
1.3.2 实验任务二:配置 hdfs-site.xml 文件参数

执行以下命令修改 hdfs-site.xml 配置文件。
  1. [root@master ~]# vi /usr/local/src/hadoop/etc/hadoop/hdfs-site.xml
  2. #在文件中<configuration>和</configuration>一对标签之间追加以下配置信息
  3. <configuration>
  4.                 <property>
  5.                                 <name>dfs.namenode.name.dir</name>
  6.                                 <value>file:/usr/local/src/hadoop/dfs/name</value>
  7.                 </property>
  8.                 <property>
  9.                                 <name>dfs.datanode.data.dir</name>
  10.                                 <value>file:/usr/local/src/hadoop/dfs/data</value>
  11.                 </property>
  12.                 <property>
  13.                                 <name>dfs.replication</name>
  14.                                 <value>2</value>
  15.                 </property>
  16. </configuration>
  17. 创建目录
  18. [root@master ~]# mkdir -p /usr/local/src/hadoop/dfs/{name,data}
复制代码
对于 Hadoop 的分布式文件系统 HDFS 而言,一样平常都是采用冗余存储,冗余因子通常为3,也就是说,一份数据生存三份副本。所以,修改 dfs.replication 的配置,使 HDFS 文件的备份副本数量设定为2个。
1.3.3 实验任务三:配置 core-site.xml 文件参数
  1. [root@master ~]# vi /usr/local/src/hadoop/etc/hadoop/core-site.xml
  2. #在文件中<configuration>和</configuration>一对标签之间追加以下配置信息
  3. <configuration>
  4.                 <property>
  5.                                 <name>fs.defaultFS</name>
  6.                                 <value>hdfs://master:9000</value>
  7.                 </property>
  8.                 <property>
  9.                                 <name>io.file.buffer.size</name>
  10.                 <value>131072</value>
  11.                 </property>
  12.                 <property>
  13.                                 <name>hadoop.tmp.dir</name>
  14.                                 <value>file:/usr/local/src/hadoop/tmp</value>
  15.                 </property>
  16. </configuration>
  17. #保存以上配置后创建目录
  18. [root@master ~]# mkdir -p /usr/local/src/hadoop/tmp
复制代码
如没有配置 hadoop.tmp.dir 参数,此时系统默认的临时目录为:/tmp/hadoop-hadoop。该目录在每次 Linux 系统重启后会被删除,必须重新执行 Hadoop 文件系统格式化命令,否则 Hadoop 运行会出错。
1.3.4 实验任务四:配置 mapred-site.xml
  1. [root@master ~]# cd /usr/local/src/hadoop/etc/hadoop/
  2. [root@master hadoop]# cp mapred-site.xml.template mapred-site.xml
  3. #在文件中<configuration>和</configuration>一对标签之间追加以下配置信息
  4. <configuration>
  5.                 <property>
  6.                                 <name>mapreduce.framework.name</name>
  7.                 <value>yarn</value>
  8.                 </property>
  9.                 <property>
  10.                                 <name>mapreduce.jobhistory.address</name>
  11.                                 <value>master:10020</value>
  12.                 </property>
  13.                 <property>
  14.                                 <name>mapreduce.jobhistory.webapp.address</name>
  15.                                 <value>master:19888</value>
  16.                 </property>
  17. </configuration>
复制代码
1.3.5 实验任务五:配置 yarn-site.xml
  1. [root@master hadoop]# vi /usr/local/src/hadoop/etc/hadoop/yarn-site.xml
  2. #在文件中<configuration>和</configuration>一对标签之间追加以下配置信息
  3. <configuration>
  4.                 <property>
  5.                                 <name>arn.resourcemanager.address</name>
  6.                                 <value>master:8032</value>
  7.                 </property>
  8.                 <property>
  9.                                 <name>yarn.resourcemanager.scheduler.address</name>
  10.                                 <value>master:8030</value>
  11.                 </property>
  12.                 <property>
  13.                                 <name>yarn.resourcemanager.webapp.address</name>
  14.                                 <value>master:8088</value>
  15.                 </property>
  16.                 <property>
  17.                                 <name>yarn.resourcemanager.resource-tracker.address</name>
  18.                                 <value>master:8031</value>
  19.                 </property>
  20.                 <property>
  21.                                 <name>yarn.resourcemanager.admin.address</name>
  22.                                 <value>master:8033</value>
  23.                 </property>
  24.                 <property>
  25.                                 <name>yarn.nodemanager.aux-services</name>
  26.                                 <value>mapreduce_shuffle</value>
  27.                 </property>
  28.                 <property>
  29.                           <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
  30.                           <value>org.apache.hadoop.mapred.ShuffleHandler</value>
  31.                 </property>
  32. </configuration>
复制代码
1.3.6 实验任务六:Hadoop 别的相关配置

1.3.6.1 步骤一:配置 masters 文件
  1. #修改 masters 配置文件
  2. [root@master ~]# vi /usr/local/src/hadoop/etc/hadoop/masters
  3. #加入以下配置信息
  4. 10.10.10.128
复制代码
1.3.6.2 步骤二:配置 slaves 文件
  1. #修改 slaves 配置文件
  2. [root@master ~]# vi /usr/local/src/hadoop/etc/hadoop/slaves
  3. #删除 localhost,加入以下配置信息
  4. 10.10.10.129
  5. 10.10.10.130
复制代码
1.3.6.3 步骤三:新建用户以及修改目录权限
  1. #新建用户
  2. [root@master ~]# useradd hadoop
  3. [root@master ~]# echo 'hadoop' | passwd --stdin hadoop
  4. Changing password for user hadoop.
  5. passwd: all authentication tokens updated successfully.
  6. #修改目录权限
  7. [root@master ~]# chown -R hadoop.hadoop /usr/local/src/
  8. [root@master ~]# cd /usr/local/src/
  9. [root@master src]# ll
  10. total 0
  11. drwxr-xr-x 11 hadoop hadoop 171 Mar 27 01:51 hadoop
  12. drwxr-xr-x  8 hadoop hadoop 255 Sep 14  2017 jdk
复制代码
1.3.6.4 步骤四:配置master能够免密登录全部slave节点
  1. [root@master ~]# ssh-keygen -t rsa
  2. Generating public/private rsa key pair.
  3. Enter file in which to save the key (/root/.ssh/id_rsa):
  4. Created directory '/root/.ssh'.
  5. Enter passphrase (empty for no passphrase):
  6. Enter same passphrase again:
  7. Your identification has been saved in /root/.ssh/id_rsa.
  8. Your public key has been saved in /root/.ssh/id_rsa.pub.
  9. The key fingerprint is:
  10. SHA256:Ibeslip4Bo9erREJP37u7qhlwaEeMOCg8DlJGSComhk root@master
  11. The key's randomart image is:
  12. +---[RSA 2048]----+
  13. |B.oo |
  14. |Oo.o |
  15. |=o=.  . o|
  16. |E.=.o  + o   |
  17. |.* BS|
  18. |* o =  o |
  19. | * * o+  |
  20. |o O *o   |
  21. |.=.+==   |
  22. +----[SHA256]-----+
  23. [root@master ~]# ssh-copy-id root@slave1
  24. /usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_rsa.pub"
  25. The authenticity of host 'slave1 (10.10.10.129)' can't be established.
  26. ECDSA key fingerprint is SHA256:Z643OMlGh0yMEc5i85oZ7c21NHdkzSZD9hY9K39xzP4.
  27. ECDSA key fingerprint is MD5:e0:ef:47:5f:ad:75:9a:44:08:bc:f2:10:8e:d6:53:4a.
  28. Are you sure you want to continue connecting (yes/no)? yes
  29. /usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
  30. /usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
  31. root@slave1's password:
  32. Number of key(s) added: 1
  33. Now try logging into the machine, with:   "ssh 'root@slave1'"
  34. and check to make sure that only the key(s) you wanted were added.
  35. [root@master ~]# ssh-copy-id root@slave2
  36. /usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_rsa.pub"
  37. The authenticity of host 'slave2 (10.10.10.130)' can't be established.
  38. ECDSA key fingerprint is SHA256:Z643OMlGh0yMEc5i85oZ7c21NHdkzSZD9hY9K39xzP4.
  39. ECDSA key fingerprint is MD5:e0:ef:47:5f:ad:75:9a:44:08:bc:f2:10:8e:d6:53:4a.
  40. Are you sure you want to continue connecting (yes/no)? yes
  41. /usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
  42. /usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
  43. root@slave2's password:
  44. Number of key(s) added: 1  
  45. Now try logging into the machine, with:   "ssh 'root@slave2'"
  46. and check to make sure that only the key(s) you wanted were added.
  47.    
  48. [root@master ~]# ssh slave1
  49. Last login: Sun Mar 27 02:58:38 2022 from master
  50. [root@slave1 ~]# exit
  51. logout
  52. Connection to slave1 closed.
  53. [root@master ~]# ssh slave2
  54. Last login: Sun Mar 27 00:26:12 2022 from 10.10.10.1
  55. [root@slave2 ~]# exit
  56. logout
  57. Connection to slave2 closed.
复制代码
1.3.6.5 步骤五:同步/usr/local/src/目录下全部文件至全部slave节点
  1. [root@master ~]# scp -r /usr/local/src/* root@slave1:/usr/local/src/
  2. [root@master ~]# scp -r /usr/local/src/* root@slave2:/usr/local/src/
  3. [root@master ~]# scp /etc/profile.d/hadoop.sh root@slave1:/etc/profile.d/
  4. hadoop.sh                                   100%  151    45.9KB/s   00:00
  5.    
  6. [root@master ~]# scp /etc/profile.d/hadoop.sh root@slave2:/etc/profile.d/
  7. hadoop.sh                                   100%  151    93.9KB/s   00:00   
复制代码
1.3.6.6 步骤六:在全部slave节点执行以下命令
  1. (1)在slave1
  2. [root@slave1 ~]# useradd hadoop
  3. [root@slave1 ~]# echo 'hadoop' | passwd --stdin hadoop
  4. Changing password for user hadoop.
  5. passwd: all authentication tokens updated successfully.
  6. [root@slave1 ~]# chown -R hadoop.hadoop /usr/local/src/
  7. [root@slave1 ~]# ll /usr/local/src/
  8. total 0
  9. drwxr-xr-x 11 hadoop hadoop 171 Mar 27 03:07 hadoop
  10. drwxr-xr-x  8 hadoop hadoop 255 Mar 27 03:07 jdk
  11. [root@slave1 ~]# source /etc/profile.d/hadoop.sh
  12. [root@slave1 ~]# echo $PATH
  13. /usr/local/src/jdk/bin:/usr/local/src/hadoop/bin:/usr/local/src/hadoop/sbin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin
  14. (2)在slave2
  15. [root@slave2 ~]# useradd hadoop
  16. [root@slave2 ~]# echo 'hadoop' | passwd --stdin hadoop
  17. Changing password for user hadoop.
  18. passwd: all authentication tokens updated successfully.
  19. [root@slave2 ~]# chown -R hadoop.hadoop /usr/local/src/
  20. [root@slave2 ~]# ll /usr/local/src/
  21. total 0
  22. drwxr-xr-x 11 hadoop hadoop 171 Mar 27 03:09 hadoop
  23. drwxr-xr-x  8 hadoop hadoop 255 Mar 27 03:09 jdk
  24. [root@slave2 ~]# source /etc/profile.d/hadoop.sh
  25. [root@slave2 ~]# echo $PATH
  26. /usr/local/src/jdk/bin:/usr/local/src/hadoop/bin:/usr/local/src/hadoop/sbin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin
复制代码
第5章 Hadoop集群运行

实验一:hadoop 集群运行

1.1 实验目标

完成本实验,您应该能够:

  • 掌握 hadoop 的运行状态
  • 掌握 hadoop 文件系统格式化配置
  • 掌握 hadoop java 运行状态查看
  • 掌握 hadoop hdfs 报告查看
  • 掌握 hadoop 节点状态查看
  • 掌握停止 hadoop 进程操作
1.2 实验要求


  • 认识怎样查看 hadoop 的运行状态
  • 认识停止 hadoop 进程的操作
1.3 实验过程

1.3.1 实验任务一:配置 Hadoop 格式化

1.3.1.1 步骤一:NameNode 格式化

将 NameNode 上的数据清零,第一次启动 HDFS 时要进行格式化,以后启动无需再格式化,否则会缺失 DataNode 进程。别的,只要运行过 HDFS,Hadoop 的工作目录(本书设置为/usr/local/src/hadoop/tmp)就会有数据,如果需要重新格式化,则在格式化之前一定要先删除工作目录下的数据,否则格式化时会出题目。
执行如下命令,格式化 NameNode
  1. [root@master ~]# su - hadoop
  2. Last login: Fri Apr  1 23:34:46 CST 2022 on pts/1
  3. [hadoop@master ~]$ cd /usr/local/src/hadoop/
  4. [hadoop@master hadoop]$ ./bin/hdfs namenode -format
  5. 22/04/02 01:22:42 INFO namenode.NameNode: STARTUP_MSG:
  6. /************************************************************
复制代码
1.3.1.2 步骤二:启动 NameNode
  1. [hadoop@master hadoop]$ hadoop-daemon.sh start namenode
  2. namenode running as process 11868. Stop it first.
复制代码
1.3.2 实验任务二:查看 Java 进程

启动完成后,可以使用 JPS 命令查看是否成功。JPS 命令是 Java 提供的一个显示当前全部 Java 进程 pid 的命令。
  1. [hadoop@master hadoop]$ jps
  2. 12122 Jps
  3. 11868 NameNode
复制代码
1.3.2.1 步骤一:切换到Hadoop用户
  1. [hadoop@master ~]$ su - hadoop
  2. Password:
  3. Last login: Sat Apr  2 01:22:13 CST 2022 on pts/1
  4. Last failed login: Sat Apr  2 04:47:08 CST 2022 on pts/1
  5. There was 1 failed login attempt since the last successful login.
复制代码
1.3.3 实验任务三:查看 HDFS 的报告
  1. [hadoop@master ~]$ hdfs dfsadmin -report
  2. Configured Capacity: 0 (0 B)
  3. Present Capacity: 0 (0 B)
  4. DFS Remaining: 0 (0 B)
  5. DFS Used: 0 (0 B)
  6. DFS Used%: NaN%
  7. Under replicated blocks: 0
  8. Blocks with corrupt replicas: 0
  9. Missing blocks: 0
  10. Missing blocks (with replication factor 1): 0
  11. -------------------------------------------------
复制代码
1.3.3.1 步骤一:生成密钥
  1. [hadoop@master ~]$ ssh-keygen -t rsa
  2. Generating public/private rsa key pair.
  3. Enter file in which to save the key (/home/hadoop/.ssh/id_rsa):
  4. Created directory '/home/hadoop/.ssh'.
  5. Enter passphrase (empty for no passphrase):
  6. Enter same passphrase again:
  7. Your identification has been saved in /home/hadoop/.ssh/id_rsa.
  8. Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub.
  9. The key fingerprint is:
  10. SHA256:nW/cVxmRp5Ht9TKGT61OmGbhQtkBdpHyS5prGhx24pI hadoop@master.example.com
  11. The key's randomart image is:
  12. +---[RSA 2048]----+
  13. |  o.oo +.|
  14. | ...o o.=|
  15. |   = o *+|
  16. | .o.* * *|
  17. |S.+= O =.|
  18. |   = ++oB.+ .|
  19. |  E +  =+o. .|
  20. |   . .o.  .. |
  21. |.o   |
  22. +----[SHA256]-----+
  23. [hadoop@master ~]$ ssh-copy-id slave1
  24. /bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/home/hadoop/.ssh/id_rsa.pub"
  25. The authenticity of host 'slave1 (10.10.10.129)' can't be established.
  26. ECDSA key fingerprint is SHA256:BE2tM2BCeGBc6aGRKBTbMTh80VP9noFKzqDknL+0Jes.
  27. ECDSA key fingerprint is MD5:a2:25:9c:bc:d0:df:fc:ec:44:4a:c0:10:26:f2:ef:c7.
  28. Are you sure you want to continue connecting (yes/no)? yes
  29. /bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
  30. /bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
  31. hadoop@slave1's password:
  32. Number of key(s) added: 1
  33. Now try logging into the machine, with:   "ssh 'slave1'"
  34. and check to make sure that only the key(s) you wanted were added.
  35. [hadoop@master ~]$ ssh-copy-id slave2
  36. /bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/home/hadoop/.ssh/id_rsa.pub"
  37. The authenticity of host 'slave2 (10.10.10.130)' can't be established.
  38. ECDSA key fingerprint is SHA256:BE2tM2BCeGBc6aGRKBTbMTh80VP9noFKzqDknL+0Jes.
  39. ECDSA key fingerprint is MD5:a2:25:9c:bc:d0:df:fc:ec:44:4a:c0:10:26:f2:ef:c7.
  40. Are you sure you want to continue connecting (yes/no)? yes
  41. /bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
  42. /bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
  43. hadoop@slave2's password:
  44. Number of key(s) added: 1
  45. Now try logging into the machine, with:   "ssh 'slave2'"
  46. and check to make sure that only the key(s) you wanted were added.
  47. [hadoop@master ~]$ ssh-copy-id master
  48. /bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/home/hadoop/.ssh/id_rsa.pub"
  49. The authenticity of host 'master (10.10.10.128)' can't be established.
  50. ECDSA key fingerprint is SHA256:BE2tM2BCeGBc6aGRKBTbMTh80VP9noFKzqDknL+0Jes.
  51. ECDSA key fingerprint is MD5:a2:25:9c:bc:d0:df:fc:ec:44:4a:c0:10:26:f2:ef:c7.
  52. Are you sure you want to continue connecting (yes/no)? yes
  53. /bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
  54. /bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
  55. hadoop@master's password:
  56. Number of key(s) added: 1
  57. Now try logging into the machine, with:   "ssh 'master'"
  58. and check to make sure that only the key(s) you wanted were added.
复制代码
1.3.4 实验任务四:停止dfs.sh
  1. [hadoop@master ~]$ stop-dfs.sh
  2. Stopping namenodes on [master]
  3. master: stopping namenode
  4. 10.10.10.129: no datanode to stop
  5. 10.10.10.130: no datanode to stop
  6. Stopping secondary namenodes [0.0.0.0]
  7. The authenticity of host '0.0.0.0 (0.0.0.0)' can't be established.
  8. ECDSA key fingerprint is SHA256:BE2tM2BCeGBc6aGRKBTbMTh80VP9noFKzqDknL+0Jes.
  9. ECDSA key fingerprint is MD5:a2:25:9c:bc:d0:df:fc:ec:44:4a:c0:10:26:f2:ef:c7.
  10. Are you sure you want to continue connecting (yes/no)? yes
  11. 0.0.0.0: Warning: Permanently added '0.0.0.0' (ECDSA) to the list of known hosts.
  12. 0.0.0.0: no secondarynamenode to stop
复制代码
1.3.4.1 重启并验证
  1. [hadoop@master ~]$ start-dfs.sh
  2. Starting namenodes on [master]
  3. master: starting namenode, logging to /usr/local/src/hadoop/logs/hadoop-hadoop-namenode-master.example.com.out
  4. 10.10.10.129: starting datanode, logging to /usr/local/src/hadoop/logs/hadoop-hadoop-datanode-slave1.out
  5. 10.10.10.130: starting datanode, logging to /usr/local/src/hadoop/logs/hadoop-hadoop-datanode-slave2.out
  6. Starting secondary namenodes [0.0.0.0]
  7. 0.0.0.0: starting secondarynamenode, logging to /usr/local/src/hadoop/logs/hadoop-hadoop-secondarynamenode-master.example.com.out
  8. [hadoop@master ~]$ start-yarn.sh
  9. starting yarn daemons
  10. starting resourcemanager, logging to /usr/local/src/hadoop/logs/yarn-hadoop-resourcemanager-master.example.com.out
  11. 10.10.10.129: starting nodemanager, logging to /usr/local/src/hadoop/logs/yarn-hadoop-nodemanager-slave1.out
  12. 10.10.10.130: starting nodemanager, logging to /usr/local/src/hadoop/logs/yarn-hadoop-nodemanager-slave2.out
  13. [hadoop@master ~]$ jps
  14. 12934 NameNode
  15. 13546 Jps
  16. 13131 SecondaryNameNode
  17. 13291 ResourceManager
  18. 如果在master上看到ResourceManager,并且在slave上看到NodeManager就表示成功
  19. [hadoop@master ~]$ jps
  20. 12934 NameNode
  21. 13546 Jps
  22. 13131 SecondaryNameNode
  23. 13291 ResourceManager
  24. [root@slave1 ~]# jps
  25. 11906 NodeManager
  26. 11797 DataNode
  27. 12037 Jps
  28. [root@slave2 ~]# jps
  29. 12758 NodeManager
  30. 12648 DataNode
  31. 12889 Jps
  32. [hadoop@master ~]$ hdfs dfs -mkdir /input
  33. [hadoop@master ~]$ hdfs dfs -ls /
  34. Found 1 items
  35. drwxr-xr-x   - hadoop supergroup          0 2022-04-02 05:18 /input
  36. [hadoop@master ~]$ mkdir ~/input
  37. [hadoop@master ~]$ vim ~/input/data.txt
  38. Hello World
  39. Hello Hadoop
  40. Hello Huasan
  41. ~
  42. [hadoop@master ~]$ hdfs dfs -put ~/input/data.txt
  43. .bash_logout       .bashrc            .oracle_jre_usage/ .viminfo           
  44. .bash_profile      input/             .ssh/              
  45. [hadoop@master ~]$ hdfs dfs -put ~/input/data.txt /input
  46. [hadoop@master ~]$ hdfs dfs -cat /input/data.txt
  47. Hello World
  48. Hello Hadoop
  49. Hello Huasan
  50. [hadoop@master ~]$ hadoop jar /usr/local/src/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar wordcount /input/data.txt /output
  51. 22/04/02 05:31:20 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
  52. 22/04/02 05:31:21 INFO input.FileInputFormat: Total input paths to process : 1
  53. 22/04/02 05:31:21 INFO mapreduce.JobSubmitter: number of splits:1
  54. 22/04/02 05:31:21 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1648846845675_0001
  55. 22/04/02 05:31:22 INFO impl.YarnClientImpl: Submitted application application_1648846845675_0001
  56. 22/04/02 05:31:22 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1648846845675_0001/
  57. 22/04/02 05:31:22 INFO mapreduce.Job: Running job: job_1648846845675_0001
  58. 22/04/02 05:31:30 INFO mapreduce.Job: Job job_1648846845675_0001 running in uber mode : false
  59. 22/04/02 05:31:30 INFO mapreduce.Job:  map 0% reduce 0%
  60. 22/04/02 05:31:38 INFO mapreduce.Job:  map 100% reduce 0%
  61. 22/04/02 05:31:42 INFO mapreduce.Job:  map 100% reduce 100%
  62. 22/04/02 05:31:42 INFO mapreduce.Job: Job job_1648846845675_0001 completed successfully
  63. 22/04/02 05:31:42 INFO mapreduce.Job: Counters: 49
  64.     File System Counters
  65.             FILE: Number of bytes read=56
  66.             FILE: Number of bytes written=230931
  67.             FILE: Number of read operations=0
  68.             FILE: Number of large read operations=0
  69.             FILE: Number of write operations=0
  70.             HDFS: Number of bytes read=136
  71.             HDFS: Number of bytes written=34
  72.             HDFS: Number of read operations=6
  73.             HDFS: Number of large read operations=0
  74.             HDFS: Number of write operations=2
  75.     Job Counters
  76.             Launched map tasks=1
  77.             Launched reduce tasks=1
  78.             Data-local map tasks=1
  79.             Total time spent by all maps in occupied slots (ms)=5501
  80.             Total time spent by all reduces in occupied slots (ms)=1621
  81.             Total time spent by all map tasks (ms)=5501
  82.             Total time spent by all reduce tasks (ms)=1621
  83.             Total vcore-seconds taken by all map tasks=5501
  84.             Total vcore-seconds taken by all reduce tasks=1621
  85.             Total megabyte-seconds taken by all map tasks=5633024
  86.             Total megabyte-seconds taken by all reduce tasks=1659904
  87.     Map-Reduce Framework
  88.             Map input records=3
  89.             Map output records=6
  90.             Map output bytes=62
  91.             Map output materialized bytes=56
  92.             Input split bytes=98
  93.             Combine input records=6
  94.             Combine output records=4
  95.             Reduce input groups=4
  96.             Reduce shuffle bytes=56
  97.             Reduce input records=4
  98.             Reduce output records=4
  99.             Spilled Records=8
  100.             Shuffled Maps =1
  101.             Failed Shuffles=0
  102.             Merged Map outputs=1
  103.             GC time elapsed (ms)=572
  104.             CPU time spent (ms)=1860
  105.             Physical memory (bytes) snapshot=428474368
  106.             Virtual memory (bytes) snapshot=4219695104
  107.             Total committed heap usage (bytes)=284164096
  108.     Shuffle Errors
  109.             BAD_ID=0
  110.             CONNECTION=0
  111.             IO_ERROR=0
  112.             WRONG_LENGTH=0
  113.             WRONG_MAP=0
  114.             WRONG_REDUCE=0
  115.     File Input Format Counters
  116.             Bytes Read=38
  117.     File Output Format Counters
  118.             Bytes Written=34
  119. [hadoop@master ~]$ hdfs dfs -cat /output/part-r-00000
  120. Hadoop  1
  121. Hello   3
  122. Huasan  1
  123. World   1
复制代码
第6章 Hive组建安装配置

实验一:Hive 组件安装配置

1.1. 实验目标

完成本实验,您应该能够:

  • 掌握Hive 组件安装配置
  • 掌握Hive 组件格式化和启动
1.2. 实验要求


  • 认识Hive 组件安装配置
  • 相识Hive 组件格式化和启动
1.3. 实验过程

1.3.1. 实验任务一:下载和解压安装文件

1.3.1.1. 步骤一:基础环境和安装准备

Hive 组件需要基于Hadoop 系统进行安装。因此,在安装 Hive 组件前,需要确保 Hadoop 系统能够正常运行。本章节内容是基于之前已部署完毕的 Hadoop 全分布系统,在 master 节点上实现 Hive 组件安装。
Hive 组件的部署规划和软件包路径如下:
(1)当前环境中已安装 Hadoop 全分布系统。
(2)当地安装 MySQL 数据库(账号 root,密码 Password123$), 软件包在/opt/software/mysql-5.7.18 路径下。
(3)MySQL 端口号(3306)。
(4)MySQL 的 JDBC 驱动包/opt/software/mysql-connector-java-5.1.47.jar, 在此基础上更新 Hive 元数据存储。
(5)Hive 软件包/opt/software/apache-hive-2.0.0-bin.tar.gz。
1.3.1.2. 步骤二:解压安装文件

(1)使用 root 用户,将 Hive 安装包
/opt/software/apache-hive-2.0.0-bin.tar.gz 路解压到/usr/local/src 路径下。
  1. [root@master ~]# tar -zxvf /opt/software/apache-hive-2.0.0-bin.tar.gz -C /usr/local/src/
复制代码
(2)将解压后的 apache-hive-2.0.0-bin 文件夹更名为 hive;
  1. [root@master ~]# mv /usr/local/src/apache-hive-2.0.0-bin/ /usr/local/src/hive/
复制代码
(3)修改 hive 目录归属用户和用户组为 hadoop
  1. [root@master ~]# chown -R hadoop:hadoop /usr/local/src/hive
复制代码
1.3.2. 实验任务二:设置 Hive 环境

1.3.2.1. 步骤一:卸载MariaDB 数据库

Hive 元数据存储在 MySQL 数据库中,因此在部署 Hive 组件前需要起首在 Linux 系统下安装 MySQL 数据库,并进行 MySQL 字符集、安全初始化、远程访问权限等相关配置。需要使用 root 用户登录,执行如下操作步骤:
(1)关闭 Linux 系统防火墙,并将防火墙设定为系统开机并不主动启动。
  1. [root@master ~]# systemctl stop firewalld
  2. [root@master ~]# systemctl disable firewalld
复制代码
(2)卸载 Linux 系统自带的 MariaDB。

  • 起首查看 Linux 系统中 MariaDB 的安装环境。
    [root@master ~]# rpm -qa | grep mariadb
2)卸载 MariaDB 软件包。
我这里没有就不需要卸载
1.3.2.2. 步骤二:安装MySQL 数据库

(1)按如下顺序依次按照 MySQL 数据库的 mysql common、mysql libs、mysql client 软件包。
  1. [root@master ~]# cd /opt/software/mysql-5.7.18/
  2. [root@master mysql-5.7.18]# rpm -ivh mysql-community-common-5.7.18-1.el7.x86_64.rpm
  3. warning: mysql-community-common-5.7.18-1.el7.x86_64.rpm: Header V3 DSA/SHA1 Signature, key ID 5072e1f5: NOKEY
  4. Preparing...  ################################# [100%]
  5. package mysql-community-common-5.7.18-1.el7.x86_64 is already installed
  6. [root@master mysql-5.7.18]# rpm -ivh mysql-community-libs-5.7.18-1.el7.x86_64.rpm
  7. warning: mysql-community-libs-5.7.18-1.el7.x86_64.rpm: Header V3 DSA/SHA1 Signature, key ID 5072e1f5: NOKEY
  8. Preparing...  ################################# [100%]
  9. package mysql-community-libs-5.7.18-1.el7.x86_64 is already installed
  10. [root@master mysql-5.7.18]# rpm -ivh mysql-community-client-5.7.18-1.el7.x86_64.rpm
  11. warning: mysql-community-client-5.7.18-1.el7.x86_64.rpm: Header V3 DSA/SHA1 Signature, key ID 5072e1f5: NOKEY
  12. Preparing...  ################################# [100%]
  13. package mysql-community-client-5.7.18-1.el7.x86_64 is already installed
复制代码
(2)安装 mysql server 软件包。
  1. [root@master mysql-5.7.18]# rpm -ivh mysql-community-server-5.7.18-1.el7.x86_64.rpm
  2. warning: mysql-community-server-5.7.18-1.el7.x86_64.rpm: Header V3 DSA/SHA1 Signature, key ID 5072e1f5: NOKEY
  3. Preparing...  ################################# [100%]
  4. package mysql-community-server-5.7.18-1.el7.x86_64 is already installed
复制代码
(3)修改 MySQL 数据库配置,在/etc/my.cnf 文件中添加如表 6-1 所示的 MySQL 数据库配置项。

将以下配置信息添加到/etc/my.cnf 文件 symbolic-links=0 配置信息的下方。
  1. default-storage-engine=innodb
  2. innodb_file_per_table
  3. collation-server=utf8_general_ci
  4. init-connect='SET NAMES utf8'
  5. character-set-server=utf8
复制代码
(4)启动 MySQL 数据库。
  1. [root@master ~]# systemctl start mysqld
复制代码
(5)查询 MySQL 数据库状态。mysqld 进程状态为 active (running),则表示 MySQL 数据库正常运行。
如果 mysqld 进程状态为 failed,则表示 MySQL 数据库启动非常。此时需要排查/etc/my.cnf 文件。
  1. [root@master ~]# systemctl status mysqld
  2. ● mysqld.service - MySQL Server
  3.    Loaded: loaded (/usr/lib/systemd/system/mysqld.service; enabled; vendor preset: disabled)
  4.    Active: active (running) since Sun 2022-04-10 22:54:39 CST; 1h 0min ago
  5. Docs: man:mysqld(8)
  6.    http://dev.mysql.com/doc/refman/en/using-systemd.html
  7. Main PID: 929 (mysqld)
  8.    CGroup: /system.slice/mysqld.service
  9.    └─929 /usr/sbin/mysqld --daemonize --pid-file=/var/run/mysqld/my...
  10. Apr 10 22:54:35 master systemd[1]: Starting MySQL Server...
  11. Apr 10 22:54:39 master systemd[1]: Started MySQL Server.
复制代码
(6)查询 MySQL 数据库默认密码。
  1. [root@master ~]# cat /var/log/mysqld.log | grep password
  2. 2022-04-08T16:20:04.456271Z 1 [Note] A temporary password is generated for root@localhost: 0yf>>yWdMd8_
复制代码
MySQL 数据库是安装后随机生成的,所以每次安装后生成的默认密码不雷同。
(7)MySQL 数据库初始化。 0yf>>yWdMd8_
执行 mysql_secure_installation 命令初始化 MySQL 数据库,初始化过程中需要设定命据库 root 用户登录密码,密码需符合安全规则,包括大小写字符、数字和特殊符号, 可设定密码为 Password123$。
在进行 MySQL 数据库初始化过程中会出现以下交互确认信息:
1)Change the password for root ? ((Press y|Y for Yes, any other key for No)表示是否更改 root 用户密码,在键盘输入 y 和回车。
2)Do you wish to continue with the password provided?(Press y|Y for Yes, any other key for No)表示是否使用设定的密码继续,在键盘输入 y 和回车。
3)Remove anonymous users? (Press y|Y for Yes, any other key for No)表示是否删除匿名用户,在键盘输入 y 和回车。
4)Disallow root login remotely? (Press y|Y for Yes, any other key for No) 表示是否拒绝 root 用户远程登录,在键盘输入 n 和回车,表示答应 root 用户远程登录。
5)Remove test database and access to it? (Press y|Y for Yes, any other key for No)表示是否删除测试数据库,在键盘输入 y 和回车。
6)Reload privilege tables now? (Press y|Y for Yes, any other key for No) 表示是否重新加载授权表,在键盘输入 y 和回车。
mysql_secure_installation 命令执行过程如下:
  1. [root@master ~]# mysql_secure_installation
  2. Securing the MySQL server deployment.
  3. Enter password for user root:
  4. The 'validate_password' plugin is installed on the server.
  5. The subsequent steps will run with the existing configuration
  6. of the plugin.
  7. Using existing password for root.
  8. Estimated strength of the password: 100
  9. Change the password for root ? ((Press y|Y for Yes, any other key for No) : y
  10. New password:
  11. Re-enter new password:
  12. Estimated strength of the password: 100
  13. Do you wish to continue with the password provided?(Press y|Y for Yes, any other key for No) : y
  14. By default, a MySQL installation has an anonymous user,
  15. allowing anyone to log into MySQL without having to have
  16. a user account created for them. This is intended only for
  17. testing, and to make the installation go a bit smoother.
  18. You should remove them before moving into a production
  19. environment.
  20. Remove anonymous users? (Press y|Y for Yes, any other key for No) : y
  21. Success.
复制代码
  1. Normally, root should only be allowed to connect from
  2. 'localhost'. This ensures that someone cannot guess at
  3. the root password from the network.
  4. Disallow root login remotely? (Press y|Y for Yes, any other key for No) : n
  5. ... skipping.
  6. By default, MySQL comes with a database named 'test' that
  7. anyone can access. This is also intended only for testing,
  8. and should be removed before moving into a production
  9. environment.
复制代码
  1. Remove test database and access to it? (Press y|Y for Yes, any other key for No) : y
  2. - Dropping test database...
  3. Success.
  4. - Removing privileges on test database...
  5. Success.
  6. Reloading the privilege tables will ensure that all changes
  7. made so far will take effect immediately.
  8. Reload privilege tables now? (Press y|Y for Yes, any other key for No) : y
  9. Success.
  10. All done!
复制代码
(7) 添加 root 用户从当地和远程访问 MySQL 数据库表单的授权。
  1. [root@master ~]# mysql -u root -p
  2. Enter password:
  3. Welcome to the MySQL monitor.  Commands end with ; or \g.
  4. Your MySQL connection id is 9
  5. Server version: 5.7.18 MySQL Community Server (GPL)
  6. Copyright (c) 2000, 2017, Oracle and/or its affiliates. All rights reserved.
  7. Oracle is a registered trademark of Oracle Corporation and/or its
  8. affiliates. Other names may be trademarks of their respective
  9. owners.
  10. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
  11. mysql> grant all privileges on *.* to root@'localhost' identified by 'Password123$';
  12. Query OK, 0 rows affected, 1 warning (0.00 sec)
  13. mysql> grant all privileges on *.* to root@'%' identified by 'Password123$';
  14. Query OK, 0 rows affected, 1 warning (0.00 sec)
  15. mysql> flush privileges;
  16. Query OK, 0 rows affected (0.00 sec)
  17. mysql> select user,host from mysql.user where user='root';
  18. +------+-----------+
  19. | user | host  |
  20. +------+-----------+
  21. | root | % |
  22. | root | localhost |
  23. +------+-----------+
  24. 2 rows in set (0.00 sec)
  25. mysql> exit;
  26. Bye
复制代码
1.3.2.3. 步骤三:配置 Hive 组件

(1)设置 Hive 环境变量并使其生效。
  1. [root@master ~]# vim /etc/profile
  2. export HIVE_HOME=/usr/local/src/hive
  3. export PATH=$PATH:$HIVE_HOME/bin
  4. [root@master ~]# source /etc/profile
复制代码
(2)修改 Hive 组件配置文件。
切换到 hadoop 用户执行以下对 Hive 组件的配置操作。
将/usr/local/src/hive/conf 文件夹下 hive-default.xml.template 文件,更名为hive-site.xml。
  1. [root@master ~]# su - hadoop
  2. Last login: Sun Apr 10 23:27:25 CS
  3. [hadoop@master ~]$ cp /usr/local/src/hive/conf/hive-default.xml.template  /usr/local/src/hive/conf/hive-site.xml
复制代码
(3)通过 vi 编辑器修改 hive-site.xml 文件实现 Hive 连接 MySQL 数据库,并设定Hive 临时文件存储路径。
  1. [hadoop@master ~]$ vi /usr/local/src/hive/conf/hive-site.xml
复制代码
1)设置 MySQL 数据库连接。
  1. <name>javax.jdo.option.ConnectionURL</name>
  2. <value>jdbc:mysql://master:3306/hive?createDatabaseIfNotExist=true&us eSSL=false</value>
  3. <description>JDBC connect string for a JDBC metastore</description>
复制代码
2)配置 MySQL 数据库 root 的密码。
  1. <property>
  2. <name>javax.jdo.option.ConnectionPassword</name>
  3. <value>Password123$</value>
  4. <description>password to use against s database</description>
  5. </property>
复制代码
3)验证元数据存储版本同等性。若默认 false,则不用修改。
  1. <property>
  2. <name>hive.metastore.schema.verification</name>
  3. <value>false</value>
  4. <description>
  5. Enforce metastore schema version consistency.
  6. True: Verify that version information stored in is compatible with one from Hive jars. Also disable automatic
  7. False: Warn if the version information stored in metastore doesn't match with one from in Hive jars.
  8. </description>
  9. </property>
复制代码
4)配置数据库驱动。
  1. <property>
  2. <name>javax.jdo.option.ConnectionDriverName</name>
  3. <value>com.mysql.jdbc.Driver</value>
  4. <description>Driver class name for a JDBC metastore</description>
  5. </property>
复制代码
5)配置数据库用户名 javax.jdo.option.ConnectionUserName 为 root。
  1. <property>
  2. <name>javax.jdo.option.ConnectionUserName</name>
  3. <value>root</value>
  4. <description>Username to use against metastore database</description>
  5. </property>
复制代码
6 )将以下位置的 ${system:java.io.tmpdir}/${system:user.name} 替换为“/usr/local/src/hive/tmp”目录及其子目录。
需要替换以下 4 处配置内容:
  1. <name>hive.querylog.location</name>
  2. <value>/usr/local/src/hive/tmp</value>
  3. <description>Location of Hive run time structured log file</description>
  4. <name>hive.exec.local.scratchdir</name>
  5. <value>/usr/local/src/hive/tmp</value>
  6. <name>hive.downloaded.resources.dir</name>
  7. <value>/usr/local/src/hive/tmp/resources</value>
  8. <name>hive.server2.logging.operation.log.location</name>
  9. <value>/usr/local/src/hive/tmp/operation_logs</value>
复制代码
7)在Hive安装目录中创建临时文件夹 tmp。
  1. [hadoop@master ~]$ mkdir /usr/local/src/hive/tmp
复制代码
至此,Hive 组件安装和配置完成。
1.3.2.4. 步骤四:初始化 hive 元数据

1)将 MySQL 数据库驱动(/opt/software/mysql-connector-java-5.1.46.jar)拷贝到Hive 安装目录的 lib 下;
  1. [hadoop@master ~]$ cp /opt/software/mysql-connector-java-5.1.46.jar /usr/local/src/hive/lib/
复制代码
2)重新启动 hadooop 即可
  1. [hadoop@master ~]$ stop-all.sh
  2. This script is Deprecated. Instead use stop-dfs.sh and stop-yarn.sh
  3. Stopping namenodes on [master]
  4. master: stopping namenode
  5. 10.10.10.129: stopping datanode
  6. 10.10.10.130: stopping datanode
  7. Stopping secondary namenodes [0.0.0.0]
  8. 0.0.0.0: stopping secondarynamenode
  9. stopping yarn daemons
  10. stopping resourcemanager
  11. 10.10.10.129: stopping nodemanager
  12. 10.10.10.130: stopping nodemanager
  13. no proxyserver to stop
  14. [hadoop@master ~]$ start-all.sh
  15. This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
  16. Starting namenodes on [master]
  17. master: starting namenode, logging to /usr/local/src/hadoop/logs/hadoop-hadoop-namenode-master.out
  18. 10.10.10.129: starting datanode, logging to /usr/local/src/hadoop/logs/hadoop-hadoop-datanode-slave1.out
  19. 10.10.10.130: starting datanode, logging to /usr/local/src/hadoop/logs/hadoop-hadoop-datanode-slave2.out
  20. Starting secondary namenodes [0.0.0.0]
  21. 0.0.0.0: starting secondarynamenode, logging to /usr/local/src/hadoop/logs/hadoop-hadoop-secondarynamenode-master.out
  22. starting yarn daemons
  23. starting resourcemanager, logging to /usr/local/src/hadoop/logs/yarn-hadoop-resourcemanager-master.out
  24. 10.10.10.130: starting nodemanager, logging to /usr/local/src/hadoop/logs/yarn-hadoop-nodemanager-slave2.out
  25. 10.10.10.129: starting nodemanager, logging to /usr/local/src/hadoop/logs/yarn-hadoop-nodemanager-slave1.out
复制代码
3)初始化数据库
  1. [hadoop@master ~]$ schematool -initSchema -dbType mysql
  2. which: no hbase in (/usr/local/src/jdk/bin:/usr/local/src/hadoop/bin:/usr/local/src/hadoop/sbin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/usr/local/src/hive/bin:/home/hadoop/.local/bin:/home/hadoop/bin)
  3. SLF4J: Class path contains multiple SLF4J bindings.
  4. SLF4J: Found binding in [jar:file:/usr/local/src/hive/lib/hive-jdbc-2.0.0-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  5. SLF4J: Found binding in [jar:file:/usr/local/src/hive/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  6. SLF4J: Found binding in [jar:file:/usr/local/src/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  7. SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
  8. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
  9. Metastore connection URL:jdbc:mysql://master:3306/hive?createDatabaseIfNotExist=true&us eSSL=false
  10. Metastore Connection Driver :com.mysql.jdbc.Driver
  11. Metastore connection User:   root
  12. Mon Apr 11 00:46:32 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
  13. Starting metastore schema initialization to 2.0.0
  14. Initialization script hive-schema-2.0.0.mysql.sql
  15. Password123$
  16. Password123$
  17. No current connection
  18. org.apache.hadoop.hive.metastore.HiveMetaException: Schema initialization FAILED! Metastore state would be inconsistent !!
复制代码
4)启动 hive
  1. [hadoop@master hive]$ hive
  2. SLF4J: Class path contains multiple SLF4J bindings.
  3. SLF4J: Found binding in [jar:file:/usr/local/src/hive/lib/hive-jdbc-2.0.0-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  4. SLF4J: Found binding in [jar:file:/usr/local/src/hive/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  5. SLF4J: Found binding in [jar:file:/usr/local/src/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  6. SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
  7. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
  8. Logging initialized using configuration in jar:file:/usr/local/src/hive/lib/hive-common-2.0.0.jar!/hive-log4j2.properties
  9. Fri May 20 18:51:50 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
  10. Fri May 20 18:51:50 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
  11. Fri May 20 18:51:50 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
  12. Fri May 20 18:51:50 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
  13. Fri May 20 18:51:52 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
  14. Fri May 20 18:51:52 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
  15. Fri May 20 18:51:52 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
  16. Fri May 20 18:51:52 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
  17. Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
  18. hive>
复制代码
第7章 ZooKeeper组件安装配置

实验一:ZooKeeper 组件安装配置

1.1.实验目标

完成本实验,您应该能够:

  • 掌握下载和安装 ZooKeeper
  • 掌握 ZooKeeper 的配置选项
  • 掌握启动 ZooKeeper
1.2.实验要求


  • 相识 ZooKeeper 的配置选项
  • 认识启动 ZooKeeper
1.3.实验过程

1.3.1 实验任务一:配置时间同步
  1. [root@master ~]# yum -y install chrony
  2. [root@master ~]# cat /etc/chrony.conf
  3. # Use public servers from the pool.ntp.org project.
  4. # Please consider joining the pool (http://www.pool.ntp.org/join.html).
  5. server time1.aliyun.com iburst
  6. [root@master ~]# systemctl restart chronyd.service
  7. [root@master ~]# systemctl enable chronyd.service
  8. [root@master ~]# date
  9. Fri Apr 15 15:40:14 CST 2022
复制代码
  1. [root@slave1 ~]# yum -y install chrony
  2. [root@slave1 ~]# cat /etc/chrony.conf
  3. # Use public servers from the pool.ntp.org project.
  4. # Please consider joining the pool (http://www.pool.ntp.org/join.html).
  5. server time1.aliyun.com iburst
  6. [root@slave1 ~]# systemctl restart chronyd.service
  7. [root@slave1 ~]# systemctl enable chronyd.service
  8. [root@slave1 ~]# date
  9. Fri Apr 15 15:40:17 CST 2022  
复制代码
  1. [root@slave2 ~]# yum -y install chrony
  2. [root@slave2 ~]# cat /etc/chrony.conf
  3. # Use public servers from the pool.ntp.org project.
  4. # Please consider joining the pool (http://www.pool.ntp.org/join.html).
  5. server time1.aliyun.com iburst
  6. [root@slave2 ~]# systemctl restart chronyd.service
  7. [root@slave2 ~]# systemctl enable chronyd.service
  8. [root@slave2 ~]# date
  9. Fri Apr 15 15:40:20 CST 2022
复制代码
1.3.2 实验任务二:下载和安装 ZooKeeper

ZooKeeper最新的版本可以通过官网http://hadoop.apache.org/zookeeper/来获取,安装 ZooKeeper 组件需要与 Hadoop 环境适配。
注意,各节点的防火墙需要关闭,否则会出现连接题目。
1.ZooKeeper 的安装包 zookeeper-3.4.8.tar.gz 已放置在 Linux系统/opt/software
目录下。
2.解压安装包到指定目标,在 Master 节点执行如下命令。
  1. [root@master ~]# tar xf /opt/software/zookeeper-3.4.8.tar.gz -C /usr/local/src/
  2. [root@master ~]# cd /usr/local/src/
  3. [root@master src]# mv zookeeper-3.4.8/ zookeeper
复制代码
1.3.3 实验任务三:ZooKeeper的配置选项

1.3.3.1 步骤一:Master节点配置

(1)在 ZooKeeper 的安装目录下创建 data 和 logs 文件夹。
  1. [root@master src]# cd /usr/local/src/zookeeper/
  2. [root@master zookeeper]# mkdir data logs
复制代码
(2)在每个节点写入该节点的标识编号,每个节点编号不同,master节点写入 1,slave1 节点写入2,slave2 节点写入3。
  1. [root@master zookeeper]# echo '1' > /usr/local/src/zookeeper/data/myid
复制代码
(3)修改配置文件 zoo.cfg
  1. [root@master zookeeper]# cd /usr/local/src/zookeeper/conf/
  2. [root@master conf]# cp zoo_sample.cfg zoo.cfg
复制代码
修改 dataDir 参数内容如下:
  1. [root@master conf]# vi zoo.cfg
  2. dataDir=/usr/local/src/zookeeper/data
复制代码
(4)在 zoo.cfg 文件末端追加以下参数配置,表示三个 ZooKeeper 节点的访问端口号。
  1. server.1=master:2888:3888
  2. server.2=slave1:2888:3888
  3. server.3=slave2:2888:3888
复制代码
(5)修改ZooKeeper安装目录的归属用户为 hadoop 用户。
  1. [root@master conf]# chown -R hadoop:hadoop /usr/local/src/
复制代码
1.3.3.2 步骤二:Slave 节点配置

(1)从 Master 节点复制 ZooKeeper 安装目录到两个 Slave 节点。
  1. [root@master ~]# scp -r /usr/local/src/zookeeper node1:/usr/local/src/
  2. [root@master ~]# scp -r /usr/local/src/zookeeper node2:/usr/local/src/
复制代码
(2)在slave1节点上修改 zookeeper 目录的归属用户为 hadoop 用户。
  1. [root@slave1 ~]# chown -R hadoop:hadoop /usr/local/src/
  2. [root@slave1 ~]# ll /usr/local/src/
  3. total 4
  4. drwxr-xr-x. 12 hadoop hadoop  183 Apr  2 18:11 hadoop
  5. drwxr-xr-x   9 hadoop hadoop  183 Apr 15 16:37 hbase
  6. drwxr-xr-x.  8 hadoop hadoop  255 Apr  2 18:06 jdk
  7. drwxr-xr-x  12 hadoop hadoop 4096 Apr 22 15:31 zookeeper
复制代码
(3)在slave1节点上配置该节点的myid为2。
  1. [root@slave1 ~]# echo 2 > /usr/local/src/zookeeper/data/myid
复制代码
(4)在slave2节点上修改 zookeeper 目录的归属用户为 hadoop 用户。
  1. [root@slave2 ~]# chown -R hadoop:hadoop /usr/local/src/
复制代码
(5)在slave2节点上配置该节点的myid为3。
  1. [root@slave2 ~]# echo 3 > /usr/local/src/zookeeper/data/myid
复制代码
1.3.3.3 步骤三:系统环境变量配置

在 master、slave1、slave2 三个节点增加环境变量配置。
  1. [root@master conf]# vi /etc/profile.d/zookeeper.sh
  2. export ZOOKEEPER_HOME=/usr/local/src/zookeeper
  3. export PATH=${ZOOKEEPER_HOME}/bin:$PATH
  4. [root@master ~]# scp /etc/profile.d/zookeeper.sh node1:/etc/profile.d/
  5. zookeeper.sh 100%   8742.3KB/s   00:00
  6. [root@master ~]# scp /etc/profile.d/zookeeper.sh node2:/etc/profile.d/
  7. zookeeper.sh 100%   8750.8KB/s   00:00
复制代码
1.3.4 实验任务四:启动 ZooKeeper

启动ZooKeeper需要使用Hadoop用户进行操作。
(1)分别在 master、slave1、slave2 三个节点使用 zkServer.sh start 命令启动ZooKeeper。
  1. [root@master ~]# su - hadoop
  2. Last login: Fri Apr 15 21:54:17 CST 2022 on pts/0
  3. [hadoop@master ~]$ jps
  4. 3922 Jps
  5. [hadoop@master ~]$ zkServer.sh start
  6. ZooKeeper JMX enabled by default
  7. Using config: /usr/local/src/zookeeper/bin/../conf/zoo.cfg
  8. Starting zookeeper ... STARTED
  9. [hadoop@master ~]$ jps
  10. 3969 Jps
  11. 3950 QuorumPeerMain
  12. [root@slave1 ~]# su - hadoop
  13. Last login: Fri Apr 15 22:06:47 CST 2022 on pts/0
  14. [hadoop@slave1 ~]$ jps
  15. 1370 Jps
  16. [hadoop@slave1 ~]$ zkServer.sh start
  17. ZooKeeper JMX enabled by default
  18. Using config: /usr/local/src/zookeeper/bin/../conf/zoo.cfg
  19. Starting zookeeper ... STARTED
  20. [hadoop@slave1 ~]$ jps
  21. 1395 QuorumPeerMain
  22. 1421 Jps
  23. [root@slave2 ~]# su - hadoop
  24. Last login: Fri Apr 15 16:25:52 CST 2022 on pts/1
  25. [hadoop@slave2 ~]$ jps
  26. 1336 Jps
  27. [hadoop@slave2 ~]$ zkServer.sh start
  28. ZooKeeper JMX enabled by default
  29. Using config: /usr/local/src/zookeeper/bin/../conf/zoo.cfg
  30. Starting zookeeper ... STARTED
  31. [hadoop@slave2 ~]$ jps
  32. 1361 QuorumPeerMain
  33. 1387 Jps
复制代码
(2)三个节点都启动完成后,再统一查看 ZooKeeper 运行状态。
  1. [hadoop@master conf]$ zkServer.sh status
  2. ZooKeeper JMX enabled by default
  3. Using config: /usr/local/src/zookeeper/bin/../conf/zoo.cfg
  4. Mode: follower
  5. [hadoop@slave1 ~]$ zkServer.sh status
  6. ZooKeeper JMX enabled by default
  7. Using config: /usr/local/src/zookeeper/bin/../conf/zoo.cfg
  8. Mode: leader
复制代码
  1. [hadoop@slave2 conf]$ zkServer.sh status
  2. ZooKeeper JMX enabled by default
  3. Using config: /usr/local/src/zookeeper/bin/../conf/zoo.cfg
  4. Mode: follower
复制代码
第8章 HBase组件安装配置

实验一:HBase 组件安装与配置

1.1实验目标

完成本实验,您应该能够:

  • 掌握HBase 安装与配置
  • 掌握HBase 常用 Shell 命令
1.2实验要求


  • 相识HBase 原理
  • 认识HBase 常用 Shell 命令
1.3实验过程

1.3.1 实验任务一:配置时间同步
  1. [root@master ~]# yum -y install chrony
  2. [root@master ~]# cat /etc/chrony.conf
  3. # Use public servers from the pool.ntp.org project.
  4. # Please consider joining the pool (http://www.pool.ntp.org/join.html).
  5. server time1.aliyun.com iburst
  6. [root@master ~]# systemctl restart chronyd.service
  7. [root@master ~]# systemctl enable chronyd.service
  8. [root@master ~]# date
  9. Fri Apr 15 15:40:14 CST 2022
复制代码
  1. [root@slave1 ~]# yum -y install chrony
  2. [root@slave1 ~]# cat /etc/chrony.conf
  3. # Use public servers from the pool.ntp.org project.
  4. # Please consider joining the pool (http://www.pool.ntp.org/join.html).
  5. server time1.aliyun.com iburst
  6. [root@slave1 ~]# systemctl restart chronyd.service
  7. [root@slave1 ~]# systemctl enable chronyd.service
  8. [root@slave1 ~]# date
  9. Fri Apr 15 15:40:17 CST 2022  
复制代码
  1. [root@slave2 ~]# yum -y install chrony
  2. [root@slave2 ~]# cat /etc/chrony.conf
  3. # Use public servers from the pool.ntp.org project.
  4. # Please consider joining the pool (http://www.pool.ntp.org/join.html).
  5. server time1.aliyun.com iburst
  6. [root@slave2 ~]# systemctl restart chronyd.service
  7. [root@slave2 ~]# systemctl enable chronyd.service
  8. [root@slave2 ~]# date
  9. Fri Apr 15 15:40:20 CST 2022
复制代码
1.3.2 实验任务二:HBase 安装与配置

1.3.2.1 步骤一:解压缩 HBase 安装包
  1. [root@master ~]# tar -zxvf hbase-1.2.1-bin.tar.gz -C /usr/local/src/
复制代码
1.3.2.2 步骤二:重命名 HBase 安装文件夹
  1. [root@master ~]# cd /usr/local/src/
  2. [root@master src]# mv hbase-1.2.1 hbase
复制代码
1.3.2.3 步骤三:在全部节点添加环境变量
  1. [root@master ~]# cat /etc/profile
  2. # set hbase environment
  3. export HBASE_HOME=/usr/local/src/hbase
  4. export PATH=$HBASE_HOME/bin:$PATH   
  5. [root@slave1 ~]# cat /etc/profile
  6. # set hbase environment
  7. export HBASE_HOME=/usr/local/src/hbase
  8. export PATH=$HBASE_HOME/bin:$PATH
  9. [root@slave2 ~]# cat /etc/profile
  10. # set hbase environment
  11. export HBASE_HOME=/usr/local/src/hbase
  12. export PATH=$HBASE_HOME/bin:$PATH
复制代码
1.3.2.4 步骤四:在全部节点使环境变量生效
  1. [root@master ~]# source /etc/profile
  2. [root@master ~]# echo $PATH
  3. /usr/local/src/hbase/bin:/usr/local/src/jdk/bin:/usr/local/src/hadoop/bin:/usr/local/src/hadoop/sbin:/usr/local/src/hbase/bin:/usr/local/src/jdk/bin:/usr/local/src/hadoop/bin:/usr/local/src/hadoop/sbin:/usr/local/src/jdk/bin:/usr/local/src/hadoop/bin:/usr/local/src/hadoop/sbin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/usr/local/src/hive/bin:/root/bin:/usr/local/src/hive/bin:/usr/local/src/hive/bin  
  4. [root@slave1 ~]# source /etc/profile
  5. [root@slave1 ~]# echo $PATH
  6. /usr/local/src/hbase/bin:/usr/local/src/jdk/bin:/usr/local/src/hadoop/bin:/usr/local/src/hadoop/sbin:/usr/local/src/hbase/bin:/usr/local/src/jdk/bin:/usr/local/src/hadoop/bin:/usr/local/src/hadoop/sbin:/usr/local/src/jdk/bin:/usr/local/src/hadoop/bin:/usr/local/src/hadoop/sbin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin
  7. [root@slave2 ~]# source /etc/profile
  8. [root@slave2 ~]# echo $PATH
  9. /usr/local/src/hbase/bin:/usr/local/src/jdk/bin:/usr/local/src/hadoop/bin:/usr/local/src/hadoop/sbin:/usr/local/src/hbase/bin:/usr/local/src/jdk/bin:/usr/local/src/hadoop/bin:/usr/local/src/hadoop/sbin:/usr/local/src/jdk/bin:/usr/local/src/hadoop/bin:/usr/local/src/hadoop/sbin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin
复制代码
1.3.2.5 步骤五:在 master 节点进入配置文件目录
  1. [root@master ~]# cd /usr/local/src/hbase/conf/
复制代码
1.3.2.6 步骤六:在 master 节点配置 hbase-env.sh 文件
  1. [root@master conf]# cat hbase-env.sh
  2. export JAVA_HOME=/usr/local/src/jdk
  3. export HBASE_MANAGES_ZK=true
  4. export HBASE_CLASSPATH=/usr/local/src/hadoop/etc/hadoop/
复制代码
1.3.2.7 步骤七:在 master 节点配置 hbase-site.xml
  1. [root@master conf]# cat hbase-site.xml
  2. <configuration>
  3.         <property>
  4.                 <name>hbase.rootdir</name>
  5.                 <value>hdfs://master:9000/hbase</value>
  6.         </property>
  7.         <property>
  8.                 <name>hbase.master.info.port</name>
  9.                 <value>60010</value>
  10.         </property>
  11.         <property>
  12.                 <name>hbase.zookeeper.property.clientPort</name>
  13.                 <value>2181</value>
  14.         </property>
  15.         <property>
  16.                 <name>zookeeper.session.timeout</name>
  17.                 <value>120000</value>
  18.         </property>
  19.         <property>
  20.                 <name>hbase.zookeeper.quorum</name>
  21.                 <value>master,node1,node2</value>
  22.         </property>
  23.         <property>
  24.                 <name>hbase.tmp.dir</name>
  25.                 <value>/usr/local/src/hbase/tmp</value>
  26.         </property>
  27.         <property>
  28.                 <name>hbase.cluster.distributed</name>
  29.                 <value>true</value>
  30.         </property>
  31. </configuration>
复制代码
1.3.2.8 步骤八:在master节点修改 regionservers 文件
  1. [root@master conf]# cat regionservers
  2. node1
  3. node2
复制代码
1.3.2.9 步骤九:在master节点创建 hbase.tmp.dir 目录
  1. [root@master ~]# mkdir /usr/local/src/hbase/tmp
复制代码
1.3.2.10 步骤十:将master上的hbase安装文件同步到 node1 node2
  1. [root@master ~]# scp -r /usr/local/src/hbase/ root@node1:/usr/local/src/
  2. [root@master ~]# scp -r /usr/local/src/hbase/ root@node2:/usr/local/src/
复制代码
1.3.2.11 步骤十一:在全部节点修改 hbase 目录权限
  1. [root@master ~]# chown -R hadoop:hadoop /usr/local/src/hbase/
  2. [root@slave1 ~]# chown -R hadoop:hadoop /usr/local/src/hbase/
  3. [root@slave2 ~]# chown -R hadoop:hadoop /usr/local/src/hbase/
复制代码
1.3.2.12 步骤十二:在全部节点切换到hadoop用户
  1. [root@master ~]# su - hadoop
  2. Last login: Mon Apr 11 00:42:46 CST 2022 on pts/0
  3. [root@slave1 ~]# su - hadoop
  4. Last login: Fri Apr  8 22:57:42 CST 2022 on pts/0
  5. [root@slave2 ~]# su - hadoop
  6. Last login: Fri Apr  8 22:57:54 CST 2022 on pts/0
复制代码
1.3.2.13 步骤十三:启动 HBase

先启动 Hadoop,然后启动 ZooKeeper,末了启动 HBase。
  1. [hadoop@master ~]$ start-all.sh
  2. [hadoop@master ~]$ jps
  3. 2130 SecondaryNameNode
  4. 1927 NameNode
  5. 2554 Jps
  6. 2301 ResourceManager
  7. [hadoop@slave1 ~]$ jps
  8. 1845 NodeManager
  9. 1977 Jps
  10. 1725 DataNode
  11. [hadoop@slave2 ~]$ jps
  12. 2080 Jps
  13. 1829 DataNode
  14. 1948 NodeManager
复制代码
1.3.2.14 步骤十四:在 master节点启动HBase
  1. [hadoop@master conf]$ start-hbase.sh
  2. [hadoop@master conf]$ jps
  3. 2130 SecondaryNameNode
  4. 3572 HQuorumPeer
  5. 1927 NameNode
  6. 5932 HMaster
  7. 2301 ResourceManager
  8. 6157 Jps
  9. [hadoop@slave1 ~]$ jps
  10. 2724 Jps
  11. 1845 NodeManager
  12. 1725 DataNode
  13. 2399 HQuorumPeer
  14. 2527 HRegionServer
  15. [root@slave2 ~]# jps
  16. 3795 Jps
  17. 1829 DataNode
  18. 3529 HRegionServer
  19. 1948 NodeManager
  20. 3388 HQuorumPeer
复制代码
1.3.2.15 步骤十五:修改windows上的hosts文件

(C:\Windows\System32\drivers\etc\hosts)
把hots文件拖到桌面上,然后编辑它参加master的主机名与P地址的映射关系后在浏览器上输入http//:master:60010访问hbase的web界面
1.3.3 实验任务三:HBase常用Shell命令

1.3.3.1 步骤一:进入 HBase 命令行
  1. [hadoop@master ~]$ hbase shell
  2. SLF4J: Class path contains multiple SLF4J bindings.
  3. SLF4J: Found binding in [jar:file:/usr/local/src/hbase/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  4. SLF4J: Found binding in [jar:file:/usr/local/src/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  5. SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
  6. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
  7. HBase Shell; enter 'help<RETURN>' for list of supported commands.
  8. Type "exit<RETURN>" to leave the HBase Shell
  9. Version 1.2.1, r8d8a7107dc4ccbf36a92f64675dc60392f85c015, Wed Mar 30 11:19:21 CDT 2016
  10. hbase(main):001:0>  
复制代码
1.3.3.2 步骤二:创建表 scores,两个列簇:grade 和 course
  1. hbase(main):001:0> create 'scores','grade','course'
  2. 0 row(s) in 1.4400 seconds
  3. => Hbase::Table - scores
复制代码
1.3.3.3 步骤三:查看数据库状态
  1. hbase(main):002:0> status
  2. 1 active master, 0 backup masters, 2 servers, 0 dead, 1.5000 average load
复制代码
1.3.3.4 步骤四:查看数据库版本
  1. hbase(main):003:0> version
  2. 1.2.1, r8d8a7107dc4ccbf36a92f64675dc60392f85c015, Wed Mar 30 11:19:21 CDT 2016
复制代码
1.3.3.5 步骤五:查看表
  1. hbase(main):004:0> list
  2. TABLE  
  3. scores
  4. 1 row(s) in 0.0150 seconds
  5. => ["scores"]
复制代码
1.3.3.6 步骤六:插入记载 1:jie,grade: 143cloud
  1. hbase(main):005:0> put 'scores','jie','grade:','146cloud'
  2. 0 row(s) in 0.1060 seconds
复制代码
1.3.3.7 步骤七:插入记载 2:jie,course:math,86
  1. hbase(main):006:0> put 'scores','jie','course:math','86'
  2. 0 row(s) in 0.0120 seconds
复制代码
1.3.3.8 步骤八:插入记载 3:jie,course:cloud,92
  1. hbase(main):009:0> put 'scores','jie','course:cloud','92'
  2. 0 row(s) in 0.0070 seconds
复制代码
1.3.3.9 步骤九:插入记载 4:shi,grade:133soft
  1. hbase(main):010:0> put 'scores','shi','grade:','133soft'
  2. 0 row(s) in 0.0120 seconds
复制代码
1.3.3.10 步骤十:插入记载 5:shi,grade:math,87
  1. hbase(main):011:0> put 'scores','shi','course:math','87'
  2. 0 row(s) in 0.0090 seconds
复制代码
1.3.3.11 步骤十一:插入记载 6:shi,grade:cloud,96
  1. hbase(main):012:0> put 'scores','shi','course:cloud','96'
  2. 0 row(s) in 0.0100 seconds
复制代码
1.3.3.12 步骤十二:读取 jie 的记载
  1. hbase(main):013:0> get 'scores','jie'
  2. COLUMN  CELL   
  3. course:cloud   timestamp=1650015032132, value=92  
  4. course:mathtimestamp=1650014925177, value=86  
  5. grade: timestamp=1650014896056, value=146cloud
  6. 3 row(s) in 0.0250 seconds
复制代码
1.3.3.13 步骤十三:读取 jie 的班级
  1. hbase(main):014:0> get 'scores','jie','grade'
  2. COLUMN  CELL   
  3. grade: timestamp=1650014896056, value=146cloud
  4. 1 row(s) in 0.0110 seconds
复制代码
1.3.3.14 步骤十四:查看整个表记载
  1. hbase(main):001:0> scan 'scores'
  2. ROW  COLUMN+CELL  
  3. jie column=course:cloud, timestamp=1650015032132, value=92   
  4. jie column=course:math, timestamp=1650014925177, value=86
  5. jie column=grade:, timestamp=1650014896056, value=146cloud   
  6. shi column=course:cloud, timestamp=1650015240873, value=96   
  7. shi column=course:math, timestamp=1650015183521, value=87
  8. 2 row(s) in 0.1490 seconds
复制代码
1.3.3.15 步骤十五:按例查看表记载
  1. hbase(main):002:0> scan 'scores',{COLUMNS=>'course'}
  2. ROW  COLUMN+CELL  
  3. jie column=course:cloud, timestamp=1650015032132, value=92   
  4. jie column=course:math, timestamp=1650014925177, value=86
  5. shi column=course:cloud, timestamp=1650015240873, value=96   
  6. shi column=course:math, timestamp=1650015183521, value=87
  7. 2 row(s) in 0.0160 seconds
复制代码
1.3.3.16 步骤十六:删除指定记载shell
  1. hbase(main):003:0> delete 'scores','shi','grade'
  2. 0 row(s) in 0.0560 seconds
复制代码
1.3.3.17 步骤十七:删除后,执行scan 命令
  1. hbase(main):004:0> scan 'scores'
  2. ROW  COLUMN+CELL  
  3. jie column=course:cloud, timestamp=1650015032132, value=92   
  4. jie column=course:math, timestamp=1650014925177, value=86
  5. jie column=grade:, timestamp=1650014896056, value=146cloud   
  6. shi column=course:cloud, timestamp=1650015240873, value=96   
  7. shi column=course:math, timestamp=1650015183521, value=87
  8. 2 row(s) in 0.0130 seconds
复制代码
1.3.3.18 步骤十八:增加新的列簇
  1. hbase(main):005:0> alter 'scores',NAME=>'age'
  2. Updating all regions with the new schema...
  3. 1/1 regions updated.
  4. Done.
  5. 0 row(s) in 2.0110 seconds
复制代码
1.3.3.19 步骤十九:查看表布局
  1. hbase(main):006:0> describe 'scores'
  2. Table scores is ENABLED   
  3. scores
  4. COLUMN FAMILIES DESCRIPTION   
  5. {NAME => 'age', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', C
  6. OMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}  
  7. {NAME => 'course', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER'
  8. , COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}   
  9. {NAME => 'grade', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER',
  10. COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
  11. 3 row(s) in 0.0230 seconds
复制代码
1.3.3.20 步骤二十:删除列簇
  1. hbase(main):007:0> alter 'scores',NAME=>'age',METHOD=>'delete'
  2. Updating all regions with the new schema...
  3. 1/1 regions updated.
  4. Done.
  5. 0 row(s) in 2.1990 seconds
复制代码
1.3.3.21 步骤二十一:删除表
  1. hbase(main):008:0> disable 'scores'
  2. 0 row(s) in 2.3190 seconds
复制代码
1.3.3.22 步骤二十二:退出
  1. hbase(main):009:0> quit
复制代码
1.3.3.23 步骤二十三:关闭 HBase
  1. [hadoop@master ~]$ stop-hbase.sh
  2. stopping hbase.................
  3. master: stopping zookeeper.
  4. node2: stopping zookeeper.
  5. node1: stopping zookeeper.
复制代码
在 master 节点关闭 Hadoop。
  1. [hadoop@master ~]$ stop-all.sh
  2. This script is Deprecated. Instead use stop-dfs.sh and stop-yarn.sh
  3. Stopping namenodes on [master]
  4. master: stopping namenode
  5. 10.10.10.130: stopping datanode
  6. 10.10.10.129: stopping datanode
  7. Stopping secondary namenodes [0.0.0.0]
  8. 0.0.0.0: stopping secondarynamenode
  9. stopping yarn daemons
  10. stopping resourcemanager
  11. 10.10.10.129: stopping nodemanager
  12. 10.10.10.130: stopping nodemanager
  13. no proxyserver to stop
  14. [hadoop@master ~]$ jps
  15. 3820 Jps
  16. [hadoop@slave1 ~]$ jps
  17. 2220 Jps
  18. [root@slave2 ~]# jps
  19. 2082 Jps
复制代码
完结,撒花
附件:






第9章 Sqoop组件安装配置

实验一:Sqoop 组件安装与配置

1.1.实验目标

完成本实验,您应该能够:

  • 下载和解压 Sqoop
  • 配置Sqoop 环境
  • 安装Sqoop
  • Sqoop 模板命令
1.2.实验要求


  • 认识Sqoop 环境
  • 认识Sqoop 模板命令
1.3.实验过程

1.3.1.实验任务一:下载和解压 Sqoop

安装Sqoop 组件需要与Hadoop 环境适配。使用 root 用户在Master 节点上进行部署, 将 /opt/software/sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz 压 缩 包 解 压 到/usr/local/src 目录下。
  1. [root@master ~]# tar xf /opt/software/sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz -C /usr/local/src/
复制代码
将解压后生成的 sqoop-1.4.7.bin hadoop-2.6.0 文件夹更名为 sqoop。
  1. [root@master ~]# cd /usr/local/src/
  2. [root@master src]# mv sqoop-1.4.7.bin__hadoop-2.6.0 sqoop
复制代码
1.3.2.实验任务二:配置 Sqoop 环境

1.3.2.1.步骤一:创建 Sqoop 的配置文件 sqoop-env.sh。

复制 sqoop-env-template.sh 模板,并将模板重命名为 sqoop-env.sh。
  1. [root@master src]# cd /usr/local/src/sqoop/conf/
  2. [root@master conf]# cp sqoop-env-template.sh sqoop-env.sh
复制代码
1.3.2.2.步骤二:修改 sqoop-env.sh 文件,添加 Hdoop、Hbase、Hive 等组件的安装路径。

注意,下面各组件的安装路径需要与实际环境中的安装路径保持同等。
  1. vim sqoop-env.sh
  2. export HADOOP_COMMON_HOME=/usr/local/src/hadoop
  3. export HADOOP_MAPRED_HOME=/usr/local/src/hadoop
  4. export HBASE_HOME=/usr/local/src/hbase
  5. export HIVE_HOME=/usr/local/src/hive
复制代码
1.3.2.3.步骤三:配置 Linux 系统环境变量,添加 Sqoop 组件的路径。
  1. vim /etc/profile.d/sqoop.sh
  2. export SQOOP_HOME=/usr/local/src/sqoop
  3. export PATH=$SQOOP_HOME/bin:$PATH
  4. export CLASSPATH=$CLASSPATH:$SQOOP_HOME/lib
  5. [root@master conf]# source /etc/profile.d/sqoop.sh
  6. [root@master conf]# echo $PATH
  7. /usr/local/src/sqoop/bin:/usr/local/src/hbase/bin:/usr/local/src/zookeeper/bin:/usr/local/src/hbase/bin:/usr/local/src/jdk/bin:/usr/local/src/hadoop/bin:/usr/local/src/hadoop/sbin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/usr/local/src/hive/bin:/root/bin
复制代码
1.3.2.4.步骤四:连接数据库

为了使 Sqoop 能够连接 MySQL 数据库,需要将/opt/software/mysql-connector-jav a-5.1.46.jar 文件放入 sqoop 的 lib 目录中。该 jar 文件的版本需要与 MySQL 数据库的版本相对应,否则 Sqoop 导入数据时会报错。(mysql-connector-java-5.1.46.jar 对应的是 MySQL 5.7 版本)若该目录没有 jar 包,则使用第 6 章导入 home 目录的jar包
  1. [root@master conf]# cp /opt/software/mysql-connector-java-5.1.46.jar /usr/local/src/sqoop/lib/
复制代码
1.3.3.实验任务三:启动Sqoop

1.3.3.1.步骤一:执行 Sqoop 前需要先启动 Hadoop 集群。

在 master 节点切换到 hadoop 用户执行 start-all.sh 命令启动 Hadoop 集群。
  1. [root@master conf]# su - hadoop
  2. Last login: Fri Apr 22 16:21:25 CST 2022 on pts/0
  3. [hadoop@master ~]$ start-all.sh
  4. This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
  5. Starting namenodes on [master]
  6. master: starting namenode, logging to /usr/local/src/hadoop/logs/hadoop-hadoop-namenode-master.out
  7. 10.10.10.129: starting datanode, logging to /usr/local/src/hadoop/logs/hadoop-hadoop-datanode-slave1.out
  8. 10.10.10.130: starting datanode, logging to /usr/local/src/hadoop/logs/hadoop-hadoop-datanode-slave2.out
  9. Starting secondary namenodes [0.0.0.0]
  10. 0.0.0.0: starting secondarynamenode, logging to /usr/local/src/hadoop/logs/hadoop-hadoop-secondarynamenode-master.out
  11. starting yarn daemons
  12. starting resourcemanager, logging to /usr/local/src/hadoop/logs/yarn-hadoop-resourcemanager-master.out
  13. 10.10.10.130: starting nodemanager, logging to /usr/local/src/hadoop/logs/yarn-hadoop-nodemanager-slave2.out
  14. 10.10.10.129: starting nodemanager, logging to /usr/local/src/hadoop/logs/yarn-hadoop-nodemanager-slave1.out
复制代码
1.3.3.2.步骤二:检查 Hadoop 集群的运行状态。
  1. [hadoop@master ~]$ jps
  2. 1653 SecondaryNameNode
  3. 2086 Jps
  4. 1450 NameNode
  5. 1822 ResourceManager
  6. [root@slave1 ~]# jps
  7. 1378 NodeManager
  8. 1268 DataNode
  9. 1519 Jps
  10. [root@slave2 ~]# jps
  11. 1541 Jps
  12. 1290 DataNode
  13. 1405 NodeManager
复制代码
1.3.3.3.步骤三:测试Sqoop是否能够正常连接MySQL 数据库。

Sqoop 连接 MySQL 数据库 P 大写 密码 Password123$
  1. [hadoop@master ~]$ sqoop list-databases --connect jdbc:mysql://master:3306 --username root -P
  2. Warning: /usr/local/src/sqoop/../hcatalog does not exist! HCatalog jobs will fail.
  3. Please set $HCAT_HOME to the root of your HCatalog installation.
  4. Warning: /usr/local/src/sqoop/../accumulo does not exist! Accumulo imports will fail.
  5. Please set $ACCUMULO_HOME to the root of your Accumulo installation.
  6. 22/04/29 15:25:49 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
  7. Enter password:
  8. 22/04/29 15:25:58 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
  9. Fri Apr 29 15:25:58 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
  10. information_schema
  11. hive
  12. mysql
  13. performance_schema
  14. sys
复制代码
1.3.3.4.步骤四:连接 hive

为了使 Sqoop 能够连接 Hive,需要将 hive 组件/usr/local/src/hive/lib 目录下的
hive-common-2.0.0.jar 也放入 Sqoop 安装路径的 lib 目录中。
  1. [hadoop@master ~]$ cp /usr/local/src/hive/lib/hive-common-2.0.0.jar  /usr/local/src/sqoop/lib/
复制代码
1.3.4.实验任务四:Sqoop 模板命令

1.3.4.1.步骤一:创建MySQL数据库和数据表。

创建 sample 数据库,在 sample 中创建 student 表,在 student 表中插入了 3 条数据。
  1. # 登录 MySQL 数据库
  2. [hadoop@master ~]$ mysql -uroot -pPassword123$
  3. mysql: [Warning] Using a password on the command line interface can be insecure.
  4. Welcome to the MySQL monitor.  Commands end with ; or \g.
  5. Your MySQL connection id is 6
  6. Server version: 5.7.18 MySQL Community Server (GPL)
  7. Copyright (c) 2000, 2017, Oracle and/or its affiliates. All rights reserved.
  8. Oracle is a registered trademark of Oracle Corporation and/or its
  9. affiliates. Other names may be trademarks of their respective
  10. owners.
  11. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
  12. # 创建 sample 库
  13. mysql> create database sample;
  14. Query OK, 1 row affected (0.00 sec)
  15. # 使用 sample 库
  16. mysql> use sample;
  17. Database changed
  18. # 创建 student 表,该数据表有number学号和name姓名两个字段
  19. mysql> create table student(number char(9) primary key, name varchar(10));
  20. Query OK, 0 rows affected (0.01 sec)
  21. # 向 student 表插入几条数据
  22. mysql>  insert into student values('01','zhangsan'),('02','lisi'),('03','wangwu');
  23. Query OK, 3 rows affected (0.01 sec)
  24. Records: 3  Duplicates: 0  Warnings: 0
  25. # 查询 student 表的数据
  26. mysql> select * from student;
  27. +--------+----------+
  28. | number | name |
  29. +--------+----------+
  30. | 01 | zhangsan |
  31. | 02 | lisi         |
  32. | 03 | wangwu   |
  33. +--------+----------+
  34. 3 rows in set (0.00 sec)
  35. mysql> quit
  36. Bye
复制代码
1.3.4.2.步骤二:在Hive中创建sample数据库和student数据表。
  1. hive>
  2. > create database sample;
  3. OK
  4. Time taken: 0.528 seconds
  5. hive>  use sample;
  6. OK
  7. Time taken: 0.019 seconds
  8. hive>  create table student(number STRING,name STRING);
  9. OK
  10. Time taken: 0.2 seconds
  11. hive> exit;
  12. [hadoop@master conf]$
复制代码
1.3.4.3.步骤三:从MySQL 导出数据,导入 Hive。
  1. [hadoop@master ~]$ sqoop import --connect jdbc:mysql://master:3306/sample --username root --password Password123$ --table student --fields-terminated-by '|' --delete-target-dir --num-mappers 1 --hive-import --hive-database sample --hive-table student
  2. hive>
  3.         > select * from sample.student;
  4. OK
  5. 01|zhangsan        NULL
  6. 02|lisi        NULL
  7. 03|wangwu        NULL
  8. Time taken: 1.238 seconds, Fetched: 3 row(s)
  9. hive>
  10.         > exit;
复制代码
1.3.4.4.步骤四:sqoop常用命令
  1. #列出所有数据库
  2. [hadoop@master ~]$ sqoop list-databases --connect jdbc:mysql://master:3306/ --username root --password Password123$
  3. Warning: /usr/local/src/sqoop/../hcatalog does not exist! HCatalog jobs will fail.
  4. Please set $HCAT_HOME to the root of your HCatalog installation.
  5. Warning: /usr/local/src/sqoop/../accumulo does not exist! Accumulo imports will fail.
  6. Please set $ACCUMULO_HOME to the root of your Accumulo installation.
  7. 22/04/29 16:55:40 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
  8. 22/04/29 16:55:40 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
  9. 22/04/29 16:55:40 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
  10. Fri Apr 29 16:55:40 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
  11. information_schema
  12. hive
  13. mysql
  14. performance_schema
  15. sample
  16. sys
  17. # 连接 MySQL 并列出 sample 数据库中的表
  18. [hadoop@master ~]$ sqoop list-tables --connect "jdbc:mysql://master:3306/sample?useSSL=false" --username root --password Password123$
  19. Warning: /usr/local/src/sqoop/../hcatalog does not exist! HCatalog jobs will fail.
  20. Please set $HCAT_HOME to the root of your HCatalog installation.
  21. Warning: /usr/local/src/sqoop/../accumulo does not exist! Accumulo imports will fail.
  22. Please set $ACCUMULO_HOME to the root of your Accumulo installation.
  23. 22/04/29 16:56:45 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
  24. 22/04/29 16:56:45 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
  25. 22/04/29 16:56:45 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
  26. student
  27. # 将关系型数据的表结构复制到 hive 中,只是复制表的结构,表中的内容没有复制过去
  28. [hadoop@master ~]$ sqoop create-hive-table --connect jdbc:mysql://master:3306/sample --table student --username root --password Password123$ --hive-table test
  29. Warning: /usr/local/src/sqoop/../hcatalog does not exist! HCatalog jobs will fail.
  30. Please set $HCAT_HOME to the root of your HCatalog installation.
  31. Warning: /usr/local/src/sqoop/../accumulo does not exist! Accumulo imports will fail.
  32. Please set $ACCUMULO_HOME to the root of your Accumulo installation.
  33. 22/04/29 16:57:42 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
  34. 22/04/29 16:57:42 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
  35. 22/04/29 16:57:42 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override
  36. 22/04/29 16:57:42 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc.
  37. 22/04/29 16:57:42 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
  38. Fri Apr 29 16:57:42 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
  39. 22/04/29 16:57:43 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `student` AS t LIMIT 1
  40. 22/04/29 16:57:43 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `student` AS t LIMIT 1
  41. SLF4J: Class path contains multiple SLF4J bindings.
  42. SLF4J: Found binding in [jar:file:/usr/local/src/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  43. SLF4J: Found binding in [jar:file:/usr/local/src/hbase/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  44. SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
  45. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
  46. 22/04/29 16:57:43 INFO hive.HiveImport: Loading uploaded data into Hive
  47. 22/04/29 16:57:46 INFO hive.HiveImport: SLF4J: Class path contains multiple SLF4J bindings.
  48. 22/04/29 16:57:46 INFO hive.HiveImport: SLF4J: Found binding in [jar:file:/usr/local/src/hive/lib/hive-jdbc-2.0.0-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  49. 22/04/29 16:57:46 INFO hive.HiveImport: SLF4J: Found binding in [jar:file:/usr/local/src/hive/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  50. 22/04/29 16:57:46 INFO hive.HiveImport: SLF4J: Found binding in [jar:file:/usr/local/src/hbase/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  51. 22/04/29 16:57:46 INFO hive.HiveImport: SLF4J: Found binding in [jar:file:/usr/local/src/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  52. 22/04/29 16:57:46 INFO hive.HiveImport: SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
  53. 22/04/29 16:57:46 INFO hive.HiveImport: SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
  54. 22/04/29 16:57:46 INFO hive.HiveImport:
  55. 22/04/29 16:57:46 INFO hive.HiveImport: Logging initialized using configuration in jar:file:/usr/local/src/hive/lib/hive-common-2.0.0.jar!/hive-log4j2.properties
  56. 22/04/29 16:57:47 INFO hive.HiveImport: Fri Apr 29 16:57:47 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
  57. 22/04/29 16:57:47 INFO hive.HiveImport: Fri Apr 29 16:57:47 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
  58. 22/04/29 16:57:47 INFO hive.HiveImport: Fri Apr 29 16:57:47 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
  59. 22/04/29 16:57:47 INFO hive.HiveImport: Fri Apr 29 16:57:47 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
  60. 22/04/29 16:57:48 INFO hive.HiveImport: Fri Apr 29 16:57:48 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
  61. 22/04/29 16:57:48 INFO hive.HiveImport: Fri Apr 29 16:57:48 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
  62. 22/04/29 16:57:48 INFO hive.HiveImport: Fri Apr 29 16:57:48 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
  63. 22/04/29 16:57:48 INFO hive.HiveImport: Fri Apr 29 16:57:48 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
  64. 22/04/29 16:57:50 INFO hive.HiveImport: OK
  65. 22/04/29 16:57:50 INFO hive.HiveImport: Time taken: 0.853 seconds
  66. 22/04/29 16:57:51 INFO hive.HiveImport: Hive import complete.
  67. # 如果执行以上命令之后显示hive.HiveImport: Hive import complete.则表示成功
  68. [hadoop@master ~]$ sqoop import --connect jdbc:mysql://master:3306/sample --username root --password Password123$ --table student --fields-terminated-by '|' --delete-target-dir --num-mappers 1 --hive-import --hive-database default --hive-table test
  69. Warning: /usr/local/src/sqoop/../hcatalog does not exist! HCatalog jobs will fail.
  70. Please set $HCAT_HOME to the root of your HCatalog installation.
  71. Warning: /usr/local/src/sqoop/../accumulo does not exist! Accumulo imports will fail.
  72. Please set $ACCUMULO_HOME to the root of your Accumulo installation.
  73. 22/04/29 17:00:06 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
  74. 22/04/29 17:00:06 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
  75. 22/04/29 17:00:06 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
  76. 22/04/29 17:00:06 INFO tool.CodeGenTool: Beginning code generation
  77. Fri Apr 29 17:00:06 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
  78. 22/04/29 17:00:06 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `student` AS t LIMIT 1
  79. 22/04/29 17:00:06 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `student` AS t LIMIT 1
  80. 22/04/29 17:00:06 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/local/src/hadoop
  81. Note: /tmp/sqoop-hadoop/compile/556af862aa5bc04a542c14f0741f7dc6/student.java uses or overrides a deprecated API.
  82. Note: Recompile with -Xlint:deprecation for details.
  83. 22/04/29 17:00:07 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/556af862aa5bc04a542c14f0741f7dc6/student.jar
  84. SLF4J: Class path contains multiple SLF4J bindings.
  85. SLF4J: Found binding in [jar:file:/usr/local/src/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  86. SLF4J: Found binding in [jar:file:/usr/local/src/hbase/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  87. SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
  88. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
  89. 22/04/29 17:00:07 INFO tool.ImportTool: Destination directory student is not present, hence not deleting.
  90. 22/04/29 17:00:07 WARN manager.MySQLManager: It looks like you are importing from mysql.
  91. 22/04/29 17:00:07 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
  92. 22/04/29 17:00:07 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
  93. 22/04/29 17:00:07 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
  94. 22/04/29 17:00:07 INFO mapreduce.ImportJobBase: Beginning import of student
  95. 22/04/29 17:00:07 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
  96. 22/04/29 17:00:07 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
  97. 22/04/29 17:00:07 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
  98. Fri Apr 29 17:00:09 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
  99. 22/04/29 17:00:09 INFO db.DBInputFormat: Using read commited transaction isolation
  100. 22/04/29 17:00:09 INFO mapreduce.JobSubmitter: number of splits:1
  101. 22/04/29 17:00:09 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1651221174197_0003
  102. 22/04/29 17:00:09 INFO impl.YarnClientImpl: Submitted application application_1651221174197_0003
  103. 22/04/29 17:00:09 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1651221174197_0003/
  104. 22/04/29 17:00:09 INFO mapreduce.Job: Running job: job_1651221174197_0003
  105. 22/04/29 17:00:13 INFO mapreduce.Job: Job job_1651221174197_0003 running in uber mode : false
  106. 22/04/29 17:00:13 INFO mapreduce.Job:  map 0% reduce 0%
  107. 22/04/29 17:00:17 INFO mapreduce.Job:  map 100% reduce 0%
  108. 22/04/29 17:00:17 INFO mapreduce.Job: Job job_1651221174197_0003 completed successfully
  109. 22/04/29 17:00:17 INFO mapreduce.Job: Counters: 30
  110.         File System Counters
  111.                 FILE: Number of bytes read=0
  112.                 FILE: Number of bytes written=134261
  113.                 FILE: Number of read operations=0
  114.                 FILE: Number of large read operations=0
  115.                 FILE: Number of write operations=0
  116.                 HDFS: Number of bytes read=87
  117.                 HDFS: Number of bytes written=30
  118.                 HDFS: Number of read operations=4
  119.                 HDFS: Number of large read operations=0
  120.                 HDFS: Number of write operations=2
  121.         Job Counters
  122.                 Launched map tasks=1
  123.                 Other local map tasks=1
  124.                 Total time spent by all maps in occupied slots (ms)=1731
  125.                 Total time spent by all reduces in occupied slots (ms)=0
  126.                 Total time spent by all map tasks (ms)=1731
  127.                 Total vcore-seconds taken by all map tasks=1731
  128.                 Total megabyte-seconds taken by all map tasks=1772544
  129.         Map-Reduce Framework
  130.                 Map input records=3
  131.                 Map output records=3
  132.                 Input split bytes=87
  133.                 Spilled Records=0
  134.                 Failed Shuffles=0
  135.                 Merged Map outputs=0
  136.                 GC time elapsed (ms)=35
  137.                 CPU time spent (ms)=1010
  138.                 Physical memory (bytes) snapshot=179433472
  139.                 Virtual memory (bytes) snapshot=2137202688
  140.                 Total committed heap usage (bytes)=88604672
  141.         File Input Format Counters
  142.                 Bytes Read=0
  143.         File Output Format Counters
  144.                 Bytes Written=30
  145. 22/04/29 17:00:17 INFO mapreduce.ImportJobBase: Transferred 30 bytes in 9.8777 seconds (3.0371 bytes/sec)
  146. 22/04/29 17:00:17 INFO mapreduce.ImportJobBase: Retrieved 3 records.
  147. 22/04/29 17:00:17 INFO mapreduce.ImportJobBase: Publishing Hive/Hcat import job data to Listeners for table student
  148. Fri Apr 29 17:00:17 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
  149. 22/04/29 17:00:17 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `student` AS t LIMIT 1
  150. 22/04/29 17:00:17 INFO hive.HiveImport: Loading uploaded data into Hive
  151. 22/04/29 17:00:20 INFO hive.HiveImport: SLF4J: Class path contains multiple SLF4J bindings.
  152. 22/04/29 17:00:20 INFO hive.HiveImport: SLF4J: Found binding in [jar:file:/usr/local/src/hive/lib/hive-jdbc-2.0.0-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  153. 22/04/29 17:00:20 INFO hive.HiveImport: SLF4J: Found binding in [jar:file:/usr/local/src/hive/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  154. 22/04/29 17:00:20 INFO hive.HiveImport: SLF4J: Found binding in [jar:file:/usr/local/src/hbase/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  155. 22/04/29 17:00:20 INFO hive.HiveImport: SLF4J: Found binding in [jar:file:/usr/local/src/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  156. 22/04/29 17:00:20 INFO hive.HiveImport: SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
  157. 22/04/29 17:00:20 INFO hive.HiveImport: SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
  158. 22/04/29 17:00:20 INFO hive.HiveImport:
  159. 22/04/29 17:00:20 INFO hive.HiveImport: Logging initialized using configuration in jar:file:/usr/local/src/hive/lib/hive-common-2.0.0.jar!/hive-log4j2.properties
  160. 22/04/29 17:00:21 INFO hive.HiveImport: Fri Apr 29 17:00:21 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
  161. 22/04/29 17:00:21 INFO hive.HiveImport: Fri Apr 29 17:00:21 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
  162. 22/04/29 17:00:21 INFO hive.HiveImport: Fri Apr 29 17:00:21 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
  163. 22/04/29 17:00:21 INFO hive.HiveImport: Fri Apr 29 17:00:21 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
  164. 22/04/29 17:00:23 INFO hive.HiveImport: Fri Apr 29 17:00:23 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
  165. 22/04/29 17:00:23 INFO hive.HiveImport: Fri Apr 29 17:00:23 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
  166. 22/04/29 17:00:23 INFO hive.HiveImport: Fri Apr 29 17:00:23 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
  167. 22/04/29 17:00:23 INFO hive.HiveImport: Fri Apr 29 17:00:23 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
  168. 22/04/29 17:00:24 INFO hive.HiveImport: OK
  169. 22/04/29 17:00:24 INFO hive.HiveImport: Time taken: 0.713 seconds
  170. 22/04/29 17:00:24 INFO hive.HiveImport: Loading data to table default.test
  171. 22/04/29 17:00:25 INFO hive.HiveImport: OK
  172. 22/04/29 17:00:25 INFO hive.HiveImport: Time taken: 0.42 seconds
  173. 22/04/29 17:00:25 INFO hive.HiveImport: Hive import complete.
  174. 22/04/29 17:00:25 INFO hive.HiveImport: Export directory is contains the _SUCCESS file only, removing the directory.
  175. hive> show tables;
  176. OK
  177. test
  178. Time taken: 0.558 seconds, Fetched: 1 row(s)
  179. hive> exit;
复制代码
  1. # 从mysql中导出表内容到HDFS文件中
  2. [hadoop@master ~]$ sqoop import --connect jdbc:mysql://master:3306/sample --username root --password Password123$ --table student --num-mappers 1 --target-dir /user/test
  3. Warning: /usr/local/src/sqoop/../hcatalog does not exist! HCatalog jobs will fail.
  4. Please set $HCAT_HOME to the root of your HCatalog installation.
  5. Warning: /usr/local/src/sqoop/../accumulo does not exist! Accumulo imports will fail.
  6. Please set $ACCUMULO_HOME to the root of your Accumulo installation.
  7. 22/04/29 17:03:13 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
  8. 22/04/29 17:03:13 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
  9. 22/04/29 17:03:13 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
  10. 22/04/29 17:03:13 INFO tool.CodeGenTool: Beginning code generation
  11. Fri Apr 29 17:03:14 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
  12. 22/04/29 17:03:14 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `student` AS t LIMIT 1
  13. 22/04/29 17:03:14 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `student` AS t LIMIT 1
  14. 22/04/29 17:03:14 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/local/src/hadoop
  15. Note: /tmp/sqoop-hadoop/compile/eab748b8f3fb956072f4877fdf4bf23a/student.java uses or overrides a deprecated API.
  16. Note: Recompile with -Xlint:deprecation for details.
  17. 22/04/29 17:03:15 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/eab748b8f3fb956072f4877fdf4bf23a/student.jar
  18. 22/04/29 17:03:15 WARN manager.MySQLManager: It looks like you are importing from mysql.
  19. 22/04/29 17:03:15 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
  20. 22/04/29 17:03:15 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
  21. 22/04/29 17:03:15 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
  22. 22/04/29 17:03:15 INFO mapreduce.ImportJobBase: Beginning import of student
  23. SLF4J: Class path contains multiple SLF4J bindings.
  24. SLF4J: Found binding in [jar:file:/usr/local/src/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  25. SLF4J: Found binding in [jar:file:/usr/local/src/hbase/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  26. SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
  27. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
  28. 22/04/29 17:03:15 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
  29. 22/04/29 17:03:15 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
  30. 22/04/29 17:03:15 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
  31. Fri Apr 29 17:03:17 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
  32. 22/04/29 17:03:17 INFO db.DBInputFormat: Using read commited transaction isolation
  33. 22/04/29 17:03:17 INFO mapreduce.JobSubmitter: number of splits:1
  34. 22/04/29 17:03:17 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1651221174197_0004
  35. 22/04/29 17:03:17 INFO impl.YarnClientImpl: Submitted application application_1651221174197_0004
  36. 22/04/29 17:03:17 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1651221174197_0004/
  37. 22/04/29 17:03:17 INFO mapreduce.Job: Running job: job_1651221174197_0004
  38. 22/04/29 17:03:21 INFO mapreduce.Job: Job job_1651221174197_0004 running in uber mode : false
  39. 22/04/29 17:03:21 INFO mapreduce.Job:  map 0% reduce 0%
  40. 22/04/29 17:03:25 INFO mapreduce.Job:  map 100% reduce 0%
  41. 22/04/29 17:03:25 INFO mapreduce.Job: Job job_1651221174197_0004 completed successfully
  42. 22/04/29 17:03:25 INFO mapreduce.Job: Counters: 30
  43.         File System Counters
  44.                 FILE: Number of bytes read=0
  45.                 FILE: Number of bytes written=134251
  46.                 FILE: Number of read operations=0
  47.                 FILE: Number of large read operations=0
  48.                 FILE: Number of write operations=0
  49.                 HDFS: Number of bytes read=87
  50.                 HDFS: Number of bytes written=30
  51.                 HDFS: Number of read operations=4
  52.                 HDFS: Number of large read operations=0
  53.                 HDFS: Number of write operations=2
  54.         Job Counters
  55.                 Launched map tasks=1
  56.                 Other local map tasks=1
  57.                 Total time spent by all maps in occupied slots (ms)=1945
  58.                 Total time spent by all reduces in occupied slots (ms)=0
  59.                 Total time spent by all map tasks (ms)=1945
  60.                 Total vcore-seconds taken by all map tasks=1945
  61.                 Total megabyte-seconds taken by all map tasks=1991680
  62.         Map-Reduce Framework
  63.                 Map input records=3
  64.                 Map output records=3
  65.                 Input split bytes=87
  66.                 Spilled Records=0
  67.                 Failed Shuffles=0
  68.                 Merged Map outputs=0
  69.                 GC time elapsed (ms)=69
  70.                 CPU time spent (ms)=1050
  71.                 Physical memory (bytes) snapshot=179068928
  72.                 Virtual memory (bytes) snapshot=2136522752
  73.                 Total committed heap usage (bytes)=88604672
  74.         File Input Format Counters
  75.                 Bytes Read=0
  76.         File Output Format Counters
  77.                 Bytes Written=30
  78. 22/04/29 17:03:25 INFO mapreduce.ImportJobBase: Transferred 30 bytes in 10.2361 seconds (2.9308 bytes/sec)
  79. 22/04/29 17:03:25 INFO mapreduce.ImportJobBase: Retrieved 3 records.
  80. # 执行以上命令后在浏览器上访问master_ip:50070然后点击Utilities下面的Browse the file system,要能看到user就表示成功
复制代码
  1. [hadoop@master ~]$ hdfs dfs -ls /user/test
  2. Found 2 items
  3. -rw-r--r--   2 hadoop supergroup  0 2022-04-29 17:03 /user/test/_SUCCESS
  4. -rw-r--r--   2 hadoop supergroup 30 2022-04-29 17:03 /user/test/part-m-00000
  5. [hadoop@master ~]$ hdfs dfs -cat /user/test/part-m-00000
  6. 01,zhangsan
  7. 02,lisi
  8. 03,wangwu
复制代码
第10章 Flume组件安装配置

实验一:Flume 组件安装配置

1.1. 实验目标

完成本实验,您应该能够:

  • 掌握下载和解压 Flume
  • 掌握 Flume 组件部署
  • 掌握使用 Flume 发送和继承信息
1.2. 实验要求


  • 相识 Flume 相关知识
  • 认识 Flume 功能应用
  • 认识 Flume 组件设置
1.3. 实验过程

1.3.1. 实验任务一:下载和解压 Flume

使用 root 用户解压 Flume 安装包到“/usr/local/src”路径,并修改解压后文件夹名
为 flume。
  1. [root@master ~]# tar xf /opt/software/apache-flume-1.6.0-bin.tar.gz -C /usr/local/src/
  2. [root@master ~]# cd /usr/local/src/
  3. [root@master src]# mv apache-flume-1.6.0-bin/
  4. flume
  5. [root@master src]# chown -R hadoop.hadoop /usr/local/src/
复制代码
1.3.2. 实验任务二:Flume 组件部署

1.3.2.1. 步骤一:使用 root 用户设置 Flume 环境变量,并使环境变量对全部用户生效。
  1. [root@master src]# vim /etc/profile.d/flume.sh
  2. export FLUME_HOME=/usr/local/src/flume
  3. export PATH=${FLUME_HOME}/bin:$PATH
复制代码
1.3.2.2. 步骤二:修改 Flume 相应配置文件。

起首,切换到 hadoop 用户,并切换当前工作目录到 Flume 的配置文件夹。
  1. [hadoop@master ~]$ echo $PATH
  2. /usr/local/src/hbase/bin:/usr/local/src/zookeeper/bin:/usr/local/src/sqoop/bin:/usr/local/src/hive/bin:/usr/local/src/hbase/bin:/usr/local/src/jdk/bin:/usr/local/src/hadoop/bin:/usr/local/src/hadoop/sbin:/usr/local/src/flume/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/usr/local/src/hive/bin:/home/hadoop/.local/bin:/home/hadoop/bin
复制代码
1.3.2.3. 步骤三:修改并配置 flume-env.sh 文件。
  1. [hadoop@master ~]$ vim /usr/local/src/hbase/conf/hbase-env.sh
  2. #export HBASE_CLASSPATH=/usr/local/src/hadoop/etc/hadoop/ #注释掉这一行的内容
  3. export JAVA_HOME=/usr/local/src/jdk
  4. [hadoop@master conf]$ start-all.sh
  5. [hadoop@master ~]$ flume-ng version
  6. Flume 1.6.0
  7. Source code repository: https://git-wip-us.apache.org/repos/asf/flume.git
  8. Revision: 2561a23240a71ba20bf288c7c2cda88f443c2080
  9. Compiled by hshreedharan on Mon May 11 11:15:44 PDT 2015
  10. From source with checksum b29e416802ce9ece3269d34233baf43f
复制代码
1.3.3. 实验任务三:使用 Flume 发送和继承信息

通过 Flume 将 Web 服务器中数据传输到 HDFS 中。
1.3.3.1. 步骤一:在 Flume 安装目录中创建 simple-hdfs-flume.conf 文件。
  1. [hadoop@master ~]$ cd /usr/local/src/flume/
  2. [hadoop@master ~]$ vi /usr/local/src/flume/simple-hdfs-flume.conf
  3. a1.sources=r1
  4. a1.sinks=k1
  5. a1.channels=c1
  6. a1.sources.r1.type=spooldir
  7. a1.sources.r1.spoolDir=/usr/local/src/hadoop/logs/
  8. a1.sources.r1.fileHeader=true
  9. a1.sinks.k1.type=hdfs
  10. a1.sinks.k1.hdfs.path=hdfs://master:9000/tmp/flume
  11. a1.sinks.k1.hdfs.rollsize=1048760
  12. a1.sinks.k1.hdfs.rollCount=0
  13. a1.sinks.k1.hdfs.rollInterval=900
  14. a1.sinks.k1.hdfs.useLocalTimeStamp=true
  15. a1.channels.c1.type=file
  16. a1.channels.c1.capacity=1000
  17. a1.channels.c1.transactionCapacity=100
  18. a1.sources.r1.channels = c1
  19. a1.sinks.k1.channel = c1
复制代码
1.3.3.2. 步骤二:使用 flume-ng agent 命令加载 simple-hdfs-flume.conf 配置信息,启 配置信息,启动flume 传输数据。
  1. [hadoop@master ~]$ flume-ng agent --conf-file simple-hdfs-flume.conf --name a1
复制代码
ctrl+c 退出 flume 传输
1.3.3.3. 步骤三:查看 Flume 传输到 HDFS 的文件,若能查看到 HDFS 上/tmp/flume目录有传输的数据文件,则表示数据传输成功。
  1. [hadoop@master ~]$ hdfs dfs -ls /
  2. Found 5 items
  3. drwxr-xr-x   - hadoop supergroup          0 2022-04-15 22:04 /hbase
  4. drwxr-xr-x   - hadoop supergroup          0 2022-04-02 18:24 /input
  5. drwxr-xr-x   - hadoop supergroup          0 2022-04-02 18:26 /output
  6. drwxr-xr-x   - hadoop supergroup          0 2022-05-06 17:24 /tmp
  7. drwxr-xr-x   - hadoop supergroup          0 2022-04-29 17:03 /user
复制代码

第13章 大数据平台监控命令

实验一:通过命令监控大数据平台运行状态

1.1. 实验目标

完成本实验,您应该能够:

  • 掌握大数据平台的运行状况
  • 掌握查看大数据平台运行状况的命令
1.2. 实验要求


  • 认识查看大数据平台运行状态的方式
  • 相识查看大数据平台运行状况的命令
1.3. 实验过程

1.3.1. 实验任务一:通过命令查看大数据平台状态

1.3.1.1. 步骤一: 查看 Linux 系统的信息( uname -a)
  1. [root@master ~]# uname -a
  2. Linux master 3.10.0-862.el7.x86_64 #1 SMP Fri Apr 20 16:44:24 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
复制代码
1.3.1.2. 步骤二:查看硬盘信息

(1)查看全部分区(fdisk -l)
  1. [root@master ~]# fdisk -l
  2. Disk /dev/sda: 21.5 GB, 21474836480 bytes, 41943040 sectors
  3. Units = sectors of 1 * 512 = 512 bytes
  4. Sector size (logical/physical): 512 bytes / 512 bytes
  5. I/O size (minimum/optimal): 512 bytes / 512 bytes
  6. Disk label type: dos
  7. Disk identifier: 0x00096169
  8.    Device Boot      Start         End      Blocks   Id  System
  9. /dev/sda1   *        2048     2099199     1048576   83  Linux
  10. /dev/sda2         2099200    41943039    19921920   8e  Linux LVM
  11. Disk /dev/mapper/centos-root: 18.2 GB, 18249416704 bytes, 35643392 sectors
  12. Units = sectors of 1 * 512 = 512 bytes
  13. Sector size (logical/physical): 512 bytes / 512 bytes
  14. I/O size (minimum/optimal): 512 bytes / 512 bytes
复制代码
  1. Disk /dev/mapper/centos-swap: 2147 MB, 2147483648 bytes, 4194304 sectors
  2. Units = sectors of 1 * 512 = 512 bytes
  3. Sector size (logical/physical): 512 bytes / 512 bytes
  4. I/O size (minimum/optimal): 512 bytes / 512 bytes
复制代码
(2)查看全部交换分区(swapon -s)
  1. [root@master ~]# swapon -s
  2. Filename                                Type                Size        Used        Priority
  3. /dev/dm-1                                      partition        2097148        0        -
复制代码
(3)查看文件系统占比(df -h)
  1. [root@master ~]# df -h
  2. Filesystem               Size  Used Avail Use% Mounted on
  3. /dev/mapper/centos-root   17G  4.8G   13G  28% /
  4. devtmpfs                 980M     0  980M   0% /dev
  5. tmpfs                    992M     0  992M   0% /dev/shm
  6. tmpfs                    992M  9.5M  982M   1% /run
  7. tmpfs                    992M     0  992M   0% /sys/fs/cgroup
  8. /dev/sda1               1014M  130M  885M  13% /boot
  9. tmpfs                    199M     0  199M   0% /run/user/0
复制代码
1.3.1.3. 步骤三: 查看网络 IP 地址( ifconfig)
  1. [root@master ~]# ifconfig
  2. ens32: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
  3.         inet 10.10.10.128  netmask 255.255.255.0  broadcast 10.10.10.255
  4.         inet6 fe80::af34:1702:3972:2b64  prefixlen 64  scopeid 0x20<link>
  5.         ether 00:0c:29:2e:33:83  txqueuelen 1000  (Ethernet)
  6.         RX packets 342  bytes 29820 (29.1 KiB)
  7.         RX errors 0  dropped 0  overruns 0  frame 0
  8.         TX packets 257  bytes 26394 (25.7 KiB)
  9.         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
  10. lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
  11.         inet 127.0.0.1  netmask 255.0.0.0
  12.         inet6 ::1  prefixlen 128  scopeid 0x10<host>
  13.         loop  txqueuelen 1000  (Local Loopback)
  14.         RX packets 4  bytes 360 (360.0 B)
  15.         RX errors 0  dropped 0  overruns 0  frame 0
  16.         TX packets 4  bytes 360 (360.0 B)
  17.         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
复制代码
1.3.1.4. 步骤四:查看全部监听端口( netstat -lntp)
  1. [root@master ~]# netstat -lntp
  2. Active Internet connections (only servers)
  3. Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name   
  4. tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      933/sshd            
  5. tcp6       0      0 :::3306                 :::*                    LISTEN      1021/mysqld         
  6. tcp6       0      0 :::22                   :::*                    LISTEN      933/sshd     、
复制代码
1.3.1.5. 步骤五:查看全部已经创建的连接( netstat -antp)
  1. [root@master ~]# netstat -antp
  2. Active Internet connections (servers and established)
  3. Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name   
  4. tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      933/sshd            
  5. tcp        0     52 10.10.10.128:22         10.10.10.1:59963        ESTABLISHED 1249/sshd: root@pts
  6. tcp6       0      0 :::3306                 :::*                    LISTEN      1021/mysqld         
  7. tcp6       0      0 :::22                   :::*                    LISTEN      933/sshd      
复制代码
1.3.1.6. 步骤六:实时显示进程状态( top ),该命令可以查看进程对 CPU 、内存的占比等。
  1. [root@master ~]# top
  2. top - 16:09:46 up 47 min,  2 users,  load average: 0.00, 0.01, 0.05
  3. Tasks: 115 total,   1 running, 114 sleeping,   0 stopped,   0 zombie
  4. %Cpu(s):  0.1 us,  0.0 sy,  0.0 ni, 99.9 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
  5. KiB Mem :  2030172 total,  1575444 free,   281296 used,   173432 buff/cache
  6. KiB Swap:  2097148 total,  2097148 free,        0 used.  1571928 avail Mem
  7.    PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                             
  8.   1021 mysql     20   0 1258940 191544   6840 S   0.3  9.4   0:01.71 mysqld                              
  9.      1 root      20   0  125456   3896   2560 S   0.0  0.2   0:00.96 systemd                             
  10.      2 root      20   0       0      0      0 S   0.0  0.0   0:00.00 kthreadd                           
  11.      3 root      20   0       0      0      0 S   0.0  0.0   0:00.00 ksoftirqd/0                        
  12.      5 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 kworker/0:0H                        
  13.      7 root      rt   0       0      0      0 S   0.0  0.0   0:00.02 migration/0                        
  14.      8 root      20   0       0      0      0 S   0.0  0.0   0:00.00 rcu_bh                              
  15.      9 root      20   0       0      0      0 S   0.0  0.0   0:00.15 rcu_sched                           
  16.     10 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 lru-add-drain                       
  17.     11 root      rt   0       0      0      0 S   0.0  0.0   0:00.00 watchdog/0                          
  18.     12 root      rt   0       0      0      0 S   0.0  0.0   0:00.00 watchdog/1                          
  19.     13 root      rt   0       0      0      0 S   0.0  0.0   0:00.00 migration/1                        
  20.     14 root      20   0       0      0      0 S   0.0  0.0   0:00.00 ksoftirqd/1                        
  21.     16 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 kworker/1:0H                        
  22.     17 root      rt   0       0      0      0 S   0.0  0.0   0:00.00 watchdog/2                          
  23.     18 root      rt   0       0      0      0 S   0.0  0.0   0:00.00 migration/2                        
  24.     19 root      20   0       0      0      0 S   0.0  0.0   0:00.00 ksoftirqd/2   
复制代码
1.3.1.7. 步骤七:查看 U CPU 信息( cat /proc/cpuinfo )
  1. [root@master ~]# cat /proc/cpuinfo
  2. processor        : 0
  3. vendor_id        : GenuineIntel
  4. cpu family        : 6
  5. model                : 158
  6. model name        : Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz
  7. stepping        : 10
  8. microcode        : 0xb4
  9. cpu MHz                : 3191.998
  10. cache size        : 12288 KB
  11. physical id        : 0
  12. siblings        : 2
  13. core id                : 0
  14. cpu cores        : 2
  15. apicid                : 0
  16. initial apicid        : 0
  17. fpu                : yes
  18. fpu_exception        : yes
  19. cpuid level        : 22
  20. wp                : yes
  21. flags                : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon nopl xtopology tsc_reliable nonstop_tsc eagerfpu pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch fsgsbase tsc_adjust bmi1 avx2 smep bmi2 invpcid rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 ibpb ibrs stibp arat spec_ctrl intel_stibp arch_capabilities
  22. bogomips        : 6383.99
  23. clflush size        : 64
  24. cache_alignment        : 64
  25. address sizes        : 45 bits physical, 48 bits virtual
  26. power management:
  27. processor        : 1
  28. vendor_id        : GenuineIntel
  29. cpu family        : 6
  30. model                : 158
  31. model name        : Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz
  32. stepping        : 10
  33. microcode        : 0xb4
  34. cpu MHz                : 3191.998
  35. cache size        : 12288 KB
  36. physical id        : 0
  37. siblings        : 2
  38. core id                : 1
  39. cpu cores        : 2
  40. apicid                : 1
  41. initial apicid        : 1
  42. fpu                : yes
  43. fpu_exception        : yes
  44. cpuid level        : 22
  45. wp                : yes
  46. flags                : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon nopl xtopology tsc_reliable nonstop_tsc eagerfpu pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch fsgsbase tsc_adjust bmi1 avx2 smep bmi2 invpcid rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 ibpb ibrs stibp arat spec_ctrl intel_stibp arch_capabilities
  47. bogomips        : 6383.99
  48. clflush size        : 64
  49. cache_alignment        : 64
  50. address sizes        : 45 bits physical, 48 bits virtual
  51. power management:
  52. processor        : 2
  53. vendor_id        : GenuineIntel
  54. cpu family        : 6
  55. model                : 158
  56. model name        : Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz
  57. stepping        : 10
  58. microcode        : 0xb4
  59. cpu MHz                : 3191.998
  60. cache size        : 12288 KB
  61. physical id        : 1
  62. siblings        : 2
  63. core id                : 0
  64. cpu cores        : 2
  65. apicid                : 2
  66. initial apicid        : 2
  67. fpu                : yes
  68. fpu_exception        : yes
  69. cpuid level        : 22
  70. wp                : yes
  71. flags                : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon nopl xtopology tsc_reliable nonstop_tsc eagerfpu pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch fsgsbase tsc_adjust bmi1 avx2 smep bmi2 invpcid rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 ibpb ibrs stibp arat spec_ctrl intel_stibp arch_capabilities
  72. bogomips        : 6383.99
  73. clflush size        : 64
  74. cache_alignment        : 64
  75. address sizes        : 45 bits physical, 48 bits virtual
  76. power management:
  77. processor        : 3
  78. vendor_id        : GenuineIntel
  79. cpu family        : 6
  80. model                : 158
  81. model name        : Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz
  82. stepping        : 10
  83. microcode        : 0xb4
  84. cpu MHz                : 3191.998
  85. cache size        : 12288 KB
  86. physical id        : 1
  87. siblings        : 2
  88. core id                : 1
  89. cpu cores        : 2
  90. apicid                : 3
  91. initial apicid        : 3
  92. fpu                : yes
  93. fpu_exception        : yes
  94. cpuid level        : 22
  95. wp                : yes
  96. flags                : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon nopl xtopology tsc_reliable nonstop_tsc eagerfpu pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch fsgsbase tsc_adjust bmi1 avx2 smep bmi2 invpcid rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 ibpb ibrs stibp arat spec_ctrl intel_stibp arch_capabilities
  97. bogomips        : 6383.99
  98. clflush size        : 64
  99. cache_alignment        : 64
  100. address sizes        : 45 bits physical, 48 bits virtual
  101. power management:
复制代码
1.3.1.8. 步骤八:查看内存信息( cat /proc/meminfo ),该命令可以查看总内存、空闲内存等信息。
  1. [root@master ~]# cat /proc/meminfo
  2. MemTotal:        2030172 kB
  3. MemFree:         1575448 kB
  4. MemAvailable:    1571932 kB
  5. Buffers:            2112 kB
  6. Cached:           126676 kB
  7. SwapCached:            0 kB
  8. Active:           251708 kB
  9. Inactive:         100540 kB
  10. Active(anon):     223876 kB
  11. Inactive(anon):     9252 kB
  12. Active(file):      27832 kB
  13. Inactive(file):    91288 kB
  14. Unevictable:           0 kB
  15. Mlocked:               0 kB
  16. SwapTotal:       2097148 kB
  17. SwapFree:        2097148 kB
  18. Dirty:                 0 kB
  19. Writeback:             0 kB
  20. AnonPages:        223648 kB
  21. Mapped:            28876 kB
  22. Shmem:              9668 kB
  23. Slab:              44644 kB
  24. SReclaimable:      18208 kB
  25. SUnreclaim:        26436 kB
  26. KernelStack:        4512 kB
  27. PageTables:         4056 kB
  28. NFS_Unstable:          0 kB
  29. Bounce:                0 kB
  30. WritebackTmp:          0 kB
  31. CommitLimit:     3112232 kB
  32. Committed_AS:     782724 kB
  33. VmallocTotal:   34359738367 kB
  34. VmallocUsed:      180220 kB
  35. VmallocChunk:   34359310332 kB
  36. HardwareCorrupted:     0 kB
  37. AnonHugePages:    178176 kB
  38. CmaTotal:              0 kB
  39. CmaFree:               0 kB
  40. HugePages_Total:       0
  41. HugePages_Free:        0
  42. HugePages_Rsvd:        0
  43. HugePages_Surp:        0
  44. Hugepagesize:       2048 kB
  45. DirectMap4k:       63360 kB
  46. DirectMap2M:     2033664 kB
  47. DirectMap1G:           0 kB
复制代码
1.3.2. 实验任务二:通过命令查看 Hadoop 状态

1.3.2.1. 步骤一:切换到 hadoop 用户

若当前的用户为 root,请切换到 hadoop 用户进行操作。
  1. [root@master ~]# su - hadoop
  2. Last login: Tue May 10 14:33:03 CST 2022 on pts/0
  3. [hadoop@master ~]$
复制代码
1.3.2.2. 步骤二:切换到 Hadoop 的安装目录
  1. [hadoop@master ~]$ cd /usr/local/src/hadoop/
  2. [hadoop@master hadoop]$
复制代码
1.3.2.3. 步骤三:启动 Hadoop
  1. [hadoop@master hadoop]$ start-all.sh
  2. This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
  3. Starting namenodes on [master]
  4. master: starting namenode, logging to /usr/local/src/hadoop/logs/hadoop-hadoop-namenode-master.out
  5. 10.10.10.130: starting datanode, logging to /usr/local/src/hadoop/logs/hadoop-hadoop-datanode-slave2.out
  6. 10.10.10.129: starting datanode, logging to /usr/local/src/hadoop/logs/hadoop-hadoop-datanode-slave1.out
  7. Starting secondary namenodes [0.0.0.0]
  8. 0.0.0.0: starting secondarynamenode, logging to /usr/local/src/hadoop/logs/hadoop-hadoop-secondarynamenode-master.out
  9. starting yarn daemons
  10. starting resourcemanager, logging to /usr/local/src/hadoop/logs/yarn-hadoop-resourcemanager-master.out
  11. 10.10.10.130: starting nodemanager, logging to /usr/local/src/hadoop/logs/yarn-hadoop-nodemanager-slave2.out
  12. 10.10.10.129: starting nodemanager, logging to /usr/local/src/hadoop/logs/yarn-hadoop-nodemanager-slave1.out
  13. [hadoop@master hadoop]$ jps
  14. 1697 SecondaryNameNode
  15. 2115 Jps
  16. 1865 ResourceManager
  17. 1498 NameNode
复制代码
1.3.2.4. 步骤四:关闭 Hadoop
  1. [hadoop@master hadoop]$ stop-all.sh
  2. This script is Deprecated. Instead use stop-dfs.sh and stop-yarn.sh
  3. Stopping namenodes on [master]
  4. master: stopping namenode
  5. 10.10.10.130: stopping datanode
  6. 10.10.10.129: stopping datanode
  7. Stopping secondary namenodes [0.0.0.0]
  8. 0.0.0.0: stopping secondarynamenode
  9. stopping yarn daemons
  10. stopping resourcemanager
  11. 10.10.10.129: stopping nodemanager
  12. 10.10.10.130: stopping nodemanager
  13. no proxyserver to stop
复制代码
实验二:通过命令监控大数据平台资源状态

2.1 实验目标

完成本实验,您应该能够:

  • 掌握大数据平台资源的运行状况
  • 掌握查看大数据平台资源运行状况的命令
2.2. 实验要求


  • 认识查看大数据平台资源运行状态的方式

  • 相识查看大数据平台资源运行状况的命令
2.3. 实验过程

2.3.1. 实验任务一:看通过命令查看YARN状态

2.3.1.1. 步骤一:确认切换到目录 确认切换到目录 /usr/local/src/hadoop
  1. [hadoop@master ~]$ cd /usr/local/src/hadoop/
  2. [hadoop@master hadoop]$
复制代码
2.3.1.2. 步骤二:返回主机界面在在Master主机上执行 start-all.sh
  1. [hadoop@master ~]$ start-all.sh
  2. This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
  3. Starting namenodes on [master]
  4. master: starting namenode, logging to /usr/local/src/hadoop/logs/hadoop-hadoop-namenode-master.out
  5. 10.10.10.129: starting datanode, logging to /usr/local/src/hadoop/logs/hadoop-hadoop-datanode-slav1.out
  6. 10.10.10.130: starting datanode, logging to /usr/local/src/hadoop/logs/hadoop-hadoop-datanode-slave2.out
  7. Starting secondary namenodes [0.0.0.0]
  8. 0.0.0.0: starting secondarynamenode, logging to /usr/local/src/hadoop/logs/hadoop-hadoop-secondarynamenode-master.out
  9. starting yarn daemons
  10. starting resourcemanager, logging to /usr/local/src/hadoop/logs/yarn-hadoop-resourcemanager-master.out
  11. 10.10.10.129: starting nodemanager, logging to /usr/local/src/hadoop/logs/yarn-hadoop-nodemanager-slav1.out
  12. 10.10.10.130: starting nodemanager, logging to /usr/local/src/hadoop/logs/yarn-hadoop-nodemanager-slave2.out
  13. [hadoop@master ~]$
  14. #master 节点启动 zookeeper
  15. [hadoop@master hadoop]$ zkServer.sh start
  16. ZooKeeper JMX enabled by default
  17. Using config: /usr/local/src/zookeeper/bin/../conf/zoo.cfg
  18. Starting zookeeper ... STARTED
  19. #slave1 节点启动 zookeeper
  20. [hadoop@slav1 hadoop]$ zkServer.sh start
  21. ZooKeeper JMX enabled by default
  22. Using config: /usr/local/src/zookeeper/bin/../conf/zoo.cfg
  23. Starting zookeeper ... STARTED
  24. #slave2 节点启动 zookeeper
  25. [hadoop@slave2 hadoop]$ zkServer.sh start
  26. ZooKeeper JMX enabled by default
  27. Using config: /usr/local/src/zookeeper/bin/../conf/zoo.cfg
  28. Starting zookeeper ... STARTED
复制代码
2.3.1.3. 步骤三:执行JPS命令,发现Master上有NodeManager进程和ResourceManager进程,则YARN启动完成。
  1. 2817 NameNode
  2. 3681 ResourceManager
  3. 3477 NodeManager
  4. 3909 Jps
  5. 2990 SecondaryNameNode
复制代码
2.3.2. 实验任务二:通过命令查看HDFS状态

2.3.2.1. 步骤一:目录操作

切换到 hadoop 目录,执行 cd /usr/local/src/hadoop 命令
  1. [hadoop@master ~]$ cd /usr/local/src/hadoop
  2. [hadoop@master hadoop]$
复制代码
查看 HDFS 目录
  1. [hadoop@master hadoop]$ ./bin/hdfs dfs –ls /
复制代码
2.3.2.2. 步骤二:查看HDSF的报告,执行命令:bin/hdfs dfsadmin -report
  1. [hadoop@master hadoop]$ bin/hdfs dfsadmin -report
  2. Configured Capacity: 36477861888 (33.97 GB)
  3. Present Capacity: 31767752704 (29.59 GB)
  4. DFS Remaining: 31767146496 (29.59 GB)
  5. DFS Used: 606208 (592 KB)
  6. DFS Used%: 0.00%
  7. Under replicated blocks: 0
  8. Blocks with corrupt replicas: 0
  9. Missing blocks: 0
  10. Missing blocks (with replication factor 1): 0
  11. -------------------------------------------------
  12. Live datanodes (2):
  13. Name: 10.10.10.129:50010 (node1)
  14. Hostname: node1
  15. Decommission Status : Normal
  16. Configured Capacity: 18238930944 (16.99 GB)
  17. DFS Used: 303104 (296 KB)
  18. Non DFS Used: 2379792384 (2.22 GB)
  19. DFS Remaining: 15858835456 (14.77 GB)
  20. DFS Used%: 0.00%
  21. DFS Remaining%: 86.95%
  22. Configured Cache Capacity: 0 (0 B)
  23. Cache Used: 0 (0 B)
  24. Cache Remaining: 0 (0 B)
  25. Cache Used%: 100.00%
  26. Cache Remaining%: 0.00%
  27. Xceivers: 1
  28. Last contact: Fri May 20 18:31:48 CST 2022
复制代码
  1. Name: 10.10.10.130:50010 (node2)
  2. Hostname: node2
  3. Decommission Status : Normal
  4. Configured Capacity: 18238930944 (16.99 GB)
  5. DFS Used: 303104 (296 KB)
  6. Non DFS Used: 2330316800 (2.17 GB)
  7. DFS Remaining: 15908311040 (14.82 GB)
  8. DFS Used%: 0.00%
  9. DFS Remaining%: 87.22%
  10. Configured Cache Capacity: 0 (0 B)
  11. Cache Used: 0 (0 B)
  12. Cache Remaining: 0 (0 B)
  13. Cache Used%: 100.00%
  14. Cache Remaining%: 0.00%
  15. Xceivers: 1
  16. Last contact: Fri May 20 18:31:48 CST 2022
复制代码
2.3.2.3. 步骤三:查看 HDFS 空间环境,执行命令:hdfs dfs -df
  1. [hadoop@master hadoop]$ hdfs dfs -df
  2. Filesystem                 Size    Used    Available  Use%
  3. hdfs://master:9000  36477861888  606208  31767146496    0%
复制代码
2.3.3. 实验任务三:看通过命令查看HBase状态

2.3.3.1. 步骤一 :启动运行HBase

切换到 HBase 安装目录/usr/local/src/hbase,命令如下:
  1. [hadoop@master hadoop]$ cd /usr/local/src/hbase
  2. [hadoop@master hbase]$ hbase version
  3. HBase 1.2.1
  4. Source code repository git://asf-dev/home/busbey/projects/hbase revision=8d8a7107dc4ccbf36a92f64675dc60392f85c015
  5. Compiled by busbey on Wed Mar 30 11:19:21 CDT 2016
  6. From source with checksum f4bb4a14bb4e0b72b46f729dae98a772
复制代码
结果显示 HBase1.2.1,说明 HBase 正在运行,版本号为 1.2.1。
2.3.3.2. 步骤二:查看HBase版本信息

执行命令hbase shell,进入HBase命令交互界面。
  1. [hadoop@master hbase]$ hbase shell
  2. SLF4J: Class path contains multiple SLF4J bindings.
  3. SLF4J: Found binding in [jar:file:/usr/local/src/hbase/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  4. SLF4J: Found binding in [jar:file:/usr/local/src/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  5. SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
  6. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
  7. HBase Shell; enter 'help<RETURN>' for list of supported commands.
  8. Type "exit<RETURN>" to leave the HBase Shell
  9. Version 1.2.1, r8d8a7107dc4ccbf36a92f64675dc60392f85c015, Wed Mar 30 11:19:21 CDT 2016
复制代码
输入version,查询 HBase 版本
  1. hbase(main):001:0> version
  2. 1.2.1, r8d8a7107dc4ccbf36a92f64675dc60392f85c015, Wed Mar 30 11:19:21 CDT 2016
复制代码
结果显示 HBase 版本为 1.2.1
2.3.3.3. 步骤三 :查询 HBase 状态,在 HBase 命令交互界面,执行 status 命令
  1. 1 active master, 0 backup masters, 3 servers, 0 dead, 0.6667
  2. average load
复制代码
我们还可以“简单”查询 HBase 的状态,执行命令 status 'simple'
  1. active master: master:16000 1589125905790
  2. 0 backup masters
  3. 3 live servers
  4. master:16020 1589125908065
  5. requestsPerSecond=0.0, numberOfOnlineRegions=1,
  6. usedHeapMB=28, maxHeapMB=1918, numberOfStores=1,
  7. numberOfStorefiles=1, storefileUncompressedSizeMB=0,
  8. storefileSizeMB=0, memstoreSizeMB=0, storefileIndexSizeMB=0,
  9. readRequestsCount=5, writeRequestsCount=1, rootIndexSizeKB=0,
  10. totalStaticIndexSizeKB=0, totalStaticBloomSizeKB=0,
  11. totalCompactingKVs=0, currentCompactedKVs=0,
  12. compactionProgressPct=NaN, coprocessors=[MultiRowMutationEndpoint]
  13. slave1:16020 1589125915820
  14. requestsPerSecond=0.0, numberOfOnlineRegions=0,
  15. usedHeapMB=17, maxHeapMB=440, numberOfStores=0,
  16. numberOfStorefiles=0, storefileUncompressedSizeMB=0,
  17. storefileSizeMB=0, memstoreSizeMB=0, storefileIndexSizeMB=0,
  18. readRequestsCount=0, writeRequestsCount=0, rootIndexSizeKB=0,
  19. totalStaticIndexSizeKB=0, totalStaticBloomSizeKB=0,
  20. totalCompactingKVs=0, currentCompactedKVs=0,
  21. compactionProgressPct=NaN, coprocessors=[]
  22. slave2:16020 1589125917741
  23. requestsPerSecond=0.0, numberOfOnlineRegions=1,
  24. usedHeapMB=15, maxHeapMB=440, numberOfStores=1,
  25. numberOfStorefiles=1, storefileUncompressedSizeMB=0,
  26. storefileSizeMB=0, memstoreSizeMB=0, storefileIndexSizeMB=0,
  27. readRequestsCount=4, writeRequestsCount=0, rootIndexSizeKB=0,
  28. totalStaticIndexSizeKB=0, totalStaticBloomSizeKB=0,
  29. totalCompactingKVs=0, currentCompactedKVs=0,
  30. compactionProgressPct=NaN, coprocessors=[]
  31. 0 dead servers
  32. Aggregate load: 0, regions: 2
复制代码
显示更多的关于 Master、Slave1和 Slave2 主机的服务端口、请求时间等具体信息。
如果需要查询更多关于 HBase 状态,执行命令 help 'status'
  1. hbase(main):004:0> help 'status'
  2. Show cluster status. Can be 'summary', 'simple', 'detailed', or 'replication'. The
  3. default is 'summary'. Examples:
  4.   hbase> status
  5.   hbase> status 'simple'
  6.   hbase> status 'summary'
  7.   hbase> status 'detailed'
  8.   hbase> status 'replication'
  9.   hbase> status 'replication', 'source'
  10.   hbase> status 'replication', 'sink'
复制代码
结果显示出全部关于 status 的命令。
2.3.3.4. 步骤四 停止HBase服务

停止HBase服务,则执行命令stop-hbase.sh。
  1. [hadoop@master hbase]$ stop-hbase.sh
  2. stopping hbasecat.........
复制代码
2.4.4. 实验任务四:通过命令查看 Hive 状态

2.4.4.1. 步骤一:启动 Hive

切换到/usr/local/src/hive 目录,输入 hive,回车。
  1. [hadoop@master ~]$ cd /usr/local/src/hive/[hadoop@master hive]$ hive
  2. SLF4J: Class path contains multiple SLF4J bindings.
  3. SLF4J: Found binding in [jar:file:/usr/local/src/hive/lib/hive-jdbc-2.0.0-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  4. SLF4J: Found binding in [jar:file:/usr/local/src/hive/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  5. SLF4J: Found binding in [jar:file:/usr/local/src/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  6. SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
  7. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
  8. Logging initialized using configuration in jar:file:/usr/local/src/hive/lib/hive-common-2.0.0.jar!/hive-log4j2.properties
  9. Fri May 20 18:51:50 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
  10. Fri May 20 18:51:50 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
  11. Fri May 20 18:51:50 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
  12. Fri May 20 18:51:50 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
  13. Fri May 20 18:51:52 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
  14. Fri May 20 18:51:52 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
  15. Fri May 20 18:51:52 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
  16. Fri May 20 18:51:52 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
  17. Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
  18. hive>
复制代码
当显示 hive>时,表示启动成功,进入到了 Hive shell 状态。
2.4.4.2. 步骤二:Hive 操作基本命令

注意:Hive 命令行语句后面一定要加分号。
(1)查看数据库
  1. hive> show databases;
  2. OK
  3. default
  4. sample
  5. Time taken: 0.596 seconds, Fetched: 2 row(s)
  6. hive>
复制代码
显示默认的数据库 default。
(2)查看 default 数据库全部表
  1. hive> use default;
  2. OK
  3. Time taken: 0.018 seconds
  4. hive> show tables;
  5. OK
  6. test
  7. Time taken: 0.036 seconds, Fetched: 1 row(s)
  8. hive>
复制代码
显示 default 数据中没有任何表。
(3)创建表 stu,表的 id 为整数型,name 为字符型
  1. hive> create table stu(id int,name string);
  2. OK
  3. Time taken: 0.23 seconds
  4. hive>
复制代码
(4)为表 stu 插入一条信息,id 号为 001,name 为张三
  1. hive> insert into stu values (1001,"zhangsan");
  2. WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
  3. Query ID = hadoop_20220520185326_7c18630d-0690-4b35-8de8-423c9b901677
  4. Total jobs = 3
  5. Launching Job 1 out of 3
  6. Number of reduce tasks is set to 0 since there's no reduce operator
  7. Starting Job = job_1653042072571_0001, Tracking URL = http://master:8088/proxy/application_1653042072571_0001/
  8. Kill Command = /usr/local/src/hadoop/bin/hadoop job  -kill job_1653042072571_0001
  9. Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
  10. 2022-05-20 18:56:05,436 Stage-1 map = 0%,  reduce = 0%
  11. 2022-05-20 18:56:11,699 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 3.47 sec
  12. MapReduce Total cumulative CPU time: 3 seconds 470 msec
  13. Ended Job = job_1653042072571_0001
  14. Stage-4 is selected by condition resolver.
  15. Stage-3 is filtered out by condition resolver.
  16. Stage-5 is filtered out by condition resolver.
  17. Moving data to: hdfs://master:9000/user/hive/warehouse/stu/.hive-staging_hive_2022-05-20_18-55-52_567_2370673334190980235-1/-ext-10000
  18. Loading data to table default.stu
  19. MapReduce Jobs Launched:
  20. Stage-Stage-1: Map: 1   Cumulative CPU: 3.47 sec   HDFS Read: 4138 HDFS Write: 81 SUCCESS
  21. Total MapReduce CPU Time Spent: 3 seconds 470 msec
  22. OK
  23. Time taken: 20.438 seconds
复制代码
按照以上操作,继续插入两条信息:id 和 name 分别为 1002、1003 和 lisi、wangwu。
(5)插入数据后查看表的信息
  1. hive> show tables;
  2. OK
  3. stu
  4. test
  5. values__tmp__table__1
  6. Time taken: 0.017 seconds, Fetched: 3 row(s)
  7. hive>
复制代码
(6)查看表 stu 布局
  1. hive> desc stu;
  2. OK
  3. id                          int                                             
  4. name                        string                                          
  5. Time taken: 0.031 seconds, Fetched: 2 row(s)
  6. hive>
复制代码
(7)查看表 stu 的内容
  1. hive> select * from stu;
  2. OK
  3. 1001        zhangsan
  4. Time taken: 0.077 seconds, Fetched: 1 row(s)
  5. hive>
复制代码
2.4.4.3. 步骤三:通过 Hive 命令行界面查看文件系统和历史命令

(1)查看当地文件系统,执行命令 ! ls /usr/local/src;
  1. hive> ! ls /usr/local/src;
  2. apache-hive-2.0.0-bin
  3. flume
  4. hadoop
  5. hbase
  6. hive
  7. jdk
  8. sqoop
  9. zookeeper
复制代码
(2)查看 HDFS 文件系统,执行命令 dfs -ls /;
  1. hive> dfs -ls /;
  2. Found 5 items
  3. drwxr-xr-x   - hadoop supergroup          0 2022-04-15 22:04 /hbase
  4. drwxr-xr-x   - hadoop supergroup          0 2022-04-02 18:24 /input
  5. drwxr-xr-x   - hadoop supergroup          0 2022-04-02 18:26 /output
  6. drwxr-xr-x   - hadoop supergroup          0 2022-05-20 18:55 /tmp
  7. drwxr-xr-x   - hadoop supergroup          0 2022-04-29 17:03 /user
复制代码
(3)查看在 Hive 中输入的全部历史命令
进入到当前用户 Hadoop 的目录/home/hadoop,查看.hivehistory 文件。
  1. [hadoop@master ~]$ cd /home/hadoop
  2. [hadoop@master ~]$ cat .hivehistory
  3. create database sample;
  4. use sample;
  5. create table student(number STRING,name STRING);
  6. exit;
  7. select * from sample.student;
  8. exit;
  9. show tables;
  10. exit;
  11. show databases;
  12. use default;
  13. show tables;
  14. create table stu(id int,name string);
  15. insert into stu values (1001,"zhangsan");
  16. show tables;
  17. desc stu;
  18. select * from stu;
  19. ! ls /usr/local/src;
  20. dfs -ls /;
  21. exit
  22. ;
复制代码
结果显示,之前在 Hive 命令行界面下运行的全部命令(含错误命令)都显示了出来,有助于维护、故障排查等工作。
实验三 通过命令监控大数据平台服务状态

3.1. 实验目标

完成本实验,您应该能够:

  • 掌握大数据平台服务的运行状况
  • 掌握查看大数据平台服务运行状况的命令
3.2. 实验要求


  • 认识查看大数据平台服务运行状态的方式
  • 相识查看大数据平台服务运行状况的命令
3.3. 实验过程

3.3.1. 实验任务一: 通过命令查看 ZooKeeper 状态

3.3.1.1. 步骤一: 查看ZooKeeper状态,执行命令 zkServer.sh status,结果显示如下
  1. [hadoop@master ~]$ zkServer.sh status
  2. ZooKeeper JMX enabled by default
  3. Using config: /usr/local/src/zookeeper/bin/../conf/zoo.cfg
  4. Mode: follower
复制代码
以上结果中,Mode:follower 表示为 ZooKeeper 的跟随者。
3.3.1.2. 步骤二: 查看运行进程

QuorumPeerMain:QuorumPeerMain 是 ZooKeeper 集群的启动入口类,是用来加载配置启动 QuorumPeer线程的。
执行命令 jps 以查看进程环境。
  1. [hadoop@master ~]$ jps
  2. 5029 Jps
  3. 3494 SecondaryNameNode
  4. 3947 QuorumPeerMain
  5. 3292 NameNode
  6. 3660 ResourceManager
复制代码
3.3.1.3. 步骤四: 在成功启动ZooKeeper服务后,输入命令 zkCli.sh,连接到ZooKeeper 服务。
  1. [hadoop@master ~]$ zkCli.sh
  2. Connecting to localhost:2181
  3. 2022-05-20 19:07:11,924 [myid:] - INFO  [main:Environment@100] - Client environment:zookeeper.version=3.4.8--1, built on 02/06/2016 03:18 GMT
  4. 2022-05-20 19:07:11,927 [myid:] - INFO  [main:Environment@100] - Client environment:host.name=master
  5. 2022-05-20 19:07:11,927 [myid:] - INFO  [main:Environment@100] - Client environment:java.version=1.8.0_152
  6. 2022-05-20 19:07:11,929 [myid:] - INFO  [main:Environment@100] - Client environment:java.vendor=Oracle Corporation
  7. 2022-05-20 19:07:11,929 [myid:] - INFO  [main:Environment@100] - Client environment:java.home=/usr/local/src/jdk/jre
  8. 2022-05-20 19:07:11,929 [myid:] - INFO  [main:Environment@100] - Client environment:java.class.path=/usr/local/src/zookeeper/bin/../build/classes:/usr/local/src/zookeeper/bin/../build/lib/*.jar:/usr/local/src/zookeeper/bin/../lib/slf4j-log4j12-1.6.1.jar:/usr/local/src/zookeeper/bin/../lib/slf4j-api-1.6.1.jar:/usr/local/src/zookeeper/bin/../lib/netty-3.7.0.Final.jar:/usr/local/src/zookeeper/bin/../lib/log4j-1.2.16.jar:/usr/local/src/zookeeper/bin/../lib/jline-0.9.94.jar:/usr/local/src/zookeeper/bin/../zookeeper-3.4.8.jar:/usr/local/src/zookeeper/bin/../src/java/lib/*.jar:/usr/local/src/zookeeper/bin/../conf::/usr/local/src/sqoop/lib
  9. 2022-05-20 19:07:11,929 [myid:] - INFO  [main:Environment@100] - Client environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
  10. 2022-05-20 19:07:11,929 [myid:] - INFO  [main:Environment@100] - Client environment:java.io.tmpdir=/tmp
  11. 2022-05-20 19:07:11,929 [myid:] - INFO  [main:Environment@100] - Client environment:java.compiler=<NA>
  12. 2022-05-20 19:07:11,929 [myid:] - INFO  [main:Environment@100] - Client environment:os.name=Linux
  13. 2022-05-20 19:07:11,929 [myid:] - INFO  [main:Environment@100] - Client environment:os.arch=amd64
  14. 2022-05-20 19:07:11,929 [myid:] - INFO  [main:Environment@100] - Client environment:os.version=3.10.0-862.el7.x86_64
  15. 2022-05-20 19:07:11,929 [myid:] - INFO  [main:Environment@100] - Client environment:user.name=hadoop
  16. 2022-05-20 19:07:11,929 [myid:] - INFO  [main:Environment@100] - Client environment:user.home=/home/hadoop
  17. 2022-05-20 19:07:11,929 [myid:] - INFO  [main:Environment@100] - Client environment:user.dir=/home/hadoop
  18. 2022-05-20 19:07:11,930 [myid:] - INFO  [main:ZooKeeper@438] - Initiating client connection, connectString=localhost:2181 sessionTimeout=30000 watcher=org.apache.zookeeper.ZooKeeperMain$MyWatcher@69d0a921
  19. Welcome to ZooKeeper!
  20. 2022-05-20 19:07:11,946 [myid:] - INFO  [main-SendThread(localhost:2181):ClientCnxn$SendThread@1032] - Opening socket connection to server localhost/0:0:0:0:0:0:0:1:2181. Will not attempt to authenticate using SASL (unknown error)
  21. JLine support is enabled
  22. 2022-05-20 19:07:11,984 [myid:] - INFO  [main-SendThread(localhost:2181):ClientCnxn$SendThread@876] - Socket connection established to localhost/0:0:0:0:0:0:0:1:2181, initiating session
  23. 2022-05-20 19:07:11,991 [myid:] - INFO  [main-SendThread(localhost:2181):ClientCnxn$SendThread@1299] - Session establishment complete on server localhost/0:0:0:0:0:0:0:1:2181, sessionid = 0x180e0fed4990001, negotiated timeout = 30000
  24. WATCHER::
  25. WatchedEvent state:SyncConnected type:None path:null
  26. [zk: localhost:2181(CONNECTED) 0]
复制代码
3.3.1.4. 步骤五: 使用 Watch 监听/hbase 目录,一旦/hbase 内容有变化,将会有提 内容有变化,将会有提示。打开监视,执行命令 示。打开监视,执行命令 get /hbase 1 。
  1. cZxid = 0x100000002
  2. ctime = Thu Apr 23 16:02:29 CST 2022
  3. mZxid = 0x100000002
  4. mtime = Thu Apr 23 16:02:29 CST 2022
  5. pZxid = 0x20000008d
  6. cversion = 26
  7. dataVersion = 0
  8. aclVersion = 0
  9. ephemeralOwner = 0x0
  10. dataLength = 0
  11. numChildren = 16
  12. [zk: localhost:2181(CONNECTED) 1] set /hbase value-update
  13. WATCHER::cZxid = 0x100000002
  14. WatchedEvent state:SyncConnected type:NodeDataChanged
  15. path:/hbase
  16. ctime = Thu Apr 23 16:02:29 CST 2022
  17. mZxid = 0x20000c6d3
  18. mtime = Fri May 15 15:03:41 CST 2022
  19. pZxid = 0x20000008d
  20. cversion = 26
  21. dataVersion = 1
  22. aclVersion = 0
  23. ephemeralOwner = 0x0
  24. dataLength = 12
  25. numChildren = 16
  26. [zk: localhost:2181(CONNECTED) 2] get /hbase
  27. value-update
  28. cZxid = 0x100000002
  29. ctime = Thu Apr 23 16:02:29 CST 2022
  30. mZxid = 0x20000c6d3
  31. mtime = Fri May 15 15:03:41 CST 2022
  32. pZxid = 0x20000008d
  33. cversion = 26
  34. dataVersion = 1
  35. aclVersion = 0
  36. ephemeralOwner = 0x0
  37. dataLength = 12
  38. numChildren = 16
  39. [zk: localhost:2181(CONNECTED) 3] quit
复制代码
结果显示,当执行命令 set /hbase value-update 后,数据版本由 0 变成 1,说明/hbase 处于监控中。
3.3.2. 实验任务二:通过命令查看 Sqoop 状态

3.3.2.1. 步骤一: 查询 Sqoop 版本号,验证 Sqoop 是否启动成功。

起首切换到/usr/local/src/sqoop 目录,执行命令:./bin/sqoop-version
  1. [hadoop@master ~]$ cd /usr/local/src/sqoop
  2. [hadoop@master sqoop]$ ./bin/sqoop-version
  3. Warning: /usr/local/src/sqoop/../hcatalog does not exist! HCatalog jobs will fail.
  4. Please set $HCAT_HOME to the root of your HCatalog installation.
  5. Warning: /usr/local/src/sqoop/../accumulo does not exist! Accumulo imports will fail.
  6. Please set $ACCUMULO_HOME to the root of your Accumulo installation.
  7. 22/05/20 19:10:55 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
  8. Sqoop 1.4.7
  9. git commit id 2328971411f57f0cb683dfb79d19d4d19d185dd8
  10. Compiled by maugli on Thu Dec 21 15:59:58 STD 2017
复制代码
结果显示:Sqoop 1.4.7,说明 Sqoop 版本号为 1.4.7,并启动成功。
3.3.2.2. 步骤二: 测试 Sqoop 是否能够成功连接数据库

切换到Sqoop 的 目 录 , 执 行 命 令 bin/sqoop list-databases --connect jdbc:mysql://master:3306/ --username root --password Password123$,命令中“master:3306”为数据库主机名和端口。
  1. [hadoop@master sqoop]$ bin/sqoop list-databases --connect jdbc:mysql://master:3306/ --username root --password Password123$
  2. Warning: /usr/local/src/sqoop/../hcatalog does not exist! HCatalog jobs will fail.
  3. Please set $HCAT_HOME to the root of your HCatalog installation.
  4. Warning: /usr/local/src/sqoop/../accumulo does not exist! Accumulo imports will fail.
  5. Please set $ACCUMULO_HOME to the root of your Accumulo installation.
  6. 22/05/20 19:13:21 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
  7. 22/05/20 19:13:21 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
  8. 22/05/20 19:13:21 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
  9. Fri May 20 19:13:21 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
  10. information_schema
  11. hive
  12. mysql
  13. performance_schema
  14. sample
  15. sys
复制代码
结果显示,可以连接到 MySQL,并查看到 Master 主机中 MySQL 的全部库实例,如information_schema、hive、mysql、performance_schema 和 sys 等数据库。
3.3.2.3. 步骤三: 执行命令sqoop help ,可以看到如下内容,代表Sqoop 启动成功。
  1. [hadoop@master sqoop]$ sqoop help
  2. Warning: /usr/local/src/sqoop/../hcatalog does not exist! HCatalog jobs will fail.
  3. Please set $HCAT_HOME to the root of your HCatalog installation.
  4. Warning: /usr/local/src/sqoop/../accumulo does not exist! Accumulo imports will fail.
  5. Please set $ACCUMULO_HOME to the root of your Accumulo installation.
  6. 22/05/20 19:14:48 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
  7. usage: sqoop COMMAND [ARGS]
  8. Available commands:
  9.   codegen            Generate code to interact with database records
  10.   create-hive-table  Import a table definition into Hive
  11.   eval               Evaluate a SQL statement and display the results
  12.   export             Export an HDFS directory to a database table
  13.   help               List available commands
  14.   import             Import a table from a database to HDFS
  15.   import-all-tables  Import tables from a database to HDFS
  16.   import-mainframe   Import datasets from a mainframe server to HDFS
  17.   job                Work with saved jobs
  18.   list-databases     List available databases on a server
  19.   list-tables        List available tables in a database
  20.   merge              Merge results of incremental imports
  21.   metastore          Run a standalone Sqoop metastore
  22.   version            Display version information
  23. See 'sqoop help COMMAND' for information on a specific command.
复制代码
结果显示了 Sqoop 的常用命令和功能,如下表所示。


3.3.3. 实验任务三:通过命令查看Flume状态

3.3.3.1. 步骤一: 检查 Flume安装是否成功,执行flume-ng version 命令,查看 Flume的版本。
  1. [hadoop@master ~]$ cd /usr/local/src/flume
  2. [hadoop@master flume]$ flume-ng version
  3. Flume 1.6.0
  4. Source code repository: https://git-wip-us.apache.org/repos/asf/flume.git
  5. Revision: 2561a23240a71ba20bf288c7c2cda88f443c2080
  6. Compiled by hshreedharan on Mon May 11 11:15:44 PDT 2015
  7. From source with checksum b29e416802ce9ece3269d34233baf43f
复制代码
3.3.3.2. 步骤二: 添加 example.conf 到/usr/local/src/flume
  1. [hadoop@master flume]$ cat /usr/local/src/flume/example.conf
  2. a1.sources=r1
  3. a1.sinks=k1
  4. a1.channels=c1
  5. a1.sources.r1.type=spooldir
  6. a1.sources.r1.spoolDir=/usr/local/src/flume/
  7. a1.sources.r1.fileHeader=true
  8. a1.sinks.k1.type=hdfs
复制代码
第4章 Hadoop文件参数配置

实验一:hadoop 全分布配置

1.1 实验目标

完成本实验,您应该能够:

  • 掌握 hadoop 全分布的配置
  • 掌握 hadoop 全分布的安装
  • 掌握 hadoop 配置文件的参数意义
1.2 实验要求


  • 认识 hadoop 全分布的安装
  • 相识 hadoop 配置文件的意义
1.3 实验过程

1.3.1 实验任务一:在 Master 节点上安装 Hadoop

1.3.1.1 步骤一:解压缩 hadoop-2.7.1.tar.gz 安装包到/usr 目录下
  1. [root@master ~]# tar zvxf jdk-8u152-linux-x64.tar.gz -C /usr/local/src/
  2. [root@master ~]# tar zvxf hadoop-2.7.1.tar.gz -C /usr/local/src/
复制代码
1.3.1.2 步骤二:将 hadoop-2.7.1 文件夹重命名为 hadoop
  1. [root@master ~]# cd /usr/local/src/
  2. [root@master src]# ls
  3. hadoop-2.7.1  jdk1.8.0_152
  4. [root@master src]# mv hadoop-2.7.1/ hadoop
  5. [root@master src]# mv jdk1.8.0_152/ jdk
  6. [root@master src]# ls
  7. hadoop  jdk
复制代码
1.3.1.3 步骤三:配置 Hadoop 环境变量

​ [root@master ~]# vi /etc/profile.d/hadoop.sh
注意:在第二章安装单机 Hadoop 系统已经配置过环境变量,先删除之前配置后添加
  1. #写入以下信息
  2. export JAVA_HOME=/usr/local/src/jdk
  3. export HADOOP_HOME=/usr/local/src/hadoop
  4. export PATH=${JAVA_HOME}/bin:${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin:$PATH
复制代码
1.3.1.4 步骤四:使配置的 Hadoop 的环境变量生效
  1. [root@master ~]# source /etc/profile.d/hadoop.sh
  2. [root@master ~]# echo $PATH
  3. /usr/local/src/jdk/bin:/usr/local/src/hadoop/bin:/usr/local/src/hadoop/sbin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin
复制代码
1.3.1.5 步骤五:执行以下命令修改 hadoop-env.sh 配置文件
  1. [root@master ~]# vi /usr/local/src/hadoop/etc/hadoop/hadoop-env.sh
  2. #写入以下信息
  3. export JAVA_HOME=/usr/local/src/jdk
复制代码
1.3.2 实验任务二:配置 hdfs-site.xml 文件参数

执行以下命令修改 hdfs-site.xml 配置文件。
  1. [root@master ~]# vi /usr/local/src/hadoop/etc/hadoop/hdfs-site.xml
  2. #在文件中<configuration>和</configuration>一对标签之间追加以下配置信息
  3. <configuration>
  4.                 <property>
  5.                                 <name>dfs.namenode.name.dir</name>
  6.                                 <value>file:/usr/local/src/hadoop/dfs/name</value>
  7.                 </property>
  8.                 <property>
  9.                                 <name>dfs.datanode.data.dir</name>
  10.                                 <value>file:/usr/local/src/hadoop/dfs/data</value>
  11.                 </property>
  12.                 <property>
  13.                                 <name>dfs.replication</name>
  14.                                 <value>2</value>
  15.                 </property>
  16. </configuration>
  17. 创建目录
  18. [root@master ~]# mkdir -p /usr/local/src/hadoop/dfs/{name,data}
复制代码
对于 Hadoop 的分布式文件系统 HDFS 而言,一样平常都是采用冗余存储,冗余因子通常为3,也就是说,一份数据生存三份副本。所以,修改 dfs.replication 的配置,使 HDFS 文件的备份副本数量设定为2个。
1.3.3 实验任务三:配置 core-site.xml 文件参数
  1. [root@master ~]# vi /usr/local/src/hadoop/etc/hadoop/core-site.xml
  2. #在文件中<configuration>和</configuration>一对标签之间追加以下配置信息
  3. <configuration>
  4.                 <property>
  5.                                 <name>fs.defaultFS</name>
  6.                                 <value>hdfs://master:9000</value>
  7.                 </property>
  8.                 <property>
  9.                                 <name>io.file.buffer.size</name>
  10.                 <value>131072</value>
  11.                 </property>
  12.                 <property>
  13.                                 <name>hadoop.tmp.dir</name>
  14.                                 <value>file:/usr/local/src/hadoop/tmp</value>
  15.                 </property>
  16. </configuration>
  17. #保存以上配置后创建目录
  18. [root@master ~]# mkdir -p /usr/local/src/hadoop/tmp
复制代码
如没有配置 hadoop.tmp.dir 参数,此时系统默认的临时目录为:/tmp/hadoop-hadoop。该目录在每次 Linux 系统重启后会被删除,必须重新执行 Hadoop 文件系统格式化命令,否则 Hadoop 运行会出错。
1.3.4 实验任务四:配置 mapred-site.xml
  1. [root@master ~]# cd /usr/local/src/hadoop/etc/hadoop/
  2. [root@master hadoop]# cp mapred-site.xml.template mapred-site.xml
  3. #在文件中<configuration>和</configuration>一对标签之间追加以下配置信息
  4. <configuration>
  5.                 <property>
  6.                                 <name>mapreduce.framework.name</name>
  7.                 <value>yarn</value>
  8.                 </property>
  9.                 <property>
  10.                                 <name>mapreduce.jobhistory.address</name>
  11.                                 <value>master:10020</value>
  12.                 </property>
  13.                 <property>
  14.                                 <name>mapreduce.jobhistory.webapp.address</name>
  15.                                 <value>master:19888</value>
  16.                 </property>
  17. </configuration>
复制代码
1.3.5 实验任务五:配置 yarn-site.xml
  1. [root@master hadoop]# vi /usr/local/src/hadoop/etc/hadoop/yarn-site.xml
  2. #在文件中<configuration>和</configuration>一对标签之间追加以下配置信息
  3. <configuration>
  4.                 <property>
  5.                                 <name>arn.resourcemanager.address</name>
  6.                                 <value>master:8032</value>
  7.                 </property>
  8.                 <property>
  9.                                 <name>yarn.resourcemanager.scheduler.address</name>
  10.                                 <value>master:8030</value>
  11.                 </property>
  12.                 <property>
  13.                                 <name>yarn.resourcemanager.webapp.address</name>
  14.                                 <value>master:8088</value>
  15.                 </property>
  16.                 <property>
  17.                                 <name>yarn.resourcemanager.resource-tracker.address</name>
  18.                                 <value>master:8031</value>
  19.                 </property>
  20.                 <property>
  21.                                 <name>yarn.resourcemanager.admin.address</name>
  22.                                 <value>master:8033</value>
  23.                 </property>
  24.                 <property>
  25.                                 <name>yarn.nodemanager.aux-services</name>
  26.                                 <value>mapreduce_shuffle</value>
  27.                 </property>
  28.                 <property>
  29.                           <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
  30.                           <value>org.apache.hadoop.mapred.ShuffleHandler</value>
  31.                 </property>
  32. </configuration>
复制代码
1.3.6 实验任务六:Hadoop 别的相关配置

1.3.6.1 步骤一:配置 masters 文件
  1. #修改 masters 配置文件
  2. [root@master ~]# vi /usr/local/src/hadoop/etc/hadoop/masters
  3. #加入以下配置信息
  4. 10.10.10.128
复制代码
1.3.6.2 步骤二:配置 slaves 文件
  1. #修改 slaves 配置文件
  2. [root@master ~]# vi /usr/local/src/hadoop/etc/hadoop/slaves
  3. #删除 localhost,加入以下配置信息
  4. 10.10.10.129
  5. 10.10.10.130
复制代码
1.3.6.3 步骤三:新建用户以及修改目录权限
  1. #新建用户
  2. [root@master ~]# useradd hadoop
  3. [root@master ~]# echo 'hadoop' | passwd --stdin hadoop
  4. Changing password for user hadoop.
  5. passwd: all authentication tokens updated successfully.
  6. #修改目录权限
  7. [root@master ~]# chown -R hadoop.hadoop /usr/local/src/
  8. [root@master ~]# cd /usr/local/src/
  9. [root@master src]# ll
  10. total 0
  11. drwxr-xr-x 11 hadoop hadoop 171 Mar 27 01:51 hadoop
  12. drwxr-xr-x  8 hadoop hadoop 255 Sep 14  2017 jdk
复制代码
1.3.6.4 步骤四:配置master能够免密登录全部slave节点
  1. [root@master ~]# ssh-keygen -t rsa
  2. Generating public/private rsa key pair.
  3. Enter file in which to save the key (/root/.ssh/id_rsa):
  4. Created directory '/root/.ssh'.
  5. Enter passphrase (empty for no passphrase):
  6. Enter same passphrase again:
  7. Your identification has been saved in /root/.ssh/id_rsa.
  8. Your public key has been saved in /root/.ssh/id_rsa.pub.
  9. The key fingerprint is:
  10. SHA256:Ibeslip4Bo9erREJP37u7qhlwaEeMOCg8DlJGSComhk root@master
  11. The key's randomart image is:
  12. +---[RSA 2048]----+
  13. |B.oo |
  14. |Oo.o |
  15. |=o=.  . o|
  16. |E.=.o  + o   |
  17. |.* BS|
  18. |* o =  o |
  19. | * * o+  |
  20. |o O *o   |
  21. |.=.+==   |
  22. +----[SHA256]-----+
  23. [root@master ~]# ssh-copy-id root@slave1
  24. /usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_rsa.pub"
  25. The authenticity of host 'slave1 (10.10.10.129)' can't be established.
  26. ECDSA key fingerprint is SHA256:Z643OMlGh0yMEc5i85oZ7c21NHdkzSZD9hY9K39xzP4.
  27. ECDSA key fingerprint is MD5:e0:ef:47:5f:ad:75:9a:44:08:bc:f2:10:8e:d6:53:4a.
  28. Are you sure you want to continue connecting (yes/no)? yes
  29. /usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
  30. /usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
  31. root@slave1's password:
  32. Number of key(s) added: 1
  33. Now try logging into the machine, with:   "ssh 'root@slave1'"
  34. and check to make sure that only the key(s) you wanted were added.
  35. [root@master ~]# ssh-copy-id root@slave2
  36. /usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_rsa.pub"
  37. The authenticity of host 'slave2 (10.10.10.130)' can't be established.
  38. ECDSA key fingerprint is SHA256:Z643OMlGh0yMEc5i85oZ7c21NHdkzSZD9hY9K39xzP4.
  39. ECDSA key fingerprint is MD5:e0:ef:47:5f:ad:75:9a:44:08:bc:f2:10:8e:d6:53:4a.
  40. Are you sure you want to continue connecting (yes/no)? yes
  41. /usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
  42. /usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
  43. root@slave2's password:
  44. Number of key(s) added: 1  
  45. Now try logging into the machine, with:   "ssh 'root@slave2'"
  46. and check to make sure that only the key(s) you wanted were added.
  47.    
  48. [root@master ~]# ssh slave1
  49. Last login: Sun Mar 27 02:58:38 2022 from master
  50. [root@slave1 ~]# exit
  51. logout
  52. Connection to slave1 closed.
  53. [root@master ~]# ssh slave2
  54. Last login: Sun Mar 27 00:26:12 2022 from 10.10.10.1
  55. [root@slave2 ~]# exit
  56. logout
  57. Connection to slave2 closed.
复制代码
1.3.6.5 步骤五:同步/usr/local/src/目录下全部文件至全部slave节点
  1. [root@master ~]# scp -r /usr/local/src/* root@slave1:/usr/local/src/
  2. [root@master ~]# scp -r /usr/local/src/* root@slave2:/usr/local/src/
  3. [root@master ~]# scp /etc/profile.d/hadoop.sh root@slave1:/etc/profile.d/
  4. hadoop.sh                                   100%  151    45.9KB/s   00:00
  5.    
  6. [root@master ~]# scp /etc/profile.d/hadoop.sh root@slave2:/etc/profile.d/
  7. hadoop.sh                                   100%  151    93.9KB/s   00:00   
复制代码
1.3.6.6 步骤六:在全部slave节点执行以下命令
  1. (1)在slave1
  2. [root@slave1 ~]# useradd hadoop
  3. [root@slave1 ~]# echo 'hadoop' | passwd --stdin hadoop
  4. Changing password for user hadoop.
  5. passwd: all authentication tokens updated successfully.
  6. [root@slave1 ~]# chown -R hadoop.hadoop /usr/local/src/
  7. [root@slave1 ~]# ll /usr/local/src/
  8. total 0
  9. drwxr-xr-x 11 hadoop hadoop 171 Mar 27 03:07 hadoop
  10. drwxr-xr-x  8 hadoop hadoop 255 Mar 27 03:07 jdk
  11. [root@slave1 ~]# source /etc/profile.d/hadoop.sh
  12. [root@slave1 ~]# echo $PATH
  13. /usr/local/src/jdk/bin:/usr/local/src/hadoop/bin:/usr/local/src/hadoop/sbin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin
  14. (2)在slave2
  15. [root@slave2 ~]# useradd hadoop
  16. [root@slave2 ~]# echo 'hadoop' | passwd --stdin hadoop
  17. Changing password for user hadoop.
  18. passwd: all authentication tokens updated successfully.
  19. [root@slave2 ~]# chown -R hadoop.hadoop /usr/local/src/
  20. [root@slave2 ~]# ll /usr/local/src/
  21. total 0
  22. drwxr-xr-x 11 hadoop hadoop 171 Mar 27 03:09 hadoop
  23. drwxr-xr-x  8 hadoop hadoop 255 Mar 27 03:09 jdk
  24. [root@slave2 ~]# source /etc/profile.d/hadoop.sh
  25. [root@slave2 ~]# echo $PATH
  26. /usr/local/src/jdk/bin:/usr/local/src/hadoop/bin:/usr/local/src/hadoop/sbin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin
复制代码
第5章 Hadoop集群运行

实验一:hadoop 集群运行

1.1 实验目标

完成本实验,您应该能够:

  • 掌握 hadoop 的运行状态
  • 掌握 hadoop 文件系统格式化配置
  • 掌握 hadoop java 运行状态查看
  • 掌握 hadoop hdfs 报告查看
  • 掌握 hadoop 节点状态查看
  • 掌握停止 hadoop 进程操作
1.2 实验要求


  • 认识怎样查看 hadoop 的运行状态
  • 认识停止 hadoop 进程的操作
1.3 实验过程

1.3.1 实验任务一:配置 Hadoop 格式化

1.3.1.1 步骤一:NameNode 格式化

将 NameNode 上的数据清零,第一次启动 HDFS 时要进行格式化,以后启动无需再格式化,否则会缺失 DataNode 进程。别的,只要运行过 HDFS,Hadoop 的工作目录(本书设置为/usr/local/src/hadoop/tmp)就会有数据,如果需要重新格式化,则在格式化之前一定要先删除工作目录下的数据,否则格式化时会出题目。
执行如下命令,格式化 NameNode
  1. [root@master ~]# su - hadoop
  2. Last login: Fri Apr  1 23:34:46 CST 2022 on pts/1
  3. [hadoop@master ~]$ cd /usr/local/src/hadoop/
  4. [hadoop@master hadoop]$ ./bin/hdfs namenode -format
  5. 22/04/02 01:22:42 INFO namenode.NameNode: STARTUP_MSG:
  6. /************************************************************
复制代码
1.3.1.2 步骤二:启动 NameNode
  1. [hadoop@master hadoop]$ hadoop-daemon.sh start namenode
  2. namenode running as process 11868. Stop it first.
复制代码
1.3.2 实验任务二:查看 Java 进程

启动完成后,可以使用 JPS 命令查看是否成功。JPS 命令是 Java 提供的一个显示当前全部 Java 进程 pid 的命令。
  1. [hadoop@master hadoop]$ jps
  2. 12122 Jps
  3. 11868 NameNode
复制代码
1.3.2.1 步骤一:切换到Hadoop用户
  1. [hadoop@master ~]$ su - hadoop
  2. Password:
  3. Last login: Sat Apr  2 01:22:13 CST 2022 on pts/1
  4. Last failed login: Sat Apr  2 04:47:08 CST 2022 on pts/1
  5. There was 1 failed login attempt since the last successful login.
复制代码
1.3.3 实验任务三:查看 HDFS 的报告
  1. [hadoop@master ~]$ hdfs dfsadmin -report
  2. Configured Capacity: 0 (0 B)
  3. Present Capacity: 0 (0 B)
  4. DFS Remaining: 0 (0 B)
  5. DFS Used: 0 (0 B)
  6. DFS Used%: NaN%
  7. Under replicated blocks: 0
  8. Blocks with corrupt replicas: 0
  9. Missing blocks: 0
  10. Missing blocks (with replication factor 1): 0
  11. -------------------------------------------------
复制代码
1.3.3.1 步骤一:生成密钥
  1. [hadoop@master ~]$ ssh-keygen -t rsa
  2. Generating public/private rsa key pair.
  3. Enter file in which to save the key (/home/hadoop/.ssh/id_rsa):
  4. Created directory '/home/hadoop/.ssh'.
  5. Enter passphrase (empty for no passphrase):
  6. Enter same passphrase again:
  7. Your identification has been saved in /home/hadoop/.ssh/id_rsa.
  8. Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub.
  9. The key fingerprint is:
  10. SHA256:nW/cVxmRp5Ht9TKGT61OmGbhQtkBdpHyS5prGhx24pI hadoop@master.example.com
  11. The key's randomart image is:
  12. +---[RSA 2048]----+
  13. |  o.oo +.|
  14. | ...o o.=|
  15. |   = o *+|
  16. | .o.* * *|
  17. |S.+= O =.|
  18. |   = ++oB.+ .|
  19. |  E +  =+o. .|
  20. |   . .o.  .. |
  21. |.o   |
  22. +----[SHA256]-----+
  23. [hadoop@master ~]$ ssh-copy-id slave1
  24. /bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/home/hadoop/.ssh/id_rsa.pub"
  25. The authenticity of host 'slave1 (10.10.10.129)' can't be established.
  26. ECDSA key fingerprint is SHA256:BE2tM2BCeGBc6aGRKBTbMTh80VP9noFKzqDknL+0Jes.
  27. ECDSA key fingerprint is MD5:a2:25:9c:bc:d0:df:fc:ec:44:4a:c0:10:26:f2:ef:c7.
  28. Are you sure you want to continue connecting (yes/no)? yes
  29. /bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
  30. /bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
  31. hadoop@slave1's password:
  32. Number of key(s) added: 1
  33. Now try logging into the machine, with:   "ssh 'slave1'"
  34. and check to make sure that only the key(s) you wanted were added.
  35. [hadoop@master ~]$ ssh-copy-id slave2
  36. /bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/home/hadoop/.ssh/id_rsa.pub"
  37. The authenticity of host 'slave2 (10.10.10.130)' can't be established.
  38. ECDSA key fingerprint is SHA256:BE2tM2BCeGBc6aGRKBTbMTh80VP9noFKzqDknL+0Jes.
  39. ECDSA key fingerprint is MD5:a2:25:9c:bc:d0:df:fc:ec:44:4a:c0:10:26:f2:ef:c7.
  40. Are you sure you want to continue connecting (yes/no)? yes
  41. /bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
  42. /bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
  43. hadoop@slave2's password:
  44. Number of key(s) added: 1
  45. Now try logging into the machine, with:   "ssh 'slave2'"
  46. and check to make sure that only the key(s) you wanted were added.
  47. [hadoop@master ~]$ ssh-copy-id master
  48. /bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/home/hadoop/.ssh/id_rsa.pub"
  49. The authenticity of host 'master (10.10.10.128)' can't be established.
  50. ECDSA key fingerprint is SHA256:BE2tM2BCeGBc6aGRKBTbMTh80VP9noFKzqDknL+0Jes.
  51. ECDSA key fingerprint is MD5:a2:25:9c:bc:d0:df:fc:ec:44:4a:c0:10:26:f2:ef:c7.
  52. Are you sure you want to continue connecting (yes/no)? yes
  53. /bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
  54. /bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
  55. hadoop@master's password:
  56. Number of key(s) added: 1
  57. Now try logging into the machine, with:   "ssh 'master'"
  58. and check to make sure that only the key(s) you wanted were added.
复制代码
1.3.4 实验任务四:停止dfs.sh
  1. [hadoop@master ~]$ stop-dfs.sh
  2. Stopping namenodes on [master]
  3. master: stopping namenode
  4. 10.10.10.129: no datanode to stop
  5. 10.10.10.130: no datanode to stop
  6. Stopping secondary namenodes [0.0.0.0]
  7. The authenticity of host '0.0.0.0 (0.0.0.0)' can't be established.
  8. ECDSA key fingerprint is SHA256:BE2tM2BCeGBc6aGRKBTbMTh80VP9noFKzqDknL+0Jes.
  9. ECDSA key fingerprint is MD5:a2:25:9c:bc:d0:df:fc:ec:44:4a:c0:10:26:f2:ef:c7.
  10. Are you sure you want to continue connecting (yes/no)? yes
  11. 0.0.0.0: Warning: Permanently added '0.0.0.0' (ECDSA) to the list of known hosts.
  12. 0.0.0.0: no secondarynamenode to stop
复制代码
1.3.4.1 重启并验证
  1. [hadoop@master ~]$ start-dfs.sh
  2. Starting namenodes on [master]
  3. master: starting namenode, logging to /usr/local/src/hadoop/logs/hadoop-hadoop-namenode-master.example.com.out
  4. 10.10.10.129: starting datanode, logging to /usr/local/src/hadoop/logs/hadoop-hadoop-datanode-slave1.out
  5. 10.10.10.130: starting datanode, logging to /usr/local/src/hadoop/logs/hadoop-hadoop-datanode-slave2.out
  6. Starting secondary namenodes [0.0.0.0]
  7. 0.0.0.0: starting secondarynamenode, logging to /usr/local/src/hadoop/logs/hadoop-hadoop-secondarynamenode-master.example.com.out
  8. [hadoop@master ~]$ start-yarn.sh
  9. starting yarn daemons
  10. starting resourcemanager, logging to /usr/local/src/hadoop/logs/yarn-hadoop-resourcemanager-master.example.com.out
  11. 10.10.10.129: starting nodemanager, logging to /usr/local/src/hadoop/logs/yarn-hadoop-nodemanager-slave1.out
  12. 10.10.10.130: starting nodemanager, logging to /usr/local/src/hadoop/logs/yarn-hadoop-nodemanager-slave2.out
  13. [hadoop@master ~]$ jps
  14. 12934 NameNode
  15. 13546 Jps
  16. 13131 SecondaryNameNode
  17. 13291 ResourceManager
  18. 如果在master上看到ResourceManager,并且在slave上看到NodeManager就表示成功
  19. [hadoop@master ~]$ jps
  20. 12934 NameNode
  21. 13546 Jps
  22. 13131 SecondaryNameNode
  23. 13291 ResourceManager
  24. [root@slave1 ~]# jps
  25. 11906 NodeManager
  26. 11797 DataNode
  27. 12037 Jps
  28. [root@slave2 ~]# jps
  29. 12758 NodeManager
  30. 12648 DataNode
  31. 12889 Jps
  32. [hadoop@master ~]$ hdfs dfs -mkdir /input
  33. [hadoop@master ~]$ hdfs dfs -ls /
  34. Found 1 items
  35. drwxr-xr-x   - hadoop supergroup          0 2022-04-02 05:18 /input
  36. [hadoop@master ~]$ mkdir ~/input
  37. [hadoop@master ~]$ vim ~/input/data.txt
  38. Hello World
  39. Hello Hadoop
  40. Hello Huasan
  41. ~
  42. [hadoop@master ~]$ hdfs dfs -put ~/input/data.txt
  43. .bash_logout       .bashrc            .oracle_jre_usage/ .viminfo           
  44. .bash_profile      input/             .ssh/              
  45. [hadoop@master ~]$ hdfs dfs -put ~/input/data.txt /input
  46. [hadoop@master ~]$ hdfs dfs -cat /input/data.txt
  47. Hello World
  48. Hello Hadoop
  49. Hello Huasan
  50. [hadoop@master ~]$ hadoop jar /usr/local/src/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar wordcount /input/data.txt /output
  51. 22/04/02 05:31:20 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
  52. 22/04/02 05:31:21 INFO input.FileInputFormat: Total input paths to process : 1
  53. 22/04/02 05:31:21 INFO mapreduce.JobSubmitter: number of splits:1
  54. 22/04/02 05:31:21 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1648846845675_0001
  55. 22/04/02 05:31:22 INFO impl.YarnClientImpl: Submitted application application_1648846845675_0001
  56. 22/04/02 05:31:22 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1648846845675_0001/
  57. 22/04/02 05:31:22 INFO mapreduce.Job: Running job: job_1648846845675_0001
  58. 22/04/02 05:31:30 INFO mapreduce.Job: Job job_1648846845675_0001 running in uber mode : false
  59. 22/04/02 05:31:30 INFO mapreduce.Job:  map 0% reduce 0%
  60. 22/04/02 05:31:38 INFO mapreduce.Job:  map 100% reduce 0%
  61. 22/04/02 05:31:42 INFO mapreduce.Job:  map 100% reduce 100%
  62. 22/04/02 05:31:42 INFO mapreduce.Job: Job job_1648846845675_0001 completed successfully
  63. 22/04/02 05:31:42 INFO mapreduce.Job: Counters: 49
  64.     File System Counters
  65.             FILE: Number of bytes read=56
  66.             FILE: Number of bytes written=230931
  67.             FILE: Number of read operations=0
  68.             FILE: Number of large read operations=0
  69.             FILE: Number of write operations=0
  70.             HDFS: Number of bytes read=136
  71.             HDFS: Number of bytes written=34
  72.             HDFS: Number of read operations=6
  73.             HDFS: Number of large read operations=0
  74.             HDFS: Number of write operations=2
  75.     Job Counters
  76.             Launched map tasks=1
  77.             Launched reduce tasks=1
  78.             Data-local map tasks=1
  79.             Total time spent by all maps in occupied slots (ms)=5501
  80.             Total time spent by all reduces in occupied slots (ms)=1621
  81.             Total time spent by all map tasks (ms)=5501
  82.             Total time spent by all reduce tasks (ms)=1621
  83.             Total vcore-seconds taken by all map tasks=5501
  84.             Total vcore-seconds taken by all reduce tasks=1621
  85.             Total megabyte-seconds taken by all map tasks=5633024
  86.             Total megabyte-seconds taken by all reduce tasks=1659904
  87.     Map-Reduce Framework
  88.             Map input records=3
  89.             Map output records=6
  90.             Map output bytes=62
  91.             Map output materialized bytes=56
  92.             Input split bytes=98
  93.             Combine input records=6
  94.             Combine output records=4
  95.             Reduce input groups=4
  96.             Reduce shuffle bytes=56
  97.             Reduce input records=4
  98.             Reduce output records=4
  99.             Spilled Records=8
  100.             Shuffled Maps =1
  101.             Failed Shuffles=0
  102.             Merged Map outputs=1
  103.             GC time elapsed (ms)=572
  104.             CPU time spent (ms)=1860
  105.             Physical memory (bytes) snapshot=428474368
  106.             Virtual memory (bytes) snapshot=4219695104
  107.             Total committed heap usage (bytes)=284164096
  108.     Shuffle Errors
  109.             BAD_ID=0
  110.             CONNECTION=0
  111.             IO_ERROR=0
  112.             WRONG_LENGTH=0
  113.             WRONG_MAP=0
  114.             WRONG_REDUCE=0
  115.     File Input Format Counters
  116.             Bytes Read=38
  117.     File Output Format Counters
  118.             Bytes Written=34
  119. [hadoop@master ~]$ hdfs dfs -cat /output/part-r-00000
  120. Hadoop  1
  121. Hello   3
  122. Huasan  1
  123. World   1
复制代码
第6章 Hive组建安装配置

实验一:Hive 组件安装配置

1.1. 实验目标

完成本实验,您应该能够:

  • 掌握Hive 组件安装配置
  • 掌握Hive 组件格式化和启动
1.2. 实验要求


  • 认识Hive 组件安装配置
  • 相识Hive 组件格式化和启动
1.3. 实验过程

1.3.1. 实验任务一:下载和解压安装文件

1.3.1.1. 步骤一:基础环境和安装准备

Hive 组件需要基于Hadoop 系统进行安装。因此,在安装 Hive 组件前,需要确保 Hadoop 系统能够正常运行。本章节内容是基于之前已部署完毕的 Hadoop 全分布系统,在 master 节点上实现 Hive 组件安装。
Hive 组件的部署规划和软件包路径如下:
(1)当前环境中已安装 Hadoop 全分布系统。
(2)当地安装 MySQL 数据库(账号 root,密码 Password123$), 软件包在/opt/software/mysql-5.7.18 路径下。
(3)MySQL 端口号(3306)。
(4)MySQL 的 JDBC 驱动包/opt/software/mysql-connector-java-5.1.47.jar, 在此基础上更新 Hive 元数据存储。
(5)Hive 软件包/opt/software/apache-hive-2.0.0-bin.tar.gz。
1.3.1.2. 步骤二:解压安装文件

(1)使用 root 用户,将 Hive 安装包
/opt/software/apache-hive-2.0.0-bin.tar.gz 路解压到/usr/local/src 路径下。
  1. [root@master ~]# tar -zxvf /opt/software/apache-hive-2.0.0-bin.tar.gz -C /usr/local/src/
复制代码
(2)将解压后的 apache-hive-2.0.0-bin 文件夹更名为 hive;
  1. [root@master ~]# mv /usr/local/src/apache-hive-2.0.0-bin/ /usr/local/src/hive/
复制代码
(3)修改 hive 目录归属用户和用户组为 hadoop
  1. [root@master ~]# chown -R hadoop:hadoop /usr/local/src/hive
复制代码
1.3.2. 实验任务二:设置 Hive 环境

1.3.2.1. 步骤一:卸载MariaDB 数据库

Hive 元数据存储在 MySQL 数据库中,因此在部署 Hive 组件前需要起首在 Linux 系统下安装 MySQL 数据库,并进行 MySQL 字符集、安全初始化、远程访问权限等相关配置。需要使用 root 用户登录,执行如下操作步骤:
(1)关闭 Linux 系统防火墙,并将防火墙设定为系统开机并不主动启动。
  1. [root@master ~]# systemctl stop firewalld
  2. [root@master ~]# systemctl disable firewalld
复制代码
(2)卸载 Linux 系统自带的 MariaDB。

  • 起首查看 Linux 系统中 MariaDB 的安装环境。
    [root@master ~]# rpm -qa | grep mariadb
2)卸载 MariaDB 软件包。
我这里没有就不需要卸载
1.3.2.2. 步骤二:安装MySQL 数据库

(1)按如下顺序依次按照 MySQL 数据库的 mysql common、mysql libs、mysql client 软件包。
  1. [root@master ~]# cd /opt/software/mysql-5.7.18/
  2. [root@master mysql-5.7.18]# rpm -ivh mysql-community-common-5.7.18-1.el7.x86_64.rpm
  3. warning: mysql-community-common-5.7.18-1.el7.x86_64.rpm: Header V3 DSA/SHA1 Signature, key ID 5072e1f5: NOKEY
  4. Preparing...  ################################# [100%]
  5. package mysql-community-common-5.7.18-1.el7.x86_64 is already installed
  6. [root@master mysql-5.7.18]# rpm -ivh mysql-community-libs-5.7.18-1.el7.x86_64.rpm
  7. warning: mysql-community-libs-5.7.18-1.el7.x86_64.rpm: Header V3 DSA/SHA1 Signature, key ID 5072e1f5: NOKEY
  8. Preparing...  ################################# [100%]
  9. package mysql-community-libs-5.7.18-1.el7.x86_64 is already installed
  10. [root@master mysql-5.7.18]# rpm -ivh mysql-community-client-5.7.18-1.el7.x86_64.rpm
  11. warning: mysql-community-client-5.7.18-1.el7.x86_64.rpm: Header V3 DSA/SHA1 Signature, key ID 5072e1f5: NOKEY
  12. Preparing...  ################################# [100%]
  13. package mysql-community-client-5.7.18-1.el7.x86_64 is already installed
复制代码
(2)安装 mysql server 软件包。
  1. [root@master mysql-5.7.18]# rpm -ivh mysql-community-server-5.7.18-1.el7.x86_64.rpm
  2. warning: mysql-community-server-5.7.18-1.el7.x86_64.rpm: Header V3 DSA/SHA1 Signature, key ID 5072e1f5: NOKEY
  3. Preparing...  ################################# [100%]
  4. package mysql-community-server-5.7.18-1.el7.x86_64 is already installed
复制代码
(3)修改 MySQL 数据库配置,在/etc/my.cnf 文件中添加如表 6-1 所示的 MySQL 数据库配置项。

将以下配置信息添加到/etc/my.cnf 文件 symbolic-links=0 配置信息的下方。
  1. default-storage-engine=innodb
  2. innodb_file_per_table
  3. collation-server=utf8_general_ci
  4. init-connect='SET NAMES utf8'
  5. character-set-server=utf8
复制代码
(4)启动 MySQL 数据库。
  1. [root@master ~]# systemctl start mysqld
复制代码
(5)查询 MySQL 数据库状态。mysqld 进程状态为 active (running),则表示 MySQL 数据库正常运行。
如果 mysqld 进程状态为 failed,则表示 MySQL 数据库启动非常。此时需要排查/etc/my.cnf 文件。
  1. [root@master ~]# systemctl status mysqld
  2. ● mysqld.service - MySQL Server
  3.    Loaded: loaded (/usr/lib/systemd/system/mysqld.service; enabled; vendor preset: disabled)
  4.    Active: active (running) since Sun 2022-04-10 22:54:39 CST; 1h 0min ago
  5. Docs: man:mysqld(8)
  6.    http://dev.mysql.com/doc/refman/en/using-systemd.html
  7. Main PID: 929 (mysqld)
  8.    CGroup: /system.slice/mysqld.service
  9.    └─929 /usr/sbin/mysqld --daemonize --pid-file=/var/run/mysqld/my...
  10. Apr 10 22:54:35 master systemd[1]: Starting MySQL Server...
  11. Apr 10 22:54:39 master systemd[1]: Started MySQL Server.
复制代码
(6)查询 MySQL 数据库默认密码。
  1. [root@master ~]# cat /var/log/mysqld.log | grep password
  2. 2022-04-08T16:20:04.456271Z 1 [Note] A temporary password is generated for root@localhost: 0yf>>yWdMd8_
复制代码
MySQL 数据库是安装后随机生成的,所以每次安装后生成的默认密码不雷同。
(7)MySQL 数据库初始化。 0yf>>yWdMd8_
执行 mysql_secure_installation 命令初始化 MySQL 数据库,初始化过程中需要设定命据库 root 用户登录密码,密码需符合安全规则,包括大小写字符、数字和特殊符号, 可设定密码为 Password123$。
在进行 MySQL 数据库初始化过程中会出现以下交互确认信息:
1)Change the password for root ? ((Press y|Y for Yes, any other key for No)表示是否更改 root 用户密码,在键盘输入 y 和回车。
2)Do you wish to continue with the password provided?(Press y|Y for Yes, any other key for No)表示是否使用设定的密码继续,在键盘输入 y 和回车。
3)Remove anonymous users? (Press y|Y for Yes, any other key for No)表示是否删除匿名用户,在键盘输入 y 和回车。
4)Disallow root login remotely? (Press y|Y for Yes, any other key for No) 表示是否拒绝 root 用户远程登录,在键盘输入 n 和回车,表示答应 root 用户远程登录。
5)Remove test database and access to it? (Press y|Y for Yes, any other key for No)表示是否删除测试数据库,在键盘输入 y 和回车。
6)Reload privilege tables now? (Press y|Y for Yes, any other key for No) 表示是否重新加载授权表,在键盘输入 y 和回车。
mysql_secure_installation 命令执行过程如下:
  1. [root@master ~]# mysql_secure_installation
  2. Securing the MySQL server deployment.
  3. Enter password for user root:
  4. The 'validate_password' plugin is installed on the server.
  5. The subsequent steps will run with the existing configuration
  6. of the plugin.
  7. Using existing password for root.
  8. Estimated strength of the password: 100
  9. Change the password for root ? ((Press y|Y for Yes, any other key for No) : y
  10. New password:
  11. Re-enter new password:
  12. Estimated strength of the password: 100
  13. Do you wish to continue with the password provided?(Press y|Y for Yes, any other key for No) : y
  14. By default, a MySQL installation has an anonymous user,
  15. allowing anyone to log into MySQL without having to have
  16. a user account created for them. This is intended only for
  17. testing, and to make the installation go a bit smoother.
  18. You should remove them before moving into a production
  19. environment.
  20. Remove anonymous users? (Press y|Y for Yes, any other key for No) : y
  21. Success.
复制代码
  1. Normally, root should only be allowed to connect from
  2. 'localhost'. This ensures that someone cannot guess at
  3. the root password from the network.
  4. Disallow root login remotely? (Press y|Y for Yes, any other key for No) : n
  5. ... skipping.
  6. By default, MySQL comes with a database named 'test' that
  7. anyone can access. This is also intended only for testing,
  8. and should be removed before moving into a production
  9. environment.
复制代码
  1. Remove test database and access to it? (Press y|Y for Yes, any other key for No) : y
  2. - Dropping test database...
  3. Success.
  4. - Removing privileges on test database...
  5. Success.
  6. Reloading the privilege tables will ensure that all changes
  7. made so far will take effect immediately.
  8. Reload privilege tables now? (Press y|Y for Yes, any other key for No) : y
  9. Success.
  10. All done!
复制代码
(7) 添加 root 用户从当地和远程访问 MySQL 数据库表单的授权。
  1. [root@master ~]# mysql -u root -p
  2. Enter password:
  3. Welcome to the MySQL monitor.  Commands end with ; or \g.
  4. Your MySQL connection id is 9
  5. Server version: 5.7.18 MySQL Community Server (GPL)
  6. Copyright (c) 2000, 2017, Oracle and/or its affiliates. All rights reserved.
  7. Oracle is a registered trademark of Oracle Corporation and/or its
  8. affiliates. Other names may be trademarks of their respective
  9. owners.
  10. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
  11. mysql> grant all privileges on *.* to root@'localhost' identified by 'Password123$';
  12. Query OK, 0 rows affected, 1 warning (0.00 sec)
  13. mysql> grant all privileges on *.* to root@'%' identified by 'Password123$';
  14. Query OK, 0 rows affected, 1 warning (0.00 sec)
  15. mysql> flush privileges;
  16. Query OK, 0 rows affected (0.00 sec)
  17. mysql> select user,host from mysql.user where user='root';
  18. +------+-----------+
  19. | user | host  |
  20. +------+-----------+
  21. | root | % |
  22. | root | localhost |
  23. +------+-----------+
  24. 2 rows in set (0.00 sec)
  25. mysql> exit;
  26. Bye
复制代码
1.3.2.3. 步骤三:配置 Hive 组件

(1)设置 Hive 环境变量并使其生效。
  1. [root@master ~]# vim /etc/profile
  2. export HIVE_HOME=/usr/local/src/hive
  3. export PATH=$PATH:$HIVE_HOME/bin
  4. [root@master ~]# source /etc/profile
复制代码
(2)修改 Hive 组件配置文件。
切换到 hadoop 用户执行以下对 Hive 组件的配置操作。
将/usr/local/src/hive/conf 文件夹下 hive-default.xml.template 文件,更名为hive-site.xml。
  1. [root@master ~]# su - hadoop
  2. Last login: Sun Apr 10 23:27:25 CS
  3. [hadoop@master ~]$ cp /usr/local/src/hive/conf/hive-default.xml.template  /usr/local/src/hive/conf/hive-site.xml
复制代码
(3)通过 vi 编辑器修改 hive-site.xml 文件实现 Hive 连接 MySQL 数据库,并设定Hive 临时文件存储路径。
  1. [hadoop@master ~]$ vi /usr/local/src/hive/conf/hive-site.xml
复制代码
1)设置 MySQL 数据库连接。
  1. <name>javax.jdo.option.ConnectionURL</name>
  2. <value>jdbc:mysql://master:3306/hive?createDatabaseIfNotExist=true&us eSSL=false</value>
  3. <description>JDBC connect string for a JDBC metastore</description>
复制代码
2)配置 MySQL 数据库 root 的密码。
  1. <property>
  2. <name>javax.jdo.option.ConnectionPassword</name>
  3. <value>Password123$</value>
  4. <description>password to use against s database</description>
  5. </property>
复制代码
3)验证元数据存储版本同等性。若默认 false,则不用修改。
  1. <property>
  2. <name>hive.metastore.schema.verification</name>
  3. <value>false</value>
  4. <description>
  5. Enforce metastore schema version consistency.
  6. True: Verify that version information stored in is compatible with one from Hive jars. Also disable automatic
  7. False: Warn if the version information stored in metastore doesn't match with one from in Hive jars.
  8. </description>
  9. </property>
复制代码
4)配置数据库驱动。
  1. <property>
  2. <name>javax.jdo.option.ConnectionDriverName</name>
  3. <value>com.mysql.jdbc.Driver</value>
  4. <description>Driver class name for a JDBC metastore</description>
  5. </property>
复制代码
5)配置数据库用户名 javax.jdo.option.ConnectionUserName 为 root。
  1. <property>
  2. <name>javax.jdo.option.ConnectionUserName</name>
  3. <value>root</value>
  4. <description>Username to use against metastore database</description>
  5. </property>
复制代码
6 )将以下位置的 ${system:java.io.tmpdir}/${system:user.name} 替换为“/usr/local/src/hive/tmp”目录及其子目录。
需要替换以下 4 处配置内容:
  1. <name>hive.querylog.location</name>
  2. <value>/usr/local/src/hive/tmp</value>
  3. <description>Location of Hive run time structured log file</description>
  4. <name>hive.exec.local.scratchdir</name>
  5. <value>/usr/local/src/hive/tmp</value>
  6. <name>hive.downloaded.resources.dir</name>
  7. <value>/usr/local/src/hive/tmp/resources</value>
  8. <name>hive.server2.logging.operation.log.location</name>
  9. <value>/usr/local/src/hive/tmp/operation_logs</value>
复制代码
7)在Hive安装目录中创建临时文件夹 tmp。
  1. [hadoop@master ~]$ mkdir /usr/local/src/hive/tmp
复制代码
至此,Hive 组件安装和配置完成。
1.3.2.4. 步骤四:初始化 hive 元数据

1)将 MySQL 数据库驱动(/opt/software/mysql-connector-java-5.1.46.jar)拷贝到Hive 安装目录的 lib 下;
  1. [hadoop@master ~]$ cp /opt/software/mysql-connector-java-5.1.46.jar /usr/local/src/hive/lib/
复制代码
2)重新启动 hadooop 即可
  1. [hadoop@master ~]$ stop-all.sh
  2. This script is Deprecated. Instead use stop-dfs.sh and stop-yarn.sh
  3. Stopping namenodes on [master]
  4. master: stopping namenode
  5. 10.10.10.129: stopping datanode
  6. 10.10.10.130: stopping datanode
  7. Stopping secondary namenodes [0.0.0.0]
  8. 0.0.0.0: stopping secondarynamenode
  9. stopping yarn daemons
  10. stopping resourcemanager
  11. 10.10.10.129: stopping nodemanager
  12. 10.10.10.130: stopping nodemanager
  13. no proxyserver to stop
  14. [hadoop@master ~]$ start-all.sh
  15. This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
  16. Starting namenodes on [master]
  17. master: starting namenode, logging to /usr/local/src/hadoop/logs/hadoop-hadoop-namenode-master.out
  18. 10.10.10.129: starting datanode, logging to /usr/local/src/hadoop/logs/hadoop-hadoop-datanode-slave1.out
  19. 10.10.10.130: starting datanode, logging to /usr/local/src/hadoop/logs/hadoop-hadoop-datanode-slave2.out
  20. Starting secondary namenodes [0.0.0.0]
  21. 0.0.0.0: starting secondarynamenode, logging to /usr/local/src/hadoop/logs/hadoop-hadoop-secondarynamenode-master.out
  22. starting yarn daemons
  23. starting resourcemanager, logging to /usr/local/src/hadoop/logs/yarn-hadoop-resourcemanager-master.out
  24. 10.10.10.130: starting nodemanager, logging to /usr/local/src/hadoop/logs/yarn-hadoop-nodemanager-slave2.out
  25. 10.10.10.129: starting nodemanager, logging to /usr/local/src/hadoop/logs/yarn-hadoop-nodemanager-slave1.out
复制代码
3)初始化数据库
  1. [hadoop@master ~]$ schematool -initSchema -dbType mysql
  2. which: no hbase in (/usr/local/src/jdk/bin:/usr/local/src/hadoop/bin:/usr/local/src/hadoop/sbin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/usr/local/src/hive/bin:/home/hadoop/.local/bin:/home/hadoop/bin)
  3. SLF4J: Class path contains multiple SLF4J bindings.
  4. SLF4J: Found binding in [jar:file:/usr/local/src/hive/lib/hive-jdbc-2.0.0-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  5. SLF4J: Found binding in [jar:file:/usr/local/src/hive/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  6. SLF4J: Found binding in [jar:file:/usr/local/src/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  7. SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
  8. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
  9. Metastore connection URL:jdbc:mysql://master:3306/hive?createDatabaseIfNotExist=true&us eSSL=false
  10. Metastore Connection Driver :com.mysql.jdbc.Driver
  11. Metastore connection User:   root
  12. Mon Apr 11 00:46:32 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
  13. Starting metastore schema initialization to 2.0.0
  14. Initialization script hive-schema-2.0.0.mysql.sql
  15. Password123$
  16. Password123$
  17. No current connection
  18. org.apache.hadoop.hive.metastore.HiveMetaException: Schema initialization FAILED! Metastore state would be inconsistent !!
复制代码
4)启动 hive
  1. [hadoop@master hive]$ hive
  2. SLF4J: Class path contains multiple SLF4J bindings.
  3. SLF4J: Found binding in [jar:file:/usr/local/src/hive/lib/hive-jdbc-2.0.0-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  4. SLF4J: Found binding in [jar:file:/usr/local/src/hive/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  5. SLF4J: Found binding in [jar:file:/usr/local/src/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  6. SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
  7. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
  8. Logging initialized using configuration in jar:file:/usr/local/src/hive/lib/hive-common-2.0.0.jar!/hive-log4j2.properties
  9. Fri May 20 18:51:50 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
  10. Fri May 20 18:51:50 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
  11. Fri May 20 18:51:50 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
  12. Fri May 20 18:51:50 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
  13. Fri May 20 18:51:52 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
  14. Fri May 20 18:51:52 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
  15. Fri May 20 18:51:52 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
  16. Fri May 20 18:51:52 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
  17. Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
  18. hive>
复制代码
第7章 ZooKeeper组件安装配置

实验一:ZooKeeper 组件安装配置

1.1.实验目标

完成本实验,您应该能够:

  • 掌握下载和安装 ZooKeeper
  • 掌握 ZooKeeper 的配置选项
  • 掌握启动 ZooKeeper
1.2.实验要求


  • 相识 ZooKeeper 的配置选项
  • 认识启动 ZooKeeper
1.3.实验过程

1.3.1 实验任务一:配置时间同步
  1. [root@master ~]# yum -y install chrony
  2. [root@master ~]# cat /etc/chrony.conf
  3. # Use public servers from the pool.ntp.org project.
  4. # Please consider joining the pool (http://www.pool.ntp.org/join.html).
  5. server time1.aliyun.com iburst
  6. [root@master ~]# systemctl restart chronyd.service
  7. [root@master ~]# systemctl enable chronyd.service
  8. [root@master ~]# date
  9. Fri Apr 15 15:40:14 CST 2022
复制代码
  1. [root@slave1 ~]# yum -y install chrony
  2. [root@slave1 ~]# cat /etc/chrony.conf
  3. # Use public servers from the pool.ntp.org project.
  4. # Please consider joining the pool (http://www.pool.ntp.org/join.html).
  5. server time1.aliyun.com iburst
  6. [root@slave1 ~]# systemctl restart chronyd.service
  7. [root@slave1 ~]# systemctl enable chronyd.service
  8. [root@slave1 ~]# date
  9. Fri Apr 15 15:40:17 CST 2022  
复制代码
  1. [root@slave2 ~]# yum -y install chrony
  2. [root@slave2 ~]# cat /etc/chrony.conf
  3. # Use public servers from the pool.ntp.org project.
  4. # Please consider joining the pool (http://www.pool.ntp.org/join.html).
  5. server time1.aliyun.com iburst
  6. [root@slave2 ~]# systemctl restart chronyd.service
  7. [root@slave2 ~]# systemctl enable chronyd.service
  8. [root@slave2 ~]# date
  9. Fri Apr 15 15:40:20 CST 2022
复制代码
1.3.2 实验任务二:下载和安装 ZooKeeper

ZooKeeper最新的版本可以通过官网http://hadoop.apache.org/zookeeper/来获取,安装 ZooKeeper 组件需要与 Hadoop 环境适配。
注意,各节点的防火墙需要关闭,否则会出现连接题目。
1.ZooKeeper 的安装包 zookeeper-3.4.8.tar.gz 已放置在 Linux系统/opt/software
目录下。
2.解压安装包到指定目标,在 Master 节点执行如下命令。
  1. [root@master ~]# tar xf /opt/software/zookeeper-3.4.8.tar.gz -C /usr/local/src/
  2. [root@master ~]# cd /usr/local/src/
  3. [root@master src]# mv zookeeper-3.4.8/ zookeeper
复制代码
1.3.3 实验任务三:ZooKeeper的配置选项

1.3.3.1 步骤一:Master节点配置

(1)在 ZooKeeper 的安装目录下创建 data 和 logs 文件夹。
  1. [root@master src]# cd /usr/local/src/zookeeper/
  2. [root@master zookeeper]# mkdir data logs
复制代码
(2)在每个节点写入该节点的标识编号,每个节点编号不同,master节点写入 1,slave1 节点写入2,slave2 节点写入3。
  1. [root@master zookeeper]# echo '1' > /usr/local/src/zookeeper/data/myid
复制代码
(3)修改配置文件 zoo.cfg
  1. [root@master zookeeper]# cd /usr/local/src/zookeeper/conf/
  2. [root@master conf]# cp zoo_sample.cfg zoo.cfg
复制代码
修改 dataDir 参数内容如下:
  1. [root@master conf]# vi zoo.cfg
  2. dataDir=/usr/local/src/zookeeper/data
复制代码
(4)在 zoo.cfg 文件末端追加以下参数配置,表示三个 ZooKeeper 节点的访问端口号。
  1. server.1=master:2888:3888
  2. server.2=slave1:2888:3888
  3. server.3=slave2:2888:3888
复制代码
(5)修改ZooKeeper安装目录的归属用户为 hadoop 用户。
  1. [root@master conf]# chown -R hadoop:hadoop /usr/local/src/
复制代码
1.3.3.2 步骤二:Slave 节点配置

(1)从 Master 节点复制 ZooKeeper 安装目录到两个 Slave 节点。
  1. [root@master ~]# scp -r /usr/local/src/zookeeper node1:/usr/local/src/
  2. [root@master ~]# scp -r /usr/local/src/zookeeper node2:/usr/local/src/
复制代码
(2)在slave1节点上修改 zookeeper 目录的归属用户为 hadoop 用户。
  1. [root@slave1 ~]# chown -R hadoop:hadoop /usr/local/src/
  2. [root@slave1 ~]# ll /usr/local/src/
  3. total 4
  4. drwxr-xr-x. 12 hadoop hadoop  183 Apr  2 18:11 hadoop
  5. drwxr-xr-x   9 hadoop hadoop  183 Apr 15 16:37 hbase
  6. drwxr-xr-x.  8 hadoop hadoop  255 Apr  2 18:06 jdk
  7. drwxr-xr-x  12 hadoop hadoop 4096 Apr 22 15:31 zookeeper
复制代码
(3)在slave1节点上配置该节点的myid为2。
  1. [root@slave1 ~]# echo 2 > /usr/local/src/zookeeper/data/myid
复制代码
(4)在slave2节点上修改 zookeeper 目录的归属用户为 hadoop 用户。
  1. [root@slave2 ~]# chown -R hadoop:hadoop /usr/local/src/
复制代码
(5)在slave2节点上配置该节点的myid为3。
  1. [root@slave2 ~]# echo 3 > /usr/local/src/zookeeper/data/myid
复制代码
1.3.3.3 步骤三:系统环境变量配置

在 master、slave1、slave2 三个节点增加环境变量配置。
  1. [root@master conf]# vi /etc/profile.d/zookeeper.sh
  2. export ZOOKEEPER_HOME=/usr/local/src/zookeeper
  3. export PATH=${ZOOKEEPER_HOME}/bin:$PATH
  4. [root@master ~]# scp /etc/profile.d/zookeeper.sh node1:/etc/profile.d/
  5. zookeeper.sh 100%   8742.3KB/s   00:00
  6. [root@master ~]# scp /etc/profile.d/zookeeper.sh node2:/etc/profile.d/
  7. zookeeper.sh 100%   8750.8KB/s   00:00
复制代码
1.3.4 实验任务四:启动 ZooKeeper

启动ZooKeeper需要使用Hadoop用户进行操作。
(1)分别在 master、slave1、slave2 三个节点使用 zkServer.sh start 命令启动ZooKeeper。
  1. [root@master ~]# su - hadoop
  2. Last login: Fri Apr 15 21:54:17 CST 2022 on pts/0
  3. [hadoop@master ~]$ jps
  4. 3922 Jps
  5. [hadoop@master ~]$ zkServer.sh start
  6. ZooKeeper JMX enabled by default
  7. Using config: /usr/local/src/zookeeper/bin/../conf/zoo.cfg
  8. Starting zookeeper ... STARTED
  9. [hadoop@master ~]$ jps
  10. 3969 Jps
  11. 3950 QuorumPeerMain
  12. [root@slave1 ~]# su - hadoop
  13. Last login: Fri Apr 15 22:06:47 CST 2022 on pts/0
  14. [hadoop@slave1 ~]$ jps
  15. 1370 Jps
  16. [hadoop@slave1 ~]$ zkServer.sh start
  17. ZooKeeper JMX enabled by default
  18. Using config: /usr/local/src/zookeeper/bin/../conf/zoo.cfg
  19. Starting zookeeper ... STARTED
  20. [hadoop@slave1 ~]$ jps
  21. 1395 QuorumPeerMain
  22. 1421 Jps
  23. [root@slave2 ~]# su - hadoop
  24. Last login: Fri Apr 15 16:25:52 CST 2022 on pts/1
  25. [hadoop@slave2 ~]$ jps
  26. 1336 Jps
  27. [hadoop@slave2 ~]$ zkServer.sh start
  28. ZooKeeper JMX enabled by default
  29. Using config: /usr/local/src/zookeeper/bin/../conf/zoo.cfg
  30. Starting zookeeper ... STARTED
  31. [hadoop@slave2 ~]$ jps
  32. 1361 QuorumPeerMain
  33. 1387 Jps
复制代码
(2)三个节点都启动完成后,再统一查看 ZooKeeper 运行状态。
  1. [hadoop@master conf]$ zkServer.sh status
  2. ZooKeeper JMX enabled by default
  3. Using config: /usr/local/src/zookeeper/bin/../conf/zoo.cfg
  4. Mode: follower
  5. [hadoop@slave1 ~]$ zkServer.sh status
  6. ZooKeeper JMX enabled by default
  7. Using config: /usr/local/src/zookeeper/bin/../conf/zoo.cfg
  8. Mode: leader
复制代码
  1. [hadoop@slave2 conf]$ zkServer.sh status
  2. ZooKeeper JMX enabled by default
  3. Using config: /usr/local/src/zookeeper/bin/../conf/zoo.cfg
  4. Mode: follower
复制代码
第8章 HBase组件安装配置

实验一:HBase 组件安装与配置

1.1实验目标

完成本实验,您应该能够:

  • 掌握HBase 安装与配置
  • 掌握HBase 常用 Shell 命令
1.2实验要求


  • 相识HBase 原理
  • 认识HBase 常用 Shell 命令
1.3实验过程

1.3.1 实验任务一:配置时间同步
  1. [root@master ~]# yum -y install chrony
  2. [root@master ~]# cat /etc/chrony.conf
  3. # Use public servers from the pool.ntp.org project.
  4. # Please consider joining the pool (http://www.pool.ntp.org/join.html).
  5. server time1.aliyun.com iburst
  6. [root@master ~]# systemctl restart chronyd.service
  7. [root@master ~]# systemctl enable chronyd.service
  8. [root@master ~]# date
  9. Fri Apr 15 15:40:14 CST 2022
复制代码
  1. [root@slave1 ~]# yum -y install chrony
  2. [root@slave1 ~]# cat /etc/chrony.conf
  3. # Use public servers from the pool.ntp.org project.
  4. # Please consider joining the pool (http://www.pool.ntp.org/join.html).
  5. server time1.aliyun.com iburst
  6. [root@slave1 ~]# systemctl restart chronyd.service
  7. [root@slave1 ~]# systemctl enable chronyd.service
  8. [root@slave1 ~]# date
  9. Fri Apr 15 15:40:17 CST 2022  
复制代码
  1. [root@slave2 ~]# yum -y install chrony
  2. [root@slave2 ~]# cat /etc/chrony.conf
  3. # Use public servers from the pool.ntp.org project.
  4. # Please consider joining the pool (http://www.pool.ntp.org/join.html).
  5. server time1.aliyun.com iburst
  6. [root@slave2 ~]# systemctl restart chronyd.service
  7. [root@slave2 ~]# systemctl enable chronyd.service
  8. [root@slave2 ~]# date
  9. Fri Apr 15 15:40:20 CST 2022
复制代码
1.3.2 实验任务二:HBase 安装与配置

1.3.2.1 步骤一:解压缩 HBase 安装包
  1. [root@master ~]# tar -zxvf hbase-1.2.1-bin.tar.gz -C /usr/local/src/
复制代码
1.3.2.2 步骤二:重命名 HBase 安装文件夹
  1. [root@master ~]# cd /usr/local/src/
  2. [root@master src]# mv hbase-1.2.1 hbase
复制代码
1.3.2.3 步骤三:在全部节点添加环境变量
  1. [root@master ~]# cat /etc/profile
  2. # set hbase environment
  3. export HBASE_HOME=/usr/local/src/hbase
  4. export PATH=$HBASE_HOME/bin:$PATH   
  5. [root@slave1 ~]# cat /etc/profile
  6. # set hbase environment
  7. export HBASE_HOME=/usr/local/src/hbase
  8. export PATH=$HBASE_HOME/bin:$PATH
  9. [root@slave2 ~]# cat /etc/profile
  10. # set hbase environment
  11. export HBASE_HOME=/usr/local/src/hbase
  12. export PATH=$HBASE_HOME/bin:$PATH
复制代码
1.3.2.4 步骤四:在全部节点使环境变量生效
  1. [root@master ~]# source /etc/profile
  2. [root@master ~]# echo $PATH
  3. /usr/local/src/hbase/bin:/usr/local/src/jdk/bin:/usr/local/src/hadoop/bin:/usr/local/src/hadoop/sbin:/usr/local/src/hbase/bin:/usr/local/src/jdk/bin:/usr/local/src/hadoop/bin:/usr/local/src/hadoop/sbin:/usr/local/src/jdk/bin:/usr/local/src/hadoop/bin:/usr/local/src/hadoop/sbin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/usr/local/src/hive/bin:/root/bin:/usr/local/src/hive/bin:/usr/local/src/hive/bin  
  4. [root@slave1 ~]# source /etc/profile
  5. [root@slave1 ~]# echo $PATH
  6. /usr/local/src/hbase/bin:/usr/local/src/jdk/bin:/usr/local/src/hadoop/bin:/usr/local/src/hadoop/sbin:/usr/local/src/hbase/bin:/usr/local/src/jdk/bin:/usr/local/src/hadoop/bin:/usr/local/src/hadoop/sbin:/usr/local/src/jdk/bin:/usr/local/src/hadoop/bin:/usr/local/src/hadoop/sbin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin
  7. [root@slave2 ~]# source /etc/profile
  8. [root@slave2 ~]# echo $PATH
  9. /usr/local/src/hbase/bin:/usr/local/src/jdk/bin:/usr/local/src/hadoop/bin:/usr/local/src/hadoop/sbin:/usr/local/src/hbase/bin:/usr/local/src/jdk/bin:/usr/local/src/hadoop/bin:/usr/local/src/hadoop/sbin:/usr/local/src/jdk/bin:/usr/local/src/hadoop/bin:/usr/local/src/hadoop/sbin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin
复制代码
1.3.2.5 步骤五:在 master 节点进入配置文件目录
  1. [root@master ~]# cd /usr/local/src/hbase/conf/
复制代码
1.3.2.6 步骤六:在 master 节点配置 hbase-env.sh 文件
  1. [root@master conf]# cat hbase-env.sh
  2. export JAVA_HOME=/usr/local/src/jdk
  3. export HBASE_MANAGES_ZK=true
  4. export HBASE_CLASSPATH=/usr/local/src/hadoop/etc/hadoop/
复制代码
1.3.2.7 步骤七:在 master 节点配置 hbase-site.xml
  1. [root@master conf]# cat hbase-site.xml
  2. <configuration>
  3.         <property>
  4.                 <name>hbase.rootdir</name>
  5.                 <value>hdfs://master:9000/hbase</value>
  6.         </property>
  7.         <property>
  8.                 <name>hbase.master.info.port</name>
  9.                 <value>60010</value>
  10.         </property>
  11.         <property>
  12.                 <name>hbase.zookeeper.property.clientPort</name>
  13.                 <value>2181</value>
  14.         </property>
  15.         <property>
  16.                 <name>zookeeper.session.timeout</name>
  17.                 <value>120000</value>
  18.         </property>
  19.         <property>
  20.                 <name>hbase.zookeeper.quorum</name>
  21.                 <value>master,node1,node2</value>
  22.         </property>
  23.         <property>
  24.                 <name>hbase.tmp.dir</name>
  25.                 <value>/usr/local/src/hbase/tmp</value>
  26.         </property>
  27.         <property>
  28.                 <name>hbase.cluster.distributed</name>
  29.                 <value>true</value>
  30.         </property>
  31. </configuration>
复制代码
1.3.2.8 步骤八:在master节点修改 regionservers 文件
  1. [root@master conf]# cat regionservers
  2. node1
  3. node2
复制代码
1.3.2.9 步骤九:在master节点创建 hbase.tmp.dir 目录
  1. [root@master ~]# mkdir /usr/local/src/hbase/tmp
复制代码
1.3.2.10 步骤十:将master上的hbase安装文件同步到 node1 node2
  1. [root@master ~]# scp -r /usr/local/src/hbase/ root@node1:/usr/local/src/
  2. [root@master ~]# scp -r /usr/local/src/hbase/ root@node2:/usr/local/src/
复制代码
1.3.2.11 步骤十一:在全部节点修改 hbase 目录权限
  1. [root@master ~]# chown -R hadoop:hadoop /usr/local/src/hbase/
  2. [root@slave1 ~]# chown -R hadoop:hadoop /usr/local/src/hbase/
  3. [root@slave2 ~]# chown -R hadoop:hadoop /usr/local/src/hbase/
复制代码
1.3.2.12 步骤十二:在全部节点切换到hadoop用户
  1. [root@master ~]# su - hadoop
  2. Last login: Mon Apr 11 00:42:46 CST 2022 on pts/0
  3. [root@slave1 ~]# su - hadoop
  4. Last login: Fri Apr  8 22:57:42 CST 2022 on pts/0
  5. [root@slave2 ~]# su - hadoop
  6. Last login: Fri Apr  8 22:57:54 CST 2022 on pts/0
复制代码
1.3.2.13 步骤十三:启动 HBase

先启动 Hadoop,然后启动 ZooKeeper,末了启动 HBase。
  1. [hadoop@master ~]$ start-all.sh
  2. [hadoop@master ~]$ jps
  3. 2130 SecondaryNameNode
  4. 1927 NameNode
  5. 2554 Jps
  6. 2301 ResourceManager
  7. [hadoop@slave1 ~]$ jps
  8. 1845 NodeManager
  9. 1977 Jps
  10. 1725 DataNode
  11. [hadoop@slave2 ~]$ jps
  12. 2080 Jps
  13. 1829 DataNode
  14. 1948 NodeManager
复制代码
1.3.2.14 步骤十四:在 master节点启动HBase
  1. [hadoop@master conf]$ start-hbase.sh
  2. [hadoop@master conf]$ jps
  3. 2130 SecondaryNameNode
  4. 3572 HQuorumPeer
  5. 1927 NameNode
  6. 5932 HMaster
  7. 2301 ResourceManager
  8. 6157 Jps
  9. [hadoop@slave1 ~]$ jps
  10. 2724 Jps
  11. 1845 NodeManager
  12. 1725 DataNode
  13. 2399 HQuorumPeer
  14. 2527 HRegionServer
  15. [root@slave2 ~]# jps
  16. 3795 Jps
  17. 1829 DataNode
  18. 3529 HRegionServer
  19. 1948 NodeManager
  20. 3388 HQuorumPeer
复制代码
1.3.2.15 步骤十五:修改windows上的hosts文件

(C:\Windows\System32\drivers\etc\hosts)
把hots文件拖到桌面上,然后编辑它参加master的主机名与P地址的映射关系后在浏览器上输入http//:master:60010访问hbase的web界面
1.3.3 实验任务三:HBase常用Shell命令

1.3.3.1 步骤一:进入 HBase 命令行
  1. [hadoop@master ~]$ hbase shell
  2. SLF4J: Class path contains multiple SLF4J bindings.
  3. SLF4J: Found binding in [jar:file:/usr/local/src/hbase/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  4. SLF4J: Found binding in [jar:file:/usr/local/src/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  5. SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
  6. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
  7. HBase Shell; enter 'help<RETURN>' for list of supported commands.
  8. Type "exit<RETURN>" to leave the HBase Shell
  9. Version 1.2.1, r8d8a7107dc4ccbf36a92f64675dc60392f85c015, Wed Mar 30 11:19:21 CDT 2016
  10. hbase(main):001:0>  
复制代码
1.3.3.2 步骤二:创建表 scores,两个列簇:grade 和 course
  1. hbase(main):001:0> create 'scores','grade','course'
  2. 0 row(s) in 1.4400 seconds
  3. => Hbase::Table - scores
复制代码
1.3.3.3 步骤三:查看数据库状态
  1. hbase(main):002:0> status
  2. 1 active master, 0 backup masters, 2 servers, 0 dead, 1.5000 average load
复制代码
1.3.3.4 步骤四:查看数据库版本
  1. hbase(main):003:0> version
  2. 1.2.1, r8d8a7107dc4ccbf36a92f64675dc60392f85c015, Wed Mar 30 11:19:21 CDT 2016
复制代码
1.3.3.5 步骤五:查看表
  1. hbase(main):004:0> list
  2. TABLE  
  3. scores
  4. 1 row(s) in 0.0150 seconds
  5. => ["scores"]
复制代码
1.3.3.6 步骤六:插入记载 1:jie,grade: 143cloud
  1. hbase(main):005:0> put 'scores','jie','grade:','146cloud'
  2. 0 row(s) in 0.1060 seconds
复制代码
1.3.3.7 步骤七:插入记载 2:jie,course:math,86
  1. hbase(main):006:0> put 'scores','jie','course:math','86'
  2. 0 row(s) in 0.0120 seconds
复制代码
1.3.3.8 步骤八:插入记载 3:jie,course:cloud,92
  1. hbase(main):009:0> put 'scores','jie','course:cloud','92'
  2. 0 row(s) in 0.0070 seconds
复制代码
1.3.3.9 步骤九:插入记载 4:shi,grade:133soft
  1. hbase(main):010:0> put 'scores','shi','grade:','133soft'
  2. 0 row(s) in 0.0120 seconds
复制代码
1.3.3.10 步骤十:插入记载 5:shi,grade:math,87
  1. hbase(main):011:0> put 'scores','shi','course:math','87'
  2. 0 row(s) in 0.0090 seconds
复制代码
1.3.3.11 步骤十一:插入记载 6:shi,grade:cloud,96
  1. hbase(main):012:0> put 'scores','shi','course:cloud','96'
  2. 0 row(s) in 0.0100 seconds
复制代码
1.3.3.12 步骤十二:读取 jie 的记载
  1. hbase(main):013:0> get 'scores','jie'
  2. COLUMN  CELL   
  3. course:cloud   timestamp=1650015032132, value=92  
  4. course:mathtimestamp=1650014925177, value=86  
  5. grade: timestamp=1650014896056, value=146cloud
  6. 3 row(s) in 0.0250 seconds
复制代码
1.3.3.13 步骤十三:读取 jie 的班级
  1. hbase(main):014:0> get 'scores','jie','grade'
  2. COLUMN  CELL   
  3. grade: timestamp=1650014896056, value=146cloud
  4. 1 row(s) in 0.0110 seconds
复制代码
1.3.3.14 步骤十四:查看整个表记载
  1. hbase(main):001:0> scan 'scores'
  2. ROW  COLUMN+CELL  
  3. jie column=course:cloud, timestamp=1650015032132, value=92   
  4. jie column=course:math, timestamp=1650014925177, value=86
  5. jie column=grade:, timestamp=1650014896056, value=146cloud   
  6. shi column=course:cloud, timestamp=1650015240873, value=96   
  7. shi column=course:math, timestamp=1650015183521, value=87
  8. 2 row(s) in 0.1490 seconds
复制代码
1.3.3.15 步骤十五:按例查看表记载
  1. hbase(main):002:0> scan 'scores',{COLUMNS=>'course'}
  2. ROW  COLUMN+CELL  
  3. jie column=course:cloud, timestamp=1650015032132, value=92   
  4. jie column=course:math, timestamp=1650014925177, value=86
  5. shi column=course:cloud, timestamp=1650015240873, value=96   
  6. shi column=course:math, timestamp=1650015183521, value=87
  7. 2 row(s) in 0.0160 seconds
复制代码
1.3.3.16 步骤十六:删除指定记载shell
  1. hbase(main):003:0> delete 'scores','shi','grade'
  2. 0 row(s) in 0.0560 seconds
复制代码
1.3.3.17 步骤十七:删除后,执行scan 命令
  1. hbase(main):004:0> scan 'scores'
  2. ROW  COLUMN+CELL  
  3. jie column=course:cloud, timestamp=1650015032132, value=92   
  4. jie column=course:math, timestamp=1650014925177, value=86
  5. jie column=grade:, timestamp=1650014896056, value=146cloud   
  6. shi column=course:cloud, timestamp=1650015240873, value=96   
  7. shi column=course:math, timestamp=1650015183521, value=87
  8. 2 row(s) in 0.0130 seconds
复制代码
1.3.3.18 步骤十八:增加新的列簇
  1. hbase(main):005:0> alter 'scores',NAME=>'age'
  2. Updating all regions with the new schema...
  3. 1/1 regions updated.
  4. Done.
  5. 0 row(s) in 2.0110 seconds
复制代码
1.3.3.19 步骤十九:查看表布局
  1. hbase(main):006:0> describe 'scores'
  2. Table scores is ENABLED   
  3. scores
  4. COLUMN FAMILIES DESCRIPTION   
  5. {NAME => 'age', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', C
  6. OMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}  
  7. {NAME => 'course', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER'
  8. , COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}   
  9. {NAME => 'grade', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER',
  10. COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
  11. 3 row(s) in 0.0230 seconds
复制代码
1.3.3.20 步骤二十:删除列簇
  1. hbase(main):007:0> alter 'scores',NAME=>'age',METHOD=>'delete'
  2. Updating all regions with the new schema...
  3. 1/1 regions updated.
  4. Done.
  5. 0 row(s) in 2.1990 seconds
复制代码
1.3.3.21 步骤二十一:删除表
  1. hbase(main):008:0> disable 'scores'
  2. 0 row(s) in 2.3190 seconds
复制代码
1.3.3.22 步骤二十二:退出
  1. hbase(main):009:0> quit
复制代码
1.3.3.23 步骤二十三:关闭 HBase
  1. [hadoop@master ~]$ stop-hbase.sh
  2. stopping hbase.................
  3. master: stopping zookeeper.
  4. node2: stopping zookeeper.
  5. node1: stopping zookeeper.
复制代码
在 master 节点关闭 Hadoop。
  1. [hadoop@master ~]$ stop-all.sh
  2. This script is Deprecated. Instead use stop-dfs.sh and stop-yarn.sh
  3. Stopping namenodes on [master]
  4. master: stopping namenode
  5. 10.10.10.130: stopping datanode
  6. 10.10.10.129: stopping datanode
  7. Stopping secondary namenodes [0.0.0.0]
  8. 0.0.0.0: stopping secondarynamenode
  9. stopping yarn daemons
  10. stopping resourcemanager
  11. 10.10.10.129: stopping nodemanager
  12. 10.10.10.130: stopping nodemanager
  13. no proxyserver to stop
  14. [hadoop@master ~]$ jps
  15. 3820 Jps
  16. [hadoop@slave1 ~]$ jps
  17. 2220 Jps
  18. [root@slave2 ~]# jps
  19. 2082 Jps
复制代码
完结,撒花
附件:






第9章 Sqoop组件安装配置

实验一:Sqoop 组件安装与配置

1.1.实验目标

完成本实验,您应该能够:

  • 下载和解压 Sqoop
  • 配置Sqoop 环境
  • 安装Sqoop
  • Sqoop 模板命令
1.2.实验要求


  • 认识Sqoop 环境
  • 认识Sqoop 模板命令
1.3.实验过程

1.3.1.实验任务一:下载和解压 Sqoop

安装Sqoop 组件需要与Hadoop 环境适配。使用 root 用户在Master 节点上进行部署, 将 /opt/software/sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz 压 缩 包 解 压 到/usr/local/src 目录下。
  1. [root@master ~]# tar xf /opt/software/sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz -C /usr/local/src/
复制代码
将解压后生成的 sqoop-1.4.7.bin hadoop-2.6.0 文件夹更名为 sqoop。
  1. [root@master ~]# cd /usr/local/src/
  2. [root@master src]# mv sqoop-1.4.7.bin__hadoop-2.6.0 sqoop
复制代码
1.3.2.实验任务二:配置 Sqoop 环境

1.3.2.1.步骤一:创建 Sqoop 的配置文件 sqoop-env.sh。

复制 sqoop-env-template.sh 模板,并将模板重命名为 sqoop-env.sh。
  1. [root@master src]# cd /usr/local/src/sqoop/conf/
  2. [root@master conf]# cp sqoop-env-template.sh sqoop-env.sh
复制代码
1.3.2.2.步骤二:修改 sqoop-env.sh 文件,添加 Hdoop、Hbase、Hive 等组件的安装路径。

注意,下面各组件的安装路径需要与实际环境中的安装路径保持同等。
  1. vim sqoop-env.sh
  2. export HADOOP_COMMON_HOME=/usr/local/src/hadoop
  3. export HADOOP_MAPRED_HOME=/usr/local/src/hadoop
  4. export HBASE_HOME=/usr/local/src/hbase
  5. export HIVE_HOME=/usr/local/src/hive
复制代码
1.3.2.3.步骤三:配置 Linux 系统环境变量,添加 Sqoop 组件的路径。
  1. vim /etc/profile.d/sqoop.sh
  2. export SQOOP_HOME=/usr/local/src/sqoop
  3. export PATH=$SQOOP_HOME/bin:$PATH
  4. export CLASSPATH=$CLASSPATH:$SQOOP_HOME/lib
  5. [root@master conf]# source /etc/profile.d/sqoop.sh
  6. [root@master conf]# echo $PATH
  7. /usr/local/src/sqoop/bin:/usr/local/src/hbase/bin:/usr/local/src/zookeeper/bin:/usr/local/src/hbase/bin:/usr/local/src/jdk/bin:/usr/local/src/hadoop/bin:/usr/local/src/hadoop/sbin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/usr/local/src/hive/bin:/root/bin
复制代码
1.3.2.4.步骤四:连接数据库

为了使 Sqoop 能够连接 MySQL 数据库,需要将/opt/software/mysql-connector-jav a-5.1.46.jar 文件放入 sqoop 的 lib 目录中。该 jar 文件的版本需要与 MySQL 数据库的版本相对应,否则 Sqoop 导入数据时会报错。(mysql-connector-java-5.1.46.jar 对应的是 MySQL 5.7 版本)若该目录没有 jar 包,则使用第 6 章导入 home 目录的jar包
  1. [root@master conf]# cp /opt/software/mysql-connector-java-5.1.46.jar /usr/local/src/sqoop/lib/
复制代码
1.3.3.实验任务三:启动Sqoop

1.3.3.1.步骤一:执行 Sqoop 前需要先启动 Hadoop 集群。

在 master 节点切换到 hadoop 用户执行 start-all.sh 命令启动 Hadoop 集群。
  1. [root@master conf]# su - hadoop
  2. Last login: Fri Apr 22 16:21:25 CST 2022 on pts/0
  3. [hadoop@master ~]$ start-all.sh
  4. This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
  5. Starting namenodes on [master]
  6. master: starting namenode, logging to /usr/local/src/hadoop/logs/hadoop-hadoop-namenode-master.out
  7. 10.10.10.129: starting datanode, logging to /usr/local/src/hadoop/logs/hadoop-hadoop-datanode-slave1.out
  8. 10.10.10.130: starting datanode, logging to /usr/local/src/hadoop/logs/hadoop-hadoop-datanode-slave2.out
  9. Starting secondary namenodes [0.0.0.0]
  10. 0.0.0.0: starting secondarynamenode, logging to /usr/local/src/hadoop/logs/hadoop-hadoop-secondarynamenode-master.out
  11. starting yarn daemons
  12. starting resourcemanager, logging to /usr/local/src/hadoop/logs/yarn-hadoop-resourcemanager-master.out
  13. 10.10.10.130: starting nodemanager, logging to /usr/local/src/hadoop/logs/yarn-hadoop-nodemanager-slave2.out
  14. 10.10.10.129: starting nodemanager, logging to /usr/local/src/hadoop/logs/yarn-hadoop-nodemanager-slave1.out
复制代码
1.3.3.2.步骤二:检查 Hadoop 集群的运行状态。
  1. [hadoop@master ~]$ jps
  2. 1653 SecondaryNameNode
  3. 2086 Jps
  4. 1450 NameNode
  5. 1822 ResourceManager
  6. [root@slave1 ~]# jps
  7. 1378 NodeManager
  8. 1268 DataNode
  9. 1519 Jps
  10. [root@slave2 ~]# jps
  11. 1541 Jps
  12. 1290 DataNode
  13. 1405 NodeManager
复制代码
1.3.3.3.步骤三:测试Sqoop是否能够正常连接MySQL 数据库。

Sqoop 连接 MySQL 数据库 P 大写 密码 Password123$
  1. [hadoop@master ~]$ sqoop list-databases --connect jdbc:mysql://master:3306 --username root -P
  2. Warning: /usr/local/src/sqoop/../hcatalog does not exist! HCatalog jobs will fail.
  3. Please set $HCAT_HOME to the root of your HCatalog installation.
  4. Warning: /usr/local/src/sqoop/../accumulo does not exist! Accumulo imports will fail.
  5. Please set $ACCUMULO_HOME to the root of your Accumulo installation.
  6. 22/04/29 15:25:49 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
  7. Enter password:
  8. 22/04/29 15:25:58 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
  9. Fri Apr 29 15:25:58 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
  10. information_schema
  11. hive
  12. mysql
  13. performance_schema
  14. sys
复制代码
1.3.3.4.步骤四:连接 hive

为了使 Sqoop 能够连接 Hive,需要将 hive 组件/usr/local/src/hive/lib 目录下的
hive-common-2.0.0.jar 也放入 Sqoop 安装路径的 lib 目录中。
  1. [hadoop@master ~]$ cp /usr/local/src/hive/lib/hive-common-2.0.0.jar  /usr/local/src/sqoop/lib/
复制代码
1.3.4.实验任务四:Sqoop 模板命令

1.3.4.1.步骤一:创建MySQL数据库和数据表。

创建 sample 数据库,在 sample 中创建 student 表,在 student 表中插入了 3 条数据。
  1. # 登录 MySQL 数据库
  2. [hadoop@master ~]$ mysql -uroot -pPassword123$
  3. mysql: [Warning] Using a password on the command line interface can be insecure.
  4. Welcome to the MySQL monitor.  Commands end with ; or \g.
  5. Your MySQL connection id is 6
  6. Server version: 5.7.18 MySQL Community Server (GPL)
  7. Copyright (c) 2000, 2017, Oracle and/or its affiliates. All rights reserved.
  8. Oracle is a registered trademark of Oracle Corporation and/or its
  9. affiliates. Other names may be trademarks of their respective
  10. owners.
  11. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
  12. # 创建 sample 库
  13. mysql> create database sample;
  14. Query OK, 1 row affected (0.00 sec)
  15. # 使用 sample 库
  16. mysql> use sample;
  17. Database changed
  18. # 创建 student 表,该数据表有number学号和name姓名两个字段
  19. mysql> create table student(number char(9) primary key, name varchar(10));
  20. Query OK, 0 rows affected (0.01 sec)
  21. # 向 student 表插入几条数据
  22. mysql>  insert into student values('01','zhangsan'),('02','lisi'),('03','wangwu');
  23. Query OK, 3 rows affected (0.01 sec)
  24. Records: 3  Duplicates: 0  Warnings: 0
  25. # 查询 student 表的数据
  26. mysql> select * from student;
  27. +--------+----------+
  28. | number | name |
  29. +--------+----------+
  30. | 01 | zhangsan |
  31. | 02 | lisi         |
  32. | 03 | wangwu   |
  33. +--------+----------+
  34. 3 rows in set (0.00 sec)
  35. mysql> quit
  36. Bye
复制代码
1.3.4.2.步骤二:在Hive中创建sample数据库和student数据表。
  1. hive>
  2. > create database sample;
  3. OK
  4. Time taken: 0.528 seconds
  5. hive>  use sample;
  6. OK
  7. Time taken: 0.019 seconds
  8. hive>  create table student(number STRING,name STRING);
  9. OK
  10. Time taken: 0.2 seconds
  11. hive> exit;
  12. [hadoop@master conf]$
复制代码
1.3.4.3.步骤三:从MySQL 导出数据,导入 Hive。
  1. [hadoop@master ~]$ sqoop import --connect jdbc:mysql://master:3306/sample --username root --password Password123$ --table student --fields-terminated-by '|' --delete-target-dir --num-mappers 1 --hive-import --hive-database sample --hive-table student
  2. hive>
  3.         > select * from sample.student;
  4. OK
  5. 01|zhangsan        NULL
  6. 02|lisi        NULL
  7. 03|wangwu        NULL
  8. Time taken: 1.238 seconds, Fetched: 3 row(s)
  9. hive>
  10.         > exit;
复制代码
1.3.4.4.步骤四:sqoop常用命令
  1. #列出所有数据库
  2. [hadoop@master ~]$ sqoop list-databases --connect jdbc:mysql://master:3306/ --username root --password Password123$
  3. Warning: /usr/local/src/sqoop/../hcatalog does not exist! HCatalog jobs will fail.
  4. Please set $HCAT_HOME to the root of your HCatalog installation.
  5. Warning: /usr/local/src/sqoop/../accumulo does not exist! Accumulo imports will fail.
  6. Please set $ACCUMULO_HOME to the root of your Accumulo installation.
  7. 22/04/29 16:55:40 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
  8. 22/04/29 16:55:40 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
  9. 22/04/29 16:55:40 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
  10. Fri Apr 29 16:55:40 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
  11. information_schema
  12. hive
  13. mysql
  14. performance_schema
  15. sample
  16. sys
  17. # 连接 MySQL 并列出 sample 数据库中的表
  18. [hadoop@master ~]$ sqoop list-tables --connect "jdbc:mysql://master:3306/sample?useSSL=false" --username root --password Password123$
  19. Warning: /usr/local/src/sqoop/../hcatalog does not exist! HCatalog jobs will fail.
  20. Please set $HCAT_HOME to the root of your HCatalog installation.
  21. Warning: /usr/local/src/sqoop/../accumulo does not exist! Accumulo imports will fail.
  22. Please set $ACCUMULO_HOME to the root of your Accumulo installation.
  23. 22/04/29 16:56:45 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
  24. 22/04/29 16:56:45 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
  25. 22/04/29 16:56:45 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
  26. student
  27. # 将关系型数据的表结构复制到 hive 中,只是复制表的结构,表中的内容没有复制过去
  28. [hadoop@master ~]$ sqoop create-hive-table --connect jdbc:mysql://master:3306/sample --table student --username root --password Password123$ --hive-table test
  29. Warning: /usr/local/src/sqoop/../hcatalog does not exist! HCatalog jobs will fail.
  30. Please set $HCAT_HOME to the root of your HCatalog installation.
  31. Warning: /usr/local/src/sqoop/../accumulo does not exist! Accumulo imports will fail.
  32. Please set $ACCUMULO_HOME to the root of your Accumulo installation.
  33. 22/04/29 16:57:42 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
  34. 22/04/29 16:57:42 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
  35. 22/04/29 16:57:42 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override
  36. 22/04/29 16:57:42 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc.
  37. 22/04/29 16:57:42 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
  38. Fri Apr 29 16:57:42 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
  39. 22/04/29 16:57:43 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `student` AS t LIMIT 1
  40. 22/04/29 16:57:43 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `student` AS t LIMIT 1
  41. SLF4J: Class path contains multiple SLF4J bindings.
  42. SLF4J: Found binding in [jar:file:/usr/local/src/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  43. SLF4J: Found binding in [jar:file:/usr/local/src/hbase/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  44. SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
  45. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
  46. 22/04/29 16:57:43 INFO hive.HiveImport: Loading uploaded data into Hive
  47. 22/04/29 16:57:46 INFO hive.HiveImport: SLF4J: Class path contains multiple SLF4J bindings.
  48. 22/04/29 16:57:46 INFO hive.HiveImport: SLF4J: Found binding in [jar:file:/usr/local/src/hive/lib/hive-jdbc-2.0.0-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  49. 22/04/29 16:57:46 INFO hive.HiveImport: SLF4J: Found binding in [jar:file:/usr/local/src/hive/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  50. 22/04/29 16:57:46 INFO hive.HiveImport: SLF4J: Found binding in [jar:file:/usr/local/src/hbase/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  51. 22/04/29 16:57:46 INFO hive.HiveImport: SLF4J: Found binding in [jar:file:/usr/local/src/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  52. 22/04/29 16:57:46 INFO hive.HiveImport: SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
  53. 22/04/29 16:57:46 INFO hive.HiveImport: SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
  54. 22/04/29 16:57:46 INFO hive.HiveImport:
  55. 22/04/29 16:57:46 INFO hive.HiveImport: Logging initialized using configuration in jar:file:/usr/local/src/hive/lib/hive-common-2.0.0.jar!/hive-log4j2.properties
  56. 22/04/29 16:57:47 INFO hive.HiveImport: Fri Apr 29 16:57:47 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
  57. 22/04/29 16:57:47 INFO hive.HiveImport: Fri Apr 29 16:57:47 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
  58. 22/04/29 16:57:47 INFO hive.HiveImport: Fri Apr 29 16:57:47 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
  59. 22/04/29 16:57:47 INFO hive.HiveImport: Fri Apr 29 16:57:47 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
  60. 22/04/29 16:57:48 INFO hive.HiveImport: Fri Apr 29 16:57:48 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
  61. 22/04/29 16:57:48 INFO hive.HiveImport: Fri Apr 29 16:57:48 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
  62. 22/04/29 16:57:48 INFO hive.HiveImport: Fri Apr 29 16:57:48 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
  63. 22/04/29 16:57:48 INFO hive.HiveImport: Fri Apr 29 16:57:48 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
  64. 22/04/29 16:57:50 INFO hive.HiveImport: OK
  65. 22/04/29 16:57:50 INFO hive.HiveImport: Time taken: 0.853 seconds
  66. 22/04/29 16:57:51 INFO hive.HiveImport: Hive import complete.
  67. # 如果执行以上命令之后显示hive.HiveImport: Hive import complete.则表示成功
  68. [hadoop@master ~]$ sqoop import --connect jdbc:mysql://master:3306/sample --username root --password Password123$ --table student --fields-terminated-by '|' --delete-target-dir --num-mappers 1 --hive-import --hive-database default --hive-table test
  69. Warning: /usr/local/src/sqoop/../hcatalog does not exist! HCatalog jobs will fail.
  70. Please set $HCAT_HOME to the root of your HCatalog installation.
  71. Warning: /usr/local/src/sqoop/../accumulo does not exist! Accumulo imports will fail.
  72. Please set $ACCUMULO_HOME to the root of your Accumulo installation.
  73. 22/04/29 17:00:06 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
  74. 22/04/29 17:00:06 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
  75. 22/04/29 17:00:06 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
  76. 22/04/29 17:00:06 INFO tool.CodeGenTool: Beginning code generation
  77. Fri Apr 29 17:00:06 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
  78. 22/04/29 17:00:06 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `student` AS t LIMIT 1
  79. 22/04/29 17:00:06 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `student` AS t LIMIT 1
  80. 22/04/29 17:00:06 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/local/src/hadoop
  81. Note: /tmp/sqoop-hadoop/compile/556af862aa5bc04a542c14f0741f7dc6/student.java uses or overrides a deprecated API.
  82. Note: Recompile with -Xlint:deprecation for details.
  83. 22/04/29 17:00:07 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/556af862aa5bc04a542c14f0741f7dc6/student.jar
  84. SLF4J: Class path contains multiple SLF4J bindings.
  85. SLF4J: Found binding in [jar:file:/usr/local/src/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  86. SLF4J: Found binding in [jar:file:/usr/local/src/hbase/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  87. SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
  88. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
  89. 22/04/29 17:00:07 INFO tool.ImportTool: Destination directory student is not present, hence not deleting.
  90. 22/04/29 17:00:07 WARN manager.MySQLManager: It looks like you are importing from mysql.
  91. 22/04/29 17:00:07 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
  92. 22/04/29 17:00:07 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
  93. 22/04/29 17:00:07 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
  94. 22/04/29 17:00:07 INFO mapreduce.ImportJobBase: Beginning import of student
  95. 22/04/29 17:00:07 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
  96. 22/04/29 17:00:07 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
  97. 22/04/29 17:00:07 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
  98. Fri Apr 29 17:00:09 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
  99. 22/04/29 17:00:09 INFO db.DBInputFormat: Using read commited transaction isolation
  100. 22/04/29 17:00:09 INFO mapreduce.JobSubmitter: number of splits:1
  101. 22/04/29 17:00:09 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1651221174197_0003
  102. 22/04/29 17:00:09 INFO impl.YarnClientImpl: Submitted application application_1651221174197_0003
  103. 22/04/29 17:00:09 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1651221174197_0003/
  104. 22/04/29 17:00:09 INFO mapreduce.Job: Running job: job_1651221174197_0003
  105. 22/04/29 17:00:13 INFO mapreduce.Job: Job job_1651221174197_0003 running in uber mode : false
  106. 22/04/29 17:00:13 INFO mapreduce.Job:  map 0% reduce 0%
  107. 22/04/29 17:00:17 INFO mapreduce.Job:  map 100% reduce 0%
  108. 22/04/29 17:00:17 INFO mapreduce.Job: Job job_1651221174197_0003 completed successfully
  109. 22/04/29 17:00:17 INFO mapreduce.Job: Counters: 30
  110.         File System Counters
  111.                 FILE: Number of bytes read=0
  112.                 FILE: Number of bytes written=134261
  113.                 FILE: Number of read operations=0
  114.                 FILE: Number of large read operations=0
  115.                 FILE: Number of write operations=0
  116.                 HDFS: Number of bytes read=87
  117.                 HDFS: Number of bytes written=30
  118.                 HDFS: Number of read operations=4
  119.                 HDFS: Number of large read operations=0
  120.                 HDFS: Number of write operations=2
  121.         Job Counters
  122.                 Launched map tasks=1
  123.                 Other local map tasks=1
  124.                 Total time spent by all maps in occupied slots (ms)=1731
  125.                 Total time spent by all reduces in occupied slots (ms)=0
  126.                 Total time spent by all map tasks (ms)=1731
  127.                 Total vcore-seconds taken by all map tasks=1731
  128.                 Total megabyte-seconds taken by all map tasks=1772544
  129.         Map-Reduce Framework
  130.                 Map input records=3
  131.                 Map output records=3
  132.                 Input split bytes=87
  133.                 Spilled Records=0
  134.                 Failed Shuffles=0
  135.                 Merged Map outputs=0
  136.                 GC time elapsed (ms)=35
  137.                 CPU time spent (ms)=1010
  138.                 Physical memory (bytes) snapshot=179433472
  139.                 Virtual memory (bytes) snapshot=2137202688
  140.                 Total committed heap usage (bytes)=88604672
  141.         File Input Format Counters
  142.                 Bytes Read=0
  143.         File Output Format Counters
  144.                 Bytes Written=30
  145. 22/04/29 17:00:17 INFO mapreduce.ImportJobBase: Transferred 30 bytes in 9.8777 seconds (3.0371 bytes/sec)
  146. 22/04/29 17:00:17 INFO mapreduce.ImportJobBase: Retrieved 3 records.
  147. 22/04/29 17:00:17 INFO mapreduce.ImportJobBase: Publishing Hive/Hcat import job data to Listeners for table student
  148. Fri Apr 29 17:00:17 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
  149. 22/04/29 17:00:17 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `student` AS t LIMIT 1
  150. 22/04/29 17:00:17 INFO hive.HiveImport: Loading uploaded data into Hive
  151. 22/04/29 17:00:20 INFO hive.HiveImport: SLF4J: Class path contains multiple SLF4J bindings.
  152. 22/04/29 17:00:20 INFO hive.HiveImport: SLF4J: Found binding in [jar:file:/usr/local/src/hive/lib/hive-jdbc-2.0.0-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  153. 22/04/29 17:00:20 INFO hive.HiveImport: SLF4J: Found binding in [jar:file:/usr/local/src/hive/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  154. 22/04/29 17:00:20 INFO hive.HiveImport: SLF4J: Found binding in [jar:file:/usr/local/src/hbase/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  155. 22/04/29 17:00:20 INFO hive.HiveImport: SLF4J: Found binding in [jar:file:/usr/local/src/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  156. 22/04/29 17:00:20 INFO hive.HiveImport: SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
  157. 22/04/29 17:00:20 INFO hive.HiveImport: SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
  158. 22/04/29 17:00:20 INFO hive.HiveImport:
  159. 22/04/29 17:00:20 INFO hive.HiveImport: Logging initialized using configuration in jar:file:/usr/local/src/hive/lib/hive-common-2.0.0.jar!/hive-log4j2.properties
  160. 22/04/29 17:00:21 INFO hive.HiveImport: Fri Apr 29 17:00:21 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
  161. 22/04/29 17:00:21 INFO hive.HiveImport: Fri Apr 29 17:00:21 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
  162. 22/04/29 17:00:21 INFO hive.HiveImport: Fri Apr 29 17:00:21 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
  163. 22/04/29 17:00:21 INFO hive.HiveImport: Fri Apr 29 17:00:21 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
  164. 22/04/29 17:00:23 INFO hive.HiveImport: Fri Apr 29 17:00:23 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
  165. 22/04/29 17:00:23 INFO hive.HiveImport: Fri Apr 29 17:00:23 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
  166. 22/04/29 17:00:23 INFO hive.HiveImport: Fri Apr 29 17:00:23 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
  167. 22/04/29 17:00:23 INFO hive.HiveImport: Fri Apr 29 17:00:23 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
  168. 22/04/29 17:00:24 INFO hive.HiveImport: OK
  169. 22/04/29 17:00:24 INFO hive.HiveImport: Time taken: 0.713 seconds
  170. 22/04/29 17:00:24 INFO hive.HiveImport: Loading data to table default.test
  171. 22/04/29 17:00:25 INFO hive.HiveImport: OK
  172. 22/04/29 17:00:25 INFO hive.HiveImport: Time taken: 0.42 seconds
  173. 22/04/29 17:00:25 INFO hive.HiveImport: Hive import complete.
  174. 22/04/29 17:00:25 INFO hive.HiveImport: Export directory is contains the _SUCCESS file only, removing the directory.
  175. hive> show tables;
  176. OK
  177. test
  178. Time taken: 0.558 seconds, Fetched: 1 row(s)
  179. hive> exit;
复制代码
  1. # 从mysql中导出表内容到HDFS文件中
  2. [hadoop@master ~]$ sqoop import --connect jdbc:mysql://master:3306/sample --username root --password Password123$ --table student --num-mappers 1 --target-dir /user/test
  3. Warning: /usr/local/src/sqoop/../hcatalog does not exist! HCatalog jobs will fail.
  4. Please set $HCAT_HOME to the root of your HCatalog installation.
  5. Warning: /usr/local/src/sqoop/../accumulo does not exist! Accumulo imports will fail.
  6. Please set $ACCUMULO_HOME to the root of your Accumulo installation.
  7. 22/04/29 17:03:13 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
  8. 22/04/29 17:03:13 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
  9. 22/04/29 17:03:13 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
  10. 22/04/29 17:03:13 INFO tool.CodeGenTool: Beginning code generation
  11. Fri Apr 29 17:03:14 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
  12. 22/04/29 17:03:14 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `student` AS t LIMIT 1
  13. 22/04/29 17:03:14 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `student` AS t LIMIT 1
  14. 22/04/29 17:03:14 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/local/src/hadoop
  15. Note: /tmp/sqoop-hadoop/compile/eab748b8f3fb956072f4877fdf4bf23a/student.java uses or overrides a deprecated API.
  16. Note: Recompile with -Xlint:deprecation for details.
  17. 22/04/29 17:03:15 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/eab748b8f3fb956072f4877fdf4bf23a/student.jar
  18. 22/04/29 17:03:15 WARN manager.MySQLManager: It looks like you are importing from mysql.
  19. 22/04/29 17:03:15 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
  20. 22/04/29 17:03:15 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
  21. 22/04/29 17:03:15 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
  22. 22/04/29 17:03:15 INFO mapreduce.ImportJobBase: Beginning import of student
  23. SLF4J: Class path contains multiple SLF4J bindings.
  24. SLF4J: Found binding in [jar:file:/usr/local/src/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  25. SLF4J: Found binding in [jar:file:/usr/local/src/hbase/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  26. SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
  27. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
  28. 22/04/29 17:03:15 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
  29. 22/04/29 17:03:15 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
  30. 22/04/29 17:03:15 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
  31. Fri Apr 29 17:03:17 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
  32. 22/04/29 17:03:17 INFO db.DBInputFormat: Using read commited transaction isolation
  33. 22/04/29 17:03:17 INFO mapreduce.JobSubmitter: number of splits:1
  34. 22/04/29 17:03:17 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1651221174197_0004
  35. 22/04/29 17:03:17 INFO impl.YarnClientImpl: Submitted application application_1651221174197_0004
  36. 22/04/29 17:03:17 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1651221174197_0004/
  37. 22/04/29 17:03:17 INFO mapreduce.Job: Running job: job_1651221174197_0004
  38. 22/04/29 17:03:21 INFO mapreduce.Job: Job job_1651221174197_0004 running in uber mode : false
  39. 22/04/29 17:03:21 INFO mapreduce.Job:  map 0% reduce 0%
  40. 22/04/29 17:03:25 INFO mapreduce.Job:  map 100% reduce 0%
  41. 22/04/29 17:03:25 INFO mapreduce.Job: Job job_1651221174197_0004 completed successfully
  42. 22/04/29 17:03:25 INFO mapreduce.Job: Counters: 30
  43.         File System Counters
  44.                 FILE: Number of bytes read=0
  45.                 FILE: Number of bytes written=134251
  46.                 FILE: Number of read operations=0
  47.                 FILE: Number of large read operations=0
  48.                 FILE: Number of write operations=0
  49.                 HDFS: Number of bytes read=87
  50.                 HDFS: Number of bytes written=30
  51.                 HDFS: Number of read operations=4
  52.                 HDFS: Number of large read operations=0
  53.                 HDFS: Number of write operations=2
  54.         Job Counters
  55.                 Launched map tasks=1
  56.                 Other local map tasks=1
  57.                 Total time spent by all maps in occupied slots (ms)=1945
  58.                 Total time spent by all reduces in occupied slots (ms)=0
  59.                 Total time spent by all map tasks (ms)=1945
  60.                 Total vcore-seconds taken by all map tasks=1945
  61.                 Total megabyte-seconds taken by all map tasks=1991680
  62.         Map-Reduce Framework
  63.                 Map input records=3
  64.                 Map output records=3
  65.                 Input split bytes=87
  66.                 Spilled Records=0
  67.                 Failed Shuffles=0
  68.                 Merged Map outputs=0
  69.                 GC time elapsed (ms)=69
  70.                 CPU time spent (ms)=1050
  71.                 Physical memory (bytes) snapshot=179068928
  72.                 Virtual memory (bytes) snapshot=2136522752
  73.                 Total committed heap usage (bytes)=88604672
  74.         File Input Format Counters
  75.                 Bytes Read=0
  76.         File Output Format Counters
  77.                 Bytes Written=30
  78. 22/04/29 17:03:25 INFO mapreduce.ImportJobBase: Transferred 30 bytes in 10.2361 seconds (2.9308 bytes/sec)
  79. 22/04/29 17:03:25 INFO mapreduce.ImportJobBase: Retrieved 3 records.
  80. # 执行以上命令后在浏览器上访问master_ip:50070然后点击Utilities下面的Browse the file system,要能看到user就表示成功
复制代码
  1. [hadoop@master ~]$ hdfs dfs -ls /user/test
  2. Found 2 items
  3. -rw-r--r--   2 hadoop supergroup  0 2022-04-29 17:03 /user/test/_SUCCESS
  4. -rw-r--r--   2 hadoop supergroup 30 2022-04-29 17:03 /user/test/part-m-00000
  5. [hadoop@master ~]$ hdfs dfs -cat /user/test/part-m-00000
  6. 01,zhangsan
  7. 02,lisi
  8. 03,wangwu
复制代码
第10章 Flume组件安装配置

实验一:Flume 组件安装配置

1.1. 实验目标

完成本实验,您应该能够:

  • 掌握下载和解压 Flume
  • 掌握 Flume 组件部署
  • 掌握使用 Flume 发送和继承信息
1.2. 实验要求


  • 相识 Flume 相关知识
  • 认识 Flume 功能应用
  • 认识 Flume 组件设置
1.3. 实验过程

1.3.1. 实验任务一:下载和解压 Flume

使用 root 用户解压 Flume 安装包到“/usr/local/src”路径,并修改解压后文件夹名
为 flume。
  1. [root@master ~]# tar xf /opt/software/apache-flume-1.6.0-bin.tar.gz -C /usr/local/src/
  2. [root@master ~]# cd /usr/local/src/
  3. [root@master src]# mv apache-flume-1.6.0-bin/
  4. flume
  5. [root@master src]# chown -R hadoop.hadoop /usr/local/src/
复制代码
1.3.2. 实验任务二:Flume 组件部署

1.3.2.1. 步骤一:使用 root 用户设置 Flume 环境变量,并使环境变量对全部用户生效。
  1. [root@master src]# vim /etc/profile.d/flume.sh
  2. export FLUME_HOME=/usr/local/src/flume
  3. export PATH=${FLUME_HOME}/bin:$PATH
复制代码
1.3.2.2. 步骤二:修改 Flume 相应配置文件。

起首,切换到 hadoop 用户,并切换当前工作目录到 Flume 的配置文件夹。
  1. [hadoop@master ~]$ echo $PATH
  2. /usr/local/src/hbase/bin:/usr/local/src/zookeeper/bin:/usr/local/src/sqoop/bin:/usr/local/src/hive/bin:/usr/local/src/hbase/bin:/usr/local/src/jdk/bin:/usr/local/src/hadoop/bin:/usr/local/src/hadoop/sbin:/usr/local/src/flume/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/usr/local/src/hive/bin:/home/hadoop/.local/bin:/home/hadoop/bin
复制代码
1.3.2.3. 步骤三:修改并配置 flume-env.sh 文件。
  1. [hadoop@master ~]$ vim /usr/local/src/hbase/conf/hbase-env.sh
  2. #export HBASE_CLASSPATH=/usr/local/src/hadoop/etc/hadoop/ #注释掉这一行的内容
  3. export JAVA_HOME=/usr/local/src/jdk
  4. [hadoop@master conf]$ start-all.sh
  5. [hadoop@master ~]$ flume-ng version
  6. Flume 1.6.0
  7. Source code repository: https://git-wip-us.apache.org/repos/asf/flume.git
  8. Revision: 2561a23240a71ba20bf288c7c2cda88f443c2080
  9. Compiled by hshreedharan on Mon May 11 11:15:44 PDT 2015
  10. From source with checksum b29e416802ce9ece3269d34233baf43f
复制代码
1.3.3. 实验任务三:使用 Flume 发送和继承信息

通过 Flume 将 Web 服务器中数据传输到 HDFS 中。
1.3.3.1. 步骤一:在 Flume 安装目录中创建 simple-hdfs-flume.conf 文件。
  1. [hadoop@master ~]$ cd /usr/local/src/flume/
  2. [hadoop@master ~]$ vi /usr/local/src/flume/simple-hdfs-flume.conf
  3. a1.sources=r1
  4. a1.sinks=k1
  5. a1.channels=c1
  6. a1.sources.r1.type=spooldir
  7. a1.sources.r1.spoolDir=/usr/local/src/hadoop/logs/
  8. a1.sources.r1.fileHeader=true
  9. a1.sinks.k1.type=hdfs
  10. a1.sinks.k1.hdfs.path=hdfs://master:9000/tmp/flume
  11. a1.sinks.k1.hdfs.rollsize=1048760
  12. a1.sinks.k1.hdfs.rollCount=0
  13. a1.sinks.k1.hdfs.rollInterval=900
  14. a1.sinks.k1.hdfs.useLocalTimeStamp=true
  15. a1.channels.c1.type=file
  16. a1.channels.c1.capacity=1000
  17. a1.channels.c1.transactionCapacity=100
  18. a1.sources.r1.channels = c1
  19. a1.sinks.k1.channel = c1
复制代码
1.3.3.2. 步骤二:使用 flume-ng agent 命令加载 simple-hdfs-flume.conf 配置信息,启 配置信息,启动flume 传输数据。
  1. [hadoop@master ~]$ flume-ng agent --conf-file simple-hdfs-flume.conf --name a1
复制代码
ctrl+c 退出 flume 传输
1.3.3.3. 步骤三:查看 Flume 传输到 HDFS 的文件,若能查看到 HDFS 上/tmp/flume目录有传输的数据文件,则表示数据传输成功。
  1. [hadoop@master ~]$ hdfs dfs -ls /
  2. Found 5 items
  3. drwxr-xr-x   - hadoop supergroup          0 2022-04-15 22:04 /hbase
  4. drwxr-xr-x   - hadoop supergroup          0 2022-04-02 18:24 /input
  5. drwxr-xr-x   - hadoop supergroup          0 2022-04-02 18:26 /output
  6. drwxr-xr-x   - hadoop supergroup          0 2022-05-06 17:24 /tmp
  7. drwxr-xr-x   - hadoop supergroup          0 2022-04-29 17:03 /user
复制代码

第13章 大数据平台监控命令

实验一:通过命令监控大数据平台运行状态

1.1. 实验目标

完成本实验,您应该能够:

  • 掌握大数据平台的运行状况
  • 掌握查看大数据平台运行状况的命令
1.2. 实验要求


  • 认识查看大数据平台运行状态的方式
  • 相识查看大数据平台运行状况的命令
1.3. 实验过程

1.3.1. 实验任务一:通过命令查看大数据平台状态

1.3.1.1. 步骤一: 查看 Linux 系统的信息( uname -a)
  1. [root@master ~]# uname -a
  2. Linux master 3.10.0-862.el7.x86_64 #1 SMP Fri Apr 20 16:44:24 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
复制代码
1.3.1.2. 步骤二:查看硬盘信息

(1)查看全部分区(fdisk -l)
  1. [root@master ~]# fdisk -l
  2. Disk /dev/sda: 21.5 GB, 21474836480 bytes, 41943040 sectors
  3. Units = sectors of 1 * 512 = 512 bytes
  4. Sector size (logical/physical): 512 bytes / 512 bytes
  5. I/O size (minimum/optimal): 512 bytes / 512 bytes
  6. Disk label type: dos
  7. Disk identifier: 0x00096169
  8.    Device Boot      Start         End      Blocks   Id  System
  9. /dev/sda1   *        2048     2099199     1048576   83  Linux
  10. /dev/sda2         2099200    41943039    19921920   8e  Linux LVM
  11. Disk /dev/mapper/centos-root: 18.2 GB, 18249416704 bytes, 35643392 sectors
  12. Units = sectors of 1 * 512 = 512 bytes
  13. Sector size (logical/physical): 512 bytes / 512 bytes
  14. I/O size (minimum/optimal): 512 bytes / 512 bytes
复制代码
  1. Disk /dev/mapper/centos-swap: 2147 MB, 2147483648 bytes, 4194304 sectors
  2. Units = sectors of 1 * 512 = 512 bytes
  3. Sector size (logical/physical): 512 bytes / 512 bytes
  4. I/O size (minimum/optimal): 512 bytes / 512 bytes
复制代码
(2)查看全部交换分区(swapon -s)
  1. [root@master ~]# swapon -s
  2. Filename                                Type                Size        Used        Priority
  3. /dev/dm-1                                      partition        2097148        0        -
复制代码
(3)查看文件系统占比(df -h)
  1. [root@master ~]# df -h
  2. Filesystem               Size  Used Avail Use% Mounted on
  3. /dev/mapper/centos-root   17G  4.8G   13G  28% /
  4. devtmpfs                 980M     0  980M   0% /dev
  5. tmpfs                    992M     0  992M   0% /dev/shm
  6. tmpfs                    992M  9.5M  982M   1% /run
  7. tmpfs                    992M     0  992M   0% /sys/fs/cgroup
  8. /dev/sda1               1014M  130M  885M  13% /boot
  9. tmpfs                    199M     0  199M   0% /run/user/0
复制代码
1.3.1.3. 步骤三: 查看网络 IP 地址( ifconfig)
  1. [root@master ~]# ifconfig
  2. ens32: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
  3.         inet 10.10.10.128  netmask 255.255.255.0  broadcast 10.10.10.255
  4.         inet6 fe80::af34:1702:3972:2b64  prefixlen 64  scopeid 0x20<link>
  5.         ether 00:0c:29:2e:33:83  txqueuelen 1000  (Ethernet)
  6.         RX packets 342  bytes 29820 (29.1 KiB)
  7.         RX errors 0  dropped 0  overruns 0  frame 0
  8.         TX packets 257  bytes 26394 (25.7 KiB)
  9.         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
  10. lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
  11.         inet 127.0.0.1  netmask 255.0.0.0
  12.         inet6 ::1  prefixlen 128  scopeid 0x10<host>
  13.         loop  txqueuelen 1000  (Local Loopback)
  14.         RX packets 4  bytes 360 (360.0 B)
  15.         RX errors 0  dropped 0  overruns 0  frame 0
  16.         TX packets 4  bytes 360 (360.0 B)
  17.         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
复制代码
1.3.1.4. 步骤四:查看全部监听端口( netstat -lntp)
  1. [root@master ~]# netstat -lntp
  2. Active Internet connections (only servers)
  3. Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name   
  4. tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      933/sshd            
  5. tcp6       0      0 :::3306                 :::*                    LISTEN      1021/mysqld         
  6. tcp6       0      0 :::22                   :::*                    LISTEN      933/sshd     、
复制代码
1.3.1.5. 步骤五:查看全部已经创建的连接( netstat -antp)
  1. [root@master ~]# netstat -antp
  2. Active Internet connections (servers and established)
  3. Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name   
  4. tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      933/sshd            
  5. tcp        0     52 10.10.10.128:22         10.10.10.1:59963        ESTABLISHED 1249/sshd: root@pts
  6. tcp6       0      0 :::3306                 :::*                    LISTEN      1021/mysqld         
  7. tcp6       0      0 :::22                   :::*                    LISTEN      933/sshd      
复制代码
1.3.1.6. 步骤六:实时显示进程状态( top ),该命令可以查看进程对 CPU 、内存的占比等。
  1. [root@master ~]# top
  2. top - 16:09:46 up 47 min,  2 users,  load average: 0.00, 0.01, 0.05
  3. Tasks: 115 total,   1 running, 114 sleeping,   0 stopped,   0 zombie
  4. %Cpu(s):  0.1 us,  0.0 sy,  0.0 ni, 99.9 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
  5. KiB Mem :  2030172 total,  1575444 free,   281296 used,   173432 buff/cache
  6. KiB Swap:  2097148 total,  2097148 free,        0 used.  1571928 avail Mem
  7.    PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                             
  8.   1021 mysql     20   0 1258940 191544   6840 S   0.3  9.4   0:01.71 mysqld                              
  9.      1 root      20   0  125456   3896   2560 S   0.0  0.2   0:00.96 systemd                             
  10.      2 root      20   0       0      0      0 S   0.0  0.0   0:00.00 kthreadd                           
  11.      3 root      20   0       0      0      0 S   0.0  0.0   0:00.00 ksoftirqd/0                        
  12.      5 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 kworker/0:0H                        
  13.      7 root      rt   0       0      0      0 S   0.0  0.0   0:00.02 migration/0                        
  14.      8 root      20   0       0      0      0 S   0.0  0.0   0:00.00 rcu_bh                              
  15.      9 root      20   0       0      0      0 S   0.0  0.0   0:00.15 rcu_sched                           
  16.     10 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 lru-add-drain                       
  17.     11 root      rt   0       0      0      0 S   0.0  0.0   0:00.00 watchdog/0                          
  18.     12 root      rt   0       0      0      0 S   0.0  0.0   0:00.00 watchdog/1                          
  19.     13 root      rt   0       0      0      0 S   0.0  0.0   0:00.00 migration/1                        
  20.     14 root      20   0       0      0      0 S   0.0  0.0   0:00.00 ksoftirqd/1                        
  21.     16 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 kworker/1:0H                        
  22.     17 root      rt   0       0      0      0 S   0.0  0.0   0:00.00 watchdog/2                          
  23.     18 root      rt   0       0      0      0 S   0.0  0.0   0:00.00 migration/2                        
  24.     19 root      20   0       0      0      0 S   0.0  0.0   0:00.00 ksoftirqd/2   
复制代码
1.3.1.7. 步骤七:查看 U CPU 信息( cat /proc/cpuinfo )
  1. [root@master ~]# cat /proc/cpuinfo
  2. processor        : 0
  3. vendor_id        : GenuineIntel
  4. cpu family        : 6
  5. model                : 158
  6. model name        : Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz
  7. stepping        : 10
  8. microcode        : 0xb4
  9. cpu MHz                : 3191.998
  10. cache size        : 12288 KB
  11. physical id        : 0
  12. siblings        : 2
  13. core id                : 0
  14. cpu cores        : 2
  15. apicid                : 0
  16. initial apicid        : 0
  17. fpu                : yes
  18. fpu_exception        : yes
  19. cpuid level        : 22
  20. wp                : yes
  21. flags                : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon nopl xtopology tsc_reliable nonstop_tsc eagerfpu pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch fsgsbase tsc_adjust bmi1 avx2 smep bmi2 invpcid rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 ibpb ibrs stibp arat spec_ctrl intel_stibp arch_capabilities
  22. bogomips        : 6383.99
  23. clflush size        : 64
  24. cache_alignment        : 64
  25. address sizes        : 45 bits physical, 48 bits virtual
  26. power management:
  27. processor        : 1
  28. vendor_id        : GenuineIntel
  29. cpu family        : 6
  30. model                : 158
  31. model name        : Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz
  32. stepping        : 10
  33. microcode        : 0xb4
  34. cpu MHz                : 3191.998
  35. cache size        : 12288 KB
  36. physical id        : 0
  37. siblings        : 2
  38. core id                : 1
  39. cpu cores        : 2
  40. apicid                : 1
  41. initial apicid        : 1
  42. fpu                : yes
  43. fpu_exception        : yes
  44. cpuid level        : 22
  45. wp                : yes
  46. flags                : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon nopl xtopology tsc_reliable nonstop_tsc eagerfpu pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch fsgsbase tsc_adjust bmi1 avx2 smep bmi2 invpcid rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 ibpb ibrs stibp arat spec_ctrl intel_stibp arch_capabilities
  47. bogomips        : 6383.99
  48. clflush size        : 64
  49. cache_alignment        : 64
  50. address sizes        : 45 bits physical, 48 bits virtual
  51. power management:
  52. processor        : 2
  53. vendor_id        : GenuineIntel
  54. cpu family        : 6
  55. model                : 158
  56. model name        : Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz
  57. stepping        : 10
  58. microcode        : 0xb4
  59. cpu MHz                : 3191.998
  60. cache size        : 12288 KB
  61. physical id        : 1
  62. siblings        : 2
  63. core id                : 0
  64. cpu cores        : 2
  65. apicid                : 2
  66. initial apicid        : 2
  67. fpu                : yes
  68. fpu_exception        : yes
  69. cpuid level        : 22
  70. wp                : yes
  71. flags                : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon nopl xtopology tsc_reliable nonstop_tsc eagerfpu pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch fsgsbase tsc_adjust bmi1 avx2 smep bmi2 invpcid rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 ibpb ibrs stibp arat spec_ctrl intel_stibp arch_capabilities
  72. bogomips        : 6383.99
  73. clflush size        : 64
  74. cache_alignment        : 64
  75. address sizes        : 45 bits physical, 48 bits virtual
  76. power management:
  77. processor        : 3
  78. vendor_id        : GenuineIntel
  79. cpu family        : 6
  80. model                : 158
  81. model name        : Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz
  82. stepping        : 10
  83. microcode        : 0xb4
  84. cpu MHz                : 3191.998
  85. cache size        : 12288 KB
  86. physical id        : 1
  87. siblings        : 2
  88. core id                : 1
  89. cpu cores        : 2
  90. apicid                : 3
  91. initial apicid        : 3
  92. fpu                : yes
  93. fpu_exception        : yes
  94. cpuid level        : 22
  95. wp                : yes
  96. flags                : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon nopl xtopology tsc_reliable nonstop_tsc eagerfpu pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch fsgsbase tsc_adjust bmi1 avx2 smep bmi2 invpcid rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 ibpb ibrs stibp arat spec_ctrl intel_stibp arch_capabilities
  97. bogomips        : 6383.99
  98. clflush size        : 64
  99. cache_alignment        : 64
  100. address sizes        : 45 bits physical, 48 bits virtual
  101. power management:
复制代码
1.3.1.8. 步骤八:查看内存信息( cat /proc/meminfo ),该命令可以查看总内存、空闲内存等信息。
  1. [root@master ~]# cat /proc/meminfo
  2. MemTotal:        2030172 kB
  3. MemFree:         1575448 kB
  4. MemAvailable:    1571932 kB
  5. Buffers:            2112 kB
  6. Cached:           126676 kB
  7. SwapCached:            0 kB
  8. Active:           251708 kB
  9. Inactive:         100540 kB
  10. Active(anon):     223876 kB
  11. Inactive(anon):     9252 kB
  12. Active(file):      27832 kB
  13. Inactive(file):    91288 kB
  14. Unevictable:           0 kB
  15. Mlocked:               0 kB
  16. SwapTotal:       2097148 kB
  17. SwapFree:        2097148 kB
  18. Dirty:                 0 kB
  19. Writeback:             0 kB
  20. AnonPages:        223648 kB
  21. Mapped:            28876 kB
  22. Shmem:              9668 kB
  23. Slab:              44644 kB
  24. SReclaimable:      18208 kB
  25. SUnreclaim:        26436 kB
  26. KernelStack:        4512 kB
  27. PageTables:         4056 kB
  28. NFS_Unstable:          0 kB
  29. Bounce:                0 kB
  30. WritebackTmp:          0 kB
  31. CommitLimit:     3112232 kB
  32. Committed_AS:     782724 kB
  33. VmallocTotal:   34359738367 kB
  34. VmallocUsed:      180220 kB
  35. VmallocChunk:   34359310332 kB
  36. HardwareCorrupted:     0 kB
  37. AnonHugePages:    178176 kB
  38. CmaTotal:              0 kB
  39. CmaFree:               0 kB
  40. HugePages_Total:       0
  41. HugePages_Free:        0
  42. HugePages_Rsvd:        0
  43. HugePages_Surp:        0
  44. Hugepagesize:       2048 kB
  45. DirectMap4k:       63360 kB
  46. DirectMap2M:     2033664 kB
  47. DirectMap1G:           0 kB
复制代码
1.3.2. 实验任务二:通过命令查看 Hadoop 状态

1.3.2.1. 步骤一:切换到 hadoop 用户

若当前的用户为 root,请切换到 hadoop 用户进行操作。
  1. [root@master ~]# su - hadoop
  2. Last login: Tue May 10 14:33:03 CST 2022 on pts/0
  3. [hadoop@master ~]$
复制代码
1.3.2.2. 步骤二:切换到 Hadoop 的安装目录
  1. [hadoop@master ~]$ cd /usr/local/src/hadoop/
  2. [hadoop@master hadoop]$
复制代码
1.3.2.3. 步骤三:启动 Hadoop
  1. [hadoop@master hadoop]$ start-all.sh
  2. This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
  3. Starting namenodes on [master]
  4. master: starting namenode, logging to /usr/local/src/hadoop/logs/hadoop-hadoop-namenode-master.out
  5. 10.10.10.130: starting datanode, logging to /usr/local/src/hadoop/logs/hadoop-hadoop-datanode-slave2.out
  6. 10.10.10.129: starting datanode, logging to /usr/local/src/hadoop/logs/hadoop-hadoop-datanode-slave1.out
  7. Starting secondary namenodes [0.0.0.0]
  8. 0.0.0.0: starting secondarynamenode, logging to /usr/local/src/hadoop/logs/hadoop-hadoop-secondarynamenode-master.out
  9. starting yarn daemons
  10. starting resourcemanager, logging to /usr/local/src/hadoop/logs/yarn-hadoop-resourcemanager-master.out
  11. 10.10.10.130: starting nodemanager, logging to /usr/local/src/hadoop/logs/yarn-hadoop-nodemanager-slave2.out
  12. 10.10.10.129: starting nodemanager, logging to /usr/local/src/hadoop/logs/yarn-hadoop-nodemanager-slave1.out
  13. [hadoop@master hadoop]$ jps
  14. 1697 SecondaryNameNode
  15. 2115 Jps
  16. 1865 ResourceManager
  17. 1498 NameNode
复制代码
1.3.2.4. 步骤四:关闭 Hadoop
  1. [hadoop@master hadoop]$ stop-all.sh
  2. This script is Deprecated. Instead use stop-dfs.sh and stop-yarn.sh
  3. Stopping namenodes on [master]
  4. master: stopping namenode
  5. 10.10.10.130: stopping datanode
  6. 10.10.10.129: stopping datanode
  7. Stopping secondary namenodes [0.0.0.0]
  8. 0.0.0.0: stopping secondarynamenode
  9. stopping yarn daemons
  10. stopping resourcemanager
  11. 10.10.10.129: stopping nodemanager
  12. 10.10.10.130: stopping nodemanager
  13. no proxyserver to stop
复制代码
实验二:通过命令监控大数据平台资源状态

2.1 实验目标

完成本实验,您应该能够:

  • 掌握大数据平台资源的运行状况
  • 掌握查看大数据平台资源运行状况的命令
2.2. 实验要求


  • 认识查看大数据平台资源运行状态的方式

  • 相识查看大数据平台资源运行状况的命令
2.3. 实验过程

2.3.1. 实验任务一:看通过命令查看YARN状态

2.3.1.1. 步骤一:确认切换到目录 确认切换到目录 /usr/local/src/hadoop
  1. [hadoop@master ~]$ cd /usr/local/src/hadoop/
  2. [hadoop@master hadoop]$
复制代码
2.3.1.2. 步骤二:返回主机界面在在Master主机上执行 start-all.sh
  1. [hadoop@master ~]$ start-all.sh
  2. This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
  3. Starting namenodes on [master]
  4. master: starting namenode, logging to /usr/local/src/hadoop/logs/hadoop-hadoop-namenode-master.out
  5. 10.10.10.129: starting datanode, logging to /usr/local/src/hadoop/logs/hadoop-hadoop-datanode-slav1.out
  6. 10.10.10.130: starting datanode, logging to /usr/local/src/hadoop/logs/hadoop-hadoop-datanode-slave2.out
  7. Starting secondary namenodes [0.0.0.0]
  8. 0.0.0.0: starting secondarynamenode, logging to /usr/local/src/hadoop/logs/hadoop-hadoop-secondarynamenode-master.out
  9. starting yarn daemons
  10. starting resourcemanager, logging to /usr/local/src/hadoop/logs/yarn-hadoop-resourcemanager-master.out
  11. 10.10.10.129: starting nodemanager, logging to /usr/local/src/hadoop/logs/yarn-hadoop-nodemanager-slav1.out
  12. 10.10.10.130: starting nodemanager, logging to /usr/local/src/hadoop/logs/yarn-hadoop-nodemanager-slave2.out
  13. [hadoop@master ~]$
  14. #master 节点启动 zookeeper
  15. [hadoop@master hadoop]$ zkServer.sh start
  16. ZooKeeper JMX enabled by default
  17. Using config: /usr/local/src/zookeeper/bin/../conf/zoo.cfg
  18. Starting zookeeper ... STARTED
  19. #slave1 节点启动 zookeeper
  20. [hadoop@slav1 hadoop]$ zkServer.sh start
  21. ZooKeeper JMX enabled by default
  22. Using config: /usr/local/src/zookeeper/bin/../conf/zoo.cfg
  23. Starting zookeeper ... STARTED
  24. #slave2 节点启动 zookeeper
  25. [hadoop@slave2 hadoop]$ zkServer.sh start
  26. ZooKeeper JMX enabled by default
  27. Using config: /usr/local/src/zookeeper/bin/../conf/zoo.cfg
  28. Starting zookeeper ... STARTED
复制代码
2.3.1.3. 步骤三:执行JPS命令,发现Master上有NodeManager进程和ResourceManager进程,则YARN启动完成。
  1. 2817 NameNode
  2. 3681 ResourceManager
  3. 3477 NodeManager
  4. 3909 Jps
  5. 2990 SecondaryNameNode
复制代码
2.3.2. 实验任务二:通过命令查看HDFS状态

2.3.2.1. 步骤一:目录操作

切换到 hadoop 目录,执行 cd /usr/local/src/hadoop 命令
  1. [hadoop@master ~]$ cd /usr/local/src/hadoop
  2. [hadoop@master hadoop]$
复制代码
查看 HDFS 目录
  1. [hadoop@master hadoop]$ ./bin/hdfs dfs –ls /
复制代码
2.3.2.2. 步骤二:查看HDSF的报告,执行命令:bin/hdfs dfsadmin -report
  1. [hadoop@master hadoop]$ bin/hdfs dfsadmin -report
  2. Configured Capacity: 36477861888 (33.97 GB)
  3. Present Capacity: 31767752704 (29.59 GB)
  4. DFS Remaining: 31767146496 (29.59 GB)
  5. DFS Used: 606208 (592 KB)
  6. DFS Used%: 0.00%
  7. Under replicated blocks: 0
  8. Blocks with corrupt replicas: 0
  9. Missing blocks: 0
  10. Missing blocks (with replication factor 1): 0
  11. -------------------------------------------------
  12. Live datanodes (2):
  13. Name: 10.10.10.129:50010 (node1)
  14. Hostname: node1
  15. Decommission Status : Normal
  16. Configured Capacity: 18238930944 (16.99 GB)
  17. DFS Used: 303104 (296 KB)
  18. Non DFS Used: 2379792384 (2.22 GB)
  19. DFS Remaining: 15858835456 (14.77 GB)
  20. DFS Used%: 0.00%
  21. DFS Remaining%: 86.95%
  22. Configured Cache Capacity: 0 (0 B)
  23. Cache Used: 0 (0 B)
  24. Cache Remaining: 0 (0 B)
  25. Cache Used%: 100.00%
  26. Cache Remaining%: 0.00%
  27. Xceivers: 1
  28. Last contact: Fri May 20 18:31:48 CST 2022
复制代码
  1. Name: 10.10.10.130:50010 (node2)
  2. Hostname: node2
  3. Decommission Status : Normal
  4. Configured Capacity: 18238930944 (16.99 GB)
  5. DFS Used: 303104 (296 KB)
  6. Non DFS Used: 2330316800 (2.17 GB)
  7. DFS Remaining: 15908311040 (14.82 GB)
  8. DFS Used%: 0.00%
  9. DFS Remaining%: 87.22%
  10. Configured Cache Capacity: 0 (0 B)
  11. Cache Used: 0 (0 B)
  12. Cache Remaining: 0 (0 B)
  13. Cache Used%: 100.00%
  14. Cache Remaining%: 0.00%
  15. Xceivers: 1
  16. Last contact: Fri May 20 18:31:48 CST 2022
复制代码
2.3.2.3. 步骤三:查看 HDFS 空间环境,执行命令:hdfs dfs -df
  1. [hadoop@master hadoop]$ hdfs dfs -df
  2. Filesystem                 Size    Used    Available  Use%
  3. hdfs://master:9000  36477861888  606208  31767146496    0%
复制代码
2.3.3. 实验任务三:看通过命令查看HBase状态

2.3.3.1. 步骤一 :启动运行HBase

切换到 HBase 安装目录/usr/local/src/hbase,命令如下:
  1. [hadoop@master hadoop]$ cd /usr/local/src/hbase
  2. [hadoop@master hbase]$ hbase version
  3. HBase 1.2.1
  4. Source code repository git://asf-dev/home/busbey/projects/hbase revision=8d8a7107dc4ccbf36a92f64675dc60392f85c015
  5. Compiled by busbey on Wed Mar 30 11:19:21 CDT 2016
  6. From source with checksum f4bb4a14bb4e0b72b46f729dae98a772
复制代码
结果显示 HBase1.2.1,说明 HBase 正在运行,版本号为 1.2.1。
2.3.3.2. 步骤二:查看HBase版本信息

执行命令hbase shell,进入HBase命令交互界面。
  1. [hadoop@master hbase]$ hbase shell
  2. SLF4J: Class path contains multiple SLF4J bindings.
  3. SLF4J: Found binding in [jar:file:/usr/local/src/hbase/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  4. SLF4J: Found binding in [jar:file:/usr/local/src/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  5. SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
  6. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
  7. HBase Shell; enter 'help<RETURN>' for list of supported commands.
  8. Type "exit<RETURN>" to leave the HBase Shell
  9. Version 1.2.1, r8d8a7107dc4ccbf36a92f64675dc60392f85c015, Wed Mar 30 11:19:21 CDT 2016
复制代码
输入version,查询 HBase 版本
  1. hbase(main):001:0> version
  2. 1.2.1, r8d8a7107dc4ccbf36a92f64675dc60392f85c015, Wed Mar 30 11:19:21 CDT 2016
复制代码
结果显示 HBase 版本为 1.2.1
2.3.3.3. 步骤三 :查询 HBase 状态,在 HBase 命令交互界面,执行 status 命令
  1. 1 active master, 0 backup masters, 3 servers, 0 dead, 0.6667
  2. average load
复制代码
我们还可以“简单”查询 HBase 的状态,执行命令 status 'simple'
  1. active master: master:16000 1589125905790
  2. 0 backup masters
  3. 3 live servers
  4. master:16020 1589125908065
  5. requestsPerSecond=0.0, numberOfOnlineRegions=1,
  6. usedHeapMB=28, maxHeapMB=1918, numberOfStores=1,
  7. numberOfStorefiles=1, storefileUncompressedSizeMB=0,
  8. storefileSizeMB=0, memstoreSizeMB=0, storefileIndexSizeMB=0,
  9. readRequestsCount=5, writeRequestsCount=1, rootIndexSizeKB=0,
  10. totalStaticIndexSizeKB=0, totalStaticBloomSizeKB=0,
  11. totalCompactingKVs=0, currentCompactedKVs=0,
  12. compactionProgressPct=NaN, coprocessors=[MultiRowMutationEndpoint]
  13. slave1:16020 1589125915820
  14. requestsPerSecond=0.0, numberOfOnlineRegions=0,
  15. usedHeapMB=17, maxHeapMB=440, numberOfStores=0,
  16. numberOfStorefiles=0, storefileUncompressedSizeMB=0,
  17. storefileSizeMB=0, memstoreSizeMB=0, storefileIndexSizeMB=0,
  18. readRequestsCount=0, writeRequestsCount=0, rootIndexSizeKB=0,
  19. totalStaticIndexSizeKB=0, totalStaticBloomSizeKB=0,
  20. totalCompactingKVs=0, currentCompactedKVs=0,
  21. compactionProgressPct=NaN, coprocessors=[]
  22. slave2:16020 1589125917741
  23. requestsPerSecond=0.0, numberOfOnlineRegions=1,
  24. usedHeapMB=15, maxHeapMB=440, numberOfStores=1,
  25. numberOfStorefiles=1, storefileUncompressedSizeMB=0,
  26. storefileSizeMB=0, memstoreSizeMB=0, storefileIndexSizeMB=0,
  27. readRequestsCount=4, writeRequestsCount=0, rootIndexSizeKB=0,
  28. totalStaticIndexSizeKB=0, totalStaticBloomSizeKB=0,
  29. totalCompactingKVs=0, currentCompactedKVs=0,
  30. compactionProgressPct=NaN, coprocessors=[]
  31. 0 dead servers
  32. Aggregate load: 0, regions: 2
复制代码
显示更多的关于 Master、Slave1和 Slave2 主机的服务端口、请求时间等具体信息。
如果需要查询更多关于 HBase 状态,执行命令 help 'status'
  1. hbase(main):004:0> help 'status'
  2. Show cluster status. Can be 'summary', 'simple', 'detailed', or 'replication'. The
  3. default is 'summary'. Examples:
  4.   hbase> status
  5.   hbase> status 'simple'
  6.   hbase> status 'summary'
  7.   hbase> status 'detailed'
  8.   hbase> status 'replication'
  9.   hbase> status 'replication', 'source'
  10.   hbase> status 'replication', 'sink'
复制代码
结果显示出全部关于 status 的命令。
2.3.3.4. 步骤四 停止HBase服务

停止HBase服务,则执行命令stop-hbase.sh。
  1. [hadoop@master hbase]$ stop-hbase.sh
  2. stopping hbasecat.........
复制代码
2.4.4. 实验任务四:通过命令查看 Hive 状态

2.4.4.1. 步骤一:启动 Hive

切换到/usr/local/src/hive 目录,输入 hive,回车。
  1. [hadoop@master ~]$ cd /usr/local/src/hive/[hadoop@master hive]$ hive
  2. SLF4J: Class path contains multiple SLF4J bindings.
  3. SLF4J: Found binding in [jar:file:/usr/local/src/hive/lib/hive-jdbc-2.0.0-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  4. SLF4J: Found binding in [jar:file:/usr/local/src/hive/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  5. SLF4J: Found binding in [jar:file:/usr/local/src/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  6. SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
  7. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
  8. Logging initialized using configuration in jar:file:/usr/local/src/hive/lib/hive-common-2.0.0.jar!/hive-log4j2.properties
  9. Fri May 20 18:51:50 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
  10. Fri May 20 18:51:50 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
  11. Fri May 20 18:51:50 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
  12. Fri May 20 18:51:50 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
  13. Fri May 20 18:51:52 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
  14. Fri May 20 18:51:52 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
  15. Fri May 20 18:51:52 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
  16. Fri May 20 18:51:52 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
  17. Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
  18. hive>
复制代码
当显示 hive>时,表示启动成功,进入到了 Hive shell 状态。
2.4.4.2. 步骤二:Hive 操作基本命令

注意:Hive 命令行语句后面一定要加分号。
(1)查看数据库
  1. hive> show databases;
  2. OK
  3. default
  4. sample
  5. Time taken: 0.596 seconds, Fetched: 2 row(s)
  6. hive>
复制代码
显示默认的数据库 default。
(2)查看 default 数据库全部表
  1. hive> use default;
  2. OK
  3. Time taken: 0.018 seconds
  4. hive> show tables;
  5. OK
  6. test
  7. Time taken: 0.036 seconds, Fetched: 1 row(s)
  8. hive>
复制代码
显示 default 数据中没有任何表。
(3)创建表 stu,表的 id 为整数型,name 为字符型
  1. hive> create table stu(id int,name string);
  2. OK
  3. Time taken: 0.23 seconds
  4. hive>
复制代码
(4)为表 stu 插入一条信息,id 号为 001,name 为张三
  1. hive> insert into stu values (1001,"zhangsan");
  2. WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
  3. Query ID = hadoop_20220520185326_7c18630d-0690-4b35-8de8-423c9b901677
  4. Total jobs = 3
  5. Launching Job 1 out of 3
  6. Number of reduce tasks is set to 0 since there's no reduce operator
  7. Starting Job = job_1653042072571_0001, Tracking URL = http://master:8088/proxy/application_1653042072571_0001/
  8. Kill Command = /usr/local/src/hadoop/bin/hadoop job  -kill job_1653042072571_0001
  9. Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
  10. 2022-05-20 18:56:05,436 Stage-1 map = 0%,  reduce = 0%
  11. 2022-05-20 18:56:11,699 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 3.47 sec
  12. MapReduce Total cumulative CPU time: 3 seconds 470 msec
  13. Ended Job = job_1653042072571_0001
  14. Stage-4 is selected by condition resolver.
  15. Stage-3 is filtered out by condition resolver.
  16. Stage-5 is filtered out by condition resolver.
  17. Moving data to: hdfs://master:9000/user/hive/warehouse/stu/.hive-staging_hive_2022-05-20_18-55-52_567_2370673334190980235-1/-ext-10000
  18. Loading data to table default.stu
  19. MapReduce Jobs Launched:
  20. Stage-Stage-1: Map: 1   Cumulative CPU: 3.47 sec   HDFS Read: 4138 HDFS Write: 81 SUCCESS
  21. Total MapReduce CPU Time Spent: 3 seconds 470 msec
  22. OK
  23. Time taken: 20.438 seconds
复制代码
按照以上操作,继续插入两条信息:id 和 name 分别为 1002、1003 和 lisi、wangwu。
(5)插入数据后查看表的信息
  1. hive> show tables;
  2. OK
  3. stu
  4. test
  5. values__tmp__table__1
  6. Time taken: 0.017 seconds, Fetched: 3 row(s)
  7. hive>
复制代码
(6)查看表 stu 布局
  1. hive> desc stu;
  2. OK
  3. id                          int                                             
  4. name                        string                                          
  5. Time taken: 0.031 seconds, Fetched: 2 row(s)
  6. hive>
复制代码
(7)查看表 stu 的内容
  1. hive> select * from stu;
  2. OK
  3. 1001        zhangsan
  4. Time taken: 0.077 seconds, Fetched: 1 row(s)
  5. hive>
复制代码
2.4.4.3. 步骤三:通过 Hive 命令行界面查看文件系统和历史命令

(1)查看当地文件系统,执行命令 ! ls /usr/local/src;
  1. hive> ! ls /usr/local/src;
  2. apache-hive-2.0.0-bin
  3. flume
  4. hadoop
  5. hbase
  6. hive
  7. jdk
  8. sqoop
  9. zookeeper
复制代码
(2)查看 HDFS 文件系统,执行命令 dfs -ls /;
  1. hive> dfs -ls /;
  2. Found 5 items
  3. drwxr-xr-x   - hadoop supergroup          0 2022-04-15 22:04 /hbase
  4. drwxr-xr-x   - hadoop supergroup          0 2022-04-02 18:24 /input
  5. drwxr-xr-x   - hadoop supergroup          0 2022-04-02 18:26 /output
  6. drwxr-xr-x   - hadoop supergroup          0 2022-05-20 18:55 /tmp
  7. drwxr-xr-x   - hadoop supergroup          0 2022-04-29 17:03 /user
复制代码
(3)查看在 Hive 中输入的全部历史命令
进入到当前用户 Hadoop 的目录/home/hadoop,查看.hivehistory 文件。
  1. [hadoop@master ~]$ cd /home/hadoop
  2. [hadoop@master ~]$ cat .hivehistory
  3. create database sample;
  4. use sample;
  5. create table student(number STRING,name STRING);
  6. exit;
  7. select * from sample.student;
  8. exit;
  9. show tables;
  10. exit;
  11. show databases;
  12. use default;
  13. show tables;
  14. create table stu(id int,name string);
  15. insert into stu values (1001,"zhangsan");
  16. show tables;
  17. desc stu;
  18. select * from stu;
  19. ! ls /usr/local/src;
  20. dfs -ls /;
  21. exit
  22. ;
复制代码
结果显示,之前在 Hive 命令行界面下运行的全部命令(含错误命令)都显示了出来,有助于维护、故障排查等工作。
实验三 通过命令监控大数据平台服务状态

3.1. 实验目标

完成本实验,您应该能够:

  • 掌握大数据平台服务的运行状况
  • 掌握查看大数据平台服务运行状况的命令
3.2. 实验要求


  • 认识查看大数据平台服务运行状态的方式
  • 相识查看大数据平台服务运行状况的命令
3.3. 实验过程

3.3.1. 实验任务一: 通过命令查看 ZooKeeper 状态

3.3.1.1. 步骤一: 查看ZooKeeper状态,执行命令 zkServer.sh status,结果显示如下
  1. [hadoop@master ~]$ zkServer.sh status
  2. ZooKeeper JMX enabled by default
  3. Using config: /usr/local/src/zookeeper/bin/../conf/zoo.cfg
  4. Mode: follower
复制代码
以上结果中,Mode:follower 表示为 ZooKeeper 的跟随者。
3.3.1.2. 步骤二: 查看运行进程

QuorumPeerMain:QuorumPeerMain 是 ZooKeeper 集群的启动入口类,是用来加载配置启动 QuorumPeer线程的。
执行命令 jps 以查看进程环境。
  1. [hadoop@master ~]$ jps
  2. 5029 Jps
  3. 3494 SecondaryNameNode
  4. 3947 QuorumPeerMain
  5. 3292 NameNode
  6. 3660 ResourceManager
复制代码
3.3.1.3. 步骤四: 在成功启动ZooKeeper服务后,输入命令 zkCli.sh,连接到ZooKeeper 服务。
  1. [hadoop@master ~]$ zkCli.sh
  2. Connecting to localhost:2181
  3. 2022-05-20 19:07:11,924 [myid:] - INFO  [main:Environment@100] - Client environment:zookeeper.version=3.4.8--1, built on 02/06/2016 03:18 GMT
  4. 2022-05-20 19:07:11,927 [myid:] - INFO  [main:Environment@100] - Client environment:host.name=master
  5. 2022-05-20 19:07:11,927 [myid:] - INFO  [main:Environment@100] - Client environment:java.version=1.8.0_152
  6. 2022-05-20 19:07:11,929 [myid:] - INFO  [main:Environment@100] - Client environment:java.vendor=Oracle Corporation
  7. 2022-05-20 19:07:11,929 [myid:] - INFO  [main:Environment@100] - Client environment:java.home=/usr/local/src/jdk/jre
  8. 2022-05-20 19:07:11,929 [myid:] - INFO  [main:Environment@100] - Client environment:java.class.path=/usr/local/src/zookeeper/bin/../build/classes:/usr/local/src/zookeeper/bin/../build/lib/*.jar:/usr/local/src/zookeeper/bin/../lib/slf4j-log4j12-1.6.1.jar:/usr/local/src/zookeeper/bin/../lib/slf4j-api-1.6.1.jar:/usr/local/src/zookeeper/bin/../lib/netty-3.7.0.Final.jar:/usr/local/src/zookeeper/bin/../lib/log4j-1.2.16.jar:/usr/local/src/zookeeper/bin/../lib/jline-0.9.94.jar:/usr/local/src/zookeeper/bin/../zookeeper-3.4.8.jar:/usr/local/src/zookeeper/bin/../src/java/lib/*.jar:/usr/local/src/zookeeper/bin/../conf::/usr/local/src/sqoop/lib
  9. 2022-05-20 19:07:11,929 [myid:] - INFO  [main:Environment@100] - Client environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
  10. 2022-05-20 19:07:11,929 [myid:] - INFO  [main:Environment@100] - Client environment:java.io.tmpdir=/tmp
  11. 2022-05-20 19:07:11,929 [myid:] - INFO  [main:Environment@100] - Client environment:java.compiler=<NA>
  12. 2022-05-20 19:07:11,929 [myid:] - INFO  [main:Environment@100] - Client environment:os.name=Linux
  13. 2022-05-20 19:07:11,929 [myid:] - INFO  [main:Environment@100] - Client environment:os.arch=amd64
  14. 2022-05-20 19:07:11,929 [myid:] - INFO  [main:Environment@100] - Client environment:os.version=3.10.0-862.el7.x86_64
  15. 2022-05-20 19:07:11,929 [myid:] - INFO  [main:Environment@100] - Client environment:user.name=hadoop
  16. 2022-05-20 19:07:11,929 [myid:] - INFO  [main:Environment@100] - Client environment:user.home=/home/hadoop
  17. 2022-05-20 19:07:11,929 [myid:] - INFO  [main:Environment@100] - Client environment:user.dir=/home/hadoop
  18. 2022-05-20 19:07:11,930 [myid:] - INFO  [main:ZooKeeper@438] - Initiating client connection, connectString=localhost:2181 sessionTimeout=30000 watcher=org.apache.zookeeper.ZooKeeperMain$MyWatcher@69d0a921
  19. Welcome to ZooKeeper!
  20. 2022-05-20 19:07:11,946 [myid:] - INFO  [main-SendThread(localhost:2181):ClientCnxn$SendThread@1032] - Opening socket connection to server localhost/0:0:0:0:0:0:0:1:2181. Will not attempt to authenticate using SASL (unknown error)
  21. JLine support is enabled
  22. 2022-05-20 19:07:11,984 [myid:] - INFO  [main-SendThread(localhost:2181):ClientCnxn$SendThread@876] - Socket connection established to localhost/0:0:0:0:0:0:0:1:2181, initiating session
  23. 2022-05-20 19:07:11,991 [myid:] - INFO  [main-SendThread(localhost:2181):ClientCnxn$SendThread@1299] - Session establishment complete on server localhost/0:0:0:0:0:0:0:1:2181, sessionid = 0x180e0fed4990001, negotiated timeout = 30000
  24. WATCHER::
  25. WatchedEvent state:SyncConnected type:None path:null
  26. [zk: localhost:2181(CONNECTED) 0]
复制代码
3.3.1.4. 步骤五: 使用 Watch 监听/hbase 目录,一旦/hbase 内容有变化,将会有提 内容有变化,将会有提示。打开监视,执行命令 示。打开监视,执行命令 get /hbase 1 。
  1. cZxid = 0x100000002
  2. ctime = Thu Apr 23 16:02:29 CST 2022
  3. mZxid = 0x100000002
  4. mtime = Thu Apr 23 16:02:29 CST 2022
  5. pZxid = 0x20000008d
  6. cversion = 26
  7. dataVersion = 0
  8. aclVersion = 0
  9. ephemeralOwner = 0x0
  10. dataLength = 0
  11. numChildren = 16
  12. [zk: localhost:2181(CONNECTED) 1] set /hbase value-update
  13. WATCHER::cZxid = 0x100000002
  14. WatchedEvent state:SyncConnected type:NodeDataChanged
  15. path:/hbase
  16. ctime = Thu Apr 23 16:02:29 CST 2022
  17. mZxid = 0x20000c6d3
  18. mtime = Fri May 15 15:03:41 CST 2022
  19. pZxid = 0x20000008d
  20. cversion = 26
  21. dataVersion = 1
  22. aclVersion = 0
  23. ephemeralOwner = 0x0
  24. dataLength = 12
  25. numChildren = 16
  26. [zk: localhost:2181(CONNECTED) 2] get /hbase
  27. value-update
  28. cZxid = 0x100000002
  29. ctime = Thu Apr 23 16:02:29 CST 2022
  30. mZxid = 0x20000c6d3
  31. mtime = Fri May 15 15:03:41 CST 2022
  32. pZxid = 0x20000008d
  33. cversion = 26
  34. dataVersion = 1
  35. aclVersion = 0
  36. ephemeralOwner = 0x0
  37. dataLength = 12
  38. numChildren = 16
  39. [zk: localhost:2181(CONNECTED) 3] quit
复制代码
结果显示,当执行命令 set /hbase value-update 后,数据版本由 0 变成 1,说明/hbase 处于监控中。
3.3.2. 实验任务二:通过命令查看 Sqoop 状态

3.3.2.1. 步骤一: 查询 Sqoop 版本号,验证 Sqoop 是否启动成功。

起首切换到/usr/local/src/sqoop 目录,执行命令:./bin/sqoop-version
  1. [hadoop@master ~]$ cd /usr/local/src/sqoop
  2. [hadoop@master sqoop]$ ./bin/sqoop-version
  3. Warning: /usr/local/src/sqoop/../hcatalog does not exist! HCatalog jobs will fail.
  4. Please set $HCAT_HOME to the root of your HCatalog installation.
  5. Warning: /usr/local/src/sqoop/../accumulo does not exist! Accumulo imports will fail.
  6. Please set $ACCUMULO_HOME to the root of your Accumulo installation.
  7. 22/05/20 19:10:55 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
  8. Sqoop 1.4.7
  9. git commit id 2328971411f57f0cb683dfb79d19d4d19d185dd8
  10. Compiled by maugli on Thu Dec 21 15:59:58 STD 2017
复制代码
结果显示:Sqoop 1.4.7,说明 Sqoop 版本号为 1.4.7,并启动成功。
3.3.2.2. 步骤二: 测试 Sqoop 是否能够成功连接数据库

切换到Sqoop 的 目 录 , 执 行 命 令 bin/sqoop list-databases --connect jdbc:mysql://master:3306/ --username root --password Password123$,命令中“master:3306”为数据库主机名和端口。
  1. [hadoop@master sqoop]$ bin/sqoop list-databases --connect jdbc:mysql://master:3306/ --username root --password Password123$
  2. Warning: /usr/local/src/sqoop/../hcatalog does not exist! HCatalog jobs will fail.
  3. Please set $HCAT_HOME to the root of your HCatalog installation.
  4. Warning: /usr/local/src/sqoop/../accumulo does not exist! Accumulo imports will fail.
  5. Please set $ACCUMULO_HOME to the root of your Accumulo installation.
  6. 22/05/20 19:13:21 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
  7. 22/05/20 19:13:21 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
  8. 22/05/20 19:13:21 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
  9. Fri May 20 19:13:21 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
  10. information_schema
  11. hive
  12. mysql
  13. performance_schema
  14. sample
  15. sys
复制代码
结果显示,可以连接到 MySQL,并查看到 Master 主机中 MySQL 的全部库实例,如information_schema、hive、mysql、performance_schema 和 sys 等数据库。
3.3.2.3. 步骤三: 执行命令sqoop help ,可以看到如下内容,代表Sqoop 启动成功。
  1. [hadoop@master sqoop]$ sqoop help
  2. Warning: /usr/local/src/sqoop/../hcatalog does not exist! HCatalog jobs will fail.
  3. Please set $HCAT_HOME to the root of your HCatalog installation.
  4. Warning: /usr/local/src/sqoop/../accumulo does not exist! Accumulo imports will fail.
  5. Please set $ACCUMULO_HOME to the root of your Accumulo installation.
  6. 22/05/20 19:14:48 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
  7. usage: sqoop COMMAND [ARGS]
  8. Available commands:
  9.   codegen            Generate code to interact with database records
  10.   create-hive-table  Import a table definition into Hive
  11.   eval               Evaluate a SQL statement and display the results
  12.   export             Export an HDFS directory to a database table
  13.   help               List available commands
  14.   import             Import a table from a database to HDFS
  15.   import-all-tables  Import tables from a database to HDFS
  16.   import-mainframe   Import datasets from a mainframe server to HDFS
  17.   job                Work with saved jobs
  18.   list-databases     List available databases on a server
  19.   list-tables        List available tables in a database
  20.   merge              Merge results of incremental imports
  21.   metastore          Run a standalone Sqoop metastore
  22.   version            Display version information
  23. See 'sqoop help COMMAND' for information on a specific command.
复制代码
结果显示了 Sqoop 的常用命令和功能,如下表所示。


3.3.3. 实验任务三:通过命令查看Flume状态

3.3.3.1. 步骤一: 检查 Flume安装是否成功,执行flume-ng version 命令,查看 Flume的版本。
  1. [hadoop@master ~]$ cd /usr/local/src/flume
  2. [hadoop@master flume]$ flume-ng version
  3. Flume 1.6.0
  4. Source code repository: https://git-wip-us.apache.org/repos/asf/flume.git
  5. Revision: 2561a23240a71ba20bf288c7c2cda88f443c2080
  6. Compiled by hshreedharan on Mon May 11 11:15:44 PDT 2015
  7. From source with checksum b29e416802ce9ece3269d34233baf43f
复制代码
3.3.3.2. 步骤二: 添加 example.conf 到/usr/local/src/flume
  1. [hadoop@master flume]$ cat /usr/local/src/flume/example.conf
  2. a1.sources=r1
  3. a1.sinks=k1
  4. a1.channels=c1
  5. a1.sources.r1.type=spooldir
  6. a1.sources.r1.spoolDir=/usr/local/src/flume/
  7. a1.sources.r1.fileHeader=true
  8. a1.sinks.k1.type=hdfsa1.sinks.k1.hdfs.path=hdfs://master:9000/flumea1.sinks.k1.hdfs.rollsize=1048760a1.sinks.k1.hdfs.rollCount=0a1.sinks.k1.hdfs.rollInterval=900a1.sinks.k1.hdfs.useLocalTimeStamp=truea1.channels.c1.type=filea1.channels.c1.capacity=1000a1.channels.c1.transactionCapacity=100a1.sources.r1.channels = c1a1.sinks.k1.channel = c1
复制代码
3.4.3.3. 步骤三:启动Flume Agent a1 日志控制台
  1. [hadoop@master flume]$ /usr/local/src/flume/bin/flume-ng agent --conf ./conf --conf-file ./example.conf --name a1 -Dflume.root.logger=INFO,console
复制代码
3.4.3.4. 步骤四: 查看结果
  1. [hadoop@master flume]$ hdfs dfs -lsr /flume
  2. drwxr-xr-x - hadoop supergroup 0 2022-05-20 15:16
  3. /flume/20220520
  4. -rw-r--r-- 2 hadoop supergroup 11 2022-05-20 15:16
  5. /flume/20220520/events-.
复制代码
  1. a1.sinks.k1.hdfs.path=hdfs://master:9000/flume a1.sinks.k1.hdfs.rollsize=1048760 a1.sinks.k1.hdfs.rollCount=0 a1.sinks.k1.hdfs.rollInterval=900 a1.sinks.k1.hdfs.useLocalTimeStamp=true a1.channels.c1.type=file a1.channels.c1.capacity=1000 a1.channels.c1.transactionCapacity=100 a1.sources.r1.channels = c1 a1.sinks.k1.channel = c1
复制代码
3.4.3.3. 步骤三:启动Flume Agent a1 日志控制台
  1. [hadoop@master flume]$ /usr/local/src/flume/bin/flume-ng agent --conf ./conf --conf-file ./example.conf --name a1 -Dflume.root.logger=INFO,console
复制代码
3.4.3.4. 步骤四: 查看结果
  1. [hadoop@master flume]$ hdfs dfs -lsr /flume
  2. drwxr-xr-x - hadoop supergroup 0 2022-05-20 15:16
  3. /flume/20220520
  4. -rw-r--r-- 2 hadoop supergroup 11 2022-05-20 15:16
  5. /flume/20220520/events-.
复制代码
免责声明:如果侵犯了您的权益,请联系站长,我们会及时删除侵权内容,谢谢合作!更多信息从访问主页:qidao123.com:ToB企服之家,中国第一个企服评测及商务社交产业平台。

本帖子中包含更多资源

您需要 登录 才可以下载或查看,没有账号?立即注册

x
回复

使用道具 举报

0 个回复

倒序浏览

快速回复

您需要登录后才可以回帖 登录 or 立即注册

本版积分规则

知者何南

金牌会员
这个人很懒什么都没写!

标签云

快速回复 返回顶部 返回列表