实验一、Flume 组件安装配置
1、下载和解压 Flume
可 以 从 官 网 下 载 Flume 组 件 安 装 包 , 下 载 地 址 如 下 URL 链 接 所 示 https://archive.apache.org/dist/flume/1.6.0/- [root@master ~]# ls<br>anaconda-ks.cfg jdk-8u152-linux-x64.tar.gz<br>apache-flume-1.6.0-bin.tar.gz mysql<br>apache-hive-2.0.0-bin.tar.gz mysql-connector-java-5.1.46.jar<br>derby.log sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz<br>hadoop-2.7.1.tar.gz zookeeper-3.4.8.tar.gz<br>hbase-1.2.1-bin.tar.gz<br>#使用 root 用户解压 Flume 安装包到“/usr/local/src”路径,并修改解压后文件夹名为 flume。<br>[root@master ~]# tar xf apache-flume-1.6.0-bin.tar.gz -C /usr/local/src/<br>[root@master ~]# cd /usr/local/src/<br>[root@master src]# ls<br>apache-flume-1.6.0-bin hadoop hbase hive jdk sqoop zookeeper<br>[root@master src]# mv apache-flume-1.6.0-bin flume<br>[root@master src]# ls<br>flume hadoop hbase hive jdk sqoop zookeeper
复制代码 2、Flume 组件部署
步骤一:使用 root 用户设置 Flume 环境变量,并使环境变量对所有用户生效。- [root@master ~]# vim /etc/profile.d/flume.sh<br>export FLUME_HOME=/usr/local/src/flume<br>export PATH=${FLUME_HOME}/bin:$PATH
复制代码 步骤二:修改 Flume 相应配置文件。
首先,切换到 hadoop 用户,并切换当前工作目录到 Flume 的配置文件夹。- [root@master ~]# chown -R hadoop.hadoop /usr/local/src/<br>[root@master ~]# su - hadoop<br>Last login: Fri Apr 14 16:31:48 CST 2023 on pts/1<br>[hadoop@master ~]$ ls<br>derby.log input student.java zookeeper.out<br>[hadoop@master ~]$ cd /usr/local/src/flume/conf/<br>[hadoop@master conf]$ ls<br>flume-conf.properties.template flume-env.sh.template<br>flume-env.ps1.template log4j.properties
复制代码 拷贝 flume-env.sh.template 文件并重命名为 flume-env.sh。- [hadoop@master conf]$ cp flume-env.sh.template flume-env.sh<br>[hadoop@master conf]$ ls<br>flume-conf.properties.template flume-env.sh log4j.properties<br>flume-env.ps1.template flume-env.sh.template
复制代码 步骤三:修改并配置 flume-env.sh 文件。
删除 JAVA_HOME 变量前的注释,修改为 JDK 的安装路径。- [hadoop@master conf]$ vi flume-env.sh<br>export JAVA_HOME=/usr/loocal/src/jdk<br>#export HBASE_CLASSPATH=/usr/local/src/hadoop/etc/hadoop
复制代码 使用 flume-ng version 命令验证安装是否成功,若能够正常查询 Flume 组件版本为 1.6.0,则表示安装成功。- [hadoop@master conf]$ flume-ng version<br>Error: Could not find or load main class org.apache.flume.tools.GetJavaProperty<br>Flume 1.6.0<br>Source code repository: https://git-wip-us.apache.org/repos/asf/flume.git<br>Revision: 2561a23240a71ba20bf288c7c2cda88f443c2080<br>Compiled by hshreedharan on Mon May 11 11:15:44 PDT 2015<br>From source with checksum b29e416802ce9ece3269d34233baf43f
复制代码 3、使用 Flume 发送和接受信息
通过 Flume 将 Web 服务器中数据传输到 HDFS 中。
步骤一:在 Flume 安装目录中创建 xxx.conf 文件。- [hadoop@master flume]$ vi xxx.conf<br>a1.sources=r1<br>a1.sinks=k1<br>a1.channels=c1<br>a1.sources.r1.type=spooldir<br>a1.sources.r1.spoolDir=/usr/local/src/hadoop/logs<br>a1.sources.r1.fileHeader=true<br>a1.sinks.k1.type=hdfs<br>a1.sinks.k1.hdfs.path=hdfs://master:9000/tmp/flume<br>a1.sinks.k1.hdfs.rollsize=1048760<br>a1.sinks.k1.hdfs.rollCount=0<br>a1.sinks.k1.hdfs.rollInterval=900<br>a1.sinks.k1.hfds.useLocalTimeStamp=true<br>a1.channels.c1.type=file<br>a1.channels.c1.capacity=1000<br>a1.channels.c1.transactionCapacity=100<br>a1.sources.r1.channels = c1<br>a1.sinks.k1.channel = c1
复制代码- [hadoop@master flume]$ ls /usr/local/src/hadoop/<br>bin etc lib LICENSE.txt NOTICE.txt sbin tmp<br>dfs include libexec logs README.txt share<br>[hadoop@master flume]$ ls /usr/local/src/hadoop/logs/<br>hadoop-hadoop-namenode-master.example.com.log<br>hadoop-hadoop-namenode-master.example.com.out<br>hadoop-hadoop-namenode-master.example.com.out.1<br>hadoop-hadoop-namenode-master.example.com.out.2<br>hadoop-hadoop-namenode-master.example.com.out.3<br>hadoop-hadoop-namenode-master.example.com.out.4<br>hadoop-hadoop-namenode-master.example.com.out.5<br>hadoop-hadoop-secondarynamenode-master.example.com.log<br>hadoop-hadoop-secondarynamenode-master.example.com.out<br>hadoop-hadoop-secondarynamenode-master.example.com.out.1<br>hadoop-hadoop-secondarynamenode-master.example.com.out.2<br>hadoop-hadoop-secondarynamenode-master.example.com.out.3<br>hadoop-hadoop-secondarynamenode-master.example.com.out.4<br>hadoop-hadoop-secondarynamenode-master.example.com.out.5<br>SecurityAuth-hadoop.audit<br>yarn-hadoop-resourcemanager-master.example.com.log<br>yarn-hadoop-resourcemanager-master.example.com.out<br>yarn-hadoop-resourcemanager-master.example.com.out.1<br>yarn-hadoop-resourcemanager-master.example.com.out.2<br>yarn-hadoop-resourcemanager-master.example.com.out.3<br>yarn-hadoop-resourcemanager-master.example.com.out.4<br>yarn-hadoop-resourcemanager-master.example.com.out.5
复制代码 步骤二:使用 flume-ng agent 命令加载 simple-hdfs-flume.conf 配置信息,启 动 flume 传输数据。- [[hadoop@master flume]$ flume-ng agent --conf-file xxx.conf --name a1<br>Warning: No configuration directory set! Use --conf <dir> to override.<br>Info: Including Hadoop libraries found via (/usr/local/src/hadoop/bin/hadoop) for HDFS access<br>Info: Excluding /usr/local/src/hadoop/share/hadoop/common/lib/slf4j-api-1.7.10.jar from classpath<br>Info: Excluding /usr/local/src/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar from classpath<br>Info: Including HBASE libraries found via (/usr/local/src/hbase/bin/hbase) for HBASE access<br>Info: Excluding /usr/local/src/hbase/lib/slf4j-api-1.7.7.jar from classpath<br>Info: Excluding /usr/local/src/hbase/lib/slf4j-log4j12-1.7.5.jar from classpath<br>Info: Excluding /usr/local/src/hadoop/share/hadoop/common/lib/slf4j-api-1.7.10.jar from classpath<br>Info: Excluding /usr/local/src/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar from classpath<br>Info: Including Hive libraries found via (/usr/local/src/hive) for Hive access<br>...<br>23/04/21 16:02:35 INFO instrumentation.MonitoredCounterGroup: Shutdown Metric for type: SOURCE, name: r1. src.append.accepted == 0<br>23/04/21 16:02:35 INFO instrumentation.MonitoredCounterGroup: Shutdown Metric for type: SOURCE, name: r1. src.append.received == 0<br>23/04/21 16:02:35 INFO instrumentation.MonitoredCounterGroup: Shutdown Metric for type: SOURCE, name: r1. src.events.accepted == 17<br>23/04/21 16:02:35 INFO instrumentation.MonitoredCounterGroup: Shutdown Metric for type: SOURCE, name: r1. src.events.received == 17<br>23/04/21 16:02:35 INFO instrumentation.MonitoredCounterGroup: Shutdown Metric for type: SOURCE, name: r1. src.open-connection.count == 0<br>23/04/21 16:02:35 INFO source.SpoolDirectorySource: SpoolDir source r1 stopped. Metrics: SOURCE:r1{src.events.accepted=17, src.open-connection.count=0, src.append.received=0, src.append-batch.received=1, src.append-batch.accepted=1, src.append.accepted=0, src.events.received=17}
复制代码 ctrl+c 退出 flume 传输- #首先得开启所有节点<br>[hadoop@master flume]$ start-all.sh<br>This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh<br>Starting namenodes on [master]<br>master: starting namenode, logging to /usr/local/src/hadoop/logs/hadoop-hadoop-namenode-master.example.com.out<br>192.168.88.201: starting datanode, logging to /usr/local/src/hadoop/logs/hadoop-hadoop-datanode-slave2.example.com.out<br>192.168.88.200: starting datanode, logging to /usr/local/src/hadoop/logs/hadoop-hadoop-datanode-slave1.example.com.out<br>Starting secondary namenodes [0.0.0.0]<br>0.0.0.0: starting secondarynamenode, logging to /usr/local/src/hadoop/logs/hadoop-hadoop-secondarynamenode-master.example.com.out<br>starting yarn daemons<br>starting resourcemanager, logging to /usr/local/src/hadoop/logs/yarn-hadoop-resourcemanager-master.example.com.out<br>192.168.88.200: starting nodemanager, logging to /usr/local/src/hadoop/logs/yarn-hadoop-nodemanager-slave1.example.com.out<br>192.168.88.201: starting nodemanager, logging to /usr/local/src/hadoop/logs/yarn-hadoop-nodemanager-slave2.example.com.out<br>[hadoop@master flume]$ ss -antl<br>State Recv-Q Send-Q Local Address:Port Peer Address:Port <br>LISTEN 0 128 192.168.88.101:9000 *:* <br>LISTEN 0 128 *:50090 *:* <br>LISTEN 0 128 *:50070 *:* <br>LISTEN 0 128 *:22 *:* <br>LISTEN 0 128 ::ffff:192.168.88.101:8030 :::* <br>LISTEN 0 128 ::ffff:192.168.88.101:8031 :::* <br>LISTEN 0 128 ::ffff:192.168.88.101:8032 :::* <br>LISTEN 0 128 ::ffff:192.168.88.101:8033 :::* <br>LISTEN 0 80 :::3306 :::* <br>LISTEN 0 128 :::22 :::* <br>LISTEN 0 128 ::ffff:192.168.88.101:8088 :::*
复制代码 步骤三:查看 Flume 传输到 HDFS 的文件,若能查看到 HDFS 上/tmp/flume 目 录有传输的数据文件,则表示数据传输成功。- [hadoop@master flume]$ hdfs dfs -ls /<br>Found 5 items<br>drwxr-xr-x - hadoop supergroup 0 2023-04-07 15:20 /hbase<br>drwxr-xr-x - hadoop supergroup 0 2023-03-17 17:33 /input<br>drwxr-xr-x - hadoop supergroup 0 2023-03-17 18:45 /output<br>drwx------ - hadoop supergroup 0 2023-04-21 16:02 /tmp<br>drwxr-xr-x - hadoop supergroup 0 2023-04-14 20:48 /user<br>[hadoop@master flume]$ hdfs dfs -ls /tmp/flume<br>Found 72 items<br>-rw-r--r-- 2 hadoop supergroup 1560 2023-04-21 16:02 /tmp/flume/FlumeData.1682064143693<br>-rw-r--r-- 2 hadoop supergroup 1398 2023-04-21 16:02 /tmp/flume/FlumeData.1682064143694<br>-rw-r--r-- 2 hadoop supergroup 1456 2023-04-21 16:02 /tmp/flume/FlumeData.1682064143695<br>-rw-r--r-- 2 hadoop supergroup 1398 2023-04-21 16:02 /tmp/flume/FlumeData.1682064143696<br>-rw-r--r-- 2 hadoop supergroup 1403 2023-04-21 16:02 /tmp/flume/FlumeData.1682064143697<br>-rw-r--r-- 2 hadoop supergroup 1434 2023-04-21 16:02 /tmp/flume/FlumeData.1682064143698<br>-rw-r--r-- 2 hadoop supergroup 1383 2023-04-21 16:02 /tmp/flume/FlumeData.1682064143699<br>...<br>-rw-r--r-- 2 hadoop supergroup 1508 2023-04-21 16:02 /tmp/flume/FlumeData.1682064143760<br>-rw-r--r-- 2 hadoop supergroup 1361 2023-04-21 16:02 /tmp/flume/FlumeData.1682064143761<br>-rw-r--r-- 2 hadoop supergroup 1359 2023-04-21 16:02 /tmp/flume/FlumeData.1682064143762<br>-rw-r--r-- 2 hadoop supergroup 1502 2023-04-21 16:02 /tmp/flume/FlumeData.1682064143763<br>-rw-r--r-- 2 hadoop supergroup 1399 2023-04-21 16:02 /tmp/flume/FlumeData.1682064143764
复制代码 实验二、通过命令监控大数据平台运行状态
1、通过命令查看大数据平台状态
步骤一:查看 Linux 系统的信息(uname -a)- [root@master ~]# uname -a<br>Linux master.example.com 3.10.0-862.el7.x86_64 #1 SMP Fri Apr 20 16:44:24 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
复制代码 结果显示,该 Linux 节点名称为 master,内核发行号为 3.10.0-693.el7.x86_64。
步骤二:查看硬盘信息
(1)查看所有分区(fdisk -l)- [root@master ~]# fdisk -l<br><br>Disk /dev/sda: 107.4 GB, 107374182400 bytes, 209715200 sectors<br>Units = sectors of 1 * 512 = 512 bytes<br>Sector size (logical/physical): 512 bytes / 512 bytes<br>I/O size (minimum/optimal): 512 bytes / 512 bytes<br>Disk label type: dos<br>Disk identifier: 0x000af885<br><br> Device Boot Start End Blocks Id System<br>/dev/sda1 * 2048 2099199 1048576 83 Linux<br>/dev/sda2 2099200 209715199 103808000 8e Linux LVM<br><br>Disk /dev/mapper/centos-root: 53.7 GB, 53687091200 bytes, 104857600 sectors<br>Units = sectors of 1 * 512 = 512 bytes<br>Sector size (logical/physical): 512 bytes / 512 bytes<br>I/O size (minimum/optimal): 512 bytes / 512 bytes<br><br><br>Disk /dev/mapper/centos-swap: 8455 MB, 8455716864 bytes, 16515072 sectors<br>Units = sectors of 1 * 512 = 512 bytes<br>Sector size (logical/physical): 512 bytes / 512 bytes<br>I/O size (minimum/optimal): 512 bytes / 512 bytes<br><br><br>Disk /dev/mapper/centos-home: 44.1 GB, 44149243904 bytes, 86228992 sectors<br>Units = sectors of 1 * 512 = 512 bytes<br>Sector size (logical/physical): 512 bytes / 512 bytes<br>I/O size (minimum/optimal): 512 bytes / 512 bytes
复制代码 结果显示,硬盘空间为 107.4GB。
(2)查看所有交换分区(swapon -s)- [root@master ~]# swapon -s<br>Filename Type Size Used Priority<br>/dev/dm-1 partition 8257532 0 -1
复制代码 结果显示,交换分区为 8257532。
(3)查看文件系统占比(df -h)- [root@master ~]# df -h<br>Filesystem Size Used Avail Use% Mounted on<br>/dev/mapper/centos-root 50G 4.6G 46G 10% /<br>devtmpfs 1.9G 0 1.9G 0% /dev<br>tmpfs 1.9G 0 1.9G 0% /dev/shm<br>tmpfs 1.9G 12M 1.9G 1% /run<br>tmpfs 1.9G 0 1.9G 0% /sys/fs/cgroup<br>/dev/mapper/centos-home 42G 36M 42G 1% /home<br>/dev/sda1 1014M 142M 873M 14% /boot<br>tmpfs 378M 0 378M 0% /run/user/0
复制代码 结果显示,挂载点“/”的容量为 50G,已使用 4.6G。
步骤三:查看网络 IP 地址(ip a)- [root@master ~]# ip a<br>1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000<br> link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00<br> inet 127.0.0.1/8 scope host lo<br> valid_lft forever preferred_lft forever<br> inet6 ::1/128 scope host <br> valid_lft forever preferred_lft forever<br>2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000<br> link/ether 00:0c:29:20:cd:03 brd ff:ff:ff:ff:ff:ff<br> inet 192.168.88.101/24 brd 192.168.88.255 scope global noprefixroute ens33<br> valid_lft forever preferred_lft forever<br> inet6 fe80::75a3:9da2:ede2:5be7/64 scope link noprefixroute <br> valid_lft forever preferred_lft forever<br> inet6 fe80::1c1b:d0e3:f01a:7c11/64 scope link tentative noprefixroute dadfailed <br> valid_lft forever preferred_lft forever
复制代码 结果显示 ens33 的 IP 地址为 192.168.88.101,子网掩码为 255.255.255.0;回环地址 为 127.0.0.1,子网掩码为 255.0.0.0。
步骤四:查看所有监听端口(netstat -lntp)- [root@master ~]# netstat -lntp<br>Active Internet connections (only servers)<br>Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name <br>tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 908/sshd <br>tcp6 0 0 :::3306 :::* LISTEN 1053/mysqld <br>tcp6 0 0 :::22 :::* LISTEN 908/sshd
复制代码 结果显示,在监听的端口分别为 22、3306。
步骤五:查看所有已经建立的连接(netstat -antp)- [root@master ~]# netstat -antp<br>Active Internet connections (servers and established)<br>Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name <br>tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 908/sshd <br>tcp 0 36 192.168.88.101:22 192.168.88.1:61450 ESTABLISHED 1407/sshd: root@pts <br>tcp6 0 0 :::3306 :::* LISTEN 1053/mysqld <br>tcp6 0 0 :::22 :::* LISTEN 908/sshd
复制代码 结果显示,已经连接上的本地端口分别为 50608、22、9000、8031 等。
步骤六:实时显示进程状态(top),该命令可以查看进程对 CPU、内存的占比 等。
<img alt="image-20230423214902071" data-local-refresh="true">
步骤七:查看 CPU 信息( cat /proc/cpuinfo)- [root@master ~]# cat /proc/cpuinfo <br>processor : 0<br>vendor_id : AuthenticAMD<br>cpu family : 25<br>model : 80<br>model name : AMD Ryzen 7 5800U with Radeon Graphics<br>stepping : 0<br>cpu MHz : 1896.438<br>cache size : 512 KB<br>physical id : 0<br>siblings : 2<br>core id : 0<br>cpu cores : 2<br>apicid : 0<br>initial apicid : 0<br>fpu : yes<br>fpu_exception : yes<br>cpuid level : 16<br>wp : yes<br>flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc art rep_good nopl tsc_reliable nonstop_tsc extd_apicid eagerfpu pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw topoext retpoline_amd vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 clzero ibpb ibrs arat pku ospke overflow_recov succor<br>bogomips : 3792.87<br>TLB size : 2560 4K pages<br>clflush size : 64<br>cache_alignment : 64<br>address sizes : 45 bits physical, 48 bits virtual<br>power management:<br><br>processor : 1<br>vendor_id : AuthenticAMD<br>cpu family : 25<br>model : 80<br>model name : AMD Ryzen 7 5800U with Radeon Graphics<br>stepping : 0<br>cpu MHz : 1896.438<br>cache size : 512 KB<br>physical id : 0<br>siblings : 2<br>core id : 1<br>cpu cores : 2<br>apicid : 1<br>initial apicid : 1<br>fpu : yes<br>fpu_exception : yes<br>cpuid level : 16<br>wp : yes<br>flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc art rep_good nopl tsc_reliable nonstop_tsc extd_apicid eagerfpu pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw topoext retpoline_amd vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 clzero ibpb ibrs arat pku ospke overflow_recov succor<br>bogomips : 3792.87<br>TLB size : 2560 4K pages<br>clflush size : 64<br>cache_alignment : 64<br>address sizes : 45 bits physical, 48 bits virtual
复制代码 步骤八:查看内存信息( cat /proc/meminfo),该命令可以查看总内存、空 闲内存等信息。- [root@master ~]# cat /proc/meminfo <br>MemTotal: 3863564 kB<br>MemFree: 2971196 kB<br>MemAvailable: 3301828 kB<br>Buffers: 2120 kB<br>Cached: 529608 kB<br>SwapCached: 0 kB<br>Active: 626584 kB<br>Inactive: 130828 kB<br>Active(anon): 226348 kB<br>Inactive(anon): 11152 kB<br>Active(file): 400236 kB<br>Inactive(file): 119676 kB<br>Unevictable: 0 kB<br>Mlocked: 0 kB<br>SwapTotal: 8257532 kB<br>SwapFree: 8257532 kB<br>Dirty: 0 kB<br>Writeback: 0 kB<br>AnonPages: 225716 kB<br>Mapped: 28960 kB<br>Shmem: 11816 kB<br>Slab: 59428 kB<br>SReclaimable: 30680 kB<br>SUnreclaim: 28748 kB<br>KernelStack: 4432 kB<br>PageTables: 3664 kB<br>NFS_Unstable: 0 kB<br>Bounce: 0 kB<br>WritebackTmp: 0 kB<br>CommitLimit: 10189312 kB<br>Committed_AS: 773684 kB<br>VmallocTotal: 34359738367 kB<br>VmallocUsed: 185376 kB<br>VmallocChunk: 34359310332 kB<br>HardwareCorrupted: 0 kB<br>AnonHugePages: 180224 kB<br>CmaTotal: 0 kB<br>CmaFree: 0 kB<br>HugePages_Total: 0<br>HugePages_Free: 0<br>HugePages_Rsvd: 0<br>HugePages_Surp: 0<br>Hugepagesize: 2048 kB<br>DirectMap4k: 73536 kB<br>DirectMap2M: 3072000 kB<br>DirectMap1G: 3145728 kB
复制代码 2、通过命令查看 Hadoop 状态
步骤一:切换到 hadoop 用户- [root@master ~]# su - hadoop<br>Last login: Fri Apr 21 15:26:05 CST 2023 on pts/0<br>Last failed login: Fri Apr 21 16:02:08 CST 2023 from slave1 on ssh:notty<br>There were 4 failed login attempts since the last successful login.
复制代码 若当前的用户为 root,请切换到 hadoop 用户进行操作。
步骤二:切换到 Hadoop 的安装目录- [hadoop@master ~]$ cd /usr/local/src/hadoop/
复制代码 步骤三:启动 Hadoop- [hadoop@master ~]$ start-all.sh <br>This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh<br>Starting namenodes on [master]<br>master: starting namenode, logging to /usr/local/src/hadoop/logs/hadoop-hadoop-namenode-master.example.com.out<br>192.168.88.201: starting datanode, logging to /usr/local/src/hadoop/logs/hadoop-hadoop-datanode-slave2.example.com.out<br>192.168.88.200: starting datanode, logging to /usr/local/src/hadoop/logs/hadoop-hadoop-datanode-slave1.example.com.out<br>Starting secondary namenodes [0.0.0.0]<br>0.0.0.0: starting secondarynamenode, logging to /usr/local/src/hadoop/logs/hadoop-hadoop-secondarynamenode-master.example.com.out<br>starting yarn daemons<br>starting resourcemanager, logging to /usr/local/src/hadoop/logs/yarn-hadoop-resourcemanager-master.example.com.out<br>192.168.88.200: starting nodemanager, logging to /usr/local/src/hadoop/logs/yarn-hadoop-nodemanager-slave1.example.com.out<br>192.168.88.201: starting nodemanager, logging to /usr/local/src/hadoop/logs/yarn-hadoop-nodemanager-slave2.example.com.out
复制代码 步骤四:关闭 Hadoop- [hadoop@master hadoop]$ stop-all.sh <br>This script is Deprecated. Instead use stop-dfs.sh and stop-yarn.sh<br>Stopping namenodes on [master]<br>master: stopping namenode<br>192.168.88.201: stopping datanode<br>192.168.88.200: stopping datanode<br>Stopping secondary namenodes [0.0.0.0]<br>0.0.0.0: stopping secondarynamenode<br>stopping yarn daemons<br>stopping resourcemanager<br>192.168.88.200: stopping nodemanager<br>192.168.88.201: stopping nodemanager<br>no proxyserver to stop
复制代码 实验三、通过命令监控大数据平台资源状态
1、通过命令查看 YARN 状态
步骤一:确认切换到目录 /usr/local/src/hadoop- [hadoop@master hadoop]$ cd /usr/local/src/hadoop/
复制代码 步骤二:返回主机界面在 Master 主机上执行 start-all.sh- #master 节点启动 zookeeper<br>[hadoop@master hadoop]$ zkServer.sh start<br>ZooKeeper JMX enabled by default<br>Using config: /usr/local/src/zookeeper/bin/../conf/zoo.cfg<br>Starting zookeeper ... STARTED<br><br>#slave1 节点启动 zookeeper<br>[root@slave1 ~]# zkServer.sh start<br>ZooKeeper JMX enabled by default<br>Using config: /usr/local/src/zookeeper/bin/../conf/zoo.cfg<br>Starting zookeeper ... STARTED<br><br>#slave2 节点启动 zookeeper<br>[root@slave2 ~]# zkServer.sh start<br>ZooKeeper JMX enabled by default<br>Using config: /usr/local/src/zookeeper/bin/../conf/zoo.cfg<br>Starting zookeeper ... STARTED<br><br>master节点<br>[hadoop@master hadoop]$ start-all.sh
复制代码 步 骤 三 : 执 行 JPS 命 令 , 发 现 Master 上 有 NodeManager 进程和 ResourceManager 进程,则 YARN 启动完成。- [hadoop@master hadoop]$ jps
复制代码 执行结果如下,说明 YARN 已启动。- [hadoop@master hadoop]$ jps<br>2642 QuorumPeerMain<br>2994 SecondaryNameNode<br>3154 ResourceManager<br>3413 Jps<br>2795 NameNode
复制代码 2、通过命令查看 HDFS 状态
步骤一:目录操作
切换到 hadoop 目录,执行 cd /usr/local/src/hadoop 命令- [hadoop@master hadoop]$ cd /usr/local/src/hadoop/
复制代码 查看 HDFS 目录- [hadoop@master hadoop]$ ./bin/hdfs dfs -ls /<br>Found 5 items<br>drwxr-xr-x - hadoop supergroup 0 2023-04-07 15:20 /hbase<br>drwxr-xr-x - hadoop supergroup 0 2023-03-17 17:33 /input<br>drwxr-xr-x - hadoop supergroup 0 2023-03-17 18:45 /output<br>drwx------ - hadoop supergroup 0 2023-04-21 16:02 /tmp<br>drwxr-xr-x - hadoop supergroup 0 2023-04-14 20:48 /user
复制代码 步骤二:查看 HDSF 的报告,执行命令: bin/hdfs dfsadmin -report- [hadoop@master hadoop]$ bin/hdfs dfsadmin -report<br>Configured Capacity: 107321753600 (99.95 GB)<br>Present Capacity: 102855995392 (95.79 GB)<br>DFS Remaining: 102852079616 (95.79 GB)<br>DFS Used: 3915776 (3.73 MB)<br>DFS Used%: 0.00%<br>Under replicated blocks: 0<br>Blocks with corrupt replicas: 0<br>Missing blocks: 0<br>Missing blocks (with replication factor 1): 0<br><br>-------------------------------------------------<br>Live datanodes (2):<br><br>Name: 192.168.88.201:50010 (slave2)<br>Hostname: slave2<br>Decommission Status : Normal<br>Configured Capacity: 53660876800 (49.98 GB)<br>DFS Used: 1957888 (1.87 MB)<br>Non DFS Used: 2232373248 (2.08 GB)<br>DFS Remaining: 51426545664 (47.89 GB)<br>DFS Used%: 0.00%<br>DFS Remaining%: 95.84%<br>Configured Cache Capacity: 0 (0 B)<br>Cache Used: 0 (0 B)<br>Cache Remaining: 0 (0 B)<br>Cache Used%: 100.00%<br>Cache Remaining%: 0.00%<br>Xceivers: 1<br>Last contact: Sun Apr 23 21:56:46 CST 2023<br><br><br>Name: 192.168.88.200:50010 (slave1)<br>Hostname: slave1<br>Decommission Status : Normal<br>Configured Capacity: 53660876800 (49.98 GB)<br>DFS Used: 1957888 (1.87 MB)<br>Non DFS Used: 2233384960 (2.08 GB)<br>DFS Remaining: 51425533952 (47.89 GB)<br>DFS Used%: 0.00%<br>DFS Remaining%: 95.83%<br>Configured Cache Capacity: 0 (0 B)<br>Cache Used: 0 (0 B)<br>Cache Remaining: 0 (0 B)<br>Cache Used%: 100.00%<br>Cache Remaining%: 0.00%<br>Xceivers: 1<br>Last contact: Sun Apr 23 21:56:46 CST 2023
复制代码 步骤三:查看 HDFS 空间情况,执行命令:hdfs dfs -df- [hadoop@master hadoop]$ hdfs dfs -df /<br>Filesystem Size Used Available Use%<br>hdfs://master:9000 107321753600 3915776 102852079616 0%
复制代码 3、通过命令查看 HBase 状态
步骤一:启动运行 HBase
切换到 HBase 安装目录/usr/local/src/hbase,命令如下:- [hadoop@master hadoop]$ cd /usr/local/src/hbase/<br>[hadoop@master hbase]$ hbase version<br>HBase 1.2.1<br>Source code repository git://asf-dev/home/busbey/projects/hbase revision=8d8a7107dc4ccbf36a92f64675dc60392f85c015<br>Compiled by busbey on Wed Mar 30 11:19:21 CDT 2016<br>From source with checksum f4bb4a14bb4e0b72b46f729dae98a772
复制代码 结果显示 HBase1.2.1,说明 HBase 正在运行,版本号为 1.2.1。- 如果没有启动,则执行命令 start-hbase.sh 启动 HBase。<br>[hadoop@master hbase]$ start-hbase.sh <br>master: starting zookeeper, logging to /usr/local/src/hbase/logs/hbase-hadoop-zookeeper-master.example.com.out<br>slave1: starting zookeeper, logging to /usr/local/src/hbase/logs/hbase-hadoop-zookeeper-slave1.example.com.out<br>slave2: starting zookeeper, logging to /usr/local/src/hbase/logs/hbase-hadoop-zookeeper-slave2.example.com.out<br>starting master, logging to /usr/local/src/hbase/logs/hbase-hadoop-master-master.example.com.out<br>slave2: starting regionserver, logging to /usr/local/src/hbase/logs/hbase-hadoop-regionserver-slave2.example.com.out<br>slave1: starting regionserver, logging to /usr/local/src/hbase/logs/hbase-hadoop-regionserver-slave1.example.com.out
复制代码 步骤二:查看 HBase 版本信息
执行命令hbase shell,进入HBase命令交互界面- [hadoop@master hbase]$ hbase shell<br>SLF4J: Class path contains multiple SLF4J bindings.<br>SLF4J: Found binding in [jar:file:/usr/local/src/hbase/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]<br>SLF4J: Found binding in [jar:file:/usr/local/src/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]<br>SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.<br>SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]<br>HBase Shell; enter 'help<RETURN>' for list of supported commands.<br>Type "exit<RETURN>" to leave the HBase Shell<br>Version 1.2.1, r8d8a7107dc4ccbf36a92f64675dc60392f85c015, Wed Mar 30 11:19:21 CDT 2016<br><br>hbase(main):001:0>
复制代码 输入 version,查询 HBase 版本- hbase(main):001:0> version<br>1.2.1, r8d8a7107dc4ccbf36a92f64675dc60392f85c015, Wed Mar 30 11:19:21 CDT 2016
复制代码 结果显示 HBase 版本为 1.2.1。
步骤三:查询 HBase 状态,在 HBase 命令交互界面,执行 status 命令- hbase(main):002:0> status<br>1 active master, 0 backup masters, 2 servers, 0 dead, 1.0000 average load
复制代码 查询结果显示,1 台活动 master,0 台备份 masters,共 3 台服务主机,平均加载 时间为 0.6667 秒。
我们还可以“简单”查询 HBase 的状态,执行命令 status 'simple'- hbase(main):003:0> status 'simple'<br>active master: master:16000 1682258321433<br>0 backup masters<br>2 live servers<br> slave1:16020 1682258322908<br> requestsPerSecond=0.0, numberOfOnlineRegions=2, usedHeapMB=19, maxHeapMB=440, numberOfStores=2, numberOfStorefiles=0, storefileUncompressedSizeMB=0, storefileSizeMB=0, memstoreSizeMB=0, storefileIndexSizeMB=0, readRequestsCount=7, writeRequestsCount=4, rootIndexSizeKB=0 totalStaticIndexSizeKB=0, totalStaticBloomSizeKB=0, totalCompactingKVs=0, currentCompactedKVs=0, compactionProgressPct=NaN, coprocessors=[MultiRowMutationEndpoint]<br> slave2:16020 1682258322896<br> requestsPerSecond=0.0, numberOfOnlineRegions=0, usedHeapMB=11, maxHeapMB=440, numberOfStores=0, numberOfStorefiles=0, storefileUncompressedSizeMB=0, storefileSizeMB=0, memstoreSizeMB=0, storefileIndexSizeMB=0, readRequestsCount=0, writeRequestsCount=0, rootIndexSizeKB=0, totalStaticIndexSizeKB=0, totalStaticBloomSizeKB=0, totalCompactingKVs=0, currentCompactedKVs=0, compactionProgressPct=NaN, coprocessors=[]<br>0 dead servers<br>Aggregate load: 0, regions: 2
复制代码 显示更多的关于 Master、Slave1 和 Slave2 主机的服务端口、请求时间等详细信息。
如果需要查询更多关于 HBase 状态,执行命令 help 'status'- hbase(main):004:0> help 'status'<br>Show cluster status. Can be 'summary', 'simple', 'detailed', or 'replication'. The<br>default is 'summary'. Examples:<br><br> hbase> status<br> hbase> status 'simple'<br> hbase> status 'summary'<br> hbase> status 'detailed'<br> hbase> status 'replication'<br> hbase> status 'replication', 'source'<br> hbase> status 'replication', 'sink'<br> <br>hbase(main):005:0> quit<br>[hadoop@master hbase]$
复制代码 结果显示出所有关于 status 的命令。
步骤四 停止 HBase 服务
停止 HBase 服务,则执行命令 stop-hbase.sh。- [hadoop@master hbase]$ stop-hbase.sh <br>stopping hbase.................<br>slave1: no zookeeper to stop because no pid file /tmp/hbase-hadoop-zookeeper.pid<br>slave2: no zookeeper to stop because no pid file /tmp/hbase-hadoop-zookeeper.pid<br>master: no zookeeper to stop because no pid file /tmp/hbase-hadoop-zookeeper.pid
复制代码 没有错误提示,显示$提示符时,即停止了 HBase 服务。
4、通过命令查看 Hive 状态
步骤一:启动 Hive
切换到/usr/local/src/hive 目录,输入 hive,回车。- [hadoop@master hbase]$ cd /usr/local/src/hive/<br>[hadoop@master hive]$ hive<br>SLF4J: Class path contains multiple SLF4J bindings.<br>SLF4J: Found binding in [jar:file:/usr/local/src/hive/lib/hive-jdbc-2.0.0-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class]<br>SLF4J: Found binding in [jar:file:/usr/local/src/hive/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]<br>SLF4J: Found binding in [jar:file:/usr/local/src/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]<br>SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.<br>SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]<br><br>Logging initialized using configuration in jar:file:/usr/local/src/hive/lib/hive-common-2.0.0.jar!/hive-log4j2.properties<br>Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.<br>hive>
复制代码 当显示 hive>时,表示启动成功,进入到了 Hive shell 状态。
步骤二:Hive 操作基本命令
注意:Hive 命令行语句后面一定要加分号。
(1)查看数据库- hive> show databases;<br>OK<br>default<br>sample<br>Time taken: 0.973 seconds, Fetched: 2 row(s)
复制代码 显示默认的数据库 default。
(2)查看 default 数据库所有表- hive> use default;<br>OK<br>Time taken: 0.02 seconds<br>hive> show tables;<br>OK<br>test<br>Time taken: 2.201 seconds, Fetched: 1 row(s)
复制代码 显示 default 数据中没有任何表。
(3)创建表 stu,表的 id 为整数型,name 为字符型- hive> create table stu(id int,name string);<br>OK<br>Time taken: 0.432 seconds
复制代码 (4)为表 stu 插入一条信息,id 号为 001,name 为liuyaling- hive> insert into stu values (001,"liuyaling");<br>WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.<br>Query ID = hadoop_20230423222915_a95e9891-fdf5-4739-a63e-fcadecc85e28<br>Total jobs = 3<br>Launching Job 1 out of 3<br>Number of reduce tasks is set to 0 since there's no reduce operator<br>Starting Job = job_1682258121749_0001, Tracking URL = http://master:8088/proxy/application_1682258121749_0001/<br>Kill Command = /usr/local/src/hadoop/bin/hadoop job -kill job_1682258121749_0001<br>Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0<br>2023-04-23 22:31:36,420 Stage-1 map = 0%, reduce = 0%<br>2023-04-23 22:31:43,892 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.72 sec<br>MapReduce Total cumulative CPU time: 2 seconds 720 msec<br>Ended Job = job_1682258121749_0001<br>Stage-4 is selected by condition resolver.<br>Stage-3 is filtered out by condition resolver.<br>Stage-5 is filtered out by condition resolver.<br>Moving data to: hdfs://master:9000/user/hive/warehouse/stu/.hive-staging_hive_2023-04-23_22-30-32_985_529079703757687911-1/-ext-10000<br>Loading data to table default.stu<br>MapReduce Jobs Launched: <br>Stage-Stage-1: Map: 1 Cumulative CPU: 2.72 sec HDFS Read: 4135 HDFS Write: 79 SUCCESS<br>Total MapReduce CPU Time Spent: 2 seconds 720 msec<br>OK<br>Time taken: 72.401 seconds
复制代码 按照以上操作,继续插入两条信息:id 和 name 分别为 1002、1003 和 yanhaoxiang、tnt。
(5)插入数据后查看表的信息- hive> show tables;<br>OK<br>stu<br>test<br>values__tmp__table__1<br>values__tmp__table__2<br>Time taken: 0.026 seconds, Fetched: 4 row(s)
复制代码 (6)查看表 stu 结构- hive> desc stu;<br>OK<br>id int <br>name string <br>Time taken: 0.041 seconds, Fetched: 2 row(s)
复制代码 (7)查看表 stu 的内容- hive> select * from stu;<br>OK<br>1 liuyaling<br>1002 yanhaoxiang<br>1002 yanhaoxiang<br>1003 tnt<br>Time taken: 0.118 seconds, Fetched: 4 row(s)
复制代码 步骤三:通过 Hive 命令行界面查看文件系统和历史命令
(1)查看本地文件系统,执行命令 ! ls /usr/local/src;- hive> ! ls /usr/local/src;<br>flume<br>hadoop<br>hbase<br>hive<br>jdk<br>sqoop<br>zookeeper
复制代码 (2)查看 HDFS 文件系统,执行命令 dfs -ls /;- hive> dfs -ls /;<br>Found 5 items<br>drwxr-xr-x - hadoop supergroup 0 2023-04-23 21:58 /hbase<br>drwxr-xr-x - hadoop supergroup 0 2023-03-17 17:33 /input<br>drwxr-xr-x - hadoop supergroup 0 2023-03-17 18:45 /output<br>drwx------ - hadoop supergroup 0 2023-04-21 16:02 /tmp<br>drwxr-xr-x - hadoop supergroup 0 2023-04-14 20:48 /user<br><br>hive> exit;
复制代码 (3)查看在 Hive 中输入的所有历史命令
进入到当前用户 Hadoop 的目录/home/hadoop,查看.hivehistory 文件。- [hadoop@master hive]$ cd /home/hadoop/<br>[hadoop@master ~]$ cat .hivehistory <br>show databases;<br>create database sample;<br>show databases;<br>use sample;<br>create table student(number string,name string);<br>exit<br>show databases;<br>exit;<br>use sample;<br>select * from student;<br>sqoop export --connect "jdbc:mysql://master:3306/sample?useUnicode=true&characterEncoding=utf-8" --username root --password Password@123! --table student --input-fields-terminated-by '|' --export-dir /user/hive/warehouse/sample.db/student/*<br>sqoop export --connect "jdbc:mysql://master:3306/sample?useUnicode=true&characterEncoding=utf-8" --username root --password Password@123! --table student --input-fields-terminated-by '|' --export-dir /user/hive/warehouse/sample.db/student/*;<br>sqoop export --connect "jdbc:mysql://master3306/sample?useUnicode=true&characterEncoding=utf-8" --username root --password Password@123! --table student --input-fields-terminated-by '|' --export-dir /user/hive/warehouse/sample.db/student/*<br>sqoop export --connect "jdbc:mysql://master3306/sample?useUnicode=true&characterEncoding=utf-8" --username root --password Password@123! --table student --input-fields-terminated-by '|' --export-dir /user/hive/warehouse/sample.db/student/*;<br>sqoop export --connect "jdbc:mysql://master3306/sample?useUnicode=true&characterEncoding=utf-8" --username root --password Password@123! --table student --input-fields-terminated-by '|' --export-dir /user/hive/warehouse/sample.db/student/*<br>exit<br>show databases;<br>use sample;<br>show tables;<br>use default;<br>show tables;<br>quit<br>'<br>select*from default.test;<br>quit;<br>show databases;<br>use default;<br>show tables;<br>insert into stu values (001,"liuyaling");<br>create table stu(id int,name string);<br>insert into stu values (001,"liuyaling");<br>insert into stu values (1002,"yanhaoxiang");<br>hive<br>insert into stu values (1003,"tnt");<br>quit<br>exit<br>exit;<br>quit;<br>show tables;<br>insert into stu values (1002,"yanhaoxiang");<br>insert into stu values (1003,"tnt");<br>show tables;<br>desc stu;<br>select * from stu;<br>! ls /usr/local/src;<br>dfs -ls /;<br>exit;
复制代码 结果显示,之前在 Hive 命令行界面下运行的所有命令(含错误命令)都显示了出 来,有助于维护、故障排查等工作。
实验四、通过命令监控大数据平台服务状态
1、通过命令查看 ZooKeeper 状态
步骤一: 查看 ZooKeeper 状态,执行命令 zkServer.sh status,结果显示如下- [hadoop@master ~]$ zkServer.sh status<br>ZooKeeper JMX enabled by default<br>Using config: /usr/local/src/zookeeper/bin/../conf/zoo.cfg<br>Mode: follower
复制代码 以上结果中,Mode:follower 表示为 ZooKeeper 的跟随者。
步骤二: 查看运行进程
QuorumPeerMain:QuorumPeerMain 是 ZooKeeper 集群的启动入口类,是用来加载配 置启动 QuorumPeer 线程的。
执行命令 jps 以查看进程情况。- [hadoop@master ~]$ jps<br>2642 QuorumPeerMain<br>2994 SecondaryNameNode<br>3154 ResourceManager<br>5400 Jps<br>2795 NameNode
复制代码 此时 QuorumPeerMain 进程已启动。
步骤四: 在成功启动 ZooKeeper 服务后,输入命令 zkCli.sh,连接到 ZooKeeper 服务。- [hadoop@master ~]$ zkCli.sh <br>Connecting to localhost:2181<br>2023-04-23 22:39:17,093 [myid:] - INFO [main:Environment@100] - Client environment:zookeeper.version=3.4.8--1, built on 02/06/2016 03:18 GMT<br>2023-04-23 22:39:17,096 [myid:] - INFO [main:Environment@100] - Client environment:host.name=master<br>2023-04-23 22:39:17,096 [myid:] - INFO [main:Environment@100] - Client environment:java.version=1.8.0_152<br>2023-04-23 22:39:17,097 [myid:] - INFO [main:Environment@100] - Client environment:java.vendor=Oracle Corporation<br>2023-04-23 22:39:17,097 [myid:] - INFO [main:Environment@100] - Client environment:java.home=/usr/local/src/jdk/jre<br>2023-04-23 22:39:17,097 [myid:] - INFO [main:Environment@100] - Client environment:java.class.path=/usr/local/src/zookeeper/bin/../build/classes:/usr/local/src/zookeeper/bin/../build/lib/*.jar:/usr/local/src/zookeeper/bin/../lib/slf4j-log4j12-1.6.1.jar:/usr/local/src/zookeeper/bin/../lib/slf4j-api-1.6.1.jar:/usr/local/src/zookeeper/bin/../lib/netty-3.7.0.Final.jar:/usr/local/src/zookeeper/bin/../lib/log4j-1.2.16.jar:/usr/local/src/zookeeper/bin/../lib/jline-0.9.94.jar:/usr/local/src/zookeeper/bin/../zookeeper-3.4.8.jar:/usr/local/src/zookeeper/bin/../src/java/lib/*.jar:/usr/local/src/zookeeper/bin/../conf:/usr/local/src/sqoop/lib:<br>2023-04-23 22:39:17,097 [myid:] - INFO [main:Environment@100] - Client environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib<br>2023-04-23 22:39:17,097 [myid:] - INFO [main:Environment@100] - Client environment:java.io.tmpdir=/tmp<br>2023-04-23 22:39:17,097 [myid:] - INFO [main:Environment@100] - Client environment:java.compiler=<NA><br>2023-04-23 22:39:17,097 [myid:] - INFO [main:Environment@100] - Client environment:os.name=Linux<br>2023-04-23 22:39:17,097 [myid:] - INFO [main:Environment@100] - Client environment:os.arch=amd64<br>2023-04-23 22:39:17,097 [myid:] - INFO [main:Environment@100] - Client environment:os.version=3.10.0-862.el7.x86_64<br>2023-04-23 22:39:17,097 [myid:] - INFO [main:Environment@100] - Client environment:user.name=hadoop<br>2023-04-23 22:39:17,097 [myid:] - INFO [main:Environment@100] - Client environment:user.home=/home/hadoop<br>2023-04-23 22:39:17,097 [myid:] - INFO [main:Environment@100] - Client environment:user.dir=/home/hadoop<br>2023-04-23 22:39:17,099 [myid:] - INFO [main:ZooKeeper@438] - Initiating client connection, connectString=localhost:2181 sessionTimeout=30000 watcher=org.apache.zookeeper.ZooKeeperMain$MyWatcher@69d0a921<br>Welcome to ZooKeeper!<br>2023-04-23 22:39:17,122 [myid:] - INFO [main-SendThread(localhost:2181):ClientCnxn$SendThread@1032] - Opening socket connection to server localhost/0:0:0:0:0:0:0:1:2181. Will not attempt to authenticate using SASL (unknown error)<br>JLine support is enabled<br>2023-04-23 22:39:17,208 [myid:] - INFO [main-SendThread(localhost:2181):ClientCnxn$SendThread@876] - Socket connection established to localhost/0:0:0:0:0:0:0:1:2181, initiating session<br>2023-04-23 22:39:17,223 [myid:] - INFO [main-SendThread(localhost:2181):ClientCnxn$SendThread@1299] - Session establishment complete on server localhost/0:0:0:0:0:0:0:1:2181, sessionid = 0x187ae88b8390000, negotiated timeout = 30000<br><br>WATCHER::<br><br>WatchedEvent state:SyncConnected type:None path:null<br>[zk: localhost:2181(CONNECTED) 0]
复制代码 步骤五: 使用 Watch 监听/hbase 目录,一旦/hbase 内容有变化,将会有提 示。打开监视,执行命令 get /hbase 1。- [zk: localhost:2181(CONNECTED) 0] get /hbase 1<br><br>cZxid = 0x500000002<br>ctime = Sun Apr 23 21:58:42 CST 2023<br>mZxid = 0x500000002<br>mtime = Sun Apr 23 21:58:42 CST 2023<br>pZxid = 0x500000062<br>cversion = 18<br>dataVersion = 0<br>aclVersion = 0<br>ephemeralOwner = 0x0<br>dataLength = 0<br>numChildren = 14<br><br>[zk: localhost:2181(CONNECTED) 1] quit<br>Quitting...<br>2023-04-23 22:40:16,816 [myid:] - INFO [main:ZooKeeper@684] - Session: 0x187ae88b8390001 closed<br>2023-04-23 22:40:16,817 [myid:] - INFO [main-EventThread:ClientCnxn$EventThread@519] - EventThread shut down for session: 0x187ae88b8390001<br>[hadoop@master ~]$
复制代码 结果显示,当执行命令 set /hbase value-update 后,数据版本由 0 变成 1,说明 /hbase 处于监控中。
2、通过命令查看 Sqoop 状态
步骤一: 查询 Sqoop 版本号,验证 Sqoop 是否启动成功。
首先切换到/usr/local/src/sqoop 目录,执行命令:./bin/sqoop-version- [hadoop@master ~]$ cd /usr/local/src/sqoop/<br>[hadoop@master sqoop]$ ./bin/sqoop-version <br>Warning: /usr/local/src/sqoop/../hcatalog does not exist! HCatalog jobs will fail.<br>Please set $HCAT_HOME to the root of your HCatalog installation.<br>Warning: /usr/local/src/sqoop/../accumulo does not exist! Accumulo imports will fail.<br>Please set $ACCUMULO_HOME to the root of your Accumulo installation.<br>23/04/23 22:40:59 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7<br>Sqoop 1.4.7<br>git commit id 2328971411f57f0cb683dfb79d19d4d19d185dd8<br>Compiled by maugli on Thu Dec 21 15:59:58 STD 2017
复制代码 结果显示:Sqoop 1.4.7,说明 Sqoop 版本号为 1.4.7,并启动成功。
步骤二: 测试 Sqoop 是否能够成功连接数据库
切换到 Sqoop 的 目 录 , 执 行 命 令 bin/sqoop list-databases --connect jdbc:mysql://master:3306/ --username root --password Password123$,命令中 “master:3306”为数据库主机名和端口。- [hadoop@master sqoop]$ bin/sqoop list-databases --connect jdbc:mysql://master:3306/ --username root --password Password@123!<br>Warning: /usr/local/src/sqoop/../hcatalog does not exist! HCatalog jobs will fail.<br>Please set $HCAT_HOME to the root of your HCatalog installation.<br>Warning: /usr/local/src/sqoop/../accumulo does not exist! Accumulo imports will fail.<br>Please set $ACCUMULO_HOME to the root of your Accumulo installation.<br>23/04/23 22:42:16 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7<br>23/04/23 22:42:16 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.<br>23/04/23 22:42:16 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.<br>Sun Apr 23 22:42:16 CST 2023 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.<br>information_schema<br>hive<br>mysql<br>performance_schema<br>sample<br>sys
复制代码 结果显示,可以连接到 MySQL,并查看到 Master 主机中 MySQL 的所有库实例,如 information_schema、hive、mysql、performance_schema 和 sys 等数据库。
步骤三: 执行命令 sqoop help,可以看到如下内容,代表 Sqoop 启动成功。- [hadoop@master sqoop]$ sqoop help<br>Warning: /usr/local/src/sqoop/../hcatalog does not exist! HCatalog jobs will fail.<br>Please set $HCAT_HOME to the root of your HCatalog installation.<br>Warning: /usr/local/src/sqoop/../accumulo does not exist! Accumulo imports will fail.<br>Please set $ACCUMULO_HOME to the root of your Accumulo installation.<br>23/04/23 22:42:37 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7<br>usage: sqoop COMMAND [ARGS]<br><br>Available commands:<br> codegen Generate code to interact with database records<br> create-hive-table Import a table definition into Hive<br> eval Evaluate a SQL statement and display the results<br> export Export an HDFS directory to a database table<br> help List available commands<br> import Import a table from a database to HDFS<br> import-all-tables Import tables from a database to HDFS<br> import-mainframe Import datasets from a mainframe server to HDFS<br> job Work with saved jobs<br> list-databases List available databases on a server<br> list-tables List available tables in a database<br> merge Merge results of incremental imports<br> metastore Run a standalone Sqoop metastore<br> version Display version information<br><br>See 'sqoop help COMMAND' for information on a specific command.
复制代码 结果显示了 Sqoop 的常用命令和功能,如下表所示。
序号命令功能1import将数据导入到集群2export将集群数据导出3codegen生成与数据库记录交互的代码4create-hivetable创建 Hive 表5eval查看 SQL 执行结果6import-all-tables导入某个数据库下所有表到 HDFS 中7job生成一个 job8list-databases列出所有数据库名9list-tables列出某个数据库下所有的表10merge将 HDFS 中不同目录下数据合在一起,并存放在指定的目录 中11metastore记录 Sqoop job 的元数据信息,如果不启动 metastore 实 例,则默认的元数据存储目录为:~/.sqoop12help打印 Sqoop 帮助信息13version打印 Sqoop 版本信息3、通过命令查看 Flume 状态
步骤一:检查 Flume 安装是否成功,执行 flume-ng version 命令,查看 Flume 的版本。- [hadoop@master sqoop]$ cd /usr/local/src/flume/<br>[hadoop@master flume]$ flume-ng version<br>Flume 1.6.0<br>Source code repository: https://git-wip-us.apache.org/repos/asf/flume.git<br>Revision: 2561a23240a71ba20bf288c7c2cda88f443c2080<br>Compiled by hshreedharan on Mon May 11 11:15:44 PDT 2015<br>From source with checksum b29e416802ce9ece3269d34233baf43f<br>[hadoop@master flume]$
复制代码 步骤二:添加 example.conf 到/usr/local/src/flume- [hadoop@master flume]$ vim /usr/local/src/flume/example.conf<br>a1.sources=r1<br>a1.sinks=k1<br>a1.channels=c1<br>a1.sources.r1.type=spooldir<br>a1.sources.r1.spoolDir=/usr/local/src/flume/<br>a1.sources.r1.fileHeader=true<br>a1.sinks.k1.type=hdfs<br>a1.sinks.k1.hdfs.path=hdfs://master:9000/flume<br>a1.sinks.k1.hdfs.rollsize=1048760<br>a1.sinks.k1.hdfs.rollCount=0<br>a1.sinks.k1.hdfs.rollInterval=900<br>a1.sinks.k1.hdfs.useLocalTimeStamp=true<br>a1.channels.c1.type=file<br>a1.channels.c1.capacity=1000<br>a1.channels.c1.transactionCapacity=100<br>a1.sources.r1.channels = c1<br>a1.sinks.k1.channel = c1
复制代码 步骤三:启动 Flume Agent a1 日志控制台- [hadoop@master flume]$ flume-ng agent --conf-file example.conf --name a1 -Dflume.root.logger=INFO,console<br>Warning: No configuration directory set! Use --conf <dir> to override.<br>Info: Including Hadoop libraries found via (/usr/local/src/hadoop/bin/hadoop) for HDFS access<br>Info: Excluding /usr/local/src/hadoop/share/hadoop/common/lib/slf4j-api-1.7.10.jar from classpath<br>Info: Excluding /usr/local/src/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar from classpath<br>Info: Including HBASE libraries found via (/usr/local/src/hbase/bin/hbase) for HBASE access<br>Info: Excluding /usr/local/src/hbase/lib/slf4j-api-1.7.7.jar from classpath<br>Info: Excluding /usr/local/src/hbase/lib/slf4j-log4j12-1.7.5.jar from classpathxxxxxxxxxx [hadoop@master flume]$ flume-ng agent --conf-file example.conf --name a1 -Dflume.root.logger=INFO,consoleWarning: No configuration directory set! Use --conf <dir> to override.Info: Including Hadoop libraries found via (/usr/local/src/hadoop/bin/hadoop) for HDFS accessInfo: Excluding /usr/local/src/hadoop/share/hadoop/common/lib/slf4j-api-1.7.10.jar from classpathInfo: Excluding /usr/local/src/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar from classpathInfo: Including HBASE libraries found via (/usr/local/src/hbase/bin/hbase) for HBASE accessInfo: Excluding /usr/local/src/hbase/lib/slf4j-api-1.7.7.jar from classpathInfo: Excluding /usr/local/src/hbase/lib/slf4j-log4j12-1.7.5.jar from classpath[hadoop@master flume]$ /usr/local/src/flume/bin/flume-ng agent --conf ./conf --conf-file ./example.conf --name a1 -Dflume.root.logger=INFO,consoleInfo: Sourcing environment configuration script /usr/local/src/flume/conf/flume-env.shInfo: Including Hadoop libraries found via (/usr/local/src/hadoop/bin/hadoop) for HDFS access.../lib/native:/usr/local/src/hadoop/lib/native org.apache.flume.node.Application --conf-file ./example.conf --name a1/usr/local/src/flume/bin/flume-ng: line 241: /usr/loocal/src/jdk/bin/java: No such file or directory<br>...<br>23/04/23 22:52:56 INFO instrumentation.MonitoredCounterGroup: Shutdown Metric for type: SINK, name: k1. sink.connection.failed.count == 0<br>23/04/23 22:52:56 INFO instrumentation.MonitoredCounterGroup: Shutdown Metric for type: SINK, name: k1. sink.event.drain.attempt == 1918<br>23/04/23 22:52:56 INFO instrumentation.MonitoredCounterGroup: Shutdown Metric for type: SINK, name: k1. sink.event.drain.sucess == 1918<br>[hadoop@master flume]$ ^C
复制代码 步骤四: 查看结果- [hadoop@master flume]$ hdfs dfs -lsr /flume<br>lsr: DEPRECATED: Please use 'ls -R' instead.<br>-rw-r--r-- 2 hadoop supergroup 1499 2023-04-23 22:52 /flume/FlumeData.1682261559722<br>-rw-r--r-- 2 hadoop supergroup 1419 2023-04-23 22:52 /flume/FlumeData.1682261559723<br>-rw-r--r-- 2 hadoop supergroup 1468 2023-04-23 22:52 /flume/FlumeData.1682261559724<br>...<br>-rw-r--r-- 2 hadoop supergroup 1795 2023-04-23 22:52 /flume/FlumeData.1682261559817<br>-rw-r--r-- 2 hadoop supergroup 1841 2023-04-23 22:52 /flume/FlumeData.1682261559818<br>-rw-r--r-- 2 hadoop supergroup 1665 2023-04-23 22:52 /flume/FlumeData.1682261559819<br>-rw-r--r-- 2 hadoop supergroup 1439 2023-04-23 22:52 /flume/FlumeData.1682261559820
复制代码
免责声明:如果侵犯了您的权益,请联系站长,我们会及时删除侵权内容,谢谢合作! |