Apache Hadoop文件上传、下载、分布式盘算案例初体验 [复制链接]
发表于 2026-2-27 15:59:07 | 显示全部楼层 |阅读模式

马上注册,结交更多好友,享用更多功能,让你轻松玩转社区。

您需要 登录 才可以下载或查看,没有账号?立即注册

×
上篇:Apache Hadoop完全分布式集群搭建无坑指南-CSDN博客
通过上篇,我们搭建了完备的Hadoop集群,此篇我们简单通过集群上传和下载文件,同时测试分布式worldCount案例。后续的篇章再对分布式盘算、分布式存储作更深的明白。
上传下载测试

从linux当地文件体系上传下载文件验证HDFS集群工作是否正常
  1. #创建目录
  2. hdfs dfs -mkdir -p /test/input
  3. #本地hoome目录创建一个文件,随便写点内容进去
  4. cd /root
  5. vim test.txt
  6. #上传linxu文件到Hdfs
  7. hdfs dfs -put /root/test.txt /test/input
  8. #从Hdfs下载文件到linux本地(可以换别的节点进行测试)
  9. hdfs dfs -get /test/input/test.txt
复制代码
分布式盘算测试

在HDFS文件体系根目次下面创建一个wcinput文件夹
  1. [root@hadoop01 hadoop-2.9.2]# hdfs dfs -mkdir /wcinput
复制代码
创建wc.txt文件,输入如下内容
  1. hadoop mapreduce yarn
  2. hdfs hadoop mapreduce
  3. mapreduce yarn kmning
  4. kmning
  5. kmning
复制代码
上传wc.txt到Hdfs目次/wcinput下
  1. hdfs dfs -put wc.txt /wcinput
复制代码
实行mapreduce使命
  1. hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.2.jar wordcount /wcinput/ /wcoutput
复制代码
打印如下
  1. 24/07/03 20:44:26 INFO client.RMProxy: Connecting to ResourceManager at hadoop03/192.168.43.103:8032
  2. 24/07/03 20:44:28 INFO input.FileInputFormat: Total input files to process : 1
  3. 24/07/03 20:44:28 INFO mapreduce.JobSubmitter: number of splits:1
  4. 24/07/03 20:44:28 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled
  5. 24/07/03 20:44:29 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1720006717389_0001
  6. 24/07/03 20:44:29 INFO impl.YarnClientImpl: Submitted application application_1720006717389_0001
  7. 24/07/03 20:44:29 INFO mapreduce.Job: The url to track the job: http://hadoop03:8088/proxy/application_1720006717389_0001/
  8. 24/07/03 20:44:29 INFO mapreduce.Job: Running job: job_1720006717389_0001
  9. 24/07/03 20:44:45 INFO mapreduce.Job: Job job_1720006717389_0001 running in uber mode : false
  10. 24/07/03 20:44:45 INFO mapreduce.Job:  map 0% reduce 0%
  11. 24/07/03 20:44:57 INFO mapreduce.Job:  map 100% reduce 0%
  12. 24/07/03 20:45:13 INFO mapreduce.Job:  map 100% reduce 100%
  13. 24/07/03 20:45:14 INFO mapreduce.Job: Job job_1720006717389_0001 completed successfully
  14. 24/07/03 20:45:14 INFO mapreduce.Job: Counters: 49
  15.        File System Counters
  16.                FILE: Number of bytes read=70
  17.                FILE: Number of bytes written=396911
  18.                FILE: Number of read operations=0
  19.                FILE: Number of large read operations=0
  20.                FILE: Number of write operations=0
  21.                HDFS: Number of bytes read=180
  22.                HDFS: Number of bytes written=44
  23.                HDFS: Number of read operations=6
  24.                HDFS: Number of large read operations=0
  25.                HDFS: Number of write operations=2
  26.        Job Counters
  27.                Launched map tasks=1
  28.                Launched reduce tasks=1
  29.                Data-local map tasks=1
  30.                Total time spent by all maps in occupied slots (ms)=9440
  31.                Total time spent by all reduces in occupied slots (ms)=11870
  32.                Total time spent by all map tasks (ms)=9440
  33.                Total time spent by all reduce tasks (ms)=11870
  34.                Total vcore-milliseconds taken by all map tasks=9440
  35.                Total vcore-milliseconds taken by all reduce tasks=11870
  36.                Total megabyte-milliseconds taken by all map tasks=9666560
  37.                Total megabyte-milliseconds taken by all reduce tasks=12154880
  38.        Map-Reduce Framework
  39.                Map input records=5
  40.                Map output records=11
  41.                Map output bytes=124
  42.                Map output materialized bytes=70
  43.                Input split bytes=100
  44.                Combine input records=11
  45.                Combine output records=5
  46.                Reduce input groups=5
  47.                Reduce shuffle bytes=70
  48.                Reduce input records=5
  49.                Reduce output records=5
  50.                Spilled Records=10
  51.                Shuffled Maps =1
  52.                Failed Shuffles=0
  53.                Merged Map outputs=1
  54.                GC time elapsed (ms)=498
  55.                CPU time spent (ms)=3050
  56.                Physical memory (bytes) snapshot=374968320
  57.                Virtual memory (bytes) snapshot=4262629376
  58.                Total committed heap usage (bytes)=219676672
  59.        Shuffle Errors
  60.                BAD_ID=0
  61.                CONNECTION=0
  62.                IO_ERROR=0
  63.                WRONG_LENGTH=0
  64.                WRONG_MAP=0
  65.                WRONG_REDUCE=0
  66.        File Input Format Counters
  67.                Bytes Read=80
  68.        File Output Format Counters
  69.                Bytes Written=44
复制代码
检察结果
  1. [root@hadoop01 hadoop-2.9.2]# hdfs dfs -cat /wcoutput/part-r-00000
  2. hadoop  2
  3. hdfs    1
  4. kmning  3
  5. mapreduce       3
  6. yarn    2
复制代码
可见,步调将单词出现的次数通过MapReduce分布式盘算统计了出来。

免责声明:如果侵犯了您的权益,请联系站长,我们会及时删除侵权内容,谢谢合作!qidao123.com:ToB企服之家,中国第一个企服评测及软件市场,开放入驻,技术点评得现金
回复

使用道具 举报

登录后关闭弹窗

登录参与点评抽奖  加入IT实名职场社区
去登录
快速回复 返回顶部 返回列表