Apache Hadoop文件上传、下载、分布式盘算案例初体验

发表于 2026-2-27 15:59:07

马上注册，结交更多好友，享用更多功能，让你轻松玩转社区。

您需要登录才可以下载或查看，没有账号？立即注册

×

上篇：Apache Hadoop完全分布式集群搭建无坑指南-CSDN博客
通过上篇，我们搭建了完备的Hadoop集群，此篇我们简单通过集群上传和下载文件，同时测试分布式worldCount案例。后续的篇章再对分布式盘算、分布式存储作更深的明白。
上传下载测试

从linux当地文件体系上传下载文件验证HDFS集群工作是否正常

#创建目录
hdfs dfs -mkdir -p /test/input
#本地hoome目录创建一个文件,随便写点内容进去
cd /root
vim test.txt
#上传linxu文件到Hdfs
hdfs dfs -put /root/test.txt /test/input
#从Hdfs下载文件到linux本地（可以换别的节点进行测试）
hdfs dfs -get /test/input/test.txt

复制代码

分布式盘算测试

在HDFS文件体系根目次下面创建一个wcinput文件夹

[root@hadoop01 hadoop-2.9.2]# hdfs dfs -mkdir /wcinput

复制代码

创建wc.txt文件，输入如下内容

hadoop mapreduce yarn
hdfs hadoop mapreduce
mapreduce yarn kmning
kmning
kmning

复制代码

上传wc.txt到Hdfs目次/wcinput下

hdfs dfs -put wc.txt /wcinput

复制代码

实行mapreduce使命

hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.2.jar wordcount /wcinput/ /wcoutput

复制代码

打印如下

24/07/03 20:44:26 INFO client.RMProxy: Connecting to ResourceManager at hadoop03/192.168.43.103:8032
24/07/03 20:44:28 INFO input.FileInputFormat: Total input files to process : 1
24/07/03 20:44:28 INFO mapreduce.JobSubmitter: number of splits:1
24/07/03 20:44:28 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled
24/07/03 20:44:29 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1720006717389_0001
24/07/03 20:44:29 INFO impl.YarnClientImpl: Submitted application application_1720006717389_0001
24/07/03 20:44:29 INFO mapreduce.Job: The url to track the job: http://hadoop03:8088/proxy/application_1720006717389_0001/
24/07/03 20:44:29 INFO mapreduce.Job: Running job: job_1720006717389_0001
24/07/03 20:44:45 INFO mapreduce.Job: Job job_1720006717389_0001 running in uber mode : false
24/07/03 20:44:45 INFO mapreduce.Job: map 0% reduce 0%
24/07/03 20:44:57 INFO mapreduce.Job: map 100% reduce 0%
24/07/03 20:45:13 INFO mapreduce.Job: map 100% reduce 100%
24/07/03 20:45:14 INFO mapreduce.Job: Job job_1720006717389_0001 completed successfully
24/07/03 20:45:14 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=70
FILE: Number of bytes written=396911
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=180
HDFS: Number of bytes written=44
HDFS: Number of read operations=6
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=9440
Total time spent by all reduces in occupied slots (ms)=11870
Total time spent by all map tasks (ms)=9440
Total time spent by all reduce tasks (ms)=11870
Total vcore-milliseconds taken by all map tasks=9440
Total vcore-milliseconds taken by all reduce tasks=11870
Total megabyte-milliseconds taken by all map tasks=9666560
Total megabyte-milliseconds taken by all reduce tasks=12154880
Map-Reduce Framework
Map input records=5
Map output records=11
Map output bytes=124
Map output materialized bytes=70
Input split bytes=100
Combine input records=11
Combine output records=5
Reduce input groups=5
Reduce shuffle bytes=70
Reduce input records=5
Reduce output records=5
Spilled Records=10
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=498
CPU time spent (ms)=3050
Physical memory (bytes) snapshot=374968320
Virtual memory (bytes) snapshot=4262629376
Total committed heap usage (bytes)=219676672
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=80
File Output Format Counters
Bytes Written=44

复制代码

检察结果

[root@hadoop01 hadoop-2.9.2]# hdfs dfs -cat /wcoutput/part-r-00000
hadoop 2
hdfs 1
kmning 3
mapreduce 3
yarn 2

复制代码

可见，步调将单词出现的次数通过MapReduce分布式盘算统计了出来。

免责声明：如果侵犯了您的权益，请联系站长，我们会及时删除侵权内容，谢谢合作！qidao123.com:ToB企服之家，中国第一个企服评测及软件市场,开放入驻,技术点评得现金