一、所需工具
JDK8
maven3.6.3
Hadoop3.2.2
IntelliJ IDEA 2022.3.3
下载链接:https://pan.baidu.com/s/1x5-hLZXUP6oawGy4h693eQ?pwd=mona
提取码:mona
二、安装JDK
(一)温馨提示,不安装在C盘,且目录万万别有空格,否则背面会报错,无法在Windows中调用hadoop命令。
(二)下载JDK8
我的版本为jdk-8u152-windows-x64,可根据自身需要到官网下载合适的版本
(三)安装JDK
1.在D盘创建Java目录,并在Java目录中分别创建子目录jdk和jre
data:image/s3,"s3://crabby-images/503e2/503e25eb452d635cf4e9196831f23f4db8493a9b" alt=""
2.安装
点击jdk-8u152-windows-x64.exe,右键以管理员身份运行,点击“下一步”后,更改安装目录到D:\Java\jdk后下一步。
data:image/s3,"s3://crabby-images/a05d5/a05d5dc1c5ef52a1b176f6791f23e1c054f9e0bc" alt=""
等到jre安装提示出来后,更改安装目录到D:\Java\jre后下一步。
data:image/s3,"s3://crabby-images/2612e/2612ec23a9d2dc44d778f4276f1c3396c76702bb" alt=""
安装完成后点击关闭。
data:image/s3,"s3://crabby-images/ae558/ae558192690fdb5fb5fb0b00af051330bbf28a35" alt=""
(四)设置环境变量
1.点击“我的电脑”,右键,点击“属性”,选择“高级系统设置”,选择“环境变量”。
data:image/s3,"s3://crabby-images/d8c8a/d8c8ac291f0b10a2e4b1fcbd4947e09109ca0976" alt=""
data:image/s3,"s3://crabby-images/017d3/017d3d3e45f3506183b414b8b39bcd3b0a345089" alt=""
data:image/s3,"s3://crabby-images/356b2/356b2c7fb70334a713fb9810929cec28e0b3340b" alt=""
2.在系统变量的栏位中选择“新建”
data:image/s3,"s3://crabby-images/d5dfd/d5dfd24553b539bb24cfb2270a9bb543d8c3910f" alt=""
3.添加JAVA_HOME,值为JDK的安装目录
data:image/s3,"s3://crabby-images/cb617/cb6171349d2502c73f1c9e630b3e59fd6e2fc8e8" alt=""
4.将JAVA_HOME添加到Path中
选中系统变量栏位中的“Path”,点击“编辑”,点击新建后输入“%JAVA_HOME%\bin”,并将此条值上移。上移是为了保证系统会优先匹配我们的安装的JDK。
data:image/s3,"s3://crabby-images/b46e9/b46e934c705633b7682aeb8416e91617107d4492" alt=""
data:image/s3,"s3://crabby-images/bc5e2/bc5e21eec158273f82ac43106b6d47b45993902b" alt=""
5.验证环境变量是否乐成设置。Win+R,输入cmd,进入dos界面。依次输入java -version,java.exe和javac.exe,能乐成表现相关内容,即证明JDK安装乐成。
data:image/s3,"s3://crabby-images/2b835/2b835ff0dc9f8e6c97f0fa32ee29c277f71c3126" alt=""
data:image/s3,"s3://crabby-images/b23fd/b23fd7214fe69b5d26a377c9eb8d3b2cf2587080" alt=""
data:image/s3,"s3://crabby-images/787ca/787ca6a04dfedbcea85cb95fc2fffbbb4427abfb" alt=""
二、安装MAVEN
(一)下载Maven,我的版本为3.6.3
(二)安装。Maven属于绿色版软件,解压即安装。将其解压到D:\Program Files\apache-maven-3.6.3
data:image/s3,"s3://crabby-images/51b0f/51b0ff87d9cf702b83044d39910260677d5a7f6c" alt=""
(三)设置环境变量。与设置JDK雷同,在环境变量中添加MAVEN_HOME,并将bin目录添加到Path中
data:image/s3,"s3://crabby-images/de896/de896f5b9d2353cda34152a35c6276e139632702" alt=""
data:image/s3,"s3://crabby-images/74058/7405875f415da73d31efdac5afff95af6ad447a0" alt=""
(四)验证是否安装乐成。Win+R,输入cmd,进入dos界面。输mvn,能乐成表现相关内容,即证明Maven安装乐成。
data:image/s3,"s3://crabby-images/f06e4/f06e4df54de6aa974f45c7765b1f201555e85cca" alt=""
(五)本地堆栈设置
1.在D盘创建本地堆栈地点D:\maven\repository
data:image/s3,"s3://crabby-images/5c63f/5c63fd9b65f333a605c42783681723066389562a" alt=""
2.将默认的堆栈地点改成D:\maven\repository。到D:\Program Files\apache-maven-3.6.3\conf中,找到settings.xml,用编辑器打开编辑,大概54行位置,添加 <localRepository>D:\maven\repository</localRepository>。如果不改,默认位置会在C盘的C:\Users\T480s\.m2\repository下,随着项目增多,C盘会爆。
data:image/s3,"s3://crabby-images/47c01/47c01a5c00cbff855dcf3185c6a0ad77c2c5ef50" alt=""
(六)镜像堆栈设置
到D:\Program Files\apache-maven-3.6.3\conf中,找到settings.xml,用编辑器打开,到160行位置编辑。设置镜像主要是为了提高国内用户下载依赖的速度和稳定性,同时方便管理和维护。
添加以下内容:
<mirror>
<!-- 次镜像的唯一标识符,用来区分不同的mirror元素 -->
<id>nexus-aliyun</id>
<!--对哪种堆栈进行镜像,简单说就是替换哪个堆栈 -->
<mirrorOf>central</mirrorOf>
<!-- 镜像名称-->
<name>Nexus aliyun</name>
<!-- 镜像URL -->
<url>http://maven.aliyun.com/nexus/content/groups/public</url>
</mirror>
data:image/s3,"s3://crabby-images/87869/878696d42effdc11d4b071dbee2a5f657a983f4b" alt=""
三、本地安装Hadoop
(一)下载Hadoop,我的版本为3.2.2
(二)解压至D:\hadoop-3.2.2
data:image/s3,"s3://crabby-images/b33b8/b33b845c63da899f12dfc1bd005bcdb3c3449083" alt=""
(三)在bin目录中添加hadoop.dll和winutils.exe。温馨提示,这两个文件一定要和Hadoop的版本匹配,如果不是3.2.2的版本,背面用的时候会报错。
data:image/s3,"s3://crabby-images/d38d6/d38d65fc9662caa9f215028086f2ec5a60cde08a" alt=""
data:image/s3,"s3://crabby-images/d0064/d0064efce0f67547aa5b335ec1579a2b901fe264" alt=""
(四)在C:\Windows\System32中也添加hadoop.dll
data:image/s3,"s3://crabby-images/67343/67343d1b5beac6fe91654a399f86392a1e24833c" alt=""
(五)设置环境变量。与设置JDK雷同,在环境变量中添加HADOOP_HOME,并将bin目录添加到Path中
data:image/s3,"s3://crabby-images/202b6/202b64b05acedefc9c701c8bc8e161dff687e88d" alt=""
data:image/s3,"s3://crabby-images/d54c8/d54c8d0c7d25c69c2afa7e10968d0da680a92f64" alt=""
(六)设置hadoop-env.cmd
1.到D:\hadoop-3.2.2\etc\hadoop下找到hadoop-env.cmd
data:image/s3,"s3://crabby-images/081c0/081c06cfcf525a609f3382f1c6019151aa7de7c9" alt=""
2.修改JAVA_HOME=D:\Java\jdk
data:image/s3,"s3://crabby-images/20c0a/20c0ac2c58c4f50a7eeb8b2bd79602dec03ccc3c" alt=""
(七)验证是否安装乐成。Win+R,输入cmd,进入dos界面。输hadoop version,能乐成表现相关内容,即证明Hadoop安装乐成。
data:image/s3,"s3://crabby-images/1a6e8/1a6e864f150b0de2386e0a41885089b234e2c8bf" alt=""
四、安装IDEA
(一)下载安装包,我的是2022.3.3
(二)安装IDEA到D:\Program Files\JetBrains\IntelliJ IDEA 2022.3.3,可下载激活工具激活。不要安装在C盘就行
五、wordcount体验
(一)打开IDEA
(二)新建maven项目
1.点击File——New——project
data:image/s3,"s3://crabby-images/c8caa/c8caafd916afbf9de5e4d36909ea0c5d40649748" alt=""
2.创建项目名为wordcount.mr,项目存放到D:\workspace(可根据自身情况确定目录),语言选择Java,构建系统选择Maven
data:image/s3,"s3://crabby-images/15498/15498a4a2ffccd438029041a5cf8f669bc858463" alt=""
3.选择我们自己的JDK版本
data:image/s3,"s3://crabby-images/86892/86892f587f64edcec692005b1c0c2fb3d21f0f8f" alt=""
data:image/s3,"s3://crabby-images/7cf2b/7cf2b86da04b317a7d6bb63f937c6e0f5920ef95" alt=""
data:image/s3,"s3://crabby-images/6bb89/6bb89305319d98ce2ebc56601a2e20ff2bdeb633" alt=""
data:image/s3,"s3://crabby-images/0465b/0465baf55300c800344fea41cd6817eeb13220a1" alt=""
data:image/s3,"s3://crabby-images/baee0/baee0ed33931d9b70a4ed4843232d3138b298fcd" alt=""
4.选择本地Maven
data:image/s3,"s3://crabby-images/3793f/3793fa8cc2b07e608cd9c6207b65d5edc0b1d83f" alt=""
data:image/s3,"s3://crabby-images/b7b53/b7b539e94ee5599dd32c56d743426be8f16aac87" alt=""
(三)编辑项目
1.编辑pom.xml
data:image/s3,"s3://crabby-images/eae21/eae215998e0c946a00362191eff8d7c24281e63f" alt=""
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.wordcount</groupId>
<artifactId>wordcount-mr</artifactId>
<version>1.0</version>
<properties>
<maven.compiler.source>8</maven.compiler.source>
<maven.compiler.target>8</maven.compiler.target>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
</properties>
<dependencies>
<!-- Hadoop dependencies -->
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>3.2.2</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
<version>3.2.2</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>3.2.2</version>
</dependency>
<!-- 如果您的项目中确实需要hadoop-mapreduce-client-core,请保留并更新版本 -->
<!-- 但通常hadoop-client已经包含了所需的MapReduce依赖 -->
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-mapreduce-client-core</artifactId>
<version>3.2.2</version>
</dependency>
<!-- 其他依赖,如数据库连接器等 -->
<dependency>
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
<version>5.1.32</version>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-jar-plugin</artifactId>
<version>3.2.0</version> <!-- 考虑使用更新的版本 -->
<configuration>
<archive>
<manifest>
<addClasspath>true</addClasspath>
<classpathPrefix>lib/</classpathPrefix>
<mainClass></mainClass> <!-- 替换为您的主类名 -->
</manifest>
</archive>
</configuration>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.0</version>
<configuration>
<source>1.8</source>
<target>1.8</target>
<encoding>UTF-8</encoding>
</configuration>
</plugin>
</plugins>
</build>
</project>
2.等候依赖下载
data:image/s3,"s3://crabby-images/20a3b/20a3bfcc1e72f3cc0b0f473bd4532574783ab89f" alt=""
3.观察依赖是否下载乐成
data:image/s3,"s3://crabby-images/d2a31/d2a31a2593055c3013fd1735da1f7c4239662f88" alt=""
data:image/s3,"s3://crabby-images/ceffc/ceffc60c03946e6923f4bde6839289266dec3cc6" alt=""
4.创建mapper
data:image/s3,"s3://crabby-images/5a353/5a353eb2fa2d76273c44ea2325723d286124ce90" alt=""
data:image/s3,"s3://crabby-images/2094c/2094c4c23beb563eb5630b2ed37595706f0a343f" alt=""
data:image/s3,"s3://crabby-images/37ac3/37ac31c83f29b58cdb731ecc60a78c0c6f02328b" alt=""
data:image/s3,"s3://crabby-images/0b326/0b326d94074e0c07acc145f45128bf9eeaa4dac1" alt=""
data:image/s3,"s3://crabby-images/4b619/4b61919d065c18f5af2107576fc0d838f0c5149f" alt=""
添加如下代码:
package com.wordcount;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Counter;
import org.apache.hadoop.mapreduce.Mapper;
import java.io.IOException;
/**
* @description:
*/
public class WordCountMapper extends Mapper<LongWritable, Text,Text,LongWritable> {
//Mapper输出kv键值对 <单词,1>
private Text keyOut = new Text();
private final static LongWritable valueOut = new LongWritable(1);
@Override
protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
//将读取的一行内容根据分隔符进行切割
String[] words = value.toString().split("\\s+");
//遍历单词数组
for (String word : words) {
keyOut.set(word);
//输出单词,并标记1
context.write(new Text(word),valueOut);
}
}
}
5.创建reducer
data:image/s3,"s3://crabby-images/a671d/a671dc757da83117a6b086efacbb7c3cf46b3756" alt=""
data:image/s3,"s3://crabby-images/db1d9/db1d9798ece56c9fac8bfa01911ef8171cb743b5" alt=""
data:image/s3,"s3://crabby-images/e0322/e032264233a3e00d663d103bf678ad79586ff32d" alt=""
添加以下代码
package com.wordcount;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
import java.io.IOException;
/**
* @description:
*/
public class WordCountReducer extends Reducer<Text, LongWritable,Text,LongWritable> {
private LongWritable result = new LongWritable();
@Override
protected void reduce(Text key, Iterable<LongWritable> values, Context context) throws IOException, InterruptedException {
//统计变量
long count = 0;
//遍历一组数据,取出该组全部的value
for (LongWritable value : values) {
//全部的value累加 就是该单词的总次数
count +=value.get();
}
result.set(count);
//输出最闭幕果<单词,总次数>
context.write(key,result);
}
}
6.创建driver
data:image/s3,"s3://crabby-images/fee49/fee498a6be89c6abc6a6fb5dba5148e586e0169f" alt=""
data:image/s3,"s3://crabby-images/b29e1/b29e1e8495b708fa7e75dd85442f024ae594ef94" alt=""
data:image/s3,"s3://crabby-images/a1935/a1935835a68ebf422ee439d238ec174ad088b9d3" alt=""
输入以下代码:
package com.wordcount;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class WordCountDriver{
public static void main(String[] args) throws Exception {
//设置文件对象
Configuration conf = new Configuration();
// 创建作业实例
Job job = Job.getInstance(conf, WordCountDriver.class.getSimpleName());
// 设置作业驱动类
job.setJarByClass(WordCountDriver.class);
// 设置作业mapper reducer类
job.setMapperClass(WordCountMapper.class);
job.setReducerClass(WordCountReducer.class);
// 设置作业mapper阶段输出key value数据范例
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(LongWritable.class);
//设置作业reducer阶段输出key value数据范例 也就是步伐最终输出数据范例
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(LongWritable.class);
// 设置作业的输入数据路径
FileInputFormat.addInputPath(job, new Path(args[0]));
// 设置作业的输出数据路径
FileOutputFormat.setOutputPath(job, new Path(args[1]));
//判断输出路径是否存在 如果存在删除
FileSystem fs = FileSystem.get(conf);
if(fs.exists(new Path(args[1]))){
fs.delete(new Path(args[1]),true);
}
// 提交作业并等候实行完成
boolean resultFlag = job.waitForCompletion(true);
//步伐退出
System.exit(resultFlag ? 0 :1);
}
}
7.设置log4j.properties
data:image/s3,"s3://crabby-images/a373a/a373a12c05d58da0aff793d2445399db8de9ab49" alt=""
data:image/s3,"s3://crabby-images/025c8/025c82b434b44c81f187f64f302fd04a9c950988" alt=""
data:image/s3,"s3://crabby-images/77dbf/77dbf894a5a199d0d47faae48de1d183f5ee1b0e" alt=""
输入以下代码:
log4j.rootLogger=info,stdout,R
log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern=%5p - %m%n
log4j.appender.R=org.apache.log4j.RollingFileAppender
log4j.appender.R.File=mapreduce_test.log
log4j.appender.R.MaxFileSize=1MB
log4j.appender.R.MaxBackupIndex=1
log4j.appender.R.layout=org.apache.log4j.PatternLayout
log4j.appender.R.layout.ConversionPattern=%p %t %c - %m%n
log4j.logger.com.codefutures=DEBUG
(四)在本地运行项目
1.添加路径参数
先运行WordCountDriver
会提示错误,因为我们没有设置路径参数
这样设置
2.在D:\wordcount\input中添加文档1.txt。注意,input目录需要自己先建好,output可以不建。
3.运行
4.查验运行乐成
(五)在集群运行项目
1.在pom.xml中添加主类
先在WordCountDriver中复制主类名
再在pom.xml中的相应位置添加
2.打包
3.查看jar包
4.集群测试
4.1启动集群
4.2创建数据目录
hdfs dfs -mkdir -p /wordcount-mr/input
4.3将1.txt上传到HDFS中的 /wordcount-mr/input
4.4将1.txt上传到主节点的/root
4.5将1.txt上传至 /wordcount-mr/input
4.6将jar包上传到主节点
4.7运行jar包(我的jar包当时的名字是wordcont-mr-1.0.jar,所以一定用自己的jar包的名字,不要搞错了)
hadoop jar wordcont-mr-1.0.jar /wordcount-mr/input /wordcount-mr/output
4.8到HDFS上查看运行结果
免责声明:如果侵犯了您的权益,请联系站长,我们会及时删除侵权内容,谢谢合作!更多信息从访问主页:qidao123.com:ToB企服之家,中国第一个企服评测及商务社交产业平台。 |