光之使者 发表于 2024-12-6 13:24:12

JDK8+MAVEN3.6.3+HADOOP3.2.2,wordcount实践

一、所需工具
JDK8
maven3.6.3
Hadoop3.2.2
IntelliJ IDEA 2022.3.3
下载链接:https://pan.baidu.com/s/1x5-hLZXUP6oawGy4h693eQ?pwd=mona 
提取码:mona 
 
二、安装JDK
(一)温馨提示,不安装在C盘,且目录万万别有空格,否则背面会报错,无法在Windows中调用hadoop命令。
(二)下载JDK8
我的版本为jdk-8u152-windows-x64,可根据自身需要到官网下载合适的版本
(三)安装JDK
1.在D盘创建Java目录,并在Java目录中分别创建子目录jdk和jre
https://i-blog.csdnimg.cn/direct/e1308428ab6644a6aca2ab8f4801bd6b.png
2.安装
点击jdk-8u152-windows-x64.exe,右键以管理员身份运行,点击“下一步”后,更改安装目录到D:\Java\jdk后下一步。
https://i-blog.csdnimg.cn/direct/7c89d3ad07d14d7c8ce7b04ed7c3dd27.png
等到jre安装提示出来后,更改安装目录到D:\Java\jre后下一步。
https://i-blog.csdnimg.cn/direct/42332813930041b6ba59816f5828a600.png
安装完成后点击关闭。
https://i-blog.csdnimg.cn/direct/44dae1fe56014dab94fb6b7e3f67dbfd.png
(四)设置环境变量
1.点击“我的电脑”,右键,点击“属性”,选择“高级系统设置”,选择“环境变量”。
https://i-blog.csdnimg.cn/direct/a27532f309b54eef82cb7c4de64d8757.png
https://i-blog.csdnimg.cn/direct/489ee6ecc6ec4da3a0d4c16585fa255a.png
https://i-blog.csdnimg.cn/direct/8e67ca1e7eeb430089f6e8d482878e38.png
2.在系统变量的栏位中选择“新建”
https://i-blog.csdnimg.cn/direct/16d21ce88eae4350a316ac8b69fbc1b7.png
3.添加JAVA_HOME,值为JDK的安装目录
https://i-blog.csdnimg.cn/direct/317e4ba58ee346278037d16bbbc036bf.png
4.将JAVA_HOME添加到Path中
选中系统变量栏位中的“Path”,点击“编辑”,点击新建后输入“%JAVA_HOME%\bin”,并将此条值上移。上移是为了保证系统会优先匹配我们的安装的JDK。
https://i-blog.csdnimg.cn/direct/c9b65126150840308f46858a1755a818.png
https://i-blog.csdnimg.cn/direct/529060a7cc3a4582a08b5814f2ea0254.png
5.验证环境变量是否乐成设置。Win+R,输入cmd,进入dos界面。依次输入java -version,java.exe和javac.exe,能乐成表现相关内容,即证明JDK安装乐成。
https://i-blog.csdnimg.cn/direct/7e530b1827ac41c8b0c61403b84610bb.png
https://i-blog.csdnimg.cn/direct/7420a00fcbd04cd8b9239e09cbfa0cf4.png
https://i-blog.csdnimg.cn/direct/0fc1df57049d42c6b536687360772778.png
二、安装MAVEN
(一)下载Maven,我的版本为3.6.3
(二)安装。Maven属于绿色版软件,解压即安装。将其解压到D:\Program Files\apache-maven-3.6.3
https://i-blog.csdnimg.cn/direct/57d3dce6e55b49c7842045b512c7ed5a.png
(三)设置环境变量。与设置JDK雷同,在环境变量中添加MAVEN_HOME,并将bin目录添加到Path中
https://i-blog.csdnimg.cn/direct/e0162c1c8dd24ce2bd978ab2565235f8.png
https://i-blog.csdnimg.cn/direct/31da51d80e8140c6a94cf6910a3e80f2.png
(四)验证是否安装乐成。Win+R,输入cmd,进入dos界面。输mvn,能乐成表现相关内容,即证明Maven安装乐成。
https://i-blog.csdnimg.cn/direct/0899b967efe641949daffeaba1e4c7f0.png
(五)本地堆栈设置
1.在D盘创建本地堆栈地点D:\maven\repository
https://i-blog.csdnimg.cn/direct/462a64f8f07c49618e608d81a6185372.png
2.将默认的堆栈地点改成D:\maven\repository。到D:\Program Files\apache-maven-3.6.3\conf中,找到settings.xml,用编辑器打开编辑,大概54行位置,添加 <localRepository>D:\maven\repository</localRepository>。如果不改,默认位置会在C盘的C:\Users\T480s\.m2\repository下,随着项目增多,C盘会爆。
https://i-blog.csdnimg.cn/direct/a3fe0610ae4f4b748315625c137cf2a0.png
(六)镜像堆栈设置
到D:\Program Files\apache-maven-3.6.3\conf中,找到settings.xml,用编辑器打开,到160行位置编辑。设置镜像主要是为了提高国内用户下载依赖的速度和稳定性,同时方便管理和维护。
添加以下内容:
<mirror>
<!-- 次镜像的唯一标识符,用来区分不同的mirror元素 -->
<id>nexus-aliyun</id>
<!--对哪种堆栈进行镜像,简单说就是替换哪个堆栈 -->
<mirrorOf>central</mirrorOf>
<!-- 镜像名称-->
<name>Nexus aliyun</name>
<!-- 镜像URL -->
<url>http://maven.aliyun.com/nexus/content/groups/public</url>
</mirror>
https://i-blog.csdnimg.cn/direct/d308fe5e286e43c2b010f4ec8bc82ca0.png
三、本地安装Hadoop
(一)下载Hadoop,我的版本为3.2.2
(二)解压至D:\hadoop-3.2.2
https://i-blog.csdnimg.cn/direct/ed6afdfcea9d41e79d2ce14f97b890d3.png
(三)在bin目录中添加hadoop.dll和winutils.exe。温馨提示,这两个文件一定要和Hadoop的版本匹配,如果不是3.2.2的版本,背面用的时候会报错。
https://i-blog.csdnimg.cn/direct/dd520b84a8874f97a8d2a35cfc7a9bd7.png
https://i-blog.csdnimg.cn/direct/ddc78283c8914be3b0df6b1edef2cd0f.png
(四)在C:\Windows\System32中也添加hadoop.dll
https://i-blog.csdnimg.cn/direct/808a98ee248048cda624d4d221e7cf06.png
(五)设置环境变量。与设置JDK雷同,在环境变量中添加HADOOP_HOME,并将bin目录添加到Path中
https://i-blog.csdnimg.cn/direct/c07733009270494f98cbd3b24b7d2c68.png
https://i-blog.csdnimg.cn/direct/3fa8abc0a71746fea127c958db00ee61.png
(六)设置hadoop-env.cmd
1.到D:\hadoop-3.2.2\etc\hadoop下找到hadoop-env.cmd
https://i-blog.csdnimg.cn/direct/37e5a8ab0eb24544bd2c9f404321983b.png
2.修改JAVA_HOME=D:\Java\jdk
https://i-blog.csdnimg.cn/direct/dafcffccf3a145d28e1399588c7e3609.png
(七)验证是否安装乐成。Win+R,输入cmd,进入dos界面。输hadoop version,能乐成表现相关内容,即证明Hadoop安装乐成。
https://i-blog.csdnimg.cn/direct/fc893f3f81dc412489a08aa7a938df5c.png
四、安装IDEA
(一)下载安装包,我的是2022.3.3
(二)安装IDEA到D:\Program Files\JetBrains\IntelliJ IDEA 2022.3.3,可下载激活工具激活。不要安装在C盘就行
五、wordcount体验
(一)打开IDEA
(二)新建maven项目
1.点击File——New——project
https://i-blog.csdnimg.cn/direct/3e9644abadb444068a0b96bfe98bf149.png
2.创建项目名为wordcount.mr,项目存放到D:\workspace(可根据自身情况确定目录),语言选择Java,构建系统选择Maven
https://i-blog.csdnimg.cn/direct/d3dd3a51b25a4c3685fbf761fdf22f13.png
3.选择我们自己的JDK版本
https://i-blog.csdnimg.cn/direct/d8466421fbdd41889af50615b7d18d82.png
https://i-blog.csdnimg.cn/direct/3d584f52df994538a680bf65de42a2cf.png
https://i-blog.csdnimg.cn/direct/2d8606b7e24a4718bf4281a288eabffc.png
https://i-blog.csdnimg.cn/direct/319c750ea0124e12be5d32dde88e0256.png
https://i-blog.csdnimg.cn/direct/b5d32e4d6d5b406088d714372bfb95f4.png
4.选择本地Maven
https://i-blog.csdnimg.cn/direct/94eafc1ac59d491d880d8cfe2a04868b.png
https://i-blog.csdnimg.cn/direct/dd9390e8db55443f8f9c11c79b16aae6.png
(三)编辑项目
1.编辑pom.xml
https://i-blog.csdnimg.cn/direct/c9d09b032a0d409b8d86add21281f257.png

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>com.wordcount</groupId>
    <artifactId>wordcount-mr</artifactId>
    <version>1.0</version>

    <properties>
        <maven.compiler.source>8</maven.compiler.source>
        <maven.compiler.target>8</maven.compiler.target>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
    </properties>

    <dependencies>
        <!-- Hadoop dependencies -->
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-common</artifactId>
            <version>3.2.2</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-hdfs</artifactId>
            <version>3.2.2</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-client</artifactId>
            <version>3.2.2</version>
        </dependency>
        <!-- 如果您的项目中确实需要hadoop-mapreduce-client-core,请保留并更新版本 -->
        <!-- 但通常hadoop-client已经包含了所需的MapReduce依赖 -->

        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-mapreduce-client-core</artifactId>
            <version>3.2.2</version>
        </dependency>

        <!-- 其他依赖,如数据库连接器等 -->
        <dependency>
            <groupId>mysql</groupId>
            <artifactId>mysql-connector-java</artifactId>
            <version>5.1.32</version>
        </dependency>
    </dependencies>

    <build>
        <plugins>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-jar-plugin</artifactId>
                <version>3.2.0</version> <!-- 考虑使用更新的版本 -->
                <configuration>
                    <archive>
                        <manifest>
                            <addClasspath>true</addClasspath>
                            <classpathPrefix>lib/</classpathPrefix>
                            <mainClass></mainClass> <!-- 替换为您的主类名 -->
                        </manifest>
                    </archive>
                </configuration>
            </plugin>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-compiler-plugin</artifactId>
                <version>3.0</version>
                <configuration>
                    <source>1.8</source>
                    <target>1.8</target>
                    <encoding>UTF-8</encoding>
                </configuration>
            </plugin>
        </plugins>
    </build>

</project>
2.等候依赖下载
https://i-blog.csdnimg.cn/direct/100d1defa9ad437eaf21fae69d94d563.png
3.观察依赖是否下载乐成
https://i-blog.csdnimg.cn/direct/463a60b332034b4583c56cf59ea1b45a.png
https://i-blog.csdnimg.cn/direct/8a70ae18e4b14b48aca3ead74db40ec2.png
4.创建mapper
https://i-blog.csdnimg.cn/direct/5697dd1d50084e1f932b6a16a4ba74bb.png
https://i-blog.csdnimg.cn/direct/20d41bfd6b744b129de900d09f84a006.png
https://i-blog.csdnimg.cn/direct/a65a5756b53a4d9c9f5ae0a3e90143d5.png
https://i-blog.csdnimg.cn/direct/310e51416bf343568c92df4b484fff84.png
https://i-blog.csdnimg.cn/direct/4d1100d055bd4ac18b7902e6e8d07e9b.png
添加如下代码:
package com.wordcount;

import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Counter;
import org.apache.hadoop.mapreduce.Mapper;

import java.io.IOException;

/**
 * @description:
 */
public class WordCountMapper extends Mapper<LongWritable, Text,Text,LongWritable> {
    //Mapper输出kv键值对  <单词,1>
    private Text keyOut = new Text();
    private final static LongWritable valueOut = new LongWritable(1);

    @Override
    protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
        //将读取的一行内容根据分隔符进行切割
        String[] words = value.toString().split("\\s+");
        //遍历单词数组
        for (String word : words) {
            keyOut.set(word);
            //输出单词,并标记1
            context.write(new Text(word),valueOut);
        }
    }
}
5.创建reducer
https://i-blog.csdnimg.cn/direct/00756f15799642d5aa4ebd1344e96578.png
https://i-blog.csdnimg.cn/direct/374c596fa94848c695b744b8d16b8ffd.png
https://i-blog.csdnimg.cn/direct/493c0baee6294ae6a8ad5f4da2607123.png
添加以下代码
package com.wordcount;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

import java.io.IOException;

/**
 * @description:
 */
public class WordCountReducer extends Reducer<Text, LongWritable,Text,LongWritable> {

    private LongWritable result = new LongWritable();

    @Override
    protected void reduce(Text key, Iterable<LongWritable> values, Context context) throws IOException, InterruptedException {
        //统计变量
        long count = 0;
        //遍历一组数据,取出该组全部的value
        for (LongWritable value : values) {
            //全部的value累加 就是该单词的总次数
            count +=value.get();
        }
        result.set(count);
        //输出最闭幕果<单词,总次数>
        context.write(key,result);
    }
}

6.创建driver
https://i-blog.csdnimg.cn/direct/2d438260b8e246d7b64bf46cd7775912.png
https://i-blog.csdnimg.cn/direct/9d275af9a2ec43e6be679dbc7cce2767.png
https://i-blog.csdnimg.cn/direct/382f0bd458e0490cb113cc090980235c.png
输入以下代码:
package com.wordcount;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;


public class WordCountDriver{
    public static void main(String[] args) throws Exception {
        //设置文件对象
        Configuration conf = new Configuration();
        // 创建作业实例
        Job job = Job.getInstance(conf, WordCountDriver.class.getSimpleName());
        // 设置作业驱动类
        job.setJarByClass(WordCountDriver.class);
        // 设置作业mapper reducer类
        job.setMapperClass(WordCountMapper.class);
        job.setReducerClass(WordCountReducer.class);
        // 设置作业mapper阶段输出key value数据范例
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(LongWritable.class);
        //设置作业reducer阶段输出key value数据范例 也就是步伐最终输出数据范例
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(LongWritable.class);
        // 设置作业的输入数据路径
        FileInputFormat.addInputPath(job, new Path(args));
        // 设置作业的输出数据路径
        FileOutputFormat.setOutputPath(job, new Path(args));
        //判断输出路径是否存在 如果存在删除
        FileSystem fs = FileSystem.get(conf);
        if(fs.exists(new Path(args))){
            fs.delete(new Path(args),true);
        }
        // 提交作业并等候实行完成
        boolean resultFlag = job.waitForCompletion(true);
        //步伐退出
        System.exit(resultFlag ? 0 :1);
    }
}

7.设置log4j.properties
https://i-blog.csdnimg.cn/direct/7d6f2414da9b481d9fefc24a72805170.png
https://i-blog.csdnimg.cn/direct/16f2e73db9e04dafa0de1a26d2947ca8.png
https://i-blog.csdnimg.cn/direct/38d25fe0b9a5429ea90a0aac1ec55009.png
输入以下代码:
log4j.rootLogger=info,stdout,R
log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern=%5p - %m%n
log4j.appender.R=org.apache.log4j.RollingFileAppender
log4j.appender.R.File=mapreduce_test.log
log4j.appender.R.MaxFileSize=1MB
log4j.appender.R.MaxBackupIndex=1
log4j.appender.R.layout=org.apache.log4j.PatternLayout
log4j.appender.R.layout.ConversionPattern=%p %t %c - %m%n
log4j.logger.com.codefutures=DEBUG


(四)在本地运行项目
1.添加路径参数
先运行WordCountDriver
https://i-blog.csdnimg.cn/direct/30d3e2f607dd42f18039eabbed19048e.png
会提示错误,因为我们没有设置路径参数
https://i-blog.csdnimg.cn/direct/f268cb10cf774a7084bb62df78cbc275.png
https://i-blog.csdnimg.cn/direct/186722da2bd6461e9f59d90389d43900.png
这样设置
https://i-blog.csdnimg.cn/direct/909bec2942c54f629f4cc15d60d67275.png
https://i-blog.csdnimg.cn/direct/056503ece8094d87a2ee785668b19444.png
https://i-blog.csdnimg.cn/direct/5d2a3b401c9a4c4f8bf38c1f82bbca01.png
2.在D:\wordcount\input中添加文档1.txt。注意,input目录需要自己先建好,output可以不建。
https://i-blog.csdnimg.cn/direct/13ef38c272f34a78ba558f962698d863.png
https://i-blog.csdnimg.cn/direct/0653f31f1d4140abba75bf8a1d8f0952.png
https://i-blog.csdnimg.cn/direct/94e3097112144a6e84cb5ea010f9e76f.png
3.运行
https://i-blog.csdnimg.cn/direct/7f3659e82c9a4d4f9ba2a524405accd2.png
https://i-blog.csdnimg.cn/direct/19c3af1d9afa41329ddf34a3c33e16f3.png
4.查验运行乐成
https://i-blog.csdnimg.cn/direct/3aab8023cb754efea1966def74d62750.png
https://i-blog.csdnimg.cn/direct/7b48162d8fce4a69a0902261fe8f74d7.png
https://i-blog.csdnimg.cn/direct/5f27acf2880d412da07d97cb60ca3839.png
(五)在集群运行项目
1.在pom.xml中添加主类
先在WordCountDriver中复制主类名
https://i-blog.csdnimg.cn/direct/3fdab6b013534663b94ae38862003e87.png
再在pom.xml中的相应位置添加
https://i-blog.csdnimg.cn/direct/47109dd295ca4ba7a4ba3f240da6a2c1.png
2.打包
https://i-blog.csdnimg.cn/direct/97efe29982e14796a32f7dd55843bf9f.png
https://i-blog.csdnimg.cn/direct/c5f0bfaace1f4f04b351585051259825.png
https://i-blog.csdnimg.cn/direct/9e8b1d68342a4990a645f8e8bff117b3.png
3.查看jar包
https://i-blog.csdnimg.cn/direct/2842e4cbf08a45f4b8406b8799d78ce3.png
4.集群测试
4.1启动集群
4.2创建数据目录
hdfs dfs -mkdir -p /wordcount-mr/input
https://i-blog.csdnimg.cn/direct/7397b5d771454b97a2fd49674d9b88a2.png
4.3将1.txt上传到HDFS中的 /wordcount-mr/input
https://i-blog.csdnimg.cn/direct/4e70048bc3e34da5b4f53158f45cc82c.png
4.4将1.txt上传到主节点的/root
https://i-blog.csdnimg.cn/direct/f0dfa6ef6a0748618e0aff3f430281e7.png
4.5将1.txt上传至 /wordcount-mr/input
https://i-blog.csdnimg.cn/direct/99ba60b3defc4a80a466612661ef3e7c.png
https://i-blog.csdnimg.cn/direct/f15101ef9a244eb8a3fa272eeea6cc5e.png
4.6将jar包上传到主节点
https://i-blog.csdnimg.cn/direct/6289a60227f94bdeb318a3e186102e4b.png
4.7运行jar包(我的jar包当时的名字是wordcont-mr-1.0.jar,所以一定用自己的jar包的名字,不要搞错了)
hadoop jar wordcont-mr-1.0.jar /wordcount-mr/input /wordcount-mr/output
https://i-blog.csdnimg.cn/direct/1db5566a4e724f7db2beef36c8cd013d.png
4.8到HDFS上查看运行结果
https://i-blog.csdnimg.cn/direct/de238fb4f7ec4f88aa230ea833f582b1.png
https://i-blog.csdnimg.cn/direct/7727a92ee9ae46d0ba6d66cf3f1b6ec8.png
https://i-blog.csdnimg.cn/direct/6ed959c6cdd0439da61294350976f4b7.png

免责声明:如果侵犯了您的权益,请联系站长,我们会及时删除侵权内容,谢谢合作!更多信息从访问主页:qidao123.com:ToB企服之家,中国第一个企服评测及商务社交产业平台。
页: [1]
查看完整版本: JDK8+MAVEN3.6.3+HADOOP3.2.2,wordcount实践