一、连接比赛节点
1.通过工具连接到ip主机
用户hadoop
暗码qweQWE123!@#
使用root登录,或者切换到root
1.1固定IP
①检察网络设置
- cd /etc/sysconfig/network-scripts/
复制代码
②编辑
③修改及新增
- BOOTPROTO=static
- ONBOOT="yes"
- IPADDR="192.168.200.131"
- NETMASK="255.255.255.0"
- GATEWAY="192.168.200.2"
- DNS1="192.168.200.2"
复制代码 ④重启网卡
- systemctl restart network
复制代码 2.修改hosts
2.1检察本机名
2.2修改域名和IP地址的映射文件
2.3内容
2.4修改主机名
- hostnamectl set-hostname Hadoop
复制代码 3.重启
4.检察
5.ping比赛节点
二、关闭防火墙
1.使用systemctl关闭(暂时)
- sudo systemctl stop filewalld
复制代码 2.检察防火墙状态
- sudo systemctl status filewalld
复制代码
3.使用service关闭(永世)
- sudo service filewalld stop
复制代码
三、设置时间同步
1.检察时间
2.安装ntpdate
- sudo yum -y install ntpdate
复制代码 3.更新yum源
3.1备份
- sudo mv /etc/yum.repos.d/CentOS-Base.repo /etc/yum.repos.d/CentOS-Base.repo.backup 3.3.2
复制代码 3.2下载
- sudo curl -o /etc/yum.repos.d/CentOS-Base.repo http://mirrors.aliyun.com/repo/Centos-7.repo
复制代码
3.3清空并天生缓存
4.安装utpdate
- sudo yum -y install ntpdate
复制代码 5.同步时间
- sudo ntpdate
- -u pool.ntp.org
复制代码
四、安装软件
1.切入/usr/local
- #删除文件夹下文件
- sudo rm -rf /usr/local
复制代码 2.安装jdk
2.1卸载原有的环境
①检察体系自带的OpenJDK相关文件
②删除文件
- cd /usr/lib/jvm
- rm -rf /usr/lib/jvm
复制代码 ③卸载
- yum -y remove java-1.7.0-openjdk*
复制代码
- yum -y remove java-1.8.0-openjdk*
复制代码
2.2安装jdk
①解压缩
- tar -zxvf jdk-8u171-linux-x64.tar.gz
复制代码 ②设置环境变量
③内容
- #JAVA_HOME
- export JAVA_HOME=/usr/local/jdk1.8.0_171
- export PATH=$PATH:$JAVA_HOME/bin
复制代码 ④让修改后的文件生效
⑤检察版本
3.安装hadoop
3.1解压缩
- tar -zxvf hadoop-2.9.2.tar.gz
复制代码 3.2hadoop环境变量
3.3内容
- ##HADOOP_HOME
- export HADOOP_HOME=/usr/local/hadoop-2.9.2
- export PATH=$PATH:$HADOOP_HOME/bin
- export PATH=$PATH:$HADOOP_HOME/sbin
复制代码 3.4将修改后的文件生效
3.5检察版本
4.设置Hadoop集群
- #切换到hadoop配置目录下的hadoop
- cd /usr/local/hadoop-2.9.2/etc/hadoop/
复制代码 4.1编辑hadoop-env.sh
修改JAVA_HOME 路径
- export JAVA_HOME=/usr/local/jdk1.8.0_171
复制代码 4.2设置core-site.xml
----在configuration里----
- <property>
- <name>fs.defaultFS</name>
- <value>hdfs://hadoop:9000</value>
- </property>
- <property>
- <name>hadoop.tmp.dir</name>
- <value>/usr/local/hadoop-2.9.2/data/tmp</value>
- </property>
- <property>
- <name>dfs.http.address</name>
- <value>0.0.0.0:50070</value>
- </property>
复制代码 4.3设置hdfs-site.xml
----在configuration里----
- <property>
- <name>dfs.replication</name>
- <value>1</value>
- </property>
复制代码 4.4格式化NameNode
- /usr/local/hadoop-2.9.2/bin/hdfs namenode -format
复制代码
4.5设置yarn-env.sh
- export JAVA_HOME=/usr/local/jdk1.8.0_171
复制代码 4.6设置yarn-site.xml
----在configuration里----
- <property>
- <name>yarn.nodemanager.aux-services</name>
- <value>mapreduce_shuffle</value>
- </property>
- <property>
- <name>yarn.resourcemanager.hostname</name>
- <value>hadoop</value>
- </property>
- <property>
- <name>yarn.resourcemanager.webapp.address</name>
- <value>0.0.0.0:8088</value>
- </property>
复制代码 4.7设置mapred-env.sh
- export JAVA_HOME=/usr/local/jdk1.8.0_171
复制代码 4.8重定名
- mv mapred-site.xml.template mapred-site.xml
复制代码 ①设置mapred-site.xml
- <property>
- <name>mapreduce.framework.name</name>
- <value>yarn</value>
- </property>
复制代码 4.9启动
- /usr/local/hadoop-2.9.2/sbin/start-all.sh
复制代码 输入yes及暗码
4.10检察是否运行
访问 NameNode 的 Web 界面
http://192.168.200.131:50070
访问 ResourceManager 的 Web 界面
http://192.168.200.131:8088
五、ssh免密设置
1.天生密钥
**回车四次,**所有主机都要实行
2.将本机公钥文件复制到其它虚拟机上
**吸收方开机,**所有主机都要实行
格式,分发时先输入yes后输对应主机的暗码
3.检察是否乐成免密登录
4.启动Hadoop集群
- /usr/local/hadoop-2.9.2/sbin/start-all.sh
复制代码
六、代码
1.Windows环境变量
变量名HADOOP_HOME,值为安装目次D:\JavaSoftware\hadoop-2.9.2
2.Path变量
%HADOOP_HOME%\bin
3.创建Maven项目
4.导⼊hadoop依赖
- <dependencies>
- <dependency>
- <groupId>org.apache.logging.log4j</groupId>
- <artifactId>log4j-core</artifactId>
- <version>2.8.2</version>
- </dependency>
- <dependency>
- <groupId>org.apache.hadoop</groupId>
- <artifactId>hadoop-common</artifactId>
- <version>2.9.2</version>
- </dependency>
- <dependency>
- <groupId>org.apache.hadoop</groupId>
- <artifactId>hadoop-client</artifactId>
- <version>2.9.2</version>
- </dependency>
- <dependency>
- <groupId>org.apache.hadoop</groupId>
- <artifactId>hadoop-hdfs</artifactId>
- <version>2.9.2</version>
- </dependency>
- </dependencies>
- <!--maven打包插件 -->
- <build>
- <plugins>
- <plugin>
- <artifactId>maven-compiler-plugin</artifactId>
- <version>2.3.2</version>
- <configuration>
- <source>1.8</source>
- <target>1.8</target>
- </configuration>
- </plugin>
- <plugin>
- <artifactId>maven-assembly-plugin</artifactId>
- <configuration>
- <descriptorRefs>
- <descriptorRef>jar-with-dependencies</descriptorRef>
- </descriptorRefs>
- </configuration>
- <executions>
- <execution>
- <id>make-assembly</id>
- <phase>package</phase>
- <goals>
- <goal>single</goal>
- </goals>
- </execution>
- </executions>
- </plugin>
- </plugins>
- </build>
复制代码 5.添加log4j.properties
- log4j.rootLogger=info, stdout
- log4j.appender.stdout=org.apache.log4j.ConsoleAppender
- log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
- log4j.appender.stdout.layout.ConversionPattern=%d %p [%c] - %m%n
- log4j.appender.logfile=org.apache.log4j.FileAppender
- log4j.appender.logfile.File=target/spring.log
- log4j.appender.logfile.layout=org.apache.log4j.PatternLayout
- log4j.appender.logfile.layout.ConversionPattern=%d %p [%c] - %m%n
复制代码 6.团体思绪梳理
Map阶段:
- map()⽅法中把传⼊的数据转为String类型
- 根据空格切分出单词
- 输出<单词,1>
Reduce阶段:
- 汇总各个key(单词)的个数,遍历value数据进⾏累加
- 输出key的总数
Driver
- 获取设置⽂件对象,获取job对象实例
- 指定步伐jar的当地路径
- 指定Mapper/Reducer类
- 指定Mapper输出的kv数据类型
- 指定终极输出的kv数据类型
- 指定job处理的原始数据路径
- 指定job输出效果路径
- 提交作业
- 编写Mapper类
7.示例一
7.1Map阶段
- package cn.bdqn;
- import org.apache.hadoop.io.IntWritable;
- import org.apache.hadoop.io.LongWritable;
- import org.apache.hadoop.io.Text;
- import org.apache.hadoop.mapreduce.Mapper;
- import java.io.IOException;
- public class WordcountMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
- Text k = new Text();
- IntWritable v = new IntWritable(1);
- @Override
- protected void map(LongWritable key, Text value, Context context) throws
- IOException, InterruptedException {
- // 1 获取一行
- String line = value.toString();
- // 2 切割
- String[] words = line.split(" ");
- // 3 输出
- for (String word : words) {
- k.set(word);
- context.write(k, v);
- }
- }
- }
复制代码 7.2Reducer阶段
- package cn.bdqn;
- import org.apache.hadoop.io.IntWritable;
- import org.apache.hadoop.io.Text;
- import org.apache.hadoop.mapreduce.Reducer;
- import java.io.IOException;
- public class WordcountReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
- int sum;
- IntWritable v = new IntWritable();
- @Override
- protected void reduce(Text key, Iterable<IntWritable> values, Context
- context) throws IOException, InterruptedException {
- // 1 累加求和
- sum = 0;
- for (IntWritable count : values) {
- sum += count.get();
- }
- System.out.println();
- // 2 输出
- v.set(sum);
- context.write(key, v);
- }
- }
复制代码 7.3Driver阶段
- package cn.bdqn;
- import org.apache.hadoop.conf.Configuration;
- import org.apache.hadoop.fs.Path;
- import org.apache.hadoop.io.IntWritable;
- import org.apache.hadoop.io.Text;
- import org.apache.hadoop.mapreduce.Job;
- import org.apache.hadoop.mapreduce.lib.input.CombineTextInputFormat;
- import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
- import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
- import java.io.IOException;
- public class WordcountDriver {
- public static void main(String[] args) throws IOException,
- ClassNotFoundException, InterruptedException {
- // 1 获取配置信息以及封装任务
- Configuration configuration = new Configuration();
- Job job = Job.getInstance(configuration);
- // 2 设置jar加载路径
- job.setJarByClass(WordcountDriver.class);
- // 3 设置map和reduce类
- job.setMapperClass(WordcountMapper.class);
- job.setReducerClass(WordcountReducer.class);
- job.setCombinerClass(WordcountReducer.class);
- // 4 设置map输出
- job.setMapOutputKeyClass(Text.class);
- job.setMapOutputValueClass(IntWritable.class);
- // 5 设置最终输出kv类型
- job.setOutputKeyClass(Text.class);
- job.setOutputValueClass(IntWritable.class);
- job.setInputFormatClass(CombineTextInputFormat.class);
- // 6 设置输入和输出路径
- FileInputFormat.setInputPaths(job, new Path(args[0]));
- FileOutputFormat.setOutputPath(job, new Path(args[1]));
- // 7 提交
- boolean result = job.waitForCompletion(true);
- System.exit(result ? 0 : 1);
- }
- }
复制代码 8.示例二
8.1序列化
- package cn.bdqn.demo1;
- import org.apache.hadoop.io.Writable;
- import java.io.DataInput;
- import java.io.DataOutput;
- import java.io.IOException;
- // 1 实现writable接口
- public class SpeakBean implements Writable {
- private long selfDuration;
- private long thirdPartDuration;
- private long sumDuration;
- //2 反序列化时,需要反射调用空参构造函数,所以必须有
- public SpeakBean() {
- }
- public SpeakBean(long selfDuration, long thirdPartDuration) {
- this.selfDuration = selfDuration;
- this.thirdPartDuration = thirdPartDuration;
- this.sumDuration = this.selfDuration + this.thirdPartDuration;
- }
- //3 写序列化方法
- public void write(DataOutput out) throws IOException {
- out.writeLong(selfDuration);
- out.writeLong(thirdPartDuration);
- out.writeLong(sumDuration);
- }
- //4 反序列化方法
- //5 反序列化方法读顺序必须和写序列化方法的写顺序必须一致
- public void readFields(DataInput in) throws IOException {
- this.selfDuration = in.readLong();
- this.thirdPartDuration = in.readLong();
- this.sumDuration = in.readLong();
- }
- // 6 编写toString方法,方便后续打印到文本
- @Override
- public String toString() {
- return selfDuration +
- "\t" + thirdPartDuration +
- "\t" + sumDuration;
- }
- public long getSelfDuration() {
- return selfDuration;
- }
- public void setSelfDuration(long selfDuration) {
- this.selfDuration = selfDuration;
- }
- public long getThirdPartDuration() {
- return thirdPartDuration;
- }
- public void setThirdPartDuration(long thirdPartDuration) {
- this.thirdPartDuration = thirdPartDuration;
- }
- public long getSumDuration() {
- return sumDuration;
- }
- public void setSumDuration(long sumDuration) {
- this.sumDuration = sumDuration;
- }
- public void set(long selfDuration, long thirdPartDuration) {
- this.selfDuration = selfDuration;
- this.thirdPartDuration = thirdPartDuration;
- this.sumDuration = this.selfDuration + this.thirdPartDuration;
- }
- }
复制代码 8.1Map阶段
- package cn.bdqn.demo1;
- import org.apache.hadoop.io.LongWritable;
- import org.apache.hadoop.io.Text;
- import org.apache.hadoop.mapreduce.Mapper;
- import java.io.IOException;
- public class SpeakDurationMapper extends Mapper<LongWritable, Text, Text,SpeakBean> {
- SpeakBean v = new SpeakBean();
- Text k = new Text();
- @Override
- protected void map(LongWritable key, Text value, Context context)
- throws IOException, InterruptedException {
- // 1 获取一行
- String line = value.toString();
- // 2 切割字段
- String[] fields = line.split("\t");
- // 3 封装对象
- // 取出设备id
- String deviceId = fields[1];
- // 取出自有和第三方时长数据
- long selfDuration = Long.parseLong(fields[fields.length - 3]);
- long thirdPartDuration = Long.parseLong(fields[fields.length - 2]);
- k.set(deviceId);
- v.set(selfDuration, thirdPartDuration);
- // 4 写出
- context.write(k, v);
- }
- }
复制代码 8.2Reducer阶段
- package cn.bdqn.demo1;
- import org.apache.hadoop.io.Text;
- import org.apache.hadoop.mapreduce.Reducer;
- import java.io.IOException;
- public class SpeakDurationReducer extends Reducer<Text, SpeakBean, Text, SpeakBean> {
- @Override
- protected void reduce(Text key, Iterable<SpeakBean> values, Context
- context) throws IOException, InterruptedException {
- long self_Duration = 0;
- long thirdPart_Duration = 0;// 1 遍历所用bean,将其中的自有,第三方时长分别累加
- for (SpeakBean sb : values) {
- self_Duration += sb.getSelfDuration();
- thirdPart_Duration += sb.getThirdPartDuration();
- }
- // 2 封装对象
- SpeakBean resultBean = new SpeakBean(self_Duration,
- thirdPart_Duration);
- // 3 写出
- context.write(key, resultBean);
- }
- }
复制代码 8.3Driver阶段
- package cn.bdqn.demo1;
- import org.apache.hadoop.conf.Configuration;
- import org.apache.hadoop.fs.Path;
- import org.apache.hadoop.io.Text;
- import org.apache.hadoop.mapreduce.Job;
- import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
- import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
- import java.io.IOException;
- public class SpeakerDriver {
- public static void main(String[] args) throws IllegalArgumentException,
- IOException, ClassNotFoundException, InterruptedException {
- // 输入输出路径需要根据自己电脑上实际的输入输出路径设置
- //args = new String[]{"d:/input/input/speak.data", "d:/output222"};
- // 1 获取配置信息,或者job对象实例
- Configuration configuration = new Configuration();
- Job job = Job.getInstance(configuration);
- // 6 指定本程序的jar包所在的本地路径
- job.setJarByClass(SpeakerDriver.class);
- // 2 指定本业务job要使用的mapper/Reducer业务类
- job.setMapperClass(SpeakDurationMapper.class);
- job.setReducerClass(SpeakDurationReducer.class);
- // 3 指定mapper输出数据的kv类型
- job.setMapOutputKeyClass(Text.class);
- job.setMapOutputValueClass(SpeakBean.class);
- // 4 指定最终输出的数据的kv类型
- job.setOutputKeyClass(Text.class);
- job.setOutputValueClass(SpeakBean.class);
- // 5 指定job的输入原始文件所在目录
- FileInputFormat.setInputPaths(job, new Path(args[0]));
- FileOutputFormat.setOutputPath(job, new Path(args[1]));
- // 7 将job中配置的相关参数,以及job所用的java类所在的jar包, 提交给yarn去运行
- boolean result = job.waitForCompletion(true);
- System.exit(result ? 0 : 1);
- }
- }
复制代码 七、打包上传
1.打包
2.效果
3.在Linux创建文件
4.hdfs命令
4.1帮助
4.2创建目标目次
- hadoop fs -mkdir -p /user/root
复制代码 4.3上传
4.4删除空目次
- hadoop fs -rmdir /user/root
复制代码 4.5删除文件
- hadoop fs -rm -f /user/root/a.txt
复制代码
4.6实行代码
- hadoop jar jar文件 启动类 读取文件位置 输出文件位置
复制代码 示例
- hadoop jar Hadoop_Demo-1.0-SNAPSHOT.jar cn.bdqn.demo.WordcountDriver /b.txt /out1
复制代码
- hadoop jar Hadoop_Demo-1.0-SNAPSHOT.jar cn.bdqn.demo1.SpeakerDriver /a.txt /out
复制代码 免责声明:如果侵犯了您的权益,请联系站长,我们会及时删除侵权内容,谢谢合作!更多信息从访问主页:qidao123.com:ToB企服之家,中国第一个企服评测及商务社交产业平台。 |