ToB企服应用市场:ToB评测及商务社交产业平台
标题:
Hadoop环境
[打印本页]
作者:
万有斥力
时间:
2024-11-5 20:46
标题:
Hadoop环境
一、连接比赛节点
1.通过工具连接到ip主机
用户hadoop
暗码qweQWE123!@#
使用root登录,或者切换到root
1.1固定IP
①检察网络设置
cd /etc/sysconfig/network-scripts/
复制代码
②编辑
vi ifcfg-ens33
复制代码
③修改及新增
BOOTPROTO=static
ONBOOT="yes"
IPADDR="192.168.200.131"
NETMASK="255.255.255.0"
GATEWAY="192.168.200.2"
DNS1="192.168.200.2"
复制代码
④重启网卡
systemctl restart network
复制代码
2.修改hosts
2.1检察本机名
cat /etc/hosts
复制代码
2.2修改域名和IP地址的映射文件
vi /etc/hosts
复制代码
2.3内容
2.4修改主机名
hostnamectl set-hostname Hadoop
复制代码
3.重启
reboot
复制代码
4.检察
5.ping比赛节点
ping hadoop
复制代码
二、关闭防火墙
1.使用systemctl关闭(暂时)
sudo systemctl stop filewalld
复制代码
2.检察防火墙状态
sudo systemctl status filewalld
复制代码
3.使用service关闭(永世)
sudo service filewalld stop
复制代码
三、设置时间同步
1.检察时间
date
复制代码
2.安装ntpdate
sudo yum -y install ntpdate
复制代码
3.更新yum源
3.1备份
sudo mv /etc/yum.repos.d/CentOS-Base.repo /etc/yum.repos.d/CentOS-Base.repo.backup 3.3.2
复制代码
3.2下载
sudo curl -o /etc/yum.repos.d/CentOS-Base.repo http://mirrors.aliyun.com/repo/Centos-7.repo
复制代码
3.3清空并天生缓存
sudo yum clean all
复制代码
sudo yum makecache
复制代码
4.安装utpdate
sudo yum -y install ntpdate
复制代码
5.同步时间
sudo ntpdate
-u pool.ntp.org
复制代码
四、安装软件
1.切入/usr/local
#删除文件夹下文件
sudo rm -rf /usr/local
复制代码
2.安装jdk
2.1卸载原有的环境
①检察体系自带的OpenJDK相关文件
rpm -qa | grep java
复制代码
②删除文件
cd /usr/lib/jvm
rm -rf /usr/lib/jvm
复制代码
③卸载
yum -y remove java-1.7.0-openjdk*
复制代码
yum -y remove java-1.8.0-openjdk*
复制代码
2.2安装jdk
①解压缩
tar -zxvf jdk-8u171-linux-x64.tar.gz
复制代码
②设置环境变量
sudo vi /etc/profile
复制代码
③内容
#JAVA_HOME
export JAVA_HOME=/usr/local/jdk1.8.0_171
export PATH=$PATH:$JAVA_HOME/bin
复制代码
④让修改后的文件生效
source /etc/profile
复制代码
⑤检察版本
java -version
复制代码
3.安装hadoop
3.1解压缩
tar -zxvf hadoop-2.9.2.tar.gz
复制代码
3.2hadoop环境变量
sudo vi /etc/profile
复制代码
3.3内容
##HADOOP_HOME
export HADOOP_HOME=/usr/local/hadoop-2.9.2
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
复制代码
3.4将修改后的文件生效
source /etc/profile
复制代码
3.5检察版本
hadoop version
复制代码
4.设置Hadoop集群
#切换到hadoop配置目录下的hadoop
cd /usr/local/hadoop-2.9.2/etc/hadoop/
复制代码
4.1编辑hadoop-env.sh
vi hadoop-env.sh
复制代码
修改JAVA_HOME 路径
export JAVA_HOME=/usr/local/jdk1.8.0_171
复制代码
4.2设置core-site.xml
vi core-site.xml
复制代码
----在configuration里----
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/hadoop-2.9.2/data/tmp</value>
</property>
<property>
<name>dfs.http.address</name>
<value>0.0.0.0:50070</value>
</property>
复制代码
4.3设置hdfs-site.xml
vi hdfs-site.xml
复制代码
----在configuration里----
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
复制代码
4.4格式化NameNode
/usr/local/hadoop-2.9.2/bin/hdfs namenode -format
复制代码
4.5设置yarn-env.sh
vi yarn-env.sh
复制代码
export JAVA_HOME=/usr/local/jdk1.8.0_171
复制代码
4.6设置yarn-site.xml
vi yarn-site.xml
复制代码
----在configuration里----
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hadoop</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>0.0.0.0:8088</value>
</property>
复制代码
4.7设置mapred-env.sh
vi mapred-env.sh
复制代码
export JAVA_HOME=/usr/local/jdk1.8.0_171
复制代码
4.8重定名
mv mapred-site.xml.template mapred-site.xml
复制代码
①设置mapred-site.xml
vi mapred-site.xml
复制代码
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
复制代码
4.9启动
/usr/local/hadoop-2.9.2/sbin/start-all.sh
复制代码
输入yes及暗码
4.10检察是否运行
jps
复制代码
访问 NameNode 的 Web 界面
http://192.168.200.131:50070
访问 ResourceManager 的 Web 界面
http://192.168.200.131:8088
五、ssh免密设置
1.天生密钥
**回车四次,**所有主机都要实行
ssh-keygen
复制代码
2.将本机公钥文件复制到其它虚拟机上
**吸收方开机,**所有主机都要实行
格式,分发时先输入yes后输对应主机的暗码
ssh-copy-id 主机名称
复制代码
3.检察是否乐成免密登录
ssh 主机名称
复制代码
4.启动Hadoop集群
/usr/local/hadoop-2.9.2/sbin/start-all.sh
复制代码
六、代码
1.Windows环境变量
变量名HADOOP_HOME,值为安装目次D:\JavaSoftware\hadoop-2.9.2
2.Path变量
%HADOOP_HOME%\bin
3.创建Maven项目
4.导⼊hadoop依赖
<dependencies>
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-core</artifactId>
<version>2.8.2</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>2.9.2</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>2.9.2</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
<version>2.9.2</version>
</dependency>
</dependencies>
<!--maven打包插件 -->
<build>
<plugins>
<plugin>
<artifactId>maven-compiler-plugin</artifactId>
<version>2.3.2</version>
<configuration>
<source>1.8</source>
<target>1.8</target>
</configuration>
</plugin>
<plugin>
<artifactId>maven-assembly-plugin</artifactId>
<configuration>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
</configuration>
<executions>
<execution>
<id>make-assembly</id>
<phase>package</phase>
<goals>
<goal>single</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
</build>
复制代码
5.添加log4j.properties
log4j.rootLogger=info, stdout
log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern=%d %p [%c] - %m%n
log4j.appender.logfile=org.apache.log4j.FileAppender
log4j.appender.logfile.File=target/spring.log
log4j.appender.logfile.layout=org.apache.log4j.PatternLayout
log4j.appender.logfile.layout.ConversionPattern=%d %p [%c] - %m%n
复制代码
6.团体思绪梳理
Map阶段:
map()⽅法中把传⼊的数据转为String类型
根据空格切分出单词
输出<单词,1>
Reduce阶段:
汇总各个key(单词)的个数,遍历value数据进⾏累加
输出key的总数
Driver
获取设置⽂件对象,获取job对象实例
指定步伐jar的当地路径
指定Mapper/Reducer类
指定Mapper输出的kv数据类型
指定终极输出的kv数据类型
指定job处理的原始数据路径
指定job输出效果路径
提交作业
编写Mapper类
7.示例一
7.1Map阶段
package cn.bdqn;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
import java.io.IOException;
public class WordcountMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
Text k = new Text();
IntWritable v = new IntWritable(1);
@Override
protected void map(LongWritable key, Text value, Context context) throws
IOException, InterruptedException {
// 1 获取一行
String line = value.toString();
// 2 切割
String[] words = line.split(" ");
// 3 输出
for (String word : words) {
k.set(word);
context.write(k, v);
}
}
}
复制代码
7.2Reducer阶段
package cn.bdqn;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
import java.io.IOException;
public class WordcountReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
int sum;
IntWritable v = new IntWritable();
@Override
protected void reduce(Text key, Iterable<IntWritable> values, Context
context) throws IOException, InterruptedException {
// 1 累加求和
sum = 0;
for (IntWritable count : values) {
sum += count.get();
}
System.out.println();
// 2 输出
v.set(sum);
context.write(key, v);
}
}
复制代码
7.3Driver阶段
package cn.bdqn;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.CombineTextInputFormat;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import java.io.IOException;
public class WordcountDriver {
public static void main(String[] args) throws IOException,
ClassNotFoundException, InterruptedException {
// 1 获取配置信息以及封装任务
Configuration configuration = new Configuration();
Job job = Job.getInstance(configuration);
// 2 设置jar加载路径
job.setJarByClass(WordcountDriver.class);
// 3 设置map和reduce类
job.setMapperClass(WordcountMapper.class);
job.setReducerClass(WordcountReducer.class);
job.setCombinerClass(WordcountReducer.class);
// 4 设置map输出
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
// 5 设置最终输出kv类型
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setInputFormatClass(CombineTextInputFormat.class);
// 6 设置输入和输出路径
FileInputFormat.setInputPaths(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
// 7 提交
boolean result = job.waitForCompletion(true);
System.exit(result ? 0 : 1);
}
}
复制代码
8.示例二
8.1序列化
package cn.bdqn.demo1;
import org.apache.hadoop.io.Writable;
import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;
// 1 实现writable接口
public class SpeakBean implements Writable {
private long selfDuration;
private long thirdPartDuration;
private long sumDuration;
//2 反序列化时,需要反射调用空参构造函数,所以必须有
public SpeakBean() {
}
public SpeakBean(long selfDuration, long thirdPartDuration) {
this.selfDuration = selfDuration;
this.thirdPartDuration = thirdPartDuration;
this.sumDuration = this.selfDuration + this.thirdPartDuration;
}
//3 写序列化方法
public void write(DataOutput out) throws IOException {
out.writeLong(selfDuration);
out.writeLong(thirdPartDuration);
out.writeLong(sumDuration);
}
//4 反序列化方法
//5 反序列化方法读顺序必须和写序列化方法的写顺序必须一致
public void readFields(DataInput in) throws IOException {
this.selfDuration = in.readLong();
this.thirdPartDuration = in.readLong();
this.sumDuration = in.readLong();
}
// 6 编写toString方法,方便后续打印到文本
@Override
public String toString() {
return selfDuration +
"\t" + thirdPartDuration +
"\t" + sumDuration;
}
public long getSelfDuration() {
return selfDuration;
}
public void setSelfDuration(long selfDuration) {
this.selfDuration = selfDuration;
}
public long getThirdPartDuration() {
return thirdPartDuration;
}
public void setThirdPartDuration(long thirdPartDuration) {
this.thirdPartDuration = thirdPartDuration;
}
public long getSumDuration() {
return sumDuration;
}
public void setSumDuration(long sumDuration) {
this.sumDuration = sumDuration;
}
public void set(long selfDuration, long thirdPartDuration) {
this.selfDuration = selfDuration;
this.thirdPartDuration = thirdPartDuration;
this.sumDuration = this.selfDuration + this.thirdPartDuration;
}
}
复制代码
8.1Map阶段
package cn.bdqn.demo1;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
import java.io.IOException;
public class SpeakDurationMapper extends Mapper<LongWritable, Text, Text,SpeakBean> {
SpeakBean v = new SpeakBean();
Text k = new Text();
@Override
protected void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
// 1 获取一行
String line = value.toString();
// 2 切割字段
String[] fields = line.split("\t");
// 3 封装对象
// 取出设备id
String deviceId = fields[1];
// 取出自有和第三方时长数据
long selfDuration = Long.parseLong(fields[fields.length - 3]);
long thirdPartDuration = Long.parseLong(fields[fields.length - 2]);
k.set(deviceId);
v.set(selfDuration, thirdPartDuration);
// 4 写出
context.write(k, v);
}
}
复制代码
8.2Reducer阶段
package cn.bdqn.demo1;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
import java.io.IOException;
public class SpeakDurationReducer extends Reducer<Text, SpeakBean, Text, SpeakBean> {
@Override
protected void reduce(Text key, Iterable<SpeakBean> values, Context
context) throws IOException, InterruptedException {
long self_Duration = 0;
long thirdPart_Duration = 0;// 1 遍历所用bean,将其中的自有,第三方时长分别累加
for (SpeakBean sb : values) {
self_Duration += sb.getSelfDuration();
thirdPart_Duration += sb.getThirdPartDuration();
}
// 2 封装对象
SpeakBean resultBean = new SpeakBean(self_Duration,
thirdPart_Duration);
// 3 写出
context.write(key, resultBean);
}
}
复制代码
8.3Driver阶段
package cn.bdqn.demo1;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import java.io.IOException;
public class SpeakerDriver {
public static void main(String[] args) throws IllegalArgumentException,
IOException, ClassNotFoundException, InterruptedException {
// 输入输出路径需要根据自己电脑上实际的输入输出路径设置
//args = new String[]{"d:/input/input/speak.data", "d:/output222"};
// 1 获取配置信息,或者job对象实例
Configuration configuration = new Configuration();
Job job = Job.getInstance(configuration);
// 6 指定本程序的jar包所在的本地路径
job.setJarByClass(SpeakerDriver.class);
// 2 指定本业务job要使用的mapper/Reducer业务类
job.setMapperClass(SpeakDurationMapper.class);
job.setReducerClass(SpeakDurationReducer.class);
// 3 指定mapper输出数据的kv类型
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(SpeakBean.class);
// 4 指定最终输出的数据的kv类型
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(SpeakBean.class);
// 5 指定job的输入原始文件所在目录
FileInputFormat.setInputPaths(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
// 7 将job中配置的相关参数,以及job所用的java类所在的jar包, 提交给yarn去运行
boolean result = job.waitForCompletion(true);
System.exit(result ? 0 : 1);
}
}
复制代码
七、打包上传
1.打包
2.效果
3.在Linux创建文件
4.hdfs命令
4.1帮助
hadoop fs -help
复制代码
4.2创建目标目次
hadoop fs -mkdir -p /user/root
复制代码
4.3上传
hadoop fs -put a.txt /
复制代码
4.4删除空目次
hadoop fs -rmdir /user/root
复制代码
4.5删除文件
hadoop fs -rm -f /user/root/a.txt
复制代码
4.6实行代码
hadoop jar jar文件 启动类 读取文件位置 输出文件位置
复制代码
示例
hadoop jar Hadoop_Demo-1.0-SNAPSHOT.jar cn.bdqn.demo.WordcountDriver /b.txt /out1
复制代码
hadoop jar Hadoop_Demo-1.0-SNAPSHOT.jar cn.bdqn.demo1.SpeakerDriver /a.txt /out
复制代码
免责声明:如果侵犯了您的权益,请联系站长,我们会及时删除侵权内容,谢谢合作!更多信息从访问主页:qidao123.com:ToB企服之家,中国第一个企服评测及商务社交产业平台。
欢迎光临 ToB企服应用市场:ToB评测及商务社交产业平台 (https://dis.qidao123.com/)
Powered by Discuz! X3.4