简介
1)Hadoop是一个由Apache基金会所开辟的分布式系统基础架构。 2)主要解决,海量数据的存储和海量数据的分析计算问题。
Hadoop HDFS 提供分布式海量数据存储本领
Hadoop YARN 提供分布式集群资源管理本领
Hadoop MapReduce 提供分布式海量数据计算本领
前置要求
- 请确保完成了集群化环境前置准备章节的内容
- 即:JDK、SSH免密、关闭防火墙、配置主机名映射等前置操作
Hadoop集群脚色
Hadoop生态体系中统共会出现如下进程脚色:
- Hadoop HDFS的管理脚色:Namenode进程(仅需1个即可(管理者一个就够))
- Hadoop HDFS的工作脚色:Datanode进程(必要多个(工人,越多越好,一个机器启动一个))
- Hadoop YARN的管理脚色:ResourceManager进程(仅需1个即可(管理者一个就够))
- Hadoop YARN的工作脚色:NodeManager进程(必要多个(工人,越多越好,一个机器启动一个))
- Hadoop 汗青记录服务器脚色:HistoryServer进程(仅需1个即可(功能进程无需太多1个充足))
- Hadoop 署理服务器脚色:WebProxyServer进程(仅需1个即可(功能进程无需太多1个充足))
- Zookeeper的进程:QuorumPeerMain进程(仅需1个即可(Zookeeper的工作者,越多越好))
脚色和节点分配
脚色分配如下:
- node1:Namenode、Datanode、ResourceManager、NodeManager、HistoryServer、WebProxyServer、QuorumPeerMain
- node2
atanode、NodeManager、QuorumPeerMain
- node3
atanode、NodeManager、QuorumPeerMain
安装
调整假造机内存
如上图,可以看出node1承载了太多的压力。同时node2和node3也同时运行了不少步伐
为了确保集群的稳定,必要对假造机进行内存设置。
请在VMware中,对:
- node1设置4GB或以上内存
- node2和node3设置2GB或以上内存
大数据的软件本身就是集群化(一堆服务器)一起运行的。
现在我们在一台电脑中以多台假造机来模仿集群,确实会有很大的内存压力哦。
Zookeeper集群摆设
略
Hadoop集群摆设
- 下载Hadoop安装包、解压、配置软链接
- # 1. 下载
- wget http://archive.apache.org/dist/hadoop/common/hadoop-3.3.0/hadoop-3.3.0.tar.gz
-
- # 2. 解压
- # 请确保目录/export/server存在
- tar -zxvf hadoop-3.3.0.tar.gz -C /export/server/
-
- # 3. 构建软链接
- ln -s /export/server/hadoop-3.3.0 /export/server/hadoop
复制代码 - 修改配置文件:hadoop-env.sh
Hadoop的配置文件要修改的地方很多,请仔细
cd 进入到/export/server/hadoop/etc/hadoop,文件夹中,配置文件都在这里
修改hadoop-env.sh文件
此文件是配置一些Hadoop用到的环境变量
这些是暂时变量,在Hadoop运行时有用
如果要永久见效,必要写到/etc/profile中
- # 在文件开头加入:
- # 配置Java安装路径
- export JAVA_HOME=/export/server/jdk
- # 配置Hadoop安装路径
- export HADOOP_HOME=/export/server/hadoop
- # Hadoop hdfs配置文件路径
- export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
- # Hadoop YARN配置文件路径
- export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop
- # Hadoop YARN 日志文件夹
- export YARN_LOG_DIR=$HADOOP_HOME/logs/yarn
- # Hadoop hdfs 日志文件夹
- export HADOOP_LOG_DIR=$HADOOP_HOME/logs/hdfs
-
- # Hadoop的使用启动用户配置
- export HDFS_NAMENODE_USER=root
- export HDFS_DATANODE_USER=root
- export HDFS_SECONDARYNAMENODE_USER=root
- export YARN_RESOURCEMANAGER_USER=root
- export YARN_NODEMANAGER_USER=root
- export YARN_PROXYSERVER_USER=root
复制代码 - 修改配置文件:core-site.xml
如下,清空文件,填入如下内容
- <?xml version="1.0" encoding="UTF-8"?>
- <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
- <!--
- Licensed under the Apache License, Version 2.0 (the "License");
- you may not use this file except in compliance with the License.
- You may obtain a copy of the License at
-
- http://www.apache.org/licenses/LICENSE-2.0
-
- Unless required by applicable law or agreed to in writing, software
- distributed under the License is distributed on an "AS IS" BASIS,
- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- See the License for the specific language governing permissions and
- limitations under the License. See accompanying LICENSE file.
- -->
-
- <!-- Put site-specific property overrides in this file. -->
- <configuration>
- <property>
- <name>fs.defaultFS</name>
- <value>hdfs://node1:8020</value>
- <description></description>
- </property>
-
- <property>
- <name>io.file.buffer.size</name>
- <value>131072</value>
- <description></description>
- </property>
- </configuration>
复制代码 - 配置:hdfs-site.xml文件
- <?xml version="1.0" encoding="UTF-8"?>
- <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
- <!--
- Licensed under the Apache License, Version 2.0 (the "License");
- you may not use this file except in compliance with the License.
- You may obtain a copy of the License at
-
- http://www.apache.org/licenses/LICENSE-2.0
-
- Unless required by applicable law or agreed to in writing, software
- distributed under the License is distributed on an "AS IS" BASIS,
- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- See the License for the specific language governing permissions and
- limitations under the License. See accompanying LICENSE file.
- -->
-
- <!-- Put site-specific property overrides in this file. -->
-
- <configuration>
- <property>
- <name>dfs.datanode.data.dir.perm</name>
- <value>700</value>
- </property>
-
- <property>
- <name>dfs.namenode.name.dir</name>
- <value>/data/nn</value>
- <description>Path on the local filesystem where the NameNode stores the namespace and transactions logs persistently.</description>
- </property>
-
- <property>
- <name>dfs.namenode.hosts</name>
- <value>node1,node2,node3</value>
- <description>List of permitted DataNodes.</description>
- </property>
-
- <property>
- <name>dfs.blocksize</name>
- <value>268435456</value>
- <description></description>
- </property>
-
-
- <property>
- <name>dfs.namenode.handler.count</name>
- <value>100</value>
- <description></description>
- </property>
-
- <property>
- <name>dfs.datanode.data.dir</name>
- <value>/data/dn</value>
- </property>
- </configuration>
复制代码 - 配置:mapred-env.sh文件
- # 在文件的开头加入如下环境变量设置
- export JAVA_HOME=/export/server/jdk
- export HADOOP_JOB_HISTORYSERVER_HEAPSIZE=1000
- export HADOOP_MAPRED_ROOT_LOGGER=INFO,RFA
复制代码 - 配置:mapred-site.xml文件
- <?xml version="1.0"?>
- <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
- <!--
- Licensed under the Apache License, Version 2.0 (the "License");
- you may not use this file except in compliance with the License.
- You may obtain a copy of the License at
-
- http://www.apache.org/licenses/LICENSE-2.0
-
- Unless required by applicable law or agreed to in writing, software
- distributed under the License is distributed on an "AS IS" BASIS,
- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- See the License for the specific language governing permissions and
- limitations under the License. See accompanying LICENSE file.
- -->
-
- <!-- Put site-specific property overrides in this file. -->
-
- <configuration>
- <property>
- <name>mapreduce.framework.name</name>
- <value>yarn</value>
- <description></description>
- </property>
-
- <property>
- <name>mapreduce.jobhistory.address</name>
- <value>node1:10020</value>
- <description></description>
- </property>
-
-
- <property>
- <name>mapreduce.jobhistory.webapp.address</name>
- <value>node1:19888</value>
- <description></description>
- </property>
-
-
- <property>
- <name>mapreduce.jobhistory.intermediate-done-dir</name>
- <value>/data/mr-history/tmp</value>
- <description></description>
- </property>
-
-
- <property>
- <name>mapreduce.jobhistory.done-dir</name>
- <value>/data/mr-history/done</value>
- <description></description>
- </property>
- <property>
- <name>yarn.app.mapreduce.am.env</name>
- <value>HADOOP_MAPRED_HOME=$HADOOP_HOME</value>
- </property>
- <property>
- <name>mapreduce.map.env</name>
- <value>HADOOP_MAPRED_HOME=$HADOOP_HOME</value>
- </property>
- <property>
- <name>mapreduce.reduce.env</name>
- <value>HADOOP_MAPRED_HOME=$HADOOP_HOME</value>
- </property>
- </configuration>
复制代码 - 配置:yarn-env.sh文件
- # 在文件的开头加入如下环境变量设置
- export JAVA_HOME=/export/server/jdk
- export HADOOP_HOME=/export/server/hadoop
- export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
- export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop
- export YARN_LOG_DIR=$HADOOP_HOME/logs/yarn
- export HADOOP_LOG_DIR=$HADOOP_HOME/logs/hdfs
复制代码 - 配置:yarn-site.xml文件
- <?xml version="1.0"?>
- <!--
- Licensed under the Apache License, Version 2.0 (the "License");
- you may not use this file except in compliance with the License.
- You may obtain a copy of the License at
-
- http://www.apache.org/licenses/LICENSE-2.0
-
- Unless required by applicable law or agreed to in writing, software
- distributed under the License is distributed on an "AS IS" BASIS,
- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- See the License for the specific language governing permissions and
- limitations under the License. See accompanying LICENSE file.
- -->
- <configuration>
-
- <!-- Site specific YARN configuration properties -->
- <property>
- <name>yarn.log.server.url</name>
- <value>http://node1:19888/jobhistory/logs</value>
- <description></description>
- </property>
-
- <property>
- <name>yarn.web-proxy.address</name>
- <value>node1:8089</value>
- <description>proxy server hostname and port</description>
- </property>
-
-
- <property>
- <name>yarn.log-aggregation-enable</name>
- <value>true</value>
- <description>Configuration to enable or disable log aggregation</description>
- </property>
-
- <property>
- <name>yarn.nodemanager.remote-app-log-dir</name>
- <value>/tmp/logs</value>
- <description>Configuration to enable or disable log aggregation</description>
- </property>
-
-
- <!-- Site specific YARN configuration properties -->
- <property>
- <name>yarn.resourcemanager.hostname</name>
- <value>node1</value>
- <description></description>
- </property>
-
- <property>
- <name>yarn.resourcemanager.scheduler.class</name>
- <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
- <description></description>
- </property>
-
- <property>
- <name>yarn.nodemanager.local-dirs</name>
- <value>/data/nm-local</value>
- <description>Comma-separated list of paths on the local filesystem where intermediate data is written.</description>
- </property>
-
-
- <property>
- <name>yarn.nodemanager.log-dirs</name>
- <value>/data/nm-log</value>
- <description>Comma-separated list of paths on the local filesystem where logs are written.</description>
- </property>
-
-
- <property>
- <name>yarn.nodemanager.log.retain-seconds</name>
- <value>10800</value>
- <description>Default time (in seconds) to retain log files on the NodeManager Only applicable if log-aggregation is disabled.</description>
- </property>
-
-
-
- <property>
- <name>yarn.nodemanager.aux-services</name>
- <value>mapreduce_shuffle</value>
- <description>Shuffle service that needs to be set for Map Reduce applications.</description>
- </property>
- </configuration>
复制代码 - 修改workers文件
- 分发hadoop到别的机器
- # 在node1执行
- cd /export/server
-
- scp -r hadoop-3.3.0 node2:`pwd`/
- scp -r hadoop-3.3.0 node2:`pwd`/
复制代码
- 在node2、node3实行
- # 创建软链接
- ln -s /export/server/hadoop-3.3.0 /export/server/hadoop
复制代码 - 创建所需目录
- 在node1实行:
- mkdir -p /data/nn
- mkdir -p /data/dn
- mkdir -p /data/nm-log
- mkdir -p /data/nm-local
复制代码 - 在node2实行:
- mkdir -p /data/dn
- mkdir -p /data/nm-log
- mkdir -p /data/nm-local
复制代码 - 在node3实行:
- mkdir -p /data/dn
- mkdir -p /data/nm-log
- mkdir -p /data/nm-local
复制代码
- 配置环境变量
在node1、node2、node3修改/etc/profile
- export HADOOP_HOME=/export/server/hadoop
- export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
复制代码 实行source /etc/profile见效
- 格式化NameNode,在node1实行
hadoop这个命令来自于:$HADOOP_HOME/bin中的步伐
由于配置了环境变量PATH,以是可以在任意位置实行hadoop命令哦
- 启动hadoop的hdfs集群,在node1实行即可
- start-dfs.sh
-
- # 如需停止可以执行
- stop-dfs.sh
复制代码 start-dfs.sh这个命令来自于:$HADOOP_HOME/sbin中的步伐
由于配置了环境变量PATH,以是可以在任意位置实行start-dfs.sh命令哦
- 启动hadoop的yarn集群,在node1实行即可
- start-yarn.sh
-
- # 如需停止可以执行
- stop-yarn.sh
复制代码 - 启动汗青服务器
- mapred --daemon start historyserver
-
- # 如需停止将start更换为stop
复制代码 - 启动web署理服务器
- yarn-daemon.sh start proxyserver
-
- # 如需停止将start更换为stop
复制代码
验证Hadoop集群运行环境
- 在node1、node2、node3上通过jps验证进程是否都启动乐成
- 验证HDFS,浏览器打开:http://node1:9870
创建文件test.txt,随意填入内容,并实行:
- hadoop fs -put test.txt /test.txt
-
- hadoop fs -cat /test.txt
复制代码 - 验证YARN,浏览器打开:http://node1:8088
实行:
- # 创建文件words.txt,填入如下内容
- itheima itcast hadoop
- itheima hadoop hadoop
- itheima itcast
-
- # 将文件上传到HDFS中
- hadoop fs -put words.txt /words.txt
-
- # 执行如下命令验证YARN是否正常
- hadoop jar /export/server/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.0.jar wordcount -Dmapred.job.queue.name=root.root /words.txt /output
复制代码
免责声明:如果侵犯了您的权益,请联系站长,我们会及时删除侵权内容,谢谢合作!更多信息从访问主页:qidao123.com:ToB企服之家,中国第一个企服评测及商务社交产业平台。 |