docker配置全分布式hadoop(5台容器两台主节点,三台从节点) ...

  金牌会员 | 2024-7-14 18:36:19 | 显示全部楼层 | 阅读模式
打印 上一主题 下一主题

主题 636|帖子 636|积分 1908

Docker容器搭建hadoop

集群规划

master: 主节点, 运行NameNode和ResourceManager
master2: 主节点, 运行SecondaryNameNode
worker1、worker2、worker3: 从节点, 运行DataNode和NodeManager
一、把jdk传到假造机上,提前准备好Dockerfile文件,hadoop配置文件(Dockerfile所在路径创建目录etc/hadoop,需要改动的hadoop配置文件放置在该目录下)(我是在/root下)

Dockerfile文件的内容

  1. # syntax=docker/dockerfile:1
  2. # 参考资料: https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/ClusterSetup.html
  3. FROM ubuntu:20.04
  4. ARG TARBALL=hadoop-3.4.0.tar.gz
  5. # 提前下载好的java8压缩包, 下载地址: https://www.oracle.com/java/technologies/downloads/
  6. ARG JAVA_TARBALL=jdk-8u212-linux-x64.tar.gz
  7. ENV HADOOP_HOME /app/hadoop
  8. ENV JAVA_HOME /usr/java
  9. WORKDIR $JAVA_HOME
  10. WORKDIR $HADOOP_HOME
  11. RUN sed -i s@/archive.ubuntu.com/@/mirrors.aliyun.com/@g /etc/apt/sources.list && \
  12.     apt-get clean && \
  13.     apt-get update && \
  14.     apt-get upgrade -y && \
  15.     apt-get install -y --no-install-recommends \
  16.     wget \
  17.     ssh
  18. # 拷贝jdk8安装包
  19. COPY ${JAVA_TARBALL} ${JAVA_HOME}/${JAVA_TARBALL}
  20. RUN tar -zxvf /usr/java/${JAVA_TARBALL} --strip-components 1 -C /usr/java && \
  21.     rm /usr/java/${JAVA_TARBALL} && \
  22.     # 设置java8环境变量
  23.     echo export JAVA_HOME=${JAVA_HOME} >> ~/.bashrc && \
  24.     echo export PATH=\$PATH:\$JAVA_HOME/bin >> ~/.bashrc && \
  25.     echo export JAVA_HOME=${JAVA_HOME} >> /etc/profile && \
  26.     echo export PATH=\$PATH:\$JAVA_HOME/bin >> /etc/profile && \
  27.     # 下载hadoop安装包
  28.     wget --no-check-certificate https://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/stable/${TARBALL} && \
  29.     # 解压hadoop安装包
  30.     tar -zxvf ${TARBALL} --strip-components 1 -C $HADOOP_HOME && \
  31.     rm ${TARBALL} && \
  32.     # 设置从节点
  33.     echo "worker1\nworker2\nworker3" > $HADOOP_HOME/etc/hadoop/workers && \
  34.     echo export HADOOP_HOME=${HADOOP_HOME} >> ~/.bashrc && \
  35.         echo export PATH=\$PATH:\$HADOOP_HOME/bin >> ~/.bashrc && \
  36.         echo export PATH=\$PATH:\$HADOOP_HOME/sbin >> ~/.bashrc && \
  37.         echo export HADOOP_HOME=${HADOOP_HOME} >> /etc/profile && \
  38.         echo export PATH=\$PATH:\$HADOOP_HOME/bin >> /etc/profile && \
  39.         echo export PATH=\$PATH:\$HADOOP_HOME/sbin >> /etc/profile && \
  40.     mkdir /app/hdfs && \
  41.     # java8软连接
  42.     ln -s $JAVA_HOME/bin/java /bin/java
  43. # 拷贝hadoop配置文件
  44. COPY ./etc/hadoop/hdfs-site.xml $HADOOP_HOME/etc/hadoop/hdfs-site.xml
  45. COPY ./etc/hadoop/core-site.xml $HADOOP_HOME/etc/hadoop/core-site.xml
  46. COPY ./etc/hadoop/mapred-site.xml $HADOOP_HOME/etc/hadoop/mapred-site.xml
  47. COPY ./etc/hadoop/yarn-site.xml $HADOOP_HOME/etc/hadoop/yarn-site.xml
  48. # 设置hadoop环境变量
  49. RUN echo export JAVA_HOME=$JAVA_HOME >> $HADOOP_HOME/etc/hadoop/hadoop-env.sh && \
  50.     echo export HADOOP_MAPRED_HOME=$HADOOP_HOME >> $HADOOP_HOME/etc/hadoop/hadoop-env.sh && \
  51.     echo export HDFS_NAMENODE_USER=root >> $HADOOP_HOME/etc/hadoop/hadoop-env.sh && \
  52.     echo export HDFS_DATANODE_USER=root >> $HADOOP_HOME/etc/hadoop/hadoop-env.sh && \
  53.     echo export HDFS_SECONDARYNAMENODE_USER=root >> $HADOOP_HOME/etc/hadoop/hadoop-env.sh && \
  54.     echo export YARN_RESOURCEMANAGER_USER=root >> $HADOOP_HOME/etc/hadoop/hadoop-env.sh && \
  55.     echo export YARN_NODEMANAGER_USER=root >> $HADOOP_HOME/etc/hadoop/hadoop-env.sh
  56. # ssh免登录设置
  57. RUN echo "/etc/init.d/ssh start" >> ~/.bashrc && \
  58.     ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa && \
  59.     cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys && \
  60.     chmod 0600 ~/.ssh/authorized_keys
  61. # NameNode WEB UI服务端口
  62. EXPOSE 9870
  63. # nn文件服务端口
  64. EXPOSE 9000
  65. # dfs.namenode.secondary.http-address
  66. EXPOSE 9868
  67. # dfs.datanode.http.address
  68. EXPOSE 9864
  69. # dfs.datanode.address
  70. EXPOSE 9866
复制代码


core-site.xml

  1. <?xml version="1.0" encoding="UTF-8"?>
  2. <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
  3. <!--
  4.   Licensed under the Apache License, Version 2.0 (the "License");
  5.   you may not use this file except in compliance with the License.
  6.   You may obtain a copy of the License at
  7.     http://www.apache.org/licenses/LICENSE-2.0
  8.   Unless required by applicable law or agreed to in writing, software
  9.   distributed under the License is distributed on an "AS IS" BASIS,
  10.   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  11.   See the License for the specific language governing permissions and
  12.   limitations under the License. See accompanying LICENSE file.
  13. -->
  14. <!-- Put site-specific property overrides in this file. -->
  15. <configuration>
  16.     <property>
  17.         <!-- 设置NameNode对外提供HDFS服务 -->
  18.         <name>fs.defaultFS</name>
  19.         <value>hdfs://hadoop330:9000</value>
  20.     </property>
  21.     <property>
  22.         <!-- 指明是否开启用户认证 -->
  23.         <name>hadoop.security.authorization</name>
  24.         <value>false</value>
  25.     </property>
  26. </configuration>
复制代码
hdfs-site.xml

  1. <?xml version="1.0" encoding="UTF-8"?>
  2. <!--
  3.   Licensed under the Apache License, Version 2.0 (the "License");
  4.   you may not use this file except in compliance with the License.
  5.   You may obtain a copy of the License at
  6.     http://www.apache.org/licenses/LICENSE-2.0
  7.   Unless required by applicable law or agreed to in writing, software
  8.   distributed under the License is distributed on an "AS IS" BASIS,
  9.   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  10.   See the License for the specific language governing permissions and
  11.   limitations under the License. See accompanying LICENSE file.
  12. -->
  13. <!-- Put site-specific property overrides in this file. -->
  14. <configuration>
  15.     <property>
  16.         <!-- 文件副本数量, 默认是3 -->
  17.         <name>dfs.replication</name>
  18.         <value>3</value>
  19. <!--        <value>1</value>-->
  20.     </property>
  21.     <property>
  22.         <!-- 是否启用文件操作权限, 不启用可以以普通账户写操作HDFS文件和目录 -->
  23.         <name>dfs.permissions</name>
  24.         <value>false</value>
  25.     </property>
  26.     <property>
  27.         <!-- NameNode存储数据的文件所在的路径 -->
  28.         <name>dfs.namenode.name.dir</name>
  29.         <value>/app/hdfs/namenode</value>
  30.     </property>
  31.     <property>
  32.         <!--  DataNode存储数据的文件路径 -->
  33.         <name>dfs.datanode.data.dir</name>
  34.         <value>/app/hdfs/datanode</value>
  35.     </property>
  36.     <property>
  37.         <!--  设置SecondNameNode节点地址 -->
  38.         <name>dfs.namenode.secondary.http-address</name>
  39.         <value>master2:50090</value>
  40.     </property>
  41. </configuration>
复制代码
mapred-site.xml

  1. <?xml version="1.0"?>
  2. <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
  3. <!--
  4.   Licensed under the Apache License, Version 2.0 (the "License");
  5.   you may not use this file except in compliance with the License.
  6.   You may obtain a copy of the License at
  7.     http://www.apache.org/licenses/LICENSE-2.0
  8.   Unless required by applicable law or agreed to in writing, software
  9.   distributed under the License is distributed on an "AS IS" BASIS,
  10.   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  11.   See the License for the specific language governing permissions and
  12.   limitations under the License. See accompanying LICENSE file.
  13. -->
  14. <!-- Put site-specific property overrides in this file. -->
  15. <configuration>
  16.     <property>
  17.         <name>mapreduce.framework.name</name>
  18.         <value>yarn</value>
  19.     </property>
  20.     <property>
  21.         <name>mapreduce.application.classpath</name>
  22.         <value>
  23.             $HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*
  24.         </value>
  25.     </property>
  26.     <property>
  27.         <name>yarn.app.mapreduce.am.env</name>
  28.         <value>HADOOP_MAPRED_HOME=/app/hadoop</value>
  29.     </property>
  30.     <property>
  31.         <name>mapreduce.map.env</name>
  32.         <value>HADOOP_MAPRED_HOME=/app/hadoop</value>
  33.     </property>
  34.     <property>
  35.         <name>mapreduce.reduce.env</name>
  36.         <value>HADOOP_MAPRED_HOME=/app/hadoop</value>
  37.     </property>
  38. </configuration>
复制代码
yarn-site.xml

  1. <?xml version="1.0"?>
  2. <!--
  3.   Licensed under the Apache License, Version 2.0 (the "License");
  4.   you may not use this file except in compliance with the License.
  5.   You may obtain a copy of the License at
  6.     http://www.apache.org/licenses/LICENSE-2.0
  7.   Unless required by applicable law or agreed to in writing, software
  8.   distributed under the License is distributed on an "AS IS" BASIS,
  9.   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  10.   See the License for the specific language governing permissions and
  11.   limitations under the License. See accompanying LICENSE file.
  12. -->
  13. <configuration>
  14.     <!-- Site specific YARN configuration properties -->
  15.     <property>
  16.         <name>yarn.nodemanager.aux-services</name>
  17.         <value>mapreduce_shuffle</value>
  18.     </property>
  19.     <property>
  20.         <name>yarn.nodemanager.env-whitelist</name>
  21.         <value>
  22.             JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,
  23.             CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME
  24.         </value>
  25.     </property>
  26.     <property>
  27.         <name>yarn.resourcemanager.address</name>
  28.         <value>hadoop330:8032</value>
  29.     </property>
  30.     <property>
  31.         <name>yarn.resourcemanager.scheduler.address</name>
  32.         <value>hadoop330:8030</value>
  33.     </property>
  34.     <property>
  35.         <name>yarn.resourcemanager.resource-tracker.address</name>
  36.         <value>hadoop330:8031</value>
  37.     </property>
  38. </configuration>
复制代码
二、java8和hadoop配置文件准备好后,从Dockerfile开始构建hadoop基础镜像

  1. docker build -t hadoop .
复制代码

三、编写docker-compose.yml文件,执行docker-compose启动5个容器

docker-compose.yml

  1. version: '3'
  2. services:
  3.   master1:
  4.     image: hadoop
  5.     stdin_open: true
  6.     tty: true
  7.     command: bash
  8.     hostname: hadoop330
  9.     ports:
  10.       - "9000:9000"
  11.       - "9870:9870"
  12.       - "8088:8088"
  13.   master2:
  14.     image: hadoop
  15.     stdin_open: true
  16.     tty: true
  17.     command: bash
  18.   worker1:
  19.     image: hadoop
  20.     stdin_open: true
  21.     tty: true
  22.     command: bash
  23.   worker2:
  24.     image: hadoop
  25.     stdin_open: true
  26.     tty: true
  27.     command: bash
  28.   worker3:
  29.     image: hadoop
  30.     stdin_open: true
  31.     tty: true
  32.     command: bash
复制代码

!!!假如没有下载docker-compose,依次执行以下命令

  1. yum -y install epel-release
  2. yum install python-pip
  3. wget https://github.com/docker/compose/releases/download/1.14.0-rc2/docker-compose-Linux-x86_64
  4. mv docker-compose-Linux-x86_64 /usr/local/bin/docker-compose
  5. chmod +x /usr/local/bin/docker-compose
  6. docker-compose -version
复制代码

执行docker-compose启动5个容器

  1. docker-compose up -d
复制代码

四、启动hadoop集群和测试验证

观察本身容器名字,我的主节点容器名字是root_master1_1


进入主节点容器

  1. docker exec -it root_master1_1 bash
复制代码

先格式化HDFS

  1. ./bin/hdfs namenode -format
复制代码

安装sudo

  1. apt-get install sudo
复制代码

启动hadoop集群

  1. ./sbin/start-all.sh
复制代码

查看各个容器进程(jps)

Master1:


Master2:


exit退出,进入worker1:


worker2:


worker3:


hadoop相关进程都启动了,接下来,验证web界面

打开网页localhost:8088, 能看到3个激活状态的从节点


打开网页localhost:9870, NameNode的Web UI服务


意味着组件全部启动乐成


免责声明:如果侵犯了您的权益,请联系站长,我们会及时删除侵权内容,谢谢合作!更多信息从访问主页:qidao123.com:ToB企服之家,中国第一个企服评测及商务社交产业平台。

本帖子中包含更多资源

您需要 登录 才可以下载或查看,没有账号?立即注册

x
回复

使用道具 举报

0 个回复

倒序浏览

快速回复

您需要登录后才可以回帖 登录 or 立即注册

本版积分规则

金牌会员
这个人很懒什么都没写!

标签云

快速回复 返回顶部 返回列表