发表于 2024-7-14 18:36:19

docker配置全分布式hadoop(5台容器两台主节点,三台从节点)

Docker容器搭建hadoop

集群规划

master: 主节点, 运行NameNode和ResourceManager
master2: 主节点, 运行SecondaryNameNode
worker1、worker2、worker3: 从节点, 运行DataNode和NodeManager
一、把jdk传到假造机上,提前准备好Dockerfile文件,hadoop配置文件(Dockerfile所在路径创建目录etc/hadoop,需要改动的hadoop配置文件放置在该目录下)(我是在/root下)

Dockerfile文件的内容

# syntax=docker/dockerfile:1

# 参考资料: https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/ClusterSetup.html

FROM ubuntu:20.04

ARG TARBALL=hadoop-3.4.0.tar.gz
# 提前下载好的java8压缩包, 下载地址: https://www.oracle.com/java/technologies/downloads/
ARG JAVA_TARBALL=jdk-8u212-linux-x64.tar.gz

ENV HADOOP_HOME /app/hadoop
ENV JAVA_HOME /usr/java

WORKDIR $JAVA_HOME
WORKDIR $HADOOP_HOME

RUN sed -i s@/archive.ubuntu.com/@/mirrors.aliyun.com/@g /etc/apt/sources.list && \
    apt-get clean && \
    apt-get update && \
    apt-get upgrade -y && \
    apt-get install -y --no-install-recommends \
    wget \
    ssh

# 拷贝jdk8安装包
COPY ${JAVA_TARBALL} ${JAVA_HOME}/${JAVA_TARBALL}

RUN tar -zxvf /usr/java/${JAVA_TARBALL} --strip-components 1 -C /usr/java && \
    rm /usr/java/${JAVA_TARBALL} && \
    # 设置java8环境变量
    echo export JAVA_HOME=${JAVA_HOME} >> ~/.bashrc && \
    echo export PATH=\$PATH:\$JAVA_HOME/bin >> ~/.bashrc && \
    echo export JAVA_HOME=${JAVA_HOME} >> /etc/profile && \
    echo export PATH=\$PATH:\$JAVA_HOME/bin >> /etc/profile && \
    # 下载hadoop安装包
    wget --no-check-certificate https://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/stable/${TARBALL} && \
    # 解压hadoop安装包
    tar -zxvf ${TARBALL} --strip-components 1 -C $HADOOP_HOME && \
    rm ${TARBALL} && \
    # 设置从节点
    echo "worker1\nworker2\nworker3" > $HADOOP_HOME/etc/hadoop/workers && \
    echo export HADOOP_HOME=${HADOOP_HOME} >> ~/.bashrc && \
        echo export PATH=\$PATH:\$HADOOP_HOME/bin >> ~/.bashrc && \
        echo export PATH=\$PATH:\$HADOOP_HOME/sbin >> ~/.bashrc && \
        echo export HADOOP_HOME=${HADOOP_HOME} >> /etc/profile && \
        echo export PATH=\$PATH:\$HADOOP_HOME/bin >> /etc/profile && \
        echo export PATH=\$PATH:\$HADOOP_HOME/sbin >> /etc/profile && \
    mkdir /app/hdfs && \
    # java8软连接
    ln -s $JAVA_HOME/bin/java /bin/java

# 拷贝hadoop配置文件
COPY ./etc/hadoop/hdfs-site.xml $HADOOP_HOME/etc/hadoop/hdfs-site.xml
COPY ./etc/hadoop/core-site.xml $HADOOP_HOME/etc/hadoop/core-site.xml
COPY ./etc/hadoop/mapred-site.xml $HADOOP_HOME/etc/hadoop/mapred-site.xml
COPY ./etc/hadoop/yarn-site.xml $HADOOP_HOME/etc/hadoop/yarn-site.xml

# 设置hadoop环境变量
RUN echo export JAVA_HOME=$JAVA_HOME >> $HADOOP_HOME/etc/hadoop/hadoop-env.sh && \
    echo export HADOOP_MAPRED_HOME=$HADOOP_HOME >> $HADOOP_HOME/etc/hadoop/hadoop-env.sh && \
    echo export HDFS_NAMENODE_USER=root >> $HADOOP_HOME/etc/hadoop/hadoop-env.sh && \
    echo export HDFS_DATANODE_USER=root >> $HADOOP_HOME/etc/hadoop/hadoop-env.sh && \
    echo export HDFS_SECONDARYNAMENODE_USER=root >> $HADOOP_HOME/etc/hadoop/hadoop-env.sh && \
    echo export YARN_RESOURCEMANAGER_USER=root >> $HADOOP_HOME/etc/hadoop/hadoop-env.sh && \
    echo export YARN_NODEMANAGER_USER=root >> $HADOOP_HOME/etc/hadoop/hadoop-env.sh
# ssh免登录设置
RUN echo "/etc/init.d/ssh start" >> ~/.bashrc && \
    ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa && \
    cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys && \
    chmod 0600 ~/.ssh/authorized_keys

# NameNode WEB UI服务端口
EXPOSE 9870
# nn文件服务端口
EXPOSE 9000
# dfs.namenode.secondary.http-address
EXPOSE 9868
# dfs.datanode.http.address
EXPOSE 9864
# dfs.datanode.address
EXPOSE 9866
https://img-blog.csdnimg.cn/direct/1e4e2cb8452442c39bb892e735e9b9e3.png#pic_center
https://img-blog.csdnimg.cn/direct/ca734344ffd84b57aa76996cb73034c9.png#pic_center
core-site.xml

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
    <property>
      <!-- 设置NameNode对外提供HDFS服务 -->
      <name>fs.defaultFS</name>
      <value>hdfs://hadoop330:9000</value>
    </property>
    <property>
      <!-- 指明是否开启用户认证 -->
      <name>hadoop.security.authorization</name>
      <value>false</value>
    </property>
</configuration>
hdfs-site.xml

<?xml version="1.0" encoding="UTF-8"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
    <property>
      <!-- 文件副本数量, 默认是3 -->
      <name>dfs.replication</name>
      <value>3</value>
<!--      <value>1</value>-->
    </property>
    <property>
      <!-- 是否启用文件操作权限, 不启用可以以普通账户写操作HDFS文件和目录 -->
      <name>dfs.permissions</name>
      <value>false</value>
    </property>
    <property>
      <!-- NameNode存储数据的文件所在的路径 -->
      <name>dfs.namenode.name.dir</name>
      <value>/app/hdfs/namenode</value>
    </property>
    <property>
      <!--DataNode存储数据的文件路径 -->
      <name>dfs.datanode.data.dir</name>
      <value>/app/hdfs/datanode</value>
    </property>
    <property>
      <!--设置SecondNameNode节点地址 -->
      <name>dfs.namenode.secondary.http-address</name>
      <value>master2:50090</value>
    </property>
</configuration>
mapred-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
    <property>
      <name>mapreduce.framework.name</name>
      <value>yarn</value>
    </property>
    <property>
      <name>mapreduce.application.classpath</name>
      <value>
            $HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*
      </value>
    </property>
    <property>
      <name>yarn.app.mapreduce.am.env</name>
      <value>HADOOP_MAPRED_HOME=/app/hadoop</value>
    </property>
    <property>
      <name>mapreduce.map.env</name>
      <value>HADOOP_MAPRED_HOME=/app/hadoop</value>
    </property>
    <property>
      <name>mapreduce.reduce.env</name>
      <value>HADOOP_MAPRED_HOME=/app/hadoop</value>
    </property>
</configuration>
yarn-site.xml

<?xml version="1.0"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<configuration>

    <!-- Site specific YARN configuration properties -->

    <property>
      <name>yarn.nodemanager.aux-services</name>
      <value>mapreduce_shuffle</value>
    </property>
    <property>
      <name>yarn.nodemanager.env-whitelist</name>
      <value>
            JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,
            CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME
      </value>
    </property>
    <property>
      <name>yarn.resourcemanager.address</name>
      <value>hadoop330:8032</value>
    </property>
    <property>
      <name>yarn.resourcemanager.scheduler.address</name>
      <value>hadoop330:8030</value>
    </property>
    <property>
      <name>yarn.resourcemanager.resource-tracker.address</name>
      <value>hadoop330:8031</value>
    </property>
</configuration>
二、java8和hadoop配置文件准备好后,从Dockerfile开始构建hadoop基础镜像

docker build -t hadoop .
https://img-blog.csdnimg.cn/direct/fe160113020945bcb3bafcd3c5ff275b.png#pic_center
三、编写docker-compose.yml文件,执行docker-compose启动5个容器

docker-compose.yml

version: '3'
services:
master1:
    image: hadoop
    stdin_open: true
    tty: true
    command: bash
    hostname: hadoop330
    ports:
      - "9000:9000"
      - "9870:9870"
      - "8088:8088"
master2:
    image: hadoop
    stdin_open: true
    tty: true
    command: bash
worker1:
    image: hadoop
    stdin_open: true
    tty: true
    command: bash
worker2:
    image: hadoop
    stdin_open: true
    tty: true
    command: bash
worker3:
    image: hadoop
    stdin_open: true
    tty: true
    command: bash
https://img-blog.csdnimg.cn/direct/47f2d76be0ce41c5abbd7b12fbb96f30.png#pic_center
!!!假如没有下载docker-compose,依次执行以下命令

yum -y install epel-release
yum install python-pip
wget https://github.com/docker/compose/releases/download/1.14.0-rc2/docker-compose-Linux-x86_64
mv docker-compose-Linux-x86_64 /usr/local/bin/docker-compose
chmod +x /usr/local/bin/docker-compose
docker-compose -version
https://img-blog.csdnimg.cn/direct/26f7197765424388b81495c3eeae3036.png#pic_center
执行docker-compose启动5个容器

docker-compose up -d
https://img-blog.csdnimg.cn/direct/8e5b69e0f877445ab5ac94762fd3013e.png#pic_center
四、启动hadoop集群和测试验证

观察本身容器名字,我的主节点容器名字是root_master1_1

https://img-blog.csdnimg.cn/direct/572f0b4bb07a4b7db768c62f3ceb90b2.png#pic_center
进入主节点容器

docker exec -it root_master1_1 bash
https://img-blog.csdnimg.cn/direct/9a224d9f60084366b35eb4cb5a36b49c.png#pic_center
先格式化HDFS

./bin/hdfs namenode -format
https://img-blog.csdnimg.cn/direct/9f968ce4965b4bc180721af643855e81.png#pic_center
安装sudo

apt-get install sudo
https://img-blog.csdnimg.cn/direct/9896743f71f74be2a3767d900d71e70d.png#pic_center
启动hadoop集群

./sbin/start-all.sh
https://img-blog.csdnimg.cn/direct/488f692e19254fca8cf28b5d1097e50e.png#pic_center
查看各个容器进程(jps)

Master1:

https://img-blog.csdnimg.cn/direct/25c537f60b4f4650be8cd6306cc804e0.png#pic_center
Master2:

https://img-blog.csdnimg.cn/direct/09e704ba4a92423a82299928e2beed45.png#pic_center
exit退出,进入worker1:

https://img-blog.csdnimg.cn/direct/820f5f715b774cdc978e590de8f0f787.png#pic_center
worker2:

https://img-blog.csdnimg.cn/direct/35c37bedef6d4d05bd617050d1149685.png#pic_center
worker3:

https://img-blog.csdnimg.cn/direct/80e7911cce934af483b981c960a09b76.png#pic_center
hadoop相关进程都启动了,接下来,验证web界面

打开网页localhost:8088, 能看到3个激活状态的从节点

https://img-blog.csdnimg.cn/direct/4ee1d62a304f4e829d8d492e7ffdfb7a.png#pic_center
打开网页localhost:9870, NameNode的Web UI服务

https://img-blog.csdnimg.cn/direct/315aed7c1b554a36824ac3fdcc073253.png#pic_center
意味着组件全部启动乐成


免责声明:如果侵犯了您的权益,请联系站长,我们会及时删除侵权内容,谢谢合作!更多信息从访问主页:qidao123.com:ToB企服之家,中国第一个企服评测及商务社交产业平台。
页: [1]
查看完整版本: docker配置全分布式hadoop(5台容器两台主节点,三台从节点)