Docker容器搭建hadoop
集群规划
master: 主节点, 运行NameNode和ResourceManager
master2: 主节点, 运行SecondaryNameNode
worker1、worker2、worker3: 从节点, 运行DataNode和NodeManager
一、把jdk传到假造机上,提前准备好Dockerfile文件,hadoop配置文件(Dockerfile所在路径创建目录etc/hadoop,需要改动的hadoop配置文件放置在该目录下)(我是在/root下)
Dockerfile文件的内容
- # syntax=docker/dockerfile:1
- # 参考资料: https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/ClusterSetup.html
- FROM ubuntu:20.04
- ARG TARBALL=hadoop-3.4.0.tar.gz
- # 提前下载好的java8压缩包, 下载地址: https://www.oracle.com/java/technologies/downloads/
- ARG JAVA_TARBALL=jdk-8u212-linux-x64.tar.gz
- ENV HADOOP_HOME /app/hadoop
- ENV JAVA_HOME /usr/java
- WORKDIR $JAVA_HOME
- WORKDIR $HADOOP_HOME
- RUN sed -i s@/archive.ubuntu.com/@/mirrors.aliyun.com/@g /etc/apt/sources.list && \
- apt-get clean && \
- apt-get update && \
- apt-get upgrade -y && \
- apt-get install -y --no-install-recommends \
- wget \
- ssh
- # 拷贝jdk8安装包
- COPY ${JAVA_TARBALL} ${JAVA_HOME}/${JAVA_TARBALL}
- RUN tar -zxvf /usr/java/${JAVA_TARBALL} --strip-components 1 -C /usr/java && \
- rm /usr/java/${JAVA_TARBALL} && \
- # 设置java8环境变量
- echo export JAVA_HOME=${JAVA_HOME} >> ~/.bashrc && \
- echo export PATH=\$PATH:\$JAVA_HOME/bin >> ~/.bashrc && \
- echo export JAVA_HOME=${JAVA_HOME} >> /etc/profile && \
- echo export PATH=\$PATH:\$JAVA_HOME/bin >> /etc/profile && \
- # 下载hadoop安装包
- wget --no-check-certificate https://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/stable/${TARBALL} && \
- # 解压hadoop安装包
- tar -zxvf ${TARBALL} --strip-components 1 -C $HADOOP_HOME && \
- rm ${TARBALL} && \
- # 设置从节点
- echo "worker1\nworker2\nworker3" > $HADOOP_HOME/etc/hadoop/workers && \
- echo export HADOOP_HOME=${HADOOP_HOME} >> ~/.bashrc && \
- echo export PATH=\$PATH:\$HADOOP_HOME/bin >> ~/.bashrc && \
- echo export PATH=\$PATH:\$HADOOP_HOME/sbin >> ~/.bashrc && \
- echo export HADOOP_HOME=${HADOOP_HOME} >> /etc/profile && \
- echo export PATH=\$PATH:\$HADOOP_HOME/bin >> /etc/profile && \
- echo export PATH=\$PATH:\$HADOOP_HOME/sbin >> /etc/profile && \
- mkdir /app/hdfs && \
- # java8软连接
- ln -s $JAVA_HOME/bin/java /bin/java
- # 拷贝hadoop配置文件
- COPY ./etc/hadoop/hdfs-site.xml $HADOOP_HOME/etc/hadoop/hdfs-site.xml
- COPY ./etc/hadoop/core-site.xml $HADOOP_HOME/etc/hadoop/core-site.xml
- COPY ./etc/hadoop/mapred-site.xml $HADOOP_HOME/etc/hadoop/mapred-site.xml
- COPY ./etc/hadoop/yarn-site.xml $HADOOP_HOME/etc/hadoop/yarn-site.xml
- # 设置hadoop环境变量
- RUN echo export JAVA_HOME=$JAVA_HOME >> $HADOOP_HOME/etc/hadoop/hadoop-env.sh && \
- echo export HADOOP_MAPRED_HOME=$HADOOP_HOME >> $HADOOP_HOME/etc/hadoop/hadoop-env.sh && \
- echo export HDFS_NAMENODE_USER=root >> $HADOOP_HOME/etc/hadoop/hadoop-env.sh && \
- echo export HDFS_DATANODE_USER=root >> $HADOOP_HOME/etc/hadoop/hadoop-env.sh && \
- echo export HDFS_SECONDARYNAMENODE_USER=root >> $HADOOP_HOME/etc/hadoop/hadoop-env.sh && \
- echo export YARN_RESOURCEMANAGER_USER=root >> $HADOOP_HOME/etc/hadoop/hadoop-env.sh && \
- echo export YARN_NODEMANAGER_USER=root >> $HADOOP_HOME/etc/hadoop/hadoop-env.sh
- # ssh免登录设置
- RUN echo "/etc/init.d/ssh start" >> ~/.bashrc && \
- ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa && \
- cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys && \
- chmod 0600 ~/.ssh/authorized_keys
- # NameNode WEB UI服务端口
- EXPOSE 9870
- # nn文件服务端口
- EXPOSE 9000
- # dfs.namenode.secondary.http-address
- EXPOSE 9868
- # dfs.datanode.http.address
- EXPOSE 9864
- # dfs.datanode.address
- EXPOSE 9866
复制代码
core-site.xml
- <?xml version="1.0" encoding="UTF-8"?>
- <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
- <!--
- Licensed under the Apache License, Version 2.0 (the "License");
- you may not use this file except in compliance with the License.
- You may obtain a copy of the License at
- http://www.apache.org/licenses/LICENSE-2.0
- Unless required by applicable law or agreed to in writing, software
- distributed under the License is distributed on an "AS IS" BASIS,
- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- See the License for the specific language governing permissions and
- limitations under the License. See accompanying LICENSE file.
- -->
- <!-- Put site-specific property overrides in this file. -->
- <configuration>
- <property>
- <!-- 设置NameNode对外提供HDFS服务 -->
- <name>fs.defaultFS</name>
- <value>hdfs://hadoop330:9000</value>
- </property>
- <property>
- <!-- 指明是否开启用户认证 -->
- <name>hadoop.security.authorization</name>
- <value>false</value>
- </property>
- </configuration>
复制代码 hdfs-site.xml
- <?xml version="1.0" encoding="UTF-8"?>
- <!--
- Licensed under the Apache License, Version 2.0 (the "License");
- you may not use this file except in compliance with the License.
- You may obtain a copy of the License at
- http://www.apache.org/licenses/LICENSE-2.0
- Unless required by applicable law or agreed to in writing, software
- distributed under the License is distributed on an "AS IS" BASIS,
- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- See the License for the specific language governing permissions and
- limitations under the License. See accompanying LICENSE file.
- -->
- <!-- Put site-specific property overrides in this file. -->
- <configuration>
- <property>
- <!-- 文件副本数量, 默认是3 -->
- <name>dfs.replication</name>
- <value>3</value>
- <!-- <value>1</value>-->
- </property>
- <property>
- <!-- 是否启用文件操作权限, 不启用可以以普通账户写操作HDFS文件和目录 -->
- <name>dfs.permissions</name>
- <value>false</value>
- </property>
- <property>
- <!-- NameNode存储数据的文件所在的路径 -->
- <name>dfs.namenode.name.dir</name>
- <value>/app/hdfs/namenode</value>
- </property>
- <property>
- <!-- DataNode存储数据的文件路径 -->
- <name>dfs.datanode.data.dir</name>
- <value>/app/hdfs/datanode</value>
- </property>
- <property>
- <!-- 设置SecondNameNode节点地址 -->
- <name>dfs.namenode.secondary.http-address</name>
- <value>master2:50090</value>
- </property>
- </configuration>
复制代码 mapred-site.xml
- <?xml version="1.0"?>
- <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
- <!--
- Licensed under the Apache License, Version 2.0 (the "License");
- you may not use this file except in compliance with the License.
- You may obtain a copy of the License at
- http://www.apache.org/licenses/LICENSE-2.0
- Unless required by applicable law or agreed to in writing, software
- distributed under the License is distributed on an "AS IS" BASIS,
- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- See the License for the specific language governing permissions and
- limitations under the License. See accompanying LICENSE file.
- -->
- <!-- Put site-specific property overrides in this file. -->
- <configuration>
- <property>
- <name>mapreduce.framework.name</name>
- <value>yarn</value>
- </property>
- <property>
- <name>mapreduce.application.classpath</name>
- <value>
- $HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*
- </value>
- </property>
- <property>
- <name>yarn.app.mapreduce.am.env</name>
- <value>HADOOP_MAPRED_HOME=/app/hadoop</value>
- </property>
- <property>
- <name>mapreduce.map.env</name>
- <value>HADOOP_MAPRED_HOME=/app/hadoop</value>
- </property>
- <property>
- <name>mapreduce.reduce.env</name>
- <value>HADOOP_MAPRED_HOME=/app/hadoop</value>
- </property>
- </configuration>
复制代码 yarn-site.xml
- <?xml version="1.0"?>
- <!--
- Licensed under the Apache License, Version 2.0 (the "License");
- you may not use this file except in compliance with the License.
- You may obtain a copy of the License at
- http://www.apache.org/licenses/LICENSE-2.0
- Unless required by applicable law or agreed to in writing, software
- distributed under the License is distributed on an "AS IS" BASIS,
- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- See the License for the specific language governing permissions and
- limitations under the License. See accompanying LICENSE file.
- -->
- <configuration>
- <!-- Site specific YARN configuration properties -->
- <property>
- <name>yarn.nodemanager.aux-services</name>
- <value>mapreduce_shuffle</value>
- </property>
- <property>
- <name>yarn.nodemanager.env-whitelist</name>
- <value>
- JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,
- CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME
- </value>
- </property>
- <property>
- <name>yarn.resourcemanager.address</name>
- <value>hadoop330:8032</value>
- </property>
- <property>
- <name>yarn.resourcemanager.scheduler.address</name>
- <value>hadoop330:8030</value>
- </property>
- <property>
- <name>yarn.resourcemanager.resource-tracker.address</name>
- <value>hadoop330:8031</value>
- </property>
- </configuration>
复制代码 二、java8和hadoop配置文件准备好后,从Dockerfile开始构建hadoop基础镜像
三、编写docker-compose.yml文件,执行docker-compose启动5个容器
docker-compose.yml
- version: '3'
- services:
- master1:
- image: hadoop
- stdin_open: true
- tty: true
- command: bash
- hostname: hadoop330
- ports:
- - "9000:9000"
- - "9870:9870"
- - "8088:8088"
- master2:
- image: hadoop
- stdin_open: true
- tty: true
- command: bash
- worker1:
- image: hadoop
- stdin_open: true
- tty: true
- command: bash
- worker2:
- image: hadoop
- stdin_open: true
- tty: true
- command: bash
- worker3:
- image: hadoop
- stdin_open: true
- tty: true
- command: bash
复制代码
!!!假如没有下载docker-compose,依次执行以下命令
- yum -y install epel-release
- yum install python-pip
- wget https://github.com/docker/compose/releases/download/1.14.0-rc2/docker-compose-Linux-x86_64
- mv docker-compose-Linux-x86_64 /usr/local/bin/docker-compose
- chmod +x /usr/local/bin/docker-compose
- docker-compose -version
复制代码
执行docker-compose启动5个容器
四、启动hadoop集群和测试验证
观察本身容器名字,我的主节点容器名字是root_master1_1
进入主节点容器
- docker exec -it root_master1_1 bash
复制代码
先格式化HDFS
- ./bin/hdfs namenode -format
复制代码
安装sudo
启动hadoop集群
查看各个容器进程(jps)
Master1:
Master2:
exit退出,进入worker1:
worker2:
worker3:
hadoop相关进程都启动了,接下来,验证web界面
打开网页localhost:8088, 能看到3个激活状态的从节点
打开网页localhost:9870, NameNode的Web UI服务
意味着组件全部启动乐成
免责声明:如果侵犯了您的权益,请联系站长,我们会及时删除侵权内容,谢谢合作!更多信息从访问主页:qidao123.com:ToB企服之家,中国第一个企服评测及商务社交产业平台。 |