Launch Linux( ubuntu14.04) GPU Acc machine in AWS

打印 上一主题 下一主题

主题 1376|帖子 1376|积分 4128

TL; DR

In order to deploy network to train Deep Learning Network, a GPU Enabled machine is required. Fortunately, AWS provides GPU Accelerated Machine.
https://aws.amazon.com/blogs/aws/new-g2-instance-type-with-4x-more-gpu-power/
Installation scripts:
Install Nvidia Drivers, CUDNn, Python, TensorFlow on Ubuntu 16.04
Provision Machine



  • AMI
    Ubuntu Server 14.04 LTS (HVM), SSD Volume Type
  • Select Instance Type

http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using_cluster_computing.html


  • Deploy it

About CUDA Cores (2560)

Nvidia GPU Product Matrix
Install TensorFlow with pip

manual
使用python3

  1. # ubuntu @ dagama in ~ [2:54:27] C:1
  2. $ cd /usr/local/bin
  3. # ubuntu @ dagama in /usr/local/bin [2:54:46]
  4. $ ls -l|grep pip
  5. -rwxr-xr-x 1 root root 204 Nov  7 11:08 pip
  6. -rwxr-xr-x 1 root root 204 Nov  7 11:08 pip2
  7. -rwxr-xr-x 1 root root 204 Nov  7 11:08 pip2.7
  8. $ sudo mv pip2 ~/bakup1
  9. $ sudo mv pip2.7 ~/bakup1
  10. # ubuntu @ dagama in /usr/local/bin [2:57:46]
  11. $ ls -l|grep pip
  12. -rwxr-xr-x 1 root root 204 Nov  7 11:08 pip
  13. ###尝试用pip安装模块,以查看pip是否安装成功###
  14. $ pip install wheel
  15. Traceback (most recent call last):
  16.   File "/usr/local/bin/pip", line 7, in <module>
  17.     from pip import main
  18. ImportError: No module named 'pip
  19. ###应该是安装python3的pip? 并更新pip###
  20. $ sudo apt-get install python3-pip
  21. $sudo pip install --upgrade pip
  22. $ pip --version
  23. pip 9.0.1 from /usr/local/lib/python3.4/dist-packages (python 3.4)
复制代码
Install required packages

  1. sudo apt-get install python-numpy python-scipy python-matplotlib ipython ipython-notebook python-pandas python-sympy python-nose
  2. # 直接利用"pip install -U scikit-learn "安装scikit-learn,会提示"UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 52: ordinal not in range(128)"的错误,可以先升级一下setuptools,如下
  3. sudo pip install --upgrade setuptools
  4. sudo pip install -U scikit-learn  # 安装成功
复制代码
Install tensorflow0.9.0(python3.4)

  1. # Ubuntu/Linux 64-bit, GPU enabled, Python 3.4
  2. # Requires CUDA toolkit 7.5 and CuDNN v4. For other versions, see "Install from sources" below.
  3. $ export TF_BINARY_URL=https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow-0.9.0-cp34-cp34m-linux_x86_64.whl
  4. # Python3
  5. $ sudo pip3 install --upgrade $TF_BINARY_UR
复制代码
But there is no 'configure’script at the root of the tree (in the tensorflow), so I clone the tensorflow repository, as follows:
Clone the TensorFlow repository

  1. $ git clone https://github.com/tensorflow/tensorflow
复制代码
Install Drivers

https://aws.amazon.com/blogs/aws/new-g2-instance-type-with-4x-more-gpu-power/
Install utilities

  1. sudo apt-get install wget zsh git curl ack-grep -yy
复制代码
Installing NVIDIA Driver

manual

CUDA Driver

manual

  1. sudo dpkg -i cuda-repo-ubuntu1404_8.0.44-1_amd64.deb
  2. sudo apt-get update
  3. sudo apt-get install cuda
复制代码
Setup CUDA_HOME in PATH

edit /etc/profile
  1. export CUDA_HOME=/usr/local/cuda
  2. export PATH=$PATH:$CUDA_HOME/bin
  3. export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CUDA_HOME/lib64
复制代码
CUDNN

Install cuDNN v5.
Uncompress and copy the cuDNN files into the toolkit directory. Assuming the toolkit is installed in /usr/local/cuda, run the following commands (edited to reflect the cuDNN version you downloaded):
  1. tar xvzf cudnn-8.0-linux-x64-v5.1.tgz
  2. sudo cp cuda/include/cudnn.h /usr/local/cuda/include
  3. sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
  4. sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*
  5. cd /usr/local/cuda/lib64/
  6. sudo rm -rf libcudnn.so libcudnn.so.5
  7. sudo ln -s libcudnn.so.5.0.5 libcudnn.so.5
  8. sudo ln -s libcudnn.so.5 libcudnn.so
复制代码
Install bazel

manual
For Ubuntu Trusty (14.04 LTS) users, since OpenJDK 8 is not available on Trusty, please install Oracle JDK 8:
  1. $ sudo add-apt-repository ppa:webupd8team/java
  2. $ sudo apt-get update
  3. $ sudo apt-get install oracle-java8-installer
复制代码
Note: You might need to sudo apt-get install software-properties-common if you don’t have the add-apt-repository command. See here.
  1. $ sudo apt-get update && sudo apt-get install bazel
  2. #Once installed, you can upgrade to newer version of Bazel with:
  3. $ sudo apt-get upgrade bazel
复制代码
Launch tensorflow



免责声明:如果侵犯了您的权益,请联系站长,我们会及时删除侵权内容,谢谢合作!更多信息从访问主页:qidao123.com:ToB企服之家,中国第一个企服评测及商务社交产业平台。

本帖子中包含更多资源

您需要 登录 才可以下载或查看,没有账号?立即注册

x
回复

使用道具 举报

0 个回复

倒序浏览

快速回复

您需要登录后才可以回帖 登录 or 立即注册

本版积分规则

光之使者

论坛元老
这个人很懒什么都没写!
快速回复 返回顶部 返回列表