ARG REF=main
RUN conda activate ai &&
cd &&
git clone https://github.com/huggingface/transformers && cd transformers && git checkout $REF &&
cd … &&
pip install --no-cache-dir ./transformers[deepspeed-testing] &&
pip install --no-cache-dir git+https://github.com/huggingface/accelerate@main#egg=accelerate &&\
recompile apex
# pip uninstall -y apex &&\
复制代码
RUN git clone https://github.com/NVIDIA/apex
MAX_JOBS=1 disables parallel building to avoid cpu memory OOM when building image on GitHub Action (standard) runners
TODO: check if there is alternative way to install latest apex
RUN cd apex && MAX_JOBS=1 python3 -m pip install --global-option=“–cpp_ext” --global-option=“–cuda_ext” --no-cache -v --disable-pip-version-check .
Pre-build latest DeepSpeed, so it would be ready for testing (otherwise, the 1st deepspeed test will timeout)
pip uninstall -y deepspeed
复制代码
This has to be run (again) inside the GPU VMs running the tests.
The installation works here, but some tests fail, if we don’t pre-build deepspeed again in the VMs running the tests.
TODO: Find out why test fail.
RUN DS_BUILD_CPU_ADAM=1 DS_BUILD_FUSED_ADAM=1
RUN conda activate ai &&
pip install deepspeed --global-option=“build_ext”
–global-option=“-j8” --no-cache -v --disable-pip-version-check 2>&1
When installing in editable mode, transformers is not recognized as a package.
this line must be added in order for python to be aware of transformers.
RUN conda activate ai &&
cd &&
cd transformers && python3 setup.py develop
The base image ships with pydantic==1.8.2 which is not working - i.e. the next command fails
RUN conda activate ai &&
pip install -U --no-cache-dir “pydantic<2”
RUN conda activate ai &&
python3 -c “from deepspeed.launcher.runner import main”
RUN apt-get update &&
rm -rf /var/lib/apt/lists/* &&
apt-get clean
[code]
### 缓存设置
预训练模型会被下载并本地缓存到 `~/.cache/huggingface/hub`。这是由环境变量 `TRANSFORMERS_CACHE` 指定的默认目录。在 Windows 上,默认目录为 `C:\Users\username\.cache\huggingface\hub`。你可以按照不同优先级改变下述环境变量,以指定不同的缓存目录。