快速体验LoRA微调Llama3-8B模型以及海光DCU推理加快（曙光超算互联网平台国 - qidao123.com技术社区-IT企服评测·应用市场

root@notebook-1813389960667746306-scnlbe5oi5-17811:/public/home/scnlbe5oi5/Downloads/models/LLaMA-Factory# conda create -n llama_factory_torch --clone base
Source: /opt/conda
Destination: /opt/conda/envs/llama_factory_torch
The following packages cannot be cloned out of the root environment:
- https://repo.anaconda.com/pkgs/main/linux-64::conda-23.7.4-py310h06a4308_0
Packages: 44
Files: 53489
Downloading and Extracting Packages
Downloading and Extracting Packages
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
#
# To activate this environment, use
#
# $ conda activate llama_factory_torch
#
# To deactivate an active environment, use
#
# $ conda deactivate

复制代码

git clone --depth 1 https://github.com/hiyouga/LLaMA-Factory.git
cd LLaMA-Factory
pip install -e ".[torch,metrics]"

复制代码

root@notebook-1813389960667746306-scnlbe5oi5-17811:/public/home/scnlbe5oi5/Downloads/models/LLaMA-Factory# source activate llama_factory_torch
(llama_factory_torch) root@notebook-1813389960667746306-scnlbe5oi5-17811:/public/home/scnlbe5oi5/Downloads/m
odels/LLaMA-Factory# pip install -e ".[torch,metrics]"
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Obtaining file:///public/home/scnlbe5oi5/Downloads/models/LLaMA-Factory
Installing build dependencies ... done
Checking if build backend supports build_editable ... done
Getting requirements to build editable ... done
Preparing editable metadata (pyproject.toml) ... done
...
Checking if build backend supports build_editable ... done
Building wheels for collected packages: llamafactory
Building editable for llamafactory (pyproject.toml) ... done
Created wheel for llamafactory: filename=llamafactory-0.8.4.dev0-0.editable-py3-none-any.whl size=20781 sha256=70c0480e2b648516e0eac3d39371d4100cbdaa1f277d87b657bf2adec9e0b2be
Stored in directory: /tmp/pip-ephem-wheel-cache-uhypmj_8/wheels/e9/b4/89/f13e921e37904ee0c839434aad2d7b2951c2c68e596667c7ef
Successfully built llamafactory
DEPRECATION: lmdeploy 0.1.0-git782048c.abi0.dtk2404.torch2.1. has a non-standard version number. pip 24.1 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of lmdeploy or contact the author to suggest that they release a version with a conforming version number. Discussion can be found at https://github.com/pypa/pip/issues/12063
DEPRECATION: mmcv 2.0.1-gitc0ccf15.abi0.dtk2404.torch2.1. has a non-standard version number. pip 24.1 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of mmcv or contact the author to suggest that they release a version with a conforming version number. Discussion can be found at https://github.com/pypa/pip/issues/12063
Installing collected packages: pydub, jieba, urllib3, tomlkit, shtab, semantic-version, scipy, ruff, rouge-chinese, joblib, importlib-resources, ffmpy, docstring-parser, aiofiles, nltk, tyro, sse-starlette, tokenizers, gradio-client, transformers, trl, peft, gradio, llamafactory
Attempting uninstall: urllib3
Found existing installation: urllib3 1.26.13
Uninstalling urllib3-1.26.13:
Successfully uninstalled urllib3-1.26.13
Attempting uninstall: tokenizers
Found existing installation: tokenizers 0.15.0
Uninstalling tokenizers-0.15.0:
Successfully uninstalled tokenizers-0.15.0
Attempting uninstall: transformers
Found existing installation: transformers 4.38.0
Uninstalling transformers-4.38.0:
Successfully uninstalled transformers-4.38.0
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
lmdeploy 0.1.0-git782048c.abi0.dtk2404.torch2.1. requires transformers==4.33.2, but you have transformers 4.43.3 which is incompatible.
Successfully installed aiofiles-23.2.1 docstring-parser-0.16 ffmpy-0.4.0 gradio-4.40.0 gradio-client-1.2.0 importlib-resources-6.4.0 jieba-0.42.1 joblib-1.4.2 llamafactory-0.8.4.dev0 nltk-3.8.1 peft-0.12.0 pydub-0.25.1 rouge-chinese-1.0.3 ruff-0.5.5 scipy-1.14.0 semantic-version-2.10.0 shtab-1.7.1 sse-starlette-2.1.3 tokenizers-0.19.1 tomlkit-0.12.0 transformers-4.43.3 trl-0.9.6 tyro-0.8.5 urllib3-2.2.2
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
[notice] A new release of pip is available: 24.0 -> 24.2
[notice] To update, run: pip install --upgrade pip

复制代码

(llama_factory_torch) root@notebook-1813389960667746306-scnlbe5oi5-17811:/public/home/scnlbe5oi5/Downloads/m
odels/LLaMA-Factory# pip install --no-deps -e .
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Obtaining file:///public/home/scnlbe5oi5/Downloads/models/LLaMA-Factory
Installing build dependencies ... done
Checking if build backend supports build_editable ... done
Getting requirements to build editable ... done
Preparing editable metadata (pyproject.toml) ... done
Building wheels for collected packages: llamafactory
Building editable for llamafactory (pyproject.toml) ... done
Created wheel for llamafactory: filename=llamafactory-0.8.4.dev0-0.editable-py3-none-any.whl size=20781 sha256=f874a791bc9fdca02075cda0459104b48a57d300a077eca00eee7221cde429c3
Stored in directory: /tmp/pip-ephem-wheel-cache-7vjiq3f3/wheels/e9/b4/89/f13e921e37904ee0c839434aad2d7b2951c2c68e596667c7ef
Successfully built llamafactory
Installing collected packages: llamafactory
Attempting uninstall: llamafactory
Found existing installation: llamafactory 0.8.4.dev0
Uninstalling llamafactory-0.8.4.dev0:
Successfully uninstalled llamafactory-0.8.4.dev0
Successfully installed llamafactory-0.8.4.dev0
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
[notice] A new release of pip is available: 24.0 -> 24.2
[notice] To update, run: pip install --upgrade pip

复制代码

(llama_factory_torch) root@notebook-1813389960667746306-scnlbe5oi5-17811:/public/home/scnlbe5oi5/Downloads/m
odels/LLaMA-Factory# pip list | grep llvm
[notice] A new release of pip is available: 24.0 -> 24.2
[notice] To update, run: pip install --upgrade pip
(llama_factory_torch) root@notebook-1813389960667746306-scnlbe5oi5-17811:/public/home/scnlbe5oi5/Downloads/m
odels/LLaMA-Factory# pip install --no-dependencies vllm==0.4.3
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Collecting vllm==0.4.3
Using cached https://pypi.tuna.tsinghua.edu.cn/packages/1a/1e/10bcb6566f4fa8b95ff85bddfd1675ff7db33ba861f59bd70aa3b92a46b7/vllm-0.4.3-cp310-cp310-manylinux1_x86_64.whl (131.1 MB)
Installing collected packages: vllm
Attempting uninstall: vllm
Found existing installation: vllm 0.3.3+git3380931.abi0.dtk2404.torch2.1
Uninstalling vllm-0.3.3+git3380931.abi0.dtk2404.torch2.1:
Successfully uninstalled vllm-0.3.3+git3380931.abi0.dtk2404.torch2.1
Successfully installed vllm-0.4.3
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
[notice] A new release of pip is available: 24.0 -> 24.2
[notice] To update, run: pip install --upgrade pip

复制代码

pip install --upgrade huggingface_hub
huggingface-cli login

复制代码

(llama_fct) root@notebook-1813389960667746306-scnlbe5oi5-17811:/public/home/scnlbe5oi5/Downloads/mod
els/LLaMA-Factory# llamafactory-cli help
No ROCm runtime is found, using ROCM_HOME='/opt/dtk'
/opt/conda/envs/llama_fct/lib/python3.10/site-packages/torchvision/io/image.py:13: UserWarning: Fail ed to load image Python extension: 'libc10_hip.so: cannot open shared object file: No such file or d irectory'If you don't plan on using image functionality from `torchvision.io`, you can ignore this w arning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg` or ` libpng` installed before building `torchvision` from source?
warn(
[2024-08-01 15:12:24,629] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
----------------------------------------------------------------------
| Usage: |
| llamafactory-cli api -h: launch an OpenAI-style API server |
| llamafactory-cli chat -h: launch a chat interface in CLI |
| llamafactory-cli eval -h: evaluate models |
| llamafactory-cli export -h: merge LoRA adapters and export model |
| llamafactory-cli train -h: train models |
| llamafactory-cli webchat -h: launch a chat interface in Web UI |
| llamafactory-cli webui: launch LlamaBoard |
| llamafactory-cli version: show version info |
----------------------------------------------------------------------

复制代码

llamafactory-cli train examples/train_lora/llama3_lora_sft.yaml
llamafactory-cli export examples/merge_lora/llama3_lora_sft.yaml
llamafactory-cli chat examples/inference/llama3_lora_sft.yaml

复制代码

(llama_factory_torch) root@notebook-1813389960667746306-scnlbe5oi5-17811:/public/home/scnlbe5oi5/Downloads/m
odels/LLaMA-Factory# llamafactory-cli train examples/train_lora/llama3_lora_sft.yaml
[2024-08-01 19:06:41,134] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (a uto detect)
08/01/2024 19:06:44 - INFO - llamafactory.hparams.parser - Process rank: 0, device: cuda:0, n_gpu: 1, distri buted training: False, compute dtype: torch.bfloat16
[INFO|tokenization_utils_base.py:2287] 2024-08-01 19:06:45,194 >> loading file tokenizer.json
[INFO|tokenization_utils_base.py:2287] 2024-08-01 19:06:45,194 >> loading file added_tokens.json
[INFO|tokenization_utils_base.py:2287] 2024-08-01 19:06:45,194 >> loading file special_tokens_map.json
[INFO|tokenization_utils_base.py:2287] 2024-08-01 19:06:45,194 >> loading file tokenizer_config.json
[INFO|tokenization_utils_base.py:2533] 2024-08-01 19:06:45,563 >> Special tokens have been added in the voca bulary, make sure the associated word embeddings are fine-tuned or trained.
08/01/2024 19:06:45 - INFO - llamafactory.data.template - Replace eos token: <|eot_id|>
08/01/2024 19:06:45 - INFO - llamafactory.data.template - Add pad token: <|eot_id|>
08/01/2024 19:06:45 - INFO - llamafactory.data.loader - Loading dataset identity.json...
Converting format of dataset (num_proc=16): 100%|███████████████████| 91/91 [00:00<00:00, 444.18 examples/s]
08/01/2024 19:06:47 - INFO - llamafactory.data.loader - Loading dataset alpaca_en_demo.json...
Converting format of dataset (num_proc=16): 100%|██████████████| 1000/1000 [00:00<00:00, 4851.17 examples/s]
Running tokenizer on dataset (num_proc=16): 100%|███████████████| 1091/1091 [00:02<00:00, 375.29 examples/s]
training example:
input_ids:
[128000, 128006, 882, 128007, 271, 6151, 128009, 128006, 78191, 128007, 271, 9906, 0, 358, 1097, 5991, 609, 39254, 459, 15592, 18328, 8040, 555, 5991, 3170, 3500, 13, 2650, 649, 358, 7945, 499, 3432, 30, 128009]
inputs:
<|begin_of_text|><|start_header_id|>user<|end_header_id|>
hi<|eot_id|><|start_header_id|>assistant<|end_header_id|>
Hello! I am {{name}}, an AI assistant developed by {{author}}. How can I assist you today?<|eot_id|>
label_ids:
[-100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, 9906, 0, 358, 1097, 5991, 609, 39254, 459 , 15592, 18328, 8040, 555, 5991, 3170, 3500, 13, 2650, 649, 358, 7945, 499, 3432, 30, 128009]
labels:
Hello! I am {{name}}, an AI assistant developed by {{author}}. How can I assist you today?<|eot_id|>
[INFO|configuration_utils.py:731] 2024-08-01 19:06:53,502 >> loading configuration file /root/.cache/modelsc ope/hub/LLM-Research/Meta-Llama-3-8B-Instruct/config.json
[INFO|configuration_utils.py:800] 2024-08-01 19:06:53,503 >> Model config LlamaConfig {
"_name_or_path": "/root/.cache/modelscope/hub/LLM-Research/Meta-Llama-3-8B-Instruct",
"architectures": [
"LlamaForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"bos_token_id": 128000,
"eos_token_id": 128009,
"hidden_act": "silu",
"hidden_size": 4096,
"initializer_range": 0.02,
"intermediate_size": 14336,
"max_position_embeddings": 8192,
"mlp_bias": false,
"model_type": "llama",
"num_attention_heads": 32,
"num_hidden_layers": 32,
"num_key_value_heads": 8,
"pretraining_tp": 1,
"rms_norm_eps": 1e-05,
"rope_scaling": null,
"rope_theta": 500000.0,
"tie_word_embeddings": false,
"torch_dtype": "bfloat16",
"transformers_version": "4.43.3",
"use_cache": true,
"vocab_size": 128256
}
[INFO|modeling_utils.py:3631] 2024-08-01 19:06:53,534 >> loading weights file /root/.cache/modelscope/hub/LL M-Research/Meta-Llama-3-8B-Instruct/model.safetensors.index.json
[INFO|modeling_utils.py:1572] 2024-08-01 19:06:53,534 >> Instantiating LlamaForCausalLM model under default dtype torch.bfloat16.
[INFO|configuration_utils.py:1038] 2024-08-01 19:06:53,536 >> Generate config GenerationConfig {
"bos_token_id": 128000,
"eos_token_id": 128009
}
Loading checkpoint shards: 100%|██████████████████████████████████████████████| 4/4 [00:08<00:00, 2.04s/it]
[INFO|modeling_utils.py:4463] 2024-08-01 19:07:01,775 >> All model checkpoint weights were used when initial izing LlamaForCausalLM.
[INFO|modeling_utils.py:4471] 2024-08-01 19:07:01,775 >> All the weights of LlamaForCausalLM were initialize d from the model checkpoint at /root/.cache/modelscope/hub/LLM-Research/Meta-Llama-3-8B-Instruct.
If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaFor CausalLM for predictions without further training.
[INFO|configuration_utils.py:991] 2024-08-01 19:07:01,779 >> loading configuration file /root/.cache/modelsc ope/hub/LLM-Research/Meta-Llama-3-8B-Instruct/generation_config.json
[INFO|configuration_utils.py:1038] 2024-08-01 19:07:01,780 >> Generate config GenerationConfig {
"bos_token_id": 128000,
"do_sample": true,
"eos_token_id": [
128001,
128009
],
"max_length": 4096,
"temperature": 0.6,
"top_p": 0.9
}
08/01/2024 19:07:01 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled.
08/01/2024 19:07:01 - INFO - llamafactory.model.model_utils.attention - Using vanilla attention implementati on.
08/01/2024 19:07:01 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32.
08/01/2024 19:07:01 - INFO - llamafactory.model.adapter - Fine-tuning method: LoRA
08/01/2024 19:07:01 - INFO - llamafactory.model.model_utils.misc - Found linear modules: q_proj,up_proj,v_pr oj,down_proj,k_proj,o_proj,gate_proj
08/01/2024 19:07:04 - INFO - llamafactory.model.loader - trainable params: 20,971,520 || all params: 8,051,232,768 || trainable%: 0.2605
Detected kernel version 3.10.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
[INFO|trainer.py:648] 2024-08-01 19:07:04,471 >> Using auto half precision backend
[INFO|trainer.py:2134] 2024-08-01 19:07:04,831 >> ***** Running training *****
[INFO|trainer.py:2135] 2024-08-01 19:07:04,831 >> Num examples = 981
[INFO|trainer.py:2136] 2024-08-01 19:07:04,831 >> Num Epochs = 3
[INFO|trainer.py:2137] 2024-08-01 19:07:04,832 >> Instantaneous batch size per device = 1
[INFO|trainer.py:2140] 2024-08-01 19:07:04,832 >> Total train batch size (w. parallel, distributed & accumulation) = 8
[INFO|trainer.py:2141] 2024-08-01 19:07:04,832 >> Gradient Accumulation steps = 8
[INFO|trainer.py:2142] 2024-08-01 19:07:04,832 >> Total optimization steps = 366
[INFO|trainer.py:2143] 2024-08-01 19:07:04,836 >> Number of trainable parameters = 20,971,520
{'loss': 1.5025, 'grad_norm': 1.3309401273727417, 'learning_rate': 2.702702702702703e-05, 'epoch': 0.08}
{'loss': 1.3424, 'grad_norm': 1.8096668720245361, 'learning_rate': 5.405405405405406e-05, 'epoch': 0.16}
{'loss': 1.1286, 'grad_norm': 1.2990491390228271, 'learning_rate': 8.108108108108109e-05, 'epoch': 0.24}
{'loss': 0.9808, 'grad_norm': 1.1075998544692993, 'learning_rate': 9.997948550797227e-05, 'epoch': 0.33}
{'loss': 0.9924, 'grad_norm': 1.8073676824569702, 'learning_rate': 9.961525153583327e-05, 'epoch': 0.41}
{'loss': 1.0052, 'grad_norm': 1.2079122066497803, 'learning_rate': 9.879896064123961e-05, 'epoch': 0.49}
{'loss': 0.9973, 'grad_norm': 1.7361079454421997, 'learning_rate': 9.753805025397779e-05, 'epoch': 0.57}
{'loss': 0.8488, 'grad_norm': 1.1059085130691528, 'learning_rate': 9.584400884284545e-05, 'epoch': 0.65}
{'loss': 0.9893, 'grad_norm': 0.8711654543876648, 'learning_rate': 9.373227124134888e-05, 'epoch': 0.73}
{'loss': 0.9116, 'grad_norm': 1.3793599605560303, 'learning_rate': 9.122207801708802e-05, 'epoch': 0.82}
{'loss': 1.0429, 'grad_norm': 1.3769993782043457, 'learning_rate': 8.833630016614976e-05, 'epoch': 0.9}
{'loss': 0.9323, 'grad_norm': 1.2503643035888672, 'learning_rate': 8.510123072976239e-05, 'epoch': 0.98}
{'loss': 0.9213, 'grad_norm': 2.449227809906006, 'learning_rate': 8.154634523184388e-05, 'epoch': 1.06}
{'loss': 0.8386, 'grad_norm': 1.009852409362793, 'learning_rate': 7.770403312015721e-05, 'epoch': 1.14}
40%|███████████████████████████▌ | 146/366 [10:19<15:11, 4.14s/it] {'loss': 0.856, 'grad_norm': 0.863474428653717, 'learning_rate': 7.360930265797935e-05, 'epoch': 1.22}
{'loss': 0.838, 'grad_norm': 0.712546169757843, 'learning_rate': 6.929946195508932e-05, 'epoch': 1.3}
{'loss': 0.8268, 'grad_norm': 1.6060960292816162, 'learning_rate': 6.481377904428171e-05, 'epoch': 1.39}
{'loss': 0.7326, 'grad_norm': 0.7863644957542419, 'learning_rate': 6.019312410053286e-05, 'epoch': 1.47}
{'loss': 0.7823, 'grad_norm': 0.8964634537696838, 'learning_rate': 5.547959706265068e-05, 'epoch': 1.55}
{'loss': 0.7599, 'grad_norm': 0.5305138826370239, 'learning_rate': 5.0716144050239375e-05, 'epoch': 1.63}
{'loss': 0.815, 'grad_norm': 0.8153926730155945, 'learning_rate': 4.594616607090028e-05, 'epoch': 1.71}
{'loss': 0.8258, 'grad_norm': 1.3266267776489258, 'learning_rate': 4.121312358283463e-05, 'epoch': 1.79}
{'loss': 0.7446, 'grad_norm': 1.8706341981887817, 'learning_rate': 3.656014051577713e-05, 'epoch': 1.88}
{'loss': 0.7539, 'grad_norm': 1.5148639678955078, 'learning_rate': 3.202961135812437e-05, 'epoch': 1.96}
{'loss': 0.7512, 'grad_norm': 1.3771291971206665, 'learning_rate': 2.7662814890184818e-05, 'epoch': 2.04}
{'loss': 0.7128, 'grad_norm': 1.420331597328186, 'learning_rate': 2.3499538082923606e-05, 'epoch': 2.12}
{'loss': 0.635, 'grad_norm': 0.9235875010490417, 'learning_rate': 1.9577713588953795e-05, 'epoch': 2.2}
{'loss': 0.6628, 'grad_norm': 1.6558737754821777, 'learning_rate': 1.5933074128684332e-05, 'epoch': 2.28}
{'loss': 0.681, 'grad_norm': 0.8138720393180847, 'learning_rate': 1.2598826920598772e-05, 'epoch': 2.36}
{'loss': 0.6707, 'grad_norm': 1.0700312852859497, 'learning_rate': 9.605351122011309e-06, 'epoch': 2.45}
{'loss': 0.6201, 'grad_norm': 1.3334729671478271, 'learning_rate': 6.979921036993042e-06, 'epoch': 2.53}
{'loss': 0.6698, 'grad_norm': 1.440247893333435, 'learning_rate': 4.746457613389904e-06, 'epoch': 2.61}
{'loss': 0.7072, 'grad_norm': 0.9171076416969299, 'learning_rate': 2.925310493105099e-06, 'epoch': 2.69}
{'loss': 0.6871, 'grad_norm': 0.9809044003486633, 'learning_rate': 1.5330726014397668e-06, 'epoch': 2.77}
{'loss': 0.5931, 'grad_norm': 1.7158288955688477, 'learning_rate': 5.824289648152126e-07, 'epoch': 2.85}
{'loss': 0.6827, 'grad_norm': 1.3241132497787476, 'learning_rate': 8.204113433559201e-08, 'epoch': 2.94}
100%|█████████████████████████████████████████████████████████████████████| 366/366 [25:42<00:00, 4.02s/it] [INFO|trainer.py:3503] 2024-08-01 19:32:47,527 >> Saving model checkpoint to saves/llama3-8b/lora/sft/checkp oint-366
[INFO|configuration_utils.py:731] 2024-08-01 19:32:47,556 >> loading configuration file /root/.cache/modelsc ope/hub/LLM-Research/Meta-Llama-3-8B-Instruct/config.json
[INFO|configuration_utils.py:800] 2024-08-01 19:32:47,557 >> Model config LlamaConfig {
"architectures": [
"LlamaForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"bos_token_id": 128000,
"eos_token_id": 128009,
"hidden_act": "silu",
"hidden_size": 4096,
"initializer_range": 0.02,
"intermediate_size": 14336,
"max_position_embeddings": 8192,
"mlp_bias": false,
"model_type": "llama",
"num_attention_heads": 32,
"num_hidden_layers": 32,
"num_key_value_heads": 8,
"pretraining_tp": 1,
"rms_norm_eps": 1e-05,
"rope_scaling": null,
"rope_theta": 500000.0,
"tie_word_embeddings": false,
"torch_dtype": "bfloat16",
"transformers_version": "4.43.3",
"use_cache": true,
"vocab_size": 128256
}
[INFO|tokenization_utils_base.py:2702] 2024-08-01 19:32:47,675 >> tokenizer config file saved in saves/llama 3-8b/lora/sft/checkpoint-366/tokenizer_config.json
[INFO|tokenization_utils_base.py:2711] 2024-08-01 19:32:47,677 >> Special tokens file saved in saves/llama3- 8b/lora/sft/checkpoint-366/special_tokens_map.json
[INFO|trainer.py:2394] 2024-08-01 19:32:48,046 >>
Training completed. Do not forget to share your model on huggingface.co/models =)
{'train_runtime': 1543.2099, 'train_samples_per_second': 1.907, 'train_steps_per_second': 0.237, 'train_loss ': 0.8416516305318947, 'epoch': 2.98}
100%|█████████████████████████████████████████████████████████████████████| 366/366 [25:43<00:00, 4.22s/it]
[INFO|trainer.py:3503] 2024-08-01 19:32:48,050 >> Saving model checkpoint to saves/llama3-8b/lora/sft
[INFO|configuration_utils.py:731] 2024-08-01 19:32:48,081 >> loading configuration file /root/.cache/modelsc ope/hub/LLM-Research/Meta-Llama-3-8B-Instruct/config.json
[INFO|configuration_utils.py:800] 2024-08-01 19:32:48,082 >> Model config LlamaConfig {
"architectures": [
"LlamaForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"bos_token_id": 128000,
"eos_token_id": 128009,
"hidden_act": "silu",
"hidden_size": 4096,
"initializer_range": 0.02,
"intermediate_size": 14336,
"max_position_embeddings": 8192,
"mlp_bias": false,
"model_type": "llama",
"num_attention_heads": 32,
"num_hidden_layers": 32,
"num_key_value_heads": 8,
"pretraining_tp": 1,
"rms_norm_eps": 1e-05,
"rope_scaling": null,
"rope_theta": 500000.0,
"tie_word_embeddings": false,
"torch_dtype": "bfloat16",
"transformers_version": "4.43.3",
"use_cache": true,
"vocab_size": 128256
}
[INFO|tokenization_utils_base.py:2702] 2024-08-01 19:32:48,191 >> tokenizer config file saved in saves/llama 3-8b/lora/sft/tokenizer_config.json
[INFO|tokenization_utils_base.py:2711] 2024-08-01 19:32:48,192 >> Special tokens file saved in saves/llama3- 8b/lora/sft/special_tokens_map.json
***** train metrics *****
epoch = 2.9847
total_flos = 20619353GF
train_loss = 0.8417
train_runtime = 0:25:43.20
train_samples_per_second = 1.907
train_steps_per_second = 0.237
Figure saved at: saves/llama3-8b/lora/sft/training_loss.png
08/01/2024 19:32:48 - WARNING - llamafactory.extras.ploting - No metric eval_loss to plot.
08/01/2024 19:32:48 - WARNING - llamafactory.extras.ploting - No metric eval_accuracy to plot.
[INFO|trainer.py:3819] 2024-08-01 19:32:48,529 >>
***** Running Evaluation *****
[INFO|trainer.py:3821] 2024-08-01 19:32:48,529 >> Num examples = 110
[INFO|trainer.py:3824] 2024-08-01 19:32:48,529 >> Batch size = 1
100%|█████████████████████████████████████████████████████████████████████| 110/110 [00:18<00:00, 6.07it/s]
***** eval metrics *****
epoch = 2.9847
eval_loss = 0.9957
eval_runtime = 0:00:18.23
eval_samples_per_second = 6.031
eval_steps_per_second = 6.031
[INFO|modelcard.py:449] 2024-08-01 19:33:06,773 >> Dropping the following result as it does not have all the necessary fields:
{'task': {'name': 'Causal Language Modeling', 'type': 'text-generation'}}

复制代码

root@notebook-1813389960667746306-scnlbe5oi5-17811:/public/home/scnlbe5oi5/Downloads/models# tree -L 6 LLaMA-Factory/saves/
LLaMA-Factory/saves/
`-- llama3-8b
`-- lora
`-- sft
|-- README.md
|-- adapter_config.json
|-- adapter_model.safetensors
|-- all_results.json
|-- checkpoint-366
| |-- README.md
| |-- adapter_config.json
| |-- adapter_model.safetensors
| |-- optimizer.pt
| |-- rng_state.pth
| |-- scheduler.pt
| |-- special_tokens_map.json
| |-- tokenizer.json
| |-- tokenizer_config.json
| |-- trainer_state.json
| `-- training_args.bin
|-- eval_results.json
|-- special_tokens_map.json
|-- tokenizer.json
|-- tokenizer_config.json
|-- train_results.json
|-- trainer_log.jsonl
|-- trainer_state.json
|-- training_args.bin
`-- training_loss.png

复制代码

(llama_factory_torch) root@notebook-1819291427828183041-scnlbe5oi5-51898:/public/home/scnlbe5oi5/Downloads/models/LLaMA-Factory# CUDA_VISIBLE_DEVICES=0,1,2 llamafactory-cli train examples/train_lora/llama3_lora_sft.yaml
[2024-08-02 18:08:58,775] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
08/02/2024 18:09:01 - INFO - llamafactory.cli - Initializing distributed tasks at: 127.0.0.1:26472
[2024-08-02 18:09:04,227] torch.distributed.run: [WARNING]
[2024-08-02 18:09:04,227] torch.distributed.run: [WARNING] *****************************************
[2024-08-02 18:09:04,227] torch.distributed.run: [WARNING] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
[2024-08-02 18:09:04,227] torch.distributed.run: [WARNING] *****************************************
[2024-08-02 18:09:09,155] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-08-02 18:09:09,269] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-08-02 18:09:09,457] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0802 18:09:12.353489 95618 ProcessGroupNCCL.cpp:686] [Rank 2] ProcessGroupNCCL initialization options:NCCL_ASYNC_ERROR_HANDLING: 1, NCCL_DESYNC_DEBUG: 0, NCCL_ENABLE_TIMING: 0, NCCL_BLOCKING_WAIT: 0, TIMEOUT(ms): 180000000000, USE_HIGH_PRIORITY_STREAM: 0, TORCH_DISTRIBUTED_DEBUG: OFF, NCCL_DEBUG: OFF, ID=353053408
08/02/2024 18:09:12 - WARNING - llamafactory.hparams.parser - `ddp_find_unused_parameters` needs to be set as False for LoRA in DDP training.
08/02/2024 18:09:12 - INFO - llamafactory.hparams.parser - Process rank: 2, device: cuda:2, n_gpu: 1, distributed training: True, compute dtype: torch.bfloat16
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0802 18:09:12.555290 95617 ProcessGroupNCCL.cpp:686] [Rank 1] ProcessGroupNCCL initialization options:NCCL_ASYNC_ERROR_HANDLING: 1, NCCL_DESYNC_DEBUG: 0, NCCL_ENABLE_TIMING: 0, NCCL_BLOCKING_WAIT: 0, TIMEOUT(ms): 180000000000, USE_HIGH_PRIORITY_STREAM: 0, TORCH_DISTRIBUTED_DEBUG: OFF, NCCL_DEBUG: OFF, ID=369111936
08/02/2024 18:09:12 - WARNING - llamafactory.hparams.parser - `ddp_find_unused_parameters` needs to be set as False for LoRA in DDP training.
08/02/2024 18:09:12 - INFO - llamafactory.hparams.parser - Process rank: 1, device: cuda:1, n_gpu: 1, distributed training: True, compute dtype: torch.bfloat16
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0802 18:09:13.120337 95616 ProcessGroupNCCL.cpp:686] [Rank 0] ProcessGroupNCCL initialization options:NCCL_ASYNC_ERROR_HANDLING: 1, NCCL_DESYNC_DEBUG: 0, NCCL_ENABLE_TIMING: 0, NCCL_BLOCKING_WAIT: 0, TIMEOUT(ms): 180000000000, USE_HIGH_PRIORITY_STREAM: 0, TORCH_DISTRIBUTED_DEBUG: OFF, NCCL_DEBUG: OFF, ID=359553664
08/02/2024 18:09:13 - WARNING - llamafactory.hparams.parser - `ddp_find_unused_parameters` needs to be set as False for LoRA in DDP training.
08/02/2024 18:09:13 - INFO - llamafactory.hparams.parser - Process rank: 0, device: cuda:0, n_gpu: 1, distributed training: True, compute dtype: torch.bfloat16
08/02/2024 18:09:13 - INFO - llamafactory.data.template - Replace eos token: <|eot_id|>
08/02/2024 18:09:13 - INFO - llamafactory.data.template - Add pad token: <|eot_id|>
08/02/2024 18:09:13 - INFO - llamafactory.data.template - Replace eos token: <|eot_id|>
08/02/2024 18:09:13 - INFO - llamafactory.data.template - Add pad token: <|eot_id|>
I0802 18:09:14.158418 95618 ProcessGroupNCCL.cpp:2780] Rank 2 using GPU 2 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device.
I0802 18:09:14.165846 95617 ProcessGroupNCCL.cpp:2780] Rank 1 using GPU 1 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device.
[INFO|tokenization_utils_base.py:2287] 2024-08-02 18:09:14,276 >> loading file tokenizer.json
[INFO|tokenization_utils_base.py:2287] 2024-08-02 18:09:14,276 >> loading file added_tokens.json
[INFO|tokenization_utils_base.py:2287] 2024-08-02 18:09:14,276 >> loading file special_tokens_map.json
[INFO|tokenization_utils_base.py:2287] 2024-08-02 18:09:14,276 >> loading file tokenizer_config.json
[INFO|tokenization_utils_base.py:2533] 2024-08-02 18:09:14,684 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
08/02/2024 18:09:14 - INFO - llamafactory.data.template - Replace eos token: <|eot_id|>
08/02/2024 18:09:14 - INFO - llamafactory.data.template - Add pad token: <|eot_id|>
08/02/2024 18:09:14 - INFO - llamafactory.data.loader - Loading dataset identity.json...
Converting format of dataset (num_proc=16): 100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 91/91 [00:00<00:00, 301.60 examples/s]
08/02/2024 18:09:16 - INFO - llamafactory.data.loader - Loading dataset alpaca_en_demo.json...
Converting format of dataset (num_proc=16): 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 1000/1000 [00:00<00:00, 3399.93 examples/s]
I0802 18:09:18.295866 95616 ProcessGroupNCCL.cpp:2780] Rank 0 using GPU 0 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device.
I0802 18:09:19.234498 95616 ProcessGroupNCCL.cpp:1340] NCCL_DEBUG: N/A
08/02/2024 18:09:19 - INFO - llamafactory.data.loader - Loading dataset identity.json...
08/02/2024 18:09:19 - INFO - llamafactory.data.loader - Loading dataset identity.json...
Running tokenizer on dataset (num_proc=16): 0%| | 0/1091 [00:00<?, ? examples/s]08/02/2024 18:09:20 - INFO - llamafactory.data.loader - Loading dataset alpaca_en_demo.json...
08/02/2024 18:09:20 - INFO - llamafactory.data.loader - Loading dataset alpaca_en_demo.json...
Running tokenizer on dataset (num_proc=16): 100%|████████████████████████████████████████████████████████████████████████████████████████████████| 1091/1091 [00:03<00:00, 273.44 examples/s]
training example:
input_ids:
[128000, 128006, 882, 128007, 271, 6151, 128009, 128006, 78191, 128007, 271, 9906, 0, 358, 1097, 5991, 609, 39254, 459, 15592, 18328, 8040, 555, 5991, 3170, 3500, 13, 2650, 649, 358, 7945, 499, 3432, 30, 128009]
inputs:
<|begin_of_text|><|start_header_id|>user<|end_header_id|>
hi<|eot_id|><|start_header_id|>assistant<|end_header_id|>
Hello! I am {{name}}, an AI assistant developed by {{author}}. How can I assist you today?<|eot_id|>
label_ids:
[-100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, 9906, 0, 358, 1097, 5991, 609, 39254, 459, 15592, 18328, 8040, 555, 5991, 3170, 3500, 13, 2650, 649, 358, 7945, 499, 3432, 30, 128009]
labels:
Hello! I am {{name}}, an AI assistant developed by {{author}}. How can I assist you today?<|eot_id|>
[INFO|configuration_utils.py:731] 2024-08-02 18:09:24,080 >> loading configuration file /root/.cache/modelscope/hub/LLM-Research/Meta-Llama-3-8B-Instruct/config.json
[INFO|configuration_utils.py:800] 2024-08-02 18:09:24,082 >> Model config LlamaConfig {
"_name_or_path": "/root/.cache/modelscope/hub/LLM-Research/Meta-Llama-3-8B-Instruct",
"architectures": [
"LlamaForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"bos_token_id": 128000,
"eos_token_id": 128009,
"hidden_act": "silu",
"hidden_size": 4096,
"initializer_range": 0.02,
"intermediate_size": 14336,
"max_position_embeddings": 8192,
"mlp_bias": false,
"model_type": "llama",
"num_attention_heads": 32,
"num_hidden_layers": 32,
"num_key_value_heads": 8,
"pretraining_tp": 1,
"rms_norm_eps": 1e-05,
"rope_scaling": null,
"rope_theta": 500000.0,
"tie_word_embeddings": false,
"torch_dtype": "bfloat16",
"transformers_version": "4.43.3",
"use_cache": true,
"vocab_size": 128256
}
[INFO|modeling_utils.py:3631] 2024-08-02 18:09:24,119 >> loading weights file /root/.cache/modelscope/hub/LLM-Research/Meta-Llama-3-8B-Instruct/model.safetensors.index.json
[INFO|modeling_utils.py:1572] 2024-08-02 18:09:24,119 >> Instantiating LlamaForCausalLM model under default dtype torch.bfloat16.
[INFO|configuration_utils.py:1038] 2024-08-02 18:09:24,121 >> Generate config GenerationConfig {
"bos_token_id": 128000,
"eos_token_id": 128009
}
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:09<00:00, 2.49s/it]
08/02/2024 18:09:34 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled.
08/02/2024 18:09:34 - INFO - llamafactory.model.model_utils.attention - Using vanilla attention implementation.
08/02/2024 18:09:34 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32.
08/02/2024 18:09:34 - INFO - llamafactory.model.adapter - Fine-tuning method: LoRA
08/02/2024 18:09:34 - INFO - llamafactory.model.model_utils.misc - Found linear modules: up_proj,gate_proj,o_proj,q_proj,k_proj,v_proj,down_proj
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:10<00:00, 2.59s/it]
[INFO|modeling_utils.py:4463] 2024-08-02 18:09:34,552 >> All model checkpoint weights were used when initializing LlamaForCausalLM.
[INFO|modeling_utils.py:4471] 2024-08-02 18:09:34,552 >> All the weights of LlamaForCausalLM were initialized from the model checkpoint at /root/.cache/modelscope/hub/LLM-Research/Meta-Llama-3-8B-Instruct.
If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training.
[INFO|configuration_utils.py:991] 2024-08-02 18:09:34,555 >> loading configuration file /root/.cache/modelscope/hub/LLM-Research/Meta-Llama-3-8B-Instruct/generation_config.json
[INFO|configuration_utils.py:1038] 2024-08-02 18:09:34,555 >> Generate config GenerationConfig {
"bos_token_id": 128000,
"do_sample": true,
"eos_token_id": [
128001,
128009
],
"max_length": 4096,
"temperature": 0.6,
"top_p": 0.9
}
08/02/2024 18:09:34 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled.
08/02/2024 18:09:34 - INFO - llamafactory.model.model_utils.attention - Using vanilla attention implementation.
08/02/2024 18:09:34 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32.
08/02/2024 18:09:34 - INFO - llamafactory.model.adapter - Fine-tuning method: LoRA
08/02/2024 18:09:34 - INFO - llamafactory.model.model_utils.misc - Found linear modules: k_proj,o_proj,v_proj,down_proj,q_proj,up_proj,gate_proj
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:10<00:00, 2.52s/it]
08/02/2024 18:09:34 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled.
08/02/2024 18:09:34 - INFO - llamafactory.model.model_utils.attention - Using vanilla attention implementation.
08/02/2024 18:09:34 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32.
08/02/2024 18:09:34 - INFO - llamafactory.model.adapter - Fine-tuning method: LoRA
08/02/2024 18:09:34 - INFO - llamafactory.model.model_utils.misc - Found linear modules: gate_proj,down_proj,q_proj,o_proj,up_proj,v_proj,k_proj
08/02/2024 18:09:34 - INFO - llamafactory.model.loader - trainable params: 20,971,520 || all params: 8,051,232,768 || trainable%: 0.2605
08/02/2024 18:09:34 - INFO - llamafactory.model.loader - trainable params: 20,971,520 || all params: 8,051,232,768 || trainable%: 0.2605
Detected kernel version 3.10.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
[INFO|trainer.py:648] 2024-08-02 18:09:34,983 >> Using auto half precision backend
08/02/2024 18:09:35 - INFO - llamafactory.model.loader - trainable params: 20,971,520 || all params: 8,051,232,768 || trainable%: 0.2605
[INFO|trainer.py:2134] 2024-08-02 18:09:37,114 >> ***** Running training *****
[INFO|trainer.py:2135] 2024-08-02 18:09:37,114 >> Num examples = 981
[INFO|trainer.py:2136] 2024-08-02 18:09:37,114 >> Num Epochs = 3
[INFO|trainer.py:2137] 2024-08-02 18:09:37,114 >> Instantaneous batch size per device = 1
[INFO|trainer.py:2140] 2024-08-02 18:09:37,114 >> Total train batch size (w. parallel, distributed & accumulation) = 24
[INFO|trainer.py:2141] 2024-08-02 18:09:37,114 >> Gradient Accumulation steps = 8
[INFO|trainer.py:2142] 2024-08-02 18:09:37,114 >> Total optimization steps = 120
[INFO|trainer.py:2143] 2024-08-02 18:09:37,119 >> Number of trainable parameters = 20,971,520
{'loss': 1.4267, 'grad_norm': 1.401288628578186, 'learning_rate': 8.333333333333334e-05, 'epoch': 0.24}
{'loss': 1.1319, 'grad_norm': 1.4780751466751099, 'learning_rate': 9.865224352899119e-05, 'epoch': 0.49}
{'loss': 0.9963, 'grad_norm': 0.532632052898407, 'learning_rate': 9.330127018922194e-05, 'epoch': 0.73}
{'loss': 0.9792, 'grad_norm': 0.7996620535850525, 'learning_rate': 8.43120818934367e-05, 'epoch': 0.98}
{'loss': 0.937, 'grad_norm': 0.4041236639022827, 'learning_rate': 7.243995901002312e-05, 'epoch': 1.22}
{'loss': 0.8805, 'grad_norm': 0.5675532221794128, 'learning_rate': 5.868240888334653e-05, 'epoch': 1.47}
{'loss': 0.8467, 'grad_norm': 0.5038197636604309, 'learning_rate': 4.4195354293738484e-05, 'epoch': 1.71}
{'loss': 0.8612, 'grad_norm': 0.7851077914237976, 'learning_rate': 3.019601169804216e-05, 'epoch': 1.96}
{'loss': 0.818, 'grad_norm': 0.450968474149704, 'learning_rate': 1.7860619515673033e-05, 'epoch': 2.2}
{'loss': 0.8308, 'grad_norm': 0.5961077809333801, 'learning_rate': 8.225609429353187e-06, 'epoch': 2.45}
{'loss': 0.8071, 'grad_norm': 0.5323781371116638, 'learning_rate': 2.100524384225555e-06, 'epoch': 2.69}
{'loss': 0.8061, 'grad_norm': 0.7563619017601013, 'learning_rate': 0.0, 'epoch': 2.94}
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 120/120 [09:46<00:00, 4.56s/it][INFO|trainer.py:3503] 2024-08-02 18:19:24,273 >> Saving model checkpoint to saves/llama3-8b/lora/sft/checkpoint-120
[INFO|configuration_utils.py:731] 2024-08-02 18:19:24,304 >> loading configuration file /root/.cache/modelscope/hub/LLM-Research/Meta-Llama-3-8B-Instruct/config.json
[INFO|configuration_utils.py:800] 2024-08-02 18:19:24,305 >> Model config LlamaConfig {
"architectures": [
"LlamaForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"bos_token_id": 128000,
"eos_token_id": 128009,
"hidden_act": "silu",
"hidden_size": 4096,
"initializer_range": 0.02,
"intermediate_size": 14336,
"max_position_embeddings": 8192,
"mlp_bias": false,
"model_type": "llama",
"num_attention_heads": 32,
"num_hidden_layers": 32,
"num_key_value_heads": 8,
"pretraining_tp": 1,
"rms_norm_eps": 1e-05,
"rope_scaling": null,
"rope_theta": 500000.0,
"tie_word_embeddings": false,
"torch_dtype": "bfloat16",
"transformers_version": "4.43.3",
"use_cache": true,
"vocab_size": 128256
}
[INFO|tokenization_utils_base.py:2702] 2024-08-02 18:19:24,432 >> tokenizer config file saved in saves/llama3-8b/lora/sft/checkpoint-120/tokenizer_config.json
[INFO|tokenization_utils_base.py:2711] 2024-08-02 18:19:24,434 >> Special tokens file saved in saves/llama3-8b/lora/sft/checkpoint-120/special_tokens_map.json
[INFO|trainer.py:2394] 2024-08-02 18:19:24,832 >>
Training completed. Do not forget to share your model on huggingface.co/models =)
{'train_runtime': 587.7138, 'train_samples_per_second': 5.008, 'train_steps_per_second': 0.204, 'train_loss': 0.9434665679931641, 'epoch': 2.94}
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 120/120 [09:47<00:00, 4.90s/it]
[INFO|trainer.py:3503] 2024-08-02 18:19:24,837 >> Saving model checkpoint to saves/llama3-8b/lora/sft
[INFO|configuration_utils.py:731] 2024-08-02 18:19:24,907 >> loading configuration file /root/.cache/modelscope/hub/LLM-Research/Meta-Llama-3-8B-Instruct/config.json
[INFO|configuration_utils.py:800] 2024-08-02 18:19:24,908 >> Model config LlamaConfig {
"architectures": [
"LlamaForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"bos_token_id": 128000,
"eos_token_id": 128009,
"hidden_act": "silu",
"hidden_size": 4096,
"initializer_range": 0.02,
"intermediate_size": 14336,
"max_position_embeddings": 8192,
"mlp_bias": false,
"model_type": "llama",
"num_attention_heads": 32,
"num_hidden_layers": 32,
"num_key_value_heads": 8,
"pretraining_tp": 1,
"rms_norm_eps": 1e-05,
"rope_scaling": null,
"rope_theta": 500000.0,
"tie_word_embeddings": false,
"torch_dtype": "bfloat16",
"transformers_version": "4.43.3",
"use_cache": true,
"vocab_size": 128256
}
[INFO|tokenization_utils_base.py:2702] 2024-08-02 18:19:25,048 >> tokenizer config file saved in saves/llama3-8b/lora/sft/tokenizer_config.json
[INFO|tokenization_utils_base.py:2711] 2024-08-02 18:19:25,055 >> Special tokens file saved in saves/llama3-8b/lora/sft/special_tokens_map.json
***** train metrics *****
epoch = 2.9358
total_flos = 20332711GF
train_loss = 0.9435
train_runtime = 0:09:47.71
train_samples_per_second = 5.008
train_steps_per_second = 0.204
Figure saved at: saves/llama3-8b/lora/sft/training_loss.png
08/02/2024 18:19:25 - WARNING - llamafactory.extras.ploting - No metric eval_loss to plot.
08/02/2024 18:19:25 - WARNING - llamafactory.extras.ploting - No metric eval_accuracy to plot.
[INFO|trainer.py:3819] 2024-08-02 18:19:25,357 >>
***** Running Evaluation *****
[INFO|trainer.py:3821] 2024-08-02 18:19:25,357 >> Num examples = 110
[INFO|trainer.py:3824] 2024-08-02 18:19:25,357 >> Batch size = 1
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 37/37 [00:08<00:00, 4.50it/s]
***** eval metrics *****
epoch = 2.9358
eval_loss = 0.9702
eval_runtime = 0:00:08.33
eval_samples_per_second = 13.193
eval_steps_per_second = 4.438
[INFO|modelcard.py:449] 2024-08-02 18:19:33,712 >> Dropping the following result as it does not have all the necessary fields:
{'task': {'name': 'Causal Language Modeling', 'type': 'text-generation'}}

复制代码

(llama_factory_torch) root@notebook-1813389960667746306-scnlbe5oi5-17811:/public/home/scnlbe5oi5/Downloads/models/LLaMA-Factory# llamafactory-cli export examples/merge_lora/llama3_lora_sft.yaml
[2024-08-01 21:34:37,394] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[INFO|tokenization_utils_base.py:2287] 2024-08-01 21:34:41,664 >> loading file tokenizer.json
[INFO|tokenization_utils_base.py:2287] 2024-08-01 21:34:41,664 >> loading file added_tokens.json
[INFO|tokenization_utils_base.py:2287] 2024-08-01 21:34:41,664 >> loading file special_tokens_map.json
[INFO|tokenization_utils_base.py:2287] 2024-08-01 21:34:41,664 >> loading file tokenizer_config.json
[INFO|tokenization_utils_base.py:2533] 2024-08-01 21:34:42,030 >> Special tokens have been added in the vocabulary, make sure the associa ted word embeddings are fine-tuned or trained.
08/01/2024 21:34:42 - INFO - llamafactory.data.template - Replace eos token: <|eot_id|>
08/01/2024 21:34:42 - INFO - llamafactory.data.template - Add pad token: <|eot_id|>
[INFO|configuration_utils.py:731] 2024-08-01 21:34:42,031 >> loading configuration file /root/.cache/modelscope/hub/LLM-Research/Meta-Lla ma-3-8B-Instruct/config.json
[INFO|configuration_utils.py:800] 2024-08-01 21:34:42,032 >> Model config LlamaConfig {
"_name_or_path": "/root/.cache/modelscope/hub/LLM-Research/Meta-Llama-3-8B-Instruct",
"architectures": [
"LlamaForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"bos_token_id": 128000,
"eos_token_id": 128009,
"hidden_act": "silu",
"hidden_size": 4096,
"initializer_range": 0.02,
"intermediate_size": 14336,
"max_position_embeddings": 8192,
"mlp_bias": false,
"model_type": "llama",
"num_attention_heads": 32,
"num_hidden_layers": 32,
"num_key_value_heads": 8,
"pretraining_tp": 1,
"rms_norm_eps": 1e-05,
"rope_scaling": null,
"rope_theta": 500000.0,
"tie_word_embeddings": false,
"torch_dtype": "bfloat16",
"transformers_version": "4.43.3",
"use_cache": true,
"vocab_size": 128256
}
08/01/2024 21:34:42 - INFO - llamafactory.model.patcher - Using KV cache for faster generation.
[INFO|modeling_utils.py:3631] 2024-08-01 21:34:42,058 >> loading weights file /root/.cache/modelscope/hub/LLM-Research/Meta-Llama-3-8B-In struct/model.safetensors.index.json
[INFO|modeling_utils.py:1572] 2024-08-01 21:34:42,058 >> Instantiating LlamaForCausalLM model under default dtype torch.bfloat16.
[INFO|configuration_utils.py:1038] 2024-08-01 21:34:42,059 >> Generate config GenerationConfig {
"bos_token_id": 128000,
"eos_token_id": 128009
}
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████| 4/4 [00:01<00:00, 3.40it/s]
[INFO|modeling_utils.py:4463] 2024-08-01 21:34:43,324 >> All model checkpoint weights were used when initializing LlamaForCausalLM.
[INFO|modeling_utils.py:4471] 2024-08-01 21:34:43,324 >> All the weights of LlamaForCausalLM were initialized from the model checkpoint a t /root/.cache/modelscope/hub/LLM-Research/Meta-Llama-3-8B-Instruct.
If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions with out further training.
[INFO|configuration_utils.py:991] 2024-08-01 21:34:43,327 >> loading configuration file /root/.cache/modelscope/hub/LLM-Research/Meta-Lla ma-3-8B-Instruct/generation_config.json
[INFO|configuration_utils.py:1038] 2024-08-01 21:34:43,327 >> Generate config GenerationConfig {
"bos_token_id": 128000,
"do_sample": true,
"eos_token_id": [
128001,
128009
],
"max_length": 4096,
"temperature": 0.6,
"top_p": 0.9
}
08/01/2024 21:34:43 - INFO - llamafactory.model.model_utils.attention - Using vanilla attention implementation.
08/01/2024 21:40:34 - INFO - llamafactory.model.adapter - Merged 1 adapter(s).
08/01/2024 21:40:34 - INFO - llamafactory.model.adapter - Loaded adapter(s): saves/llama3-8b/lora/sft
08/01/2024 21:40:34 - INFO - llamafactory.model.loader - all params: 8,030,261,248
08/01/2024 21:40:34 - INFO - llamafactory.train.tuner - Convert model dtype to: torch.bfloat16.
[INFO|configuration_utils.py:472] 2024-08-01 21:40:34,700 >> Configuration saved in models/llama3_lora_sft/config.json
[INFO|configuration_utils.py:807] 2024-08-01 21:40:34,704 >> Configuration saved in models/llama3_lora_sft/generation_config.json
[INFO|modeling_utils.py:2763] 2024-08-01 21:40:49,039 >> The model is bigger than the maximum size per checkpoint (2GB) and is going to be split in 9 checkpoint shards. You can find where each parameters has been saved in the index located at models/llama3_lora_sft/model.safetensors.index.json.
[INFO|tokenization_utils_base.py:2702] 2024-08-01 21:40:49,046 >> tokenizer config file saved in models/llama3_lora_sft/tokenizer_config.json
[INFO|tokenization_utils_base.py:2711] 2024-08-01 21:40:49,048 >> Special tokens file saved in models/llama3_lora_sft/special_tokens_map.json

复制代码

(llama_factory_torch) root@notebook-1813389960667746306-scnlbe5oi5-17811:/public/home/scnlbe5oi5/Downloads/models# tree -L 6 LLaMA-Factory/models/llama3_lora_sft/
LLaMA-Factory/models/llama3_lora_sft/
|-- config.json
|-- generation_config.json
|-- model-00001-of-00009.safetensors
|-- model-00002-of-00009.safetensors
|-- model-00003-of-00009.safetensors
|-- model-00004-of-00009.safetensors
|-- model-00005-of-00009.safetensors
|-- model-00006-of-00009.safetensors
|-- model-00007-of-00009.safetensors
|-- model-00008-of-00009.safetensors
|-- model-00009-of-00009.safetensors
|-- model.safetensors.index.json
|-- special_tokens_map.json
|-- tokenizer.json
`-- tokenizer_config.json

复制代码

(llama_factory_torch) root@notebook-1813389960667746306-scnlbe5oi5-20553:/public/home/scnlbe5oi5/Downloads/models/LLaMA-Factory# llamafactory-cli chat examples/inference/llama3_lora_sft.yaml
[2024-08-02 22:08:48,070] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
2024-08-02 22:08:52,267 - modelscope - WARNING - Using branch: master as version is unstable, use with caution
[INFO|tokenization_utils_base.py:2287] 2024-08-02 22:08:52,535 >> loading file tokenizer.json
[INFO|tokenization_utils_base.py:2287] 2024-08-02 22:08:52,535 >> loading file added_tokens.json
[INFO|tokenization_utils_base.py:2287] 2024-08-02 22:08:52,535 >> loading file special_tokens_map.json
[INFO|tokenization_utils_base.py:2287] 2024-08-02 22:08:52,535 >> loading file tokenizer_config.json
[INFO|tokenization_utils_base.py:2533] 2024-08-02 22:08:52,818 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
08/02/2024 22:08:52 - INFO - llamafactory.data.template - Replace eos token: <|eot_id|>
08/02/2024 22:08:52 - INFO - llamafactory.data.template - Add pad token: <|eot_id|>
[INFO|configuration_utils.py:731] 2024-08-02 22:08:52,820 >> loading configuration file /root/.cache/modelscope/hub/LLM-Research/Meta-Llama-3-8B-Instruct/config.json
[INFO|configuration_utils.py:800] 2024-08-02 22:08:52,821 >> Model config LlamaConfig {
"_name_or_path": "/root/.cache/modelscope/hub/LLM-Research/Meta-Llama-3-8B-Instruct",
"architectures": [
"LlamaForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"bos_token_id": 128000,
"eos_token_id": 128009,
"hidden_act": "silu",
"hidden_size": 4096,
"initializer_range": 0.02,
"intermediate_size": 14336,
"max_position_embeddings": 8192,
"mlp_bias": false,
"model_type": "llama",
"num_attention_heads": 32,
"num_hidden_layers": 32,
"num_key_value_heads": 8,
"pretraining_tp": 1,
"rms_norm_eps": 1e-05,
"rope_scaling": null,
"rope_theta": 500000.0,
"tie_word_embeddings": false,
"torch_dtype": "bfloat16",
"transformers_version": "4.43.3",
"use_cache": true,
"vocab_size": 128256
}
08/02/2024 22:08:52 - INFO - llamafactory.model.patcher - Using KV cache for faster generation.
[INFO|modeling_utils.py:3631] 2024-08-02 22:08:52,847 >> loading weights file /root/.cache/modelscope/hub/LLM-Research/Meta-Llama-3-8B-Instruct/model.safetensors.index.json
[INFO|modeling_utils.py:1572] 2024-08-02 22:08:52,847 >> Instantiating LlamaForCausalLM model under default dtype torch.bfloat16.
[INFO|configuration_utils.py:1038] 2024-08-02 22:08:52,848 >> Generate config GenerationConfig {
"bos_token_id": 128000,
"eos_token_id": 128009
}
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:07<00:00, 1.98s/it]
[INFO|modeling_utils.py:4463] 2024-08-02 22:09:01,148 >> All model checkpoint weights were used when initializing LlamaForCausalLM.
[INFO|modeling_utils.py:4471] 2024-08-02 22:09:01,148 >> All the weights of LlamaForCausalLM were initialized from the model checkpoint at /root/.cache/modelscope/hub/LLM-Research/Meta-Llama-3-8B-Instruct.
If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training.
[INFO|configuration_utils.py:991] 2024-08-02 22:09:01,151 >> loading configuration file /root/.cache/modelscope/hub/LLM-Research/Meta-Llama-3-8B-Instruct/generation_config.json
[INFO|configuration_utils.py:1038] 2024-08-02 22:09:01,152 >> Generate config GenerationConfig {
"bos_token_id": 128000,
"do_sample": true,
"eos_token_id": [
128001,
128009
],
"max_length": 4096,
"temperature": 0.6,
"top_p": 0.9
}
08/02/2024 22:09:01 - INFO - llamafactory.model.model_utils.attention - Using vanilla attention implementation.
08/02/2024 22:09:06 - INFO - llamafactory.model.adapter - Merged 1 adapter(s).
08/02/2024 22:09:06 - INFO - llamafactory.model.adapter - Loaded adapter(s): saves/llama3-8b/lora/sft
08/02/2024 22:09:06 - INFO - llamafactory.model.loader - all params: 8,030,261,248
Welcome to the CLI application, use `clear` to remove the history, use `exit` to exit the application.
User: 中国深圳有哪些旅游景点
Assistant: 深圳是一个非常有名的旅游城市，拥有许多名副其名的旅游景点。以下是一些主要的旅游景点：
1. Window of the World：这是一个规模宏大的主题公园，展示了世界各地的风土人情。
2. Splendid China Miniature Theme Park：这个公园展现了中国的历史和文化，拥有许多精致的模型和景观。
3. Dafen Oil Painting Village：这个村庄是中国最大的油画村，拥有数以万计的油画作品，展示了中国油画的技艺。
4. Dameisha Beach：这个沙滩是深圳最为人知的旅游景点之一，拥有洁洁的沙滩和清澈的海水，是一个非常适合休闲的场所。
5. Mangrove Forest Nature Reserve：这个自然保护区拥有广泛的 mangrove 森林，展示了中国的自然景观。
6. Shenzhen Museum：这个博物馆展现了深圳的历史和文化，拥有许多历史和艺术的收藏品。
7. Lianhua Mountain Park：这个公园是深圳最大的公园，拥有许多山路和景观，展示了中国的自然美景。
8. Shenzhen Bay Sports Center：这个体育中心拥有许多不同的运动场所，展示了中国的体育技艺。
9. OCT-LOFT：这个文化区拥有许多艺术和文化的项目，展示了中国的艺术和文化。
10. Fairy Lake Botanical Garden：这个植物园拥有许多不同的植物和花卉，展示了中国的自然美景。
User: 中国广州有哪些旅游景点
Assistant: 广州是一个非常有名的旅游城市，拥有许多名副其名的旅游景点。以下是一些主要的旅游景点：
1. Canton Tower：这是一个位于广州的超高建筑，拥有360度的观景台，展示了广州的全景。
2. Chimelong Paradise：这个主题公园拥有许多不同的游乐设施和景观，展示了中国的游乐技艺。
3. Baiyun Mountain：这个山区拥有许多不同的景观和游乐设施，展示了中国的自然美景。
4. Yuexiu Park：这个公园是广州最大的公园，拥有许多不同的景观和游乐设施，展示了中国的自然美景。
5. Temple of the Six Banyan Trees：这个寺庙拥有许多不同的文化和历史的收藏品，展示了中国的历史和文化。
6. Museum of the Chinese Revolution：这个博物馆展现了中国革命的历史和文化，拥有许多不同的收藏品和展品。
7. Guangzhou Tower：这个塔楼是广州最早的建筑，拥有许多不同的景观和游乐设施，展示了中国的历史和文化。
8. Guangzhou Museum：这个博物馆展现了广州的历史和文化，拥有许多不同的收藏品和展品。
9. Flower Street：这个街区拥有许多不同的花卉和景观，展示了中国的自然美景。
10. Shamian Island：这个岛区拥有许多不同的景观和游乐设施，展示了中国的自然美景和历史文化。

复制代码

(llama_fct) root@notebook-1813389960667746306-scnlbe5oi5-17811:/public/home/scnlbe5oi5/Downloads/mod
els/LLaMA-Factory# llamafactory-cli train examples/train_lora/llama3_lora_sft.yaml
No ROCm runtime is found, using ROCM_HOME='/opt/dtk'
/opt/conda/envs/llama_fct/lib/python3.10/site-packages/torchvision/io/image.py:13: UserWarning: Fail ed to load image Python extension: 'libc10_hip.so: cannot open shared object file: No such file or d irectory'If you don't plan on using image functionality from `torchvision.io`, you can ignore this w arning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg` or ` libpng` installed before building `torchvision` from source?
warn(
[2024-08-01 15:13:21,242] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
08/01/2024 15:13:24 - INFO - llamafactory.hparams.parser - Process rank: 0, device: cpu, n_gpu: 0, distributed training: False, compute dtype: torch.bfloat16
[INFO|tokenization_auto.py:682] 2024-08-01 15:13:25,152 >> Could not locate the tokenizer configuration file, will try to use the model config instead.
Traceback (most recent call last):
File "/opt/conda/envs/llama_fct/lib/python3.10/site-packages/huggingface_hub/utils/_errors.py", line 304, in hf_raise_for_status
response.raise_for_status()
File "/opt/conda/envs/llama_fct/lib/python3.10/site-packages/requests/models.py", line 1024, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://hf-mirror.com/meta-llama/Meta-Llama-3-8B-Instruct/resolve/main/config.json
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/opt/conda/envs/llama_fct/lib/python3.10/site-packages/transformers/utils/hub.py", line 402, in cached_file
resolved_file = hf_hub_download(
File "/opt/conda/envs/llama_fct/lib/python3.10/site-packages/huggingface_hub/utils/_deprecation.py", line 101, in inner_f
return f(*args, **kwargs)
File "/opt/conda/envs/llama_fct/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
return fn(*args, **kwargs)
File "/opt/conda/envs/llama_fct/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1240, in hf_hub_download
return _hf_hub_download_to_cache_dir(
File "/opt/conda/envs/llama_fct/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1347, in _hf_hub_download_to_cache_dir
_raise_on_head_call_error(head_call_error, force_download, local_files_only)
File "/opt/conda/envs/llama_fct/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1854, in _raise_on_head_call_error
raise head_call_error
File "/opt/conda/envs/llama_fct/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1751, in _get_metadata_or_catch_error
metadata = get_hf_file_metadata(
File "/opt/conda/envs/llama_fct/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
return fn(*args, **kwargs)
File "/opt/conda/envs/llama_fct/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1673, in get_hf_file_metadata
r = _request_wrapper(
File "/opt/conda/envs/llama_fct/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 376, in _request_wrapper
response = _request_wrapper(
File "/opt/conda/envs/llama_fct/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 400, in _request_wrapper
hf_raise_for_status(response)
File "/opt/conda/envs/llama_fct/lib/python3.10/site-packages/huggingface_hub/utils/_errors.py", line 321, in hf_raise_for_status
raise GatedRepoError(message, response) from e
huggingface_hub.utils._errors.GatedRepoError: 401 Client Error. (Request ID: Root=1-66ab3595-53663c2f4d5cf81405b65b9e;080cfa15-3220-4ab1-b123-4a32ba31a03a)
Cannot access gated repo for url https://hf-mirror.com/meta-llama/Meta-Llama-3-8B-Instruct/resolve/main/config.json.
Access to model meta-llama/Meta-Llama-3-8B-Instruct is restricted. You must be authenticated to access it.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/opt/conda/envs/llama_fct/bin/llamafactory-cli", line 8, in <module>
sys.exit(main())
File "/public/home/scnlbe5oi5/Downloads/models/LLaMA-Factory/src/llamafactory/cli.py", line 111, in main
run_exp()
File "/public/home/scnlbe5oi5/Downloads/models/LLaMA-Factory/src/llamafactory/train/tuner.py", line 50, in run_exp
run_sft(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
File "/public/home/scnlbe5oi5/Downloads/models/LLaMA-Factory/src/llamafactory/train/sft/workflow.py", line 44, in run_sft
tokenizer_module = load_tokenizer(model_args)
File "/public/home/scnlbe5oi5/Downloads/models/LLaMA-Factory/src/llamafactory/model/loader.py", line 69, in load_tokenizer
tokenizer = AutoTokenizer.from_pretrained(
File "/opt/conda/envs/llama_fct/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py", line 853, in from_pretrained
config = AutoConfig.from_pretrained(
File "/opt/conda/envs/llama_fct/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 972, in from_pretrained
config_dict, unused_kwargs = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
File "/opt/conda/envs/llama_fct/lib/python3.10/site-packages/transformers/configuration_utils.py", line 632, in get_config_dict
config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)
File "/opt/conda/envs/llama_fct/lib/python3.10/site-packages/transformers/configuration_utils.py", line 689, in _get_config_dict
resolved_config_file = cached_file(
File "/opt/conda/envs/llama_fct/lib/python3.10/site-packages/transformers/utils/hub.py", line 420, in cached_file
raise EnvironmentError(
OSError: You are trying to access a gated repo.
Make sure to have access to it at https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct.
401 Client Error. (Request ID: Root=1-66ab3595-53663c2f4d5cf81405b65b9e;080cfa15-3220-4ab1-b123-4a32ba31a03a)
Cannot access gated repo for url https://hf-mirror.com/meta-llama/Meta-Llama-3-8B-Instruct/resolve/main/config.json.
Access to model meta-llama/Meta-Llama-3-8B-Instruct is restricted. You must be authenticated to access it.

复制代码

export USE_MODELSCOPE_HUB=1
# Windows 使用 `set USE_MODELSCOPE_HUB=1`

复制代码

# model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
改为
model_name_or_path: LLM-Research/Meta-Llama-3-8B-Instruct

复制代码

llamafactory-cli train examples/train_lora/llama3_lora_sft.yaml

复制代码

(llama_factory_torch) root@notebook-1813389960667746306-scnlbe5oi5-17811:/public/home/scnlbe5oi5/Downloads/models/LLaMA-Factory# llamafactory-cli chat examples/inference/llama3_lora_sft.yaml
[2024-08-01 21:17:22,212] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
Traceback (most recent call last):
File "/opt/conda/envs/llama_factory_torch/lib/python3.10/site-packages/huggingface_hub/utils/_errors.py", line 304, in hf_raise_for_status
response.raise_for_status()
File "/opt/conda/envs/llama_factory_torch/lib/python3.10/site-packages/requests/models.py", line 1024, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://hf-mirror.com/LLM-Research/Meta-Llama-3-8B-Instruct/resolve/main/tokenizer_config.json
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/opt/conda/envs/llama_factory_torch/lib/python3.10/site-packages/transformers/utils/hub.py", line 402, in cached_file
resolved_file = hf_hub_download(
File "/opt/conda/envs/llama_factory_torch/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
return fn(*args, **kwargs)
File "/opt/conda/envs/llama_factory_torch/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1221, in hf_hub_download
return _hf_hub_download_to_cache_dir(
File "/opt/conda/envs/llama_factory_torch/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1325, in _hf_hub_download_to_cache_dir
_raise_on_head_call_error(head_call_error, force_download, local_files_only)
File "/opt/conda/envs/llama_factory_torch/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1823, in _raise_on_head_call_error
raise head_call_error
File "/opt/conda/envs/llama_factory_torch/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1722, in _get_metadata_or_catch_error
metadata = get_hf_file_metadata(url=url, proxies=proxies, timeout=etag_timeout, headers=headers)
File "/opt/conda/envs/llama_factory_torch/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
return fn(*args, **kwargs)
File "/opt/conda/envs/llama_factory_torch/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1645, in get_hf_file_metadata
r = _request_wrapper(
File "/opt/conda/envs/llama_factory_torch/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 372, in _request_wrapper
response = _request_wrapper(
File "/opt/conda/envs/llama_factory_torch/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 396, in _request_wrapper
hf_raise_for_status(response)
File "/opt/conda/envs/llama_factory_torch/lib/python3.10/site-packages/huggingface_hub/utils/_errors.py", line 352, in hf_raise_for_status
raise RepositoryNotFoundError(message, response) from e
huggingface_hub.utils._errors.RepositoryNotFoundError: 401 Client Error. (Request ID: Root=1-66ab8ae6-4ed0547e1f86fcb201b723f8;acee559e-0676-48e4-8871-b6eb58e797ca)
Repository Not Found for url: https://hf-mirror.com/LLM-Research/Meta-Llama-3-8B-Instruct/resolve/main/tokenizer_config.json.
Please make sure you specified the correct `repo_id` and `repo_type`.
If you are trying to access a private or gated repo, make sure you are authenticated.
Invalid username or password.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/opt/conda/envs/llama_factory_torch/bin/llamafactory-cli", line 8, in <module>
sys.exit(main())
File "/public/home/scnlbe5oi5/Downloads/models/LLaMA-Factory/src/llamafactory/cli.py", line 81, in main
run_chat()
File "/public/home/scnlbe5oi5/Downloads/models/LLaMA-Factory/src/llamafactory/chat/chat_model.py", line 125, in run_chat
chat_model = ChatModel()
File "/public/home/scnlbe5oi5/Downloads/models/LLaMA-Factory/src/llamafactory/chat/chat_model.py", line 44, in __init__
self.engine: "BaseEngine" = HuggingfaceEngine(model_args, data_args, finetuning_args, generating_args)
File "/public/home/scnlbe5oi5/Downloads/models/LLaMA-Factory/src/llamafactory/chat/hf_engine.py", line 53, in __init__
tokenizer_module = load_tokenizer(model_args)
File "/public/home/scnlbe5oi5/Downloads/models/LLaMA-Factory/src/llamafactory/model/loader.py", line 69, in load_tokenizer
tokenizer = AutoTokenizer.from_pretrained(
File "/opt/conda/envs/llama_factory_torch/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py", line 833, in from_pretrained
tokenizer_config = get_tokenizer_config(pretrained_model_name_or_path, **kwargs)
File "/opt/conda/envs/llama_factory_torch/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py", line 665, in get_tokenizer_config
resolved_config_file = cached_file(
File "/opt/conda/envs/llama_factory_torch/lib/python3.10/site-packages/transformers/utils/hub.py", line 425, in cached_file
raise EnvironmentError(
OSError: LLM-Research/Meta-Llama-3-8B-Instruct is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models'
If this is a private repository, make sure to pass a token having permission to this repo either by logging in with `huggingface-cli login` or by passing `token=<your_token>`

复制代码

export USE_MODELSCOPE_HUB=1

复制代码

(llama_factory_torch) root@notebook-1813389960667746306-scnlbe5oi5-17811:/public/home/scnlbe5oi5/Downloads/models/LLaMA-Factory# llamafactory-cli train examples/train_lora/llama3_lora_sft.yaml
[2024-08-01 19:05:15,320] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (a uto detect)08/01/2024 19:05:18 - INFO - llamafactory.hparams.parser - Process rank: 0, device: cuda:0, n_gpu: 1, distri buted training: False, compute dtype: torch.bfloat16Traceback (most recent call last): File "/public/home/scnlbe5oi5/Downloads/models/LLaMA-Factory/src/llamafactory/extras/misc.py", line 219, i n try_download_model_from_ms from modelscope import snapshot_downloadModuleNotFoundError: No module named 'modelscope'During handling of the above exception, another exception occurred:Traceback (most recent call last): File "/opt/conda/envs/llama_factory_torch/bin/llamafactory-cli", line 8, in <module> sys.exit(main()) File "/public/home/scnlbe5oi5/Downloads/models/LLaMA-Factory/src/llamafactory/cli.py", line 111, in main run_exp() File "/public/home/scnlbe5oi5/Downloads/models/LLaMA-Factory/src/llamafactory/train/tuner.py", line 50, in run_exp run_sft(model_args, data_args, training_args, finetuning_args, generating_args, callbacks) File "/public/home/scnlbe5oi5/Downloads/models/LLaMA-Factory/src/llamafactory/train/sft/workflow.py", line 44, in run_sft tokenizer_module = load_tokenizer(model_args) File "/public/home/scnlbe5oi5/Downloads/models/LLaMA-Factory/src/llamafactory/model/loader.py", line 67, i n load_tokenizer init_kwargs = _get_init_kwargs(model_args) File "/public/home/scnlbe5oi5/Downloads/models/LLaMA-Factory/src/llamafactory/model/loader.py", line 52, i n _get_init_kwargs model_args.model_name_or_path = try_download_model_from_ms(model_args) File "/public/home/scnlbe5oi5/Downloads/models/LLaMA-Factory/src/llamafactory/extras/misc.py", line 224, i n try_download_model_from_ms raise ImportError("Please install modelscope via `pip install modelscope -U`")ImportError: Please install modelscope via `pip install modelscope -U`

复制代码

(llama_factory_torch) root@notebook-1813389960667746306-scnlbe5oi5-17811:/public/home/scnlbe5oi5/Downloads/m
odels/LLaMA-Factory# pip install --no-dependencies modelscope
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Collecting modelscope
Using cached https://pypi.tuna.tsinghua.edu.cn/packages/38/37/9fe505ebc67ba5e0345a69d6e8b2ee8630523975b484 d221691ef60182bd/modelscope-1.16.1-py3-none-any.whl (5.7 MB)
Installing collected packages: modelscope
Successfully installed modelscope-1.16.1

复制代码

(llama_fct_pytorch) root@notebook-1813389960667746306-scnlbe5oi5-17811:/public/home/scnlbe5oi5/Downloads/models/LLaMA-Factory# llamafactory-cli train examples/train_lora/llama3_lora_sft.yaml
Traceback (most recent call last): File "/opt/conda/envs/llama_fct_pytorch/bin/llamafactory-cli", line 5, in <module> from llamafactory.cli import main File "/public/home/scnlbe5oi5/Downloads/models/LLaMA-Factory/src/llamafactory/__init__.py", line 38, in <module> from .cli import VERSION File "/public/home/scnlbe5oi5/Downloads/models/LLaMA-Factory/src/llamafactory/cli.py", line 21, in <module> from . import launcher File "/public/home/scnlbe5oi5/Downloads/models/LLaMA-Factory/src/llamafactory/launcher.py", line 15, in <module> from llamafactory.train.tuner import run_exp File "/public/home/scnlbe5oi5/Downloads/models/LLaMA-Factory/src/llamafactory/train/tuner.py", line 19, in <module> import torch File "/opt/conda/envs/llama_fct_pytorch/lib/python3.10/site-packages/torch/__init__.py", line 237, in <module> from torch._C import * # noqa: F403ImportError: /opt/conda/envs/llama_fct_pytorch/lib/python3.10/site-packages/torch/lib/libtorch_hip.so: undefined symbol: ncclCommInitRankConfig

复制代码

>>> import torch
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/opt/conda/envs/llama_fct_pytorch/lib/python3.10/site-packages/torch/__init__.py", line 237, in <module>
from torch._C import * # noqa: F403
ImportError: /opt/conda/envs/llama_fct_pytorch/lib/python3.10/site-packages/torch/lib/libtorch_hip.so: undefined symbol: ncclCommInitRankConfig

复制代码

(llama_fct) root@notebook-1813389960667746306-scnlbe5oi5-17811:/public/home/scnlbe5oi5/Downloads/models/LLaMA-Factory# llamafactory-cli train examples/train_lora/llama3_lora_sft.yaml
No ROCm runtime is found, using ROCM_HOME='/opt/dtk'/opt/conda/envs/llama_fct/lib/python3.10/site-packages/torchvision/io/image.py:13: UserWarning: Failed to lo ad image Python extension: 'libc10_hip.so: cannot open shared object file: No such file or directory'If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg` or `libpng` installed before buildin g `torchvision` from source? warn([2024-08-01 17:49:08,805] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (a uto detect)08/01/2024 17:49:12 - INFO - llamafactory.hparams.parser - Process rank: 0, device: cpu, n_gpu: 0, distribut ed training: False, compute dtype: torch.bfloat16Downloading: 100%|█████████████████████████████████████████████████████████| 654/654 [00:00<00:00, 2.56kB/s]Downloading: 100%|█████████████████████████████████████████████████████████| 48.0/48.0 [00:00<00:00, 183B/s]Downloading: 100%|███████████████████████████████████████████████████████████| 187/187 [00:00<00:00, 759B/s]Downloading: 100%|█████████████████████████████████████████████████████| 7.62k/7.62k [00:00<00:00, 29.9kB/s]Downloading: 100%|█████████████████████████████████████████████████████| 4.63G/4.63G [01:33<00:00, 53.4MB/s]Downloading: 100%|█████████████████████████████████████████████████████| 4.66G/4.66G [01:02<00:00, 79.9MB/s]Downloading: 100%|█████████████████████████████████████████████████████| 4.58G/4.58G [01:00<00:00, 81.7MB/s]Downloading: 100%|█████████████████████████████████████████████████████| 1.09G/1.09G [00:22<00:00, 51.6MB/s]Downloading: 100%|█████████████████████████████████████████████████████| 23.4k/23.4k [00:00<00:00, 53.6kB/s]Downloading: 100%|██████████████████████████████████████████████████████| 36.3k/36.3k [00:00<00:00, 125kB/s]Downloading: 100%|█████████████████████████████████████████████████████████| 73.0/73.0 [00:00<00:00, 293B/s]Downloading: 100%|█████████████████████████████████████████████████████| 8.66M/8.66M [00:00<00:00, 13.5MB/s]Downloading: 100%|█████████████████████████████████████████████████████| 49.8k/49.8k [00:00<00:00, 90.0kB/s]Downloading: 100%|█████████████████████████████████████████████████████| 4.59k/4.59k [00:00<00:00, 18.7kB/s][INFO|tokenization_utils_base.py:2287] 2024-08-01 17:53:53,510 >> loading file tokenizer.json[INFO|tokenization_utils_base.py:2287] 2024-08-01 17:53:53,511 >> loading file added_tokens.json[INFO|tokenization_utils_base.py:2287] 2024-08-01 17:53:53,511 >> loading file special_tokens_map.json[INFO|tokenization_utils_base.py:2287] 2024-08-01 17:53:53,511 >> loading file tokenizer_config.json[INFO|tokenization_utils_base.py:2533] 2024-08-01 17:53:53,854 >> Special tokens have been added in the voca bulary, make sure the associated word embeddings are fine-tuned or trained.08/01/2024 17:53:53 - INFO - llamafactory.data.template - Replace eos token: <|eot_id|>08/01/2024 17:53:53 - INFO - llamafactory.data.template - Add pad token: <|eot_id|>08/01/2024 17:53:53 - INFO - llamafactory.data.loader - Loading dataset identity.json...Generating train split: 91 examples [00:00, 10580.81 examples/s]Converting format of dataset (num_proc=16): 100%|███████████████████| 91/91 [00:00<00:00, 427.78 examples/s]08/01/2024 17:53:56 - INFO - llamafactory.data.loader - Loading dataset alpaca_en_demo.json...Generating train split: 1000 examples [00:00, 66788.28 examples/s]Converting format of dataset (num_proc=16): 100%|██████████████████████████████████████████████████████████████████████████████████████████| 1000/1000 [00:00<00:00, 4688.60 examples/s]Running tokenizer on dataset (num_proc=16): 100%|███████████████████████████████████████████████████████████████████████████████████████████| 1091/1091 [00:03<00:00, 295.08 examples/s]training example:input_ids:[128000, 128006, 882, 128007, 271, 6151, 128009, 128006, 78191, 128007, 271, 9906, 0, 358, 1097, 5991, 609, 39254, 459, 15592, 18328, 8040, 555, 5991, 3170, 3500, 13, 2650, 649, 358, 7945, 499, 3432, 30, 128009]inputs:<|begin_of_text|><|start_header_id|>user<|end_header_id|>hi<|eot_id|><|start_header_id|>assistant<|end_header_id|>Hello! I am {{name}}, an AI assistant developed by {{author}}. How can I assist you today?<|eot_id|>label_ids:[-100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, 9906, 0, 358, 1097, 5991, 609, 39254, 459, 15592, 18328, 8040, 555, 5991, 3170, 3500, 13, 2650, 649, 358, 7945, 499, 3432, 30, 128009]labels:Hello! I am {{name}}, an AI assistant developed by {{author}}. How can I assist you today?<|eot_id|>[INFO|configuration_utils.py:731] 2024-08-01 17:54:02,106 >> loading configuration file /root/.cache/modelscope/hub/LLM-Research/Meta-Llama-3-8B-Instruct/config.json[INFO|configuration_utils.py:800] 2024-08-01 17:54:02,108 >> Model config LlamaConfig { "_name_or_path": "/root/.cache/modelscope/hub/LLM-Research/Meta-Llama-3-8B-Instruct", "architectures": [ "LlamaForCausalLM" ], "attention_bias": false, "attention_dropout": 0.0, "bos_token_id": 128000, "eos_token_id": 128009, "hidden_act": "silu", "hidden_size": 4096, "initializer_range": 0.02, "intermediate_size": 14336, "max_position_embeddings": 8192, "mlp_bias": false, "model_type": "llama", "num_attention_heads": 32, "num_hidden_layers": 32, "num_key_value_heads": 8, "pretraining_tp": 1, "rms_norm_eps": 1e-05, "rope_scaling": null, "rope_theta": 500000.0, "tie_word_embeddings": false, "torch_dtype": "bfloat16", "transformers_version": "4.43.3", "use_cache": true, "vocab_size": 128256}[INFO|modeling_utils.py:3631] 2024-08-01 17:54:02,139 >> loading weights file /root/.cache/modelscope/hub/LLM-Research/Meta-Llama-3-8B-Instruct/model.safetensors.index.json[INFO|modeling_utils.py:1572] 2024-08-01 17:54:02,140 >> Instantiating LlamaForCausalLM model under default dtype torch.bfloat16.[INFO|configuration_utils.py:1038] 2024-08-01 17:54:02,142 >> Generate config GenerationConfig { "bos_token_id": 128000, "eos_token_id": 128009}Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:01<00:00, 2.68it/s][INFO|modeling_utils.py:4463] 2024-08-01 17:54:03,708 >> All model checkpoint weights were used when initializing LlamaForCausalLM.[INFO|modeling_utils.py:4471] 2024-08-01 17:54:03,709 >> All the weights of LlamaForCausalLM were initialized from the model checkpoint at /root/.cache/modelscope/hub/LLM-Research/Meta-Llama-3-8B-Instruct.If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training.[INFO|configuration_utils.py:991] 2024-08-01 17:54:03,712 >> loading configuration file /root/.cache/modelscope/hub/LLM-Research/Meta-Llama-3-8B-Instruct/generation_config.json[INFO|configuration_utils.py:1038] 2024-08-01 17:54:03,713 >> Generate config GenerationConfig { "bos_token_id": 128000, "do_sample": true, "eos_token_id": [ 128001, 128009 ], "max_length": 4096, "temperature": 0.6, "top_p": 0.9}08/01/2024 17:54:03 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled.08/01/2024 17:54:03 - INFO - llamafactory.model.model_utils.attention - Using torch SDPA for faster training and inference.08/01/2024 17:54:03 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32.08/01/2024 17:54:03 - INFO - llamafactory.model.adapter - Fine-tuning method: LoRA08/01/2024 17:54:03 - INFO - llamafactory.model.model_utils.misc - Found linear modules: q_proj,down_proj,o_proj,k_proj,gate_proj,up_proj,v_proj08/01/2024 17:54:08 - INFO - llamafactory.model.loader - trainable params: 20,971,520 || all params: 8,051,232,768 || trainable%: 0.2605Detected kernel version 3.10.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.[INFO|trainer.py:648] 2024-08-01 17:54:08,091 >> Using cpu_amp half precision backend[INFO|trainer.py:2134] 2024-08-01 17:54:09,008 >> ***** Running training *****[INFO|trainer.py:2135] 2024-08-01 17:54:09,008 >> Num examples = 981[INFO|trainer.py:2136] 2024-08-01 17:54:09,008 >> Num Epochs = 3[INFO|trainer.py:2137] 2024-08-01 17:54:09,008 >> Instantaneous batch size per device = 1[INFO|trainer.py:2140] 2024-08-01 17:54:09,008 >> Total train batch size (w. parallel, distributed & accumulation) = 8[INFO|trainer.py:2141] 2024-08-01 17:54:09,008 >> Gradient Accumulation steps = 8[INFO|trainer.py:2142] 2024-08-01 17:54:09,008 >> Total optimization steps = 366[INFO|trainer.py:2143] 2024-08-01 17:54:09,012 >> Number of trainable parameters = 20,971,520 0%| | 0/366 [00:00<?, ?it/s

复制代码