[WARNING|logging.py:313] 2024-07-09 09:24:49,392 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
07/09/2024 09:24:49 - INFO - llamafactory.data.template - Add <|user|>,<|observation|> to stop words.
07/09/2024 09:24:49 - INFO - llamafactory.data.loader - Loading dataset identity.json...
Converting format of dataset (num_proc=16): 100%|████████████████████████████████████████████████████████████████████████████████████████████████| 91/91 [00:00<00:00, 187.32 examples/s]
07/09/2024 09:25:01 - INFO - llamafactory.data.loader - Loading dataset alpaca_en_demo.json...
Converting format of dataset (num_proc=16): 100%|███████████████████████████████████████████████████████████████████████████████████████████| 1000/1000 [00:00<00:00, 2385.01 examples/s]
Running tokenizer on dataset (num_proc=16): 100%|█████████████████████████████████████████████████████████████████████████████████████████████| 1091/1091 [00:43<00:00, 24.98 examples/s]
Setting ds_accelerator to npu (auto detect):指示程序将利用NPU(神经处置惩罚单元)加快器,系统自动检测到这一设置。
async_io requires the dev libaio .so object and headers but these were not found.:警告提示缺少 libaio 库,这可能影响异步IO的性能。
If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.:建议假如已经安装了 libaio,可以实验设置 CFLAGS 和 LDFLAGS 环境变量来正确定位该库。
下载和处置惩罚数据集:
大量的 Downloading 和 Converting format of dataset 行指示正在下载和转换数据集,这是模型练习过程中常见的操作。
RuntimeError: call aclnnCast failed, detail:EZ1001: 2024-07-09-09:38:52.309.843 The param dtype not implemented for DT_BFLOAT16, should be in dtype support list [DT_FLOAT16,DT_FLOAT,DT_DOUBLE,DT_INT8,DT_UINT8,DT_INT16,DT_INT32,DT_INT64,DT_UINT16,DT_UINT32,DT_UINT64,DT_BOOL,DT_COMPLEX64,DT_COMPLEX128,].
RuntimeError: call aclnnCast failed, detail:EZ1001: 2024-07-09-09:38:52.309.843 The param dtype not implemented for DT_BFLOAT16, should be in dtype support list [DT_FLOAT16,DT_FLOAT,DT_DOUBLE,DT_INT8,DT_UINT8,DT_INT16,DT_INT32,DT_INT64,DT_UINT16,DT_UINT32,DT_UINT64,DT_BOOL,DT_COMPLEX64,DT_COMPLEX128,].
这段信息表明,在实验将数据范例从DT_BFLOAT16转换为DeepSpeed支持的范例时失败了。DT_BFLOAT16不在DeepSpeed支持的范例列表中,以是转换失败。
File "/home/ma-user/anaconda3/envs/MindSpore/lib/python3.9/site-packages/torch_npu/npu/amp/autocast_mode.py", line 113, in decorate_fwd
return fwd(*args, **kwargs)
File "/home/ma-user/anaconda3/envs/MindSpore/lib/python3.9/site-packages/deepspeed/runtime/zero/linear.py", line 59, in forward
output += bias
RuntimeError: call aclnnInplaceAdd failed, detail:EZ1001: 2024-07-09-10:40:00.116.800 the size of tensor selfRef [1,120] must match the size of tensor other [0].
TraceBack (most recent call last):
120 and 0 cannot broadcast.
the size of tensor selfRef [1,120] must match the size of tensor other [0].
更换成qwen2-7B举行微调练习出现了tensor 不匹配的问题。对非常日志举行解读。
从日志来看,报错的缘故原由是发生了张量操作的维度不匹配。具体来说,错误信息 the size of tensor selfRef [1,120] must match the size of tensor other [0] 表现在举行 aclnnInplaceAdd 操作时,一个张量的维度是 [1,120],另一个张量的维度是 [0],导致无法举行广播操作。这通常是由于数据输入的外形或巨细设置不正确引起的。以下是详细的解读及可能的解决方案:
错误日志解读
RuntimeError: call aclnnInplaceAdd failed, detail:EZ1001: 2024-07-09-10:40:00.116.800 the size of tensor selfRef [1,120] must match the size of tensor other [0].
Traceback (most recent call last): File "/home/ma-user/anaconda3/envs/MindSpore/bin/llamafactory-cli", line 8, in <module> sys.exit(main())
File "/tmp/code/LLaMA-Factory/src/llamafactory/cli.py", line 110, in main run_exp()
File "/tmp/code/LLaMA-Factory/src/llamafactory/train/tuner.py", line 47, in run_exp run_sft(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
File "/tmp/code/LLaMA-Factory/src/llamafactory/train/sft/workflow.py", line 49, in run_sft model = load_model(tokenizer, model_args, finetuning_args, training_args.do_train) File "/tmp/code/LLaMA-Factory/src/llamafactory/model/loader.py", line 160, in load_model model = init_adapter(config, model, model_args, finetuning_args, is_trainable) File "/tmp/code/LLaMA-Factory/src/llamafactory/model/adapter.py", line 306, in init_adapter _setup_full_tuning(model, model_args, finetuning_args, is_trainable, cast_trainable_params_to_fp32) File "/tmp/code/LLaMA-Factory/src/llamafactory/model/adapter.py", line 59, in _setup_full_tuning param.data = param.data.to(torch.float32)RuntimeError: NPU out of memory. Tried to allocate 2.03 GiB (NPU 0; 32.00 GiB total capacity; 29.19 GiB already allocated; 29.19 GiB current active; 412.09 MiB free; 30.43 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.