开源模型应用落地-qwen2-7b-instruct-LoRA微调&归并-ms-swift-单机单卡-V10 ...

打印 上一主题 下一主题

主题 878|帖子 878|积分 2634

一、前言

    本篇文章将利用ms-swift去归并微调后的模型权重,通过阅读本文,您将能够更好地把握这些关键技术,理解其中的关键技术要点,并应用于自己的项目中。

二、术语先容

2.1. LoRA微调

    LoRA (Low-Rank Adaptation) 用于微调大型语言模型 (LLM)。  是一种有效的自适应计谋,它不会引入额外的推理延迟,并在保持模型质量的同时显着减少下游使命的可训练参数数目。
2.2.参数高效微调(PEFT) 

    仅微调少量 (额外) 模型参数,同时冻结预训练 LLM 的大部分参数,从而大大降低了盘算和存储本钱。
2.3.Qwen2-7B-Instruct

    是通义千问 Qwen2 系列中的一个指令微调模型。它在 Qwen2-7B 的底子上举行了指令微调,以进步模型在特定使命上的性能。
    Qwen2-7B-Instruct 具有以下特点:


  • 强大的性能:在多个基准测试中,Qwen2-7B-Instruct 的性能可与 Llama-3-70B-Instruct 相匹敌。
  • 代码和数学能力提拔:得益于高质量的数据和指令微调,Qwen2-7B-Instruct 在数学和代码能力上实现了飞升。
  • 多语言能力:模型训练过程中增加了 27 种语言相关的高质量数据,提拔了多语言能力。
  • 上下文长度支持:Qwen2 系列中的所有 Instruct 模型均在 32k 上下文中举行训练,Qwen2-7B-Instruct 和 Qwen2-72B-Instruct 还支持最高可达 128k token 的上下文长度。
2.4.模型归并

    指的是将多个模型的权重或参数整合到一个新的模型中,形成一个更强大的模型。
    模型归并的用途:


  • 提拔模型性能: 整合差别模型的上风,从而进步模型的精度、鲁棒性等性能指标
  • 增强模型泛化能力: 降低过拟合风险,使模型在差别数据集上体现更稳定
  • 减少模型尺寸: 减少模型的存储空间和盘算量
  • 进步模型效率: 进步模型的推理效率

三、前置条件

 3.1. 底子情况及前置条件

     1. 操作体系:centos7
     2. NVIDIA Tesla V100 32GB   CUDA Version: 12.2 

     3. 提前下载好Qwen2-7B-Instruct模型
          通过以下两个所在举行下载,优先保举魔搭
          hugging face:https://huggingface.co/Qwen/Qwen2-7B-Instruct/tree/main
         

          modelscope:git clone https://www.modelscope.cn/qwen/Qwen2-7B-Instruct.git

按需选择SDK大概Git方式下载

利用git方式下载示例(大概利用git-lfs):

3.2.Anaconda安装

      参见“开源模型应用落地-qwen-7b-chat与vllm实现推理加快的正确姿势(一)
3.3.安装依靠

    通过命令行安装:
  1. conda create --name swift  python=3.10
  2. conda activate swift
  3. conda env remove -n swift
  4. pip install 'ms-swift[all]' -U -i https://pypi.tuna.tsinghua.edu.cn/simple
复制代码
    通过源码安装:
  1. git clone https://github.com/modelscope/swift.git
  2. cd swift
  3. pip install -e '.[llm]' -i https://pypi.tuna.tsinghua.edu.cn/simple
复制代码
3.4.完成模型微调

参见:开源模型应用落地-qwen2-7b-instruct-LoRA微调-ms-swift-单机单卡-V100(十二)
日志输出情况:
  1. Train: 100%|██████████| 873/873 [09:34<00:00,  1.69it/s]{'eval_loss': nan, 'eval_acc': 0.02320291, 'eval_runtime': 1.6477, 'eval_samples_per_second': 4.855, 'eval_steps_per_second': 4.855, 'epoch': 0.92, 'global_step/max_steps': '800/873', 'percentage': '91.64%', 'elapsed_time': '8m 47s', 'remaining_time': '48s'}
  2. Val: 100%|██████████| 8/8 [00:01<00:00,  5.65it/s]9it/s]
  3. [INFO:swift] Saving model checkpoint to /data/model/sft/qwen2-7b-instruct-sft/qwen2-7b-instruct/v1-20240830-151000/checkpoint-873
  4. Train: 100%|██████████| 873/873 [09:36<00:00,  1.51it/s]
  5. [INFO:swift] last_model_checkpoint: /data/model/sft/qwen2-7b-instruct-sft/qwen2-7b-instruct/v1-20240830-151000/checkpoint-873
  6. [INFO:swift] best_model_checkpoint: /data/model/sft/qwen2-7b-instruct-sft/qwen2-7b-instruct/v1-20240830-151000/checkpoint-100
  7. [INFO:swift] images_dir: /data/model/sft/qwen2-7b-instruct-sft/qwen2-7b-instruct/v1-20240830-151000/images
  8. [INFO:swift] End time of running main: 2024-08-30 15:20:25.615625
  9. {'eval_loss': nan, 'eval_acc': 0.02320291, 'eval_runtime': 1.6682, 'eval_samples_per_second': 4.796, 'eval_steps_per_second': 4.796, 'epoch': 1.0, 'global_step/max_steps': '873/873', 'percentage': '100.00%', 'elapsed_time': '9m 36s', 'remaining_time': '0s'}
  10. {'train_runtime': 576.7666, 'train_samples_per_second': 1.514, 'train_steps_per_second': 1.514, 'train_loss': 0.0, 'epoch': 1.0, 'global_step/max_steps': '873/873', 'percentage': '100.00%', 'elapsed_time': '9m 36s', 'remaining_time': '0s'}
复制代码
生成的模型权重:



四、技术实现

4.1.推理时归并

在推理时, 归并LoRA权重并保存
启动归并:
  1. conda activate swift
  2. swift infer --ckpt_dir /data/model/sft/qwen2-7b-instruct-sft/qwen2-7b-instruct/v0-20240830-152615/checkpoint-873 --load_dataset_config true --merge_lora true --infer_backend vllm --max_model_len 8192
复制代码
若配置启用vllm,则必要安装vllm依靠
  1. pip install vllm -i https://pypi.tuna.tsinghua.edu.cn/simple
复制代码
可配置参数:
  1. InferArguments(model_type='qwen2-7b-instruct', model_id_or_path='/data/model/qwen2-7b-instruct', model_revision='master', sft_type='lora', template_type='qwen', infer_backend='vllm', ckpt_dir='/data/model/sft/qwen2-7b-instruct-sft/qwen2-7b-instruct/v1-20240901-141800/checkpoint-873', result_dir=None, load_args_from_ckpt_dir=True, load_dataset_config=True, eval_human=False, seed=42, dtype='bf16', model_kwargs=None, dataset=['qwen_zh_demo'], val_dataset=[], dataset_seed=42, dataset_test_ratio=0.01, show_dataset_sample=-1, save_result=True, system='You are a helpful assistant.', tools_prompt='react_en', max_length=None, truncation_strategy='delete', check_dataset_strategy='none', model_name=[None, None], model_author=[None, None], quant_method=None, quantization_bit=0, hqq_axis=0, hqq_dynamic_config_path=None, bnb_4bit_comp_dtype='bf16', bnb_4bit_quant_type='nf4', bnb_4bit_use_double_quant=True, bnb_4bit_quant_storage=None, max_new_tokens=2048, do_sample=True, temperature=0.3, top_k=20, top_p=0.7, repetition_penalty=1.0, num_beams=1, stop_words=[], rope_scaling=None, use_flash_attn=None, ignore_args_error=False, stream=True, merge_lora=True, merge_device_map='cpu', save_safetensors=True, overwrite_generation_config=True, verbose=None, local_repo_path=None, custom_register_path=None, custom_dataset_info='/data/service/swift/data/custom_dataset_info.json', device_map_config_path=None, device_max_memory=[], hub_token=None, gpu_memory_utilization=0.9, tensor_parallel_size=1, max_num_seqs=256, max_model_len=8192, disable_custom_all_reduce=True, enforce_eager=False, vllm_enable_lora=False, vllm_max_lora_rank=16, lora_modules=[], tp=1, cache_max_entry_count=0.8, quant_policy=0, vision_batch_size=1, self_cognition_sample=0, train_dataset_sample=-1, val_dataset_sample=None, safe_serialization=None, model_cache_dir=None, merge_lora_and_save=None, custom_train_dataset_path=[], custom_val_dataset_path=[], vllm_lora_modules=None)
复制代码
归并结果:
  1. INFO 08-30 17:16:03 model_runner.py:879] Starting to load model /data/model/sft/qwen2-7b-instruct-sft/qwen2-7b-instruct/v0-20240830-152615/checkpoint-873-merged...
  2. INFO 08-30 17:16:03 selector.py:217] Cannot use FlashAttention-2 backend for Volta and Turing GPUs.
  3. INFO 08-30 17:16:03 selector.py:116] Using XFormers backend.
  4. Loading safetensors checkpoint shards:   0% Completed | 0/4 [00:00<?, ?it/s]
  5. Loading safetensors checkpoint shards:  25% Completed | 1/4 [01:07<03:23, 67.95s/it]
  6. Loading safetensors checkpoint shards:  50% Completed | 2/4 [01:21<01:12, 36.12s/it]
  7. Loading safetensors checkpoint shards:  75% Completed | 3/4 [01:22<00:20, 20.10s/it]
  8. Loading safetensors checkpoint shards: 100% Completed | 4/4 [01:23<00:00, 12.29s/it]
  9. Loading safetensors checkpoint shards: 100% Completed | 4/4 [01:23<00:00, 20.78s/it]
  10. INFO 08-30 17:17:27 model_runner.py:890] Loading model weights took 14.2487 GB
  11. INFO 08-30 17:17:29 gpu_executor.py:121] # GPU blocks: 13857, # CPU blocks: 4681
  12. INFO 08-30 17:17:32 model_runner.py:1181] Capturing the model for CUDA graphs. This may lead to unexpected consequences if the model is not static. To run the model in eager mode, set 'enforce_eager=True' or use '--enforce-eager' in the CLI.
  13. INFO 08-30 17:17:32 model_runner.py:1185] CUDA graphs can take additional 1~3 GiB memory per GPU. If you are running out of memory, consider decreasing `gpu_memory_utilization` or enforcing eager mode. You can also reduce the `max_num_seqs` as needed to decrease memory usage.
  14. INFO 08-30 17:17:59 model_runner.py:1300] Graph capturing finished in 27 secs.
  15. [INFO:swift] generation_config: SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.3, top_p=0.7, top_k=20, min_p=0.0, seed=None, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=[], stop_token_ids=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=2048, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=False, spaces_between_special_tokens=True, truncate_prompt_tokens=None)
  16. [INFO:swift] system: You are a helpful assistant.
  17. Map: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 881/881 [00:00<00:00, 9162.50 examples/s]
  18. [INFO:swift] val_dataset: Dataset({
  19.     features: ['query', 'response'],
  20.     num_rows: 8
  21. })
  22. [INFO:swift] Setting args.verbose: True
  23. [INFO:swift] save_result_path: /data/model/sft/qwen2-7b-instruct-sft/qwen2-7b-instruct/v0-20240830-152615/checkpoint-873-merged/infer_result/20240830-171759.jsonl
  24. [INFO:swift] End time of running main: 2024-08-30 17:20:33.961291
  25. [rank0]:[W830 17:20:34.309521566 ProcessGroupNCCL.cpp:1168] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present,  but this warning has only been added since PyTorch 2.4 (function operator())
复制代码
归并后的文件:

4.2.单独归并

启动归并:
  1. conda activate swift 
  2. swift export --ckpt_dir /data/model/sft/qwen2-7b-instruct-sft/qwen2-7b-instruct/v0-20240830-152615/checkpoint-873 --merge_lora true
复制代码
可配置参数:
  1. ExportArguments(model_type='qwen2-7b-instruct', model_id_or_path='/data/model/qwen2-7b-instruct', model_revision='master', sft_type='lora', template_type='qwen', infer_backend='vllm', ckpt_dir='/data/model/sft/qwen2-7b-instruct-sft/qwen2-7b-instruct/v0-20240830-152615/checkpoint-873', result_dir=None, load_args_from_ckpt_dir=True, load_dataset_config=False, eval_human=True, seed=42, dtype='fp16', model_kwargs=None, dataset=[], val_dataset=[], dataset_seed=42, dataset_test_ratio=0.01, show_dataset_sample=-1, save_result=True, system='You are a helpful assistant.', tools_prompt='react_en', max_length=None, truncation_strategy='delete', check_dataset_strategy='none', model_name=[None, None], model_author=[None, None], quant_method='awq', quantization_bit=0, hqq_axis=0, hqq_dynamic_config_path=None, bnb_4bit_comp_dtype='fp16', bnb_4bit_quant_type='nf4', bnb_4bit_use_double_quant=True, bnb_4bit_quant_storage=None, max_new_tokens=2048, do_sample=True, temperature=0.3, top_k=20, top_p=0.7, repetition_penalty=1.0, num_beams=1, stop_words=[], rope_scaling=None, use_flash_attn=None, ignore_args_error=False, stream=True, merge_lora=True, merge_device_map='auto', save_safetensors=True, overwrite_generation_config=True, verbose=None, local_repo_path=None, custom_register_path=None, custom_dataset_info='/data/service/swift/data/custom_dataset_info.json', device_map_config_path=None, device_max_memory=[], hub_token=None, gpu_memory_utilization=0.9, tensor_parallel_size=1, max_num_seqs=256, max_model_len=None, disable_custom_all_reduce=True, enforce_eager=False, vllm_enable_lora=False, vllm_max_lora_rank=16, lora_modules=[], tp=1, cache_max_entry_count=0.8, quant_policy=0, vision_batch_size=1, self_cognition_sample=0, train_dataset_sample=-1, val_dataset_sample=None, safe_serialization=None, model_cache_dir=None, merge_lora_and_save=None, custom_train_dataset_path=[], custom_val_dataset_path=[], vllm_lora_modules=None, to_peft_format=False, to_ollama=False, ollama_output_dir=None, gguf_file=None, quant_bits=0, quant_n_samples=256, quant_seqlen=2048, quant_device_map='cpu', quant_output_dir=None, quant_batch_size=1, push_to_hub=False, hub_model_id=None, hub_private_repo=False, commit_message='update files', to_megatron=False, to_hf=False, megatron_output_dir=None, hf_output_dir=None, pp=1)
复制代码
归并结果:
  1. run sh: `/usr/local/miniconda3/envs/swift/bin/python /usr/local/miniconda3/envs/swift/lib/python3.10/site-packages/swift/cli/export.py --ckpt_dir /data/model/sft/qwen2-7b-instruct-sft/qwen2-7b-instruct/v0-20240830-152615/checkpoint-873 --merge_lora true`
  2. [INFO:swift] Successfully registered `/usr/local/miniconda3/envs/swift/lib/python3.10/site-packages/swift/llm/data/dataset_info.json`
  3. [INFO:swift] No LMDeploy installed, if you are using LMDeploy, you will get `ImportError: cannot import name 'prepare_lmdeploy_engine_template' from 'swift.llm'`
  4. [INFO:swift] Start time of running main: 2024-08-30 22:20:48.465926
  5. [INFO:swift] ckpt_dir: /data/model/sft/qwen2-7b-instruct-sft/qwen2-7b-instruct/v0-20240830-152615/checkpoint-873
  6. [INFO:swift] Successfully registered `/data/service/swift/data/custom_dataset_info.json`
  7. [INFO:swift] Setting model_info['revision']: master
  8. [INFO:swift] Setting self.eval_human: True
  9. [INFO:swift] Setting overwrite_generation_config: True
  10. [INFO:swift] args: ExportArguments(model_type='qwen2-7b-instruct', model_id_or_path='/data/model/qwen2-7b-instruct', model_revision='master', sft_type='lora', template_type='qwen', infer_backend='vllm', ckpt_dir='/data/model/sft/qwen2-7b-instruct-sft/qwen2-7b-instruct/v0-20240830-152615/checkpoint-873', result_dir=None, load_args_from_ckpt_dir=True, load_dataset_config=False, eval_human=True, seed=42, dtype='fp16', model_kwargs=None, dataset=[], val_dataset=[], dataset_seed=42, dataset_test_ratio=0.01, show_dataset_sample=-1, save_result=True, system='You are a helpful assistant.', tools_prompt='react_en', max_length=None, truncation_strategy='delete', check_dataset_strategy='none', model_name=[None, None], model_author=[None, None], quant_method='awq', quantization_bit=0, hqq_axis=0, hqq_dynamic_config_path=None, bnb_4bit_comp_dtype='fp16', bnb_4bit_quant_type='nf4', bnb_4bit_use_double_quant=True, bnb_4bit_quant_storage=None, max_new_tokens=2048, do_sample=True, temperature=0.3, top_k=20, top_p=0.7, repetition_penalty=1.0, num_beams=1, stop_words=[], rope_scaling=None, use_flash_attn=None, ignore_args_error=False, stream=True, merge_lora=True, merge_device_map='auto', save_safetensors=True, overwrite_generation_config=True, verbose=None, local_repo_path=None, custom_register_path=None, custom_dataset_info='/data/service/swift/data/custom_dataset_info.json', device_map_config_path=None, device_max_memory=[], hub_token=None, gpu_memory_utilization=0.9, tensor_parallel_size=1, max_num_seqs=256, max_model_len=None, disable_custom_all_reduce=True, enforce_eager=False, vllm_enable_lora=False, vllm_max_lora_rank=16, lora_modules=[], tp=1, cache_max_entry_count=0.8, quant_policy=0, vision_batch_size=1, self_cognition_sample=0, train_dataset_sample=-1, val_dataset_sample=None, safe_serialization=None, model_cache_dir=None, merge_lora_and_save=None, custom_train_dataset_path=[], custom_val_dataset_path=[], vllm_lora_modules=None, to_peft_format=False, to_ollama=False, ollama_output_dir=None, gguf_file=None, quant_bits=0, quant_n_samples=256, quant_seqlen=2048, quant_device_map='cpu', quant_output_dir=None, quant_batch_size=1, push_to_hub=False, hub_model_id=None, hub_private_repo=False, commit_message='update files', to_megatron=False, to_hf=False, megatron_output_dir=None, hf_output_dir=None, pp=1)
  11. [INFO:swift] Global seed set to 42
  12. [INFO:swift] replace_if_exists: False
  13. [INFO:swift] merged_lora_path: `/data/model/sft/qwen2-7b-instruct-sft/qwen2-7b-instruct/v0-20240830-152615/checkpoint-873-merged`
  14. [INFO:swift] merge_device_map: auto
  15. [INFO:swift] device_count: 1
  16. [INFO:swift] Loading the model using model_dir: /data/model/qwen2-7b-instruct
  17. [INFO:swift] model_kwargs: {'low_cpu_mem_usage': True, 'device_map': 'auto'}
  18. Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [02:05<00:00, 31.46s/it]
  19. [INFO:swift] model.max_model_len: 32768
  20. [INFO:swift] generation_config: GenerationConfig {
  21.   "do_sample": true,
  22.   "eos_token_id": 151645,
  23.   "max_new_tokens": 2048,
  24.   "pad_token_id": 151643,
  25.   "temperature": 0.3,
  26.   "top_k": 20,
  27.   "top_p": 0.7
  28. }
  29. [INFO:swift] PeftModelForCausalLM: 7619.0572M Params (0.0000M Trainable [0.0000%]), 234.8828M Buffers.
  30. [INFO:swift] system: You are a helpful assistant.
  31. [INFO:swift] Merge LoRA...
  32. [INFO:swift] Saving merged weights...
  33. [INFO:swift] Successfully merged LoRA and saved in /data/model/sft/qwen2-7b-instruct-sft/qwen2-7b-instruct/v0-20240830-152615/checkpoint-873-merged.
  34. [INFO:swift] Setting args.sft_type: 'full'
  35. [INFO:swift] Setting args.ckpt_dir: /data/model/sft/qwen2-7b-instruct-sft/qwen2-7b-instruct/v0-20240830-152615/checkpoint-873-merged
  36. [INFO:swift] End time of running main: 2024-08-30 22:24:23.811071
复制代码
归并后的文件:


免责声明:如果侵犯了您的权益,请联系站长,我们会及时删除侵权内容,谢谢合作!更多信息从访问主页:qidao123.com:ToB企服之家,中国第一个企服评测及商务社交产业平台。
回复

使用道具 举报

0 个回复

倒序浏览

快速回复

您需要登录后才可以回帖 登录 or 立即注册

本版积分规则

美丽的神话

金牌会员
这个人很懒什么都没写!
快速回复 返回顶部 返回列表