LLMs之Llama3：基于Colab平台(免费T4-GPU)利用LLaMA-Factory的GUI界面(底层 ...

络腮胡菲菲 · 2024-9-4 03:38:27

LLMs之Llama3：基于Colab平台(免费T4-GPU)利用LLaMA-Factory的GUI界面(底层接纳unsloth优化框架【加速练习5~30倍+镌汰50%的内存占用】)对llama-3-8b-Instruct-bnb-4bit模型接纳alpaca数据集实现CLI方式/GUI傻瓜可视化方式进行LoRA指令微调→模型推理测试→CLI方式合并权重

目次
基于colab平台(免费T4-GPU)利用LLaMA-Factory的GUI界面(底层接纳unsloth优化框架【加速练习5倍~30+镌汰50%的内存占用】)对llama-3-8b-Instruct-bnb-4bit模型接纳alpaca数据集实现CLI方式/GUI傻瓜可视化方式进行LoRA指令微调→模型推理测试→CLI方式合并权重
# 1、安装依赖
# 1.1、克隆 LLaMA-Factory 仓库并安装必要的 Python 包，包罗 unsloth、xformers 和 bitsandbytes。
# 1.2、检查 GPU 环境，确保可以利用 Colab 的 Tesla T4 GPU。
# 2、更新身份数据集：
# 3、模型微调
# T1、通过 LLaMA Board 微调模型：
# T2、通过下令行微调模型：It takes ~30min for training.
# 4、模型推理
# 5、合并 LoRA 适配器并可选地上传模型：
实现代码

基于colab平台(免费T4-GPU)利用LLaMA-Factory的GUI界面(底层接纳unsloth优化框架【加速练习5倍~30+镌汰50%的内存占用】)对llama-3-8b-Instruct-bnb-4bit模型接纳alpaca数据集实现CLI方式/GUI傻瓜可视化方式进行LoRA指令微调→模型推理测试→CLI方式合并权重

# 1、安装依赖

# 1.1、克隆 LLaMA-Factory 仓库并安装必要的 Python 包，包罗 unsloth、xformers 和 bitsandbytes。

# 1.2、检查 GPU 环境，确保可以利用 Colab 的 Tesla T4 GPU。

# 2、更新身份数据集：

# 读取并修改 identity.json 文件，替换其中的占位符为 “Llama-3” 和 “LLaMA Factory”。
# 这一步调是为了个性化练习数据，确保模型能够生成与特定身份相关的回复。

# 3、模型微调

# T1、通过 LLaMA Board 微调模型：

# 利用 llamafactory-cli 下令行工具启动一个 Web UI 界面，大概用于监控和调整微调过程。

# T2、通过下令行微调模型：It takes ~30min for training.

# 定义微调参数，包罗利用的模型、数据集、模板、微调类型（此处利用 LoRA 适配器）、输出目次、批处理巨细、学习率调度器、日志步调等。
# 利用 llamafactory-cli 下令行工具开始微调过程。
# 这一步利用的技术包罗 LoRA 适配器（用于节省内存）、4位量化、LoRA+ 算法以及浮点16混淆精度练习。

# 4、模型推理

# 推断微调后的模型：用于测试和验证微调后的模型性能。
# 设置模型参数，加载微调时利用的 LoRA 适配器，并初始化一个 ChatModel 实例。
# 通过 CLI 应用步调与模型交互，输入查询并接收模型的生成文本。
f

# 5、合并 LoRA 适配器并可选地上传模型：

# 定义参数以合并 LoRA 适配器到原始模型，并指定输出目次。
# 利用 llamafactory-cli 下令行工具执行导出操纵。
# 注意，由于 Colab 免费版本内存限制，无法在此环境中合并 8B 的模型。

实现代码

源码地址：https://colab.research.google.com/drive/1eRTPn37ltBbYsISy9Aw2NuI2Aq5CQrD9?usp=sharing#scrollTo=kTESHaFvbNTr

'''
Finetune Llama-3 with LLaMA Factory
Please use a free Tesla T4 Colab GPU to run this!
4月22日
源码地址：https://colab.research.google.com/drive/1eRTPn37ltBbYsISy9Aw2NuI2Aq5CQrD9?usp=sharing#scrollTo=kTESHaFvbNTr
'''
# 1、安装依赖
# 1.1、克隆 LLaMA-Factory 仓库并安装必要的 Python 包，包括 unsloth、xformers 和 bitsandbytes。
%cd /content/
%rm -rf LLaMA-Factory
!git clone https://github.com/hiyouga/LLaMA-Factory.git
%cd LLaMA-Factory
%ls
!pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
!pip install --no-deps xformers==0.0.25
!pip install .[bitsandbytes]
# 1.2、检查 GPU 环境，确保可以使用 Colab 的 Tesla T4 GPU。
import torch
try:
assert torch.cuda.is_available() is True
except AssertionError:
print("Please set up a GPU before using LLaMA Factory: https://medium.com/mlearning-ai/training-yolov4-on-google-colab-316f8fff99c6")
# 2、更新身份数据集：
# 读取并修改 identity.json 文件，替换其中的占位符为 “Llama-3” 和 “LLaMA Factory”。
# 这一步骤是为了个性化训练数据，确保模型能够生成与特定身份相关的回复。
import json
NAME = "Llama-3"
AUTHOR = "LLaMA Factory"
with open("data/identity.json", "r", encoding="utf-8") as f:
dataset = json.load(f)
for sample in dataset:
sample["output"] = sample["output"].replace("{{"+ "name" + "}}", NAME).replace("{{"+ "author" + "}}", AUTHOR)
with open("data/identity.json", "w", encoding="utf-8") as f:
json.dump(dataset, f, indent=2, ensure_ascii=False)
# 3、模型微调
# T1、通过 LLaMA Board 微调模型：
# 使用 llamafactory-cli 命令行工具启动一个 Web UI 界面，可能用于监控和调整微调过程。
%cd /content/LLaMA-Factory/
!GRADIO_SHARE=1 llamafactory-cli webui
# T2、通过命令行微调模型：It takes ~30min for training.
# 定义微调参数，包括使用的模型、数据集、模板、微调类型（此处使用 LoRA 适配器）、输出目录、批处理大小、学习率调度器、日志步骤等。
# 使用 llamafactory-cli 命令行工具开始微调过程。
# 这一步使用的技术包括 LoRA 适配器（用于节省内存）、4位量化、LoRA+ 算法以及浮点16混合精度训练。
import json
args = dict(
stage="sft", # do supervised fine-tuning
do_train=True,
model_name_or_path="unsloth/llama-3-8b-Instruct-bnb-4bit", # use bnb-4bit-quantized Llama-3-8B-Instruct model
dataset="identity,alpaca_gpt4_en", # use alpaca and identity datasets
template="llama3", # use llama3 prompt template
finetuning_type="lora", # use LoRA adapters to save memory
lora_target="all", # attach LoRA adapters to all linear layers
output_dir="llama3_lora", # the path to save LoRA adapters
per_device_train_batch_size=2, # the batch size
gradient_accumulation_steps=4, # the gradient accumulation steps
lr_scheduler_type="cosine", # use cosine learning rate scheduler
logging_steps=10, # log every 10 steps
warmup_ratio=0.1, # use warmup scheduler
save_steps=1000, # save checkpoint every 1000 steps
learning_rate=5e-5, # the learning rate
num_train_epochs=3.0, # the epochs of training
max_samples=500, # use 500 examples in each dataset
max_grad_norm=1.0, # clip gradient norm to 1.0
quantization_bit=4, # use 4-bit QLoRA
loraplus_lr_ratio=16.0, # use LoRA+ algorithm with lambda=16.0
use_unsloth=True, # use UnslothAI's LoRA optimization for 2x faster training
fp16=True, # use float16 mixed precision training
)
json.dump(args, open("train_llama3.json", "w", encoding="utf-8"), indent=2)
%cd /content/LLaMA-Factory/
!llamafactory-cli train train_llama3.json
# 4、模型推理
# 推断微调后的模型：用于测试和验证微调后的模型性能。
# 设置模型参数，加载微调时使用的 LoRA 适配器，并初始化一个 ChatModel 实例。
# 通过 CLI 应用程序与模型交互，输入查询并接收模型的生成文本。
from llmtuner.chat import ChatModel
from llmtuner.extras.misc import torch_gc
%cd /content/LLaMA-Factory/
args = dict(
model_name_or_path="unsloth/llama-3-8b-Instruct-bnb-4bit", # use bnb-4bit-quantized Llama-3-8B-Instruct model
adapter_name_or_path="llama3_lora", # load the saved LoRA adapters
template="llama3", # same to the one in training
finetuning_type="lora", # same to the one in training
quantization_bit=4, # load 4-bit quantized model
use_unsloth=True, # use UnslothAI's LoRA optimization for 2x faster generation
)
chat_model = ChatModel(args)
messages = []
print("Welcome to the CLI application, use `clear` to remove the history, use `exit` to exit the application.")
while True:
query = input("\nUser: ")
if query.strip() == "exit":
break
if query.strip() == "clear":
messages = []
torch_gc()
print("History has been removed.")
continue
messages.append({"role": "user", "content": query})
print("Assistant: ", end="", flush=True)
response = ""
for new_text in chat_model.stream_chat(messages):
print(new_text, end="", flush=True)
response += new_text
print()
messages.append({"role": "assistant", "content": response})
torch_gc()
# 5、合并 LoRA 适配器并可选地上传模型：
# 定义参数以合并 LoRA 适配器到原始模型，并指定输出目录。
# 使用 llamafactory-cli 命令行工具执行导出操作。
# 注意，由于 Colab 免费版本内存限制，无法在此环境中合并 8B 的模型。
# NOTE: the Colab free version has merely 12GB RAM, where merging LoRA of a 8B model needs at least 18GB RAM, thus you cannot perform it in the free version.
# !huggingface-cli login
import json
args = dict(
model_name_or_path="meta-llama/Meta-Llama-3-8B-Instruct", # use official non-quantized Llama-3-8B-Instruct model
adapter_name_or_path="llama3_lora", # load the saved LoRA adapters
template="llama3", # same to the one in training
finetuning_type="lora", # same to the one in training
export_dir="llama3_lora_merged", # the path to save the merged model
export_size=2, # the file shard size (in GB) of the merged model
export_device="cpu", # the device used in export, can be chosen from `cpu` and `cuda`
#export_hub_model_id="your_id/your_model", # the Hugging Face hub ID to upload model
)
json.dump(args, open("merge_llama3.json", "w", encoding="utf-8"), indent=2)
%cd /content/LLaMA-Factory/
!llamafactory-cli export merge_llama3.json

复制代码

免责声明：如果侵犯了您的权益，请联系站长，我们会及时删除侵权内容，谢谢合作！更多信息从访问主页：qidao123.com:ToB企服之家，中国第一个企服评测及商务社交产业平台。

		自动登录	找回密码
密码			立即注册

LLMs之Llama3：基于Colab平台(免费T4-GPU)利用LLaMA-Factory的GUI界面(底层 ...

0 个回复

快速回复

楼主热帖

标签云