wget https://developer.download.nvidia.com/compute/cuda/12.1.0/local_installers/cuda_12.1.0_530.30.02_linux.run``sudo sh cuda_12.1.0_530.30.02_linux.run
注意: 如果你已经通过了 Hugging Face 的稽核,下载步骤可以省略。你可以直接通过 Hugging Face 的接口下载相应的模子文件。下载完成后,模子会被缓存到本地文件系统,下次微调时无需重复下载。此外,得到模子使用权后,你还需要从 Hugging Face 获取一个 API Key,并将该模子与 API Key 关联。
不外,手动下载模子相对简单,发起直接下载以简化流程。 第二步,模子加载
通过vscode或pycharm之类的IDE新建一个jupyter文件(方便调试,调试过程中会提示你缺少如许那样的库,需要安装),复制以下代码:
import json` `from unsloth import FastLanguageModel` `import torch` `max_seq_length = 2048 # Choose any! We auto support RoPE Scaling internally!` `dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+` `load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False. ``# 4bit pre quantized models we support for 4x faster downloading + no OOMs.` `fourbit_models = [ `` "unsloth/Meta-Llama-3.1-8B-bnb-4bit", # Llama-3.1 15 trillion tokens model 2x faster! `` "unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit", `` "unsloth/Meta-Llama-3.1-70B-bnb-4bit", `` "unsloth/Meta-Llama-3.1-405B-bnb-4bit", # We also uploaded 4bit for 405b! `` "unsloth/Mistral-Nemo-Base-2407-bnb-4bit", # New Mistral 12b 2x faster! `` "unsloth/Mistral-Nemo-Instruct-2407-bnb-4bit", `` "unsloth/mistral-7b-v0.3-bnb-4bit", # Mistral v3 2x faster! `` "unsloth/mistral-7b-instruct-v0.3-bnb-4bit", `` "unsloth/Phi-3.5-mini-instruct", # Phi-3.5 2x faster! `` "unsloth/Phi-3-medium-4k-instruct", `` "unsloth/gemma-2-9b-bnb-4bit", ``"unsloth/gemma-2-27b-bnb-4bit", # Gemma 2x faster!` `] # More models at https://huggingface.co/unsloth` `# 载入的模型 ,如下目录文件是我下载之后存入本地文件系统中的文件。``model_path ="/mnt/d/02-LLM/LLM-APP/00-models/unsloth-llama-3.1-8b-bnb-4bit"` `# 加载模型和分词器` `model, tokenizer = FastLanguageModel.from_pretrained( `` model_name = model_path, # Choose any model from above list! `` max_seq_length = max_seq_length, `` dtype = dtype, `` load_in_4bit = load_in_4bit, ``# token = "hf_...", # use one if using gated models like meta-llama/Llama-2-7b-hf` `)