5.llama.cpp编译及使用

宝塔山 · 2024-8-10 21:53:31

马上注册，结交更多好友，享用更多功能，让你轻松玩转社区。

您需要登录才可以下载或查看，没有账号？立即注册

x

llama.cpp的编译及使用

下载源码

llama.cpp

https://github.com/ggerganov/llama.cpp

复制代码

ggml 向量库

https://github.com/ggerganov/ggml

复制代码

安装依靠库

cmake 编译：版本稍高一些，我的是3.22

编译

支持cuda

cd llama.cpp
mkdir build
cd build
cmake .. -DLLAMA_CUBLAS=ON
make -j8

复制代码

末了在build/bin目次下生成
下载模型

meta官网下载，贼麻烦

https://ai.meta.com/llama/

复制代码

https://huggingface.co/
meta-llama

复制代码

huggingface下载

https://huggingface.co/

复制代码

Linly: 国内Linly开源

https://github.com/CVI-SZU/Linly

复制代码

模型量化

模型量化的python代码在llama.cpp下面找到。在硬件资源有限的情况下才对模型举行量化。
在build/bin找到quantize

模型下载

https://huggingface.co/
meta-llama
/Llama-2-7b-hf

复制代码

模型转换
convert the 7B model to ggml FP16 format 默认做当前目次下生成ggml模型ggml-model-f16.bin

python convert.py models/llama-2-7b-hf/

复制代码

在较新版本默认生成的是ggml-model-f16.gguf

模型量化
quantize the model to 4-bits (using q4_0 method) 进一步对FP16模型举行4-bit量化

./quantize ./models/llama-2-7b-hf/ggml-model-f16.bin ./models/llama-2-7b-hf/ggml-model-q4_0.bin q4_0

复制代码

模型推理

在build/bin找到main

./main -ngl 30 -m ./models/llama-2-7b-hf/ggml-model-q4_0.bin --color -f ./prompts/chat-with-vicuna-v0.txt -ins -c 2048 --temp 0.2 -n 4096 --repeat_penalty 1.0

复制代码

Linly模型

自己动手处理
运行测试

测试用脚本

#!/bin/bash
# llama 推理
#./main -ngl 30 -m ./models/7B/ggml-model-alpaca-7b-q4_0.gguf --color -f ./prompts/chat-with-vicuna-v0.txt -ins -c 2048 --temp 0.2 -n 4096 --repeat_penalty 1.3
# linly 基础模型
#./main -ngl 30 -m ./models/7B/linly-ggml-model-q4_0.bin --color -f ./prompts/chat-with-vicuna-v0.txt -ins -c 2048 --temp 0.2 -n 4096 --repeat_penalty 1.0
# linly chatflow模型
./main -ngl 30 -m ./models/chatflow_7b/linly-chatflow-7b-q4_0.bin --color -f ./prompts/chat-with-vicuna-v0.txt -ins -c 2048 --temp 0.2 -n 4096 --repeat_penalty 1.0
# whisper llama
#./whisper/talk-llama -l zh -mw ./models/ggml-small_q4_0.bin -ml ./models/7B/ggml-model-alpaca-7b-q4_0.gguf -p "lfrobot" -t 8 -c 0 -vth 0.6 -fth 100 -pe

复制代码

参数说明
比较重要的参数：

-ins 启动类ChatGPT的对话交流模式
-f 指定prompt模板，alpaca模型请加载prompts/alpaca.txt 指令模板
-c 控制上下文的长度，值越大越能参考更长的对话历史（默认：512）
-n 控制回复生成的最大长度（默认：128）
--repeat_penalty 控制生成回复中对重复文本的惩罚力度
--temp 温度系数，值越低回复的随机性越小，反之越大
--top_p, top_k 控制解码采样的相关参数
-b 控制batch size（默认：512）
-t 控制线程数量（默认：8），可适当增加
-ngl 使用cuda核心数
-m 指定模型

复制代码

免责声明：如果侵犯了您的权益，请联系站长，我们会及时删除侵权内容，谢谢合作！更多信息从访问主页：qidao123.com:ToB企服之家，中国第一个企服评测及商务社交产业平台。

		自动登录	找回密码
密码			立即注册

5.llama.cpp编译及使用

马上注册，结交更多好友，享用更多功能，让你轻松玩转社区。

0 个回复

快速回复

楼主热帖

标签云

浏览过的版块