qidao123.com技术社区-IT企服评测·应用市场

标题: 5.llama.cpp编译及使用 [打印本页]

作者: 宝塔山 时间: 2024-8-10 21:53
标题: 5.llama.cpp编译及使用
llama.cpp的编译及使用

下载源码

复制代码

复制代码

安装依靠库

编译

支持cuda

复制代码

末了在build/bin目次下生成
下载模型

复制代码

复制代码

复制代码

复制代码

模型量化

模型量化的python代码在llama.cpp下面找到。在硬件资源有限的情况下才对模型举行量化。
在build/bin找到quantize

复制代码

模型转换
convert the 7B model to ggml FP16 format 默认做当前目次下生成ggml模型ggml-model-f16.bin

复制代码

在较新版本默认生成的是ggml-model-f16.gguf

./quantize ./models/llama-2-7b-hf/ggml-model-f16.bin ./models/llama-2-7b-hf/ggml-model-q4_0.bin q4_0

复制代码

模型推理

在build/bin找到main

./main -ngl 30 -m ./models/llama-2-7b-hf/ggml-model-q4_0.bin --color -f ./prompts/chat-with-vicuna-v0.txt -ins -c 2048 --temp 0.2 -n 4096 --repeat_penalty 1.0

复制代码

Linly模型

自己动手处理
运行测试

#!/bin/bash
# llama 推理
#./main -ngl 30 -m ./models/7B/ggml-model-alpaca-7b-q4_0.gguf --color -f ./prompts/chat-with-vicuna-v0.txt -ins -c 2048 --temp 0.2 -n 4096 --repeat_penalty 1.3
# linly 基础模型
#./main -ngl 30 -m ./models/7B/linly-ggml-model-q4_0.bin --color -f ./prompts/chat-with-vicuna-v0.txt -ins -c 2048 --temp 0.2 -n 4096 --repeat_penalty 1.0
# linly chatflow模型
./main -ngl 30 -m ./models/chatflow_7b/linly-chatflow-7b-q4_0.bin --color -f ./prompts/chat-with-vicuna-v0.txt -ins -c 2048 --temp 0.2 -n 4096 --repeat_penalty 1.0
# whisper llama
#./whisper/talk-llama -l zh -mw ./models/ggml-small_q4_0.bin -ml ./models/7B/ggml-model-alpaca-7b-q4_0.gguf -p "lfrobot" -t 8 -c 0 -vth 0.6 -fth 100 -pe

复制代码

复制代码

免责声明：如果侵犯了您的权益，请联系站长，我们会及时删除侵权内容，谢谢合作！更多信息从访问主页：qidao123.com:ToB企服之家，中国第一个企服评测及商务社交产业平台。

欢迎光临 qidao123.com技术社区-IT企服评测·应用市场 (https://dis.qidao123.com/)