人工智能-最强终端摆设的多模态MiniCPM-V模型摆设分享（不看痛恨）

忿忿的泥巴坨 发表于 2024-11-8 16:34:10

最强终端摆设的多模态MiniCPM-V模型摆设分享（不看痛恨）

MiniCPM-V模型是一个强大的端侧多模态大语言模型，专为高效的终端摆设而设计。
现在该模型有MiniCPM-V 1.0、MiniCPM-V 2.0和MiniCPM-Llama3-V 2.5版本。
MiniCPM-V 1.0模型：该模型系列第一个版本，具有底子的多模态处置惩罚本事，同时是最轻量级的版本。
MiniCPM-V 2.0模型：此版本提供了高效而先辈的端侧双语多模态理解本事，能够处置惩罚最大180万像素的高清大图，包括那些具有1:9极限宽高比的图像，举行高效编码和无损辨认。
它集成了多模态通用本事、OCR（光学字符辨认）综合本事和对多种类型数据的处置惩罚本事。
MiniCPM-Llama3-V 2.5：这是MiniCPM系列的最新版本，拥有80亿（8B）参数，被宣传为“最强端侧多模态模型”。它在2024年5月21日推出并开源，支持超过30种语言，性能逾越了Gemini Pro和GPT-4V等多模态巨无霸模型。
该模型在HuggingFace和GitHub Trending榜上均登顶，展示了其在开源社区的影响力和受接待水平。
MiniCPM-Llama3-V 2.5强调了在有限的硬件资源（如8GB显存）上实现高效推理的本事，得当在手机等移动装备上摆设。
github项目地址：https://github.com/OpenBMB/MiniCPM-V
一、环境安装

1、python环境
建议安装python版本在3.10以上。
2、pip库安装
pip install torch==2.1.2+cu118 torchvision==0.16.2+cu118 torchaudio==2.1.2 --extra-index-url https://download.pytorch.org/whl/cu118

pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple

pip install jmespath -i https://pypi.tuna.tsinghua.edu.cn/simple 3、模型下载
（1）MiniCPM-V-1模型
git lfs install

git clone https://www.modelscope.cn/OpenBMB/MiniCPM-V.git （2）MiniCPM-V-2.0模型
git lfs install

git clone https://www.modelscope.cn/OpenBMB/MiniCPM-V-2.git （3）MiniCPM-V-2.5模型
git lfs install

git clone https://www.modelscope.cn/OpenBMB/MiniCPM-Llama3-V-2_5.git 二、功能测试

1、web功能测试
利用第一张显卡，显卡至少要有19G显存以上测试MiniCPM-V-2.5模型
CUDA_VISIBLE_DEVICES=0 python web_demo_2.5.py --device cuda 2、python接口测试
from PIL import Image

import torch

from modelscope import AutoModel, AutoTokenizer

#初始化模型和分词器

model = AutoModel.from_pretrained('openbmb/MiniCPM-Llama3-V-2_5', trust_remote_code=True, torch_dtype=torch.float16).to('cuda')

model.eval()

tokenizer = AutoTokenizer.from_pretrained('openbmb/MiniCPM-Llama3-V-2_5', trust_remote_code=True)

#输入图片和问题

image_path = 'example_image.jpg'

image = Image.open(image_path).convert('RGB')

query = 'Describe the scene depicted in this image.'

#定义对话消息

messages = [{'role': 'user', 'content': query}]

#方法1：使用模型进行聊天（非流式）

response = model.chat(image=image, msgs=messages, tokenizer=tokenizer, sampling=True, temperature=0.5)

print("\nNon-streaming response:")

print(response)

#方法2：使用模型进行聊天（流式）

print("\nStreaming response:")

stream_response = model.chat(image=image, msgs=messages, tokenizer=tokenizer, sampling=True, temperature=0.5, stream=True)

for text_chunk in stream_response:

print(text_chunk, end='', flush=True)

print() 3、测试结果
（1）案例1
https://i-blog.csdnimg.cn/direct/8460c98e14654163b6be49c1f8c002fc.png
（2）案例2
https://i-blog.csdnimg.cn/direct/3226fcd4065c4d258b42c9c0789847ac.png
（2）案例3
https://i-blog.csdnimg.cn/direct/fdf11748cbff40a99861627b06b53683.png
三、总结

MiniCPM-V是一个端侧多模态大型语言模型，专为视觉-语言理解任务设计。
该模型能够同时理解和生成文本及图像内容，实用于各种交互式应用，如虚拟助手、图像形貌生成、加强实际等。
MiniCPM-V模型不仅仅能够提供高性能、低资源消耗的多模态处置惩罚本事，还特别得当在装备端（如手机、嵌入式装备等）运行，无需依靠云端计算资源。
随着MiniCPM-V系列模型的不断演进，预计它们将在智能家居、可穿戴装备、移动应用、自动驾驶等多个范畴发挥重要作用，推动AI技术的普及和创新应用。

https://i-blog.csdnimg.cn/direct/9550add1e2f2400a98819bbd4884dcf7.jpeg

喜欢就点赞转发，后期我还会连续在这里分享最新研发技术动向。
另外，想看更多的CogVLM2相关技术履历，接待背景留言讨论。

免责声明：如果侵犯了您的权益，请联系站长，我们会及时删除侵权内容，谢谢合作！更多信息从访问主页：qidao123.com:ToB企服之家，中国第一个企服评测及商务社交产业平台。

页: [1]

qidao123.com技术社区-IT企服评测·应用市场's Archiver

最强终端摆设的多模态MiniCPM-V模型摆设分享（不看痛恨）