超GPT3.5性能,无穷长文本,超强RAG三件套,MiniCPM3-4B模型分享 ...

打印 上一主题 下一主题

主题 873|帖子 873|积分 2619

MiniCPM3-4B是由面壁智能与清华大学天然语言处理实验室互助开发的一款高性能端侧AI模型,它是MiniCPM系列的第三代产物,具有4亿参数量。
MiniCPM3-4B模型在性能上超过了Phi-3.5-mini-Instruct和GPT-3.5-Turbo-0125,而且与多款70亿至90亿参数的AI模型相媲美。
MiniCPM3-4B在多项指标上都有显著提升,包括词汇表大小、模型层数和隐蔽层节点的增加,使其处理本领更为精彩。
MiniCPM3-4B支持32k的上下文窗口计划,理论上可以处理无穷的上下文信息,这对于须要处理大量数据和复杂查询的用户来说是一个巨大的优势。
MiniCPM3-4B还支持更高效的代码实行和函数调用,使开发者能够更快速地实现复杂的任务。
此外,面壁智能还发布了针对RAG场景的微调版MiniCPM3-RAG-LoRA模型,以及RAG套件MiniCPM-Embedding模型和MiniCPM-Reranker模型。
github项目地点:https://github.com/OpenBMB/MiniCPM。
一、环境安装

1、python环境
建议安装python版本在3.10以上。
2、pip库安装
pip install torch==2.3.0+cu118 torchvision==0.18.0+cu118 torchaudio==2.3.0 --extra-index-url https://download.pytorch.org/whl/cu118
pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install datamodel_code_generator -i https://pypi.tuna.tsinghua.edu.cn/simple
3、MiniCPM3-4B模型下载
git lfs install
git clone https://modelscope.cn/models/OpenBMB/MiniCPM3-4B 4、MiniCPM3-RAG-LoRA模型下载
git lfs install
git clone https://modelscope.cn/models/OpenBMB/MiniCPM3-RAG-LoRA 5、MiniCPM-Reranker模型下载
git lfs install
git clone https://modelscope.cn/models/OpenBMB/MiniCPM-Reranker 6、MiniCPM-Embedding模型下载
git lfs install
git clone https://modelscope.cn/models/OpenBMB/MiniCPM-Embedding
、功能测试

1、运行测试
(1)python代码调用测试
  1. import torch
  2. from modelscope import AutoModelForCausalLM, AutoModel, AutoTokenizer, snapshot_download
  3. from transformers import AutoModelForSequenceClassification
  4. from peft import PeftModel
  5. import torch.nn.functional as F
  6. import numpy as np
  7. def MiniCPM3_4B_inference(message, model_path="OpenBMB/MiniCPM3-4B", device="cuda"):
  8.     tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
  9.     model = AutoModelForCausalLM.from_pretrained(model_path, torch_dtype=torch.bfloat16, device_map=device, trust_remote_code=True)
  10.     messages = [{"role": "user", "content": message}]
  11.     model_inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to(device)
  12.     model_outputs = model.generate(
  13.         model_inputs,
  14.         max_new_tokens=1024,
  15.         top_p=0.7,
  16.         temperature=0.7,
  17.         repetition_penalty=1.02
  18.     )
  19.     output_token_ids = [model_outputs[i][len(model_inputs[i]):] for i in range(len(model_inputs))]
  20.     responses = tokenizer.batch_decode(output_token_ids, skip_special_tokens=True)[0]
  21.     return responses
  22. def MiniCPM3_RAG_LoRA_inference(instruction, passages_list, base_model_dir="OpenBMB/MiniCPM3-4B", lora_model_dir="OpenBMB/MiniCPM3-RAG-LoRA"):
  23.     base_model_dir = snapshot_download(base_model_dir)
  24.     lora_model_dir = snapshot_download(lora_model_dir)
  25.     model = AutoModelForCausalLM.from_pretrained(base_model_dir, device_map="auto", torch_dtype=torch.bfloat16).eval()
  26.     tokenizer = AutoTokenizer.from_pretrained(lora_model_dir)
  27.     model = PeftModel.from_pretrained(model, lora_model_dir)
  28.     passages = '\n'.join(passages_list)
  29.     input_text = 'Background:\n' + passages + '\n\n' + instruction
  30.     messages = [
  31.         {"role": "system", "content": "You are a helpful assistant."},
  32.         {"role": "user", "content": input_text},
  33.     ]
  34.     prompt = tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=False)
  35.     outputs = model.chat(tokenizer, prompt, temperature=0.8, top_p=0.8)
  36.     return outputs[0]
  37. def MiniCPM_Embedding_inference(queries, passages, model_name="OpenBMB/MiniCPM-Embedding", device="cuda"):
  38.     tokenizer = AutoTokenizer.from_pretrained(model_name)
  39.     model = AutoModel.from_pretrained(model_name, trust_remote_code=True, attn_implementation="flash_attention_2", torch_dtype=torch.float16).to(device)
  40.     model.eval()
  41.     def weighted_mean_pooling(hidden, attention_mask):
  42.         attention_mask_ = attention_mask * attention_mask.cumsum(dim=1)
  43.         s = torch.sum(hidden * attention_mask_.unsqueeze(-1).float(), dim=1)
  44.         d = attention_mask_.sum(dim=1, keepdim=True).float()
  45.         reps = s / d
  46.         return reps
  47.     @torch.no_grad()
  48.     def encode(input_texts):
  49.         batch_dict = tokenizer(input_texts, max_length=512, padding=True, truncation=True, return_tensors='pt', return_attention_mask=True).to(device)
  50.         outputs = model(**batch_dict)
  51.         attention_mask = batch_dict["attention_mask"]
  52.         hidden = outputs.last_hidden_state
  53.         reps = weighted_mean_pooling(hidden, attention_mask)
  54.         embeddings = F.normalize(reps, p=2, dim=1).detach().cpu().numpy()
  55.         return embeddings
  56.     INSTRUCTION = "Query: "
  57.     queries = [INSTRUCTION + query for query in queries]
  58.     embeddings_query = encode(queries)
  59.     embeddings_doc = encode(passages)
  60.     scores = (embeddings_query @ embeddings_doc.T)
  61.     return scores.tolist()
  62. def MiniCPM_Reranker_rerank(queries, passages, model_name='OpenBMB/MiniCPM-Reranker', device="cuda", max_len_q=512, max_len_d=512):
  63.     model_name = snapshot_download(model_name)
  64.     tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
  65.     tokenizer.padding_side = "right"
  66.     model = AutoModelForSequenceClassification.from_pretrained(model_name, trust_remote_code=True, attn_implementation="flash_attention_2", torch_dtype=torch.float16).to(device)
  67.     model.eval()
  68.     def tokenize_our(query, doc):
  69.         input_id_query = tokenizer.encode(query, add_special_tokens=False, max_length=max_len_q, truncation=True)
  70.         input_id_doc = tokenizer.encode(doc, add_special_tokens=False, max_length=max_len_d, truncation=True)
  71.         pad_input = {"input_ids": [tokenizer.bos_token_id] + input_id_query + [tokenizer.eos_token_id] + input_id_doc}
  72.         return tokenizer.pad(
  73.             pad_input,
  74.             padding="max_length",
  75.             max_length=max_len_q + max_len_d + 2,
  76.             return_tensors="pt",
  77.         )
  78.     @torch.no_grad()
  79.     def rerank(input_query, input_docs):
  80.         tokenized_inputs = [tokenize_our(input_query, input_doc).to(device) for input_doc in input_docs]
  81.         input_ids = {
  82.             "input_ids": [tokenized_input["input_ids"] for tokenized_input in tokenized_inputs],
  83.             "attention_mask": [tokenized_input["attention_mask"] for tokenized_input in tokenized_inputs]
  84.         }
  85.         for k in input_ids:
  86.             input_ids[k] = torch.stack(input_ids[k]).to(device)
  87.         outputs = model(**input_ids)
  88.         score = outputs.logits
  89.         return score.float().detach().cpu().numpy()
  90.     INSTRUCTION = "Query: "
  91.     queries = [INSTRUCTION + query for query in queries]
  92.     scores = [rerank(query, docs) for query, docs in zip(queries, passages)]
  93.     return np.array(scores)
  94. def main():
  95.     # Example use cases
  96.     response_4B = MiniCPM3_4B_inference("推荐5个北京的景点。")
  97.     print(f"MiniCPM3-4B Response: {response_4B}")
  98.     instruction = "Q: What is the name of the lead character in the novel 'The Silent Watcher'?\nA:"
  99.     passages_list = [
  100.         "In the novel 'The Silent Watcher,' the lead character is named Alex Carter. Alex is a private detective who uncovers a series of mysterious events in a small town.",
  101.         "Set in a quiet town, 'The Silent Watcher' follows Alex Carter, a former police officer turned private investigator, as he unravels the town's dark secrets.",
  102.         "'The Silent Watcher' revolves around Alex Carter's journey as he confronts his past while solving complex cases in his hometown."
  103.     ]
  104.     response_RAG_LoRA = MiniCPM3_RAG_LoRA_inference(instruction, passages_list)
  105.     print(f"MiniCPM3-RAG-LoRA Response: {response_RAG_LoRA}")
  106.     queries = ["China capital?"]
  107.     passages = ["beijing", "shanghai"]
  108.     scores_embedding = MiniCPM_Embedding_inference(queries, passages)
  109.     print(f"MiniCPM-Embedding Scores: {scores_embedding}")
  110.     rerank_queries = ["China capital?"]
  111.     rerank_passages = [["beijing", "shanghai"]]
  112.     scores_reranker = MiniCPM_Reranker_rerank(rerank_queries, rerank_passages)
  113.     print(f"MiniCPM-Reranker Scores: {scores_reranker}")
  114. if __name__ == "__main__":
  115.     main()
复制代码
未完......
更多具体的接待关注:杰哥新技能


免责声明:如果侵犯了您的权益,请联系站长,我们会及时删除侵权内容,谢谢合作!更多信息从访问主页:qidao123.com:ToB企服之家,中国第一个企服评测及商务社交产业平台。

本帖子中包含更多资源

您需要 登录 才可以下载或查看,没有账号?立即注册

x
回复

使用道具 举报

0 个回复

倒序浏览

快速回复

您需要登录后才可以回帖 登录 or 立即注册

本版积分规则

曂沅仴駦

金牌会员
这个人很懒什么都没写!
快速回复 返回顶部 返回列表