Mistral AI 又又又开源了闭源企业级模型——Mistral-Small-Instruct-2409 ...

打印 上一主题 下一主题

主题 681|帖子 681|积分 2047

就在不久前,Mistral 公司在开源了 Pixtral 12B 视觉多模态大模型之后,又开源了自家的企业级小型模型 Mistral-Small-Instruct-2409 (22B),这是 Mistral AI 最新的企业级小型模型,是 Mistral Small v24.02 的升级版。该机型可根据 Mistral Research License 使用,为客户提供了机动的选择,使其能够在翻译、摘要、感情分析和其他不必要完整通用模型的任务中,选择经济高效、快速可靠的办理方案。

Mistral Small 雏形采用 Mixtral-8X7B-v0.1(46.7B),这是一个具有 12B 活动参数的希奇专家混合模型。它的推理本领更强,功能更多,可以生成和推理代码,并且是多语言的,支持英语、法语、德语、意大利语和西班牙语。
太激动人心了, Mistral 型号的性能总是出类拔萃。如今,我们在很多缝隙上都有了出色的覆盖范围


  • 8b- Llama 3.1 8b
  • 12b- Nemo 12b
  • 22b- Mistral Small
  • 27b- Gemma-2 27b
  • 35b- Command-R 35b 08-2024
  • 40-60b- GAP (我相信这里有两个新的 MOE,但我最后发现 Llamacpp 不支持它们)
  • 70b- Llama 3.1 70b
  • 103b- Command-R+ 103b
  • 123b- Mistral Large 2
  • 141b- WizardLM-2 8x22b
  • 230b- Deepseek V2/2.5
  • 405b- Llama 3.1 405b
Mistral Small v24.09 拥有 220 亿个参数,为客户提供了介于 Mistral NeMo 12B 和 Mistral Large 2 之间的便捷中心点,提供了可在各种平台和环境中摆设的经济高效的办理方案。。


Mistral Small v24.09 拥有 220 亿个参数,为客户提供了介于 Mistral NeMo 12B 和 Mistral Large 2 之间的便捷中心点,提供了可在各种平台和环境中摆设的经济高效的办理方案。如下图所示,与以前的模型相比,新的小型模型在人类对齐、推理本领和代码方面都有显著改进。


Mistral-Small-Instruct-2409 是一个指示微调版本,具有以下特点:


  • 22B 参数
  • 词汇量达 32768
  • 支持函数调用
  • 128k 序列长度
使用

vLLM(保举)

安装 vLLM >= v0.6.1.post1
  1. pip install --upgrade vllm
复制代码
安装 mistral_common >= 1.4.1
  1. pip install --upgrade mistral_common
复制代码
本地
  1. from vllm import LLM
  2. from vllm.sampling_params import SamplingParams
  3. model_name = "mistralai/Mistral-Small-Instruct-2409"
  4. sampling_params = SamplingParams(max_tokens=8192)
  5. # note that running Mistral-Small on a single GPU requires at least 44 GB of GPU RAM
  6. # If you want to divide the GPU requirement over multiple devices, please add *e.g.* `tensor_parallel=2`
  7. llm = LLM(model=model_name, tokenizer_mode="mistral", config_format="mistral", load_format="mistral")
  8. prompt = "How often does the letter r occur in Mistral?"
  9. messages = [
  10.     {
  11.         "role": "user",
  12.         "content": prompt
  13.     },
  14. ]
  15. outputs = llm.chat(messages, sampling_params=sampling_params)
  16. print(outputs[0].outputs[0].text)
复制代码
服务器
  1. vllm serve mistralai/Mistral-Small-Instruct-2409 --tokenizer_mode mistral --config_format mistral --load_format mistral
复制代码
留意: 在单 GPU 上运行 Mistral-Small 至少必要 44 GB GPU 内存。
如果要将 GPU 需求分配给多个装备,请添加 --tensor_parallel=2 等信息
客户端
  1. curl --location 'http://<your-node-url>:8000/v1/chat/completions' \
  2. --header 'Content-Type: application/json' \
  3. --header 'Authorization: Bearer token' \
  4. --data '{
  5.     "model": "mistralai/Mistral-Small-Instruct-2409",
  6.     "messages": [
  7.       {
  8.         "role": "user",
  9.         "content": "How often does the letter r occur in Mistral?"
  10.       }
  11.     ]
  12. }'
复制代码
Mistral-inference

安装mistral_inference >= 1.4.1
  1. pip install mistral_inference --upgrade
复制代码
下载
  1. from huggingface_hub import snapshot_download
  2. from pathlib import Path
  3. mistral_models_path = Path.home().joinpath('mistral_models', '22B-Instruct-Small')
  4. mistral_models_path.mkdir(parents=True, exist_ok=True)
  5. snapshot_download(repo_id="mistralai/Mistral-Small-Instruct-2409", allow_patterns=["params.json", "consolidated.safetensors", "tokenizer.model.v3"], local_dir=mistral_models_path)
复制代码
聊天
  1. mistral-chat $HOME/mistral_models/22B-Instruct-Small --instruct --max_tokens 256
复制代码
Instruct following
  1. from mistral_inference.transformer import Transformer
  2. from mistral_inference.generate import generate
  3. from mistral_common.tokens.tokenizers.mistral import MistralTokenizer
  4. from mistral_common.protocol.instruct.messages import UserMessage
  5. from mistral_common.protocol.instruct.request import ChatCompletionRequest
  6. tokenizer = MistralTokenizer.from_file(f"{mistral_models_path}/tokenizer.model.v3")
  7. model = Transformer.from_folder(mistral_models_path)
  8. completion_request = ChatCompletionRequest(messages=[UserMessage(content="How often does the letter r occur in Mistral?")])
  9. tokens = tokenizer.encode_chat_completion(completion_request).tokens
  10. out_tokens, _ = generate([tokens], model, max_tokens=64, temperature=0.0, eos_id=tokenizer.instruct_tokenizer.tokenizer.eos_id)
  11. result = tokenizer.instruct_tokenizer.tokenizer.decode(out_tokens[0])
  12. print(result)
复制代码
Function calling
  1. from mistral_common.protocol.instruct.tool_calls import Function, Tool
  2. from mistral_inference.transformer import Transformer
  3. from mistral_inference.generate import generate
  4. from mistral_common.tokens.tokenizers.mistral import MistralTokenizer
  5. from mistral_common.protocol.instruct.messages import UserMessage
  6. from mistral_common.protocol.instruct.request import ChatCompletionRequest
  7. tokenizer = MistralTokenizer.from_file(f"{mistral_models_path}/tokenizer.model.v3")
  8. model = Transformer.from_folder(mistral_models_path)
  9. completion_request = ChatCompletionRequest(
  10.     tools=[
  11.         Tool(
  12.             function=Function(
  13.                 name="get_current_weather",
  14.                 description="Get the current weather",
  15.                 parameters={
  16.                     "type": "object",
  17.                     "properties": {
  18.                         "location": {
  19.                             "type": "string",
  20.                             "description": "The city and state, e.g. San Francisco, CA",
  21.                         },
  22.                         "format": {
  23.                             "type": "string",
  24.                             "enum": ["celsius", "fahrenheit"],
  25.                             "description": "The temperature unit to use. Infer this from the users location.",
  26.                         },
  27.                     },
  28.                     "required": ["location", "format"],
  29.                 },
  30.             )
  31.         )
  32.     ],
  33.     messages=[
  34.         UserMessage(content="What's the weather like today in Paris?"),
  35.         ],
  36. )
  37. tokens = tokenizer.encode_chat_completion(completion_request).tokens
  38. out_tokens, _ = generate([tokens], model, max_tokens=64, temperature=0.0, eos_id=tokenizer.instruct_tokenizer.tokenizer.eos_id)
  39. result = tokenizer.instruct_tokenizer.tokenizer.decode(out_tokens[0])
  40. print(result)
复制代码
Hugging Face Transformers

  1. from transformers import LlamaTokenizerFast, MistralForCausalLM
  2. import torch
  3. device = "cuda"
  4. tokenizer = LlamaTokenizerFast.from_pretrained('mistralai/Mistral-Small-Instruct-2409')
  5. tokenizer.pad_token = tokenizer.eos_token
  6. model = MistralForCausalLM.from_pretrained('mistralai/Mistral-Small-Instruct-2409', torch_dtype=torch.bfloat16)
  7. model = model.to(device)
  8. prompt = "How often does the letter r occur in Mistral?"
  9. messages = [
  10.     {"role": "user", "content": prompt},
  11. ]
  12. model_input = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt").to(device)
  13. gen = model.generate(model_input, max_new_tokens=150)
  14. dec = tokenizer.batch_decode(gen)
  15. print(dec)
复制代码
输出
  1. <s>
  2.   [INST]
  3.   How often does the letter r occur in Mistral?
  4.   [/INST]
  5.   To determine how often the letter "r" occurs in the word "Mistral,"
  6.   we can simply count the instances of "r" in the word.
  7.   The word "Mistral" is broken down as follows:
  8.     - M
  9.     - i
  10.     - s
  11.     - t
  12.     - r
  13.     - a
  14.     - l
  15.   Counting the "r"s, we find that there is only one "r" in "Mistral."
  16.   Therefore, the letter "r" occurs once in the word "Mistral."
  17. </s>
复制代码
看来 Mistral 尝试用 CoT 来修复草莓题目

本帖子中包含更多资源

您需要 登录 才可以下载或查看,没有账号?立即注册

x
回复

使用道具 举报

0 个回复

倒序浏览

快速回复

您需要登录后才可以回帖 登录 or 立即注册

本版积分规则

王國慶

金牌会员
这个人很懒什么都没写!

标签云

快速回复 返回顶部 返回列表