ToB企服应用市场:ToB评测及商务社交产业平台

标题: 【大语言模型_1】VLLM部署Qwen模型 [打印本页]

作者: 火影    时间: 2024-9-25 02:30
标题: 【大语言模型_1】VLLM部署Qwen模型
1、模型下载:

              魔塔社区:魔搭社区
              huggingface:https://huggingface.co/Qwen

2、安装python环境

             1、python官网安装python 【推荐要3.8以上版本】
             2、安装vllm模块


3、启动模型

      
  1. CUDA_VISIBLE_DEVICES=0,1 /root/vendor/Python3.10.12/bin/python3.10 -m vllm.entrypoints.openai.api_server --host 0.0.0.0 --port 25010 --served-model-name mymodel --model //root/qwen2.5/qwen2.5-coder-7b-instruct/ --tensor-parallel-size 2 --max-model-len 8096
复制代码
出现以下内容代表运行乐成
  1. INFO 09-20 15:22:59 model_runner.py:1335] Graph capturing finished in 11 secs.
  2. (VllmWorkerProcess pid=101403) INFO 09-20 15:22:59 model_runner.py:1335] Graph capturing finished in 11 secs.
  3. INFO 09-20 15:22:59 api_server.py:224] vLLM to use /tmp/tmplc42ak3s as PROMETHEUS_MULTIPROC_DIR
  4. WARNING 09-20 15:22:59 serving_embedding.py:190] embedding_mode is False. Embedding API will not work.
  5. INFO 09-20 15:22:59 launcher.py:20] Available routes are:
  6. INFO 09-20 15:22:59 launcher.py:28] Route: /openapi.json, Methods: HEAD, GET
  7. INFO 09-20 15:22:59 launcher.py:28] Route: /docs, Methods: HEAD, GET
  8. INFO 09-20 15:22:59 launcher.py:28] Route: /docs/oauth2-redirect, Methods: HEAD, GET
  9. INFO 09-20 15:22:59 launcher.py:28] Route: /redoc, Methods: HEAD, GET
  10. INFO 09-20 15:22:59 launcher.py:28] Route: /health, Methods: GET
  11. INFO 09-20 15:22:59 launcher.py:28] Route: /tokenize, Methods: POST
  12. INFO 09-20 15:22:59 launcher.py:28] Route: /detokenize, Methods: POST
  13. INFO 09-20 15:22:59 launcher.py:28] Route: /v1/models, Methods: GET
  14. INFO 09-20 15:22:59 launcher.py:28] Route: /version, Methods: GET
  15. INFO 09-20 15:22:59 launcher.py:28] Route: /v1/chat/completions, Methods: POST
  16. INFO 09-20 15:22:59 launcher.py:28] Route: /v1/completions, Methods: POST
  17. INFO 09-20 15:22:59 launcher.py:28] Route: /v1/embeddings, Methods: POST
  18. INFO 09-20 15:22:59 launcher.py:33] Launching Uvicorn with --limit_concurrency 32765. To avoid this limit at the expense of performance run with --disable-frontend-multiprocessing
  19. INFO:     Started server process [101179]
  20. INFO:     Waiting for application startup.
  21. INFO:     Application startup complete.
  22. INFO:     Uvicorn running on http://0.0.0.0:25010
复制代码
4、使用python脚本调用测试

  1. from openai import OpenAI
  2. # 初始化客户端
  3. client = OpenAI(base_url="http://localhost:25010/v1", api_key="EMPTY")
  4. print("欢迎使用Qwen智能问答机器人!输入'退出'以结束对话。")
  5. while True:
  6.     # 获取用户输入
  7.     print("您: ", end='', flush=True)
  8.     user_input = input()
  9.     if user_input.lower() in ['退出', '再见', '拜拜']:
  10.         print("qwen: 再见!期待下次与您交谈。")
  11.         break
  12.     # 构造消息列表
  13.     messages = [
  14.         {"role": "system", "content": "你的角色是名为“qwen”的智能问答机器人"},
  15.         {"role": "user", "content": user_input}
  16.     ]
  17.     try:
  18.         # 发送请求并获取回复
  19.         chat_completion = client.chat.completions.create(
  20.             model="mymodel",
  21.             messages=messages,
  22.             #stop=[ "。"],
  23.             stop=["<|endoftext|>", "<|im_end|>", "<|im_start|>"],
  24.             stream = False,
  25.         )
  26.         # 打印模型回复
  27.         print("qwen:", chat_completion.choices[0].message.content)
  28.     except Exception as e:
  29.         print("出现错误: {e}",e)
  30.         print("请稍后再试或检查您的网络连接及API配置。")
复制代码


免责声明:如果侵犯了您的权益,请联系站长,我们会及时删除侵权内容,谢谢合作!更多信息从访问主页:qidao123.com:ToB企服之家,中国第一个企服评测及商务社交产业平台。




欢迎光临 ToB企服应用市场:ToB评测及商务社交产业平台 (https://dis.qidao123.com/) Powered by Discuz! X3.4