物联网【大语言模型_5】xinference部署embedding模型和rerank模型

王海鱼 发表于 2025-3-19 18:02:25

【大语言模型_5】xinference部署embedding模型和rerank模型

一、安装xinference

pip install xinference 二、启动xinference

./xinference-local--host=0.0.0.0--port=5544 三、注册本地模型

1、注册embedding模型
curl -X POST "http://localhost:5544/v1/models" \
-H "Content-Type: application/json" \
-d '{
"model_type": "embedding",
"model_name": "bce-embedding-base_v1",
"model_uid": "bce-embedding-base_v1",
"model_path": "/root/embed_rerank/bce-embedding-base_v1/"
}'

验证：
curl -X POST "http://localhost:5544/v1/embeddings" \
-H "Content-Type: application/json" \
-d '{
"model": "bce-embedding-base_v1",
"input": ["需要嵌入的文本1", "这是第二个句子"]
}'

2、注册rerank模型

curl -X POST "http://localhost:5544/v1/models" \
-H "Content-Type: application/json" \
-d '{
"model_type": "rerank",
"model_name": "bce-reranker-base_v1",
"model_uid": "bce-reranker-base_v1",
"model_path": "/root/embed_rerank/bce-reranker-base_v1"
}'

验证
curl -X POST "http://localhost:5544/v1/rerank" \
-H "Content-Type: application/json" \
-d '{
"model": "bge-reranker-v2-m3",
"query": "What is Python?",
"documents": [
"Python is a programming language.",
"Java is another language.",
"Python is used for web development."
]
}'

3、执行./xinference list 查看运行模型

四、删除模型

curl -X DELETE "http://localhost:5544/v1/models/bge-reranker-v2-m3" 五、备注

1、在cpu运行

[*]服务器有显卡但是选择用cpu加载
启动xinference之前设置
export CUDA_VISIBLE_DEVICES=""

[*]服务器无显卡会自动在cpu加载模型

2、在gpu运行

启动服务器前设置情况变量
export CUDA_VISIBLE_DEVICES=""
curl -X POST "http://localhost:5544/v1/models" \
-H "Content-Type: application/json" \
-d '{
"model_type": "embedding",
"model_name": "bce-embedding-base_v1",
"model_uid": "bce-embedding-base_v1",
"model_path": "/root/zml/embed_rerank/bce-embedding-base_v1/"
"gpu_idx": 1
"n_gpu" : 1
}'

备注：
gpu_idx :选用的显卡index
n_gpu:选定的显卡总张数

免责声明：如果侵犯了您的权益，请联系站长，我们会及时删除侵权内容，谢谢合作！更多信息从访问主页：qidao123.com:ToB企服之家，中国第一个企服评测及商务社交产业平台。

页: [1]

IT评测·应用市场-qidao123.com技术社区's Archiver

【大语言模型_5】xinference部署embedding模型和rerank模型