一、安装xinference
二、启动xinference
- ./xinference-local --host=0.0.0.0 --port=5544
复制代码 三、注册本地模型
- 1、注册embedding模型
- curl -X POST "http://localhost:5544/v1/models" \
- -H "Content-Type: application/json" \
- -d '{
- "model_type": "embedding",
- "model_name": "bce-embedding-base_v1",
- "model_uid": "bce-embedding-base_v1",
- "model_path": "/root/embed_rerank/bce-embedding-base_v1/"
- }'
- 验证:
- curl -X POST "http://localhost:5544/v1/embeddings" \
- -H "Content-Type: application/json" \
- -d '{
- "model": "bce-embedding-base_v1",
- "input": ["需要嵌入的文本1", "这是第二个句子"]
- }'
- 2、注册rerank模型
- curl -X POST "http://localhost:5544/v1/models" \
- -H "Content-Type: application/json" \
- -d '{
- "model_type": "rerank",
- "model_name": "bce-reranker-base_v1",
- "model_uid": "bce-reranker-base_v1",
- "model_path": "/root/embed_rerank/bce-reranker-base_v1"
- }'
- 验证
- curl -X POST "http://localhost:5544/v1/rerank" \
- -H "Content-Type: application/json" \
- -d '{
- "model": "bge-reranker-v2-m3",
- "query": "What is Python?",
- "documents": [
- "Python is a programming language.",
- "Java is another language.",
- "Python is used for web development."
- ]
- }'
- 3、执行./xinference list 查看运行模型
复制代码 四、删除模型
- curl -X DELETE "http://localhost:5544/v1/models/bge-reranker-v2-m3"
复制代码 五、备注
1、在cpu运行
启动xinference之前设置
export CUDA_VISIBLE_DEVICES=""
2、在gpu运行
启动服务器前设置情况变量
export CUDA_VISIBLE_DEVICES=""
- curl -X POST "http://localhost:5544/v1/models" \
- -H "Content-Type: application/json" \
- -d '{
- "model_type": "embedding",
- "model_name": "bce-embedding-base_v1",
- "model_uid": "bce-embedding-base_v1",
- "model_path": "/root/zml/embed_rerank/bce-embedding-base_v1/"
- "gpu_idx": 1
- "n_gpu" : 1
- }'
- 备注:
- gpu_idx :选用的显卡index
- n_gpu:选定的显卡总张数
复制代码
免责声明:如果侵犯了您的权益,请联系站长,我们会及时删除侵权内容,谢谢合作!更多信息从访问主页:qidao123.com:ToB企服之家,中国第一个企服评测及商务社交产业平台。 |