分布式数据库Elasticsearch：将 Ollama 与推理 API 联合使用

用多少眼泪才能让你相信 发表于 2025-2-18 14:23:14

Elasticsearch：将 Ollama 与推理 API 联合使用

作者：来自 Elastic Jeffrey Rengifo
https://i-blog.csdnimg.cn/direct/7cf22df1e582492d83700177a66b27c8.webp
Ollama API 与 OpenAI API 兼容，因此将 Ollama 与 Elasticsearch 集成非常容易。

在本文中，我们将学习如何使用 Ollama 将本地模子连接到 Elasticsearch 推理模子，然后使用 Playground 向文档提出题目。
Elasticsearch 答应用户使用开放推理 API（Inference API）连接到 LLMs，支持 Amazon Bedrock、Cohere、Google AI、Azure AI Studio、HuggingFace 等提供商（作为服务）等。
Ollama 是一个工具，答应你使用本身的底子办法（本地呆板/服务器）下载和实行 LLM 模子。你可以在此处找到与 Ollama 兼容的可用型号列表。
如果你想要托管和测试不同的开源模子，而又不必担心每个模子需要以不同的方式设置，大概如何创建 API 来访问模子功能，那么 Ollama 是一个不错的选择，由于 Ollama 会处理所有事情。
由于 Ollama API 与 OpenAI API 兼容，我们可以轻松集成推理模子并使用 Playground 创建 RAG 应用程序。

更多阅读，请参阅 “Elasticsearch：在 Elastic 中玩转 DeepSeek R1 来实现 RAG 应用”。

先决条件

[*]Elasticsearch 8.17
[*]Kibana 8.17
[*]Python

步骤

[*]设置 Ollama LLM 服务器
[*]创建映射
[*]索引数据
[*]使用 Playground 提问

设置 Ollama LLM 服务器

我们将设置一个 LLM 服务器，并使用 Ollama 将其连接到我们的 Playground 实例。我们需要：

[*]下载并运行 Ollama。
[*]使用 ngrok 通过互联网访问托管 Ollama 的本地 Web 服务器

下载并运行 Ollama

要使用Ollama，我们起首需要下载它。 Ollama 支持 Linux、Windows 和 macOS，因此只需在此处下载与你的操作系统兼容的 Ollama 版本即可。一旦安装了 Ollama，我们就可以从这个受支持的 LLM 列表中选择一个模子。在此示例中，我们将使用 llama3.2 模子，这是一个通用的多语言模子。在安装过程中，你将启用 Ollama 的命令行工具。下载完成后，你可以运行以下行：
ollama pull llama3.2 这将输出：
pulling manifest
pulling dde5aa3fc5ff... 100% ▕█████████████████████████████████████████████████████████████████████████████████████████▏ 2.0 GB
pulling 966de95ca8a6... 100% ▕█████████████████████████████████████████████████████████████████████████████████████████▏ 1.4 KB
pulling fcc5a6bec9da... 100% ▕█████████████████████████████████████████████████████████████████████████████████████████▏ 7.7 KB
pulling a70ff7e570d9... 100% ▕█████████████████████████████████████████████████████████████████████████████████████████▏ 6.0 KB
pulling 56bb8bd477a5... 100% ▕█████████████████████████████████████████████████████████████████████████████████████████▏ 96 B
pulling 34bb5ab01051... 100% ▕█████████████████████████████████████████████████████████████████████████████████████████▏561 B
verifying sha256 digest
writing manifest
success 安装后，你可以使用以下命令举行测试：
ollama run llama3.2 我们来问一个题目：
https://i-blog.csdnimg.cn/direct/f051b5bcd5354eb88d15bd55aa0a0b98.gif
在模子运行时，Ollama 启用默认在端口 “11434” 上运行的 API。让我们按照官方文档向该 API 发出请求：
curl http://localhost:11434/api/generate -d '{
"model": "llama3.2",
"prompt": "What is the capital of France?"
}' 这是我们得到的答案：
{"model":"llama3.2","created_at":"2024-11-28T21:48:42.152817532Z","response":"The","done":false}
{"model":"llama3.2","created_at":"2024-11-28T21:48:42.251884485Z","response":" capital","done":false}
{"model":"llama3.2","created_at":"2024-11-28T21:48:42.347365913Z","response":" of","done":false}
{"model":"llama3.2","created_at":"2024-11-28T21:48:42.446837322Z","response":" France","done":false}
{"model":"llama3.2","created_at":"2024-11-28T21:48:42.542367394Z","response":" is","done":false}
{"model":"llama3.2","created_at":"2024-11-28T21:48:42.644580384Z","response":" Paris","done":false}
{"model":"llama3.2","created_at":"2024-11-28T21:48:42.739865362Z","response":".","done":false}
{"model":"llama3.2","created_at":"2024-11-28T21:48:42.834347518Z","response":"","done":true,"done_reason":"stop","context":,"total_duration":6948567145,"load_duration":4386106503,"prompt_eval_count":32,"prompt_eval_duration":1872000000,"eval_count":8,"eval_duration":684000000} 请注意，此端点的具体相应是流式传输。

使用 ngrok 将端点袒露给互联网

由于我们的端点在本地情况中工作，因此无法通过互联网从另一个点（如我们的 Elastic Cloud 实例）访问它。 ngrok 答应我们公开提供公共 IP 的端口。在 ngrok 中创建一个帐户并按照官方设置指南举行操作。
注：这个有点类似在中国提供的 “花生壳” 功能。
一旦安装并设置了 ngrok 署理，我们就可以使用以下命令公开 Ollama 端口：
ngrok http 11434 --host-header="localhost:11434" 注意：标头 --host-header="localhost:11434" 包管请求中的 “Host” 标头与 “localhost:11434” 匹配
实行此命令将返回一个公共链接，只要 ngrok 和 Ollama 服务器在本地运行，该链接就会起作用。
Session Status             online
Account                   xxxx@yourEmailProvider.com (Plan: Free)
Version                   3.18.4
Region                      United States (us)
Latency                   561ms
Web Interface             http://127.0.0.1:4040
Forwarding                https://your-ngrok-url.ngrok-free.app -> http://localhost:11434

Connections                ttl opn rt1 rt5 p50 p90
                           0    0    0.00 0.00 0.00 0.00                                              ``` 在 “Forwarding” 中我们可以看到 ngrok 生成了一个 URL。生存以供以后使用。
让我们再次尝试向端点发出 HTTP 请求，现在使用 ngrok 生成的 URL：
curl https://your-ngrok-endpoint.ngrok-free.app/api/generate -d '{
"model": "llama3.2",
"prompt": "What is the capital of France?"
}' 相应应与前一个类似。

创建映射

ELSER 端点

对于此示例，我们将使用 Elasticsearch 推理 API 创建一个推理端点。此外，我们将使用 ELSER 来生成嵌入。
PUT _inference/sparse_embedding/medicines-inference
{
"service": "elasticsearch",
"service_settings": {
"num_allocations": 1,
"num_threads": 1,
"model_id": ".elser_model_2_linux-x86_64"
}
} 在这个例子中，假设你有一家药店，销售两种范例的药品：

[*]需要处方的药物。
[*]不需要处方的药物。
该信息将包含在每种药物的形貌字段中。
LLM 必须表明这个字段，因此我们将使用以下数据映射：
PUT medicines
{
"mappings": {
"properties": {
   "name": {
   "type": "text",
   "copy_to": "semantic_field"
   },
   "semantic_field": {
   "type": "semantic_text",
   "inference_id": "medicines-inference"
   },
   "text_description": {
   "type": "text",
   "copy_to": "semantic_field"
   }
}
}
} 字段 text_description 将存储形貌的纯文本，而 semantic_field（一种 semantic_text 字段范例）将存储由 ELSER 生成的嵌入。
copy_to 属性将把字段 name 和 text_description 中的内容复制到语义字段中，以便生成这些字段的嵌入。

索引数据

现在，让我们使用 _bulk API 对数据举行索引。
POST _bulk
{"index":{"_index":"medicines"}}
{"id":1,"name":"Paracetamol","text_description":"An analgesic and antipyretic that does NOT require a prescription."}
{"index":{"_index":"medicines"}}
{"id":2,"name":"Ibuprofen","text_description":"A nonsteroidal anti-inflammatory drug (NSAID) available WITHOUT a prescription."}
{"index":{"_index":"medicines"}}
{"id":3,"name":"Amoxicillin","text_description":"An antibiotic that requires a prescription."}
{"index":{"_index":"medicines"}}
{"id":4,"name":"Lorazepam","text_description":"An anxiolytic medication that strictly requires a prescription."}
{"index":{"_index":"medicines"}}
{"id":5,"name":"Omeprazole","text_description":"A medication for stomach acidity that does NOT require a prescription."}
{"index":{"_index":"medicines"}}
{"id":6,"name":"Insulin","text_description":"A hormone used in diabetes treatment that requires a prescription."}
{"index":{"_index":"medicines"}}
{"id":7,"name":"Cold Medicine","text_description":"A compound formula to relieve flu symptoms available WITHOUT a prescription."}
{"index":{"_index":"medicines"}}
{"id":8,"name":"Clonazepam","text_description":"An antiepileptic medication that requires a prescription."}
{"index":{"_index":"medicines"}}
{"id":9,"name":"Vitamin C","text_description":"A dietary supplement that does NOT require a prescription."}
{"index":{"_index":"medicines"}}
{"id":10,"name":"Metformin","text_description":"A medication used for type 2 diabetes that requires a prescription."} 相应：
{
"errors": false,
"took": 34732020848,
"items": [
{
   "index": {
   "_index": "medicines",
   "_id": "mYoeMpQBF7lnCNFTfdn2",
   "_version": 1,
   "result": "created",
   "_shards": {
   "total": 2,
   "successful": 2,
   "failed": 0
   },
   "_seq_no": 0,
   "_primary_term": 1,
   "status": 201
   }
},
{
   "index": {
   "_index": "medicines",
   "_id": "mooeMpQBF7lnCNFTfdn2",
   "_version": 1,
   "result": "created",
   "_shards": {
   "total": 2,
   "successful": 2,
   "failed": 0
   },
   "_seq_no": 1,
   "_primary_term": 1,
   "status": 201
   }
},
{
   "index": {
   "_index": "medicines",
   "_id": "m4oeMpQBF7lnCNFTfdn2",
   "_version": 1,
   "result": "created",
   "_shards": {
   "total": 2,
   "successful": 2,
   "failed": 0
   },
   "_seq_no": 2,
   "_primary_term": 1,
   "status": 201
   }
},
{
   "index": {
   "_index": "medicines",
   "_id": "nIoeMpQBF7lnCNFTfdn2",
   "_version": 1,
   "result": "created",
   "_shards": {
   "total": 2,
   "successful": 2,
   "failed": 0
   },
   "_seq_no": 3,
   "_primary_term": 1,
   "status": 201
   }
},
{
   "index": {
   "_index": "medicines",
   "_id": "nYoeMpQBF7lnCNFTfdn2",
   "_version": 1,
   "result": "created",
   "_shards": {
   "total": 2,
   "successful": 2,
   "failed": 0
   },
   "_seq_no": 4,
   "_primary_term": 1,
   "status": 201
   }
},
{
   "index": {
   "_index": "medicines",
   "_id": "nooeMpQBF7lnCNFTfdn2",
   "_version": 1,
   "result": "created",
   "_shards": {
   "total": 2,
   "successful": 2,
   "failed": 0
   },
   "_seq_no": 5,
   "_primary_term": 1,
   "status": 201
   }
},
{
   "index": {
   "_index": "medicines",
   "_id": "n4oeMpQBF7lnCNFTfdn2",
   "_version": 1,
   "result": "created",
   "_shards": {
   "total": 2,
   "successful": 2,
   "failed": 0
   },
   "_seq_no": 6,
   "_primary_term": 1,
   "status": 201
   }
},
{
   "index": {
   "_index": "medicines",
   "_id": "oIoeMpQBF7lnCNFTfdn2",
   "_version": 1,
   "result": "created",
   "_shards": {
   "total": 2,
   "successful": 2,
   "failed": 0
   },
   "_seq_no": 7,
   "_primary_term": 1,
   "status": 201
   }
},
{
   "index": {
   "_index": "medicines",
   "_id": "oYoeMpQBF7lnCNFTfdn2",
   "_version": 1,
   "result": "created",
   "_shards": {
   "total": 2,
   "successful": 2,
   "failed": 0
   },
   "_seq_no": 8,
   "_primary_term": 1,
   "status": 201
   }
},
{
   "index": {
   "_index": "medicines",
   "_id": "oooeMpQBF7lnCNFTfdn2",
   "_version": 1,
   "result": "created",
   "_shards": {
   "total": 2,
   "successful": 2,
   "failed": 0
   },
   "_seq_no": 9,
   "_primary_term": 1,
   "status": 201
   }
}
]
}
使用 Playground 提问

Playground 是一个 Kibana 工具，答应你使用 Elasticsearch 索引和 LLM 提供程序快速创建 RAG 系统。你可以阅读本文以了解更多信息。

将本地 LLM 连接到 Playground

我们起首需要创建一个使用我们刚刚创建的公共 URL 的连接器。在 Kibana 中，转到 Search>Playground，然后单击 “Connect to an LLM”。
https://i-blog.csdnimg.cn/direct/f38fcdbb5b34440e97b034bac0a394f0.webp
此操作将显示 Kibana 界面左侧的菜单。在那里，点击 “OpenAI”。
https://i-blog.csdnimg.cn/direct/dbbbaaf0631247d5bb17a2a17eda1345.webp
我们现在可以开始设置 OpenAI 连接器。
转到 “Connector settings”，对于 OpenAI 提供商，选择 “Other (OpenAI Compatible Service)”：
https://i-blog.csdnimg.cn/direct/c4f69824bd344e49a1f850551935ab73.webp
现在，让我们设置其他字段。在这个例子中，我们将我们的模子定名为 “medicines-llm”。在 URL 字段中，使用 ngrok 生成的 URL（/v1/chat/completions）。在 “Default model” 字段中，选择 “llama3.2”。我们不会使用 API 密钥，因此只需输入任何随机文本即可继续：
https://i-blog.csdnimg.cn/direct/daa0d6aeb0da462b821376d9d9ff285f.webp
点击 “Save”，点击 “Add data sources” 添加索引药品：
https://i-blog.csdnimg.cn/direct/723e97797e7b4258984a7dd09442d03e.webp
https://i-blog.csdnimg.cn/direct/5f432fcc7e5b4d2bbddd8acb7a392e69.webp

太棒了！我们现在可以使用在本地运行的 LLM 作为 RAG 引擎来访问 Playground。
https://i-blog.csdnimg.cn/direct/efdcc080648c4fce805193a5fbdf048c.webp
在测试之前，让我们向署理添加更具体的指令，并将发送给模子的文档数量增长到 10，以便答案具有尽可能多的可用文档。上下文字段将是 semantic_field，它包括药物的名称和形貌，这要归功于 copy_to 属性。
https://i-blog.csdnimg.cn/direct/c401405b04304c1395fc3e58bc38198f.webp
现在让我们问一个题目：Can I buy Clonazepam without a prescription? 看看会发生什么：
https://drive.google.com/file/d/1WOg9yJ2Vs5ugmXk9_K9giZJypB8jbxuN/view?usp=drive_link
正如我们所料，我们得到了正确的答案。

后续步骤

下一步是创建你本身的应用程序！ Playground 提供了一个 Python 代码脚本，你可以在本身的呆板上运行它并自界说它以满足你的需要。例如，通过将其置于 FastAPI 服务器后面来创建由你的 UI 使用的 QA 药品聊天呆板人。
你可以通过点击 Playground 右上角的 View code 按钮找到此代码：
https://i-blog.csdnimg.cn/direct/353ae4e2562c483894bfb5fc53007b65.webp
并且你使用 Endpoints & API keys 生成代码中所需的 ES_API_KEY 情况变量。
对于此特定示例，代码如下：
## Install the required packages
## pip install -qU elasticsearch openai
import os
from elasticsearch import Elasticsearch
from openai import OpenAI
es_client = Elasticsearch(
"https://your-deployment.us-central1.gcp.cloud.es.io:443",
api_key=os.environ["ES_API_KEY"]
)
openai_client = OpenAI(
api_key=os.environ["OPENAI_API_KEY"],
)
index_source_fields = {
"medicines": [
   "semantic_field"
]
}
def get_elasticsearch_results():
es_query = {
   "retriever": {
         "standard": {
            "query": {
               "nested": {
                     "path": "semantic_field.inference.chunks",
                     "query": {
                        "sparse_vector": {
                           "inference_id": "medicines-inference",
                           "field": "semantic_field.inference.chunks.embeddings",
                           "query": query
                        }
                     },
                     "inner_hits": {
                        "size": 2,
                        "name": "medicines.semantic_field",
                        "_source": [
                           "semantic_field.inference.chunks.text"
                        ]
                     }
               }
            }
         }
   },
   "size": 3
}
result = es_client.search(index="medicines", body=es_query)
return result["hits"]["hits"]
def create_openai_prompt(results):
context = ""
for hit in results:
   inner_hit_path = f"{hit['_index']}.{index_source_fields.get(hit['_index'])}"
   ## For semantic_text matches, we need to extract the text from the inner_hits
   if 'inner_hits' in hit and inner_hit_path in hit['inner_hits']:
         context += '\n --- \n'.join(inner_hit['_source']['text'] for inner_hit in hit['inner_hits']['hits']['hits'])
   else:
         source_field = index_source_fields.get(hit["_index"])
         hit_context = hit["_source"]
         context += f"{hit_context}\n"
prompt = f"""
Instructions:
- You are an assistant specializing in answering questions about the sale of medicines.
- Answer questions truthfully and factually using only the context presented.
- If you don't know the answer, just say that you don't know, don't make up an answer.
- You must always cite the document where the answer was extracted using inline academic citation style [], using the position.
- Use markdown format for code examples.
- You are correct, factual, precise, and reliable.
Context:
{context}
"""
return prompt
def generate_openai_completion(user_prompt, question):
response = openai_client.chat.completions.create(
   model="gpt-3.5-turbo",
   messages=[
         {"role": "system", "content": user_prompt},
         {"role": "user", "content": question},
   ]
)
return response.choices.message.content
if __name__ == "__main__":
question = "my question"
elasticsearch_results = get_elasticsearch_results()
context_prompt = create_openai_prompt(elasticsearch_results)
openai_completion = generate_openai_completion(context_prompt, question)
print(openai_completion) 为了使其与 Ollama 一起工作，你必须更改 OpenAI 客户端以连接到 Ollama 服务器而不是 OpenAI 服务器。你可以在此处找到 OpenAI 示例和兼容端点的完备列表。
openai_client = OpenAI(
# you can use http://localhost:11434/v1/ if running this code locally.
base_url='https://your-ngrok-url.ngrok-free.app/v1/',
# required but ignored
api_key='ollama',
) 并且在调用完成方法时将模子更改为 llama3.2：
def generate_openai_completion(user_prompt, question):
response = openai_client.chat.completions.create(
   model="llama3.2",
   messages=[
         {"role": "system", "content": user_prompt},
         {"role": "user", "content": question},
   ]
)
return response.choices.message.content 让我们添加一个题目：an I buy Clonazepam without a prescription? 对于 Elasticsearch 查询：
def get_elasticsearch_results():
es_query = {
   "retriever": {
         "standard": {
            "query": {
               "nested": {
                     "path": "semantic_field.inference.chunks",
                     "query": {
                        "sparse_vector": {
                           "inference_id": "medicines-inference",
                           "field": "semantic_field.inference.chunks.embeddings",
                           "query": "Can I buy Clonazepam without a prescription?"
                        }
                     },
                     "inner_hits": {
                        "size": 2,
                        "name": "medicines.semantic_field",
                        "_source": [
                           "semantic_field.inference.chunks.text"
                        ]
                     }
               }
            }
         }
   },
   "size": 3
}
result = es_client.search(index="medicines", body=es_query)
return result["hits"]["hits"] 别的，在完成调用时还会打印一些内容，这样我们就可以确认我们正在将 Elasticsearch 效果作为题目上下文的一部分发送：
if __name__ == "__main__":
question = "Can I buy Clonazepam without a prescription?"
elasticsearch_results = get_elasticsearch_results()
context_prompt = create_openai_prompt(elasticsearch_results)
print("========== Context Prompt START ==========")
print(context_prompt)
print("========== Context Prompt END ==========")
print("========== Ollama Completion START ==========")
openai_completion = generate_openai_completion(context_prompt, question)
print(openai_completion)
print("========== Ollama Completion END ==========") 现在让我们运行命令：
pip install -qU elasticsearch openai

python main.py 你应该看到类似这样的内容：
========== Context Prompt START ==========
Instructions:
- You are an assistant specializing in answering questions about the sale of medicines.
- Answer questions truthfully and factually using only the context presented.
- If you don't know the answer, just say that you don't know, don't make up an answer.
- You must always cite the document where the answer was extracted using inline academic citation style [], using the position.
- Use markdown format for code examples.
- You are correct, factual, precise, and reliable.
Context:
Clonazepam
---
An antiepileptic medication that requires a prescription.A nonsteroidal anti-inflammatory drug (NSAID) available WITHOUT a prescription.
---
IbuprofenAn anxiolytic medication that strictly requires a prescription.
---
Lorazepam

========== Context Prompt END ==========
========== Ollama Completion START ==========
No, you cannot buy Clonazepam over-the-counter (OTC) without a prescription . It is classified as a controlled substance in the United States due to its potential for dependence and abuse. Therefore, it can only be obtained from a licensed healthcare provider who will issue a prescription for this medication.
========== Ollama Completion END ==========
结论

在本文中，我们可以看到，当将 Ollama 等工具与 Elasticsearch 推理 API 和 Playground 联合使用时，它们的强大功能和多功能性。
颠末几个简朴的步骤，我们就得到了一个可以运行的 RAG 应用程序，该应用程序可以使用 LLM 在我们本身的底子办法中免费运行的聊天功能。这还使我们能够更好地控制资源和敏感信息，同时还使我们能够访问用于不同任务的各种模子。
想要获得 Elastic 认证吗？了解下一期 Elasticsearch 工程师培训何时举行！
Elasticsearch 包含许多新功能，可帮助你为你的用例构建最佳的搜索解决方案。深入了解我们的示例条记本以了解更多信息，开始免费云试用，或立即在本地呆板上试用 Elastic。

原文：Using Ollama with the Inference API - Elasticsearch Labs

免责声明：如果侵犯了您的权益，请联系站长，我们会及时删除侵权内容，谢谢合作！更多信息从访问主页：qidao123.com:ToB企服之家，中国第一个企服评测及商务社交产业平台。

页: [1]

IT评测·应用市场-qidao123.com技术社区's Archiver

Elasticsearch：将 Ollama 与推理 API 联合使用