Elasticsearch：将 Ollama 与推理 API 联合使用

用多少眼泪才能让你相信 · 2025-2-18 14:23:14

作者：来自 Elastic Jeffrey Rengifo

Ollama API 与 OpenAI API 兼容，因此将 Ollama 与 Elasticsearch 集成非常容易。

在本文中，我们将学习如何使用 Ollama 将本地模子连接到 Elasticsearch 推理模子，然后使用 Playground 向文档提出题目。
Elasticsearch 答应用户使用开放推理 API（Inference API）连接到 LLMs，支持 Amazon Bedrock、Cohere、Google AI、Azure AI Studio、HuggingFace 等提供商（作为服务）等。
Ollama 是一个工具，答应你使用本身的底子办法（本地呆板/服务器）下载和实行 LLM 模子。你可以在此处找到与 Ollama 兼容的可用型号列表。
如果你想要托管和测试不同的开源模子，而又不必担心每个模子需要以不同的方式设置，大概如何创建 API 来访问模子功能，那么 Ollama 是一个不错的选择，由于 Ollama 会处理所有事情。
由于 Ollama API 与 OpenAI API 兼容，我们可以轻松集成推理模子并使用 Playground 创建 RAG 应用程序。

更多阅读，请参阅 “Elasticsearch：在 Elastic 中玩转 DeepSeek R1 来实现 RAG 应用”。

先决条件

Elasticsearch 8.17
Kibana 8.17
Python

步骤

设置 Ollama LLM 服务器
创建映射
索引数据
使用 Playground 提问

设置 Ollama LLM 服务器

我们将设置一个 LLM 服务器，并使用 Ollama 将其连接到我们的 Playground 实例。我们需要：

下载并运行 Ollama。
使用 ngrok 通过互联网访问托管 Ollama 的本地 Web 服务器

下载并运行 Ollama

要使用Ollama，我们起首需要下载它。 Ollama 支持 Linux、Windows 和 macOS，因此只需在此处下载与你的操作系统兼容的 Ollama 版本即可。一旦安装了 Ollama，我们就可以从这个受支持的 LLM 列表中选择一个模子。在此示例中，我们将使用 llama3.2 模子，这是一个通用的多语言模子。在安装过程中，你将启用 Ollama 的命令行工具。下载完成后，你可以运行以下行：

ollama pull llama3.2

复制代码

这将输出：

pulling manifest
pulling dde5aa3fc5ff... 100% ▕█████████████████████████████████████████████████████████████████████████████████████████▏ 2.0 GB
pulling 966de95ca8a6... 100% ▕█████████████████████████████████████████████████████████████████████████████████████████▏ 1.4 KB
pulling fcc5a6bec9da... 100% ▕█████████████████████████████████████████████████████████████████████████████████████████▏ 7.7 KB
pulling a70ff7e570d9... 100% ▕█████████████████████████████████████████████████████████████████████████████████████████▏ 6.0 KB
pulling 56bb8bd477a5... 100% ▕█████████████████████████████████████████████████████████████████████████████████████████▏ 96 B
pulling 34bb5ab01051... 100% ▕█████████████████████████████████████████████████████████████████████████████████████████▏ 561 B
verifying sha256 digest
writing manifest
success

复制代码

安装后，你可以使用以下命令举行测试：

ollama run llama3.2

复制代码

我们来问一个题目：

在模子运行时，Ollama 启用默认在端口 “11434” 上运行的 API。让我们按照官方文档向该 API 发出请求：

curl http://localhost:11434/api/generate -d '{
"model": "llama3.2",
"prompt": "What is the capital of France?"
}'

复制代码

这是我们得到的答案：

{"model":"llama3.2","created_at":"2024-11-28T21:48:42.152817532Z","response":"The","done":false}
{"model":"llama3.2","created_at":"2024-11-28T21:48:42.251884485Z","response":" capital","done":false}
{"model":"llama3.2","created_at":"2024-11-28T21:48:42.347365913Z","response":" of","done":false}
{"model":"llama3.2","created_at":"2024-11-28T21:48:42.446837322Z","response":" France","done":false}
{"model":"llama3.2","created_at":"2024-11-28T21:48:42.542367394Z","response":" is","done":false}
{"model":"llama3.2","created_at":"2024-11-28T21:48:42.644580384Z","response":" Paris","done":false}
{"model":"llama3.2","created_at":"2024-11-28T21:48:42.739865362Z","response":".","done":false}
{"model":"llama3.2","created_at":"2024-11-28T21:48:42.834347518Z","response":"","done":true,"done_reason":"stop","context":[128006,9125,128007,271,38766,1303,33025,2696,25,6790,220,2366,18,271,128009,128006,882,128007,271,3923,374,279,6864,315,9822,30,128009,128006,78191,128007,271,791,6864,315,9822,374,12366,13],"total_duration":6948567145,"load_duration":4386106503,"prompt_eval_count":32,"prompt_eval_duration":1872000000,"eval_count":8,"eval_duration":684000000}

复制代码

  请注意，此端点的具体相应是流式传输。

使用 ngrok 将端点袒露给互联网

由于我们的端点在本地情况中工作，因此无法通过互联网从另一个点（如我们的 Elastic Cloud 实例）访问它。 ngrok 答应我们公开提供公共 IP 的端口。在 ngrok 中创建一个帐户并按照官方设置指南举行操作。
注：这个有点类似在中国提供的 “花生壳” 功能。
  一旦安装并设置了 ngrok 署理，我们就可以使用以下命令公开 Ollama 端口：

ngrok http 11434 --host-header="localhost:11434"

复制代码

注意：标头 --host-header="localhost:11434" 包管请求中的 “Host” 标头与 “localhost:11434” 匹配
实行此命令将返回一个公共链接，只要 ngrok 和 Ollama 服务器在本地运行，该链接就会起作用。

Session Status online
Account xxxx@yourEmailProvider.com (Plan: Free)
Version 3.18.4
Region United States (us)
Latency 561ms
Web Interface http://127.0.0.1:4040
Forwarding https://your-ngrok-url.ngrok-free.app -> http://localhost:11434
Connections ttl opn rt1 rt5 p50 p90
0 0 0.00 0.00 0.00 0.00 ```

复制代码

在 “Forwarding” 中我们可以看到 ngrok 生成了一个 URL。生存以供以后使用。
让我们再次尝试向端点发出 HTTP 请求，现在使用 ngrok 生成的 URL：

curl https://your-ngrok-endpoint.ngrok-free.app/api/generate -d '{
"model": "llama3.2",
"prompt": "What is the capital of France?"
}'

复制代码

相应应与前一个类似。

创建映射

ELSER 端点

对于此示例，我们将使用 Elasticsearch 推理 API 创建一个推理端点。此外，我们将使用 ELSER 来生成嵌入。

PUT _inference/sparse_embedding/medicines-inference
{
"service": "elasticsearch",
"service_settings": {
"num_allocations": 1,
"num_threads": 1,
"model_id": ".elser_model_2_linux-x86_64"
}
}

复制代码

在这个例子中，假设你有一家药店，销售两种范例的药品：

需要处方的药物。
不需要处方的药物。

该信息将包含在每种药物的形貌字段中。
LLM 必须表明这个字段，因此我们将使用以下数据映射：

PUT medicines
{
"mappings": {
"properties": {
"name": {
"type": "text",
"copy_to": "semantic_field"
},
"semantic_field": {
"type": "semantic_text",
"inference_id": "medicines-inference"
},
"text_description": {
"type": "text",
"copy_to": "semantic_field"
}
}
}
}

复制代码

字段 text_description 将存储形貌的纯文本，而 semantic_field（一种 semantic_text 字段范例）将存储由 ELSER 生成的嵌入。
copy_to 属性将把字段 name 和 text_description 中的内容复制到语义字段中，以便生成这些字段的嵌入。

索引数据

现在，让我们使用 _bulk API 对数据举行索引。

POST _bulk
{"index":{"_index":"medicines"}}
{"id":1,"name":"Paracetamol","text_description":"An analgesic and antipyretic that does NOT require a prescription."}
{"index":{"_index":"medicines"}}
{"id":2,"name":"Ibuprofen","text_description":"A nonsteroidal anti-inflammatory drug (NSAID) available WITHOUT a prescription."}
{"index":{"_index":"medicines"}}
{"id":3,"name":"Amoxicillin","text_description":"An antibiotic that requires a prescription."}
{"index":{"_index":"medicines"}}
{"id":4,"name":"Lorazepam","text_description":"An anxiolytic medication that strictly requires a prescription."}
{"index":{"_index":"medicines"}}
{"id":5,"name":"Omeprazole","text_description":"A medication for stomach acidity that does NOT require a prescription."}
{"index":{"_index":"medicines"}}
{"id":6,"name":"Insulin","text_description":"A hormone used in diabetes treatment that requires a prescription."}
{"index":{"_index":"medicines"}}
{"id":7,"name":"Cold Medicine","text_description":"A compound formula to relieve flu symptoms available WITHOUT a prescription."}
{"index":{"_index":"medicines"}}
{"id":8,"name":"Clonazepam","text_description":"An antiepileptic medication that requires a prescription."}
{"index":{"_index":"medicines"}}
{"id":9,"name":"Vitamin C","text_description":"A dietary supplement that does NOT require a prescription."}
{"index":{"_index":"medicines"}}
{"id":10,"name":"Metformin","text_description":"A medication used for type 2 diabetes that requires a prescription."}

复制代码

相应：

{
"errors": false,
"took": 34732020848,
"items": [
{
"index": {
"_index": "medicines",
"_id": "mYoeMpQBF7lnCNFTfdn2",
"_version": 1,
"result": "created",
"_shards": {
"total": 2,
"successful": 2,
"failed": 0
},
"_seq_no": 0,
"_primary_term": 1,
"status": 201
}
},
{
"index": {
"_index": "medicines",
"_id": "mooeMpQBF7lnCNFTfdn2",
"_version": 1,
"result": "created",
"_shards": {
"total": 2,
"successful": 2,
"failed": 0
},
"_seq_no": 1,
"_primary_term": 1,
"status": 201
}
},
{
"index": {
"_index": "medicines",
"_id": "m4oeMpQBF7lnCNFTfdn2",
"_version": 1,
"result": "created",
"_shards": {
"total": 2,
"successful": 2,
"failed": 0
},
"_seq_no": 2,
"_primary_term": 1,
"status": 201
}
},
{
"index": {
"_index": "medicines",
"_id": "nIoeMpQBF7lnCNFTfdn2",
"_version": 1,
"result": "created",
"_shards": {
"total": 2,
"successful": 2,
"failed": 0
},
"_seq_no": 3,
"_primary_term": 1,
"status": 201
}
},
{
"index": {
"_index": "medicines",
"_id": "nYoeMpQBF7lnCNFTfdn2",
"_version": 1,
"result": "created",
"_shards": {
"total": 2,
"successful": 2,
"failed": 0
},
"_seq_no": 4,
"_primary_term": 1,
"status": 201
}
},
{
"index": {
"_index": "medicines",
"_id": "nooeMpQBF7lnCNFTfdn2",
"_version": 1,
"result": "created",
"_shards": {
"total": 2,
"successful": 2,
"failed": 0
},
"_seq_no": 5,
"_primary_term": 1,
"status": 201
}
},
{
"index": {
"_index": "medicines",
"_id": "n4oeMpQBF7lnCNFTfdn2",
"_version": 1,
"result": "created",
"_shards": {
"total": 2,
"successful": 2,
"failed": 0
},
"_seq_no": 6,
"_primary_term": 1,
"status": 201
}
},
{
"index": {
"_index": "medicines",
"_id": "oIoeMpQBF7lnCNFTfdn2",
"_version": 1,
"result": "created",
"_shards": {
"total": 2,
"successful": 2,
"failed": 0
},
"_seq_no": 7,
"_primary_term": 1,
"status": 201
}
},
{
"index": {
"_index": "medicines",
"_id": "oYoeMpQBF7lnCNFTfdn2",
"_version": 1,
"result": "created",
"_shards": {
"total": 2,
"successful": 2,
"failed": 0
},
"_seq_no": 8,
"_primary_term": 1,
"status": 201
}
},
{
"index": {
"_index": "medicines",
"_id": "oooeMpQBF7lnCNFTfdn2",
"_version": 1,
"result": "created",
"_shards": {
"total": 2,
"successful": 2,
"failed": 0
},
"_seq_no": 9,
"_primary_term": 1,
"status": 201
}
}
]
}

复制代码

使用 Playground 提问

Playground 是一个 Kibana 工具，答应你使用 Elasticsearch 索引和 LLM 提供程序快速创建 RAG 系统。你可以阅读本文以了解更多信息。

将本地 LLM 连接到 Playground

我们起首需要创建一个使用我们刚刚创建的公共 URL 的连接器。在 Kibana 中，转到 Search>layground，然后单击 “Connect to an LLM”。

此操作将显示 Kibana 界面左侧的菜单。在那里，点击 “OpenAI”。

我们现在可以开始设置 OpenAI 连接器。
转到 “Connector settings”，对于 OpenAI 提供商，选择 “Other (OpenAI Compatible Service)”：

现在，让我们设置其他字段。在这个例子中，我们将我们的模子定名为 “medicines-llm”。在 URL 字段中，使用 ngrok 生成的 URL（/v1/chat/completions）。在 “Default model” 字段中，选择 “llama3.2”。我们不会使用 API 密钥，因此只需输入任何随机文本即可继续：

点击 “Save”，点击 “Add data sources” 添加索引药品：

太棒了！我们现在可以使用在本地运行的 LLM 作为 RAG 引擎来访问 Playground。

在测试之前，让我们向署理添加更具体的指令，并将发送给模子的文档数量增长到 10，以便答案具有尽可能多的可用文档。上下文字段将是 semantic_field，它包括药物的名称和形貌，这要归功于 copy_to 属性。

现在让我们问一个题目：Can I buy Clonazepam without a prescription? 看看会发生什么：
https://drive.google.com/file/d/1WOg9yJ2Vs5ugmXk9_K9giZJypB8jbxuN/view?usp=drive_link
正如我们所料，我们得到了正确的答案。

后续步骤

下一步是创建你本身的应用程序！ Playground 提供了一个 Python 代码脚本，你可以在本身的呆板上运行它并自界说它以满足你的需要。例如，通过将其置于 FastAPI 服务器后面来创建由你的 UI 使用的 QA 药品聊天呆板人。
你可以通过点击 Playground 右上角的 View code 按钮找到此代码：

并且你使用 Endpoints & API keys 生成代码中所需的 ES_API_KEY 情况变量。
对于此特定示例，代码如下：

## Install the required packages
## pip install -qU elasticsearch openai
import os
from elasticsearch import Elasticsearch
from openai import OpenAI
es_client = Elasticsearch(
"https://your-deployment.us-central1.gcp.cloud.es.io:443",
api_key=os.environ["ES_API_KEY"]
)
openai_client = OpenAI(
api_key=os.environ["OPENAI_API_KEY"],
)
index_source_fields = {
"medicines": [
"semantic_field"
]
}
def get_elasticsearch_results():
es_query = {
"retriever": {
"standard": {
"query": {
"nested": {
"path": "semantic_field.inference.chunks",
"query": {
"sparse_vector": {
"inference_id": "medicines-inference",
"field": "semantic_field.inference.chunks.embeddings",
"query": query
}
},
"inner_hits": {
"size": 2,
"name": "medicines.semantic_field",
"_source": [
"semantic_field.inference.chunks.text"
]
}
}
}
}
},
"size": 3
}
result = es_client.search(index="medicines", body=es_query)
return result["hits"]["hits"]
def create_openai_prompt(results):
context = ""
for hit in results:
inner_hit_path = f"{hit['_index']}.{index_source_fields.get(hit['_index'])[0]}"
## For semantic_text matches, we need to extract the text from the inner_hits
if 'inner_hits' in hit and inner_hit_path in hit['inner_hits']:
context += '\n --- \n'.join(inner_hit['_source']['text'] for inner_hit in hit['inner_hits'][inner_hit_path]['hits']['hits'])
else:
source_field = index_source_fields.get(hit["_index"])[0]
hit_context = hit["_source"][source_field]
context += f"{hit_context}\n"
prompt = f"""
Instructions:
- You are an assistant specializing in answering questions about the sale of medicines.
- Answer questions truthfully and factually using only the context presented.
- If you don't know the answer, just say that you don't know, don't make up an answer.
- You must always cite the document where the answer was extracted using inline academic citation style [], using the position.
- Use markdown format for code examples.
- You are correct, factual, precise, and reliable.
Context:
{context}
"""
return prompt
def generate_openai_completion(user_prompt, question):
response = openai_client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[
{"role": "system", "content": user_prompt},
{"role": "user", "content": question},
]
)
return response.choices[0].message.content
if __name__ == "__main__":
question = "my question"
elasticsearch_results = get_elasticsearch_results()
context_prompt = create_openai_prompt(elasticsearch_results)
openai_completion = generate_openai_completion(context_prompt, question)
print(openai_completion)

复制代码

为了使其与 Ollama 一起工作，你必须更改 OpenAI 客户端以连接到 Ollama 服务器而不是 OpenAI 服务器。你可以在此处找到 OpenAI 示例和兼容端点的完备列表。

openai_client = OpenAI(
# you can use http://localhost:11434/v1/ if running this code locally.
base_url='https://your-ngrok-url.ngrok-free.app/v1/',
# required but ignored
api_key='ollama',
)

复制代码

并且在调用完成方法时将模子更改为 llama3.2：

def generate_openai_completion(user_prompt, question):
response = openai_client.chat.completions.create(
model="llama3.2",
messages=[
{"role": "system", "content": user_prompt},
{"role": "user", "content": question},
]
)
return response.choices[0].message.content

复制代码

让我们添加一个题目：an I buy Clonazepam without a prescription? 对于 Elasticsearch 查询：

def get_elasticsearch_results():
es_query = {
"retriever": {
"standard": {
"query": {
"nested": {
"path": "semantic_field.inference.chunks",
"query": {
"sparse_vector": {
"inference_id": "medicines-inference",
"field": "semantic_field.inference.chunks.embeddings",
"query": "Can I buy Clonazepam without a prescription?"
}
},
"inner_hits": {
"size": 2,
"name": "medicines.semantic_field",
"_source": [
"semantic_field.inference.chunks.text"
]
}
}
}
}
},
"size": 3
}
result = es_client.search(index="medicines", body=es_query)
return result["hits"]["hits"]

复制代码

别的，在完成调用时还会打印一些内容，这样我们就可以确认我们正在将 Elasticsearch 效果作为题目上下文的一部分发送：

if __name__ == "__main__":
question = "Can I buy Clonazepam without a prescription?"
elasticsearch_results = get_elasticsearch_results()
context_prompt = create_openai_prompt(elasticsearch_results)
print("========== Context Prompt START ==========")
print(context_prompt)
print("========== Context Prompt END ==========")
print("========== Ollama Completion START ==========")
openai_completion = generate_openai_completion(context_prompt, question)
print(openai_completion)
print("========== Ollama Completion END ==========")

复制代码

现在让我们运行命令：

pip install -qU elasticsearch openai
python main.py

复制代码

你应该看到类似这样的内容：

========== Context Prompt START ==========
Instructions:
- You are an assistant specializing in answering questions about the sale of medicines.
- Answer questions truthfully and factually using only the context presented.
- If you don't know the answer, just say that you don't know, don't make up an answer.
- You must always cite the document where the answer was extracted using inline academic citation style [], using the position.
- Use markdown format for code examples.
- You are correct, factual, precise, and reliable.
Context:
Clonazepam
---
An antiepileptic medication that requires a prescription.A nonsteroidal anti-inflammatory drug (NSAID) available WITHOUT a prescription.
---
IbuprofenAn anxiolytic medication that strictly requires a prescription.
---
Lorazepam
========== Context Prompt END ==========
========== Ollama Completion START ==========
No, you cannot buy Clonazepam over-the-counter (OTC) without a prescription [1]. It is classified as a controlled substance in the United States due to its potential for dependence and abuse. Therefore, it can only be obtained from a licensed healthcare provider who will issue a prescription for this medication.
========== Ollama Completion END ==========

复制代码

结论

在本文中，我们可以看到，当将 Ollama 等工具与 Elasticsearch 推理 API 和 Playground 联合使用时，它们的强大功能和多功能性。
颠末几个简朴的步骤，我们就得到了一个可以运行的 RAG 应用程序，该应用程序可以使用 LLM 在我们本身的底子办法中免费运行的聊天功能。这还使我们能够更好地控制资源和敏感信息，同时还使我们能够访问用于不同任务的各种模子。
想要获得 Elastic 认证吗？了解下一期 Elasticsearch 工程师培训何时举行！
Elasticsearch 包含许多新功能，可帮助你为你的用例构建最佳的搜索解决方案。深入了解我们的示例条记本以了解更多信息，开始免费云试用，或立即在本地呆板上试用 Elastic。

原文：Using Ollama with the Inference API - Elasticsearch Labs

免责声明：如果侵犯了您的权益，请联系站长，我们会及时删除侵权内容，谢谢合作！更多信息从访问主页：qidao123.com:ToB企服之家，中国第一个企服评测及商务社交产业平台。

		自动登录	找回密码
密码			立即注册

Elasticsearch：将 Ollama 与推理 API 联合使用

本帖子中包含更多资源

0 个回复

快速回复

楼主热帖

标签云